Arxiv Day: Article

Predicting Cellular Responses with Variational Causal Inference and Refined Relational Information

Predicting the responses of a cell under perturbations may bring important benefits to drug discovery and personalized therapeutics. In this work, we propose a novel graph variational Bayesian causal inference framework to predict a cell's gene expressions under counterfactual perturbations (perturbations that this cell did not factually receive), leveraging information representing biological knowledge in the form of gene regulatory networks (GRNs) to aid individualized cellular response predictions. Aiming at a data-adaptive GRN, we also developed an adjacency matrix updating technique for graph convolutional networks and used it to refine GRNs during pre-training, which generated more insights on gene relations and enhanced model performance. Additionally, we propose a robust estimator within our framework for the asymptotically efficient estimation of marginal perturbation effect, which is yet to be carried out in previous works. With extensive experiments, we exhibited the advantage of our approach over state-of-the-art deep learning models for individual response prediction.

Updated: 2025-02-11 23:58:24

标题: 使用变分因果推断和精细的关系信息预测细胞反应

摘要: 预测细胞在干扰下的反应可能为药物发现和个性化治疗带来重要的益处。在这项工作中，我们提出了一种新颖的图变分贝叶斯因果推断框架，用于预测细胞在反事实干扰下（即该细胞实际上没有接受的干扰）的基因表达，利用代表生物知识的基因调控网络（GRNs）信息来辅助个体化细胞响应预测。针对数据自适应的GRN，我们还开发了一种邻接矩阵更新技术用于图卷积网络，并用它来在预训练过程中优化GRNs，这产生了更多关于基因关系的见解并增强了模型性能。此外，我们在框架中提出了一个稳健的估计器，用于渐近有效地估计边际干扰效应，这在先前的研究中尚未进行。通过大量实验，我们展示了我们的方法相对于最先进的深度学习模型在个体响应预测方面的优势。

更新时间: 2025-02-11 23:58:24

领域: cs.LG,cs.AI,q-bio.GN,stat.ME,stat.ML

下载: http://arxiv.org/abs/2210.00116v3

Counterfactual Generative Modeling with Variational Causal Inference

Estimating an individual's counterfactual outcomes under interventions is a challenging task for traditional causal inference and supervised learning approaches when the outcome is high-dimensional (e.g. gene expressions, facial images) and covariates are relatively limited. In this case, to predict one's outcomes under counterfactual treatments, it is crucial to leverage individual information contained in the observed outcome in addition to the covariates. Prior works using variational inference in counterfactual generative modeling have been focusing on neural adaptations and model variants within the conditional variational autoencoder formulation, which we argue is fundamentally ill-suited to the notion of counterfactual in causal inference. In this work, we present a novel variational Bayesian causal inference framework and its theoretical backings to properly handle counterfactual generative modeling tasks, through which we are able to conduct counterfactual supervision end-to-end during training without any counterfactual samples, and encourage disentangled exogenous noise abduction that aids the correct identification of causal effect in counterfactual generations. In experiments, we demonstrate the advantage of our framework compared to state-of-the-art models in counterfactual generative modeling on multiple benchmarks.

Updated: 2025-02-11 23:56:02

标题: 利用变分因果推断进行反事实生成建模

摘要: 估计个体在干预下的反事实结果对于传统因果推断和监督学习方法来说是一项具有挑战性的任务，特别是在结果是高维的情况下（例如基因表达、面部图像），而协变量相对有限。在这种情况下，为了预测个体在反事实处理下的结果，除了协变量外，利用观察结果中包含的个体信息是至关重要的。先前的工作在反事实生成建模中使用变分推断，集中在条件变分自动编码器公式中的神经适应和模型变体，我们认为这在因果推断的反事实概念上基本不适用。在这项工作中，我们提出了一种新颖的变分贝叶斯因果推断框架及其理论支持，以正确处理反事实生成建模任务，通过这一框架，我们能够在训练期间进行端到端的反事实监督，无需任何反事实样本，并鼓励分解外生噪声的绑架，从而有助于正确识别反事实生成中的因果效应。在实验中，我们证明了我们的框架相对于多个基准模型在反事实生成建模中的优势。

更新时间: 2025-02-11 23:56:02

领域: cs.LG,cs.AI,math.ST,stat.ML,stat.TH

下载: http://arxiv.org/abs/2410.12730v2

Initialization Matters: Unraveling the Impact of Pre-Training on Federated Learning

Initializing with pre-trained models when learning on downstream tasks is becoming standard practice in machine learning. Several recent works explore the benefits of pre-trained initialization in a federated learning (FL) setting, where the downstream training is performed at the edge clients with heterogeneous data distribution. These works show that starting from a pre-trained model can substantially reduce the adverse impact of data heterogeneity on the test performance of a model trained in a federated setting, with no changes to the standard FedAvg training algorithm. In this work, we provide a deeper theoretical understanding of this phenomenon. To do so, we study the class of two-layer convolutional neural networks (CNNs) and provide bounds on the training error convergence and test error of such a network trained with FedAvg. We introduce the notion of aligned and misaligned filters at initialization and show that the data heterogeneity only affects learning on misaligned filters. Starting with a pre-trained model typically results in fewer misaligned filters at initialization, thus producing a lower test error even when the model is trained in a federated setting with data heterogeneity. Experiments in synthetic settings and practical FL training on CNNs verify our theoretical findings.

Updated: 2025-02-11 23:53:16

标题: 初始化很重要：揭示预训练对联邦学习的影响

摘要: 在机器学习中，在学习下游任务时使用预训练模型进行初始化已经成为标准做法。最近的几项研究探讨了在联邦学习（FL）设置中使用预训练初始化的好处，其中下游训练是在具有异构数据分布的边缘客户端上进行的。这些研究表明，从预训练模型开始可以大幅减少数据异质性对在联邦设置中训练的模型的测试性能的不利影响，而无需对标准FedAvg训练算法进行任何更改。在这项工作中，我们提供了对这一现象更深入的理论理解。为此，我们研究了两层卷积神经网络（CNNs）的类别，并为使用FedAvg训练的这种网络的训练错误收敛和测试错误提供了界限。我们引入了初始化时对齐和未对齐滤波器的概念，并展示了数据异质性仅影响对未对齐滤波器的学习。从预训练模型开始通常会导致初始化时较少的未对齐滤波器，从而即使在具有数据异质性的联邦设置中训练模型时也会产生较低的测试错误。在合成环境和实际的CNN联邦学习训练中的实验验证了我们的理论发现。

更新时间: 2025-02-11 23:53:16

领域: cs.LG,cs.DC

下载: http://arxiv.org/abs/2502.08024v1

A Probabilistic Inference Approach to Inference-Time Scaling of LLMs using Particle-Based Monte Carlo Methods

Large language models (LLMs) have achieved significant performance gains via scaling up model sizes and/or data. However, recent evidence suggests diminishing returns from such approaches, motivating scaling the computation spent at inference time. Existing inference-time scaling methods, usually with reward models, cast the task as a search problem, which tends to be vulnerable to reward hacking as a consequence of approximation errors in reward models. In this paper, we instead cast inference-time scaling as a probabilistic inference task and leverage sampling-based techniques to explore the typical set of the state distribution of a state-space model with an approximate likelihood, rather than optimize for its mode directly. We propose a novel inference-time scaling approach by adapting particle-based Monte Carlo methods to this task. Our empirical evaluation demonstrates that our methods have a 4-16x better scaling rate over our deterministic search counterparts on various challenging mathematical reasoning tasks. Using our approach, we show that Qwen2.5-Math-1.5B-Instruct can surpass GPT-4o accuracy in only 4 rollouts, while Qwen2.5-Math-7B-Instruct scales to o1 level accuracy in only 32 rollouts. Our work not only presents an effective method to inference-time scaling, but also connects the rich literature in probabilistic inference with inference-time scaling of LLMs to develop more robust algorithms in future work. Code, videos, and further information available at https://probabilistic-inference-scaling.github.io.

Updated: 2025-02-11 23:52:26

标题: 使用基于粒子的蒙特卡洛方法进行LLMs推理时间缩放的概率推理方法

摘要: 大型语言模型（LLMs）通过扩大模型规模和/或数据，取得了显著的性能提升。然而，最近的证据表明，这种方法的收益递减，促使我们将推理时间的计算规模化。现有的推理时间规模化方法通常使用奖励模型，将任务视为搜索问题，但由于奖励模型中的逼近误差，容易受到奖励欺诈的影响。本文将推理时间的规模化任务视为概率推理任务，并利用基于抽样的技术来探索状态空间模型的状态分布的典型集，而不是直接优化其模式。我们提出了一种新颖的推理时间规模化方法，通过将基于粒子的蒙特卡罗方法应用于这一任务。我们的实证评估表明，我们的方法在各种具有挑战性的数学推理任务上比我们的确定性搜索对照组具有4-16倍更好的规模化速率。使用我们的方法，我们展示了Qwen2.5-Math-1.5B-Instruct在只进行4次滚动时就能超过GPT-4o的准确性，而Qwen2.5-Math-7B-Instruct在只进行32次滚动时就能达到o1级的准确度。我们的工作不仅提出了一种有效的推理时间规模化方法，还将概率推理中的丰富文献与LLMs的推理时间规模化联系起来，以在未来的工作中开发更加稳健的算法。代码、视频和更多信息可在https://probabilistic-inference-scaling.github.io获得。

更新时间: 2025-02-11 23:52:26

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2502.01618v3

Model Selection for Off-policy Evaluation: New Algorithms and Experimental Protocol

Holdout validation and hyperparameter tuning from data is a long-standing problem in offline reinforcement learning (RL). A standard framework is to use off-policy evaluation (OPE) methods to evaluate and select the policies, but OPE either incurs exponential variance (e.g., importance sampling) or has hyperparameters on their own (e.g., FQE and model-based). In this work we focus on hyperparameter tuning for OPE itself, which is even more under-investigated. Concretely, we select among candidate value functions ("model-free") or dynamics ("model-based") to best assess the performance of a target policy. Our contributions are two fold. We develop: (1) new model-free and model-based selectors with theoretical guarantees, and (2) a new experimental protocol for empirically evaluating them. Compared to the model-free protocol in prior works, our new protocol allows for more stable generation of candidate value functions, better control of misspecification, and evaluation of model-free and model-based methods alike. We exemplify the protocol on a Gym environment, and find that our new model-free selector, LSTD-Tournament, demonstrates promising empirical performance.

Updated: 2025-02-11 23:40:55

标题: 策略外评估的模型选择：新算法和实验协议

摘要: 在离线强化学习（RL）中，保持验证和超参数调整是一个长期存在的问题。一个标准框架是使用离线策略评估（OPE）方法来评估和选择策略，但OPE要么会产生指数方差（例如重要性采样），要么具有自己的超参数（例如FQE和基于模型的方法）。在这项工作中，我们专注于OPE本身的超参数调整，这方面甚至更少被研究。具体来说，我们从候选值函数（“无模型”）或动态（“基于模型”）中选择以最佳评估目标策略的性能。我们的贡献有两个方面。我们开发了：（1）具有理论保证的新的无模型和基于模型的选择器，以及（2）一种用于经验评估它们的新实验协议。与先前作品中的无模型协议相比，我们的新协议允许更稳定地生成候选值函数，更好地控制误差规范化，并评估无模型和基于模型的方法。我们在一个Gym环境中演示了该协议，并发现我们的新无模型选择器LSTD-Tournament展现出有希望的实证表现。

更新时间: 2025-02-11 23:40:55

领域: cs.LG,cs.AI,stat.ML

下载: http://arxiv.org/abs/2502.08021v1

Speculate, then Collaborate: Fusing Knowledge of Language Models during Decoding

Large Language Models (LLMs) often excel in specific domains but fall short in others due to the limitations of their training. Thus, enabling LLMs to solve problems collaboratively by integrating their complementary knowledge promises to improve their performance across domains. To realize this potential, we introduce a novel Collaborative Speculative Decoding (CoSD) algorithm that enables efficient LLM knowledge fusion at test time without requiring additional model training. CoSD employs a draft model to generate initial sequences and an easy-to-learn rule or decision tree to decide when to invoke an assistant model to improve these drafts. CoSD not only enhances knowledge fusion but also improves inference efficiency, is transferable across domains and models, and offers greater explainability. Experimental results demonstrate that CoSD improves accuracy by up to 10\% across benchmarks compared to existing methods, providing a scalable and effective solution for LLM-based applications

Updated: 2025-02-11 23:40:53

标题: 推测，然后合作：在解码过程中融合语言模型知识

摘要: 大型语言模型（LLMs）通常在特定领域表现出色，但在其他领域表现不佳，这是由于它们的训练限制。因此，通过整合它们的互补知识，使LLMs能够协同解决问题，有望提高它们在各个领域的性能。为了实现这一潜力，我们引入了一种新颖的协作推测解码（CoSD）算法，该算法能够在测试时有效地实现LLM知识融合，而无需额外的模型训练。CoSD使用一个草稿模型生成初始序列，并使用一个易于学习的规则或决策树来决定何时调用助手模型来改进这些草稿。CoSD不仅增强了知识融合，还提高了推理效率，可在不同领域和模型之间转移，并提供更大的可解释性。实验结果表明，与现有方法相比，CoSD在各项基准测试中将准确性提高了高达10％，为基于LLM的应用提供了可扩展且有效的解决方案。

更新时间: 2025-02-11 23:40:53

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2502.08020v1

Hierarchical Manifold Projection for Ransomware Detection: A Novel Geometric Approach to Identifying Malicious Encryption Patterns

Encryption-based cyber threats continue to evolve, employing increasingly sophisticated techniques to bypass traditional detection mechanisms. Many existing classification strategies depend on static rule sets, signature-based matching, or machine learning models that require extensive labeled datasets, making them ineffective against emerging ransomware families that exhibit polymorphic and adversarial behaviors. A novel classification framework structured through hierarchical manifold projection introduces a mathematical approach to detecting malicious encryption workflows, preserving geometric consistencies that differentiate ransomware-induced modifications from benign cryptographic operations. The proposed methodology transforms encryption sequences into structured manifold embeddings, ensuring classification robustness through non-Euclidean feature separability rather than reliance on static indicators. Generalization capabilities remain stable across diverse ransomware variants, as hierarchical decomposition techniques capture multi-scale encryption characteristics while maintaining resilience against code obfuscation and execution flow modifications. Empirical analysis demonstrates that detection accuracy remains high even when encryption key variability, delayed execution tactics, or API call obfuscation strategies are introduced, reinforcing the reliability of manifold-based classification. Real-time scalability assessments confirm that the proposed approach maintains computational efficiency across increasing dataset volumes, validating its applicability to large-scale threat detection scenarios.

Updated: 2025-02-11 23:20:58

标题: 分层流形投影用于勒索软件检测：一种新颖的几何方法识别恶意加密模式

摘要: 基于加密的网络威胁不断演变，采用越来越复杂的技术来规避传统的检测机制。许多现有的分类策略依赖于静态规则集、基于签名的匹配，或者需要大量标记数据集的机器学习模型，这使它们无法有效地应对表现出多态性和对抗性行为的新兴勒索软件家族。通过层次流形投影构建的新颖分类框架引入了一种数学方法来检测恶意加密工作流程，保留了区分勒索软件引发的修改与良性加密操作的几何一致性。所提出的方法将加密序列转换为结构化的流形嵌入，通过非欧几里得特征可分性而不是依赖静态指标，确保了分类的鲁棒性。在各种勒索软件变种之间保持稳定的泛化能力，通过层次分解技术捕获多尺度加密特征，同时保持对代码混淆和执行流程修改的抵抗力。实证分析表明，即使引入了加密密钥的可变性、延迟执行策略或API调用混淆策略，检测准确率仍然很高，加强了基于流形的分类的可靠性。实时可扩展性评估证实，所提出的方法在不断增加的数据集量下保持计算效率，验证了其适用于大规模威胁检测场景的可行性。

更新时间: 2025-02-11 23:20:58

领域: cs.CR

下载: http://arxiv.org/abs/2502.08013v1

More Experts Than Galaxies: Conditionally-overlapping Experts With Biologically-Inspired Fixed Routing

The evolution of biological neural systems has led to both modularity and sparse coding, which enables energy efficiency and robustness across the diversity of tasks in the lifespan. In contrast, standard neural networks rely on dense, non-specialized architectures, where all model parameters are simultaneously updated to learn multiple tasks, leading to interference. Current sparse neural network approaches aim to alleviate this issue but are hindered by limitations such as 1) trainable gating functions that cause representation collapse, 2) disjoint experts that result in redundant computation and slow learning, and 3) reliance on explicit input or task IDs that limit flexibility and scalability. In this paper we propose Conditionally Overlapping Mixture of ExperTs (COMET), a general deep learning method that addresses these challenges by inducing a modular, sparse architecture with an exponential number of overlapping experts. COMET replaces the trainable gating function used in Sparse Mixture of Experts with a fixed, biologically inspired random projection applied to individual input representations. This design causes the degree of expert overlap to depend on input similarity, so that similar inputs tend to share more parameters. This results in faster learning per update step and improved out-of-sample generalization. We demonstrate the effectiveness of COMET on a range of tasks, including image classification, language modeling, and regression, using several popular deep learning architectures.

Updated: 2025-02-11 23:18:12

标题: 比星系还多的专家：具有生物启发固定路由的条件重叠专家

摘要: 生物神经系统的演化导致了模块化和稀疏编码，从而实现了在整个生命周期中各种任务的能效性和鲁棒性。相比之下，标准神经网络依赖于密集的、非专门化的架构，其中所有模型参数同时更新以学习多个任务，导致干扰。目前的稀疏神经网络方法旨在缓解这一问题，但受到了诸如1）可训练门控函数引起的表示坍缩、2）不相交专家导致冗余计算和学习缓慢，以及3）依赖于显式输入或任务ID限制了灵活性和可伸缩性等限制。在本文中，我们提出了Conditionally Overlapping Mixture of Experts（COMET），这是一种通用的深度学习方法，通过引入一个带有指数数量重叠专家的模块化、稀疏架构来解决这些挑战。COMET用一个固定的、受生物启发的随机投影替换了Sparse Mixture of Experts中使用的可训练的门控函数，应用于单个输入表示。这一设计使得专家重叠程度取决于输入相似性，因此相似的输入倾向于共享更多参数。这导致了每次更新步骤的更快学习和改进的样本外泛化。我们展示了COMET在一系列任务上的有效性，包括图像分类、语言建模和回归，使用了几种流行的深度学习架构。

更新时间: 2025-02-11 23:18:12

领域: cs.LG

下载: http://arxiv.org/abs/2410.08003v6

The Role of Randomness in Stability

Stability is a central property in learning and statistics promising the output of an algorithm $A$ does not change substantially when applied to similar datasets $S$ and $S'$. It is an elementary fact that any sufficiently stable algorithm (e.g.\ one returning the same result with high probability, satisfying privacy guarantees, etc.) must be randomized. This raises a natural question: can we quantify how much randomness is needed for algorithmic stability? We study the randomness complexity of two influential notions of stability in learning: replicability, which promises $A$ usually outputs the same result when run over samples from the same distribution (and shared random coins), and differential privacy, which promises the output distribution of $A$ remains similar under neighboring datasets. The randomness complexity of these notions was studied recently in (Dixon et al. ICML 2024) and (Cannone et al. ITCS 2024) for basic $d$-dimensional tasks (e.g. estimating the bias of $d$ coins), but little is known about the measures more generally or in complex settings like classification. Toward this end, we prove a `weak-to-strong' boosting theorem for stability: the randomness complexity of a task $M$ (either under replicability or DP) is tightly controlled by the best replication probability of any deterministic algorithm solving the task, a weak measure called `global stability' that is universally capped at $\frac{1}{2}$ (Chase et al. FOCS 2023). Using this, we characterize the randomness complexity of PAC Learning: a class has bounded randomness complexity iff it has finite Littlestone dimension, and moreover scales at worst logarithmically in the excess error of the learner. This resolves a question of (Chase et al. STOC 2024) who asked for such a characterization in the equivalent language of (error-dependent) `list-replicability'.

Updated: 2025-02-11 23:06:43

标题: 随机性在稳定性中的作用

摘要: 稳定性是学习和统计中的一个核心属性，承诺算法$A$在应用于类似数据集$S$和$S'$时输出不会发生显著变化。这是一个基本事实，任何足够稳定的算法（例如，返回相同结果的概率很高，满足隐私保证等）必须是随机的。这引发了一个自然的问题：我们能够量化算法稳定性所需的随机性程度吗？我们研究了学习中两个具有影响力的稳定性概念的随机性复杂性：可复制性，它承诺当从相同分布的样本（和共享的随机硬币）运行时，$A$通常输出相同结果；差分隐私，它承诺在邻近的数据集下，$A$的输出分布保持相似。最近在（Dixon等人ICML 2024）和（Cannone等人ITCS 2024）中研究了这些概念的随机性复杂性，用于基本的$d$维任务（例如，估计$d$个硬币的偏差），但对于更一般或在复杂设置（如分类）中的度量了解甚少。为此，我们证明了一个针对稳定性的“弱到强”提升定理：任务$M$的随机性复杂性（在可复制性或DP下）由解决任务的任何确定性算法的最佳复制概率严格控制，这是一个称为“全局稳定性”的弱度量，普遍上限为$\frac{1}{2}$（Chase等人FOCS 2023）。利用这一结果，我们表征了PAC学习的随机性复杂性：一个类具有有界的随机性复杂性，当且仅当它具有有限的Littlestone维度，并且最坏情况下以学习者的过度错误对数级别扩展。这解决了（Chase等人STOC 2024）提出的一个问题，他们使用（错误相关的）“列表可复制性”的等价语言寻求这样的表征。

更新时间: 2025-02-11 23:06:43

领域: cs.LG,stat.ML,68Q32

下载: http://arxiv.org/abs/2502.08007v1

GMem: A Modular Approach for Ultra-Efficient Generative Models

Recent studies indicate that the denoising process in deep generative diffusion models implicitly learns and memorizes semantic information from the data distribution. These findings suggest that capturing more complex data distributions requires larger neural networks, leading to a substantial increase in computational demands, which in turn become the primary bottleneck in both training and inference of diffusion models. To this end, we introduce GMem: A Modular Approach for Ultra-Efficient Generative Models. Our approach GMem decouples the memory capacity from model and implements it as a separate, immutable memory set that preserves the essential semantic information in the data. The results are significant: GMem enhances both training, sampling efficiency, and diversity generation. This design on one hand reduces the reliance on network for memorize complex data distribution and thus enhancing both training and sampling efficiency. On ImageNet at $256 \times 256$ resolution, GMem achieves a $50\times$ training speedup compared to SiT, reaching FID $=7.66$ in fewer than $28$ epochs ($\sim 4$ hours training time), while SiT requires $1400$ epochs. Without classifier-free guidance, GMem achieves state-of-the-art (SoTA) performance FID $=1.53$ in $160$ epochs with only $\sim 20$ hours of training, outperforming LightningDiT which requires $800$ epochs and $\sim 95$ hours to attain FID $=2.17$.

Updated: 2025-02-11 23:05:30

标题: GMem：一种用于超高效生成模型的模块化方法

摘要: 最近的研究表明，深度生成扩散模型中的去噪过程隐式地学习并记忆了数据分布中的语义信息。这些发现表明，捕捉更复杂的数据分布需要更大的神经网络，这导致计算需求大幅增加，进而成为扩散模型的训练和推理的主要瓶颈。为此，我们引入了GMem：一种超高效生成模型的模块化方法。我们的方法GMem将内存容量与模型分离，并将其实现为一个独立的、不可变的内存集，用于保存数据中的基本语义信息。结果显著：GMem提升了训练、采样效率和多样性生成。这种设计一方面降低了对网络记忆复杂数据分布的依赖，从而提高了训练和采样效率。在$256 \times 256$分辨率的ImageNet上，与SiT相比，GMem实现了50倍的训练加速，达到了FID $=7.66$，在不到$28$个时代（约4小时的训练时间）内，而SiT需要$1400$个时代。在没有分类器指导的情况下，GMem在$160$个时代内实现了最新技术水平（SoTA）的性能FID $=1.53$，仅需约20小时的训练时间，胜过了需要$800$个时代和约95小时才能达到FID $=2.17$的LightningDiT。

更新时间: 2025-02-11 23:05:30

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2412.08781v2

Greed is Good: Guided Generation from a Greedy Perspective

Training-free guided generation is a widely used and powerful technique that allows the end user to exert further control over the generative process of diffusion models. In this work, we explore the guided generation from the perspective of optimizing the solution trajectory of a neural differential equation in a greedy manner. We present such a strategy as a unifying view on training-free guidance by showing that the greedy strategy is a first-order discretization of end-to-end optimization techniques. We show that a greedy guidance strategy makes good decisions and compare it to a guidance strategy using the ideal gradients found via the continuous adjoint equations. We then show how other popular training-free guidance strategies can be viewed in a unified manner from this perspective.

Updated: 2025-02-11 23:05:16

标题: 贪婪是好的：从贪婪的角度进行引导生成

摘要: 无需训练的引导生成是一种广泛使用且强大的技术，使最终用户能够进一步控制扩散模型的生成过程。在这项工作中，我们从贪婪的角度探讨了引导生成，即优化神经微分方程的解轨迹。我们将这种策略作为一个统一的训练-free指导视角来呈现，通过展示贪婪策略是端到端优化技术的一阶离散化。我们展示了贪婪引导策略做出了明智的决定，并将其与使用通过连续伴随方程找到的理想梯度的引导策略进行比较。然后，我们展示了如何从这个角度统一地看待其他流行的无需训练的引导策略。

更新时间: 2025-02-11 23:05:16

领域: cs.LG,cs.AI,stat.ML

下载: http://arxiv.org/abs/2502.08006v1

Towards Training One-Step Diffusion Models Without Distillation

Recent advances in one-step generative models typically follow a two-stage process: first training a teacher diffusion model and then distilling it into a one-step student model. This distillation process traditionally relies on both the teacher model's score function to compute the distillation loss and its weights for student initialization. In this paper, we explore whether one-step generative models can be trained directly without this distillation process. First, we show that the teacher's score function is not essential and propose a family of distillation methods that achieve competitive results without relying on score estimation. Next, we demonstrate that initialization from teacher weights is indispensable in successful training. Surprisingly, we find that this benefit is not due to improved ``input-output" mapping but rather the learned feature representations, which dominate distillation quality. Our findings provide a better understanding of the role of initialization in one-step model training and its impact on distillation quality.

Updated: 2025-02-11 23:02:14

标题: 朝向无需蒸馏的一步扩散模型训练

摘要: 最近一步生成模型的研究进展通常遵循两阶段过程：首先训练一个教师扩散模型，然后将其提炼成一个一步学生模型。这种提炼过程传统上依赖于教师模型的评分函数来计算提炼损失以及其权重用于学生初始化。在本文中，我们探讨了一步生成模型是否可以在不经过这种提炼过程的情况下直接训练。首先，我们表明教师的评分函数并非必不可少，并提出了一系列提炼方法，可以在不依赖评分估计的情况下取得竞争性结果。接下来，我们证明了来自教师权重的初始化对于成功训练是不可或缺的。令人惊讶的是，我们发现这种好处并不是由于改进的“输入-输出”映射，而是由于学习到的特征表示主导了提炼质量。我们的研究结果为我们更好地理解初始化在一步模型训练中的作用以及其对提炼质量的影响提供了帮助。

更新时间: 2025-02-11 23:02:14

领域: cs.LG,cs.CV

下载: http://arxiv.org/abs/2502.08005v1

Optimizing Likelihoods via Mutual Information: Bridging Simulation-Based Inference and Bayesian Optimal Experimental Design

Simulation-based inference (SBI) is a method to perform inference on a variety of complex scientific models with challenging inference (inverse) problems. Bayesian Optimal Experimental Design (BOED) aims to efficiently use experimental resources to make better inferences. Various stochastic gradient-based BOED methods have been proposed as an alternative to Bayesian optimization and other experimental design heuristics to maximize information gain from an experiment. We demonstrate a link via mutual information bounds between SBI and stochastic gradient-based variational inference methods that permits BOED to be used in SBI applications as SBI-BOED. This link allows simultaneous optimization of experimental designs and optimization of amortized inference functions. We evaluate the pitfalls of naive design optimization using this method in a standard SBI task and demonstrate the utility of a well-chosen design distribution in BOED. We compare this approach on SBI-based models in real-world simulators in epidemiology and biology, showing notable improvements in inference.

Updated: 2025-02-11 22:58:18

标题: 通过互信息优化似然：连接基于模拟推断和贝叶斯最优实验设计

摘要: 基于模拟的推断（SBI）是一种在具有挑战性推断（反演）问题的各种复杂科学模型上进行推断的方法。贝叶斯最优实验设计（BOED）旨在有效利用实验资源以进行更好的推断。各种基于随机梯度的BOED方法已被提出作为贝叶斯优化和其他实验设计启发式的替代方案，以最大化实验中的信息增益。我们通过互信息界展示了SBI和基于随机梯度的变分推断方法之间的联系，允许BOED在SBI应用中作为SBI-BOED使用。这种联系允许同时优化实验设计和摊还推断函数的优化。我们在标准SBI任务中评估了使用这种方法进行天真设计优化的缺陷，并展示了BOED中选择良好设计分布的实用性。我们在流行病学和生物学中的真实模拟器上比较了这种方法在基于SBI的模型上，展示了推断方面的显著改善。

更新时间: 2025-02-11 22:58:18

领域: stat.ML,cs.LG

下载: http://arxiv.org/abs/2502.08004v1

Heterogeneous Multi-agent Multi-armed Bandits on Stochastic Block Models

We study a novel heterogeneous multi-agent multi-armed bandit problem with a cluster structure induced by stochastic block models, influencing not only graph topology, but also reward heterogeneity. Specifically, agents are distributed on random graphs based on stochastic block models - a generalized Erdos-Renyi model with heterogeneous edge probabilities: agents are grouped into clusters (known or unknown); edge probabilities for agents within the same cluster differ from those across clusters. In addition, the cluster structure in stochastic block model also determines our heterogeneous rewards. Rewards distributions of the same arm vary across agents in different clusters but remain consistent within a cluster, unifying homogeneous and heterogeneous settings and varying degree of heterogeneity, and rewards are independent samples from these distributions. The objective is to minimize system-wide regret across all agents. To address this, we propose a novel algorithm applicable to both known and unknown cluster settings. The algorithm combines an averaging-based consensus approach with a newly introduced information aggregation and weighting technique, resulting in a UCB-type strategy. It accounts for graph randomness, leverages both intra-cluster (homogeneous) and inter-cluster (heterogeneous) information from rewards and graphs, and incorporates cluster detection for unknown cluster settings. We derive optimal instance-dependent regret upper bounds of order $\log{T}$ under sub-Gaussian rewards. Importantly, our regret bounds capture the degree of heterogeneity in the system (an additional layer of complexity), exhibit smaller constants, scale better for large systems, and impose significantly relaxed assumptions on edge probabilities. In contrast, prior works have not accounted for this refined problem complexity, rely on more stringent assumptions, and exhibit limited scalability.

Updated: 2025-02-11 22:57:19

标题: 随机块模型上的异质多智能体多臂老虎机

摘要: 我们研究了一种新颖的具有由随机分块模型引起的集群结构的异质多智能体多臂赌博问题，这种结构不仅影响图的拓扑结构，还影响奖励的异质性。具体而言，智能体分布在基于随机分块模型的随机图上 - 这是一种具有异质边概率的广义Erdos-Renyi模型：智能体被分成集群（已知或未知）；同一集群内的智能体之间的边概率与不同集群间的不同。此外，随机分块模型中的集群结构也决定了我们的异质奖励。同一臂的奖励分布在不同集群的智能体之间变化，但在一个集群内保持一致，统一了同质和异质设置以及不同程度的异质性，奖励是从这些分布中独立抽样得到的。目标是最小化所有智能体的系统性后悔。为了解决这个问题，我们提出了一种适用于已知和未知集群设置的新算法。该算法将基于平均值的共识方法与一种新引入的信息聚合和加权技术相结合，从而产生一种UCB类型的策略。它考虑图的随机性，利用奖励和图中的集群内（同质）和集群间（异质）信息，并为未知集群设置包含集群检测。我们推导了在次高斯奖励下的优化实例相关后悔上界的$\log{T}$阶。重要的是，我们的后悔上界捕捉了系统中的异质性程度（另一层复杂性），具有更小的常数，更适合大型系统，对边概率施加了明显放松的假设。相反，先前的研究没有考虑到这种精细的问题复杂性，依赖于更严格的假设，并且展示了有限的可扩展性。

更新时间: 2025-02-11 22:57:19

领域: cs.LG

下载: http://arxiv.org/abs/2502.08003v1

Unveiling Client Privacy Leakage from Public Dataset Usage in Federated Distillation

Federated Distillation (FD) has emerged as a popular federated training framework, enabling clients to collaboratively train models without sharing private data. Public Dataset-Assisted Federated Distillation (PDA-FD), which leverages public datasets for knowledge sharing, has become widely adopted. Although PDA-FD enhances privacy compared to traditional Federated Learning, we demonstrate that the use of public datasets still poses significant privacy risks to clients' private training data. This paper presents the first comprehensive privacy analysis of PDA-FD in presence of an honest-but-curious server. We show that the server can exploit clients' inference results on public datasets to extract two critical types of private information: label distributions and membership information of the private training dataset. To quantify these vulnerabilities, we introduce two novel attacks specifically designed for the PDA-FD setting: a label distribution inference attack and innovative membership inference methods based on Likelihood Ratio Attack (LiRA). Through extensive evaluation of three representative PDA-FD frameworks (FedMD, DS-FL, and Cronus), our attacks achieve state-of-the-art performance, with label distribution attacks reaching minimal KL-divergence and membership inference attacks maintaining high True Positive Rates under low False Positive Rate constraints. Our findings reveal significant privacy risks in current PDA-FD frameworks and emphasize the need for more robust privacy protection mechanisms in collaborative learning systems.

Updated: 2025-02-11 22:48:49

标题: 揭示联邦蒸馏中公共数据集使用中客户隐私泄露

摘要: 联邦蒸馏（FD）已成为一种流行的联邦训练框架，使客户能够在不共享私人数据的情况下协作训练模型。利用公共数据集进行辅助的联邦蒸馏（PDA-FD）已被广泛采用，这种方法利用公共数据集进行知识共享。尽管与传统的联邦学习相比，PDA-FD提高了隐私性，但我们证明利用公共数据集仍然会对客户的私人训练数据造成重大隐私风险。本文针对一个诚实但好奇的服务器环境，提出了PDA-FD的第一份全面的隐私分析。我们展示了服务器可以利用客户在公共数据集上的推断结果提取两种关键类型的私人信息：标签分布和私人训练数据集的成员信息。为了量化这些漏洞，我们提出了两种专门针对PDA-FD设置设计的新型攻击：标签分布推断攻击和基于似然比攻击（LiRA）的创新成员推断方法。通过对三种代表性的PDA-FD框架（FedMD、DS-FL和Cronus）进行广泛评估，我们的攻击实现了最先进的性能，标签分布攻击达到最小的KL散度，成员推断攻击在低误报率约束下保持高真正率。我们的发现揭示了当前PDA-FD框架中存在的重大隐私风险，并强调了在协作学习系统中需要更加健壮的隐私保护机制的必要性。

更新时间: 2025-02-11 22:48:49

领域: cs.CR,cs.LG

下载: http://arxiv.org/abs/2502.08001v1

Adaptive kernel predictors from feature-learning infinite limits of neural networks

Previous influential work showed that infinite width limits of neural networks in the lazy training regime are described by kernel machines. Here, we show that neural networks trained in the rich, feature learning infinite-width regime in two different settings are also described by kernel machines, but with data-dependent kernels. For both cases, we provide explicit expressions for the kernel predictors and prescriptions to numerically calculate them. To derive the first predictor, we study the large-width limit of feature-learning Bayesian networks, showing how feature learning leads to task-relevant adaptation of layer kernels and preactivation densities. The saddle point equations governing this limit result in a min-max optimization problem that defines the kernel predictor. To derive the second predictor, we study gradient flow training of randomly initialized networks trained with weight decay in the infinite-width limit using dynamical mean field theory (DMFT). The fixed point equations of the arising DMFT defines the task-adapted internal representations and the kernel predictor. We compare our kernel predictors to kernels derived from lazy regime and demonstrate that our adaptive kernels achieve lower test loss on benchmark datasets.

Updated: 2025-02-11 22:34:49

标题: 来自特征学习神经网络无限极限的自适应核预测器

摘要: 先前的有影响力的工作表明，在懒惰训练模式下，神经网络的无限宽度极限可由核机器描述。在这里，我们展示了在两种不同设置中训练的富有特征学习无限宽度模式的神经网络也可由核机器描述，但具有数据相关的核。对于这两种情况，我们提供了核预测器的显式表达式和计算它们的数值方法。为了推导第一个预测器，我们研究了特征学习贝叶斯网络的大宽度极限，展示了特征学习如何导致层核和预激活密度的任务相关适应性。控制这一极限的鞍点方程导致了一个定义核预测器的极小极大优化问题。为了推导第二个预测器，我们使用动力学平均场理论（DMFT）研究了随机初始化网络的梯度流训练，在无限宽度极限下使用权重衰减。所产生的DMFT的固定点方程定义了任务适应的内部表示和核预测器。我们将我们的核预测器与从懒惰模式导出的核进行比较，并展示我们的自适应核在基准数据集上实现了更低的测试损失。

更新时间: 2025-02-11 22:34:49

领域: cs.LG,cond-mat.dis-nn,stat.ML

下载: http://arxiv.org/abs/2502.07998v1

Vision Foundation Models in Remote Sensing: A Survey

Artificial Intelligence (AI) technologies have profoundly transformed the field of remote sensing, revolutionizing data collection, processing, and analysis. Traditionally reliant on manual interpretation and task-specific models, remote sensing research has been significantly enhanced by the advent of foundation models-large-scale, pre-trained AI models capable of performing a wide array of tasks with unprecedented accuracy and efficiency. This paper provides a comprehensive survey of foundation models in the remote sensing domain. We categorize these models based on their architectures, pre-training datasets, and methodologies. Through detailed performance comparisons, we highlight emerging trends and the significant advancements achieved by those foundation models. Additionally, we discuss technical challenges, practical implications, and future research directions, addressing the need for high-quality data, computational resources, and improved model generalization. Our research also finds that pre-training methods, particularly self-supervised learning techniques like contrastive learning and masked autoencoders, remarkably enhance the performance and robustness of foundation models. This survey aims to serve as a resource for researchers and practitioners by providing a panorama of advances and promising pathways for continued development and application of foundation models in remote sensing.

Updated: 2025-02-11 22:29:52

标题: 遥感中的视觉基础模型：一项调查

摘要: 人工智能（AI）技术深刻地改变了遥感领域，彻底改变了数据收集、处理和分析的方式。传统上依赖于手动解释和任务特定模型的遥感研究已经通过基础模型的出现得到了显着增强——这些大规模、预训练的AI模型能够以前所未有的准确性和效率执行各种任务。本文全面调查了遥感领域的基础模型。我们根据它们的架构、预训练数据集和方法对这些模型进行分类。通过详细的性能比较，我们突出了新兴趋势和这些基础模型所取得的显著进展。此外，我们讨论了技术挑战、实际影响和未来研究方向，解决了对高质量数据、计算资源和改进模型泛化能力的需求。我们的研究还发现，尤其是对比学习和掩盖自动编码器等自监督学习技术，显著提升了基础模型的性能和鲁棒性。本调查旨在为研究人员和实践者提供资源，通过展示基础模型在遥感中持续发展和应用的进展和有前途的途径。

更新时间: 2025-02-11 22:29:52

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2408.03464v2

Closure Discovery for Coarse-Grained Partial Differential Equations Using Grid-based Reinforcement Learning

Reliable predictions of critical phenomena, such as weather, wildfires and epidemics often rely on models described by Partial Differential Equations (PDEs). However, simulations that capture the full range of spatio-temporal scales described by such PDEs are often prohibitively expensive. Consequently, coarse-grained simulations are usually deployed that adopt various heuristics and empirical closure terms to account for the missing information. We propose a novel and systematic approach for identifying closures in under-resolved PDEs using grid-based Reinforcement Learning. This formulation incorporates inductive bias and exploits locality by deploying a central policy represented efficiently by a Fully Convolutional Network (FCN). We demonstrate the capabilities and limitations of our framework through numerical solutions of the advection equation and the Burgers' equation. Our results show accurate predictions for in- and out-of-distribution test cases as well as a significant speedup compared to resolving all scales.

Updated: 2025-02-11 22:20:47

标题: 基于网格的强化学习用于粗粒度偏微分方程的闭合发现

摘要: 可靠的对临界现象，如天气、野火和流行病的预测往往依赖于由偏微分方程描述的模型。然而，捕捉这些偏微分方程描述的完整时空尺度范围的模拟往往代价高昂。因此，通常采用粗粒化模拟，采用各种启发式方法和经验性闭合项来补充缺失信息。我们提出了一种新颖而系统的方法，用于使用基于网格的强化学习来识别在未解决的偏微分方程中的封闭项。该公式结合了归纳偏差，并通过部署由全卷积网络(FCN)高效表示的中心策略来利用局部性。我们通过求解平流方程和Burgers'方程的数值解来展示我们框架的能力和局限性。我们的结果显示了对于分布内和分布外的测试案例的准确预测，以及与解决所有尺度相比的显著加速。

更新时间: 2025-02-11 22:20:47

领域: cs.LG,cs.MA,physics.comp-ph

下载: http://arxiv.org/abs/2402.00972v3

What is a Sketch-and-Precondition Derivation for Low-Rank Approximation? Inverse Power Error or Inverse Power Estimation?

Randomized sketching accelerates large-scale numerical linear algebra by reducing computational complexity. While the traditional sketch-and-solve approach reduces the problem size directly through sketching, the sketch-and-precondition method leverages sketching to construct a computational friendly preconditioner. This preconditioner improves the convergence speed of iterative solvers applied to the original problem, maintaining accuracy in the full space. Furthermore, the convergence rate of the solver improves at least linearly with the sketch size. Despite its potential, developing a sketch-and-precondition framework for randomized algorithms in low-rank matrix approximation remains an open challenge. We introduce the Error-Powered Sketched Inverse Iteration (EPSI) Method via run sketched Newton iteration for the Lagrange form as a sketch-and-precondition variant for randomized low-rank approximation. Our method achieves theoretical guarantees, including a convergence rate that improves at least linearly with the sketch size.

Updated: 2025-02-11 22:19:56

标题: 什么是用于低秩逼近的素描和预处理推导？逆功率误差还是逆功率估计？

摘要: 随机素描通过降低计算复杂性加速大规模数值线性代数。传统的素描和求解方法通过素描直接减小问题规模，而素描和预处理方法利用素描构建计算友好的预处理器。这个预处理器提高了迭代求解器在原始问题上的收敛速度，保持全空间的精度。此外，求解器的收敛速度至少线性地随着素描规模的增加而提高。尽管潜力巨大，为低秩矩阵逼近开发一种基于随机算法的素描和预处理框架仍然是一个未解之谜。我们介绍了通过运行素描牛顿迭代的Lagrange形式作为随机低秩逼近的素描和预处理变体的Error-Powered Sketched Inverse Iteration (EPSI) 方法。我们的方法实现了理论保证，包括至少线性地随着素描规模增加而提高的收敛速度。

更新时间: 2025-02-11 22:19:56

领域: math.NA,cs.CC,cs.LG,cs.NA,stat.CO,stat.ML

下载: http://arxiv.org/abs/2502.07993v1

Learning Effective Dynamics across Spatio-Temporal Scales of Complex Flows

Modeling and simulation of complex fluid flows with dynamics that span multiple spatio-temporal scales is a fundamental challenge in many scientific and engineering domains. Full-scale resolving simulations for systems such as highly turbulent flows are not feasible in the foreseeable future, and reduced-order models must capture dynamics that involve interactions across scales. In the present work, we propose a novel framework, Graph-based Learning of Effective Dynamics (Graph-LED), that leverages graph neural networks (GNNs), as well as an attention-based autoregressive model, to extract the effective dynamics from a small amount of simulation data. GNNs represent flow fields on unstructured meshes as graphs and effectively handle complex geometries and non-uniform grids. The proposed method combines a GNN based, dimensionality reduction for variable-size unstructured meshes with an autoregressive temporal attention model that can learn temporal dependencies automatically. We evaluated the proposed approach on a suite of fluid dynamics problems, including flow past a cylinder and flow over a backward-facing step over a range of Reynolds numbers. The results demonstrate robust and effective forecasting of spatio-temporal physics; in the case of the flow past a cylinder, both small-scale effects that occur close to the cylinder as well as its wake are accurately captured.

Updated: 2025-02-11 22:14:30

标题: 学习跨时空尺度的复杂流动中的有效动态

摘要: 建模和模拟涉及跨多个时空尺度的动态的复杂流体流动是许多科学和工程领域的基本挑战。对于高度湍流流动等系统的全尺度解析模拟在可预见的未来是不可行的，必须采用降阶模型来捕捉涉及尺度交互作用的动态。在本研究中，我们提出了一种新颖的框架，基于图神经网络（GNN）和基于注意力的自回归模型的Graph-based Learning of Effective Dynamics (Graph-LED)，用于从少量模拟数据中提取有效动态。GNN将流场表示为图形并有效处理复杂几何形状和非均匀网格。所提出的方法将基于GNN的变尺度非结构网格降维与自回归时间注意力模型结合起来，能够自动学习时间依赖性。我们对一系列流体动力学问题（包括圆柱流动和向后方向阶跃流动在一系列雷诺数下）上提出的方法进行了评估。结果表明，在圆柱流动的情况下，能够准确捕捉发生在圆柱附近以及其尾流中的小尺度效应。

更新时间: 2025-02-11 22:14:30

领域: cs.LG,physics.comp-ph,physics.flu-dyn

下载: http://arxiv.org/abs/2502.07990v1

MetaSC: Test-Time Safety Specification Optimization for Language Models

We propose a novel dynamic safety framework that optimizes language model (LM) safety reasoning at inference time without modifying model weights. Building on recent advances in self-critique methods, our approach leverages a meta-critique mechanism that iteratively updates safety prompts-termed specifications-to drive the critique and revision process adaptively. This test-time optimization not only improves performance against adversarial jailbreak requests but also in diverse general safety-related tasks, such as avoiding moral harm or pursuing honest responses. Our empirical evaluations across several language models demonstrate that dynamically optimized safety prompts yield significantly higher safety scores compared to fixed system prompts and static self-critique defenses. Code to be released at https://github.com/vicgalle/meta-self-critique.git .

Updated: 2025-02-11 22:06:25

标题: MetaSC：面向语言模型的测试时间安全规范优化

摘要: 我们提出了一个新颖的动态安全框架，可以在推理时优化语言模型（LM）的安全推理，而无需修改模型权重。借鉴最近自我批评方法的进展，我们的方法利用了一个元批评机制，通过迭代更新安全提示（称为规范）来驱动自我批评和修订过程的适应性。这种测试时间优化不仅改善了对抗性越狱请求的性能，还在各种一般安全相关任务中表现出色，比如避免道德伤害或追求诚实回应。我们在几个语言模型上的实证评估表明，动态优化的安全提示相对于固定系统提示和静态自我批评防御可以显著提高安全评分。代码将在https://github.com/vicgalle/meta-self-critique.git 上发布。

更新时间: 2025-02-11 22:06:25

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2502.07985v1

Deep Semantic Graph Learning via LLM based Node Enhancement

Graph learning has attracted significant attention due to its widespread real-world applications. Current mainstream approaches rely on text node features and obtain initial node embeddings through shallow embedding learning using GNNs, which shows limitations in capturing deep textual semantics. Recent advances in Large Language Models (LLMs) have demonstrated superior capabilities in understanding text semantics, transforming traditional text feature processing. This paper proposes a novel framework that combines Graph Transformer architecture with LLM-enhanced node features. Specifically, we leverage LLMs to generate rich semantic representations of text nodes, which are then processed by a multi-head self-attention mechanism in the Graph Transformer to capture both local and global graph structural information. Our model utilizes the Transformer's attention mechanism to dynamically aggregate neighborhood information while preserving the semantic richness provided by LLM embeddings. Experimental results demonstrate that the LLM-enhanced node features significantly improve the performance of graph learning models on node classification tasks. This approach shows promising results across multiple graph learning tasks, offering a practical direction for combining graph networks with language models.

Updated: 2025-02-11 21:55:46

标题: 通过基于LLM的节点增强实现深度语义图学习

摘要: 图学习因其广泛的现实世界应用而受到重视。目前主流方法依赖于文本节点特征，并通过使用GNN进行浅层嵌入学习来获得初始节点嵌入，但在捕捉深层文本语义方面存在局限性。最近，大型语言模型（LLMs）的进展表明了在理解文本语义方面的优越能力，改变了传统文本特征处理的方式。本文提出了一种新颖的框架，将图变换器架构与LLM增强的节点特征相结合。具体地，我们利用LLMs生成文本节点的丰富语义表示，然后通过图变换器中的多头自注意机制来处理这些表示，以捕获本地和全局图结构信息。我们的模型利用变换器的注意机制动态聚合邻域信息，同时保留LLM嵌入提供的语义丰富性。实验结果表明，LLM增强的节点特征显著提高了图学习模型在节点分类任务上的性能。这种方法在多个图学习任务中表现出有前途的结果，为将图网络与语言模型相结合提供了实际方向。

更新时间: 2025-02-11 21:55:46

领域: cs.AI

下载: http://arxiv.org/abs/2502.07982v1

CIRCUIT: A Benchmark for Circuit Interpretation and Reasoning Capabilities of LLMs

The role of Large Language Models (LLMs) has not been extensively explored in analog circuit design, which could benefit from a reasoning-based approach that transcends traditional optimization techniques. In particular, despite their growing relevance, there are no benchmarks to assess LLMs' reasoning capability about circuits. Therefore, we created the CIRCUIT dataset consisting of 510 question-answer pairs spanning various levels of analog-circuit-related subjects. The best-performing model on our dataset, GPT-4o, achieves 48.04% accuracy when evaluated on the final numerical answer. To evaluate the robustness of LLMs on our dataset, we introduced a unique feature that enables unit-test-like evaluation by grouping questions into unit tests. In this case, GPT-4o can only pass 27.45% of the unit tests, highlighting that the most advanced LLMs still struggle with understanding circuits, which requires multi-level reasoning, particularly when involving circuit topologies. This circuit-specific benchmark highlights LLMs' limitations, offering valuable insights for advancing their application in analog integrated circuit design.

Updated: 2025-02-11 21:53:48

标题: 电路：LLM的电路解释和推理能力基准

摘要: 大型语言模型（LLMs）在模拟电路设计中的作用尚未得到广泛探讨，而这种设计可以受益于一种超越传统优化技术的基于推理的方法。尽管它们的相关性日益增加，但尚无基准可以评估LLMs对电路的推理能力。因此，我们创建了由510个问题-答案对组成的CIRCUIT数据集，涵盖各种模拟电路相关主题的不同级别。在我们的数据集上表现最佳的模型GPT-4o，在最终数值答案上达到48.04%的准确率。为了评估LLMs在我们的数据集上的稳健性，我们引入了一个独特的功能，使得可以将问题分组为单元测试进行评估。在这种情况下，GPT-4o只能通过27.45%的单元测试，凸显了最先进的LLMs仍在努力理解电路，尤其是涉及电路拓扑结构时需要多层推理。这个特定于电路的基准测试突显了LLMs的局限性，为推进它们在模拟集成电路设计中的应用提供了宝贵的见解。

更新时间: 2025-02-11 21:53:48

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2502.07980v1

A Survey of In-Context Reinforcement Learning

Reinforcement learning (RL) agents typically optimize their policies by performing expensive backward passes to update their network parameters. However, some agents can solve new tasks without updating any parameters by simply conditioning on additional context such as their action-observation histories. This paper surveys work on such behavior, known as in-context reinforcement learning.

Updated: 2025-02-11 21:52:19

标题: 一个关于上下文强化学习的调查

摘要: 强化学习（RL）代理通常通过执行昂贵的反向传播来优化其策略，以更新其网络参数。然而，一些代理可以通过简单地在额外的上下文条件下进行操作，例如它们的行动-观察历史，来解决新任务而无需更新任何参数。本文调查了这种行为，称为上下文强化学习。

更新时间: 2025-02-11 21:52:19

领域: cs.LG

下载: http://arxiv.org/abs/2502.07978v1

RESIST: Resilient Decentralized Learning Using Consensus Gradient Descent

Empirical risk minimization (ERM) is a cornerstone of modern machine learning (ML), supported by advances in optimization theory that ensure efficient solutions with provable algorithmic convergence rates, which measure the speed at which optimization algorithms approach a solution, and statistical learning rates, which characterize how well the solution generalizes to unseen data. Privacy, memory, computational, and communications constraints increasingly necessitate data collection, processing, and storage across network-connected devices. In many applications, these networks operate in decentralized settings where a central server cannot be assumed, requiring decentralized ML algorithms that are both efficient and resilient. Decentralized learning, however, faces significant challenges, including an increased attack surface for adversarial interference during decentralized learning processes. This paper focuses on the man-in-the-middle (MITM) attack, which can cause models to deviate significantly from their intended ERM solutions. To address this challenge, we propose RESIST (Resilient dEcentralized learning using conSensus gradIent deScenT), an optimization algorithm designed to be robust against adversarially compromised communication links. RESIST achieves algorithmic and statistical convergence for strongly convex, Polyak-Lojasiewicz, and nonconvex ERM problems. Experimental results demonstrate the robustness and scalability of RESIST for real-world decentralized learning in adversarial environments.

Updated: 2025-02-11 21:48:10

标题: RESIST: 使用共识梯度下降的弹性分散学习

摘要: 经验风险最小化（ERM）是现代机器学习（ML）的基石，得到了优化理论的支持，确保了具有可证明的算法收敛速率的高效解决方案，这些收敛速率衡量了优化算法接近解决方案的速度，以及统计学习速率，表征解决方案对未见数据的泛化程度。隐私、内存、计算和通信约束日益需要在网络连接的设备之间进行数据收集、处理和存储。在许多应用中，这些网络在分散设置中运作，不能假设有一个中央服务器，因此需要既高效又具有韧性的分散式ML算法。然而，分散学习面临着重大挑战，包括在分散学习过程中对敌对干扰的攻击面增加。本文重点讨论中间人（MITM）攻击，这种攻击可能导致模型明显偏离其预期的ERM解决方案。为了解决这一挑战，我们提出了RESIST（Resilient dEcentralized learning using conSensus gradIent deScenT）算法，旨在抵御遭到敌对破坏的通信链路。RESIST算法实现了对强凸、Polyak-Lojasiewicz和非凸ERM问题的算法和统计收敛。实验结果展示了RESIST在对抗环境中进行实际分散式学习时的韧性和可扩展性。

更新时间: 2025-02-11 21:48:10

领域: cs.LG,math.OC,stat.ML

下载: http://arxiv.org/abs/2502.07977v1

EnvId: A Metric Learning Approach for Forensic Few-Shot Identification of Unseen Environments

Audio recordings may provide important evidence in criminal investigations. One such case is the forensic association of a recorded audio to its recording location. For example, a voice message may be the only investigative cue to narrow down the candidate sites for a crime. Up to now, several works provide supervised classification tools for closed-set recording environment identification under relatively clean recording conditions. However, in forensic investigations, the candidate locations are case-specific. Thus, supervised learning techniques are not applicable without retraining a classifier on a sufficient amount of training samples for each case and respective candidate set. In addition, a forensic tool has to deal with audio material from uncontrolled sources with variable properties and quality. In this work, we therefore attempt a major step towards practical forensic application scenarios. We propose a representation learning framework called EnvId, short for environment identification. EnvId avoids case-specific retraining by modeling the task as a few-shot classification problem. We demonstrate that EnvId can handle forensically challenging material. It provides good quality predictions even under unseen signal degradations, out-of-distribution reverberation characteristics or recording position mismatches.

Updated: 2025-02-11 21:40:27

标题: EnvId：一种度量学习方法，用于对未见环境进行法证少样本识别

摘要: 音频记录可能在刑事调查中提供重要证据。一个案例是将录制的音频与录制位置进行鉴定。例如，语音消息可能是缩小犯罪候选地点范围的唯一调查线索。到目前为止，一些作品提供了针对相对清洁的录制环境的监督分类工具。然而，在法庭调查中，候选地点是特定案例的。因此，如果没有重新对每个案例和候选集进行足够数量的训练样本进行分类器的再训练，监督学习技术是不适用的。此外，法庭工具必须处理来自具有可变属性和质量的不受控制来源的音频材料。在这项工作中，我们尝试朝着实际法庭应用场景迈出重要一步。我们提出了一个称为EnvId的表示学习框架，简称环境识别。EnvId通过将任务建模为少数样本分类问题，避免了特定案例的重新训练。我们证明EnvId可以处理具有法庭挑战性的材料。即使在见不到的信号退化、超出分布的混响特征或录制位置不匹配的情况下，它也能提供高质量的预测。

更新时间: 2025-02-11 21:40:27

领域: cs.SD,cs.LG,eess.AS

下载: http://arxiv.org/abs/2405.02119v2

Sink equilibria and the attractors of learning in games

Characterizing the limit behavior -- that is, the attractors -- of learning dynamics is one of the most fundamental open questions in game theory. In recent work in this front, it was conjectured that the attractors of the replicator dynamic are in one-to-one correspondence with the sink equilibria of the game -- the sink strongly connected components of a game's preference graph -- , and it was established that they do stand in at least one-to-many correspondence with them. We make threefold progress on the problem of characterizing attractors. First, we show through a topological construction that the one-to-one conjecture is false. Second, we make progress on the attractor characterization problem for two-player games by establishing that the one-to-one conjecture is true in the absence of a local pattern called a weak local source -- a pattern that is absent from zero-sum games. Finally, we look -- for the first time in this context -- at fictitious play, the longest-studied learning dynamic, and examine to what extent the conjecture generalizes there. We establish that under fictitious play, sink equilibria always contain attractors (sometimes strictly), and every attractor corresponds to a strongly connected set of nodes in the preference graph.

Updated: 2025-02-11 21:40:11

标题: 沉没均衡与博弈中学习的吸引子

摘要: 表征学习动态的极限行为 - 即吸引子 - 是博弈论中最基本的未解问题之一。在最近的研究中，有人猜测复制动态的吸引子与游戏的沉底均衡一一对应 - 游戏偏好图的沉底强连通分量 - ，并已经确定它们至少与之一一对应。我们在表征吸引子的问题上取得了三方面的进展。首先，我们通过拓扑构造展示了一对一猜想是错误的。其次，通过建立在两人游戏中表征吸引子的进展，我们证明了在没有称为弱局部源的局部模式的情况下，这一猜想是正确的 - 这种模式在零和游戏中不存在。最后，我们首次在这个背景下研究虚拟博弈，这是研究时间最长的学习动态，并考察了猜想在那里的普遍性。我们确定在虚拟博弈中，沉底均衡总是包含吸引子（有时严格），并且每个吸引子对应于偏好图中一组强连通节点。

更新时间: 2025-02-11 21:40:11

领域: cs.GT,cs.LG

下载: http://arxiv.org/abs/2502.07975v1

Characterization of point-source transient events with a rolling-shutter compressed sensing system

Point-source transient events (PSTEs) - optical events that are both extremely fast and extremely small - pose several challenges to an imaging system. Due to their speed, accurately characterizing such events often requires detectors with very high frame rates. Due to their size, accurately detecting such events requires maintaining coverage over an extended field-of-view, often through the use of imaging focal plane arrays (FPA) with a global shutter readout. Traditional imaging systems that meet these requirements are costly in terms of price, size, weight, power consumption, and data bandwidth, and there is a need for cheaper solutions with adequate temporal and spatial coverage. To address these issues, we develop a novel compressed sensing algorithm adapted to the rolling shutter readout of an imaging system. This approach enables reconstruction of a PSTE signature at the sampling rate of the rolling shutter, offering a 1-2 order of magnitude temporal speedup and a proportional reduction in data bandwidth. We present empirical results demonstrating accurate recovery of PSTEs using measurements that are spatially undersampled by a factor of 25, and our simulations show that, relative to other compressed sensing algorithms, our algorithm is both faster and yields higher quality reconstructions. We also present theoretical results characterizing our algorithm and corroborating simulations. The potential impact of our work includes the development of much faster, cheaper sensor solutions for PSTE detection and characterization.

Updated: 2025-02-11 21:39:32

标题: 使用滚动快门压缩感知系统对点源瞬态事件进行表征

摘要: 点源瞬态事件（PSTEs） - 光学事件，既极快又极小 - 对成像系统提出了几个挑战。由于其速度，准确表征这类事件通常需要具有非常高帧率的探测器。由于其大小，准确检测这类事件需要在扩展的视场范围内保持覆盖，通常通过使用具有全局快门读出的成像焦平面阵列（FPA）来实现。符合这些要求的传统成像系统在价格、大小、重量、功耗和数据带宽方面成本高昂，需要更便宜的解决方案来满足适当的时间和空间覆盖。为了解决这些问题，我们开发了一种新颖的压缩感知算法，适应成像系统的滚动快门读出。这种方法使得可以以滚动快门的采样率重建PSTE签名，提供了1-2数量级的时间加速和相应的数据带宽减少。我们提出的实验结果展示了使用空间下采样25倍的测量准确恢复PSTEs，我们的模拟显示，相对于其他压缩感知算法，我们的算法既更快速，又产生更高质量的重建。我们还提出了表征我们算法并证实模拟结果的理论结果。我们的工作潜在影响包括开发更快速、更便宜的传感器解决方案，用于PSTE检测和表征。

更新时间: 2025-02-11 21:39:32

领域: stat.ML,cs.LG,eess.SP,physics.optics,stat.AP,I.4.5

下载: http://arxiv.org/abs/2408.16868v2

From Hazard Identification to Controller Design: Proactive and LLM-Supported Safety Engineering for ML-Powered Systems

Machine learning (ML) components are increasingly integrated into software products, yet their complexity and inherent uncertainty often lead to unintended and hazardous consequences, both for individuals and society at large. Despite these risks, practitioners seldom adopt proactive approaches to anticipate and mitigate hazards before they occur. Traditional safety engineering approaches, such as Failure Mode and Effects Analysis (FMEA) and System Theoretic Process Analysis (STPA), offer systematic frameworks for early risk identification but are rarely adopted. This position paper advocates for integrating hazard analysis into the development of any ML-powered software product and calls for greater support to make this process accessible to developers. By using large language models (LLMs) to partially automate a modified STPA process with human oversight at critical steps, we expect to address two key challenges: the heavy dependency on highly experienced safety engineering experts, and the time-consuming, labor-intensive nature of traditional hazard analysis, which often impedes its integration into real-world development workflows. We illustrate our approach with a running example, demonstrating that many seemingly unanticipated issues can, in fact, be anticipated.

Updated: 2025-02-11 21:37:19

标题: 从危险识别到控制器设计：针对基于机器学习系统的主动和LLM支持的安全工程

摘要: 机器学习（ML）组件越来越多地集成到软件产品中，然而它们的复杂性和固有的不确定性往往会导致意想不到的危险后果，对个人和整个社会都产生危害。尽管存在这些风险，从业者很少采用积极的方法来预测和减轻发生危险之前的危害。传统的安全工程方法，例如故障模式和影响分析（FMEA）和系统论过程分析（STPA），提供了早期风险识别的系统框架，但很少被采用。这篇论文主张将危害分析融入到任何由ML驱动的软件产品的开发中，并呼吁为使这一过程对开发者更加易于接触提供更大的支持。通过使用大型语言模型（LLMs）部分自动化修改的STPA过程，并在关键步骤上监督人员，我们期望解决两个关键挑战：高度依赖经验丰富的安全工程专家和传统危害分析所需的耗时、劳动密集性，这经常阻碍其融入现实开发工作流程。我们通过一个运行示例来说明我们的方法，展示许多看似未预料到的问题实际上是可以预料到的。

更新时间: 2025-02-11 21:37:19

领域: cs.SE,cs.AI,cs.LG

下载: http://arxiv.org/abs/2502.07974v1

Faster Algorithms for Structured Linear and Kernel Support Vector Machines

Quadratic programming is a ubiquitous prototype in convex programming. Many machine learning problems can be formulated as quadratic programming, including the famous Support Vector Machines (SVMs). Linear and kernel SVMs have been among the most popular models in machine learning over the past three decades, prior to the deep learning era. Generally, a quadratic program has an input size of $\Theta(n^2)$, where $n$ is the number of variables. Assuming the Strong Exponential Time Hypothesis ($\textsf{SETH}$), it is known that no $O(n^{2-o(1)})$ time algorithm exists when the quadratic objective matrix is positive semidefinite (Backurs, Indyk, and Schmidt, NeurIPS'17). However, problems such as SVMs usually admit much smaller input sizes: one is given $n$ data points, each of dimension $d$, and $d$ is oftentimes much smaller than $n$. Furthermore, the SVM program has only $O(1)$ equality linear constraints. This suggests that faster algorithms are feasible, provided the program exhibits certain structures. In this work, we design the first nearly-linear time algorithm for solving quadratic programs whenever the quadratic objective admits a low-rank factorization, and the number of linear constraints is small. Consequently, we obtain results for SVMs: * For linear SVM when the input data is $d$-dimensional, our algorithm runs in time $\widetilde O(nd^{(\omega+1)/2}\log(1/\epsilon))$ where $\omega\approx 2.37$ is the fast matrix multiplication exponent; * For Gaussian kernel SVM, when the data dimension $d = {\color{black}O(\log n)}$ and the squared dataset radius is sub-logarithmic in $n$, our algorithm runs in time $O(n^{1+o(1)}\log(1/\epsilon))$. We also prove that when the squared dataset radius is at least $\Omega(\log^2 n)$, then $\Omega(n^{2-o(1)})$ time is required. This improves upon the prior best lower bound in both the dimension $d$ and the squared dataset radius.

Updated: 2025-02-11 21:37:03

标题: 结构化线性和核支持向量机的更快算法

摘要: 二次规划是凸规划中一个普遍的原型。许多机器学习问题可以被表述为二次规划，包括著名的支持向量机（SVMs）。在深度学习时代之前的三十年中，线性和核SVMs一直是机器学习中最流行的模型之一。一般来说，一个二次规划的输入大小为$\Theta(n^2)$，其中$n$是变量的数量。假设强指数时间假设（SETH），已知当二次目标矩阵为正半定时，不存在$O(n^{2-o(1)})$的时间算法（Backurs, Indyk, 和Schmidt, NeurIPS'17）。然而，像SVMs这样的问题通常具有更小的输入大小：给定$n$个数据点，每个维度为$d$，而$d$通常比$n$小得多。此外，SVM程序仅具有$O(1)$个等式线性约束。这表明，只要程序表现出某种结构，更快的算法是可行的。在这项工作中，我们设计了第一个几乎线性时间算法，用于解决二次规划，只要二次目标具有低秩因式分解，且线性约束的数量较小。因此，我们获得了对SVMs的结果： * 对于线性SVM，当输入数据为$d$维时，我们的算法运行时间为$\widetilde O(nd^{(\omega+1)/2}\log(1/\epsilon))$，其中$\omega\approx 2.37$是快速矩阵乘法指数； * 对于高斯核SVM，当数据维度$d = O(\log n)$且平方数据集半径为亚对数级别时，我们的算法运行时间为$O(n^{1+o(1)}\log(1/\epsilon))$。我们还证明了当平方数据集半径至少为$\Omega(\log^2 n)$时，需要$\Omega(n^{2-o(1)})$的时间。这在维度$d$和平方数据集半径方面均改进了先前的最佳下界。

更新时间: 2025-02-11 21:37:03

领域: math.OC,cs.LG,stat.ML

下载: http://arxiv.org/abs/2307.07735v3

ReTreever: Tree-based Coarse-to-Fine Representations for Retrieval

Document retrieval is a core component of question-answering systems, as it enables conditioning answer generation on new and large-scale corpora. While effective, the standard practice of encoding documents into high-dimensional embeddings for similarity search entails large memory and compute footprints, and also makes it hard to inspect the inner workings of the system. In this paper, we propose a tree-based method for organizing and representing reference documents at various granular levels, which offers the flexibility to balance cost and utility, and eases the inspection of the corpus content and retrieval operations. Our method, called ReTreever, jointly learns a routing function per internal node of a binary tree such that query and reference documents are assigned to similar tree branches, hence directly optimizing for retrieval performance. Our evaluations show that ReTreever generally preserves full representation accuracy. Its hierarchical structure further provides strong coarse representations and enhances transparency by indirectly learning meaningful semantic groupings. Among hierarchical retrieval methods, ReTreever achieves the best retrieval accuracy at the lowest latency, proving that this family of techniques can be viable in practical applications.

Updated: 2025-02-11 21:35:13

标题: 再发现：基于树的检索过程中粗到精细的表示

摘要: 文件检索是问答系统的核心组件，因为它可以在新的大规模语料库上生成答案。尽管有效，但将文件编码为高维嵌入以进行相似性搜索的标准做法会导致较大的内存和计算开销，同时也使得难以检查系统的内部工作原理。在本文中，我们提出了一种基于树的方法，用于组织和表示参考文档在不同粒度水平上，这种方法可以灵活平衡成本和效益，并简化对文档内容和检索操作的检查。我们的方法称为ReTreever，通过在二叉树的每个内部节点上共同学习一个路由函数，使查询和参考文档被分配到相似的树枝上，从而直接优化检索性能。我们的评估结果表明，ReTreever通常可以保持完整的表示准确性。其分层结构进一步提供了强大的粗略表示，并通过间接学习有意义的语义分组来增强透明度。在分层检索方法中，ReTreever在最低延迟下实现了最佳的检索准确性，证明了这类技术在实际应用中是可行的。

更新时间: 2025-02-11 21:35:13

领域: cs.IR,cs.AI,cs.LG,I.2; I.7; E.2; H.3

下载: http://arxiv.org/abs/2502.07971v1

A RAG Approach for Generating Competency Questions in Ontology Engineering

Competency question (CQ) formulation is central to several ontology development and evaluation methodologies. Traditionally, the task of crafting these competency questions heavily relies on the effort of domain experts and knowledge engineers which is often time-consuming and labor-intensive. With the emergence of Large Language Models (LLMs), there arises the possibility to automate and enhance this process. Unlike other similar works which use existing ontologies or knowledge graphs as input to LLMs, we present a retrieval-augmented generation (RAG) approach that uses LLMs for the automatic generation of CQs given a set of scientific papers considered to be a domain knowledge base. We investigate its performance and specifically, we study the impact of different number of papers to the RAG and different temperature setting of the LLM. We conduct experiments using GPT-4 on two domain ontology engineering tasks and compare results against ground-truth CQs constructed by domain experts. Empirical assessments on the results, utilizing evaluation metrics (precision and consistency), reveal that compared to zero-shot prompting, adding relevant domain knowledge to the RAG improves the performance of LLMs on generating CQs for concrete ontology engineering tasks.

Updated: 2025-02-11 21:25:45

标题: 一个用于在本体工程中生成能力问题的RAG方法 (Note: RAG stands for Resource Agent Group.)

摘要: Competency question (CQ) formulation is central to several ontology development and evaluation methodologies. Traditionally, the task of crafting these competency questions heavily relies on the effort of domain experts and knowledge engineers which is often time-consuming and labor-intensive. With the emergence of Large Language Models (LLMs), there arises the possibility to automate and enhance this process. Unlike other similar works which use existing ontologies or knowledge graphs as input to LLMs, we present a retrieval-augmented generation (RAG) approach that uses LLMs for the automatic generation of CQs given a set of scientific papers considered to be a domain knowledge base. We investigate its performance and specifically, we study the impact of different numbers of papers on the RAG and different temperature settings of the LLM. We conduct experiments using GPT-4 on two domain ontology engineering tasks and compare results against ground-truth CQs constructed by domain experts. Empirical assessments on the results, utilizing evaluation metrics (precision and consistency), reveal that compared to zero-shot prompting, adding relevant domain knowledge to the RAG improves the performance of LLMs in generating CQs for concrete ontology engineering tasks.

更新时间: 2025-02-11 21:25:45

领域: cs.AI

下载: http://arxiv.org/abs/2409.08820v2

Generative Risk Minimization for Out-of-Distribution Generalization on Graphs

Out-of-distribution (OOD) generalization on graphs aims at dealing with scenarios where the test graph distribution differs from the training graph distributions. Compared to i.i.d. data like images, the OOD generalization problem on graph-structured data remains challenging due to the non-i.i.d. property and complex structural information on graphs. Recently, several works on graph OOD generalization have explored extracting invariant subgraphs that share crucial classification information across different distributions. Nevertheless, such a strategy could be suboptimal for entirely capturing the invariant information, as the extraction of discrete structures could potentially lead to the loss of invariant information or the involvement of spurious information. In this paper, we propose an innovative framework, named Generative Risk Minimization (GRM), designed to generate an invariant subgraph for each input graph to be classified, instead of extraction. To address the challenge of optimization in the absence of optimal invariant subgraphs (i.e., ground truths), we derive a tractable form of the proposed GRM objective by introducing a latent causal variable, and its effectiveness is validated by our theoretical analysis. We further conduct extensive experiments across a variety of real-world graph datasets for both node-level and graph-level OOD generalization, and the results demonstrate the superiority of our framework GRM.

Updated: 2025-02-11 21:24:13

标题: 生成风险最小化用于图上的超分布泛化

摘要: 图上的分布外泛化（OOD）旨在处理测试图分布与训练图分布不同的情况。与像图像这样的i.i.d.数据相比，图结构数据上的OOD泛化问题仍然具有挑战性，因为图上的非i.i.d.属性和复杂的结构信息。最近，一些关于图OOD泛化的作品已经探索了提取不变子图，这些子图在不同分布之间共享关键的分类信息。然而，这种策略可能不够完美地捕捉不变信息，因为离散结构的提取可能会导致不变信息的丢失或涉及虚假信息。在本文中，我们提出了一种名为生成风险最小化（GRM）的创新框架，旨在为每个要分类的输入图生成一个不变子图，而不是提取。为了解决在没有最佳不变子图（即地面真相）的情况下的优化挑战，我们通过引入一个潜在因果变量，推导了所提出的GRM目标的可计算形式，并通过我们的理论分析验证了其有效性。我们进一步在各种真实世界图数据集上进行了广泛实验，包括节点级和图级OOD泛化，结果显示了我们的框架GRM的优越性。

更新时间: 2025-02-11 21:24:13

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2502.07968v1

Caught in the Web of Words: Do LLMs Fall for Spin in Medical Literature?

Medical research faces well-documented challenges in translating novel treatments into clinical practice. Publishing incentives encourage researchers to present "positive" findings, even when empirical results are equivocal. Consequently, it is well-documented that authors often spin study results, especially in article abstracts. Such spin can influence clinician interpretation of evidence and may affect patient care decisions. In this study, we ask whether the interpretation of trial results offered by Large Language Models (LLMs) is similarly affected by spin. This is important since LLMs are increasingly being used to trawl through and synthesize published medical evidence. We evaluated 22 LLMs and found that they are across the board more susceptible to spin than humans. They might also propagate spin into their outputs: We find evidence, e.g., that LLMs implicitly incorporate spin into plain language summaries that they generate. We also find, however, that LLMs are generally capable of recognizing spin, and can be prompted in a way to mitigate spin's impact on LLM outputs.

Updated: 2025-02-11 21:21:05

标题: 被词语之网困住：LLM在医学文献中会受到歪曲信息的影响吗？

摘要: 医学研究在将新治疗转化为临床实践方面面临着众所周知的挑战。出版激励鼓励研究人员呈现“积极”的结果，即使实证结果并不明确。因此，众所周知，作者经常对研究结果进行操纵，特别是在文章摘要中。这种操纵可能影响临床医生对证据的解释，并可能影响患者的护理决策。在这项研究中，我们探讨了大型语言模型（LLMs）提供的试验结果解释是否受到同样的影响。这很重要，因为LLMs越来越多地被用来搜索和综合已发布的医学证据。我们评估了22个LLM，并发现它们普遍比人类更容易受到操纵。它们也可能将操纵传播到它们的输出中：我们发现证据，例如，LLMs在生成的简明语言摘要中隐含地融入了操纵。然而，我们也发现，LLMs通常能够识别操纵，并且可以通过一种方式提示以减轻操纵对LLMs输出的影响。

更新时间: 2025-02-11 21:21:05

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2502.07963v1

ESPFormer: Doubly-Stochastic Attention with Expected Sliced Transport Plans

While self-attention has been instrumental in the success of Transformers, it can lead to over-concentration on a few tokens during training, resulting in suboptimal information flow. Enforcing doubly-stochastic constraints in attention matrices has been shown to improve structure and balance in attention distributions. However, existing methods rely on iterative Sinkhorn normalization, which is computationally costly. In this paper, we introduce a novel, fully parallelizable doubly-stochastic attention mechanism based on sliced optimal transport, leveraging Expected Sliced Transport Plans (ESP). Unlike prior approaches, our method enforces double stochasticity without iterative Sinkhorn normalization, significantly enhancing efficiency. To ensure differentiability, we incorporate a temperature-based soft sorting technique, enabling seamless integration into deep learning models. Experiments across multiple benchmark datasets, including image classification, point cloud classification, sentiment analysis, and neural machine translation, demonstrate that our enhanced attention regularization consistently improves performance across diverse applications.

Updated: 2025-02-11 21:20:48

标题: ESPFormer：具有预期切片传输计划的双随机注意力

摘要: 自注意力在Transformer的成功中起到了关键作用，但在训练过程中可能导致对少数标记的过度集中，从而导致次优的信息流。在注意力矩阵中强制实施双随机约束已被证明可以改善注意力分布的结构和平衡。然而，现有方法依赖于计算成本高昂的迭代Sinkhorn规范化。在本文中，我们引入了一种基于切片最优输运的新颖、完全可并行化的双随机注意力机制，利用了期望切片输运计划（ESP）。与先前的方法不同，我们的方法在不需要迭代Sinkhorn规范化的情况下强制双随机性，显著提高了效率。为了确保可微性，我们结合了基于温度的软排序技术，使其能够无缝集成到深度学习模型中。跨多个基准数据集的实验，包括图像分类、点云分类、情感分析和神经机器翻译，在各种应用中展示了我们增强的注意力正则化一致提高性能。

更新时间: 2025-02-11 21:20:48

领域: cs.LG

下载: http://arxiv.org/abs/2502.07962v1

Intrinsic Bias is Predicted by Pretraining Data and Correlates with Downstream Performance in Vision-Language Encoders

While recent work has found that vision-language models trained under the Contrastive Language Image Pre-training (CLIP) framework contain intrinsic social biases, the extent to which different upstream pre-training features of the framework relate to these biases, and hence how intrinsic bias and downstream performance are connected has been unclear. In this work, we present the largest comprehensive analysis to-date of how the upstream pre-training factors and downstream performance of CLIP models relate to their intrinsic biases. Studying 131 unique CLIP models, trained on 26 datasets, using 55 architectures, and in a variety of sizes, we evaluate bias in each model using 26 well-established unimodal and cross-modal principled Embedding Association Tests. We find that the choice of pre-training dataset is the most significant upstream predictor of bias, whereas architectural variations have minimal impact. Additionally, datasets curated using sophisticated filtering techniques aimed at enhancing downstream model performance tend to be associated with higher levels of intrinsic bias. Finally, we observe that intrinsic bias is often significantly correlated with downstream performance ($0.3 \leq r \leq 0.8$), suggesting that models optimized for performance inadvertently learn to amplify representational biases. Comparisons between unimodal and cross-modal association tests reveal that social group bias depends heavily on the modality. Our findings imply that more sophisticated strategies are needed to address intrinsic model bias for vision-language models across the entire model development pipeline.

Updated: 2025-02-11 21:11:47

标题: 内在偏见由预训练数据预测，并与视觉语言编码器的下游性能相关

摘要: 最近的研究发现，在对比语言图像预训练（CLIP）框架下训练的视觉-语言模型中存在固有的社会偏见，不同上游预训练特征与这些偏见之间的关系以及因此固有偏见与下游性能如何相互关联尚不清楚。在这项工作中，我们提出了迄今为止关于CLIP模型的上游预训练因素和下游性能如何与其固有偏见相关的最大的综合分析。我们研究了131个独特的CLIP模型，这些模型基于26个数据集、使用55种架构，并具有各种规模，我们使用26个成熟的单模态和跨模态原则性嵌入关联测试来评估每个模型中的偏见。我们发现，预训练数据集的选择是偏见最显著的上游预测因素，而架构变化对偏见的影响很小。此外，使用复杂过滤技术筛选的数据集，旨在增强下游模型性能的数据集往往与更高水平的固有偏见相关联。最后，我们观察到固有偏见通常与下游性能显著相关（$0.3 \leq r \leq 0.8$），这表明为了性能而优化的模型不经意地学习放大了表征偏见。单模态和跨模态关联测试之间的比较显示，社会群体偏见在很大程度上取决于模态。我们的发现暗示，需要更复杂的策略来解决整个模型开发流程中视觉-语言模型的固有偏见。

更新时间: 2025-02-11 21:11:47

领域: cs.AI

下载: http://arxiv.org/abs/2502.07957v1

Personalized Negative Reservoir for Incremental Learning in Recommender Systems

Recommender systems have become an integral part of online platforms. Every day the volume of training data is expanding and the number of user interactions is constantly increasing. The exploration of larger and more expressive models has become a necessary pursuit to improve user experience. However, this progression carries with it an increased computational burden. In commercial settings, once a recommendation system model has been trained and deployed it typically needs to be updated frequently as new client data arrive. Cumulatively, the mounting volume of data is guaranteed to eventually make full batch retraining of the model from scratch computationally infeasible. Naively fine-tuning solely on the new data runs into the well-documented problem of catastrophic forgetting. Despite the fact that negative sampling is a crucial part of training with implicit feedback, no specialized technique exists that is tailored to the incremental learning framework. In this work, we propose a personalized negative reservoir strategy, which is used to obtain negative samples for the standard triplet loss of graph-based recommendation systems. Our technique balances alleviation of forgetting with plasticity by encouraging the model to remember stable user preferences and selectively forget when user interests change. We derive the mathematical formulation of a negative sampler to populate and update the reservoir. We integrate our design in three SOTA and commonly used incremental recommendation models. We show that these concrete realizations of our negative reservoir framework achieve state-of-the-art results for standard benchmarks using multiple top-k evaluation metrics.

Updated: 2025-02-11 21:10:32

标题: 在推荐系统中用于增量学习的个性化负向储备

摘要: 推荐系统已经成为在线平台的一个重要组成部分。每天训练数据的量都在扩大，用户互动的数量也在不断增加。探索更大更具表现力的模型已经成为提高用户体验的必要追求。然而，这种发展带来了增加的计算负担。在商业环境中，一旦推荐系统模型被训练和部署，通常需要经常更新，以适应新的客户数据。累积起来，日益增长的数据量最终将使得从头开始对模型进行完全批量重新训练在计算上变得不可行。仅仅在新数据上进行简单调整会遇到已经有文献证明的灾难性遗忘问题。尽管负采样是使用隐式反馈进行训练的一个关键部分，但目前没有专门针对增量学习框架的技术存在。在这项工作中，我们提出了一种个性化的负样本库策略，用于为基于图的推荐系统的标准三元组损失获取负样本。我们的技术通过鼓励模型记住稳定的用户偏好并在用户兴趣改变时有选择性地遗忘，平衡了减轻遗忘和可塑性。我们推导了一个负采样器的数学公式，用于填充和更新负样本库。我们将我们的设计集成到三种最先进和常用的增量推荐模型中。我们展示了我们的负样本库框架的这些具体实现在使用多个top-k评估指标的标准基准上取得了最先进的结果。

更新时间: 2025-02-11 21:10:32

领域: cs.IR,cs.AI

下载: http://arxiv.org/abs/2403.03993v2

Federated Self-supervised Domain Generalization for Label-efficient Polyp Segmentation

Employing self-supervised learning (SSL) methodologies assumes par-amount significance in handling unlabeled polyp datasets when building deep learning-based automatic polyp segmentation models. However, the intricate privacy dynamics surrounding medical data often preclude seamless data sharing among disparate medical centers. Federated learning (FL) emerges as a formidable solution to this privacy conundrum, yet within the realm of FL, optimizing model generalization stands as a pressing imperative. Robust generalization capabilities are imperative to ensure the model's efficacy across diverse geographical domains post-training on localized client datasets. In this paper, a Federated self-supervised Domain Generalization method is proposed to enhance the generalization capacity of federated and Label-efficient intestinal polyp segmentation, named LFDG. Based on a classical SSL method, DropPos, LFDG proposes an adversarial learning-based data augmentation method (SSADA) to enhance the data diversity. LFDG further proposes a relaxation module based on Source-reconstruction and Augmentation-masking (SRAM) to maintain stability in feature learning. We have validated LFDG on polyp images from six medical centers. The performance of our method achieves 3.80% and 3.92% better than the baseline and other recent FL methods and SSL methods, respectively.

Updated: 2025-02-11 21:00:01

标题: 联邦式自监督域泛化用于高效标签息息相关的息息相关的息息相关的息息相关的息息相关的息息相关的息息相关的息息相关的息息相关的息息相关的息息相关的息息相关的息息相关的息息相关的息息相关的息息相关的息息相关的息息相关的息息相关的息息相关的息息相关的息息相关的息息相关的息息相关的息息相关的息息相关的息息相关的息息相关的息息相关的息息相关的息息相关的息息相关的息息相关的息息相关的息息相关的息息相关的息息相关的息息相关的息息相关的息息相关的息息相关的息息相关的息息相关的息息相关的息息相关的息息相关的息息相关的息息相关的息息相关的息息相关的息息相关的息息相关的息息相关的息息相关的息息相关的息息相关的息息相关的息息相关的息息相关的息息相关的息息相关的息息相关的息息相关的息息相关的息息相关的息息相关的息息相关的息息相关的息息相关的息息相关的息息相关的息息相关的息息相关的息息相关的息息相关的息息相关的息息相关的息息相关的息息相关的息息相关的息息相关的息息相关的息息相关的息息相关的息息相关的息息相关的息息相关的息息相关的息息相关的息息相关的息息相关的息息相关的息息相关的息息相关的息息相关的息息相关的息息相关的息息相关的息息相关的息息相关的息

摘要: 使用自监督学习（SSL）方法在构建基于深度学习的自动息肉分割模型时，处理未标记的息肉数据集至关重要。然而，围绕医疗数据的复杂隐私动态通常阻碍了不同医疗中心之间的无缝数据共享。联邦学习（FL）出现作为解决这一隐私难题的强有力方法，然而在FL领域中，优化模型泛化能力成为一个迫切的问题。强大的泛化能力是确保模型在本地客户数据训练后在不同地理领域有效性的重要保证。本文提出了一种联邦自监督领域泛化方法，用于增强联邦和标签高效的肠息肉分割模型，命名为LFDG。基于经典的SSL方法DropPos，LFDG提出了一种基于对抗学习的数据增强方法（SSADA）来增强数据的多样性。LFDG进一步提出了一个基于源重构和增强掩蔽（SRAM）的松弛模块，以保持特征学习的稳定性。我们在六家医疗中心的息肉图像上验证了LFDG。我们的方法的性能分别比基线和其他最近的FL方法和SSL方法提高了3.80%和3.92%。

更新时间: 2025-02-11 21:00:01

领域: cs.CV,cs.DC,cs.LG

下载: http://arxiv.org/abs/2502.07951v1

VSC-RL: Advancing Autonomous Vision-Language Agents with Variational Subgoal-Conditioned Reinforcement Learning

State-of-the-art (SOTA) reinforcement learning (RL) methods enable the vision-language agents to learn from interactions with the environment without human supervision. However, they struggle with learning inefficiencies in tackling real-world complex sequential decision-making tasks, especially with sparse reward signals and long-horizon dependencies. To effectively address the issue, we introduce Variational Subgoal-Conditioned RL (VSC-RL), which reformulates the vision-language sequential decision-making task as a variational goal-conditioned RL problem, allowing us to leverage advanced optimization methods to enhance learning efficiency. Specifically, VSC-RL optimizes the SubGoal Evidence Lower BOund (SGC-ELBO), which consists of (a) maximizing the subgoal-conditioned return via RL and (b) minimizing the subgoal-conditioned difference with the reference policy. We theoretically demonstrate that SGC-ELBO is equivalent to the original optimization objective, ensuring improved learning efficiency without sacrificing performance guarantees. Additionally, for real-world complex decision-making tasks, VSC-RL leverages the vision-language model to autonomously decompose the goal into feasible subgoals, enabling efficient learning. Across various benchmarks, including challenging real-world mobile device control tasks, VSC-RL significantly outperforms the SOTA vision-language agents, achieving superior performance and remarkable improvement in learning efficiency.

Updated: 2025-02-11 20:57:46

标题: VSC-RL: 使用变分子目标条件强化学习推动自主视觉-语言代理的发展

摘要: 最先进的强化学习（RL）方法使视觉语言代理能够在与环境互动中学习，而无需人类监督。然而，在处理现实世界复杂的顺序决策任务时，特别是在稀疏奖励信号和长期依赖性方面，它们往往面临学习效率低下的困难。为了有效解决这个问题，我们引入了变分子目标条件RL（VSC-RL），它将视觉语言顺序决策任务重新制定为一个变分目标条件RL问题，使我们能够利用先进的优化方法来增强学习效率。具体来说，VSC-RL优化了SubGoal Evidence Lower BOund（SGC-ELBO），其中包括（a）通过RL最大化子目标条件回报和（b）最小化与参考策略的子目标条件差异。我们在理论上证明了SGC-ELBO等效于原始优化目标，确保了提高学习效率而不损害性能保证。此外，对于现实世界复杂的决策任务，VSC-RL利用视觉语言模型自主将目标分解为可行的子目标，实现高效学习。在包括具有挑战性的现实世界移动设备控制任务在内的各种基准测试中，VSC-RL明显优于SOTA视觉语言代理，实现了更优越的性能和学习效率显著提高。

更新时间: 2025-02-11 20:57:46

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2502.07949v1

SurGrID: Controllable Surgical Simulation via Scene Graph to Image Diffusion

Surgical simulation offers a promising addition to conventional surgical training. However, available simulation tools lack photorealism and rely on hardcoded behaviour. Denoising Diffusion Models are a promising alternative for high-fidelity image synthesis, but existing state-of-the-art conditioning methods fall short in providing precise control or interactivity over the generated scenes. We introduce SurGrID, a Scene Graph to Image Diffusion Model, allowing for controllable surgical scene synthesis by leveraging Scene Graphs. These graphs encode a surgical scene's components' spatial and semantic information, which are then translated into an intermediate representation using our novel pre-training step that explicitly captures local and global information. Our proposed method improves the fidelity of generated images and their coherence with the graph input over the state-of-the-art. Further, we demonstrate the simulation's realism and controllability in a user assessment study involving clinical experts. Scene Graphs can be effectively used for precise and interactive conditioning of Denoising Diffusion Models for simulating surgical scenes, enabling high fidelity and interactive control over the generated content.

Updated: 2025-02-11 20:49:13

标题: SurGrID：通过场景图到图像扩散实现可控的手术模拟

摘要: 手术模拟为传统手术培训提供了一个具有前景的补充。然而，现有的模拟工具缺乏光真实性，并依赖于硬编码行为。去噪扩散模型是高保真图像合成的一个有前途的替代方案，但现有的最先进的调控方法在提供对生成场景的精确控制或交互方面存在不足。我们引入了SurGrID，一种场景图像扩散模型，通过利用场景图形实现可控手术场景合成。这些图形编码了手术场景的组件的空间和语义信息，然后将其转换为使用我们的新颖预训练步骤明确捕捉局部和全局信息的中间表示。我们提出的方法提高了生成图像的保真度和与图形输入的一致性，超过了最先进技术水平。此外，我们通过涉及临床专家的用户评估研究展示了模拟的逼真性和可控性。场景图可有效用于精确和交互式地调节去噪扩散模型，用于模拟手术场景，实现对生成内容的高保真度和交互式控制。

更新时间: 2025-02-11 20:49:13

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2502.07945v1

SHACL-SKOS Based Knowledge Representation of Material Safety Data Sheet (SDS) for the Pharmaceutical Industry

We report the development of a knowledge representation and reasoning (KRR) system built on hybrid SHACL-SKOS ontologies for globally harmonized system (GHS) material Safety Data Sheets (SDS) to enhance chemical safety communication and regulatory compliance. SDS are comprehensive documents containing safety and handling information for chemical substances. Thus, they are an essential part of workplace safety and risk management. However, the vast number of Safety Data Sheets from multiple organizations, manufacturers, and suppliers that produce and distribute chemicals makes it challenging to centralize and access SDS documents through a single repository. To accomplish the underlying issues of data exchange related to chemical shipping and handling, we construct SDS related controlled vocabulary and conditions validated by SHACL, and knowledge systems of similar domains linked via SKOS. The resulting hybrid ontologies aim to provide standardized yet adaptable representations of SDS information, facilitating better data sharing, retrieval, and integration across various platforms. This paper outlines our SHACL-SKOS system architectural design and showcases our implementation for an industrial application streamlining the generation of a composite shipping cover sheet.

Updated: 2025-02-11 20:44:45

标题: 基于SHACL-SKOS的药品行业安全数据表（SDS）知识表示

摘要: 我们报告了一个基于混合SHACL-SKOS本体论的知识表示和推理（KRR）系统的开发，用于全球协调系统（GHS）材料安全数据表（SDS）以增强化学安全沟通和法规遵从性。SDS是包含化学物质安全和处理信息的全面文档。因此，它们是工作场所安全和风险管理的重要组成部分。然而，来自多个组织、制造商和供应商生产和分发化学品的大量安全数据表使得通过单一存储库集中和访问SDS文档具有挑战性。为了解决与化学品运输和处理相关的数据交换潜在问题，我们构建了经SHACL验证的SDS相关控制词汇和条件，以及通过SKOS链接的类似领域的知识系统。由此产生的混合本体论旨在提供标准化但可适应的SDS信息表示，促进跨各种平台的更好数据共享、检索和集成。本文概述了我们的SHACL-SKOS系统架构设计，并展示了我们用于工业应用的实施，简化了复合运输封面表的生成。

更新时间: 2025-02-11 20:44:45

领域: cs.AI,I.2.4

下载: http://arxiv.org/abs/2502.07944v1

CREDAL: Close Reading of Data Models

Data models are necessary for the birth of data and of any data-driven system. Indeed, every algorithm, every machine learning model, every statistical model, and every database has an underlying data model without which the system would not be usable. Hence, data models are excellent sites for interrogating the (material, social, political, ...) conditions giving rise to a data system. Towards this, drawing inspiration from literary criticism, we propose to closely read data models in the same spirit as we closely read literary artifacts. Close readings of data models reconnect us with, among other things, the materiality, the genealogies, the techne, the closed nature, and the design of technical systems. While recognizing from literary theory that there is no one correct way to read, it is nonetheless critical to have systematic guidance for those unfamiliar with close readings. This is especially true for those trained in the computing and data sciences, who too often are enculturated to set aside the socio-political aspects of data work. A systematic methodology for reading data models currently does not exist. To fill this gap, we present the CREDAL methodology for close readings of data models. We detail our iterative development process and present results of a qualitative evaluation of CREDAL demonstrating its usability, usefulness, and effectiveness in the critical study of data.

Updated: 2025-02-11 20:42:56

标题: CREDAL: 数据模型的仔细阅读

摘要: 数据模型是数据和任何数据驱动系统诞生的必要条件。实际上，每个算法，每个机器学习模型，每个统计模型和每个数据库都有一个基础的数据模型，没有这个模型，系统将无法使用。因此，数据模型是一个很好的地方，可以探讨产生数据系统的各种条件（物质的，社会的，政治的...）。为此，我们借鉴文学批评的灵感，提议以与我们阅读文学作品一样的精神来仔细阅读数据模型。对数据模型的仔细阅读重新连接我们与技术系统的物质性，谱系，技术，封闭性和设计等方面。虽然从文学理论中认识到没有一种正确的阅读方式，但对于那些不熟悉仔细阅读的人来说，有系统的指导是至关重要的。对于那些受过计算和数据科学训练的人来说尤其如此，他们往往被教导忽视数据工作的社会政治方面。目前还没有一种系统的方法论来阅读数据模型。为了填补这一空白，我们提出了用于对数据模型进行仔细阅读的CREDAL方法论。我们详细介绍了我们的迭代式开发过程，并提出了对CREDAL进行定性评估的结果，证明了它在对数据进行批判性研究中的可用性，有用性和有效性。

更新时间: 2025-02-11 20:42:56

领域: cs.DB,cs.AI,cs.CY

下载: http://arxiv.org/abs/2502.07943v1

Symbiotic Cooperation for Web Agents: Harnessing Complementary Strengths of Large and Small LLMs

Web browsing agents powered by large language models (LLMs) have shown tremendous potential in automating complex web-based tasks. Existing approaches typically rely on large LLMs (e.g., GPT-4o) to explore web environments and generate trajectory data, which is then used either for demonstration retrieval (for large LLMs) or to distill small LLMs (e.g., Llama3) in a process that remains decoupled from the exploration. In this paper, we propose AgentSymbiotic, an iterative framework that couples data synthesis with task-performance, yielding a "symbiotic improvement" for both large and small LLMs. Our study uncovers a complementary dynamic between LLM types: while large LLMs excel at generating high-quality trajectories for distillation, the distilled small LLMs-owing to their distinct reasoning capabilities-often choose actions that diverge from those of their larger counterparts. This divergence drives the exploration of novel trajectories, thereby enriching the synthesized data. However, we also observe that the performance of small LLMs becomes a bottleneck in this iterative enhancement process. To address this, we propose two innovations in LLM distillation: a speculative data synthesis strategy that mitigates off-policy bias, and a multi-task learning approach designed to boost the reasoning capabilities of the student LLM. Furthermore, we introduce a Hybrid Mode for Privacy Preservation to address user privacy concerns. Evaluated on the WEBARENA benchmark, AgentSymbiotic achieves SOTA performance with both LLM types. Our best Large LLM agent reaches 52%, surpassing the previous best of 45%, while our 8B distilled model demonstrates a competitive 49%, exceeding the prior best of 28%. Code will be released upon acceptance.

Updated: 2025-02-11 20:41:49

标题: 网络代理的共生合作：利用大型和小型LLM的互补优势

摘要: 由大型语言模型（LLMs）驱动的网络浏览代理已经显示出在自动化复杂基于网络的任务方面具有巨大潜力。现有方法通常依赖于大型LLMs（例如GPT-4o）来探索网络环境并生成轨迹数据，然后将该数据用于演示检索（对于大型LLMs）或将其用于提炼小型LLMs（例如Llama3）的过程与探索是分离的。在本文中，我们提出了AgentSymbiotic，这是一个将数据合成与任务性能相结合的迭代框架，为大型和小型LLMs都提供了“共生改进”。我们的研究揭示了LLM类型之间的互补动态：尽管大型LLMs擅长生成用于提炼的高质量轨迹，但由于其独特的推理能力，提炼出的小型LLMs往往选择与其较大对应物不同的行动。这种分歧推动了新颖轨迹的探索，从而丰富了合成数据。但是，我们也观察到小型LLMs的性能在这个迭代增强过程中成为一个瓶颈。为了解决这个问题，我们在LLM提炼中提出了两项创新：一种缓解离线策略偏差的推测数据合成策略，以及一种旨在提升学生LLM推理能力的多任务学习方法。此外，我们引入了一种用于隐私保护的混合模式，以解决用户隐私问题。在WEBARENA基准测试上进行评估，AgentSymbiotic在两种LLM类型上均实现了SOTA性能。我们最佳的大型LLM代理达到了52％，超过了之前的45％最佳水平，而我们的8B提炼模型展示了竞争力的49％，超过了之前的28％最佳水平。代码将在接受后发布。

更新时间: 2025-02-11 20:41:49

领域: cs.MA,cs.LG

下载: http://arxiv.org/abs/2502.07942v1

Discrete Markov Probabilistic Models

This paper introduces the Discrete Markov Probabilistic Model (DMPM), a novel algorithm for discrete data generation. The algorithm operates in the space of bits $\{0,1\}^d$, where the noising process is a continuous-time Markov chain that can be sampled exactly via a Poissonian clock that flips labels uniformly at random. The time-reversal process, like the forward noise process, is a jump process, with its intensity governed by a discrete analogue of the classical score function. Crucially, this intensity is proven to be the conditional expectation of a function of the forward process, strengthening its theoretical alignment with score-based generative models while ensuring robustness and efficiency. We further establish convergence bounds for the algorithm under minimal assumptions and demonstrate its effectiveness through experiments on low-dimensional Bernoulli-distributed datasets and high-dimensional binary MNIST data. The results highlight its strong performance in generating discrete structures. This work bridges theoretical foundations and practical applications, advancing the development of effective and theoretically grounded discrete generative modeling.

Updated: 2025-02-11 20:36:23

标题: 离散马尔可夫概率模型

摘要: 本文介绍了离散马尔可夫概率模型（DMPM），这是一种用于离散数据生成的新算法。该算法在位 $\{0,1\}^d$ 的空间中操作，其中噪声过程是一个连续时间马尔可夫链，可以通过泊松时钟精确抽样，该时钟以均匀随机方式翻转标签。时间反演过程，就像前向噪声过程一样，是一个跃迁过程，其强度由经典得分函数的离散类比控制。关键是，该强度被证明是前向过程的一个函数的条件期望，加强了其与基于得分的生成模型的理论对齐，同时确保了稳健性和效率。我们进一步在最小假设下建立了该算法的收敛界，并通过对低维伯努利分布数据集和高维二进制MNIST数据的实验证明了其有效性。结果突显了它在生成离散结构方面的强大性能。这项工作架起了理论基础和实际应用之间的桥梁，推动了有效且理论基础扎实的离散生成建模的发展。

更新时间: 2025-02-11 20:36:23

领域: stat.ML,cs.LG

下载: http://arxiv.org/abs/2502.07939v1

What Matters in Hierarchical Search for Combinatorial Reasoning Problems?

Efficiently tackling combinatorial reasoning problems, particularly the notorious NP-hard tasks, remains a significant challenge for AI research. Recent efforts have sought to enhance planning by incorporating hierarchical high-level search strategies, known as subgoal methods. While promising, their performance against traditional low-level planners is inconsistent, raising questions about their application contexts. In this study, we conduct an in-depth exploration of subgoal-planning methods for combinatorial reasoning. We identify the attributes pivotal for leveraging the advantages of high-level search: hard-to-learn value functions, complex action spaces, presence of dead ends in the environment, or using data collected from diverse experts. We propose a consistent evaluation methodology to achieve meaningful comparisons between methods and reevaluate the state-of-the-art algorithms.

Updated: 2025-02-11 20:33:14

标题: Hierarchical Search对组合推理问题有什么重要影响？

摘要: 高效地解决组合推理问题，特别是臭名昭著的NP难题，仍然是人工智能研究的一个重要挑战。最近的努力致力于通过整合分层高水平搜索策略，即子目标方法，来增强规划能力。虽然有所希望，但它们在与传统低水平规划者的性能上存在不一致性，引发了对其应用背景的疑问。在本研究中，我们对用于组合推理的子目标规划方法进行了深入探讨。我们确定了利用高水平搜索优势的关键属性：难以学习的价值函数、复杂的动作空间、环境中存在死胡同，或使用来自不同专家的数据。我们提出了一种一致的评估方法，以实现方法之间的有意义比较，并重新评估最先进的算法。

更新时间: 2025-02-11 20:33:14

领域: cs.LG

下载: http://arxiv.org/abs/2406.03361v3

Active Advantage-Aligned Online Reinforcement Learning with Offline Data

Online reinforcement learning (RL) enhances policies through direct interactions with the environment, but faces challenges related to sample efficiency. In contrast, offline RL leverages extensive pre-collected data to learn policies, but often produces suboptimal results due to limited data coverage. Recent efforts have sought to integrate offline and online RL in order to harness the advantages of both approaches. However, effectively combining online and offline RL remains challenging due to issues that include catastrophic forgetting, lack of robustness and sample efficiency. In an effort to address these challenges, we introduce A3 RL , a novel method that actively selects data from combined online and offline sources to optimize policy improvement. We provide theoretical guarantee that validates the effectiveness our active sampling strategy and conduct thorough empirical experiments showing that our method outperforms existing state-of-the-art online RL techniques that utilize offline data. Our code will be publicly available at: https://github.com/xuefeng-cs/A3RL.

Updated: 2025-02-11 20:31:59

标题: 主动优势对齐的在线强化学习与离线数据

摘要: 在线强化学习（RL）通过与环境的直接交互增强策略，但面临与样本效率相关的挑战。相比之下，线下RL利用大量预先收集的数据来学习策略，但由于数据覆盖有限，通常会产生次优结果。最近的努力致力于将线下和线上RL相结合，以充分利用两种方法的优势。然而，有效地将在线和离线RL结合起来仍然具有挑战性，原因包括灾难性遗忘、缺乏鲁棒性和样本效率等问题。为了解决这些挑战，我们引入了A3 RL，这是一种新的方法，通过积极从线上和线下来源选择数据来优化策略改进。我们提供理论保证，验证了我们的主动采样策略的有效性，并进行了彻底的实证实验，显示出我们的方法优于利用离线数据的现有最先进在线RL技术。我们的代码将公开发布在：https://github.com/xuefeng-cs/A3RL。

更新时间: 2025-02-11 20:31:59

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2502.07937v1

CSR-Bench: Benchmarking LLM Agents in Deployment of Computer Science Research Repositories

The increasing complexity of computer science research projects demands more effective tools for deploying code repositories. Large Language Models (LLMs), such as Anthropic Claude and Meta Llama, have demonstrated significant advancements across various fields of computer science research, including the automation of diverse software engineering tasks. To evaluate the effectiveness of LLMs in handling complex code development tasks of research projects, particularly for NLP/CV/AI/ML/DM topics, we introduce CSR-Bench, a benchmark for Computer Science Research projects. This benchmark assesses LLMs from various aspects including accuracy, efficiency, and deployment script quality, aiming to explore their potential in conducting computer science research autonomously. We also introduce a novel framework, CSR-Agents, that utilizes multiple LLM agents to automate the deployment of GitHub code repositories of computer science research projects. Specifically, by checking instructions from markdown files and interpreting repository structures, the model generates and iteratively improves bash commands that set up the experimental environments and deploy the code to conduct research tasks. Preliminary results from CSR-Bench indicate that LLM agents can significantly enhance the workflow of repository deployment, thereby boosting developer productivity and improving the management of developmental workflows.

Updated: 2025-02-11 20:25:11

标题: CSR-Bench：在计算机科学研究仓库部署中对LLM代理进行基准测试

摘要: 计算机科学研究项目的复杂性不断增加，需要更有效的工具来部署代码存储库。大型语言模型（LLMs），如Anthropic Claude和Meta Llama，已经在计算机科学研究的各个领域取得了显著进展，包括自动化各种软件工程任务。为了评估LLMs在处理研究项目的复杂代码开发任务中的有效性，特别是针对NLP/CV/AI/ML/DM主题，我们引入了一个名为CSR-Bench的计算机科学研究项目基准。该基准从准确性、效率和部署脚本质量等各个方面评估LLMs，旨在探索它们在独立进行计算机科学研究方面的潜力。我们还引入了一个新颖的框架，CSR-Agents，利用多个LLM代理自动化部署计算机科学研究项目的GitHub代码存储库。具体来说，通过检查markdown文件中的指令并解释存储库结构，模型生成并迭代改进设置实验环境和部署代码以进行研究任务的bash命令。CSR-Bench的初步结果表明，LLM代理可以显著增强存储库部署的工作流程，从而提高开发人员的生产力并改善开发工作流程管理。

更新时间: 2025-02-11 20:25:11

领域: cs.SE,cs.AI,cs.LG

下载: http://arxiv.org/abs/2502.06111v2

Emulators for stellar profiles in binary population modeling

Knowledge about the internal physical structure of stars is crucial to understanding their evolution. The novel binary population synthesis code POSYDON includes a module for interpolating the stellar and binary properties of any system at the end of binary MESA evolution based on a pre-computed set of models. In this work, we present a new emulation method for predicting stellar profiles, i.e., the internal stellar structure along the radial axis, using machine learning techniques. We use principal component analysis for dimensionality reduction and fully-connected feed-forward neural networks for making predictions. We find accuracy to be comparable to that of nearest neighbor approximation, with a strong advantage in terms of memory and storage efficiency. By providing a versatile framework for modeling stellar internal structure, the emulation method presented here will enable faster simulations of higher physical fidelity, offering a foundation for a wide range of large-scale population studies of stellar and binary evolution.

Updated: 2025-02-11 20:18:12

标题: 双星群体建模中恒星轮廓的模拟器

摘要: 对于星体内部物理结构的了解对于理解它们的演化至关重要。新颖的双星群体合成代码POSYDON包括一个模块，用于在双星MESA演化结束时基于预先计算的模型集合来插值任何系统的恒星和双星属性。在这项工作中，我们提出了一种新的仿真方法，用于预测恒星轮廓，即沿径向轴的内部恒星结构，使用机器学习技术。我们使用主成分分析进行维度降低，并使用全连接前馈神经网络进行预测。我们发现准确性与最近邻逼近相当，但在内存和存储效率方面具有明显优势。通过提供一个多功能框架来建模恒星内部结构，这里提出的仿真方法将使更快速的高物理保真度模拟成为可能，为广泛的恒星和双星演化大规模人口研究奠定基础。

更新时间: 2025-02-11 20:18:12

领域: astro-ph.SR,astro-ph.GA,astro-ph.IM,cs.LG

下载: http://arxiv.org/abs/2410.11105v2

Educating a Responsible AI Workforce: Piloting a Curricular Module on AI Policy in a Graduate Machine Learning Course

As artificial intelligence (AI) technologies begin to permeate diverse fields-from healthcare to education-consumers, researchers and policymakers are increasingly raising concerns about whether and how AI is regulated. It is therefore reasonable to anticipate that alignment with principles of 'ethical' or 'responsible' AI, as well as compliance with law and policy, will form an increasingly important part of AI development. Yet, for the most part, the conventional computer science curriculum is ill-equipped to prepare students for these challenges. To this end, we seek to explore how new educational content related to AI ethics and AI policy can be integrated into both ethics- and technical-focused courses. This paper describes a two-lecture 'AI policy module' that was piloted in a graduate-level introductory machine learning course in 2024. The module, which includes an in-class active learning game, is evaluated using data from student surveys before and after the lectures, and pedagogical motivations and considerations are discussed. We find that the module is successful in engaging otherwise technically-oriented students on the topic of AI policy, increasing student awareness of the social impacts of a variety of AI technologies and developing student interest in the field of AI regulation.

Updated: 2025-02-11 20:16:56

标题: 培养负责任的人工智能人才：在研究生机器学习课程中试行人工智能政策课程模块

摘要: 随着人工智能（AI）技术开始渗透到各个领域-从医疗到教育-消费者、研究人员和政策制定者越来越担心是否以及如何对AI进行监管。因此，可以合理预期，与“道德”或“负责任”的AI原则一致，以及遵守法律和政策，将逐渐成为AI发展中越来越重要的一部分。然而，就大多数情况而言，传统的计算机科学课程并不足以为学生准备好应对这些挑战。为此，我们试图探讨如何将与AI伦理和AI政策相关的新教育内容整合到伦理和技术课程中。本文描述了在2024年一门研究生级别的入门机器学习课程中试行的两节“AI政策模块”。该模块包括一项课堂内的积极学习游戏，并使用学生在讲座前后的调查数据进行评估，讨论了教学动机和考虑因素。我们发现，该模块成功地吸引了原本以技术为导向的学生对AI政策这一主题的关注，增加了学生对各种AI技术的社会影响的认识，并培养了学生对AI监管领域的兴趣。

更新时间: 2025-02-11 20:16:56

领域: cs.CY,cs.AI

下载: http://arxiv.org/abs/2502.07931v1

Trustworthy AI on Safety, Bias, and Privacy: A Survey

The capabilities of artificial intelligence systems have been advancing to a great extent, but these systems still struggle with failure modes, vulnerabilities, and biases. In this paper, we study the current state of the field, and present promising insights and perspectives regarding concerns that challenge the trustworthiness of AI models. In particular, this paper investigates the issues regarding three thrusts: safety, privacy, and bias, which hurt models' trustworthiness. For safety, we discuss safety alignment in the context of large language models, preventing them from generating toxic or harmful content. For bias, we focus on spurious biases that can mislead a network. Lastly, for privacy, we cover membership inference attacks in deep neural networks. The discussions addressed in this paper reflect our own experiments and observations.

Updated: 2025-02-11 20:08:42

标题: 值得信赖的人工智能：关于安全、偏见和隐私的调查

摘要: 人工智能系统的能力正在不断提高，但这些系统仍然面临着故障模式、漏洞和偏见等问题。本文研究了该领域的当前状况，并提出了有关挑战AI模型可信度的问题的有希望的见解和观点。具体而言，本文调查了涉及三个方面的问题：安全性、隐私性和偏见，这些问题损害了模型的可信度。在安全性方面，我们讨论了在大型语言模型的背景下的安全对齐，防止它们生成有毒或有害内容。对于偏见，我们专注于可能误导网络的虚假偏见。最后，对于隐私性，我们涵盖了深度神经网络中的成员推断攻击。本文讨论的问题反映了我们自己的实验和观察。

更新时间: 2025-02-11 20:08:42

领域: cs.CR,cs.AI,cs.CL,cs.LG

下载: http://arxiv.org/abs/2502.10450v1

PIXHELL: When Pixels Learn to Scream

This paper presents a technique for generating sound by leveraging the electrical properties of liquid crystal displays (LCDs). The phenomenon occurs due to vibrational noise produced by capacitors within the LCD panel during rapid pixel state transitions. By modulating these transitions through specially crafted bitmap patterns projected onto the screen, we demonstrate how weak yet audible acoustic signals can be generated directly from the display. We designed, implemented, evaluated, and tested a system that repurposes the LCD as a sound-emitting device. Potential applications for this technique include low-power auditory feedback systems, short-range device communication, air-gap covert channels, secure auditory signaling, and innovative approaches to human-computer interaction.

Updated: 2025-02-11 19:58:17

标题: PIXHELL：当像素学会尖叫时

摘要: 本文介绍了一种利用液晶显示器（LCD）的电特性生成声音的技术。这种现象是由于LCD面板内的电容器在快速像素状态转换期间产生的振动噪音。通过调制这些转换，通过投影到屏幕上的特制位图模式，我们展示了如何直接从显示屏生成微弱但可听到的声学信号。我们设计、实现、评估和测试了一个将LCD重新用作发声设备的系统。这种技术的潜在应用包括低功耗听觉反馈系统、短距离设备通信、空气隙隐蔽通道、安全听觉信号和创新的人机交互方法。

更新时间: 2025-02-11 19:58:17

领域: cs.CR

下载: http://arxiv.org/abs/2502.07925v1

NDAI Agreements

We study a fundamental challenge in the economics of innovation: an inventor must reveal details of a new idea to secure compensation or funding, yet such disclosure risks expropriation. We present a model in which a seller (inventor) and buyer (investor) bargain over an information good under the threat of hold-up. In the classical setting, the seller withholds disclosure to avoid misappropriation, leading to inefficiency. We show that trusted execution environments (TEEs) combined with AI agents can mitigate and even fully eliminate this hold-up problem. By delegating the disclosure and payment decisions to tamper-proof programs, the seller can safely reveal the invention without risking expropriation, achieving full disclosure and an efficient ex post transfer. Moreover, even if the invention's value exceeds a threshold that TEEs can fully secure, partial disclosure still improves outcomes compared to no disclosure. Recognizing that real AI agents are imperfect, we model "agent errors" in payments or disclosures and demonstrate that budget caps and acceptance thresholds suffice to preserve most of the efficiency gains. Our results imply that cryptographic or hardware-based solutions can function as an "ironclad NDA," substantially mitigating the fundamental disclosure-appropriation paradox first identified by Arrow (1962) and Nelson (1959). This has far-reaching policy implications for fostering R&D, technology transfer, and collaboration.

Updated: 2025-02-11 19:56:26

标题: NDAI协议

摘要: 我们研究了创新经济学中的一个基本挑战：发明者必须透露新想法的细节以获得补偿或资金支持，然而这种披露存在被侵占的风险。我们提出了一个模型，在这个模型中，卖方（发明者）和买方（投资者）在遭受拖延的威胁下就信息商品进行讨价还价。在传统设置中，卖方为避免被侵占而隐瞒披露，导致效率低下。我们表明，受信任的执行环境（TEEs）与AI代理相结合可以缓解甚至完全消除这一拖延问题。通过将披露和支付决策委托给防篡改程序，卖方可以安全地揭示发明而不必担心被侵占，实现全面披露和有效的事后转移。此外，即使发明的价值超过了TEEs可以完全保护的阈值，部分披露仍然比没有披露改进结果。鉴于真实的AI代理是不完美的，我们对支付或披露中的“代理错误”进行建模，并证明预算上限和接受阈值足以保留大部分效率收益。我们的结果表明，加密或基于硬件的解决方案可以作为“坚不可摧的保密协议”，大大缓解了Arrow（1962年）和Nelson（1959年）首次发现的基本披露-侵占悖论。这对促进研发、技术转移和合作有深远的政策影响。

更新时间: 2025-02-11 19:56:26

领域: econ.TH,cs.AI

下载: http://arxiv.org/abs/2502.07924v1

Rescriber: Smaller-LLM-Powered User-Led Data Minimization for LLM-Based Chatbots

The proliferation of LLM-based conversational agents has resulted in excessive disclosure of identifiable or sensitive information. However, existing technologies fail to offer perceptible control or account for users' personal preferences about privacy-utility tradeoffs due to the lack of user involvement. To bridge this gap, we designed, built, and evaluated Rescriber, a browser extension that supports user-led data minimization in LLM-based conversational agents by helping users detect and sanitize personal information in their prompts. Our studies (N=12) showed that Rescriber helped users reduce unnecessary disclosure and addressed their privacy concerns. Users' subjective perceptions of the system powered by Llama3-8B were on par with that by GPT-4o. The comprehensiveness and consistency of the detection and sanitization emerge as essential factors that affect users' trust and perceived protection. Our findings confirm the viability of smaller-LLM-powered, user-facing, on-device privacy controls, presenting a promising approach to address the privacy and trust challenges of AI.

Updated: 2025-02-11 19:56:20

标题: 重写器：基于较小LLM的用户主导数据最小化为基于LLM的聊天机器人

摘要: 基于LLM的会话代理的蔓延导致了可识别或敏感信息的过度披露。然而，现有技术未能提供可察觉的控制，也未能考虑用户个人偏好对隐私-效用权衡的影响，因为缺乏用户参与。为了弥补这一差距，我们设计、构建并评估了Rescriber，这是一个浏览器扩展程序，通过帮助用户检测和清除其提示中的个人信息，支持用户主导的LLM-based会话代理数据最小化。我们的研究（N=12）表明，Rescriber帮助用户减少不必要的披露，并解决了他们的隐私顾虑。用户对由Llama3-8B驱动的系统的主观感知与由GPT-4o相当。检测和清除的全面性和一致性成为影响用户信任和感知保护的重要因素。我们的发现确认了由较小LLM驱动的、面向用户的、设备上的隐私控制的可行性，这提供了一个有前途的方法来解决人工智能的隐私和信任挑战。

更新时间: 2025-02-11 19:56:20

领域: cs.HC,cs.AI,cs.CR

下载: http://arxiv.org/abs/2410.11876v3

Robot Instance Segmentation with Few Annotations for Grasping

The ability of robots to manipulate objects relies heavily on their aptitude for visual perception. In domains characterized by cluttered scenes and high object variability, most methods call for vast labeled datasets, laboriously hand-annotated, with the aim of training capable models. Once deployed, the challenge of generalizing to unfamiliar objects implies that the model must evolve alongside its domain. To address this, we propose a novel framework that combines Semi-Supervised Learning (SSL) with Learning Through Interaction (LTI), allowing a model to learn by observing scene alterations and leverage visual consistency despite temporal gaps without requiring curated data of interaction sequences. As a result, our approach exploits partially annotated data through self-supervision and incorporates temporal context using pseudo-sequences generated from unlabeled still images. We validate our method on two common benchmarks, ARMBench mix-object-tote and OCID, where it achieves state-of-the-art performance. Notably, on ARMBench, we attain an $\text{AP}_{50}$ of $86.37$, almost a $20\%$ improvement over existing work, and obtain remarkable results in scenarios with extremely low annotation, achieving an $\text{AP}_{50}$ score of $84.89$ with just $1 \%$ of annotated data compared to $72$ presented in ARMBench on the fully annotated counterpart.

Updated: 2025-02-11 19:56:18

标题: 使用少量标注进行机器人实例分割以进行抓取

摘要: 机器人操纵物体的能力在很大程度上依赖于它们对视觉感知的能力。在充满混乱场景和高度对象变化的领域中，大多数方法需要大量标记的数据集，经过费力地手工注释，以训练出能力强大的模型。一旦部署，面对泛化到陌生对象的挑战意味着模型必须随着领域的发展而进化。为了解决这个问题，我们提出了一个新颖的框架，将半监督学习（SSL）与通过交互学习（LTI）相结合，使模型能够通过观察场景变化并利用视觉一致性进行学习，尽管存在时间间隔，而无需交互序列的精心策划数据。因此，我们的方法通过自我监督利用部分标记的数据，并利用从未标记的静止图像生成的伪序列来融入时间上下文。我们在两个常见基准测试上验证了我们的方法，ARMBench混合物体装载和OCID，在这些基准测试中取得了最先进的性能。值得注意的是，在ARMBench上，我们获得了86.37的AP50，几乎比现有工作提高了20％，并在极低注释场景中取得了显著的结果，与ARMBench上完全注释的对照相比，仅使用1％的已注释数据就获得了84.89的AP50分数，而ARMBench中的已注释数据量为72。

更新时间: 2025-02-11 19:56:18

领域: cs.CV,cs.AI,cs.RO

下载: http://arxiv.org/abs/2407.01302v2

Sign Operator for Coping with Heavy-Tailed Noise: High Probability Convergence Bounds with Extensions to Distributed Optimization and Comparison Oracle

The growing popularity of AI optimization problems involving severely corrupted data has increased the demand for methods capable of handling heavy-tailed noise, i.e., noise with bounded $\kappa$-th moment, $\kappa \in (1,2]$. For the widely used clipping technique, effectiveness heavily depends on the careful tuning of clipping levels throughout training. In this paper, we demonstrate that using only the sign of the input, without introducing additional hyperparameters, is sufficient to cope with heavy-tailed noise effectively. For smooth non-convex functions, we prove that SignSGD achieves optimal sample complexity $\tilde{O}\left(\varepsilon^{-\frac{3\kappa - 2}{\kappa - 1}}\right)$ with high probability for attaining an average gradient norm accuracy of $\varepsilon$. Under the assumption of symmetric noise, we use SignSGD with Majority Voting to extend this bound to the distributed optimization or reduce the sample complexity to $\tilde{O}(\varepsilon^{-4})$ in the case of a single worker with arbitrary parameters. Furthermore, we explore the application of the sign operator in zeroth-order optimization with an oracle that can only compare function values at two different points. We propose a novel method, MajorityVote-CompsSGD, and provide the first-known high-probability bound $\tilde{O}(\varepsilon^{-6})$ for the number of comparisons under symmetric noise assumption. Our theoretical findings are supported by the superior performance of sign-based methods in training Large Language Models.

Updated: 2025-02-11 19:54:11

标题: 应对重尾噪声的标志符号操作：高概率收敛界限及其在分布式优化和比较Oracle中的扩展

摘要: 随着涉及严重损坏数据的人工智能优化问题日益普及，对能够处理重尾噪声（即具有有界$\kappa$-th矩，$\kappa \in (1,2]$）的方法的需求增加了。对于广泛使用的剪切技术，其有效性在很大程度上取决于在整个训练过程中对剪切水平进行仔细调整。在本文中，我们证明仅使用输入的符号，而不引入额外的超参数，就足以有效处理重尾噪声。对于光滑非凸函数，我们证明SignSGD实现了最佳样本复杂度$\tilde{O}\left(\varepsilon^{-\frac{3\kappa - 2}{\kappa - 1}}\right)$，并且在达到平均梯度范数精度为$\varepsilon$时，具有很高的概率。在对称噪声的假设下，我们使用SignSGD与多数投票将此界限扩展到分布式优化，或在具有任意参数的单个工作者的情况下将样本复杂度降低到$\tilde{O}(\varepsilon^{-4})$。此外，我们探讨了在只能比较两个不同点的神谕中应用符号运算符进行零阶优化的方法。我们提出了一种新颖的方法，MajorityVote-CompsSGD，并针对对称噪声假设提供了首次知道的高概率界限$\tilde{O}(\varepsilon^{-6})$，用于比较次数。我们的理论发现得到了基于符号的方法在训练大型语言模型中卓越性能的支持。

更新时间: 2025-02-11 19:54:11

领域: math.OC,cs.LG

下载: http://arxiv.org/abs/2502.07923v1

Machine Unlearning via Information Theoretic Regularization

How can we effectively remove or "unlearn" undesirable information, such as specific features or individual data points, from a learning outcome while minimizing utility loss and ensuring rigorous guarantees? We introduce a mathematical framework based on information-theoretic regularization to address both feature and data point unlearning. For feature unlearning, we derive a unified solution that simultaneously optimizes diverse learning objectives, including entropy, conditional entropy, KL-divergence, and the energy of conditional probability. For data point unlearning, we first propose a novel definition that serves as a practical condition for unlearning via retraining, is easy to verify, and aligns with the principles of differential privacy from an inference perspective. Then, we provide provable guarantees for our framework on data point unlearning. By combining flexibility in learning objectives with simplicity in regularization design, our approach is highly adaptable and practical for a wide range of machine learning and AI applications.

Updated: 2025-02-11 19:45:20

标题: 通过信息论正则化实现机器遗忘

摘要: 我们如何有效地去除或“遗忘”不良信息，比如特定特征或个别数据点，从学习结果中，同时最小化效用损失并确保严格的保证？我们引入了一种基于信息论正则化的数学框架，以解决特征和数据点的遗忘问题。对于特征的遗忘，我们推导了一个统一的解决方案，同时优化不同的学习目标，包括熵、条件熵、KL-散度和条件概率的能量。对于数据点的遗忘，我们首先提出了一个新的定义，作为通过重新训练进行遗忘的实际条件，易于验证，并与差分隐私原则相一致。然后，我们为我们的框架在数据点遗忘上提供了可证明的保证。通过在学习目标的灵活性和正则化设计的简单性相结合，我们的方法在广泛的机器学习和人工智能应用中具有高度的适应性和实用性。

更新时间: 2025-02-11 19:45:20

领域: cs.LG,cs.AI,cs.IT,math.IT,stat.ML

下载: http://arxiv.org/abs/2502.05684v2

EIA: Environmental Injection Attack on Generalist Web Agents for Privacy Leakage

Generalist web agents have demonstrated remarkable potential in autonomously completing a wide range of tasks on real websites, significantly boosting human productivity. However, web tasks, such as booking flights, usually involve users' PII, which may be exposed to potential privacy risks if web agents accidentally interact with compromised websites, a scenario that remains largely unexplored in the literature. In this work, we narrow this gap by conducting the first study on the privacy risks of generalist web agents in adversarial environments. First, we present a realistic threat model for attacks on the website, where we consider two adversarial targets: stealing users' specific PII or the entire user request. Then, we propose a novel attack method, termed Environmental Injection Attack (EIA). EIA injects malicious content designed to adapt well to environments where the agents operate and our work instantiates EIA specifically for privacy scenarios in web environments. We collect 177 action steps that involve diverse PII categories on realistic websites from the Mind2Web, and conduct experiments using one of the most capable generalist web agent frameworks to date. The results demonstrate that EIA achieves up to 70% ASR in stealing specific PII and 16% ASR for full user request. Additionally, by accessing the stealthiness and experimenting with a defensive system prompt, we indicate that EIA is hard to detect and mitigate. Notably, attacks that are not well adapted for a webpage can be detected via human inspection, leading to our discussion about the trade-off between security and autonomy. However, extra attackers' efforts can make EIA seamlessly adapted, rendering such supervision ineffective. Thus, we further discuss the defenses at the pre- and post-deployment stages of the websites without relying on human supervision and call for more advanced defense strategies.

Updated: 2025-02-11 19:42:26

标题: EIA：面向通用网络代理的环境注入攻击，用于隐私泄露

摘要: 广义网络代理在真实网站上展示了自主完成各种任务的显著潜力，显著提升了人类的生产力。然而，网站任务，如预订航班，通常涉及用户的个人身份信息（PII），如果网络代理意外与受损网站互动，可能会面临潜在的隐私风险，这是文献中尚未探讨的情景。在这项工作中，我们通过在对抗环境中进行首次关于广义网络代理隐私风险的研究来缩小这一差距。首先，我们提出了一个针对网站攻击的现实威胁模型，考虑两个对抗性目标：窃取用户特定的PII或整个用户请求。然后，我们提出了一种新的攻击方法，称为环境注入攻击（EIA）。 EIA注入了专门设计的恶意内容，以适应网络代理操作的环境，并且我们的工作将EIA具体实例化为网络环境中的隐私场景。我们从Mind2Web收集了包含各种PII类别的177个动作步骤，然后使用迄今为止最强大的广义网络代理框架之一进行实验。结果表明，EIA在窃取特定PII方面实现了高达70%的ASR，并且对于完整用户请求实现了16%的ASR。此外，通过访问潜在性并进行防御系统提示的实验，我们表明EIA很难被检测和减轻。值得注意的是，无法很好适应网页的攻击可以通过人工检查来检测，这导致我们对安全性和自主性之间的权衡进行讨论。然而，额外的攻击者努力可以使EIA自然适应，使得这种监督无效。因此，我们进一步讨论了在网站的预部署和后部署阶段进行防御，而不依赖于人工监督，并呼吁更多先进的防御策略。

更新时间: 2025-02-11 19:42:26

领域: cs.CR,cs.AI,cs.CL,cs.LG

下载: http://arxiv.org/abs/2409.11295v4

A Multimodal Automated Interpretability Agent

This paper describes MAIA, a Multimodal Automated Interpretability Agent. MAIA is a system that uses neural models to automate neural model understanding tasks like feature interpretation and failure mode discovery. It equips a pre-trained vision-language model with a set of tools that support iterative experimentation on subcomponents of other models to explain their behavior. These include tools commonly used by human interpretability researchers: for synthesizing and editing inputs, computing maximally activating exemplars from real-world datasets, and summarizing and describing experimental results. Interpretability experiments proposed by MAIA compose these tools to describe and explain system behavior. We evaluate applications of MAIA to computer vision models. We first characterize MAIA's ability to describe (neuron-level) features in learned representations of images. Across several trained models and a novel dataset of synthetic vision neurons with paired ground-truth descriptions, MAIA produces descriptions comparable to those generated by expert human experimenters. We then show that MAIA can aid in two additional interpretability tasks: reducing sensitivity to spurious features, and automatically identifying inputs likely to be mis-classified.

Updated: 2025-02-11 19:35:42

标题: 一个多模态自动可解释性代理

摘要: 本文描述了MAIA，一种多模态自动可解释性代理。MAIA是一个系统，利用神经模型来自动化神经模型理解任务，如特征解释和故障模式发现。它为一个预训练的视觉语言模型配备了一组工具，支持对其他模型的子组件进行迭代实验，以解释它们的行为。这些工具包括人类可解释性研究者常用的工具：用于合成和编辑输入、从真实数据集中计算最大激活示例，以及总结和描述实验结果。MAIA提出的可解释性实验将这些工具组合起来描述和解释系统行为。我们评估了MAIA在计算机视觉模型中的应用。我们首先描述了MAIA在学习表示图像的特征（神经元级别）方面的能力。在几个训练模型和一个新的合成视觉神经元数据集中，MAIA生成的描述与专家人类实验者生成的描述相当。然后，我们展示了MAIA如何在两个额外的可解释性任务中发挥作用：减少对虚假特征的敏感性，自动识别可能被错误分类的输入。

更新时间: 2025-02-11 19:35:42

领域: cs.AI,cs.CL,cs.CV

下载: http://arxiv.org/abs/2404.14394v2

DeepSeek on a Trip: Inducing Targeted Visual Hallucinations via Representation Vulnerabilities

Multimodal Large Language Models (MLLMs) represent the cutting edge of AI technology, with DeepSeek models emerging as a leading open-source alternative offering competitive performance to closed-source systems. While these models demonstrate remarkable capabilities, their vision-language integration mechanisms introduce specific vulnerabilities. We implement an adapted embedding manipulation attack on DeepSeek Janus that induces targeted visual hallucinations through systematic optimization of image embeddings. Through extensive experimentation across COCO, DALL-E 3, and SVIT datasets, we achieve hallucination rates of up to 98.0% while maintaining high visual fidelity (SSIM > 0.88) of the manipulated images on open-ended questions. Our analysis demonstrates that both 1B and 7B variants of DeepSeek Janus are susceptible to these attacks, with closed-form evaluation showing consistently higher hallucination rates compared to open-ended questioning. We introduce a novel multi-prompt hallucination detection framework using LLaMA-3.1 8B Instruct for robust evaluation. The implications of these findings are particularly concerning given DeepSeek's open-source nature and widespread deployment potential. This research emphasizes the critical need for embedding-level security measures in MLLM deployment pipelines and contributes to the broader discussion of responsible AI implementation.

Updated: 2025-02-11 19:21:23

标题: DeepSeek 上的旅程：通过表示漏洞诱发有针对性的视觉幻觉

摘要: 多模式大型语言模型（MLLMs）代表了人工智能技术的最前沿，DeepSeek模型作为一种领先的开源替代方案，提供了与闭源系统竞争性能相当的性能。虽然这些模型展示了卓越的能力，但它们的视觉语言整合机制引入了特定的漏洞。我们在DeepSeek Janus上实施了一种适应的嵌入式操纵攻击，通过对图像嵌入的系统优化诱发目标视觉幻觉。通过对COCO、DALL-E 3和SVIT数据集进行广泛实验，我们实现了高达98.0%的幻觉率，同时在开放的问题上保持了操纵图像的高视觉保真度（SSIM>0.88）。我们的分析表明，DeepSeek Janus的1B和7B变体都容易受到这些攻击，封闭形式评估显示出与开放式提问相比更高的幻觉率。我们引入了一个使用LLaMA-3.1 8B Instruct的新型多提示幻觉检测框架，用于稳健评估。考虑到DeepSeek的开源性质和广泛的部署潜力，这些发现的影响特别令人担忧。这项研究强调了在MLLM部署流程中需要嵌入级安全措施的关键性需求，并为负责任的人工智能实施的广泛讨论做出贡献。

更新时间: 2025-02-11 19:21:23

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2502.07905v1

Zero-Shot Learning of Causal Models

With the increasing acquisition of datasets over time, we now have access to precise and varied descriptions of the world, encompassing a broad range of phenomena. These datasets can be seen as observations from unknown causal generative processes, commonly described by Structural Causal Models (SCMs). Recovering SCMs from observations poses formidable challenges, and often requires us to learn a specific generative model for each dataset. In this work, we propose to learn a \emph{single} model capable of inferring the SCMs in a zero-shot manner. Rather than learning a specific SCM for each dataset, we enable the Fixed-Point Approach (FiP)~\citep{scetbon2024fip} to infer the generative SCMs conditionally on their empirical representations. As a by-product, our approach can perform zero-shot generation of new dataset samples and intervened samples. We demonstrate via experiments that our amortized procedure achieves performances on par with SoTA methods trained specifically for each dataset on both in and out-of-distribution problems. To the best of our knowledge, this is the first time that SCMs are inferred in a zero-shot manner from observations, paving the way for a paradigmatic shift toward the assimilation of causal knowledge across datasets. The code is available on Github.

Updated: 2025-02-11 19:21:07

标题: 零样本学习因果模型

摘要: 随着时间的推移，我们对数据集的获取不断增加，现在我们可以精确而多样地描述世界，涵盖了广泛的现象。这些数据集可以被视为来自未知因果生成过程的观察结果，通常由结构因果模型（SCMs）来描述。从观察结果中恢复SCMs面临着巨大的挑战，通常需要我们为每个数据集学习一个特定的生成模型。在这项工作中，我们提出学习一种能够以零样本方式推断SCMs的\emph{单一}模型。与为每个数据集学习特定的SCM不同，我们使固定点方法（FiP）~\citep{scetbon2024fip}能够在其经验表示条件下推断生成的SCMs。作为副产品，我们的方法可以执行零样本生成新数据集样本和干预样本。通过实验证明，我们的摊销程序在内部和超出分布问题上的表现与专门针对每个数据集训练的SoTA方法相媲美。据我们所知，这是首次以零样本方式从观察结果中推断SCMs，为跨数据集间因果知识的同化打开了一种范式转变的道路。代码可在Github上获取。

更新时间: 2025-02-11 19:21:07

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2410.06128v2

Logicbreaks: A Framework for Understanding Subversion of Rule-based Inference

We study how to subvert large language models (LLMs) from following prompt-specified rules. We first formalize rule-following as inference in propositional Horn logic, a mathematical system in which rules have the form "if $P$ and $Q$, then $R$" for some propositions $P$, $Q$, and $R$. Next, we prove that although small transformers can faithfully follow such rules, maliciously crafted prompts can still mislead both theoretical constructions and models learned from data. Furthermore, we demonstrate that popular attack algorithms on LLMs find adversarial prompts and induce attention patterns that align with our theory. Our novel logic-based framework provides a foundation for studying LLMs in rule-based settings, enabling a formal analysis of tasks like logical reasoning and jailbreak attacks.

Updated: 2025-02-11 19:08:08

标题: Logicbreaks：一个用于理解基于规则推理颠覆的框架

摘要: 我们研究如何颠覆大型语言模型（LLMs）遵循特定提示规则。我们首先将遵循规则形式化为命题Horn逻辑推理，这是一个数学系统，其中规则的形式为“如果$P$和$Q$，则$R$”对于一些命题$P$，$Q$和$R$。接下来，我们证明尽管小型变压器可以忠实地遵循这些规则，但恶意制定的提示仍然可以误导理论构造和从数据中学习的模型。此外，我们证明了针对LLMs的流行攻击算法会找到敌对的提示并引发与我们理论相符的注意力模式。我们的新颖基于逻辑的框架为在基于规则的环境中研究LLMs提供了基础，从而使得对逻辑推理和越狱攻击等任务进行形式化分析成为可能。

更新时间: 2025-02-11 19:08:08

领域: cs.AI,cs.CL,cs.CR,cs.LG

下载: http://arxiv.org/abs/2407.00075v3

EfficientLLM: Scalable Pruning-Aware Pretraining for Architecture-Agnostic Edge Language Models

Modern large language models (LLMs) driven by scaling laws, achieve intelligence emergency in large model sizes. Recently, the increasing concerns about cloud costs, latency, and privacy make it an urgent requirement to develop compact edge language models. Distinguished from direct pretraining that bounded by the scaling law, this work proposes the pruning-aware pretraining, focusing on retaining performance of much larger optimized models. It features following characteristics: 1) Data-scalable: we introduce minimal parameter groups in LLM and continuously optimize structural pruning, extending post-training pruning methods like LLM-Pruner and SparseGPT into the pretraining phase. 2) Architecture-agnostic: the LLM architecture is auto-designed using saliency-driven pruning, which is the first time to exceed SoTA human-designed LLMs in modern pretraining. We reveal that it achieves top-quality edge language models, termed EfficientLLM, by scaling up LLM compression and extending its boundary. EfficientLLM significantly outperforms SoTA baselines with $100M \sim 1B$ parameters, such as MobileLLM, SmolLM, Qwen2.5-0.5B, OLMo-1B, Llama3.2-1B in common sense benchmarks. As the first attempt, EfficientLLM bridges the performance gap between traditional LLM compression and direct pretraining methods, and we will fully open source at https://github.com/Xingrun-Xing2/EfficientLLM.

Updated: 2025-02-11 19:01:39

标题: EfficientLLM：可扩展的剪枝感知预训练方法，用于与架构无关的边缘语言模型

摘要: 现代大型语言模型(LLMs)受到缩放定律的驱动，实现了大型模型尺寸上的智能突破。最近，对云成本、延迟和隐私的日益关注使得开发紧凑的边缘语言模型成为一项紧迫需求。与受缩放定律限制的直接预训练不同，本研究提出了修剪感知预训练，重点是保留更大优化模型的性能。其特点包括：1) 数据可扩展性：我们在LLM中引入最小参数组，并持续优化结构修剪，将后训练修剪方法如LLM-Pruner和SparseGPT扩展到预训练阶段。2) 架构不可知性：LLM架构是使用显著性驱动修剪自动设计的，这是首次在现代预训练中超越了SoTA人为设计的LLM。我们发现，通过扩大LLM压缩并扩展其边界，它实现了高质量的边缘语言模型，称为EfficientLLM。EfficientLLM在常识基准测试中明显优于具有$100M \sim 1B$参数的SoTA基线，如MobileLLM、SmolLM、Qwen2.5-0.5B、OLMo-1B、Llama3.2-1B。作为第一次尝试，EfficientLLM弥合了传统LLM压缩和直接预训练方法之间的性能差距，我们将完全开源在https://github.com/Xingrun-Xing2/EfficientLLM。

更新时间: 2025-02-11 19:01:39

领域: cs.LG

下载: http://arxiv.org/abs/2502.06663v2

A unifying account of warm start guarantees for patches of quantum landscapes

Barren plateaus are fundamentally a statement about quantum loss landscapes on average but there can, and generally will, exist patches of barren plateau landscapes with substantial gradients. Previous work has studied certain classes of parameterized quantum circuits and found example regions where gradients vanish at worst polynomially in system size. Here we present a general bound that unifies all these previous cases and that can tackle physically-motivated ans\"atze that could not be analyzed previously. Concretely, we analytically prove a lower-bound on the variance of the loss that can be used to show that in a non-exponentially narrow region around a point with curvature the loss variance cannot decay exponentially fast. This result is complemented by numerics and an upper-bound that suggest that any loss function with a barren plateau will have exponentially vanishing gradients in any constant radius subregion. Our work thus suggests that while there are hopes to be able to warm-start variational quantum algorithms, any initialization strategy that cannot get increasingly close to the region of attraction with increasing problem size is likely inadequate.

Updated: 2025-02-11 19:00:05

标题: 一个关于量子景观片段热启动保证的统一说明

摘要: 贫瘠高原基本上是关于量子损失景观的陈述，但通常会存在具有显著梯度的贫瘠高原景观区域。先前的研究已经研究了某些参数化量子电路的类别，并找到了梯度在系统规模下最坏情况下多项式消失的示例区域。在这里，我们提出了一个统一所有先前情况的通用界限，并可以处理以前无法分析的物理动机的ansatz。具体而言，我们在损失的方差上解析证明了一个下界，该下界可用于显示在曲率点周围的非指数狭窄区域内，损失方差不可能快速指数衰减。这一结果得到了数值和上界的补充，表明任何具有贫瘠高原的损失函数将在任何常数半径子区域中具有指数消失的梯度。因此，我们的工作表明，虽然有希望能够启动变分量子算法，但任何无法随着问题规模增加而逐渐接近吸引区域的初始化策略可能是不足的。

更新时间: 2025-02-11 19:00:05

领域: quant-ph,cs.LG,stat.ML

下载: http://arxiv.org/abs/2502.07889v1

Curvature Tuning: Provable Training-free Model Steering From a Single Parameter

The scaling of model size and data size has reshaped the paradigm of AI. As a result, the common protocol to leverage the latest models is to steer them towards a specific downstream task of interest through {\em fine-tuning}. Despite its importance, the main methods for fine-tuning remain limited to full or low-rank adapters--containing countless hyper-parameters and lacking interpretability. In this paper, we take a step back and demonstrate how novel and explainable post-training steering solutions can be derived theoretically from {\em spline operators}, a rich mathematical framing of Deep Networks that was recently developed. Our method--coined \textbf{Curvature Tuning (CT)}--has a single parameter that provably modulates the curvature of the model's decision boundary henceforth allowing training-free steering. This makes CT both more efficient and interpretable than conventional fine-tuning methods. We empirically validate its effectiveness in improving generalization and robustness of pretrained models. For example, CT improves out-of-distribution transfer performances of ResNet-18/50 by 2.57\%/1.74\% across seventeen downstream datasets, and improves RobustBench robust accuracy by 11.76\%/348.44\%. Additionally, we apply CT to ReLU-based Swin-T/S, improving their generalization on nine downstream datasets by 2.43\%/3.33\%. Our code is available at \href{https://github.com/Leon-Leyang/curvature-tuning}{https://github.com/Leon-Leyang/curvature-tuning}.

Updated: 2025-02-11 18:59:57

标题: 曲率调节：基于单一参数可证明的无需训练的模型调节

摘要: 模型大小和数据大小的扩展已经重新塑造了人工智能的范式。因此，利用最新模型的常见协议是通过微调将它们引导到特定感兴趣的下游任务中。尽管微调很重要，但主要的微调方法仍然局限于包含无数超参数且缺乏可解释性的全或低秩适配器。在本文中，我们退一步，展示了如何从最近开发的深度网络的丰富数学框架——样条算子中理论上推导出新颖且可解释的训练后引导解决方案。我们的方法——命名为\textbf{曲率调整（CT）}——有一个可证明调节模型决策边界曲率的单一参数，从而实现无需训练的引导。这使得CT比传统微调方法更高效和可解释。我们在实验中验证了其提高预训练模型泛化和鲁棒性的有效性。例如，CT在十七个下游数据集上提高了ResNet-18/50的超出分布转移性能2.57\%/1.74\%，并将RobustBench的鲁棒准确性提高了11.76\%/348.44\%。此外，我们将CT应用于基于ReLU的Swin-T/S，提高了它们在九个下游数据集上的泛化性能2.43\%/3.33\%。我们的代码可在\href{https://github.com/Leon-Leyang/curvature-tuning}{https://github.com/Leon-Leyang/curvature-tuning}上找到。

更新时间: 2025-02-11 18:59:57

领域: cs.LG

下载: http://arxiv.org/abs/2502.07783v1

Training Language Models on Synthetic Edit Sequences Improves Code Synthesis

Software engineers mainly write code by editing existing programs. In contrast, language models (LMs) autoregressively synthesize programs in a single pass. One explanation for this is the scarcity of sequential edit data. While high-quality instruction data for code synthesis is scarce, edit data for synthesis is even scarcer. To fill this gap, we develop a synthetic data generation algorithm called LintSeq. This algorithm refactors programs into sequences of synthetic edits by using a linter to procedurally sample across interdependent lines of source code. Synthetic edits sampled with LintSeq reflect the syntax and semantics of their programming language. To test the algorithm, we use it to refactor a dataset of instruction + program pairs into instruction + program-diff-sequence tuples. Then, we fine-tune a series of smaller LMs ranging from 2.6B to 14B parameters on both the re-factored and original versions of this dataset. We perform comprehensive evaluations comparing edit sequence code LMs against baselines on HumanEval, MBPP(+), CodeContests, DS-1000, and BigCodeBench. We show that models fine-tuned to iteratively synthesize code match or outperform baselines on pass@1, and exhibit better scaling across higher pass@k as a function of total test-time FLOPs. Finally, we also pretrain our own tiny LMs for code understanding. We show that fine-tuning these models to synthesize code edit-by-edit results in strong performance on HumanEval and MBPP(+) compared to existing code language models of similar scale such as CodeT5+, AlphaCode, and Codex.

Updated: 2025-02-11 18:59:47

标题: 使用合成编辑序列对语言模型进行训练可改进代码合成

摘要: 软件工程师主要通过编辑现有程序来编写代码。相比之下，语言模型（LMs）通过自回归地合成程序来完成一次性通过。其中一个解释是序列编辑数据的稀缺性。尽管用于代码合成的高质量指导数据稀缺，但用于合成的编辑数据甚至更加稀缺。为了填补这一差距，我们开发了一种名为LintSeq的合成数据生成算法。该算法通过使用linter在源代码的相互依赖行之间进行程序化抽样，将程序重构为一系列合成编辑序列。通过LintSeq抽样的合成编辑反映了其编程语言的语法和语义。为了测试该算法，我们使用它将一组指令+程序对重构为指令+程序-差异序列元组。然后，我们在重新构建和原始版本的这个数据集上对一系列从2.6B到14B参数的较小的LM进行微调。我们对HumanEval，MBPP(+)，CodeContests，DS-1000和BigCodeBench进行了全面评估，比较了编辑序列代码LM与基线模型。我们表明，经过微调以迭代地合成代码的模型在pass@1上匹配或胜过基线，并且随总测试时间FLOPs的函数在更高的pass@k上表现出更好的扩展性。最后，我们还对我们自己的微型LM进行了预训练以进行代码理解。我们表明，将这些模型微调以逐步合成代码逐个编辑的结果在HumanEval和MBPP(+)上表现出色，与CodeT5+，AlphaCode和Codex等规模相似的现有代码语言模型相比。

更新时间: 2025-02-11 18:59:47

领域: cs.LG,cs.CL

下载: http://arxiv.org/abs/2410.02749v3

OLMES: A Standard for Language Model Evaluations

Progress in AI is often demonstrated by new models claiming improved performance on tasks measuring model capabilities. Evaluating language models can be particularly challenging, as choices of how a model is evaluated on a task can lead to large changes in measured performance. There is no common standard setup, so different models are evaluated on the same tasks in different ways, leading to claims about which models perform best not being reproducible. We propose OLMES, a completely documented, practical, open standard for reproducible LLM evaluations. In developing this standard, we identify and review the varying factors in evaluation practices adopted by the community - such as details of prompt formatting, choice of in-context examples, probability normalizations, and task formulation. In particular, OLMES supports meaningful comparisons between smaller base models that require the unnatural "cloze" formulation of multiple-choice questions against larger models that can utilize the original formulation. OLMES includes well-considered, documented recommendations guided by results from existing literature as well as new experiments resolving open questions.

Updated: 2025-02-11 18:59:26

标题: OLMES：语言模型评估的标准

摘要: 人工智能领域的进展通常通过新模型声称在衡量模型能力的任务上表现更好来展示。评估语言模型可能特别具有挑战性，因为对模型在任务上的评估方式选择可能导致所测量性能发生较大变化。由于缺乏共同的标准设置，因此不同模型以不同方式在相同任务上进行评估，导致关于哪些模型表现最佳的说法无法再现。我们提出OLMES，这是一个完全记录、实用、开放的标准，用于可重现的LLM评估。在制定这一标准时，我们确定并审查了社区采用的评估实践中的各种因素——如提示格式的细节、在上下文示例的选择、概率归一化和任务制定。特别是，OLMES支持较小基础模型之间的有意义比较，这些模型需要使用不自然的“填空”公式来回答多项选择问题，而较大模型则可以利用原始公式。OLMES包括经过深思熟虑、有文档记录的建议，这些建议受到现有文献结果以及解决未决问题的新实验的指导。

更新时间: 2025-02-11 18:59:26

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2406.08446v2

Auditing Prompt Caching in Language Model APIs

Prompt caching in large language models (LLMs) results in data-dependent timing variations: cached prompts are processed faster than non-cached prompts. These timing differences introduce the risk of side-channel timing attacks. For example, if the cache is shared across users, an attacker could identify cached prompts from fast API response times to learn information about other users' prompts. Because prompt caching may cause privacy leakage, transparency around the caching policies of API providers is important. To this end, we develop and conduct statistical audits to detect prompt caching in real-world LLM API providers. We detect global cache sharing across users in seven API providers, including OpenAI, resulting in potential privacy leakage about users' prompts. Timing variations due to prompt caching can also result in leakage of information about model architecture. Namely, we find evidence that OpenAI's embedding model is a decoder-only Transformer, which was previously not publicly known.

Updated: 2025-02-11 18:58:04

标题: 审计语言模型API中的快速缓存

摘要: 大语言模型（LLMs）中的提示缓存导致数据相关的时间变化：缓存的提示比非缓存的提示处理速度更快。这些时间差异引入了侧信道时间攻击的风险。例如，如果缓存在用户之间共享，攻击者可以通过快速API响应时间识别缓存的提示，从而了解其他用户的提示信息。由于提示缓存可能导致隐私泄露，API提供商的缓存政策透明度至关重要。为此，我们开发并进行统计审计，以检测现实世界中LLM API提供商中的提示缓存。我们发现在包括OpenAI在内的七个API提供商中存在全局缓存共享，可能导致关于用户提示的隐私泄露。由于提示缓存导致的时间变化也可能导致关于模型架构的信息泄露。具体来说，我们发现OpenAI的嵌入模型是一个仅解码器的Transformer，这之前并不为公众所知。

更新时间: 2025-02-11 18:58:04

领域: cs.CL,cs.CR,cs.LG

下载: http://arxiv.org/abs/2502.07776v1

Optimistic Interior Point Methods for Sequential Hypothesis Testing by Betting

The technique of "testing by betting" frames nonparametric sequential hypothesis testing as a multiple-round game, where a player bets on future observations that arrive in a streaming fashion, accumulates wealth that quantifies evidence against the null hypothesis, and rejects the null once the wealth exceeds a specified threshold while controlling the false positive error. Designing an online learning algorithm that achieves a small regret in the game can help rapidly accumulate the bettor's wealth, which in turn can shorten the time to reject the null hypothesis under the alternative $H_1$. However, many of the existing works employ the Online Newton Step (ONS) to update within a halved decision space to avoid a gradient explosion issue, which is potentially conservative for rapid wealth accumulation. In this paper, we introduce a novel strategy utilizing interior-point methods in optimization that allows updates across the entire interior of the decision space without the risk of gradient explosion. Our approach not only maintains strong statistical guarantees but also facilitates faster null hypothesis rejection in critical scenarios, overcoming the limitations of existing approaches.

Updated: 2025-02-11 18:57:18

标题: 乐观内点方法用于通过赌注进行顺序假设检验

摘要: “通过赌注测试”技术将非参数顺序假设检验构建为一个多轮游戏，其中玩家在以流式方式到达的未来观察结果上下注，累积财富来量化对零假设的证据，并在财富超过指定阈值时拒绝零假设，同时控制错误拒绝率。设计一个在线学习算法，在游戏中实现较小遗憾可以帮助快速积累下注者的财富，从而缩短在备择假设$H_1$下拒绝零假设的时间。然而，许多现有作品采用在线牛顿步骤（ONS）在半决策空间内更新，以避免梯度爆炸问题，这可能对快速积累财富保守。在本文中，我们引入一种新颖的策略，利用优化中的内点方法，允许在整个决策空间的内部进行更新，而不会出现梯度爆炸的风险。我们的方法不仅保持强大的统计保证，还在关键场景中促进更快的零假设拒绝，克服了现有方法的局限性。”

更新时间: 2025-02-11 18:57:18

领域: cs.LG

下载: http://arxiv.org/abs/2502.07774v1

Breaking Down Bias: On The Limits of Generalizable Pruning Strategies

We employ model pruning to examine how LLMs conceptualize racial biases, and whether a generalizable mitigation strategy for such biases appears feasible. Our analysis yields several novel insights. We find that pruning can be an effective method to reduce bias without significantly increasing anomalous model behavior. Neuron-based pruning strategies generally yield better results than approaches pruning entire attention heads. However, our results also show that the effectiveness of either approach quickly deteriorates as pruning strategies become more generalized. For instance, a model that is trained on removing racial biases in the context of financial decision-making poorly generalizes to biases in commercial transactions. Overall, our analysis suggests that racial biases are only partially represented as a general concept within language models. The other part of these biases is highly context-specific, suggesting that generalizable mitigation strategies may be of limited effectiveness. Our findings have important implications for legal frameworks surrounding AI. In particular, they suggest that an effective mitigation strategy should include the allocation of legal responsibility on those that deploy models in a specific use case.

Updated: 2025-02-11 18:55:57

标题: 消除偏见：关于可推广修剪策略的局限性

摘要: 我们采用模型修剪来研究LLMs如何构想种族偏见，以及是否存在一种可推广的缓解偏见的策略。我们的分析得出了几个新颖的见解。我们发现修剪可以是一种有效的方法来减少偏见，而不显著增加异常模型行为。基于神经元的修剪策略通常比修剪整个注意力头的方法产生更好的结果。然而，我们的结果还显示，随着修剪策略变得更加普遍化，任何一种方法的有效性很快就会下降。例如，在金融决策背景下训练的模型在去除种族偏见方面表现不佳，而在商业交易中的偏见上表现普遍不佳。总的来说，我们的分析表明，种族偏见只在语言模型中部分地被表示为一个普遍概念。这些偏见的另一部分是高度特定于环境的，这表明可推广的缓解策略可能有效性有限。我们的发现对围绕人工智能的法律框架具有重要意义。特别是，它们表明一种有效的缓解策略应该包括在特定用例中部署模型的人员上分配法律责任。

更新时间: 2025-02-11 18:55:57

领域: cs.CL,cs.AI,cs.CY,cs.LG

下载: http://arxiv.org/abs/2502.07771v1

ENFORCE: Exact Nonlinear Constrained Learning with Adaptive-depth Neural Projection

Ensuring neural networks adhere to domain-specific constraints is crucial for addressing safety and ethical concerns while also enhancing prediction accuracy. Despite the nonlinear nature of most real-world tasks, existing methods are predominantly limited to affine or convex constraints. We introduce ENFORCE, a neural network architecture that guarantees predictions to satisfy nonlinear constraints exactly. ENFORCE is trained with standard unconstrained gradient-based optimizers (e.g., Adam) and leverages autodifferentiation and local neural projections to enforce any $\mathcal{C}^1$ constraint to arbitrary tolerance $\epsilon$. We build an adaptive-depth neural projection (AdaNP) module that dynamically adjusts its complexity to suit the specific problem and the required tolerance levels. ENFORCE guarantees satisfaction of equality constraints that are nonlinear in both inputs and outputs of the neural network with minimal (and adjustable) computational cost.

Updated: 2025-02-11 18:54:30

标题: ENFORCE：具有自适应深度神经投影的精确非线性约束学习

摘要: 确保神经网络遵守领域特定约束对于解决安全和伦理问题以及增强预测准确性至关重要。尽管大多数现实世界任务具有非线性特性，但现有方法主要受限于仿射或凸约束。我们引入了ENFORCE，一种神经网络架构，可确保预测完全满足非线性约束。ENFORCE使用标准无约束的基于梯度的优化器（例如Adam）进行训练，并利用自动微分和局部神经投影来强制执行任意$\mathcal{C}^1$约束，以达到任意容差$\epsilon$。我们构建了一种自适应深度神经投影（AdaNP）模块，动态调整其复杂性以适应特定问题和所需的容差水平。ENFORCE保证在最小（且可调节的）计算成本下满足神经网络输入和输出的非线性等式约束。

更新时间: 2025-02-11 18:54:30

领域: cs.LG

下载: http://arxiv.org/abs/2502.06774v2

Effect of Adaptive Communication Support on LLM-powered Human-Robot Collaboration

Effective human-robot collaboration requires robot to adopt their roles and levels of support based on human needs, task requirements, and complexity. Traditional human-robot teaming often relies on a pre-determined robot communication scheme, restricting teamwork adaptability in complex tasks. Leveraging strong communication capabilities of Large Language Models (LLMs), we propose a Human-Robot Teaming Framework with Multi-Modal Language feedback (HRT-ML), a framework designed to enhance human-robot interaction by adjusting the frequency and content of language-based feedback. HRT-ML framework includes two core modules: a Coordinator for high-level, low-frequency strategic guidance, and a Manager for subtask-specific, high-frequency instructions, enabling passive and active interactions with human teammates. To assess the impact of language feedback in collaborative scenarios, we conducted experiments in an enhanced Overcooked environment with varying levels of task complexity (easy, medium, hard) and feedback frequency (inactive, passive, active, superactive). Our results show that as task complexity increases relative to human capabilities, human teammates exhibited a stronger preference towards robotic agents that can offer frequent, proactive support. However, when task complexities exceed the LLM's capacity, noisy and inaccurate feedback from superactive robotic agents can instead hinder team performance, as it requires human teammates to increase their effort to interpret and respond to a large number of communications, with limited performance return. Our results offer a general principle for robotic agents to dynamically adjust their levels and frequencies of communications to work seamlessly with humans and achieve improved teaming performance.

Updated: 2025-02-11 18:52:51

标题: 自适应通信支持对LLM动力人机协作的影响

摘要: 有效的人机协作需要机器人根据人类需求、任务要求和复杂性来采用其角色和支持水平。传统的人机团队合作通常依赖于预先确定的机器人通信方案，限制了在复杂任务中的团队适应性。利用大型语言模型（LLMs）的强大通信能力，我们提出了一种带有多模态语言反馈的人机团队框架（HRT-ML），这是一个旨在通过调整基于语言的反馈的频率和内容来增强人机交互的框架。HRT-ML框架包括两个核心模块：一个协调员用于高层次、低频率的战略指导，一个经理用于子任务特定的高频率指令，实现与人类队友的被动和主动交互。为了评估协作场景中语言反馈的影响，我们在一个增强的Overcooked环境中进行了实验，任务复杂性不同（简单、中等、困难），反馈频率也不同（无效、被动、主动、超主动）。我们的结果显示，随着任务复杂性相对于人类能力的增加，人类队友对能够提供频繁主动支持的机器人代理表现出更强的偏好。然而，当任务复杂性超过LLM的能力时，来自超主动机器人代理的嘈杂和不准确的反馈反而会妨碍团队表现，因为这需要人类队友增加努力来解释和回应大量的通信，但性能回报有限。我们的结果为机器人代理提供了一个通用原则，即根据需要动态调整其通信水平和频率，以与人类无缝合作，并实现团队绩效的提升。

更新时间: 2025-02-11 18:52:51

领域: cs.HC,cs.AI,cs.RO,68T05,I.2.9

下载: http://arxiv.org/abs/2412.06808v2

Polynomial-Time Approximability of Constrained Reinforcement Learning

We study the computational complexity of approximating general constrained Markov decision processes. Our primary contribution is the design of a polynomial time $(0,\epsilon)$-additive bicriteria approximation algorithm for finding optimal constrained policies across a broad class of recursively computable constraints, including almost-sure, chance, expectation, and their anytime variants. Matching lower bounds imply our approximation guarantees are optimal so long as $P \neq NP$. The generality of our approach results in answers to several long-standing open complexity questions in the constrained reinforcement learning literature. Specifically, we are the first to prove polynomial-time approximability for the following settings: policies under chance constraints, deterministic policies under multiple expectation constraints, policies under non-homogeneous constraints (i.e., constraints of different types), and policies under constraints for continuous-state processes.

Updated: 2025-02-11 18:47:53

标题: 多项式时间约束强化学习的近似可行性

摘要: 我们研究了近似一般受限制的马尔可夫决策过程的计算复杂性。我们的主要贡献是设计了一个多项式时间的（0，ε）-可加双标准近似算法，用于在广泛的可递归计算约束类别中找到最优的受限制政策，包括几乎确定、机会、期望及其任意变体。匹配的下界意味着我们的近似保证在P≠NP的情况下是最优的。我们方法的普适性导致了对受限制强化学习文献中几个长期存在的复杂性问题的答案。具体地，我们是第一个证明以下设置具有多项式时间可近似性的：在机会约束下的政策、在多个期望约束下的确定性政策、在非均匀约束下的政策（即，不同类型的约束）以及在连续状态过程下的约束政策。

更新时间: 2025-02-11 18:47:53

领域: cs.DS,cs.AI,cs.LG

下载: http://arxiv.org/abs/2502.07764v1

Hallucination, Monofacts, and Miscalibration: An Empirical Investigation

Recent theoretical work by [Kalai and Vempala 2024] proves that a particular notion of hallucination rate in LLMs must be lower bounded by the training data monofact rate (related to the classical Good-Turing missing mass estimator) minus model miscalibration. Through systematic experiments with n-gram models and in-context learning with LLMs, we empirically investigate and validate this theory by examining how different underlying data distributions affect the monofact rate and a model's tendency to hallucinate. We then vary model miscalibration through controlled upweighting of training samples while holding monofact rates constant, allowing us to isolate miscalibration's reduction effect on hallucination. These findings suggest that both the distribution of fact frequencies in training data and the calibration-hallucination trade-off are inherent to probabilistic language generation. Our results also suggest that current practices of aggressive deduplication in training data may need to be reconsidered, as selective duplication could serve as a principled mechanism for reducing hallucination.

Updated: 2025-02-11 18:46:00

标题: 幻觉、单一事实和误校准：一项实证调查

摘要: 最近由[Kalai和Vempala 2024]进行的理论工作证明，LLMs中的一种特定的幻觉率概念必须由训练数据的单因子率（与经典的Good-Turing缺失质量估计器有关）减去模型误校准来作为下界。通过对n-gram模型和LLMs中的上下文学习进行系统实验，我们通过检查不同的基础数据分布如何影响单因子率以及模型产生幻觉的倾向，从经验上调查和验证了这一理论。然后，我们通过控制训练样本的加权来变化模型误校准，同时保持单因子率恒定，从而使我们能够分离误校准对幻觉的减少效果。这些发现表明，训练数据中事实频率的分布以及校准-幻觉权衡是概率语言生成的固有特性。我们的结果还表明，目前在训练数据中进行积极去重的做法可能需要重新考虑，因为选择性复制可以作为减少幻觉的原则机制。

更新时间: 2025-02-11 18:46:00

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2502.08666v1

Accessing Vision Foundation Models via ImageNet-1K

Vision foundation models are renowned for the generalization ability due to massive training data. Nevertheless, they demand tremendous training resources, and the training data is often inaccessible, e.g., CLIP, DINOv2, posing great challenges to developing derivatives that could facilitate the research. In this work, we offer a very simple and general solution, named \textit{Proteus}, to distill foundation models into smaller equivalents on ImageNet-1K without access to the original training data. Specifically, we remove the designs from conventional knowledge distillation settings that result in dataset bias and present three levels of training objectives, i.e., token, patch, and feature, to maximize the efficacy of knowledge transfer. In this manner, Proteus is trained at ImageNet-level costs with surprising ability, facilitating the accessibility of training foundation models for the broader research community. When leveraging DINOv2-g/14 as the teacher, Proteus-L/14 matches the performance of the Oracle method DINOv2-L/14 (142M training data) across 19 benchmarks and outperforms other vision foundation models including CLIP-L/14 (400M), OpenCLIP-L/14 (400M/2B) and SynCLR-L/14 (600M) with a significantly smaller training set of 1.2M images.

Updated: 2025-02-11 18:44:46

标题: 通过ImageNet-1K访问视觉基础模型

摘要: 视觉基础模型以其由大量训练数据带来的泛化能力而闻名。然而，它们需要大量的训练资源，并且训练数据通常是不可访问的，例如CLIP、DINOv2，这给开发能够促进研究的派生产品带来了巨大挑战。在这项工作中，我们提供了一个非常简单和通用的解决方案，名为Proteus，将基础模型提炼成ImageNet-1K上更小的等效模型，而无需访问原始训练数据。具体来说，我们去除了传统知识蒸馏设置中导致数据集偏差的设计，并提出了三个训练目标级别，即令牌、补丁和特征，以最大化知识传输的效果。通过这种方式，Proteus在ImageNet级别的成本下进行训练，具有惊人的能力，促进了更广泛研究社区对基础模型的训练可访问性。当利用DINOv2-g/14作为教师时，Proteus-L/14在19个基准测试中与Oracle方法DINOv2-L/14（142M训练数据）的性能相匹配，并且优于其他视觉基础模型，包括CLIP-L/14（400M）、OpenCLIP-L/14（400M/2B）和SynCLR-L/14（600M），其训练集仅为1.2M图像。

更新时间: 2025-02-11 18:44:46

领域: cs.CV,cs.AI,cs.LG

下载: http://arxiv.org/abs/2407.10366v2

Scalable Fingerprinting of Large Language Models

Model fingerprinting has emerged as a powerful tool for model owners to identify their shared model given API access. However, to lower false discovery rate, fight fingerprint leakage, and defend against coalitions of model users attempting to bypass detection, we argue that {\em scalability} is critical, i.e., scaling up the number of fingerprints one can embed into a model. Hence, we pose scalability as a crucial requirement for fingerprinting schemes. We experiment with fingerprint design at a scale significantly larger than previously considered, and introduce a new method, dubbed Perinucleus sampling, to generate scalable, persistent, and harmless fingerprints. We demonstrate that this scheme can add 24,576 fingerprints to a Llama-3.1-8B model -- two orders of magnitude more than existing schemes -- without degrading the model's utility. Our inserted fingerprints persist even after supervised fine-tuning on standard post-training data. We further address security risks for fingerprinting, and theoretically and empirically show how a scalable fingerprinting scheme like ours can mitigate these risks.

Updated: 2025-02-11 18:43:07

标题: 大规模语言模型的可扩展指纹识别

摘要: 模型指纹技术已经成为模型所有者在获得API访问权限时识别其共享模型的强大工具。然而，为了降低错误发现率，防止指纹泄露，并且抵御试图绕过检测的模型用户联盟，我们认为可扩展性是至关重要的，即扩大可以嵌入模型的指纹数量。因此，我们将可扩展性视为指纹方案的关键要求。我们在比以前考虑的规模大得多的情况下进行指纹设计实验，并引入一种名为Perinucleus采样的新方法，以生成可扩展、持久且无害的指纹。我们证明，这种方案可以向Llama-3.1-8B模型添加24,576个指纹，比现有方案多两个数量级，而不会降低模型的效用。我们插入的指纹甚至在标准后训练数据上进行监督微调后仍然存在。我们进一步解决指纹技术的安全风险，并从理论和实证角度展示类似我们的可扩展指纹技术如何可以缓解这些风险。

更新时间: 2025-02-11 18:43:07

领域: cs.CR,cs.LG

下载: http://arxiv.org/abs/2502.07760v1

Language Model Council: Democratically Benchmarking Foundation Models on Highly Subjective Tasks

As Large Language Models (LLMs) continue to evolve, evaluating them remains a persistent challenge. Many recent evaluations use LLMs as judges to score outputs from other LLMs, often relying on a single large model like GPT-4o. However, using a single LLM judge is prone to intra-model bias, and many tasks - such as those related to emotional intelligence, creative writing, and persuasiveness - may be too subjective for a single model to judge fairly. We introduce the Language Model Council (LMC), where a group of LLMs collaborate to create tests, respond to them, and evaluate each other's responses to produce a ranking in a democratic fashion. Unlike previous approaches that focus on reducing cost or bias by using a panel of smaller models, our work examines the benefits and nuances of a fully inclusive LLM evaluation system. In a detailed case study on emotional intelligence, we deploy a council of 20 recent LLMs to rank each other on open-ended responses to interpersonal conflicts. Our results show that the LMC produces rankings that are more separable and more robust, and through a user study, we show that they are more consistent with human evaluations than any individual LLM judge. Using all LLMs for judging can be costly, however, so we use Monte Carlo simulations and hand-curated sub-councils to study hypothetical council compositions and discuss the value of the incremental LLM judge.

Updated: 2025-02-11 18:42:44

标题: 语言模型委员会：在高度主观的任务上对基础模型进行民主基准测试

摘要: 随着大型语言模型（LLMs）的不断发展，对它们进行评估仍然是一个持续性挑战。许多最近的评估使用LLMs作为评判者来评分其他LLMs的输出，通常依赖于像GPT-4o这样的单一大型模型。然而，使用单一LLM评判者容易出现内部模型偏见，并且许多任务 - 如情感智能、创意写作和说服力等 - 可能太主观，以至于单一模型无法公平地评判。我们引入了语言模型委员会（LMC），在这里一组LLMs合作创建测试，回答测试，并评估彼此的回答，以民主的方式产生排名。与以往专注于通过使用一组较小模型来降低成本或偏见的方法不同，我们的工作研究了完全包容的LLM评估系统的益处和细微差别。在情感智能的详细案例研究中，我们部署了一个由20个最新LLMs组成的委员会，对彼此在人际冲突的开放性回应上进行排名。我们的结果显示，LMC产生的排名更具可分辨性和更加稳健，通过用户研究，我们展示它们与人类评估更一致，比任何单个LLM评判者都更一致。然而，使用所有LLMs进行评判可能成本较高，因此我们使用蒙特卡洛模拟和手工策划的子委员会来研究假设的委员会构成，并讨论增量LLM评判者的价值。

更新时间: 2025-02-11 18:42:44

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2406.08598v3

From Fog to Failure: How Dehazing Can Harm Clear Image Object Detection

This study explores the challenges of integrating human visual cue-based dehazing into object detection, given the selective nature of human perception. While human vision adapts dynamically to environmental conditions, computational dehazing does not always enhance detection uniformly. We propose a multi-stage framework where a lightweight detector identifies regions of interest (RoIs), which are then enhanced via spatial attention-based dehazing before final detection by a heavier model. Though effective in foggy conditions, this approach unexpectedly degrades the performance on clear images. We analyze this phenomenon, investigate possible causes, and offer insights for designing hybrid pipelines that balance enhancement and detection. Our findings highlight the need for selective preprocessing and challenge assumptions about universal benefits from cascading transformations.

Updated: 2025-02-11 18:33:27

标题: 从雾霾到失败：去雾如何损害清晰图像目标检测

摘要: 这项研究探讨了将基于人类视觉线索的去雾技术整合到目标检测中的挑战，考虑到人类感知的选择性特性。虽然人类视觉可以动态适应环境条件，但计算去雾并不总是能够统一地增强检测结果。我们提出了一个多阶段框架，其中一个轻量级检测器识别感兴趣区域（RoIs），然后通过基于空间注意力的去雾技术增强这些区域，最终由一个更重的模型进行检测。尽管在雾天条件下有效，但这种方法意外地会降低在晴天图像上的性能。我们分析了这一现象，调查了可能的原因，并提供了设计平衡增强和检测的混合管道的见解。我们的研究结果强调了对选择性预处理的需求，并挑战了关于级联变换普遍受益的假设。

更新时间: 2025-02-11 18:33:27

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2502.02027v3

An Advanced NLP Framework for Automated Medical Diagnosis with DeBERTa and Dynamic Contextual Positional Gating

This paper presents a novel Natural Language Processing (NLP) framework for enhancing medical diagnosis through the integration of advanced techniques in data augmentation, feature extraction, and classification. The proposed approach employs back-translation to generate diverse paraphrased datasets, improving robustness and mitigating overfitting in classification tasks. Leveraging Decoding-enhanced BERT with Disentangled Attention (DeBERTa) with Dynamic Contextual Positional Gating (DCPG), the model captures fine-grained contextual and positional relationships, dynamically adjusting the influence of positional information based on semantic context to produce high-quality text embeddings. For classification, an Attention-Based Feedforward Neural Network (ABFNN) is utilized, effectively focusing on the most relevant features to improve decision-making accuracy. Applied to the classification of symptoms, clinical notes, and other medical texts, this architecture demonstrates its ability to address the complexities of medical data. The combination of data augmentation, contextual embedding generation, and advanced classification mechanisms offers a robust and accurate diagnostic tool, with potential applications in automated medical diagnosis and clinical decision support. This method demonstrates the effectiveness of the proposed NLP framework for medical diagnosis, achieving remarkable results with an accuracy of 99.78%, recall of 99.72%, precision of 99.79%, and an F1-score of 99.75%. These metrics not only underscore the model's robust performance in classifying medical texts with exceptional precision and reliability but also highlight its superiority over existing methods, making it a highly promising tool for automated diagnostic systems.

Updated: 2025-02-11 18:32:24

标题: 一种基于DeBERTa和动态上下文位置门控的自动医学诊断的先进NLP框架

摘要: 这篇论文提出了一个新颖的自然语言处理（NLP）框架，通过整合先进的数据增强、特征提取和分类技术来增强医学诊断。所提出的方法利用回译生成多样化的释义数据集，改善分类任务中的鲁棒性并减轻过拟合。利用具有解耦注意力的解码增强BERT（DeBERTa）和动态上下文位置门控（DCPG），该模型捕捉细粒度的上下文和位置关系，根据语义上下文动态调整位置信息的影响，产生高质量的文本嵌入。对于分类，采用基于注意力的前馈神经网络（ABFNN），有效地聚焦于最相关的特征，提高决策准确性。应用于症状、临床笔记和其他医学文本的分类，这种架构展示了其解决医学数据复杂性的能力。数据增强、上下文嵌入生成和先进的分类机制的结合提供了一个鲁棒且准确的诊断工具，具有潜在的应用于自动化医学诊断和临床决策支持。该方法展示了所提出的NLP框架在医学诊断方面的有效性，以99.78%的准确率、99.72%的召回率、99.79%的精确率和99.75%的F1分数取得了显著的结果。这些指标不仅强调了该模型在对医学文本进行分类时的鲁棒性表现以及卓越的精度和可靠性，还突显了其优于现有方法的优越性，使其成为自动诊断系统的高度有前途的工具。

更新时间: 2025-02-11 18:32:24

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2502.07755v1

Whole-Genome Phenotype Prediction with Machine Learning: Open Problems in Bacterial Genomics

How can we identify causal genetic mechanisms that govern bacterial traits? Initial efforts entrusting machine learning models to handle the task of predicting phenotype from genotype return high accuracy scores. However, attempts to extract any meaning from the predictive models are found to be corrupted by falsely identified "causal" features. Relying solely on pattern recognition and correlations is unreliable, significantly so in bacterial genomics settings where high-dimensionality and spurious associations are the norm. Though it is not yet clear whether we can overcome this hurdle, significant efforts are being made towards discovering potential high-risk bacterial genetic variants. In view of this, we set up open problems surrounding phenotype prediction from bacterial whole-genome datasets and extending those to learning causal effects, and discuss challenges that impact the reliability of a machine's decision-making when faced with datasets of this nature.

Updated: 2025-02-11 18:25:14

标题: 用机器学习进行全基因组表型预测：细菌基因组学中的未解问题

摘要: 我们如何确定控制细菌特征的因果遗传机制？最初的努力委托机器学习模型来处理从基因型预测表型的任务，返回高准确性分数。然而，尝试从预测模型中提取任何含义时发现被错误地识别为“因果”的特征。仅依靠模式识别和相关性是不可靠的，尤其是在细菌基因组设置中，高维度和虚假关联是常态。虽然尚不清楚我们是否能克服这一障碍，但正在做出重大努力发现潜在的高风险细菌遗传变异。鉴于此，我们提出了围绕从细菌全基因组数据集中预测表型并将其扩展到学习因果效应的开放问题，并讨论了在面对这种数据集时影响机器决策可靠性的挑战。

更新时间: 2025-02-11 18:25:14

领域: q-bio.GN,cs.LG

下载: http://arxiv.org/abs/2502.07749v1

An Efficient Rehearsal Scheme for Catastrophic Forgetting Mitigation during Multi-stage Fine-tuning

Incrementally fine-tuning foundational models on new tasks or domains is now the de facto approach in NLP. A known pitfall of this approach is the \emph{catastrophic forgetting} of prior knowledge that happens during fine-tuning. A common approach to alleviate such forgetting is to rehearse samples from prior tasks during fine-tuning. Several existing works assume a fixed memory buffer to store prior task examples, while relying on inferences (forward passes) with the model at hand for choosing examples for rehearsal from the buffer. However, given the increasing computational cost of model inference, and decreasing cost of data storage, we focus on the setting to rehearse samples with a fixed computational budget instead of a fixed memory budget. We propose a sampling scheme, \texttt{\bf mix-cd}, that prioritizes rehearsal of ``collateral damage'' samples, which are samples predicted correctly by the prior model but forgotten by the incrementally tuned one. The crux of our scheme is a procedure to efficiently estimate the density of collateral damage samples without incurring additional model inferences. Our approach is computationally efficient, easy to implement, and outperforms several leading continual learning methods in compute-constrained settings. All the code will be publicly available at https://github.com/jybai/mix-cd-rehearsal.

Updated: 2025-02-11 18:25:07

标题: 一种用于多阶段微调期间减轻灾难性遗忘的高效复习方案

摘要: 在自然语言处理中，逐步微调基础模型以适应新任务或领域现在是事实上的方法。这种方法的一个已知缺陷是在微调过程中发生的先前知识的“灾难性遗忘”。缓解这种遗忘的常见方法是在微调过程中重复练习先前任务的样本。一些现有作品假定一个固定的内存缓冲区来存储先前任务的示例，同时依赖于在手头的模型上进行推断（前向传递）来从缓冲区中选择练习示例。然而，考虑到模型推断的计算成本不断增加，数据存储成本不断降低，我们专注于在固定计算预算而不是固定内存预算的设置中练习样本。我们提出了一个采样方案\texttt{\bf mix-cd}，该方案优先考虑“附带损害”样本的练习，这些样本由先前模型正确预测，但被逐步调整的模型遗忘。我们方案的关键是一种能够有效估计附带损害样本密度而不需要额外模型推断的过程。我们的方法在计算上效率高，易于实现，并在受计算限制的环境中优于几种主要的持续学习方法。所有代码将在https://github.com/jybai/mix-cd-rehearsal上公开。

更新时间: 2025-02-11 18:25:07

领域: cs.LG

下载: http://arxiv.org/abs/2402.08096v3

Fast Audio Codec Identification Using Overlapping LCS

Audio data are widely exchanged over telecommunications networks. Due to the limitations of network resources, these data are typically compressed before transmission. Various methods are available for compressing audio data. To access such audio information, it is first necessary to identify the codec used for compression. One of the most effective approaches for audio codec identification involves analyzing the content of received packets. In these methods, statistical features extracted from the packets are utilized to determine the codec employed. This paper proposes a novel method for audio codec classification based on features derived from the overlapped longest common sub-string and sub-sequence (LCS). The simulation results, which achieved an accuracy of 97% for 8 KB packets, demonstrate the superiority of the proposed method over conventional approaches. This method divides each 8 KB packet into fifteen 1 KB packets with a 50% overlap. The results indicate that this division has no significant impact on the simulation outcomes, while significantly speeding up the feature extraction, being eight times faster than the traditional method for extracting LCS features.

Updated: 2025-02-11 18:24:41

标题: 快速音频编解码器识别：利用重叠的LCS

摘要: 音频数据在电信网络上被广泛交换。由于网络资源的限制，这些数据通常在传输前被压缩。有多种方法可用于压缩音频数据。要访问这样的音频信息，首先需要识别用于压缩的编解码器。对于音频编解码器识别，最有效的方法之一涉及分析接收到的数据包的内容。在这些方法中，从数据包中提取的统计特征被用来确定所使用的编解码器。本文提出了一种基于重叠最长公共子字符串和子序列（LCS）特征的音频编解码器分类的新方法。模拟结果表明，对于8 KB数据包，该方法实现了97%的准确率，证明了该方法比传统方法更为优越。该方法将每个8 KB数据包分成十五个1 KB数据包，并且有50%的重叠。结果表明，这种划分对模拟结果没有显著影响，同时显著加快了特征提取的速度，比传统提取LCS特征的方法快了八倍。

更新时间: 2025-02-11 18:24:41

领域: cs.MM,cs.CR

下载: http://arxiv.org/abs/2502.00950v3

Reinforcement Learning from Human Feedback with Active Queries

Aligning large language models (LLM) with human preference plays a key role in building modern generative models and can be achieved by reinforcement learning from human feedback (RLHF). Despite their superior performance, current RLHF approaches often require a large amount of human-labelled preference data, which is expensive to collect. In this paper, inspired by the success of active learning, we address this problem by proposing query-efficient RLHF methods. We first formalize the alignment problem as a contextual dueling bandit problem and design an active-query-based proximal policy optimization (APPO) algorithm with an $\tilde{O}(d^2/\Delta)$ instance-dependent regret bound and an $\tilde{O}(d^2/\Delta^2)$ query complexity, where $d$ is the dimension of feature space and $\Delta$ is the sub-optimality gap over all the contexts. We then propose ADPO, a practical version of our algorithm based on direct preference optimization (DPO) and apply it to fine-tuning LLMs. Our experiments show that ADPO, while only making about half of queries for human preference, matches the performance of the state-of-the-art DPO method.

Updated: 2025-02-11 18:18:59

标题: 使用主动查询的人类反馈强化学习

摘要: 将大型语言模型（LLM）与人类偏好对齐在构建现代生成模型中起着关键作用，并且可以通过从人类反馈中进行强化学习（RLHF）来实现。尽管它们具有卓越的性能，但当前的RLHF方法通常需要大量的人工标记的偏好数据，这一过程成本高昂。在本文中，受主动学习的成功启发，我们通过提出查询高效的RLHF方法来解决这一问题。我们首先将对齐问题形式化为上下文对决乐队问题，并设计了一种基于主动查询的近端策略优化（APPO）算法，具有$\tilde{O}(d^2/\Delta)$的实例相关遗憾界限和$\tilde{O}(d^2/\Delta^2)$的查询复杂性，其中$d$是特征空间的维数，$\Delta$是所有上下文中的次优差距。然后，我们提出了ADPO，这是我们算法的一个实际版本，基于直接偏好优化（DPO），并将其应用于微调LLM。我们的实验表明，尽管只提出了约一半的人类偏好查询，ADPO与最先进的DPO方法性能相匹配。

更新时间: 2025-02-11 18:18:59

领域: cs.LG,cs.AI,cs.CL,math.OC,stat.ML

下载: http://arxiv.org/abs/2402.09401v2

Faux Polyglot: A Study on Information Disparity in Multilingual Large Language Models

Although the multilingual capability of LLMs offers new opportunities to overcome the language barrier, do these capabilities translate into real-life scenarios where linguistic divide and knowledge conflicts between multilingual sources are known occurrences? In this paper, we studied LLM's linguistic preference in a cross-language RAG-based information search setting. We found that LLMs displayed systemic bias towards information in the same language as the query language in both document retrieval and answer generation. Furthermore, in scenarios where no information is in the language of the query, LLMs prefer documents in high-resource languages during generation, potentially reinforcing the dominant views. Such bias exists for both factual and opinion-based queries. Our results highlight the linguistic divide within multilingual LLMs in information search systems. The seemingly beneficial multilingual capability of LLMs may backfire on information parity by reinforcing language-specific information cocoons or filter bubbles further marginalizing low-resource views.

Updated: 2025-02-11 18:17:53

标题: 伪多语者：关于多语言大型语言模型信息不对称的研究

摘要: 尽管LLMs的多语言能力为克服语言障碍提供了新的机遇，但这些能力是否能转化为现实生活中的情景，即多语言来源之间的语言分歧和知识冲突是已知的事件？在本文中，我们研究了LLMs在跨语言RAG（retrieval-augmented generation）信息搜索环境中的语言偏好。我们发现，LLMs在文档检索和答案生成中显示出对查询语言相同语言的信息的系统偏向。此外，在没有查询语言的信息的情况下，LLMs在生成过程中更倾向于使用高资源语言的文档，可能加强主流观点。这种偏向存在于事实和基于观点的查询中。我们的结果突显了信息搜索系统中多语言LLMs内部的语言分裂。看似有益的LLMs的多语言能力可能通过加强特定语言的信息茧或过滤泡泡进一步边缘化低资源观点，从而产生负面效果。

更新时间: 2025-02-11 18:17:53

领域: cs.CL,cs.AI,cs.IR

下载: http://arxiv.org/abs/2407.05502v3

WHODUNIT: Evaluation benchmark for culprit detection in mystery stories

We present a novel data set, WhoDunIt, to assess the deductive reasoning capabilities of large language models (LLM) within narrative contexts. Constructed from open domain mystery novels and short stories, the dataset challenges LLMs to identify the perpetrator after reading and comprehending the story. To evaluate model robustness, we apply a range of character-level name augmentations, including original names, name swaps, and substitutions with well-known real and/or fictional entities from popular discourse. We further use various prompting styles to investigate the influence of prompting on deductive reasoning accuracy. We conduct evaluation study with state-of-the-art models, specifically GPT-4o, GPT-4-turbo, and GPT-4o-mini, evaluated through multiple trials with majority response selection to ensure reliability. The results demonstrate that while LLMs perform reliably on unaltered texts, accuracy diminishes with certain name substitutions, particularly those with wide recognition. This dataset is publicly available here.

Updated: 2025-02-11 18:14:44

标题: “WHODUNIT：侦探小说中犯罪者检测的评估基准”

摘要: 我们提出了一个新颖的数据集WhoDunIt，用于评估大型语言模型（LLM）在叙事背景下的演绎推理能力。该数据集由开放领域的悬疑小说和短篇故事构建，挑战LLMs在阅读和理解故事后识别犯罪者。为了评估模型的稳健性，我们应用了一系列人物名称级别的增强，包括原始名称、名称交换以及与流行话语中的知名实际和/或虚构实体的替换。我们进一步使用各种提示样式来研究提示对演绎推理准确性的影响。我们在最先进的模型上进行评估研究，特别是GPT-4o、GPT-4-turbo和GPT-4o-mini，通过多次试验进行多数回答选择以确保可靠性。结果表明，虽然LLMs在未经改动的文本上表现可靠，但在某些名称替换的情况下，特别是那些广为人知的名称，准确性会降低。该数据集在此公开提供。

更新时间: 2025-02-11 18:14:44

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2502.07747v1

HiPoNet: A Topology-Preserving Multi-View Neural Network For High Dimensional Point Cloud and Single-Cell Data

In this paper, we propose HiPoNet, an end-to-end differentiable neural network for regression, classification, and representation learning on high-dimensional point clouds. Single-cell data can have high dimensionality exceeding the capabilities of existing methods point cloud tailored for 3D data. Moreover, modern single-cell and spatial experiments now yield entire cohorts of datasets (i.e. one on every patient), necessitating models that can process large, high-dimensional point clouds at scale. Most current approaches build a single nearest-neighbor graph, discarding important geometric information. In contrast, HiPoNet forms higher-order simplicial complexes through learnable feature reweighting, generating multiple data views that disentangle distinct biological processes. It then employs simplicial wavelet transforms to extract multi-scale features - capturing both local and global topology. We empirically show that these components preserve topological information in the learned representations, and that HiPoNet significantly outperforms state-of-the-art point-cloud and graph-based models on single cell. We also show an application of HiPoNet on spatial transcriptomics datasets using spatial co-ordinates as one of the views. Overall, HiPoNet offers a robust and scalable solution for high-dimensional data analysis.

Updated: 2025-02-11 18:13:29

标题: HiPoNet：一种保持拓扑结构的多视图神经网络，用于高维点云和单细胞数据

摘要: 在本文中，我们提出了HiPoNet，这是一个端对端可微的神经网络，用于在高维点云上进行回归、分类和表示学习。单细胞数据可能具有超出现有方法所能处理的高维度，而这些方法专门针对3D数据。此外，现代单细胞和空间实验现在产生了整个数据集群（即每位患者一个），这需要能够大规模处理大型、高维点云的模型。大多数当前方法构建一个最近邻图，丢弃了重要的几何信息。相反，HiPoNet通过可学习的特征重新加权形成更高阶的单纯形复合体，生成多个数据视图，解开不同的生物过程。然后，它使用单纯形小波变换提取多尺度特征-捕捉局部和全局拓扑。我们在实证中展示了这些组件在学习表示中保留拓扑信息，并且HiPoNet在单细胞上显著优于最先进的点云和基于图的模型。我们还展示了HiPoNet在空间转录组数据集上的应用，其中使用空间坐标作为其中一个视图。总的来说，HiPoNet为高维数据分析提供了一个稳健且可扩展的解决方案。

更新时间: 2025-02-11 18:13:29

领域: cs.LG,math.AT

下载: http://arxiv.org/abs/2502.07746v1

UNSURE: self-supervised learning with Unknown Noise level and Stein's Unbiased Risk Estimate

Recently, many self-supervised learning methods for image reconstruction have been proposed that can learn from noisy data alone, bypassing the need for ground-truth references. Most existing methods cluster around two classes: i) Stein's Unbiased Risk Estimate (SURE) and similar approaches that assume full knowledge of the noise distribution, and ii) Noise2Self and similar cross-validation methods that require very mild knowledge about the noise distribution. The first class of methods tends to be impractical, as the noise level is often unknown in real-world applications, and the second class is often suboptimal compared to supervised learning. In this paper, we provide a theoretical framework that characterizes this expressivity-robustness trade-off and propose a new approach based on SURE, but unlike the standard SURE, does not require knowledge about the noise level. Throughout a series of experiments, we show that the proposed estimator outperforms other existing self-supervised methods on various imaging inverse problems.

Updated: 2025-02-11 18:09:35

标题: 不确定：具有未知噪声水平和斯坦恩无偏风险估计的自监督学习

摘要: 最近，许多用于图像重建的自监督学习方法已被提出，可以仅从噪声数据中学习，无需依赖地面真实参考。大多数现有方法聚集在两类方法周围：i）Stein的无偏风险估计（SURE）和类似方法，假设对噪声分布有完全了解，ii）Noise2Self和类似的交叉验证方法，需要对噪声分布有非常轻微的了解。第一类方法往往不切实际，因为在现实世界的应用中通常无法确定噪声水平，而第二类方法通常与监督学习相比不够优化。在本文中，我们提供了一个理论框架，描述了表达能力和鲁棒性之间的权衡，并提出了一种基于SURE的新方法，但与标准SURE不同，不需要关于噪声水平的知识。通过一系列实验，我们展示了所提出的估计器在各种成像逆问题上优于其他现有的自监督方法。

更新时间: 2025-02-11 18:09:35

领域: stat.ML,cs.LG,eess.SP,68U10,I.4.5; I.2.10; G.3

下载: http://arxiv.org/abs/2409.01985v4

Training Language Models to Reason Efficiently

Scaling model size and training data has led to great advances in the performance of Large Language Models (LLMs). However, the diminishing returns of this approach necessitate alternative methods to improve model capabilities, particularly in tasks requiring advanced reasoning. Large reasoning models, which leverage long chain-of-thoughts, bring unprecedented breakthroughs in problem-solving capabilities but at a substantial deployment cost associated to longer generations. Reducing inference costs is crucial for the economic feasibility, user experience, and environmental sustainability of these models. In this work, we propose to train large reasoning models to reason efficiently. More precisely, we use reinforcement learning (RL) to train reasoning models to dynamically allocate inference-time compute based on task complexity. Our method incentivizes models to minimize unnecessary computational overhead while maintaining accuracy, thereby achieving substantial efficiency gains. It enables the derivation of a family of reasoning models with varying efficiency levels, controlled via a single hyperparameter. Experiments on two open-weight large reasoning models demonstrate significant reductions in inference cost while preserving most of the accuracy.

Updated: 2025-02-11 18:06:02

标题: 训练语言模型以有效推理

摘要: 规模化模型大小和训练数据的增加，使得大型语言模型（LLMs）的性能有了很大的进步。然而，这种方法的收益递减需要采取替代方法来提高模型能力，特别是在需要高级推理的任务中。大型推理模型利用长链式思维带来了在问题解决能力方面的前所未有的突破，但与更长的生成相关的部署成本也是相当高昂的。降低推理成本对于这些模型的经济可行性、用户体验和环境可持续性至关重要。在这项工作中，我们提出训练大型推理模型以实现高效推理。更具体地说，我们使用强化学习（RL）来训练推理模型，根据任务复杂性动态分配推理时间计算。我们的方法鼓励模型最小化不必要的计算开销，同时保持准确性，从而实现了显著的效率提升。它可以衍生出一系列具有不同效率水平的推理模型，通过单一超参数进行控制。对两个开放权重的大型推理模型的实验结果显示，在保留大部分准确性的同时，推理成本显著降低。

更新时间: 2025-02-11 18:06:02

领域: cs.LG,cs.CL

下载: http://arxiv.org/abs/2502.04463v2

Advancing climate model interpretability: Feature attribution for Arctic melt anomalies

The focus of our work is improving the interpretability of anomalies in climate models and advancing our understanding of Arctic melt dynamics. The Arctic and Antarctic ice sheets are experiencing rapid surface melting and increased freshwater runoff, contributing significantly to global sea level rise. Understanding the mechanisms driving snowmelt in these regions is crucial. ERA5, a widely used reanalysis dataset in polar climate studies, offers extensive climate variables and global data assimilation. However, its snowmelt model employs an energy imbalance approach that may oversimplify the complexity of surface melt. In contrast, the Glacier Energy and Mass Balance (GEMB) model incorporates additional physical processes, such as snow accumulation, firn densification, and meltwater percolation/refreezing, providing a more detailed representation of surface melt dynamics. In this research, we focus on analyzing surface snowmelt dynamics of the Greenland Ice Sheet using feature attribution for anomalous melt events in ERA5 and GEMB models. We present a novel unsupervised attribution method leveraging counterfactual explanation method to analyze detected anomalies in ERA5 and GEMB. Our anomaly detection results are validated using MEaSUREs ground-truth data, and the attributions are evaluated against established feature ranking methods, including XGBoost, Shapley values, and Random Forest. Our attribution framework identifies the physics behind each model and the climate features driving melt anomalies. These findings demonstrate the utility of our attribution method in enhancing the interpretability of anomalies in climate models and advancing our understanding of Arctic melt dynamics.

Updated: 2025-02-11 18:05:54

标题: 推进气候模型的可解释性：北极融化异常的特征归因

摘要: 我们的工作重点是改善气候模型中异常现象的可解释性，并推动我们对北极融化动态的理解。北极和南极冰盖正在经历快速的地表融化和增加的淡水径流，这对全球海平面上升贡献显著。了解驱动这些地区雪融化的机制至关重要。ERA5是极地气候研究中广泛使用的再分析数据集，提供了广泛的气候变量和全球数据同化。然而，其雪融化模型采用能量失衡方法，可能过于简化地表融化的复杂性。相比之下，冰川能量和质量平衡（GEMB）模型融入了额外的物理过程，如积雪积累、冰雪密实化和融水渗透/再冻结，提供了更详细的地表融化动态表现。在这项研究中，我们专注于使用特征归因来分析格陵兰冰盖的地表雪融化动态，以探究ERA5和GEMB模型中异常融化事件的特征。我们提出了一种利用反事实解释方法的新型无监督归因方法来分析ERA5和GEMB中检测到的异常。我们的异常检测结果通过MEaSUREs地面真实数据进行验证，并通过与XGBoost、Shapley值和随机森林等已建立的特征排名方法进行评估。我们的归因框架确定了每个模型背后的物理学和驱动融化异常的气候特征。这些发现展示了我们的归因方法在增强气候模型中异常现象的可解释性和推动我们对北极融化动态的理解方面的实用性。

更新时间: 2025-02-11 18:05:54

领域: cs.LG

下载: http://arxiv.org/abs/2502.07741v1

What makes math problems hard for reinforcement learning: a case study

Using a long-standing conjecture from combinatorial group theory, we explore, from multiple perspectives, the challenges of finding rare instances carrying disproportionately high rewards. Based on lessons learned in the context defined by the Andrews-Curtis conjecture, we propose algorithmic enhancements and a topological hardness measure with implications for a broad class of search problems. As part of our study, we also address several open mathematical questions. Notably, we demonstrate the length reducibility of all but two presentations in the Akbulut-Kirby series (1981), and resolve various potential counterexamples in the Miller-Schupp series (1991), including three infinite subfamilies.

Updated: 2025-02-11 18:01:40

标题: 什么让数学问题对强化学习变得困难：案例研究

摘要: 利用组合群论中的一个长期存在的猜想，我们从多个角度探讨了寻找罕见实例带来的巨大回报的挑战。基于在安德鲁斯-柯蒂斯猜想定义的背景中所学到的经验，我们提出了算法改进和拓扑难度度量，并对一类广泛的搜索问题提出了相关影响。作为我们研究的一部分，我们还解决了几个未解数学问题。值得注意的是，我们证明了Akbulut-Kirby系列（1981）中除两个演示之外的所有演示都具有长度可约性，并解决了Miller-Schupp系列（1991）中各种潜在反例，包括三个无限子系列。

更新时间: 2025-02-11 18:01:40

领域: cs.LG,cs.AI,math.CO,math.GR,math.GT

下载: http://arxiv.org/abs/2408.15332v2

Revisiting Non-Acyclic GFlowNets in Discrete Environments

Generative Flow Networks (GFlowNets) are a family of generative models that learn to sample objects from a given probability distribution, potentially known up to a normalizing constant. Instead of working in the object space, GFlowNets proceed by sampling trajectories in an appropriately constructed directed acyclic graph environment, greatly relying on the acyclicity of the graph. In our paper, we revisit the theory that relaxes the acyclicity assumption and present a simpler theoretical framework for non-acyclic GFlowNets in discrete environments. Moreover, we provide various novel theoretical insights related to training with fixed backward policies, the nature of flow functions, and connections between entropy-regularized RL and non-acyclic GFlowNets, which naturally generalize the respective concepts and theoretical results from the acyclic setting. In addition, we experimentally re-examine the concept of loss stability in non-acyclic GFlowNet training, as well as validate our own theoretical findings.

Updated: 2025-02-11 17:55:03

标题: 重访离散环境中的非循环GFlowNets

摘要: 生成流网络（GFlowNets）是一类生成模型，学习从给定的概率分布中采样对象，可能已知直到一个归一化常数。GFlowNets不是在对象空间中工作，而是通过在适当构建的有向无环图环境中采样轨迹来进行操作，严重依赖于图的无环性。在我们的论文中，我们重新审视了放宽无环性假设的理论，并提出了一个更简单的理论框架，用于非无环GFlowNets在离散环境中。此外，我们提供了与使用固定后向策略训练、流函数的性质以及熵正则化强化学习和非无环GFlowNets之间的联系相关的各种新颖的理论洞见，这自然地推广了从无环设置中相应的概念和理论结果。此外，我们在实验中重新审视了非无环GFlowNet训练中损失稳定性的概念，并验证了我们自己的理论结果。

更新时间: 2025-02-11 17:55:03

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2502.07735v1

SpaceMesh: A Continuous Representation for Learning Manifold Surface Meshes

Meshes are ubiquitous in visual computing and simulation, yet most existing machine learning techniques represent meshes only indirectly, e.g. as the level set of a scalar field or deformation of a template, or as a disordered triangle soup lacking local structure. This work presents a scheme to directly generate manifold, polygonal meshes of complex connectivity as the output of a neural network. Our key innovation is to define a continuous latent connectivity space at each mesh vertex, which implies the discrete mesh. In particular, our vertex embeddings generate cyclic neighbor relationships in a halfedge mesh representation, which gives a guarantee of edge-manifoldness and the ability to represent general polygonal meshes. This representation is well-suited to machine learning and stochastic optimization, without restriction on connectivity or topology. We first explore the basic properties of this representation, then use it to fit distributions of meshes from large datasets. The resulting models generate diverse meshes with tessellation structure learned from the dataset population, with concise details and high-quality mesh elements. In applications, this approach not only yields high-quality outputs from generative models, but also enables directly learning challenging geometry processing tasks such as mesh repair.

Updated: 2025-02-11 17:53:46

标题: 空间网格：用于学习流形表面网格的连续表示

摘要: Mesh在视觉计算和模拟中无处不在，然而大多数现有的机器学习技术只间接表示Mesh，例如作为标量场的水平集或模板的变形，或者作为缺乏局部结构的无序三角形集。本文提出了一种方案，通过神经网络直接生成具有复杂连接性的流形多边形Mesh。我们的关键创新是在每个Mesh顶点定义一个连续的潜在连接空间，这意味着离散Mesh。特别是，我们的顶点嵌入在半边Mesh表示中生成循环邻居关系，这保证了边缘流形性和表示一般多边形Mesh的能力。这种表示非常适合机器学习和随机优化，不受连接性或拓扑的限制。我们首先探讨了这种表示的基本属性，然后用它来拟合大型数据集中的Mesh分布。结果模型生成从数据集人口中学习的具有镶嵌结构的各种Mesh，具有简洁的细节和高质量的Mesh元素。在应用中，这种方法不仅能够从生成模型中产生高质量的输出，还能直接学习挑战性的几何处理任务，如Mesh修复。

更新时间: 2025-02-11 17:53:46

领域: cs.CV,cs.GR,cs.LG

下载: http://arxiv.org/abs/2409.20562v2

EdgeEar: Efficient and Accurate Ear Recognition for Edge Devices

Ear recognition is a contactless and unobtrusive biometric technique with applications across various domains. However, deploying high-performing ear recognition models on resource-constrained devices is challenging, limiting their applicability and widespread adoption. This paper introduces EdgeEar, a lightweight model based on a proposed hybrid CNN-transformer architecture to solve this problem. By incorporating low-rank approximations into specific linear layers, EdgeEar reduces its parameter count by a factor of 50 compared to the current state-of-the-art, bringing it below two million while maintaining competitive accuracy. Evaluation on the Unconstrained Ear Recognition Challenge (UERC2023) benchmark shows that EdgeEar achieves the lowest EER while significantly reducing computational costs. These findings demonstrate the feasibility of efficient and accurate ear recognition, which we believe will contribute to the wider adoption of ear biometrics.

Updated: 2025-02-11 17:53:33

标题: 边缘设备上高效准确的耳朵识别技术：EdgeEar

摘要: 耳朵识别是一种无接触且不显眼的生物特征识别技术，在各个领域都有应用。然而，在资源受限设备上部署高性能的耳朵识别模型是具有挑战性的，这限制了它们的适用性和广泛采用。本文介绍了EdgeEar，这是一种基于提出的混合CNN-transformer架构的轻量级模型，以解决这一问题。通过将低秩近似法引入特定的线性层中，EdgeEar将其参数数量减少了50倍，使其降至两百万以下，同时保持竞争性的准确性。在无约束耳朵识别挑战（UERC2023）基准测试上的评估表明，EdgeEar实现了最低的EER，同时显著降低了计算成本。这些发现表明了高效准确的耳朵识别的可行性，我们相信这将有助于更广泛地采用耳朵生物特征识别技术。

更新时间: 2025-02-11 17:53:33

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2502.07734v1

Economics of Sourcing Human Data

Progress in AI has relied on human-generated data, from annotator marketplaces to the wider Internet. However, the widespread use of large language models now threatens the quality and integrity of human-generated data on these very platforms. We argue that this issue goes beyond the immediate challenge of filtering AI-generated content--it reveals deeper flaws in how data collection systems are designed. Existing systems often prioritize speed, scale, and efficiency at the cost of intrinsic human motivation, leading to declining engagement and data quality. We propose that rethinking data collection systems to align with contributors' intrinsic motivations--rather than relying solely on external incentives--can help sustain high-quality data sourcing at scale while maintaining contributor trust and long-term participation.

Updated: 2025-02-11 17:51:52

标题: 采集人类数据的经济学

摘要: 人工智能的进展依赖于人类生成的数据，从注释者市场到更广泛的互联网。然而，大型语言模型的广泛使用现在威胁着这些平台上人类生成数据的质量和完整性。我们认为，这个问题超出了过滤人工智能生成内容的即时挑战--它揭示了数据收集系统设计中更深层次的缺陷。现有系统往往以速度、规模和效率为优先，以牺牲固有的人类动机为代价，导致参与度和数据质量下降。我们建议重新思考数据收集系统，使其与贡献者的内在动机相一致--而不仅仅依赖外部激励--可以帮助在规模上维持高质量的数据采集，同时保持贡献者的信任和长期参与。

更新时间: 2025-02-11 17:51:52

领域: cs.CY,cs.AI,cs.CL,cs.CV,cs.HC,cs.LG

下载: http://arxiv.org/abs/2502.07732v1

TopoTune : A Framework for Generalized Combinatorial Complex Neural Networks

Graph Neural Networks (GNNs) excel in learning from relational datasets, processing node and edge features in a way that preserves the symmetries of the graph domain. However, many complex systems -- such as biological or social networks--involve multiway complex interactions that are more naturally represented by higher-order topological domains. The emerging field of Topological Deep Learning (TDL) aims to accommodate and leverage these higher-order structures. Combinatorial Complex Neural Networks (CCNNs), fairly general TDL models, have been shown to be more expressive and better performing than GNNs. However, differently from the GNN ecosystem, TDL lacks a principled and standardized framework for easily defining new architectures, restricting its accessibility and applicability. To address this issue, we introduce Generalized CCNNs (GCCNs), a novel simple yet powerful family of TDL models that can be used to systematically transform any (graph) neural network into its TDL counterpart. We prove that GCCNs generalize and subsume CCNNs, while extensive experiments on a diverse class of GCCNs show that these architectures consistently match or outperform CCNNs, often with less model complexity. In an effort to accelerate and democratize TDL, we introduce TopoTune, a lightweight software for defining, building, and training GCCNs with unprecedented flexibility and ease.

Updated: 2025-02-11 17:49:04

标题: TopoTune：一种用于广义组合复杂神经网络的框架

摘要: 图神经网络（GNNs）擅长从关系数据集中学习，以一种保留图域对称性的方式处理节点和边特征。然而，许多复杂系统--如生物或社交网络--涉及更自然地由高阶拓扑域表示的多向复杂交互。新兴的拓扑深度学习（TDL）领域旨在容纳和利用这些高阶结构。组合复杂神经网络（CCNNs）是相当通用的TDL模型，已被证明在表达和性能上优于GNNs。然而，与GNN生态系统不同，TDL缺乏一个原则性和标准化的框架，可以方便地定义新的架构，限制了其可访问性和适用性。为了解决这个问题，我们引入了广义CCNNs（GCCNs），这是一种新颖而简单而强大的TDL模型系列，可以系统地将任何（图）神经网络转换为其TDL对应物。我们证明GCCNs泛化和包含CCNNs，而对各种类别的GCCNs进行大量实验表明，这些架构通常与或胜过CCNNs，往往具有更少的模型复杂性。为了加速和民主化TDL，我们引入了TopoTune，一个轻量级软件，可用于定义、构建和训练GCCNs，具有前所未有的灵活性和便利性。

更新时间: 2025-02-11 17:49:04

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2410.06530v3

Libra: Architectural Support For Principled, Secure And Efficient Balanced Execution On High-End Processors (Extended Version)

Control-flow leakage (CFL) attacks enable an attacker to expose control-flow decisions of a victim program via side-channel observations. Linearization (i.e., elimination) of secret-dependent control flow is the main countermeasure against these attacks, yet it comes at a non-negligible cost. Conversely, balancing secret-dependent branches often incurs a smaller overhead, but is notoriously insecure on high-end processors. Hence, linearization has been widely believed to be the only effective countermeasure against CFL attacks. In this paper, we challenge this belief and investigate an unexplored alternative: how to securely balance secret-dependent branches on higher-end processors? We propose Libra, a generic and principled hardware-software codesign to efficiently address CFL on high-end processors. We perform a systematic classification of hardware primitives leaking control flow from the literature, and provide guidelines to handle them with our design. Importantly, Libra enables secure control-flow balancing without the need to disable performance-critical hardware such as the instruction cache and the prefetcher. We formalize the semantics of Libra and propose a code transformation algorithm for securing programs, which we prove correct and secure. Finally, we implement and evaluate Libra on an out-of-order RISC-V processor, showing performance overhead on par with insecure balanced code, and outperforming state-of-the-art linearized code by 19.3%.

Updated: 2025-02-11 17:48:15

标题: 天秤座：高端处理器上支持原则、安全和高效平衡执行的架构支持（扩展版本）

摘要: 控制流泄漏（CFL）攻击使攻击者能够通过侧信道观察暴露受害程序的控制流决策。线性化（即消除）依赖于秘密的控制流是对抗这些攻击的主要对策，然而这种方法会带来不可忽视的成本。相反，平衡依赖于秘密的分支通常会产生较小的开销，但在高端处理器上安全性差。因此，人们普遍认为线性化是对抗CFL攻击的唯一有效对策。本文挑战了这种观念，探讨了一个未被开发的替代方案：如何在高端处理器上安全地平衡依赖于秘密的分支？我们提出了Libra，这是一个通用的原则性硬件-软件代码设计，能够有效应对高端处理器上的CFL。我们对文献中泄漏控制流的硬件基元进行了系统分类，并提供了处理它们的指导方针。重要的是，Libra使得可以在不需要禁用性能关键硬件（如指令缓存和预取器）的情况下实现安全的控制流平衡。我们正式规范了Libra的语义，并提出了一个用于保护程序的代码转换算法，我们证明其正确且安全。最后，我们在一个乱序RISC-V处理器上实现和评估了Libra，结果显示性能开销与不安全的平衡代码相当，并且优于最先进的线性化代码19.3%。

更新时间: 2025-02-11 17:48:15

领域: cs.CR

下载: http://arxiv.org/abs/2409.03743v2

The Benefits of Balance: From Information Projections to Variance Reduction

Data balancing across multiple modalities and sources appears in various forms in foundation models in machine learning and AI, e.g. in CLIP and DINO. We show that data balancing across modalities and sources actually offers an unsuspected benefit: variance reduction. We present a non-asymptotic statistical bound that quantifies this variance reduction effect and relates it to the eigenvalue decay of Markov operators. Furthermore, we describe how various forms of data balancing in contrastive multimodal learning and self-supervised clustering can be better understood, and even improved upon, owing to our variance reduction viewpoint.

Updated: 2025-02-11 17:47:11

标题: 平衡的好处：从信息投影到方差减少

摘要: 跨多种模态和来源的数据平衡在机器学习和人工智能的基础模型中以各种形式出现，例如在CLIP和DINO中。我们展示了跨模态和来源的数据平衡实际上提供了一个意想不到的好处：方差降低。我们提出了一个非渐近统计界限，量化了这种方差降低效应，并将其与马尔可夫算子的特征值衰减相关联。此外，我们描述了如何更好地理解对比多模态学习和自监督聚类中的各种形式的数据平衡，并甚至得以改进，这要归功于我们的方差降低观点。

更新时间: 2025-02-11 17:47:11

领域: stat.ML,cs.LG,math.ST,stat.TH

下载: http://arxiv.org/abs/2408.15065v2

The Faiss library

Vector databases typically manage large collections of embedding vectors. Currently, AI applications are growing rapidly, and so is the number of embeddings that need to be stored and indexed. The Faiss library is dedicated to vector similarity search, a core functionality of vector databases. Faiss is a toolkit of indexing methods and related primitives used to search, cluster, compress and transform vectors. This paper describes the trade-off space of vector search and the design principles of Faiss in terms of structure, approach to optimization and interfacing. We benchmark key features of the library and discuss a few selected applications to highlight its broad applicability.

Updated: 2025-02-11 17:43:59

标题: Faiss图书馆

摘要: 矢量数据库通常管理大量的嵌入向量。当前，人工智能应用正在快速增长，需要存储和索引的嵌入向量数量也在增加。Faiss库致力于矢量相似性搜索，这是矢量数据库的核心功能。Faiss是用于搜索、聚类、压缩和转换向量的索引方法和相关原语的工具包。本文描述了矢量搜索的权衡空间以及Faiss的设计原则，包括结构、优化方法和接口。我们对库的关键功能进行了基准测试，并讨论了一些选择的应用，以突显其广泛适用性。

更新时间: 2025-02-11 17:43:59

领域: cs.LG,cs.CV,cs.SE

下载: http://arxiv.org/abs/2401.08281v3

Verifying LLM-Generated Code in the Context of Software Verification with Ada/SPARK

Large language models (LLMs) have demonstrated remarkable code generation capabilities, but the correctness of the generated code cannot be inherently trusted. This paper explores the feasibility of using formal software verification, specifically the SPARK framework for Ada, to ensure the reliability of LLM-generated code. We present Marmaragan, a tool that leverages an LLM in order to generate SPARK annotations for existing programs, enabling formal verification of the code. The tool is benchmarked on a curated set of SPARK programs, with annotations selectively removed to test specific capabilities. The performance of Marmaragan with GPT-4o on the benchmark is promising, with correct annotations having been generated for 50.7% of the benchmark cases. The results establish a foundation for future work on combining the power of LLMs with the reliability of formal software verification.

Updated: 2025-02-11 17:42:07

标题: 在Ada/SPARK软件验证背景下验证LLM生成的代码

摘要: 大型语言模型（LLMs）已经展示出了出色的代码生成能力，但生成的代码的正确性不能从根本上信任。本文探讨了使用形式化软件验证，特别是Ada的SPARK框架，来确保LLM生成的代码的可靠性的可行性。我们提出了Marmaragan，这是一个利用LLM来为现有程序生成SPARK注释的工具，从而实现对代码的形式化验证。该工具在一个经过精心筛选的SPARK程序集上进行了基准测试，选择性地删除注释以测试特定功能。在基准测试中，Marmaragan与GPT-4o的性能很有前途，对50.7％的基准案例生成了正确的注释。这些结果为将LLMs的强大功能与形式化软件验证的可靠性结合起来的未来工作奠定了基础。

更新时间: 2025-02-11 17:42:07

领域: cs.SE,cs.AI

下载: http://arxiv.org/abs/2502.07728v1

Hallucination Detection in Foundation Models for Decision-Making: A Flexible Definition and Review of the State of the Art

Autonomous systems are soon to be ubiquitous, spanning manufacturing, agriculture, healthcare, entertainment, and other industries. Most of these systems are developed with modular sub-components for decision-making, planning, and control that may be hand-engineered or learning-based. While these approaches perform well under the situations they were specifically designed for, they can perform especially poorly in out-of-distribution scenarios that will undoubtedly arise at test-time. The rise of foundation models trained on multiple tasks with impressively large datasets has led researchers to believe that these models may provide "common sense" reasoning that existing planners are missing, bridging the gap between algorithm development and deployment. While researchers have shown promising results in deploying foundation models to decision-making tasks, these models are known to hallucinate and generate decisions that may sound reasonable, but are in fact poor. We argue there is a need to step back and simultaneously design systems that can quantify the certainty of a model's decision, and detect when it may be hallucinating. In this work, we discuss the current use cases of foundation models for decision-making tasks, provide a general definition for hallucinations with examples, discuss existing approaches to hallucination detection and mitigation with a focus on decision problems, present guidelines, and explore areas for further research in this exciting field.

Updated: 2025-02-11 17:40:41

标题: 基于决策制定的基础模型中的幻觉检测：灵活定义和现有技术综述

摘要: 自主系统很快将无处不在，涵盖制造业、农业、医疗保健、娱乐和其他行业。大多数这些系统都使用模块化子组件进行决策、规划和控制，这些组件可能是手工设计的，也可能是基于学习的。虽然这些方法在它们专门设计用于的情况下表现良好，但在测试时不可避免会出现分布范围之外的情况，它们的表现可能特别糟糕。训练在多个任务上的基础模型，使用庞大的数据集，使研究人员相信这些模型可能提供现有规划器所缺乏的“常识”推理，弥合算法开发和部署之间的差距。虽然研究人员已经展示了将基础模型应用于决策任务的有希望的结果，但这些模型被认为会产生幻觉，并生成听起来合理但实际上很差的决策。我们认为有必要退一步，同时设计可以量化模型决策的确定性，并检测何时可能出现幻觉的系统。在这项工作中，我们讨论了基础模型在决策任务中的当前用例，提供了幻觉的一般定义和示例，讨论了现有的幻觉检测和缓解方法，重点关注决策问题，提出了指导方针，并探讨了这一激动人心领域的进一步研究方向。

更新时间: 2025-02-11 17:40:41

领域: cs.AI,cs.CL,cs.RO

下载: http://arxiv.org/abs/2403.16527v2

Novelty Detection in Reinforcement Learning with World Models

Reinforcement learning (RL) using world models has found significant recent successes. However, when a sudden change to world mechanics or properties occurs then agent performance and reliability can dramatically decline. We refer to the sudden change in visual properties or state transitions as novelties. Implementing novelty detection within generated world model frameworks is a crucial task for protecting the agent when deployed. In this paper, we propose straightforward bounding approaches to incorporate novelty detection into world model RL agents, by utilizing the misalignment of the world model's hallucinated states and the true observed states as an anomaly score. We provide effective approaches to detecting novelties in a distribution of transitions learned by an agent in a world model. Finally, we show the advantage of our work in a novel environment compared to traditional machine learning novelty detection methods as well as currently accepted RL focused novelty detection algorithms.

Updated: 2025-02-11 17:38:13

标题: 在世界模型中的强化学习中的新颖性检测

摘要: 使用世界模型的强化学习（RL）最近取得了显著的成功。然而，当世界机械或属性发生突然变化时，代理的性能和可靠性可能会急剧下降。我们将视觉特性或状态转换的突然变化称为新颖性。在生成的世界模型框架中实现新颖性检测是保护代理在部署时的关键任务。在本文中，我们提出了直接的边界方法，将新颖性检测纳入世界模型RL代理中，通过利用世界模型产生的虚拟状态与真实观察状态的不一致性作为异常分数。我们提供了有效的方法来检测代理在世界模型中学习的转换分布中的新颖性。最后，我们展示了我们的工作在新颖环境中相对于传统机器学习新颖性检测方法以及目前接受的RL重点新颖性检测算法的优势。

更新时间: 2025-02-11 17:38:13

领域: cs.AI,cs.LG,cs.SY,eess.SY

下载: http://arxiv.org/abs/2310.08731v3

Glinthawk: A Two-Tiered Architecture for Offline LLM Inference

We introduce Glinthawk, an architecture for offline Large Language Model (LLM) inference. By leveraging a two-tiered structure, Glinthawk optimizes the utilization of the high-end accelerators ("Tier 1") by offloading the attention mechanism to lower-end compute tier ("Tier 2"). This separation allows the memory demand of the attention, known as the key-value cache, to scale independently from the model weights, enabling larger batch sizes and more efficient accelerator usage. Prototyped with NVIDIA T4 GPUs and standard CPU VMs, Glinthawk improves throughput by $5.9\times$ and reduces cost of generation by $2.8\times$, compared to paged attention baselines. For long sequence lengths, it achieves $16.3\times$ throughput improvement at $2.4\times$ less cost. Our evaluation shows that this architecture can tolerate moderate network latency with minimal performance degradation, making it highly effective for latency-tolerant, throughput-focused applications such as batch processing. The prototype is publicly available at https://github.com/microsoft/glinthawk.

Updated: 2025-02-11 17:36:32

标题: Glinthawk：用于离线LLM推理的两层架构

摘要: 我们介绍了Glinthawk，这是一种用于离线大语言模型(LLM)推断的架构。通过利用两层结构，Glinthawk通过将注意机制卸载到较低端的计算层("Tier 2")来优化高端加速器("Tier 1")的利用率。这种分离允许注意力的内存需求（称为键值缓存）独立于模型权重进行扩展，从而实现更大的批处理大小和更有效的加速器使用。Glinthawk使用NVIDIA T4 GPU和标准CPU VM进行原型设计，与分页注意基准相比，通过提高吞吐量$5.9倍，降低生成成本$2.8倍。对于长序列长度，它实现了吞吐量提高16.3倍，成本降低2.4倍。我们的评估表明，这种架构可以容忍适度的网络延迟，几乎不降低性能，使其非常适用于容忍延迟、关注吞吐量的应用，如批处理。该原型公开可用于https://github.com/microsoft/glinthawk。

更新时间: 2025-02-11 17:36:32

领域: cs.LG,cs.DC,cs.PF

下载: http://arxiv.org/abs/2501.11779v2

Natural Variational Annealing for Multimodal Optimization

We introduce a new multimodal optimization approach called Natural Variational Annealing (NVA) that combines the strengths of three foundational concepts to simultaneously search for multiple global and local modes of black-box nonconvex objectives. First, it implements a simultaneous search by using variational posteriors, such as, mixtures of Gaussians. Second, it applies annealing to gradually trade off exploration for exploitation. Finally, it learns the variational search distribution using natural-gradient learning where updates resemble well-known and easy-to-implement algorithms. The three concepts come together in NVA giving rise to new algorithms and also allowing us to incorporate "fitness shaping", a core concept from evolutionary algorithms. We assess the quality of search on simulations and compare them to methods using gradient descent and evolution strategies. We also provide an application to a real-world inverse problem in planetary science.

Updated: 2025-02-11 17:36:13

标题: 自然变分退火用于多模态优化

摘要: 我们引入了一种新的多模态优化方法，称为自然变分退火（NVA），它结合了三个基本概念的优势，同时搜索黑盒非凸目标的多个全局和局部模态。首先，它通过使用变分后验（如高斯混合）实现同时搜索。其次，它应用退火逐渐权衡探索和利用。最后，它使用自然梯度学习来学习变分搜索分布，其中更新类似于众所周知且易于实现的算法。这三个概念在NVA中汇集在一起，产生了新的算法，并允许我们整合“适应性塑造”，这是进化算法的核心概念。我们通过模拟评估了搜索的质量，并将其与使用梯度下降和进化策略的方法进行了比较。我们还提供了一个应用于行星科学中的真实逆问题。

更新时间: 2025-02-11 17:36:13

领域: stat.ML,cs.LG,stat.CO

下载: http://arxiv.org/abs/2501.04667v2

TMLC-Net: Transferable Meta Label Correction for Noisy Label Learning

The prevalence of noisy labels in real-world datasets poses a significant impediment to the effective deployment of deep learning models. While meta-learning strategies have emerged as a promising approach for addressing this challenge, existing methods often suffer from limited transferability and task-specific designs. This paper introduces TMLC-Net, a novel Transferable Meta-Learner for Correcting Noisy Labels, designed to overcome these limitations. TMLC-Net learns a general-purpose label correction strategy that can be readily applied across diverse datasets and model architectures without requiring extensive retraining or fine-tuning. Our approach integrates three core components: (1) Normalized Noise Perception, which captures and normalizes training dynamics to handle distribution shifts; (2) Time-Series Encoding, which models the temporal evolution of sample statistics using a recurrent neural network; and (3) Subclass Decoding, which predicts a corrected label distribution based on the learned representations. We conduct extensive experiments on benchmark datasets with various noise types and levels, demonstrating that TMLC-Net consistently outperforms state-of-the-art methods in terms of both accuracy and robustness to label noise. Furthermore, we analyze the transferability of TMLC-Net, showcasing its adaptability to new datasets and noise conditions, and establishing its potential as a broadly applicable solution for robust deep learning in noisy environments.

Updated: 2025-02-11 17:33:48

标题: TMLC-Net：用于嘈杂标签学习的可转移元标签校正

摘要: 真实世界数据集中标签噪声的普遍存在严重影响了深度学习模型的有效部署。虽然元学习策略已经成为解决这一挑战的一种有希望的方法，但现有方法往往受到有限的可转移性和任务特定设计的影响。本文介绍了一种新颖的可转移的元学习器TMLC-Net，用于纠正标签噪声，旨在克服这些限制。TMLC-Net学习了一种通用的标签校正策略，可以在不需要大量重新训练或微调的情况下，跨不同数据集和模型架构进行快速应用。我们的方法集成了三个核心组件：（1）标准化噪声感知，捕捉和规范训练动态以处理分布转移；（2）时间序列编码，使用循环神经网络模拟样本统计数据的时间演变；（3）子类解码，基于学习到的表示预测校正标签分布。我们在具有各种噪声类型和级别的基准数据集上进行了大量实验，展示了TMLC-Net在准确性和对标签噪声的鲁棒性方面始终优于最先进的方法。此外，我们分析了TMLC-Net的可转移性，展示了它对新数据集和噪声条件的适应性，并建立了其作为噪声环境中鲁棒深度学习的广泛适用解决方案的潜力。

更新时间: 2025-02-11 17:33:48

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2502.07721v1

Drago: Primal-Dual Coupled Variance Reduction for Faster Distributionally Robust Optimization

We consider the penalized distributionally robust optimization (DRO) problem with a closed, convex uncertainty set, a setting that encompasses learning using $f$-DRO and spectral/$L$-risk minimization. We present Drago, a stochastic primal-dual algorithm that combines cyclic and randomized components with a carefully regularized primal update to achieve dual variance reduction. Owing to its design, Drago enjoys a state-of-the-art linear convergence rate on strongly convex-strongly concave DRO problems with a fine-grained dependency on primal and dual condition numbers. Theoretical results are supported by numerical benchmarks on regression and classification tasks.

Updated: 2025-02-11 17:28:34

标题: Drago：基本-对偶耦合方差减少用于更快的分布鲁棒优化

摘要: 我们考虑具有闭合、凸不确定性集的受惩罚分布鲁棒优化（DRO）问题，这种设置包括使用$f$-DRO和谱/$L$-风险最小化进行学习。我们提出了Drago，这是一种随机原始-对偶算法，结合了循环和随机化组件，并通过精心规范的原始更新实现对偶方差缩减。由于其设计，Drago在强凸-强凹DRO问题上拥有最先进的线性收敛速度，并且对原始和对偶条件数有细粒度依赖。理论结果得到了在回归和分类任务上的数值基准测试的支持。

更新时间: 2025-02-11 17:28:34

领域: stat.ML,cs.LG,math.OC

下载: http://arxiv.org/abs/2403.10763v2

DPO Meets PPO: Reinforced Token Optimization for RLHF

In the classical Reinforcement Learning from Human Feedback (RLHF) framework, Proximal Policy Optimization (PPO) is employed to learn from sparse, sentence-level rewards -- a challenging scenario in traditional deep reinforcement learning. Despite the great successes of PPO in the alignment of large language models, its open-source implementation is still largely sub-optimal. To address these issues, we introduce a framework that models RLHF problems as a Markov decision process (MDP), enabling the capture of fine-grained token-wise information. Under this framework, we introduce an algorithm Reinforced Token Optimization (\texttt{RTO}), which learns the token-wise reward function from preference data and performs policy optimization based on this learned token-wise reward signal. Theoretically, \texttt{RTO} is proven to have the capability of finding the near-optimal policy sample-efficiently. For its practical implementation, \texttt{RTO} innovatively integrates Direct Preference Optimization (DPO) and PPO. DPO, originally derived from sparse sentence rewards, surprisingly provides us with a token-wise characterization of response quality, which is seamlessly incorporated into our subsequent PPO training stage. Extensive experiments demonstrate that \texttt{RTO} performs better than PPO and other direct preference learning algorithms. In particular, RTO outperforms PPO by 7.5 points on the AlpacaEval 2 benchmark and by 4.1 points on Arena-Hard. Our code and models are available at \href{https://github.com/zkshan2002/RTO}{https://github.com/zkshan2002/RTO}.

Updated: 2025-02-11 17:23:13

标题: DPO遇见PPO：强化令牌优化对RLHF

摘要: 在经典的人类反馈强化学习（RLHF）框架中，采用Proximal Policy Optimization（PPO）来学习来自稀疏、句子级奖励的挑战性情景，这在传统的深度强化学习中很具挑战性。尽管PPO在大型语言模型对齐方面取得了巨大成功，但其开源实现仍然存在较大优化空间。为了解决这些问题，我们引入了一个将RLHF问题建模为马尔可夫决策过程（MDP）的框架，使其能够捕捉到细粒度的标记信息。在这个框架下，我们引入了一种名为Reinforced Token Optimization（RTO）的算法，该算法从偏好数据中学习标记奖励函数，并根据这个学习到的标记奖励信号进行策略优化。从理论上讲，RTO被证明具有以样本为效率的能力找到近似最优策略。在其实际实现中，RTO创新地将直接偏好优化（DPO）和PPO整合在一起。DPO，最初来源于稀疏句子奖励，出人意料地为我们提供了响应质量的标记特征，这被无缝地融入到我们后续的PPO训练阶段中。大量实验表明，RTO的性能优于PPO和其他直接偏好学习算法。特别是，在AlpacaEval 2基准上，RTO比PPO高出7.5个点，在Arena-Hard上高出4.1个点。我们的代码和模型可在\href{https://github.com/zkshan2002/RTO}{https://github.com/zkshan2002/RTO}处获得。

更新时间: 2025-02-11 17:23:13

领域: cs.LG,cs.AI,cs.CL,stat.ML

下载: http://arxiv.org/abs/2404.18922v3

ADMN: A Layer-Wise Adaptive Multimodal Network for Dynamic Input Noise and Compute Resources

Multimodal deep learning systems are deployed in dynamic scenarios due to the robustness afforded by multiple sensing modalities. Nevertheless, they struggle with varying compute resource availability (due to multi-tenancy, device heterogeneity, etc.) and fluctuating quality of inputs (from sensor feed corruption, environmental noise, etc.). Current multimodal systems employ static resource provisioning and cannot easily adapt when compute resources change over time. Additionally, their reliance on processing sensor data with fixed feature extractors is ill-equipped to handle variations in modality quality. Consequently, uninformative modalities, such as those with high noise, needlessly consume resources better allocated towards other modalities. We propose ADMN, a layer-wise Adaptive Depth Multimodal Network capable of tackling both challenges - it adjusts the total number of active layers across all modalities to meet compute resource constraints, and continually reallocates layers across input modalities according to their modality quality. Our evaluations showcase ADMN can match the accuracy of state-of-the-art networks while reducing up to 75% of their floating-point operations.

Updated: 2025-02-11 17:19:44

标题: ADMN：一种逐层自适应的多模态网络，用于动态输入噪声和计算资源

摘要: 多模深度学习系统由于多种感知模式所提供的稳健性而部署在动态场景中。然而，由于多租户、设备异构等原因，它们在计算资源可用性（由于多租户、设备异构等原因）和输入质量波动（来自传感器数据损坏、环境噪音等）方面面临困难。当前的多模系统采用静态资源配置，并且在计算资源随时间变化时不能轻松适应。此外，它们依赖于使用固定特征提取器处理传感器数据，无法处理模态质量的变化。因此，具有高噪声的信息不明确的模态会不必要地消耗资源，这些资源最好分配给其他模态。我们提出了ADMN，一种逐层自适应深度多模网络，能够应对这两个挑战 - 它调整所有模态间活跃层的总数以满足计算资源约束，并根据它们的模态质量持续重新分配输入模态间的层。我们的评估表明，ADMN可以在降低高达75%的浮点运算的同时，达到最先进网络的准确性水平。

更新时间: 2025-02-11 17:19:44

领域: cs.LG,cs.AI,cs.CV

下载: http://arxiv.org/abs/2502.07862v1

BalanceKV: KV Cache Compression through Discrepancy Theory

Large language models (LLMs) have achieved impressive success, but their high memory requirements present challenges for long-context token generation. The memory complexity of long-context LLMs is primarily due to the need to store Key-Value (KV) embeddings in their KV cache. We present BalanceKV, a KV cache compression method based on geometric sampling process stemming from Banaszczyk's vector balancing theory, which introduces dependencies informed by the geometry of keys and value tokens, and improves precision. BalanceKV offers both theoretically proven and empirically validated performance improvements over existing methods.

Updated: 2025-02-11 17:18:17

标题: BalanceKV：通过差异理论实现KV缓存压缩

摘要: 大型语言模型（LLMs）取得了令人印象深刻的成功，但由于其高内存需求，给长文本生成带来了挑战。长文本LLMs的内存复杂性主要是由于需要在其KV缓存中存储键-值（KV）嵌入。我们提出了BalanceKV，一种基于几何抽样过程的KV缓存压缩方法，源自Banaszczyk的向量平衡理论，其引入了受键和值标记几何形状影响的依赖关系，并提高了精度。BalanceKV在现有方法上既具有理论证明又经过实验证实的性能改进。

更新时间: 2025-02-11 17:18:17

领域: cs.LG,cs.AI,cs.DS

下载: http://arxiv.org/abs/2502.07861v1

Near-Optimal Sample Complexity in Reward-Free Kernel-Based Reinforcement Learning

Reinforcement Learning (RL) problems are being considered under increasingly more complex structures. While tabular and linear models have been thoroughly explored, the analytical study of RL under nonlinear function approximation, especially kernel-based models, has recently gained traction for their strong representational capacity and theoretical tractability. In this context, we examine the question of statistical efficiency in kernel-based RL within the reward-free RL framework, specifically asking: how many samples are required to design a near-optimal policy? Existing work addresses this question under restrictive assumptions about the class of kernel functions. We first explore this question by assuming a generative model, then relax this assumption at the cost of increasing the sample complexity by a factor of H, the length of the episode. We tackle this fundamental problem using a broad class of kernels and a simpler algorithm compared to prior work. Our approach derives new confidence intervals for kernel ridge regression, specific to our RL setting, which may be of broader applicability. We further validate our theoretical findings through simulations.

Updated: 2025-02-11 17:15:55

标题: 无奖励基于核的强化学习中的近最优样本复杂性

摘要: 强化学习（RL）问题正在越来越复杂的结构下进行考虑。虽然已经深入研究了表格和线性模型，但在非线性函数逼近，尤其是基于核的模型下对RL的分析研究最近因其强大的表征能力和理论可行性而引起关注。在这种情况下，我们在奖励自由的RL框架下考察基于核的RL中的统计效率问题，具体问题是：设计一个接近最优策略需要多少样本？现有工作在对核函数类的限制性假设下解决了这个问题。我们首先通过假设一个生成模型来探讨这个问题，然后通过增加一个长度为H的剧集的因子来放宽这个假设，从而增加样本复杂性。我们使用一类广泛的核和一个比以前的工作更简单的算法来解决这个基本问题。我们的方法为核岭回归推导了适用于我们RL设置的新置信区间，可能具有更广泛的适用性。我们通过模拟进一步验证了我们的理论发现。

更新时间: 2025-02-11 17:15:55

领域: cs.LG

下载: http://arxiv.org/abs/2502.07715v1

pFedGPA: Diffusion-based Generative Parameter Aggregation for Personalized Federated Learning

Federated Learning (FL) offers a decentralized approach to model training, where data remains local and only model parameters are shared between the clients and the central server. Traditional methods, such as Federated Averaging (FedAvg), linearly aggregate these parameters which are usually trained on heterogeneous data distributions, potentially overlooking the complex, high-dimensional nature of the parameter space. This can result in degraded performance of the aggregated model. While personalized FL approaches can mitigate the heterogeneous data issue to some extent, the limitation of linear aggregation remains unresolved. To alleviate this issue, we investigate the generative approach of diffusion model and propose a novel generative parameter aggregation framework for personalized FL, \texttt{pFedGPA}. In this framework, we deploy a diffusion model on the server to integrate the diverse parameter distributions and propose a parameter inversion method to efficiently generate a set of personalized parameters for each client. This inversion method transforms the uploaded parameters into a latent code, which is then aggregated through denoising sampling to produce the final personalized parameters. By encoding the dependence of a client's model parameters on the specific data distribution using the high-capacity diffusion model, \texttt{pFedGPA} can effectively decouple the complexity of the overall distribution of all clients' model parameters from the complexity of each individual client's parameter distribution. Our experimental results consistently demonstrate the superior performance of the proposed method across multiple datasets, surpassing baseline approaches.

Updated: 2025-02-11 17:14:43

标题: pFedGPA：基于扩散的个性化联邦学习生成参数聚合

摘要: 联邦学习（FL）提供了一种去中心化的模型训练方法，其中数据保持本地，只有模型参数在客户端和中央服务器之间共享。传统方法，如联邦平均（FedAvg），线性聚合这些参数，这些参数通常在异构数据分布上训练，可能忽视参数空间的复杂、高维特性。这可能导致聚合模型的性能下降。尽管个性化FL方法可以在一定程度上缓解异构数据问题，但线性聚合的限制仍未解决。为了减轻这个问题，我们研究了扩散模型的生成方法，并提出了一个新颖的个性化FL参数聚合框架\texttt{pFedGPA}。在这个框架中，我们在服务器上部署一个扩散模型来整合不同的参数分布，并提出一个参数反演方法来高效地为每个客户端生成一组个性化参数。这种反演方法将上传的参数转换为潜在代码，然后通过去噪抽样来聚合生成最终的个性化参数。通过使用高容量扩散模型对客户端模型参数对特定数据分布的依赖进行编码，\texttt{pFedGPA}能有效地解耦所有客户端模型参数总体分布的复杂性与每个个体客户端参数分布的复杂性。我们的实验结果一致显示出所提出的方法在多个数据集上表现出优越的性能，超越了基线方法。

更新时间: 2025-02-11 17:14:43

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2409.05701v3

(Ir)rationality in AI: State of the Art, Research Challenges and Open Questions

The concept of rationality is central to the field of artificial intelligence. Whether we are seeking to simulate human reasoning, or the goal is to achieve bounded optimality, we generally seek to make artificial agents as rational as possible. Despite the centrality of the concept within AI, there is no unified definition of what constitutes a rational agent. This article provides a survey of rationality and irrationality in artificial intelligence, and sets out the open questions in this area. The understanding of rationality in other fields has influenced its conception within artificial intelligence, in particular work in economics, philosophy and psychology. Focusing on the behaviour of artificial agents, we consider irrational behaviours that can prove to be optimal in certain scenarios. Some methods have been developed to deal with irrational agents, both in terms of identification and interaction, however work in this area remains limited. Methods that have up to now been developed for other purposes, namely adversarial scenarios, may be adapted to suit interactions with artificial agents. We further discuss the interplay between human and artificial agents, and the role that rationality plays within this interaction; many questions remain in this area, relating to potentially irrational behaviour of both humans and artificial agents.

Updated: 2025-02-11 17:06:45

标题: (Ir)rationailty in AI：现状、研究挑战和未解问题

摘要: 理性概念在人工智能领域中是至关重要的。无论我们是寻求模拟人类推理，还是目标是实现有限最优性，我们通常都希望使人工代理尽可能理性。尽管该概念在人工智能领域中具有核心地位，但并没有统一的定义来界定何为理性代理。本文对人工智能领域中的理性和非理性进行了调查，并提出了该领域的未解问题。理性在其他领域中的理解影响了其在人工智能领域内的构想，特别是经济学、哲学和心理学领域的研究。关注人工代理的行为，我们考虑在某些场景中可能证明为最优的非理性行为。已经开发了一些方法来处理非理性代理，无论是在识别还是互动方面，然而该领域的研究仍然有限。迄今为止已经为其他目的开发的方法，即对抗性场景，可以被调整以适应与人工代理的互动。我们进一步讨论人类和人工代理之间的相互作用，以及理性在这种互动中的作用；在这一领域仍然存在许多问题，涉及到人类和人工代理的潜在非理性行为。

更新时间: 2025-02-11 17:06:45

领域: cs.AI,cs.CY,cs.HC,cs.LG,cs.MA

下载: http://arxiv.org/abs/2311.17165v3

Principled Data Selection for Alignment: The Hidden Risks of Difficult Examples

The alignment of large language models (LLMs) often assumes that using more clean data yields better outcomes, overlooking the match between model capacity and example difficulty. Challenging this, we propose a new principle: Preference data vary in difficulty, and overly difficult examples hinder alignment, by exceeding the model's capacity. Through systematic experimentation, we validate this principle with three key findings: (1) preference examples vary in difficulty, as evidenced by consistent learning orders across alignment runs; (2) overly difficult examples significantly degrade performance across four LLMs and two datasets; and (3) the capacity of a model dictates its threshold for handling difficult examples, underscoring a critical relationship between data selection and model capacity. Building on this principle, we introduce Selective DPO, which filters out overly difficult examples. This simple adjustment improves alignment performance by 9-16% in win rates on the AlpacaEval 2 benchmark compared to the DPO baseline, suppressing a series of DPO variants with different algorithmic adjustments. Together, these results illuminate the importance of aligning data difficulty with model capacity, offering a transformative perspective for improving alignment strategies in LLMs. Code is available at https://github.com/glorgao/SelectiveDPO.

Updated: 2025-02-11 17:01:11

标题: 对齐的原则性数据选择：困难示例的潜在风险

摘要: 大语言模型（LLMs）的对齐通常假定使用更多干净数据会产生更好的结果，忽视了模型容量与示例难度之间的匹配。挑战这一观点，我们提出了一个新原则：偏好数据的难度不同，过于困难的示例会阻碍对齐，超过模型的容量。通过系统实验，我们验证了这一原则的三个关键发现：（1）偏好示例的难度不同，如在对齐运行中学习顺序的一致性证明；（2）过于困难的示例显著降低了四个LLMs和两个数据集的性能；（3）模型的容量决定了其处理困难示例的阈值，强调了数据选择和模型容量之间的关键关系。基于这一原则，我们引入了选择性DPO，它可以过滤掉过于困难的示例。这种简单的调整使在AlpacaEval 2基准测试中，与DPO基线相比，对齐性能在胜率上提高了9-16％，抑制了一系列具有不同算法调整的DPO变体。总之，这些结果阐明了将数据难度与模型容量对齐的重要性，为改善LLMs中的对齐策略提供了一种变革性观点。代码可在https://github.com/glorgao/SelectiveDPO 上找到。

更新时间: 2025-02-11 17:01:11

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2502.09650v1

Amuro and Char: Analyzing the Relationship between Pre-Training and Fine-Tuning of Large Language Models

The development of large language models leads to the formation of a pre-train-then-align paradigm, in which the model is typically pre-trained on a large text corpus and undergoes a tuning stage to align the model with human preference or downstream tasks. In this work, we investigate the relationship between pre-training and fine-tuning by fine-tuning multiple intermediate pre-trained model checkpoints. Our results on 18 datasets suggest that i) continual pre-training improves the model in a latent way that unveils after fine-tuning; ii) with extra fine-tuning, the datasets that the model does not demonstrate capability gain much more than those that the model performs well during the pre-training stage; iii) although model benefits significantly through supervised fine-tuning, it may forget previously known domain knowledge and the tasks that are not seen during fine-tuning; iv) the model resembles high sensitivity to evaluation prompts after supervised fine-tuning, but this sensitivity can be alleviated by more pre-training.

Updated: 2025-02-11 16:57:29

标题: 安室和夏尔：分析大型语言模型的预训练和微调之间的关系

摘要: 大型语言模型的发展导致了一个预训练-微调范式的形成，其中模型通常在大型文本语料库上进行预训练，并经历一个调整阶段，以使模型与人类偏好或下游任务对齐。在这项工作中，我们通过微调多个中间预训练模型检查点，研究了预训练和微调之间的关系。我们在18个数据集上的结果表明：i）持续的预训练以一种潜在的方式改进了模型，在微调后展现出来；ii）通过额外的微调，模型在预训练阶段表现不佳的数据集比那些模型在预训练阶段表现良好的数据集获得更多收益；iii）尽管模型通过监督微调显著受益，但它可能会忘记先前已知的领域知识和在微调期间未见的任务；iv）在监督微调后，模型在评估提示上表现出高度敏感，但这种敏感性可以通过更多的预训练来缓解。

更新时间: 2025-02-11 16:57:29

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2408.06663v4

Towards scientific discovery with dictionary learning: Extracting biological concepts from microscopy foundation models

Dictionary learning (DL) has emerged as a powerful interpretability tool for large language models. By extracting known concepts (e.g., Golden-Gate Bridge) from human-interpretable data (e.g., text), sparse DL can elucidate a model's inner workings. In this work, we ask if DL can also be used to discover unknown concepts from less human-interpretable scientific data (e.g., cell images), ultimately enabling modern approaches to scientific discovery. As a first step, we use DL algorithms to study microscopy foundation models trained on multi-cell image data, where little prior knowledge exists regarding which high-level concepts should arise. We show that sparse dictionaries indeed extract biologically-meaningful concepts such as cell type and genetic perturbation type. We also propose Iterative Codebook Feature Learning~(ICFL) and combine it with a pre-processing step which uses PCA whitening from a control dataset. In our experiments, we demonstrate that both ICFL and PCA improve the selectivity of extracted features compared to TopK sparse autoencoders.

Updated: 2025-02-11 16:54:45

标题: 朝向科学发现之路：利用字典学习从显微镜基础模型中提取生物概念

摘要: 词典学习（DL）已经成为大型语言模型的一个强大的可解释性工具。通过从人类可解释的数据（例如文本）中提取已知概念（例如金门大桥），稀疏DL可以阐明模型的内部运作方式。在这项工作中，我们探讨DL是否也可以用于从不太可解释的科学数据（例如细胞图像）中发现未知概念，最终实现现代科学发现方法。作为第一步，我们使用DL算法研究在多细胞图像数据上训练的显微镜基础模型，这些数据中几乎没有关于应该出现哪些高级概念的先前知识。我们展示稀疏词典确实提取出生物意义的概念，例如细胞类型和遗传扰动类型。我们还提出了迭代码本特征学习（ICFL）并将其与一个使用来自控制数据集的PCA白化的预处理步骤结合起来。在我们的实验中，我们证明ICFL和PCA相对于TopK稀疏自动编码器改善了提取特征的选择性。

更新时间: 2025-02-11 16:54:45

领域: cs.LG,cs.AI,cs.CV,stat.ML

下载: http://arxiv.org/abs/2412.16247v2

DPCore: Dynamic Prompt Coreset for Continual Test-Time Adaptation

Continual Test-Time Adaptation (CTTA) seeks to adapt source pre-trained models to continually changing, unseen target domains. While existing CTTA methods assume structured domain changes with uniform durations, real-world environments often exhibit dynamic patterns where domains recur with varying frequencies and durations. Current approaches, which adapt the same parameters across different domains, struggle in such dynamic conditions-they face convergence issues with brief domain exposures, risk forgetting previously learned knowledge, or misapplying it to irrelevant domains. To remedy this, we propose DPCore, a method designed for robust performance across diverse domain change patterns while ensuring computational efficiency. DPCore integrates three key components: Visual Prompt Adaptation for efficient domain alignment, a Prompt Coreset for knowledge preservation, and a Dynamic Update mechanism that intelligently adjusts existing prompts for similar domains while creating new ones for substantially different domains. Extensive experiments on four benchmarks demonstrate that DPCore consistently outperforms various CTTA methods, achieving state-of-the-art performance in both structured and dynamic settings while reducing trainable parameters by 99% and computation time by 64% compared to previous approaches.

Updated: 2025-02-11 16:47:17

标题: DPCore：连续测试时间适应的动态提示核心集。

摘要: 持续测试时间适应（CTTA）旨在将源预训练模型适应不断变化的、未见过的目标领域。现有的CTTA方法假设结构化领域变化具有统一的持续时间，而实际环境往往展现出具有不同频率和持续时间的动态模式。当前的方法在不断变化的条件下遇到困难，因为它们跨不同领域调整相同的参数，会面临收敛问题，可能忘记之前学习的知识，或将其错误地应用于无关的领域。为了解决这个问题，我们提出了DPCore，这是一种专为在不同的领域变化模式下实现稳健性能并确保计算效率的方法。DPCore集成了三个关键组件：用于有效领域对齐的视觉提示适应、用于知识保留的提示核心集，以及一种动态更新机制，能够智能地调整现有提示以适应相似领域，同时为完全不同的领域创建新的提示。在四个基准测试上进行的大量实验表明，DPCore始终优于各种CTTA方法，实现了在结构化和动态设置中的最先进性能，同时与先前方法相比，减少了可训练参数99％，计算时间减少了64％。

更新时间: 2025-02-11 16:47:17

领域: cs.LG,cs.CV

下载: http://arxiv.org/abs/2406.10737v3

A statistically consistent measure of Semantic Variability using Language Models

To address the issue of variability in the output generated by a language model, we present a measure of semantic variability that is statistically consistent under mild assumptions. This measure, denoted as semantic spectral entropy, is a easy to implement algorithm that requires just off the shelf language models. We put very few restrictions on the language models and we have shown in a clear simulation studies that such method can generate accurate metric despite randomness that arise from the language models.

Updated: 2025-02-11 16:39:55

标题: 使用语言模型的统计一致的语义变化度量

摘要: 为了解决语言模型生成的输出变异性问题，我们提出了一种语义变异度的度量，该度量在温和假设下是统计一致的。这种度量被称为语义谱熵，是一种易于实现的算法，只需要使用现成的语言模型。我们对语言模型几乎没有限制，并且在清晰的模拟研究中表明，这种方法可以生成准确的度量，尽管语言模型引起的随机性。

更新时间: 2025-02-11 16:39:55

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2502.00507v2

Evaluating Evidence Attribution in Generated Fact Checking Explanations

Automated fact-checking systems often struggle with trustworthiness, as their generated explanations can include hallucinations. In this work, we explore evidence attribution for fact-checking explanation generation. We introduce a novel evaluation protocol -- citation masking and recovery -- to assess attribution quality in generated explanations. We implement our protocol using both human annotators and automatic annotators, and find that LLM annotation correlates with human annotation, suggesting that attribution assessment can be automated. Finally, our experiments reveal that: (1) the best-performing LLMs still generate explanations with inaccurate attributions; and (2) human-curated evidence is essential for generating better explanations. Code and data are available here: https://github.com/ruixing76/Transparent-FCExp.

Updated: 2025-02-11 16:36:32

标题: 评估生成的事实核查解释中的证据归因

摘要: 自动事实核查系统经常在可信度方面遇到困难，因为它们生成的解释可能包含幻觉。在这项工作中，我们探讨了用于事实核查解释生成的证据归因。我们引入了一种新颖的评估协议 - 引文掩码和恢复 - 来评估生成解释中的归因质量。我们使用人工注释者和自动注释者实施了我们的协议，并发现LLM注释与人工注释相关，表明归因评估可以自动化。最后，我们的实验揭示了两个结果：（1）表现最佳的LLM仍会生成带有不准确归因的解释；和（2）人工筛选的证据对于生成更好的解释至关重要。代码和数据可以在这里找到：https://github.com/ruixing76/Transparent-FCExp。

更新时间: 2025-02-11 16:36:32

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2406.12645v3

Paying to Do Better: Games with Payments between Learning Agents

In repeated games, such as auctions, players typically use learning algorithms to choose their actions. The use of such autonomous learning agents has become widespread on online platforms. In this paper, we explore the impact of players incorporating monetary transfer policies into their agents' algorithms, aiming to influence behavior in their favor through the dynamics between the agents. Our focus is on understanding when players have incentives to make use of monetary transfers, how such payments may affect learning dynamics, and what the implications are for welfare and its distribution among the players. We propose a simple and general game-theoretic model to capture such scenarios. Our results on general games show that in a very broad class of games, self-interested players benefit from letting their learning agents make payments to other learners during the game dynamics, and that in many cases, this kind of behavior improves welfare for all players. Our results on first- and second-price auctions show that in equilibria of the ``payment policy game,'' the agents' dynamics reach strong collusive outcomes with low revenue for the auctioneer. These results raise new questions and highlight a challenge for mechanism design in systems where automated learning agents can benefit from interacting with their peers in the digital ecosystem and outside the boundaries of the mechanism.

Updated: 2025-02-11 16:29:04

标题: 支付以做得更好：学习智能体之间的付款游戏

摘要: 在重复博弈中，比如拍卖，玩家通常使用学习算法来选择他们的行动。在在线平台上，这种自主学习代理的使用已经变得普遍。本文探讨了玩家将货币转移政策纳入他们代理算法的影响，旨在通过代理之间的动态影响行为。我们关注的重点是了解玩家何时有动机利用货币转移，这些支付如何影响学习动态，以及对玩家之间的福利及其分配有什么影响。我们提出了一个简单且通用的博弈理论模型来捕捉这种情况。我们关于一般游戏的结果表明，在非常广泛的游戏类别中，自利的玩家受益于让他们的学习代理在游戏动态过程中向其他学习者支付费用，并且在许多情况下，这种行为提高了所有玩家的福利。我们关于一价和二价拍卖的结果表明，在“支付政策游戏”的均衡中，代理的动态达到了对拍卖者收入较低的强勾结结果。这些结果提出了新的问题，并突显了在自动学习代理可以从数字生态系统中与同行互动并超越机制边界的系统中进行机制设计的挑战。

更新时间: 2025-02-11 16:29:04

领域: cs.GT,cs.AI,cs.MA,econ.TH,91A05, 91A06, 91A10, 91A20, 91A40, 91A80,F.0; I.2; I.2.6; J.4

下载: http://arxiv.org/abs/2405.20880v2

Learning from Demonstration with Implicit Nonlinear Dynamics Models

Learning from Demonstration (LfD) is a useful paradigm for training policies that solve tasks involving complex motions, such as those encountered in robotic manipulation. In practice, the successful application of LfD requires overcoming error accumulation during policy execution, i.e. the problem of drift due to errors compounding over time and the consequent out-of-distribution behaviours. Existing works seek to address this problem through scaling data collection, correcting policy errors with a human-in-the-loop, temporally ensembling policy predictions or through learning a dynamical system model with convergence guarantees. In this work, we propose and validate an alternative approach to overcoming this issue. Inspired by reservoir computing, we develop a recurrent neural network layer that includes a fixed nonlinear dynamical system with tunable dynamical properties for modelling temporal dynamics. We validate the efficacy of our neural network layer on the task of reproducing human handwriting motions using the LASA Human Handwriting Dataset. Through empirical experiments we demonstrate that incorporating our layer into existing neural network architectures addresses the issue of compounding errors in LfD. Furthermore, we perform a comparative evaluation against existing approaches including a temporal ensemble of policy predictions and an Echo State Network (ESN) implementation. We find that our approach yields greater policy precision and robustness on the handwriting task while also generalising to multiple dynamics regimes and maintaining competitive latency scores.

Updated: 2025-02-11 16:24:23

标题: 学习演示与隐式非线性动力学模型

摘要: 从示范学习（LfD）中学习是训练解决涉及复杂动作的任务策略的有用范例，如在机器人操作中遇到的那些。在实践中，成功应用LfD需要克服策略执行过程中的误差累积问题，即由于时间推移而导致错误累积和随之而来的超出分布行为的问题。现有工作试图通过扩大数据收集、通过人机协作纠正策略错误、通过时间集成策略预测或通过学习具有收敛保证的动态系统模型来解决这一问题。在这项工作中，我们提出并验证了一种克服这一问题的替代方法。受储层计算的启发，我们开发了一个包含固定非线性动态系统的可调动态特性的循环神经网络层，用于建模时间动态。我们验证了我们的神经网络层在使用LASA人类手写数据集复制人类手写动作任务中的有效性。通过实证实验，我们证明将我们的层结合到现有神经网络架构中可以解决LfD中的错误累积问题。此外，我们对现有方法进行了比较评估，包括策略预测的时间集成和Echo State Network（ESN）实现。我们发现我们的方法在手写任务上产生了更高的策略精度和鲁棒性，同时也可以推广到多个动态体制，并保持竞争性的延迟得分。

更新时间: 2025-02-11 16:24:23

领域: cs.AI,cs.LG,cs.RO,cs.SY,eess.SY,I.2

下载: http://arxiv.org/abs/2409.18768v3

Revisiting the Initial Steps in Adaptive Gradient Descent Optimization

Adaptive gradient optimization methods, such as Adam, are prevalent in training deep neural networks across diverse machine learning tasks due to their ability to achieve faster convergence. However, these methods often suffer from suboptimal generalization compared to stochastic gradient descent (SGD) and exhibit instability, particularly when training Transformer models. In this work, we show the standard initialization of the second-order moment estimation ($v_0 =0$) as a significant factor contributing to these limitations. We introduce simple yet effective solutions: initializing the second-order moment estimation with non-zero values, using either data-driven or random initialization strategies. Empirical evaluations demonstrate that our approach not only stabilizes convergence but also enhances the final performance of adaptive gradient optimizers. Furthermore, by adopting the proposed initialization strategies, Adam achieves performance comparable to many recently proposed variants of adaptive gradient optimization methods. Our code is available at https://github.com/Walleclipse/Adam_Initialization.

Updated: 2025-02-11 16:23:39

标题: 重新审视自适应梯度下降优化中的初始步骤

摘要: 自适应梯度优化方法，如Adam，在训练深度神经网络中广泛应用于各种机器学习任务，因为它们能够实现更快的收敛。然而，与随机梯度下降（SGD）相比，这些方法通常存在优化一般化不足的问题，并且在训练Transformer模型时表现出不稳定性。在本研究中，我们展示了标准初始化第二阶矩估计（$v_0=0$）作为导致这些限制的重要因素。我们提出了简单而有效的解决方案：使用非零值初始化第二阶矩估计，使用数据驱动或随机初始化策略。实证评估表明，我们的方法不仅稳定了收敛速度，还提高了自适应梯度优化器的最终性能。此外，通过采用所提出的初始化策略，Adam的性能达到了许多最近提出的自适应梯度优化方法变体的水平。我们的代码可在https://github.com/Walleclipse/Adam_Initialization找到。

更新时间: 2025-02-11 16:23:39

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2412.02153v2

SNAP: Sequential Non-Ancestor Pruning for Targeted Causal Effect Estimation With an Unknown Graph

Causal discovery can be computationally demanding for large numbers of variables. If we only wish to estimate the causal effects on a small subset of target variables, we might not need to learn the causal graph for all variables, but only a small subgraph that includes the targets and their adjustment sets. In this paper, we focus on identifying causal effects between target variables in a computationally and statistically efficient way. This task combines causal discovery and effect estimation, aligning the discovery objective with the effects to be estimated. We show that definite non-ancestors of the targets are unnecessary to learn causal relations between the targets and to identify efficient adjustments sets. We sequentially identify and prune these definite non-ancestors with our Sequential Non-Ancestor Pruning (SNAP) framework, which can be used either as a preprocessing step to standard causal discovery methods, or as a standalone sound and complete causal discovery algorithm. Our results on synthetic and real data show that both approaches substantially reduce the number of independence tests and the computation time without compromising the quality of causal effect estimations.

Updated: 2025-02-11 16:20:57

标题: SNAP：用于未知图的有针对性因果效应估计的顺序非祖先修剪

摘要: 因果发现对于大量变量可能需要大量计算。如果我们只想估计一小部分目标变量的因果效应，可能并不需要学习所有变量的因果图，而只需要包括目标变量及其调整集的一个小子图。在这篇论文中，我们专注于以一种计算和统计高效的方式识别目标变量之间的因果效应。这项任务结合了因果发现和效应估计，将发现目标与估计效应的目标对齐。我们表明，目标的明确非祖先对于学习目标之间的因果关系和识别高效调整集是不必要的。我们使用我们的顺序非祖先修剪（SNAP）框架依次识别和修剪这些明确非祖先，这可以作为标准因果发现方法的预处理步骤，也可以作为独立的完整和准确的因果发现算法。我们在合成和真实数据上的结果表明，这两种方法都显著减少了独立性检验的数量和计算时间，而不会影响因果效应估计的质量。

更新时间: 2025-02-11 16:20:57

领域: stat.ML,cs.AI,cs.LG

下载: http://arxiv.org/abs/2502.07857v1

From Pixels to Components: Eigenvector Masking for Visual Representation Learning

Predicting masked from visible parts of an image is a powerful self-supervised approach for visual representation learning. However, the common practice of masking random patches of pixels exhibits certain failure modes, which can prevent learning meaningful high-level features, as required for downstream tasks. We propose an alternative masking strategy that operates on a suitable transformation of the data rather than on the raw pixels. Specifically, we perform principal component analysis and then randomly mask a subset of components, which accounts for a fixed ratio of the data variance. The learning task then amounts to reconstructing the masked components from the visible ones. Compared to local patches of pixels, the principal components of images carry more global information. We thus posit that predicting masked from visible components involves more high-level features, allowing our masking strategy to extract more useful representations. This is corroborated by our empirical findings which demonstrate improved image classification performance for component over pixel masking. Our method thus constitutes a simple and robust data-driven alternative to traditional masked image modeling approaches.

Updated: 2025-02-11 16:04:15

标题: 从像素到组件：用于视觉表示学习的特征向量屏蔽

摘要: 从图像的可见部分预测掩蔽部分是一种强大的自监督方法，用于视觉表示学习。然而，常见的遮罩随机像素块的做法表现出某些失败模式，可能阻止学习有意义的高级特征，这些特征对下游任务是必需的。我们提出了一种替代的遮罩策略，它在数据的适当转换上运行，而不是在原始像素上。具体而言，我们执行主成分分析，然后随机遮罩一部分成分，这些成分占数据方差的固定比例。学习任务的关键是从可见成分重构遮罩的成分。与局部像素块相比，图像的主成分携带更多的全局信息。因此，我们假设从可见成分预测掩蔽部分涉及更多的高级特征，使得我们的遮罩策略能够提取更有用的表示。我们的实证发现证实了这一观点，显示出了对成分的遮罩比对像素的遮罩具有改进的图像分类性能。因此，我们的方法构成了一种简单而稳健的数据驱动替代传统的遮罩图像建模方法。

更新时间: 2025-02-11 16:04:15

领域: cs.LG,cs.AI,cs.CV

下载: http://arxiv.org/abs/2502.06314v2

A Practical Method for Generating String Counterfactuals

Interventions targeting the representation space of language models (LMs) have emerged as an effective means to influence model behavior. Such methods are employed, for example, to eliminate or alter the encoding of demographic information such as gender within the model's representations and, in so doing, create a counterfactual representation. However, because the intervention operates within the representation space, understanding precisely what aspects of the text it modifies poses a challenge. In this paper, we give a method to convert representation counterfactuals into string counterfactuals. We demonstrate that this approach enables us to analyze the linguistic alterations corresponding to a given representation space intervention and to interpret the features utilized to encode a specific concept. Moreover, the resulting counterfactuals can be used to mitigate bias in classification through data augmentation.

Updated: 2025-02-11 16:03:35

标题: 一个生成字符串反事实的实用方法

摘要: 针对语言模型（LMs）表示空间的干预已经成为影响模型行为的有效手段。例如，这些方法被用来消除或改变模型表示中的人口统计信息，比如性别，并通过这种方式创建一个反事实的表示。然而，由于干预操作在表示空间内进行，准确理解它修改的文本的哪些方面构成了一个挑战。在本文中，我们提出了一种将表示反事实转换为字符串反事实的方法。我们展示了这种方法使我们能够分析与特定表示空间干预对应的语言变化，并解释用于编码特定概念的特征。此外，由此产生的反事实可以通过数据增强用于减少分类中的偏见。

更新时间: 2025-02-11 16:03:35

领域: cs.CL,cs.CY,cs.LG

下载: http://arxiv.org/abs/2402.11355v5

Learning to Optimize for Mixed-Integer Non-linear Programming

Mixed-integer nonlinear programs (MINLPs) arise in diverse domains such as energy systems and transportation but are notoriously difficult to solve, particularly on a large scale. While learning-to-optimize methods have been successful at continuous optimization, extending them to MINLPs is still challenging due to the integer constraints. To overcome this, we propose a novel deep-learning approach with two learnable correction layers to ensure solution integrality and a post-processing step to improve solution feasibility. Our experiments show that this is the first general method capable of efficiently solving large-scale MINLPs with up to tens of thousands of variables in milliseconds, delivering high-quality solutions even when traditional solvers and heuristics fail. This is the first general learning method for MINLP, successfully solving some of the largest instances reported to date.

Updated: 2025-02-11 15:59:51

标题: 学习用于混合整数非线性规划的优化

摘要: 混合整数非线性规划(MINLPs)在能源系统和交通运输等领域中出现，但在大规模情况下解决起来非常困难。虽然学习优化方法在连续优化方面取得成功，但将其扩展到MINLPs仍然具有挑战性，因为存在整数约束。为了克服这一挑战，我们提出了一种新颖的深度学习方法，其中包括两个可学习的校正层以确保解的整数性，并进行后处理步骤以改善解的可行性。我们的实验表明，这是第一种通用方法，能够在毫秒级内高效解决具有数万个变量的大规模MINLPs，即使传统求解器和启发式方法失败，也能提供高质量的解决方案。这是第一个通用的MINLP学习方法，成功解决了迄今为止报道的一些最大的实例。

更新时间: 2025-02-11 15:59:51

领域: cs.LG,math.OC

下载: http://arxiv.org/abs/2410.11061v8

Not All Prompts Are Made Equal: Prompt-based Pruning of Text-to-Image Diffusion Models

Text-to-image (T2I) diffusion models have demonstrated impressive image generation capabilities. Still, their computational intensity prohibits resource-constrained organizations from deploying T2I models after fine-tuning them on their internal target data. While pruning techniques offer a potential solution to reduce the computational burden of T2I models, static pruning methods use the same pruned model for all input prompts, overlooking the varying capacity requirements of different prompts. Dynamic pruning addresses this issue by utilizing a separate sub-network for each prompt, but it prevents batch parallelism on GPUs. To overcome these limitations, we introduce Adaptive Prompt-Tailored Pruning (APTP), a novel prompt-based pruning method designed for T2I diffusion models. Central to our approach is a prompt router model, which learns to determine the required capacity for an input text prompt and routes it to an architecture code, given a total desired compute budget for prompts. Each architecture code represents a specialized model tailored to the prompts assigned to it, and the number of codes is a hyperparameter. We train the prompt router and architecture codes using contrastive learning, ensuring that similar prompts are mapped to nearby codes. Further, we employ optimal transport to prevent the codes from collapsing into a single one. We demonstrate APTP's effectiveness by pruning Stable Diffusion (SD) V2.1 using CC3M and COCO as target datasets. APTP outperforms the single-model pruning baselines in terms of FID, CLIP, and CMMD scores. Our analysis of the clusters learned by APTP reveals they are semantically meaningful. We also show that APTP can automatically discover previously empirically found challenging prompts for SD, e.g. prompts for generating text images, assigning them to higher capacity codes.

Updated: 2025-02-11 15:58:10

标题: 并非所有提示都是平等的：基于提示的文本到图像扩散模型修剪

摘要: 文本到图像（T2I）扩散模型展示了令人印象深刻的图像生成能力。然而，它们的计算强度限制了资源受限的组织在对内部目标数据进行微调后部署T2I模型。虽然剪枝技术提供了减少T2I模型计算负担的潜在解决方案，但静态剪枝方法对所有输入提示使用相同的剪枝模型，忽视了不同提示的不同容量要求。动态剪枝通过为每个提示使用一个单独的子网络来解决这个问题，但它阻止了在GPU上的批量并行处理。为了克服这些限制，我们引入了一种适应性提示定制剪枝（APTP）方法，这是一种为T2I扩散模型设计的新颖的基于提示的剪枝方法。我们方法的核心是一个提示路由器模型，该模型学习确定输入文本提示所需的容量，并将其路由到一个架构代码，给定用于提示的总计算预算。每个架构代码代表一个针对其分配的提示量身定制的专门模型，代码的数量是一个超参数。我们使用对比学习来训练提示路由器和架构代码，确保相似的提示被映射到附近的代码。此外，我们采用最优传输来防止代码坍缩成一个单一的代码。我们通过使用CC3M和COCO作为目标数据集对Stable Diffusion（SD）V2.1进行剪枝来展示APTP的有效性。在FID、CLIP和CMMD分数方面，APTP优于单模型剪枝基线。我们对APTP学习的聚类进行的分析表明它们在语义上是有意义的。我们还展示APTP可以自动发现以前经验发现的SD的具有挑战性的提示，例如用于生成文本图像的提示，并将它们分配给更高容量的代码。

更新时间: 2025-02-11 15:58:10

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2406.12042v3

Causal Covariate Shift Correction using Fisher information penalty

Evolving feature densities across batches of training data bias cross-validation, making model selection and assessment unreliable (\cite{sugiyama2012machine}). This work takes a distributed density estimation angle to the training setting where data are temporally distributed. \textit{Causal Covariate Shift Correction ($C^{3}$)}, accumulates knowledge about the data density of a training batch using Fisher Information, and using it to penalize the loss in all subsequent batches. The penalty improves accuracy by $12.9\%$ over the full-dataset baseline, by $20.3\%$ accuracy at maximum in batchwise and $5.9\%$ at minimum in foldwise benchmarks.

Updated: 2025-02-11 15:51:59

标题: 使用Fisher信息惩罚进行因果协变量转移校正

摘要: 随着训练数据批次之间特征密度的变化，造成了交叉验证的偏差，使得模型选择和评估变得不可靠。本研究采用了分布式密度估计方法，将数据在时间上进行分布。因果协变量偏移校正（$C^{3}$）通过使用Fisher信息来累积关于训练批次数据密度的知识，并利用它来惩罚所有后续批次中的损失。这种惩罚可以使准确性比完整数据集基准提高12.9％，在批次基准中最高提高20.3％的准确性，在交叉验证基准中最低提高5.9％的准确性。【参考文献：\cite{sugiyama2012machine}】。

更新时间: 2025-02-11 15:51:59

领域: cs.LG

下载: http://arxiv.org/abs/2502.15756v1

Partial-Label Learning with Conformal Candidate Cleaning

Real-world data is often ambiguous; for example, human annotation produces instances with multiple conflicting class labels. Partial-label learning (PLL) aims at training a classifier in this challenging setting, where each instance is associated with a set of candidate labels and one correct, but unknown, class label. A multitude of algorithms targeting this setting exists and, to enhance their prediction quality, several extensions that are applicable across a wide range of PLL methods have been introduced. While many of these extensions rely on heuristics, this article proposes a novel enhancing method that incrementally prunes candidate sets using conformal prediction. To work around the missing labeled validation set, which is typically required for conformal prediction, we propose a strategy that alternates between training a PLL classifier to label the validation set, leveraging these predicted class labels for calibration, and pruning candidate labels that are not part of the resulting conformal sets. In this sense, our method alternates between empirical risk minimization and candidate set pruning. We establish that our pruning method preserves the conformal validity with respect to the unknown ground truth. Our extensive experiments on artificial and real-world data show that the proposed approach significantly improves the test set accuracies of several state-of-the-art PLL classifiers.

Updated: 2025-02-11 15:51:23

标题: 使用符合性候选清理的部分标签学习

摘要: 现实世界的数据通常存在歧义；例如，人类标注会产生具有多个冲突类标签的实例。部分标签学习（PLL）旨在在这种具有挑战性的情境下训练分类器，其中每个实例关联着一组候选标签和一个正确但未知的类标签。存在着许多针对这种情况的算法，并且为了提高它们的预测质量，引入了适用于广泛范围的PLL方法的几种扩展。虽然许多这些扩展依赖于启发式方法，但本文提出了一种利用符合预测逐渐修剪候选集的新型增强方法。为了解决通常需要的缺失标记验证集的问题，我们提出了一种策略，交替训练一个PLL分类器来标记验证集，利用这些预测的类标签进行校准，并修剪不属于结果符合集的候选标签。在这种意义上，我们的方法在经验风险最小化和候选集修剪之间交替。我们确定我们的修剪方法在未知地面真相方面保持了符合有效性。我们对人工和现实世界数据进行了广泛的实验，结果表明所提出的方法显著提高了几种最先进的PLL分类器的测试集准确性。

更新时间: 2025-02-11 15:51:23

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2502.07661v1

Isotonic Mechanism for Exponential Family Estimation in Machine Learning Peer Review

In 2023, the International Conference on Machine Learning (ICML) required authors with multiple submissions to rank their submissions based on perceived quality. In this paper, we aim to employ these author-specified rankings to enhance peer review in machine learning and artificial intelligence conferences by extending the Isotonic Mechanism to exponential family distributions. This mechanism generates adjusted scores that closely align with the original scores while adhering to author-specified rankings. Despite its applicability to a broad spectrum of exponential family distributions, implementing this mechanism does not require knowledge of the specific distribution form. We demonstrate that an author is incentivized to provide accurate rankings when her utility takes the form of a convex additive function of the adjusted review scores. For a certain subclass of exponential family distributions, we prove that the author reports truthfully only if the question involves only pairwise comparisons between her submissions, thus indicating the optimality of ranking in truthful information elicitation. Moreover, we show that the adjusted scores improve dramatically the estimation accuracy compared to the original scores and achieve nearly minimax optimality when the ground-truth scores have bounded total variation. We conclude with a numerical analysis of the ICML 2023 ranking data, showing substantial estimation gains in approximating a proxy ground-truth quality of the papers using the Isotonic Mechanism.

Updated: 2025-02-11 15:50:27

标题: 在机器学习同行评审中指数族估计的等渗机制

摘要: 在2023年，国际机器学习大会（ICML）要求有多个投稿的作者根据感知质量对其投稿进行排名。本文旨在利用这些作者指定的排名来增强机器学习和人工智能会议中的同行评审，通过将等保机制扩展到指数族分布。该机制生成调整分数，与原始分数密切对齐，同时遵循作者指定的排名。尽管这种机制适用于广泛的指数族分布，但实施该机制并不需要了解具体的分布形式。我们证明，如果作者的效用采用调整审查分数的凸加性函数形式，作者将被激励提供准确的排名。对于一定子类的指数族分布，我们证明只有在问题涉及作者投稿之间的成对比较时，作者才会真实报告，从而表明排名在真实信息引导中的最优性。此外，我们表明，与原始分数相比，调整分数显著提高了估计准确性，并在地面真实分数具有有界总变异性时几乎达到极小化最优性。我们最后通过对ICML 2023排名数据的数值分析得出结论，显示使用等保机制来近似估计论文的代理地面真实质量带来了实质性的估计收益。

更新时间: 2025-02-11 15:50:27

领域: math.ST,cs.GT,cs.LG,econ.TH,stat.ME,stat.TH

下载: http://arxiv.org/abs/2304.11160v4

NeuPAN: Direct Point Robot Navigation with End-to-End Model-based Learning

Navigating a nonholonomic robot in a cluttered, unknown environment requires accurate perception and precise motion control for real-time collision avoidance. This paper presents NeuPAN: a real-time, highly accurate, map-free, easy-to-deploy, and environment-invariant robot motion planner. Leveraging a tightly coupled perception-to-control framework, NeuPAN has two key innovations compared to existing approaches: 1) it directly maps raw point cloud data to a latent distance feature space for collision-free motion generation, avoiding error propagation from the perception to control pipeline; 2) it is interpretable from an end-to-end model-based learning perspective. The crux of NeuPAN is solving an end-to-end mathematical model with numerous point-level constraints using a plug-and-play (PnP) proximal alternating-minimization network (PAN), incorporating neurons in the loop. This allows NeuPAN to generate real-time, physically interpretable motions. It seamlessly integrates data and knowledge engines, and its network parameters can be fine-tuned via backpropagation. We evaluate NeuPAN on a ground mobile robot, a wheel-legged robot, and an autonomous vehicle, in extensive simulated and real-world environments. Results demonstrate that NeuPAN outperforms existing baselines in terms of accuracy, efficiency, robustness, and generalization capabilities across various environments, including the cluttered sandbox, office, corridor, and parking lot. We show that NeuPAN works well in unknown and unstructured environments with arbitrarily shaped objects, transforming impassable paths into passable ones.

Updated: 2025-02-11 15:47:43

标题: NeuPAN：具有端到端基于模型学习的直接点式机器人导航

摘要: 在拥挤、未知环境中导航一个非完整机器人需要准确的感知和精确的运动控制，以实时避免碰撞。本文介绍了NeuPAN：一个实时、高度准确、无地图、易于部署并且环境不变的机器人运动规划器。利用紧密耦合的感知到控制框架，NeuPAN与现有方法相比具有两个关键创新：1）它直接将原始点云数据映射到潜在距离特征空间，用于生成无碰撞运动，避免了从感知到控制管道的误差传播；2）它是可解释的，从端到端的基于模型的学习视角来看。NeuPAN的核心是使用一种插拔式（PnP）近似交替最小化网络（PAN）解决具有众多点级约束的端到端数学模型，将神经元纳入其中。这使得NeuPAN能够生成实时、物理可解释的运动。它无缝集成了数据和知识引擎，其网络参数可以通过反向传播进行微调。我们在广泛的模拟和真实环境中对NeuPAN进行了评估，包括地面移动机器人、轮腿机器人和自动驾驶车辆。结果表明，NeuPAN在准确性、效率、稳健性和泛化能力方面优于现有基准，在各种环境中表现出色，包括拥挤的沙箱、办公室、走廊和停车场。我们展示了NeuPAN在未知和非结构化环境中表现良好，将不可通行的路径转换为可通行的路径。

更新时间: 2025-02-11 15:47:43

领域: cs.RO,cs.AI

下载: http://arxiv.org/abs/2403.06828v3

Private Low-Rank Approximation for Covariance Matrices, Dyson Brownian Motion, and Eigenvalue-Gap Bounds for Gaussian Perturbations

We consider the problem of approximating a $d \times d$ covariance matrix $M$ with a rank-$k$ matrix under $(\varepsilon,\delta)$-differential privacy. We present and analyze a complex variant of the Gaussian mechanism and obtain upper bounds on the Frobenius norm of the difference between the matrix output by this mechanism and the best rank-$k$ approximation to $M$. Our analysis provides improvements over previous bounds, particularly when the spectrum of $M$ satisfies natural structural assumptions. The novel insight is to view the addition of Gaussian noise to a matrix as a continuous-time matrix Brownian motion. This viewpoint allows us to track the evolution of eigenvalues and eigenvectors of the matrix, which are governed by stochastic differential equations discovered by Dyson. These equations enable us to upper bound the Frobenius distance between the best rank-$k$ approximation of $M$ and that of a Gaussian perturbation of $M$ as an integral that involves inverse eigenvalue gaps of the stochastically evolving matrix, as opposed to a sum of perturbation bounds obtained via Davis-Kahan-type theorems. Subsequently, again using the Dyson Brownian motion viewpoint, we show that the eigenvalues of the matrix $M$ perturbed by Gaussian noise have large gaps with high probability. These results also contribute to the analysis of low-rank approximations under average-case perturbations, and to an understanding of eigenvalue gaps for random matrices, both of which may be of independent interest.

Updated: 2025-02-11 15:46:03

标题: 私人低秩逼近协方差矩阵，戴森布朗运动和高斯扰动的特征值间隙界限

摘要: 我们考虑在$(\varepsilon,\delta)$-差分隐私条件下，用秩为$k$的矩阵近似一个$d \times d$协方差矩阵$M$的问题。我们介绍并分析了高斯机制的复杂变体，并得到了通过该机制输出的矩阵与$M$的最佳秩为$k$近似之间的Frobenius范数差的上界。我们的分析改进了先前的界限，特别是当$M$的谱满足自然结构假设时。新颖的见解是将高斯噪声添加到矩阵中视为连续时间矩阵布朗运动。这种观点使我们能够跟踪矩阵的特征值和特征向量的演变，这些特征值和特征向量受Dyson发现的随机微分方程控制。这些方程使我们能够通过涉及随机演化矩阵的逆特征值间隙的积分，而不是通过Davis-Kahan类型定理获得的扰动边界的总和，来上界$M$的最佳秩为$k$近似与高斯扰动$M$的Frobenius距离。随后，再次使用Dyson布朗运动视角，我们证明了通过高斯噪声扰动的矩阵$M$的特征值具有很大的间隙的概率很高。这些结果还有助于分析在平均情况下的低秩近似，以及理解随机矩阵的特征值间隙，这两者可能是独立感兴趣的。

更新时间: 2025-02-11 15:46:03

领域: cs.DS,cs.CR,cs.LG,cs.NA,math.NA,math.PR

下载: http://arxiv.org/abs/2502.07657v1

A Unifying Framework for Causal Imitation Learning with Hidden Confounders

We propose a general and unifying framework for causal Imitation Learning (IL) with hidden confounders that subsumes several existing confounded IL settings from the literature. Our framework accounts for two types of hidden confounders: (a) those observed by the expert, which thus influence the expert's policy, and (b) confounding noise hidden to both the expert and the IL algorithm. For additional flexibility, we also introduce a confounding noise horizon and time-varying expert-observable hidden variables. We show that causal IL in our framework can be reduced to a set of Conditional Moment Restrictions (CMRs) by leveraging trajectory histories as instruments to learn a history-dependent policy. We propose DML-IL, a novel algorithm that uses instrumental variable regression to solve these CMRs and learn a policy. We provide a bound on the imitation gap for DML-IL, which recovers prior results as special cases. Empirical evaluation on a toy environment with continues state-action spaces and multiple Mujoco tasks demonstrate that DML-IL outperforms state-of-the-art causal IL algorithms.

Updated: 2025-02-11 15:43:49

标题: 一个统一的框架用于具有隐藏混淆因素的因果模仿学习

摘要: 我们提出了一个通用且统一的框架，用于处理带有隐藏混杂因素的因果模仿学习（IL），该框架涵盖了文献中几种现有的混杂IL设置。我们的框架考虑了两种类型的隐藏混杂因素：（a）专家观察到的那些混杂因素，因此影响了专家的策略，以及（b）对专家和IL算法都隐藏的混杂噪音。为了增加灵活性，我们还引入了混杂噪音视野和时间变化的专家可观察隐藏变量。我们展示了在我们的框架中，因果IL可以通过利用轨迹历史作为工具来学习依赖于历史的策略，从而被简化为一组条件矩限制（CMRs）。我们提出了DML-IL，这是一种使用工具变量回归来解决这些CMRs并学习策略的新算法。我们为DML-IL提供了一个模仿差距的界限，该界限将之前的结果恢复为特殊情况。在一个具有连续状态-动作空间和多个Mujoco任务的玩具环境上的实证评估表明，DML-IL优于最先进的因果IL算法。

更新时间: 2025-02-11 15:43:49

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2502.07656v1

Your Absorbing Discrete Diffusion Secretly Models the Conditional Distributions of Clean Data

Discrete diffusion models with absorbing processes have shown promise in language modeling. The key quantities to be estimated are the ratios between the marginal probabilities of two transitive states at all timesteps, called the concrete score. In this paper, we reveal that the concrete score in absorbing diffusion can be expressed as conditional probabilities of clean data, multiplied by a time-dependent scalar in an analytic form. Motivated by this finding, we propose reparameterized absorbing discrete diffusion (RADD), a dedicated diffusion model without time-condition that characterizes the time-independent conditional probabilities. Besides its simplicity, RADD can reduce the number of function evaluations (NFEs) by caching the output of the time-independent network when the noisy sample remains unchanged in a sampling interval, which enables sampling acceleration. Built upon the new perspective of conditional distributions, we further unify absorbing discrete diffusion and any-order autoregressive models (AO-ARMs), showing that the upper bound on the negative log-likelihood for the diffusion model can be interpreted as an expected negative log-likelihood for AO-ARMs. Further, our RADD models achieve SOTA performance among diffusion models on 5 zero-shot language modeling benchmarks (measured by perplexity) at the GPT-2 scale. Our code is available at https://github.com/ML-GSAI/RADD.

Updated: 2025-02-11 15:42:19

标题: 您吸收的离散扩散悄然模拟了干净数据的条件分布

摘要: 具有吸收过程的离散扩散模型在语言建模中表现出了潜力。需要估计的关键数量是在所有时间步长上两个传递状态的边际概率之间的比率，称为具体分数。在本文中，我们揭示了吸收扩散中的具体分数可以用干净数据的条件概率乘以解析形式中的时间相关标量来表示。受到这一发现的启发，我们提出了重新参数化的吸收离散扩散（RADD），这是一个专门的扩散模型，不需要时间条件，它描述了时间无关的条件概率。除了其简单性外，RADD可以通过缓存时间无关网络的输出来减少函数评估（NFEs）的数量，当在采样间隔中嘈杂样本保持不变时，这可以加速采样。基于条件分布的新视角，我们进一步统一了吸收离散扩散和任意阶自回归模型（AO-ARMs），表明扩散模型的负对数似然的上限可以解释为AO-ARMs的预期负对数似然。此外，我们的RADD模型在5个零样本语言建模基准（通过困惑度衡量）中在GPT-2规模上实现了最先进的性能。我们的代码可以在https://github.com/ML-GSAI/RADD 上找到。

更新时间: 2025-02-11 15:42:19

领域: cs.LG,cs.CL

下载: http://arxiv.org/abs/2406.03736v3

Guiding Time-Varying Generative Models with Natural Gradients on Exponential Family Manifold

Optimising probabilistic models is a well-studied field in statistics. However, its connection with the training of generative models remains largely under-explored. In this paper, we show that the evolution of time-varying generative models can be projected onto an exponential family manifold, naturally creating a link between the parameters of a generative model and those of a probabilistic model. We then train the generative model by moving its projection on the manifold according to the natural gradient descent scheme. This approach also allows us to approximate the natural gradient of the KL divergence efficiently without relying on MCMC for intractable models. Furthermore, we propose particle versions of the algorithm, which feature closed-form update rules for any parametric model within the exponential family. Through toy and real-world experiments, we validate the effectiveness of the proposed algorithms.

Updated: 2025-02-11 15:39:47

标题: 用自然梯度在指数族流形上引导时变生成模型

摘要: 概率模型的优化是统计学中一个经过深入研究的领域。然而，它与生成模型的训练之间的联系仍然大部分未被探索。在本文中，我们展示了时变生成模型的演变可以投射到指数家族流形上，从而自然地建立了生成模型的参数与概率模型参数之间的联系。然后，我们通过自然梯度下降方案将生成模型在流形上的投影进行训练。这种方法还使我们能够有效地近似KL散度的自然梯度，而无需依赖于MCMC来处理难以处理的模型。此外，我们提出了算法的粒子版本，该版本具有指数家族内任何参数化模型的封闭形式更新规则。通过玩具和真实世界实验，我们验证了所提出算法的有效性。

更新时间: 2025-02-11 15:39:47

领域: stat.ML,cs.LG

下载: http://arxiv.org/abs/2502.07650v1

Model Surgery: Modulating LLM's Behavior Via Simple Parameter Editing

Large Language Models (LLMs) have demonstrated great potential as generalist assistants, showcasing powerful task understanding and problem-solving capabilities. To deploy LLMs as AI assistants, it is crucial that these models exhibit desirable behavioral traits, such as non-toxicity and resilience against jailbreak attempts. Current approaches for detoxification or preventing jailbreaking usually involve Supervised Fine-Tuning (SFT) or Reinforcement Learning from Human Feedback (RLHF), which requires finetuning billions of parameters through gradient descent with substantial computational cost. Furthermore, models modified through SFT and RLHF may deviate from the pretrained models, potentially leading to a degradation in foundational LLM capabilities. In this paper, we observe that surprisingly, directly editing a small subset of parameters can effectively modulate specific behaviors of LLMs, such as detoxification and resistance to jailbreaking, with only inference-level computational resources. Experiments demonstrate that in the detoxification task, our approach achieves reductions of up to 90.0% in toxicity on the RealToxicityPrompts dataset and 49.2% on ToxiGen, while maintaining the LLM's general capabilities in areas such as common sense, question answering, and mathematics

Updated: 2025-02-11 15:39:08

标题: 模型手术：通过简单参数编辑调节LLM的行为

摘要: 大型语言模型（LLMs）已经展示出作为通用助手的巨大潜力，展示了强大的任务理解和问题解决能力。要将LLMs部署为人工智能助手，这些模型展现出良好的行为特征是至关重要的，如无毒性和抗破解尝试的韧性。目前用于解毒或预防破解的方法通常涉及监督微调（SFT）或从人类反馈中进行强化学习（RLHF），这需要通过梯度下降对数十亿个参数进行微调，具有重大的计算成本。此外，通过SFT和RLHF修改的模型可能会偏离预训练模型，潜在地导致LLM基础能力下降。在本文中，我们观察到令人惊讶的是，直接编辑一小部分参数可以有效地调节LLMs的特定行为，如解毒和抗破解，仅需推理级的计算资源。实验证明，在解毒任务中，我们的方法在RealToxicityPrompts数据集上实现了高达90.0％的毒性降低和在ToxiGen上的49.2％，同时保持LLM在常识、问题回答和数学等领域的通用能力。

更新时间: 2025-02-11 15:39:08

领域: cs.AI,68T50 (Primary) 68T07, 62M45 (Secondary),I.2.7

下载: http://arxiv.org/abs/2407.08770v2

Causal Additive Models with Unobserved Causal Paths and Backdoor Paths

Causal additive models have been employed as tractable yet expressive frameworks for causal discovery involving hidden variables. State-of-the-art methodologies suggest that determining the causal relationship between a pair of variables is infeasible in the presence of an unobserved backdoor or an unobserved causal path. Contrary to this assumption, we theoretically show that resolving the causal direction is feasible in certain scenarios by incorporating two novel components into the theory. The first component introduces a novel characterization of regression sets within independence between regression residuals. The second component leverages conditional independence among the observed variables. We also provide a search algorithm that integrates these innovations and demonstrate its competitive performance against existing methods.

Updated: 2025-02-11 15:35:15

标题: 带有未观测因果路径和背门路径的因果加法模型

摘要: 因果加法模型已被用作涉及隐藏变量的因果发现的可处理但表达丰富的框架。最先进的方法学表明，在存在未观察到的反向门或未观察到的因果路径的情况下，确定一对变量之间的因果关系是不可行的。与这种假设相反，我们在理论上表明，在某些情况下通过将两个新组件纳入理论中是可行的。第一个组件引入了对回归残差之间独立性内的回归集的新特征描述。第二个组件利用了观察变量之间的条件独立性。我们还提供了一个集成这些创新的搜索算法，并展示了它与现有方法的竞争性能力。

更新时间: 2025-02-11 15:35:15

领域: cs.LG,stat.ME,stat.ML

下载: http://arxiv.org/abs/2502.07646v1

FoQA: A Faroese Question-Answering Dataset

We present FoQA, a Faroese extractive question-answering (QA) dataset with 2,000 samples, created using a semi-automated approach combining Large Language Models (LLMs) and human validation. The dataset was generated from Faroese Wikipedia articles using GPT-4-turbo for initial QA generation, followed by question rephrasing to increase complexity and native speaker validation to ensure quality. We provide baseline performance metrics for FoQA across multiple models, including LLMs and BERT, demonstrating its effectiveness in evaluating Faroese QA performance. The dataset is released in three versions: a validated set of 2,000 samples, a complete set of all 10,001 generated samples, and a set of 2,395 rejected samples for error analysis.

Updated: 2025-02-11 15:33:17

标题: FoQA：一个法罗问题回答数据集

摘要: 我们提出了FoQA，一个法罗语抽取式问答（QA）数据集，包含2,000个样本，采用半自动化方法结合大型语言模型（LLMs）和人工验证创建。该数据集是从法罗语维基百科文章中使用GPT-4-turbo进行初始QA生成，随后进行问题改述以增加复杂性，并进行母语者验证以确保质量。我们提供了FoQA的基准性能指标，跨多个模型，包括LLMs和BERT，展示其在评估法罗语QA表现方面的有效性。该数据集以三个版本发布：一个包含2,000个经过验证的样本集，一个包含所有10,001个生成样本的完整集，以及一个包含2,395个被拒绝样本用于错误分析。

更新时间: 2025-02-11 15:33:17

领域: cs.CL,cs.LG

下载: http://arxiv.org/abs/2502.07642v1

LP-DETR: Layer-wise Progressive Relations for Object Detection

This paper presents LP-DETR (Layer-wise Progressive DETR), a novel approach that enhances DETR-based object detection through multi-scale relation modeling. Our method introduces learnable spatial relationships between object queries through a relation-aware self-attention mechanism, which adaptively learns to balance different scales of relations (local, medium and global) across decoder layers. This progressive design enables the model to effectively capture evolving spatial dependencies throughout the detection pipeline. Extensive experiments on COCO 2017 dataset demonstrate that our method improves both convergence speed and detection accuracy compared to standard self-attention module. The proposed method achieves competitive results, reaching 52.3\% AP with 12 epochs and 52.5\% AP with 24 epochs using ResNet-50 backbone, and further improving to 58.0\% AP with Swin-L backbone. Furthermore, our analysis reveals an interesting pattern: the model naturally learns to prioritize local spatial relations in early decoder layers while gradually shifting attention to broader contexts in deeper layers, providing valuable insights for future research in object detection.

Updated: 2025-02-11 15:25:02

标题: LP-DETR: 逐层渐进关系用于目标检测

摘要: 这篇论文介绍了LP-DETR (Layer-wise Progressive DETR)，一种通过多尺度关系建模增强基于DETR的物体检测的新方法。我们的方法通过关系感知的自注意机制引入了可学习的目标查询之间的空间关系，自适应地学习在解码器层之间平衡不同尺度的关系（局部、中等和全局）。这种渐进式设计使模型能够有效地捕捉检测管道中不断演变的空间依赖关系。在COCO 2017数据集上的大量实验表明，与标准自注意模块相比，我们的方法提高了收敛速度和检测精度。所提出的方法取得了竞争性结果，使用ResNet-50骨干网络，在12个时期时达到52.3\% AP，在24个时期时达到52.5\% AP，并在使用Swin-L骨干网络时进一步提高到58.0\% AP。此外，我们的分析揭示了一个有趣的模式：模型自然地学习在早期解码器层中优先考虑局部空间关系，而逐渐将注意力转移到更广泛的上下文中，为未来物体检测领域的研究提供了有价值的见解。

更新时间: 2025-02-11 15:25:02

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2502.05147v2

Consistency Training with Physical Constraints

We propose a physics-aware Consistency Training (CT) method that accelerates sampling in Diffusion Models with physical constraints. Our approach leverages a two-stage strategy: (1) learning the noise-to-data mapping via CT, and (2) incorporating physics constraints as a regularizer. Experiments on toy examples show that our method generates samples in a single step while adhering to the imposed constraints. This approach has the potential to efficiently solve partial differential equations (PDEs) using deep generative modeling.

Updated: 2025-02-11 15:23:14

标题: 具有物理约束的一致性训练

摘要: 我们提出了一种物理感知的一致性训练（CT）方法，可以加速在具有物理约束的扩散模型中的采样。我们的方法利用了一个两阶段策略：（1）通过CT学习噪声到数据的映射，（2）将物理约束作为正则化器进行整合。在玩具示例上的实验表明，我们的方法可以在一步内生成样本，同时遗守所施加的约束。这种方法有望通过深度生成建模高效地解决偏微分方程（PDEs）。

更新时间: 2025-02-11 15:23:14

领域: cs.LG

下载: http://arxiv.org/abs/2502.07636v1

Distributed Value Decomposition Networks with Networked Agents

We investigate the problem of distributed training under partial observability, whereby cooperative multi-agent reinforcement learning agents (MARL) maximize the expected cumulative joint reward. We propose distributed value decomposition networks (DVDN) that generate a joint Q-function that factorizes into agent-wise Q-functions. Whereas the original value decomposition networks rely on centralized training, our approach is suitable for domains where centralized training is not possible and agents must learn by interacting with the physical environment in a decentralized manner while communicating with their peers. DVDN overcomes the need for centralized training by locally estimating the shared objective. We contribute with two innovative algorithms, DVDN and DVDN (GT), for the heterogeneous and homogeneous agents settings respectively. Empirically, both algorithms approximate the performance of value decomposition networks, in spite of the information loss during communication, as demonstrated in ten MARL tasks in three standard environments.

Updated: 2025-02-11 15:23:05

标题: 具有网络代理的分布式价值分解网络

摘要: 我们研究了部分可观察性下的分布式训练问题，其中合作多智能体强化学习（MARL）代理最大化期望的累积联合奖励。我们提出了分布式值分解网络（DVDN），生成一个分解为代理Q函数的联合Q函数。原始值分解网络依赖于集中式训练，而我们的方法适用于集中式训练不可能的领域，代理必须以分散的方式与物理环境进行交互并与同行通信。DVDN通过本地估计共享目标来克服集中式训练的需要。我们提供了两种创新算法，DVDN和DVDN（GT），分别用于异构和同质代理设置。实验上，尽管在通信过程中存在信息损失，这两种算法在三个标准环境中的十个MARL任务中近似值分解网络的性能。

更新时间: 2025-02-11 15:23:05

领域: cs.LG,cs.AI,cs.MA,I.2.6; I.2.11

下载: http://arxiv.org/abs/2502.07635v1

Exploring the Bitcoin Mesoscale

The open availability of the entire history of the Bitcoin transactions opens up the possibility to study this system at an unprecedented level of detail. This contribution is devoted to the analysis of the mesoscale structural properties of the Bitcoin User Network (BUN), across its entire history (i.e. from 2009 to 2017). What emerges from our analysis is that the BUN is characterized by a core-periphery structure a deeper analysis of which reveals a certain degree of bow-tieness (i.e. the presence of a Strongly-Connected Component, an IN- and an OUT-component together with some tendrils attached to the IN-component). Interestingly, the evolution of the BUN structural organization experiences fluctuations that seem to be correlated with the presence of bubbles, i.e. periods of price surge and decline observed throughout the entire Bitcoin history: our results, thus, further confirm the interplay between structural quantities and price movements observed in previous analyses.

Updated: 2025-02-11 15:18:42

标题: 探索比特币的中尺度特征

摘要: 比特币交易的整个历史的开放可用性使得有可能以前所未有的细节水平研究这个系统。本文献致力于对比特币用户网络（BUN）的中尺度结构特性进行分析，覆盖了其整个历史（即从2009年至2017年）。我们分析的结果显示，BUN具有核-外围结构，更深入的分析揭示出一定程度的蝶形结构（即存在一个强连接组件、一个内部组件和一个外部组件，以及一些连接到内部组件的嫩枝）。有趣的是，BUN结构组织的演变经历了波动，似乎与泡沫的存在相关，即在整个比特币历史中观察到的价格激增和下跌期间：因此，我们的结果进一步证实了之前分析中观察到的结构量和价格波动之间的相互作用。

更新时间: 2025-02-11 15:18:42

领域: q-fin.ST,cs.CR,physics.soc-ph

下载: http://arxiv.org/abs/2307.14409v2

Scalable and consistent embedding of probability measures into Hilbert spaces via measure quantization

This paper is focused on statistical learning from data that come as probability measures. In this setting, popular approaches consist in embedding such data into a Hilbert space with either Linearized Optimal Transport or Kernel Mean Embedding. However, the cost of computing such embeddings prohibits their direct use in large-scale settings. We study two methods based on measure quantization for approximating input probability measures with discrete measures of small-support size. The first one is based on optimal quantization of each input measure, while the second one relies on mean-measure quantization. We study the consistency of such approximations, and its implication for scalable embeddings of probability measures into a Hilbert space at a low computational cost. We finally illustrate our findings with various numerical experiments.

Updated: 2025-02-11 15:17:43

标题: 可伸缩和一致的将概率测度嵌入希尔伯特空间的方法：通过测度量化

摘要: 本文关注的是从概率测度数据中进行统计学习。在这种情况下，流行的方法包括使用Linearized Optimal Transport或Kernel Mean Embedding将这些数据嵌入希尔伯特空间。然而，计算这些嵌入的成本限制了它们在大规模环境中的直接使用。我们研究了两种基于测度量化的方法，用于将输入概率测度近似为小支持大小的离散测度。第一种方法基于对每个输入测度的最优量化，而第二种方法依赖于平均测度量化。我们研究了这种近似的一致性，以及其对将概率测度可伸缩嵌入希尔伯特空间的低计算成本的影响。最后，我们通过各种数值实验来说明我们的发现。

更新时间: 2025-02-11 15:17:43

领域: stat.ML,cs.LG

下载: http://arxiv.org/abs/2502.04907v2

Rethinking Timing Residuals: Advancing PET Detectors with Explicit TOF Corrections

PET is a functional imaging method that visualizes metabolic processes. TOF information can be derived from coincident detector signals and incorporated into image reconstruction to enhance the SNR. PET detectors are typically assessed by their CTR, but timing performance is degraded by various factors. Research on timing calibration seeks to mitigate these degradations and restore accurate timing information. While many calibration methods use analytical approaches, machine learning techniques have recently gained attention due to their flexibility. We developed a residual physics-based calibration approach that combines prior domain knowledge with the power of machine learning models. This approach begins with an initial analytical calibration addressing first-order skews. The remaining deviations, regarded as residual effects, are used to train machine learning models to eliminate higher-order skews. The key advantage is that the experimenter guides the learning process through the definition of timing residuals. In earlier studies, we developed models that directly predicted the expected time difference, which offered corrections only implicitly (implicit correction models). In this study, we introduce a new definition for timing residuals, enabling us to train models that directly predict correction values (explicit correction models). The explicit correction approach significantly simplifies data acquisition, improves linearity, and enhances timing performance from $371 \pm 6$ ps to $281 \pm 5$ ps for coincidences from 430 keV to 590 keV. Additionally, the new definition reduces model size, making it suitable for high-throughput applications like PET scanners. Experiments were conducted using two detector stacks composed of $4 \times 4$ LYSO:Ce,Ca crystals ($3.8\times 3.8\times 20$ mm$^{3}$) coupled to $4 \times 4$ Broadcom NUV-MT SiPMs and digitized with the TOFPET2 ASIC.

Updated: 2025-02-11 15:17:29

标题: 重新思考时间残差：通过显式 TOF 校正推进 PET 探测器

摘要: PET是一种可视化代谢过程的功能成像方法。 TOF信息可以从巧合探测器信号中派生，并并入图像重建以增强信噪比。 PET探测器通常通过它们的CTR进行评估，但定时性能受到各种因素的影响而下降。定时校准的研究旨在减轻这些降级并恢复准确的定时信息。虽然许多校准方法使用分析方法，但机器学习技术近年来因其灵活性而受到关注。我们开发了一种基于残差物理的校准方法，将先前的领域知识与机器学习模型的能力相结合。该方法从初始的分析校准开始，解决第一阶偏斜。其余的偏差被视为残差效应，用于训练机器学习模型以消除高阶偏差。关键优势在于实验者通过定义定时残差来引导学习过程。在早期的研究中，我们开发了直接预测预期时间差的模型，这仅提供了隐式校正（隐式校正模型）。在本研究中，我们引入了新的定时残差定义，使我们能够训练直接预测校正值的模型（显式校正模型）。显式校正方法显着简化了数据采集，提高了线性度，并将时间性能从430 keV至590 keV的巧合中的371±6 ps提高到281±5 ps。此外，新的定义减小了模型大小，使其适用于PET扫描仪等高通量应用。实验使用由4×4个LYSO:Ce,Ca晶体组成的两个探测器堆叠（$3.8×3.8×20 mm^{3}$），与4×4个Broadcom NUV-MT SiPMs耦合，并使用TOFPET2 ASIC进行数字化。

更新时间: 2025-02-11 15:17:29

领域: physics.ins-det,cs.LG

下载: http://arxiv.org/abs/2502.07630v1

Faster Convergence with Less Communication: Broadcast-Based Subgraph Sampling for Decentralized Learning over Wireless Networks

Consensus-based decentralized stochastic gradient descent (D-SGD) is a widely adopted algorithm for decentralized training of machine learning models across networked agents. A crucial part of D-SGD is the consensus-based model averaging, which heavily relies on information exchange and fusion among the nodes. Specifically, for consensus averaging over wireless networks, communication coordination is necessary to determine when and how a node can access the channel and transmit (or receive) information to (or from) its neighbors. In this work, we propose $\texttt{BASS}$, a broadcast-based subgraph sampling method designed to accelerate the convergence of D-SGD while considering the actual communication cost per iteration. $\texttt{BASS}$ creates a set of mixing matrix candidates that represent sparser subgraphs of the base topology. In each consensus iteration, one mixing matrix is sampled, leading to a specific scheduling decision that activates multiple collision-free subsets of nodes. The sampling occurs in a probabilistic manner, and the elements of the mixing matrices, along with their sampling probabilities, are jointly optimized. Simulation results demonstrate that $\texttt{BASS}$ enables faster convergence with fewer transmission slots compared to existing link-based scheduling methods. In conclusion, the inherent broadcasting nature of wireless channels offers intrinsic advantages in accelerating the convergence of decentralized optimization and learning.

Updated: 2025-02-11 15:09:49

标题: 更快的收敛速度，更少的通信量：基于广播的子图采样用于无线网络上的分散式学习

摘要: 共识基础的分布式随机梯度下降（D-SGD）是一种广泛采用的算法，用于跨网络代理进行机器学习模型的分布式训练。D-SGD的关键部分是基于共识的模型平均，它严重依赖于节点之间的信息交换和融合。具体而言，对于在无线网络上的共识平均，通信协调是必要的，以确定节点何时以及如何可以访问信道并向（或从）其邻居传输（或接收）信息。在这项工作中，我们提出了$\texttt{BASS}$，一种基于广播的子图抽样方法，旨在加速D-SGD的收敛，同时考虑每次迭代的实际通信成本。$\texttt{BASS}$创建了一组混合矩阵候选，代表基础拓扑的更稀疏子图。在每个共识迭代中，会抽样一个混合矩阵，导致特定的调度决策，激活多个无冲突子集的节点。抽样以概率方式进行，混合矩阵的元素以及它们的抽样概率是联合优化的。模拟结果表明，与现有的基于链路的调度方法相比，$\texttt{BASS}$使收敛速度更快，传输时隙更少。总之，无线信道的广播性质在加速分布式优化和学习的收敛方面具有固有优势。

更新时间: 2025-02-11 15:09:49

领域: cs.IT,cs.DC,cs.LG,eess.SP,math.IT

下载: http://arxiv.org/abs/2401.13779v2

Causal-Informed Contrastive Learning: Towards Bias-Resilient Pre-training under Concept Drift

The evolution of large-scale contrastive pre-training propelled by top-tier datasets has reached a transition point in the scaling law. Consequently, sustaining and enhancing a model's pre-training capabilities in drift environments have surfaced as a notable challenge. In this paper, we initially uncover that contrastive pre-training methods are significantly impacted by concept drift wherein distributions change unpredictably, resulting in notable biases in the feature space of the pre-trained model. Empowered by causal inference, we construct a structural causal graph to analyze the impact of concept drift to contrastive pre-training systemically, and propose the causal interventional contrastive objective. Upon achieving this, we devise a resilient contrastive pre-training approach to accommodate the data stream of concept drift, with simple and scalable implementation. Extensive experiments on various downstream tasks demonstrate our resilient contrastive pre-training effectively mitigates the bias stemming from the concept drift data stream. Codes are available at https://anonymous.4open.science/r/ResilientCL/.

Updated: 2025-02-11 15:09:05

标题: 因果知情对比学习：面向概念漂移下的偏差抗干扰预训练

摘要: 大规模对比预训练的演变，受顶级数据集推动，已经达到了扩展规律的转折点。因此，在漂移环境中维持和增强模型的预训练能力已经成为一个显著的挑战。在本文中，我们首先发现对比预训练方法受到概念漂移的显著影响，其中分布不可预测地改变，导致预训练模型的特征空间中存在显著的偏见。借助因果推断，我们构建了一个结构因果图，系统地分析了概念漂移对对比预训练的影响，并提出了因果干预对比目标。在实现这一点后，我们设计了一种弹性对比预训练方法，以适应概念漂移的数据流，实现简单且可扩展的实现。对各种下游任务进行的广泛实验表明，我们的弹性对比预训练有效地减轻了概念漂移数据流带来的偏见。代码可在https://anonymous.4open.science/r/ResilientCL/ 上找到。

更新时间: 2025-02-11 15:09:05

领域: cs.LG,cs.CV

下载: http://arxiv.org/abs/2502.07620v1

Tractable Transformers for Flexible Conditional Generation

Non-autoregressive (NAR) generative models are valuable because they can handle diverse conditional generation tasks in a more principled way than their autoregressive (AR) counterparts, which are constrained by sequential dependency requirements. Recent advancements in NAR models, such as diffusion language models, have demonstrated superior performance in unconditional generation compared to AR models (e.g., GPTs) of similar sizes. However, such improvements do not always lead to improved conditional generation performance. We show that a key reason for this gap is the difficulty in generalizing to conditional probability queries unseen during training. As a result, strong unconditional generation performance does not guarantee high-quality conditional generation. This paper proposes Tractable Transformers (Tracformer), a Transformer-based generative model that is more robust to different conditional generation tasks. Unlike existing models that rely solely on global contextual features derived from full inputs, Tracformers incorporate a sparse Transformer encoder to capture both local and global contextual information. This information is routed through a decoder for conditional generation. Empirical results demonstrate that Tracformers achieve state-of-the-art conditional generation performance on text modeling compared to recent diffusion and AR model baselines.

Updated: 2025-02-11 15:05:26

标题: 灵活条件生成可处理的Transformer

摘要: 非自回归（NAR）生成模型具有价值，因为它们可以以比自回归（AR）对应物更为原则的方式处理各种条件生成任务，后者受到顺序依赖性需求的限制。最近对NAR模型的进展，如扩散语言模型，已经证明在无条件生成方面比类似规模的AR模型（例如GPT）表现更出色。然而，这种改进并不总是导致改善条件生成性能。我们表明，造成这种差距的一个关键原因是难以泛化到训练中未见过的条件概率查询。因此，强大的无条件生成性能并不保证高质量的条件生成。本文提出了可处理Transformer（Tracformer），这是一种基于Transformer的生成模型，更能适应不同的条件生成任务。与现有模型不同，Tracformers不仅依赖于来自完整输入的全局上下文特征，还结合了一个稀疏Transformer编码器，以捕获本地和全局上下文信息。这些信息经过解码器进行条件生成。实证结果表明，与最近的扩散和AR模型基线相比，Tracformers在文本建模方面实现了最先进的条件生成性能。

更新时间: 2025-02-11 15:05:26

领域: cs.CL,cs.LG

下载: http://arxiv.org/abs/2502.07616v1

OBI-Bench: Can LMMs Aid in Study of Ancient Script on Oracle Bones?

We introduce OBI-Bench, a holistic benchmark crafted to systematically evaluate large multi-modal models (LMMs) on whole-process oracle bone inscriptions (OBI) processing tasks demanding expert-level domain knowledge and deliberate cognition. OBI-Bench includes 5,523 meticulously collected diverse-sourced images, covering five key domain problems: recognition, rejoining, classification, retrieval, and deciphering. These images span centuries of archaeological findings and years of research by front-line scholars, comprising multi-stage font appearances from excavation to synthesis, such as original oracle bone, inked rubbings, oracle bone fragments, cropped single characters, and handprinted characters. Unlike existing benchmarks, OBI-Bench focuses on advanced visual perception and reasoning with OBI-specific knowledge, challenging LMMs to perform tasks akin to those faced by experts. The evaluation of 6 proprietary LMMs as well as 17 open-source LMMs highlights the substantial challenges and demands posed by OBI-Bench. Even the latest versions of GPT-4o, Gemini 1.5 Pro, and Qwen-VL-Max are still far from public-level humans in some fine-grained perception tasks. However, they perform at a level comparable to untrained humans in deciphering tasks, indicating remarkable capabilities in offering new interpretative perspectives and generating creative guesses. We hope OBI-Bench can facilitate the community to develop domain-specific multi-modal foundation models towards ancient language research and delve deeper to discover and enhance these untapped potentials of LMMs.

Updated: 2025-02-11 14:59:40

标题: OBI-Bench：线性混合模型能辅助研究甲骨文吗？

摘要: 我们介绍了OBI-Bench，这是一个全面的基准，旨在系统评估大型多模态模型（LMMs）在需要专业领域知识和深思熟虑认知的整个过程中的甲骨文（OBI）处理任务。OBI-Bench包括5,523张精心收集的多样化来源的图像，涵盖五个关键领域问题：识别、重组、分类、检索和解读。这些图像跨越了数个世纪的考古发现和前线学者多年的研究，包括从挖掘到合成的多阶段字体外观，如原始甲骨、墨写拓片、甲骨碎片、裁剪的单个字符和手写字符。与现有的基准不同，OBI-Bench侧重于使用OBI特定知识进行高级视觉感知和推理，挑战LMMs执行类似于专家面临的任务。对6个专有LMMs以及17个开源LMMs的评估突显了OBI-Bench所面临的重大挑战和需求。即使是最新版本的GPT-4o、Gemini 1.5 Pro和Qwen-VL-Max在某些细粒度感知任务上仍远远落后于公众水平的人类。然而，在解读任务中，它们的表现与未经训练的人类相媲美，表明在提供新的解释视角和生成创造性猜测方面具有显著的能力。我们希望OBI-Bench可以促进社区开发面向古代语言研究的领域特定多模态基础模型，并深入挖掘和提升这些LMMs未开发潜力。

更新时间: 2025-02-11 14:59:40

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2412.01175v2

Generalized Least Squares Kernelized Tensor Factorization

Completing multidimensional tensor-structured data with missing entries is a fundamental task for many real-world applications involving incomplete or corrupted datasets. For data with spatial or temporal side information, low-rank factorization models with smoothness constraints have demonstrated strong performance. Although effective at capturing global and long-range correlations, these models often struggle to capture short-scale, high-frequency variations in the data. To address this limitation, we propose the Generalized Least Squares Kernelized Tensor Factorization (GLSKF) framework for tensor completion. GLSKF integrates smoothness-constrained low-rank factorization with a locally correlated residual process; the resulting additive structure enables effective characterization of both global dependencies and local variations. Specifically, we define the covariance norm to enforce the smoothness of factor matrices in the global low-rank factorization, and use structured covariance/kernel functions to model the local processes. For model estimation, we develop an alternating least squares (ALS) procedure with closed-form solutions for each subproblem. GLSKF utilizes zero-padding and slicing operations based on projection matrices which preserve the Kronecker structure of covariances, facilitating efficient computations through the conjugate gradient (CG) method. The proposed framework is evaluated on four real-world datasets across diverse tasks. Experimental results demonstrate that GLSKF achieves superior performance and scalability, establishing it as a novel solution for multidimensional tensor completion.

Updated: 2025-02-11 14:57:40

标题: 广义最小二乘核张量分解

摘要: 使用具有缺失条目的多维张量结构化数据是许多涉及不完整或损坏数据集的实际应用程序的基本任务。对于具有空间或时间边缘信息的数据，具有平滑约束的低秩分解模型已经展现出很强的性能。尽管这些模型在捕获全局和长距离相关性方面有效，但它们经常难以捕获数据中的短尺度、高频变化。为了解决这一限制，我们提出了用于张量完成的广义最小二乘核化张量分解（GLSKF）框架。GLSKF将具有平滑约束的低秩分解与局部相关残差过程相结合；由此产生的加法结构能够有效地描述全局依赖性和局部变化。具体而言，我们定义协方差规范以强制在全局低秩分解中平滑因子矩阵，并使用结构化协方差/核函数来建模局部过程。对于模型估计，我们开发了一个交替最小二乘（ALS）过程，对每个子问题都有封闭形式的解。GLSKF利用基于保持协方差Kronecker结构的投影矩阵的零填充和切片操作，通过共轭梯度（CG）方法实现高效计算。所提出的框架在四个不同任务的真实数据集上进行了评估。实验结果表明，GLSKF实现了优越的性能和可扩展性，将其确立为多维张量完成的新颖解决方案。

更新时间: 2025-02-11 14:57:40

领域: stat.ML,cs.CV,cs.LG

下载: http://arxiv.org/abs/2412.07041v3

Algorithmic Aspects of Strategic Trading

Algorithmic trading in modern financial markets is widely acknowledged to exhibit strategic, game-theoretic behaviors whose complexity can be difficult to model. A recent series of papers (Chriss, 2024b,c,a, 2025) has made progress in the setting of trading for position building. Here parties wish to buy or sell a fixed number of shares in a fixed time period in the presence of both temporary and permanent market impact, resulting in exponentially large strategy spaces. While these papers primarily consider the existence and structural properties of equilibrium strategies, in this work we focus on the algorithmic aspects of the proposed model. We give an efficient algorithm for computing best responses, and show that while the temporary impact only setting yields a potential game, best response dynamics do not generally converge for the general setting, for which no fast algorithm for (Nash) equilibrium computation is known. This leads us to consider the broader notion of Coarse Correlated Equilibria (CCE), which we show can be computed efficiently via an implementation of Follow the Perturbed Leader (FTPL). We illustrate the model and our results with an experimental investigation, where FTPL exhibits interesting behavior in different regimes of the relative weighting between temporary and permanent market impact.

Updated: 2025-02-11 14:56:16

标题: 策略交易的算法方面

摘要: 现代金融市场中的算法交易被广泛认为具有战略性、博弈论行为，其复杂性很难建模。最近一系列论文（Chriss，2024b，c，a，2025）在交易建仓方面取得了进展。在这里，交易方希望在固定时间段内买入或卖出固定数量的股票，同时存在临时和永久市场影响，导致策略空间呈指数级增长。虽然这些论文主要考虑平衡策略的存在和结构特性，在这项工作中，我们专注于所提出模型的算法方面。我们提出了一个高效算法来计算最佳应对策略，并展示了在仅考虑临时影响的情况下，可能形成一个潜在博弈；但对于一般设置，最佳应对动态通常不会收敛，且尚无快速算法可用于（纳什）平衡计算。这使我们考虑到了更广泛的粗糙相关均衡（CCE）概念，我们展示了通过Follow the Perturbed Leader（FTPL）实现的高效计算方法。我们通过实验研究模型和结果，其中FTPL在临时和永久市场影响之间的相对权重不同情况下表现出有趣的行为。

更新时间: 2025-02-11 14:56:16

领域: cs.GT,cs.CE,cs.LG

下载: http://arxiv.org/abs/2502.07606v1

Generalisation under gradient descent via deterministic PAC-Bayes

We establish disintegrated PAC-Bayesian generalisation bounds for models trained with gradient descent methods or continuous gradient flows. Contrary to standard practice in the PAC-Bayesian setting, our result applies to optimisation algorithms that are deterministic, without requiring any de-randomisation step. Our bounds are fully computable, depending on the density of the initial distribution and the Hessian of the training objective over the trajectory. We show that our framework can be applied to a variety of iterative optimisation algorithms, including stochastic gradient descent (SGD), momentum-based schemes, and damped Hamiltonian dynamics.

Updated: 2025-02-11 14:54:12

标题: 通过确定性PAC-Bayes在梯度下降下的泛化

摘要: 我们建立了针对使用梯度下降方法或连续梯度流训练的模型的分散的PAC-Bayesian泛化界限。与PAC-Bayesian设置中的标准做法相反，我们的结果适用于确定性的优化算法，无需任何去随机化步骤。我们的界限是完全可计算的，取决于初始分布的密度和训练目标在轨迹上的Hessian。我们展示了我们的框架可以应用于各种迭代优化算法，包括随机梯度下降（SGD）、基于动量的方案和阻尼哈密顿动力学。

更新时间: 2025-02-11 14:54:12

领域: stat.ML,cs.LG

下载: http://arxiv.org/abs/2209.02525v4

Claim Verification in the Age of Large Language Models: A Survey

The large and ever-increasing amount of data available on the Internet coupled with the laborious task of manual claim and fact verification has sparked the interest in the development of automated claim verification systems. Several deep learning and transformer-based models have been proposed for this task over the years. With the introduction of Large Language Models (LLMs) and their superior performance in several NLP tasks, we have seen a surge of LLM-based approaches to claim verification along with the use of novel methods such as Retrieval Augmented Generation (RAG). In this survey, we present a comprehensive account of recent claim verification frameworks using LLMs. We describe the different components of the claim verification pipeline used in these frameworks in detail including common approaches to retrieval, prompting, and fine-tuning. Finally, we describe publicly available English datasets created for this task.

Updated: 2025-02-11 14:51:08

标题: 大语言模型时代的索赔验证：一项调查

摘要: 互联网上可用的大量数据不断增加，再加上手动索赔和事实核实的繁重任务，引发了对自动索赔验证系统的开发的兴趣。多年来，已经提出了几种基于深度学习和变压器的模型来完成这项任务。随着大型语言模型（LLMs）的引入以及它们在多个自然语言处理任务中的优越性能，我们看到了基于LLM的索赔验证方法的激增，以及对检索增强生成（RAG）等新方法的使用。在本调查中，我们介绍了使用LLMs进行最近索赔验证框架的全面情况。我们详细描述了这些框架中使用的索赔验证流程的不同组件，包括检索、提示和微调的常见方法。最后，我们描述了为此任务创建的可公开获取的英文数据集。

更新时间: 2025-02-11 14:51:08

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2408.14317v2

Distributed Non-Interactive Zero-Knowledge Proofs

Distributed certification is a set of mechanisms that allows an all-knowing prover to convince the units of a communication network that the network's state has some desired property, such as being 3-colorable or triangle-free. Classical mechanisms, such as proof labeling schemes (PLS), consist of a message from the prover to each unit, followed by one round of communication between each unit and its neighbors. Later works consider extensions, called distributed interactive proofs, where the prover and the units can have multiple rounds of communication before the communication among the units. Recently, Bick, Kol, and Oshman (SODA '22) defined a zero-knowledge version of distributed interactive proofs, where the prover convinces the units of the network's state without revealing any other information about the network's state or structure. In their work, they propose different variants of this model and show that many graph properties of interest can be certified with them. In this work, we define and study distributed non-interactive zero-knowledge proofs (dNIZK); these can be seen as a non-interactive version of the aforementioned model, and also as a zero-knowledge version of PLS. We prove the following: - There exists a dNIZK protocol for 3-coloring with O(log n)-bit messages from the prover and O(log n)-size messages among neighbors. - There exists a family of dNIZK protocols for triangle-freeness, that presents a trade-off between the size of the messages from the prover and the size of the messages among neighbors. - There exists a dNIZK protocol for any graph property in NP in the random oracle models, which is secure against an arbitrary number of malicious parties.

Updated: 2025-02-11 14:44:51

标题: 分布式非交互式零知识证明

摘要: 分布式认证是一组机制，允许一个全知的证明者说服通信网络的单元，使网络的状态具有某种期望的属性，比如可以3-着色或无三角形。传统的机制，如证明标签方案（PLS），包括证明者向每个单元发送消息，然后在每个单元和其邻居之间进行一轮通信。后来的作品考虑了称为分布式互动证明的扩展，其中证明者和单元之间可以在单元之间进行多轮通信之前进行多轮通信。最近，Bick、Kol和Oshman（SODA '22）定义了分布式互动证明的零知识版本，在这个版本中，证明者说服网络的单元状态，而不透露有关网络状态或结构的任何其他信息。在他们的工作中，他们提出了这个模型的不同变体，并展示了许多感兴趣的图形属性可以通过它们进行认证。在这项工作中，我们定义和研究分布式非互动零知识证明（dNIZK）；这可以看作是前述模型的非互动版本，也可以看作是PLS的零知识版本。我们证明了以下内容： - 存在一个3-着色的dNIZK协议，从证明者那里收到O(log n)位消息，并在邻居之间收到O(log n)大小的消息。 - 存在一个三角形无关的dNIZK协议系列，介绍了从证明者那里收到的消息的大小和邻居之间消息的大小之间的权衡。 - 存在一个在随机预言模型中对抗任意数量的恶意参与者的NP中的任何图形属性的dNIZK协议。

更新时间: 2025-02-11 14:44:51

领域: cs.DC,cs.CR

下载: http://arxiv.org/abs/2502.07594v1

NLGR: Utilizing Neighbor Lists for Generative Rerank in Personalized Recommendation Systems

Reranking plays a crucial role in modern multi-stage recommender systems by rearranging the initial ranking list. Due to the inherent challenges of combinatorial search spaces, some current research adopts an evaluator-generator paradigm, with a generator generating feasible sequences and an evaluator selecting the best sequence based on the estimated list utility. However, these methods still face two issues. Firstly, due to the goal inconsistency problem between the evaluator and generator, the generator tends to fit the local optimal solution of exposure distribution rather than combinatorial space optimization. Secondly, the strategy of generating target items one by one is difficult to achieve optimality because it ignores the information of subsequent items. To address these issues, we propose a utilizing Neighbor Lists model for Generative Reranking (NLGR), which aims to improve the performance of the generator in the combinatorial space. NLGR follows the evaluator-generator paradigm and improves the generator's training and generating methods. Specifically, we use neighbor lists in combination space to enhance the training process, making the generator perceive the relative scores and find the optimization direction. Furthermore, we propose a novel sampling-based non-autoregressive generation method, which allows the generator to jump flexibly from the current list to any neighbor list. Extensive experiments on public and industrial datasets validate NLGR's effectiveness and we have successfully deployed NLGR on the Meituan food delivery platform.

Updated: 2025-02-11 14:44:47

标题: NLGR: 利用邻居列表在个性化推荐系统中进行生成性重新排列

摘要: 重新排名在现代多阶段推荐系统中发挥着至关重要的作用，通过重新排列初始排名列表。由于组合搜索空间的固有挑战，一些当前的研究采用了评估器生成器范式，生成器生成可行序列，评估器根据估计的列表效用选择最佳序列。然而，这些方法仍然面临两个问题。首先，由于评估器和生成器之间的目标不一致问题，生成器往往更适应于曝光分布的局部最优解，而不是组合空间优化。其次，逐个生成目标项目的策略很难实现最优性，因为它忽略了后续项目的信息。为了解决这些问题，我们提出了一种利用邻居列表模型进行生成式重新排名（NLGR）的方法，旨在改善组合空间中生成器的性能。NLGR遵循评估器生成器范式，并改进了生成器的训练和生成方法。具体来说，我们在组合空间中使用邻居列表来增强训练过程，使生成器感知相对分数并找到优化方向。此外，我们提出了一种基于采样的非自回归生成方法，允许生成器灵活地从当前列表跳转到任何邻居列表。对公共和工业数据集进行的大量实验验证了NLGR的有效性，并且我们已成功将NLGR部署在美团外卖平台上。

更新时间: 2025-02-11 14:44:47

领域: cs.IR,cs.AI

下载: http://arxiv.org/abs/2502.06097v2

Bridging Brain Signals and Language: A Deep Learning Approach to EEG-to-Text Decoding

Brain activity translation into human language delivers the capability to revolutionize machine-human interaction while providing communication support to people with speech disability. Electronic decoding reaches a certain level of achievement yet current EEG-to-text decoding methods fail to reach open vocabularies and depth of meaning and individual brain-specific variables. We introduce a special framework which changes conventional closed-vocabulary EEG-to-text decoding approaches by integrating subject-specific learning models with natural language processing methods to resolve detection obstacles. This method applies a deep representation learning approach to extract important EEG features which allow training of neural networks to create elaborate sentences that extend beyond original data content. The ZuCo dataset analysis demonstrates that research findings achieve higher BLEU, ROUGE and BERTScore performance when compared to current methods. The research proves how this framework functions as an effective approach to generate meaningful and correct texts while understanding individual brain variations. The proposed research aims to create a connection between open-vocabulary Text generation systems and human brain signal interpretation for developing efficacious brain-to-text systems. The research produces interdisciplinary effects through innovative assistive technology development and personalized communication systems which extend possibilities for human-computer interaction in various settings.

Updated: 2025-02-11 14:43:14

标题: 连接脑信号和语言：一种深度学习方法用于将脑电图解码为文本

摘要: 将大脑活动转化为人类语言，可以彻底改变机器与人类之间的互动方式，同时为言语残障者提供交流支持。电子解码已经取得了一定的成就，但目前的 EEG 到文本解码方法无法实现开放词汇表和深度含义以及个体大脑特定变量。我们引入了一个特殊的框架，通过将主题特定学习模型与自然语言处理方法整合，从而改变传统的封闭词汇表 EEG 到文本解码方法，以解决检测障碍。该方法应用了深度表示学习方法来提取重要的 EEG 特征，从而使神经网络能够训练出超越原始数据内容的复杂句子。ZuCo 数据集分析表明，与当前方法相比，研究结果在 BLEU、ROUGE 和 BERTScore 性能上取得了更高的表现。研究证明了这一框架如何作为一种有效的方法来生成有意义和正确的文本，同时理解个体大脑变异。该研究旨在在开放词汇文本生成系统和人类大脑信号解释之间建立联系，以开发有效的大脑到文本系统。该研究通过创新的辅助技术开发和个性化交流系统，产生了跨学科效果，为各种环境中的人机交互扩展了可能性。

更新时间: 2025-02-11 14:43:14

领域: eess.SP,cs.CL,cs.LG

下载: http://arxiv.org/abs/2502.17465v1

DMWM: Dual-Mind World Model with Long-Term Imagination

Imagination in world models is crucial for enabling agents to learn long-horizon policy in a sample-efficient manner. Existing recurrent state-space model (RSSM)-based world models depend on single-step statistical inference to capture the environment dynamics, and, hence, they are unable to perform long-term imagination tasks due to the accumulation of prediction errors. Inspired by the dual-process theory of human cognition, we propose a novel dual-mind world model (DMWM) framework that integrates logical reasoning to enable imagination with logical consistency. DMWM is composed of two components: an RSSM-based System 1 (RSSM-S1) component that handles state transitions in an intuitive manner and a logic-integrated neural network-based System 2 (LINN-S2) component that guides the imagination process through hierarchical deep logical reasoning. The inter-system feedback mechanism is designed to ensure that the imagination process follows the logical rules of the real environment. The proposed framework is evaluated on benchmark tasks that require long-term planning from the DMControl suite. Extensive experimental results demonstrate that the proposed framework yields significant improvements in terms of logical coherence, trial efficiency, data efficiency and long-term imagination over the state-of-the-art world models.

Updated: 2025-02-11 14:40:57

标题: DMWM: 双重心智世界模型与长期想象

摘要: 在世界模型中，想象力对于使代理能够以高效的方式学习长期政策至关重要。现有的基于循环状态空间模型（RSSM）的世界模型依赖于单步统计推断来捕捉环境动态，因此它们无法执行长期想象任务，因为预测错误会累积。受人类认知的双过程理论的启发，我们提出了一种新颖的双心世界模型（DMWM）框架，该框架集成了逻辑推理，以实现逻辑一致性的想象。DMWM由两个组件组成：基于RSSM的系统1（RSSM-S1）组件以直观方式处理状态转换，以及基于逻辑集成神经网络的系统2（LINN-S2）组件通过分层深度逻辑推理引导想象过程。系统间的反馁机制被设计为确保想象过程遵循真实环境的逻辑规则。所提出的框架在需要来自DMControl套件的长期规划的基准任务上进行了评估。大量实验结果表明，所提出的框架在逻辑连贯性、试验效率、数据效率和长期想象方面显著改善了目前最先进的世界模型。

更新时间: 2025-02-11 14:40:57

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2502.07591v1

A Particle Algorithm for Mean-Field Variational Inference

Variational inference is a fast and scalable alternative to Markov chain Monte Carlo and has been widely applied to posterior inference tasks in statistics and machine learning. A traditional approach for implementing mean-field variational inference (MFVI) is coordinate ascent variational inference (CAVI), which relies crucially on parametric assumptions on complete conditionals. In this paper, we introduce a novel particle-based algorithm for mean-field variational inference, which we term PArticle VI (PAVI). Notably, our algorithm does not rely on parametric assumptions on complete conditionals, and it applies to the nonparametric setting. We provide non-asymptotic finite-particle convergence guarantee for our algorithm. To our knowledge, this is the first end-to-end guarantee for particle-based MFVI.

Updated: 2025-02-11 14:37:00

标题: 一种用于均场变分推断的粒子算法

摘要: 变分推断是马尔可夫链蒙特卡洛的一种快速可扩展的替代方法，广泛应用于统计学和机器学习中的后验推断任务。传统的实现均场变分推断（MFVI）的方法是坐标上升变分推断（CAVI），它在完全条件上关键依赖参数假设。在本文中，我们介绍了一种新颖的基于粒子的均场变分推断算法，我们称之为PArticle VI（PAVI）。值得注意的是，我们的算法不依赖于完全条件上的参数假设，并且适用于非参数设置。我们为我们的算法提供了非渐近有限粒子收敛保证。据我们所知，这是基于粒子的MFVI的第一个端到端保证。

更新时间: 2025-02-11 14:37:00

领域: math.ST,cs.LG,math.OC,stat.ML,stat.TH

下载: http://arxiv.org/abs/2412.20385v2

SEMU: Singular Value Decomposition for Efficient Machine Unlearning

While the capabilities of generative foundational models have advanced rapidly in recent years, methods to prevent harmful and unsafe behaviors remain underdeveloped. Among the pressing challenges in AI safety, machine unlearning (MU) has become increasingly critical to meet upcoming safety regulations. Most existing MU approaches focus on altering the most significant parameters of the model. However, these methods often require fine-tuning substantial portions of the model, resulting in high computational costs and training instabilities, which are typically mitigated by access to the original training dataset. In this work, we address these limitations by leveraging Singular Value Decomposition (SVD) to create a compact, low-dimensional projection that enables the selective forgetting of specific data points. We propose Singular Value Decomposition for Efficient Machine Unlearning (SEMU), a novel approach designed to optimize MU in two key aspects. First, SEMU minimizes the number of model parameters that need to be modified, effectively removing unwanted knowledge while making only minimal changes to the model's weights. Second, SEMU eliminates the dependency on the original training dataset, preserving the model's previously acquired knowledge without additional data requirements. Extensive experiments demonstrate that SEMU achieves competitive performance while significantly improving efficiency in terms of both data usage and the number of modified parameters.

Updated: 2025-02-11 14:36:39

标题: SEMU：用于高效机器遗忘的奇异值分解

摘要: 在过去几年中，生成式基础模型的能力已经迅速提升，但是防止有害和不安全行为的方法仍然不成熟。在人工智能安全方面面临的紧迫挑战之一是机器遗忘（MU）已经变得越来越关键，以满足即将出台的安全法规。大多数现有的MU方法侧重于修改模型的最显著参数。然而，这些方法通常需要微调模型的大部分部分，导致高计算成本和训练不稳定性，通常需要访问原始训练数据集来减轻这些问题。本文通过利用奇异值分解（SVD）来创建一个紧凑的低维投影，实现对特定数据点的选择性遗忘，以解决这些限制。我们提出了奇异值分解用于高效机器遗忘（SEMU），这是一种旨在优化MU的新方法，设计有两个关键方面。首先，SEMU最小化需要修改的模型参数数量，有效地去除不需要的知识，同时只对模型权重进行最小的更改。其次，SEMU消除了对原始训练数据集的依赖，保留了模型先前获得的知识，而无需额外的数据要求。大量实验证明，SEMU在竞争性性能方面取得了成功，同时显著提高了数据使用效率和修改参数的数量。

更新时间: 2025-02-11 14:36:39

领域: cs.LG

下载: http://arxiv.org/abs/2502.07587v1

We Can't Understand AI Using our Existing Vocabulary

This position paper argues that, in order to understand AI, we cannot rely on our existing vocabulary of human words. Instead, we should strive to develop neologisms: new words that represent precise human concepts that we want to teach machines, or machine concepts that we need to learn. We start from the premise that humans and machines have differing concepts. This means interpretability can be framed as a communication problem: humans must be able to reference and control machine concepts, and communicate human concepts to machines. Creating a shared human-machine language through developing neologisms, we believe, could solve this communication problem. Successful neologisms achieve a useful amount of abstraction: not too detailed, so they're reusable in many contexts, and not too high-level, so they convey precise information. As a proof of concept, we demonstrate how a "length neologism" enables controlling LLM response length, while a "diversity neologism" allows sampling more variable responses. Taken together, we argue that we cannot understand AI using our existing vocabulary, and expanding it through neologisms creates opportunities for both controlling and understanding machines better.

Updated: 2025-02-11 14:34:05

标题: 我们无法用现有的词汇理解人工智能

摘要: 这篇立场文件认为，为了理解人工智能，我们不能依赖现有的人类词汇。相反，我们应该努力发展新词：代表我们想要教给机器的精确人类概念，或者我们需要学习的机器概念的新词。我们从一个前提出发，即人类和机器具有不同的概念。这意味着可解释性可以被看作是一个沟通问题：人类必须能够引用和控制机器概念，并将人类概念传达给机器。通过开发新词，创建一个共享的人机语言，我们相信可以解决这个沟通问题。成功的新词实现了一定程度的抽象：不要太详细，以便在许多情境中重复使用，也不要太高级，以便传达精确信息。作为概念验证，我们展示了如何通过“长度新词”来控制LLM响应长度，而“多样性新词”则允许采样更多变化的响应。综合起来，我们认为我们不能用现有的词汇理解人工智能，通过新词扩展词汇创造了更好地控制和理解机器的机会。

更新时间: 2025-02-11 14:34:05

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2502.07586v1

Understanding the Generalization Error of Markov algorithms through Poissonization

Using continuous-time stochastic differential equation (SDE) proxies to stochastic optimization algorithms has proven fruitful for understanding their generalization abilities. A significant part of these approaches are based on the so-called ``entropy flows'', which greatly simplify the generalization analysis. Unfortunately, such well-structured entropy flows cannot be obtained for most discrete-time algorithms, and the existing SDE approaches remain limited to specific noise and algorithmic structures. We aim to alleviate this issue by introducing a generic framework for analyzing the generalization error of Markov algorithms through `Poissonization', a continuous-time approximation of discrete-time processes with formal approximation guarantees. Through this approach, we first develop a novel entropy flow, which directly leads to PAC-Bayesian generalization bounds. We then draw novel links to modified versions of the celebrated logarithmic Sobolev inequalities (LSI), identify cases where such LSIs are satisfied, and obtain improved bounds. Beyond its generality, our framework allows exploiting specific properties of learning algorithms. In particular, we incorporate the noise structure of different algorithm types - namely, those with additional noise injections (noisy) and those without (non-noisy) - through various technical tools. This illustrates the capacity of our methods to achieve known (yet, Poissonized) and new generalization bounds.

Updated: 2025-02-11 14:31:32

标题: 通过泊松化理解马尔可夫算法的泛化误差

摘要: 使用连续时间随机微分方程（SDE）代理来进行随机优化算法已被证明对于理解它们的泛化能力是有益的。其中一个重要部分是基于所谓的“熵流”，这极大地简化了泛化分析。不幸的是，对于大多数离散时间算法而言，无法获得这种结构良好的熵流，现有的SDE方法仍然局限于特定的噪声和算法结构。我们旨在通过引入一个通用框架来通过“泊松化”分析马尔可夫算法的泛化误差，泊松化是离散时间过程的连续时间近似，并具有正式的逼近保证。通过这种方法，我们首先开发了一种新颖的熵流，直接导致PAC-Bayesian泛化界限。然后我们建立了与著名的对数Sobolev不等式（LSI）的修改版本的新联系，识别了LSI被满足的情况，并获得了改进的界限。除了其普遍性外，我们的框架还允许利用学习算法的特定属性。特别是，我们通过各种技术工具将不同算法类型的噪声结构（有噪声和无噪声）纳入其中。这展示了我们的方法实现已知（但经过泊松化）和新的泛化界限的能力。

更新时间: 2025-02-11 14:31:32

领域: stat.ML,cs.LG

下载: http://arxiv.org/abs/2502.07584v1

Governing AI Agents

The field of AI is undergoing a fundamental transition from generative models that can produce synthetic content to artificial agents that can plan and execute complex tasks with only limited human involvement. Companies that pioneered the development of language models have now built AI agents that can independently navigate the internet, perform a wide range of online tasks, and increasingly serve as AI personal assistants and virtual coworkers. The opportunities presented by this new technology are tremendous, as are the associated risks. Fortunately, there exist robust analytic frameworks for confronting many of these challenges, namely, the economic theory of principal-agent problems and the common law doctrine of agency relationships. Drawing on these frameworks, this Article makes three contributions. First, it uses agency law and theory to identify and characterize problems arising from AI agents, including issues of information asymmetry, discretionary authority, and loyalty. Second, it illustrates the limitations of conventional solutions to agency problems: incentive design, monitoring, and enforcement might not be effective for governing AI agents that make uninterpretable decisions and operate at unprecedented speed and scale. Third, the Article explores the implications of agency law and theory for designing and regulating AI agents, arguing that new technical and legal infrastructure is needed to support governance principles of inclusivity, visibility, and liability.

Updated: 2025-02-11 14:30:44

标题: 治理人工智能代理

摘要: AI领域正在经历一场根本性的转变，从能够生成合成内容的生成模型到能够计划和执行复杂任务的人工智能代理，只需要有限的人类参与。那些开创语言模型发展的公司现在已经建立了可以独立浏览互联网、执行各种在线任务，并越来越多地充当人工智能个人助手和虚拟同事的AI代理。这种新技术带来的机遇巨大，同时也伴随着相关的风险。幸运的是，存在着稳健的分析框架来应对这些挑战，即委托代理问题的经济理论和代理关系的普通法原则。本文基于这些框架做出三方面的贡献。首先，利用代理法和理论来识别和描述由AI代理产生的问题，包括信息不对称、自由裁量权和忠诚度等问题。其次，本文展示了传统解决代理问题的限制：激励设计、监督和执行可能无法有效地管理做出难以解释决策并以前所未有的速度和规模运行的AI代理。第三，本文探讨了代理法和理论对设计和监管AI代理的影响，认为需要新的技术和法律基础设施来支持包容性、可见性和责任性的治理原则。

更新时间: 2025-02-11 14:30:44

领域: cs.AI

下载: http://arxiv.org/abs/2501.07913v2

Generative Modeling with Bayesian Sample Inference

We derive a novel generative model from the simple act of Gaussian posterior inference. Treating the generated sample as an unknown variable to infer lets us formulate the sampling process in the language of Bayesian probability. Our model uses a sequence of prediction and posterior update steps to narrow down the unknown sample from a broad initial belief. In addition to a rigorous theoretical analysis, we establish a connection between our model and diffusion models and show that it includes Bayesian Flow Networks (BFNs) as a special case. In our experiments, we demonstrate improved performance over both BFNs and Variational Diffusion Models, achieving competitive likelihood scores on CIFAR10 and ImageNet.

Updated: 2025-02-11 14:27:10

标题: 使用贝叶斯样本推断进行生成建模

摘要: 我们从高斯后验推断的简单行为中推导出了一种新颖的生成模型。将生成的样本视为一个未知变量进行推断，让我们能够用贝叶斯概率的语言来表达采样过程。我们的模型使用一系列预测和后验更新步骤来从广泛的初始信念中缩小未知样本。除了严格的理论分析，我们还建立了我们的模型与扩散模型之间的联系，并展示了它将贝叶斯流网络（BFNs）作为一个特殊情况。在我们的实验中，我们展示了相对于BFNs和变分扩散模型的改进性能，在CIFAR10和ImageNet上取得了竞争性的似然分数。

更新时间: 2025-02-11 14:27:10

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2502.07580v1

Single-Step Consistent Diffusion Samplers

Sampling from unnormalized target distributions is a fundamental yet challenging task in machine learning and statistics. Existing sampling algorithms typically require many iterative steps to produce high-quality samples, leading to high computational costs that limit their practicality in time-sensitive or resource-constrained settings. In this work, we introduce consistent diffusion samplers, a new class of samplers designed to generate high-fidelity samples in a single step. We first develop a distillation algorithm to train a consistent diffusion sampler from a pretrained diffusion model without pre-collecting large datasets of samples. Our algorithm leverages incomplete sampling trajectories and noisy intermediate states directly from the diffusion process. We further propose a method to train a consistent diffusion sampler from scratch, fully amortizing exploration by training a single model that both performs diffusion sampling and skips intermediate steps using a self-consistency loss. Through extensive experiments on a variety of unnormalized distributions, we show that our approach yields high-fidelity samples using less than 1% of the network evaluations required by traditional diffusion samplers.

Updated: 2025-02-11 14:25:52

标题: 单步一致扩散取样器

摘要: 从未归一化的目标分布中进行抽样是机器学习和统计学中一个基本而具有挑战性的任务。现有的抽样算法通常需要许多迭代步骤才能产生高质量的样本，导致计算成本高，限制了它们在时间敏感或资源受限的环境中的实用性。在这项工作中，我们引入了一种新的抽样器类别——一致扩散抽样器，旨在在单一步骤中生成高保真度的样本。我们首先开发了一个蒸馏算法，从预先训练的扩散模型中训练一致的扩散抽样器，而无需预先收集大量样本数据。我们的算法利用扩散过程中的不完整抽样轨迹和嘈杂的中间状态。我们进一步提出了一种方法，从头开始训练一致的扩散抽样器，通过训练一个既执行扩散抽样又跳过中间步骤的单一模型，完全摊销探索的成本，使用自洽损失。通过在各种未归一化分布上进行大量实验，我们展示了我们的方法可以使用传统扩散抽样器所需网络评估量的不到1%就产生高保真度的样本。

更新时间: 2025-02-11 14:25:52

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2502.07579v1

Neural Networks and (Virtual) Extended Formulations

Neural networks with piecewise linear activation functions, such as rectified linear units (ReLU) or maxout, are among the most fundamental models in modern machine learning. We make a step towards proving lower bounds on the size of such neural networks by linking their representative capabilities to the notion of the extension complexity $\mathrm{xc}(P)$ of a polytope $P$. This is a well-studied quantity in combinatorial optimization and polyhedral geometry describing the number of inequalities needed to model $P$ as a linear program. We show that $\mathrm{xc}(P)$ is a lower bound on the size of any monotone or input-convex neural network that solves the linear optimization problem over $P$. This implies exponential lower bounds on such neural networks for a variety of problems, including the polynomially solvable maximum weight matching problem. In an attempt to prove similar bounds also for general neural networks, we introduce the notion of virtual extension complexity $\mathrm{vxc}(P)$, which generalizes $\mathrm{xc}(P)$ and describes the number of inequalities needed to represent the linear optimization problem over $P$ as a difference of two linear programs. We prove that $\mathrm{vxc}(P)$ is a lower bound on the size of any neural network that optimizes over $P$. While it remains an open question to derive useful lower bounds on $\mathrm{vxc}(P)$, we argue that this quantity deserves to be studied independently from neural networks by proving that one can efficiently optimize over a polytope $P$ using a small virtual extended formulation.

Updated: 2025-02-11 14:19:32

标题: 神经网络与（虚拟）扩展形式

摘要: 具有分段线性激活函数的神经网络，如修正线性单元（ReLU）或maxout，在现代机器学习中是最基本的模型之一。我们通过将它们的代表能力与多面体$P$的扩展复杂性$\mathrm{xc}(P)$的概念联系起来，迈出了证明此类神经网络规模下限的一步。多面体$P$的扩展复杂性$\mathrm{xc}(P)$是组合优化和多面体几何学中的一个研究量，描述将$P$建模为线性规划所需的不等式数量。我们表明$\mathrm{xc}(P)$是解决线性优化问题的任何单调或输入凸神经网络规模的下限。这意味着对包括多项式可解的最大权重匹配问题在内的各种问题存在指数下限的神经网络。为了证明一般神经网络的类似下限，我们引入了虚拟扩展复杂性$\mathrm{vxc}(P)$的概念，它是$\mathrm{xc}(P)$的推广，描述了表示线性优化问题在$P$上的不等式数量，作为两个线性规划之差。我们证明$\mathrm{vxc}(P)$是优化$P$的任何神经网络规模的下限。虽然如何获得有用的$\mathrm{vxc}(P)$下限仍然是一个未解之谜，但我们认为这个量应该从神经网络中独立研究，证明可以有效地使用小型虚拟扩展形式优化多面体$P$。

更新时间: 2025-02-11 14:19:32

领域: math.CO,cs.CC,cs.DM,cs.LG,math.OC

下载: http://arxiv.org/abs/2411.03006v2

Near, far: Patch-ordering enhances vision foundation models' scene understanding

We introduce NeCo: Patch Neighbor Consistency, a novel self-supervised training loss that enforces patch-level nearest neighbor consistency across a student and teacher model. Compared to contrastive approaches that only yield binary learning signals, i.e., 'attract' and 'repel', this approach benefits from the more fine-grained learning signal of sorting spatially dense features relative to reference patches. Our method leverages differentiable sorting applied on top of pretrained representations, such as DINOv2-registers to bootstrap the learning signal and further improve upon them. This dense post-pretraining leads to superior performance across various models and datasets, despite requiring only 19 hours on a single GPU. This method generates high-quality dense feature encoders and establishes several new state-of-the-art results such as +5.5% and +6% for non-parametric in-context semantic segmentation on ADE20k and Pascal VOC, +7.2% and +5.7% for linear segmentation evaluations on COCO-Things and -Stuff and improvements in the 3D understanding of multi-view consistency on SPair-71k, by more than 1.5%.

Updated: 2025-02-11 14:15:13

标题: 近，远：修补顺序增强视觉基础模型的场景理解

摘要: 我们引入NeCo：Patch Neighbor Consistency，这是一种新颖的自监督训练损失，它强化了学生和教师模型之间的补丁级最近邻一致性。与只产生二进制学习信号（即“吸引”和“排斥”）的对比方法相比，这种方法通过相对于参考补丁对空间密集特征进行排序，从而获益于更精细的学习信号。我们的方法利用可微分排序应用于预训练表示，例如DINOv2-寄存器，以引导学习信号并进一步改进它们。这种密集的后预训练方法在各种模型和数据集上实现了卓越的性能，尽管只需要在单个GPU上进行19小时。该方法生成高质量的密集特征编码器，并建立了一些新的最先进结果，如在ADE20k和Pascal VOC上进行非参数上下文语义分割的+5.5%和+6%，在COCO-Things和-Stuff上进行线性分割评估的+7.2%和+5.7%，以及在SPair-71k上进行多视角一致性的3D理解，提高了超过1.5%。

更新时间: 2025-02-11 14:15:13

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2408.11054v2

Finding Dino: A Plug-and-Play Framework for Zero-Shot Detection of Out-of-Distribution Objects Using Prototypes

Detecting and localising unknown or out-of-distribution (OOD) objects in any scene can be a challenging task in vision, particularly in safety-critical cases involving autonomous systems like automated vehicles or trains. Supervised anomaly segmentation or open-world object detection models depend on training on exhaustively annotated datasets for every domain and still struggle in distinguishing between background and OOD objects. In this work, we present a plug-and-play framework - PRototype-based OOD detection Without Labels (PROWL). It is an inference-based method that does not require training on the domain dataset and relies on extracting relevant features from self-supervised pre-trained models. PROWL can be easily adapted to detect in-domain objects in any operational design domain (ODD) in a zero-shot manner by specifying a list of known classes from this domain. PROWL, as a first zero-shot unsupervised method, achieves state-of-the-art results on the RoadAnomaly and RoadObstacle datasets provided in road driving benchmarks - SegmentMeIfYouCan (SMIYC) and Fishyscapes, as well as comparable performance against existing supervised methods trained without auxiliary OOD data. We also demonstrate its generalisability to other domains such as rail and maritime.

Updated: 2025-02-11 14:05:29

标题: 发现Dino：使用原型进行零样本检测的即插即用框架

摘要: 检测和定位任何场景中的未知或超出分布（OOD）对象可能是视觉中的一项具有挑战性的任务，特别是在涉及像自动驾驶车辆或列车这样的自主系统的安全关键情况下。监督异常分割或开放世界对象检测模型依赖于对每个领域进行详尽注释的数据集训练，仍然难以区分背景和OOD对象。在这项工作中，我们提出了一个即插即用的框架 - 无标签的基于原型的OOD检测（PROWL）。这是一种基于推理的方法，不需要在域数据集上进行训练，并依赖于从自监督预训练模型中提取相关特征。PROWL可以通过指定该领域中已知类别的列表以零样本方式轻松适应于任何操作设计领域（ODD）中检测领域内的对象。PROWL作为第一个零样本无监督方法，在道路驾驶基准数据集SegmentMeIfYouCan（SMIYC）和Fishyscapes中提供的RoadAnomaly和RoadObstacle数据集上取得了最先进的结果，以及与训练时没有辅助OOD数据的现有监督方法相当的性能。我们还展示了它在其他领域如铁路和海事中的泛化能力。

更新时间: 2025-02-11 14:05:29

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2404.07664v2

Vision-Language Models for Edge Networks: A Comprehensive Survey

Vision Large Language Models (VLMs) combine visual understanding with natural language processing, enabling tasks like image captioning, visual question answering, and video analysis. While VLMs show impressive capabilities across domains such as autonomous vehicles, smart surveillance, and healthcare, their deployment on resource-constrained edge devices remains challenging due to processing power, memory, and energy limitations. This survey explores recent advancements in optimizing VLMs for edge environments, focusing on model compression techniques, including pruning, quantization, knowledge distillation, and specialized hardware solutions that enhance efficiency. We provide a detailed discussion of efficient training and fine-tuning methods, edge deployment challenges, and privacy considerations. Additionally, we discuss the diverse applications of lightweight VLMs across healthcare, environmental monitoring, and autonomous systems, illustrating their growing impact. By highlighting key design strategies, current challenges, and offering recommendations for future directions, this survey aims to inspire further research into the practical deployment of VLMs, ultimately making advanced AI accessible in resource-limited settings.

Updated: 2025-02-11 14:04:43

标题: 边缘网络中的视觉语言模型：一项全面调查

摘要: 视觉大型语言模型（VLMs）将视觉理解与自然语言处理相结合，实现了诸如图像描述、视觉问答和视频分析等任务。虽然VLMs在自动驾驶车辆、智能监控和医疗保健等领域展现出令人印象深刻的能力，但由于处理能力、内存和能源限制，它们在资源受限的边缘设备上的部署仍然具有挑战性。本调查探讨了优化VLMs用于边缘环境的最新进展，重点关注模型压缩技术，包括剪枝、量化、知识蒸馏和专门的硬件解决方案，以提高效率。我们提供了对高效训练和微调方法、边缘部署挑战和隐私考虑的详细讨论。此外，我们讨论了轻量级VLMs在医疗保健、环境监测和自主系统等领域的多样应用，展示了它们不断增长的影响力。通过强调关键设计策略、当前挑战并提出未来方向的建议，本调查旨在激发对VLMs实际部署的进一步研究，最终使先进人工智能在资源有限的环境中可及。

更新时间: 2025-02-11 14:04:43

领域: cs.CV,cs.AI,cs.CL

下载: http://arxiv.org/abs/2502.07855v1

Imit Diff: Semantics Guided Diffusion Transformer with Dual Resolution Fusion for Imitation Learning

Visuomotor imitation learning enables embodied agents to effectively acquire manipulation skills from video demonstrations and robot proprioception. However, as scene complexity and visual distractions increase, existing methods that perform well in simple scenes tend to degrade in performance. To address this challenge, we introduce Imit Diff, a semanstic guided diffusion transformer with dual resolution fusion for imitation learning. Our approach leverages prior knowledge from vision language foundation models to translate high-level semantic instruction into pixel-level visual localization. This information is explicitly integrated into a multi-scale visual enhancement framework, constructed with a dual resolution encoder. Additionally, we introduce an implementation of Consistency Policy within the diffusion transformer architecture to improve both real-time performance and motion smoothness in embodied agent control.We evaluate Imit Diff on several challenging real-world tasks. Due to its task-oriented visual localization and fine-grained scene perception, it significantly outperforms state-of-the-art methods, especially in complex scenes with visual distractions, including zero-shot experiments focused on visual distraction and category generalization. The code will be made publicly available.

Updated: 2025-02-11 14:03:57

标题: Imit Diff：具有双分辨率融合的语义引导扩散变压器，用于模仿学习

摘要: Visualmotor imitation learning enables embodied agents to effectively acquire manipulation skills from video demonstrations and robot proprioception. However, as scene complexity and visual distractions increase, existing methods that perform well in simple scenes tend to degrade in performance. To address this challenge, we introduce Imit Diff, a semanstic guided diffusion transformer with dual resolution fusion for imitation learning. Our approach leverages prior knowledge from vision language foundation models to translate high-level semantic instruction into pixel-level visual localization. This information is explicitly integrated into a multi-scale visual enhancement framework, constructed with a dual resolution encoder. Additionally, we introduce an implementation of Consistency Policy within the diffusion transformer architecture to improve both real-time performance and motion smoothness in embodied agent control. We evaluate Imit Diff on several challenging real-world tasks. Due to its task-oriented visual localization and fine-grained scene perception, it significantly outperforms state-of-the-art methods, especially in complex scenes with visual distractions, including zero-shot experiments focused on visual distraction and category generalization. The code will be made publicly available.

更新时间: 2025-02-11 14:03:57

领域: cs.AI,cs.CV,cs.LG,cs.RO

下载: http://arxiv.org/abs/2502.09649v1

ChameleonLLM: Batch-Aware Dynamic Low-Rank Adaptation via Inference-Time Clusters

Recent advances in large language models (LLMs) have shown remarkable performance across diverse tasks. However, these models are typically deployed with fixed weights, which limits their ability to adapt dynamically to the variability inherent in real-world data during inference. This paper introduces ChameleonLLM, a novel framework that enables inference-time adaptation of LLMs by leveraging batch-aware clustering and on-the-fly generation of low-rank updates. Unlike traditional fine-tuning approaches such as Low-Rank Adaptation (LoRA) or methods that rely on a fixed set of pre-learned uniforms (changeable masks), our method dynamically generates adaptive modifications to the decoder weights based on the aggregated statistics of clustered batches. By intelligently grouping similar inputs and computing context-aware low-rank updates via a hyper-network, ChameleonLLM achieves significant performance gains, outperforming conventional LoRA methods while eliminating the overhead of maintaining multiple expert models. Our experiments highlight the potential of our approach to serve as a versatile and highly adaptive solution for language model inference. ChameleonLLM is open-sourced to ensure the reproducibility of our experiments: https://anonymous.4open.science/r/ChamaleonLLM/

Updated: 2025-02-11 14:01:39

标题: ChameleonLLM：通过推理时集群进行批处理感知动态低秩适应

摘要: 最近对大型语言模型（LLMs）的研究取得了显著进展，在各种任务中表现出色。然而，这些模型通常使用固定权重部署，这限制了它们在推理过程中动态适应真实数据固有变异性的能力。本文介绍了ChameleonLLM，这是一个新颖的框架，通过利用批处理感知聚类和实时生成低秩更新，实现了LLMs在推理时的自适应。与传统的微调方法（如LoRA）或依赖于固定的预先学习统一（可变掩码）的方法不同，我们的方法基于聚类批次的聚合统计数据动态生成适应性修改解码器权重。通过智能地将相似输入分组，并通过超网络计算上下文感知低秩更新，ChameleonLLM实现了显著的性能提升，优于传统的LoRA方法，同时消除了维护多个专家模型的开销。我们的实验突出了我们的方法作为语言模型推理的多功能和高度自适应解决方案的潜力。ChameleonLLM是开源的，以确保我们实验的可重复性：https://anonymous.4open.science/r/ChamaleonLLM/

更新时间: 2025-02-11 14:01:39

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2502.04315v3

LASP-2: Rethinking Sequence Parallelism for Linear Attention and Its Hybrid

Linear sequence modeling approaches, such as linear attention, provide advantages like linear-time training and constant-memory inference over sequence lengths. However, existing sequence parallelism (SP) methods are either not optimized for the right-product-first feature of linear attention or use a ring-style communication strategy, which results in lower computation parallelism, limits their scalability for longer sequences in distributed systems. In this paper, we introduce LASP-2, a new SP method to enhance both communication and computation parallelism when training linear attention transformer models with very-long input sequences. Compared to previous work LASP, LASP-2 rethinks the minimal communication requirement for SP on linear attention layers, reorganizes the whole communication-computation workflow of LASP. In this way, only one single AllGather collective communication is needed on intermediate memory states, whose sizes are independent of the sequence length, leading to significant improvements of both communication and computation parallelism, as well as their overlap. Additionally, we extend LASP-2 to LASP-2H by applying similar communication redesign to standard attention modules, offering an efficient SP solution for hybrid models that blend linear and standard attention layers. Our evaluation on a Linear-Llama3 model, a variant of Llama3 with linear attention replacing standard attention, demonstrates the effectiveness of LASP-2 and LASP-2H. Specifically, LASP-2 achieves training speed improvements of 15.2% over LASP and 36.6% over Ring Attention, with a sequence length of 2048K across 64 GPUs. The Code is released as a part of: https://github.com/OpenSparseLLMs/Linear-MoE.

Updated: 2025-02-11 14:01:39

标题: LASP-2：重新思考线性注意力的序列并行性及其混合

摘要: 线性序列建模方法，如线性注意力，具有线性时间训练和常数内存推理等优势，适用于不同长度的序列。然而，现有的序列并行（SP）方法要么没有针对线性注意力的正确-产品-优先特性进行优化，要么使用环形通信策略，导致较低的计算并行性，限制了它们在分布式系统中对更长序列的可扩展性。在本文中，我们介绍了LASP-2，一种新的SP方法，用于在训练具有非常长输入序列的线性注意力变换器模型时增强通信和计算并行性。与之前的工作LASP相比，LASP-2重新思考了线性注意力层上的SP的最小通信要求，重新组织了整个LASP的通信-计算工作流程。通过这种方式，在中间存储状态上只需要一个单一的AllGather集体通信，其大小与序列长度无关，大大提高了通信和计算并行性，以及它们的重叠。此外，我们将LASP-2扩展为LASP-2H，通过将类似的通信重设计应用于标准注意力模块，为混合模型提供了有效的SP解决方案，这些模型将线性和标准注意力层结合在一起。我们在Linear-Llama3模型上进行评估，这是Llama3的一个变体，其中线性注意力取代了标准注意力，证明了LASP-2和LASP-2H的有效性。具体而言，LASP-2在64个GPU上跨2048K序列长度实现了15.2%的训练速度提升，比LASP高出36.6%，比Ring Attention高出36.6%。该代码作为https://github.com/OpenSparseLLMs/Linear-MoE的一部分发布。

更新时间: 2025-02-11 14:01:39

领域: cs.LG,cs.AI,cs.CL

下载: http://arxiv.org/abs/2502.07563v1

LoRP-TTS: Low-Rank Personalized Text-To-Speech

Speech synthesis models convert written text into natural-sounding audio. While earlier models were limited to a single speaker, recent advancements have led to the development of zero-shot systems that generate realistic speech from a wide range of speakers using their voices as additional prompts. However, they still struggle with imitating non-studio-quality samples that differ significantly from the training datasets. In this work, we demonstrate that utilizing Low-Rank Adaptation (LoRA) allows us to successfully use even single recordings of spontaneous speech in noisy environments as prompts. This approach enhances speaker similarity by up to $30pp$ while preserving content and naturalness. It represents a significant step toward creating truly diverse speech corpora, that is crucial in all speech-related tasks.

Updated: 2025-02-11 14:00:12

标题: LoRP-TTS：低秩个性化文本转语音

摘要: 语音合成模型将书面文本转换为自然声音的音频。虽然早期模型仅限于单个发言人，但最近的进展导致零样本系统的开发，这些系统可以使用各种发言人的声音作为额外提示生成逼真的语音。然而，它们仍然难以模仿与训练数据集显著不同的非工作室质量样本。在这项工作中，我们展示了利用低秩适应（LoRA）使我们能够成功地使用甚至在嘈杂环境中的单个自发语音录音作为提示。这种方法在保留内容和自然性的同时，将发言者相似度提高了高达$30pp$。这代表了朝着创建真正多样化的语音语料库迈出的重要一步，这在所有与语音相关的任务中都是至关重要的。

更新时间: 2025-02-11 14:00:12

领域: cs.SD,cs.AI,eess.AS

下载: http://arxiv.org/abs/2502.07562v1

The Causal Information Bottleneck and Optimal Causal Variable Abstractions

To effectively study complex causal systems, it is often useful to construct abstractions of parts of the system by discarding irrelevant details while preserving key features. The Information Bottleneck (IB) method is a widely used approach to construct variable abstractions by compressing random variables while retaining predictive power over a target variable. Traditional methods like IB are purely statistical and ignore underlying causal structures, making them ill-suited for causal tasks. We propose the Causal Information Bottleneck (CIB), a causal extension of the IB, which compresses a set of chosen variables while maintaining causal control over a target variable. This method produces abstractions of (sets of) variables which are causally interpretable, give us insight about the interactions between the abstracted variables and the target variable, and can be used when reasoning about interventions. We present experimental results demonstrating that the learned abstractions accurately capture causal relations as intended.

Updated: 2025-02-11 13:59:11

标题: 因果信息瓶颈和最佳因果变量抽象化

摘要: 为了有效地研究复杂的因果系统，通常有必要构建系统部分的抽象，通过舍弃不相关的细节而保留关键特征。信息瓶颈（IB）方法是一种广泛使用的方法，通过压缩随机变量来构建变量抽象，同时保留对目标变量的预测能力。传统方法如IB纯粹是统计的，忽略了潜在的因果结构，使它们不适合于因果任务。我们提出了因果信息瓶颈（CIB），这是IB的因果扩展，它在保持对目标变量的因果控制的同时压缩一组选择的变量。这种方法产生的（一组）变量抽象是因果可解释的，为我们提供了关于抽象变量和目标变量之间相互作用的见解，并且可以在推理干预时使用。我们呈现实验结果，证明学习到的抽象准确捕捉了预期的因果关系。

更新时间: 2025-02-11 13:59:11

领域: cs.LG,cs.AI,cs.IT,math.IT,stat.ML

下载: http://arxiv.org/abs/2410.00535v3

Fault Localization via Fine-tuning Large Language Models with Mutation Generated Stack Traces

Abrupt and unexpected terminations of software are termed as software crashes. They can be challenging to analyze. Finding the root cause requires extensive manual effort and expertise to connect information sources like stack traces, source code, and logs. Typical approaches to fault localization require either test failures or source code. Crashes occurring in production environments, such as that of SAP HANA, provide solely crash logs and stack traces. We present a novel approach to localize faults based only on the stack trace information and no additional runtime information, by fine-tuning large language models (LLMs). We address complex cases where the root cause of a crash differs from the technical cause, and is not located in the innermost frame of the stack trace. As the number of historic crashes is insufficient to fine-tune LLMs, we augment our dataset by leveraging code mutators to inject synthetic crashes into the code base. By fine-tuning on 64,369 crashes resulting from 4.1 million mutations of the HANA code base, we can correctly predict the root cause location of a crash with an accuracy of 66.9\% while baselines only achieve 12.6% and 10.6%. We substantiate the generalizability of our approach by evaluating on two additional open-source databases, SQLite and DuckDB, achieving accuracies of 63% and 74%, respectively. Across all our experiments, fine-tuning consistently outperformed prompting non-finetuned LLMs for localizing faults in our datasets.

Updated: 2025-02-11 13:58:39

标题: 使用突变生成的堆栈跟踪对大型语言模型进行微调的故障定位

摘要: 软件的突然和意外终止被称为软件崩溃。它们在分析过程中可能具有挑战性。要找到根本原因需要大量的手动工作和专业知识，以连接诸如堆栈跟踪、源代码和日志等信息源。典型的故障定位方法要求测试失败或源代码。在生产环境中发生的崩溃，比如SAP HANA的环境，只提供崩溃日志和堆栈跟踪。我们提出了一种基于堆栈跟踪信息而无需额外运行时信息的故障定位新方法，通过对大型语言模型（LLMs）进行微调。我们解决了根本原因与技术原因不同，并且不位于堆栈跟踪最内部帧的复杂情况。由于历史崩溃数量不足以微调LLMs，我们通过利用代码变异器向数据集中注入合成崩溃来增加数据集。通过对从HANA代码库的410万次变异导致的64369次崩溃进行微调，我们可以以66.9％的准确率正确预测崩溃的根本原因位置，而基准值仅达到12.6％和10.6％。我们通过在另外两个开源数据库SQLite和DuckDB上进行评估，分别实现了63％和74％的准确率，证实了我们方法的普适性。在我们的所有实验中，微调始终优于未经微调的LLMs来定位我们数据集中的故障。

更新时间: 2025-02-11 13:58:39

领域: cs.SE,cs.LG

下载: http://arxiv.org/abs/2501.18005v3

An analysis of data variation and bias in image-based dermatological datasets for machine learning classification

AI algorithms have become valuable in aiding professionals in healthcare. The increasing confidence obtained by these models is helpful in critical decision demands. In clinical dermatology, classification models can detect malignant lesions on patients' skin using only RGB images as input. However, most learning-based methods employ data acquired from dermoscopic datasets on training, which are large and validated by a gold standard. Clinical models aim to deal with classification on users' smartphone cameras that do not contain the corresponding resolution provided by dermoscopy. Also, clinical applications bring new challenges. It can contain captures from uncontrolled environments, skin tone variations, viewpoint changes, noises in data and labels, and unbalanced classes. A possible alternative would be to use transfer learning to deal with the clinical images. However, as the number of samples is low, it can cause degradations on the model's performance; the source distribution used in training differs from the test set. This work aims to evaluate the gap between dermoscopic and clinical samples and understand how the dataset variations impact training. It assesses the main differences between distributions that disturb the model's prediction. Finally, from experiments on different architectures, we argue how to combine the data from divergent distributions, decreasing the impact on the model's final accuracy.

Updated: 2025-02-11 13:55:01

标题: 一个关于基于图像的皮肤病数据集中数据变异和偏差的分析，用于机器学习分类。

摘要: 人工智能算法已经成为卫生保健专业人员的有价值的辅助工具。这些模型带来的增加的信心对于关键决策需求是有帮助的。在临床皮肤病学中，分类模型可以使用仅RGB图像作为输入来检测患者皮肤上的恶性损伤。然而，大多数基于学习的方法使用从皮肤镜数据集中获得的数据进行训练，这些数据集庞大且由黄金标准验证。临床模型旨在处理用户智能手机相机上的分类，这些相机不包含皮肤镜所提供的相应分辨率。此外，临床应用带来了新的挑战。它可能包含来自不受控制环境的捕获、肤色变化、视角变化、数据和标签中的噪音以及不平衡的类别。一个可能的替代方案是使用迁移学习来处理临床图像。然而，由于样本数量较少，这可能会导致模型性能下降；训练中使用的源分布与测试集不同。这项工作旨在评估皮肤镜和临床样本之间的差距，并了解数据集变化如何影响训练。它评估了扰乱模型预测的主要差异。最后，通过对不同架构的实验，我们讨论如何将来自不同分布的数据结合起来，减少对模型最终准确性的影响。

更新时间: 2025-02-11 13:55:01

领域: cs.CV,cs.AI,I.5.4; J.3

下载: http://arxiv.org/abs/2501.08962v2

JBShield: Defending Large Language Models from Jailbreak Attacks through Activated Concept Analysis and Manipulation

Despite the implementation of safety alignment strategies, large language models (LLMs) remain vulnerable to jailbreak attacks, which undermine these safety guardrails and pose significant security threats. Some defenses have been proposed to detect or mitigate jailbreaks, but they are unable to withstand the test of time due to an insufficient understanding of jailbreak mechanisms. In this work, we investigate the mechanisms behind jailbreaks based on the Linear Representation Hypothesis (LRH), which states that neural networks encode high-level concepts as subspaces in their hidden representations. We define the toxic semantics in harmful and jailbreak prompts as toxic concepts and describe the semantics in jailbreak prompts that manipulate LLMs to comply with unsafe requests as jailbreak concepts. Through concept extraction and analysis, we reveal that LLMs can recognize the toxic concepts in both harmful and jailbreak prompts. However, unlike harmful prompts, jailbreak prompts activate the jailbreak concepts and alter the LLM output from rejection to compliance. Building on our analysis, we propose a comprehensive jailbreak defense framework, JBShield, consisting of two key components: jailbreak detection JBShield-D and mitigation JBShield-M. JBShield-D identifies jailbreak prompts by determining whether the input activates both toxic and jailbreak concepts. When a jailbreak prompt is detected, JBShield-M adjusts the hidden representations of the target LLM by enhancing the toxic concept and weakening the jailbreak concept, ensuring LLMs produce safe content. Extensive experiments demonstrate the superior performance of JBShield, achieving an average detection accuracy of 0.95 and reducing the average attack success rate of various jailbreak attacks to 2% from 61% across distinct LLMs.

Updated: 2025-02-11 13:50:50

标题: JBShield：通过激活的概念分析和操纵来防御大型语言模型免受越狱攻击

摘要: 尽管安全对齐策略已经实施，但大型语言模型（LLMs）仍然容易受到越狱攻击的威胁，这会削弱这些安全防护措施并构成重大安全威胁。一些防御措施已经被提出来检测或减轻越狱攻击，但由于对越狱机制的了解不足，这些防御措施无法经受时间的考验。在这项工作中，我们根据线性表示假设（LRH）研究了越狱背后的机制，该假设认为神经网络将高级概念编码为它们的隐藏表示中的子空间。我们将有害和越狱提示中的有毒语义定义为有毒概念，并描述越狱提示中操纵LLMs以遵从不安全请求的语义为越狱概念。通过概念提取和分析，我们发现LLMs可以识别有害和越狱提示中的有毒概念。然而，与有害提示不同，越狱提示会激活越狱概念，并将LLM的输出从拒绝改为遵从。基于我们的分析，我们提出了一个全面的越狱防御框架JBShield，包括两个关键组件：越狱检测JBShield-D和缓解JBShield-M。JBShield-D通过确定输入是否激活了有毒和越狱概念来识别越狱提示。当检测到越狱提示时，JBShield-M通过增强有毒概念和削弱越狱概念来调整目标LLM的隐藏表示，确保LLMs产生安全内容。大量实验证明了JBShield的卓越性能，平均检测准确率达到了0.95，并将各种越狱攻击的平均成功率从61%降低到2%。

更新时间: 2025-02-11 13:50:50

领域: cs.CR

下载: http://arxiv.org/abs/2502.07557v1

Safe Interval RRT* for Scalable Multi-Robot Path Planning in Continuous Space

In this paper, we consider the problem of Multi-Robot Path Planning (MRPP) in continuous space. The difficulty of the problem arises from the extremely large search space caused by the combinatorial nature of the problem and the continuous state space. We propose a two-level approach where the low level is a sampling-based planner Safe Interval RRT* (SI-RRT*) that finds a collision-free trajectory for individual robots. The high level can use any method that can resolve inter-robot conflicts where we employ two representative methods that are Prioritized Planning (SI-CPP) and Conflict Based Search (SI-CCBS). Experimental results show that SI-RRT* can quickly find a high-quality solution with a few samples. SI-CPP exhibits improved scalability while SI-CCBS produces higher-quality solutions compared to the state-of-the-art planners for continuous space.

Updated: 2025-02-11 13:46:36

标题: 在连续空间中可扩展多机器人路径规划的安全间隔RRT*

摘要: 在这篇论文中，我们考虑了连续空间中多机器人路径规划（MRPP）的问题。问题的困难在于由于问题的组合性质和连续状态空间引起的非常庞大的搜索空间。我们提出了一个两级方法，其中低级是基于采样的规划器Safe Interval RRT*（SI-RRT*），用于为单个机器人找到无碰撞的轨迹。高级可以使用任何可以解决机器人之间冲突的方法，我们采用了两种代表性方法，即优先规划（SI-CPP）和基于冲突的搜索（SI-CCBS）。实验结果表明，SI-RRT*可以快速找到高质量的解决方案，只需少量样本。SI-CPP展现出更好的可扩展性，而SI-CCBS与连续空间的最先进规划器相比产生更高质量的解决方案。

更新时间: 2025-02-11 13:46:36

领域: cs.RO,cs.AI,cs.MA

下载: http://arxiv.org/abs/2404.01752v3

Attention Learning is Needed to Efficiently Learn Parity Function

Transformers, with their attention mechanisms, have emerged as the state-of-the-art architectures of sequential modeling and empirically outperform feed-forward neural networks (FFNNs) across many fields, such as natural language processing and computer vision. However, their generalization ability, particularly for low-sensitivity functions, remains less studied. We bridge this gap by analyzing transformers on the $k$-parity problem. Daniely and Malach (NeurIPS 2020) show that FFNNs with one hidden layer and $O(nk^7 \log k)$ parameters can learn $k$-parity, where the input length $n$ is typically much larger than $k$. In this paper, we prove that FFNNs require at least $\Omega(n)$ parameters to learn $k$-parity, while transformers require only $O(k)$ parameters, surpassing the theoretical lower bound needed by FFNNs. We further prove that this parameter efficiency cannot be achieved with fixed attention heads. Our work establishes transformers as theoretically superior to FFNNs in learning parity function, showing how their attention mechanisms enable parameter-efficient generalization in functions with low sensitivity.

Updated: 2025-02-11 13:41:30

标题: 需要学习注意力以有效学习奇偶功能

摘要: 使用他们的注意机制，变压器已成为顺序建模的最先进架构，并在许多领域（如自然语言处理和计算机视觉）中经验性地优于前馈神经网络（FFNNs）。然而，它们的泛化能力，特别是对于低敏感性函数，仍然少有研究。我们通过在$k$-奇偶问题上分析变压器来填补这一空白。Daniely和Malach（NeurIPS 2020）表明，具有一个隐藏层和$O(nk^7\log k)$参数的FFNN可以学习$k$-奇偶，其中输入长度$n$通常远大于$k$。在本文中，我们证明FFNN至少需要$\Omega(n)$个参数来学习$k$-奇偶，而变压器只需要$O(k)$个参数，超过了FFNN所需的理论下限。我们进一步证明，这种参数效率无法通过固定注意头来实现。我们的工作确立了变压器在学习奇偶函数方面在理论上优于FFNN，展示了它们的注意机制如何在低敏感性函数中实现参数高效的泛化。

更新时间: 2025-02-11 13:41:30

领域: cs.LG

下载: http://arxiv.org/abs/2502.07553v1

Unsupervised Translation of Emergent Communication

Emergent Communication (EC) provides a unique window into the language systems that emerge autonomously when agents are trained to jointly achieve shared goals. However, it is difficult to interpret EC and evaluate its relationship with natural languages (NL). This study employs unsupervised neural machine translation (UNMT) techniques to decipher ECs formed during referential games with varying task complexities, influenced by the semantic diversity of the environment. Our findings demonstrate UNMT's potential to translate EC, illustrating that task complexity characterized by semantic diversity enhances EC translatability, while higher task complexity with constrained semantic variability exhibits pragmatic EC, which, although challenging to interpret, remains suitable for translation. This research marks the first attempt, to our knowledge, to translate EC without the aid of parallel data.

Updated: 2025-02-11 13:41:06

标题: 无监督的新兴通信翻译

摘要: Emergent Communication (EC)提供了一个独特的窗口，让我们可以观察到在代理人被训练以共同实现共享目标时自发产生的语言系统。然而，解释EC并评估它与自然语言（NL）之间的关系是困难的。本研究采用了无监督神经机器翻译（UNMT）技术来解读在受语义环境影响而形成的具有不同任务复杂性的指称游戏中形成的EC。我们的研究结果表明UNMT有潜力翻译EC，说明由语义多样性影响的任务复杂性增强了EC的可翻译性，而具有受限语义变化的更高任务复杂性展现出实用的EC，尽管难以解释，但仍适合进行翻译。据我们所知，这项研究是第一次尝试在没有平行数据的情况下翻译EC。

更新时间: 2025-02-11 13:41:06

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2502.07552v1

Early Stopping Against Label Noise Without Validation Data

Early stopping methods in deep learning face the challenge of balancing the volume of training and validation data, especially in the presence of label noise. Concretely, sparing more data for validation from training data would limit the performance of the learned model, yet insufficient validation data could result in a sub-optimal selection of the desired model. In this paper, we propose a novel early stopping method called Label Wave, which does not require validation data for selecting the desired model in the presence of label noise. It works by tracking the changes in the model's predictions on the training set during the training process, aiming to halt training before the model unduly fits mislabeled data. This method is empirically supported by our observation that minimum fluctuations in predictions typically occur at the training epoch before the model excessively fits mislabeled data. Through extensive experiments, we show both the effectiveness of the Label Wave method across various settings and its capability to enhance the performance of existing methods for learning with noisy labels.

Updated: 2025-02-11 13:40:15

标题: 没有验证数据的情况下早停止对抗标签噪声

摘要: 深度学习中的早停止方法面临着在训练和验证数据量之间达到平衡的挑战，尤其是在存在标签噪声的情况下。具体而言，从训练数据中保留更多数据用于验证会限制所学模型的性能，但验证数据不足可能导致选择所需模型的次优选择。在本文中，我们提出了一种名为Label Wave的新颖早停止方法，它在存在标签噪声的情况下不需要验证数据来选择所需模型。它通过跟踪模型在训练过程中对训练集的预测变化来工作，旨在在模型过度拟合错误标记数据之前停止训练。我们的观察经验支持了这种方法，即在模型过度拟合错误标记数据之前，预测中的最小波动通常发生在训练周期。通过大量实验，我们展示了Label Wave方法在各种设置中的有效性，以及其增强现有方法在存在噪声标签的学习中的性能的能力。

更新时间: 2025-02-11 13:40:15

领域: cs.LG

下载: http://arxiv.org/abs/2502.07551v1

HGTUL: A Hypergraph-based Model For Trajectory User Linking

Trajectory User Linking (TUL), which links anonymous trajectories with users who generate them, plays a crucial role in modeling human mobility. Despite significant advancements in this field, existing studies primarily neglect the high-order inter-trajectory relationships, which represent complex associations among multiple trajectories, manifested through multi-location co-occurrence patterns emerging when trajectories intersect at various Points of Interest (POIs). Furthermore, they also overlook the variable influence of POIs on different trajectories, as well as the user class imbalance problem caused by disparities in user activity levels and check-in frequencies. To address these limitations, we propose a novel HyperGraph-based multi-perspective Trajectory User Linking model (HGTUL). Our model learns trajectory representations from both relational and spatio-temporal perspectives: (1) it captures high-order associations among trajectories by constructing a trajectory hypergraph and leverages a hypergraph attention network to learn the variable impact of POIs on trajectories; (2) it models the spatio-temporal characteristics of trajectories by incorporating their temporal and spatial information into a sequential encoder. Moreover, we design a data balancing method to effectively address the user class imbalance problem and experimentally validate its significance in TUL. Extensive experiments on three real-world datasets demonstrate that HGTUL outperforms state-of-the-art baselines, achieving improvements of 2.57%~20.09% and 5.68%~26.00% in ACC@1 and Macro-F1 metrics, respectively.

Updated: 2025-02-11 13:39:35

标题: HGTUL：一种基于超图的轨迹用户链接模型

摘要: Trajectory User Linking (TUL)是将匿名轨迹与生成它们的用户联系起来，在建模人类移动性方面起着至关重要的作用。尽管在这一领域取得了显著进展，但现有研究主要忽视了高阶轨迹间关系，这些关系代表了多个轨迹之间的复杂关联，表现为轨迹在各种兴趣点（POIs）相交时出现的多位置共现模式。此外，它们还忽视了POIs对不同轨迹的可变影响，以及由用户活动水平和签到频率不平衡引起的用户类别不平衡问题。为了解决这些限制，我们提出了一种新颖的基于超图的多视角轨迹用户链接模型（HGTUL）。我们的模型从关系和时空角度学习轨迹表示：（1）通过构建轨迹超图捕捉轨迹之间的高阶关联，并利用超图注意力网络学习POIs对轨迹的可变影响；（2）通过将轨迹的时间和空间信息纳入到顺序编码器中，模拟轨迹的时空特性。此外，我们设计了一种数据平衡方法，有效解决用户类别不平衡问题，并在TUL中进行了实验证明其重要性。在三个真实世界数据集上进行的大量实验表明，HGTUL优于最先进的基线模型，在ACC@1和Macro-F1指标上分别实现了2.57%~20.09%和5.68%~26.00%的改进。

更新时间: 2025-02-11 13:39:35

领域: cs.LG,cs.AI,68-07,I.2.6

下载: http://arxiv.org/abs/2502.07549v1

Scalable and Robust Physics-Informed Graph Neural Networks for Water Distribution Systems

Water distribution systems (WDSs) are an important part of critical infrastructure becoming increasingly significant in the face of climate change and urban population growth. We propose a robust and scalable surrogate deep learning (DL) model to enable efficient planning, expansion, and rehabilitation of WDSs. Our approach incorporates an improved graph neural network architecture, an adapted physics-informed algorithm, an innovative training scheme, and a physics-preserving data normalization method. Evaluation results on a number of WDSs demonstrate that our model outperforms the current state-of-the-art DL model. Moreover, our method allows us to scale the model to bigger and more realistic WDSs. Furthermore, our approach makes the model more robust to out-of-distribution input features (demands, pipe diameters). Hence, our proposed method constitutes a significant step towards bridging the simulation-to-real gap in the use of artificial intelligence for WDSs.

Updated: 2025-02-11 13:38:14

标题: 可扩展和稳健的物理信息图神经网络用于水配系统

摘要: 水配送系统（WDSs）是关键基础设施的重要组成部分，在气候变化和城市人口增长的背景下变得越来越重要。我们提出了一种稳健且可伸缩的替代深度学习（DL）模型，以实现水配送系统的高效规划、扩建和修复。我们的方法包括改进的图神经网络架构、适应物理知识的算法、创新的训练方案以及保持物理特性的数据归一化方法。在多个WDSs的评估结果表明，我们的模型优于当前最先进的DL模型。此外，我们的方法使我们能够将模型扩展到更大更现实的WDSs。此外，我们的方法使模型更具鲁棒性，能够处理超出分布范围的输入特征（需求、管道直径）。因此，我们提出的方法在利用人工智能进行WDSs时在模拟与实际之间构建了重要的一步。

更新时间: 2025-02-11 13:38:14

领域: cs.NE,cs.LG,cs.SY,eess.SY

下载: http://arxiv.org/abs/2502.12164v1

NeoRL: Efficient Exploration for Nonepisodic RL

We study the problem of nonepisodic reinforcement learning (RL) for nonlinear dynamical systems, where the system dynamics are unknown and the RL agent has to learn from a single trajectory, i.e., without resets. We propose Nonepisodic Optimistic RL (NeoRL), an approach based on the principle of optimism in the face of uncertainty. NeoRL uses well-calibrated probabilistic models and plans optimistically w.r.t. the epistemic uncertainty about the unknown dynamics. Under continuity and bounded energy assumptions on the system, we provide a first-of-its-kind regret bound of $O(\Gamma_T \sqrt{T})$ for general nonlinear systems with Gaussian process dynamics. We compare NeoRL to other baselines on several deep RL environments and empirically demonstrate that NeoRL achieves the optimal average cost while incurring the least regret.

Updated: 2025-02-11 13:35:23

标题: NeoRL：非时序RL的高效探索

摘要: 我们研究非记忆强化学习（RL）在非线性动态系统中的问题，其中系统动态未知，RL代理必须从单个轨迹中学习，即不进行重置。我们提出了Nonepisodic Optimistic RL（NeoRL），这是一种基于乐观主义原则的方法，面对不确定性。NeoRL使用良好校准的概率模型，并对未知动态的认知不确定性进行乐观规划。在对系统连续性和有界能量的假设下，我们为具有高斯过程动态的一般非线性系统提供了一种首次的遗憾上界，为$O(\Gamma_T \sqrt{T})$。我们将NeoRL与其他基线在多个深度RL环境中进行比较，并在实证上证明，NeoRL在达到最佳平均成本的同时，遭受最少的遗憾。

更新时间: 2025-02-11 13:35:23

领域: cs.LG

下载: http://arxiv.org/abs/2406.01175v4

Instance-dependent Early Stopping

In machine learning practice, early stopping has been widely used to regularize models and can save computational costs by halting the training process when the model's performance on a validation set stops improving. However, conventional early stopping applies the same stopping criterion to all instances without considering their individual learning statuses, which leads to redundant computations on instances that are already well-learned. To further improve the efficiency, we propose an Instance-dependent Early Stopping (IES) method that adapts the early stopping mechanism from the entire training set to the instance level, based on the core principle that once the model has mastered an instance, the training on it should stop. IES considers an instance as mastered if the second-order differences of its loss value remain within a small range around zero. This offers a more consistent measure of an instance's learning status compared with directly using the loss value, and thus allows for a unified threshold to determine when an instance can be excluded from further backpropagation. We show that excluding mastered instances from backpropagation can increase the gradient norms, thereby accelerating the decrease of the training loss and speeding up the training process. Extensive experiments on benchmarks demonstrate that IES method can reduce backpropagation instances by 10%-50% while maintaining or even slightly improving the test accuracy and transfer learning performance of a model.

Updated: 2025-02-11 13:34:09

标题: 实例相关的提前停止

摘要: 在机器学习实践中，早停止被广泛用于正则化模型，并且可以通过在验证集上模型表现不再提升时停止训练过程来节省计算成本。然而，传统的早停止方法对所有实例应用相同的停止准则，而不考虑它们的个体学习状态，这导致对已经学习良好的实例进行冗余计算。为了进一步提高效率，我们提出了一种基于实例的早停止（IES）方法，根据核心原则，即一旦模型掌握了一个实例，应该停止对其进行训练，将整个训练集的早停止机制调整到实例级别。IES认为，如果一个实例的损失值的二阶差在零附近保持在一个较小的范围内，那么这个实例就被视为已经掌握。与直接使用损失值相比，这提供了一个更一致的度量实例学习状态的方法，从而允许使用统一的阈值确定何时可以将一个实例排除在进一步的反向传播之外。我们表明，从反向传播中排除掌握的实例可以增加梯度范数，从而加速训练损失的减少，加快训练过程。在基准测试中进行的大量实验表明，IES方法可以减少反向传播实例10%-50%，同时保持甚至略微提高模型的测试精度和迁移学习性能。

更新时间: 2025-02-11 13:34:09

领域: cs.LG

下载: http://arxiv.org/abs/2502.07547v1

Towards More Accurate Full-Atom Antibody Co-Design

Antibody co-design represents a critical frontier in drug development, where accurate prediction of both 1D sequence and 3D structure of complementarity-determining regions (CDRs) is essential for targeting specific epitopes. Despite recent advances in equivariant graph neural networks for antibody design, current approaches often fall short in capturing the intricate interactions that govern antibody-antigen recognition and binding specificity. In this work, we present Igformer, a novel end-to-end framework that addresses these limitations through innovative modeling of antibody-antigen binding interfaces. Our approach refines the inter-graph representation by integrating personalized propagation with global attention mechanisms, enabling comprehensive capture of the intricate interplay between local chemical interactions and global conformational dependencies that characterize effective antibody-antigen binding. Through extensive validation on epitope-binding CDR design and structure prediction tasks, Igformer demonstrates significant improvements over existing methods, suggesting that explicit modeling of multi-scale residue interactions can substantially advance computational antibody design for therapeutic applications.

Updated: 2025-02-11 13:33:28

标题: 朝着更精准的全原子抗体共同设计方向

摘要: 抗体共设计代表了药物开发中的一个关键前沿，其中准确预测互补决定区域（CDRs）的1D序列和3D结构对于瞄准特定表位至关重要。尽管最近在抗体设计方面取得了进展，但目前的方法往往无法捕捉决定抗体-抗原识别和结合特异性的复杂相互作用。在这项工作中，我们提出了Igformer，一种新颖的端到端框架，通过创新地建模抗体-抗原结合界面来解决这些局限性。我们的方法通过将个性化传播与全局注意机制相结合来完善图间表示，实现了对局部化学相互作用和全局构象依赖之间错综复杂的互动的全面捕捉，这些特征了有效的抗体-抗原结合。通过在表位结合CDR设计和结构预测任务上进行广泛验证，Igformer显示出明显的改进，表明明确建模多尺度残基相互作用可以大大推进计算抗体设计用于治疗应用。

更新时间: 2025-02-11 13:33:28

领域: q-bio.BM,cs.LG

下载: http://arxiv.org/abs/2502.19391v1

UKTA: Unified Korean Text Analyzer

Evaluating writing quality is complex and time-consuming often delaying feedback to learners. While automated writing evaluation tools are effective for English, Korean automated writing evaluation tools face challenges due to their inability to address multi-view analysis, error propagation, and evaluation explainability. To overcome these challenges, we introduce UKTA (Unified Korean Text Analyzer), a comprehensive Korea text analysis and writing evaluation system. UKTA provides accurate low-level morpheme analysis, key lexical features for mid-level explainability, and transparent high-level rubric-based writing scores. Our approach enhances accuracy and quadratic weighted kappa over existing baseline, positioning UKTA as a leading multi-perspective tool for Korean text analysis and writing evaluation.

Updated: 2025-02-11 13:30:56

标题: 英国统一韩文文本分析器

摘要: 评估写作质量是复杂且耗时的，通常会延迟对学习者的反馈。虽然自动写作评估工具在英语方面效果显著，但韩语自动写作评估工具面临多视角分析、错误传播和评估可解释性的挑战。为了克服这些挑战，我们引入了统一韩文文本分析器（UKTA），这是一个全面的韩文文本分析和写作评估系统。UKTA提供准确的低级语素分析，中级解释性的关键词汇特征，以及基于透明高级评分标准的写作评分。我们的方法提高了准确性和现有基线的二次加权kappa值，将UKTA定位为韩文文本分析和写作评估的领先多视角工具。

更新时间: 2025-02-11 13:30:56

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2502.09648v1

Exoplanet Transit Candidate Identification in TESS Full-Frame Images via a Transformer-Based Algorithm

The Transiting Exoplanet Survey Satellite (TESS) is surveying a large fraction of the sky, generating a vast database of photometric time series data that requires thorough analysis to identify exoplanetary transit signals. Automated learning approaches have been successfully applied to identify transit signals. However, most existing methods focus on the classification and validation of candidates, while few efforts have explored new techniques for the search of candidates. To search for new exoplanet transit candidates, we propose an approach to identify exoplanet transit signals without the need for phase folding or assuming periodicity in the transit signals, such as those observed in multi-transit light curves. To achieve this, we implement a new neural network inspired by Transformers to directly process Full Frame Image (FFI) light curves to detect exoplanet transits. Transformers, originally developed for natural language processing, have recently demonstrated significant success in capturing long-range dependencies compared to previous approaches focused on sequential data. This ability allows us to employ multi-head self-attention to identify exoplanet transit signals directly from the complete light curves, combined with background and centroid time series, without requiring prior transit parameters. The network is trained to learn characteristics of the transit signal, like the dip shape, which helps distinguish planetary transits from other variability sources. Our model successfully identified 214 new planetary system candidates, including 122 multi-transit light curves, 88 single-transit and 4 multi-planet systems from TESS sectors 1-26 with a radius > 0.27 $R_{\mathrm{Jupiter}}$, demonstrating its ability to detect transits regardless of their periodicity.

Updated: 2025-02-11 13:29:58

标题: TESS全景图像中外行星凌日候选体的识别：基于Transformer算法

摘要: 横越系外行星调查卫星（TESS）正在对大部分天空进行调查，生成了一个庞大的光变时间序列数据数据库，需要深入分析以识别系外行星凌日信号。自动学习方法已成功应用于识别凌日信号。然而，大多数现有方法侧重于候选者的分类和验证，而很少有工作探索了寻找候选者的新技术。为了寻找新的系外行星凌日候选者，我们提出了一种方法，可以识别系外行星凌日信号，而无需进行相位折叠或假定凌日信号中的周期性，例如观测到的多次凌日光变曲线。为了实现这一目标，我们实现了一种受变压器启发的新神经网络，直接处理全帧图像（FFI）光变曲线，以检测系外行星凌日。变压器最初是为自然语言处理而开发的，最近证明了与之前侧重于顺序数据的方法相比，它在捕获长距离依赖性方面取得了显著成功。这种能力使我们能够利用多头自注意力来直接从完整光变曲线中识别系外行星凌日信号，结合背景和质心时间序列，而无需先验的凌日参数。该网络经过训练，学习了凌日信号的特征，如凹陷形状，有助于将行星凌日与其他变化源区分开。我们的模型成功识别了214个新的行星系统候选者，包括122个多次凌日光变曲线、88个单次凌日和4个多行星系统，其半径大于0.27倍木星半径，展示了其无论周期性如何都能检测凌日的能力。

更新时间: 2025-02-11 13:29:58

领域: astro-ph.EP,astro-ph.GA,astro-ph.IM,cs.AI

下载: http://arxiv.org/abs/2502.07542v1

Fast and Accurate Antibody Sequence Design via Structure Retrieval

Recent advancements in protein design have leveraged diffusion models to generate structural scaffolds, followed by a process known as protein inverse folding, which involves sequence inference on these scaffolds. However, these methodologies face significant challenges when applied to hyper-variable structures such as antibody Complementarity-Determining Regions (CDRs), where sequence inference frequently results in non-functional sequences due to hallucinations. Distinguished from prevailing protein inverse folding approaches, this paper introduces Igseek, a novel structure-retrieval framework that infers CDR sequences by retrieving similar structures from a natural antibody database. Specifically, Igseek employs a simple yet effective multi-channel equivariant graph neural network to generate high-quality geometric representations of CDR backbone structures. Subsequently, it aligns sequences of structurally similar CDRs and utilizes structurally conserved sequence motifs to enhance inference accuracy. Our experiments demonstrate that Igseek not only proves to be highly efficient in structural retrieval but also outperforms state-of-the-art approaches in sequence recovery for both antibodies and T-Cell Receptors, offering a new retrieval-based perspective for therapeutic protein design.

Updated: 2025-02-11 13:29:49

标题: 快速准确的抗体序列设计方法通过结构检索

摘要: 最近在蛋白质设计领域取得的进展已经利用扩散模型生成结构支架，然后进行一种称为蛋白质逆折叠的过程，该过程涉及对这些支架进行序列推断。然而，当应用于抗体互补决定区域（CDR）等高度可变结构时，这些方法面临着重大挑战，因为序列推断经常会导致由于幻觉而导致的非功能性序列。与目前的蛋白质逆折叠方法有所不同，本文介绍了Igseek，这是一个新颖的结构检索框架，通过从天然抗体数据库中检索类似结构来推断CDR序列。具体而言，Igseek采用简单而有效的多通道等变图神经网络生成CDR主干结构的高质量几何表示。随后，它通过对结构相似的CDR序列进行对齐，并利用结构保守的序列模体来增强推断准确性。我们的实验表明，Igseek不仅在结构检索方面高效，而且在抗体和T细胞受体的序列恢复方面优于最先进的方法，为治疗蛋白质设计提供了一种新的基于检索的视角。

更新时间: 2025-02-11 13:29:49

领域: q-bio.BM,cs.LG

下载: http://arxiv.org/abs/2502.19395v1

PLMTrajRec: A Scalable and Generalizable Trajectory Recovery Method with Pre-trained Language Models

Spatiotemporal trajectory data is crucial for various applications. However, issues such as device malfunctions and network instability often cause sparse trajectories, leading to lost detailed movement information. Recovering the missing points in sparse trajectories to restore the detailed information is thus essential. Despite recent progress, several challenges remain. First, the lack of large-scale dense trajectory data makes it difficult to train a trajectory recovery model from scratch. Second, the varying spatiotemporal correlations in sparse trajectories make it hard to generalize recovery across different sampling intervals. Third, the lack of location information complicates the extraction of road conditions for missing points. To address these challenges, we propose a novel trajectory recovery model called PLMTrajRec. It leverages the scalability of a pre-trained language model (PLM) and can be fine-tuned with only a limited set of dense trajectories. To handle different sampling intervals in sparse trajectories, we first convert each trajectory's sampling interval and movement features into natural language representations, allowing the PLM to recognize its interval. We then introduce a trajectory encoder to unify trajectories of varying intervals into a single interval and capture their spatiotemporal relationships. To obtain road conditions for missing points, we propose an area flow-guided implicit trajectory prompt, which models road conditions by collecting traffic flows in each region. We also introduce a road condition passing mechanism that uses observed points' road conditions to infer those of the missing points. Experiments on two public trajectory datasets with three sampling intervals each demonstrate the effectiveness, scalability, and generalization ability of PLMTrajRec.

Updated: 2025-02-11 13:28:10

标题: PLMTrajRec：一种可扩展且通用的轨迹恢复方法，具有预训练的语言模型

摘要: 时空轨迹数据对各种应用至关重要。然而，设备故障和网络不稳定等问题经常导致轨迹稀疏，从而丢失详细的移动信息。因此，恢复稀疏轨迹中丢失的点以恢复详细信息至关重要。尽管最近取得了一些进展，但仍存在几个挑战。首先，缺乏大规模密集轨迹数据使得从头开始训练轨迹恢复模型变得困难。其次，稀疏轨迹中不同的时空相关性使得跨不同采样间隔的恢复变得困难。第三，缺乏位置信息使得提取丢失点的道路条件变得复杂。为了解决这些挑战，我们提出了一种新颖的轨迹恢复模型称为PLMTrajRec。它利用预训练语言模型（PLM）的可伸缩性，并且只需要有限量的密集轨迹进行微调。为了处理稀疏轨迹中不同的采样间隔，我们首先将每个轨迹的采样间隔和移动特征转换为自然语言表示，使得PLM能够识别其间隔。然后我们引入一个轨迹编码器将不同间隔的轨迹统一到一个单一间隔，并捕捉它们的时空关系。为了获取丢失点的道路条件，我们提出了一个区域流引导的隐式轨迹提示，通过收集每个区域的交通流量来建模道路条件。我们还引入了一个道路条件传递机制，利用观测点的道路条件推断丢失点的道路条件。在两个公共轨迹数据集上进行的实验，每个数据集有三种采样间隔，展示了PLMTrajRec的有效性、可伸缩性和泛化能力。

更新时间: 2025-02-11 13:28:10

领域: cs.LG

下载: http://arxiv.org/abs/2410.14281v2

MeMo: Meaningful, Modular Controllers via Noise Injection

Robots are often built from standardized assemblies, (e.g. arms, legs, or fingers), but each robot must be trained from scratch to control all the actuators of all the parts together. In this paper we demonstrate a new approach that takes a single robot and its controller as input and produces a set of modular controllers for each of these assemblies such that when a new robot is built from the same parts, its control can be quickly learned by reusing the modular controllers. We achieve this with a framework called MeMo which learns (Me)aningful, (Mo)dular controllers. Specifically, we propose a novel modularity objective to learn an appropriate division of labor among the modules. We demonstrate that this objective can be optimized simultaneously with standard behavior cloning loss via noise injection. We benchmark our framework in locomotion and grasping environments on simple to complex robot morphology transfer. We also show that the modules help in task transfer. On both structure and task transfer, MeMo achieves improved training efficiency to graph neural network and Transformer baselines.

Updated: 2025-02-11 13:27:59

标题: MeMo: 通过噪声注入实现有意义、模块化的控制器

摘要: 机器人通常由标准化的组件（例如手臂、腿或手指）组建，但每个机器人必须从头开始训练，以控制所有部件的所有执行器。在本文中，我们展示了一种新方法，该方法以单个机器人及其控制器为输入，并生成每个组件的一组模块化控制器，以便当使用相同部件构建新机器人时，可以通过重用模块化控制器快速学习其控制。我们通过一个名为MeMo的框架实现了这一点，该框架学习（Me）有意义的，（Mo）模块化控制器。具体来说，我们提出了一种新颖的模块化目标，以学习在模块之间适当的分工。我们展示了这一目标可以通过噪声注入同时与标准的行为克隆损失进行优化。我们在简单到复杂的机器人形态转移中，在行走和抓取环境中对我们的框架进行基准测试。我们还展示了模块在任务转移中的帮助。在结构和任务转移方面，MeMo实现了比图神经网络和Transformer基线更高的训练效率。

更新时间: 2025-02-11 13:27:59

领域: cs.RO,cs.LG

下载: http://arxiv.org/abs/2407.01567v2

Physics-consistent machine learning: output projection onto physical manifolds

Data-driven machine learning models often require extensive datasets, which can be costly or inaccessible, and their predictions may fail to comply with established physical laws. Current approaches for incorporating physical priors mitigate these issues by penalizing deviations from known physical laws, as in physics-informed neural networks, or by designing architectures that automatically satisfy specific invariants. However, penalization approaches do not guarantee compliance with physical constraints for unseen inputs, and invariant-based methods lack flexibility and generality. We propose a novel physics-consistent machine learning method that directly enforces compliance with physical principles by projecting model outputs onto the manifold defined by these laws. This procedure ensures that predictions inherently adhere to the chosen physical constraints, improving reliability and interpretability. Our method is demonstrated on two systems: a spring-mass system and a low-temperature reactive plasma. Compared to purely data-driven models, our approach significantly reduces errors in physical law compliance, enhances predictive accuracy of physical quantities, and outperforms alternatives when working with simpler models or limited datasets. The proposed projection-based technique is versatile and can function independently or in conjunction with existing physics-informed neural networks, offering a powerful, general, and scalable solution for developing fast and reliable surrogate models of complex physical systems, particularly in resource-constrained scenarios.

Updated: 2025-02-11 13:18:19

标题: 物理一致的机器学习：输出投影到物理流形上

摘要: 数据驱动的机器学习模型通常需要大量数据集，这可能成本高昂或无法获取，并且它们的预测可能无法符合已建立的物理定律。当前的方法是通过惩罚与已知物理定律的偏差来融合物理先验，例如物理信息神经网络，或者通过设计能够自动满足特定不变量的架构。然而，惩罚方法不能保证对未知输入遵守物理约束，而基于不变性的方法缺乏灵活性和普适性。我们提出了一种新颖的物理一致的机器学习方法，通过将模型输出投影到由这些定律定义的流形上来直接强制执行与物理原则的一致性。这个过程确保预测固有地遵守所选的物理约束，提高了可靠性和可解释性。我们的方法在两个系统上进行了演示：一个弹簧-质量系统和一个低温反应等离子体。与纯数据驱动模型相比，我们的方法显著降低了物理定律遵从性的错误，提高了物理量的预测准确性，并在处理简单模型或有限数据集时胜过其他方法。所提出的基于投影的技术是多才多艺的，可以独立运作或与现有的物理信息神经网络结合使用，为开发快速可靠的复杂物理系统的替代模型提供了强大、普遍和可扩展的解决方案，特别是在资源受限的情况下。

更新时间: 2025-02-11 13:18:19

领域: cs.LG,cs.AI,physics.plasm-ph,68T07

下载: http://arxiv.org/abs/2502.15755v1

Evaluating Small Language Models for News Summarization: Implications and Factors Influencing Performance

The increasing demand for efficient summarization tools in resource-constrained environments highlights the need for effective solutions. While large language models (LLMs) deliver superior summarization quality, their high computational resource requirements limit practical use applications. In contrast, small language models (SLMs) present a more accessible alternative, capable of real-time summarization on edge devices. However, their summarization capabilities and comparative performance against LLMs remain underexplored. This paper addresses this gap by presenting a comprehensive evaluation of 19 SLMs for news summarization across 2,000 news samples, focusing on relevance, coherence, factual consistency, and summary length. Our findings reveal significant variations in SLM performance, with top-performing models such as Phi3-Mini and Llama3.2-3B-Ins achieving results comparable to those of 70B LLMs while generating more concise summaries. Notably, SLMs are better suited for simple prompts, as overly complex prompts may lead to a decline in summary quality. Additionally, our analysis indicates that instruction tuning does not consistently enhance the news summarization capabilities of SLMs. This research not only contributes to the understanding of SLMs but also provides practical insights for researchers seeking efficient summarization solutions that balance performance and resource use.

Updated: 2025-02-11 13:12:16

标题: 评估小语言模型用于新闻摘要：影响性能的因素和含义

摘要: 在资源受限环境中对高效摘要工具的需求不断增加，凸显了有效解决方案的必要性。尽管大型语言模型（LLMs）提供了更高质量的摘要能力，但其高计算资源需求限制了实际应用场景。相比之下，小型语言模型（SLMs）提供了一种更易接触的替代方案，能够在边缘设备上进行实时摘要。然而，它们的摘要能力以及与LLMs的比较性能仍未得到充分探讨。本文通过对19个SLMs在2,000个新闻样本上的新闻摘要进行全面评估，重点关注关联性、连贯性、事实一致性和摘要长度。研究结果显示，SLMs的性能存在显著差异，像Phi3-Mini和Llama3.2-3B-Ins这样的顶尖模型在生成更简洁摘要的同时，实现了与70B LLMs相当的结果。值得注意的是，SLMs更适合简单提示，过于复杂的提示可能会导致摘要质量下降。此外，我们的分析表明，指导调整并不一致地提升SLMs的新闻摘要能力。这项研究不仅有助于理解SLMs，还为寻求平衡性能与资源利用的高效摘要解决方案的研究人员提供了实用见解。

更新时间: 2025-02-11 13:12:16

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2502.00641v2

Advancing Heat Demand Forecasting with Attention Mechanisms: Opportunities and Challenges

Global leaders and policymakers are unified in their unequivocal commitment to decarbonization efforts in support of Net-Zero agreements. District Heating Systems (DHS), while contributing to carbon emissions due to the continued reliance on fossil fuels for heat production, are embracing more sustainable practices albeit with some sense of vulnerability as it could constrain their ability to adapt to dynamic demand and production scenarios. As demographic demands grow and renewables become the central strategy in decarbonizing the heating sector, the need for accurate demand forecasting has intensified. Advances in digitization have paved the way for Machine Learning (ML) based solutions to become the industry standard for modeling complex time series patterns. In this paper, we focus on building a Deep Learning (DL) model that uses deconstructed components of independent and dependent variables that affect heat demand as features to perform multi-step ahead forecasting of head demand. The model represents the input features in a time-frequency space and uses an attention mechanism to generate accurate forecasts. The proposed method is evaluated on a real-world dataset and the forecasting performance is assessed against LSTM and CNN-based forecasting models. Across different supply zones, the attention-based models outperforms the baselines quantitatively and qualitatively, with an Mean Absolute Error (MAE) of 0.105 with a standard deviation of 0.06kW h and a Mean Absolute Percentage Error (MAPE) of 5.4% with a standard deviation of 2.8%, in comparison the second best model with a MAE of 0.10 with a standard deviation of 0.06kW h and a MAPE of 5.6% with a standard deviation of 3%.

Updated: 2025-02-11 13:12:06

标题: 用注意力机制推进热需求预测：机遇与挑战

摘要: 全球领导人和政策制定者在坚定承诺支持净零协议的脱碳努力方面达成一致。尽管地区供热系统(DHS)由于继续依赖化石燃料进行供热而导致碳排放，但正在接受更加可持续的做法，尽管在某种程度上存在脆弱性，因为这可能会限制它们适应动态需求和生产情况的能力。随着人口需求增长和可再生能源成为脱碳供热行业的核心策略，对准确的需求预测的需求日益加剧。数字化技术的进步为基于机器学习(ML)的解决方案成为建模复杂时间序列模式的行业标准铺平了道路。在本文中，我们专注于构建一个深度学习(DL)模型，利用影响热需求的独立和因变量的分解组件作为特征，执行热需求的多步预测。该模型将输入特征表示为时间频率空间，并使用注意机制生成准确的预测。所提出的方法在真实数据集上进行评估，并将预测性能与基于LSTM和CNN的预测模型进行评估。在不同供应区域，基于注意力的模型在定量和定性上优于基线，其平均绝对误差(MAE)为0.105，标准差为0.06千瓦时，平均绝对百分比误差(MAPE)为5.4%，标准差为2.8%，与第二好的模型相比，其MAE为0.10，标准差为0.06千瓦时，MAPE为5.6%，标准差为3%。

更新时间: 2025-02-11 13:12:06

领域: cs.LG,cs.CV

下载: http://arxiv.org/abs/2502.07854v1

Training Deep Learning Models with Norm-Constrained LMOs

In this work, we study optimization methods that leverage the linear minimization oracle (LMO) over a norm-ball. We propose a new stochastic family of algorithms that uses the LMO to adapt to the geometry of the problem and, perhaps surprisingly, show that they can be applied to unconstrained problems. The resulting update rule unifies several existing optimization methods under a single framework. Furthermore, we propose an explicit choice of norm for deep architectures, which, as a side benefit, leads to the transferability of hyperparameters across model sizes. Experimentally, we demonstrate significant speedups on nanoGPT training without any reliance on Adam. The proposed method is memory-efficient, requiring only one set of model weights and one set of gradients, which can be stored in half-precision.

Updated: 2025-02-11 13:10:34

标题: 使用范数约束的LMOs训练深度学习模型

摘要: 在这项工作中，我们研究了利用线性最小化预言机（LMO）在范数球上的优化方法。我们提出了一种新的随机算法族，利用LMO来适应问题的几何结构，并且令人惊讶地表明它们可以应用于无约束问题。所得到的更新规则将几种现有的优化方法统一在一个框架下。此外，我们提出了一种深度架构的明确范数选择，作为一个副产品，这导致超参数在模型大小之间的可转移性。在实验中，我们展示了在无需依赖Adam的情况下，在nanoGPT训练中取得了显著加速。所提出的方法具有内存效率，只需要一个模型权重集和一个梯度集，可以以半精度存储。

更新时间: 2025-02-11 13:10:34

领域: cs.LG,math.OC

下载: http://arxiv.org/abs/2502.07529v1

Forecasting the future development in quality and value of professional football players for applications in team management

Transfers in professional football (soccer) are risky investments because of the large transfer fees and high risks involved. Although data-driven models can be used to improve transfer decisions, existing models focus on describing players' historical progress, leaving their future performance unknown. Moreover, recent developments have called for the use of explainable models combined with uncertainty quantification of predictions. This paper assesses explainable machine learning models based on predictive accuracy and uncertainty quantification methods for the prediction of the future development in quality and transfer value of professional football players. Using a historical data set of data-driven indicators describing player quality and the transfer value of a football player, the models are trained to forecast player quality and player value one year ahead. These two prediction problems demonstrate the efficacy of tree-based models, particularly random forest and XGBoost, in making accurate predictions. In general, the random forest model is found to be the most suitable model because it provides accurate predictions as well as an uncertainty quantification method that naturally arises from the bagging procedure of the random forest model. Additionally, our research shows that the development of player performance contains nonlinear patterns and interactions between variables, and that time series information can provide useful information for the modeling of player performance metrics. Our research provides models to help football clubs make more informed, data-driven transfer decisions by forecasting player quality and transfer value.

Updated: 2025-02-11 13:09:09

标题: 预测专业足球运动员质量和价值未来发展，以应用于团队管理

摘要: 在职业足球转会中，由于涉及巨额转会费用和高风险，转会被视为一项风险投资。虽然可以利用数据驱动模型来改善转会决策，但现有模型主要集中在描述球员的历史进展，而未能预测他们的未来表现。此外，最近的发展要求利用可解释的模型结合预测不确定性量化。本文评估了基于预测准确性和不确定性量化方法的可解释机器学习模型，用于预测职业足球运动员质量和转会价值的未来发展。利用描述球员质量和转会价值的数据驱动指标的历史数据集，训练模型以预测一年后的球员质量和球员价值。这两个预测问题展示了基于树的模型，特别是随机森林和XGBoost，在进行准确预测方面的有效性。总体而言，随机森林模型被发现是最适合的模型，因为它提供了准确的预测以及自然产生于随机森林模型的装袋过程的不确定性量化方法。此外，我们的研究表明，球员表现的发展包含非线性模式和变量之间的相互作用，时间序列信息可以为建模球员表现指标提供有用信息。我们的研究提供了模型，帮助足球俱乐部通过预测球员质量和转会价值做出更为明智的、数据驱动的转会决策。

更新时间: 2025-02-11 13:09:09

领域: stat.AP,cs.LG

下载: http://arxiv.org/abs/2502.07528v1

NatureLM: Deciphering the Language of Nature for Scientific Discovery

Foundation models have revolutionized natural language processing and artificial intelligence, significantly enhancing how machines comprehend and generate human languages. Inspired by the success of these foundation models, researchers have developed foundation models for individual scientific domains, including small molecules, materials, proteins, DNA, and RNA. However, these models are typically trained in isolation, lacking the ability to integrate across different scientific domains. Recognizing that entities within these domains can all be represented as sequences, which together form the "language of nature", we introduce Nature Language Model (briefly, NatureLM), a sequence-based science foundation model designed for scientific discovery. Pre-trained with data from multiple scientific domains, NatureLM offers a unified, versatile model that enables various applications including: (i) generating and optimizing small molecules, proteins, RNA, and materials using text instructions; (ii) cross-domain generation/design, such as protein-to-molecule and protein-to-RNA generation; and (iii) achieving state-of-the-art performance in tasks like SMILES-to-IUPAC translation and retrosynthesis on USPTO-50k. NatureLM offers a promising generalist approach for various scientific tasks, including drug discovery (hit generation/optimization, ADMET optimization, synthesis), novel material design, and the development of therapeutic proteins or nucleotides. We have developed NatureLM models in different sizes (1 billion, 8 billion, and 46.7 billion parameters) and observed a clear improvement in performance as the model size increases.

Updated: 2025-02-11 13:08:03

标题: NatureLM：解读自然语言，为科学发现效助

摘要: 基础模型已经彻底改变了自然语言处理和人工智能，显著提升了机器理解和生成人类语言的能力。受到这些基础模型成功的启发，研究人员开发了针对个别科学领域的基础模型，包括小分子、材料、蛋白质、DNA和RNA。然而，这些模型通常是孤立训练的，缺乏跨不同科学领域整合的能力。认识到这些领域内的实体都可以被表示为序列，这些序列共同构成了“自然语言”，我们介绍了Nature Language Model（简称NatureLM），这是一个基于序列的科学基础模型，旨在进行科学发现。通过预训练多个科学领域的数据，NatureLM提供了一个统一、多功能的模型，可以实现多种应用，包括：（i）使用文本说明生成和优化小分子、蛋白质、RNA和材料；（ii）跨领域生成/设计，如蛋白质到分子和蛋白质到RNA的生成；以及（iii）在SMILES-to-IUPAC翻译和USPTO-50k上的返向合成等任务中实现最先进的性能。NatureLM为各种科学任务提供了一种有前途的通用方法，包括药物发现（命中生成/优化、ADMET优化、合成）、新材料设计以及治疗蛋白质或核苷酸的开发。我们开发了不同规模的NatureLM模型（10亿、80亿和467亿参数），观察到随着模型规模的增加，性能明显提高。

更新时间: 2025-02-11 13:08:03

领域: cs.AI,cs.LG

下载: http://arxiv.org/abs/2502.07527v1

Overview of the Amphion Toolkit (v0.2)

Amphion is an open-source toolkit for Audio, Music, and Speech Generation, designed to lower the entry barrier for junior researchers and engineers in these fields. It provides a versatile framework that supports a variety of generation tasks and models. In this report, we introduce Amphion v0.2, the second major release developed in 2024. This release features a 100K-hour open-source multilingual dataset, a robust data preparation pipeline, and novel models for tasks such as text-to-speech, audio coding, and voice conversion. Furthermore, the report includes multiple tutorials that guide users through the functionalities and usage of the newly released models.

Updated: 2025-02-11 13:05:36

标题: Amphion工具包（v0.2）概述

摘要: Amphion是一个开源工具包，用于音频、音乐和语音生成，旨在降低初级研究人员和工程师在这些领域的准入门槛。它提供了一个多功能框架，支持各种生成任务和模型。在这份报告中，我们介绍了Amphion v0.2，这是在2024年开发的第二个重要版本。这个版本包括一个100K小时的开源多语言数据集，一个强大的数据准备流水线，以及用于文本转语音、音频编码和语音转换等任务的新颖模型。此外，报告还包括多个教程，指导用户如何使用新发布的模型的功能和用法。

更新时间: 2025-02-11 13:05:36

领域: cs.SD,cs.AI,eess.AS

下载: http://arxiv.org/abs/2501.15442v2

Probabilistic Foundations for Metacognition via Hybrid-AI

Metacognition is the concept of reasoning about an agent's own internal processes, and it has recently received renewed attention with respect to artificial intelligence (AI) and, more specifically, machine learning systems. This paper reviews a hybrid-AI approach known as "error detecting and correcting rules" (EDCR) that allows for the learning of rules to correct perceptual (e.g., neural) models. Additionally, we introduce a probabilistic framework that adds rigor to prior empirical studies, and we use this framework to prove results on necessary and sufficient conditions for metacognitive improvement, as well as limits to the approach. A set of future

Updated: 2025-02-11 12:57:13

标题: Probabilistic Foundations for Metacognition via Hybrid-AI 通过混合人工智能实现元认知的概率基础

摘要: 研究表明，元认知是指代理人对自身内部过程进行推理的概念，并且最近在人工智能（AI）和更具体地说是机器学习系统方面受到了重新关注。本文回顾了一种名为“错误检测和校正规则”（EDCR）的混合AI方法，该方法允许学习规则来纠正感知（例如神经）模型。此外，我们引入了一个概率框架，为先前的实证研究增加了严谨性，并利用这一框架证明了元认知改进的必要和充分条件，以及该方法的限制。未来的研究方向包括...（未完待续）.

更新时间: 2025-02-11 12:57:13

领域: cs.AI

下载: http://arxiv.org/abs/2502.05398v2

Scaling Off-Policy Reinforcement Learning with Batch and Weight Normalization

Reinforcement learning has achieved significant milestones, but sample efficiency remains a bottleneck for real-world applications. Recently, CrossQ has demonstrated state-of-the-art sample efficiency with a low update-to-data (UTD) ratio of 1. In this work, we explore CrossQ's scaling behavior with higher UTD ratios. We identify challenges in the training dynamics, which are emphasized by higher UTD ratios. To address these, we integrate weight normalization into the CrossQ framework, a solution that stabilizes training, has been shown to prevent potential loss of plasticity and keeps the effective learning rate constant. Our proposed approach reliably scales with increasing UTD ratios, achieving competitive performance across 25 challenging continuous control tasks on the DeepMind Control Suite and Myosuite benchmarks, notably the complex dog and humanoid environments. This work eliminates the need for drastic interventions, such as network resets, and offers a simple yet robust pathway for improving sample efficiency and scalability in model-free reinforcement learning.

Updated: 2025-02-11 12:55:32

标题: 使用批处理和权重归一化对脱机策略强化学习进行扩展

摘要: 强化学习已经取得了重要的里程碑，但样本效率仍然是现实世界应用的瓶颈。最近，CrossQ展示了具有低更新数据比率（UTD）为1的最先进的样本效率。在这项工作中，我们探索了CrossQ在更高UTD比率下的扩展行为。我们发现训练动态中的挑战，这些挑战在更高的UTD比率下得到强调。为了解决这些问题，我们将权重归一化集成到CrossQ框架中，这是一个稳定训练的解决方案，已经被证明可以防止潜在的可塑性损失，并保持有效的学习率恒定。我们提出的方法可靠地随着UTD比率的增加扩展，在DeepMind控制套件和Myosuite基准测试中的25个具有挑战性的连续控制任务中取得了竞争性的表现，特别是复杂的狗和人形环境。这项工作消除了对网络重置等激烈干预的需求，并为改进无模型强化学习中的样本效率和可伸缩性提供了一条简单而强大的途径。

更新时间: 2025-02-11 12:55:32

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2502.07523v1

Latent Linear Quadratic Regulator for Robotic Control Tasks

Model predictive control (MPC) has played a more crucial role in various robotic control tasks, but its high computational requirements are concerning, especially for nonlinear dynamical models. This paper presents a $\textbf{la}$tent $\textbf{l}$inear $\textbf{q}$uadratic $\textbf{r}$egulator (LaLQR) that maps the state space into a latent space, on which the dynamical model is linear and the cost function is quadratic, allowing the efficient application of LQR. We jointly learn this alternative system by imitating the original MPC. Experiments show LaLQR's superior efficiency and generalization compared to other baselines.

Updated: 2025-02-11 12:51:40

标题: 机器人控制任务的潜在线性二次调节器

摘要: 模型预测控制（MPC）在各种机器人控制任务中发挥着更为关键的作用，但其高计算要求令人担忧，特别是对于非线性动态模型。本文提出了一种将状态空间映射到潜在空间的潜在线性二次调节器（LaLQR），在该空间上，动态模型是线性的，成本函数是二次的，从而实现了对LQR的有效应用。我们通过模仿原始MPC来共同学习这种替代系统。实验证明LaLQR相对于其他基线具有更高的效率和泛化能力。

更新时间: 2025-02-11 12:51:40

领域: cs.RO,cs.LG

下载: http://arxiv.org/abs/2407.11107v2

MemControl: Mitigating Memorization in Diffusion Models via Automated Parameter Selection

Diffusion models excel in generating images that closely resemble their training data but are also susceptible to data memorization, raising privacy, ethical, and legal concerns, particularly in sensitive domains such as medical imaging. We hypothesize that this memorization stems from the overparameterization of deep models and propose that regularizing model capacity during fine-tuning can mitigate this issue. Firstly, we empirically show that regulating the model capacity via Parameter-efficient fine-tuning (PEFT) mitigates memorization to some extent, however, it further requires the identification of the exact parameter subsets to be fine-tuned for high-quality generation. To identify these subsets, we introduce a bi-level optimization framework, MemControl, that automates parameter selection using memorization and generation quality metrics as rewards during fine-tuning. The parameter subsets discovered through MemControl achieve a superior tradeoff between generation quality and memorization. For the task of medical image generation, our approach outperforms existing state-of-the-art memorization mitigation strategies by fine-tuning as few as 0.019% of model parameters. Moreover, we demonstrate that the discovered parameter subsets are transferable to non-medical domains. Our framework is scalable to large datasets, agnostic to reward functions, and can be integrated with existing approaches for further memorization mitigation. To the best of our knowledge, this is the first study to empirically evaluate memorization in medical images and propose a targeted yet universal mitigation strategy. The code is available at https://github.com/Raman1121/Diffusion_Memorization_HPO.

Updated: 2025-02-11 12:41:08

标题: MemControl: 通过自动参数选择减轻扩散模型中的记忆效应

摘要: 扩散模型在生成与其训练数据密切相似的图像方面表现出色，但也容易受到数据记忆的影响，引发隐私、道德和法律方面的关注，特别是在医学成像等敏感领域。我们假设这种记忆源于深度模型的参数过多，并提出在微调过程中对模型容量进行正则化可以缓解这一问题。首先，我们通过经验验证，通过参数高效微调（PEFT）调节模型容量可以在一定程度上缓解记忆效应，但进一步需要确定哪些精确的参数子集需要进行高质量生成的微调。为了识别这些子集，我们引入了一个双层优化框架MemControl，该框架在微调过程中使用记忆和生成质量指标作为奖励来自动选择参数。通过MemControl发现的参数子集在生成质量和记忆效应之间实现了卓越的权衡。在医学图像生成任务中，我们的方法通过微调仅0.019%的模型参数就胜过现有最先进的记忆缓解策略。此外，我们证明发现的参数子集可以转移到非医学领域。我们的框架可扩展到大型数据集，不受奖励函数的限制，并可与现有方法集成以进一步减轻记忆效应。据我们所知，这是第一项在医学图像中经验评估记忆效应并提出有针对性但通用的缓解策略的研究。代码可在https://github.com/Raman1121/Diffusion_Memorization_HPO找到。

更新时间: 2025-02-11 12:41:08

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2405.19458v4

Holistic Semantic Representation for Navigational Trajectory Generation

Trajectory generation has garnered significant attention from researchers in the field of spatio-temporal analysis, as it can generate substantial synthesized human mobility trajectories that enhance user privacy and alleviate data scarcity. However, existing trajectory generation methods often focus on improving trajectory generation quality from a singular perspective, lacking a comprehensive semantic understanding across various scales. Consequently, we are inspired to develop a HOlistic SEmantic Representation (HOSER) framework for navigational trajectory generation. Given an origin-and-destination (OD) pair and the starting time point of a latent trajectory, we first propose a Road Network Encoder to expand the receptive field of road- and zone-level semantics. Second, we design a Multi-Granularity Trajectory Encoder to integrate the spatio-temporal semantics of the generated trajectory at both the point and trajectory levels. Finally, we employ a Destination-Oriented Navigator to seamlessly integrate destination-oriented guidance. Extensive experiments on three real-world datasets demonstrate that HOSER outperforms state-of-the-art baselines by a significant margin. Moreover, the model's performance in few-shot learning and zero-shot learning scenarios further verifies the effectiveness of our holistic semantic representation.

Updated: 2025-02-11 12:38:35

标题: 整体语义表示用于导航轨迹生成

摘要: 轨迹生成在时空分析领域引起了研究人员的广泛关注，因为它可以生成大量的综合人类移动轨迹，增强用户隐私并减轻数据稀缺问题。然而，现有的轨迹生成方法通常侧重于从单一视角改善轨迹生成质量，缺乏跨不同尺度的全面语义理解。因此，我们受到启发，开发了一个用于导航轨迹生成的全面语义表示（HOSER）框架。给定一个起点和终点（OD）对以及潜在轨迹的起始时间点，我们首先提出了一个道路网络编码器，以扩展道路和区域级语义的感知范围。其次，我们设计了一个多粒度轨迹编码器，以整合生成轨迹在点和轨迹级别的时空语义。最后，我们采用了一个面向目的地的导航器，无缝集成目的地导向指导。对三个真实世界数据集的广泛实验表明，HOSER在很大程度上优于现有基线。此外，在少样本学习和零样本学习场景中模型的表现进一步验证了我们全面语义表示的有效性。

更新时间: 2025-02-11 12:38:35

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2501.02737v2

A Near-optimal, Scalable and Corruption-tolerant Framework for Stochastic Bandits: From Single-Agent to Multi-Agent and Beyond

We investigate various stochastic bandit problems in the presence of adversarial corruption. A seminal contribution to this area is the BARBAR~\citep{gupta2019better} algorithm, which is both simple and efficient, tolerating significant levels of corruption with nearly no degradation in performance. However, its regret upper bound exhibits a complexity of $O(KC)$, while the lower bound is $\Omega(C)$. In this paper, we enhance the BARBAR algorithm by proposing a novel framework called BARBAT, which eliminates the factor of $K$ and achieves an optimal regret bound up to a logarithmic factor. We also demonstrate how BARBAT can be extended to various settings, including graph bandits, combinatorial semi-bandits, batched bandits and multi-agent bandits. In comparison to the Follow-The-Regularized-Leader (FTRL) family of methods, which provide a best-of-both-worlds guarantee, our approach is more efficient and parallelizable. Notably, FTRL-based methods face challenges in scaling to batched and multi-agent settings.

Updated: 2025-02-11 12:33:33

标题: 一个近乎最优、可扩展且容忍错误的随机赌博框架：从单智能体到多智能体及更远

摘要: 我们研究了在对手性干扰存在的情况下的各种随机赌博问题。该领域的一个开创性的贡献是BARBAR算法，它既简单又高效，可以容忍相当程度的干扰而几乎不降低性能。然而，其遗憾上界展示了一个复杂度为$O(KC)$，而下界为$\Omega(C)$。在本文中，我们通过提出一个称为BARBAT的新框架来增强BARBAR算法，消除了$K$因子，并达到了一个最优的遗憾上界，最多具有对数因子。我们还演示了BARBAT如何扩展到各种设置，包括图赌博、组合半赌博、批处理赌博和多智能体赌博。与提供最佳保证的Follow-The-Regularized-Leader（FTRL）方法系列相比，我们的方法更高效、更可并行化。值得注意的是，基于FTRL的方法在扩展到批处理和多智能体设置方面面临挑战。

更新时间: 2025-02-11 12:33:33

领域: cs.LG

下载: http://arxiv.org/abs/2502.07514v1

Domain-invariant Clinical Representation Learning by Bridging Data Distribution Shift across EMR Datasets

Emerging diseases present challenges in symptom recognition and timely clinical intervention due to limited available information. An effective prognostic model could assist physicians in making accurate diagnoses and designing personalized treatment plans to prevent adverse outcomes. However, in the early stages of disease emergence, several factors hamper model development: limited data collection, insufficient clinical experience, and privacy and ethical concerns restrict data availability and complicate accurate label assignment. Furthermore, Electronic Medical Record (EMR) data from different diseases or sources often exhibit significant cross-dataset feature misalignment, severely impacting the effectiveness of deep learning models. We present a domain-invariant representation learning method that constructs a transition model between source and target datasets. By constraining the distribution shift of features generated across different domains, we capture domain-invariant features specifically relevant to downstream tasks, developing a unified domain-invariant encoder that achieves better feature representation across various task domains. Experimental results across multiple target tasks demonstrate that our proposed model surpasses competing baseline methods and achieves faster training convergence, particularly when working with limited data. Extensive experiments validate our method's effectiveness in providing more accurate predictions for emerging pandemics and other diseases. Code is publicly available at https://github.com/wang1yuhang/domain_invariant_network.

Updated: 2025-02-11 12:32:59

标题: 通过跨越电子医疗记录数据集之间的数据分布转移实现领域不变的临床表示学习

摘要: 新兴疾病在症状识别和及时临床干预方面存在挑战，因为可用信息有限。有效的预后模型可以帮助医生做出准确诊断，并设计个性化治疗方案以预防不良结果。然而，在疾病出现的早期阶段，有几个因素妨碍模型的发展：数据收集有限，临床经验不足，隐私和伦理问题限制了数据的可用性，并使准确标签分配复杂化。此外，来自不同疾病或来源的电子医疗记录（EMR）数据经常表现出显著的跨数据集特征不对齐，严重影响深度学习模型的有效性。我们提出了一种领域不变表示学习方法，构建了一个在源数据集和目标数据集之间的过渡模型。通过限制在不同领域生成的特征的分布变化，我们捕捉了特定于下游任务的领域不变特征，开发了一个统一的领域不变编码器，实现了更好的特征表示在各种任务领域。跨多个目标任务的实验结果表明，我们提出的模型超越了竞争基线方法，并在使用有限数据时实现更快的训练收敛。广泛的实验证明了我们的方法在提供新兴大流行病和其他疾病的更准确预测方面的有效性。代码可在https://github.com/wang1yuhang/domain_invariant_network上公开获取。

更新时间: 2025-02-11 12:32:59

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2310.07799v3

RAGFormer: Learning Semantic Attributes and Topological Structure for Fraud Detection

Fraud detection remains a challenging task due to the complex and deceptive nature of fraudulent activities. Current approaches primarily concentrate on learning only one perspective of the graph: either the topological structure of the graph or the attributes of individual nodes. However, we conduct empirical studies to reveal that these two types of features, while nearly orthogonal, are each independently effective. As a result, previous methods can not fully capture the comprehensive characteristics of the fraud graph. To address this dilemma, we present a novel framework called Relation-Aware GNN with transFormer~(RAGFormer) which simultaneously embeds both semantic and topological features into a target node. The simple yet effective network consists of a semantic encoder, a topology encoder, and an attention fusion module. The semantic encoder utilizes Transformer to learn semantic features and node interactions across different relations. We introduce Relation-Aware GNN as the topology encoder to learn topological features and node interactions within each relation. These two complementary features are interleaved through an attention fusion module to support prediction by both orthogonal features. Extensive experiments on two popular public datasets demonstrate that RAGFormer achieves state-of-the-art performance. The significant improvement of RAGFormer in an industrial credit card fraud detection dataset further validates the applicability of our method in real-world business scenarios.

Updated: 2025-02-11 12:29:00

标题: RAGFormer：学习欺诈检测的语义属性和拓扑结构

摘要: 欺诈检测仍然是一个具有挑战性的任务，因为欺诈活动的复杂和欺骗性质。当前的方法主要集中在学习图的一个视角：要么是图的拓扑结构，要么是个体节点的属性。然而，我们进行了经验研究，揭示了这两种特征，虽然几乎正交，但每种特征都是独立有效的。因此，以前的方法无法完全捕捉欺诈图的全面特征。为了解决这一困境，我们提出了一种名为Relation-Aware GNN with transFormer~(RAGFormer)的新型框架，它将语义特征和拓扑特征同时嵌入到目标节点中。这种简单而有效的网络由语义编码器、拓扑编码器和注意力融合模块组成。语义编码器利用Transformer学习语义特征和不同关系之间的节点交互。我们引入Relation-Aware GNN作为拓扑编码器，学习每个关系内的拓扑特征和节点交互。这两种互补特征通过注意力融合模块交错在一起，支持通过正交特征进行预测。对两个流行的公共数据集进行的大量实验表明，RAGFormer实现了最先进的性能。在一个工业信用卡欺诈检测数据集中，RAGFormer的显著改进进一步验证了我们的方法在现实世界业务场景中的适用性。

更新时间: 2025-02-11 12:29:00

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2402.17472v4

Joint Metric Space Embedding by Unbalanced OT with Gromov-Wasserstein Marginal Penalization

We propose a new approach for unsupervised alignment of heterogeneous datasets, which maps data from two different domains without any known correspondences to a common metric space. Our method is based on an unbalanced optimal transport problem with Gromov-Wasserstein marginal penalization. It can be seen as a counterpart to the recently introduced joint multidimensional scaling method. We prove that there exists a minimizer of our functional and that for penalization parameters going to infinity, the corresponding sequence of minimizers converges to a minimizer of the so-called embedded Wasserstein distance. Our model can be reformulated as a quadratic, multi-marginal, unbalanced optimal transport problem, for which a bi-convex relaxation admits a numerical solver via block-coordinate descent. We provide numerical examples for joint embeddings in Euclidean as well as non-Euclidean spaces.

Updated: 2025-02-11 12:28:47

标题: 使用不平衡OT和Gromov-Wasserstein边际惩罚进行联合度量空间嵌入

摘要: 我们提出了一种新的方法，用于无监督对齐异构数据集，将来自两个不同领域的数据映射到一个没有任何已知对应关系的公共度量空间中。我们的方法基于一个带有Gromov-Wasserstein边际惩罚的不平衡最优输运问题。它可以看作是最近引入的联合多维尺度方法的一个对应物。我们证明了存在我们的函数的一个最小化器，并且对于惩罚参数趋向于无穷大时，相应的最小化器序列收敛于所谓的嵌入式Wasserstein距离的最小化器。我们的模型可以重新表述为一个二次、多边缘、不平衡的最优输运问题，对于这个问题，通过块坐标下降的数值求解器可以进行双凸松弛。我们提供了在欧几里得空间和非欧几里得空间中进行联合嵌入的数值示例。

更新时间: 2025-02-11 12:28:47

领域: cs.LG

下载: http://arxiv.org/abs/2502.07510v1

MAPF-GPT: Imitation Learning for Multi-Agent Pathfinding at Scale

Multi-agent pathfinding (MAPF) is a problem that generally requires finding collision-free paths for multiple agents in a shared environment. Solving MAPF optimally, even under restrictive assumptions, is NP-hard, yet efficient solutions for this problem are critical for numerous applications, such as automated warehouses and transportation systems. Recently, learning-based approaches to MAPF have gained attention, particularly those leveraging deep reinforcement learning. Typically, such learning-based MAPF solvers are augmented with additional components like single-agent planning or communication. Orthogonally, in this work we rely solely on imitation learning that leverages a large dataset of expert MAPF solutions and transformer-based neural network to create a foundation model for MAPF called MAPF-GPT. The latter is capable of generating actions without additional heuristics or communication. MAPF-GPT demonstrates zero-shot learning abilities when solving the MAPF problems that are not present in the training dataset. We show that MAPF-GPT notably outperforms the current best-performing learnable MAPF solvers on a diverse range of problem instances and is computationally efficient during inference.

Updated: 2025-02-11 12:28:36

标题: MAPF-GPT：大规模多智能体路径规划的模仿学习

摘要: 多智能体路径规划（MAPF）是一个通常需要在共享环境中为多个智能体找到无碰撞路径的问题。即使在限制性假设下，最优地解决MAPF问题也是NP难的，然而对于许多应用程序，如自动仓库和运输系统，对这个问题的高效解决方案至关重要。最近，基于学习的MAPF方法引起了关注，特别是那些利用深度强化学习的方法。通常，这种基于学习的MAPF求解器会添加额外的组件，如单智能体规划或通信。在这项工作中，我们完全依赖于模仿学习，利用大量专家MAPF解决方案的数据集和基于Transformer的神经网络创建了一个名为MAPF-GPT的MAPF基础模型。后者能够在不使用额外启发式或通信的情况下生成动作。MAPF-GPT展示了在解决训练数据集中不存在的MAPF问题时的零-shot学习能力。我们展示了MAPF-GPT在各种问题实例上明显优于当前表现最佳的可学习MAPF求解器，并且在推理过程中具有计算效率。

更新时间: 2025-02-11 12:28:36

领域: cs.MA,cs.AI,cs.LG

下载: http://arxiv.org/abs/2409.00134v4

Generating crossmodal gene expression from cancer histopathology improves multimodal AI predictions

Emerging research has highlighted that artificial intelligence based multimodal fusion of digital pathology and transcriptomic features can improve cancer diagnosis (grading/subtyping) and prognosis (survival risk) prediction. However, such direct fusion for joint decision is impractical in real clinical settings, where histopathology is still the gold standard for diagnosis and transcriptomic tests are rarely requested, at least in the public healthcare system. With our novel diffusion based crossmodal generative AI model PathGen, we show that genomic expressions synthesized from digital histopathology jointly predicts cancer grading and patient survival risk with high accuracy (state-of-the-art performance), certainty (through conformal coverage guarantee) and interpretability (through distributed attention maps). PathGen code is available for open use by the research community through GitHub at https://github.com/Samiran-Dey/PathGen.

Updated: 2025-02-11 12:25:42

标题: 生成跨模态基因表达数据：从癌症组织病理学改进多模态AI预测

摘要: 新兴研究已经突显出，基于人工智能的数字病理学和转录组特征的多模态融合可以改善癌症诊断（分级/亚型）和预后（生存风险）预测。然而，在真实临床环境中，直接融合以进行联合决策是不切实际的，其中组织病理学仍然是诊断的黄金标准，转录组检测很少被要求，至少在公共医疗系统中是这样。通过我们的新型基于扩散的跨模态生成AI模型PathGen，我们展示了从数字病理学合成的基因组表达共同预测癌症分级和患者生存风险，具有高准确性（最先进的表现）、确定性（通过一致覆盖保证）和可解释性（通过分布式注意力图）。PathGen代码可通过GitHub（https://github.com/Samiran-Dey/PathGen）向研究社区开放使用。

更新时间: 2025-02-11 12:25:42

领域: cs.CV,cs.AI,cs.LG

下载: http://arxiv.org/abs/2502.00568v3

Learning Confident Classifiers in the Presence of Label Noise

The success of Deep Neural Network (DNN) models significantly depends on the quality of provided annotations. In medical image segmentation, for example, having multiple expert annotations for each data point is common to minimize subjective annotation bias. Then, the goal of estimation is to filter out the label noise and recover the ground-truth masks, which are not explicitly given. This paper proposes a probabilistic model for noisy observations that allows us to build a confident classification and segmentation models. To accomplish it, we explicitly model label noise and introduce a new information-based regularization that pushes the network to recover the ground-truth labels. In addition, for segmentation task we adjust the loss function by prioritizing learning in high-confidence regions where all the annotators agree on labeling. We evaluate the proposed method on a series of classification tasks such as noisy versions of MNIST, CIFAR-10, Fashion-MNIST datasets as well as CIFAR-10N, which is real-world dataset with noisy human annotations. Additionally, for segmentation task, we consider several medical imaging datasets, such as, LIDC and RIGA that reflect real-world inter-variability among multiple annotators. Our experiments show that our algorithm outperforms state-of-the-art solutions for the considered classification and segmentation problems.

Updated: 2025-02-11 12:25:17

标题: 在标签噪声存在的情况下学习自信分类器

摘要: 深度神经网络（DNN）模型的成功在很大程度上取决于提供的标注质量。在医学图像分割中，例如，为每个数据点提供多个专家标注以减少主观标注偏差是常见的。然后，估计的目标是过滤标签噪声并恢复地面真实掩模，这些掩模并没有明确给出。本文提出了一种用于嘈杂观测的概率模型，使我们能够构建自信的分类和分割模型。为了实现这一目标，我们明确地对标签噪声进行建模，并引入了一种新的基于信息的正则化，推动网络恢复地面真实标签。此外，对于分割任务，我们通过优先学习所有标注者一致标注的高置信区域来调整损失函数。我们评估了所提出的方法在一系列分类任务上的表现，如MNIST、CIFAR-10、Fashion-MNIST数据集的嘈杂版本以及带有嘈杂人类标注的真实世界数据集CIFAR-10N。此外，对于分割任务，我们考虑了几个医学影像数据集，如LIDC和RIGA，反映了多个标注者之间的真实世界可变性。我们的实验表明，我们的算法在所考虑的分类和分割问题上优于最先进的解决方案。

更新时间: 2025-02-11 12:25:17

领域: cs.CV,cs.HC,cs.LG

下载: http://arxiv.org/abs/2301.00524v3

Polyrating: A Cost-Effective and Bias-Aware Rating System for LLM Evaluation

Rating-based human evaluation has become an essential tool to accurately evaluate the impressive performance of large language models (LLMs). However, current rating systems suffer from several important limitations: first, they fail to account for biases that significantly influence evaluation results, second, they require large and expensive preference datasets to obtain accurate ratings, and third, they do not facilitate meaningful comparisons of model ratings across different tasks. To address these issues, we introduce Polyrating, an expressive and flexible rating system based on maximum a posteriori estimation that enables a more nuanced and thorough analysis of model performance at lower costs. Polyrating can detect and quantify biases affecting human preferences, ensuring fairer model comparisons. Further, Polyrating can reduce the cost of human evaluations by up to $41\%$ for new models and up to $77\%$ for new tasks by leveraging existing benchmark scores. Lastly, Polyrating enables direct comparisons of ratings across different tasks, providing a comprehensive understanding of an LLMs' strengths, weaknesses, and relative performance across different applications.

Updated: 2025-02-11 12:21:13

标题: Polyrating: 一种成本效益高且具有偏见意识的LLM评估评分系统

摘要: 基于评分的人类评估已经成为准确评估大型语言模型（LLMs）出色表现的重要工具。然而，当前的评分系统存在几个重要限制：首先，它们未能考虑对评估结果产生重大影响的偏见，其次，它们需要大量昂贵的偏好数据集才能获得准确的评分，第三，它们无法促进在不同任务之间进行有意义的模型评分比较。为了解决这些问题，我们引入了Polyrating，这是一个基于最大后验估计的表达丰富且灵活的评分系统，可以以更低的成本实现对模型性能的更细致和彻底的分析。Polyrating可以检测和量化影响人类偏好的偏见，确保更公平的模型比较。此外，通过利用现有的基准分数，Polyrating可以将新模型的人类评估成本降低高达41％，将新任务的人类评估成本降低高达77％。最后，Polyrating实现了在不同任务之间直接比较评分，提供了对LLMs的强项、弱项以及在不同应用中的相对性能的综合理解。

更新时间: 2025-02-11 12:21:13

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2409.00696v3

Mind the Gap: Towards Generalizable Autonomous Penetration Testing via Domain Randomization and Meta-Reinforcement Learning

With increasing numbers of vulnerabilities exposed on the internet, autonomous penetration testing (pentesting) has emerged as a promising research area. Reinforcement learning (RL) is a natural fit for studying this topic. However, two key challenges limit the applicability of RL-based autonomous pentesting in real-world scenarios: (a) training environment dilemma -- training agents in simulated environments is sample-efficient while ensuring their realism remains challenging; (b) poor generalization ability -- agents' policies often perform poorly when transferred to unseen scenarios, with even slight changes potentially causing significant generalization gap. To this end, we propose GAP, a generalizable autonomous pentesting framework that aims to realizes efficient policy training in realistic environments and train generalizable agents capable of drawing inferences about other cases from one instance. GAP introduces a Real-to-Sim-to-Real pipeline that (a) enables end-to-end policy learning in unknown real environments while constructing realistic simulations; (b) improves agents' generalization ability by leveraging domain randomization and meta-RL learning.Specially, we are among the first to apply domain randomization in autonomous pentesting and propose a large language model-powered domain randomization method for synthetic environment generation. We further apply meta-RL to improve agents' generalization ability in unseen environments by leveraging synthetic environments. The combination of two methods effectively bridges the generalization gap and improves agents' policy adaptation performance.Experiments are conducted on various vulnerable virtual machines, with results showing that GAP can enable policy learning in various realistic environments, achieve zero-shot policy transfer in similar environments, and realize rapid policy adaptation in dissimilar environments.

Updated: 2025-02-11 12:16:21

标题: 注意间隙：通过领域随机化和元强化学习实现可泛化的自主渗透测试

摘要: 随着互联网上暴露的漏洞数量不断增加，自主渗透测试（pentesting）已经成为一个有前景的研究领域。强化学习（RL）是研究这一主题的自然选择。然而，两个关键挑战限制了基于RL的自主渗透测试在现实场景中的适用性：（a）训练环境困境--在模拟环境中训练代理在样本效率高的同时确保其真实性仍具挑战性；（b）低泛化能力--代理的策略在转移到未见场景时通常性能较差，即使轻微变化也可能导致显著的泛化差距。为此，我们提出了GAP，一个通用的自主渗透测试框架，旨在在现实环境中实现高效的策略训练，并训练能够从一个实例中推断其他情况的通用代理。GAP引入了一个实到模到实的流程，（a）在未知的真实环境中进行端到端策略学习，同时构建逼真的模拟环境；（b）通过利用领域随机化和元强化学习学习改进代理的泛化能力。特别是，我们是第一批在自主渗透测试中应用领域随机化的研究者，并提出了一个由大型语言模型驱动的领域随机化方法用于合成环境生成。我们进一步应用元强化学习来提高代理在未知环境中的泛化能力，通过利用合成环境。这两种方法的组合有效地弥合了泛化差距，并提高了代理的策略适应性表现。我们在各种易受攻击的虚拟机上进行了实验，结果显示GAP可以实现在各种逼真环境中的策略学习，在类似环境中实现零-shot策略转移，并在不同环境中实现快速策略适应。

更新时间: 2025-02-11 12:16:21

领域: cs.LG,cs.CR

下载: http://arxiv.org/abs/2412.04078v2

Proceedings 40th International Conference on Logic Programming

Since the first conference In Marseille in 1982, the International Conference on Logic Programming (ICLP) has been the premier international event for presenting research in logic programming. These proceedings include technical communications about, and abstracts for presentations given at the 40th ICLP held October 14-17, in Dallas Texas, USA. The papers and abstracts in this volume include the following areas and topics. Formal and operational semantics: including non-monotonic reasoning, probabilistic reasoning, argumentation, and semantic issues of combining logic with neural models. Language design and programming methodologies such as answer set programming. inductive logic programming, and probabilistic programming. Program analysis and logic-based validation of generated programs. Implementation methodologies including constraint implementation, tabling, Logic-based prompt engineering, and the interaction of logic programming with LLMs.

Updated: 2025-02-11 12:13:52

标题: 第40届国际逻辑编程会议论文集

摘要: 自1982年在马赛举办第一次会议以来，国际逻辑编程会议（ICLP）一直是展示逻辑编程研究的首要国际活动。本次论文集包括有关第40届ICLP（于10月14日至17日在美国德克萨斯州达拉斯市举行）的技术交流和演讲摘要。本卷中的论文和摘要涵盖以下领域和主题。形式和操作语义：包括非单调推理、概率推理、论证以及将逻辑与神经模型结合的语义问题。语言设计和编程方法，如答案集编程、归纳逻辑编程和概率编程。程序分析和基于逻辑的生成程序验证。实现方法，包括约束实现、表格化、基于逻辑的提示工程以及逻辑编程与LLM的互动。

更新时间: 2025-02-11 12:13:52

领域: cs.LO,cs.AI

下载: http://arxiv.org/abs/2502.08453v1

Mediator: Memory-efficient LLM Merging with Less Parameter Conflicts and Uncertainty Based Routing

Model merging aggregates Large Language Models (LLMs) finetuned on different tasks into a stronger one. However, parameter conflicts between models leads to performance degradation in averaging. While model routing addresses this issue by selecting individual models during inference, it imposes excessive storage and compute costs, and fails to leverage the common knowledge from different models. In this work, we observe that different layers exhibit varying levels of parameter conflicts. Building on this insight, we average layers with minimal parameter conflicts and use a novel task-level expert routing for layers with significant conflicts. To further reduce storage costs, inspired by task arithmetic sparsity, we decouple multiple fine-tuned experts into a dense expert and several sparse experts. Considering the out-of-distribution samples, we select and merge appropriate experts based on the task uncertainty of the input data. We conduct extensive experiments on both LLaMA and Qwen with varying parameter scales, and evaluate on real-world reasoning tasks. Results demonstrate that our method consistently achieves significant performance improvements while requiring less system cost compared to existing methods.

Updated: 2025-02-11 12:09:51

标题: 调解者：基于更少参数冲突和不确定性路由的内存高效LLM合并

摘要: 模型合并将在不同任务上微调的大型语言模型（LLMs）聚合成一个更强大的模型。然而，模型之间的参数冲突导致了平均性能下降。虽然模型路由在推断过程中通过选择个别模型来解决这个问题，但它会造成过多的存储和计算成本，并且无法充分利用不同模型的共同知识。在这项工作中，我们观察到不同层次的参数冲突程度各不相同。基于这一观察，我们对具有最小参数冲突的层进行平均，并对具有显著冲突的层使用一种新颖的任务级专家路由。为了进一步降低存储成本，受到任务算术稀疏性的启发，我们将多个微调专家分解为一个密集专家和几个稀疏专家。考虑到样本的分布之外，我们根据输入数据的任务不确定性选择并合并适当的专家。我们进行了对LLaMA和Qwen的广泛实验，评估了不同参数规模下的实际推理任务。结果表明，与现有方法相比，我们的方法始终能够显著提高性能，同时需要更少的系统成本。

更新时间: 2025-02-11 12:09:51

领域: cs.LG,cs.AI,cs.CL,68T50

下载: http://arxiv.org/abs/2502.04411v2

The AI off-switch problem as a signalling game: bounded rationality and incomparability

The off-switch problem is a critical challenge in AI control: if an AI system resists being switched off, it poses a significant risk. In this paper, we model the off-switch problem as a signalling game, where a human decision-maker communicates its preferences about some underlying decision problem to an AI agent, which then selects actions to maximise the human's utility. We assume that the human is a bounded rational agent and explore various bounded rationality mechanisms. Using real machine learning models, we reprove prior results and demonstrate that a necessary condition for an AI system to refrain from disabling its off-switch is its uncertainty about the human's utility. We also analyse how message costs influence optimal strategies and extend the analysis to scenarios involving incomparability.

Updated: 2025-02-11 12:08:04

标题: AI关闭开关问题作为一个信号博弈：有界理性和无法比较性

摘要: 关于AI控制的关闭问题是一个关键挑战：如果一个AI系统抵抗被关闭，那将带来重大风险。在本文中，我们将关闭问题建模为一个信号博弈，其中一个人类决策者将其对某个基础决策问题的偏好传达给一个AI代理，然后该代理选择行动以最大化人类的效用。我们假设人类是有限理性的代理并探索各种有限理性机制。通过使用真实的机器学习模型，我们重新证明了先前的结果，并证明了AI系统避免禁用其关闭开关的必要条件是对人类效用的不确定性。我们还分析了信息成本如何影响最优策略，并将分析扩展到涉及无法比较的情景。

更新时间: 2025-02-11 12:08:04

领域: cs.LG

下载: http://arxiv.org/abs/2502.06403v2

Unified Graph Networks (UGN): A Deep Neural Framework for Solving Graph Problems

Deep neural networks have enabled researchers to create powerful generalized frameworks, such as transformers, that can be used to solve well-studied problems in various application domains, such as text and image. However, such generalized frameworks are not available for solving graph problems. Graph structures are ubiquitous in many applications around us and many graph problems have been widely studied over years. In recent times, there has been a surge in deep neural network based approaches to solve graph problems, with growing availability of graph structured datasets across diverse domains. Nevertheless, existing methods are mostly tailored to solve a specific task and lack the capability to create a generalized model leading to solutions for different downstream tasks. In this work, we propose a novel, resource-efficient framework named \emph{U}nified \emph{G}raph \emph{N}etwork (UGN) by leveraging the feature extraction capability of graph convolutional neural networks (GCN) and 2-dimensional convolutional neural networks (Conv2D). UGN unifies various graph learning tasks, such as link prediction, node classification, community detection, graph-to-graph translation, knowledge graph completion, and more, within a cohesive framework, while exercising minimal task-specific extensions (e.g., formation of supernodes for coarsening massive networks to increase scalability, use of \textit{mean target connectivity matrix} (MTCM) representation for achieving scalability in graph translation task, etc.) to enhance the generalization capability of graph learning and analysis. We test the novel UGN framework for six uncorrelated graph problems, using twelve different datasets. Experimental results show that UGN outperforms the state-of-the-art baselines by a significant margin on ten datasets, while producing comparable results on the remaining dataset.

Updated: 2025-02-11 12:03:18

标题: 统一图网络（UGN）：用于解决图问题的深度神经框架

摘要: 深度神经网络使研究人员能够创建强大的广义框架，例如transformers，可用于解决各种领域中的经典问题，如文本和图像。然而，目前尚未为解决图形问题提供此类广义框架。图形结构在我们周围的许多应用中无处不在，多年来许多图形问题已得到广泛研究。近年来，基于深度神经网络的方法在解决图形问题方面迅速增长，各个领域中的图结构数据集也日益丰富。然而，现有方法大多针对解决特定任务，缺乏创建通用模型以解决不同下游任务的能力。在本文中，我们提出了一种新颖的资源高效框架，名为统一图网络（UGN），通过利用图卷积神经网络（GCN）和二维卷积神经网络（Conv2D）的特征提取能力。 UGN在一个连贯的框架中统一了各种图学习任务，如链接预测、节点分类、社区检测、图形到图形翻译、知识图完成等，同时通过最小化任务特定扩展（例如，形成超节点以缩减大规模网络以增加可伸缩性，使用“均值目标连接矩阵”（MTCM）表示来实现图形翻译任务的可伸缩性等）来增强图形学习和分析的通用化能力。我们使用十二种不相关数据集对六个图形问题测试了这一新颖的UGN框架。实验结果表明，在十个数据集上，UGN的性能明显优于最先进的基线方法，而在其余数据集上产生可比较的结果。

更新时间: 2025-02-11 12:03:18

领域: cs.LG

下载: http://arxiv.org/abs/2502.07500v1

Decentralized Entropy-Driven Ransomware Detection Using Autonomous Neural Graph Embeddings

The increasing sophistication of cyber threats has necessitated the development of advanced detection mechanisms capable of identifying and mitigating ransomware attacks with high precision and efficiency. A novel framework, termed Decentralized Entropy-Driven Detection (DED), is introduced, leveraging autonomous neural graph embeddings and entropy-based anomaly scoring to address the limitations of traditional methods. The framework operates on a distributed network of nodes, eliminating single points of failure and enhancing resilience against targeted attacks. Experimental results demonstrate its ability to achieve detection accuracy exceeding 95\%, with false positive rates maintained below 2\% across diverse ransomware variants. The integration of graph-based modeling and machine learning techniques enables the framework to capture complex system interactions, facilitating the identification of subtle anomalies indicative of ransomware activity. Comparative analysis against existing methods highlights its superior performance in terms of detection rates and computational efficiency. Case studies further validate its effectiveness in real-world scenarios, showcasing its ability to detect and mitigate ransomware attacks within minutes of their initiation. The proposed framework represents a significant step forward in cybersecurity, offering a scalable and adaptive solution to the growing challenge of ransomware detection.

Updated: 2025-02-11 11:59:10

标题: 使用自主神经图嵌入的去中心化熵驱动勒索软件检测

摘要: 网络威胁日益复杂，迫使人们开发先进的检测机制，能够高精度高效地识别和缓解勒索软件攻击。引入了一种新颖的框架，称为分散熵驱动检测（DED），利用自主神经图嵌入和基于熵的异常评分来解决传统方法的局限性。该框架在一个分布式节点网络上运行，消除了单点故障，并增强了抵御有针对性攻击的韧性。实验结果表明，它能够实现超过95\%的检测准确率，同时将误报率保持在各种勒索软件变种下低于2\%。图形建模和机器学习技术的整合使得该框架能够捕捉复杂系统交互，有助于识别暗示勒索软件活动的微妙异常。与现有方法的比较分析突显了其在检测率和计算效率方面的卓越表现。案例研究进一步验证了其在现实场景中的有效性，展示了其能够在攻击发起后几分钟内检测和缓解勒索软件攻击的能力。所提出的框架代表了网络安全的重要进步，为不断增长的勒索软件检测挑战提供了一种可扩展且适应性强的解决方案。

更新时间: 2025-02-11 11:59:10

领域: cs.CR

下载: http://arxiv.org/abs/2502.07498v1

On Training-Conditional Conformal Prediction and Binomial Proportion Confidence Intervals

Estimating the expectation of a Bernoulli random variable based on N independent trials is a classical problem in statistics, typically addressed using Binomial Proportion Confidence Intervals (BPCI). In the control systems community, many critical tasks-such as certifying the statistical safety of dynamical systems-can be formulated as BPCI problems. Conformal Prediction (CP), a distribution-free technique for uncertainty quantification, has gained significant attention in recent years and has been applied to various control systems problems, particularly to address uncertainties in learned dynamics or controllers. A variant known as training-conditional CP was recently employed to tackle the problem of safety certification. In this note, we highlight that the use of training-conditional CP in this context does not provide valid safety guarantees. We demonstrate why CP is unsuitable for BPCI problems and argue that traditional BPCI methods are better suited for statistical safety certification.

Updated: 2025-02-11 11:59:03

标题: 关于训练条件一致性预测和二项比例置信区间

摘要: 在统计学中，基于N个独立试验的伯努利随机变量的期望值是一个经典问题，通常使用二项比例置信区间（BPCI）来解决。在控制系统社区中，许多关键任务，如验证动态系统的统计安全性，可以被表述为BPCI问题。无分布技术的可信预测（CP）近年来引起了广泛关注，并被应用于各种控制系统问题，特别是用来处理学习动态或控制器中的不确定性。最近提出的一种变体，即训练条件CP，被用来解决安全认证问题。在本文中，我们强调在这种情况下使用训练条件CP并不能提供有效的安全保证。我们展示了为什么CP不适用于BPCI问题，并主张传统的BPCI方法更适合用于统计安全认证。

更新时间: 2025-02-11 11:59:03

领域: cs.LG

下载: http://arxiv.org/abs/2502.07497v1

LLM-Sketch: Enhancing Network Sketches with LLM

Network stream mining is fundamental to many network operations. Sketches, as compact data structures that offer low memory overhead with bounded accuracy, have emerged as a promising solution for network stream mining. Recent studies attempt to optimize sketches using machine learning; however, these approaches face the challenges of lacking adaptivity to dynamic networks and incurring high training costs. In this paper, we propose LLM-Sketch, based on the insight that fields beyond the flow IDs in packet headers can also help infer flow sizes. By using a two-tier data structure and separately recording large and small flows, LLM-Sketch improves accuracy while minimizing memory usage. Furthermore, it leverages fine-tuned large language models (LLMs) to reliably estimate flow sizes. We evaluate LLM-Sketch on three representative tasks, and the results demonstrate that LLM-Sketch outperforms state-of-the-art methods by achieving a $7.5\times$ accuracy improvement.

Updated: 2025-02-11 11:54:56

标题: LLM-Sketch：通过LLM增强网络草图

摘要: 网络流挖掘对于许多网络操作至关重要。作为一种紧凑的数据结构，能够提供低内存开销和有界精度的草图已经成为网络流挖掘的一种有前途的解决方案。最近的研究尝试利用机器学习来优化草图；然而，这些方法面临着适应动态网络和高训练成本的挑战。本文提出了基于以下观点的LLM-Sketch，即在数据包头部的流ID之外的字段也可以帮助推断流大小。通过使用两层数据结构并分别记录大流和小流，LLM-Sketch改善了精度同时最大限度地减少了内存使用。此外，它利用精细调整的大型语言模型（LLMs）可靠地估计流大小。我们在三个代表性任务上评估了LLM-Sketch，结果表明LLM-Sketch通过实现$7.5\times$精度改进而胜过了最先进的方法。

更新时间: 2025-02-11 11:54:56

领域: cs.NI,cs.LG

下载: http://arxiv.org/abs/2502.07495v1

URECA: The Chain of Two Minimum Set Cover Problems exists behind Adaptation to Shifts in Semantic Code Search

Adaptation is to make model learn the patterns shifted from the training distribution. In general, this adaptation is formulated as the minimum entropy problem. However, the minimum entropy problem has inherent limitation -- shifted initialization cascade phenomenon. We extend the relationship between the minimum entropy problem and the minimum set cover problem via Lebesgue integral. This extension reveals that internal mechanism of the minimum entropy problem ignores the relationship between disentangled representations, which leads to shifted initialization cascade. From the analysis, we introduce a new clustering algorithm, Union-find based Recursive Clustering Algorithm~(URECA). URECA is an efficient clustering algorithm for the leverage of the relationships between disentangled representations. The update rule of URECA depends on Thresholdly-Updatable Stationary Assumption to dynamics as a released version of Stationary Assumption. This assumption helps URECA to transport disentangled representations with no errors based on the relationships between disentangled representations. URECA also utilize simulation trick to efficiently cluster disentangled representations. The wide range of evaluations show that URECA achieves consistent performance gains for the few-shot adaptation to diverse types of shifts along with advancement to State-of-The-Art performance in CoSQA in the scenario of query shift.

Updated: 2025-02-11 11:53:23

标题: URECA：适应语义代码搜索转变背后存在的两个最小集覆盖问题链条

摘要: 适应性是使模型学习从训练分布中转移的模式。一般来说，这种适应性被规定为最小熵问题。然而，最小熵问题具有固有的局限性 -- 转移初始化级联现象。我们通过勒贝格积分扩展了最小熵问题与最小集覆盖问题之间的关系。这种扩展揭示了最小熵问题的内部机制忽略了解耦表示之间的关系，导致了转移初始化级联。通过分析，我们引入了一种新的聚类算法，基于并查集的递归聚类算法（URECA）。URECA是一种有效的聚类算法，用于利用解耦表示之间的关系。URECA的更新规则取决于阈值可更新的稳定假设，作为稳定假设的一个发布版本。这种假设帮助URECA基于解耦表示之间的关系无误地传输解耦表示。URECA还利用模拟技巧有效地聚类解耦表示。广泛的评估显示，URECA在适应各种类型的转移时实现了一致的性能提升，并在查询转移场景中取得了CoSQA的最新性能。

更新时间: 2025-02-11 11:53:23

领域: cs.AI

下载: http://arxiv.org/abs/2502.07494v1

Exploring Patterns Behind Sports

This paper presents a comprehensive framework for time series prediction using a hybrid model that combines ARIMA and LSTM. The model incorporates feature engineering techniques, including embedding and PCA, to transform raw data into a lower-dimensional representation while retaining key information. The embedding technique is used to convert categorical data into continuous vectors, facilitating the capture of complex relationships. PCA is applied to reduce dimensionality and extract principal components, enhancing model performance and computational efficiency. To handle both linear and nonlinear patterns in the data, the ARIMA model captures linear trends, while the LSTM model models complex nonlinear dependencies. The hybrid model is trained on historical data and achieves high accuracy, as demonstrated by low RMSE and MAE scores. Additionally, the paper employs the run test to assess the randomness of sequences, providing insights into the underlying patterns. Ablation studies are conducted to validate the roles of different components in the model, demonstrating the significance of each module. The paper also utilizes the SHAP method to quantify the impact of traditional advantages on the predicted results, offering a detailed understanding of feature importance. The KNN method is used to determine the optimal prediction interval, further enhancing the model's accuracy. The results highlight the effectiveness of combining traditional statistical methods with modern deep learning techniques for robust time series forecasting in Sports.

Updated: 2025-02-11 11:51:07

标题: 探索体育背后的模式

摘要: 这篇论文提出了一个综合框架，用于使用结合了ARIMA和LSTM的混合模型进行时间序列预测。该模型结合了特征工程技术，包括嵌入和PCA，将原始数据转换为低维表示，同时保留关键信息。嵌入技术用于将分类数据转换为连续向量，有助于捕捉复杂关系。PCA用于降低维度并提取主成分，增强模型性能和计算效率。为了处理数据中的线性和非线性模式，ARIMA模型捕捉线性趋势，而LSTM模型建模复杂的非线性依赖关系。混合模型在历史数据上进行训练，并通过低RMSE和MAE分数展示了高准确性。此外，该论文采用运行测试评估序列的随机性，提供对潜在模式的洞察。消融研究验证了模型中不同组件的作用，展示了每个模块的重要性。该论文还利用SHAP方法量化传统优势对预测结果的影响，提供对特征重要性的详细理解。KNN方法用于确定最佳预测间隔，进一步提高模型的准确性。结果突显了将传统统计方法与现代深度学习技术相结合，用于运动领域的稳健时间序列预测的有效性。

更新时间: 2025-02-11 11:51:07

领域: cs.LG,cs.IR

下载: http://arxiv.org/abs/2502.07491v1

GANQ: GPU-Adaptive Non-Uniform Quantization for Large Language Models

Large Language Models (LLMs) face significant deployment challenges due to their substantial resource requirements. While low-bit quantized weights can reduce memory usage and improve inference efficiency, current hardware lacks native support for mixed-precision General Matrix Multiplication (mpGEMM), resulting in inefficient dequantization-based implementations. Moreover, uniform quantization methods often fail to capture weight distributions adequately, leading to performance degradation. We propose GANQ (GPU-Adaptive Non-Uniform Quantization), a layer-wise post-training non-uniform quantization framework optimized for hardware-efficient lookup table-based mpGEMM. GANQ achieves superior quantization performance by utilizing a training-free, GPU-adaptive optimization algorithm to efficiently reduce layer-wise quantization errors. Extensive experiments demonstrate GANQ's ability to reduce the perplexity gap from the FP16 baseline compared to state-of-the-art methods for both 3-bit and 4-bit quantization. Furthermore, when deployed on a single NVIDIA RTX 4090 GPU, GANQ's quantized models achieve up to 2.57$\times$ speedup over the baseline, advancing memory and inference efficiency in LLM deployment.

Updated: 2025-02-11 11:50:15

标题: GANQ：大型语言模型的GPU自适应非均匀量化

摘要: 大型语言模型（LLMs）面临重要的部署挑战，由于它们庞大的资源需求。虽然低比特量化权重可以减少内存使用量并提高推断效率，但当前硬件缺乏混合精度一般矩阵乘法（mpGEMM）的原生支持，导致基于去量化的实现效率低下。此外，均匀量化方法通常无法充分捕捉权重分布，导致性能下降。我们提出了GANQ（GPU自适应非均匀量化），这是一个针对硬件高效查找表为基础的mpGEMM优化的层次后训练非均匀量化框架。GANQ通过利用一种无需训练的GPU自适应优化算法，有效减少层次量化误差，实现了出色的量化性能。大量实验表明，与最先进的方法相比，GANQ能够减少与FP16基线的困惑度差距，无论是3比特还是4比特量化。此外，当部署在单个NVIDIA RTX 4090 GPU上时，GANQ的量化模型比基线实现了高达2.57倍的加速，提升了LLMs部署中的内存和推断效率。

更新时间: 2025-02-11 11:50:15

领域: cs.LG,cs.AI,math.OC

下载: http://arxiv.org/abs/2501.12956v2

Mask-Enhanced Autoregressive Prediction: Pay Less Attention to Learn More

Large Language Models (LLMs) are discovered to suffer from accurately retrieving key information. To address this, we propose Mask-Enhanced Autoregressive Prediction (MEAP), a simple yet effective training paradigm that seamlessly integrates Masked Language Modeling (MLM) into Next-Token Prediction (NTP) to enhance the latter's in-context retrieval capabilities. Specifically, MEAP first randomly masks a small fraction of input tokens and then directly performs the standard next-token prediction autoregressive using a decoder-only Transformer. MEAP eliminates the need for bidirectional attention or encoder-decoder architectures for MLM, incurring no additional computational overhead during pre-training or inference. Intensive experiments demonstrate that MEAP substantially outperforms NTP on key information retrieval and long-context reasoning tasks, while performing on par or better on commonsense reasoning tasks. The benefits of MEAP also extend to supervised fine-tuning, where it shows remarkable advantages in lost-in-the-middle scenarios, outperforming NTP by 11.77 percentage points. Our analysis indicates that MEAP's effectiveness arises from its ability to promote more distinguishable attention scores by concentrating on a reduced set of non-masked tokens. This mechanism improves the model's focus on task-relevant signals while mitigating the influence of peripheral context. These findings position MEAP as a promising training paradigm for large language models.

Updated: 2025-02-11 11:49:03

标题: 面具增强的自回归预测：少关注更多学习

摘要: 大型语言模型（LLMs）被发现在准确检索关键信息方面存在困难。为了解决这个问题，我们提出了增强掩模自回归预测（MEAP），这是一种简单而有效的训练范式，将掩模语言建模（MLM）无缝集成到下一个标记预测（NTP）中，以增强后者在上下文中的检索能力。具体来说，MEAP首先随机屏蔽少量输入标记，然后直接使用仅解码器的Transformer执行标准的下一个标记预测自回归。MEAP消除了MLM需要双向注意力或编码器-解码器架构的需要，在预训练或推理过程中不会产生额外的计算开销。大量实验表明，MEAP在关键信息检索和长上下文推理任务上明显优于NTP，同时在常识推理任务上表现相当或更好。MEAP的优势还延伸到监督微调领域，在“迷失在中间”场景中，其优势比NTP高出11.77个百分点。我们的分析表明，MEAP的有效性源于其能够通过专注于一组减少的非屏蔽标记来促进更可区分的注意力分数。这种机制提高了模型对任务相关信号的关注，同时减轻了外围上下文的影响。这些发现将MEAP定位为大型语言模型的有希望的训练范式。

更新时间: 2025-02-11 11:49:03

领域: cs.CL,cs.LG

下载: http://arxiv.org/abs/2502.07490v1

Physiome-ODE: A Benchmark for Irregularly Sampled Multivariate Time Series Forecasting Based on Biological ODEs

State-of-the-art methods for forecasting irregularly sampled time series with missing values predominantly rely on just four datasets and a few small toy examples for evaluation. While ordinary differential equations (ODE) are the prevalent models in science and engineering, a baseline model that forecasts a constant value outperforms ODE-based models from the last five years on three of these existing datasets. This unintuitive finding hampers further research on ODE-based models, a more plausible model family. In this paper, we develop a methodology to generate irregularly sampled multivariate time series (IMTS) datasets from ordinary differential equations and to select challenging instances via rejection sampling. Using this methodology, we create Physiome-ODE, a large and sophisticated benchmark of IMTS datasets consisting of 50 individual datasets, derived from real-world ordinary differential equations from research in biology. Physiome-ODE is the first benchmark for IMTS forecasting that we are aware of and an order of magnitude larger than the current evaluation setting of four datasets. Using our benchmark Physiome-ODE, we show qualitatively completely different results than those derived from the current four datasets: on Physiome-ODE ODE-based models can play to their strength and our benchmark can differentiate in a meaningful way between different IMTS forecasting models. This way, we expect to give a new impulse to research on ODE-based time series modeling.

Updated: 2025-02-11 11:48:22

标题: Physiome-ODE: 基于生物ODE的不规则采样多变量时间序列预测的基准测试

摘要: 目前用于预测不规则采样时间序列的最先进方法主要依赖于仅四个数据集和少量小型示例进行评估。虽然普通微分方程（ODE）是科学和工程中普遍采用的模型，但在过去五年中，一个预测恒定值的基准模型在这三个现有数据集上表现优于基于ODE的模型。这种令人费解的发现阻碍了对基于ODE的更合理模型族的进一步研究。在本文中，我们开发了一种方法，通过拒绝抽样从普通微分方程生成不规则采样的多变量时间序列（IMTS）数据集，并选择具有挑战性的实例。利用这种方法，我们创建了Physiome-ODE，一个包含50个独立数据集的大型复杂IMTS数据集基准，这些数据集源自生物学研究中的真实普通微分方程。Physiome-ODE是我们所知的第一个IMTS预测基准，比当前仅有四个数据集的评估环境大一个数量级。利用我们的基准Physiome-ODE，我们展示了与当前四个数据集得出的完全不同的定性结果：在Physiome-ODE上，基于ODE的模型可以发挥其优势，我们的基准可以以有意义的方式区分不同的IMTS预测模型。通过这种方式，我们希望为基于ODE的时间序列建模研究注入新的动力。

更新时间: 2025-02-11 11:48:22

领域: cs.LG

下载: http://arxiv.org/abs/2502.07489v1

Improving Adaptive Moment Optimization via Preconditioner Diagonalization

Modern adaptive optimization methods, such as Adam and its variants, have emerged as the most widely used tools in deep learning over recent years. These algorithms offer automatic mechanisms for dynamically adjusting the update step based on estimates of gradient statistics. Compared to traditional algorithms like Stochastic Gradient Descent, these adaptive methods are typically more robust to model scale and hyperparameter tuning. However, the gradient statistics employed by these methods often do not leverage sufficient gradient covariance information, leading to suboptimal updates in certain directions of the parameter space and potentially slower convergence. In this work, we keep track of such covariance statistics in the form of a structured preconditioner matrix. Unlike other works, our approach does not apply direct approximations to estimate this matrix. We instead implement an invertible transformation that maps the preconditioner matrix into a new space where it becomes approximately diagonal. This enables a diagonal approximation of the preconditioner matrix in the transformed space, offering several computational advantages. Empirical results show that our approach can substantially enhance the convergence speed of modern adaptive optimizers. Notably, for large language models like LLaMA, we can achieve a speedup of 2x compared to the baseline Adam. Additionally, our method can be integrated with memory-efficient optimizers like Adafactor to manage computational overhead.

Updated: 2025-02-11 11:48:04

标题: 通过预条件化对角化改进自适应动量优化

摘要: 近年来，诸如Adam及其变体之类的现代自适应优化方法已成为深度学习中最广泛使用的工具。这些算法提供了基于梯度统计估计动态调整更新步骤的自动机制。与随机梯度下降等传统算法相比，这些自适应方法通常对模型规模和超参数调整更具鲁棒性。然而，这些方法所使用的梯度统计信息通常没有充分利用梯度协方差信息，导致参数空间某些方向上的更新不佳，潜在地减缓收敛速度。在本研究中，我们通过结构化的预处理矩阵跟踪这种协方差统计数据。与其他研究不同，我们的方法不是直接对这个矩阵进行近似估计，而是实现一个可逆变换，将预处理矩阵映射到一个新的空间，使其近似对角化。这使得在变换空间中对预处理矩阵进行对角化近似，提供了几个计算优势。实证结果表明，我们的方法可以显著提高现代自适应优化器的收敛速度。值得注意的是，对于像LLaMA这样的大型语言模型，我们与基准Adam相比可以实现2倍加速。此外，我们的方法可以与内存高效的优化器如Adafactor集成，以管理计算开销。

更新时间: 2025-02-11 11:48:04

领域: cs.LG

下载: http://arxiv.org/abs/2502.07488v1

Overfitting Regimes of Nadaraya-Watson Interpolators

In recent years, there has been much interest in understanding the generalization behavior of interpolating predictors, which overfit on noisy training data. Whereas standard analyses are concerned with whether a method is consistent or not, recent observations have shown that even inconsistent predictors can generalize well. In this work, we revisit the classic interpolating Nadaraya-Watson (NW) estimator (also known as Shepard's method), and study its generalization capabilities through this modern viewpoint. In particular, by varying a single bandwidth-like hyperparameter, we prove the existence of multiple overfitting behaviors, ranging non-monotonically from catastrophic, through benign, to tempered. Our results highlight how even classical interpolating methods can exhibit intricate generalization behaviors. Numerical experiments complement our theory, demonstrating the same phenomena.

Updated: 2025-02-11 11:41:09

标题: Nadaraya-Watson插值器的过拟合区域

摘要: 近年来，人们对理解插值预测器的泛化行为表现出了很大兴趣，这些插值预测器在嘈杂的训练数据上过拟合。虽然标准分析关注于一个方法是否一致，但最近的观察显示，即使不一致的预测器也可以很好地泛化。在这项工作中，我们重新审视经典的插值Nadaraya-Watson（NW）估计器（也称为Shepard方法），通过这种现代视角研究其泛化能力。具体而言，通过改变单个类似带宽的超参数，我们证明了存在多种过拟合行为，从灾难性的、良性的到温和的非单调变化。我们的结果突显了即使是经典的插值方法也可以展示复杂的泛化行为。数值实验补充了我们的理论，展示了相同的现象。

更新时间: 2025-02-11 11:41:09

领域: cs.LG,math.ST,stat.ML,stat.TH

下载: http://arxiv.org/abs/2502.07480v1

WebChecker: A Versatile EVL Plugin for Validating HTML Pages with Bootstrap Frameworks

WebChecker is a plugin for Epsilon Validation Language (EVL), designed to validate both static and dynamic HTML pages utilizing frameworks like Bootstrap. By employing configurable EVL constraints, WebChecker enforces implicit rules governing HTML and CSS frameworks. The effectiveness of the plugin is demonstrated through its application on Bootstrap, the widely adopted HTML, CSS, and JavaScript framework. WebChecker comes with a set of EVL constraints to assess Bootstrap based web pages. To substantiate our claims, I present an illustrative example featuring two solutions that effectively enforce implicit rules.

Updated: 2025-02-11 11:40:43

标题: WebChecker：用于验证带有Bootstrap框架的HTML页面的多功能EVL插件

摘要: WebChecker是一个用于Epsilon Validation Language（EVL）的插件，旨在验证使用Bootstrap等框架的静态和动态HTML页面。通过使用可配置的EVL约束，WebChecker强制执行控制HTML和CSS框架的隐含规则。该插件的有效性通过在广泛采用的HTML、CSS和JavaScript框架Bootstrap上的应用来证明。WebChecker配备了一组EVL约束，用于评估基于Bootstrap的网页。为了证实我们的说法，我提供了一个示例，展示了两种有效强制执行隐含规则的解决方案。

更新时间: 2025-02-11 11:40:43

领域: cs.SE,cs.AI

下载: http://arxiv.org/abs/2502.07479v1

Limits to scalable evaluation at the frontier: LLM as Judge won't beat twice the data

High quality annotations are increasingly a bottleneck in the explosively growing machine learning ecosystem. Scalable evaluation methods that avoid costly annotation have therefore become an important research ambition. Many hope to use strong existing models in lieu of costly labels to provide cheap model evaluations. Unfortunately, this method of using models as judges introduces biases, such as self-preferencing, that can distort model comparisons. An emerging family of debiasing tools promises to fix these issues by using a few high quality labels to debias a large number of model judgments. In this paper, we study how far such debiasing methods, in principle, can go. Our main result shows that when the judge is no more accurate than the evaluated model, no debiasing method can decrease the required amount of ground truth labels by more than half. Our result speaks to the severe limitations of the LLM-as-a-judge paradigm at the evaluation frontier where the goal is to assess newly released models that are possibly better than the judge. Through an empirical evaluation, we demonstrate that the sample size savings achievable in practice are even more modest than what our theoretical limit suggests. Along the way, our work provides new observations about debiasing methods for model evaluation, and points out promising avenues for future work.

Updated: 2025-02-11 11:39:18

标题: 在前沿领域的可扩展评估的限制：作为法官的LLM不会超过两倍的数据

摘要: 高质量的注释在迅速增长的机器学习生态系统中越来越成为一个瓶颈。因此，避免昂贵注释的可扩展评估方法已经成为一个重要的研究目标。许多人希望利用强大的现有模型代替昂贵的标签，以提供廉价的模型评估。不幸的是，将模型用作评判者的方法会引入偏见，例如自我偏好，可能扭曲模型比较。一系列新兴的去偏见工具承诺通过使用少量高质量标签来去除大量模型判断中的偏见。在本文中，我们研究了这种去偏见方法在原则上能达到的程度。我们的主要结果表明，当评判者的准确性不再高于被评估的模型时，任何去偏见方法都无法将所需的地面真实标签数量减少超过一半。我们的结果揭示了在评估新发布的可能比评判者更好的模型的评估前沿，LLM作为评判者范式的严重限制。通过实证评估，我们展示了实践中可实现的样本量节约效果甚至比我们的理论限制所暗示的更为温和。在此过程中，我们的工作为模型评估的去偏见方法提供了新的观察，并指出了未来工作的有前景的方向。

更新时间: 2025-02-11 11:39:18

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2410.13341v2

Supply Chain Network Security Investment Strategies Based on Nonlinear Budget Constraints: The Moderating Roles of Market Share and Attack Risk

In the context of the rapid development of digital supply chain networks, dealing with the increasing cybersecurity threats and formulating effective security investment strategies to defend against cyberattack risks are the core issues in supply chain management. Cybersecurity investment decision-making is a key strategic task in enterprise supply chain manage-ment. Traditional game theory models and linear programming methods make it challenging to deal with complex problems such as multi-party par-ticipation in the supply chain, resource constraints, and risk uncertainty, re-sulting in enterprises facing high risks and uncertainties in the field of cy-bersecurity. To effectively meet this challenge, this study proposes a nonlin-ear budget-constrained cybersecurity investment optimization model based on variational inequality and projection shrinkage algorithm. This method simulates the impact of market competition on security investment by intro-ducing market share variables, combining variational inequality and projec-tion shrinkage algorithm to solve the model, and analyzing the effect of dif-ferent variables such as budget constraints, cyberattack losses, and market share on supply chain network security. In numerical analysis, the model achieved high cybersecurity levels of 0.96 and 0.95 in the experimental sce-narios of two retailers and two demand markets, respectively, and the budget constraint analysis revealed the profound impact of budget constraints on cybersecurity investment. Through numerical experiments and comparative analysis, the effectiveness and operability of this method in improving sup-ply chain network security are verified.

Updated: 2025-02-11 11:37:58

标题: 基于非线性预算约束的供应链网络安全投资策略：市场份额和攻击风险的调节作用

摘要: 在数字化供应链网络快速发展的背景下，应对日益增多的网络安全威胁，并制定有效的安全投资策略以应对网络攻击风险，是供应链管理中的核心问题。网络安全投资决策是企业供应链管理中的关键战略任务。传统的博弈论模型和线性规划方法难以处理诸如供应链中多方参与、资源约束和风险不确定性等复杂问题，导致企业在网络安全领域面临高风险和不确定性。为有效应对这一挑战，本研究提出了一种基于变分不等式和投影收缩算法的非线性预算约束网络安全投资优化模型。该方法通过引入市场份额变量，模拟市场竞争对安全投资的影响，结合变分不等式和投影收缩算法来解决模型，并分析预算约束、网络安全损失和市场份额等不同变量对供应链网络安全的影响。在数值分析中，该模型分别在两家零售商和两个需求市场的实验场景中实现了高达0.96和0.95的网络安全水平，并且预算约束分析揭示了预算约束对网络安全投资的深远影响。通过数值实验和比较分析，验证了该方法在提高供应链网络安全性方面的有效性和可操作性。

更新时间: 2025-02-11 11:37:58

领域: cs.CR

下载: http://arxiv.org/abs/2502.10448v1

5D Neural Surrogates for Nonlinear Gyrokinetic Simulations of Plasma Turbulence

Nuclear fusion plays a pivotal role in the quest for reliable and sustainable energy production. A major roadblock to achieving commercially viable fusion power is understanding plasma turbulence, which can significantly degrade plasma confinement. Modelling turbulence is crucial to design performing plasma scenarios for next-generation reactor-class devices and current experimental machines. The nonlinear gyrokinetic equation underpinning turbulence modelling evolves a 5D distribution function over time. Solving this equation numerically is extremely expensive, requiring up to weeks for a single run to converge, making it unfeasible for iterative optimisation and control studies. In this work, we propose a method for training neural surrogates for 5D gyrokinetic simulations. Our method extends a hierarchical vision transformer to five dimensions and is trained on the 5D distribution function for the adiabatic electron approximation. We demonstrate that our model can accurately infer downstream physical quantities such as heat flux time trace and electrostatic potentials for single-step predictions two orders of magnitude faster than numerical codes. Our work paves the way towards neural surrogates for plasma turbulence simulations to accelerate deployment of commercial energy production via nuclear fusion.

Updated: 2025-02-11 11:25:10

标题: 5D神经代理人用于等离子体湍流的非线性陀螺运动模拟

摘要: 核聚变在寻求可靠和可持续能源生产方面发挥着关键作用。实现商业可行核聚变能源的一个主要障碍是理解等离子体湍流，这可能会显著降低等离子体的约束能力。对湍流建模对于设计下一代反应堆级设备和当前实验机器的性能等离子体场景至关重要。支撑湍流建模的非线性回旋动力学方程随时间演变5D分布函数。数值求解这个方程非常昂贵，需要长达数周的时间才能收敛，这使得对迭代优化和控制研究不可行。在这项工作中，我们提出了一种用于训练5D回旋动力学模拟的神经替代方法。我们的方法将分层视觉变换器扩展到五维，并在绝热电子近似的5D分布函数上进行训练。我们证明，我们的模型可以准确推断出下游物理量，如热流时间迹和静电势，以单步预测比数值代码快两个数量级。我们的工作为等离子体湍流模拟的神经替代方法铺平了道路，加速了通过核聚变实现商业能源生产的部署。

更新时间: 2025-02-11 11:25:10

领域: physics.plasm-ph,cs.AI,cs.LG,stat.ML

下载: http://arxiv.org/abs/2502.07469v1

Higher-Order Message Passing for Glycan Representation Learning

Glycans are the most complex biological sequence, with monosaccharides forming extended, non-linear sequences. As post-translational modifications, they modulate protein structure, function, and interactions. Due to their diversity and complexity, predictive models of glycan properties and functions are still insufficient. Graph Neural Networks (GNNs) are deep learning models designed to process and analyze graph-structured data. These architectures leverage the connectivity and relational information in graphs to learn effective representations of nodes, edges, and entire graphs. Iteratively aggregating information from neighboring nodes, GNNs capture complex patterns within graph data, making them particularly well-suited for tasks such as link prediction or graph classification across domains. This work presents a new model architecture based on combinatorial complexes and higher-order message passing to extract features from glycan structures into a latent space representation. The architecture is evaluated on an improved GlycanML benchmark suite, establishing a new state-of-the-art performance. We envision that these improvements will spur further advances in computational glycosciences and reveal the roles of glycans in biology.

Updated: 2025-02-11 11:25:03

标题: 高阶消息传递用于糖类表示学习

摘要: 糖链是最复杂的生物序列，由单糖形成扩展的非线性序列。作为后转录修饰，它们调节蛋白质的结构、功能和相互作用。由于它们的多样性和复杂性，关于糖链性质和功能的预测模型仍然不足。图神经网络（GNNs）是设计用于处理和分析图结构数据的深度学习模型。这些架构利用图中的连接和关系信息来学习节点、边缘和整个图的有效表示。通过从相邻节点迭代地聚合信息，GNNs捕捉图数据中的复杂模式，使它们特别适用于跨领域的任务，如链接预测或图分类。本文提出了一种基于组合复合体和高阶消息传递的新模型架构，用于从糖链结构中提取特征到潜在空间表示。该架构在改进的GlycanML基准套件上进行评估，确立了新的最先进性能。我们预见这些改进将推动计算糖科学的进一步发展，并揭示糖在生物学中的作用。

更新时间: 2025-02-11 11:25:03

领域: cs.LG,q-bio.BM

下载: http://arxiv.org/abs/2409.13467v3

A Robotics-Inspired Scanpath Model Reveals the Importance of Uncertainty and Semantic Object Cues for Gaze Guidance in Dynamic Scenes

The objects we perceive guide our eye movements when observing real-world dynamic scenes. Yet, gaze shifts and selective attention are critical for perceiving details and refining object boundaries. Object segmentation and gaze behavior are, however, typically treated as two independent processes. Here, we present a computational model that simulates these processes in an interconnected manner and allows for hypothesis-driven investigations of distinct attentional mechanisms. Drawing on an information processing pattern from robotics, we use a Bayesian filter to recursively segment the scene, which also provides an uncertainty estimate for the object boundaries that we use to guide active scene exploration. We demonstrate that this model closely resembles observers' free viewing behavior on a dataset of dynamic real-world scenes, measured by scanpath statistics, including foveation duration and saccade amplitude distributions used for parameter fitting and higher-level statistics not used for fitting. These include how object detections, inspections, and returns are balanced and a delay of returning saccades without an explicit implementation of such temporal inhibition of return. Extensive simulations and ablation studies show that uncertainty promotes balanced exploration and that semantic object cues are crucial to forming the perceptual units used in object-based attention. Moreover, we show how our model's modular design allows for extensions, such as incorporating saccadic momentum or pre-saccadic attention, to further align its output with human scanpaths.

Updated: 2025-02-11 11:18:26

标题: 一个启发自机器人的扫描路径模型揭示了不确定性和语义对象线索对于动态场景中凝视引导的重要性

摘要: 我们所感知的物体在观察现实世界动态场景时引导我们的眼动。然而，凝视转移和选择性注意对于感知细节和精细化物体边界至关重要。然而，物体分割和凝视行为通常被视为两个独立的过程。在这里，我们提出了一个计算模型，以相互关联的方式模拟这些过程，并允许进行基于假设的研究不同的注意机制。借鉴机器人学的信息处理模式，我们使用贝叶斯滤波器递归分割场景，这也为我们用来引导主动场景探索的物体边界提供了不确定性估计。我们证明这个模型在动态真实场景数据集上的观察者的自由观看行为中表现出与扫描路径统计学的相似性，包括用于参数拟合的凝视持续时间和扫视幅度分布，以及用于拟合的较高级别统计数据。这些包括物体检测、检查和返回如何平衡以及没有明确实现此类时间性返回抑制的返回凝视的延迟。广泛的模拟和消融研究表明不确定性促进平衡的探索，并且语义物体线索对于形成物体注意中使用的感知单位至关重要。此外，我们展示了我们模型的模块化设计如何允许扩展，例如将快速运动或前凝视注意纳入，以进一步使其输出与人类扫描路径对齐。

更新时间: 2025-02-11 11:18:26

领域: cs.CV,cs.AI,q-bio.NC

下载: http://arxiv.org/abs/2408.01322v3

JamendoMaxCaps: A Large Scale Music-caption Dataset with Imputed Metadata

We introduce JamendoMaxCaps, a large-scale music-caption dataset featuring over 200,000 freely licensed instrumental tracks from the renowned Jamendo platform. The dataset includes captions generated by a state-of-the-art captioning model, enhanced with imputed metadata. We also introduce a retrieval system that leverages both musical features and metadata to identify similar songs, which are then used to fill in missing metadata using a local large language model (LLLM). This approach allows us to provide a more comprehensive and informative dataset for researchers working on music-language understanding tasks. We validate this approach quantitatively with five different measurements. By making the JamendoMaxCaps dataset publicly available, we provide a high-quality resource to advance research in music-language understanding tasks such as music retrieval, multimodal representation learning, and generative music models.

Updated: 2025-02-11 11:12:19

标题: JamendoMaxCaps：一个带有补全元数据的大规模音乐标题数据集

摘要: 我们介绍了JamendoMaxCaps，这是一个大规模的音乐标题数据集，包含了来自著名Jamendo平台的超过20万首自由授权的器乐曲目。该数据集包括由最先进的字幕模型生成的标题，并通过补充的元数据进行增强。我们还介绍了一个检索系统，利用音乐特征和元数据来识别相似的歌曲，然后使用本地大型语言模型（LLLM）填补缺失的元数据。这种方法使我们能够为研究音乐语言理解任务的研究人员提供更全面和有信息的数据集。我们通过五种不同的测量量定量验证了这种方法。通过公开提供JamendoMaxCaps数据集，我们为推进音乐语言理解任务的研究提供了一个高质量的资源，如音乐检索、多模态表示学习和生成音乐模型。

更新时间: 2025-02-11 11:12:19

领域: cs.SD,cs.AI

下载: http://arxiv.org/abs/2502.07461v1

Adversarial Perturbations Cannot Reliably Protect Artists From Generative AI

Artists are increasingly concerned about advancements in image generation models that can closely replicate their unique artistic styles. In response, several protection tools against style mimicry have been developed that incorporate small adversarial perturbations into artworks published online. In this work, we evaluate the effectiveness of popular protections -- with millions of downloads -- and show they only provide a false sense of security. We find that low-effort and "off-the-shelf" techniques, such as image upscaling, are sufficient to create robust mimicry methods that significantly degrade existing protections. Through a user study, we demonstrate that all existing protections can be easily bypassed, leaving artists vulnerable to style mimicry. We caution that tools based on adversarial perturbations cannot reliably protect artists from the misuse of generative AI, and urge the development of alternative non-technological solutions.

Updated: 2025-02-11 11:11:29

标题: 对抗性扰动无法可靠地保护艺术家免受生成式人工智能的侵害

摘要: 艺术家越来越关注能够紧密复制其独特艺术风格的图像生成模型的进展。作为回应，已经开发了几种针对风格模仿的保护工具，这些工具将小的对抗性扰动融入在线发布的艺术作品中。在这项工作中，我们评估了流行的保护措施的有效性--这些措施已经被数百万人下载--并展示它们只提供了一种虚假的安全感。我们发现，低成本和“现成”的技术，比如图像放大，足以创建强大的模仿方法，显著降低了现有的保护措施。通过用户研究，我们证明所有现有的保护措施都可以轻松绕过，使艺术家容易受到风格模仿的威胁。我们警告称，基于对抗性扰动的工具无法可靠地保护艺术家免受生成式人工智能的滥用，并敦促开发替代的非技术解决方案。

更新时间: 2025-02-11 11:11:29

领域: cs.CR

下载: http://arxiv.org/abs/2406.12027v2

Object-centric proto-symbolic behavioural reasoning from pixels

Autonomous intelligent agents must bridge computational challenges at disparate levels of abstraction, from the low-level spaces of sensory input and motor commands to the high-level domain of abstract reasoning and planning. A key question in designing such agents is how best to instantiate the representational space that will interface between these two levels -- ideally without requiring supervision in the form of expensive data annotations. These objectives can be efficiently achieved by representing the world in terms of objects (grounded in perception and action). In this work, we present a novel, brain-inspired, deep-learning architecture that learns from pixels to interpret, control, and reason about its environment, using object-centric representations. We show the utility of our approach through tasks in synthetic environments that require a combination of (high-level) logical reasoning and (low-level) continuous control. Results show that the agent can learn emergent conditional behavioural reasoning, such as $(A \to B) \land (\neg A \to C)$, as well as logical composition $(A \to B) \land (A \to C) \vdash A \to (B \land C)$ and XOR operations, and successfully controls its environment to satisfy objectives deduced from these logical rules. The agent can adapt online to unexpected changes in its environment and is robust to mild violations of its world model, thanks to dynamic internal desired goal generation. While the present results are limited to synthetic settings (2D and 3D activated versions of dSprites), which fall short of real-world levels of complexity, the proposed architecture shows how to manipulate grounded object representations, as a key inductive bias for unsupervised learning, to enable behavioral reasoning.

Updated: 2025-02-11 11:10:32

标题: 从像素级别进行的对象中心原型符号行为推理

摘要: 自主智能代理必须解决来自不同抽象级别的计算挑战，从感官输入和运动命令的低级空间到抽象推理和规划的高级领域。设计这种代理的关键问题是如何最好地实例化将在这两个级别之间进行接口的表示空间 - 理想情况下不需要昂贵的数据注释监督。通过以对象为基础（基于感知和行动）来表示世界，可以有效实现这些目标。在这项工作中，我们提出了一种新颖的、启发式的、深度学习架构，从像素学习来解释、控制、并推理其环境，使用以对象为中心的表示。我们通过在合成环境中需要高级逻辑推理和低级连续控制的任务中展示了我们方法的效用。结果表明，代理可以学习出现条件行为推理，如$(A \to B) \land (\neg A \to C)$，以及逻辑组合$(A \to B) \land (A \to C) \vdash A \to (B \land C)$和XOR操作，并成功控制其环境以满足从这些逻辑规则推导出的目标。代理可以在线适应其环境中的意外变化，并且由于动态内部期望目标生成，对其世界模型的轻微违反具有鲁棒性。虽然目前的结果仅限于合成环境（2D和3D版本的dSprites），这些环境还不足以达到真实世界的复杂程度，但所提出的架构展示了如何操纵基于对象的表示，作为无监督学习的一个关键归纳偏差，以实现行为推理。

更新时间: 2025-02-11 11:10:32

领域: cs.AI,cs.CV,cs.LG,cs.NE,I.2.0; I.2.6; I.2.10

下载: http://arxiv.org/abs/2411.17438v2

Generative Conformal Prediction with Vectorized Non-Conformity Scores

Conformal prediction (CP) provides model-agnostic uncertainty quantification with guaranteed coverage, but conventional methods often produce overly conservative uncertainty sets, especially in multi-dimensional settings. This limitation arises from simplistic non-conformity scores that rely solely on prediction error, failing to capture the prediction error distribution's complexity. To address this, we propose a generative conformal prediction framework with vectorized non-conformity scores, leveraging a generative model to sample multiple predictions from the fitted data distribution. By computing non-conformity scores across these samples and estimating empirical quantiles at different density levels, we construct adaptive uncertainty sets using density-ranked uncertainty balls. This approach enables more precise uncertainty allocation -- yielding larger prediction sets in high-confidence regions and smaller or excluded sets in low-confidence regions -- enhancing both flexibility and efficiency. We establish theoretical guarantees for statistical validity and demonstrate through extensive numerical experiments that our method outperforms state-of-the-art techniques on synthetic and real-world datasets.

Updated: 2025-02-11 11:09:52

标题: 基于矢量化非一致性评分的生成式符合性预测

摘要: 符合预测（CP）提供了基于模型的不确定性量化，保证覆盖率，但传统方法往往在多维环境中产生过度保守的不确定性集合。这种限制源于简单的不符合得分，仅依赖于预测误差，未能捕捉预测误差分布的复杂性。为了解决这个问题，我们提出了一个具有矢量化不符合得分的生成式符合预测框架，利用生成模型从拟合的数据分布中采样多个预测。通过计算这些样本的不符合得分，并在不同密度级别上估计经验分位数，我们使用密度排序的不确定性球构建自适应不确定性集。这种方法能够更精确地分配不确定性，产生更大的预测集合在高置信度区域，较小或被排除的集合在低置信度区域，增强了灵活性和效率。我们建立了统计有效性的理论保证，并通过广泛的数值实验表明，我们的方法在合成和真实数据集上优于最先进的技术。

更新时间: 2025-02-11 11:09:52

领域: cs.LG,stat.ME

下载: http://arxiv.org/abs/2410.13735v2

SKADA-Bench: Benchmarking Unsupervised Domain Adaptation Methods with Realistic Validation On Diverse Modalities

Unsupervised Domain Adaptation (DA) consists of adapting a model trained on a labeled source domain to perform well on an unlabeled target domain with some data distribution shift. While many methods have been proposed in the literature, fair and realistic evaluation remains an open question, particularly due to methodological difficulties in selecting hyperparameters in the unsupervised setting. With SKADA-bench, we propose a framework to evaluate DA methods on diverse modalities, beyond computer vision task that have been largely explored in the literature. We present a complete and fair evaluation of existing shallow algorithms, including reweighting, mapping, and subspace alignment. Realistic hyperparameter selection is performed with nested cross-validation and various unsupervised model selection scores, on both simulated datasets with controlled shifts and real-world datasets across diverse modalities, such as images, text, biomedical, and tabular data. Our benchmark highlights the importance of realistic validation and provides practical guidance for real-life applications, with key insights into the choice and impact of model selection approaches. SKADA-bench is open-source, reproducible, and can be easily extended with novel DA methods, datasets, and model selection criteria without requiring re-evaluating competitors. SKADA-bench is available on Github at https://github.com/scikit-adaptation/skada-bench.

Updated: 2025-02-11 11:09:19

标题: SKADA-Bench：利用多样化模态实际验证来对无监督领域自适应方法进行基准测试

摘要: 无监督领域自适应（DA）包括将在标记源域上训练的模型调整到在具有一定数据分布偏移的未标记目标域上表现良好。尽管文献中提出了许多方法，但公平和现实的评估仍然是一个悬而未决的问题，特别是由于在无监督设置中选择超参数的方法论困难。通过SKADA-bench，我们提出了一个框架来评估DA方法在多种模态上的表现，超越了文献中广泛探讨的计算机视觉任务。我们对现有的浅层算法进行了全面而公平的评估，包括重新加权、映射和子空间对齐。通过嵌套交叉验证和各种无监督模型选择得分，在受控偏移的模拟数据集和跨多种模态的真实世界数据集上进行了现实的超参数选择，如图像、文本、生物医学和表格数据。我们的基准强调了现实验证的重要性，并为实际应用提供了实用指导，深入了解模型选择方法的选择和影响。SKADA-bench是开源的、可重现的，并且可以轻松扩展新的DA方法、数据集和模型选择标准，而无需重新评估竞争对手。SKADA-bench可以在Github上找到：https://github.com/scikit-adaptation/skada-bench。

更新时间: 2025-02-11 11:09:19

领域: cs.LG,cs.AI,stat.ME,stat.ML

下载: http://arxiv.org/abs/2407.11676v3

PerCul: A Story-Driven Cultural Evaluation of LLMs in Persian

Large language models predominantly reflect Western cultures, largely due to the dominance of English-centric training data. This imbalance presents a significant challenge, as LLMs are increasingly used across diverse contexts without adequate evaluation of their cultural competence in non-English languages, including Persian. To address this gap, we introduce PerCul, a carefully constructed dataset designed to assess the sensitivity of LLMs toward Persian culture. PerCul features story-based, multiple-choice questions that capture culturally nuanced scenarios. Unlike existing benchmarks, PerCul is curated with input from native Persian annotators to ensure authenticity and to prevent the use of translation as a shortcut. We evaluate several state-of-the-art multilingual and Persian-specific LLMs, establishing a foundation for future research in cross-cultural NLP evaluation. Our experiments demonstrate a 11.3% gap between best closed source model and layperson baseline while the gap increases to 21.3% by using the best open-weight model. You can access the dataset from here: https://huggingface.co/datasets/teias-ai/percul

Updated: 2025-02-11 11:07:44

标题: PerCul：波斯语中LLM的叙事驱动文化评价

摘要: 大型语言模型主要反映了西方文化，这在很大程度上是由于英语为中心的训练数据的支配地位。这种不平衡带来了重大挑战，因为LLMs越来越多地被用于各种不同背景的环境中，而对其在非英语语言（包括波斯语）中的文化能力缺乏充分评估。为了填补这一空白，我们引入了PerCul，一个精心构建的数据集，旨在评估LLMs对波斯文化的敏感性。PerCul包含基于故事的、涵盖文化细微差别情景的多项选择题。与现有的基准数据集不同，PerCul是由母语为波斯语的注释人员提供输入进行策划，以确保真实性，并防止使用翻译作为捷径。我们评估了几种最先进的多语言和波斯语特定的LLMs，为跨文化自然语言处理评估的未来研究奠定了基础。我们的实验表明，最佳闭源模型与普通人基线之间存在11.3%的差距，而使用最佳开源模型时这一差距扩大到21.3%。您可以从这里访问数据集：https://huggingface.co/datasets/teias-ai/percul

更新时间: 2025-02-11 11:07:44

领域: cs.CL,cs.AI,cs.CY,I.2.7

下载: http://arxiv.org/abs/2502.07459v1

Transformer Neural Processes - Kernel Regression

Neural Processes (NPs) are a rapidly evolving class of models designed to directly model the posterior predictive distribution of stochastic processes. Originally developed as a scalable alternative to Gaussian Processes (GPs), which are limited by $O(n^3)$ runtime complexity, the most accurate modern NPs can often rival GPs but still suffer from an $O(n^2)$ bottleneck due to their attention mechanism. We introduce the Transformer Neural Process - Kernel Regression (TNP-KR), a scalable NP featuring: (1) a Kernel Regression Block (KRBlock), a simple, extensible, and parameter efficient transformer block with complexity $O(n_c^2 + n_c n_t)$, where $n_c$ and $n_t$ are the number of context and test points, respectively; (2) a kernel-based attention bias; and (3) two novel attention mechanisms: scan attention (SA), a memory-efficient scan-based attention that when paired with a kernel-based bias can make TNP-KR translation invariant, and deep kernel attention (DKA), a Performer-style attention that implicitly incoporates a distance bias and further reduces complexity to $O(n_c)$. These enhancements enable both TNP-KR variants to perform inference with 100K context points on over 1M test points in under a minute on a single 24GB GPU. On benchmarks spanning meta regression, Bayesian optimization, image completion, and epidemiology, TNP-KR with DKA outperforms its Performer counterpart on nearly every benchmark, while TNP-KR with SA achieves state-of-the-art results.

Updated: 2025-02-11 11:03:24

标题: Transformer神经过程-核回归

摘要: 神经过程（NPs）是一类快速发展的模型，旨在直接建模随机过程的后验预测分布。最初作为高斯过程（GPs）的可扩展替代方案而开发，GPs受到$O(n^3)$运行时间复杂度的限制，而最准确的现代NPs往往可以与GPs相媲美，但仍然受到$O(n^2)$瓶颈的影响，这是由于它们的注意机制。我们引入了变压器神经过程-核回归（TNP-KR），这是一种可扩展的NP，具有以下特点：（1）核回归块（KRBlock），这是一个简单、可扩展且参数高效的变压器块，其复杂度为$O(n_c^2 + n_c n_t)$，其中$n_c$和$n_t$分别是上下文和测试点的数量；（2）基于核的注意偏差；和（3）两种新颖的注意机制：扫描注意（SA），一种内存高效的基于扫描的注意机制，与基于核的偏差配对时可以使TNP-KR具有平移不变性，以及深度核注意（DKA），一种类似Performer风格的注意机制，隐式地融入了距离偏差，并将复杂度进一步降低到$O(n_c)$。这些增强功能使得两种TNP-KR变体都能够在单个24GB GPU上在不到一分钟的时间内对超过100K上下文点进行推理，并在超过1M测试点上进行推理。在涵盖元回归、贝叶斯优化、图像完成和流行病学的基准测试中，具有DKA的TNP-KR在几乎每个基准测试中均优于其Performer对应物，而具有SA的TNP-KR实现了最先进的结果。

更新时间: 2025-02-11 11:03:24

领域: cs.LG,cs.AI,stat.ML

下载: http://arxiv.org/abs/2411.12502v3

Improving Autoformalization using Type Checking

Autoformalization, the automatic translation of unconstrained natural language into formal languages, has garnered significant attention due to its potential applications in theorem proving, formal verification, and LLM output checking. In this work, we analyze both current autoformalization methods and the processes used to evaluate them, focusing specifically on the Lean 4 theorem proving language. We demonstrate that scaling type-check filtering with self-consistency techniques on top of existing methods significantly improves performance, achieving absolute accuracy gains of up to +18.4\% on ProofNet. To support reproducibility and further research, we release our code, including new symbolic equivalence for Lean formulas. We also release new benchmarks: a new research-level mathematics dataset RLM25, a corrected ProofNet, and ProofNetVerif with labeled correct and incorrect autoformalization pairs for evaluating metrics.

Updated: 2025-02-11 11:02:10

标题: 改进自动形式化使用类型检查

摘要: Autoformalization，将无约束的自然语言自动翻译成形式语言，由于其在定理证明、形式验证和LLM输出检查等方面的潜在应用而引起了广泛关注。在这项工作中，我们分析了当前的autoformalization方法以及用于评估它们的过程，重点放在Lean 4定理证明语言上。我们证明通过在现有方法之上使用自洽技术扩展类型检查过滤，可以显著改善性能，使在ProofNet上的绝对准确度提高了高达+18.4％。为了支持可重现性和进一步研究，我们发布了我们的代码，包括为Lean公式提供的新的符号等价性。我们还发布了新的基准测试：一个新的研究级数学数据集RLM25，一个经过纠正的ProofNet，以及ProofNetVerif，其中包含有标记的正确和不正确的autoformalization对用于评估度量。

更新时间: 2025-02-11 11:02:10

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2406.07222v2

MoHAVE: Mixture of Hierarchical Audio-Visual Experts for Robust Speech Recognition

Audio-visual speech recognition (AVSR) has become critical for enhancing speech recognition in noisy environments by integrating both auditory and visual modalities. However, existing AVSR systems struggle to scale up without compromising computational efficiency. In this study, we introduce MoHAVE (Mixture of Hierarchical Audio-Visual Experts), a novel robust AVSR framework designed to address these scalability constraints. By leveraging a Mixture-of-Experts (MoE) architecture, MoHAVE activates modality-specific expert groups, ensuring dynamic adaptation to various audio-visual inputs with minimal computational overhead. Key contributions of MoHAVE include: (1) a sparse MoE framework that efficiently scales AVSR model capacity, (2) a hierarchical gating mechanism that dynamically utilizes the expert groups based on input context, enhancing adaptability and robustness, and (3) remarkable performance across robust AVSR benchmarks, including LRS3 and MuAViC transcription and translation tasks, setting a new standard for scalable speech recognition systems.

Updated: 2025-02-11 11:01:05

标题: MoHAVE：用于稳健语音识别的分层音频-视觉专家混合模型

摘要: 音视频语音识别（AVSR）已成为在嘈杂环境中增强语音识别的关键，通过整合听觉和视觉模态。然而，现有的AVSR系统在不损害计算效率的情况下很难扩展。在这项研究中，我们引入了MoHAVE（混合的层次音视频专家），这是一个旨在解决这些可扩展性约束的创新稳健的AVSR框架。通过利用专家混合（MoE）架构，MoHAVE激活了与模态特定的专家组，确保对各种音视频输入进行动态适应，同时最小化计算开销。MoHAVE的关键贡献包括：（1）高效扩展AVSR模型容量的稀疏MoE框架，（2）基于输入上下文动态利用专家组的分层门控机制，增强适应性和稳健性，以及（3）在稳健AVSR基准测试中表现出色，包括LRS3和MuAViC转录和翻译任务，为可扩展语音识别系统设定了新标准。

更新时间: 2025-02-11 11:01:05

领域: eess.AS,cs.CL,cs.LG

下载: http://arxiv.org/abs/2502.10447v1

Sharp Analysis for KL-Regularized Contextual Bandits and RLHF

Reverse-Kullback-Leibler (KL) regularization has emerged to be a predominant technique used to enhance policy optimization in reinforcement learning (RL) and reinforcement learning from human feedback (RLHF), which forces the learned policy to stay close to a reference policy. While the effectiveness and necessity of KL-regularization have been empirically demonstrated in various practical scenarios, current theoretical analysis of KL-regularized RLHF still obtains the same $\mathcal{O}(1 / \epsilon^2)$ sample complexity as problems without KL-regularization. To understand the fundamental distinction between policy learning objectives with KL-regularization and ones without KL-regularization, we are the first to theoretically demonstrate the power of KL-regularization by providing a sharp analysis for KL-regularized contextual bandits and RLHF, revealing an $\mathcal{O}(1 / \epsilon)$ sample complexity when $\epsilon$ is sufficiently small. We further explore the role of data coverage in contextual bandits and RLHF. While the coverage assumption is commonly employed in offline RLHF to link the samples from the reference policy to the optimal policy, often at the cost of a multiplicative dependence on the coverage coefficient, its impact on the sample complexity of online RLHF remains unclear. Previous theoretical analyses of online RLHF typically require explicit exploration and additional structural assumptions on the reward function class. In contrast, we show that with sufficient coverage from the reference policy, a simple two-stage mixed sampling strategy can achieve a sample complexity with only an additive dependence on the coverage coefficient. Our results provide a comprehensive understanding of the roles of KL-regularization and data coverage in RLHF, shedding light on the design of more efficient RLHF algorithms.

Updated: 2025-02-11 10:58:16

标题: KL正则化上下文臂和RLHF的尖锐分析

摘要: 逆Kullback-Leibler（KL）正则化已经成为增强强化学习（RL）和强化学习从人类反馈（RLHF）中的政策优化的主要技术，它强制学习的政策保持接近参考政策。尽管在各种实际场景中已经经验性地证明了KL正则化的有效性和必要性，但目前对KL正则化的RLHF的理论分析仍然获得了与没有KL正则化的问题相同的O(1/ε²)样本复杂度。为了理解具有KL正则化和没有KL正则化的政策学习目标之间的基本区别，我们首次在理论上证明了KL正则化的强大作用，通过为KL正则化的上下文强化学习和RLHF提供尖锐的分析，揭示了当ε足够小时，具有O(1/ε)样本复杂度。我们进一步探讨了上下文强化学习和RLHF中数据覆盖的作用。虽然覆盖假设通常被用于离线RLHF中，以将参考政策的样本链接到最优政策，但往往以覆盖系数的乘法依赖为代价，其对在线RLHF的样本复杂度的影响仍不清楚。以往对在线RLHF的理论分析通常需要显式探索和对奖励函数类的额外结构假设。相反，我们表明，通过参考政策的充分覆盖，一个简单的两阶段混合采样策略可以实现样本复杂度，只有对覆盖系数有附加依赖。我们的结果在强化学习从人类反馈中的KL正则化和数据覆盖的作用方面提供了全面的理解，为设计更高效的RLHF算法提供了启示。

更新时间: 2025-02-11 10:58:16

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2411.04625v2

RusCode: Russian Cultural Code Benchmark for Text-to-Image Generation

Text-to-image generation models have gained popularity among users around the world. However, many of these models exhibit a strong bias toward English-speaking cultures, ignoring or misrepresenting the unique characteristics of other language groups, countries, and nationalities. The lack of cultural awareness can reduce the generation quality and lead to undesirable consequences such as unintentional insult, and the spread of prejudice. In contrast to the field of natural language processing, cultural awareness in computer vision has not been explored as extensively. In this paper, we strive to reduce this gap. We propose a RusCode benchmark for evaluating the quality of text-to-image generation containing elements of the Russian cultural code. To do this, we form a list of 19 categories that best represent the features of Russian visual culture. Our final dataset consists of 1250 text prompts in Russian and their translations into English. The prompts cover a wide range of topics, including complex concepts from art, popular culture, folk traditions, famous people's names, natural objects, scientific achievements, etc. We present the results of a human evaluation of the side-by-side comparison of Russian visual concepts representations using popular generative models.

Updated: 2025-02-11 10:57:12

标题: RusCode：俄罗斯文化代码基准用于文本到图像生成

摘要: 文本到图像生成模型在全球用户中获得了广泛的流行。然而，许多这些模型对讲英语的文化存在较强的偏见，忽视或误代其他语言群体、国家和民族的独特特征。缺乏文化意识可能降低生成质量，并导致不良后果，如无意冒犯和偏见传播。与自然语言处理领域不同，计算机视觉中的文化意识尚未得到广泛探讨。在本文中，我们致力于缩小这一差距。我们提出了一个用于评估包含俄罗斯文化元素的文本到图像生成质量的RusCode基准。为此，我们形成了一个包含19个最能代表俄罗斯视觉文化特征的类别的列表。我们的最终数据集包括1250个俄语文本提示及其英文翻译。这些提示涵盖了广泛的主题，包括艺术、流行文化、民间传统、名人姓名、自然物体、科学成就等复杂概念。我们展示了使用流行生成模型对俄罗斯视觉概念表征进行人工评估的结果。

更新时间: 2025-02-11 10:57:12

领域: cs.CV,cs.AI,cs.CL

下载: http://arxiv.org/abs/2502.07455v1

Locally Private Estimation with Public Features

We initiate the study of locally differentially private (LDP) learning with public features. We define semi-feature LDP, where some features are publicly available while the remaining ones, along with the label, require protection under local differential privacy. Under semi-feature LDP, we demonstrate that the mini-max convergence rate for non-parametric regression is significantly reduced compared to that of classical LDP. Then we propose HistOfTree, an estimator that fully leverages the information contained in both public and private features. Theoretically, HistOfTree reaches the mini-max optimal convergence rate. Empirically, HistOfTree achieves superior performance on both synthetic and real data. We also explore scenarios where users have the flexibility to select features for protection manually. In such cases, we propose an estimator and a data-driven parameter tuning strategy, leading to analogous theoretical and empirical results.

Updated: 2025-02-11 10:56:50

标题: 具有公共特征的局部私密估计

摘要: 我们开始研究具有公共特征的局部差分隐私（LDP）学习。我们定义了半特征LDP，其中一些特征是公开的，而其余特征以及标签需要在局部差分隐私下保护。在半特征LDP下，我们证明了非参数回归的最小-最大收敛速度与经典LDP相比显著降低。然后我们提出了HistOfTree，一个估计器，充分利用了公共和私有特征中包含的信息。理论上，HistOfTree达到了最小-最大最优收敛速度。在实证方面，HistOfTree在合成和真实数据上都实现了优越的性能。我们还探讨了用户可以灵活选择要手动保护的特征的情况。在这种情况下，我们提出了一个估计器和一个数据驱动的参数调整策略，导致类似的理论和实证结果。

更新时间: 2025-02-11 10:56:50

领域: stat.ML,cs.CR,cs.LG

下载: http://arxiv.org/abs/2405.13481v2

Towards bandit-based prompt-tuning for in-the-wild foundation agents

Prompting has emerged as the dominant paradigm for adapting large, pre-trained transformer-based models to downstream tasks. The Prompting Decision Transformer (PDT) enables large-scale, multi-task offline reinforcement learning pre-training by leveraging stochastic trajectory prompts to identify the target task. However, these prompts are sampled uniformly from expert demonstrations, overlooking a critical limitation: Not all prompts are equally informative for differentiating between tasks. To address this, we propose an inference time bandit-based prompt-tuning framework that explores and optimizes trajectory prompt selection to enhance task performance. Our experiments indicate not only clear performance gains due to bandit-based prompt-tuning, but also better sample complexity, scalability, and prompt space exploration compared to prompt-tuning baselines.

Updated: 2025-02-11 10:54:40

标题: 朝着野外基金会代理的强盗式提示调节

摘要: 提示已成为将大型、预训练的基于transformer的模型适应到下游任务的主导范式。Prompting决策transformer（PDT）通过利用随机轨迹提示来识别目标任务，实现了大规模、多任务离线强化学习预训练。然而，这些提示是均匀地从专家演示中抽样的，忽视了一个关键限制：并非所有提示对于区分任务都同样具有信息量。为了解决这个问题，我们提出了一个基于推断时间bandit的提示调优框架，探索并优化轨迹提示的选择以增强任务性能。我们的实验表明，不仅由于基于bandit的提示调优而获得明显的性能增益，而且与提示调优基线相比，还具有更好的样本复杂性、可扩展性和提示空间探索。

更新时间: 2025-02-11 10:54:40

领域: cs.LG

下载: http://arxiv.org/abs/2502.06358v2

Eliciting Rational Initial Weights in Gradual Argumentation

Many semantics for weighted argumentation frameworks assume that each argument is associated with an initial weight. However, eliciting these initial weights poses challenges: (1) accurately providing a specific numerical value is often difficult, and (2) individuals frequently confuse initial weights with acceptability degrees in the presence of other arguments. To address these issues, we propose an elicitation pipeline that allows one to specify acceptability degree intervals for each argument. By employing gradual semantics, we can refine these intervals when they are rational, restore rationality when they are not, and ultimately identify possible initial weights for each argument.

Updated: 2025-02-11 10:52:54

标题: 引发逐步论证中的合理初始权重

摘要: Many semantics for weighted argumentation frameworks assume that each argument is associated with an initial weight. However, eliciting these initial weights poses challenges: (1) accurately providing a specific numerical value is often difficult, and (2) individuals frequently confuse initial weights with acceptability degrees in the presence of other arguments. To address these issues, we propose an elicitation pipeline that allows one to specify acceptability degree intervals for each argument. By employing gradual semantics, we can refine these intervals when they are rational, restore rationality when they are not, and ultimately identify possible initial weights for each argument.

更新时间: 2025-02-11 10:52:54

领域: cs.AI

下载: http://arxiv.org/abs/2502.07452v1

Forget What You Know about LLMs Evaluations -- LLMs are Like a Chameleon

Large language models (LLMs) often appear to excel on public benchmarks, but these high scores may mask an overreliance on dataset-specific surface cues rather than true language understanding. We introduce the Chameleon Benchmark Overfit Detector (C-BOD), a meta-evaluation framework that systematically distorts benchmark prompts via a parametric transformation and detects overfitting of LLMs. By rephrasing inputs while preserving their semantic content and labels, C-BOD exposes whether a model's performance is driven by memorized patterns. Evaluated on the MMLU benchmark using 26 leading LLMs, our method reveals an average performance degradation of 2.15% under modest perturbations, with 20 out of 26 models exhibiting statistically significant differences. Notably, models with higher baseline accuracy exhibit larger performance differences under perturbation, and larger LLMs tend to be more sensitive to rephrasings indicating that both cases may overrely on fixed prompt patterns. In contrast, the Llama family and models with lower baseline accuracy show insignificant degradation, suggesting reduced dependency on superficial cues. Moreover, C-BOD's dataset- and model-agnostic design allows easy integration into training pipelines to promote more robust language understanding. Our findings challenge the community to look beyond leaderboard scores and prioritize resilience and generalization in LLM evaluation.

Updated: 2025-02-11 10:43:36

标题: 忘掉你对LLM评估的认知——LLM就像变色龙

摘要: 大型语言模型（LLMs）通常在公共基准测试中表现出色，但这些高分可能掩盖了对数据集特定表面线索而非真正语言理解的过分依赖。我们引入了变色龙基准过拟合检测器（C-BOD），这是一个元评估框架，通过参数转换系统地扭曲基准提示，并检测LLMs的过拟合。通过重新表述输入同时保留其语义内容和标签，C-BOD揭示了模型性能是否受到记忆模式的驱动。在使用26个领先的LLMs对MMLU基准进行评估时，我们的方法显示，在适度扰动下，平均性能下降了2.15％，其中26个模型中有20个表现出统计上显着的差异。值得注意的是，基线准确性较高的模型在扰动下表现出更大的性能差异，较大的LLMs倾向于对重新表述更敏感，表明这两种情况都可能过度依赖固定提示模式。相比之下，Llama系列和基线准确性较低的模型显示出无关紧要的退化，这表明对表面线索的依赖减少了。此外，C-BOD的数据集和模型不可知的设计允许轻松集成到训练流程中，以促进更强大的语言理解。我们的研究结果挑战了社区超越排行榜分数，优先考虑LLMs评估中的韧性和泛化能力。

更新时间: 2025-02-11 10:43:36

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2502.07445v1

Approximating Human Strategic Reasoning with LLM-Enhanced Recursive Reasoners Leveraging Multi-agent Hypergames

LLM-driven multi-agent-based simulations have been gaining traction with applications in game-theoretic and social simulations. While most implementations seek to exploit or evaluate LLM-agentic reasoning, they often do so with a weak notion of agency and simplified architectures. We implement a role-based multi-agent strategic interaction framework tailored to sophisticated recursive reasoners, providing the means for systematic in-depth development and evaluation of strategic reasoning. Our game environment is governed by the umpire responsible for facilitating games, from matchmaking through move validation to environment management. Players incorporate state-of-the-art LLMs in their decision mechanism, relying on a formal hypergame-based model of hierarchical beliefs. We use one-shot, 2-player beauty contests to evaluate the recursive reasoning capabilities of the latest LLMs, providing a comparison to an established baseline model from economics and data from human experiments. Furthermore, we introduce the foundations of an alternative semantic measure of reasoning to the k-level theory. Our experiments show that artificial reasoners can outperform the baseline model in terms of both approximating human behaviour and reaching the optimal solution.

Updated: 2025-02-11 10:37:20

标题: 用LLM增强的递归推理器利用多智能体超博弈逼近人类战略推理

摘要: 基于LLM驱动的多智能体仿真在博弈论和社会仿真中越来越受到关注。虽然大多数实现旨在利用或评估LLM智能体的推理能力，但它们通常对智能体概念和简化架构有着薄弱的认识。我们实现了一个基于角色的多智能体战略互动框架，专为复杂的递归推理者设计，为系统化深入开发和评估战略推理提供了手段。我们的游戏环境由裁判负责促进游戏，从配对到移动验证再到环境管理。玩家在决策机制中融入了最先进的LLM，依靠一种基于层次信念的形式化超游戏模型。我们使用一次性的、2人的美丽比赛来评估最新LLM的递归推理能力，与经济学中已建立的基线模型和人类实验数据进行比较。此外，我们引入了一种与k级理论相对的推理的替代语义度量基础。我们的实验表明，人工推理者在逼近人类行为和达到最佳解决方案方面可以胜过基线模型。

更新时间: 2025-02-11 10:37:20

领域: cs.AI,cs.GT

下载: http://arxiv.org/abs/2502.07443v1

Mathematical reasoning and the computer

Computers have already changed the way that humans do mathematics: they enable us to compute efficiently. But will they soon be helping us to reason? And will they one day start reasoning themselves? We give an overview of recent developments in neural networks, computer theorem provers and large language models.

Updated: 2025-02-11 10:35:52

标题: 数学推理与计算机

摘要: 计算机已经改变了人类进行数学的方式：它们使我们能够高效地计算。但它们是否会很快帮助我们进行推理？它们是否会有一天开始进行推理？我们概述了最近神经网络、计算机定理证明器和大型语言模型的发展。

更新时间: 2025-02-11 10:35:52

领域: cs.AI,68T01

下载: http://arxiv.org/abs/2502.07850v1

Learning Source Disentanglement in Neural Audio Codec

Neural audio codecs have significantly advanced audio compression by efficiently converting continuous audio signals into discrete tokens. These codecs preserve high-quality sound and enable sophisticated sound generation through generative models trained on these tokens. However, existing neural codec models are typically trained on large, undifferentiated audio datasets, neglecting the essential discrepancies between sound domains like speech, music, and environmental sound effects. This oversight complicates data modeling and poses additional challenges to the controllability of sound generation. To tackle these issues, we introduce the Source-Disentangled Neural Audio Codec (SD-Codec), a novel approach that combines audio coding and source separation. By jointly learning audio resynthesis and separation, SD-Codec explicitly assigns audio signals from different domains to distinct codebooks, sets of discrete representations. Experimental results indicate that SD-Codec not only maintains competitive resynthesis quality but also, supported by the separation results, demonstrates successful disentanglement of different sources in the latent space, thereby enhancing interpretability in audio codec and providing potential finer control over the audio generation process.

Updated: 2025-02-11 10:35:04

标题: 学习神经音频编解码器中的源信号分解

摘要: 神经音频编解码器通过高效地将连续音频信号转换为离散标记，显著提高了音频压缩技术。这些编解码器保留了高质量的声音，并通过在这些标记上训练的生成模型实现了复杂的声音生成。然而，现有的神经编解码器模型通常是在大型、未区分的音频数据集上训练的，忽略了诸如语音、音乐和环境声音效果之间的重要差异。这一疏忽使数据建模变得复杂，并对声音生成的可控性提出了额外挑战。为了应对这些问题，我们引入了源分解神经音频编解码器（SD-Codec），这是一种将音频编码和源分离结合起来的新方法。通过共同学习音频重合成和分离，SD-Codec明确地将不同领域的音频信号分配到不同的码书，即一组离散表示。实验结果表明，SD-Codec不仅保持了具有竞争力的重合成质量，而且在分离结果的支持下，成功地在潜在空间中分离了不同的源，从而提高了音频编解码器的可解释性，并为音频生成过程提供潜在的更精细的控制。

更新时间: 2025-02-11 10:35:04

领域: cs.SD,cs.AI,cs.LG,eess.AS

下载: http://arxiv.org/abs/2409.11228v2

The Moral Mind(s) of Large Language Models

As large language models (LLMs) become integrated into decision-making across various sectors, key questions arise: do they exhibit an emergent "moral mind" - a consistent set of moral principles guiding their ethical judgments - and is this reasoning uniform or diverse across models? To investigate this, we presented approximately forty models from major providers with a structured set of ethical scenarios, creating one of the largest datasets of its kind. Our rationality tests revealed that at least one model from each provider exhibited behavior consistent with approximately stable moral principles, effectively acting as if nearly optimizing a utility function encoding ethical reasoning. We estimated these utility functions and found that models tend to cluster around neutral ethical stances. To further characterize moral heterogeneity, we applied a non-parametric permutation approach, constructing a probabilistic similarity network based on revealed preference patterns. This analysis showed that while approximately rational models share a core ethical structure, differences emerged: roughly half displayed greater moral adaptability, bridging diverse perspectives, while the remainder adhered to more rigid ethical structures.

Updated: 2025-02-11 10:35:02

标题: 大型语言模型的道德心理

摘要: 随着大型语言模型（LLMs）被整合到各个领域的决策中，一些关键问题浮出水面：它们是否表现出一种新兴的“道德思维” - 一套一致的道德原则指导它们的伦理判断 - 这种推理在模型之间是一致的还是多样化的？为了研究这个问题，我们向主要供应商提供了大约四十个模型，并给出了一组结构化的伦理情景，创建了这类数据中最大的数据集之一。我们的理性测试显示，每个供应商的至少一个模型表现出与大约稳定的道德原则一致的行为，有效地表现出几乎优化了编码伦理推理的效用函数。我们估计了这些效用函数，并发现模型倾向于围绕中立的伦理立场聚集。为了进一步表征道德的异质性，我们应用了一种非参数置换方法，构建了一个基于显露的偏好模式的概率相似性网络。这种分析显示，虽然大约理性的模型共享核心的伦理结构，但也存在差异：大约一半的模型显示出更大的道德适应性，弥合了不同的观点，而其余的则遵循更为严格的伦理结构。

更新时间: 2025-02-11 10:35:02

领域: cs.CY,cs.AI

下载: http://arxiv.org/abs/2412.04476v2

SensPS: Sensing Personal Space Comfortable Distance between Human-Human Using Multimodal Sensors

Personal space, also known as peripersonal space, is crucial in human social interaction, influencing comfort, communication, and social stress. Estimating and respecting personal space is essential for enhancing human-computer interaction (HCI) and smart environments. Personal space preferences vary due to individual traits, cultural background, and contextual factors. Advanced multimodal sensing technologies, including eye-tracking and wristband sensors, offer opportunities to develop adaptive systems that dynamically adjust to user comfort levels. Integrating physiological and behavioral data enables a deeper understanding of spatial interactions. This study develops a sensor-based model to estimate comfortable personal space and identifies key features influencing spatial preferences. Our findings show that multimodal sensors, particularly eye-tracking and physiological wristband data, can effectively predict personal space preferences, with eye-tracking data playing a more significant role. An experimental study involving controlled human interactions demonstrates that a Transformer-based model achieves the highest predictive accuracy (F1 score: 0.87) for estimating personal space. Eye-tracking features, such as gaze point and pupil diameter, emerge as the most significant predictors, while physiological signals from wristband sensors contribute marginally. These results highlight the potential for AI-driven personalization of social space in adaptive environments, suggesting that multimodal sensing can be leveraged to develop intelligent systems that optimize spatial arrangements in workplaces, educational institutions, and public settings. Future work should explore larger datasets, real-world applications, and additional physiological markers to enhance model robustness.

Updated: 2025-02-11 10:31:43

标题: SensPS：利用多模态传感器感知人际之间的个人空间舒适距离

摘要: 个人空间，也称为周围空间，在人类社会互动中至关重要，影响着舒适度、沟通和社交压力。估计和尊重个人空间对于增强人机交互（HCI）和智能环境至关重要。个人空间偏好因个体特征、文化背景和环境因素而异。先进的多模态感知技术，包括眼动跟踪和手环传感器，为开发动态调整到用户舒适级别的自适应系统提供了机会。整合生理和行为数据可以更深入地理解空间交互。本研究开发了一个基于传感器的模型来估计舒适的个人空间，并确定影响空间偏好的关键特征。我们的发现显示，多模态传感器，特别是眼动跟踪和生理手环数据，可以有效地预测个人空间偏好，其中眼动跟踪数据起着更重要的作用。一项涉及受控人类互动的实验研究表明，一种基于Transformer的模型实现了对估计个人空间的最高预测准确性（F1分数：0.87）。眼动跟踪特征，如注视点和瞳孔直径，成为最显著的预测因素，而手环传感器的生理信号则在很小程度上有所贡献。这些结果突显了AI驱动的社会空间个性化的潜力，在自适应环境中，多模态感知可用于开发优化工作场所、教育机构和公共环境中的空间布置的智能系统。未来的工作应该探索更大的数据集、真实世界的应用和额外的生理标记，以增强模型的稳健性。

更新时间: 2025-02-11 10:31:43

领域: cs.HC,cs.AI

下载: http://arxiv.org/abs/2502.07441v1

Understanding Classifier-Free Guidance: High-Dimensional Theory and Non-Linear Generalizations

Recent studies have raised concerns about the effectiveness of Classifier-Free Guidance (CFG), indicating that in low-dimensional settings, it can lead to overshooting the target distribution and reducing sample diversity. In this work, we demonstrate that in infinite and sufficiently high-dimensional contexts CFG effectively reproduces the target distribution, revealing a blessing-of-dimensionality result. Additionally, we explore finite-dimensional effects, precisely characterizing overshoot and variance reduction. Based on our analysis, we introduce non-linear generalizations of CFG. Through numerical simulations on Gaussian mixtures and experiments on class-conditional and text-to-image diffusion models, we validate our analysis and show that our non-linear CFG offers improved flexibility and generation quality without additional computation cost.

Updated: 2025-02-11 10:29:29

标题: 理解无分类器指导：高维理论和非线性推广

摘要: 最近的研究引起了对无分类器指导（CFG）有效性的担忧，表明在低维设置中，它可能导致超调目标分布并降低样本多样性。在这项工作中，我们证明在无限和足够高维度的情况下，CFG有效地再现了目标分布，揭示了一种维度的祝福结果。此外，我们探索了有限维效应，精确地表征超调和方差减少。根据我们的分析，我们引入了CFG的非线性泛化。通过对高斯混合的数值模拟和对类条件和文本到图像扩散模型的实验证实了我们的分析，并显示我们的非线性CFG提供了改进的灵活性和生成质量，而无需额外的计算成本。

更新时间: 2025-02-11 10:29:29

领域: cs.LG,cs.AI,stat.ML

下载: http://arxiv.org/abs/2502.07849v1

Concentration of Non-Isotropic Random Tensors with Applications to Learning and Empirical Risk Minimization

Dimension is an inherent bottleneck to some modern learning tasks, where optimization methods suffer from the size of the data. In this paper, we study non-isotropic distributions of data and develop tools that aim at reducing these dimensional costs by a dependency on an effective dimension rather than the ambient one. Based on non-asymptotic estimates of the metric entropy of ellipsoids -- that prove to generalize to infinite dimensions -- and on a chaining argument, our uniform concentration bounds involve an effective dimension instead of the global dimension, improving over existing results. We show the importance of taking advantage of non-isotropic properties in learning problems with the following applications: i) we improve state-of-the-art results in statistical preconditioning for communication-efficient distributed optimization, ii) we introduce a non-isotropic randomized smoothing for non-smooth optimization. Both applications cover a class of functions that encompasses empirical risk minization (ERM) for linear models.

Updated: 2025-02-11 10:29:23

标题: 非各向异性随机张量的集中性及其在学习和经验风险最小化中的应用

摘要: 维度是一些现代学习任务固有的瓶颈，其中优化方法受到数据规模的限制。在本文中，我们研究了数据的非各向异性分布，并开发了旨在通过依赖于有效维度而不是环境维度来减少这些维度成本的工具。基于椭球体的非渐近估计的度量熵，证明可以推广到无限维度，并通过链接论证，我们的统一集中界限涉及有效维度而不是全局维度，优于现有结果。我们展示了在学习问题中利用非各向异性属性的重要性，并应用于以下领域：i) 我们改进了通信效率分布式优化中的统计预处理的最新结果，ii) 我们引入了一种非各向异性随机平滑方法用于非平滑优化。这两种应用涵盖了一类函数，包括线性模型的经验风险最小化（ERM）。

更新时间: 2025-02-11 10:29:23

领域: stat.ML,cs.LG,math.PR,math.ST,stat.TH,60E15, 60B20, 60E15, 60F10

下载: http://arxiv.org/abs/2102.04259v4

Handling missing values in clinical machine learning: Insights from an expert study

Inherently interpretable machine learning (IML) models offer valuable support for clinical decision-making but face challenges when features contain missing values. Traditional approaches, such as imputation or discarding incomplete records, are often impractical in scenarios where data is missing at test time. We surveyed 55 clinicians from 29 French trauma centers, collecting 20 complete responses to study their interaction with three IML models in a real-world clinical setting for predicting hemorrhagic shock with missing values. Our findings reveal that while clinicians recognize the value of interpretability and are familiar with common IML approaches, traditional imputation techniques often conflict with their intuition. Instead of imputing unobserved values, they rely on observed features combined with medical intuition and experience. As a result, methods that natively handle missing values are preferred. These findings underscore the need to integrate clinical reasoning into future IML models to enhance human-computer interaction.

Updated: 2025-02-11 10:27:18

标题: 在临床机器学习中处理缺失值：来自专家研究的见解

摘要: 天生可解释的机器学习(IML)模型为临床决策提供了宝贵的支持，但在特征包含缺失值时面临挑战。传统方法，如填补或丢弃不完整的记录，在测试时数据缺失的情况下往往不切实际。我们调查了来自29个法国创伤中心的55名临床医生，收集了20份完整的回应，研究他们与三种IML模型在预测出血性休克时与缺失值的交互作用。我们的研究结果显示，虽然临床医生认识到可解释性的价值并熟悉常见的IML方法，传统的填补技术常常与他们的直觉相冲突。他们不会填补未观察到的值，而是依赖观察到的特征结合医学直觉和经验。因此，首选能够本地处理缺失值的方法。这些发现强调了将临床推理融入未来的IML模型以增强人机交互的必要性。

更新时间: 2025-02-11 10:27:18

领域: cs.LG

下载: http://arxiv.org/abs/2411.09591v2

Is That Rain? Understanding Effects on Visual Odometry Performance for Autonomous UAVs and Efficient DNN-based Rain Classification at the Edge

The development of safe and reliable autonomous unmanned aerial vehicles relies on the ability of the system to recognise and adapt to changes in the local environment based on sensor inputs. State-of-the-art local tracking and trajectory planning are typically performed using camera sensor input to the flight control algorithm, but the extent to which environmental disturbances like rain affect the performance of these systems is largely unknown. In this paper, we first describe the development of an open dataset comprising ~335k images to examine these effects for seven different classes of precipitation conditions and show that a worst-case average tracking error of 1.5 m is possible for a state-of-the-art visual odometry system (VINS-Fusion). We then use the dataset to train a set of deep neural network models suited to mobile and constrained deployment scenarios to determine the extent to which it may be possible to efficiently and accurately classify these `rainy' conditions. The most lightweight of these models (MobileNetV3 small) can achieve an accuracy of 90% with a memory footprint of just 1.28 MB and a frame rate of 93 FPS, which is suitable for deployment in resource-constrained and latency-sensitive systems. We demonstrate a classification latency in the order of milliseconds using typical flight computer hardware. Accordingly, such a model can feed into the disturbance estimation component of an autonomous flight controller. In addition, data from unmanned aerial vehicles with the ability to accurately determine environmental conditions in real time may contribute to developing more granular timely localised weather forecasting.

Updated: 2025-02-11 10:21:16

标题: 这个标题的翻译是：那是雨吗？理解降雨对自主无人机视觉测距性能的影响以及边缘高效的基于深度神经网络的雨分类

摘要: 安全可靠的自主无人机的发展依赖于系统根据传感器输入识别和适应本地环境变化的能力。最先进的本地跟踪和轨迹规划通常使用摄像头传感器输入来执行飞行控制算法，但像雨水这样的环境干扰对这些系统性能的影响程度大部分是未知的。在本文中，我们首先描述了开发一个包含约335k张图像的开放数据集，以便研究七种不同降水条件对这些效果的影响，并展示了对于一种最先进的视觉测距系统（VINS-Fusion）可能存在1.5米的最坏情况平均跟踪误差。然后我们使用该数据集训练了一组适用于移动和受限部署场景的深度神经网络模型，以确定可能有效且准确地分类这些“雨天”条件的程度。其中最轻量级的模型（MobileNetV3 small）可以在只有1.28MB的内存占用和93FPS的帧率的情况下达到90%的准确率，适用于部署在资源受限和延迟敏感的系统中。我们展示了使用典型飞行计算机硬件的分类延迟为毫秒级。因此，这样的模型可以输入到自主飞行控制器的干扰估计组件中。此外，具有实时准确确定环境条件能力的无人机的数据可能有助于开发更精细及时的局部天气预报。

更新时间: 2025-02-11 10:21:16

领域: cs.RO,cs.AI

下载: http://arxiv.org/abs/2407.12663v2

CapyMOA: Efficient Machine Learning for Data Streams in Python

CapyMOA is an open-source library designed for efficient machine learning on streaming data. It provides a structured framework for real-time learning and evaluation, featuring a flexible data representation. CapyMOA includes an extensible architecture that allows integration with external frameworks such as MOA and PyTorch, facilitating hybrid learning approaches that combine traditional online algorithms with deep learning techniques. By emphasizing adaptability, scalability, and usability, CapyMOA allows researchers and practitioners to tackle dynamic learning challenges across various domains.

Updated: 2025-02-11 10:20:04

标题: CapyMOA：Python中数据流的高效机器学习

摘要: CapyMOA是一个开源库，旨在实现对流式数据进行高效机器学习。它提供了一个结构化框架，用于实时学习和评估，具有灵活的数据表示。CapyMOA包括一个可扩展的架构，允许与外部框架（如MOA和PyTorch）集成，促进结合传统在线算法和深度学习技术的混合学习方法。通过强调适应性、可伸缩性和易用性，CapyMOA允许研究人员和从业者应对各个领域的动态学习挑战。

更新时间: 2025-02-11 10:20:04

领域: cs.LG

下载: http://arxiv.org/abs/2502.07432v1

Accuracy and Robustness of Weight-Balancing Methods for Training PINNs

Physics-Informed Neural Networks (PINNs) have emerged as powerful tools for integrating physics-based models with data by minimizing both data and physics losses. However, this multi-objective optimization problem is notoriously challenging, with some benchmark problems leading to unfeasible solutions. To address these issues, various strategies have been proposed, including adaptive weight adjustments in the loss function. In this work, we introduce clear definitions of accuracy and robustness in the context of PINNs and propose a novel training algorithm based on the Primal-Dual (PD) optimization framework. Our approach enhances the robustness of PINNs while maintaining comparable performance to existing weight-balancing methods. Numerical experiments demonstrate that the PD method consistently achieves reliable solutions across all investigated cases, even in the low-data regime, and can be easily implemented, facilitating its practical adoption. The code is available at https://github.com/haoming-SHEN/Accuracy-and-Robustness-of-Weight-Balancing-Methods-for-Training-PINNs.git.

Updated: 2025-02-11 10:16:04

标题: 用于训练PINNs的权重平衡方法的准确性和鲁棒性

摘要: 物理信息神经网络（PINNs）已经成为将基于物理的模型与数据相结合的强大工具，通过最小化数据损失和物理损失。然而，这个多目标优化问题是非常具有挑战性的，一些基准问题会导致不可行的解决方案。为了解决这些问题，提出了各种策略，包括在损失函数中进行自适应权重调整。在这项工作中，我们在PINNs的上下文中引入了准确性和鲁棒性的清晰定义，并提出了一种基于原始-对偶（PD）优化框架的新型训练算法。我们的方法增强了PINNs的鲁棒性，同时保持与现有权重平衡方法相当的性能。数值实验表明，PD方法在所有研究案例中始终能够实现可靠的解决方案，即使在低数据情况下也是如此，并且可以很容易地实现，有助于其实际应用。代码可在https://github.com/haoming-SHEN/Accuracy-and-Robustness-of-Weight-Balancing-Methods-for-Training-PINNs.git获取。

更新时间: 2025-02-11 10:16:04

领域: cs.LG

下载: http://arxiv.org/abs/2501.18582v2

Towards a Foundation Model for Physics-Informed Neural Networks: Multi-PDE Learning with Active Sampling

Physics-Informed Neural Networks (PINNs) have emerged as a powerful framework for solving partial differential equations (PDEs) by embedding physical laws into neural network training. However, traditional PINN models are typically designed for single PDEs, limiting their generalizability across different physical systems. In this work, we explore the potential of a foundation PINN model capable of solving multiple PDEs within a unified architecture. We investigate the efficacy of a single PINN framework trained on four distinct PDEs-the Simple Harmonic Oscillator (SHO), the 1D Heat Equation, the 1D Wave Equation, and the 2D Laplace Equation, demonstrating its ability to learn diverse physical dynamics. To enhance sample efficiency, we incorporate Active Learning (AL) using Monte Carlo (MC) Dropout-based uncertainty estimation, selecting the most informative training samples iteratively. We evaluate different active learning strategies, comparing models trained on 10%, 20%, 30%, 40%, and 50% of the full dataset, and analyze their impact on solution accuracy. Our results indicate that targeted uncertainty sampling significantly improves performance with fewer training samples, leading to efficient learning across multiple PDEs. This work highlights the feasibility of a generalizable PINN-based foundation model, capable of adapting to different physics-based problems without redesigning network architectures. Our findings suggest that multi-PDE PINNs with active learning can serve as an effective approach for reducing computational costs while maintaining high accuracy in physics-based deep learning applications.

Updated: 2025-02-11 10:12:28

标题: 朝向物理信息神经网络的基础模型：主动采样下的多PDE学习

摘要: 物理信息神经网络（PINNs）已经成为一种强大的框架，通过将物理定律嵌入神经网络训练中，可以解决偏微分方程（PDEs）。然而，传统的PINN模型通常设计用于单个PDE，限制了它们在不同物理系统中的普适性。在这项工作中，我们探索了一个基础PINN模型的潜力，能够在统一架构内解决多个PDE。我们研究了一个单一PINN框架在四个不同PDE（简谐振子（SHO）、一维热方程、一维波动方程和二维拉普拉斯方程）上进行训练的有效性，展示了其学习多样物理动态的能力。为了提高样本效率，我们使用基于蒙特卡洛（MC）Dropout的不确定性估计，引入主动学习（AL），迭代选择最具信息量的训练样本。我们评估了不同的主动学习策略，比较了在完整数据集的10％、20％、30％、40％和50％上训练的模型，并分析了它们对解的准确性的影响。我们的结果表明，有针对性的不确定性采样显著提高了性能，同时减少了训练样本，从而实现了在多个PDE上的高效学习。这项工作强调了一个通用的基于PINN的基础模型的可行性，能够在不重新设计网络架构的情况下适应不同的基于物理的问题。我们的发现表明，具有主动学习的多PDE PINNs可以作为在物理学基础深度学习应用中降低计算成本并保持高准确性的有效方法。

更新时间: 2025-02-11 10:12:28

领域: cs.LG

下载: http://arxiv.org/abs/2502.07425v1

Modelling Chemical Reaction Networks using Neural Ordinary Differential Equations

In chemical reaction network theory, ordinary differential equations are used to model the temporal change of chemical species concentration. As the functional form of these ordinary differential equations systems is derived from an empirical model of the reaction network, it may be incomplete. Our approach aims to elucidate these hidden insights in the reaction network by combining dynamic modelling with deep learning in the form of neural ordinary differential equations. Our contributions not only help to identify the shortcomings of existing empirical models but also assist the design of future reaction networks.

Updated: 2025-02-11 10:10:33

标题: 使用神经常微分方程对化学反应网络进行建模

摘要: 在化学反应网络理论中，常微分方程被用来模拟化学物种浓度的时间变化。由于这些常微分方程系统的功能形式是从反应网络的经验模型中推导出来的，可能是不完整的。我们的方法旨在通过将动态建模与深度学习相结合，以神经常微分方程的形式揭示反应网络中的隐藏洞见。我们的贡献不仅有助于识别现有经验模型的缺陷，还有助于设计未来的反应网络。

更新时间: 2025-02-11 10:10:33

领域: q-bio.MN,cs.LG

下载: http://arxiv.org/abs/2502.19397v1

Technical note on calibrating vision-language models under covariate shift

Despite being a successful example of emerging capability, vision-language foundation models for low-shot vision classification have a limited ability to sufficiently generalize to the target data distribution due to sample poverty, leading to sensitivity to variations in the data. A popular mitigation strategy is finetuning over multiple datasets, but domain generalization is expensive when practiced in this manner. This work examines both covariate shift between pre-training data and the underspecified target data, and \textit{confidence misalignment}, where the model's prediction confidence amplified by the limited data availability. We propose \textit{Confidence-Calibrated Covariate Shift Correction ($C3SC$)}, a unified framework to mitigate both covariate shift and confidence misalignment. $C3SC$ leverages Fisher information penalty for covariate shift correction and confidence misalignment penalty (CMP) to lower confidence on misclassified examples. Experimental results across various vision and covariate shift datasets demonstrates that $C3SC$ significantly improves in calibration (ECE) by $5.82\%$ at maximum. $C3SC$ shows better robustness as well by showing $3.5\%$ improvement in accuracy metric on challenging covariate shift datasets, making $C3SC$ a promising solution for reliable real-world vision-language low-shot applications under distribution shift.

Updated: 2025-02-11 10:10:15

标题: 在协变量转移下校准视觉-语言模型的技术说明

摘要: 尽管视觉-语言基础模型在低样本视觉分类方面是成功的范例，但由于样本稀缺，这些模型在足够泛化到目标数据分布方面的能力有限，导致对数据变化的敏感性。一种流行的缓解策略是在多个数据集上进行微调，但是以这种方式进行域泛化是昂贵的。本文研究了预训练数据与未确定目标数据之间的协变量转移，以及置信度不匹配，即模型的预测置信度受到有限数据可用性的放大。我们提出了一种统一框架“置信度校准协变量转移修正（$C3SC$）”，以减轻协变量转移和置信度不匹配。$C3SC$利用Fisher信息惩罚来进行协变量转移校正，并利用置信度不匹配惩罚（CMP）来降低对错误分类示例的置信度。在各种视觉和协变量转移数据集上的实验结果表明，$C3SC$在最大程度上提高了校准度（ECE）达到了$5.82\%$。$C3SC$还表现出更好的鲁棒性，在具有挑战性的协变量转移数据集上，准确性指标提高了$3.5\％$，使$C3SC$成为可靠的解决方案，可在分布转变下应用于可靠的现实世界视觉-语言低样本应用中。

更新时间: 2025-02-11 10:10:15

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2502.07847v1

Towards a Formal Theory of the Need for Competence via Computational Intrinsic Motivation

Computational models offer powerful tools for formalising psychological theories, making them both testable and applicable in digital contexts. However, they remain little used in the study of motivation within psychology. We focus on the "need for competence", postulated as a key basic human need within Self-Determination Theory (SDT) -- arguably the most influential psychological framework for studying intrinsic motivation (IM). The need for competence is treated as a single construct across SDT texts. Yet, recent research has identified multiple, ambiguously defined facets of competence in SDT. We propose that these inconsistencies may be alleviated by drawing on computational models from the field of artificial intelligence, specifically from the domain of reinforcement learning (RL). By aligning the aforementioned facets of competence -- effectance, skill use, task performance, and capacity growth -- with existing RL formalisms, we provide a foundation for advancing competence-related theory in SDT and motivational psychology more broadly. The formalisms reveal underlying preconditions that SDT fails to make explicit, demonstrating how computational models can improve our understanding of IM. Additionally, our work can support a cycle of theory development by inspiring new computational models formalising aspects of the theory, which can then be tested empirically to refine the theory. While our research lays a promising foundation, empirical studies of these models in both humans and machines are needed, inviting collaboration across disciplines.

Updated: 2025-02-11 10:03:40

标题: 朝向通过计算内在动机需求的能力的形式理论

摘要: 计算模型为形式化心理理论提供了强大的工具，使它们既可以被测试，也可以应用于数字环境中。然而，在心理学中，计算模型在研究动机方面仍然很少被使用。我们关注“能力需求”，它被认为是自我决定理论（SDT）中的一个关键的基本人类需求，SDT可以说是研究内在动机（IM）最具影响力的心理框架。在SDT的文本中，能力需求被视为一个单一的构建。然而，最近的研究已经确定了SDT中多个模糊定义的能力需求方面。我们提出，通过借鉴来自人工智能领域的计算模型，特别是来自强化学习（RL）领域的模型，这些不一致之处可能会得到缓解。通过将上述的能力需求方面——效能、技能使用、任务表现和能力增长与现有的RL形式系统对齐，我们为推进SDT中与能力相关的理论和更广泛的动机心理学提供了基础。这些形式系统揭示了SDT未能明确表达的潜在前提，表明计算模型如何可以改进我们对IM的理解。此外，我们的工作可以通过激发新的计算模型来支持理论发展循环，这些模型形式化理论的各个方面，然后可以通过实证研究来完善理论。虽然我们的研究为这些模型在人类和机器中的实证研究奠定了良好的基础，但跨学科的合作邀请需要进行这些模型的实证研究。

更新时间: 2025-02-11 10:03:40

领域: cs.AI

下载: http://arxiv.org/abs/2502.07423v1

Re-evaluating Automatic LLM System Ranking for Alignment with Human Preference

Evaluating and ranking the capabilities of different LLMs is crucial for understanding their performance and alignment with human preferences. Due to the high cost and time-consuming nature of human evaluations, an automatic LLM bencher (i.e., an automatic evaluation framework that aims to rank LLMs based on their alignment with human preferences) is indispensable. An automatic LLM bencher consists of four components: the input set (e.g., a user instruction), the evaluation model (e.g., an LLM), the evaluation type (e.g., pairwise comparison), and the aggregation method (e.g., the ELO rating system). However, previous work has not thoroughly explored how to select these components or how their different combinations influence the results. In this work, through controlled experiments, we provide a series of recommendations on how to choose each component to better automate the evaluation of LLMs. Furthermore, we discovered that when evaluating LLMs with similar performance, the performance of the automatic LLM bencher declines sharply, underscoring the limitations of current benchers and calling for future work. Lastly, we found that the evaluation models' performance at the instance level (e.g., the accuracy of selecting the best output) does not always align with their effectiveness when used as a component of a bencher, highlighting the importance of dedicated system-level evaluation of benchers.

Updated: 2025-02-11 10:02:55

标题: 重新评估自动LLM系统排名，以与人类偏好对齐

摘要: 评估和排名不同LLM的能力对于了解它们的性能和与人类偏好的一致性至关重要。由于人类评估的高成本和耗时性质，自动LLM评估框架（即旨在根据它们与人类偏好的一致性对LLM进行排名的自动评估框架）是不可或缺的。自动LLM评估框架包括四个组成部分：输入集（例如用户指令）、评估模型（例如LLM）、评估类型（例如成对比较）和聚合方法（例如ELO评级系统）。然而，先前的研究并没有深入探讨如何选择这些组件以及它们不同的组合如何影响结果。在这项工作中，通过控制实验，我们提供了一系列关于如何选择每个组件以更好地自动化LLM评估的建议。此外，我们发现，当评估性能相似的LLM时，自动LLM评估框架的性能急剧下降，突显了当前评估框架的局限性，并呼吁未来的工作。最后，我们发现，评估模型在实例级别的性能（例如选择最佳输出的准确性）并不总是与它们作为评估框架组件时的有效性一致，强调了对评估框架的系统级评估的重要性。

更新时间: 2025-02-11 10:02:55

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2501.00560v2

MoENAS: Mixture-of-Expert based Neural Architecture Search for jointly Accurate, Fair, and Robust Edge Deep Neural Networks

There has been a surge in optimizing edge Deep Neural Networks (DNNs) for accuracy and efficiency using traditional optimization techniques such as pruning, and more recently, employing automatic design methodologies. However, the focus of these design techniques has often overlooked critical metrics such as fairness, robustness, and generalization. As a result, when evaluating SOTA edge DNNs' performance in image classification using the FACET dataset, we found that they exhibit significant accuracy disparities (14.09%) across 10 different skin tones, alongside issues of non-robustness and poor generalizability. In response to these observations, we introduce Mixture-of-Experts-based Neural Architecture Search (MoENAS), an automatic design technique that navigates through a space of mixture of experts to discover accurate, fair, robust, and general edge DNNs. MoENAS improves the accuracy by 4.02% compared to SOTA edge DNNs and reduces the skin tone accuracy disparities from 14.09% to 5.60%, while enhancing robustness by 3.80% and minimizing overfitting to 0.21%, all while keeping model size close to state-of-the-art models average size (+0.4M). With these improvements, MoENAS establishes a new benchmark for edge DNN design, paving the way for the development of more inclusive and robust edge DNNs.

Updated: 2025-02-11 10:02:43

标题: MoENAS: 基于专家混合的神经架构搜索，用于联合精确、公平和稳健的边缘深度神经网络

摘要: 最近，通过传统的优化技术，如修剪，以及最近采用自动设计方法，对边缘深度神经网络（DNNs）进行了优化，以提高准确性和效率。然而，这些设计技术的重点往往忽视了公平性、鲁棒性和泛化等关键指标。因此，在评估使用FACET数据集进行图像分类时，我们发现SOTA边缘DNNs在10种不同肤色之间存在显着的准确度差异（14.09%），同时存在不稳定性和泛化能力差的问题。针对这些观察结果，我们引入了基于专家混合的神经体系结构搜索（MoENAS），一种自动设计技术，通过在专家混合空间中导航，发现准确、公平、鲁棒和通用的边缘DNNs。相比于SOTA边缘DNNs，MoENAS将准确度提高了4.02%，将肤色准确度差异从14.09%降低至5.60%，同时将鲁棒性提高了3.80%，过拟合降至0.21%，同时将模型大小保持在接近最先进模型平均大小的水平（+0.4M）。通过这些改进，MoENAS为边缘DNN设计树立了新的基准，为开发更加包容和鲁棒的边缘DNNs铺平了道路。

更新时间: 2025-02-11 10:02:43

领域: cs.LG,cs.CV

下载: http://arxiv.org/abs/2502.07422v1

Quantification of model error for inverse problems in the Weak Neural Variational Inference framework

We present a novel extension of the Weak Neural Variational Inference (WNVI) framework for probabilistic material property estimation that explicitly quantifies model errors in PDE-based inverse problems. Traditional approaches assume the correctness of all governing equations, including potentially unreliable constitutive laws, which can lead to biased estimates and misinterpretations. Our proposed framework addresses this limitation by distinguishing between reliable governing equations, such as conservation laws, and uncertain constitutive relationships. By treating all state variables as latent random variables, we enforce these equations through separate sets of residuals, leveraging a virtual likelihood approach with weighted residuals. This formulation not only identifies regions where constitutive laws break down but also improves robustness against model uncertainties without relying on a fully trustworthy forward model. We demonstrate the effectiveness of our approach in the context of elastography, showing that it provides a structured, interpretable, and computationally efficient alternative to traditional model error correction techniques. Our findings suggest that the proposed framework enhances the accuracy and reliability of material property estimation by offering a principled way to incorporate uncertainty in constitutive modeling.

Updated: 2025-02-11 09:52:06

标题: 在弱神经变分推断框架中反问题模型误差的量化

摘要: 我们提出了一种新颖的弱神经变分推理（WNVI）框架的扩展，用于概率材料性质估计，明确量化PDE反问题中的模型误差。传统方法假设所有控制方程的正确性，包括潜在不可靠的本构定律，这可能导致偏倚估计和误解。我们提出的框架通过区分可靠的控制方程（如守恒定律）和不确定的本构关系来解决这一限制。通过将所有状态变量视为潜在随机变量，我们通过不同的残差集来强制执行这些方程，利用加权残差的虚拟似然方法。这种表述不仅可以识别本构定律失效的区域，还能提高对模型不确定性的鲁棒性，而不依赖于完全可信的前向模型。我们在弹性成像的背景下展示了我们方法的有效性，表明它提供了一个结构化、可解释和计算高效的传统模型误差校正技术的替代方案。我们的研究结果表明，所提出的框架通过以原则性的方式将不确定性纳入本构建模，提高了材料性质估计的准确性和可靠性。

更新时间: 2025-02-11 09:52:06

领域: stat.ML,cs.LG

下载: http://arxiv.org/abs/2502.07415v1

Robust Amortized Bayesian Inference with Self-Consistency Losses on Unlabeled Data

Neural amortized Bayesian inference (ABI) can solve probabilistic inverse problems orders of magnitude faster than classical methods. However, neural ABI is not yet sufficiently robust for widespread and safe applicability. In particular, when performing inference on observations outside of the scope of the simulated data seen during training, for example, because of model misspecification, the posterior approximations are likely to become highly biased. Due to the bad pre-asymptotic behavior of current neural posterior estimators in the out-of-simulation regime, the resulting estimation biases cannot be fixed in acceptable time by just simulating more training data. In this proof-of-concept paper, we propose a semi-supervised approach that enables training not only on (labeled) simulated data generated from the model, but also on unlabeled data originating from any source, including real-world data. To achieve the latter, we exploit Bayesian self-consistency properties that can be transformed into strictly proper losses without requiring knowledge of true parameter values, that is, without requiring data labels. The results of our initial experiments show remarkable improvements in the robustness of ABI on out-of-simulation data. Even if the observed data is far away from both labeled and unlabeled training data, inference remains highly accurate. If our findings also generalize to other scenarios and model classes, we believe that our new method represents a major breakthrough in neural ABI.

Updated: 2025-02-11 09:52:04

标题: 使用自一致性损失对未标记数据进行强大的摊销贝叶斯推断

摘要: 神经摊销贝叶斯推断（ABI）可以比经典方法快几个数量级地解决概率逆问题。然而，神经ABI目前还不够稳健，无法广泛和安全地应用。特别是，在对训练期间未见的观测数据进行推断时，由于模型错误规定，后验逼近很可能变得高度偏倚。由于当前神经后验估计器在模拟范围之外的预渐近性能不佳，仅通过模拟更多训练数据无法在可接受的时间内修复由此导致的估计偏差。在这篇概念验证论文中，我们提出了一种半监督方法，使训练不仅可以在（标记的）从模型生成的模拟数据上进行，还可以在来自任何源头的未标记数据上进行，包括真实世界数据。为了实现后者，我们利用了可以转化为严格适当损失的贝叶斯自一致性属性，而不需要知道真实参数值，即不需要数据标签。我们初步实验结果显示，ABI在模拟范围之外的数据上的稳健性有显著改善。即使观察到的数据远离标记和未标记的训练数据，推断仍然非常精确。如果我们的发现也适用于其他情景和模型类别，我们相信我们的新方法代表了神经ABI的重大突破。

更新时间: 2025-02-11 09:52:04

领域: stat.ML,cs.LG

下载: http://arxiv.org/abs/2501.13483v2

Memory Analysis on the Training Course of DeepSeek Models

We present a theoretical analysis of GPU memory consumption during the training of DeepSeek models such as DeepSeek-v2 and DeepSeek-v3. Our primary objective is to clarify the device-level memory requirements associated with various distributed training configurations. Specifically, we examine critical factors influencing memory usage, including micro-batch size, activation recomputation policies, 3D parallelism, and ZeRO optimizations. It is important to emphasize that the training policies discussed in this report are not representative of DeepSeek's official configurations. Instead, they are explored to provide a deeper understanding of memory dynamics in training of large-scale mixture-of-experts model.

Updated: 2025-02-11 09:51:25

标题: 深度搜索模型训练课程的记忆分析

摘要: 我们对DeepSeek模型（如DeepSeek-v2和DeepSeek-v3）训练过程中GPU内存消耗进行了理论分析。我们的主要目标是澄清与各种分布式训练配置相关的设备级内存需求。具体来说，我们考察了影响内存使用的关键因素，包括微批量大小、激活重计算策略、3D并行性和ZeRO优化。重要的是要强调，本报告中讨论的训练策略并不代表DeepSeek的官方配置。相反，它们被探讨以提供对大规模专家混合模型训练中内存动态的更深入理解。

更新时间: 2025-02-11 09:51:25

领域: cs.PF,cs.LG

下载: http://arxiv.org/abs/2502.07846v1

Sample Weight Averaging for Stable Prediction

The challenge of Out-of-Distribution (OOD) generalization poses a foundational concern for the application of machine learning algorithms to risk-sensitive areas. Inspired by traditional importance weighting and propensity weighting methods, prior approaches employ an independence-based sample reweighting procedure. They aim at decorrelating covariates to counteract the bias introduced by spurious correlations between unstable variables and the outcome, thus enhancing generalization and fulfilling stable prediction under covariate shift. Nonetheless, these methods are prone to experiencing an inflation of variance, primarily attributable to the reduced efficacy in utilizing training samples during the reweighting process. Existing remedies necessitate either environmental labels or substantially higher time costs along with additional assumptions and supervised information. To mitigate this issue, we propose SAmple Weight Averaging (SAWA), a simple yet efficacious strategy that can be universally integrated into various sample reweighting algorithms to decrease the variance and coefficient estimation error, thus boosting the covariate-shift generalization and achieving stable prediction across different environments. We prove its rationality and benefits theoretically. Experiments across synthetic datasets and real-world datasets consistently underscore its superiority against covariate shift.

Updated: 2025-02-11 09:51:22

标题: 样本权重平均用于稳定预测

摘要: Out-of-Distribution（OOD）泛化的挑战对于将机器学习算法应用于风险敏感领域构成了一个基础问题。受传统重要性加权和倾向性加权方法的启发，先前的方法采用基于独立性的样本重新加权程序。它们旨在通过去相关化协变量来抵消由不稳定变量与结果之间的虚假相关性引入的偏差，从而增强泛化能力，并在协变量转移下实现稳定预测。然而，这些方法容易遭受方差膨胀，主要是由于在重新加权过程中利用训练样本的效力降低。现有的补救方法要么需要环境标签，要么需要额外的时间成本以及额外的假设和监督信息。为了缓解这个问题，我们提出了SAmple Weight Averaging（SAWA），这是一种简单而有效的策略，可以普遍集成到各种样本重新加权算法中，以减少方差和系数估计误差，从而增强协变量转移的泛化能力，并实现在不同环境下的稳定预测。我们在理论上证明了它的合理性和好处。通过对合成数据集和真实数据集的实验一致强调了它相对于协变量转移的优势。

更新时间: 2025-02-11 09:51:22

领域: cs.LG

下载: http://arxiv.org/abs/2502.07414v1

PSformer: Parameter-efficient Transformer with Segment Attention for Time Series Forecasting

Time series forecasting remains a critical challenge across various domains, often complicated by high-dimensional data and long-term dependencies. This paper presents a novel transformer architecture for time series forecasting, incorporating two key innovations: parameter sharing (PS) and Spatial-Temporal Segment Attention (SegAtt). We also define the time series segment as the concatenation of sequence patches from the same positions across different variables. The proposed model, PSformer, reduces the number of training parameters through the parameter sharing mechanism, thereby improving model efficiency and scalability. The introduction of SegAtt could enhance the capability of capturing local spatio-temporal dependencies by computing attention over the segments, and improve global representation by integrating information across segments. The combination of parameter sharing and SegAtt significantly improves the forecasting performance. Extensive experiments on benchmark datasets demonstrate that PSformer outperforms popular baselines and other transformer-based approaches in terms of accuracy and scalability, establishing itself as an accurate and scalable tool for time series forecasting.

Updated: 2025-02-11 09:50:04

标题: PSformer: 具有分段注意力的参数高效Transformer用于时间序列预测

摘要: 时间序列预测在各个领域仍然是一个关键挑战，通常由于高维数据和长期依赖性而变得复杂。本文提出了一种新颖的变压器架构用于时间序列预测，其中包括两个关键创新：参数共享（PS）和空间-时间分段注意力（SegAtt）。我们还将时间序列分段定义为从不同变量的相同位置跨越的序列补丁的串联。所提出的模型PSformer通过参数共享机制减少了训练参数的数量，从而提高了模型的效率和可伸缩性。引入SegAtt可以通过对段进行注意力计算来增强捕捉局部时空依赖关系的能力，并通过整合跨段的信息来改善全局表示。参数共享和SegAtt的结合显著提高了预测性能。对基准数据集的大量实验表明，PSformer在准确性和可伸缩性方面优于流行的基线和其他基于变压器的方法，确立自己作为时间序列预测的准确和可伸缩工具。

更新时间: 2025-02-11 09:50:04

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2411.01419v2

Mining Power Destruction Attacks in the Presence of Petty-Compliant Mining Pools

Bitcoin's security relies on its Proof-of-Work consensus, where miners solve puzzles to propose blocks. The puzzle's difficulty is set by the difficulty adjustment mechanism (DAM), based on the network's available mining power. Attacks that destroy some portion of mining power can exploit the DAM to lower difficulty, making such attacks profitable. In this paper, we analyze three types of mining power destruction attacks in the presence of petty-compliant mining pools: selfish mining, bribery, and mining power distraction attacks. We analyze selfish mining while accounting for the distribution of mining power among pools, a factor often overlooked in the literature. Our findings indicate that selfish mining can be more destructive when the non-adversarial mining share is well distributed among pools. We also introduce a novel bribery attack, where the adversarial pool bribes petty-compliant pools to orphan others' blocks. For small pools, we demonstrate that the bribery attack can dominate strategies like selfish mining or undercutting. Lastly, we present the mining distraction attack, where the adversarial pool incentivizes petty-compliant pools to abandon Bitcoin's puzzle and mine for a simpler puzzle, thus wasting some part of their mining power. Similar to the previous attacks, this attack can lower the mining difficulty, but with the difference that it does not generate any evidence of mining power destruction, such as orphan blocks.

Updated: 2025-02-11 09:44:41

标题: 在存在顺从小型矿池的情况下开展挖矿算力破坏攻击

摘要: 比特币的安全性依赖于其工作证明共识，即矿工通过解决难题来提出区块。难题的难度由困难调整机制（DAM）设置，基于网络可用的挖矿算力。破坏某部分挖矿算力的攻击可以利用DAM降低难度，使这种攻击变得有利可图。本文分析了存在顺从的挖矿池时三种类型的挖矿算力破坏攻击：自私挖矿、贿赂和挖矿算力分散攻击。我们在考虑挖矿算力在矿池之间分配的情况下分析了自私挖矿，这在文献中经常被忽略。我们的发现表明，当非对抗性挖矿份额在矿池之间分布均匀时，自私挖矿可能更具破坏性。我们还引入了一种新颖的贿赂攻击，即对抗性矿池贿赂顺从的矿池以孤立他人的区块。对于小矿池，我们证明贿赂攻击可以主导自私挖矿或削弱策略。最后，我们提出了挖矿分散攻击，即对抗性矿池激励顺从的矿池放弃比特币的难题，转而挖掘一个更简单的难题，从而浪费部分挖矿算力。与先前的攻击类似，这种攻击可以降低挖矿难度，但不会产生任何挖矿算力破坏的证据，如孤立区块。

更新时间: 2025-02-11 09:44:41

领域: cs.CR

下载: http://arxiv.org/abs/2502.07410v1

MGPATH: Vision-Language Model with Multi-Granular Prompt Learning for Few-Shot WSI Classification

Whole slide pathology image classification presents challenges due to gigapixel image sizes and limited annotation labels, hindering model generalization. This paper introduces a prompt learning method to adapt large vision-language models for few-shot pathology classification. We first extend the Prov-GigaPath vision foundation model, pre-trained on 1.3 billion pathology image tiles, into a vision-language model by adding adaptors and aligning it with medical text encoders via contrastive learning on 923K image-text pairs. The model is then used to extract visual features and text embeddings from few-shot annotations and fine-tunes with learnable prompt embeddings. Unlike prior methods that combine prompts with frozen features using prefix embeddings or self-attention, we propose multi-granular attention that compares interactions between learnable prompts with individual image patches and groups of them. This approach improves the model's ability to capture both fine-grained details and broader context, enhancing its recognition of complex patterns across sub-regions. To further improve accuracy, we leverage (unbalanced) optimal transport-based visual-text distance to secure model robustness by mitigating perturbations that might occur during the data augmentation process. Empirical experiments on lung, kidney, and breast pathology modalities validate the effectiveness of our approach; thereby, we surpass several of the latest competitors and consistently improve performance across diverse architectures, including CLIP, PLIP, and Prov-GigaPath integrated PLIP. We release our implementations and pre-trained models at this MGPATH.

Updated: 2025-02-11 09:42:13

标题: MGPATH：具有多粒度提示学习的视觉语言模型，用于少样本WSI分类

摘要: 整个切片病理学图像分类面临挑战，原因是巨像素图像大小和有限的注释标签，这阻碍了模型的泛化。本文介绍了一种快速学习方法，以适应少样病理分类的大视觉-语言模型。我们首先将在13亿病理图像块上预训练的Prov-GigaPath视觉基础模型扩展为视觉-语言模型，通过在923K图像-文本对上使用对比学习添加适配器并将其与医学文本编码器对齐。然后使用该模型从少样注释中提取视觉特征和文本嵌入，并使用可学习的提示嵌入进行微调。与将提示与冻结特征结合使用前缀嵌入或自注意力的先前方法不同，我们提出了多粒度注意力，比较可学习提示与单个图像补丁和它们组之间的交互。这种方法改善了模型捕捉细节和更广泛上下文的能力，增强其对亚区域中复杂模式的识别。为了进一步提高准确性，我们利用（不平衡的）基于最优传输的视觉-文本距离来确保模型的稳健性，通过减轻在数据增强过程中可能发生的扰动。在肺部、肾脏和乳腺病理学模式上的实证实验验证了我们方法的有效性，从而超越了一些最新竞争对手，并在不同体系结构上持续改进性能，包括CLIP、PLIP和Prov-GigaPath集成的PLIP。我们在此MGPATH发布了我们的实现和预训练模型。

更新时间: 2025-02-11 09:42:13

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2502.07409v1

Singular leaning coefficients and efficiency in learning theory

Singular learning models with non-positive Fisher information matrices include neural networks, reduced-rank regression, Boltzmann machines, normal mixture models, and others. These models have been widely used in the development of learning machines. However, theoretical analysis is still in its early stages. In this paper, we examine learning coefficients, which indicate the general learning efficiency of deep linear learning models and three-layer neural network models with ReLU units. Finally, we extend the results to include the case of the Softmax function.

Updated: 2025-02-11 09:41:34

标题: 单一倾斜系数与学习理论中的效率

摘要: 具有非正Fisher信息矩阵的单一学习模型包括神经网络、降维回归、玻尔兹曼机、正态混合模型等。这些模型在学习机器的发展中被广泛使用。然而，理论分析仍处于早期阶段。在本文中，我们研究了学习系数，这些系数表明了具有ReLU单元的深度线性学习模型和三层神经网络模型的一般学习效率。最后，我们将结果扩展到包括Softmax函数的情况。

更新时间: 2025-02-11 09:41:34

领域: stat.ML,cs.LG,math.AG,math.ST,stat.TH

下载: http://arxiv.org/abs/2501.12747v2

No Data, No Optimization: A Lightweight Method To Disrupt Neural Networks With Sign-Flips

Deep Neural Networks (DNNs) can be catastrophically disrupted by flipping only a handful of sign bits in their parameters. We introduce Deep Neural Lesion (DNL), a data-free, lightweight method that locates these critical parameters and triggers massive accuracy drops. We validate its efficacy on a wide variety of computer vision models and datasets. The method requires no training data or optimization and can be carried out via common exploits software, firmware or hardware based attack vectors. An enhanced variant that uses a single forward and backward pass further amplifies the damage beyond DNL's zero-pass approach. Flipping just two sign bits in ResNet50 on ImageNet reduces accuracy by 99.8\%. We also show that selectively protecting a small fraction of vulnerable sign bits provides a practical defense against such attacks.

Updated: 2025-02-11 09:40:45

标题: 没有数据，没有优化：一种使用符号翻转扰乱神经网络的轻量级方法

摘要: 深度神经网络（DNNs）可以通过翻转其中参数的少量符号位来造成灾难性的破坏。我们引入了深度神经病变（DNL），这是一种无需数据且轻量级的方法，可以定位这些关键参数并触发大规模准确性下降。我们验证了其在各种计算机视觉模型和数据集上的有效性。该方法不需要训练数据或优化，并可以通过常见的漏洞软件、固件或基于硬件的攻击向量进行。一种增强型变体利用单次前向和反向传递进一步放大了损害，超越了DNL的零传递方法。在ImageNet上对ResNet50进行翻转只有两个符号位，准确性降低了99.8\%。我们还展示了有选择地保护一小部分易受攻击的符号位可以提供实际防御措施。

更新时间: 2025-02-11 09:40:45

领域: cs.LG,cs.AI,cs.CV

下载: http://arxiv.org/abs/2502.07408v1

Human-in-the-Loop Annotation for Image-Based Engagement Estimation: Assessing the Impact of Model Reliability on Annotation Accuracy

Human-in-the-loop (HITL) frameworks are increasingly recognized for their potential to improve annotation accuracy in emotion estimation systems by combining machine predictions with human expertise. This study focuses on integrating a high-performing image-based emotion model into a HITL annotation framework to evaluate the collaborative potential of human-machine interaction and identify the psychological and practical factors critical to successful collaboration. Specifically, we investigate how varying model reliability and cognitive framing influence human trust, cognitive load, and annotation behavior in HITL systems. We demonstrate that model reliability and psychological framing significantly impact annotators' trust, engagement, and consistency, offering insights into optimizing HITL frameworks. Through three experimental scenarios with 29 participants--baseline model reliability (S1), fabricated errors (S2), and cognitive bias introduced by negative framing (S3)--we analyzed behavioral and qualitative data. Reliable predictions in S1 yielded high trust and annotation consistency, while unreliable outputs in S2 led to increased critical evaluations but also heightened frustration and response variability. Negative framing in S3 revealed how cognitive bias influenced participants to perceive the model as more relatable and accurate, despite misinformation regarding its reliability. These findings highlight the importance of both reliable machine outputs and psychological factors in shaping effective human-machine collaboration. By leveraging the strengths of both human oversight and automated systems, this study establishes a scalable HITL framework for emotion annotation and lays the foundation for broader applications in adaptive learning and human-computer interaction.

Updated: 2025-02-11 09:37:10

标题: 人在循环注释中用于基于图像的参与度估计：评估模型可靠性对注释准确性的影响

摘要: 人在环的（HITL）框架越来越被认可，因为它们有潜力通过将机器预测与人类专业知识相结合来提高情感估计系统的标注准确性。本研究侧重于将一个表现优异的基于图像的情感模型整合到一个HITL标注框架中，以评估人机互动的协作潜力，并确定对成功协作至关重要的心理和实践因素。具体而言，我们研究了模型可靠性和认知框架变化如何影响HITL系统中人类信任、认知负荷和标注行为。我们证明了模型可靠性和心理框架显著影响标注者的信任、参与度和一致性，为优化HITL框架提供了见解。通过包含29名参与者的三个实验场景-基线模型可靠性（S1）、制造的错误（S2）和由负面框架引入的认知偏见（S3）-我们分析了行为和定性数据。S1中的可靠预测产生了高信任和标注一致性，而S2中的不可靠输出导致了增加的批判性评估，但也加剧了挫折和响应变化。S3中的负面框架揭示了认知偏见如何影响参与者认为模型更具可靠性和准确性，尽管关于其可靠性的错误信息。这些发现突显了可靠的机器输出和心理因素在塑造有效的人机协作中的重要性。通过利用人类监督和自动化系统的优势，本研究建立了一个可扩展的HITL框架用于情感标注，并为在自适应学习和人机交互等更广泛应用奠定了基础。

更新时间: 2025-02-11 09:37:10

领域: cs.HC,cs.AI,cs.CV

下载: http://arxiv.org/abs/2502.07404v1

Enhancing Higher Education with Generative AI: A Multimodal Approach for Personalised Learning

This research explores the opportunities of Generative AI (GenAI) in the realm of higher education through the design and development of a multimodal chatbot for an undergraduate course. Leveraging the ChatGPT API for nuanced text-based interactions and Google Bard for advanced image analysis and diagram-to-code conversions, we showcase the potential of GenAI in addressing a broad spectrum of educational queries. Additionally, the chatbot presents a file-based analyser designed for educators, offering deep insights into student feedback via sentiment and emotion analysis, and summarising course evaluations with key metrics. These combinations highlight the crucial role of multimodal conversational AI in enhancing teaching and learning processes, promising significant advancements in educational adaptability, engagement, and feedback analysis. By demonstrating a practical web application, this research underlines the imperative for integrating GenAI technologies to foster more dynamic and responsive educational environments, ultimately contributing to improved educational outcomes and pedagogical strategies.

Updated: 2025-02-11 09:29:29

标题: 用生成式人工智能增强高等教育：个性化学习的多模态方法

摘要: 这项研究通过设计和开发一个多模态聊天机器人来探索生成式人工智能（GenAI）在高等教育领域的机遇。利用ChatGPT API进行细致的文本交互和Google Bard进行高级图像分析和图表转换代码，我们展示了GenAI在解决广泛教育问题中的潜力。此外，聊天机器人还提供了一个面向教育工作者的基于文件的分析工具，通过情感和情绪分析深入了解学生反馈，并用关键指标总结课程评价。这些组合突显了多模态对话人工智能在增强教学和学习过程中的关键作用，承诺在教育适应性、参与度和反馈分析方面取得重大进展。通过展示一个实际的网络应用程序，这项研究强调了整合GenAI技术以促进更加动态和响应性教育环境的必要性，最终有助于改善教育成果和教学策略。

更新时间: 2025-02-11 09:29:29

领域: cs.HC,cs.AI

下载: http://arxiv.org/abs/2502.07401v1

Explainable Multimodal Machine Learning for Revealing Structure-Property Relationships in Carbon Nanotube Fibers

In this study, we propose Explainable Multimodal Machine Learning (EMML), which integrates the analysis of diverse data types (multimodal data) using factor analysis for feature extraction with Explainable AI (XAI), for carbon nanotube (CNT) fibers prepared from aqueous dispersions. This method is a powerful approach to elucidate the mechanisms governing material properties, where multi-stage fabrication conditions and multiscale structures have complex influences. Thus, in our case, this approach helps us understand how different processing steps and structures at various scales impact the final properties of CNT fibers. The analysis targeted structures ranging from the nanoscale to the macroscale, including aggregation size distributions of CNT dispersions and the effective length of CNTs. Furthermore, because some types of data were difficult to interpret using standard methods, challenging-to-interpret distribution data were analyzed using Negative Matrix Factorization (NMF) for extracting key features that determine the outcome. Contribution analysis with SHapley Additive exPlanations (SHAP) demonstrated that small, uniformly distributed aggregates are crucial for improving fracture strength, while CNTs with long effective lengths are significant factors for enhancing electrical conductivity. The analysis also identified thresholds and trends for these key factors to assist in defining the conditions needed to optimize CNT fiber properties. EMML is not limited to CNT fibers but can be applied to the design of other materials derived from nanomaterials, making it a useful tool for developing a wide range of advanced materials. This approach provides a foundation for advancing data-driven materials research.

Updated: 2025-02-11 09:29:23

标题: 可解释的多模态机器学习用于揭示碳纳米管纤维结构-性质关系

摘要: 在这项研究中，我们提出了可解释的多模态机器学习（EMML），该方法将使用因子分析进行特征提取的多样化数据类型（多模态数据）的分析与可解释的人工智能（XAI）相结合，用于制备自水分散液体的碳纳米管（CNT）纤维。这种方法是阐明控制材料性质机制的强大方法，其中多阶段制备条件和多尺度结构具有复杂影响。因此，在我们的案例中，这种方法帮助我们理解不同加工步骤和不同尺度结构对CNT纤维最终性能的影响。该分析针对从纳米尺度到宏观尺度的结构，包括CNT分散体的聚集尺寸分布和CNT的有效长度。此外，由于某些类型的数据难以使用标准方法解释，挑战性的分布数据使用负矩阵分解（NMF）进行分析，以提取决定结果的关键特征。使用SHapley Additive exPlanations（SHAP）的贡献分析表明，小型、均匀分布的聚集对提高断裂强度至关重要，而具有较长有效长度的CNT是增强电导率的重要因素。该分析还确定了这些关键因素的阈值和趋势，以帮助定义优化CNT纤维性能所需的条件。EMML不仅限于CNT纤维，还可以应用于从纳米材料衍生的其他材料的设计，使其成为开发各种先进材料的有用工具。这种方法为推进基于数据的材料研究奠定了基础。

更新时间: 2025-02-11 09:29:23

领域: cond-mat.mtrl-sci,cond-mat.soft,cs.AI,cs.LG,physics.data-an

下载: http://arxiv.org/abs/2502.07400v1

On Iterative Evaluation and Enhancement of Code Quality Using GPT-4o

This paper introduces CodeQUEST, a novel framework leveraging Large Language Models (LLMs) to iteratively evaluate and enhance code quality across multiple dimensions, including readability, maintainability, efficiency, and security. The framework is divided into two main components: an Evaluator that assesses code quality across ten dimensions, providing both quantitative scores and qualitative summaries, and an Optimizer that iteratively improves the code based on the Evaluator's feedback. Our study demonstrates that CodeQUEST can effectively and robustly evaluate code quality, with its assessments aligning closely with established code quality metrics. Through a series of experiments using a curated dataset of Python and JavaScript examples, CodeQUEST demonstrated significant improvements in code quality, achieving a mean relative percentage improvement of 52.6%. The framework's evaluations were validated against a set of proxy metrics comprising of Pylint Score, Radon Maintainability Index, and Bandit output logs, showing a meaningful correlation. This highlights the potential of LLMs in automating code quality evaluation and improvement processes, presenting a significant advancement toward enhancing software development practices. The code implementation of the framework is available at: https://github.com/jpmorganchase/CodeQuest.

Updated: 2025-02-11 09:27:00

标题: 关于使用GPT-4o进行代码质量的迭代评估和增强

摘要: 本文介绍了CodeQUEST，这是一个利用大型语言模型（LLMs）来迭代评估和增强代码质量的新框架，涵盖了可读性、可维护性、效率和安全性等多个维度。该框架分为两个主要组件：一个评估器，评估代码质量涵盖十个维度，提供定量分数和定性摘要，以及一个优化器，根据评估器的反馈迭代改进代码。我们的研究表明，CodeQUEST能够有效和稳健地评估代码质量，其评估结果与已建立的代码质量指标密切相关。通过一系列使用Python和JavaScript示例构建的数据集的实验，CodeQUEST展现了代码质量的显著改善，平均相对百分比改进为52.6%。该框架的评估结果经过验证，并与一组代理指标（包括Pylint分数、Radon可维护性指数和Bandit输出日志）呈现出有意义的相关性。这突显了LLMs在自动化代码质量评估和改进过程中的潜力，是向软件开发实践改进迈出的重要一步。该框架的代码实现可在以下链接找到：https://github.com/jpmorganchase/CodeQuest。

更新时间: 2025-02-11 09:27:00

领域: cs.SE,cs.AI

下载: http://arxiv.org/abs/2502.07399v1

Bandit Optimal Transport

Despite the impressive progress in statistical Optimal Transport (OT) in recent years, there has been little interest in the study of the \emph{sequential learning} of OT. Surprisingly so, as this problem is both practically motivated and a challenging extension of existing settings such as linear bandits. This article considers (for the first time) the stochastic bandit problem of learning to solve generic Kantorovich and entropic OT problems from repeated interactions when the marginals are known but the cost is unknown. We provide $\tilde{\mathcal O}(\sqrt{T})$ regret algorithms for both problems by extending linear bandits on Hilbert spaces. These results provide a reduction to infinite-dimensional linear bandits. To deal with the dimension, we provide a method to exploit the intrinsic regularity of the cost to learn, yielding corresponding regret bounds which interpolate between $\tilde{\mathcal O}(\sqrt{T})$ and $\tilde{\mathcal O}(T)$.

Updated: 2025-02-11 09:24:25

标题: 强盗最优输运

摘要: 尽管近年来统计最优输运（OT）取得了令人印象深刻的进展，但对于OT的\emph{顺序学习}的研究却鲜有兴趣。令人惊讶的是，这个问题既具有实际动机，又是对现有设置（如线性赌博机）的挑战性扩展。本文首次考虑了学习解决通用Kantorovich和熵OT问题的随机赌博机问题，其中边际已知但成本未知。我们通过在希尔伯特空间上延伸线性赌博机提供了$\tilde{\mathcal O}(\sqrt{T})$的遗憾算法解决这两个问题。这些结果将问题简化为无限维线性赌博机。为了处理维度，我们提供了一种利用成本的内在规律来学习的方法，从而得到相应的遗憾界，这些界在$\tilde{\mathcal O}(\sqrt{T})$和$\tilde{\mathcal O}(T)$之间插值。

更新时间: 2025-02-11 09:24:25

领域: stat.ML,cs.LG

下载: http://arxiv.org/abs/2502.07397v1

Spread them Apart: Towards Robust Watermarking of Generated Content

Generative models that can produce realistic images have improved significantly in recent years. The quality of the generated content has increased drastically, so sometimes it is very difficult to distinguish between the real images and the generated ones. Such an improvement comes at a price of ethical concerns about the usage of the generative models: the users of generative models can improperly claim ownership of the generated content protected by a license. In this paper, we propose an approach to embed watermarks into the generated content to allow future detection of the generated content and identification of the user who generated it. The watermark is embedded during the inference of the model, so the proposed approach does not require the retraining of the latter. We prove that watermarks embedded are guaranteed to be robust against additive perturbations of a bounded magnitude. We apply our method to watermark diffusion models and show that it matches state-of-the-art watermarking schemes in terms of robustness to different types of synthetic watermark removal attacks.

Updated: 2025-02-11 09:23:38

标题: 把它们分开：实现生成内容的稳健水印技术

摘要: 在过去几年中，能够生成逼真图像的生成模型有了显著改进。生成内容的质量大幅提高，有时很难区分真实图像和生成图像。然而，这种改进也带来了对生成模型使用的道德关注：生成模型的用户可能会不当主张拥有受许可保护的生成内容。在本文中，我们提出了一种方法，将水印嵌入生成的内容中，以便将来能够检测生成的内容并识别生成者。水印是在模型推断过程中嵌入的，因此所提出的方法不需要对模型进行重新训练。我们证明嵌入的水印能够保证对有界幅度的加性扰动具有鲁棒性。我们将我们的方法应用于水印扩散模型，并展示在不同类型的合成水印去除攻击方面，它与最先进的水印方案相匹配。

更新时间: 2025-02-11 09:23:38

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2502.07845v1

Interpretable Rules for Online Failure Prediction: A Case Study on the Metro do Porto dataset

Due to their high predictive performance, predictive maintenance applications have increasingly been approached with Deep Learning techniques in recent years. However, as in other real-world application scenarios, the need for explainability is often stated but not sufficiently addressed. This study will focus on predicting failures on Metro trains in Porto, Portugal. While recent works have found high-performing deep neural network architectures that feature a parallel explainability pipeline, the generated explanations are fairly complicated and need help explaining why the failures are happening. This work proposes a simple online rule-based explainability approach with interpretable features that leads to straightforward, interpretable rules. We showcase our approach on MetroPT2 and find that three specific sensors on the Metro do Porto trains suffice to predict the failures present in the dataset with simple rules.

Updated: 2025-02-11 09:23:16

标题: 可解释的规则用于在线故障预测：以波尔图地铁数据集为例的案例研究

摘要: 由于其高预测性能，近年来越来越多的预测性维护应用采用了深度学习技术。然而，与其他现实世界的应用场景一样，对解释性的需求经常被提出，但没有得到充分解决。本研究将重点放在葡萄牙波尔图地铁列车上预测故障。尽管最近的研究发现了具有并行可解释性管道的高性能深度神经网络架构，但生成的解释相当复杂，需要帮助解释故障发生的原因。本研究提出了一种基于在线规则的解释性方法，具有可解释性特征，可以产生直观、可解释的规则。我们在MetroPT2上展示了我们的方法，并发现波尔图地铁列车上的三个特定传感器足以用简单规则预测数据集中存在的故障。

更新时间: 2025-02-11 09:23:16

领域: cs.LG

下载: http://arxiv.org/abs/2502.07394v1

DaWin: Training-free Dynamic Weight Interpolation for Robust Adaptation

Adapting a pre-trained foundation model on downstream tasks should ensure robustness against distribution shifts without the need to retrain the whole model. Although existing weight interpolation methods are simple yet effective, we argue their static nature limits downstream performance while achieving efficiency. In this work, we propose DaWin, a training-free dynamic weight interpolation method that leverages the entropy of individual models over each unlabeled test sample to assess model expertise, and compute per-sample interpolation coefficients dynamically. Unlike previous works that typically rely on additional training to learn such coefficients, our approach requires no training. Then, we propose a mixture modeling approach that greatly reduces inference overhead raised by dynamic interpolation. We validate DaWin on the large-scale visual recognition benchmarks, spanning 14 tasks across robust fine-tuning -- ImageNet and derived five distribution shift benchmarks -- and multi-task learning with eight classification tasks. Results demonstrate that DaWin achieves significant performance gain in considered settings, with minimal computational overhead. We further discuss DaWin's analytic behavior to explain its empirical success.

Updated: 2025-02-11 09:21:41

标题: DaWin: Training-free动态权重插值用于稳健适应

摘要: 在下游任务中使用预训练的基础模型应该确保对抗分布转移，而无需重新训练整个模型。尽管现有的权重插值方法简单且有效，但我们认为它们的静态性质限制了下游性能的提高，同时实现了效率。在这项工作中，我们提出了DaWin，一种无需训练的动态权重插值方法，利用每个未标记测试样本上的个体模型的熵来评估模型的专业知识，并动态计算每个样本的插值系数。与以往通常依赖额外训练来学习这些系数的方法不同，我们的方法不需要训练。然后，我们提出了一种混合建模方法，大大降低了动态插值引起的推理开销。我们在大规模视觉识别基准上验证了DaWin，涵盖了14个任务，包括对抗微调--ImageNet和衍生的五个分布转移基准--以及带有八个分类任务的多任务学习。结果表明，在考虑的设置中，DaWin 在性能上取得了显著的提升，且计算开销最小。我们进一步讨论了DaWin的分析行为，以解释其经验成功。

更新时间: 2025-02-11 09:21:41

领域: cs.LG,cs.CV

下载: http://arxiv.org/abs/2410.03782v2

AURO: Reinforcement Learning for Adaptive User Retention Optimization in Recommender Systems

The field of Reinforcement Learning (RL) has garnered increasing attention for its ability of optimizing user retention in recommender systems. A primary obstacle in this optimization process is the environment non-stationarity stemming from the continual and complex evolution of user behavior patterns over time, such as variations in interaction rates and retention propensities. These changes pose significant challenges to existing RL algorithms for recommendations, leading to issues with dynamics and reward distribution shifts. This paper introduces a novel approach called \textbf{A}daptive \textbf{U}ser \textbf{R}etention \textbf{O}ptimization (AURO) to address this challenge. To navigate the recommendation policy in non-stationary environments, AURO introduces an state abstraction module in the policy network. The module is trained with a new value-based loss function, aligning its output with the estimated performance of the current policy. As the policy performance of RL is sensitive to environment drifts, the loss function enables the state abstraction to be reflective of environment changes and notify the recommendation policy to adapt accordingly. Additionally, the non-stationarity of the environment introduces the problem of implicit cold start, where the recommendation policy continuously interacts with users displaying novel behavior patterns. AURO encourages exploration guarded by performance-based rejection sampling to maintain a stable recommendation quality in the cost-sensitive online environment. Extensive empirical analysis are conducted in a user retention simulator, the MovieLens dataset, and a live short-video recommendation platform, demonstrating AURO's superior performance against all evaluated baseline algorithms.

Updated: 2025-02-11 09:07:15

标题: AURO：推荐系统中用户留存优化的自适应强化学习

摘要: 强化学习（RL）领域因其优化推荐系统中用户保留的能力而受到越来越多的关注。在这一优化过程中的一个主要障碍是源自用户行为模式随时间持续复杂演变而导致的环境非稳态性，例如互动率和保留倾向的变化。这些变化给现有的推荐系统RL算法带来了重大挑战，导致了动态和奖励分布的转变。本文介绍了一种名为自适应用户保留优化（AURO）的新方法来解决这一挑战。为了在非稳态环境中调整推荐策略，AURO在策略网络中引入了一种状态抽象模块。该模块使用一种新的基于价值的损失函数进行训练，使其输出与当前策略的估计性能保持一致。由于RL的策略绩效对环境漂移非常敏感，这一损失函数使得状态抽象能够反映环境的变化并通知推荐策略相应地调整。此外，环境的非稳态性引入了隐式冷启动问题，即推荐策略不断与展示新行为模式的用户进行交互。AURO鼓励通过基于绩效的拒绝采样进行探索，以在成本敏感的在线环境中维持稳定的推荐质量。在用户保留模拟器、MovieLens数据集和一个实时短视频推荐平台上进行了广泛的实证分析，展示了AURO相对于所有评估基线算法的卓越性能。

更新时间: 2025-02-11 09:07:15

领域: cs.IR,cs.LG

下载: http://arxiv.org/abs/2310.03984v2

Data Assetization via Resources-decoupled Federated Learning

With the development of the digital economy, data is increasingly recognized as an essential resource for both work and life. However, due to privacy concerns, data owners tend to maximize the value of data through the circulation of information rather than direct data transfer. Federated learning (FL) provides an effective approach to collaborative training models while preserving privacy. However, as model parameters and training data grow, there are not only real differences in data resources between different data owners, but also mismatches between data and computing resources. These challenges lead to inadequate collaboration among data owners, compute centers, and model owners, reducing the global utility of the three parties and the effectiveness of data assetization. In this work, we first propose a framework for resource-decoupled FL involving three parties. Then, we design a Tripartite Stackelberg Model and theoretically analyze the Stackelberg-Nash equilibrium (SNE) for participants to optimize global utility. Next, we propose the Quality-aware Dynamic Resources-decoupled FL algorithm (QD-RDFL), in which we derive and solve the optimal strategies of all parties to achieve SNE using backward induction. We also design a dynamic optimization mechanism to improve the optimal strategy profile by evaluating the contribution of data quality from data owners to the global model during real training. Finally, our extensive experiments demonstrate that our method effectively encourages the linkage of the three parties involved, maximizing the global utility and value of data assets.

Updated: 2025-02-11 09:03:49

标题: 数据资产化通过资源解耦的联邦学习

摘要: 随着数字经济的发展，数据越来越被认为是工作和生活中的重要资源。然而，由于隐私问题，数据所有者倾向于通过信息传播而不是直接数据传输来最大化数据价值。联邦学习（FL）为保护隐私的同时提供了协作训练模型的有效方法。然而，随着模型参数和训练数据的增长，不仅不同数据所有者之间存在着数据资源的真实差异，而且数据和计算资源之间也存在不匹配。这些挑战导致数据所有者、计算中心和模型所有者之间合作不足，降低了三方的全局效用和数据资产化的效果。在这项工作中，我们首先提出了一个涉及三方的资源解耦FL的框架。然后，我们设计了一个三方斯达克尔贝格模型，并在理论上分析了参与者优化全局效用的斯达克尔贝格-纳什均衡（SNE）。接下来，我们提出了一种质量感知动态资源解耦FL算法（QD-RDFL），其中我们通过反向归纳推导和解决所有方的最优策略以达到SNE。我们还设计了一个动态优化机制，通过在实际训练过程中评估数据所有者对全局模型的贡献来改进最优策略配置。最后，我们的大量实验证明，我们的方法有效地鼓励了三方之间的联系，最大化了全局效用和数据资产的价值。

更新时间: 2025-02-11 09:03:49

领域: cs.LG

下载: http://arxiv.org/abs/2501.14588v2

Point Cloud Synthesis Using Inner Product Transforms

Point-cloud synthesis, i.e. the generation of novel point clouds from an input distribution, remains a challenging task, for which numerous complex machine-learning models have been devised. We develop a novel method that encodes geometrical-topological characteristics of point clouds using inner products, leading to a highly-efficient point cloud representation with provable expressivity properties. Integrated into deep learning models, our encoding exhibits high quality in typical tasks like reconstruction, generation, and interpolation, with inference times orders of magnitude faster than existing methods.

Updated: 2025-02-11 09:03:09

标题: 使用内积变换合成点云

摘要: 点云合成，即从输入分布生成新颖点云，仍然是一个具有挑战性的任务，为此设计了许多复杂的机器学习模型。我们开发了一种新颖的方法，使用内积编码点云的几何拓扑特征，导致高效的点云表示具有可证实的表达性质。集成到深度学习模型中，我们的编码在重建、生成和插值等典型任务中表现出高质量，推断时间比现有方法快几个数量级。

更新时间: 2025-02-11 09:03:09

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2410.18987v2

EvoFlow: Evolving Diverse Agentic Workflows On The Fly

The past two years have witnessed the evolution of large language model (LLM)-based multi-agent systems from labor-intensive manual design to partial automation (\textit{e.g.}, prompt engineering, communication topology) and eventually to fully automated design. However, existing agentic automation pipelines often lack LLM heterogeneity and focus on single-objective performance optimization, limiting their potential to combine weaker models for more customized and cost-effective solutions. To address this challenge, we propose EvoFlow, a niching evolutionary algorithm-based framework to automatically search a population of heterogeneous and complexity-adaptive agentic workflows, rather than a single homogeneous, complex workflow. Technically, EvoFlow performs \textit{(1) tag-based retrieval} to extract parent workflows from an agentic population, evolves new workflows through \textit{(2) crossover} and \textit{(3) mutation}, and employs \textit{(4) niching-based selection} to maintain population diversity and quality. Extensive evaluations across seven benchmarks demonstrate that EvoFlow is: \textbf{(I) diverse}, evolving a population of workflows ranging from simple I/O tasks to complex multi-turn interactions; \textbf{(II) high-performing}, outperforming previous handcrafted and automated workflows by $1.23\%\sim29.86\%$; \textbf{(III) economical}, surpassing powerful \llmname{o1-preview} at $12.4\%$ of its inference cost using weaker open-source models.

Updated: 2025-02-11 08:48:46

标题: EvoFlow：即时演化多元化的主体工作流程

摘要: 在过去的两年中，大型语言模型(LLM)基础的多智能体系统已经从繁重的手动设计逐步演变为部分自动化（例如，提示工程、通信拓扑），最终实现全自动设计。然而，现有的智能体自动化流程通常缺乏LLM异质性，并专注于单一目标的性能优化，限制了它们将较弱模型结合以实现更定制和经济有效解决方案的潜力。为了解决这一挑战，我们提出了EvoFlow，一个基于尼奇进化算法的框架，可以自动搜索一种异质且复杂适应性的智能体工作流程种群，而不是单一的同质复杂工作流程。从技术上讲，EvoFlow执行“基于标签的检索”来从智能体种群中提取父工作流程，通过“交叉”和“突变”进化新的工作流程，并采用“尼奇选择”来维持种群的多样性和质量。对七个基准进行的广泛评估表明，EvoFlow具有以下特点：（I）多样性，演变出一种从简单I/O任务到复杂多轮交互的工作流种群；（II）高性能，比以前手工制作和自动化工作流程的性能高出1.23%~29.86%；（III）经济实惠，在使用较弱的开源模型时，超过了功能强大的LLM模型“o1-preview”推理成本的12.4%。

更新时间: 2025-02-11 08:48:46

领域: cs.LG,cs.CL,cs.MA,cs.NE

下载: http://arxiv.org/abs/2502.07373v1

Uniform Kernel Prober

The ability to identify useful features or representations of the input data based on training data that achieves low prediction error on test data across multiple prediction tasks is considered the key to multitask learning success. In practice, however, one faces the issue of the choice of prediction tasks and the availability of test data from the chosen tasks while comparing the relative performance of different features. In this work, we develop a class of pseudometrics called Uniform Kernel Prober (UKP) for comparing features or representations learned by different statistical models such as neural networks when the downstream prediction tasks involve kernel ridge regression. The proposed pseudometric, UKP, between any two representations, provides a uniform measure of prediction error on test data corresponding to a general class of kernel ridge regression tasks for a given choice of a kernel without access to test data. Additionally, desired invariances in representations can be successfully captured by UKP only through the choice of the kernel function and the pseudometric can be efficiently estimated from $n$ input data samples with $O(\frac{1}{\sqrt{n}})$ estimation error. We also experimentally demonstrate the ability of UKP to discriminate between different types of features or representations based on their generalization performance on downstream kernel ridge regression tasks.

Updated: 2025-02-11 08:43:41

标题: 统一核心探针

摘要: 能够基于训练数据识别输入数据的有用特征或表示，并在多个预测任务中取得低预测错误的能力被认为是多任务学习成功的关键。然而，在实践中，人们面临着选择预测任务和比较不同特征相对性能时所选任务的测试数据可用性的问题。在这项工作中，我们开发了一类称为Uniform Kernel Prober (UKP)的伪度量，用于比较由不同统计模型（如神经网络）学习的特征或表示，当下游预测任务涉及核岭回归时。所提出的伪度量UKP，在任意两个表示之间，提供了一个统一的预测误差度量，对应于给定核选择的一般类核岭回归任务的测试数据，而无需访问测试数据。此外，通过选择核函数，可以成功捕捉到表示中的所需不变性，伪度量可以从$n$个输入数据样本中高效地估计，估计误差为$O(\frac{1}{\sqrt{n}})$。我们还通过实验展示了UKP在基于下游核岭回归任务的泛化性能上区分不同类型特征或表示的能力。

更新时间: 2025-02-11 08:43:41

领域: stat.ML,cs.LG,math.ST,stat.TH

下载: http://arxiv.org/abs/2502.07369v1

Effects of Random Edge-Dropping on Over-Squashing in Graph Neural Networks

Message Passing Neural Networks (MPNNs) are a class of Graph Neural Networks (GNNs) that leverage the graph topology to propagate messages across increasingly larger neighborhoods. The message-passing scheme leads to two distinct challenges: over-smoothing and over-squashing. While several algorithms, e.g. DropEdge and its variants -- DropNode, DropAgg and DropGNN -- have successfully addressed the over-smoothing problem, their impact on over-squashing remains largely unexplored. This represents a critical gap in the literature as failure to mitigate over-squashing would make these methods unsuitable for long-range tasks. In this work, we take the first step towards closing this gap by studying the aforementioned algorithms in the context of over-squashing. We present novel theoretical results that characterize the negative effects of DropEdge on sensitivity between distant nodes, suggesting its unsuitability for long-range tasks. Our findings are easily extended to its variants, allowing us to build a comprehensive understanding of how they affect over-squashing. We evaluate these methods using real-world datasets, demonstrating their detrimental effects. Specifically, we show that while DropEdge-variants improve test-time performance in short range tasks, they deteriorate performance in long-range ones. Our theory explains these results as follows: random edge-dropping lowers the effective receptive field of GNNs, which although beneficial for short-range tasks, misaligns the models on long-range ones. This forces the models to overfit to short-range artefacts in the training set, resulting in poor generalization. Our conclusions highlight the need to re-evaluate various methods designed for training deep GNNs, with a renewed focus on modelling long-range interactions.

Updated: 2025-02-11 08:36:38

标题: 随机边丢弃对图神经网络中过度压缩的影响

摘要: 消息传递神经网络（MPNNs）是一类图神经网络（GNNs），利用图的拓扑结构在不断扩大的邻域中传递消息。消息传递方案导致了两个明显的挑战：过度平滑和过度压缩。虽然一些算法，如DropEdge及其变体——DropNode、DropAgg和DropGNN——已成功解决了过度平滑问题，但它们对过度压缩的影响仍然鲜为人知。这在文献中代表了一个重要的缺口，因为未能减轻过度压缩将使这些方法不适用于长距离任务。在本研究中，我们采取了第一步，研究了上述算法在过度压缩的背景下。我们提出了表征DropEdge对远距离节点之间敏感性负面影响的新颖理论结果，表明其对长距离任务的不适用性。我们的发现很容易扩展到其变体，使我们能够全面了解它们对过度压缩的影响。我们使用真实世界数据集评估这些方法，展示了它们的有害影响。具体来说，我们发现虽然DropEdge的变体在短距离任务中提高了测试性能，但在长距离任务中却导致性能下降。我们的理论解释了这些结果：随机删除边降低了GNN的有效感受野，虽然对短距离任务有益，但在长距离任务中使模型不匹配。这迫使模型过度拟合训练集中的短距离特征，导致泛化性能不佳。我们的结论强调了有必要重新评估为训练深度GNN设计的各种方法，重点放在对长距离交互作用的建模上。

更新时间: 2025-02-11 08:36:38

领域: cs.LG

下载: http://arxiv.org/abs/2502.07364v1

Socially Pertinent Robots in Gerontological Healthcare

Despite the many recent achievements in developing and deploying social robotics, there are still many underexplored environments and applications for which systematic evaluation of such systems by end-users is necessary. While several robotic platforms have been used in gerontological healthcare, the question of whether or not a social interactive robot with multi-modal conversational capabilities will be useful and accepted in real-life facilities is yet to be answered. This paper is an attempt to partially answer this question, via two waves of experiments with patients and companions in a day-care gerontological facility in Paris with a full-sized humanoid robot endowed with social and conversational interaction capabilities. The software architecture, developed during the H2020 SPRING project, together with the experimental protocol, allowed us to evaluate the acceptability (AES) and usability (SUS) with more than 60 end-users. Overall, the users are receptive to this technology, especially when the robot perception and action skills are robust to environmental clutter and flexible to handle a plethora of different interactions.

Updated: 2025-02-11 08:32:11

标题: 社会相关的机器人在老年保健中的应用

摘要: 尽管在开发和部署社交机器人方面取得了许多最近的成就，但仍有许多未经探索的环境和应用需要最终用户对此类系统进行系统评估。虽然已经有几种机器人平台在老年保健领域得到应用，但社交互动机器人是否具有多模态会话能力，在现实设施中是否有用且被接受的问题尚未得到解答。本文试图通过两轮实验来部分回答这个问题，实验对象为巴黎一家日间老年保健机构的患者和陪伴者，使用一台具备社交和会话交互能力的全尺寸人形机器人。在H2020 SPRING项目期间开发的软件架构和实验方案使我们能够对60多名最终用户进行可接受性（AES）和可用性（SUS）评估。总体而言，用户对这项技术持开放态度，特别是当机器人的感知和行动技能能够应对环境混乱并灵活处理各种不同的互动时。

更新时间: 2025-02-11 08:32:11

领域: cs.RO,cs.AI

下载: http://arxiv.org/abs/2404.07560v2

Bridging the Evaluation Gap: Leveraging Large Language Models for Topic Model Evaluation

This study presents a framework for automated evaluation of dynamically evolving topic taxonomies in scientific literature using Large Language Models (LLMs). In digital library systems, topic modeling plays a crucial role in efficiently organizing and retrieving scholarly content, guiding researchers through complex knowledge landscapes. As research domains proliferate and shift, traditional human centric and static evaluation methods struggle to maintain relevance. The proposed approach harnesses LLMs to measure key quality dimensions, such as coherence, repetitiveness, diversity, and topic-document alignment, without heavy reliance on expert annotators or narrow statistical metrics. Tailored prompts guide LLM assessments, ensuring consistent and interpretable evaluations across various datasets and modeling techniques. Experiments on benchmark corpora demonstrate the method's robustness, scalability, and adaptability, underscoring its value as a more holistic and dynamic alternative to conventional evaluation strategies.

Updated: 2025-02-11 08:23:56

标题: 填补评估差距：利用大型语言模型进行主题模型评估

摘要: 这项研究提出了一个框架，用于利用大型语言模型（LLMs）自动评估科学文献中动态发展的主题分类。在数字图书馆系统中，主题建模在高效组织和检索学术内容，引导研究人员穿越复杂的知识领域方面发挥着至关重要的作用。随着研究领域的不断增长和转变，传统的人类中心和静态评估方法难以保持相关性。所提出的方法利用LLMs来度量关键的质量维度，例如连贯性、重复性、多样性和主题-文档对齐，而不过分依赖专家注释者或狭窄的统计指标。量身定制的提示指导LLM评估，确保在各种数据集和建模技术中进行一致和可解释的评估。对基准语料库的实验显示了该方法的鲁棒性、可扩展性和适应性，突显其作为传统评估策略更全面和动态替代方案的价值。

更新时间: 2025-02-11 08:23:56

领域: cs.CL,cs.AI,cs.DL

下载: http://arxiv.org/abs/2502.07352v1

Multi-Task-oriented Nighttime Haze Imaging Enhancer for Vision-driven Measurement Systems

Salient object detection (SOD) plays a critical role in vision-driven measurement systems (VMS), facilitating the detection and segmentation of key visual elements in an image. However, adverse imaging conditions such as haze during the day, low light, and haze at night severely degrade image quality, and complicating the SOD process. To address these challenges, we propose a multi-task-oriented nighttime haze imaging enhancer (MToIE), which integrates three tasks: daytime dehazing, low-light enhancement, and nighttime dehazing. The MToIE incorporates two key innovative components: First, the network employs a task-oriented node learning mechanism to handle three specific degradation types: day-time haze, low light, and night-time haze conditions, with an embedded self-attention module enhancing its performance in nighttime imaging. In addition, multi-receptive field enhancement module that efficiently extracts multi-scale features through three parallel depthwise separable convolution branches with different dilation rates, capturing comprehensive spatial information with minimal computational overhead. To ensure optimal image reconstruction quality and visual characteristics, we suggest a hybrid loss function. Extensive experiments on different types of weather/imaging conditions illustrate that MToIE surpasses existing methods, significantly enhancing the accuracy and reliability of vision systems across diverse imaging scenarios. The code is available at https://github.com/Ai-Chen-Lab/MToIE.

Updated: 2025-02-11 08:22:21

标题: 基于视觉驱动测量系统的多任务夜间雾化图像增强器

摘要: 显著物体检测（SOD）在视觉驱动的测量系统（VMS）中起着至关重要的作用，有助于检测和分割图像中的关键视觉元素。然而，白天的雾霾、低光和夜间的雾霾等不利的成像条件严重降低了图像质量，并使SOD过程复杂化。为了应对这些挑战，我们提出了一个多任务导向的夜间雾霾图像增强器（MToIE），它整合了三个任务：白天去雾、低光增强和夜间去雾。MToIE包含两个关键的创新组件：首先，网络采用了一个任务导向节点学习机制，处理三种特定的降级类型：白天雾霾、低光和夜间雾霾条件，同时嵌入了自注意力模块，增强了其在夜间成像中的性能。此外，多感受野增强模块通过三个具有不同扩张率的深度可分离卷积分支有效地提取多尺度特征，以最小的计算开销捕获全面的空间信息。为了确保最佳的图像重建质量和视觉特征，我们建议使用混合损失函数。对不同类型的天气/成像条件进行的大量实验表明，MToIE超越了现有方法，显著提高了视觉系统在各种成像场景下的准确性和可靠性。该代码可在https://github.com/Ai-Chen-Lab/MToIE获取。

更新时间: 2025-02-11 08:22:21

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2502.07351v1

KABB: Knowledge-Aware Bayesian Bandits for Dynamic Expert Coordination in Multi-Agent Systems

As scaling large language models faces prohibitive costs, multi-agent systems emerge as a promising alternative, though challenged by static knowledge assumptions and coordination inefficiencies. We introduces Knowledge-Aware Bayesian Bandits (KABB), a novel framework that enhances multi-agent system coordination through semantic understanding and dynamic adaptation. The framework features three key innovations: a three-dimensional knowledge distance model for deep semantic understanding, a dual-adaptation mechanism for continuous expert optimization, and a knowledge-aware Thompson Sampling strategy for efficient expert selection. Extensive evaluation demonstrates KABB achieves an optimal cost-performance balance, maintaining high performance while keeping computational demands relatively low in multi-agent coordination.

Updated: 2025-02-11 08:22:12

标题: KABB: 多智能体系统中动态专家协调的知识感知贝叶斯老虎机

摘要: 随着扩展大型语言模型所面临的高昂成本，多Agent系统出现作为一种有前途的替代方案，尽管受到静态知识假设和协调效率低下的挑战。我们引入了一种名为Knowledge-Aware Bayesian Bandits（KABB）的新框架，通过语义理解和动态适应增强多Agent系统协调。该框架具有三个关键创新：用于深度语义理解的三维知识距离模型，用于连续专家优化的双重适应机制，以及用于高效专家选择的知识感知汤普森抽样策略。广泛的评估表明，KABB实现了最佳的成本性能平衡，在保持高性能的同时，在多Agent协调中保持相对较低的计算需求。

更新时间: 2025-02-11 08:22:12

领域: cs.AI

下载: http://arxiv.org/abs/2502.07350v1

Integrating Physics and Data-Driven Approaches: An Explainable and Uncertainty-Aware Hybrid Model for Wind Turbine Power Prediction

The rapid growth of the wind energy sector underscores the urgent need to optimize turbine operations and ensure effective maintenance through early fault detection systems. While traditional empirical and physics-based models offer approximate predictions of power generation based on wind speed, they often fail to capture the complex, non-linear relationships between other input variables and the resulting power output. Data-driven machine learning methods present a promising avenue for improving wind turbine modeling by leveraging large datasets, enhancing prediction accuracy but often at the cost of interpretability. In this study, we propose a hybrid semi-parametric model that combines the strengths of both approaches, applied to a dataset from a wind farm with four turbines. The model integrates a physics-inspired submodel, providing a reasonable approximation of power generation, with a non-parametric submodel that predicts the residuals. This non-parametric submodel is trained on a broader range of variables to account for phenomena not captured by the physics-based component. The hybrid model achieves a 37% improvement in prediction accuracy over the physics-based model. To enhance interpretability, SHAP values are used to analyze the influence of input features on the residual submodel's output. Additionally, prediction uncertainties are quantified using a conformalized quantile regression method. The combination of these techniques, alongside the physics grounding of the parametric submodel, provides a flexible, accurate, and reliable framework. Ultimately, this study opens the door for evaluating the impact of unmodeled variables on wind turbine power generation, offering a basis for potential optimization.

Updated: 2025-02-11 08:16:48

标题: 整合物理学和数据驱动方法：一种可解释和不确定性感知的风力发电机功率预测混合模型

摘要: 风能行业的快速增长凸显了优化风力发电机运营并通过早期故障检测系统确保有效维护的紧迫性。虽然传统的经验和基于物理的模型可以根据风速提供发电功率的近似预测，但它们经常无法捕捉其他输入变量与结果功率输出之间的复杂非线性关系。数据驱动的机器学习方法为通过利用大型数据集来提高风力发电机建模提供了一个有希望的途径，增强了预测精度，但通常以解释性为代价。在本研究中，我们提出了一个融合了两种方法优势的混合半参数模型，应用于一个拥有四台风力发电机的风电场的数据集。该模型整合了一个受物理启发的子模型，提供了发电功率的合理近似，以及一个预测残差的非参数子模型。这个非参数子模型是在更广泛的变量范围上进行训练的，以考虑未被物理组件捕捉的现象。混合模型在预测精度方面比基于物理的模型提高了37%。为了增强可解释性，使用SHAP值来分析输入特征对残差子模型输出的影响。此外，使用一种符合化的分位数回归方法来量化预测的不确定性。这些技术的结合，以及参数子模型的物理基础，提供了一个灵活、准确和可靠的框架。最终，这项研究为评估未建模变量对风力发电机发电的影响打开了大门，为潜在优化提供了基础。

更新时间: 2025-02-11 08:16:48

领域: cs.LG,cs.AI,cs.CE

下载: http://arxiv.org/abs/2502.07344v1

ZeroDiff: Solidified Visual-Semantic Correlation in Zero-Shot Learning

Zero-shot Learning (ZSL) aims to enable classifiers to identify unseen classes. This is typically achieved by generating visual features for unseen classes based on learned visual-semantic correlations from seen classes. However, most current generative approaches heavily rely on having a sufficient number of samples from seen classes. Our study reveals that a scarcity of seen class samples results in a marked decrease in performance across many generative ZSL techniques. We argue, quantify, and empirically demonstrate that this decline is largely attributable to spurious visual-semantic correlations. To address this issue, we introduce ZeroDiff, an innovative generative framework for ZSL that incorporates diffusion mechanisms and contrastive representations to enhance visual-semantic correlations. ZeroDiff comprises three key components: (1) Diffusion augmentation, which naturally transforms limited data into an expanded set of noised data to mitigate generative model overfitting; (2) Supervised-contrastive (SC)-based representations that dynamically characterize each limited sample to support visual feature generation; and (3) Multiple feature discriminators employing a Wasserstein-distance-based mutual learning approach, evaluating generated features from various perspectives, including pre-defined semantics, SC-based representations, and the diffusion process. Extensive experiments on three popular ZSL benchmarks demonstrate that ZeroDiff not only achieves significant improvements over existing ZSL methods but also maintains robust performance even with scarce training data. Our codes are available at https://github.com/FouriYe/ZeroDiff_ICLR25.

Updated: 2025-02-11 08:09:50

标题: ZeroDiff：零样本学习中的凝固视觉-语义相关性

摘要: 零样本学习（ZSL）旨在使分类器能够识别未见过的类别。通常通过基于从已见类别学习的视觉-语义相关性生成未见类别的视觉特征来实现。然而，大多数当前的生成方法严重依赖于来自已见类别的足够数量的样本。我们的研究表明，已见类别样本的稀缺导致许多生成式ZSL技术的性能显著下降。我们认为、量化并经验性地证明，这种下降主要归因于虚假的视觉-语义相关性。为了解决这个问题，我们引入了ZeroDiff，一个创新的用于ZSL的生成框架，它结合了扩散机制和对比表示以增强视觉-语义相关性。ZeroDiff包括三个关键组件：（1）扩散增强，自然地将有限数据转化为扩展的噪声数据集，以减轻生成模型的过拟合；（2）基于监督-对比（SC）的表示，动态地对每个有限样本进行表征，以支持视觉特征的生成；和（3）多特征鉴别器，采用基于Wasserstein距离的相互学习方法，从各种角度评估生成的特征，包括预定义的语义、基于SC的表示和扩散过程。对三个流行的ZSL基准的广泛实验表明，ZeroDiff不仅在现有ZSL方法上取得了显著改进，而且即使在训练数据稀缺的情况下也能保持稳健的性能。我们的代码可以在https://github.com/FouriYe/ZeroDiff_ICLR25 上找到。

更新时间: 2025-02-11 08:09:50

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2406.02929v2

Spindle: Efficient Distributed Training of Multi-Task Large Models via Wavefront Scheduling

Recent foundation models are capable of handling multiple tasks and multiple data modalities with the unified base model structure and several specialized model components. However, efficient training of such multi-task (MT) multi-modal (MM) models poses significant system challenges due to the sophisticated model architecture and the heterogeneous workloads of different tasks and modalities. In this paper, we propose Spindle, a brand new training system tailored for resource-efficient and high-performance training of MT MM models via wavefront scheduling. The key idea of Spindle is to decompose the model execution into waves and address the joint optimization problem sequentially, including both heterogeneity-aware workload parallelization and dependency-driven execution scheduling. We build our system and evaluate it on various MT MM models. Experiments demonstrate the superior performance and efficiency of Spindle, with speedup ratio up to 71% compared to state-of-the-art training systems.

Updated: 2025-02-11 08:08:01

标题: Spindle：通过波前调度实现多任务大型模型的高效分布式训练

摘要: 最近的基础模型能够处理多个任务和多个数据模态，具有统一的基本模型结构和几个专门的模型组件。然而，有效训练这种多任务（MT）多模态（MM）模型面临重大系统挑战，因为复杂的模型架构和不同任务和模态的异构工作负载。在本文中，我们提出了Spindle，一个专为资源高效和高性能训练MT MM模型的全新训练系统，通过波前调度。Spindle的关键思想是将模型执行分解为波，并按顺序解决联合优化问题，包括异构感知工作负载并行化和依赖驱动执行调度。我们构建了我们的系统并在各种MT MM模型上进行评估。实验表明，与最先进的训练系统相比，Spindle具有卓越的性能和效率，速度提高比率高达71%。

更新时间: 2025-02-11 08:08:01

领域: cs.DC,cs.LG

下载: http://arxiv.org/abs/2409.03365v3

Density Ratio Estimation with Conditional Probability Paths

Density ratio estimation in high dimensions can be reframed as integrating a certain quantity, the time score, over probability paths which interpolate between the two densities. In practice, the time score has to be estimated based on samples from the two densities. However, existing methods for this problem remain computationally expensive and can yield inaccurate estimates. Inspired by recent advances in generative modeling, we introduce a novel framework for time score estimation, based on a conditioning variable. Choosing the conditioning variable judiciously enables a closed-form objective function. We demonstrate that, compared to previous approaches, our approach results in faster learning of the time score and competitive or better estimation accuracies of the density ratio on challenging tasks. Furthermore, we establish theoretical guarantees on the error of the estimated density ratio.

Updated: 2025-02-11 08:07:17

标题: 使用条件概率路径进行密度比估计

摘要: 在高维度中，密度比率估计可以重新构建为在两个密度之间插值的概率路径上积分一个特定数量，即时间分数。在实践中，时间分数必须基于来自两个密度的样本进行估计。然而，现有方法对于这个问题仍然计算昂贵，并且可能产生不准确的估计。受生成建模最新进展的启发，我们引入了一个基于条件变量的时间分数估计新框架。明智地选择条件变量可以使目标函数形式封闭。我们证明，与先前的方法相比，我们的方法导致更快地学习时间分数，并在具有挑战性的任务上竞争性地或更好地估计密度比率。此外，我们对估计密度比率的误差建立了理论保证。

更新时间: 2025-02-11 08:07:17

领域: cs.LG

下载: http://arxiv.org/abs/2502.02300v2

A PAC-Bayesian Link Between Generalisation and Flat Minima

Modern machine learning usually involves predictors in the overparameterised setting (number of trained parameters greater than dataset size), and their training yields not only good performance on training data, but also good generalisation capacity. This phenomenon challenges many theoretical results, and remains an open problem. To reach a better understanding, we provide novel generalisation bounds involving gradient terms. To do so, we combine the PAC-Bayes toolbox with Poincar\'e and Log-Sobolev inequalities, avoiding an explicit dependency on the dimension of the predictor space. Our results highlight the positive influence of flat minima (being minima with a neighbourhood nearly minimising the learning problem as well) on generalisation performance, involving directly the benefits of the optimisation phase.

Updated: 2025-02-11 08:04:24

标题: A PAC-Bayesian连接：泛化和平坦极小值

摘要: 现代机器学习通常涉及过参数化设置（训练参数数量大于数据集大小），它们的训练不仅在训练数据上表现良好，而且具有良好的泛化能力。这种现象挑战了许多理论结果，仍然是一个未解决的问题。为了更好地理解，我们提供了涉及梯度项的新颖泛化界限。为此，我们将PAC-Bayes工具箱与Poincaré和Log-Sobolev不等式结合起来，避免了对预测空间维度的显式依赖。我们的结果突显了平坦极小值（即具有近似最小化学习问题的邻域的极小值）对泛化性能的积极影响，直接涉及优化阶段的好处。

更新时间: 2025-02-11 08:04:24

领域: stat.ML,cs.LG

下载: http://arxiv.org/abs/2402.08508v2

Emotional EEG Classification using Upscaled Connectivity Matrices

In recent studies of emotional EEG classification, connectivity matrices have been successfully employed as input to convolutional neural networks (CNNs), which can effectively consider inter-regional interaction patterns in EEG. However, we find that such an approach has a limitation that important patterns in connectivity matrices may be lost during the convolutional operations in CNNs. To resolve this issue, we propose and validate an idea to upscale the connectivity matrices to strengthen the local patterns. Experimental results demonstrate that this simple idea can significantly enhance the classification performance.

Updated: 2025-02-11 07:56:19

标题: 情绪EEG分类使用升级的连接矩阵

摘要: 在最近的情绪脑电图分类研究中，连接矩阵已成功地作为卷积神经网络（CNNs）的输入，这可以有效地考虑EEG中区域间的相互作用模式。然而，我们发现这种方法存在一个局限性，即在CNNs的卷积操作过程中，连接矩阵中的重要模式可能会丢失。为了解决这个问题，我们提出并验证了一种将连接矩阵放大以加强局部模式的想法。实验结果表明，这个简单的想法可以显著增强分类性能。

更新时间: 2025-02-11 07:56:19

领域: cs.LG

下载: http://arxiv.org/abs/2502.07843v1

Neural Flow Samplers with Shortcut Models

Sampling from unnormalized densities is a fundamental task across various domains. Flow-based samplers generate samples by learning a velocity field that satisfies the continuity equation, but this requires estimating the intractable time derivative of the partition function. While importance sampling provides an approximation, it suffers from high variance. To mitigate this, we introduce a velocity-driven Sequential Monte Carlo method combined with control variates to reduce variance. Additionally, we incorporate a shortcut model to improve efficiency by minimizing the number of sampling steps. Empirical results on both synthetic datasets and $n$-body system targets validate the effectiveness of our approach.

Updated: 2025-02-11 07:55:41

标题: 使用快捷模型的神经流采样器

摘要: 从未归一化的密度中进行抽样是跨领域的一项基本任务。基于流的采样器通过学习满足连续性方程的速度场来生成样本，但这需要估计难以处理的分区函数的时间导数。虽然重要性抽样提供了近似值，但存在高方差的问题。为了减少这种问题，我们引入了一个以速度驱动的顺序蒙特卡洛方法，并结合控制变量来降低方差。此外，我们还将快捷模型纳入其中，通过减少抽样步骤的数量来提高效率。对合成数据集和n体系统目标的实证结果验证了我们方法的有效性。

更新时间: 2025-02-11 07:55:41

领域: cs.LG

下载: http://arxiv.org/abs/2502.07337v1

EMERALD: Evidence Management for Continuous Certification as a Service in the Cloud

The conspicuous lack of cloud-specific security certifications, in addition to the existing market fragmentation, hinder transparency and accountability in the provision and usage of European cloud services. Both issues ultimately reflect on the level of customers' trustworthiness and adoption of cloud services. The upcoming demand for continuous certification has not yet been definitively addressed and it remains unclear how the level 'high' of the European Cybersecurity Certification Scheme for Cloud Services (EUCS) shall be technologically achieved. The introduction of AI in cloud services is raising the complexity of certification even further. This paper presents the EMERALD Certification-as-a-Service (CaaS) concept for continuous certification of harmonized cybersecurity schemes, like the EUCS. EMERALD CaaS aims to provide agile and lean re-certification to consumers that adhere to a defined level of security and trust in a uniform way across heterogeneous environments consisting of combinations of different resources (Cloud, Edge, IoT). Initial findings suggest that EMERALD will significantly contribute to continuous certification, boosting providers and users of cloud services to maintain regulatory compliance towards the latest and upcoming security schemes.

Updated: 2025-02-11 07:49:10

标题: 翡翠：云中连续认证服务的证据管理

摘要: 在欧洲云服务的提供和使用中，突出的云特定安全认证的缺乏，加上现有市场的碎片化，阻碍了透明度和问责制。这两个问题最终反映在客户的信任度和对云服务的采用程度上。对连续认证的需求尚未得到明确解决，尚不清楚欧洲网络安全认证计划（EUCS）的高级别如何在技术上实现。在云服务中引入人工智能进一步提高了认证的复杂性。本文提出了EMERALD认证即服务（CaaS）概念，用于连续认证的协调网络安全计划，如EUCS。 EMERALD CaaS旨在为遵循统一安全和信任水平的消费者提供敏捷和精简的再认证，跨异构环境（包括不同资源的组合，如云、边缘、物联网）以统一方式进行。初步研究结果表明，EMERALD将显著促进连续认证，推动云服务的提供商和用户保持符合最新和即将推出的安全计划的法规合规性。

更新时间: 2025-02-11 07:49:10

领域: cs.CR

下载: http://arxiv.org/abs/2502.07330v1

PICTS: A Novel Deep Reinforcement Learning Approach for Dynamic P-I Control in Scanning Probe Microscopy

We have developed a Parallel Integrated Control and Training System, leveraging the deep reinforcement learning to dynamically adjust the control strategies in real time for scanning probe microscopy techniques.

Updated: 2025-02-11 07:43:46

标题: PICTS：一种新颖的深度强化学习方法用于扫描探针显微镜中动态P-I控制

摘要: 我们开发了一个并行集成控制和培训系统，利用深度强化学习动态调整控制策略，实时应用于扫描探针显微技术中。

更新时间: 2025-02-11 07:43:46

领域: cond-mat.mtrl-sci,cs.LG,physics.app-ph

下载: http://arxiv.org/abs/2502.07326v1

Long-term simulation of physical and mechanical behaviors using curriculum-transfer-learning based physics-informed neural networks

This paper proposes a Curriculum-Transfer-Learning based physics-informed neural network (CTL-PINN) for long-term simulation of physical and mechanical behaviors. The main innovation of CTL-PINN lies in decomposing long-term problems into a sequence of short-term subproblems. Initially, the standard PINN is employed to solve the first sub-problem. As the simulation progresses, subsequent time-domain problems are addressed using a curriculum learning approach that integrates information from previous steps. Furthermore, transfer learning techniques are incorporated, allowing the model to effectively utilize prior training data and solve sequential time domain transfer problems. CTL-PINN combines the strengths of curriculum learning and transfer learning, overcoming the limitations of standard PINNs, such as local optimization issues, and addressing the inaccuracies over extended time domains encountered in CL-PINN and the low computational efficiency of TL-PINN. The efficacy and robustness of CTL-PINN are demonstrated through applications to nonlinear wave propagation, Kirchhoff plate dynamic response, and the hydrodynamic model of the Three Gorges Reservoir Area, showcasing its superior capability in addressing long-term computational challenges.

Updated: 2025-02-11 07:43:03

标题: 基于课程转移学习的物理信息神经网络长期模拟物理和机械行为

摘要: 这篇论文提出了一种基于课程-迁移学习的物理信息神经网络（CTL-PINN），用于长期模拟物理和机械行为。CTL-PINN的主要创新在于将长期问题分解为一系列短期子问题。最初，使用标准PINN解决第一个子问题。随着模拟的进行，采用课程学习方法来解决后续的时间域问题，该方法整合了前几步的信息。此外，还引入了迁移学习技术，使模型能够有效利用先前的训练数据并解决顺序时间域转移问题。CTL-PINN结合了课程学习和迁移学习的优势，克服了标准PINN的局部优化问题，解决了CL-PINN中遇到的扩展时间域的不准确性以及TL-PINN的低计算效率。通过应用于非线性波传播、Kirchhoff板动态响应和三峡水库区的水动力模型，展示了CTL-PINN在解决长期计算挑战方面的卓越能力。

更新时间: 2025-02-11 07:43:03

领域: cs.LG,cs.NA,math.NA

下载: http://arxiv.org/abs/2502.07325v1

DreamCatalyst: Fast and High-Quality 3D Editing via Controlling Editability and Identity Preservation

Score distillation sampling (SDS) has emerged as an effective framework in text-driven 3D editing tasks, leveraging diffusion models for 3D-consistent editing. However, existing SDS-based 3D editing methods suffer from long training times and produce low-quality results. We identify that the root cause of this performance degradation is \textit{their conflict with the sampling dynamics of diffusion models}. Addressing this conflict allows us to treat SDS as a diffusion reverse process for 3D editing via sampling from data space. In contrast, existing methods naively distill the score function using diffusion models. From these insights, we propose DreamCatalyst, a novel framework that considers these sampling dynamics in the SDS framework. Specifically, we devise the optimization process of our DreamCatalyst to approximate the diffusion reverse process in editing tasks, thereby aligning with diffusion sampling dynamics. As a result, DreamCatalyst successfully reduces training time and improves editing quality. Our method offers two modes: (1) a fast mode that edits Neural Radiance Fields (NeRF) scenes approximately 23 times faster than current state-of-the-art NeRF editing methods, and (2) a high-quality mode that produces superior results about 8 times faster than these methods. Notably, our high-quality mode outperforms current state-of-the-art NeRF editing methods in terms of both speed and quality. DreamCatalyst also surpasses the state-of-the-art 3D Gaussian Splatting (3DGS) editing methods, establishing itself as an effective and model-agnostic 3D editing solution. See more extensive results on our project page: https://dream-catalyst.github.io.

Updated: 2025-02-11 07:34:37

标题: 梦想催化剂：通过控制可编辑性和身份保留实现快速高质量的3D编辑

摘要: 评分蒸馏抽样（SDS）已经成为文本驱动的3D编辑任务中的有效框架，利用扩散模型进行3D一致编辑。然而，现有基于SDS的3D编辑方法存在训练时间长和产生低质量结果的问题。我们确定性能下降的根本原因是它们与扩散模型的抽样动态发生冲突。解决这种冲突使我们能够将SDS视为一个从数据空间中抽样的3D编辑的扩散反向过程。相比之下，现有方法通过扩散模型天真地提炼评分函数。基于这些见解，我们提出了一种新颖的框架DreamCatalyst，考虑了SDS框架中的这些抽样动态。具体地，我们设计了DreamCatalyst的优化过程，以近似编辑任务中的扩散反向过程，从而与扩散抽样动态保持一致。因此，DreamCatalyst成功地减少了训练时间并提高了编辑质量。我们的方法提供两种模式：（1）一个快速模式，编辑神经辐射场（NeRF）场景的速度大约比当前最先进的NeRF编辑方法快23倍，以及（2）一个高质量模式，产生的结果大约比这些方法快8倍。值得注意的是，我们的高质量模式在速度和质量方面都优于当前最先进的NeRF编辑方法。DreamCatalyst还超越了最先进的3D高斯喷溅（3DGS）编辑方法，将自己确立为一个有效且与模型无关的3D编辑解决方案。在我们的项目页面上查看更多详细结果：https://dream-catalyst.github.io。

更新时间: 2025-02-11 07:34:37

领域: cs.CV,cs.AI,cs.GR,cs.LG

下载: http://arxiv.org/abs/2407.11394v3

Aligning Multiple Knowledge Graphs in a Single Pass

Entity alignment (EA) is to identify equivalent entities across different knowledge graphs (KGs), which can help fuse these KGs into a more comprehensive one. Previous EA methods mainly focus on aligning a pair of KGs, and to the best of our knowledge, no existing EA method considers aligning multiple (more than two) KGs. To fill this research gap, in this work, we study a novel problem of aligning multiple KGs and propose an effective framework named MultiEA to solve the problem. First, we embed the entities of all the candidate KGs into a common feature space by a shared KG encoder. Then, we explore three alignment strategies to minimize the distances among pre-aligned entities. In particular, we propose an innovative inference enhancement technique to improve the alignment performance by incorporating high-order similarities. Finally, to verify the effectiveness of MultiEA, we construct two new real-world benchmark datasets and conduct extensive experiments on them. The results show that our MultiEA can effectively and efficiently align multiple KGs in a single pass. We release the source codes of MultiEA at: https://github.com/kepsail/MultiEA.

Updated: 2025-02-11 07:32:30

标题: 在一次对齐多个知识图的过程中

摘要: 实体对齐（EA）是指在不同知识图谱（KGs）之间识别等价实体，有助于将这些KGs融合成一个更全面的知识图谱。先前的EA方法主要集中在对齐一对KGs，据我们所知，目前没有现有的EA方法考虑对齐多个（超过两个）KGs。为填补这一研究空白，本文研究了一个新颖的问题，即对齐多个KGs，并提出了一个名为MultiEA的有效框架来解决这一问题。首先，我们通过共享的KG编码器将所有候选KGs的实体嵌入到一个公共特征空间中。然后，我们探索三种对齐策略来最小化预先对齐实体之间的距离。特别地，我们提出了一种创新的推理增强技术，通过整合高阶相似性来提高对齐性能。最后，为验证MultiEA的有效性，我们构建了两个新的真实世界基准数据集，并对其进行了广泛的实验。结果表明，我们的MultiEA能够在一次运行中有效而高效地对齐多个KGs。我们在https://github.com/kepsail/MultiEA上发布了MultiEA的源代码。

更新时间: 2025-02-11 07:32:30

领域: cs.CL,cs.LG

下载: http://arxiv.org/abs/2408.00662v2

VideoQA-SC: Adaptive Semantic Communication for Video Question Answering

Although semantic communication (SC) has shown its potential in efficiently transmitting multimodal data such as texts, speeches and images, SC for videos has focused primarily on pixel-level reconstruction. However, these SC systems may be suboptimal for downstream intelligent tasks. Moreover, SC systems without pixel-level video reconstruction present advantages by achieving higher bandwidth efficiency and real-time performance of various intelligent tasks. The difficulty in such system design lies in the extraction of task-related compact semantic representations and their accurate delivery over noisy channels. In this paper, we propose an end-to-end SC system, named VideoQA-SC for video question answering (VideoQA) tasks. Our goal is to accomplish VideoQA tasks directly based on video semantics over noisy or fading wireless channels, bypassing the need for video reconstruction at the receiver. To this end, we develop a spatiotemporal semantic encoder for effective video semantic extraction, and a learning-based bandwidth-adaptive deep joint source-channel coding (DJSCC) scheme for efficient and robust video semantic transmission. Experiments demonstrate that VideoQA-SC outperforms traditional and advanced DJSCC-based SC systems that rely on video reconstruction at the receiver under a wide range of channel conditions and bandwidth constraints. In particular, when the signal-to-noise ratio is low, VideoQA-SC can improve the answer accuracy by 5.17% while saving almost 99.5\% of the bandwidth at the same time, compared with the advanced DJSCC-based SC system. Our results show the great potential of SC system design for video applications.

Updated: 2025-02-11 07:31:10

标题: VideoQA-SC: 视频问答中的自适应语义沟通

摘要: 尽管语义通信（SC）已经显示出在有效传输诸如文本、语音和图像等多模态数据方面的潜力，但是针对视频的SC主要集中在像素级的重建上。然而，这些SC系统可能对下游智能任务不够优化。此外，没有像素级视频重建的SC系统通过实现更高的带宽效率和各种智能任务的实时性能而具有优势。这种系统设计的难点在于从噪声信道中提取与任务相关的紧凑语义表示，并准确传递它们。在本文中，我们提出了一种名为VideoQA-SC的端到端SC系统，用于视频问答（VideoQA）任务。我们的目标是基于视频语义直接完成VideoQA任务，而无需接收端进行视频重建。为此，我们开发了一个时空语义编码器，用于有效提取视频语义，并基于学习的带宽自适应深度联合源通道编码（DJSCC）方案，用于高效和稳健地传输视频语义。实验证明，VideoQA-SC在各种信道条件和带宽约束下优于依赖接收端视频重建的传统和先进的DJSCC-based SC系统。特别是在信噪比较低时，与先进的DJSCC-based SC系统相比，VideoQA-SC可以提高5.17％的答案准确性，同时节省几乎99.5％的带宽。我们的结果显示了SC系统设计在视频应用中的巨大潜力。

更新时间: 2025-02-11 07:31:10

领域: cs.CV,cs.AI,eess.IV

下载: http://arxiv.org/abs/2406.18538v2

FlexSP: Accelerating Large Language Model Training via Flexible Sequence Parallelism

Extending the context length (i.e., the maximum supported sequence length) of LLMs is of paramount significance. To facilitate long context training of LLMs, sequence parallelism has emerged as an essential technique, which scatters each input sequence across multiple devices and necessitates communication to process the sequence. In essence, existing sequence parallelism methods assume homogeneous sequence lengths (i.e., all input sequences are equal in length) and therefore leverages a single, static scattering strategy for all input sequences. However, in reality, the sequence lengths in LLM training corpora exhibit substantial variability, often following a long-tail distribution, which leads to workload heterogeneity. In this paper, we show that employing a single, static strategy results in inefficiency and resource under-utilization, highlighting the need for adaptive approaches to handle the heterogeneous workloads across sequences. To address this, we propose a heterogeneity-adaptive sequence parallelism method. For each training step, our approach captures the variability in sequence lengths and assigns the optimal combination of scattering strategies based on workload characteristics. We model this problem as a linear programming optimization and design an efficient and effective solver to find the optimal solution. Furthermore, we implement our method in a high-performance system that supports adaptive parallelization in distributed LLM training. Experimental results demonstrate that our system outperforms state-of-the-art training frameworks by up to 1.98x.

Updated: 2025-02-11 07:31:03

标题: FlexSP：通过灵活的序列并行加速大型语言模型训练

摘要: 延长LLMs的上下文长度（即最大支持的序列长度）具有至关重要的意义。为了促进LLMs的长上下文训练，序列并行性已经成为一种重要的技术，它将每个输入序列分散到多个设备上，并需要通信来处理序列。实际上，现有的序列并行方法假定序列长度是均匀的（即所有输入序列长度相等），因此为所有输入序列采用单一的静态散射策略。然而，在现实中，LLM训练语料库中的序列长度存在相当大的变异性，通常遵循长尾分布，这导致了工作负载的异质性。在本文中，我们展示采用单一的静态策略会导致效率低下和资源未充分利用，突显了处理序列间异质工作负载的需求。为了解决这个问题，我们提出了一种适应性的序列并行方法。对于每个训练步骤，我们的方法捕捉序列长度的变异性，并根据工作负载特征分配最佳的散射策略组合。我们将这个问题建模为线性规划优化，并设计了一个高效和有效的求解器以找到最优解。此外，我们在一个支持分布式LLM训练中的高性能系统中实现了我们的方法。实验结果表明，我们的系统在性能上超过了最先进的训练框架高达1.98倍。

更新时间: 2025-02-11 07:31:03

领域: cs.DC,cs.LG

下载: http://arxiv.org/abs/2412.01523v3

Learnable Residual-based Latent Denoising in Semantic Communication

A latent denoising semantic communication (SemCom) framework is proposed for robust image transmission over noisy channels. By incorporating a learnable latent denoiser into the receiver, the received signals are preprocessed to effectively remove the channel noise and recover the semantic information, thereby enhancing the quality of the decoded images. Specifically, a latent denoising mapping is established by an iterative residual learning approach to improve the denoising efficiency while ensuring stable performance. Moreover, channel signal-to-noise ratio (SNR) is utilized to estimate and predict the latent similarity score (SS) for conditional denoising, where the number of denoising steps is adapted based on the predicted SS sequence, further reducing the communication latency. Finally, simulations demonstrate that the proposed framework can effectively and efficiently remove the channel noise at various levels and reconstruct visual-appealing images.

Updated: 2025-02-11 07:29:32

标题: 可学习的基于残差的语义通信中的潜在去噪

摘要: 提出了一个潜在的去噪语义通信（SemCom）框架，用于在嘈杂的信道上进行稳健的图像传输。通过在接收端引入可学习的潜在去噪器，对接收到的信号进行预处理，有效去除信道噪声并恢复语义信息，从而提高解码图像的质量。具体而言，通过迭代残差学习方法建立了一个潜在去噪映射，以提高去噪效率同时确保稳定性。此外，利用信道信噪比（SNR）来估计和预测潜在相似度得分（SS）进行有条件的去噪，根据预测的SS序列调整去噪步骤的数量，进一步减少通信延迟。最后，模拟结果表明，所提出的框架能够有效且高效地去除各种级别的信道噪声并重建视觉吸引人的图像。

更新时间: 2025-02-11 07:29:32

领域: cs.LG,cs.IT,math.IT

下载: http://arxiv.org/abs/2502.07319v1

Information Theoretic Text-to-Image Alignment

Diffusion models for Text-to-Image (T2I) conditional generation have recently achieved tremendous success. Yet, aligning these models with user's intentions still involves a laborious trial-and-error process, and this challenging alignment problem has attracted considerable attention from the research community. In this work, instead of relying on fine-grained linguistic analyses of prompts, human annotation, or auxiliary vision-language models, we use Mutual Information (MI) to guide model alignment. In brief, our method uses self-supervised fine-tuning and relies on a point-wise (MI) estimation between prompts and images to create a synthetic fine-tuning set for improving model alignment. Our analysis indicates that our method is superior to the state-of-the-art, yet it only requires the pre-trained denoising network of the T2I model itself to estimate MI, and a simple fine-tuning strategy that improves alignment while maintaining image quality. Code available at https://github.com/Chao0511/mitune.

Updated: 2025-02-11 07:27:41

标题: 信息论文本到图像对齐

摘要: 最近，针对文本到图像（T2I）条件生成任务的扩散模型取得了巨大成功。然而，将这些模型与用户意图对齐仍然涉及繁琐的试错过程，这个具有挑战性的对齐问题引起了研究界的广泛关注。在这项工作中，我们不依赖于对提示进行精细的语言分析、人工标注或辅助视觉-语言模型，而是利用互信息（MI）来指导模型对齐。简而言之，我们的方法使用自监督微调，依赖于提示和图像之间的点对点（MI）估计，为改善模型对齐创建了一个合成微调集。我们的分析表明，我们的方法优于最先进技术，但只需要T2I模型本身的预训练去噪网络来估计MI，以及一种简单的微调策略，可以在保持图像质量的同时改善对齐。源代码可在https://github.com/Chao0511/mitune获取。

更新时间: 2025-02-11 07:27:41

领域: cs.LG,cs.CV

下载: http://arxiv.org/abs/2405.20759v3

Denoising Task Difficulty-based Curriculum for Training Diffusion Models

Diffusion-based generative models have emerged as powerful tools in the realm of generative modeling. Despite extensive research on denoising across various timesteps and noise levels, a conflict persists regarding the relative difficulties of the denoising tasks. While various studies argue that lower timesteps present more challenging tasks, others contend that higher timesteps are more difficult. To address this conflict, our study undertakes a comprehensive examination of task difficulties, focusing on convergence behavior and changes in relative entropy between consecutive probability distributions across timesteps. Our observational study reveals that denoising at earlier timesteps poses challenges characterized by slower convergence and higher relative entropy, indicating increased task difficulty at these lower timesteps. Building on these observations, we introduce an easy-to-hard learning scheme, drawing from curriculum learning, to enhance the training process of diffusion models. By organizing timesteps or noise levels into clusters and training models with ascending orders of difficulty, we facilitate an order-aware training regime, progressing from easier to harder denoising tasks, thereby deviating from the conventional approach of training diffusion models simultaneously across all timesteps. Our approach leads to improved performance and faster convergence by leveraging benefits of curriculum learning, while maintaining orthogonality with existing improvements in diffusion training techniques. We validate these advantages through comprehensive experiments in image generation tasks, including unconditional, class-conditional, and text-to-image generation.

Updated: 2025-02-11 07:25:00

标题: 根据去噪任务难度设计的用于训练扩散模型的课程

摘要: 扩散基础生成模型已经成为生成建模领域中强大的工具。尽管在各种时间步长和噪声水平下进行了大量去噪研究，但对于去噪任务的相对难度仍然存在争议。一些研究认为较低的时间步长提出了更具挑战性的任务，而其他人则认为较高的时间步长更难。为了解决这一争议，我们的研究对任务难度进行了全面检查，重点关注收敛行为和在不同时间步长下连续概率分布之间的相对熵的变化。我们的观察研究显示，较早的时间步长的去噪面临更慢的收敛和更高的相对熵，表明在这些较低的时间步长上任务难度增加。基于这些观察，我们引入了一个易到难的学习方案，借鉴了课程学习的概念，以增强扩散模型的训练过程。通过将时间步长或噪声水平划分为集群，并使用递增难度的顺序训练模型，我们促进了一个有序的训练制度，从容易到困难的去噪任务逐步推进，从而偏离了同时在所有时间步长上训练扩散模型的传统方法。我们的方法利用了课程学习的好处，导致了改善的性能和更快的收敛，同时保持了与扩散训练技术中现有改进的正交性。我们通过对图像生成任务进行全面实验验证了这些优势，包括无条件生成、类条件生成和文本到图像生成。

更新时间: 2025-02-11 07:25:00

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2403.10348v3

OpenGrok: Enhancing SNS Data Processing with Distilled Knowledge and Mask-like Mechanisms

This report details Lumen Labs' novel approach to processing Social Networking Service (SNS) data. We leverage knowledge distillation, specifically a simple distillation method inspired by DeepSeek-R1's CoT acquisition, combined with prompt hacking, to extract valuable training data from the Grok model. This data is then used to fine-tune a Phi-3-mini model, augmented with a mask-like mechanism specifically designed for handling the nuances of SNS data. Our method demonstrates state-of-the-art (SOTA) performance on several SNS data processing tasks, outperforming existing models like Grok, Phi-3, and GPT-4. We provide a comprehensive analysis of our approach, including mathematical formulations, engineering details, ablation studies, and comparative evaluations.

Updated: 2025-02-11 07:20:38

标题: OpenGrok：利用提炼的知识和类似掩模的机制增强SNS数据处理

摘要: 这份报告详细介绍了Lumen Labs处理社交网络服务（SNS）数据的新方法。我们利用知识蒸馏，特别是受DeepSeek-R1的CoT获取启发的简单蒸馏方法，结合提示黑客技术，从Grok模型中提取有价值的训练数据。然后利用这些数据对Phi-3-mini模型进行微调，配备了一个类似面具的机制，专门设计用于处理SNS数据的细微差别。我们的方法在几项SNS数据处理任务上展现了最先进的性能，超过了现有模型如Grok、Phi-3和GPT-4。我们提供了我们方法的全面分析，包括数学公式、工程细节、消融研究和比较评估。

更新时间: 2025-02-11 07:20:38

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2502.07312v1

Tackling Dimensional Collapse toward Comprehensive Universal Domain Adaptation

Universal Domain Adaptation (UniDA) addresses unsupervised domain adaptation where target classes may differ arbitrarily from source ones, except for a shared subset. An important approach, partial domain matching (PDM), aligns only shared classes but struggles in extreme cases where many source classes are absent in the target domain, underperforming the most naive baseline that trains on only source data. In this work, we identify that the failure of PDM for extreme UniDA stems from dimensional collapse (DC) in target representations. To address target DC, we propose to jointly leverage the alignment and uniformity techniques in modern self-supervised learning (SSL) on the unlabeled target data to preserve the intrinsic structure of the learned representations. Our experimental results confirm that SSL consistently advances PDM and delivers new state-of-the-art results across a broader benchmark of UniDA scenarios with different portions of shared classes, representing a crucial step toward truly comprehensive UniDA.

Updated: 2025-02-11 07:18:41

标题: 解决面向全面通用领域适应的维度崩溃

摘要: Universal Domain Adaptation（UniDA）处理无监督域适应问题，其中目标类别可能与源类别不同，除了一个共享子集。一种重要方法，部分域匹配（PDM），仅对齐共享类别，但在极端情况下遇到困难，即许多源类别在目标域中不存在，表现不如仅在源数据上训练的最简单基线。在这项工作中，我们发现PDM在极端UniDA中失败的根源是目标表示中的维度坍缩（DC）。为了解决目标DC，我们提议联合利用现代自监督学习（SSL）中的对齐和均匀化技术，对无标签的目标数据进行处理，以保留学习表示的内在结构。我们的实验结果证实，SSL持续推动PDM，并在更广泛的UniDA场景基准测试中取得了最新的最新结果，这代表着朝着真正全面的UniDA迈出了重要的一步。

更新时间: 2025-02-11 07:18:41

领域: cs.LG

下载: http://arxiv.org/abs/2410.11271v2

EgoOops: A Dataset for Mistake Action Detection from Egocentric Videos Referring to Procedural Texts

Mistake action detection is crucial for developing intelligent archives that detect workers' errors and provide feedback. Existing studies have focused on visually apparent mistakes in free-style activities, resulting in video-only approaches to mistake detection. However, in text-following activities, models cannot determine the correctness of some actions without referring to the texts. Additionally, current mistake datasets rarely use procedural texts for video recording except for cooking. To fill these gaps, this paper proposes the EgoOops dataset, where egocentric videos record erroneous activities when following procedural texts across diverse domains. It features three types of annotations: video-text alignment, mistake labels, and descriptions for mistakes. We also propose a mistake detection approach, combining video-text alignment and mistake label classification to leverage the texts. Our experimental results show that incorporating procedural texts is essential for mistake detection. Data is available through https://y-haneji.github.io/EgoOops-project-page/.

Updated: 2025-02-11 07:17:37

标题: EgoOops：一个用于从自我中心视频中检测错误操作的数据集，参考程序文本

摘要: 错误动作检测对于开发能够检测工人错误并提供反馈的智能档案至关重要。现有研究主要关注自由活动中明显的错误，导致仅使用视频方法来检测错误。然而，在遵循文本的活动中，模型无法在不参考文本的情况下确定某些动作的正确性。此外，目前的错误数据集很少在视频录制中使用程序文本，除了烹饪之外。为了填补这些空白，本文提出了EgoOops数据集，其中以自我中心的视频记录了在各个领域遵循程序文本时发生的错误活动。它包括三种类型的注释：视频文本对齐，错误标签和错误描述。我们还提出了一种错误检测方法，结合视频文本对齐和错误标签分类来利用文本。我们的实验结果表明，结合程序文本对于错误检测至关重要。数据可通过https://y-haneji.github.io/EgoOops-project-page/获得。

更新时间: 2025-02-11 07:17:37

领域: cs.CV,cs.AI,cs.CL

下载: http://arxiv.org/abs/2410.05343v2

One Class Restricted Kernel Machines

Restricted kernel machines (RKMs) have demonstrated a significant impact in enhancing generalization ability in the field of machine learning. Recent studies have introduced various methods within the RKM framework, combining kernel functions with the least squares support vector machine (LSSVM) in a manner similar to the energy function of restricted boltzmann machines (RBM), such that a better performance can be achieved. However, RKM's efficacy can be compromised by the presence of outliers and other forms of contamination within the dataset. These anomalies can skew the learning process, leading to less accurate and reliable outcomes. To address this critical issue and to ensure the robustness of the model, we propose the novel one-class RKM (OCRKM). In the framework of OCRKM, we employ an energy function akin to that of the RBM, which integrates both visible and hidden variables in a nonprobabilistic setting. The formulation of the proposed OCRKM facilitates the seamless integration of one-class classification method with the RKM, enhancing its capability to detect outliers and anomalies effectively. The proposed OCRKM model is evaluated over UCI benchmark datasets. Experimental findings and statistical analyses consistently emphasize the superior generalization capabilities of the proposed OCRKM model over baseline models across all scenarios.

Updated: 2025-02-11 07:11:20

标题: 一类受限核机器

摘要: 限制核机器（RKMs）已在机器学习领域展示出显著的泛化能力增强。最近的研究在RKM框架内引入了各种方法，将核函数与最小二乘支持向量机（LSSVM）结合起来，类似于受限玻尔兹曼机（RBM）的能量函数，从而实现更好的性能。然而，RKM的有效性可能会受到异常值和数据集中其他形式的污染的影响。这些异常值可能会扭曲学习过程，导致不太准确和可靠的结果。为了解决这一关键问题并确保模型的鲁棒性，我们提出了新颖的单类RKM（OCRKM）。在OCRKM框架中，我们采用类似于RBM的能量函数，将可见变量和隐藏变量集成在非概率设置中。所提出的OCRKM的公式化促进了单类分类方法与RKM的无缝集成，增强了其有效检测异常值和异常的能力。提出的OCRKM模型在UCI基准数据集上进行了评估。实验结果和统计分析一致强调了提出的OCRKM模型在所有情景下相对基线模型的优越泛化能力。

更新时间: 2025-02-11 07:11:20

领域: cs.LG

下载: http://arxiv.org/abs/2502.10443v1

TRAVEL: Training-Free Retrieval and Alignment for Vision-and-Language Navigation

In this work, we propose a modular approach for the Vision-Language Navigation (VLN) task by decomposing the problem into four sub-modules that use state-of-the-art Large Language Models (LLMs) and Vision-Language Models (VLMs) in a zero-shot setting. Given navigation instruction in natural language, we first prompt LLM to extract the landmarks and the order in which they are visited. Assuming the known model of the environment, we retrieve the top-k locations of the last landmark and generate $k$ path hypotheses from the starting location to the last landmark using the shortest path algorithm on the topological map of the environment. Each path hypothesis is represented by a sequence of panoramas. We then use dynamic programming to compute the alignment score between the sequence of panoramas and the sequence of landmark names, which match scores obtained from VLM. Finally, we compute the nDTW metric between the hypothesis that yields the highest alignment score to evaluate the path fidelity. We demonstrate superior performance compared to other approaches that use joint semantic maps like VLMaps \cite{vlmaps} on the complex R2R-Habitat \cite{r2r} instruction dataset and quantify in detail the effect of visual grounding on navigation performance.

Updated: 2025-02-11 07:09:37

标题: 旅行：无需训练的视觉和语言导航检索和对齐

摘要: 在这项工作中，我们提出了一种模块化方法，用于将视觉语言导航（VLN）任务分解为四个子模块，这些子模块在零-shot设置中使用最先进的大型语言模型（LLMs）和视觉语言模型（VLMs）。给定自然语言的导航指令，我们首先提示LLM提取地标及其访问顺序。假设已知环境模型，我们检索最后一个地标的前k个位置，并使用最短路径算法在环境的拓扑图上从起始位置到最后一个地标生成$k$条路径假设。每个路径假设由一系列全景图表示。然后，我们使用动态规划计算全景图序列与地标名称序列之间的对齐分数，这些分数与VLM得到的匹配分数相匹配。最后，我们计算nDTW指标来评估产生最高对齐分数的假设的路径保真度。我们展示了与使用联合语义地图（如VLMaps）在复杂的R2R-Habitat指令数据集上的其他方法相比的卓越性能，并详细量化了视觉基础对导航性能的影响。

更新时间: 2025-02-11 07:09:37

领域: cs.CV,cs.AI,cs.CL,cs.LG,cs.RO

下载: http://arxiv.org/abs/2502.07306v1

Calibrating LLMs with Information-Theoretic Evidential Deep Learning

Fine-tuned large language models (LLMs) often exhibit overconfidence, particularly when trained on small datasets, resulting in poor calibration and inaccurate uncertainty estimates. Evidential Deep Learning (EDL), an uncertainty-aware approach, enables uncertainty estimation in a single forward pass, making it a promising method for calibrating fine-tuned LLMs. However, despite its computational efficiency, EDL is prone to overfitting, as its training objective can result in overly concentrated probability distributions. To mitigate this, we propose regularizing EDL by incorporating an information bottleneck (IB). Our approach IB-EDL suppresses spurious information in the evidence generated by the model and encourages truly predictive information to influence both the predictions and uncertainty estimates. Extensive experiments across various fine-tuned LLMs and tasks demonstrate that IB-EDL outperforms both existing EDL and non-EDL approaches. By improving the trustworthiness of LLMs, IB-EDL facilitates their broader adoption in domains requiring high levels of confidence calibration. Code is available at https://github.com/sandylaker/ib-edl.

Updated: 2025-02-11 07:06:40

标题: 用信息论证据深度学习校准LLMs

摘要: 经过微调的大型语言模型（LLMs）在训练小数据集时往往表现出过分的自信，导致校准不良和不准确的不确定性估计。 Evidential Deep Learning（EDL）是一种关注不确定性的方法，可以在单次前向传递中进行不确定性估计，因此是校准微调LLMs的一种有前途的方法。然而，尽管EDL具有计算效率，但由于其训练目标可能导致过于集中的概率分布，因此容易过拟合。为了缓解这一问题，我们提出了通过引入信息瓶颈（IB）来正则化EDL的方法。我们的方法IB-EDL抑制了模型生成的伪信息，并鼓励真正预测性信息影响预测和不确定性估计。通过对各种微调的LLMs和任务进行大量实验，我们证明了IB-EDL优于现有的EDL和非EDL方法。通过提高LLMs的可信度，IB-EDL有助于在需要高水平信心校准的领域更广泛地采用这些模型。代码可在https://github.com/sandylaker/ib-edl获得。

更新时间: 2025-02-11 07:06:40

领域: cs.LG

下载: http://arxiv.org/abs/2502.06351v2

Recommendations to OSCE/ODIHR (on how to give better recommendations for Internet voting)

This paper takes a critical look at the recommendations OSCE/ODIHR has given for the Estonian Internet voting over the 20 years it has been running. We present examples of recommendations that can not be fulfilled at all, but also examples where fulfilling a recommendation requires a non-trivial trade-off, potentially weakening the system in some other respect. In such cases OSCE/ODIHR should take an explicit position which trade-off it recommends. We also look at the development of the recommendation to introduce end-to-end verifiability. In this case we expect OSCE/ODIHR to define what it exactly means by this property, as well as to give explicit criteria to determine whether and to which extent end-to-end verifiability has been achieved.

Updated: 2025-02-11 07:04:43

标题: 建议给予欧安组织/ODIHR（如何为网络投票提供更好的建议）

摘要: 本文对欧安组织/民主与人权事务办公室（OSCE/ODIHR）在爱沙尼亚互联网投票运行20年以来所提出的建议进行了批判性审视。我们提供了一些无法实现的建议示例，同时也展示了履行建议可能需要进行非平凡的权衡，可能会在其他方面削弱系统。在这种情况下，欧安组织/民主与人权事务办公室应该明确表态，推荐哪种权衡。我们还关注了引入端到端可验证性建议的发展。在这种情况下，我们期望欧安组织/民主与人权事务办公室定义这一属性的确切含义，并提供明确的标准来确定是否以及在多大程度上实现了端到端可验证性。

更新时间: 2025-02-11 07:04:43

领域: cs.CR

下载: http://arxiv.org/abs/2502.06385v2

Revisiting Benchmark and Assessment: An Agent-based Exploratory Dynamic Evaluation Framework for LLMs

While various vertical domain large language models (LLMs) have been developed, automatically evaluating their performance across different domains remains a critical challenge. Current benchmark-based methods often rely on static and costly datasets, are misaligned with practical user needs, and lack flexibility across domains. To address these limitations, we revisit the evaluation process and introduce two key concepts: Benchmark+, which extends the traditional question-answer benchmark into a more flexible ``strategy-criterion'' format; and Assessment+, which enhances the interaction process, enabling deeper exploration and supporting analysis from broader perspectives. We propose TestAgent, an agent-based evaluation framework that implements these concepts using retrieval-augmented generation and reinforcement learning. TestAgent enables automatic dynamic benchmark generation and in-depth assessment across diverse vertical domain scenarios. Experiments on tasks ranging from constructing multiple vertical domain evaluations to converting static benchmarks into dynamic forms demonstrate the effectiveness of TestAgent. This work offers an interesting perspective on automatic evaluation for LLMs and highlights a pathway for dynamic and domain-adaptive assessments.

Updated: 2025-02-11 07:03:51

标题: 重新审视基准和评估：基于Agent的探索性动态评估框架对LLMs

摘要: 尽管已经开发了各种垂直领域大型语言模型（LLMs），但自动评估它们在不同领域的表现仍然是一个关键挑战。当前基于基准的方法通常依赖于静态和昂贵的数据集，与实际用户需求不一致，并且在不同领域之间缺乏灵活性。为了解决这些限制，我们重新审视评估过程并引入两个关键概念：Benchmark+，它将传统的问答基准扩展为更灵活的“策略-标准”格式；以及Assessment+，它增强了交互过程，使深度探索成为可能，并支持更广泛的分析。我们提出了TestAgent，这是一个基于代理的评估框架，利用检索增强生成和强化学习来实现这些概念。TestAgent实现了自动动态基准生成和在不同垂直领域场景中进行深入评估。从构建多个垂直领域评估到将静态基准转换为动态形式的任务实验表明了TestAgent的有效性。这项工作为LLMs的自动评估提供了有趣的视角，并强调了动态和领域自适应评估的途径。

更新时间: 2025-02-11 07:03:51

领域: cs.AI,cs.CL

下载: http://arxiv.org/abs/2410.11507v3

Value Preferences Estimation and Disambiguation in Hybrid Participatory Systems

Understanding citizens' values in participatory systems is crucial for citizen-centric policy-making. We envision a hybrid participatory system where participants make choices and provide motivations for those choices, and AI agents estimate their value preferences by interacting with them. We focus on situations where a conflict is detected between participants' choices and motivations, and propose methods for estimating value preferences while addressing detected inconsistencies by interacting with the participants. We operationalize the philosophical stance that "valuing is deliberatively consequential." That is, if a participant's choice is based on a deliberation of value preferences, the value preferences can be observed in the motivation the participant provides for the choice. Thus, we propose and compare value preferences estimation methods that prioritize the values estimated from motivations over the values estimated from choices alone. Then, we introduce a disambiguation strategy that combines Natural Language Processing and Active Learning to address the detected inconsistencies between choices and motivations. We evaluate the proposed methods on a dataset of a large-scale survey on energy transition. The results show that explicitly addressing inconsistencies between choices and motivations improves the estimation of an individual's value preferences. The disambiguation strategy does not show substantial improvements when compared to similar baselines--however, we discuss how the novelty of the approach can open new research avenues and propose improvements to address the current limitations.

Updated: 2025-02-11 07:00:28

标题: 混合参与系统中的价值偏好估计和消歧

摘要: 理解公民在参与式系统中的价值观对于以公民为中心的政策制定至关重要。我们设想了一个混合参与式系统，参与者在其中做出选择并提供选择动机，而AI代理通过与他们互动来估计他们的价值偏好。我们专注于检测参与者的选择和动机之间存在冲突的情况，并提出了在与参与者互动时估计价值偏好的方法，同时解决检测到的不一致性。我们将“评价是经过深思熟虑的结果”的哲学立场具体化。也就是说，如果参与者的选择是基于对价值偏好的深思熟虑，那么可以从参与者为选择提供的动机中观察到价值偏好。因此，我们提出并比较了优先考虑从动机中估计的价值而不是仅从选择中估计的价值的价值偏好估计方法。然后，我们介绍了一种结合自然语言处理和主动学习的消歧策略，以解决选择和动机之间的检测到的不一致性。我们在一个关于能源转型的大规模调查数据集上评估了提出的方法。结果显示，明确解决选择和动机之间的不一致性可以改善个体的价值偏好估计。与类似基准相比，消歧策略并未显示出实质性的改进，然而，我们讨论了该方法的新颖性如何可以开辟新的研究方向，并提出改进措施以解决当前的局限性。

更新时间: 2025-02-11 07:00:28

领域: cs.AI,cs.CL

下载: http://arxiv.org/abs/2402.16751v3

The Complexity of Learning Sparse Superposed Features with Feedback

The success of deep networks is crucially attributed to their ability to capture latent features within a representation space. In this work, we investigate whether the underlying learned features of a model can be efficiently retrieved through feedback from an agent, such as a large language model (LLM), in the form of relative \textit{triplet comparisons}. These features may represent various constructs, including dictionaries in LLMs or components of a covariance matrix of Mahalanobis distances. We analyze the feedback complexity associated with learning a feature matrix in sparse settings. Our results establish tight bounds when the agent is permitted to construct activations and demonstrate strong upper bounds in sparse scenarios when the agent's feedback is limited to distributional information. We validate our theoretical findings through experiments on two distinct applications: feature recovery from Recursive Feature Machine-trained models and dictionary extraction from sparse autoencoders trained on Large Language Models.

Updated: 2025-02-11 06:57:41

标题: 学习带有反馈的稀疏叠加特征的复杂性

摘要: 深度网络的成功主要归因于其捕获表示空间中的潜在特征的能力。在这项工作中，我们研究了通过来自代理的反馈（如大型语言模型(LLM)）以相对\textit{三元比较}的形式，是否能够有效地检索模型的基础学习特征。这些特征可以代表各种构造，包括LLM中的词典或马哈拉诺比斯距离的协方差矩阵的组件。我们分析了在稀疏设置中学习特征矩阵所涉及的反馈复杂性。我们的结果建立了当代理被允许构建激活时的严格界限，并在稀疏场景中展示了当代理的反馈被限制为分布信息时的强大上限。我们通过对两个不同应用的实验验证了我们的理论发现：从递归特征机器训练模型中恢复特征以及从在大型语言模型上训练的稀疏自编码器中提取词典。

更新时间: 2025-02-11 06:57:41

领域: cs.LG,cs.AI,stat.ML

下载: http://arxiv.org/abs/2502.05407v2

Life-Code: Central Dogma Modeling with Multi-Omics Sequence Unification

The interactions between DNA, RNA, and proteins are fundamental to biological processes, as illustrated by the central dogma of molecular biology. While modern biological pre-trained models have achieved great success in analyzing these macromolecules individually, their interconnected nature remains under-explored. In this paper, we follow the guidance of the central dogma to redesign both the data and model pipeline and offer a comprehensive framework, Life-Code, that spans different biological functions. As for data flow, we propose a unified pipeline to integrate multi-omics data by reverse-transcribing RNA and reverse-translating amino acids into nucleotide-based sequences. As for the model, we design a codon tokenizer and a hybrid long-sequence architecture to encode the interactions of both coding and non-coding regions with masked modeling pre-training. To model the translation and folding process with coding sequences, Life-Code learns protein structures of the corresponding amino acids by knowledge distillation from off-the-shelf protein language models. Such designs enable Life-Code to capture complex interactions within genetic sequences, providing a more comprehensive understanding of multi-omics with the central dogma. Extensive Experiments show that Life-Code achieves state-of-the-art performance on various tasks across three omics, highlighting its potential for advancing multi-omics analysis and interpretation.

Updated: 2025-02-11 06:53:59

标题: 生命密码：多组学序列统一的中心法则建模

摘要: DNA、RNA和蛋白质之间的相互作用对生物过程至关重要，正如分子生物学的中心法则所示。虽然现代生物学预训练模型在分析这些大分子方面取得了巨大成功，但它们之间相互联系的性质仍未得到充分探索。在本文中，我们遵循中心法则的指导，重新设计数据和模型流程，并提供一个跨越不同生物功能的综合框架，Life-Code。至于数据流，我们提出一个统一的流程，通过将RNA逆转录和氨基酸逆转译成以核苷酸为基础的序列，整合多组学数据。至于模型，我们设计了一个密码子标记器和一个混合长序列架构，以对编码和非编码区域的相互作用进行编码，并进行掩码建模预训练。为了模拟编码序列的翻译和折叠过程，Life-Code通过从现成的蛋白质语言模型进行知识蒸馏来学习相应氨基酸的蛋白质结构。这样的设计使得Life-Code能够捕捉遗传序列内的复杂相互作用，提供对多组学与中心法则的更全面理解。大量实验证明，Life-Code在跨三个组学的各种任务上取得了最先进的性能，突显了其推动多组学分析和解释的潜力。

更新时间: 2025-02-11 06:53:59

领域: cs.LG,cs.AI,cs.CL,q-bio.GN

下载: http://arxiv.org/abs/2502.07299v1

Generation of Drug-Induced Cardiac Reactions towards Virtual Clinical Trials

Clinical trials are pivotal in cardiac drug development, yet they often fail due to inadequate efficacy and unexpected safety issues, leading to significant financial losses. Using in-silico trials to replace a part of physical clinical trials, e.g., leveraging advanced generative models to generate drug-influenced electrocardiograms (ECGs), seems an effective method to reduce financial risk and potential harm to trial participants. While existing generative models have demonstrated progress in ECG generation, they fall short in modeling drug reactions due to limited fidelity and inability to capture individualized drug response patterns. In this paper, we propose a Drug-Aware Diffusion Model (DADM), which could simulate individualized drug reactions while ensuring fidelity. To ensure fidelity, we construct a set of ordinary differential equations to provide external physical knowledge (EPK) of the realistic ECG morphology. The EPK is used to adaptively constrain the morphology of the generated ECGs through a dynamic cross-attention (DCA) mechanism. Furthermore, we propose an extension of ControlNet to incorporate demographic and drug data, simulating individual drug reactions. We compare DADM with the other eight state-of-the-art ECG generative models on two real-world databases covering 8 types of drug regimens. The results demonstrate that DADM can more accurately simulate drug-induced changes in ECGs, improving the accuracy by at least 5.79% and recall by 8%.

Updated: 2025-02-11 06:50:33

标题: 药物诱发心脏反应在虚拟临床试验中的产生

摘要: 临床试验在心脏药物开发中至关重要，然而它们经常因为疗效不足和意外安全问题而失败，导致重大财务损失。利用计算试验来取代部分物理临床试验，例如利用先进的生成模型生成受药影响的心电图（ECG），似乎是减少财务风险和潜在对试验参与者造成伤害的有效方法。尽管现有的生成模型在ECG生成方面取得了进展，但由于有限的保真度和无法捕捉个性化药物反应模式，它们在建模药物反应方面仍存在不足。在本文中，我们提出了一种药物感知扩散模型（DADM），可以模拟个性化的药物反应同时确保保真度。为了确保保真度，我们构建了一组普通微分方程来提供真实ECG形态的外部物理知识（EPK）。EPK被用于通过动态交叉注意力（DCA）机制自适应地约束生成的ECG的形态。此外，我们提出了对ControlNet的扩展，以整合人口统计和药物数据，模拟个体药物反应。我们将DADM与其他八种最先进的ECG生成模型在覆盖8种药物方案的两个真实数据库上进行比较。结果表明，DADM能够更准确地模拟药物诱导的ECG变化，将准确度提高至少5.79％，召回率提高8％。

更新时间: 2025-02-11 06:50:33

领域: cs.LG,q-bio.QM

下载: http://arxiv.org/abs/2502.07297v1

CataractBot: An LLM-Powered Expert-in-the-Loop Chatbot for Cataract Patients

The healthcare landscape is evolving, with patients seeking reliable information about their health conditions and available treatment options. Despite the abundance of information sources, the digital age overwhelms individuals with excess, often inaccurate information. Patients primarily trust medical professionals, highlighting the need for expert-endorsed health information. However, increased patient loads on experts has led to reduced communication time, impacting information sharing. To address this gap, we developed CataractBot. CataractBot answers cataract surgery related questions instantly using an LLM to query a curated knowledge base, and provides expert-verified responses asynchronously. It has multimodal and multilingual capabilities. In an in-the-wild deployment study with 49 patients and attendants, 4 doctors, and 2 patient coordinators, CataractBot demonstrated potential, providing anytime accessibility, saving time, accommodating diverse literacy levels, alleviating power differences, and adding a privacy layer between patients and doctors. Users reported that their trust in the system was established through expert verification. Broadly, our results could inform future work on designing expert-mediated LLM bots.

Updated: 2025-02-11 06:48:43

标题: 白内障机器人：一款由LLM驱动的专家辅助聊天机器人，服务于白内障患者

摘要: 医疗健康领域正在发生变化，患者正在寻找可靠的关于他们健康状况和可用治疗选择的信息。尽管信息来源丰富，数字时代却让个人被过多、经常不准确的信息所淹没。患者主要信任医疗专业人员，突显了对专家认可的健康信息的需求。然而，患者对专家的负荷增加导致了沟通时间的减少，影响了信息共享。为了填补这一差距，我们开发了CataractBot。CataractBot利用LLM查询一个经过筛选的知识库，即时回答白内障手术相关问题，并异步提供经专家验证的答复。它具有多模态和多语言能力。在一个包括49名患者和陪护人员、4名医生和2名患者协调员的野外部署研究中，CataractBot展示了潜力，提供了随时可访问性，节省了时间，适应了不同的读写水平，缓解了权力差异，并在患者和医生之间增加了隐私层。用户报告称，他们对该系统的信任是通过专家验证建立的。广泛地说，我们的结果可以为未来设计专家中介LLM机器人的工作提供信息。

更新时间: 2025-02-11 06:48:43

领域: cs.HC,cs.LG

下载: http://arxiv.org/abs/2402.04620v5

PRKAN: Parameter-Reduced Kolmogorov-Arnold Networks

Kolmogorov-Arnold Networks (KANs) represent an innovation in neural network architectures, offering a compelling alternative to Multi-Layer Perceptrons (MLPs) in models such as Convolutional Neural Networks (CNNs), Recurrent Neural Networks (RNNs), and Transformers. By advancing network design, KANs drive groundbreaking research and enable transformative applications across various scientific domains involving neural networks. However, existing KANs often require significantly more parameters in their network layers than MLPs. To address this limitation, this paper introduces PRKANs (Parameter-Reduced Kolmogorov-Arnold Networks), which employ several methods to reduce the parameter count in KAN layers, making them comparable to MLP layers. Experimental results on the MNIST and Fashion-MNIST datasets demonstrate that PRKANs outperform several existing KANs, and their variant with attention mechanisms rivals the performance of MLPs, albeit with slightly longer training times. Furthermore, the study highlights the advantages of Gaussian Radial Basis Functions (GRBFs) and layer normalization in KAN designs. The repository for this work is available at: https://github.com/hoangthangta/All-KAN.

Updated: 2025-02-11 06:47:56

标题: PRKAN: 参数减少的科尔莫戈洛夫-阿诺德网络

摘要: 科尔莫戈洛夫-阿诺德网络（KANs）代表了神经网络架构的一项创新，为卷积神经网络（CNNs）、循环神经网络（RNNs）和变压器等模型提供了一种引人注目的替代方案，与多层感知器（MLPs）相比。通过推进网络设计，KANs推动了突破性研究，并在涉及神经网络的各种科学领域中实现了变革性应用。然而，现有的KANs通常在其网络层中需要比MLPs更多的参数。为了解决这一限制，本文介绍了PRKANs（参数减少的科尔莫戈洛夫-阿诺德网络），采用多种方法来减少KAN层中的参数数量，使其与MLP层可比。在MNIST和Fashion-MNIST数据集上的实验结果表明，PRKANs胜过了几种现有的KANs，并且具有注意机制的变种与MLPs的表现相媲美，尽管训练时间略长。此外，研究突显了高斯径向基函数（GRBFs）和层归一化在KAN设计中的优势。该工作的存储库可在https://github.com/hoangthangta/All-KAN 上找到。

更新时间: 2025-02-11 06:47:56

领域: cs.LG

下载: http://arxiv.org/abs/2501.07032v4

GRAPHMOE: Amplifying Cognitive Depth of Mixture-of-Experts Network via Introducing Self-Rethinking Mechanism

Traditional Mixture-of-Experts (MoE) networks benefit from utilizing multiple smaller expert models as opposed to a single large network. However, these experts typically operate independently, leaving a question open about whether interconnecting these models could enhance the performance of MoE networks. In response, we introduce GRAPHMOE, a novel method aimed at augmenting the cognitive depth of language models via a self-rethinking mechanism constructed on Pseudo GraphMoE networks. GRAPHMOE employs a recurrent routing strategy to simulate iterative thinking steps, thereby facilitating the flow of information among expert nodes. We implement the GRAPHMOE architecture using Low-Rank Adaptation techniques (LoRA) and conduct extensive experiments on various benchmark datasets. The experimental results reveal that GRAPHMOE outperforms other LoRA based models, achieving state-of-the-art (SOTA) performance. Additionally, this study explores a novel recurrent routing strategy that may inspire further advancements in enhancing the reasoning capabilities of language models.

Updated: 2025-02-11 06:47:01

标题: GRAPHMOE：通过引入自我反思机制增强专家网络的认知深度

摘要: 传统的专家混合（MoE）网络通过利用多个较小的专家模型而不是单个大型网络获益。然而，这些专家通常独立操作，这就引出了一个问题：连接这些模型是否能提升MoE网络的性能。为此，我们引入了GRAPHMOE，一种旨在通过伪图MoE网络上构建的自我反思机制来增强语言模型认知深度的新方法。GRAPHMOE采用循环路由策略来模拟迭代思考步骤，从而促进专家节点之间的信息流动。我们使用低秩调整技术（LoRA）实现了GRAPHMOE架构，并在各种基准数据集上进行了广泛实验。实验结果显示，GRAPHMOE优于其他基于LoRA的模型，实现了最先进（SOTA）的性能。此外，本研究探讨了一种可能激发语言模型推理能力进一步提升的新颖循环路由策略。

更新时间: 2025-02-11 06:47:01

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2501.07890v2

Asymptotically Optimal Change Detection for Unnormalized Pre- and Post-Change Distributions

This paper addresses the problem of detecting changes when only unnormalized pre- and post-change distributions are accessible. This situation happens in many scenarios in physics such as in ferromagnetism, crystallography, magneto-hydrodynamics, and thermodynamics, where the energy models are difficult to normalize. Our approach is based on the estimation of the Cumulative Sum (CUSUM) statistics, which is known to produce optimal performance. We first present an intuitively appealing approximation method. Unfortunately, this produces a biased estimator of the CUSUM statistics and may cause performance degradation. We then propose the Log-Partition Approximation Cumulative Sum (LPA-CUSUM) algorithm based on thermodynamic integration (TI) in order to estimate the log-ratio of normalizing constants of pre- and post-change distributions. It is proved that this approach gives an unbiased estimate of the log-partition function and the CUSUM statistics, and leads to an asymptotically optimal performance. Moreover, we derive a relationship between the required sample size for thermodynamic integration and the desired detection delay performance, offering guidelines for practical parameter selection. Numerical studies are provided demonstrating the efficacy of our approach.

Updated: 2025-02-11 06:41:22

标题: 渐近最优的非标准化前后变化分布的变化检测

摘要: 本文讨论了当只能访问非标准化的变化前和变化后分布时检测变化的问题。在物理学的许多场景中会出现这种情况，例如在铁磁性、晶体学、磁流体力学和热力学中，能量模型很难归一化。我们的方法基于累积和（CUSUM）统计量的估计，这被认为是产生最佳性能的方法。我们首先提出了一种直观吸引人的近似方法。不幸的是，这产生了CUSUM统计量的有偏估计，并可能导致性能下降。然后，我们提出了基于热力学积分（TI）的对数分区近似累积和（LPA-CUSUM）算法，以估计变化前和变化后分布的归一化常数的对数比率。证明了这种方法给出了对分区函数和CUSUM统计量的无偏估计，并导致了渐近优化性能。此外，我们推导了热力学积分所需的样本大小与所需的检测延迟性能之间的关系，为实际参数选择提供了指导。提供了数值研究，证明了我们方法的有效性。

更新时间: 2025-02-11 06:41:22

领域: stat.ML,cs.AI,cs.IT,cs.LG,eess.SP,math.IT

下载: http://arxiv.org/abs/2410.14615v2

Treatment Effect Estimation for Exponential Family Outcomes using Neural Networks with Targeted Regularization

Neural Networks (NNs) have became a natural choice for treatment effect estimation due to their strong approximation capabilities. Nevertheless, how to design NN-based estimators with desirable properties, such as low bias and doubly robustness, still remains a significant challenge. A common approach to address this is targeted regularization, which modifies the objective function of NNs. However, existing works on targeted regularization are limited to Gaussian-distributed outcomes, significantly restricting their applicability in real-world scenarios. In this work, we aim to bridge this blank by extending this framework to the boarder exponential family outcomes. Specifically, we first derive the von-Mises expansion of the Average Dose function of Canonical Functions (ADCF), which inspires us how to construct a doubly robust estimator with good properties. Based on this, we develop a NN-based estimator for ADCF by generalizing functional targeted regularization to exponential families, and provide the corresponding theoretical convergence rate. Extensive experimental results demonstrate the effectiveness of our proposed model.

Updated: 2025-02-11 06:36:20

标题: 使用具有目标正则化的神经网络估计指数族结果的治疗效应

摘要: 神经网络（NNs）由于其强大的逼近能力，已成为治疗效果估计的自然选择。然而，如何设计具有低偏差和双重稳健性等理想特性的基于NN的估计器仍然是一个重大挑战。解决这个问题的一种常见方法是定向正则化，它修改了NNs的目标函数。然而，现有的定向正则化工作仅限于高斯分布的结果，显著限制了其在现实世界场景中的适用性。在这项工作中，我们旨在通过将这个框架扩展到更广泛的指数家族结果，来填补这一空白。具体地，我们首先推导出规范函数的平均剂量函数（ADCF）的von-Mises展开式，这启发了我们如何构建具有良好特性的双重稳健估计器。基于此，我们通过将功能定向正则化推广到指数家族，为ADCF开发了一个基于NN的估计器，并提供了相应的理论收敛速度。广泛的实验结果证明了我们提出的模型的有效性。

更新时间: 2025-02-11 06:36:20

领域: cs.LG

下载: http://arxiv.org/abs/2502.07295v1

Global Universal Scaling and Ultra-Small Parameterization in Machine Learning Interatomic Potentials with Super-Linearity

Using machine learning (ML) to construct interatomic interactions and thus potential energy surface (PES) has become a common strategy for materials design and simulations. However, those current models of machine learning interatomic potential (MLIP) provide no relevant physical constrains, and thus may owe intrinsic out-of-domain difficulty which underlies the challenges of model generalizability and physical scalability. Here, by incorporating physics-informed Universal-Scaling law and nonlinearity-embedded interaction function, we develop a Super-linear MLIP with both Ultra-Small parameterization and greatly expanded expressive capability, named SUS2-MLIP. Due to the global scaling rooting in universal equation of state (UEOS), SUS2-MLIP not only has significantly-reduced parameters by decoupling the element space from coordinate space, but also naturally outcomes the out-of-domain difficulty and endows the potentials with inherent generalizability and scalability even with relatively small training dataset. The nonlinearity-enbeding transformation for interaction function expands the expressive capability and make the potentials super-linear. The SUS2-MLIP outperforms the state-of-the-art MLIP models with its exceptional computational efficiency especially for multiple-element materials and physical scalability in property prediction. This work not only presents a highly-efficient universal MLIP model but also sheds light on incorporating physical constraints into artificial-intelligence-aided materials simulation.

Updated: 2025-02-11 06:34:31

标题: 全球通用缩放和超小参数化在具有超线性的机器学习原子间势的应用中

摘要: 利用机器学习（ML）构建原子间相互作用，从而得到潜在能量表面（PES）已经成为材料设计和模拟的常见策略。然而，目前的机器学习原子间势能模型（MLIP）没有提供相关的物理约束，因此可能存在本质上的域外困难，这是模型泛化能力和物理可扩展性所面临的挑战。在这里，通过结合受物理启发的通用缩放定律和非线性嵌入的相互作用函数，我们开发了一个具有超线性MLIP的模型，参数化极小且具有极大的表达能力，命名为SUS2-MLIP。由于根植于通用状态方程（UEOS）的全局缩放，SUS2-MLIP不仅通过将元素空间与坐标空间解耦而显著减少参数，还自然地克服了域外困难，并赋予势能固有的泛化能力和可扩展性，即使是相对较小的训练数据集。对于相互作用函数的非线性嵌入转换扩展了表达能力，并使势能呈现超线性。SUS2-MLIP在计算效率方面表现优异，特别适用于多元素材料和物理可扩展性的属性预测。这项工作不仅提出了一个高效的通用MLIP模型，还为将物理约束纳入人工智能辅助的材料模拟提供了启示。

更新时间: 2025-02-11 06:34:31

领域: cond-mat.mtrl-sci,cs.LG

下载: http://arxiv.org/abs/2502.07293v1

Graph Neural Networks at a Fraction

Graph Neural Networks (GNNs) have emerged as powerful tools for learning representations of graph-structured data. In addition to real-valued GNNs, quaternion GNNs also perform well on tasks on graph-structured data. With the aim of reducing the energy footprint, we reduce the model size while maintaining accuracy comparable to that of the original-sized GNNs. This paper introduces Quaternion Message Passing Neural Networks (QMPNNs), a framework that leverages quaternion space to compute node representations. Our approach offers a generalizable method for incorporating quaternion representations into GNN architectures at one-fourth of the original parameter count. Furthermore, we present a novel perspective on Graph Lottery Tickets, redefining their applicability within the context of GNNs and QMPNNs. We specifically aim to find the initialization lottery from the subnetwork of the GNNs that can achieve comparable performance to the original GNN upon training. Thereby reducing the trainable model parameters even further. To validate the effectiveness of our proposed QMPNN framework and LTH for both GNNs and QMPNNs, we evaluate their performance on real-world datasets across three fundamental graph-based tasks: node classification, link prediction, and graph classification.

Updated: 2025-02-11 06:30:25

标题: 图神经网络的一部分

摘要: 图神经网络（GNNs）已经成为学习图结构数据表示的强大工具。除了实值GNN外，四元数GNN在图结构数据任务上也表现良好。为了减少能源消耗，我们减小了模型大小，同时保持了与原始大小GNN相当的准确性。本文介绍了四元数消息传递神经网络（QMPNNs），这是一个利用四元数空间计算节点表示的框架。我们的方法提供了一种通用方法，将四元数表示整合到GNN架构中，参数数量只有原始参数数量的四分之一。此外，我们提出了一种新颖的图彩票视角，重新定义了它们在GNN和QMPNN上的适用性。我们特别旨在从GNN的子网络中找到初始化中奖彩票，该子网络在训练后可以达到与原始GNN相当的性能，从而进一步减少可训练模型参数。为了验证我们提出的QMPNN框架和图彩票技术在GNN和QMPNN上的有效性，我们在真实世界数据集上评估它们在三个基本基于图的任务上的性能：节点分类、链接预测和图分类。

更新时间: 2025-02-11 06:30:25

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2502.06136v2

CROWN: A Novel Approach to Comprehending Users' Preferences for Accurate Personalized News Recommendation

Personalized news recommendation aims to assist users in finding news articles that align with their interests, which plays a pivotal role in mitigating users' information overload problem. Although many recent works have been studied for better personalized news recommendation, the following challenges should be explored more: (C1) Comprehending manifold intents coupled within a news article, (C2) Differentiating varying post-read preferences of news articles, and (C3) Addressing the cold-start user problem. To tackle the aforementioned challenges together, in this paper, we propose a novel personalized news recommendation framework (CROWN) that employs (1) category-guided intent disentanglement for (C1), (2) consistency-based news representation for (C2), and (3) GNN-enhanced hybrid user representation for (C3). Furthermore, we incorporate a category prediction into the training process of CROWN as an auxiliary task, which provides supplementary supervisory signals to enhance intent disentanglement. Extensive experiments on two real-world datasets reveal that (1) CROWN provides consistent performance improvements over ten state-of-the-art news recommendation methods and (2) the proposed strategies significantly improve the accuracy of CROWN.

Updated: 2025-02-11 06:26:34

标题: CROWN：一种理解用户偏好以实现准确个性化新闻推荐的新方法

摘要: 个性化新闻推荐旨在帮助用户找到符合其兴趣的新闻文章，这在缓解用户信息过载问题中起着至关重要的作用。尽管许多最近的研究致力于改进个性化新闻推荐，但以下挑战仍需更多探索：(C1)理解新闻文章中复杂的意图，(C2)区分新闻文章不同的阅读后偏好，以及(C3)解决冷启动用户问题。为了共同解决上述挑战，在本文中，我们提出了一种新颖的个性化新闻推荐框架（CROWN），该框架采用了（1）基于类别引导的意图分解来解决（C1），（2）基于一致性的新闻表示来解决（C2），以及（3）增强混合用户表示的GNN用于解决（C3）。此外，我们将类别预测整合到CROWN的训练过程中作为辅助任务，提供了额外的监督信号以增强意图分解。在两个真实世界数据集上进行的广泛实验表明，（1）CROWN相对于十种最先进的新闻推荐方法提供了一致的性能改进，（2）所提出的策略显著提高了CROWN的准确性。

更新时间: 2025-02-11 06:26:34

领域: cs.IR,cs.AI

下载: http://arxiv.org/abs/2310.09401v6

A duality framework for analyzing random feature and two-layer neural networks

We consider the problem of learning functions within the $\mathcal{F}_{p,\pi}$ and Barron spaces, which play crucial roles in understanding random feature models (RFMs), two-layer neural networks, as well as kernel methods. Leveraging tools from information-based complexity (IBC), we establish a dual equivalence between approximation and estimation, and then apply it to study the learning of the preceding function spaces. The duality allows us to focus on the more tractable problem between approximation and estimation. To showcase the efficacy of our duality framework, we delve into two important but under-explored problems: 1) Random feature learning beyond kernel regime: We derive sharp bounds for learning $\mathcal{F}_{p,\pi}$ using RFMs. Notably, the learning is efficient without the curse of dimensionality for $p>1$. This underscores the extended applicability of RFMs beyond the traditional kernel regime, since $\mathcal{F}_{p,\pi}$ with $p<2$ is strictly larger than the corresponding reproducing kernel Hilbert space (RKHS) where $p=2$. 2) The $L^\infty$ learning of RKHS: We establish sharp, spectrum-dependent characterizations for the convergence of $L^\infty$ learning error in both noiseless and noisy settings. Surprisingly, we show that popular kernel ridge regression can achieve near-optimal performance in $L^\infty$ learning, despite it primarily minimizing square loss. To establish the aforementioned duality, we introduce a type of IBC, termed $I$-complexity, to measure the size of a function class. Notably, $I$-complexity offers a tight characterization of learning in noiseless settings, yields lower bounds comparable to Le Cam's in noisy settings, and is versatile in deriving upper bounds. We believe that our duality framework holds potential for broad application in learning analysis across more scenarios.

Updated: 2025-02-11 06:24:23

标题: 一个用于分析随机特征和双层神经网络的对偶框架

摘要: 我们考虑在$\mathcal{F}_{p,\pi}$和Barron空间中学习函数的问题，在理解随机特征模型（RFMs）、两层神经网络以及核方法方面起着关键作用。利用信息复杂度（IBC）的工具，我们建立了逼近和估计之间的对偶等价关系，然后将其应用于研究前述函数空间的学习。这种对偶性使我们能够集中精力解决逼近和估计之间更易处理的问题。为了展示我们的对偶框架的有效性，我们深入研究了两个重要但尚未充分探讨的问题： 1）超越核范 regime 的随机特征学习：我们推导了使用RFMs学习$\mathcal{F}_{p,\pi}$的尖锐界限。值得注意的是，对于$p>1$，学习是高效的，没有维度灾难。这强调了RFMs在超越传统核范 regime 的扩展适用性，因为$\mathcal{F}_{p,\pi}$中的$p<2$严格大于相应的再现核希尔伯特空间（RKHS），其中$p=2$。 2）RKHS的$L^\infty$学习：我们建立了锐利的、依赖于频谱的$L^\infty$学习误差收敛特征，在无噪声和有噪声的情况下。令人惊讶的是，我们展示了流行的核岭回归可以在$L^\infty$学习中实现接近最优性能，尽管它主要是最小化平方损失。为了建立上述对偶性，我们引入了一种IBC类型，称为$I$-complexity，来衡量函数类的大小。值得注意的是，$I$-complexity提供了对无噪声环境中学习的紧凑描述，产生了与李·卡姆在有噪声环境中相当的下界，并且在推导上界方面具有多功能性。我们相信我们的对偶框架在更多情景中的学习分析中具有广泛应用潜力。

更新时间: 2025-02-11 06:24:23

领域: stat.ML,cs.LG

下载: http://arxiv.org/abs/2305.05642v3

Quest: Query-centric Data Synthesis Approach for Long-context Scaling of Large Language Model

Recent advancements in large language models (LLMs) have highlighted the importance of extending context lengths for handling complex tasks. While traditional methods for training on long contexts often use filtered long documents, these approaches lead to domain imbalances, limiting model performance. To address this, techniques like random document concatenation (Standard) and similarity-based methods (KNN, ICLM) have been developed. However, they either sacrifice semantic coherence or diversity. To balance both aspects, we introduce Quest, a query-centric data synthesis method aggregating semantically relevant yet diverse documents. Quest uses a generative model to predict potential queries for each document, grouping documents with similar queries and keywords. Extensive experiments demonstrate Quest's superior performance on long-context tasks, achieving remarkable results with context lengths of up to 1M tokens and confirming its scalability across various model sizes.

Updated: 2025-02-11 06:22:30

标题: 探索：用于大型语言模型长上下文扩展的查询中心数据综合方法

摘要: 最近大型语言模型（LLMs）的进展凸显了扩展上下文长度以处理复杂任务的重要性。传统的训练长上下文的方法通常使用经过筛选的长文档，但这些方法导致了领域不平衡，限制了模型性能。为了解决这个问题，出现了一些技术，如随机文档连接（标准）和基于相似性的方法（KNN，ICLM）。然而，它们要么牺牲语义连贯性，要么牺牲多样性。为了平衡这两个方面，我们引入了Quest，一种以查询为中心的数据合成方法，聚合语义相关但多样化的文档。Quest使用生成模型为每个文档预测潜在查询，将具有相似查询和关键词的文档分组。大量实验证明了Quest在长上下文任务上的卓越性能，实现了在长达1M标记的上下文长度下取得了显著结果，并证实了其在各种模型大小上的可扩展性。

更新时间: 2025-02-11 06:22:30

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2405.19846v7

KPIs 2024 Challenge: Advancing Glomerular Segmentation from Patch- to Slide-Level

Chronic kidney disease (CKD) is a major global health issue, affecting over 10% of the population and causing significant mortality. While kidney biopsy remains the gold standard for CKD diagnosis and treatment, the lack of comprehensive benchmarks for kidney pathology segmentation hinders progress in the field. To address this, we organized the Kidney Pathology Image Segmentation (KPIs) Challenge, introducing a dataset that incorporates preclinical rodent models of CKD with over 10,000 annotated glomeruli from 60+ Periodic Acid Schiff (PAS)-stained whole slide images. The challenge includes two tasks, patch-level segmentation and whole slide image segmentation and detection, evaluated using the Dice Similarity Coefficient (DSC) and F1-score. By encouraging innovative segmentation methods that adapt to diverse CKD models and tissue conditions, the KPIs Challenge aims to advance kidney pathology analysis, establish new benchmarks, and enable precise, large-scale quantification for disease research and diagnosis.

Updated: 2025-02-11 06:20:28

标题: KPIs 2024挑战：从Patch级别到Slide级别推进肾小球分割

摘要: 慢性肾病（CKD）是一个重要的全球健康问题，影响超过10％的人口并导致显著的死亡率。虽然肾脏活检仍然是CKD诊断和治疗的金标准，但对肾脏病理分割的缺乏全面的基准阻碍了该领域的进展。为了解决这个问题，我们组织了肾脏病理图像分割（KPIs）挑战赛，引入了一个数据集，其中包含了超过10,000个来自60多个周期性酸性希夫（PAS）染色全切片图像的标记了的肾小球的临床前啮齿动物模型。挑战包括两个任务，即补丁级别的分割和整个切片图像的分割和检测，使用Dice相似性系数（DSC）和F1分数进行评估。通过鼓励创新的分割方法，适应不同的CKD模型和组织条件，KPIs挑战旨在推进肾脏病理分析，建立新的基准，并实现疾病研究和诊断的精确、大规模的量化。

更新时间: 2025-02-11 06:20:28

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2502.07288v1

Gaussian Ensemble Belief Propagation for Efficient Inference in High-Dimensional Systems

Efficient inference in high-dimensional models is a central challenge in machine learning. We introduce the Gaussian Ensemble Belief Propagation (GEnBP) algorithm, which combines the strengths of the Ensemble Kalman Filter (EnKF) and Gaussian Belief Propagation (GaBP) to address this challenge. GEnBP updates ensembles of prior samples into posterior samples by passing low-rank local messages over the edges of a graphical model, enabling efficient handling of high-dimensional states, parameters, and complex, noisy, black-box generation processes. By utilizing local message passing within a graphical model structure, GEnBP effectively manages complex dependency structures and remains computationally efficient even when the ensemble size is much smaller than the inference dimension -- a common scenario in spatiotemporal modeling, image processing, and physical model inversion. We demonstrate that GEnBP can be applied to various problem structures, including data assimilation, system identification, and hierarchical models, and show through experiments that it outperforms existing belief propagation methods in terms of accuracy and computational efficiency. Supporting code is available at https://github.com/danmackinlay/GEnBP

Updated: 2025-02-11 06:18:45

标题: 高斯集成信念传播用于高维系统高效推断

摘要: 高维模型中的有效推理是机器学习中的一个核心挑战。我们引入了高斯集成信念传播（GEnBP）算法，结合了集合卡尔曼滤波器（EnKF）和高斯信念传播（GaBP）的优势，以解决这一挑战。GEnBP通过在图模型的边缘上传递低秩局部消息，将先前样本的集合更新为后验样本，从而实现对高维状态、参数和复杂、嘈杂、黑盒生成过程的有效处理。通过利用图模型结构内的局部消息传递，GEnBP有效地管理复杂的依赖结构，并在集合大小远小于推理维度的情况下仍保持计算效率——这在时空建模、图像处理和物理模型反演中是常见情况。我们展示了GEnBP可以应用于各种问题结构，包括数据同化、系统识别和层次模型，并通过实验证明，它在准确性和计算效率方面优于现有的信念传播方法。支持代码可在https://github.com/danmackinlay/GEnBP找到。

更新时间: 2025-02-11 06:18:45

领域: cs.LG,stat.ML,62-07 (Primary) 62F15, 62M40, 68T05, 68W25,I.2.6; H.2.4; I.2.8; J.2

下载: http://arxiv.org/abs/2402.08193v7

The Plug-in Approach for Average-Reward and Discounted MDPs: Optimal Sample Complexity Analysis

We study the sample complexity of the plug-in approach for learning $\varepsilon$-optimal policies in average-reward Markov decision processes (MDPs) with a generative model. The plug-in approach constructs a model estimate then computes an average-reward optimal policy in the estimated model. Despite representing arguably the simplest algorithm for this problem, the plug-in approach has never been theoretically analyzed. Unlike the more well-studied discounted MDP reduction method, the plug-in approach requires no prior problem information or parameter tuning. Our results fill this gap and address the limitations of prior approaches, as we show that the plug-in approach is optimal in several well-studied settings without using prior knowledge. Specifically it achieves the optimal diameter- and mixing-based sample complexities of $\widetilde{O}\left(SA \frac{D}{\varepsilon^2}\right)$ and $\widetilde{O}\left(SA \frac{\tau_{\mathrm{unif}}}{\varepsilon^2}\right)$, respectively, without knowledge of the diameter $D$ or uniform mixing time $\tau_{\mathrm{unif}}$. We also obtain span-based bounds for the plug-in approach, and complement them with algorithm-specific lower bounds suggesting that they are unimprovable. Our results require novel techniques for analyzing long-horizon problems which may be broadly useful and which also improve results for the discounted plug-in approach, removing effective-horizon-related sample size restrictions and obtaining the first optimal complexity bounds for the full range of sample sizes without reward perturbation.

Updated: 2025-02-11 06:14:22

标题: 基于插件方法的平均奖励和折现MDP：最优样本复杂性分析

摘要: 我们研究了在具有生成模型的平均奖励马尔可夫决策过程（MDP）中学习$\varepsilon$-最优策略的插件方法的样本复杂性。插件方法构建一个模型估计，然后在估计模型中计算一个平均奖励最优策略。尽管代表了这个问题的可能是最简单的算法，插件方法从未被理论分析过。与更受研究的折扣MDP减少方法不同，插件方法不需要任何先验问题信息或参数调整。我们的结果填补了这一空白，并解决了先前方法的局限性，因为我们展示了在不使用先验知识的情况下插件方法在几个广泛研究的设置中是最优的。具体而言，它分别实现了基于直径和混合的样本复杂性的最优值$\widetilde{O}\left(SA \frac{D}{\varepsilon^2}\right)$和$\widetilde{O}\left(SA \frac{\tau_{\mathrm{unif}}}{\varepsilon^2}\right)$，而不需要直径$D$或均匀混合时间$\tau_{\mathrm{unif}}$的知识。我们还获得了插件方法的基于跨度的界限，并用特定于算法的下界来补充它们，表明它们是无法改进的。我们的结果需要对分析长期问题的新技术，这可能是广泛有用的，并且还改进了折扣插件方法的结果，消除了与有效水平相关的样本大小限制，并获得了对全范围的样本大小而言没有奖励扰动的第一个最优复杂性界限。

更新时间: 2025-02-11 06:14:22

领域: cs.LG,cs.IT,math.IT,math.OC,stat.ML

下载: http://arxiv.org/abs/2410.07616v2

Small Language Model Makes an Effective Long Text Extractor

Named Entity Recognition (NER) is a fundamental problem in natural language processing (NLP). However, the task of extracting longer entity spans (e.g., awards) from extended texts (e.g., homepages) is barely explored. Current NER methods predominantly fall into two categories: span-based methods and generation-based methods. Span-based methods require the enumeration of all possible token-pair spans, followed by classification on each span, resulting in substantial redundant computations and excessive GPU memory usage. In contrast, generation-based methods involve prompting or fine-tuning large language models (LLMs) to adapt to downstream NER tasks. However, these methods struggle with the accurate generation of longer spans and often incur significant time costs for effective fine-tuning. To address these challenges, this paper introduces a lightweight span-based NER method called SeNER, which incorporates a bidirectional arrow attention mechanism coupled with LogN-Scaling on the [CLS] token to embed long texts effectively, and comprises a novel bidirectional sliding-window plus-shaped attention (BiSPA) mechanism to reduce redundant candidate token-pair spans significantly and model interactions between token-pair spans simultaneously. Extensive experiments demonstrate that our method achieves state-of-the-art extraction accuracy on three long NER datasets and is capable of extracting entities from long texts in a GPU-memory-friendly manner. Code: https://github.com/THUDM/scholar-profiling/tree/main/sener

Updated: 2025-02-11 06:06:25

标题: 小语言模型是一种有效的长文本提取器

摘要: 实体命名识别（NER）是自然语言处理（NLP）中的一个基本问题。然而，从扩展文本（例如主页）中提取更长的实体跨度（例如奖项）的任务几乎没有被探索。当前的NER方法主要分为两类：基于跨度的方法和基于生成的方法。基于跨度的方法需要列举所有可能的标记对跨度，然后对每个跨度进行分类，导致大量冗余计算和过多的GPU内存使用。相比之下，基于生成的方法涉及提示或对大型语言模型（LLMs）进行微调，以适应下游NER任务。然而，这些方法在准确生成更长跨度方面存在困难，并且往往需要大量时间成本进行有效的微调。为了解决这些挑战，本文介绍了一种轻量级基于跨度的NER方法，称为SeNER，它结合了双向箭头注意机制和[CLS]标记上的LogN缩放，以有效嵌入长文本，并包括一种新颖的双向滑动窗口加号形状注意（BiSPA）机制，显著减少了冗余候选标记对跨度，并同时建模标记对跨度之间的交互。大量实验证明，我们的方法在三个长NER数据集上实现了最先进的提取准确性，并能够以对GPU内存友好的方式从长文本中提取实体。代码：https://github.com/THUDM/scholar-profiling/tree/main/sener

更新时间: 2025-02-11 06:06:25

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2502.07286v1

Negative Dependence as a toolbox for machine learning : review and new developments

Negative dependence is becoming a key driver in advancing learning capabilities beyond the limits of traditional independence. Recent developments have evidenced support towards negatively dependent systems as a learning paradigm in a broad range of fundamental machine learning challenges including optimization, sampling, dimensionality reduction and sparse signal recovery, often surpassing the performance of current methods based on statistical independence. The most popular negatively dependent model has been that of determinantal point processes (DPPs), which have their origins in quantum theory. However, other models, such as perturbed lattice models, strongly Rayleigh measures, zeros of random functions have gained salience in various learning applications. In this article, we review this burgeoning field of research, as it has developed over the past two decades or so. We also present new results on applications of DPPs to the parsimonious representation of neural networks. In the limited scope of the article, we mostly focus on aspects of this area to which the authors contributed over the recent years, including applications to Monte Carlo methods, coresets and stochastic gradient descent, stochastic networks, signal processing and connections to quantum computation. However, starting from basics of negative dependence for the uninitiated reader, extensive references are provided to a broad swath of related developments which could not be covered within our limited scope. While existing works and reviews generally focus on specific negatively dependent models (e.g. DPPs), a notable feature of this article is that it addresses negative dependence as a machine learning methodology as a whole. In this vein, it covers within its span an array of negatively dependent models and their applications well beyond DPPs, thereby putting forward a very general and rather unique perspective.

Updated: 2025-02-11 06:04:49

标题: 负相关性作为机器学习工具箱：回顾与新发展

摘要: 负相关性正在成为推动学习能力超越传统独立性限制的关键驱动因素。最近的发展表明，对于负相关系统作为广泛范围内包括优化、抽样、降维和稀疏信号恢复在内的基本机器学习挑战的学习范式提供了支持，往往超过了基于统计独立性的当前方法的性能。最流行的负相关模型是行列式点过程（DPPs），其起源于量子理论。然而，其他模型，如扰动晶格模型、强瑞利测度、随机函数的零点在各种学习应用中变得突出。在本文中，我们回顾了这个在过去二十年左右发展起来的蓬勃发展领域。我们还提出了DPPs在神经网络的简约表示中的应用的新结果。在本文的有限范围内，我们主要关注作者近年来对该领域的贡献，包括对蒙特卡洛方法、核心集和随机梯度下降、随机网络、信号处理以及与量子计算的关联的应用。然而，对于未曾接触过负相关性基础知识的读者，我们提供了大量参考文献，涵盖了广泛相关发展，这些发展在我们的有限范围内无法涵盖。虽然现有的作品和评论通常关注特定的负相关模型（例如DPPs），但本文的一个显著特点是它将负相关性作为整体的机器学习方法论来探讨。因此，它在其范围内涵盖了一系列负相关模型及其应用，远远超出了DPPs，从而提出了一个非常一般且独特的视角。

更新时间: 2025-02-11 06:04:49

领域: stat.ML,cs.LG,math.PR

下载: http://arxiv.org/abs/2502.07285v1

VLWE: Variety-based Learning with Errors for Vector Encryption through Algebraic Geometry

Lattice-based cryptography is a foundation for post-quantum security, with the Learning with Errors (LWE) problem as a core component in key exchange, encryption, and homomorphic computation. Structured variants like Ring-LWE (RLWE) and Module-LWE (MLWE) improve efficiency using polynomial rings but remain constrained by traditional polynomial multiplication rules, limiting their ability to handle structured vectorized data. This work introduces Variety-LWE (VLWE), a new structured lattice problem based on algebraic geometry. Unlike RLWE and MLWE, which use polynomial quotient rings with standard multiplication, VLWE operates over multivariate polynomial rings defined by algebraic varieties. A key difference is that these polynomials lack mixed variables, and multiplication is coordinate-wise rather than following standard polynomial multiplication. This enables direct encoding and homomorphic processing of high-dimensional data while preserving worst-case to average-case hardness reductions. We prove VLWE's security by reducing it to multiple independent Ideal-SVP instances, demonstrating resilience against classical and quantum attacks. Additionally, we analyze hybrid algebraic-lattice attacks, showing that existing Grobner basis and lattice reduction methods do not directly threaten VLWE. We further construct a vector homomorphic encryption scheme based on VLWE, supporting structured computations while controlling noise growth. This scheme offers advantages in privacy-preserving machine learning, encrypted search, and secure computations over structured data. VLWE emerges as a novel and independent paradigm in lattice-based cryptography, leveraging algebraic geometry to enable new cryptographic capabilities beyond traditional polynomial quotient rings.

Updated: 2025-02-11 06:04:24

标题: VLWE：基于多样性的代数几何向量加密学习带错误

摘要: 基于格的密码学是后量子安全的基础，其中学习与错误（LWE）问题是密钥交换、加密和同态计算的核心组成部分。结构化变体如环-LWE（RLWE）和模块-LWE（MLWE）利用多项式环提高效率，但仍受传统多项式乘法规则的限制，限制了它们处理结构化向量化数据的能力。本文介绍了Variety-LWE（VLWE），这是一个基于代数几何的新结构化格问题。与使用标准乘法的多项式商环的RLWE和MLWE不同，VLWE在由代数变量定义的多变量多项式环上运行。一个关键的区别是这些多项式缺乏混合变量，乘法是按坐标进行而不是遵循标准多项式乘法。这使得能够直接对高维数据进行编码和同态处理，同时保持最坏情况到平均情况的难度降低。我们通过将VLWE约化为多个独立的理想-SVP实例来证明VLWE的安全性，展示其对抗经典和量子攻击的弹性。此外，我们分析了混合代数格攻击，显示现有的格基和格约简方法并不直接威胁VLWE。我们进一步构建了一个基于VLWE的向量同态加密方案，支持结构化计算同时控制噪声增长。这一方案在隐私保护机器学习、加密搜索和安全计算结构化数据方面具有优势。VLWE作为基于格的密码学中的一种新颖且独立的范式出现，利用代数几何实现了超越传统多项式商环的新的加密能力。

更新时间: 2025-02-11 06:04:24

领域: cs.CR,cs.CG

下载: http://arxiv.org/abs/2502.07284v1

Enhancing Security and Privacy in Federated Learning using Low-Dimensional Update Representation and Proximity-Based Defense

Federated Learning (FL) is a promising privacy-preserving machine learning paradigm that allows data owners to collaboratively train models while keeping their data localized. Despite its potential, FL faces challenges related to the trustworthiness of both clients and servers, particularly against curious or malicious adversaries. In this paper, we introduce a novel framework named \underline{F}ederated \underline{L}earning with Low-Dimensional \underline{U}pdate \underline{R}epresentation and \underline{P}roximity-Based defense (FLURP), designed to address privacy preservation and resistance to Byzantine attacks in distributed learning environments. FLURP employs $\mathsf{LinfSample}$ method, enabling clients to compute the $l_{\infty}$ norm across sliding windows of updates, resulting in a Low-Dimensional Update Representation (LUR). Calculating the shared distance matrix among LURs, rather than updates, significantly reduces the overhead of Secure Multi-Party Computation (SMPC) by three orders of magnitude while effectively distinguishing between benign and poisoned updates. Additionally, FLURP integrates a privacy-preserving proximity-based defense mechanism utilizing optimized SMPC protocols to minimize communication rounds. Our experiments demonstrate FLURP's effectiveness in countering Byzantine adversaries with low communication and runtime overhead. FLURP offers a scalable framework for secure and reliable FL in distributed environments, facilitating its application in scenarios requiring robust data management and security.

Updated: 2025-02-11 06:00:39

标题: 利用低维度更新表示和基于接近性的防御增强联邦学习中的安全性和隐私性

摘要: 联邦学习（FL）是一种有前途的隐私保护机器学习范式，允许数据所有者在保持数据本地化的同时协同训练模型。尽管具有潜力，但FL面临与客户和服务器的可信度有关的挑战，特别是针对好奇或恶意对手。本文介绍了一种名为FLURP的新颖框架，旨在解决分布式学习环境中的隐私保护和对抗拜占庭攻击的问题。FLURP采用LinfSample方法，使客户端能够计算更新的滑动窗口上的l∞范数，从而产生低维更新表示（LUR）。通过计算LUR之间的共享距离矩阵，而不是更新，显著减少了安全多方计算（SMPC）的开销，效果是降低了三个数量级，同时有效区分良性和恶意更新。此外，FLURP集成了一个隐私保护的基于接近度的防御机制，利用优化的SMPC协议来最小化通信轮数。我们的实验证明了FLURP在对抗拜占庭对手时的有效性，且通信和运行时开销较低。FLURP为分布式环境中的安全可靠FL提供了可扩展的框架，促进了其在需要强大数据管理和安全性的场景中的应用。

更新时间: 2025-02-11 06:00:39

领域: cs.CR,cs.AI

下载: http://arxiv.org/abs/2405.18802v2

Supervised Contrastive Block Disentanglement

Real-world datasets often combine data collected under different experimental conditions. This yields larger datasets, but also introduces spurious correlations that make it difficult to model the phenomena of interest. We address this by learning two embeddings to independently represent the phenomena of interest and the spurious correlations. The embedding representing the phenomena of interest is correlated with the target variable $y$, and is invariant to the environment variable $e$. In contrast, the embedding representing the spurious correlations is correlated with $e$. The invariance to $e$ is difficult to achieve on real-world datasets. Our primary contribution is an algorithm called Supervised Contrastive Block Disentanglement (SCBD) that effectively enforces this invariance. It is based purely on Supervised Contrastive Learning, and applies to real-world data better than existing approaches. We empirically validate SCBD on two challenging problems. The first problem is domain generalization, where we achieve strong performance on a synthetic dataset, as well as on Camelyon17-WILDS. We introduce a single hyperparameter $\alpha$ to control the degree of invariance to $e$. When we increase $\alpha$ to strengthen the degree of invariance, out-of-distribution performance improves at the expense of in-distribution performance. The second problem is batch correction, in which we apply SCBD to preserve biological signal and remove inter-well batch effects when modeling single-cell perturbations from 26 million Optical Pooled Screening images.

Updated: 2025-02-11 05:55:27

标题: 受监督的对比块解缠

摘要: 真实世界的数据集通常结合了在不同实验条件下收集的数据。这产生了更大的数据集，但也引入了虚假的相关性，使得难以对感兴趣的现象建模。我们通过学习两个嵌入来独立表示感兴趣的现象和虚假相关性来解决这个问题。表示感兴趣的现象的嵌入与目标变量$y$相关，并且对环境变量$e$不变。相反，表示虚假相关性的嵌入与$e$相关。在真实世界的数据集上实现对$e$的不变性是困难的。我们的主要贡献是一种名为监督对比块解缠（SCBD）的算法，有效地强制实现这种不变性。它基于纯粹的监督对比学习，并且比现有方法更适用于真实世界的数据。我们在两个具有挑战性的问题上对SCBD进行了实证验证。第一个问题是域泛化，在合成数据集以及Camelyon17-WILDS上取得了强大的性能。我们引入了一个单一的超参数$\alpha$来控制对$e$的不变程度。当我们增加$\alpha$以加强不变性时，超出分布的性能会提高，而内部分布的性能会降低。第二个问题是批次校正，在这个问题中，我们将SCBD应用于从2600万个光池筛选图像中建模单细胞扰动时，保留生物信号并消除不同孔批次效应。

更新时间: 2025-02-11 05:55:27

领域: cs.LG

下载: http://arxiv.org/abs/2502.07281v1

MIGT: Memory Instance Gated Transformer Framework for Financial Portfolio Management

Deep reinforcement learning (DRL) has been applied in financial portfolio management to improve returns in changing market conditions. However, unlike most fields where DRL is widely used, the stock market is more volatile and dynamic as it is affected by several factors such as global events and investor sentiment. Therefore, it remains a challenge to construct a DRL-based portfolio management framework with strong return capability, stable training, and generalization ability. This study introduces a new framework utilizing the Memory Instance Gated Transformer (MIGT) for effective portfolio management. By incorporating a novel Gated Instance Attention module, which combines a transformer variant, instance normalization, and a Lite Gate Unit, our approach aims to maximize investment returns while ensuring the learning process's stability and reducing outlier impacts. Tested on the Dow Jones Industrial Average 30, our framework's performance is evaluated against fifteen other strategies using key financial metrics like the cumulative return and risk-return ratios (Sharpe, Sortino, and Omega ratios). The results highlight MIGT's advantage, showcasing at least a 9.75% improvement in cumulative returns and a minimum 2.36% increase in risk-return ratios over competing strategies, marking a significant advancement in DRL for portfolio management.

Updated: 2025-02-11 05:54:42

标题: MIGT：金融投资组合管理的内存实例门控变压器框架

摘要: 深度强化学习（DRL）已被应用于金融投资组合管理，以改善在变化的市场条件下的回报。然而，与大多数领域普遍使用DRL不同，股票市场更加波动和动态，受全球事件和投资者情绪等多种因素影响。因此，构建一个具有强大回报能力、稳定训练和泛化能力的基于DRL的投资组合管理框架仍然是一个挑战。本研究引入了一个利用记忆实例门控变压器（MIGT）进行有效投资组合管理的新框架。通过整合一种新型的门控实例注意模块，结合了变压器变种、实例归一化和Lite门单元，我们的方法旨在在确保学习过程稳定性和减少离群值影响的同时最大化投资回报。在道琼斯工业平均指数30的测试中，我们的框架的表现与其他十五种策略使用关键的金融指标（如累计回报和风险回报比率（夏普、索汀诺和Omega比率）进行了评估。结果突显出MIGT的优势，展示了与竞争策略相比至少9.75％的累计回报改进和至少2.36％的风险回报比率增加，标志着在投资组合管理中DRL的显著进步。

更新时间: 2025-02-11 05:54:42

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2502.07280v1

xPatch: Dual-Stream Time Series Forecasting with Exponential Seasonal-Trend Decomposition

In recent years, the application of transformer-based models in time-series forecasting has received significant attention. While often demonstrating promising results, the transformer architecture encounters challenges in fully exploiting the temporal relations within time series data due to its attention mechanism. In this work, we design eXponential Patch (xPatch for short), a novel dual-stream architecture that utilizes exponential decomposition. Inspired by the classical exponential smoothing approaches, xPatch introduces the innovative seasonal-trend exponential decomposition module. Additionally, we propose a dual-flow architecture that consists of an MLP-based linear stream and a CNN-based non-linear stream. This model investigates the benefits of employing patching and channel-independence techniques within a non-transformer model. Finally, we develop a robust arctangent loss function and a sigmoid learning rate adjustment scheme, which prevent overfitting and boost forecasting performance. The code is available at the following repository: https://github.com/stitsyuk/xPatch.

Updated: 2025-02-11 05:49:47

标题: xPatch：使用指数季节性趋势分解的双流时间序列预测

摘要: 近年来，基于变压器的模型在时间序列预测中的应用受到了重视。虽然通常表现出有希望的结果，但由于其注意力机制，变压器架构在充分利用时间序列数据中的时间关系方面遇到挑战。在这项工作中，我们设计了一种新颖的双流架构——指数补丁（xPatch），该架构利用指数分解。受经典指数平滑方法的启发，xPatch引入了创新的季节趋势指数分解模块。此外，我们提出了一个由基于MLP的线性流和基于CNN的非线性流组成的双流架构。该模型探讨了在非变压器模型中采用修补和通道独立技术的好处。最后，我们开发了一个稳健的反正切损失函数和一个sigmoid学习率调整方案，以防止过拟合并提高预测性能。代码可在以下存储库中找到：https://github.com/stitsyuk/xPatch。

更新时间: 2025-02-11 05:49:47

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2412.17323v3

Exploratory Diffusion Policy for Unsupervised Reinforcement Learning

Unsupervised reinforcement learning (RL) aims to pre-train agents by exploring states or skills in reward-free environments, facilitating the adaptation to downstream tasks. However, existing methods often overlook the fitting ability of pre-trained policies and struggle to handle the heterogeneous pre-training data, which are crucial for achieving efficient exploration and fast fine-tuning. To address this gap, we propose Exploratory Diffusion Policy (EDP), which leverages the strong expressive ability of diffusion models to fit the explored data, both boosting exploration and obtaining an efficient initialization for downstream tasks. Specifically, we estimate the distribution of collected data in the replay buffer with the diffusion policy and propose a score intrinsic reward, encouraging the agent to explore unseen states. For fine-tuning the pre-trained diffusion policy on downstream tasks, we provide both theoretical analyses and practical algorithms, including an alternating method of Q function optimization and diffusion policy distillation. Extensive experiments demonstrate the effectiveness of EDP in efficient exploration during pre-training and fast adaptation during fine-tuning.

Updated: 2025-02-11 05:48:51

标题: 无监督强化学习的探索式扩散政策

摘要: 无监督强化学习（RL）旨在通过在无奖励环境中探索状态或技能来预训练代理，从而促进对下游任务的适应。然而，现有方法经常忽视预训练策略的拟合能力，并且难以处理异构的预训练数据，这对于实现高效探索和快速微调至关重要。为了填补这一差距，我们提出了探索性扩散策略（EDP），利用扩散模型的强大表达能力来适应探索数据，从而提升探索效率并为下游任务提供高效初始化。具体而言，我们使用扩散策略估计回放缓冲区中收集的数据的分布，并提出一种得分内在奖励，鼓励代理探索未见过的状态。为了在下游任务上对预训练的扩散策略进行微调，我们提供了理论分析和实际算法，包括Q函数优化和扩散策略提炼的交替方法。大量实验证明了EDP在预训练期间的高效探索和微调期间的快速适应的有效性。

更新时间: 2025-02-11 05:48:51

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2502.07279v1

Transformer-based Stagewise Decomposition for Large-Scale Multistage Stochastic Optimization

Solving large-scale multistage stochastic programming (MSP) problems poses a significant challenge as commonly used stagewise decomposition algorithms, including stochastic dual dynamic programming (SDDP), face growing time complexity as the subproblem size and problem count increase. Traditional approaches approximate the value functions as piecewise linear convex functions by incrementally accumulating subgradient cutting planes from the primal and dual solutions of stagewise subproblems. Recognizing these limitations, we introduce TranSDDP, a novel Transformer-based stagewise decomposition algorithm. This innovative approach leverages the structural advantages of the Transformer model, implementing a sequential method for integrating subgradient cutting planes to approximate the value function. Through our numerical experiments, we affirm TranSDDP's effectiveness in addressing MSP problems. It efficiently generates a piecewise linear approximation for the value function, significantly reducing computation time while preserving solution quality, thus marking a promising progression in the treatment of large-scale multistage stochastic programming problems.

Updated: 2025-02-11 05:48:49

标题: 基于Transformer的大规模多阶段随机优化的逐阶分解

摘要: 解决大规模多阶段随机规划（MSP）问题是一个重大挑战，因为常用的分阶分解算法，包括随机对偶动态规划（SDDP），随着子问题规模和问题数量的增加而面临着不断增长的时间复杂度。传统方法通过逐步累积主问题和分阶子问题的原始和对偶解得到次梯度切割平面，将值函数近似为分段线性凸函数。鉴于这些局限性，我们引入了一种新颖的基于Transformer的分阶分解算法TranSDDP。这种创新方法利用Transformer模型的结构优势，实施了一个顺序方法，将次梯度切割平面集成到值函数中以进行近似。通过我们的数值实验，我们确认了TranSDDP在解决MSP问题方面的有效性。它有效地生成了值函数的分段线性近似，显著减少了计算时间，同时保持了解的质量，因此在处理大规模多阶段随机规划问题上标志着一个有希望的进展。

更新时间: 2025-02-11 05:48:49

领域: cs.LG

下载: http://arxiv.org/abs/2404.02583v2

Enhancing Video Understanding: Deep Neural Networks for Spatiotemporal Analysis

It's no secret that video has become the primary way we share information online. That's why there's been a surge in demand for algorithms that can analyze and understand video content. It's a trend going to continue as video continues to dominate the digital landscape. These algorithms will extract and classify related features from the video and will use them to describe the events and objects in the video. Deep neural networks have displayed encouraging outcomes in the realm of feature extraction and video description. This paper will explore the spatiotemporal features found in videos and recent advancements in deep neural networks in video understanding. We will review some of the main trends in video understanding models and their structural design, the main problems, and some offered solutions in this topic. We will also review and compare significant video understanding and action recognition datasets.

Updated: 2025-02-11 05:44:50

标题: 增强视频理解：用于时空分析的深度神经网络

摘要: 这段文献摘要指出，视频已成为我们在线分享信息的主要方式，因此对能够分析和理解视频内容的算法需求激增。随着视频继续主导数字领域，这一趋势将持续下去。这些算法将从视频中提取和分类相关特征，并用它们描述视频中的事件和对象。深度神经网络在特征提取和视频描述领域显示了令人鼓舞的成果。本文将探讨视频中的时空特征以及深度神经网络在视频理解方面的最新进展。我们将审查视频理解模型及其结构设计的主要趋势，主要问题以及该领域提供的一些解决方案。我们还将回顾并比较重要的视频理解和动作识别数据集。

更新时间: 2025-02-11 05:44:50

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2502.07277v1

Dataset Ownership Verification in Contrastive Pre-trained Models

High-quality open-source datasets, which necessitate substantial efforts for curation, has become the primary catalyst for the swift progress of deep learning. Concurrently, protecting these datasets is paramount for the well-being of the data owner. Dataset ownership verification emerges as a crucial method in this domain, but existing approaches are often limited to supervised models and cannot be directly extended to increasingly popular unsupervised pre-trained models. In this work, we propose the first dataset ownership verification method tailored specifically for self-supervised pre-trained models by contrastive learning. Its primary objective is to ascertain whether a suspicious black-box backbone has been pre-trained on a specific unlabeled dataset, aiding dataset owners in upholding their rights. The proposed approach is motivated by our empirical insights that when models are trained with the target dataset, the unary and binary instance relationships within the embedding space exhibit significant variations compared to models trained without the target dataset. We validate the efficacy of this approach across multiple contrastive pre-trained models including SimCLR, BYOL, SimSiam, MOCO v3, and DINO. The results demonstrate that our method rejects the null hypothesis with a $p$-value markedly below $0.05$, surpassing all previous methodologies. Our code is available at https://github.com/xieyc99/DOV4CL.

Updated: 2025-02-11 05:42:21

标题: 对比式预训练模型中的数据集所有权验证

摘要: 高质量的开源数据集需要大量的精心策划工作，已成为深度学习迅速进展的主要推动力。与此同时，保护这些数据集对于数据所有者的福祉至关重要。数据集所有权验证出现在这一领域中变得至关重要，但现有方法往往局限于监督模型，无法直接扩展到越来越受欢迎的无监督预训练模型。在这项工作中，我们提出了第一个专门针对自监督预训练模型的数据集所有权验证方法，采用对比学习。其主要目标是确定可疑的黑盒骨干是否已在特定未标记数据集上进行了预训练，帮助数据集所有者维护其权利。所提出的方法是基于我们的经验见解，即当模型使用目标数据集进行训练时，嵌入空间内的一元和二元实例关系与未使用目标数据集进行训练的模型相比呈现出显著变化。我们验证了这种方法在多个对比预训练模型（包括SimCLR、BYOL、SimSiam、MOCO v3和DINO）上的有效性。结果表明，我们的方法拒绝了显著低于0.05的$p$值的零假设，超过了所有先前的方法。我们的代码可在https://github.com/xieyc99/DOV4CL上找到。

更新时间: 2025-02-11 05:42:21

领域: cs.LG,cs.AI,cs.CV

下载: http://arxiv.org/abs/2502.07276v1

Process Reward Model with Q-Value Rankings

Process Reward Modeling (PRM) is critical for complex reasoning and decision-making tasks where the accuracy of intermediate steps significantly influences the overall outcome. Existing PRM approaches, primarily framed as classification problems, employ cross-entropy loss to independently evaluate each step's correctness. This method can lead to suboptimal reward distribution and does not adequately address the interdependencies among steps. To address these limitations, we introduce the Process Q-value Model (PQM), a novel framework that redefines PRM in the context of a Markov Decision Process. PQM optimizes Q-value rankings based on a novel comparative loss function, enhancing the model's ability to capture the intricate dynamics among sequential decisions. This approach provides a more granular and theoretically grounded methodology for process rewards. Our extensive empirical evaluations across various sampling policies, language model backbones, and multi-step reasoning benchmarks show that PQM outperforms classification-based PRMs. The effectiveness of the comparative loss function is highlighted in our comprehensive ablation studies, confirming PQM's practical efficacy and theoretical advantage.

Updated: 2025-02-11 05:41:41

标题: 具有Q值排名的过程奖励模型

摘要: 过程奖励建模（PRM）对于复杂的推理和决策任务至关重要，其中中间步骤的准确性显着影响整体结果。现有的PRM方法主要作为分类问题，采用交叉熵损失独立评估每个步骤的正确性。这种方法可能导致次优的奖励分配，并且不能充分解决步骤之间的相互依赖关系。为了解决这些限制，我们引入了过程Q值模型（PQM），这是一个在马尔可夫决策过程的背景下重新定义PRM的新框架。PQM基于一种新颖的比较损失函数优化Q值排名，增强模型捕捉顺序决策之间复杂动态的能力。这种方法提供了一个更加细致和理论基础的过程奖励方法论。我们在各种采样策略、语言模型骨干和多步推理基准上的广泛实证评估表明，PQM优于基于分类的PRM。比较损失函数的有效性在我们的全面消融研究中得到了突出展示，验证了PQM的实用效力和理论优势。

更新时间: 2025-02-11 05:41:41

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2410.11287v2

Cost-Efficient Continual Learning with Sufficient Exemplar Memory

Continual learning (CL) research typically assumes highly constrained exemplar memory resources. However, in many real-world scenarios-especially in the era of large foundation models-memory is abundant, while GPU computational costs are the primary bottleneck. In this work, we investigate CL in a novel setting where exemplar memory is ample (i.e., sufficient exemplar memory). Unlike prior methods designed for strict exemplar memory constraints, we propose a simple yet effective approach that directly operates in the model's weight space through a combination of weight resetting and averaging techniques. Our method achieves state-of-the-art performance while reducing the computational cost to a quarter or third of existing methods. These findings challenge conventional CL assumptions and provide a practical baseline for computationally efficient CL applications.

Updated: 2025-02-11 05:40:52

标题: 经济高效的持续学习与充足示例记忆

摘要: 不断学习（CL）研究通常假设具有高度受限的示例记忆资源。然而，在许多现实场景中，特别是在大型基础模型时代，记忆是丰富的，而GPU计算成本是主要瓶颈。在这项工作中，我们研究了一个新颖的场景中的CL，其中示例记忆是充足的。与为严格的示例记忆约束设计的先前方法不同，我们提出了一种简单而有效的方法，通过重置和平均权重技术直接在模型的权重空间中操作。我们的方法在降低计算成本的同时实现了最先进的性能，降低到现有方法的四分之一或三分之一。这些发现挑战了传统的CL假设，并为计算效率高的CL应用提供了一个实用的基线。

更新时间: 2025-02-11 05:40:52

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2502.07274v1

Variational Learning Induces Adaptive Label Smoothing

We show that variational learning naturally induces an adaptive label smoothing where label noise is specialized for each example. Such label-smoothing is useful to handle examples with labeling errors and distribution shifts, but designing a good adaptivity strategy is not always easy. We propose to skip this step and simply use the natural adaptivity induced during the optimization of a variational objective. We show empirical results where a variational algorithm called IVON outperforms traditional label smoothing and yields adaptivity strategies similar to those of an existing approach. By connecting Bayesian methods to label smoothing, our work provides a new way to handle overconfident predictions.

Updated: 2025-02-11 05:40:42

标题: 变分学习引导自适应标签平滑

摘要: 我们展示了变分学习自然地引入了自适应标签平滑，其中标签噪声针对每个示例进行了专门化处理。这种标签平滑对处理带有标注错误和分布偏移的示例非常有用，但设计一个良好的自适应策略并不总是容易的。我们建议跳过这一步，简单地利用在变分目标优化过程中引入的自然自适应性。我们展示了实证结果，其中一种名为IVON的变分算法优于传统的标签平滑，并产生与现有方法类似的自适应策略。通过将贝叶斯方法与标签平滑相连接，我们的工作提供了一种处理过度自信预测的新方法。

更新时间: 2025-02-11 05:40:42

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2502.07273v1

AdParaphrase: Paraphrase Dataset for Analyzing Linguistic Features toward Generating Attractive Ad Texts

Effective linguistic choices that attract potential customers play crucial roles in advertising success. This study aims to explore the linguistic features of ad texts that influence human preferences. Although the creation of attractive ad texts is an active area of research, progress in understanding the specific linguistic features that affect attractiveness is hindered by several obstacles. First, human preferences are complex and influenced by multiple factors, including their content, such as brand names, and their linguistic styles, making analysis challenging. Second, publicly available ad text datasets that include human preferences are lacking, such as ad performance metrics and human feedback, which reflect people's interests. To address these problems, we present AdParaphrase, a paraphrase dataset that contains human preferences for pairs of ad texts that are semantically equivalent but differ in terms of wording and style. This dataset allows for preference analysis that focuses on the differences in linguistic features. Our analysis revealed that ad texts preferred by human judges have higher fluency, longer length, more nouns, and use of bracket symbols. Furthermore, we demonstrate that an ad text-generation model that considers these findings significantly improves the attractiveness of a given text. The dataset is publicly available at: https://github.com/CyberAgentAILab/AdParaphrase.

Updated: 2025-02-11 05:36:24

标题: 广告改写：用于分析语言特征以生成具吸引力广告文本的改写数据集

摘要: 有效的语言选择对吸引潜在客户在广告成功中起着至关重要的作用。本研究旨在探讨影响人类偏好的广告文本的语言特征。尽管创作吸引人的广告文本是一个活跃的研究领域，但在理解影响吸引力的具体语言特征方面取得进展受到了几个障碍的阻碍。首先，人类偏好是复杂的，受到多种因素的影响，包括它们的内容，如品牌名称，以及它们的语言风格，这使得分析具有挑战性。其次，缺乏包含人类偏好的广告文本数据集，例如广告表现指标和人类反馈，这些反映了人们的兴趣。为了解决这些问题，我们提出了AdParaphrase，这是一个包含了对语义上等价但在措辞和风格上有差异的广告文本对的人类偏好的重述数据集。该数据集允许进行侧重于语言特征差异的偏好分析。我们的分析显示，被人类评委偏好的广告文本具有更高的流畅性、更长的长度、更多的名词和使用括号符号。此外，我们证明了一种考虑到这些发现的广告文本生成模型显着提高了给定文本的吸引力。该数据集可以在以下网址公开获取：https://github.com/CyberAgentAILab/AdParaphrase。

更新时间: 2025-02-11 05:36:24

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2502.04674v2

Column-wise Quantization of Weights and Partial Sums for Accurate and Efficient Compute-In-Memory Accelerators

Compute-in-memory (CIM) is an efficient method for implementing deep neural networks (DNNs) but suffers from substantial overhead from analog-to-digital converters (ADCs), especially as ADC precision increases. Low-precision ADCs can reduce this overhead but introduce partial-sum quantization errors degrading accuracy. Additionally, low-bit weight constraints, imposed by cell limitations and the need for multiple cells for higher-bit weights, present further challenges. While fine-grained partial-sum quantization has been studied to lower ADC resolution effectively, weight granularity, which limits overall partial-sum quantized accuracy, remains underexplored. This work addresses these challenges by aligning weight and partial-sum quantization granularities at the column-wise level. Our method improves accuracy while maintaining dequantization overhead, simplifies training by removing two-stage processes, and ensures robustness to memory cell variations via independent column-wise scale factors. We also propose an open-source CIM-oriented convolution framework to handle fine-grained weights and partial-sums efficiently, incorporating a novel tiling method and group convolution. Experimental results on ResNet-20 (CIFAR-10, CIFAR-100) and ResNet-18 (ImageNet) show accuracy improvements of 0.99%, 2.69%, and 1.01%, respectively, compared to the best-performing related works. Additionally, variation analysis reveals the robustness of our method against memory cell variations. These findings highlight the effectiveness of our quantization scheme in enhancing accuracy and robustness while maintaining hardware efficiency in CIM-based DNN implementations. Our code is available at https://github.com/jiyoonkm/ColumnQuant.

Updated: 2025-02-11 05:32:14

标题: 权重和部分和的列式量化，用于准确和高效的计算-内存加速器

摘要: 在存储器中进行计算（CIM）是实现深度神经网络（DNNs）的高效方法，但由于模拟到数字转换器（ADCs）的重大开销，尤其是当ADC精度增加时，会受到影响。低精度ADC可以减少这种开销，但会引入部分和量化误差，降低准确性。此外，由于单元限制和需要多个单元来处理更高位重量，低位权重约束带来进一步挑战。虽然已经研究了细粒度的部分和量化以有效降低ADC分辨率，但限制整体部分和量化精度的权重粒度仍未得到充分探讨。本研究通过在列级别上调整权重和部分和量化粒度来解决这些挑战。我们的方法在提高准确性的同时保持去量化开销，通过消除两阶段过程简化训练，并通过独立的列级别标度因子确保对存储单元变化的鲁棒性。我们还提出了一个面向CIM的开源卷积框架，有效处理细粒度权重和部分和，融合了一种新型平铺方法和组卷积。在ResNet-20（CIFAR-10，CIFAR-100）和ResNet-18（ImageNet）上的实验结果显示，与表现最佳的相关作品相比，准确性分别提高了0.99％，2.69％和1.01％。此外，变异分析显示了我们的方法对存储单元变化的鲁棒性。这些发现突显了我们的量化方案在增强准确性和鲁棒性的同时保持CIM-based DNN实现的硬件效率的有效性。我们的代码可在https://github.com/jiyoonkm/ColumnQuant找到。

更新时间: 2025-02-11 05:32:14

领域: cs.AR,cs.AI,cs.LG

下载: http://arxiv.org/abs/2502.07842v1

When More is Less: Understanding Chain-of-Thought Length in LLMs

Chain-of-thought (CoT) reasoning enhances the multi-step reasoning capabilities of large language models (LLMs) by breaking complex tasks into smaller, manageable sub-tasks. Researchers have been exploring ways to guide models to generate more complex CoT processes to improve the reasoning ability of LLMs, such as long CoT and the test-time scaling law. However, for most models and tasks, does an increase in CoT length consistently lead to improved reasoning accuracy? In this paper, we observe a nuanced relationship: as the number of reasoning steps increases, performance initially improves but eventually decreases. To understand this phenomenon, we provide a piece of evidence that longer reasoning processes are increasingly susceptible to noise. We theoretically prove the existence of an optimal CoT length and derive a scaling law for this optimal length based on model capability and task difficulty. Inspired by our theory, we conduct experiments on both synthetic and real world datasets and propose Length-filtered Vote to alleviate the effects of excessively long or short CoTs. Our findings highlight the critical need to calibrate CoT length to align with model capabilities and task demands, offering a principled framework for optimizing multi-step reasoning in LLMs.

Updated: 2025-02-11 05:28:59

标题: 更多并非更好：理解LLMs中的思维链长度

摘要: 思维链（CoT）推理通过将复杂任务分解为更小、可管理的子任务，提高了大型语言模型（LLMs）的多步推理能力。研究人员一直在探索引导模型生成更复杂的CoT过程以提高LLMs推理能力的方法，例如长CoT和测试时间缩放定律。然而，对于大多数模型和任务来说，增加CoT长度是否始终会导致推理准确性的提高？在本文中，我们观察到一个微妙的关系：随着推理步骤数量的增加，性能起初会提高，但最终会下降。为了理解这一现象，我们提供了一些证据表明，更长的推理过程越来越容易受到噪声的影响。我们在理论上证明了最佳CoT长度的存在，并根据模型能力和任务难度推导出了这种最佳长度的缩放定律。受我们理论的启发，我们在合成和真实世界数据集上进行实验，并提出了Length-filtered Vote来缓解过长或过短的CoT的影响。我们的发现强调了需要根据模型能力和任务需求调整CoT长度的重要性，为优化LLMs中的多步推理提供了一个基于原则的框架。

更新时间: 2025-02-11 05:28:59

领域: cs.AI,cs.CL,cs.LG

下载: http://arxiv.org/abs/2502.07266v1

Estimating LLM Uncertainty with Logits

In recent years, Large Language Models (LLMs) have seen remarkable advancements and have been extensively integrated across various fields. Despite their progress, LLMs are prone to hallucinations, producing responses that may not be dependable if the models lack sufficient grounding knowledge. To mitigate this issue, methods for estimating uncertainty have been adopted, with a focus on critical tokens as indicators of reliability. Nevertheless, probability-based approaches have shown limitations in assessing token-level reliability due to the erosion of evidence strength information acquired during training. In this paper, we introduce Logits-induced Token Uncertainty (LogU), a novel framework designed to estimate token-specific uncertainty in LLMs in real time, without the need for multiple sampling rounds. By leveraging evidence modeling for the implementation of LogU, we utilize the derived uncertainty measures to steer downstream tasks. Our experimental findings highlight the substantial effectiveness and potential of LogU, marking a significant advancement in addressing the challenge of model hallucinations.

Updated: 2025-02-11 05:26:22

标题: 用逻辑回归估计LLM的不确定性

摘要: 在近年来，大型语言模型（LLMs）取得了显著进展，并已广泛整合到各个领域中。尽管取得了进展，但LLMs容易产生幻觉，产生的响应可能不可靠，如果模型缺乏足够的基础知识。为了缓解这一问题，已采用了估计不确定性的方法，重点关注关键标记作为可靠性的指标。然而，基于概率的方法在评估标记级可靠性方面显示出局限性，因为在训练过程中获取的证据强度信息被侵蚀。在本文中，我们介绍了一种新颖的框架Logits-induced Token Uncertainty（LogU），旨在实时估计LLMs中特定标记的不确定性，无需多次采样轮次。通过利用证据建模来实现LogU，我们利用导出的不确定性度量来引导下游任务。我们的实验结果突显了LogU的显著有效性和潜力，标志着在解决模型幻觉挑战方面取得了重大进展。

更新时间: 2025-02-11 05:26:22

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2502.00290v2

Riemannian Proximal Sampler for High-accuracy Sampling on Manifolds

We introduce the Riemannian Proximal Sampler, a method for sampling from densities defined on Riemannian manifolds. The performance of this sampler critically depends on two key oracles: the Manifold Brownian Increments (MBI) oracle and the Riemannian Heat-kernel (RHK) oracle. We establish high-accuracy sampling guarantees for the Riemannian Proximal Sampler, showing that generating samples with $\varepsilon$-accuracy requires $O(\log(1/\varepsilon))$ iterations in Kullback-Leibler divergence assuming access to exact oracles and $O(\log^2(1/\varepsilon))$ iterations in the total variation metric assuming access to sufficiently accurate inexact oracles. Furthermore, we present practical implementations of these oracles by leveraging heat-kernel truncation and Varadhan's asymptotics. In the latter case, we interpret the Riemannian Proximal Sampler as a discretization of the entropy-regularized Riemannian Proximal Point Method on the associated Wasserstein space. We provide preliminary numerical results that illustrate the effectiveness of the proposed methodology.

Updated: 2025-02-11 05:08:47

标题: 黎曼接近采样器：在流形上进行高精度采样

摘要: 我们介绍了黎曼近似采样器（Riemannian Proximal Sampler），这是一种从定义在黎曼流形上的密度中采样的方法。该采样器的性能关键取决于两个重要的预测器：流形布朗增量（MBI）预测器和黎曼热核（RHK）预测器。我们为黎曼近似采样器建立了高精度的采样保证，表明在Kullback-Leibler散度下，以$\varepsilon$精度生成样本需要$O(\log(1/\varepsilon))$次迭代，假设可以访问精确的预测器；在总变差度量下，假设可以访问足够准确的不精确预测器，则需要$O(\log^2(1/\varepsilon))$次迭代。此外，我们通过利用热核截断和Varadhan的渐近方法，提出了这些预测器的实际实现。在后一种情况下，我们将黎曼近似采样器解释为与Wasserstein空间上的熵正则化黎曼近似点方法的离散化。我们提供了初步的数值结果，展示了所提出方法的有效性。

更新时间: 2025-02-11 05:08:47

领域: stat.ML,cs.LG,math.ST,stat.TH

下载: http://arxiv.org/abs/2502.07265v1

Flat U-Net: An Efficient Ultralightweight Model for Solar Filament Segmentation in Full-disk H$α$ Images

Solar filaments are one of the most prominent features observed on the Sun, and their evolutions are closely related to various solar activities, such as flares and coronal mass ejections. Real-time automated identification of solar filaments is the most effective approach to managing large volumes of data. Existing models of filament identification are characterized by large parameter sizes and high computational costs, which limit their future applications in highly integrated and intelligent ground-based and space-borne observation devices. Consequently, the design of more lightweight models will facilitate the advancement of intelligent observation equipment. In this study, we introduce Flat U-Net, a novel and highly efficient ultralightweight model that incorporates simplified channel attention (SCA) and channel self-attention (CSA) convolutional blocks for the segmentation of solar filaments in full-disk H$\alpha$ images. Feature information from each network layer is fully extracted to reconstruct interchannel feature representations. Each block effectively optimizes the channel features from the previous layer, significantly reducing parameters. The network architecture presents an elegant flattening, improving its efficiency, and simplifying the overall design. Experimental validation demonstrates that a model composed of pure SCAs achieves a precision of approximately 0.93, with dice similarity coefficient (DSC) and recall rates of 0.76 and 0.64, respectively, significantly outperforming the classical U-Net. Introducing a certain number of CSA blocks improves the DSC and recall rates to 0.82 and 0.74, respectively, which demonstrates a pronounced advantage, particularly concerning model weight size and detection effectiveness. The data set, models, and code are available as open-source resources.

Updated: 2025-02-11 04:57:33

标题: Flat U-Net：一种用于全圆盘H$α$图像太阳丝分割的高效超轻量模型

摘要: 太阳丝是太阳上观察到的最显著的特征之一，它们的演变与各种太阳活动密切相关，如耀斑和日冕物质抛射。实时自动识别太阳丝是管理大量数据的最有效方法。现有的太阳丝识别模型具有大参数大小和高计算成本的特点，这限制了它们在高度集成和智能地面和空间观测设备中的未来应用。因此，设计更轻量级的模型将有助于推动智能观测设备的发展。在本研究中，我们引入了Flat U-Net，这是一种新颖且高效的超轻量级模型，它结合了简化通道注意力（SCA）和通道自注意力（CSA）卷积块，用于在全盘Hα图像中分割太阳丝。来自每个网络层的特征信息被充分提取以重建通道间特征表示。每个块有效地优化了来自前一层的通道特征，显著减少了参数。网络架构呈现出一种优雅的扁平化，提高了其效率，并简化了整体设计。实验证实，由纯SCA组成的模型实现了约0.93的精度，Dice相似系数（DSC）和召回率分别达到0.76和0.64，明显优于经典的U-Net。引入一定数量的CSA块将DSC和召回率提高到0.82和0.74，这表明在模型重量大小和检测效果方面具有显著优势。数据集、模型和代码可作为开源资源获得。

更新时间: 2025-02-11 04:57:33

领域: astro-ph.IM,astro-ph.SR,cs.CV,cs.LG

下载: http://arxiv.org/abs/2502.07259v1

Action-Free Reasoning for Policy Generalization

End-to-end imitation learning offers a promising approach for training robot policies. However, generalizing to new settings remains a significant challenge. Although large-scale robot demonstration datasets have shown potential for inducing generalization, they are resource-intensive to scale. In contrast, human video data is abundant and diverse, presenting an attractive alternative. Yet, these human-video datasets lack action labels, complicating their use in imitation learning. Existing methods attempt to extract grounded action representations (e.g., hand poses), but resulting policies struggle to bridge the embodiment gap between human and robot actions. We propose an alternative approach: leveraging language-based reasoning from human videos-essential for guiding robot actions-to train generalizable robot policies. Building on recent advances in reasoning-based policy architectures, we introduce Reasoning through Action-free Data (RAD). RAD learns from both robot demonstration data (with reasoning and action labels) and action-free human video data (with only reasoning labels). The robot data teaches the model to map reasoning to low-level actions, while the action-free data enhances reasoning capabilities. Additionally, we will release a new dataset of 3,377 human-hand demonstrations with reasoning annotations compatible with the Bridge V2 benchmark and aimed at facilitating future research on reasoning-driven robot learning. Our experiments show that RAD enables effective transfer across the embodiment gap, allowing robots to perform tasks seen only in action-free data. Furthermore, scaling up action-free reasoning data significantly improves policy performance and generalization to novel tasks. These results highlight the promise of reasoning-driven learning from action-free datasets for advancing generalizable robot control. Project page: https://rad-generalization.github.io

Updated: 2025-02-11 04:51:45

标题: 无操作推理用于政策泛化

摘要: 端到端模仿学习提供了一个有前途的方法来训练机器人策略。然而，泛化到新环境仍然是一个重要挑战。尽管大规模的机器人演示数据集显示出诱导泛化的潜力，但它们需要资源密集型才能扩展。相比之下，人类视频数据丰富多样，是一个具有吸引力的替代方案。然而，这些人类视频数据集缺乏动作标签，使得它们在模仿学习中的使用变得复杂。现有方法试图提取基于行动的行动表示（例如，手势），但由此产生的策略很难弥合人类和机器人行动之间的具体差距。我们提出一种替代方法：利用来自人类视频的基于语言推理——这对引导机器人行动至关重要——来训练可泛化的机器人策略。基于最近在基于推理的策略架构方面的进展，我们引入了无行动数据推理（RAD）。RAD从机器人演示数据（带有推理和动作标签）和无行动人类视频数据（仅带有推理标签）中学习。机器人数据教导模型将推理映射到低级动作，而无行动数据增强了推理能力。此外，我们将发布一个包含3,377个带有推理注释的人类手部演示的新数据集，与Bridge V2基准兼容，旨在促进未来关于基于推理的机器人学习的研究。我们的实验表明，RAD实现了有效地跨越具体差距的转移，使机器人能够执行仅在无行动数据中看到的任务。此外，扩展无行动推理数据显著提高了策略性能和对新任务的泛化能力。这些结果突出了从无行动数据进行基于推理学习对推动可泛化机器人控制的潜力。项目页面：https://rad-generalization.github.io

更新时间: 2025-02-11 04:51:45

领域: cs.RO,cs.AI

下载: http://arxiv.org/abs/2502.03729v2

G2PDiffusion: Genotype-to-Phenotype Prediction with Diffusion Models

Discovering the genotype-phenotype relationship is crucial for genetic engineering, which will facilitate advances in fields such as crop breeding, conservation biology, and personalized medicine. Current research usually focuses on single species and small datasets due to limitations in phenotypic data collection, especially for traits that require visual assessments or physical measurements. Deciphering complex and composite phenotypes, such as morphology, from genetic data at scale remains an open question. To break through traditional generic models that rely on simplified assumptions, this paper introduces G2PDiffusion, the first-of-its-kind diffusion model designed for genotype-to-phenotype generation across multiple species. Specifically, we use images to represent morphological phenotypes across species and redefine phenotype prediction as conditional image generation. To this end, this paper introduces an environment-enhanced DNA sequence conditioner and trains a stable diffusion model with a novel alignment method to improve genotype-to-phenotype consistency. Extensive experiments demonstrate that our approach enhances phenotype prediction accuracy across species, capturing subtle genetic variations that contribute to observable traits.

Updated: 2025-02-11 04:42:11

标题: G2PDiffusion：使用扩散模型进行基因型到表型预测

摘要: 发现基因型与表型关系对于遗传工程至关重要，这将有助于推动作物育种、保护生物学和个性化医学等领域的进步。当前的研究通常集中在单一物种和小规模数据集上，这是因为表型数据收集存在限制，特别是对于需要视觉评估或物理测量的特征。从遗传数据中大规模解读复杂和综合性表型，如形态学，仍然是一个悬而未决的问题。为了突破依赖简化假设的传统通用模型，本文介绍了G2PDiffusion，这是第一个为多个物种设计的基因型到表型生成扩散模型。具体地，我们使用图像来代表跨物种的形态表型，并重新定义表型预测为条件图像生成。为此，本文引入了一个增强环境的DNA序列调节器，并使用一种新的对齐方法训练稳定的扩散模型，以改善基因型到表型的一致性。大量实验证明我们的方法提高了跨物种表型预测的准确性，捕捉了导致可观察特征的微小遗传变异。

更新时间: 2025-02-11 04:42:11

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2502.04684v2

Fairness in Multi-Agent AI: A Unified Framework for Ethical and Equitable Autonomous Systems

Ensuring fairness in decentralized multi-agent systems presents significant challenges due to emergent biases, systemic inefficiencies, and conflicting agent incentives. This paper provides a comprehensive survey of fairness in multi-agent AI, introducing a novel framework where fairness is treated as a dynamic, emergent property of agent interactions. The framework integrates fairness constraints, bias mitigation strategies, and incentive mechanisms to align autonomous agent behaviors with societal values while balancing efficiency and robustness. Through empirical validation, we demonstrate that incorporating fairness constraints results in more equitable decision-making. This work bridges the gap between AI ethics and system design, offering a foundation for accountable, transparent, and socially responsible multi-agent AI systems.

Updated: 2025-02-11 04:42:00

标题: 多智能体人工智能中的公平性：用于道德和公正自主系统的统一框架

摘要: 确保去中心化多智能体系统的公平性存在显著挑战，原因是出现的偏见、系统效率低下和智能体激励冲突。本文提供了对多智能体AI中公平性的全面调查，引入了一个新颖的框架，其中公平性被视为智能体相互作用的动态、出现性质。该框架整合了公平性约束、偏见缓解策略和激励机制，以使自主智能体的行为与社会价值观保持一致，同时平衡效率和稳健性。通过经验验证，我们证明了整合公平性约束会导致更加公平的决策制定。这项工作填补了人工智能伦理与系统设计之间的鸿沟，为负责任、透明和社会负责的多智能体AI系统提供了基础。

更新时间: 2025-02-11 04:42:00

领域: cs.MA,cs.AI,cs.CY

下载: http://arxiv.org/abs/2502.07254v1

Ensemble quantile-based deep learning framework for streamflow and flood prediction in Australian catchments

In recent years, climate extremes such as floods have created significant environmental and economic hazards for Australia. Deep learning methods have been promising for predicting extreme climate events; however, large flooding events present a critical challenge due to factors such as model calibration and missing data. We present an ensemble quantile-based deep learning framework that addresses large-scale streamflow forecasts using quantile regression for uncertainty projections in prediction. We evaluate selected univariate and multivariate deep learning models and catchment strategies. Furthermore, we implement a multistep time-series prediction model using the CAMELS dataset for selected catchments across Australia. The ensemble model employs a set of quantile deep learning models for streamflow determined by historical streamflow data. We utilise the streamflow prediction and obtain flood probability using flood frequency analysis and compare it with historical flooding events for selected catchments. Our results demonstrate notable efficacy and uncertainties in streamflow forecasts with varied catchment properties. Our flood probability estimates show good accuracy in capturing the historical floods from the selected catchments. This underscores the potential for our deep learning framework to revolutionise flood forecasting across diverse regions and be implemented as an early warning system.

Updated: 2025-02-11 04:41:10

标题: 基于集成分位数的深度学习框架用于澳大利亚集水区的流量和洪水预测

摘要: 近年来，诸如洪水等气候极端事件为澳大利亚带来了重大的环境和经济危害。深度学习方法在预测极端气候事件方面表现出了潜在的优势；然而，大规模洪水事件由于模型校准和缺失数据等因素而面临重大挑战。我们提出了一种基于合奏分位数的深度学习框架，通过分位数回归在预测中进行不确定性投影，以解决大规模水流预测的问题。我们评估了选择的单变量和多变量深度学习模型和集水区策略。此外，我们利用CAMELS数据集为澳大利亚的选定集水区实现了多步时间序列预测模型。集合模型利用由历史水流数据确定的一组分位深度学习模型进行水流预测。我们利用水流预测结果并通过洪水频率分析获取洪水概率，并将其与选定集水区的历史洪水事件进行比较。我们的结果展示了在各种集水区属性下水流预测中的显著效力和不确定性。我们的洪水概率估计显示出良好的准确性，能够捕捉选定集水区的历史洪水事件。这突显了我们的深度学习框架在革新跨不同地区的洪水预测方面的潜力，并可作为一个早期预警系统来实施。

更新时间: 2025-02-11 04:41:10

领域: cs.LG,physics.ao-ph,stat.AP,stat.ML

下载: http://arxiv.org/abs/2407.15882v2

NARCE: A Mamba-Based Neural Algorithmic Reasoner Framework for Online Complex Event Detection

Current machine learning models excel in short-span perception tasks but struggle to derive high-level insights from long-term observation, a capability central to understanding complex events (CEs). CEs, defined as sequences of short-term atomic events (AEs) governed by spatiotemporal rules, are challenging to detect online due to the need to extract meaningful patterns from long and noisy sensor data while ignoring irrelevant events. We hypothesize that state-based methods are well-suited for CE detection, as they capture event progression through state transitions without requiring long-term memory. Baseline experiments validate this, demonstrating that the state-space model Mamba outperforms existing architectures. However, Mamba's reliance on extensive labeled data, which are difficult to obtain, motivates our second hypothesis: decoupling CE rule learning from noisy sensor data can reduce data requirements. To address this, we propose NARCE, a framework that combines Neural Algorithmic Reasoning (NAR) to split the task into two components: (i) learning CE rules independently of sensor data using synthetic concept traces generated by LLMs and (ii) mapping sensor inputs to these rules via an adapter. Our results show that NARCE outperforms baselines in accuracy, generalization to unseen and longer sensor data, and data efficiency, significantly reducing annotation costs while advancing robust CE detection.

Updated: 2025-02-11 04:34:53

标题: NARCE：一种基于曼巴的在线复杂事件检测神经算法推理框架

摘要: 目前的机器学习模型在短期感知任务方面表现出色，但在从长期观察中获得高层次洞察力方面却很难，这是理解复杂事件（CEs）的核心能力。CEs被定义为由时空规则控制的一系列短期原子事件（AEs），由于需要从长时间和嘈杂的传感器数据中提取有意义的模式，同时忽略无关的事件，因此在线检测它们是具有挑战性的。我们假设基于状态的方法非常适合于CE检测，因为它们通过状态转换捕获事件的进展，而无需长期记忆。基准实验验证了这一点，表明状态空间模型Mamba优于现有的架构。然而，Mamba对大量标记数据的依赖性，这些数据很难获取，激发了我们的第二个假设：将CE规则学习与嘈杂的传感器数据解耦可以减少数据需求。为了解决这个问题，我们提出了NARCE，这是一个框架，将神经算法推理（NAR）与任务分为两个组件：（i）使用LLMs生成的合成概念跟踪独立于传感器数据学习CE规则，以及（ii）通过一个适配器将传感器输入映射到这些规则。我们的结果显示，NARCE在准确性、对未见过的更长传感器数据的泛化能力和数据效率方面优于基线，显著降低了注释成本，同时推进了稳健的CE检测。

更新时间: 2025-02-11 04:34:53

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2502.07250v1

Large Cognition Model: Towards Pretrained EEG Foundation Model

Electroencephalography provides a non-invasive window into brain activity, offering valuable insights for neurological research, brain-computer interfaces, and clinical diagnostics. However, the development of robust machine learning models for EEG analysis is hindered by the scarcity of large-scale, well-annotated datasets and the inherent variability of EEG signals across subjects and recording conditions. Inspired by the success of foundation models in natural language processing and computer vision, we propose the Large Cognition Model-a transformer-based foundation model designed to generalize across diverse EEG datasets and downstream tasks. Unlike traditional approaches, our proposed transformer-based architecture demonstrates strong generalization capabilities across datasets and tasks, even without pretraining, surpassing some existing EEG universal models on specific downstream applications. LCM leverages large-scale self-supervised learning techniques to capture universal EEG representations, enabling efficient fine-tuning for applications such as cognitive state decoding, disease classification, and neurofeedback systems. We introduce a novel architecture that integrates temporal and spectral attention mechanisms, optimizing the model's ability to extract meaningful features from raw EEG signals. Extensive evaluations demonstrate that LCM outperforms state-of-the-art approaches across multiple EEG benchmarks, exhibiting strong cross-subject and cross-task generalization. Our findings highlight the potential of pretrained EEG foundation models to accelerate advancements in neuroscience, personalized medicine, and BCI technology.

Updated: 2025-02-11 04:28:10

标题: 大型认知模型：朝向预训练脑电图基础模型

摘要: 脑电图提供了一个非侵入性的窗口来观察大脑活动，为神经科学研究、脑-计算机界面和临床诊断提供了宝贵的见解。然而，由于大规模、良好注释的数据集的稀缺性以及不同受试者和记录条件下脑电图信号的固有变异性，对于脑电图分析的稳健机器学习模型的发展受到了阻碍。受自然语言处理和计算机视觉中基础模型成功的启发，我们提出了大认知模型-一种基于变压器的基础模型，旨在泛化各种脑电图数据集和下游任务。与传统方法不同，我们提出的基于变压器的架构展示了在数据集和任务之间的强大泛化能力，甚至在没有预训练的情况下，超过了一些现有的特定下游应用的脑电图通用模型。LCM利用大规模自监督学习技术来捕捉通用脑电图表示，从而实现对认知状态解码、疾病分类和神经反馈系统等应用的高效微调。我们引入了一种集成时间和频谱注意机制的新型架构，优化了模型从原始脑电图信号中提取有意义特征的能力。广泛的评估表明，LCM在多个脑电图基准测试中表现优于最先进的方法，展现了强大的跨受试者和跨任务的泛化能力。我们的发现突显了预训练的脑电图基础模型在加速神经科学、个性化医学和脑-计算机界面技术进步方面的潜力。

更新时间: 2025-02-11 04:28:10

领域: eess.SP,cs.LG,q-bio.NC

下载: http://arxiv.org/abs/2502.17464v1

Linear Transformers as VAR Models: Aligning Autoregressive Attention Mechanisms with Autoregressive Forecasting

Autoregressive attention-based time series forecasting (TSF) has drawn increasing interest, with mechanisms like linear attention sometimes outperforming vanilla attention. However, deeper Transformer architectures frequently misalign with autoregressive objectives, obscuring the underlying VAR structure embedded within linear attention and hindering their ability to capture the data generative processes in TSF. In this work, we first show that a single linear attention layer can be interpreted as a dynamic vector autoregressive (VAR) structure. We then explain that existing multi-layer Transformers have structural mismatches with the autoregressive forecasting objective, which impair interpretability and generalization ability. To address this, we show that by rearranging the MLP, attention, and input-output flow, multi-layer linear attention can also be aligned as a VAR model. Then, we propose Structural Aligned Mixture of VAR (SAMoVAR), a linear Transformer variant that integrates interpretable dynamic VAR weights for multivariate TSF. By aligning the Transformer architecture with autoregressive objectives, SAMoVAR delivers improved performance, interpretability, and computational efficiency, comparing to SOTA TSF models.

Updated: 2025-02-11 04:24:43

标题: 线性变换器作为VAR模型：将自回归注意机制与自回归预测对齐

摘要: 自回归注意力机制的时间序列预测（TSF）引起了越来越多的关注，具有线性注意力等机制有时优于普通注意力。然而，更深的Transformer架构经常与自回归目标不一致，遮蔽了线性注意力中嵌入的VAR结构，并阻碍了其捕捉TSF中数据生成过程的能力。在这项工作中，我们首先展示了单个线性注意力层可以解释为动态向量自回归（VAR）结构。然后，我们解释了现有的多层Transformer与自回归预测目标存在结构不匹配，影响了可解释性和泛化能力。为了解决这个问题，我们展示通过重新排列MLP、注意力和输入输出流，多层线性注意力也可以对齐为VAR模型。然后，我们提出了结构对齐的VAR混合（SAMoVAR），这是一种线性Transformer变种，集成了可解释的动态VAR权重用于多变量TSF。通过将Transformer架构与自回归目标对齐，SAMoVAR相比于SOTA TSF模型，在性能、可解释性和计算效率上都有所提升。

更新时间: 2025-02-11 04:24:43

领域: cs.LG,cs.AI,stat.ML

下载: http://arxiv.org/abs/2502.07244v1

Vevo: Controllable Zero-Shot Voice Imitation with Self-Supervised Disentanglement

The imitation of voice, targeted on specific speech attributes such as timbre and speaking style, is crucial in speech generation. However, existing methods rely heavily on annotated data, and struggle with effectively disentangling timbre and style, leading to challenges in achieving controllable generation, especially in zero-shot scenarios. To address these issues, we propose Vevo, a versatile zero-shot voice imitation framework with controllable timbre and style. Vevo operates in two core stages: (1) Content-Style Modeling: Given either text or speech's content tokens as input, we utilize an autoregressive transformer to generate the content-style tokens, which is prompted by a style reference; (2) Acoustic Modeling: Given the content-style tokens as input, we employ a flow-matching transformer to produce acoustic representations, which is prompted by a timbre reference. To obtain the content and content-style tokens of speech, we design a fully self-supervised approach that progressively decouples the timbre, style, and linguistic content of speech. Specifically, we adopt VQ-VAE as the tokenizer for the continuous hidden features of HuBERT. We treat the vocabulary size of the VQ-VAE codebook as the information bottleneck, and adjust it carefully to obtain the disentangled speech representations. Solely self-supervised trained on 60K hours of audiobook speech data, without any fine-tuning on style-specific corpora, Vevo matches or surpasses existing methods in accent and emotion conversion tasks. Additionally, Vevo's effectiveness in zero-shot voice conversion and text-to-speech tasks further demonstrates its strong generalization and versatility. Audio samples are available at https://versavoice.github.io.

Updated: 2025-02-11 04:18:33

标题: Vevo：具有自监督解缠的可控零样本语音模仿

摘要: 声音的模仿，针对特定的语音属性，如音质和说话风格，对于语音生成至关重要。然而，现有的方法严重依赖注释数据，并且在有效解开音质和风格方面遇到困难，导致在实现可控生成方面存在挑战，尤其是在零样本情况下。为了解决这些问题，我们提出了Vevo，一个具有可控音质和风格的多功能零样本声音模仿框架。Vevo分为两个核心阶段：（1）内容-风格建模：给定文本或语音的内容令牌作为输入，我们利用自回归变换器生成由风格参考提示的内容-风格令牌；（2）声学建模：给定内容-风格令牌作为输入，我们采用流匹配变换器生成由音质参考提示的声学表示。为了获得语音的内容和内容-风格令牌，我们设计了一种完全自监督的方法，逐步解开语音的音质、风格和语言内容。具体来说，我们采用VQ-VAE作为HuBERT的连续隐藏特征的标记器。我们将VQ-VAE码书的词汇大小视为信息瓶颈，并仔细调整以获得解开的语音表示。在仅自监督训练了60K小时的有声书语音数据，没有在特定风格语料库上进行微调的情况下，Vevo在口音和情感转换任务中与现有方法匹敌甚至超越。此外，Vevo在零样本语音转换和文本转语音任务中的有效性进一步证明了其强大的泛化和多功能性。音频样本可在https://versavoice.github.io上获得。

更新时间: 2025-02-11 04:18:33

领域: cs.SD,cs.AI

下载: http://arxiv.org/abs/2502.07243v1

Contextual Gesture: Co-Speech Gesture Video Generation through Context-aware Gesture Representation

Co-speech gesture generation is crucial for creating lifelike avatars and enhancing human-computer interactions by synchronizing gestures with speech. Despite recent advancements, existing methods struggle with accurately identifying the rhythmic or semantic triggers from audio for generating contextualized gesture patterns and achieving pixel-level realism. To address these challenges, we introduce Contextual Gesture, a framework that improves co-speech gesture video generation through three innovative components: (1) a chronological speech-gesture alignment that temporally connects two modalities, (2) a contextualized gesture tokenization that incorporate speech context into motion pattern representation through distillation, and (3) a structure-aware refinement module that employs edge connection to link gesture keypoints to improve video generation. Our extensive experiments demonstrate that Contextual Gesture not only produces realistic and speech-aligned gesture videos but also supports long-sequence generation and video gesture editing applications, shown in Fig.1 Project Page: https://andypinxinliu.github.io/Contextual-Gesture/.

Updated: 2025-02-11 04:09:12

标题: 上下文手势：通过上下文感知手势表征生成共语手势视频

摘要: 共语言手势生成对于创建栩栩如生的虚拟形象以及通过将手势与语音同步来增强人机交互至关重要。尽管最近取得了进展，现有方法在准确识别音频中的节奏或语义触发器以生成情境化手势模式并实现像素级逼真度方面仍存在困难。为了解决这些挑战，我们引入了Contextual Gesture，一个通过三个创新组件改进共语言手势视频生成的框架：(1) 时间上连接两种模态的时间顺序语音-手势对齐，(2) 通过蒸馏将语音上下文结合到动作模式表示中的情境化手势标记化，以及(3) 利用边缘连接的结构感知细化模块，将手势关键点连接起来以改进视频生成。我们的大量实验表明，Contextual Gesture不仅能够生成逼真且与语音对齐的手势视频，还支持长序列生成和视频手势编辑应用，如图1所示项目页面：https://andypinxinliu.github.io/Contextual-Gesture/。

更新时间: 2025-02-11 04:09:12

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2502.07239v1

Diffusion Suction Grasping with Large-Scale Parcel Dataset

While recent advances in object suction grasping have shown remarkable progress, significant challenges persist particularly in cluttered and complex parcel handling scenarios. Two fundamental limitations hinder current approaches: (1) the lack of a comprehensive suction grasp dataset tailored for parcel manipulation tasks, and (2) insufficient adaptability to diverse object characteristics including size variations, geometric complexity, and textural diversity. To address these challenges, we present Parcel-Suction-Dataset, a large-scale synthetic dataset containing 25 thousand cluttered scenes with 410 million precision-annotated suction grasp poses. This dataset is generated through our novel geometric sampling algorithm that enables efficient generation of optimal suction grasps incorporating both physical constraints and material properties. We further propose Diffusion-Suction, an innovative framework that reformulates suction grasp prediction as a conditional generation task through denoising diffusion probabilistic models. Our method iteratively refines random noise into suction grasp score maps through visual-conditioned guidance from point cloud observations, effectively learning spatial point-wise affordances from our synthetic dataset. Extensive experiments demonstrate that the simple yet efficient Diffusion-Suction achieves new state-of-the-art performance compared to previous models on both Parcel-Suction-Dataset and the public SuctionNet-1Billion benchmark.

Updated: 2025-02-11 04:09:11

标题: 扩展包数据集下的扩散吸附抓取

摘要: 尽管最近在物体吸盘抓取方面取得了显著进展，但在充满杂乱和复杂包裹处理场景中仍然存在重大挑战。目前方法存在两个根本限制：(1) 缺乏专门针对包裹操作任务的全面吸盘抓取数据集，以及(2) 对不同物体特征的适应性不足，包括尺寸变化、几何复杂性和纹理多样性。为了解决这些挑战，我们提出了Parcel-Suction-Dataset，这是一个大规模合成数据集，包含25万个杂乱场景和4.1亿个精确注释的吸盘抓取姿势。该数据集是通过我们的新颖几何采样算法生成的，该算法能够高效地生成考虑物理约束和材料特性的最佳吸盘抓取。我们进一步提出了Diffusion-Suction，这是一个创新的框架，通过去噪扩散概率模型将吸盘抓取预测重新表述为条件生成任务。我们的方法通过从点云观察中获得的视觉条件引导，将随机噪声迭代地转化为吸盘抓取得分图，有效地从我们的合成数据集中学习空间点位可承受力。大量实验证明，简单而高效的Diffusion-Suction相对于先前模型在Parcel-Suction-Dataset和公开的SuctionNet-1Billion基准上取得了新的最先进的性能。

更新时间: 2025-02-11 04:09:11

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2502.07238v1

DrugImproverGPT: A Large Language Model for Drug Optimization with Fine-Tuning via Structured Policy Optimization

Finetuning a Large Language Model (LLM) is crucial for generating results towards specific objectives. This research delves into the realm of drug optimization and introduce a novel reinforcement learning algorithm to finetune a drug optimization LLM-based generative model, enhancing the original drug across target objectives, while retains the beneficial chemical properties of the original drug. This work is comprised of two primary components: (1) DrugImprover: A framework tailored for improving robustness and efficiency in drug optimization. It includes a LLM designed for drug optimization and a novel Structured Policy Optimization (SPO) algorithm, which is theoretically grounded. This algorithm offers a unique perspective for fine-tuning the LLM-based generative model by aligning the improvement of the generated molecule with the input molecule under desired objectives. (2) A dataset of 1 million compounds, each with OEDOCK docking scores on 5 human proteins associated with cancer cells and 24 binding sites from SARS-CoV-2 virus. We conduct a comprehensive evaluation of SPO and demonstrate its effectiveness in improving the original drug across target properties. Our code and dataset will be publicly available at: https://github.com/xuefeng-cs/DrugImproverGPT.

Updated: 2025-02-11 04:00:21

标题: DrugImproverGPT: 通过结构化策略优化微调的用于药物优化的大型语言模型

摘要: 对大型语言模型（LLM）进行微调对于实现特定目标至关重要。本研究深入探讨药物优化领域，并引入一种新颖的强化学习算法来微调基于LLM的药物优化生成模型，提升原始药物在目标属性上的表现，同时保留原始药物的有益化学特性。本工作由两个主要组成部分组成：（1）DrugImprover：一个旨在提高药物优化的鲁棒性和效率的框架。它包括一个专为药物优化设计的LLM和一种理论上基础的新颖结构化策略优化（SPO）算法。该算法通过将生成的分子的改进与输入分子在所需目标下进行对齐，为微调基于LLM的生成模型提供了独特的视角。（2）一个包含100万化合物的数据集，每个化合物在与癌细胞相关的5个人类蛋白质和来自SARS-CoV-2病毒的24个结合位点上具有OEDOCK对接分数。我们对SPO进行了全面评估，并展示了其在提升原始药物在目标属性上的有效性。我们的代码和数据集将在以下链接上公开提供：https://github.com/xuefeng-cs/DrugImproverGPT。

更新时间: 2025-02-11 04:00:21

领域: cs.LG,cs.CL,q-bio.BM,stat.ML

下载: http://arxiv.org/abs/2502.07237v1

OmniBal: Towards Fast Instruct-tuning for Vision-Language Models via Omniverse Computation Balance

Recently, vision-language instruct-tuning models have made significant progress due to their more comprehensive understanding of the world. In this work, we discovered that large-scale 3D parallel training on those models leads to an imbalanced computation load across different devices. The vision and language parts are inherently heterogeneous: their data distribution and model architecture differ significantly, which affects distributed training efficiency. We rebalanced the computational loads from data, model, and memory perspectives to address this issue, achieving more balanced computation across devices. These three components are not independent but are closely connected, forming an omniverse balanced training framework. Specifically, for the data, we grouped instances into new balanced mini-batches within and across devices. For the model, we employed a search-based method to achieve a more balanced partitioning. For memory optimization, we adaptively adjusted the re-computation strategy for each partition to utilize the available memory fully. We conducted extensive experiments to validate the effectiveness of our method. Compared with the open-source training code of InternVL-Chat, we significantly reduced GPU days, achieving about 1.8x speed-up. Our method's efficacy and generalizability were further demonstrated across various models and datasets. Codes will be released at https://github.com/ModelTC/OmniBal.

Updated: 2025-02-11 03:53:46

标题: OmniBal：通过全宇宙计算平衡实现视觉语言模型的快速指导调整

摘要: 最近，由于对世界的更全面理解，视觉语言调整模型取得了显著进展。在这项工作中，我们发现大规模的三维并行训练会导致不同设备之间的计算负载不平衡。视觉和语言部分本质上是异质的：它们的数据分布和模型架构存在显著差异，这影响了分布式训练的效率。我们从数据、模型和内存角度重新平衡了计算负载，以解决这个问题，实现了跨设备更平衡的计算。这三个组件并非独立的，而是密切连接在一起，形成了一个全方位平衡训练框架。具体来说，对于数据，我们将实例分组到新的平衡小批次中，跨设备进行操作。对于模型，我们采用基于搜索的方法实现了更平衡的划分。对于内存优化，我们根据每个分区自适应地调整重新计算策略，充分利用可用内存。我们进行了大量实验证实了我们方法的有效性。与InternVL-Chat的开源训练代码相比，我们大大减少了GPU天数，实现了约1.8倍的加速。我们的方法的功效和泛化性在各种模型和数据集上进一步得到验证。代码将发布在https://github.com/ModelTC/OmniBal。

更新时间: 2025-02-11 03:53:46

领域: cs.AI

下载: http://arxiv.org/abs/2407.20761v3

STP: Self-play LLM Theorem Provers with Iterative Conjecturing and Proving

A fundamental challenge in formal theorem proving by LLMs is the lack of high-quality training data. Although reinforcement learning or expert iteration partially mitigates this issue by alternating between LLM generating proofs and finetuning them on correctly generated ones, performance quickly plateaus due to the scarcity of correct proofs (sparse rewards). To keep improving the models with limited data, we draw inspiration from mathematicians, who continuously develop new results, partly by proposing novel conjectures or exercises (which are often variants of known results) and attempting to solve them. We design the Self-play Theorem Prover (STP) that simultaneously takes on two roles, conjecturer and prover, each providing training signals to the other. The conjecturer is trained iteratively on previously generated conjectures that are barely provable by the current prover, which incentivizes it to generate increasingly challenging conjectures over time. The prover attempts to prove the conjectures with standard expert iteration. We evaluate STP with both Lean and Isabelle formal versifiers. With 19.8 billion tokens generated during the training in Lean, STP proves 26.3% of the statements in the LeanWorkbook dataset, doubling the previous best result of 13.2% achieved through expert iteration. The final model achieves state-of-the-art performance among whole-proof generation methods on miniF2F-test (61.7%, pass@3200), Proofnet-test (23.1%, pass@3200) and PutnamBench (8/644, pass@3200).

Updated: 2025-02-11 03:52:52

标题: STP：具有迭代猜测和证明的自我对弈LLM定理证明器

摘要: LLMs在形式定理证明中面临的一个基本挑战是缺乏高质量的训练数据。尽管强化学习或专家迭代通过在LLMs生成证明和对正确生成的证明进行微调之间交替部分缓解了这个问题，但由于正确证明的稀缺性（稀疏奖励），性能很快就会达到平稳状态。为了在有限的数据集上持续改进模型，我们从数学家身上得到了灵感，他们通过提出新的猜想或练习（通常是已知结果的变体）并尝试解决它们来持续发展新的结果。我们设计了自我对弈定理证明器（STP），它同时扮演两个角色，猜想者和证明者，每个角色都向另一个提供训练信号。猜想者被迭代训练，用以前生成的几乎无法被当前证明者证明的猜想，这鼓励它随着时间的推移生成越来越具有挑战性的猜想。证明者尝试用标准的专家迭代来证明这些猜想。我们使用Lean和Isabelle形式验证器对STP进行评估。在Lean的训练过程中生成了198亿个标记，STP证明了LeanWorkbook数据集中26.3%的陈述，这使得先前通过专家迭代达到的13.2%的最佳结果翻了一番。最终模型在miniF2F-test（61.7%，通过率为3200）、Proofnet-test（23.1%，通过率为3200）和PutnamBench（8/644，通过率为3200）上实现了最先进的整个证明生成方法的性能。

更新时间: 2025-02-11 03:52:52

领域: cs.LG,cs.AI,cs.LO

下载: http://arxiv.org/abs/2502.00212v3

Simplifying Adversarially Robust PAC Learning with Tolerance

Adversarially robust PAC learning has proved to be challenging, with the currently best known learners [Montasser et al., 2021a] relying on improper methods based on intricate compression schemes, resulting in sample complexity exponential in the VC-dimension. A series of follow up work considered a slightly relaxed version of the problem called adversarially robust learning with tolerance [Ashtiani et al., 2023, Bhattacharjee et al., 2023, Raman et al., 2024] and achieved better sample complexity in terms of the VC-dimension. However, those algorithms were either improper and complex, or required additional assumptions on the hypothesis class H. We prove, for the first time, the existence of a simpler learner that achieves a sample complexity linear in the VC-dimension without requiring additional assumptions on H. Even though our learner is improper, it is "almost proper" in the sense that it outputs a hypothesis that is "similar" to a hypothesis in H. We also use the ideas from our algorithm to construct a semi-supervised learner in the tolerant setting. This simple algorithm achieves comparable bounds to the previous (non-tolerant) semi-supervised algorithm of Attias et al. [2022a], but avoids the use of intricate subroutines from previous works, and is "almost proper."

Updated: 2025-02-11 03:48:40

标题: 简化具有容忍度的对抗性强 PAC 学习

摘要: 对抗鲁棒的PAC学习一直是具有挑战性的，目前已知的最佳学习者[Montasser等，2021a]依赖于基于复杂压缩方案的不适当方法，导致样本复杂度与VC维度呈指数关系。一系列后续工作考虑了一个稍微放松的问题版本，称为带容忍度的对抗鲁棒学习[Ashtiani等，2023，Bhattacharjee等，2023，Raman等，2024]，并在VC维度方面实现了更好的样本复杂度。然而，这些算法要么是不适当且复杂的，要么需要对假设类H进行额外的假设。我们首次证明了存在一个更简单的学习者，其样本复杂度与VC维度呈线性关系，而不需要对H进行额外的假设。尽管我们的学习者是不适当的，但它在某种意义上是“几乎适当的”，因为它输出一个与H中的假设“相似”的假设。我们还利用我们算法的思想构建了一个在宽容设置中的半监督学习者。这个简单的算法实现了与Attias等[2022a]之前的（非宽容的）半监督算法相当的界限，但避免了先前工作中复杂的子程序的使用，并且是“几乎适当的”。

更新时间: 2025-02-11 03:48:40

领域: cs.LG

下载: http://arxiv.org/abs/2502.07232v1

Revisiting the Auxiliary Data in Backdoor Purification

Backdoor attacks occur when an attacker subtly manipulates machine learning models during the training phase, leading to unintended behaviors when specific triggers are present. To mitigate such emerging threats, a prevalent strategy is to cleanse the victim models by various backdoor purification techniques. Despite notable achievements, current state-of-the-art (SOTA) backdoor purification techniques usually rely on the availability of a small clean dataset, often referred to as auxiliary dataset. However, acquiring an ideal auxiliary dataset poses significant challenges in real-world applications. This study begins by assessing the SOTA backdoor purification techniques across different types of real-world auxiliary datasets. Our findings indicate that the purification effectiveness fluctuates significantly depending on the type of auxiliary dataset used. Specifically, a high-quality in-distribution auxiliary dataset is essential for effective purification, whereas datasets from varied or out-of-distribution sources significantly degrade the defensive performance. Based on this, we propose Guided Input Calibration (GIC), which aims to improve purification efficacy by employing a learnable transformation. Guided by the victim model itself, GIC aligns the characteristics of the auxiliary dataset with those of the original training set. Comprehensive experiments demonstrate that GIC can substantially enhance purification performance across diverse types of auxiliary datasets. The code and data will be available via https://github.com/shawkui/BackdoorBenchER.

Updated: 2025-02-11 03:46:35

标题: 重新审视后门净化中的辅助数据

摘要: 后门攻击发生在攻击者在训练阶段微妙地操纵机器学习模型，导致特定触发器出现时意外行为。为了缓解这种新兴威胁，一种普遍的策略是通过各种后门净化技术来清理受害模型。尽管取得了显著成就，但目前的最新技术（SOTA）后门净化技术通常依赖于小型干净数据集，通常称为辅助数据集。然而，在实际应用中获取理想的辅助数据集面临重大挑战。本研究首先评估了不同类型的真实世界辅助数据集中的SOTA后门净化技术。我们的研究结果表明，净化效果在很大程度上取决于使用的辅助数据集类型。具体而言，高质量的分布内辅助数据集对有效净化至关重要，而来自不同或分布外来源的数据集显著降低了防御性能。基于此，我们提出了引导输入校准（GIC），旨在通过使用可学习的转换来提高净化效果。在受害模型的指导下，GIC将辅助数据集的特征与原始训练集的特征对齐。全面的实验表明，GIC可以显著增强在各种类型的辅助数据集上的净化性能。代码和数据将通过https://github.com/shawkui/BackdoorBenchER提供。

更新时间: 2025-02-11 03:46:35

领域: cs.CR

下载: http://arxiv.org/abs/2502.07231v1

AiRacleX: Automated Detection of Price Oracle Manipulations via LLM-Driven Knowledge Mining and Prompt Generation

Decentralized finance (DeFi) applications depend on accurate price oracles to ensure secure transactions, yet these oracles are highly vulnerable to manipulation, enabling attackers to exploit smart contract vulnerabilities for unfair asset valuation and financial gain. Detecting such manipulations traditionally relies on the manual effort of experienced experts, presenting significant challenges. In this paper, we propose a novel LLM-driven framework that automates the detection of price oracle manipulations by leveraging the complementary strengths of different LLM models (LLMs). Our approach begins with domain-specific knowledge extraction, where an LLM model synthesizes precise insights about price oracle vulnerabilities from top-tier academic papers, eliminating the need for profound expertise from developers or auditors. This knowledge forms the foundation for a second LLM model to generate structured, context-aware chain of thought prompts, which guide a third LLM model in accurately identifying manipulation patterns in smart contracts. We validate the effectiveness of framework through experiments on 60 known vulnerabilities from 46 real-world DeFi attacks or projects spanning 2021 to 2023. The best performing combination of LLMs (Haiku-Haiku-4o-mini) identified by AiRacleX demonstrate a 2.58-times improvement in recall (0.667 vs 0.259) compared to the state-of-the-art tool GPTScan, while maintaining comparable precision. Furthermore, our framework demonstrates the feasibility of replacing commercial models with open-source alternatives, enhancing privacy and security for developers.

Updated: 2025-02-11 03:40:13

标题: AiRacleX: 基于LLM驱动知识挖掘和提示生成的价格预言机操纵自动检测

摘要: DeFi应用程序依赖于准确的价格预言机来确保安全交易，然而这些预言机极易受到操纵，使攻击者能够利用智能合约漏洞进行资产估值和金融收益的不公平利用。传统上，检测此类操纵依赖于经验丰富的专家的手动努力，面临重大挑战。在本文中，我们提出了一个新颖的LLM驱动框架，通过利用不同LLM模型的互补优势，自动化检测价格预言机操纵。我们的方法从领域特定知识提取开始，其中一个LLM模型从顶尖学术论文中综合出关于价格预言机漏洞的精确见解，消除了开发人员或审计人员对深刻专业知识的需求。这些知识构成了第二个LLM模型的基础，生成结构化、上下文感知的思维链提示，指导第三个LLM模型准确识别智能合约中的操纵模式。我们通过对涵盖2021年至2023年的46起真实世界DeFi攻击或项目中的60个已知漏洞进行实验证明了该框架的有效性。由AiRacleX确定的LLM组合（Haiku-Haiku-4o-mini）相比于最先进的工具GPTScan实现了召回率的2.58倍改进（0.667 vs 0.259），同时保持可比较的精度。此外，我们的框架展示了用开源替代品取代商业模型的可行性，增强了开发人员的隐私和安全性。

更新时间: 2025-02-11 03:40:13

领域: cs.CR,cs.AI

下载: http://arxiv.org/abs/2502.06348v2

Are KANs Effective for Multivariate Time Series Forecasting?

Multivariate time series forecasting is a crucial task that predicts the future states based on historical inputs. Related techniques have been developing in parallel with the machine learning community, from early statistical learning methods to current deep learning methods. Despite their significant advancements, existing methods continue to struggle with the challenge of inadequate interpretability. The rise of the Kolmogorov-Arnold Network (KAN) provides a new perspective to solve this challenge, but current work has not yet concluded whether KAN is effective in time series forecasting tasks. In this paper, we aim to evaluate the effectiveness of KANs in time-series forecasting from the perspectives of performance, integrability, efficiency, and interpretability. To this end, we propose the Multi-layer Mixture-of-KAN network (MMK), which achieves excellent performance while retaining KAN's ability to be transformed into a combination of symbolic functions. The core module of MMK is the mixture-of-KAN layer, which uses a mixture-of-experts structure to assign variables to best-matched KAN experts. Then, we explore some useful experimental strategies to deal with the issues in the training stage. Finally, we compare MMK and various baselines on seven datasets. Extensive experimental and visualization results demonstrate that KANs are effective in multivariate time series forecasting. Code is available at: https://github.com/2448845600/EasyTSF.

Updated: 2025-02-11 03:38:57

标题: KAN是否有效用于多变量时间序列预测？

摘要: 多元时间序列预测是一项关键任务，它基于历史输入预测未来状态。相关技术与机器学习社区同步发展，从早期的统计学习方法到当前的深度学习方法。尽管它们取得了重大进展，现有方法仍然在应对解释性不足的挑战上面临困难。科尔莫戈洛夫-阿诺德网络（KAN）的崛起提供了解决这一挑战的新视角，但目前的工作尚未得出KAN在时间序列预测任务中是否有效的结论。本文旨在从性能、可集成性、效率和解释性等角度评估KAN在时间序列预测中的有效性。为此，我们提出了多层混合KAN网络（MMK），它在保留KAN能够转化为符号函数组合的能力的同时实现了出色的性能。MMK的核心模块是混合KAN层，它使用专家混合结构将变量分配给最匹配的KAN专家。然后，我们探索了一些有用的实验策略来处理培训阶段的问题。最后，我们在七个数据集上比较了MMK和各种基线。广泛的实验和可视化结果表明，KAN在多元时间序列预测中是有效的。代码可在https://github.com/2448845600/EasyTSF 获取。

更新时间: 2025-02-11 03:38:57

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2408.11306v2

TimeBridge: Non-Stationarity Matters for Long-term Time Series Forecasting

Non-stationarity poses significant challenges for multivariate time series forecasting due to the inherent short-term fluctuations and long-term trends that can lead to spurious regressions or obscure essential long-term relationships. Most existing methods either eliminate or retain non-stationarity without adequately addressing its distinct impacts on short-term and long-term modeling. Eliminating non-stationarity is essential for avoiding spurious regressions and capturing local dependencies in short-term modeling, while preserving it is crucial for revealing long-term cointegration across variates. In this paper, we propose TimeBridge, a novel framework designed to bridge the gap between non-stationarity and dependency modeling in long-term time series forecasting. By segmenting input series into smaller patches, TimeBridge applies Integrated Attention to mitigate short-term non-stationarity and capture stable dependencies within each variate, while Cointegrated Attention preserves non-stationarity to model long-term cointegration across variates. Extensive experiments show that TimeBridge consistently achieves state-of-the-art performance in both short-term and long-term forecasting. Additionally, TimeBridge demonstrates exceptional performance in financial forecasting on the CSI 500 and S&P 500 indices, further validating its robustness and effectiveness. Code is available at https://github.com/Hank0626/TimeBridge.

Updated: 2025-02-11 03:36:22

标题: TimeBridge：长期时间序列预测中非稳态性很重要

摘要: 非平稳性对多变量时间序列预测构成重大挑战，因为固有的短期波动和长期趋势可能导致虚假回归或掩盖基本的长期关系。大多数现有方法要么消除，要么保留非平稳性，但未能充分解决其对短期和长期建模的不同影响。消除非平稳性对于避免虚假回归和捕捉短期建模中的局部依赖关系至关重要，而保留非平稳性对于揭示变量间的长期协整关系至关重要。在本文中，我们提出了TimeBridge，这是一个新颖的框架，旨在弥合非平稳性和依赖建模在长期时间序列预测中的差距。通过将输入序列分割成较小的片段，TimeBridge应用了整合注意力来减轻短期非平稳性，并捕捉每个变量内稳定的依赖关系，而协整注意力则保留非平稳性，以建模变量间的长期协整关系。大量实验证明，TimeBridge在短期和长期预测中始终保持最先进的性能。此外，TimeBridge在CSI 500和S&P 500指数的金融预测中表现出色，进一步验证了其稳健性和有效性。代码可在https://github.com/Hank0626/TimeBridge获得。

更新时间: 2025-02-11 03:36:22

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2410.04442v3

Backdoor Mitigation by Distance-Driven Detoxification

Backdoor attacks undermine the integrity of machine learning models by allowing attackers to manipulate predictions using poisoned training data. Such attacks lead to targeted misclassification when specific triggers are present, while the model behaves normally under other conditions. This paper considers a post-training backdoor defense task, aiming to detoxify the backdoors in pre-trained models. We begin by analyzing the underlying issues of vanilla fine-tuning and observe that it is often trapped in regions with low loss for both clean and poisoned samples. Motivated by such observations, we propose Distance-Driven Detoxification (D3), an innovative approach that reformulates backdoor defense as a constrained optimization problem. Specifically, D3 promotes the model's departure from the vicinity of its initial weights, effectively reducing the influence of backdoors. Extensive experiments on state-of-the-art (SOTA) backdoor attacks across various model architectures and datasets demonstrate that D3 not only matches but often surpasses the performance of existing SOTA post-training defense techniques.

Updated: 2025-02-11 03:32:37

标题: 通过距离驱动解毒来减轻后门风险

摘要: 后门攻击通过使用有毒的训练数据允许攻击者操纵预测，从而破坏了机器学习模型的完整性。这种攻击导致有针对性的错误分类，当特定触发器存在时，而在其他条件下模型表现正常。本文考虑了一个后训练后门防御任务，旨在清除预先训练模型中的后门。我们首先分析了香草微调的潜在问题，并观察到它通常陷入对干净和有毒样本损失较低的区域。受到这些观察的启发，我们提出了一种创新方法Distance-Driven Detoxification (D3)，将后门防御重新构造为一个受限制的优化问题。具体而言，D3促使模型远离其初始权重的附近，有效减少后门的影响。对各种模型架构和数据集上的最新后门攻击进行了广泛实验，结果表明D3不仅匹配，而且经常超越现有的最新后训练防御技术的性能。

更新时间: 2025-02-11 03:32:37

领域: cs.CR

下载: http://arxiv.org/abs/2411.09585v2

A Memory Efficient Randomized Subspace Optimization Method for Training Large Language Models

The memory challenges associated with training Large Language Models (LLMs) have become a critical concern, particularly when using the Adam optimizer. To address this issue, numerous memory-efficient techniques have been proposed, with GaLore standing out as a notable example designed to reduce the memory footprint of optimizer states. However, these approaches do not alleviate the memory burden imposed by activations, rendering them unsuitable for scenarios involving long context sequences or large mini-batches. Moreover, their convergence properties are still not well-understood in the literature. In this work, we introduce a Randomized Subspace Optimization framework for pre-training and fine-tuning LLMs. Our approach decomposes the high-dimensional training problem into a series of lower-dimensional subproblems. At each iteration, a random subspace is selected, and the parameters within that subspace are optimized. This structured reduction in dimensionality allows our method to simultaneously reduce memory usage for both activations and optimizer states. We establish comprehensive convergence guarantees and derive rates for various scenarios, accommodating different optimization strategies to solve the subproblems. Extensive experiments validate the superior memory and communication efficiency of our method, achieving performance comparable to GaLore and Adam.

Updated: 2025-02-11 03:32:10

标题: 一种用于训练大型语言模型的内存高效的随机子空间优化方法

摘要: 与训练大型语言模型（LLMs）相关的记忆挑战已成为一个关键问题，特别是在使用Adam优化器时。为了解决这个问题，已经提出了许多内存高效的技术，其中GaLore作为一个显著的例子旨在减少优化器状态的内存占用。然而，这些方法并不能减轻激活所带来的内存负担，因此不适用于涉及长上下文序列或大批量的情况。此外，它们的收敛特性在文献中仍不为人熟知。在本研究中，我们介绍了一个用于预训练和微调LLMs的随机子空间优化框架。我们的方法将高维训练问题分解为一系列低维子问题。在每次迭代中，选择一个随机子空间，并优化该子空间内的参数。这种结构化的降维方式使我们的方法能够同时减少激活和优化器状态的内存使用。我们建立了全面的收敛保证，并针对不同的优化策略为解决子问题导出了速率。大量实验验证了我们的方法在内存和通信效率上的优越性，实现了与GaLore和Adam相当的性能。

更新时间: 2025-02-11 03:32:10

领域: cs.LG

下载: http://arxiv.org/abs/2502.07222v1

Recurrent Diffusion for Large-Scale Parameter Generation

Parameter generation has long struggled to match the scale of today large vision and language models, curbing its broader utility. In this paper, we introduce Recurrent Diffusion for Large Scale Parameter Generation (RPG), a novel framework that generates full neural network parameters up to hundreds of millions on a single GPU. Our approach first partitions a networks parameters into non-overlapping tokens, each corresponding to a distinct portion of the model. A recurrent mechanism then learns the inter token relationships, producing prototypes which serve as conditions for a diffusion process that ultimately synthesizes the full parameters. Across a spectrum of architectures and tasks including ResNets, ConvNeXts and ViTs on ImageNet 1K and COCO, and even LoRA based LLMs RPG achieves performance on par with fully trained networks while avoiding excessive memory overhead. Notably, it generalizes beyond its training set to generate valid parameters for previously unseen tasks, highlighting its flexibility in dynamic and open ended scenarios. By overcoming the longstanding memory and scalability barriers, RPG serves as a critical advance in AI generating AI, potentially enabling efficient weight generation at scales previously deemed infeasible.

Updated: 2025-02-11 03:29:30

标题: 大规模参数生成的循环扩散

摘要: 参数生成长期以来一直难以匹配当今大规模视觉和语言模型的规模，限制了其更广泛的实用性。在本文中，我们介绍了用于大规模参数生成的递归扩散（RPG）框架，这是一种新颖的框架，可以在单个GPU上生成高达数亿个完整的神经网络参数。我们的方法首先将网络参数划分为不重叠的令牌，每个令牌对应模型的不同部分。然后，一个递归机制学习令牌间的关系，产生原型，作为扩散过程的条件，最终合成完整的参数。在一系列架构和任务中，包括在ImageNet 1K和COCO上的ResNets、ConvNeXts和ViTs，甚至基于LoRA的LLMs，RPG取得了与完全训练的网络相当的性能，同时避免了过多的内存开销。值得注意的是，它可以泛化到超出训练集的任务，为以前未见的任务生成有效的参数，突显了其在动态和开放式场景中的灵活性。通过克服长期存在的内存和可扩展性障碍，RPG作为AI生成AI的关键进展，潜在地能够在以前被认为不可行的规模上实现高效的权重生成。

更新时间: 2025-02-11 03:29:30

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2501.11587v2

LUNAR: LLM Unlearning via Neural Activation Redirection

Large Language Models (LLMs) benefit from training on ever larger amounts of textual data, but as a result, they increasingly incur the risk of leaking private information. The ability to selectively remove knowledge from LLMs is, therefore, a highly desirable capability. In this paper, we propose LUNAR, a novel unlearning methodology grounded in the Linear Representation Hypothesis. LUNAR operates by redirecting the representations of unlearned data to regions that trigger the model's inherent ability to express its inability to answer. LUNAR achieves state-of-the-art unlearning performance while significantly enhancing the controllability of the unlearned model during inference. Specifically, LUNAR achieves between 2.9x to 11.7x improvements on combined "unlearning efficacy" and "model utility" score ("Deviation Score") on the PISTOL dataset across various base models. We also demonstrate, through quantitative analysis and qualitative examples, LUNAR's superior controllability in generating coherent and contextually aware responses, mitigating undesired side effects of existing methods. Moreover, we demonstrate that LUNAR is robust against white-box adversarial attacks and versatile in handling real-world scenarios, such as processing sequential unlearning requests.

Updated: 2025-02-11 03:23:22

标题: LUNAR: 通过神经激活重定向的LLM遗忘

摘要: 大型语言模型（LLMs）受益于在越来越多的文本数据上进行训练，但作为结果，它们越来越面临泄露私人信息的风险。因此，有能力有选择地从LLMs中删除知识是一种非常理想的能力。在本文中，我们提出了LUNAR，这是一种基于线性表示假设的新颖的遗忘方法。LUNAR通过将未学习数据的表示重定向到触发模型固有的表达其无法回答的能力的区域来运作。LUNAR在明显增强未学习模型在推理过程中的可控性的同时，实现了最先进的遗忘性能。具体来说，在各种基础模型上，LUNAR在PISTOL数据集上实现了2.9倍到11.7倍的提高，结合了“遗忘效果”和“模型效用”得分（“偏差分数”）。我们还通过定量分析和定性示例证明，LUNAR在生成连贯和具有上下文意识的响应方面具有卓越的可控性，减轻了现有方法的不良副作用。此外，我们还证明LUNAR对白盒对抗攻击具有鲁棒性，并且在处理诸如处理序列遗忘请求等实际场景中具有多功能性。

更新时间: 2025-02-11 03:23:22

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2502.07218v1

SparseFormer: Detecting Objects in HRW Shots via Sparse Vision Transformer

Recent years have seen an increase in the use of gigapixel-level image and video capture systems and benchmarks with high-resolution wide (HRW) shots. However, unlike close-up shots in the MS COCO dataset, the higher resolution and wider field of view raise unique challenges, such as extreme sparsity and huge scale changes, causing existing close-up detectors inaccuracy and inefficiency. In this paper, we present a novel model-agnostic sparse vision transformer, dubbed SparseFormer, to bridge the gap of object detection between close-up and HRW shots. The proposed SparseFormer selectively uses attentive tokens to scrutinize the sparsely distributed windows that may contain objects. In this way, it can jointly explore global and local attention by fusing coarse- and fine-grained features to handle huge scale changes. SparseFormer also benefits from a novel Cross-slice non-maximum suppression (C-NMS) algorithm to precisely localize objects from noisy windows and a simple yet effective multi-scale strategy to improve accuracy. Extensive experiments on two HRW benchmarks, PANDA and DOTA-v1.0, demonstrate that the proposed SparseFormer significantly improves detection accuracy (up to 5.8%) and speed (up to 3x) over the state-of-the-art approaches.

Updated: 2025-02-11 03:21:25

标题: SparseFormer：通过稀疏视觉变换器在高分辨率宽高比图像中检测物体

摘要: 近年来，吉格像素级图像和视频捕捉系统以及具有高分辨率广角镜头（HRW）的基准的使用增加。然而，与MS COCO数据集中的近距离拍摄不同，更高的分辨率和更宽的视野带来了独特的挑战，如极端稀疏性和巨大的尺度变化，导致现有的近距离检测器不准确且低效。在本文中，我们提出了一种新颖的模型无关的稀疏视觉变换器，命名为SparseFormer，以弥合近距离和HRW镜头之间的目标检测差距。所提出的SparseFormer选择性地使用关注令牌来审查可能包含目标的稀疏分布窗口。通过这种方式，它可以通过融合粗粒度和细粒度特征来同时探索全局和局部注意力，以处理巨大的尺度变化。SparseFormer还受益于一种新颖的交叉切片非极大值抑制（C-NMS）算法，以精确地从嘈杂的窗口中定位目标，并采用简单而有效的多尺度策略来提高准确性。对PANDA和DOTA-v1.0两个HRW基准的广泛实验表明，所提出的SparseFormer显著提高了检测准确性（最高可达5.8%）和速度（最高可达3倍），优于现有技术方法。

更新时间: 2025-02-11 03:21:25

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2502.07216v1

Pareto Optimal Algorithmic Recourse in Multi-cost Function

In decision-making systems, algorithmic recourse aims to identify minimal-cost actions to alter an individual features, thereby obtaining a desired outcome. This empowers individuals to understand, question, or alter decisions that negatively affect them. However, due to the variety and sensitivity of system environments and individual personalities, quantifying the cost of a single function is nearly impossible while considering multiple criteria situations. Most current recourse mechanisms use gradient-based methods that assume cost functions are differentiable, often not applicable in real-world scenarios, resulting in sub-optimal solutions that compromise various criteria. These solutions are typically intractable and lack rigorous theoretical foundations, raising concerns regarding interpretability, reliability, and transparency from the explainable AI (XAI) perspective. To address these issues, this work proposes an algorithmic recourse framework that handles non-differentiable and discrete multi-cost functions. By formulating recourse as a multi-objective optimization problem and assigning weights to different criteria based on their importance, our method identifies Pareto optimal recourse recommendations. To demonstrate scalability, we incorporate the concept of epsilon-net, proving the ability to find approximated Pareto optimal actions. Experiments show the trade-off between different criteria and the methods scalability in large graphs. Compared to current heuristic practices, our approach provides a stronger theoretical foundation and better aligns recourse suggestions with real-world requirements.

Updated: 2025-02-11 03:16:08

标题: 帕累托最优算法在多成本函数中的应用

摘要: 在决策系统中，算法性补救旨在确定最小成本的行动，以改变个体特征，从而实现期望的结果。这使个体能够理解、质疑或改变对他们产生负面影响的决策。然而，由于系统环境和个体个性的多样性和敏感性，在考虑多个标准情况下量化单个函数的成本几乎是不可能的。大多数当前的补救机制使用基于梯度的方法，假设成本函数是可微分的，这在现实场景中通常不适用，导致妥协各种标准的次优解决方案。这些解决方案通常是难以解决的，缺乏严格的理论基础，从可解释AI（XAI）的角度引发了关于可解释性、可靠性和透明性的担忧。为了解决这些问题，本研究提出了一个处理非可微分和离散多成本函数的算法性补救框架。通过将补救问题形式化为多目标优化问题，并根据其重要性分配不同标准的权重，我们的方法确定帕累托最优的补救建议。为了展示可扩展性，我们引入了ε-网的概念，证明了找到近似帕累托最优行动的能力。实验证明了不同标准之间的权衡以及方法在大型图中的可扩展性。与当前的启发式实践相比，我们的方法提供了更强的理论基础，并更好地与真实世界需求相一致的补救建议。

更新时间: 2025-02-11 03:16:08

领域: cs.LG,cs.AI,cs.DS

下载: http://arxiv.org/abs/2502.07214v1

Improve the Training Efficiency of DRL for Wireless Communication Resource Allocation: The Role of Generative Diffusion Models

Dynamic resource allocation in mobile wireless networks involves complex, time-varying optimization problems, motivating the adoption of deep reinforcement learning (DRL). However, most existing works rely on pre-trained policies, overlooking dynamic environmental changes that rapidly invalidate the policies. Periodic retraining becomes inevitable but incurs prohibitive computational costs and energy consumption-critical concerns for resource-constrained wireless systems. We identify three root causes of inefficient retraining: high-dimensional state spaces, suboptimal action spaces exploration-exploitation trade-offs, and reward design limitations. To overcome these limitations, we propose Diffusion-based Deep Reinforcement Learning (D2RL), which leverages generative diffusion models (GDMs) to holistically enhance all three DRL components. Iterative refinement process and distribution modelling of GDMs enable (1) the generation of diverse state samples to improve environmental understanding, (2) balanced action space exploration to escape local optima, and (3) the design of discriminative reward functions that better evaluate action quality. Our framework operates in two modes: Mode I leverages GDMs to explore reward spaces and design discriminative reward functions that rigorously evaluate action quality, while Mode II synthesizes diverse state samples to enhance environmental understanding and generalization. Extensive experiments demonstrate that D2RL achieves faster convergence and reduced computational costs over conventional DRL methods for resource allocation in wireless communications while maintaining competitive policy performance. This work underscores the transformative potential of GDMs in overcoming fundamental DRL training bottlenecks for wireless networks, paving the way for practical, real-time deployments.

Updated: 2025-02-11 03:09:45

标题: 提高无线通信资源分配的DRL训练效率：生成扩散模型的作用

摘要: 移动无线网络中的动态资源分配涉及复杂且时变的优化问题，这促使采用深度强化学习（DRL）。然而，大多数现有作品依赖于预训练策略，忽视了快速使策略失效的动态环境变化。周期性重新训练变得不可避免，但会产生巨大的计算成本和能量消耗——这对于资源受限的无线系统来说是关键问题。我们确定了重新训练低效的三个根本原因：高维状态空间、次优动作空间的探索-开发权衡以及奖励设计的局限性。为了克服这些限制，我们提出了基于扩散的深度强化学习（D2RL），该方法利用生成扩散模型（GDMs）全面增强了DRL的三个组件。GDM的迭代细化过程和分布建模能够实现：（1）生成多样化的状态样本以提高环境理解，（2）平衡的行动空间探索以避免局部最优点，以及（3）设计更好评估行动质量的区别性奖励函数。我们的框架分为两种模式：模式I利用GDM探索奖励空间并设计区别性奖励函数，严格评估行动质量；而模式II合成多样化的状态样本以增强环境理解和泛化。大量实验表明，D2RL在无线通信资源分配中实现了更快的收敛速度和降低的计算成本，同时保持了与传统DRL方法竞争力相当的策略性能。这项工作强调了GDM在克服无线网络的基本DRL训练瓶颈方面的变革潜力，为实践中的实时部署铺平了道路。

更新时间: 2025-02-11 03:09:45

领域: cs.LG

下载: http://arxiv.org/abs/2502.07211v1

Enhancing Physics-Informed Neural Networks Through Feature Engineering

Physics-Informed Neural Networks (PINNs) seek to solve partial differential equations (PDEs) with deep learning. Mainstream approaches that deploy fully-connected multi-layer deep learning architectures require prolonged training to achieve even moderate accuracy, while recent work on feature engineering allows higher accuracy and faster convergence. This paper introduces SAFE-NET, a Single-layered Adaptive Feature Engineering NETwork that achieves orders-of-magnitude lower errors with far fewer parameters than baseline feature engineering methods. SAFE-NET returns to basic ideas in machine learning, using Fourier features, a simplified single hidden layer network architecture, and an effective optimizer that improves the conditioning of the PINN optimization problem. Numerical results show that SAFE-NET converges faster and typically outperforms deeper networks and more complex architectures. It consistently uses fewer parameters -- on average, 65% fewer than the competing feature engineering methods -- while achieving comparable accuracy in less than 30% of the training epochs. Moreover, each SAFE-NET epoch is 95% faster than those of competing feature engineering approaches. These findings challenge the prevailing belief that modern PINNs effectively learn features in these scientific applications and highlight the efficiency gains possible through feature engineering.

Updated: 2025-02-11 03:07:28

标题: 通过特征工程增强物理信息神经网络

摘要: 物理信息神经网络（PINNs）旨在利用深度学习解决偏微分方程（PDEs）。采用全连接多层深度学习架构的主流方法需要长时间的训练才能达到适度的准确性，而最近关于特征工程的工作允许更高的准确性和更快的收敛速度。本文介绍了SAFE-NET，一种单层自适应特征工程网络，通过比基准特征工程方法更少的参数实现了数量级更低的错误。SAFE-NET回归到机器学习中的基本思想，使用傅立叶特征、简化的单隐藏层网络架构以及有效的优化器来改善PINN优化问题的条件。数值结果显示，SAFE-NET收敛速度更快，通常优于更深的网络和更复杂的架构。它持续使用更少的参数--平均比竞争的特征工程方法少65%--同时在不到30%的训练周期内达到可比的准确性。此外，每个SAFE-NET周期比竞争特征工程方法的周期快95%。这些发现挑战了普遍的观念，即现代PINNs在这些科学应用中有效地学习特征，并突显了通过特征工程可能实现的效率收益。

更新时间: 2025-02-11 03:07:28

领域: cs.LG

下载: http://arxiv.org/abs/2502.07209v1

A Study on the Importance of Features in Detecting Advanced Persistent Threats Using Machine Learning

Advanced Persistent Threats (APTs) pose a significant security risk to organizations and industries. These attacks often lead to severe data breaches and compromise the system for a long time. Mitigating these sophisticated attacks is highly challenging due to the stealthy and persistent nature of APTs. Machine learning models are often employed to tackle this challenge by bringing automation and scalability to APT detection. Nevertheless, these intelligent methods are data-driven, and thus, highly affected by the quality and relevance of input data. This paper aims to analyze measurements considered when recording network traffic and conclude which features contribute more to detecting APT samples. To do this, we study the features associated with various APT cases and determine their importance using a machine learning framework. To ensure the generalization of our findings, several feature selection techniques are employed and paired with different classifiers to evaluate their effectiveness. Our findings provide insights into how APT detection can be enhanced in real-world scenarios.

Updated: 2025-02-11 03:06:03

标题: 一项关于利用机器学习检测高级持久威胁中特征重要性的研究

摘要: 高级持续威胁(APTs)对组织和行业构成重大安全风险。这些攻击往往导致严重的数据泄露，并且长时间危害系统。由于APTs的隐秘和持久性质，减轻这些复杂攻击是非常具有挑战性的。通常使用机器学习模型来解决这一挑战，通过将自动化和可扩展性引入到APTs检测中。然而，这些智能方法是数据驱动的，因此受到输入数据质量和相关性的影响。本文旨在分析记录网络流量时考虑的测量，以及得出哪些特征对于检测APTs样本的贡献更大。为了做到这一点，我们研究了与各种APTs案例相关的特征，并使用机器学习框架确定它们的重要性。为了确保我们发现的普遍性，采用了几种特征选择技术，并将其与不同的分类器配对以评估它们的有效性。我们的研究结果为如何在现实情境中提高APTs检测提供了见解。

更新时间: 2025-02-11 03:06:03

领域: cs.CR,cs.AI,cs.LG

下载: http://arxiv.org/abs/2502.07207v1

Optimal Actuator Attacks on Autonomous Vehicles Using Reinforcement Learning

With the increasing prevalence of autonomous vehicles (AVs), their vulnerability to various types of attacks has grown, presenting significant security challenges. In this paper, we propose a reinforcement learning (RL)-based approach for designing optimal stealthy integrity attacks on AV actuators. We also analyze the limitations of state-of-the-art RL-based secure controllers developed to counter such attacks. Through extensive simulation experiments, we demonstrate the effectiveness and efficiency of our proposed method.

Updated: 2025-02-11 03:01:05

标题: 使用强化学习对自动驾驶车辆进行最佳致动器攻击

摘要: 随着自动驾驶汽车（AVs）的普及，它们面临各种类型攻击的脆弱性不断增加，带来重大安全挑战。本文提出了一种基于强化学习（RL）的方法，用于设计对AV执行器进行最佳隐蔽完整性攻击。我们还分析了针对此类攻击开发的最新RL安全控制器的局限性。通过大量的模拟实验，我们展示了我们提出的方法的有效性和效率。

更新时间: 2025-02-11 03:01:05

领域: cs.RO,cs.LG

下载: http://arxiv.org/abs/2502.07839v1

From Foresight to Forethought: VLM-In-the-Loop Policy Steering via Latent Alignment

While generative robot policies have demonstrated significant potential in learning complex, multimodal behaviors from demonstrations, they still exhibit diverse failures at deployment-time. Policy steering offers an elegant solution to reducing the chance of failure by using an external verifier to select from low-level actions proposed by an imperfect generative policy. Here, one might hope to use a Vision Language Model (VLM) as a verifier, leveraging its open-world reasoning capabilities. However, off-the-shelf VLMs struggle to understand the consequences of low-level robot actions as they are represented fundamentally differently than the text and images the VLM was trained on. In response, we propose FOREWARN, a novel framework to unlock the potential of VLMs as open-vocabulary verifiers for runtime policy steering. Our key idea is to decouple the VLM's burden of predicting action outcomes (foresight) from evaluation (forethought). For foresight, we leverage a latent world model to imagine future latent states given diverse low-level action plans. For forethought, we align the VLM with these predicted latent states to reason about the consequences of actions in its native representation--natural language--and effectively filter proposed plans. We validate our framework across diverse robotic manipulation tasks, demonstrating its ability to bridge representational gaps and provide robust, generalizable policy steering. Videos can be found on the project website: https://yilin-wu98.github.io/forewarn/.

Updated: 2025-02-11 03:00:12

标题: 从远见到谋划：通过潜在对齐实现VLM在环路政策导向

摘要: 生成式机器人策略在从示范中学习复杂的多模态行为方面表现出显著潜力，但它们在部署时仍然存在各种失败。策略导向提供了一个优雅的解决方案，通过使用外部验证器从不完美的生成式策略提出的低级动作中进行选择，从而降低失败的可能性。在这里，人们可能希望使用视觉语言模型（VLM）作为验证器，利用其开放世界推理能力。然而，现成的VLM很难理解低级机器人动作的后果，因为它们在本质上与VLM训练时所使用的文本和图像有着根本的不同表示。作为回应，我们提出了FOREWARN，这是一个新颖的框架，旨在释放VLM作为运行时策略导向器的开放词汇验证器的潜力。我们的关键想法是将VLM预测行动结果（预见）的负担与评估（思考）分开。对于预见，我们利用一个潜在的世界模型来想象未来的潜在状态，给出多样的低级行动计划。对于思考，我们将VLM与这些预测的潜在状态对齐，以便用其本机表示——自然语言——推断行动的后果，并有效地过滤提出的计划。我们验证了我们的框架在各种机器人操作任务中的有效性，展示了它跨越表示差距并提供健壮、可泛化的策略导向的能力。视频可以在项目网站上找到：https://yilin-wu98.github.io/forewarn/。

更新时间: 2025-02-11 03:00:12

领域: cs.RO,cs.LG

下载: http://arxiv.org/abs/2502.01828v2

VolleyBots: A Testbed for Multi-Drone Volleyball Game Combining Motion Control and Strategic Play

Multi-agent reinforcement learning (MARL) has made significant progress, largely fueled by the development of specialized testbeds that enable systematic evaluation of algorithms in controlled yet challenging scenarios. However, existing testbeds often focus on purely virtual simulations or limited robot morphologies such as robotic arms, quadrupeds, and humanoids, leaving high-mobility platforms with real-world physical constraints like drones underexplored. To bridge this gap, we present VolleyBots, a new MARL testbed where multiple drones cooperate and compete in the sport of volleyball under physical dynamics. VolleyBots features a turn-based interaction model under volleyball rules, a hierarchical decision-making process that combines motion control and strategic play, and a high-fidelity simulation for seamless sim-to-real transfer. We provide a comprehensive suite of tasks ranging from single-drone drills to multi-drone cooperative and competitive tasks, accompanied by baseline evaluations of representative MARL and game-theoretic algorithms. Results in simulation show that while existing algorithms handle simple tasks effectively, they encounter difficulty in complex tasks that require both low-level control and high-level strategy. We further demonstrate zero-shot deployment of a simulation-learned policy to real-world drones, highlighting VolleyBots' potential to propel MARL research involving agile robotic platforms. The project page is at https://sites.google.com/view/thu-volleybots/home.

Updated: 2025-02-11 03:00:12

标题: VolleyBots：结合运动控制和战略游戏的多无人机排球比赛测试平台

摘要: 多智能体强化学习（MARL）取得了显著进展，这在很大程度上得益于专门的测试平台的发展，这些平台可以在受控但具有挑战性的场景中系统地评估算法。然而，现有的测试平台通常集中在纯虚拟模拟或有限的机器人形态，如机械臂、四足动物和仿人类机器人上，而对于具有现实世界物理约束的高机动性平台，如无人机，研究不足。为了弥补这一差距，我们提出了VolleyBots，一个新的MARL测试平台，在这个平台上多架无人机在物理动态下合作和竞争进行排球比赛。VolleyBots采用基于轮流操作的互动模型，符合排球规则，结合运动控制和战略游戏的分层决策过程，并提供高保真度的模拟环境，以实现模拟到真实的无缝转换。我们提供了一套全面的任务，从单机无人机练习到多机协作和竞争任务，同时还附带了代表性MARL和博弈论算法的基准评估。在模拟中的结果显示，现有算法可以有效处理简单任务，但在需要低级控制和高级策略的复杂任务中遇到困难。我们进一步展示了将从模拟中学习的策略零点部署到现实世界无人机上，突出了VolleyBots在推动涉及灵活机器人平台的MARL研究中的潜力。项目页面位于https://sites.google.com/view/thu-volleybots/home。

更新时间: 2025-02-11 03:00:12

领域: cs.RO,cs.AI,cs.LG

下载: http://arxiv.org/abs/2502.01932v2

Monte Carlo Tree Diffusion for System 2 Planning

Diffusion models have recently emerged as a powerful tool for planning. However, unlike Monte Carlo Tree Search (MCTS)-whose performance naturally improves with additional test-time computation (TTC), standard diffusion-based planners offer only limited avenues for TTC scalability. In this paper, we introduce Monte Carlo Tree Diffusion (MCTD), a novel framework that integrates the generative strength of diffusion models with the adaptive search capabilities of MCTS. Our method reconceptualizes denoising as a tree-structured process, allowing partially denoised plans to be iteratively evaluated, pruned, and refined. By selectively expanding promising trajectories while retaining the flexibility to revisit and improve suboptimal branches, MCTD achieves the benefits of MCTS such as controlling exploration-exploitation trade-offs within the diffusion framework. Empirical results on challenging long-horizon tasks show that MCTD outperforms diffusion baselines, yielding higher-quality solutions as TTC increases.

Updated: 2025-02-11 02:51:42

标题: 蒙特卡洛树扩散用于系统2规划

摘要: 扩散模型最近已经成为规划的强大工具。然而，与蒙特卡罗树搜索（MCTS）不同，其性能自然随着额外的测试时间计算（TTC）而提高，标准的基于扩散的规划器仅提供有限的TTC可扩展性途径。在本文中，我们介绍了蒙特卡罗树扩散（MCTD），这是一个将扩散模型的生成力量与MCTS的自适应搜索能力相结合的新框架。我们的方法将去噪重新构想为一个树结构过程，允许部分去噪计划被迭代地评估、修剪和改进。通过选择性地扩展有希望的轨迹，同时保留重新访问和改进次优分支的灵活性，MCTD实现了MCTS的好处，例如在扩散框架内控制勘探利用权衡。对具有挑战性的长期任务的经验结果表明，MCTD优于扩散基线，在TTC增加时产生更高质量的解决方案。

更新时间: 2025-02-11 02:51:42

领域: cs.AI,cs.LG

下载: http://arxiv.org/abs/2502.07202v1

Fixed-Confidence Best Arm Identification with Decreasing Variance

We focus on the problem of best-arm identification in a stochastic multi-arm bandit with temporally decreasing variances for the arms' rewards. We model arm rewards as Gaussian random variables with fixed means and variances that decrease with time. The cost incurred by the learner is modeled as a weighted sum of the time needed by the learner to identify the best arm, and the number of samples of arms collected by the learner before termination. Under this cost function, there is an incentive for the learner to not sample arms in all rounds, especially in the initial rounds. On the other hand, not sampling increases the termination time of the learner, which also increases cost. This trade-off necessitates new sampling strategies. We propose two policies. The first policy has an initial wait period with no sampling followed by continuous sampling. The second policy samples periodically and uses a weighted average of the rewards observed to identify the best arm. We provide analytical guarantees on the performance of both policies and supplement our theoretical results with simulations which show that our polices outperform the state-of-the-art policies for the classical best arm identification problem.

Updated: 2025-02-11 02:47:20

标题: 固定置信度下的最佳臂识别与方差减少

摘要: 我们关注的问题是在具有递减方差的随机多臂赌博机中进行最佳臂识别。我们将臂奖励建模为具有固定均值和随时间递减方差的高斯随机变量。学习者所产生的成本被建模为学习者识别最佳臂所需的时间与终止前学习者收集的臂样本数量的加权和。在这个成本函数下，学习者有动机不在所有回合中对臂进行取样，尤其是在初始回合中。另一方面，不取样会增加学习者的终止时间，也会增加成本。这种权衡需要新的取样策略。我们提出了两种策略。第一种策略有一个初始等待期不进行取样，然后进行连续取样。第二种策略定期取样，并使用观察到的奖励的加权平均来识别最佳臂。我们提供了关于这两种策略性能的解析保证，并通过模拟结果补充我们的理论结果，表明我们的策略在经典最佳臂识别问题中胜过了现有的最先进策略。

更新时间: 2025-02-11 02:47:20

领域: cs.LG,cs.IT,math.IT,math.ST,stat.ML,stat.TH

下载: http://arxiv.org/abs/2502.07199v1

Dense Object Detection Based on De-homogenized Queries

Dense object detection is widely used in automatic driving, video surveillance, and other fields. This paper focuses on the challenging task of dense object detection. Currently, detection methods based on greedy algorithms, such as non-maximum suppression (NMS), often produce many repetitive predictions or missed detections in dense scenarios, which is a common problem faced by NMS-based algorithms. Through the end-to-end DETR (DEtection TRansformer), as a type of detector that can incorporate the post-processing de-duplication capability of NMS, etc., into the network, we found that homogeneous queries in the query-based detector lead to a reduction in the de-duplication capability of the network and the learning efficiency of the encoder, resulting in duplicate prediction and missed detection problems. To solve this problem, we propose learnable differentiated encoding to de-homogenize the queries, and at the same time, queries can communicate with each other via differentiated encoding information, replacing the previous self-attention among the queries. In addition, we used joint loss on the output of the encoder that considered both location and confidence prediction to give a higher-quality initialization for queries. Without cumbersome decoder stacking and guaranteeing accuracy, our proposed end-to-end detection framework was more concise and reduced the number of parameters by about 8% compared to deformable DETR. Our method achieved excellent results on the challenging CrowdHuman dataset with 93.6% average precision (AP), 39.2% MR-2, and 84.3% JI. The performance overperformed previous SOTA methods, such as Iter-E2EDet (Progressive End-to-End Object Detection) and MIP (One proposal, Multiple predictions). In addition, our method is more robust in various scenarios with different densities.

Updated: 2025-02-11 02:36:10

标题: 基于去均匀化查询的密集目标检测

摘要: 密集物体检测广泛应用于自动驾驶、视频监控等领域。本文关注密集物体检测这一具有挑战性的任务。目前，基于贪婪算法的检测方法，如非极大值抑制（NMS），在密集场景中常常会产生许多重复预测或漏检，这是NMS算法面临的普遍问题。通过端到端的DETR（DEtection TRansformer），作为一种可以将NMS等后处理去重能力纳入网络的检测器类型，我们发现基于查询的检测器中的同质查询会降低网络的去重能力和编码器的学习效率，导致重复预测和漏检问题。为了解决这个问题，我们提出了可学习的差异化编码来去除查询的同质性，并同时，查询可以通过差异化编码信息相互通信，取代以前查询之间的自注意力。此外，我们在编码器的输出上使用联合损失，考虑了位置和置信度预测，为查询提供了更高质量的初始化。与Deformable DETR相比，我们提出的端到端检测框架更为简洁，减少了约8%的参数数量，无需繁琐的解码器堆叠和保证准确性。我们的方法在具有挑战性的CrowdHuman数据集上取得了出色的结果，平均精度（AP）为93.6%，MR-2为39.2%，JI为84.3%。性能超过了之前的SOTA方法，如Iter-E2EDet（渐进式端到端目标检测）和MIP（一个提议，多个预测）。此外，我们的方法在不同密度的各种场景中更加稳健。

更新时间: 2025-02-11 02:36:10

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2502.07194v1

Provably Efficient RLHF Pipeline: A Unified View from Contextual Bandits

Reinforcement Learning from Human Feedback (RLHF) is a widely used approach for aligning Large Language Models (LLMs) with human preferences. While recent advancements have provided valuable insights into various stages and settings of RLHF, a comprehensive theoretical understanding of the entire RLHF pipeline remains lacking. Towards this end, we propose a unified framework for the RLHF pipeline from the view of contextual bandits and provide provable efficiency guarantees. In particular, we decompose the RLHF process into two distinct stages: (post-)training and deployment, exploring both passive and active data collection strategies during the training phase. By employing the Bradley-Terry preference model with a linearly parameterized reward function, we reformulate RLHF as a contextual preference bandit problem. We then develop novel algorithms for each stage, demonstrating significant improvements over existing approaches in both statistical and computational efficiency. Finally, we apply our method to train and deploy Llama-3-8B-Instruct on the Ultrafeedback-binarized dataset, and empirical results confirm the effectiveness of our approach.

Updated: 2025-02-11 02:36:01

标题: 可证明高效的RLHF管道：从情境赌博机的统一视角

摘要: 人类反馈的强化学习（RLHF）是一种广泛应用的方法，用于使大型语言模型（LLMs）与人类偏好保持一致。尽管最近的进展为RLHF的各个阶段和设置提供了有价值的见解，但对整个RLHF流程的全面理论理解仍然不足。为此，我们提出了一个统一的框架，从上下文二项式的视角为RLHF流程提供可证明的效率保证。具体来说，我们将RLHF过程分解为两个不同的阶段：（后续）训练和部署，在训练阶段探索了被动和主动数据收集策略。通过使用布拉德利-特里偏好模型和线性参数化的奖励函数，我们将RLHF重新表述为上下文偏好二项式问题。然后我们为每个阶段开发了新算法，展示了在统计和计算效率方面相比现有方法的显著改进。最后，我们将我们的方法应用于训练和部署Llama-3-8B-Instruct模型，使用Ultrafeedback-binarized数据集，实证结果证实了我们方法的有效性。

更新时间: 2025-02-11 02:36:01

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2502.07193v1

Understanding LLMs' Fluid Intelligence Deficiency: An Analysis of the ARC Task

While LLMs have exhibited strong performance on various NLP tasks, it is noteworthy that most of these tasks rely on utilizing the vast amount of knowledge encoded in LLMs' parameters, rather than solving new problems without prior knowledge. In cognitive research, the latter ability is referred to as fluid intelligence, which is considered to be critical for assessing human intelligence. Recent research on fluid intelligence assessments has highlighted significant deficiencies in LLMs' abilities. In this paper, we analyze the challenges LLMs face in demonstrating fluid intelligence through controlled experiments, using the most representative ARC task as an example. Our study revealed three major limitations in existing LLMs: limited ability for skill composition, unfamiliarity with abstract input formats, and the intrinsic deficiency of left-to-right decoding. Our data and code can be found in https://wujunjie1998.github.io/araoc-benchmark.github.io/.

Updated: 2025-02-11 02:31:09

标题: 理解LLMs的流体智力缺陷：对ARC任务的分析

摘要: 尽管LLMs在各种NLP任务中表现出色，但值得注意的是，大多数任务依赖于利用LLMs参数中编码的大量知识，而不是在没有先验知识的情况下解决新问题。在认知研究中，后者的能力被称为流体智力，被认为对评估人类智力至关重要。最近对流体智力评估的研究突出了LLMs能力的显著缺陷。在本文中，我们通过使用最具代表性的ARC任务作为例子，分析了LLMs在展示流体智力方面面临的挑战。我们的研究揭示了现有LLMs的三个主要局限性：技能组合能力有限，对抽象输入格式不熟悉，以及左到右解码的固有缺陷。我们的数据和代码可以在https://wujunjie1998.github.io/araoc-benchmark.github.io/找到。

更新时间: 2025-02-11 02:31:09

领域: cs.AI

下载: http://arxiv.org/abs/2502.07190v1

Exploring Neural Network Pruning with Screening Methods

Deep neural networks (DNNs) such as convolutional neural networks (CNNs) for visual tasks, recurrent neural networks (RNNs) for sequence data, and transformer models for rich linguistic or multimodal tasks, achieved unprecedented performance on a wide range of tasks. The impressive performance of modern DNNs is partially attributed to their sheer scale. The latest deep learning models have tens to hundreds of millions of parameters which makes the inference processes resource-intensive. The high computational complexity of these networks prevents their deployment on resource-limited devices such as mobile platforms, IoT devices, and edge computing systems because these devices require energy-efficient and real-time processing capabilities. This paper proposes and evaluates a network pruning framework that eliminates non-essential parameters based on a statistical analysis of network component significance across classification categories. The proposed method uses screening methods coupled with a weighted scheme to assess connection and channel contributions for unstructured and structured pruning which allows for the elimination of unnecessary network elements without significantly degrading model performance. Extensive experimental validation on real-world vision datasets for both fully connected neural networks (FNNs) and CNNs has shown that the proposed framework produces competitive lean networks compared to the original networks. Moreover, the proposed framework outperforms state-of-art network pruning methods in two out of three cases.

Updated: 2025-02-11 02:31:04

标题: 使用筛选方法探索神经网络剪枝

摘要: 深度神经网络（DNNs）如卷积神经网络（CNNs）用于视觉任务，递归神经网络（RNNs）用于序列数据，以及变压器模型用于丰富的语言或多模态任务，在广泛的任务上取得了前所未有的性能。现代DNNs的出色性能部分归因于它们的庞大规模。最新的深度学习模型拥有数千万到数亿个参数，这使得推断过程变得资源密集。这些网络的高计算复杂性阻止了它们在资源有限的设备上部署，如移动平台、物联网设备和边缘计算系统，因为这些设备需要能效高和实时处理能力。本文提出并评估了一个网络修剪框架，根据网络组件在分类类别上的统计分析确定非必要参数。所提出的方法使用筛选方法结合加权方案来评估连接和通道对非结构化和结构化修剪的贡献，从而实现消除不必要的网络元素而不显著降低模型性能。在真实视觉数据集上进行的大量实验验证表明，所提出的框架相对于原始网络产生了具有竞争力的精简网络，无论是对于全连接神经网络（FNNs）还是CNNs。此外，在三种情况中，所提出的框架在两种情况下优于现有的网络修剪方法。

更新时间: 2025-02-11 02:31:04

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2502.07189v1

Local Regularizers Are Not Transductive Learners

We partly resolve an open question raised by Asilis et al. (COLT 2024): whether the algorithmic template of local regularization -- an intriguing generalization of explicit regularization, a.k.a. structural risk minimization -- suffices to learn all learnable multiclass problems. Specifically, we provide a negative answer to this question in the transductive model of learning. We exhibit a multiclass classification problem which is learnable in both the transductive and PAC models, yet cannot be learned transductively by any local regularizer. The corresponding hypothesis class, and our proof, are based on principles from cryptographic secret sharing. We outline challenges in extending our negative result to the PAC model, leaving open the tantalizing possibility of a PAC/transductive separation with respect to local regularization.

Updated: 2025-02-11 02:28:35

标题: 本地正则化器不是传导学习者

摘要: 我们在某种程度上解决了Asilis等人提出的一个开放问题（COLT 2024）：即本地正则化的算法模板 - 显式正则化的有趣泛化，也称为结构风险最小化 - 是否足以学习所有可学习的多类问题。具体来说，我们在学习的传导模型中对这个问题给出了一个否定答案。我们展示了一个多类分类问题，在传导和PAC模型中都是可学习的，但不能被任何本地正则化器传导学习。相应的假设类和我们的证明基于密码秘密共享原理。我们概述了将我们的负面结果扩展到PAC模型的挑战，这留下了一种令人心动的可能性：在本地正则化方面存在PAC/传导分离。

更新时间: 2025-02-11 02:28:35

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2502.07187v1

Perceived Confidence Scoring for Data Annotation with Zero-Shot LLMs

Zero-shot LLMs are now also used for textual classification tasks, e.g., sentiment/emotion detection of a given input as a sentence/article. However, their performance can be suboptimal in such data annotation tasks. We introduce a novel technique Perceived Confidence Scoring (PCS) that evaluates LLM's confidence for its classification of an input by leveraging Metamorphic Relations (MRs). The MRs generate semantically equivalent yet textually mutated versions of the input. Following the principles of Metamorphic Testing (MT), the mutated versions are expected to have annotation labels similar to the input. By analyzing the consistency of LLM responses across these variations, PCS computes a confidence score based on the frequency of predicted labels. PCS can be used both for single LLM and multiple LLM settings (e.g., majority voting). We introduce an algorithm Perceived Differential Evolution (PDE) that determines the optimal weights assigned to the MRs and the LLMs for a classification task. Empirical evaluation shows PCS significantly improves zero-shot accuracy for Llama-3-8B-Instruct (4.96%) and Mistral-7B-Instruct-v0.3 (10.52%), with Gemma-2-9b-it showing a 9.39% gain. When combining all three models, PCS significantly outperforms majority voting by 7.75%.

Updated: 2025-02-11 02:25:44

标题: 使用零样本LLMs进行数据标注的感知置信度评分

摘要: 零样本LLM现在也被用于文本分类任务，例如对给定输入进行情感/情绪检测，作为句子/文章。然而，在这些数据标注任务中，它们的性能可能不够理想。我们引入了一种新颖的技术Perceived Confidence Scoring (PCS)，通过利用Metamorphic Relations (MRs)评估LLM对输入的分类的置信度。MRs生成语义等效但文本变异的输入版本。遵循Metamorphic Testing (MT)的原则，期望变异版本具有与输入类似的注释标签。通过分析LLM在这些变异版本上的响应一致性，PCS根据预测标签的频率计算置信度得分。PCS可用于单个LLM和多个LLM设置（例如，多数投票）。我们引入了一种算法Perceived Differential Evolution (PDE)，用于确定分配给分类任务的MRs和LLMs的最佳权重。经验评估显示，PCS显著提高了Llama-3-8B-Instruct（4.96%）和Mistral-7B-Instruct-v0.3（10.52%）的零样本准确率，Gemma-2-9b-it显示出了9.39%的增益。当结合所有三个模型时，PCS比多数投票显著提高了7.75%。

更新时间: 2025-02-11 02:25:44

领域: cs.CL,cs.LG

下载: http://arxiv.org/abs/2502.07186v1

Refine Knowledge of Large Language Models via Adaptive Contrastive Learning

How to alleviate the hallucinations of Large Language Models (LLMs) has always been the fundamental goal pursued by the LLMs research community. Looking through numerous hallucination-related studies, a mainstream category of methods is to reduce hallucinations by optimizing the knowledge representation of LLMs to change their output. Considering that the core focus of these works is the knowledge acquired by models, and knowledge has long been a central theme in human societal progress, we believe that the process of models refining knowledge can greatly benefit from the way humans learn. In our work, by imitating the human learning process, we design an Adaptive Contrastive Learning strategy. Our method flexibly constructs different positive and negative samples for contrastive learning based on LLMs' actual mastery of knowledge. This strategy helps LLMs consolidate the correct knowledge they already possess, deepen their understanding of the correct knowledge they have encountered but not fully grasped, forget the incorrect knowledge they previously learned, and honestly acknowledge the knowledge they lack. Extensive experiments and detailed analyses on widely used datasets demonstrate the effectiveness of our method.

Updated: 2025-02-11 02:19:13

标题: 通过自适应对比学习来改进大型语言模型的知识

摘要: 大语言模型（LLMs）如何缓解幻觉一直是LLMs研究界所追求的根本目标。通过查阅大量与幻觉相关的研究，一种主流方法类别是通过优化LLMs的知识表示来改变它们的输出，从而减少幻觉。考虑到这些工作的核心焦点是模型获取的知识，并且知识长期以来一直是人类社会进步的核心主题，我们相信模型改进知识的过程可以极大地受益于人类学习的方式。在我们的工作中，通过模仿人类学习过程，我们设计了一种自适应对比学习策略。我们的方法根据LLMs对知识的实际掌握情况灵活构建不同的正负样本用于对比学习。这种策略有助于LLMs巩固他们已经拥有的正确知识，加深他们对已经遇到但尚未完全掌握的正确知识的理解，忘记他们之前学过的错误知识，并诚实地承认他们缺乏的知识。对广泛使用的数据集进行的大量实验和详细分析证明了我们方法的有效性。

更新时间: 2025-02-11 02:19:13

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2502.07184v1

Multimodal Needle in a Haystack: Benchmarking Long-Context Capability of Multimodal Large Language Models

Multimodal Large Language Models (MLLMs) have shown significant promise in various applications, leading to broad interest from researchers and practitioners alike. However, a comprehensive evaluation of their long-context capabilities remains underexplored. To address these gaps, we introduce the MultiModal Needle-in-a-haystack (MMNeedle) benchmark, specifically designed to assess the long-context capabilities of MLLMs. Besides multi-image input, we employ image stitching to further increase the input context length, and develop a protocol to automatically generate labels for sub-image level retrieval. Essentially, MMNeedle evaluates MLLMs by stress-testing their capability to locate a target sub-image (needle) within a set of images (haystack) based on textual instructions and descriptions of image contents. This setup necessitates an advanced understanding of extensive visual contexts and effective information retrieval within long-context image inputs. With this benchmark, we evaluate state-of-the-art MLLMs, encompassing both API-based and open-source models. The findings reveal that GPT-4o consistently surpasses other models in long-context scenarios, but suffers from hallucination problems in negative samples, i.e., when needles are not in the haystacks. Our comprehensive long-context evaluation of MLLMs also sheds lights on the considerable performance gap between API-based and open-source models. All the code, data, and instructions required to reproduce the main results are available at https://github.com/Wang-ML-Lab/multimodal-needle-in-a-haystack.

Updated: 2025-02-11 02:17:24

标题: 一根草堆中的多模态针：多模态大型语言模型长上下文能力的基准测试

摘要: 多模态大型语言模型（MLLMs）在各种应用中展示出显著的潜力，引起了研究人员和从业者的广泛兴趣。然而，对它们长上下文能力的全面评估仍未得到充分探讨。为了填补这些空白，我们引入了MultiModal Needle-in-a-haystack（MMNeedle）基准，专门设计用来评估MLLMs的长上下文能力。除了多图像输入外，我们还采用图像拼接来进一步增加输入上下文长度，并制定了一个协议，用于自动生成子图像级别的检索标签。基本上，MMNeedle通过对图像内容的文本说明和描述，对MLLMs进行评估，从而对它们在基于长上下文图像输入中定位目标子图像（针）的能力进行了压力测试。这一设置需要对广泛的视觉背景有着高级的理解，并在长上下文图像输入中进行有效的信息检索。通过这个基准，我们评估了最先进的MLLMs，包括基于API和开源模型。研究结果显示，GPT-4o在长上下文情景中始终优于其他模型，但在负样本中存在幻觉问题，即当针不在堆中时。我们对MLLMs的全面长上下文评估还揭示了基于API和开源模型之间的显著性能差距。重现主要结果所需的所有代码、数据和说明均可在https://github.com/Wang-ML-Lab/multimodal-needle-in-a-haystack找到。

更新时间: 2025-02-11 02:17:24

领域: cs.LG,cs.AI,cs.CL,cs.CV

下载: http://arxiv.org/abs/2406.11230v2

RoboBERT: An End-to-end Multimodal Robotic Manipulation Model

Embodied intelligence integrates multiple modalities, enabling agents to understand images, language, and actions simultaneously. However, existing models always depend on additional datasets or extensive pre-training to maximize performance improvements, consuming abundant training time and expensive hardware cost. To tackle this issue, we present RoboBERT, a novel end-to-end robotic manipulation model integrated with a unique training strategy. This model utilizes a CNN-based diffusion policy, enhancing and stabilizing the effectiveness of this model by separating training processes for different modalities. It also underscores the importance of data augmentation, verifying various techniques to significantly boost performance. Unlike models that depend on extra data or large foundation models, RoboBERT achieves a highly competitive success rate while using only language-labeled expert demonstrations and maintaining a relatively smaller model size. Specifically, RoboBERT achieves an average length of 4.52 on the CALVIN benchmark for $ABCD \rightarrow D$ task, setting a new state-of-the-art (SOTA) record. Furthermore, when tested on a real robot, the model demonstrates superior performance, achieving a higher success rate than other methods trained with the same data. We propose that these concepts and methodologies of RoboBERT demonstrate extensive versatility and compatibility, contributing significantly to the development of lightweight multimodal robotic models. The code can be accessed on https://github.com/PeterWangsicheng/RoboBERT

Updated: 2025-02-11 02:16:59

标题: RoboBERT：一种端到端的多模态机器人操作模型

摘要: 具身智能整合了多种模式，使代理能够同时理解图像、语言和动作。然而，现有模型总是依赖额外的数据集或广泛的预训练来最大化性能改进，消耗大量的训练时间和昂贵的硬件成本。为了解决这个问题，我们提出了RoboBERT，一个集成了独特训练策略的全新端到端机器人操作模型。该模型利用基于CNN的扩散策略，通过为不同的模式分开训练过程来增强和稳定该模型的效果。它还强调了数据增强的重要性，验证了各种技术以显著提升性能。与依赖额外数据或大型基础模型的模型不同，RoboBERT 在仅使用语言标记的专家演示数据并保持相对较小的模型大小的情况下，实现了极具竞争力的成功率。具体来说，RoboBERT 在 CALVIN 基准测试中的 $ABCD \rightarrow D$ 任务上实现了平均长度为 4.52，创造了新的最新技术记录。此外，在真实机器人上测试时，该模型展示了优越的性能，比其他使用相同数据训练的方法实现了更高的成功率。我们提出，RoboBERT 的这些概念和方法展示了广泛的多样性和兼容性，对轻量级多模态机器人模型的发展有重要贡献。该代码可在 https://github.com/PeterWangsicheng/RoboBERT 上获取。

更新时间: 2025-02-11 02:16:59

领域: cs.RO,cs.LG

下载: http://arxiv.org/abs/2502.07837v1

Tab2Visual: Overcoming Limited Data in Tabular Data Classification Using Deep Learning with Visual Representations

This research addresses the challenge of limited data in tabular data classification, particularly prevalent in domains with constraints like healthcare. We propose Tab2Visual, a novel approach that transforms heterogeneous tabular data into visual representations, enabling the application of powerful deep learning models. Tab2Visual effectively addresses data scarcity by incorporating novel image augmentation techniques and facilitating transfer learning. We extensively evaluate the proposed approach on diverse tabular datasets, comparing its performance against a wide range of machine learning algorithms, including classical methods, tree-based ensembles, and state-of-the-art deep learning models specifically designed for tabular data. We also perform an in-depth analysis of factors influencing Tab2Visual's performance. Our experimental results demonstrate that Tab2Visual outperforms other methods in classification problems with limited tabular data.

Updated: 2025-02-11 02:12:29

标题: Tab2Visual：利用深度学习和视觉表示克服表格数据分类中的数据有限性

摘要: 这项研究解决了表格数据分类中数据有限的挑战，尤其是在具有诸如医疗保健等约束的领域中普遍存在。我们提出了Tab2Visual，一种新颖的方法，将异构表格数据转换为视觉表示，从而能够应用强大的深度学习模型。Tab2Visual通过整合新颖的图像增强技术和促进迁移学习来有效解决数据稀缺性问题。我们在多样的表格数据集上对提出的方法进行了广泛评估，将其性能与各种机器学习算法进行了比较，包括经典方法、基于树的集成方法以及专门设计用于表格数据的最新深度学习模型。我们还对影响Tab2Visual性能的因素进行了深入分析。我们的实验结果表明，Tab2Visual在具有有限表格数据的分类问题中优于其他方法。

更新时间: 2025-02-11 02:12:29

领域: cs.LG,cs.CV

下载: http://arxiv.org/abs/2502.07181v1

SymbolFit: Automatic Parametric Modeling with Symbolic Regression

We introduce SymbolFit, a framework that automates parametric modeling by using symbolic regression to perform a machine-search for functions that fit the data, while simultaneously providing uncertainty estimates in a single run. Traditionally, constructing a parametric model to accurately describe binned data has been a manual and iterative process, requiring an adequate functional form to be determined before the fit can be performed. The main challenge arises when the appropriate functional forms cannot be derived from first principles, especially when there is no underlying true closed-form function for the distribution. In this work, we address this problem by utilizing symbolic regression, a machine learning technique that explores a vast space of candidate functions without needing a predefined functional form, treating the functional form itself as a trainable parameter. Our approach is demonstrated in data analysis applications in high-energy physics experiments at the CERN Large Hadron Collider (LHC). We demonstrate its effectiveness and efficiency using five real proton-proton collision datasets from new physics searches at the LHC, namely the background modeling in resonance searches for high-mass dijet, trijet, paired-dijet, diphoton, and dimuon events. We also validate the framework using several toy datasets with one and more variables.

Updated: 2025-02-11 02:11:22

标题: SymbolFit：符号回归自动参数建模

摘要: 我们介绍了SymbolFit，这是一个框架，通过使用符号回归来执行机器搜索适合数据的函数，同时在单个运行中提供不确定性估计，从而自动化参数建模。传统上，构建一个参数模型以准确描述分箱数据一直是一个手动和迭代的过程，需要在进行拟合之前确定一个足够的函数形式。主要挑战在于当适当的函数形式无法从第一原则推导出来时，尤其是当分布没有基础的真实闭式函数时。在这项工作中，我们通过利用符号回归来解决这个问题，这是一种机器学习技术，可以探索候选函数的广泛空间，而无需预定义的函数形式，将函数形式本身视为可训练的参数。我们的方法在欧洲核子中心大型强子对撞机（LHC）的高能物理实验数据分析应用中得到展示。我们使用来自LHC新物理搜索的五个真实质子-质子碰撞数据集展示了其有效性和效率，即在高质量双喷注、三喷注、成对双喷注、双光子和双μ子事件的谐振搜索中的背景建模。我们还使用几个玩具数据集验证了该框架。

更新时间: 2025-02-11 02:11:22

领域: hep-ex,cs.LG,physics.data-an

下载: http://arxiv.org/abs/2411.09851v2

CollabStory: Multi-LLM Collaborative Story Generation and Authorship Analysis

The rise of unifying frameworks that enable seamless interoperability of Large Language Models (LLMs) has made LLM-LLM collaboration for open-ended tasks a possibility. Despite this, there have not been efforts to explore such collaborative writing. We take the next step beyond human-LLM collaboration to explore this multi-LLM scenario by generating the first exclusively LLM-generated collaborative stories dataset called CollabStory. We focus on single-author to multi-author (up to 5 LLMs) scenarios, where multiple LLMs co-author stories. We generate over 32k stories using open-source instruction-tuned LLMs. Further, we take inspiration from the PAN tasks that have set the standard for human-human multi-author writing tasks and analysis. We extend their authorship-related tasks for multi-LLM settings and present baselines for LLM-LLM collaboration. We find that current baselines are not able to handle this emerging scenario. Thus, CollabStory is a resource that could help propel an understanding as well as the development of new techniques to discern the use of multiple LLMs. This is crucial to study in the context of writing tasks since LLM-LLM collaboration could potentially overwhelm ongoing challenges related to plagiarism detection, credit assignment, maintaining academic integrity in educational settings, and addressing copyright infringement concerns. We make our dataset and code available at https://github.com/saranya-venkatraman/CollabStory.

Updated: 2025-02-11 02:09:38

标题: CollabStory：多LLM协作故事生成和作者分析

摘要: 随着能够实现大型语言模型（LLMs）无缝互操作性的统一框架的兴起，LLM-LLM协作完成开放性任务成为可能。尽管如此，迄今为止还没有探索过这种协作写作的努力。我们超越了人类-LLM协作的下一步，通过生成首个专门由LLM生成的协作故事数据集CollabStory来探索这种多LLM场景。我们专注于单作者到多作者（最多5个LLM）情景，其中多个LLM共同撰写故事。我们使用开源指令调整的LLM生成了超过32k个故事。此外，我们从为人类-人类多作者写作任务和分析设定标准的PAN任务中获得灵感。我们为多LLM设置扩展了与作者相关的任务，并提出了LLM-LLM协作的基线。我们发现当前的基线无法处理这种新兴情景。因此，CollabStory是一个资源，可以帮助推动对多个LLM使用的理解以及新技术的发展。在写作任务的背景下研究这一点至关重要，因为LLM-LLM协作可能潜在地超越与抄袭检测、学术诚信维护、学术环境中的学术诚信维护和解决版权侵权问题相关的挑战。我们将我们的数据集和代码提供在https://github.com/saranya-venkatraman/CollabStory。

更新时间: 2025-02-11 02:09:38

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2406.12665v3

Improved YOLOv7 model for insulator defect detection

Insulators are crucial insulation components and structural supports in power grids, playing a vital role in the transmission lines. Due to temperature fluctuations, internal stress, or damage from hail, insulators are prone to injury. Automatic detection of damaged insulators faces challenges such as diverse types, small defect targets, and complex backgrounds and shapes. Most research for detecting insulator defects has focused on a single defect type or a specific material. However, the insulators in the grid's transmission lines have different colors and materials. Various insulator defects coexist, and the existing methods have difficulty meeting the practical application requirements. Current methods suffer from low detection accuracy and mAP0.5 cannot meet application requirements. This paper proposes an improved YOLOv7 model for multi-type insulator defect detection. First, our model replaces the SPPCSPC module with the RFB module to enhance the network's feature extraction capability. Second, a CA mechanism is introduced into the head part to enhance the network's feature representation ability and to improve detection accuracy. Third, a WIoU loss function is employed to address the low-quality samples hindering model generalization during training, thereby improving the model's overall performance. The experimental results indicate that the proposed model exhibits enhancements across various performance metrics. Specifically, there is a 1.6% advancement in mAP_0.5, a corresponding 1.6% enhancement in mAP_0.5:0.95, a 1.3% elevation in precision, and a 1% increase in recall. Moreover, the model achieves parameter reduction by 3.2 million, leading to a decrease of 2.5 GFLOPS in computational cost. Notably, there is also an improvement of 2.81 milliseconds in single-image detection speed.

Updated: 2025-02-11 02:09:30

标题: 改进的YOLOv7模型用于绝缘子缺陷检测

摘要: 绝缘子是电网中至关重要的绝缘组件和结构支撑，对输电线路起着至关重要的作用。由于温度波动、内部应力或冰雹造成的损坏，绝缘子容易受到损伤。损坏绝缘子的自动检测面临着诸如多样化的类型、小缺陷目标和复杂的背景和形状等挑战。大多数用于检测绝缘子缺陷的研究都集中在单一缺陷类型或特定材料上。然而，电网输电线路中的绝缘子具有不同的颜色和材料。各种绝缘子缺陷同时存在，现有方法难以满足实际应用需求。当前方法存在低检测精度和mAP0.5无法满足应用需求的问题。本文提出了一个改进的YOLOv7模型，用于多类型绝缘子缺陷检测。首先，我们的模型用RFB模块替换了SPPCSPC模块，以增强网络的特征提取能力。其次，在头部引入了一个CA机制，以增强网络的特征表示能力并提高检测准确性。第三，采用了WIoU损失函数来解决训练过程中阻碍模型泛化的低质量样本问题，从而提高模型的整体表现。实验结果表明，所提出的模型在各种性能指标上都有所增强。具体而言，mAP_0.5提升了1.6%，对应mAP_0.5:0.95提升了1.6%，精度提升了1.3%，召回率提升了1%。此外，该模型减少了320万个参数，计算成本减少了2.5 GFLOPS。值得注意的是，单张图像检测速度也提高了2.81毫秒。

更新时间: 2025-02-11 02:09:30

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2502.07179v1

OpenRANet: Neuralized Spectrum Access by Joint Subcarrier and Power Allocation with Optimization-based Deep Learning

The next-generation radio access network (RAN), known as Open RAN, is poised to feature an AI-native interface for wireless cellular networks, including emerging satellite-terrestrial systems, making deep learning integral to its operation. In this paper, we address the nonconvex optimization challenge of joint subcarrier and power allocation in Open RAN, with the objective of minimizing the total power consumption while ensuring users meet their transmission data rate requirements. We propose OpenRANet, an optimization-based deep learning model that integrates machine-learning techniques with iterative optimization algorithms. We start by transforming the original nonconvex problem into convex subproblems through decoupling, variable transformation, and relaxation techniques. These subproblems are then efficiently solved using iterative methods within the standard interference function framework, enabling the derivation of primal-dual solutions. These solutions integrate seamlessly as a convex optimization layer within OpenRANet, enhancing constraint adherence, solution accuracy, and computational efficiency by combining machine learning with convex analysis, as shown in numerical experiments. OpenRANet also serves as a foundation for designing resource-constrained AI-native wireless optimization strategies for broader scenarios like multi-cell systems, satellite-terrestrial networks, and future Open RAN deployments with complex power consumption requirements.

Updated: 2025-02-11 02:06:03

标题: OpenRANet: 通过基于优化的深度学习进行联合子载波和功率分配的神经化频谱访问

摘要: 下一代无线电接入网络（RAN），即Open RAN，有望在无线蜂窝网络中引入AI原生接口，包括新兴的卫星-地面系统，使深度学习成为其运行的重要组成部分。本文解决了Open RAN中联合子载波和功率分配的非凸优化挑战，目标是在确保用户满足传输数据速率要求的同时，最小化总功耗。我们提出了OpenRANet，这是一个基于优化的深度学习模型，将机器学习技术与迭代优化算法相结合。我们通过解耦、变量转换和松弛技术将原始的非凸问题转化为凸子问题。然后，通过在标准干扰函数框架内使用迭代方法有效地解决这些子问题，从而实现原始-对偶解的导出。这些解无缝集成为OpenRANet中的凸优化层，通过将机器学习与凸分析相结合，在数值实验中显示出增强的约束遵守性、解决方案准确性和计算效率。OpenRANet还可作为设计面向资源受限的AI原生无线优化策略的基础，适用于更广泛的场景，如多小区系统、卫星-地面网络以及具有复杂功耗需求的未来Open RAN部署。

更新时间: 2025-02-11 02:06:03

领域: cs.IT,cs.AI,math.IT

下载: http://arxiv.org/abs/2409.12964v3

MatrixKAN: Parallelized Kolmogorov-Arnold Network

Kolmogorov-Arnold Networks (KAN) are a new class of neural network architecture representing a promising alternative to the Multilayer Perceptron (MLP), demonstrating improved expressiveness and interpretability. However, KANs suffer from slow training and inference speeds relative to MLPs due in part to the recursive nature of the underlying B-spline calculations. This issue is particularly apparent with respect to KANs utilizing high-degree B-splines, as the number of required non-parallelizable recursions is proportional to B-spline degree. We solve this issue by proposing MatrixKAN, a novel optimization that parallelizes B-spline calculations with matrix representation and operations, thus significantly improving effective computation time for models utilizing high-degree B-splines. In this paper, we demonstrate the superior scaling of MatrixKAN's computation time relative to B-spline degree. Further, our experiments demonstrate speedups of approximately 40x relative to KAN, with significant additional speedup potential for larger datasets or higher spline degrees.

Updated: 2025-02-11 01:59:46

标题: MatrixKAN: 并行化的 Kolmogorov-Arnold 网络

摘要: 科尔莫戈洛夫-阿诺德网络（KAN）是一种新型的神经网络架构，代表了对多层感知器（MLP）有希望的替代方案，表现出更好的表达能力和可解释性。然而，由于基础B样条计算的递归性质，KAN在训练和推断速度上相对于MLP存在较慢的问题。这个问题在使用高阶B样条的KAN中尤为明显，因为所需的不可并行化递归次数与B样条阶数成正比。我们通过提出MatrixKAN来解决这个问题，这是一种新颖的优化方法，通过矩阵表示和操作来并行化B样条计算，从而显著提高了使用高阶B样条模型的有效计算时间。在本文中，我们展示了MatrixKAN相对于B样条阶数的计算时间具有更好的扩展性。此外，我们的实验结果显示，相对于KAN，我们实现了大约40倍的加速，并且对于更大的数据集或更高的样条阶数，还有显著的额外加速潜力。

更新时间: 2025-02-11 01:59:46

领域: cs.LG

下载: http://arxiv.org/abs/2502.07176v1

Foreign-Object Detection in High-Voltage Transmission Line Based on Improved YOLOv8m

The safe operation of high-voltage transmission lines ensures the power grid's security. Various foreign objects attached to the transmission lines, such as balloons, kites and nesting birds, can significantly affect the safe and stable operation of high-voltage transmission lines. With the advancement of computer vision technology, periodic automatic inspection of foreign objects is efficient and necessary. Existing detection methods have low accuracy because foreign objects at-tached to the transmission lines are complex, including occlusions, diverse object types, significant scale variations, and complex backgrounds. In response to the practical needs of the Yunnan Branch of China Southern Power Grid Co., Ltd., this paper proposes an improved YOLOv8m-based model for detecting foreign objects on transmission lines. Experiments are conducted on a dataset collected from Yunnan Power Grid. The proposed model enhances the original YOLOv8m by in-corporating a Global Attention Module (GAM) into the backbone to focus on occluded foreign objects, replacing the SPPF module with the SPPCSPC module to augment the model's multiscale feature extraction capability, and introducing the Focal-EIoU loss function to address the issue of high- and low-quality sample imbalances. These improvements accelerate model convergence and enhance detection accuracy. The experimental results demonstrate that our proposed model achieves a 2.7% increase in mAP_0.5, a 4% increase in mAP_0.5:0.95, and a 6% increase in recall.

Updated: 2025-02-11 01:58:32

标题: 基于改进的YOLOv8m的高压输电线外物检测

摘要: 高压输电线路的安全运行确保了电网的安全性。附着在输电线路上的各种外来物体，如气球、风筝和筑巢鸟类，可能会显著影响高压输电线路的安全稳定运行。随着计算机视觉技术的进步，定期自动检测外来物体是高效且必要的。现有的检测方法准确率较低，因为附着在输电线路上的外来物体复杂，包括遮挡、多样化物体类型、显著的尺度变化和复杂的背景。针对中国南方电网云南分公司的实际需求，本文提出了一种基于改进的YOLOv8m模型用于检测输电线路上的外来物体。实验使用从云南电网收集的数据集进行。提出的模型通过将全局注意力模块（GAM）整合到主干中，以便专注于被遮挡的外来物体，用SPPCSPC模块取代SPPF模块以增强模型的多尺度特征提取能力，并引入Focal-EIoU损失函数以解决高质量和低质量样本失衡问题。这些改进加速了模型的收敛速度并提高了检测准确率。实验结果表明，我们提出的模型在mAP_0.5上实现了2.7%的增长，在mAP_0.5:0.95上实现了4%的增长，并且在召回率上实现了6%的增长。

更新时间: 2025-02-11 01:58:32

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2502.07175v1

Enhancing Multi-Step Reasoning Abilities of Language Models through Direct Q-Function Optimization

Reinforcement Learning (RL) plays a crucial role in aligning large language models (LLMs) with human preferences and improving their ability to perform complex tasks. However, current approaches either require significant computational resources due to the use of multiple models and extensive online sampling for training (e.g., PPO) or are framed as bandit problems (e.g., DPO, DRO), which often struggle with multi-step reasoning tasks, such as math problem solving and complex reasoning that involve long chains of thought. To overcome these limitations, we introduce Direct Q-function Optimization (DQO), which formulates the response generation process as a Markov Decision Process (MDP) and utilizes the soft actor-critic (SAC) framework to optimize a Q-function directly parameterized by the language model. The MDP formulation of DQO offers structural advantages over bandit-based methods, enabling more effective process supervision. Experimental results on two math problem-solving datasets, GSM8K and MATH, demonstrate that DQO outperforms previous methods, establishing it as a promising offline reinforcement learning approach for aligning language models.

Updated: 2025-02-11 01:46:35

标题: 通过直接Q函数优化增强语言模型的多步推理能力

摘要: 强化学习（RL）在将大型语言模型（LLMs）与人类偏好对齐并提高其执行复杂任务的能力方面起着至关重要的作用。然而，当前的方法要么因为使用多个模型和广泛的在线采样进行训练（例如PPO）而需要大量计算资源，要么被构建为赌博问题（例如DPO，DRO），这些方法往往在多步推理任务（如数学问题解决和涉及长链思维的复杂推理）中遇到困难。为了克服这些限制，我们引入了直接Q函数优化（DQO），将响应生成过程形式化为马尔可夫决策过程（MDP），并利用软演员-评论家（SAC）框架直接优化由语言模型参数化的Q函数。DQO的MDP形式化相对于基于赌博的方法具有结构优势，能够提供更有效的过程监督。对两个数学问题解决数据集GSM8K和MATH的实验结果表明，DQO优于先前的方法，使其成为一个有前途的离线强化学习方法，用于对齐语言模型。

更新时间: 2025-02-11 01:46:35

领域: cs.LG,cs.AI,cs.CL

下载: http://arxiv.org/abs/2410.09302v2

Advancing Precision Oncology Through Modeling of Longitudinal and Multimodal Data

Cancer evolves continuously over time through a complex interplay of genetic, epigenetic, microenvironmental, and phenotypic changes. This dynamic behavior drives uncontrolled cell growth, metastasis, immune evasion, and therapy resistance, posing challenges for effective monitoring and treatment. However, today's data-driven research in oncology has primarily focused on cross-sectional analysis using data from a single modality, limiting the ability to fully characterize and interpret the disease's dynamic heterogeneity. Advances in multiscale data collection and computational methods now enable the discovery of longitudinal multimodal biomarkers for precision oncology. Longitudinal data reveal patterns of disease progression and treatment response that are not evident from single-timepoint data, enabling timely abnormality detection and dynamic treatment adaptation. Multimodal data integration offers complementary information from diverse sources for more precise risk assessment and targeting of cancer therapy. In this review, we survey methods of longitudinal and multimodal modeling, highlighting their synergy in providing multifaceted insights for personalized care tailored to the unique characteristics of a patient's cancer. We summarize the current challenges and future directions of longitudinal multimodal analysis in advancing precision oncology.

Updated: 2025-02-11 01:44:51

标题: 推动精准肿瘤学发展：通过纵向和多模态数据建模

摘要: 癌症在时间上持续不断地演变，通过基因、表观遗传、微环境和表型变化的复杂相互作用。这种动态行为驱动了细胞不受控制的生长、转移、免疫逃逸和治疗抵抗，对有效监测和治疗提出了挑战。然而，当今肿瘤学中基于数据驱动的研究主要集中在使用单一模态数据进行横断面分析，限制了充分表征和解释疾病动态异质性的能力。多尺度数据收集和计算方法的进步现在使得发现用于精准肿瘤学的纵向多模态生物标志物成为可能。纵向数据揭示了疾病进展和治疗反应的模式，这些模式从单一时间点数据中并不明显，可以实现及时的异常检测和动态治疗调整。多模态数据集成提供了来自不同来源的互补信息，用于更精确的风险评估和癌症治疗的定位。在本文中，我们调查了纵向和多模态建模方法，突出它们在为个体化护理提供多方面见解方面的协同作用，以满足患者癌症独特特征的需求。我们总结了纵向多模态分析在推进精准肿瘤学方面的当前挑战和未来方向。

更新时间: 2025-02-11 01:44:51

领域: q-bio.QM,cs.LG

下载: http://arxiv.org/abs/2502.07836v1

Machine Learning for Scalable and Optimal Load Shedding Under Power System Contingency

Prompt and effective corrective actions in response to unexpected contingencies are crucial for improving power system resilience and preventing cascading blackouts. The optimal load shedding (OLS) accounting for network limits has the potential to address the diverse system-wide impacts of contingency scenarios as compared to traditional local schemes. However, due to the fast cascading propagation of initial contingencies, real-time OLS solutions are challenging to attain in large systems with high computation and communication needs. In this paper, we propose a decentralized design that leverages offline training of a neural network (NN) model for individual load centers to autonomously construct the OLS solutions from locally available measurements. Our learning-for-OLS approach can greatly reduce the computation and communication needs during online emergency responses, thus preventing the cascading propagation of contingencies for enhanced power grid resilience. Numerical studies on both the IEEE 118-bus system and a synthetic Texas 2000-bus system have demonstrated the efficiency and effectiveness of our scalable OLS learning design for timely power system emergency operations.

Updated: 2025-02-11 01:43:34

标题: 机器学习在电力系统事故情况下可扩展和最优负荷减少中的应用

摘要: 在应对意外事件时，及时而有效的纠正措施对于提高电力系统的韧性和防止级联停电至关重要。考虑网络限制的最优负荷减载（OLS）有潜力解决与传统的局部方案相比，应对意外情景的系统范围影响。然而，由于初始意外情景的快速级联传播，实时OLS解决方案在计算和通信需求高的大型系统中难以实现。在本文中，我们提出了一个分散设计，利用神经网络（NN）模型的离线训练，用于个体负荷中心从本地可用的测量数据自主构建OLS解决方案。我们的学习-OLS方法可以大大减少在线紧急响应过程中的计算和通信需求，从而防止意外情景的级联传播，提高电力系统的韧性。对IEEE 118母线系统和一个合成的德克萨斯2000母线系统进行的数值研究表明，我们可扩展的OLS学习设计对及时的电力系统紧急运行具有高效和有效的作用。

更新时间: 2025-02-11 01:43:34

领域: cs.LG,cs.SY,eess.SY

下载: http://arxiv.org/abs/2405.05521v2

Group & Reweight: A Novel Cost-Sensitive Approach to Mitigating Class Imbalance in Network Traffic Classification

Internet services have led to the eruption of network traffic, and machine learning on these Internet data has become an indispensable tool, especially when the application is risk-sensitive. This paper focuses on network traffic classification in the presence of severe class imbalance. Such a distributional trait mostly drifts the optimal decision boundary and results in an unsatisfactory solution. This raises safety concerns in the network traffic field when previous class imbalance methods hardly deal with numerous minority malicious classes. To alleviate these effects, we design a group & reweight strategy for alleviating class imbalance. Inspired by the group distributionally optimization framework, our approach heuristically clusters classes into groups, iteratively updates the non-parametric weights for separate classes, and optimizes the learning model by minimizing reweighted losses. We theoretically interpret the optimization process from a Stackelberg game and perform extensive experiments on typical benchmarks. Results show that our approach can not only suppress the negative effect of class imbalance but also improve the comprehensive performance in prediction.

Updated: 2025-02-11 01:38:53

标题: 分组和重新赋权：一种新颖的成本敏感方法，用于减轻网络流量分类中的类别不平衡

摘要: 互联网服务导致网络流量的激增，对这些互联网数据进行机器学习已经成为一种必不可少的工具，特别是在应用对风险敏感的情况下。本文着重研究了在严重类别不平衡情况下的网络流量分类。这种分布特征往往使最优决策边界发生偏移，导致不理想的解决方案。这在网络流量领域引起了安全问题，因为先前的类别不平衡方法很难处理大量的少数恶意类别。为了缓解这些影响，我们设计了一种用于减轻类别不平衡的分组和重新加权策略。受群分布优化框架的启发，我们的方法启发式地将类别分成组，迭代更新各个类别的非参数权重，并通过最小化重新加权损失来优化学习模型。我们从斯塔克伯格博弈的角度理论解释了优化过程，并在典型基准上进行了大量实验。结果表明，我们的方法不仅可以抑制类别不平衡的负面影响，还可以提高预测的综合性能。

更新时间: 2025-02-11 01:38:53

领域: stat.ML,cs.LG

下载: http://arxiv.org/abs/2409.19214v6

Enhancing Robustness Of Digital Shadow For CO2 Storage Monitoring With Augmented Rock Physics Modeling

To meet climate targets, the IPCC underscores the necessity of technologies capable of removing gigatonnes of CO2 annually, with Geological Carbon Storage (GCS) playing a central role. GCS involves capturing CO2 and injecting it into deep geological formations for long-term storage, requiring precise monitoring to ensure containment and prevent leakage. Time-lapse seismic imaging is essential for tracking CO2 migration but often struggles to capture the complexities of multi-phase subsurface flow. Digital Shadows (DS), leveraging machine learning-driven data assimilation techniques such as nonlinear Bayesian filtering and generative AI, provide a more detailed, uncertainty-aware monitoring approach. By incorporating uncertainties in reservoir properties, DS frameworks improve CO2 migration forecasts, reducing risks in GCS operations. However, data assimilation depends on assumptions regarding reservoir properties, rock physics models, and initial conditions, which, if inaccurate, can compromise prediction reliability. This study demonstrates that augmenting forecast ensembles with diverse rock physics models mitigates the impact of incorrect assumptions and improves predictive accuracy, particularly in differentiating uniform versus patchy saturation models.

Updated: 2025-02-11 01:33:35

标题: 使用增强的岩石物理建模提高二氧化碳封存监测中数字阴影的稳健性

摘要: 为了实现气候目标，IPCC强调必须开发能够每年移除数十亿吨二氧化碳的技术，地质碳储存（GCS）在其中扮演着关键角色。GCS涉及捕获二氧化碳并将其注入深层地质形态进行长期存储，需要精确的监测以确保封存并防止泄漏。时序地震成像对追踪二氧化碳迁移至关重要，但通常难以捕捉多相地下流动的复杂性。数字阴影（DS）利用机器学习驱动的数据同化技术，如非线性贝叶斯滤波和生成式人工智能，提供了一种更详细的、考虑不确定性的监测方法。通过纳入储层属性的不确定性，DS框架改进了二氧化碳迁移预测，降低了GCS运营风险。然而，数据同化依赖于对储层属性、岩石物理模型和初始条件的假设，如果不准确，可能会影响预测的可靠性。本研究表明，通过将多种岩石物理模型融入预测集合，可以减轻不准确假设的影响，并提高预测的准确性，特别是在区分均匀与斑块饱和模型方面。

更新时间: 2025-02-11 01:33:35

领域: physics.comp-ph,cs.LG,physics.geo-ph

下载: http://arxiv.org/abs/2502.07171v1

Advancing Geological Carbon Storage Monitoring With 3d Digital Shadow Technology

Geological Carbon Storage (GCS) is a key technology for achieving global climate goals by capturing and storing CO2 in deep geological formations. Its effectiveness and safety rely on accurate monitoring of subsurface CO2 migration using advanced time-lapse seismic imaging. A Digital Shadow framework integrates field data, including seismic and borehole measurements, to track CO2 saturation over time. Machine learning-assisted data assimilation techniques, such as generative AI and nonlinear ensemble Bayesian filtering, update a digital model of the CO2 plume while incorporating uncertainties in reservoir properties. Compared to 2D approaches, 3D monitoring enhances the spatial accuracy of GCS assessments, capturing the full extent of CO2 migration. This study extends the uncertainty-aware 2D Digital Shadow framework by incorporating 3D seismic imaging and reservoir modeling, improving decision-making and risk mitigation in CO2 storage projects.

Updated: 2025-02-11 01:25:57

标题: 利用3D数字影子技术推进地质碳储存监测

摘要: 地质碳存储（GCS）是通过在深层地质形态中捕获和储存CO2来实现全球气候目标的关键技术。其有效性和安全性依赖于利用先进的时间序列地震成像准确监测地下CO2迁移。数字阴影框架整合了现场数据，包括地震和钻孔测量，以跟踪CO2饱和度随时间的变化。机器学习辅助数据同化技术，如生成式人工智能和非线性集合贝叶斯滤波，更新CO2云的数字模型，同时考虑储层属性的不确定性。与2D方法相比，3D监测提高了GCS评估的空间精度，捕捉CO2迁移的全部范围。本研究通过整合3D地震成像和储层建模，扩展了有不确定性意识的2D数字阴影框架，改善了CO2存储项目的决策和风险缓解。

更新时间: 2025-02-11 01:25:57

领域: physics.comp-ph,cs.LG,physics.geo-ph

下载: http://arxiv.org/abs/2502.07169v1

Bayesian Optimization for Building Social-Influence-Free Consensus

We introduce Social Bayesian Optimization (SBO), a vote-efficient algorithm for consensus-building in collective decision-making. In contrast to single-agent scenarios, collective decision-making encompasses group dynamics that may distort agents' preference feedback, thereby impeding their capacity to achieve a social-influence-free consensus -- the most preferable decision based on the aggregated agent utilities. We demonstrate that under mild rationality axioms, reaching social-influence-free consensus using noisy feedback alone is impossible. To address this, SBO employs a dual voting system: cheap but noisy public votes (e.g., show of hands in a meeting), and more accurate, though expensive, private votes (e.g., one-to-one interview). We model social influence using an unknown social graph and leverage the dual voting system to efficiently learn this graph. Our theoretical findigns show that social graph estimation converges faster than the black-box estimation of agents' utilities, allowing us to reduce reliance on costly private votes early in the process. This enables efficient consensus-building primarily through noisy public votes, which are debiased based on the estimated social graph to infer social-influence-free feedback. We validate the efficacy of SBO across multiple real-world applications, including thermal comfort, team building, travel negotiation, and energy trading collaboration.

Updated: 2025-02-11 01:20:32

标题: 贝叶斯优化用于构建无社交影响的共识

摘要: 我们引入了社交贝叶斯优化（SBO），这是一种在集体决策中建立共识的投票高效算法。与单一代理人场景相比，集体决策涵盖了可能扭曲代理人偏好反馈的群体动态，从而妨碍他们实现不受社会影响的共识 - 即基于聚合代理人效用的最可取决策。我们证明，在温和的理性公理下，仅使用嘈杂的反馈就无法实现不受社会影响的共识。为了解决这个问题，SBO采用了双重投票系统：廉价但嘈杂的公开投票（例如会议中的举手投票），以及更准确但更昂贵的私人投票（例如一对一访谈）。我们使用未知的社交图模拟社会影响，并利用双重投票系统有效地学习这个图。我们的理论发现表明，社交图估计的收敛速度比代理人效用的黑盒估计更快，使我们能够在过程早期减少对昂贵私人投票的依赖。这使得通过嘈杂的公开投票实现高效共识建立成为可能，这些投票根据估计的社交图进行去偏置，以推断不受社会影响的反馈。我们验证了SBO在多个现实应用中的有效性，包括热舒适度、团队建设、旅行协商和能源交易合作。

更新时间: 2025-02-11 01:20:32

领域: cs.MA,cs.GT,cs.LG,stat.ML,62C10, 62F15

下载: http://arxiv.org/abs/2502.07166v1

Bridging LLM-Generated Code and Requirements: Reverse Generation technique and SBC Metric for Developer Insights

The rise of Large Language Models (LLMs) in software engineering, particularly in code generation, has garnered significant attention. However, assessing the quality of AI-generated code remains a challenge due to the inherent complexity of programming tasks and the lack of robust evaluation metrics that align well with human judgment. Traditional token-based metrics such as BLEU and ROUGE, while commonly used in natural language processing, exhibit weak correlations with human assessments in code intelligence and verification tasks. Furthermore, these metrics are primarily research focused and are not designed for seamless integration into the software development lifecycle, limiting their practical utility for developers seeking to improve code quality and security. AI-assisted coding has been shown to be more beneficial for senior developers, as they possess the expertise to critically evaluate the generated code for correctness, completeness, and compliance. In contrast, junior developers may struggle to identify hallucinations, missing functionality, or incorrect logic in AI-generated code. To bridge this gap, This paper introduces a novel scoring mechanism called the SBC score, which is based on a reverse generation technique that leverages the natural language generation capabilities of LLMs. Unlike direct code analysis, our approach reconstructs system requirements from AI-generated code and compares them with the original specifications to quantify accuracy. The SBC score combines semantic similarity, BLEU, and completeness analysis, providing actionable insights to developers by highlighting missing features and hallucinations. Our code and datasets are available on GitHub

Updated: 2025-02-11 01:12:11

标题: 连接LLM生成的代码和需求：逆向生成技术和SBC度量标准用于开发人员洞察

摘要: 大型语言模型（LLMs）在软件工程领域的崛起，特别是在代码生成方面，引起了广泛的关注。然而，评估AI生成的代码质量仍然是一个挑战，这是由于编程任务的固有复杂性和缺乏与人类判断相一致的健壮评估指标。传统的基于标记的度量标准，如BLEU和ROUGE，在自然语言处理中常用，但在代码智能和验证任务中与人类评估之间的相关性较弱。此外，这些度量标准主要是研究焦点，并不是为了无缝集成到软件开发生命周期中，限制了开发人员寻求改进代码质量和安全性的实际效用。已经证明，AI辅助编码对于资深开发人员更有益，因为他们具有对生成的代码进行正确性、完整性和合规性评估的专业知识。相比之下，初级开发人员可能会难以识别AI生成代码中的幻觉、缺失功能或不正确的逻辑。为了弥合这一差距，本文介绍了一种新颖的评分机制，称为SBC分数，它基于一种逆向生成技术，利用LLMs的自然语言生成能力。与直接代码分析不同，我们的方法从AI生成的代码中重建系统需求，并将其与原始规格进行比较以量化准确性。SBC分数结合了语义相似性、BLEU和完整性分析，通过突出缺失功能和幻觉，为开发人员提供可操作的见解。我们的代码和数据集可在GitHub上获得。

更新时间: 2025-02-11 01:12:11

领域: cs.SE,cs.AI

下载: http://arxiv.org/abs/2502.07835v1

Don't Just Demo, Teach Me the Principles: A Principle-Based Multi-Agent Prompting Strategy for Text Classification

We present PRINCIPLE-BASED PROMPTING, a simple but effective multi-agent prompting strategy for text classification. It first asks multiple LLM agents to independently generate candidate principles based on analysis of demonstration samples with or without labels, consolidates them into final principles via a finalizer agent, and then sends them to a classifier agent to perform downstream classification tasks. Extensive experiments on binary and multi-class classification datasets with different sizes of LLMs show that our approach not only achieves substantial performance gains (1.55% - 19.37%) over zero-shot prompting on macro-F1 score but also outperforms other strong baselines (CoT and stepback prompting). Principles generated by our approach help LLMs perform better on classification tasks than human crafted principles on two private datasets. Our multi-agent PRINCIPLE-BASED PROMPTING approach also shows on-par or better performance compared to demonstration-based few-shot prompting approaches, yet with substantially lower inference costs. Ablation studies show that label information and the multi-agent cooperative LLM framework play an important role in generating high-quality principles to facilitate downstream classification tasks.

Updated: 2025-02-11 01:10:13

标题: 不要仅仅演示，教我原则：基于原则的多Agent提示策略用于文本分类

摘要: 我们提出了基于原则的提示(PRINCIPLE-BASED PROMPTING)，这是一种简单但有效的文本分类多智能体提示策略。首先，它要求多个LLM智能体独立生成候选原则，这些原则是基于对演示样本的分析而产生的，可以是有标签的，也可以是无标签的；然后通过一个最终器智能体将它们整合成最终的原则，然后将它们发送给分类器智能体执行下游分类任务。对具有不同大小的LLM的二元和多类别分类数据集进行了大量实验，结果表明，我们的方法不仅在宏观F1分数上比零-shot提示获得了显著的性能提升（1.55%-19.37%），而且胜过了其他强基线（CoT和stepback提示）。我们方法生成的原则帮助LLM在两个私有数据集上比人工制定的原则更好地执行分类任务。我们的多智能体基于原则的提示方法在生成为少量样本提示的方法方面表现出与示范为基础的少量样本提示方法相当或更好的性能，但推断成本大大降低。消融研究表明，标签信息和多智能体合作的LLM框架在生成高质量原则以促进下游分类任务方面起着重要作用。

更新时间: 2025-02-11 01:10:13

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2502.07165v1

Does Training on Synthetic Data Make Models Less Robust?

An increasingly common practice is to train large language models (LLMs) using synthetic data. Often this synthetic data is produced by the same or similar LLMs as those it is being used to train. This raises the question of whether the synthetic data might in fact exacerbate certain "blindspots" by reinforcing heuristics that the LLM already encodes. In this paper, we conduct simulated experiments on the natural language inference (NLI) task with Llama-2-7B-hf models. We use MultiNLI as the general task and HANS, a targeted evaluation set designed to measure the presence of specific heuristic strategies for NLI, as our "blindspot" task. Our goal is to determine whether performance disparities between the general and blind spot tasks emerge. Our results indicate that synthetic data does not reinforce blindspots in the way we expected. Specifically, we see that, while fine-tuning with synthetic data doesn't necessarily reduce the use of the heuristic, it also does not make it worse as we hypothesized.

Updated: 2025-02-11 01:03:33

标题: 训练合成数据会使模型更不稳定吗？

摘要: 一种越来越常见的做法是使用合成数据来训练大型语言模型（LLMs）。通常，这些合成数据是由与被用来训练的相同或类似的LLMs生成的。这引发了一个问题，即合成数据是否实际上会加剧某些“盲点”，通过强化LLM已经编码的启发式方法。在本文中，我们对自然语言推理（NLI）任务进行了LLama-2-7B-hf模型的模拟实验。我们使用MultiNLI作为一般任务，使用HANS作为我们的“盲点”任务，这是一个专门设计用来衡量NLI特定启发式策略存在的评估集。我们的目标是确定一般任务和盲点任务之间是否出现性能差异。我们的结果表明，合成数据并没有像我们预期的那样加强盲点。具体来说，我们发现，虽然使用合成数据进行微调不一定会减少启发式的使用，但也不会像我们假设的那样使其变得更糟。

更新时间: 2025-02-11 01:03:33

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2502.07164v1

A Survey on Mamba Architecture for Vision Applications

Transformers have become foundational for visual tasks such as object detection, semantic segmentation, and video understanding, but their quadratic complexity in attention mechanisms presents scalability challenges. To address these limitations, the Mamba architecture utilizes state-space models (SSMs) for linear scalability, efficient processing, and improved contextual awareness. This paper investigates Mamba architecture for visual domain applications and its recent advancements, including Vision Mamba (ViM) and VideoMamba, which introduce bidirectional scanning, selective scanning mechanisms, and spatiotemporal processing to enhance image and video understanding. Architectural innovations like position embeddings, cross-scan modules, and hierarchical designs further optimize the Mamba framework for global and local feature extraction. These advancements position Mamba as a promising architecture in computer vision research and applications.

Updated: 2025-02-11 00:59:30

标题: 一份关于用于视觉应用的曼巴架构的调查

摘要: Transformers已成为视觉任务（如物体检测、语义分割和视频理解）的基础，但其注意力机制中的二次复杂性带来了可扩展性挑战。为了解决这些限制，Mamba架构利用状态空间模型（SSMs）实现了线性可伸缩性、高效处理和改善的上下文感知能力。本文调查了Mamba架构在视觉领域应用中的情况以及其最新进展，包括Vision Mamba（ViM）和VideoMamba，引入了双向扫描、选择性扫描机制和时空处理以增强图像和视频理解能力。架构创新，如位置嵌入、跨扫描模块和分层设计，进一步优化了Mamba框架用于全局和局部特征提取。这些进展将Mamba定位为计算机视觉研究和应用中一个有前途的架构。

更新时间: 2025-02-11 00:59:30

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2502.07161v1

Pseudorandomness Properties of Random Reversible Circuits

Motivated by practical concerns in cryptography, we study pseudorandomness properties of permutations on $\{0,1\}^n$ computed by random circuits made from reversible $3$-bit gates (permutations on $\{0,1\}^3$). Our main result is that a random circuit of depth $\sqrt{n} \cdot \tilde{O}(k^3)$, with each layer consisting of $\Theta(n)$ random gates in a fixed two-dimensional nearest-neighbor architecture, yields approximate $k$-wise independent permutations. Our result can be seen as a particularly simple/practical block cipher construction that gives provable statistical security against attackers with access to $k$~input-output pairs within few rounds. The main technical component of our proof consists of two parts: 1. We show that the Markov chain on $k$-tuples of $n$-bit strings induced by a single random $3$-bit one-dimensional nearest-neighbor gate has spectral gap at least $1/n \cdot \tilde{O}(k)$. Then we infer that a random circuit with layers of random gates in a fixed one-dimensional gate architecture yields approximate $k$-wise independent permutations of $\{0,1\}^n$ in depth $n\cdot \tilde{O}(k^2)$ 2. We show that if the $n$ wires are layed out on a two-dimensional lattice of bits, then repeatedly alternating applications of approximate $k$-wise independent permutations of $\{0,1\}^{\sqrt n}$ to the rows and columns of the lattice yields an approximate $k$-wise independent permutation of $\{0,1\}^n$ in small depth. Our work improves on the original work of Gowers, who showed a gap of $1/\mathrm{poly}(n,k)$ for one random gate (with non-neighboring inputs); and, on subsequent work improving the gap to $\Omega(1/n^2k)$ in the same setting.

Updated: 2025-02-11 00:54:24

标题: 随机可逆电路的伪随机性质

摘要: 受密码学实际问题的启发，我们研究由可逆3比特门（在{0,1}^3上的排列）构成的随机电路计算的{0,1}^n上的排列的伪随机性质。我们的主要结果是，深度为$\sqrt{n} \cdot \tilde{O}(k^3)$的随机电路，每层包含固定的二维最近邻结构中的$\Theta(n)$个随机门，产生近似$k$-wise独立排列。我们的结果可以看作是一种特别简单/实用的分组密码构造，可以针对具有少数轮次内访问$k$个输入输出对的攻击者提供可证明的统计安全性。我们证明的主要技术要素包括两部分： 1. 我们展示，由单个随机3比特一维最近邻门诱导的$n$比特字符串的$k$-元组的马尔可夫链至少具有$1/n \cdot \tilde{O}(k)$的谱间隙。然后我们推断，具有随机门层的随机电路在固定的一维门结构中产生深度为$n\cdot \tilde{O}(k^2)$的近似$k$-wise独立排列。 2. 我们展示，如果$n$个导线布置在位图的二维晶格上，那么交替应用近似$k$-wise独立排列到晶格的行和列，可以产生{0,1}^n的近似$k$-wise独立排列，并且深度较小。我们的工作改进了Gowers的原始工作，他展示了一个随机门的间隙为$1/\mathrm{poly}(n,k)$（具有非相邻输入）；以及后续的工作在相同设置中将间隙改进为$\Omega(1/n^2k)$。

更新时间: 2025-02-11 00:54:24

领域: cs.CR,math.PR

下载: http://arxiv.org/abs/2502.07159v1

TimeDiT: General-purpose Diffusion Transformers for Time Series Foundation Model

Foundation models, particularly Large Language Models (LLMs), have revolutionized text and video processing, yet time series data presents distinct challenges for such approaches due to domain-specific features such as missing values, multi-resolution characteristics, etc. Furthermore, the de-facto autoregressive transformers tend to learn deterministic temporal dependencies within pre-trained data while overlooking inherent uncertainties and lacking integration of physical constraints. In this paper, we introduce TimeDiT, a diffusion transformer model that synergistically combines transformer-based temporal dependency learning with diffusion-based probabilistic sampling. TimeDiT employs a unified masking mechanism to harmonize the training and inference process across diverse tasks while introducing a theoretically grounded, finetuning-free model editing strategy that enables flexible integration of external knowledge during sampling. Acknowledging the challenges of unifying multiple downstream tasks under a single model, our systematic evaluation demonstrates TimeDiT's effectiveness both in fundamental tasks, i.e., forecasting and imputation, through zero-shot/fine-tuning; and in domain tasks, i.e., multi-resolution forecasting, anomaly detection, and data generation, establishing it as a \textit{proto-foundation model} that bridges the gap between general-purpose and domain-specific models.

Updated: 2025-02-11 00:53:58

标题: TimeDiT：用于时间序列基础模型的通用扩散变换器

摘要: 基础模型，尤其是大型语言模型（LLMs），已经彻底改变了文本和视频处理，然而时间序列数据对这种方法提出了独特的挑战，因为领域特定特征比如缺失值、多分辨率特征等。此外，事实上的自回归变压器倾向于学习预先训练数据中的确定性时间依赖性，而忽视固有的不确定性，并缺乏对物理约束的整合。在本文中，我们介绍了TimeDiT，这是一种扩散变压器模型，它将基于变压器的时间依赖性学习与基于扩散的概率抽样相结合。TimeDiT采用统一的屏蔽机制，在各种任务中协调培训和推理过程，同时引入了一个在抽样过程中灵活整合外部知识的理论基础、无微调的模型编辑策略。在承认统一多个下游任务在一个模型下的挑战的同时，我们的系统评估证明了TimeDiT在基本任务（即预测和填充）以及领域任务（即多分辨率预测、异常检测和数据生成）中的有效性，将其建立为一种“原型基础模型”，弥合了通用模型和领域特定模型之间的差距。

更新时间: 2025-02-11 00:53:58

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2409.02322v2

MEMHD: Memory-Efficient Multi-Centroid Hyperdimensional Computing for Fully-Utilized In-Memory Computing Architectures

The implementation of Hyperdimensional Computing (HDC) on In-Memory Computing (IMC) architectures faces significant challenges due to the mismatch between highdimensional vectors and IMC array sizes, leading to inefficient memory utilization and increased computation cycles. This paper presents MEMHD, a Memory-Efficient Multi-centroid HDC framework designed to address these challenges. MEMHD introduces a clustering-based initialization method and quantization aware iterative learning for multi-centroid associative memory. Through these approaches and its overall architecture, MEMHD achieves a significant reduction in memory requirements while maintaining or improving classification accuracy. Our approach achieves full utilization of IMC arrays and enables one-shot (or few-shot) associative search. Experimental results demonstrate that MEMHD outperforms state-of-the-art binary HDC models, achieving up to 13.69% higher accuracy with the same memory usage, or 13.25x more memory efficiency at the same accuracy level. Moreover, MEMHD reduces computation cycles by up to 80x and array usage by up to 71x compared to baseline IMC mapping methods when mapped to 128x128 IMC arrays, while significantly improving energy and computation cycle efficiency.

Updated: 2025-02-11 00:53:15

标题: MEMHD：内存高效的多质心超维计算，用于充分利用内存计算架构

摘要: 超维计算（HDC）在内存计算（IMC）架构上的实现面临着重大挑战，主要是由于高维向量与IMC数组大小不匹配，导致内存利用效率低下和计算周期增加。本文介绍了MEMHD，一种设计用于解决这些挑战的内存高效多质心HDC框架。MEMHD引入了基于聚类的初始化方法和量化感知的迭代学习，用于多质心关联存储器。通过这些方法和整体架构，MEMHD实现了内存需求的显著减少，同时保持或提高分类准确性。我们的方法实现了对IMC数组的充分利用，并实现了一次性（或少量次）关联搜索。实验结果表明，与最先进的二进制HDC模型相比，MEMHD表现更好，在相同内存使用情况下准确率提高了高达13.69％，或者在相同准确性水平下内存效率提高了13.25倍。此外，与基线IMC映射方法相比，当映射到128x128 IMC数组时，MEMHD将计算周期降低了高达80倍，数组使用降低了高达71倍，同时显著提高了能量和计算周期效率。

更新时间: 2025-02-11 00:53:15

领域: cs.AR,cs.AI,cs.LG

下载: http://arxiv.org/abs/2502.07834v1

A Review of Fairness and A Practical Guide to Selecting Context-Appropriate Fairness Metrics in Machine Learning

Recent regulatory proposals for artificial intelligence emphasize fairness requirements for machine learning models. However, precisely defining the appropriate measure of fairness is challenging due to philosophical, cultural and political contexts. Biases can infiltrate machine learning models in complex ways depending on the model's context, rendering a single common metric of fairness insufficient. This ambiguity highlights the need for criteria to guide the selection of context-aware measures, an issue of increasing importance given the proliferation of ever tighter regulatory requirements. To address this, we developed a flowchart to guide the selection of contextually appropriate fairness measures. Twelve criteria were used to formulate the flowchart. This included consideration of model assessment criteria, model selection criteria, and data bias. We also review fairness literature in the context of machine learning and link it to core regulatory instruments to assist policymakers, AI developers, researchers, and other stakeholders in appropriately addressing fairness concerns and complying with relevant regulatory requirements.

Updated: 2025-02-11 00:44:45

标题: 对公平性的回顾和在机器学习中选择适合上下文的公平性度量的实用指南

摘要: 最近关于人工智能的监管提案强调了对机器学习模型的公平性要求。然而，由于哲学、文化和政治背景的不同，准确定义公平性的适当度量是具有挑战性的。偏见可以以复杂的方式渗入机器学习模型，取决于模型的上下文，使得单一的公平性指标不足以应对。这种模糊性凸显了指导选择上下文适当的公平性措施的标准的必要性，这一问题在越来越严格的监管要求不断增加的情况下变得越来越重要。为了解决这个问题，我们制定了一个流程图，指导选择上下文适当的公平性措施。使用了十二个标准来制定这个流程图。这包括考虑模型评估标准、模型选择标准和数据偏见。我们还在机器学习的背景下审查公平性文献，并将其与核心监管工具联系起来，以帮助决策者、人工智能开发者、研究人员和其他利益相关者适当地处理公平性问题，并遵守相关的监管要求。

更新时间: 2025-02-11 00:44:45

领域: cs.AI

下载: http://arxiv.org/abs/2411.06624v3

Explaining 3D Computed Tomography Classifiers with Counterfactuals

Counterfactual explanations in medical imaging are critical for understanding the predictions made by deep learning models. We extend the Latent Shift counterfactual generation method from 2D applications to 3D computed tomography (CT) scans. We address the challenges associated with 3D data, such as limited training samples and high memory demands, by implementing a slice-based approach. This method leverages a 2D encoder trained on CT slices, which are subsequently combined to maintain 3D context. We demonstrate this technique on two models for clinical phenotype prediction and lung segmentation. Our approach is both memory-efficient and effective for generating interpretable counterfactuals in high-resolution 3D medical imaging.

Updated: 2025-02-11 00:44:20

标题: 用反事实推理解释3D计算机断层扫描分类器

摘要: 在医学影像中，对因果解释是理解深度学习模型预测的关键。我们将潜在转移的因果生成方法从2D应用扩展到3D计算机断层扫描（CT）图像。我们通过实施基于切片的方法来解决与3D数据相关的挑战，例如有限的训练样本和高内存需求。该方法利用在CT切片上训练的2D编码器，然后将其组合以保持3D上下文。我们在临床表型预测和肺分割的两个模型上演示了这种技术。我们的方法在生成可解释的高分辨率3D医学成像中的因果解释方面既节约内存又有效。

更新时间: 2025-02-11 00:44:20

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2502.07156v1

Rethinking Fine-Tuning when Scaling Test-Time Compute: Limiting Confidence Improves Mathematical Reasoning

Recent progress in large language models (LLMs) highlights the power of scaling test-time compute to achieve strong performance on complex tasks, such as mathematical reasoning and code generation. This raises a critical question: how should model training be modified to optimize performance under a subsequent test-time compute strategy and budget? To explore this, we focus on pass@N, a simple test-time strategy that searches for a correct answer in $N$ independent samples. We show, surprisingly, that training with cross-entropy (CE) loss can be ${\it misaligned}$ with pass@N in that pass@N accuracy ${\it decreases}$ with longer training. We explain the origins of this misalignment in terms of model overconfidence induced by CE, and experimentally verify our prediction of overconfidence as an impediment to scaling test-time compute via pass@N. Furthermore we suggest a principled, modified training loss that is better aligned to pass@N by limiting model confidence and rescuing pass@N test performance. Our algorithm demonstrates improved mathematical reasoning on MATH and MiniF2F benchmarks under several scenarios: (1) providing answers to math questions; and (2) proving theorems by searching over proof trees of varying shapes. Overall our work underscores the importance of co-designing two traditionally separate phases of LLM development: training-time protocols and test-time search and reasoning strategies.

Updated: 2025-02-11 00:33:31

标题: 重新思考在扩展测试时间计算时的精细调整：限制置信度提高数学推理

摘要: 近期大型语言模型（LLMs）的进展突显了在复杂任务（如数学推理和代码生成）上通过扩展测试时间计算来实现强大性能的能力。这引发了一个关键问题：如何修改模型训练以优化在随后的测试时间计算策略和预算下的性能？为了探索这个问题，我们专注于pass@N，这是一种简单的测试时间策略，它在$N$个独立样本中搜索正确答案。令人惊讶的是，我们发现使用交叉熵（CE）损失进行训练可能与pass@N不一致，因为pass@N的准确率随着训练时间的延长而减少。我们解释了这种不一致的根源是由CE引起的模型过度自信，并通过实验证实了我们对过度自信阻碍通过pass@N扩展测试时间计算的预测。此外，我们提出了一个有原则的修改训练损失，更好地与pass@N对齐，通过限制模型信心并挽救pass@N的测试性能。我们的算法在MATH和MiniF2F基准测试中展示了数学推理的改进，涵盖了几种场景：（1）回答数学问题；和（2）通过搜索不同形状的证明树来证明定理。总的来说，我们的工作强调了LLM发展的两个传统分离阶段的共同设计的重要性：训练时间协议和测试时间搜索和推理策略。

更新时间: 2025-02-11 00:33:31

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2502.07154v1

Feature Importance Depends on Properties of the Data: Towards Choosing the Correct Explanations for Your Data and Decision Trees based Models

In order to ensure the reliability of the explanations of machine learning models, it is crucial to establish their advantages and limits and in which case each of these methods outperform. However, the current understanding of when and how each method of explanation can be used is insufficient. To fill this gap, we perform a comprehensive empirical evaluation by synthesizing multiple datasets with the desired properties. Our main objective is to assess the quality of feature importance estimates provided by local explanation methods, which are used to explain predictions made by decision tree-based models. By analyzing the results obtained from synthetic datasets as well as publicly available binary classification datasets, we observe notable disparities in the magnitude and sign of the feature importance estimates generated by these methods. Moreover, we find that these estimates are sensitive to specific properties present in the data. Although some model hyper-parameters do not significantly influence feature importance assignment, it is important to recognize that each method of explanation has limitations in specific contexts. Our assessment highlights these limitations and provides valuable insight into the suitability and reliability of different explanatory methods in various scenarios.

Updated: 2025-02-11 00:29:55

标题: 特征重要性取决于数据的属性：朝向选择正确的数据和基于决策树模型的解释

摘要: 为了确保机器学习模型解释的可靠性，建立它们的优势和局限性以及在哪种情况下每种方法表现更好至关重要。然而，目前对于每种解释方法何时以及如何使用的理解还不足。为了填补这一空白，我们通过综合多个具有所需属性的数据集进行全面的实证评估。我们的主要目标是评估局部解释方法提供的特征重要性估计的质量，这些方法用于解释基于决策树模型的预测。通过分析从合成数据集和公开可用的二分类数据集获得的结果，我们观察到这些方法生成的特征重要性估计的幅度和符号存在显著差异。此外，我们发现这些估计对数据中存在的特定属性敏感。虽然一些模型超参数不会显著影响特征重要性分配，但重要的是要认识到每种解释方法在特定情境下都有局限性。我们的评估突出了这些限制，并为不同解释方法在各种情况下的适用性和可靠性提供了宝贵的见解。

更新时间: 2025-02-11 00:29:55

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2502.07153v1

Conditional Distribution Quantization in Machine Learning

Conditional expectation \mathbb{E}(Y \mid X) often fails to capture the complexity of multimodal conditional distributions \mathcal{L}(Y \mid X). To address this, we propose using n-point conditional quantizations--functional mappings of X that are learnable via gradient descent--to approximate \mathcal{L}(Y \mid X). This approach adapts Competitive Learning Vector Quantization (CLVQ), tailored for conditional distributions. It goes beyond single-valued predictions by providing multiple representative points that better reflect multimodal structures. It enables the approximation of the true conditional law in the Wasserstein distance. The resulting framework is theoretically grounded and useful for uncertainty quantification and multimodal data generation tasks. For example, in computer vision inpainting tasks, multiple plausible reconstructions may exist for the same partially observed input image X. We demonstrate the effectiveness of our approach through experiments on synthetic and real-world datasets.

Updated: 2025-02-11 00:28:24

标题: 机器学习中的条件分布量化

摘要: 条件期望\mathbb{E}(Y \mid X)经常无法捕捉多峰条件分布\mathcal{L}(Y \mid X)的复杂性。为了解决这个问题，我们提出使用n点条件量化 - 可通过梯度下降学习的X的函数映射 - 来近似\mathcal{L}(Y \mid X)。这种方法改编自适用于条件分布的竞争学习向量量化（CLVQ），它通过提供更好地反映多峰结构的多个代表点来超越单值预测。它使得能够在Wasserstein距离中近似真实的条件法则。所得到的框架在理论上基础牢固，并对不确定性量化和多峰数据生成任务非常有用。例如，在计算机视觉修复任务中，对于相同部分观察到的输入图像X可能存在多个合理的重建。我们通过对合成和真实数据集的实验展示了我们方法的有效性。

更新时间: 2025-02-11 00:28:24

领域: cs.LG

下载: http://arxiv.org/abs/2502.07151v1

SAFR: Neuron Redistribution for Interpretability

Superposition refers to encoding representations of multiple features within a single neuron, which is common in deep neural networks. This property allows neurons to combine and represent multiple features, enabling the model to capture intricate information and handle complex tasks. Despite promising performance, the model's interpretability has been diminished. This paper presents a novel approach to enhance model interpretability by regularizing feature superposition. We introduce SAFR, which simply applies regularizations to the loss function to promote monosemantic representations for important tokens while encouraging polysemanticity for correlated token pairs, where important tokens and correlated token pairs are identified via VMASK and attention weights respectively. We evaluate SAFR with a transformer model on two classification tasks. Experiments demonstrate the effectiveness of SAFR in improving model interpretability without compromising prediction performance. Besides, SAFR provides explanations by visualizing the neuron allocation within the intermediate layers.

Updated: 2025-02-11 00:26:45

标题: SAFR：为可解释性重新分配神经元

摘要: 超定位是指在单个神经元中编码多个特征的表示，在深度神经网络中很常见。这种属性允许神经元结合和表示多个特征，使模型能够捕捉复杂信息并处理复杂任务。尽管表现有希望，但模型的可解释性已经降低。本文提出了一种增强模型可解释性的新方法，通过规范化特征的超定位。我们介绍了SAFR，它简单地对损失函数应用正则化，以促进重要标记的单语义表示，同时鼓励相关标记对的多语义性，其中重要标记和相关标记对分别通过VMASK和注意力权重识别。我们在两个分类任务上使用一个transformer模型评估了SAFR。实验证明了SAFR在提高模型可解释性的有效性，同时不影响预测性能。此外，SAFR通过可视化中间层内的神经元分配提供解释。

更新时间: 2025-02-11 00:26:45

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2501.16374v2

SHARP: Accelerating Language Model Inference by SHaring Adjacent layers with Recovery Parameters

While Large language models (LLMs) have advanced natural language processing tasks, their growing computational and memory demands make deployment on resource-constrained devices like mobile phones increasingly challenging. In this paper, we propose SHARP (SHaring Adjacent Layers with Recovery Parameters), a novel approach to accelerate LLM inference by sharing parameters across adjacent layers, thus reducing memory load overhead, while introducing low-rank recovery parameters to maintain performance. Inspired by observations that consecutive layers have similar outputs, SHARP employs a two-stage recovery process: Single Layer Warmup (SLW), and Supervised Fine-Tuning (SFT). The SLW stage aligns the outputs of the shared layers using L_2 loss, providing a good initialization for the following SFT stage to further restore the model performance. Extensive experiments demonstrate that SHARP can recover the model's perplexity on various in-distribution tasks using no more than 50k fine-tuning data while reducing the number of stored MLP parameters by 38% to 65%. We also conduct several ablation studies of SHARP and show that replacing layers towards the later parts of the model yields better performance retention, and that different recovery parameterizations perform similarly when parameter counts are matched. Furthermore, SHARP saves 42.8% in model storage and reduces the total inference time by 42.2% compared to the original Llama2-7b model on mobile devices. Our results highlight SHARP as an efficient solution for reducing inference costs in deploying LLMs without the need for pretraining-scale resources.

Updated: 2025-02-11 00:21:40

标题: SHARP: 通过共享相邻层和恢复参数加速语言模型推断

摘要: 尽管大型语言模型(LLMs)已经推动了自然语言处理任务，但它们日益增长的计算和内存需求使得在资源受限的设备上部署，如手机上变得越来越具有挑战性。在本文中，我们提出了SHARP（SHaring Adjacent Layers with Recovery Parameters），这是一种加速LLM推理的新方法，通过在相邻层之间共享参数，从而减少内存负载开销，同时引入低秩恢复参数以维持性能。受连续层具有类似输出的观察启发，SHARP采用两阶段恢复过程：单层热身（SLW）和监督微调（SFT）。SLW阶段利用L_2损失使共享层的输出对齐，为随后的SFT阶段提供良好的初始化，以进一步恢复模型性能。大量实验表明，SHARP可以在各种分布任务上恢复模型的困惑度，而不超过50k的微调数据，同时将存储的MLP参数数量减少了38%至65%。我们还进行了几项SHARP的消融研究，并显示在模型后部替换层会获得更好的性能保留，当参数数量匹配时，不同的恢复参数化表现相似。此外，与原始的Llama2-7b模型相比，SHARP在模型存储上节省了42.8%，并将总推理时间缩短了42.2%。我们的结果突出了SHARP作为在部署LLMs时降低推理成本的高效解决方案，而无需预训练规模的资源。

更新时间: 2025-02-11 00:21:40

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2502.07832v1

Interactive Symbolic Regression through Offline Reinforcement Learning: A Co-Design Framework

Symbolic Regression (SR) holds great potential for uncovering underlying mathematical and physical relationships from observed data. However, the vast combinatorial space of possible expressions poses significant challenges for both online search methods and pre-trained transformer models. Additionally, current state-of-the-art approaches typically do not consider the integration of domain experts' prior knowledge and do not support iterative interactions with the model during the equation discovery process. To address these challenges, we propose the Symbolic Q-network (Sym-Q), an advanced interactive framework for large-scale symbolic regression. Unlike previous large-scale transformer-based SR approaches, Sym-Q leverages reinforcement learning without relying on a transformer-based decoder. This formulation allows the agent to learn through offline reinforcement learning using any type of tree encoder, enabling more efficient training and inference. Furthermore, we propose a co-design mechanism, where the reinforcement learning-based Sym-Q facilitates effective interaction with domain experts at any stage of the equation discovery process. Users can dynamically modify generated nodes of the expression, collaborating with the agent to tailor the mathematical expression to best fit the problem and align with the assumed physical laws, particularly when there is prior partial knowledge of the expected behavior. Our experiments demonstrate that the pre-trained Sym-Q surpasses existing SR algorithms on the challenging SSDNC benchmark. Moreover, we experimentally show on real-world cases that its performance can be further enhanced by the interactive co-design mechanism, with Sym-Q achieving greater performance gains than other state-of-the-art models. Our reproducible code is available at https://github.com/EPFL-IMOS/Sym-Q.

Updated: 2025-02-11 00:20:37

标题: 通过离线强化学习进行交互式符号回归：一种协同设计框架

摘要: Symbolic Regression（SR）具有发现观察数据中潜在数学和物理关系的巨大潜力。然而，可能表达式的庞大组合空间对在线搜索方法和预先训练的转换器模型都构成重大挑战。此外，当前的最先进方法通常不考虑领域专家的先验知识的整合，并且在方程发现过程中不支持迭代交互。为了解决这些挑战，我们提出了Symbolic Q-network（Sym-Q），这是一个用于大规模符号回归的先进交互框架。与以前的基于大规模转换器的SR方法不同，Sym-Q利用强化学习而不依赖于基于转换器的解码器。这种形式允许代理通过离线强化学习使用任何类型的树编码器进行学习，从而实现更高效的训练和推断。此外，我们提出了一种共同设计机制，基于强化学习的Sym-Q在方程发现过程的任何阶段促进与领域专家的有效交互。用户可以动态修改表达式的生成节点，与代理合作，使数学表达式最佳地适应问题并符合假定的物理定律，特别是在存在预期行为的部分知识时。我们的实验表明，预先训练的Sym-Q在具有挑战性的SSDNC基准测试中超越了现有的SR算法。此外，我们通过真实案例的实验表明，交互式共同设计机制可以进一步提高其性能，Sym-Q取得的性能增益大于其他最先进模型。我们的可重现代码可在https://github.com/EPFL-IMOS/Sym-Q找到。

更新时间: 2025-02-11 00:20:37

领域: cs.LG,cs.AI,cs.SC

下载: http://arxiv.org/abs/2502.02917v2

Analytical Lyapunov Function Discovery: An RL-based Generative Approach

Despite advances in learning-based methods, finding valid Lyapunov functions for nonlinear dynamical systems remains challenging. Current neural network approaches face two main issues: challenges in scalable verification and limited interpretability. To address these, we propose an end-to-end framework using transformers to construct analytical Lyapunov functions (local), which simplifies formal verification, enhances interpretability, and provides valuable insights for control engineers. Our framework consists of a transformer-based trainer that generates candidate Lyapunov functions and a falsifier that verifies candidate expressions and refines the model via risk-seeking policy gradient. Unlike Alfarano et al. (2024), which utilizes pre-training and seeks global Lyapunov functions for low-dimensional systems, our model is trained from scratch via reinforcement learning (RL) and succeeds in finding local Lyapunov functions for high-dimensional and non-polynomial systems. Given the analytical nature of the candidates, we employ efficient optimization methods for falsification during training and formal verification tools for the final verification. We demonstrate the efficiency of our approach on a range of nonlinear dynamical systems with up to ten dimensions and show that it can discover Lyapunov functions not previously identified in the control literature.

Updated: 2025-02-11 00:19:47

标题: 分析李雅普诺夫函数的发现：基于强化学习的生成方法

摘要: 尽管学习方法取得了进展，但对于非线性动力系统找到有效的Lyapunov函数仍然具有挑战性。当前的神经网络方法面临两个主要问题：可扩展性验证的挑战和有限的可解释性。为了解决这些问题，我们提出了一个使用变压器构建分析Lyapunov函数（局部）的端到端框架，简化了形式验证，增强了可解释性，并为控制工程师提供了有价值的见解。我们的框架包括一个基于变压器的训练器，用于生成候选Lyapunov函数，以及一个验证器，通过寻求风险导向策略梯度验证候选表达式并改进模型。与Alfarano等人（2024）利用预训练并寻求低维系统的全局Lyapunov函数不同，我们的模型通过强化学习（RL）从头开始训练，并成功找到了高维和非多项式系统的局部Lyapunov函数。鉴于候选函数的分析性质，我们在训练期间采用高效的优化方法进行验证，并在最终验证时使用形式验证工具。我们展示了我们方法在多达十个维度的一系列非线性动力系统上的效率，并表明它可以发现控制文献中以前未被识别的Lyapunov函数。

更新时间: 2025-02-11 00:19:47

领域: cs.LG,cs.AI,cs.SC,cs.SY,eess.SY

下载: http://arxiv.org/abs/2502.02014v2

Analysis of Overparameterization in Continual Learning under a Linear Model

Autonomous machine learning systems that learn many tasks in sequence are prone to the catastrophic forgetting problem. Mathematical theory is needed in order to understand the extent of forgetting during continual learning. As a foundational step towards this goal, we study continual learning and catastrophic forgetting from a theoretical perspective in the simple setting of gradient descent with no explicit algorithmic mechanism to prevent forgetting. In this setting, we analytically demonstrate that overparameterization alone can mitigate forgetting in the context of a linear regression model. We consider a two-task setting motivated by permutation tasks, and show that as the overparameterization ratio becomes sufficiently high, a model trained on both tasks in sequence results in a low-risk estimator for the first task. As part of this work, we establish a non-asymptotic bound of the risk of a single linear regression task, which may be of independent interest to the field of double descent theory.

Updated: 2025-02-11 00:15:38

标题: 对线性模型下连续学习中的过参数化分析

摘要: 自主机器学习系统在学习连续多个任务时容易出现灾难性遗忘问题。需要数学理论来理解在持续学习过程中遗忘的程度。为了实现这一目标的基础步骤，我们从理论角度研究了持续学习和灾难性遗忘，在没有明确防止遗忘的算法机制的梯度下降简单设置中。在这种设置下，我们分析地证明了过度参数化本身可以在线性回归模型的背景下减轻遗忘。我们考虑了一个受排列任务启发的两任务设置，并展示了当过度参数化比例足够高时，按顺序训练两个任务的模型会导致第一个任务的低风险估计器。作为这项工作的一部分，我们建立了单个线性回归任务风险的非渐近界限，这可能对双下降理论领域具有独立的兴趣。

更新时间: 2025-02-11 00:15:38

领域: cs.LG,cs.AI,stat.ML

下载: http://arxiv.org/abs/2502.10442v1

Small steps no more: Global convergence of stochastic gradient bandits for arbitrary learning rates

We provide a new understanding of the stochastic gradient bandit algorithm by showing that it converges to a globally optimal policy almost surely using \emph{any} constant learning rate. This result demonstrates that the stochastic gradient algorithm continues to balance exploration and exploitation appropriately even in scenarios where standard smoothness and noise control assumptions break down. The proofs are based on novel findings about action sampling rates and the relationship between cumulative progress and noise, and extend the current understanding of how simple stochastic gradient methods behave in bandit settings.

Updated: 2025-02-11 00:12:04

标题: 不再小步走：针对任意学习率的随机梯度赌博机的全球收敛

摘要: 我们通过展示随机梯度赌博算法几乎肯定收敛于全局最优策略，提供了对其新的理解，使用\emph{任何}恒定的学习率。该结果表明，即使在标准平滑度和噪声控制假设破坏的情况下，随机梯度算法仍然适当地平衡探索和利用。证明基于关于动作采样率和累积进展与噪声之间关系的新发现，并扩展了关于简单随机梯度方法在赌博设置中行为的当前理解。

更新时间: 2025-02-11 00:12:04

领域: cs.LG

下载: http://arxiv.org/abs/2502.07141v1

Captured by Captions: On Memorization and its Mitigation in CLIP Models

Multi-modal models, such as CLIP, have demonstrated strong performance in aligning visual and textual representations, excelling in tasks like image retrieval and zero-shot classification. Despite this success, the mechanisms by which these models utilize training data, particularly the role of memorization, remain unclear. In uni-modal models, both supervised and self-supervised, memorization has been shown to be essential for generalization. However, it is not well understood how these findings would apply to CLIP, which incorporates elements from both supervised learning via captions that provide a supervisory signal similar to labels, and from self-supervised learning via the contrastive objective. To bridge this gap in understanding, we propose a formal definition of memorization in CLIP (CLIPMem) and use it to quantify memorization in CLIP models. Our results indicate that CLIP's memorization behavior falls between the supervised and self-supervised paradigms, with "mis-captioned" samples exhibiting highest levels of memorization. Additionally, we find that the text encoder contributes more to memorization than the image encoder, suggesting that mitigation strategies should focus on the text domain. Building on these insights, we propose multiple strategies to reduce memorization while at the same time improving utility--something that had not been shown before for traditional learning paradigms where reducing memorization typically results in utility decrease.

Updated: 2025-02-11 00:11:13

标题: 被标题所捕捉：关于CLIP模型中记忆和其缓解的问题

摘要: 多模态模型，如CLIP，在将视觉和文本表示对齐方面表现出强大的性能，在图像检索和零样本分类等任务中表现优异。尽管取得了成功，但这些模型利用训练数据的机制，特别是记忆的作用，仍然不清楚。在单模型模型中，无论是监督还是自监督，记忆都被证明对泛化至关重要。然而，目前尚不清楚这些发现如何适用于CLIP，CLIP融合了来自监督学习的元素，通过提供类似标签的标题提供监督信号，以及通过对比目标进行自监督学习。为弥补这一理解上的差距，我们提出了CLIP（CLIPMem）中记忆的正式定义，并用它来量化CLIP模型中的记忆。我们的结果表明，CLIP的记忆行为介于监督和自监督范式之间，“错误标注”的样本表现出最高水平的记忆。此外，我们发现文本编码器对记忆的贡献大于图像编码器，这表明缓解策略应重点放在文本领域上。基于这些见解，我们提出了多种策略来减少记忆，同时提高效用--这在传统学习范式中以前尚未显示，因为减少记忆通常会导致效用降低。

更新时间: 2025-02-11 00:11:13

领域: cs.CV,cs.AI,cs.LG

下载: http://arxiv.org/abs/2502.07830v1

Few-Shot Multi-Human Neural Rendering Using Geometry Constraints

We present a method for recovering the shape and radiance of a scene consisting of multiple people given solely a few images. Multi-human scenes are complex due to additional occlusion and clutter. For single-human settings, existing approaches using implicit neural representations have achieved impressive results that deliver accurate geometry and appearance. However, it remains challenging to extend these methods for estimating multiple humans from sparse views. We propose a neural implicit reconstruction method that addresses the inherent challenges of this task through the following contributions: First, we propose to use geometry constraints by exploiting pre-computed meshes using a human body model (SMPL). Specifically, we regularize the signed distances using the SMPL mesh and leverage bounding boxes for improved rendering. Second, we propose a ray regularization scheme to minimize rendering inconsistencies, and a saturation regularization for robust optimization in variable illumination. Extensive experiments on both real and synthetic datasets demonstrate the benefits of our approach and show state-of-the-art performance against existing neural reconstruction methods.

Updated: 2025-02-11 00:10:58

标题: 使用几何约束的少样本多人神经渲染

摘要: 我们提出了一种方法来恢复由多个人组成的场景的形状和辐射，仅仅通过几张图像。多人场景由于额外的遮挡和杂乱而变得复杂。对于单人场景，使用隐式神经表示的现有方法已经取得了令人印象深刻的结果，可以提供准确的几何形状和外观。然而，将这些方法扩展到从稀疏视图中估计多个人仍然具有挑战性。我们提出了一种神经隐式重建方法，通过以下贡献解决了这一任务的固有挑战：首先，我们提出利用人体模型（SMPL）使用几何约束，通过利用预先计算的网格来规范有符号距离，并利用边界框来改善渲染效果。其次，我们提出了一种射线规则化方案，以最小化渲染不一致性，并提出了一种饱和规则化方案，以在不同照明条件下进行稳健优化。对真实和合成数据集进行的广泛实验证实了我们方法的优点，并展示了与现有神经重建方法相比的最先进性能。

更新时间: 2025-02-11 00:10:58

领域: cs.CV,cs.AI,cs.GR

下载: http://arxiv.org/abs/2502.07140v1

Language-TPP: Integrating Temporal Point Processes with Language Models for Event Analysis

Temporal Point Processes (TPPs) have been widely used for event sequence modeling, but they often struggle to incorporate rich textual event descriptions effectively. Conversely, while Large Language Models (LLMs) have been shown remarkable capabilities in processing textual data, they lack mechanisms for handling temporal dynamics. To bridge this gap, we introduce Language-TPP, a unified framework that integrates TPPs with LLMs for enhanced event sequence modeling. Language-TPP introduces a novel temporal encoding mechanism that converts continuous time intervals into specialized byte-tokens, enabling seamless integration with standard LLM architectures. This approach allows Language-TPP to achieve state-of-the-art performance across multiple TPP tasks, including event time prediction, type prediction, and intensity estimation, on five datasets. Additionally, we demonstrate that incorporating temporal information significantly improves the quality of generated event descriptions.

Updated: 2025-02-11 00:09:45

标题: Language-TPP：将时间点过程与语言模型集成，用于事件分析

摘要: 时间点过程（TPP）已被广泛用于事件序列建模，但它们经常难以有效地整合丰富的文本事件描述。相反，大型语言模型（LLMs）已被证明在处理文本数据方面具有显著的能力，但它们缺乏处理时间动态的机制。为了弥补这一差距，我们引入了Language-TPP，这是一个统一的框架，将TPP与LLMs集成在一起，用于增强事件序列建模。Language-TPP引入了一种新颖的时间编码机制，将连续的时间间隔转化为专门的字节标记，实现与标准LLM体系结构的无缝集成。这种方法使Language-TPP能够在五个数据集上实现多个TPP任务的最先进性能，包括事件时间预测、类型预测和强度估计。此外，我们证明，整合时间信息显著提高了生成事件描述的质量。

更新时间: 2025-02-11 00:09:45

领域: cs.CL,cs.LG

下载: http://arxiv.org/abs/2502.07139v1

Towards a Robust Framework for Multimodal Hate Detection: A Study on Video vs. Image-based Content

Social media platforms enable the propagation of hateful content across different modalities such as textual, auditory, and visual, necessitating effective detection methods. While recent approaches have shown promise in handling individual modalities, their effectiveness across different modality combinations remains unexplored. This paper presents a systematic analysis of fusion-based approaches for multimodal hate detection, focusing on their performance across video and image-based content. Our comprehensive evaluation reveals significant modality-specific limitations: while simple embedding fusion achieves state-of-the-art performance on video content (HateMM dataset) with a 9.9% points F1-score improvement, it struggles with complex image-text relationships in memes (Hateful Memes dataset). Through detailed ablation studies and error analysis, we demonstrate how current fusion approaches fail to capture nuanced cross-modal interactions, particularly in cases involving benign confounders. Our findings provide crucial insights for developing more robust hate detection systems and highlight the need for modality-specific architectural considerations. The code is available at https://github.com/gak97/Video-vs-Meme-Hate.

Updated: 2025-02-11 00:07:40

标题: 走向一个健壮的多模式仇恨检测框架：基于视频与图像内容的研究

摘要: 社交媒体平台促使仇恨内容在不同模态（如文本、听觉和视觉）之间传播，这需要有效的检测方法。虽然最近的方法在处理单个模态时显示出潜力，但它们在不同模态组合上的有效性尚未被探讨。本文系统地分析了基于融合的多模态仇恨检测方法，重点关注它们在视频和基于图像的内容上的表现。我们的综合评估揭示了重要的模态特定限制：尽管简单的嵌入融合在视频内容（HateMM数据集）上实现了最先进的性能，F1分数提高了9.9个百分点，但在梗图（Hateful Memes数据集）中却难以处理复杂的图像-文本关系。通过详细的消融研究和错误分析，我们展示了当前的融合方法如何未能捕捉微妙的跨模态交互，特别是涉及良性混淆因素的情况。我们的发现为开发更健壮的仇恨检测系统提供了关键见解，并强调了对模态特定的架构考虑的需求。代码可在https://github.com/gak97/Video-vs-Meme-Hate上找到。

更新时间: 2025-02-11 00:07:40

领域: cs.CV,cs.CL,cs.LG,I.4.10; I.5.1; I.2.7

下载: http://arxiv.org/abs/2502.07138v1

Unveiling Simplicities of Attention: Adaptive Long-Context Head Identification

The ability to process long contexts is crucial for many natural language processing tasks, yet it remains a significant challenge. While substantial progress has been made in enhancing the efficiency of attention mechanisms, there is still a gap in understanding how attention heads function in long-context settings. In this paper, we observe that while certain heads consistently attend to local information only, others swing between attending to local and long-context information depending on the query. This raises the question: can we identify which heads require long-context information to predict the next token accurately? We demonstrate that it's possible to predict which heads are crucial for long-context processing using only local keys. The core idea here is to exploit a simple model for the long-context scores via second moment approximations. These findings unveil simple properties of attention in the context of long sequences, and open the door to potentially significant gains in efficiency.

Updated: 2025-02-11 00:04:32

标题: 揭示注意力的简单性：自适应长上下文头部识别

摘要: 处理长上下文对于许多自然语言处理任务至关重要，但仍然是一个重大挑战。虽然在增强注意力机制的效率方面已经取得了实质性进展，但在理解注意力头在长上下文环境中的功能方面仍存在差距。在本文中，我们观察到，虽然某些注意力头始终只关注本地信息，但其他头在查询时会在关注本地和长上下文信息之间摇摆。这引发了一个问题：我们能否确定哪些头需要长上下文信息才能准确预测下一个标记？我们展示了只使用本地键就可以预测哪些头对于长上下文处理至关重要的可能性。这里的核心思想是通过二阶矩近似利用一个简单模型来处理长上下文得分。这些发现揭示了在长序列环境中注意力的简单属性，并为潜在的效率显著提升打开了大门。

更新时间: 2025-02-11 00:04:32

领域: cs.CL,cs.LG

下载: http://arxiv.org/abs/2502.09647v1