Arxiv Day: Article

Natural Counterfactuals With Necessary Backtracking

Counterfactual reasoning is pivotal in human cognition and especially important for providing explanations and making decisions. While Judea Pearl's influential approach is theoretically elegant, its generation of a counterfactual scenario often requires too much deviation from the observed scenarios to be feasible, as we show using simple examples. To mitigate this difficulty, we propose a framework of \emph{natural counterfactuals} and a method for generating counterfactuals that are more feasible with respect to the actual data distribution. Our methodology incorporates a certain amount of backtracking when needed, allowing changes in causally preceding variables to minimize deviations from realistic scenarios. Specifically, we introduce a novel optimization framework that permits but also controls the extent of backtracking with a naturalness criterion. Empirical experiments demonstrate the effectiveness of our method. The code is available at https://github.com/GuangyuanHao/natural_counterfactuals.

Updated: 2024-10-30 23:53:11

标题: 具有必要回溯的自然反事实情况

摘要: 反事实推理在人类认知中至关重要，尤其在提供解释和做出决策方面尤为重要。尽管朱迪亚·珀尔的影响力方法在理论上非常优雅，但其生成反事实场景往往需要与观察到的场景有太大的偏离，以至于不可行，我们使用简单示例进行展示。为了减轻这一困难，我们提出了一个“自然反事实”框架和一种生成反事实的方法，这种方法相对于实际数据分布更为可行。我们的方法在需要时包含一定程度的回溯，允许对因果先行变量进行改变，以最小化与现实场景的偏差。具体来说，我们引入了一个新的优化框架，该框架允许但也控制回溯的程度，同时考虑了自然度标准。实证实验证明了我们方法的有效性。代码可在https://github.com/GuangyuanHao/natural_counterfactuals 上找到。

更新时间: 2024-10-30 23:53:11

领域: cs.AI,cs.CV,cs.LG,cs.NE,stat.ME

下载: http://arxiv.org/abs/2402.01607v3

Generative forecasting of brain activity enhances Alzheimer's classification and interpretation

Understanding the relationship between cognition and intrinsic brain activity through purely data-driven approaches remains a significant challenge in neuroscience. Resting-state functional magnetic resonance imaging (rs-fMRI) offers a non-invasive method to monitor regional neural activity, providing a rich and complex spatiotemporal data structure. Deep learning has shown promise in capturing these intricate representations. However, the limited availability of large datasets, especially for disease-specific groups such as Alzheimer's Disease (AD), constrains the generalizability of deep learning models. In this study, we focus on multivariate time series forecasting of independent component networks derived from rs-fMRI as a form of data augmentation, using both a conventional LSTM-based model and the novel Transformer-based BrainLM model. We assess their utility in AD classification, demonstrating how generative forecasting enhances classification performance. Post-hoc interpretation of BrainLM reveals class-specific brain network sensitivities associated with AD.

Updated: 2024-10-30 23:51:31

标题: 脑活动的生成预测增强阿尔茨海默病的分类和解释

摘要: 了解认知与内在脑活动之间的关系，通过纯数据驱动的方法仍然是神经科学中的一个重要挑战。静息态功能磁共振成像（rs-fMRI）提供了一种无创方法来监测区域神经活动，提供了丰富且复杂的时空数据结构。深度学习已经显示出捕捉这些复杂表示的潜力。然而，大型数据集的有限可用性，特别是对于疾病特定群体如阿尔茨海默病（AD）来说，限制了深度学习模型的泛化能力。在这项研究中，我们关注从rs-fMRI中导出的独立成分网络的多变量时间序列预测作为一种数据增强形式，使用传统的基于LSTM的模型和新颖的基于Transformer的BrainLM模型。我们评估它们在AD分类中的实用性，展示了生成预测如何增强分类性能。BrainLM的事后解释揭示了与AD相关的特定类别的脑网络敏感性。

更新时间: 2024-10-30 23:51:31

领域: cs.LG,q-bio.NC

下载: http://arxiv.org/abs/2410.23515v1

Learning Image Priors through Patch-based Diffusion Models for Solving Inverse Problems

Diffusion models can learn strong image priors from underlying data distribution and use them to solve inverse problems, but the training process is computationally expensive and requires lots of data. Such bottlenecks prevent most existing works from being feasible for high-dimensional and high-resolution data such as 3D images. This paper proposes a method to learn an efficient data prior for the entire image by training diffusion models only on patches of images. Specifically, we propose a patch-based position-aware diffusion inverse solver, called PaDIS, where we obtain the score function of the whole image through scores of patches and their positional encoding and utilize this as the prior for solving inverse problems. First of all, we show that this diffusion model achieves an improved memory efficiency and data efficiency while still maintaining the capability to generate entire images via positional encoding. Additionally, the proposed PaDIS model is highly flexible and can be plugged in with different diffusion inverse solvers (DIS). We demonstrate that the proposed PaDIS approach enables solving various inverse problems in both natural and medical image domains, including CT reconstruction, deblurring, and superresolution, given only patch-based priors. Notably, PaDIS outperforms previous DIS methods trained on entire image priors in the case of limited training data, demonstrating the data efficiency of our proposed approach by learning patch-based prior.

Updated: 2024-10-30 23:48:44

标题: 学习基于补丁扩散模型的图像先验用于解决逆问题

摘要: 扩散模型可以从基础数据分布中学习强大的图像先验，并利用它们来解决逆问题，但训练过程计算成本高昂，需要大量数据。这种瓶颈阻碍了大多数现有作品在高维和高分辨率数据（如3D图像）上的可行性。本文提出了一种方法，通过仅在图像块上训练扩散模型来学习整个图像的高效数据先验。具体来说，我们提出了一种基于图像块位置感知的扩散逆解算器，称为PaDIS，通过分数和位置编码获得整个图像的得分函数，并利用这一先验来解决逆问题。首先，我们展示了这种扩散模型实现了改进的内存效率和数据效率，同时仍保持通过位置编码生成整个图像的能力。此外，所提出的PaDIS模型高度灵活，可以与不同的扩散逆解算器（DIS）插接。我们证明了所提出的PaDIS方法使得能够通过仅有基于图像块的先验来解决自然和医学图像领域中的各种逆问题，包括CT重建、去模糊和超分辨率。值得注意的是，在有限的训练数据情况下，PaDIS在性能上优于先前在整个图像先验上训练的DIS方法，通过学习基于图像块的先验展示了我们提出方法的数据效率。

更新时间: 2024-10-30 23:48:44

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2406.02462v2

Subwords as Skills: Tokenization for Sparse-Reward Reinforcement Learning

Exploration in sparse-reward reinforcement learning is difficult due to the requirement of long, coordinated sequences of actions in order to achieve any reward. Moreover, in continuous action spaces there are an infinite number of possible actions, which only increases the difficulty of exploration. One class of methods designed to address these issues forms temporally extended actions, often called skills, from interaction data collected in the same domain, and optimizes a policy on top of this new action space. Typically such methods require a lengthy pretraining phase, especially in continuous action spaces, in order to form the skills before reinforcement learning can begin. Given prior evidence that the full range of the continuous action space is not required in such tasks, we propose a novel approach to skill-generation with two components. First we discretize the action space through clustering, and second we leverage a tokenization technique borrowed from natural language processing to generate temporally extended actions. Such a method outperforms baselines for skill-generation in several challenging sparse-reward domains, and requires orders-of-magnitude less computation in skill-generation and online rollouts. Our code is available at \url{https://github.com/dyunis/subwords_as_skills}.

Updated: 2024-10-30 23:45:17

标题: 子词作为技能：稀疏奖励强化学习的标记化

摘要: 由于需要长时间，协调的动作序列才能获得任何奖励，稀疏奖励强化学习中的探索是困难的。此外，在连续动作空间中有无限数量的可能动作，这只增加了探索的难度。为了解决这些问题而设计的一类方法从在相同领域收集的交互数据中形成时间扩展动作，通常被称为技能，并在这个新的动作空间上优化策略。通常这样的方法在连续动作空间中需要一个漫长的预训练阶段，以形成技能，然后才能开始强化学习。鉴于先前证据表明在这些任务中不需要连续动作空间的全部范围，我们提出了一种新颖的技能生成方法，有两个组成部分。首先，我们通过聚类对动作空间进行离散化，其次，我们利用从自然语言处理中借鉴的一种标记化技术来生成时间扩展动作。这种方法在几个具有挑战性的稀疏奖励领域的技能生成方面优于基线，并且在技能生成和在线实施中需要少得多的计算。我们的代码可以在\url{https://github.com/dyunis/subwords_as_skills}上找到。

更新时间: 2024-10-30 23:45:17

领域: cs.LG,cs.AI,cs.RO

下载: http://arxiv.org/abs/2309.04459v2

H-STAR: LLM-driven Hybrid SQL-Text Adaptive Reasoning on Tables

Tabular reasoning involves interpreting natural language queries about tabular data, which presents a unique challenge of combining language understanding with structured data analysis. Existing methods employ either textual reasoning, which excels in semantic interpretation but struggles with mathematical operations, or symbolic reasoning, which handles computations well but lacks semantic understanding. This paper introduces a novel algorithm H-STAR that integrates both symbolic and semantic (textual) approaches in a two-stage process to address these limitations. H-STAR employs: (1) step-wise table extraction using `multi-view' column retrieval followed by row extraction, and (2) adaptive reasoning that adapts reasoning strategies based on question types, utilizing semantic reasoning for direct lookup and complex lexical queries while augmenting textual reasoning with symbolic reasoning support for quantitative and logical tasks. Our extensive experiments demonstrate that H-STAR significantly outperforms state-of-the-art methods across three tabular question-answering (QA) and fact-verification datasets, underscoring its effectiveness and efficiency.

Updated: 2024-10-30 23:44:31

标题: H-STAR: 基于LLM的表格上混合SQL-文本自适应推理

摘要: Tabular reasoning involves interpreting natural language queries about tabular data, which presents a unique challenge of combining language understanding with structured data analysis. Existing methods employ either textual reasoning, which excels in semantic interpretation but struggles with mathematical operations, or symbolic reasoning, which handles computations well but lacks semantic understanding. This paper introduces a novel algorithm H-STAR that integrates both symbolic and semantic (textual) approaches in a two-stage process to address these limitations. H-STAR employs: (1) step-wise table extraction using 'multi-view' column retrieval followed by row extraction, and (2) adaptive reasoning that adapts reasoning strategies based on question types, utilizing semantic reasoning for direct lookup and complex lexical queries while augmenting textual reasoning with symbolic reasoning support for quantitative and logical tasks. Our extensive experiments demonstrate that H-STAR significantly outperforms state-of-the-art methods across three tabular question-answering (QA) and fact-verification datasets, underscoring its effectiveness and efficiency.

更新时间: 2024-10-30 23:44:31

领域: cs.DB,cs.AI,cs.CL,cs.LG

下载: http://arxiv.org/abs/2407.05952v2

Dynamic Strategy Planning for Efficient Question Answering with Large Language Models

Research has shown the effectiveness of reasoning (e.g., Chain-of-Thought), planning (e.g., SelfAsk), and retrieval augmented generation strategies to improve the performance of Large Language Models (LLMs) on various tasks, such as question answering. However, using a single fixed strategy to answer different kinds of questions is suboptimal in performance and inefficient in terms of generated output tokens and performed retrievals. In our work, we propose a novel technique DyPlan, to induce a dynamic strategy selection process in LLMs, to improve performance and reduce costs in question-answering. DyPlan incorporates an initial decision step to select the most suitable strategy conditioned on the input question and guides the LLM's response generation accordingly. We extend DyPlan to DyPlan-verify, adding an internal verification and correction process to further enrich the generated answer. Experiments on three prominent multi-hop question answering (MHQA) datasets reveal how DyPlan can improve model performance by 7-13% while reducing the cost by 11-32% relative to the best baseline model.

Updated: 2024-10-30 23:35:21

标题: 大型语言模型高效问答的动态策略规划

摘要: 研究表明，推理（例如Thought-Chain）、规划（例如SelfAsk）和检索增强生成策略等方法可以提高大型语言模型（LLMs）在各种任务上的性能，例如问答。然而，使用单一固定策略回答不同类型的问题在性能上是次优的，并且在生成的输出标记和执行的检索方面效率低下。在我们的工作中，我们提出了一种新颖的技术DyPlan，以诱导LLMs中的动态策略选择过程，以提高问答中的性能并降低成本。DyPlan包括一个初始决策步骤，根据输入问题选择最合适的策略，并指导LLM的响应生成。我们将DyPlan扩展到DyPlan-verify，添加一个内部验证和纠正过程，进一步丰富生成的答案。对三个著名的多跳问题回答（MHQA）数据集进行的实验显示，相对于最佳基线模型，DyPlan可以将模型性能提高7-13%，并降低成本11-32%。

更新时间: 2024-10-30 23:35:21

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2410.23511v1

Tiny Transformers Excel at Sentence Compression

It is staggering that words of the English language, which are on average represented by 5--6 bytes of ASCII, require as much as 24 kilobytes when served to large language models. We show that there is room for more information in every token embedding. We demonstrate that 1--3-layer transformers are capable of encoding and subsequently decoding standard English sentences into as little as a single 3-kilobyte token. Our work implies that even small networks can learn to construct valid English sentences and suggests the possibility of optimising large language models by moving from sub-word token embeddings towards larger fragments of text.

Updated: 2024-10-30 23:34:45

标题: 微型变压器在句子压缩方面表现出色

摘要: 这是一个惊人的事实，英语单词通常由5-6个ASCII字节表示，但在提供给大型语言模型时，却需要高达24千字节的空间。我们展示了每个令牌嵌入中还有更多信息的空间。我们证明，1-3层的变压器可以将标准英语句子编码并随后解码为仅3千字节的单个令牌。我们的研究表明，即使是小型网络也能学会构建有效的英语句子，并暗示通过从次单词令牌嵌入向更大的文本片段迁移，优化大型语言模型的可能性。

更新时间: 2024-10-30 23:34:45

领域: cs.LG,cs.CL

下载: http://arxiv.org/abs/2410.23510v1

Learning to Achieve Goals with Belief State Transformers

We introduce the "Belief State Transformer", a next-token predictor that takes both a prefix and suffix as inputs, with a novel objective of predicting both the next token for the prefix and the previous token for the suffix. The Belief State Transformer effectively learns to solve challenging problems that conventional forward-only transformers struggle with, in a domain-independent fashion. Key to this success is learning a compact belief state that captures all relevant information necessary for accurate predictions. Empirical ablations show that each component of the model is essential in difficult scenarios where standard Transformers fall short. For the task of story writing with known prefixes and suffixes, our approach outperforms the Fill-in-the-Middle method for reaching known goals and demonstrates improved performance even when the goals are unknown. Altogether, the Belief State Transformer enables more efficient goal-conditioned decoding, better test-time inference, and high-quality text representations on small scale problems.

Updated: 2024-10-30 23:26:06

标题: 学习使用信念状态转换器实现目标

摘要: 我们引入了“信念状态变换器”，这是一个下一个标记预测器，它接受前缀和后缀作为输入，并采用一种新颖的目标，即预测前缀的下一个标记和后缀的前一个标记。信念状态变换器有效地学习解决传统仅向前的变换器难以处理的具有挑战性的问题，而且是领域无关的。这一成功的关键在于学习一个紧凑的信念状态，该状态捕捉了所有必要的相关信息，以便进行准确的预测。实证消融表明，在标准变换器无法胜任的困难情况下，模型的每个组件都是必不可少的。在已知前缀和后缀的故事撰写任务中，我们的方法在达到已知目标时优于“填充中间”方法，并且在目标未知时表现出改进的性能。总的来说，信念状态变换器实现了更高效的目标条件解码，更好的测试时间推断，以及在小规模问题上的高质量文本表示。

更新时间: 2024-10-30 23:26:06

领域: cs.LG,cs.AI,cs.CL

下载: http://arxiv.org/abs/2410.23506v1

From Blocking to Breaking: Evaluating the Impact of Adblockers on Web Usability

Recent years have seen a sharp rise in adblocker use, driven by increased web tracking and personalized ads. However, a significant issue for adblocker users is the web breakages they encounter, which worsens their browsing experience and often leads them to turn off their adblockers. Despite efforts by filter list maintainers to create rules that minimize these breakages, they remain a common issue. Our research aims to assess the extent of web breakages caused by adblocking on live sites using automated tools, attempting to establish a baseline for these disruptions. The study also outlines the challenges and limitations encountered when measuring web breakages in real-time. The current automated crawler's inability to consistently navigate a vast array of websites, combined with the unpredictable nature of web content, makes this research particularly difficult. We have identified several key findings related to web breakages in our preliminary study, which we intend to delve deeper into in future research.

Updated: 2024-10-30 23:25:07

标题: 从拦截到破坏：评估广告拦截器对网络可用性的影响

摘要: 近年来，广告拦截器的使用率急剧上升，这主要是由于增加的网络跟踪和个性化广告。然而，广告拦截器用户面临的一个重要问题是他们遇到的网页破坏，这会恶化他们的浏览体验，通常会导致他们关闭广告拦截器。尽管过滤列表维护人员努力制定规则以最小化这些破坏，但它们仍然是一个普遍问题。我们的研究旨在使用自动化工具评估广告拦截对现场网站造成的网页破坏程度，试图建立这些干扰的基准。该研究还概述了在实时测量网页破坏时遇到的挑战和限制。当前自动化爬虫无法一致地浏览大量网站，再加上网页内容的不可预测性，使得这项研究特别困难。我们在初步研究中已经确定了与网页破坏相关的一些关键发现，我们打算在未来的研究中进一步深入研究。

更新时间: 2024-10-30 23:25:07

领域: cs.CR

下载: http://arxiv.org/abs/2410.23504v1

Development and Comparative Analysis of Machine Learning Models for Hypoxemia Severity Triage in CBRNE Emergency Scenarios Using Physiological and Demographic Data from Medical-Grade Devices

This paper presents the development of machine learning (ML) models to predict hypoxemia severity during emergency triage, especially in Chemical, Biological, Radiological, Nuclear, and Explosive (CBRNE) events, using physiological data from medical-grade sensors. Gradient Boosting Models (XGBoost, LightGBM, CatBoost) and sequential models (LSTM, GRU) were trained on physiological and demographic data from the MIMIC-III and IV datasets. A robust preprocessing pipeline addressed missing data, class imbalances, and incorporated synthetic data flagged with masks. Gradient Boosting Models (GBMs) outperformed sequential models in terms of training speed, interpretability, and reliability, making them well-suited for real-time decision-making. While their performance was comparable to that of sequential models, the GBMs used score features from six physiological variables derived from the enhanced National Early Warning Score (NEWS) 2, which we termed NEWS2+. This approach significantly improved prediction accuracy. While sequential models handled temporal data well, their performance gains did not justify the higher computational cost. A 5-minute prediction window was chosen for timely intervention, with minute-level interpolations standardizing the data. Feature importance analysis highlighted the significant role of mask and score features in enhancing both transparency and performance. Temporal dependencies proved to be less critical, as Gradient Boosting Models were able to capture key patterns effectively without relying on them. This study highlights ML's potential to improve triage and reduce alarm fatigue. Future work will integrate data from multiple hospitals to enhance model generalizability across clinical settings.

Updated: 2024-10-30 23:24:28

标题: 《利用医疗级设备的生理和人口数据开发和比较CBERN应急情况下低氧血症严重程度分级的机器学习模型》

摘要: 这篇论文介绍了使用机器学习（ML）模型来预测急诊分类中低氧血症严重程度的发展，特别是在化学、生物、放射性、核力量和爆炸（CBRNE）事件中，使用来自医疗级传感器的生理数据。梯度提升模型（XGBoost、LightGBM、CatBoost）和顺序模型（LSTM、GRU）在来自MIMIC-III和IV数据集的生理和人口统计数据上进行了训练。一个强大的预处理流程解决了缺失数据、类别不平衡问题，并结合了用掩模标记的合成数据。梯度提升模型（GBMs）在训练速度、可解释性和可靠性方面胜过了顺序模型，使其非常适合实时决策。虽然它们的表现与顺序模型相当，但GBMs使用了从增强的国家早期预警评分（NEWS）2中派生的六个生理变量的得分特征，我们称之为NEWS2+。这种方法显著提高了预测准确性。虽然顺序模型处理时间数据得当，但其性能提升并不能证明更高的计算成本。选择了一个5分钟的预测窗口进行及时干预，分钟级插值标准化数据。特征重要性分析突出了掩码和得分特征在提高透明度和性能方面的重要作用。时间依赖性证明不那么关键，因为梯度提升模型能够有效地捕捉关键模式而不依赖于它们。这项研究突显了ML改善分类和减少警报疲劳的潜力。未来的工作将整合来自多家医院的数据，以增强模型在临床设置中的泛化能力。

更新时间: 2024-10-30 23:24:28

领域: cs.LG

下载: http://arxiv.org/abs/2410.23503v1

Burning RED: Unlocking Subtask-Driven Reinforcement Learning and Risk-Awareness in Average-Reward Markov Decision Processes

Average-reward Markov decision processes (MDPs) provide a foundational framework for sequential decision-making under uncertainty. However, average-reward MDPs have remained largely unexplored in reinforcement learning (RL) settings, with the majority of RL-based efforts having been allocated to episodic and discounted MDPs. In this work, we study a unique structural property of average-reward MDPs and utilize it to introduce Reward-Extended Differential (or RED) reinforcement learning: a novel RL framework that can be used to effectively and efficiently solve various subtasks simultaneously in the average-reward setting. We introduce a family of RED learning algorithms for prediction and control, including proven-convergent algorithms for the tabular case. We then showcase the power of these algorithms by demonstrating how they can be used to learn a policy that optimizes, for the first time, the well-known conditional value-at-risk (CVaR) risk measure in a fully-online manner, without the use of an explicit bi-level optimization scheme or an augmented state-space.

Updated: 2024-10-30 23:23:42

标题: 《燃烧的红色：解锁子任务驱动的强化学习和风险意识在平均奖励马尔可夫决策过程中的应用》

摘要: 平均奖励马尔可夫决策过程（MDPs）为在不确定性下进行序贯决策提供了基础框架。然而，平均奖励MDPs在强化学习（RL）环境中仍然未被广泛探索，大多数RL方法都被分配到了一段时间和折扣MDPs上。在这项工作中，我们研究了平均奖励MDPs的一个独特结构特性，并利用它引入了奖励扩展微分（或RED）强化学习：一种新颖的RL框架，可以有效而高效地同时解决平均奖励设置中的各种子任务。我们引入了一系列用于预测和控制的RED学习算法，包括针对表格案例的已证明收敛的算法。然后，我们展示了这些算法的威力，演示了它们如何可以在完全在线的方式下学习一种策略，优化了著名的条件风险价值（CVaR）风险度量，而无需使用显式的双层优化方案或增强状态空间。

更新时间: 2024-10-30 23:23:42

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2410.10578v3

All or None: Identifiable Linear Properties of Next-token Predictors in Language Modeling

We analyze identifiability as a possible explanation for the ubiquity of linear properties across language models, such as the vector difference between the representations of "easy" and "easiest" being parallel to that between "lucky" and "luckiest". For this, we ask whether finding a linear property in one model implies that any model that induces the same distribution has that property, too. To answer that, we first prove an identifiability result to characterize distribution-equivalent next-token predictors, lifting a diversity requirement of previous results. Second, based on a refinement of relational linearity [Paccanaro and Hinton, 2001; Hernandez et al., 2024], we show how many notions of linearity are amenable to our analysis. Finally, we show that under suitable conditions, these linear properties either hold in all or none distribution-equivalent next-token predictors.

Updated: 2024-10-30 23:19:29

标题: 全有或全无：语言建模中下一个标记预测器的可识别线性属性

摘要: 我们分析可识别性作为解释语言模型中线性属性普遍存在的可能原因，比如表示“easy”和“easiest”之间的向量差与表示“lucky”和“luckiest”之间的向量差平行。为此，我们询问在一个模型中找到一个线性属性是否意味着诱导相同分布的任何模型也具有该属性。为了回答这个问题，我们首先证明了一个可识别性结果，以描述等效分布的下一个标记预测器，消除了先前结果的多样性要求。第二，基于对关系线性性的改进[Paccanaro和Hinton，2001; Hernandez等，2024]，我们展示了许多线性概念如何适合我们的分析。最后，我们表明在适当条件下，这些线性属性要么在所有等效分布的下一个标记预测器中成立，要么不成立。

更新时间: 2024-10-30 23:19:29

领域: stat.ML,cs.AI,cs.CL,cs.LG

下载: http://arxiv.org/abs/2410.23501v1

Tangent Space Causal Inference: Leveraging Vector Fields for Causal Discovery in Dynamical Systems

Causal discovery with time series data remains a challenging yet increasingly important task across many scientific domains. Convergent cross mapping (CCM) and related methods have been proposed to study time series that are generated by dynamical systems, where traditional approaches like Granger causality are unreliable. However, CCM often yields inaccurate results depending upon the quality of the data. We propose the Tangent Space Causal Inference (TSCI) method for detecting causalities in dynamical systems. TSCI works by considering vector fields as explicit representations of the systems' dynamics and checks for the degree of synchronization between the learned vector fields. The TSCI approach is model-agnostic and can be used as a drop-in replacement for CCM and its generalizations. We first present a basic version of the TSCI algorithm, which is shown to be more effective than the basic CCM algorithm with very little additional computation. We additionally present augmented versions of TSCI that leverage the expressive power of latent variable models and deep learning. We validate our theory on standard systems, and we demonstrate improved causal inference performance across a number of benchmark tasks.

Updated: 2024-10-30 23:08:12

标题: 切线空间因果推断：利用向量场进行动力系统中的因果发现

摘要: 使用时间序列数据进行因果发现仍然是许多科学领域中具有挑战性但日益重要的任务。收敛交叉映射（CCM）及相关方法已被提出用于研究由动态系统生成的时间序列，传统方法如Granger因果关系在这种情况下不可靠。然而，取决于数据质量，CCM通常会产生不准确的结果。我们提出了一种用于检测动态系统中因果关系的切线空间因果推断（TSCI）方法。TSCI通过将矢量场视为系统动态的显式表示，并检查学习到的矢量场之间的同步程度来工作。TSCI方法是与模型无关的，可以作为CCM及其推广的替代方案。我们首先介绍了TSCI算法的基本版本，证明其比基本CCM算法更有效，而且需要极少的额外计算。我们还提出了利用潜变量模型和深度学习的TSCI的增强版本。我们在标准系统上验证了我们的理论，并展示了在许多基准任务中改进的因果推断性能。

更新时间: 2024-10-30 23:08:12

领域: cs.LG,nlin.CD,stat.ML

下载: http://arxiv.org/abs/2410.23499v1

Kernel-Based Function Approximation for Average Reward Reinforcement Learning: An Optimist No-Regret Algorithm

Reinforcement learning utilizing kernel ridge regression to predict the expected value function represents a powerful method with great representational capacity. This setting is a highly versatile framework amenable to analytical results. We consider kernel-based function approximation for RL in the infinite horizon average reward setting, also referred to as the undiscounted setting. We propose an optimistic algorithm, similar to acquisition function based algorithms in the special case of bandits. We establish novel no-regret performance guarantees for our algorithm, under kernel-based modelling assumptions. Additionally, we derive a novel confidence interval for the kernel-based prediction of the expected value function, applicable across various RL problems.

Updated: 2024-10-30 23:04:10

标题: 基于核函数的函数逼近用于平均奖励强化学习：一种乐观无悔算法

摘要: 使用核岭回归来预测期望值函数的强化学习代表着具有巨大表征能力的强大方法。这种设置是一个高度灵活的框架，适合进行分析结果。我们考虑在无限时间跨度平均奖励设置中使用基于核的函数逼近来进行强化学习，也称为未折扣设置。我们提出了一种乐观算法，类似于特殊情况下的基于获取函数的算法。我们在基于核的建模假设下为我们的算法建立了新颖的无遗憾性能保证。此外，我们推导了一种新颖的置信区间，用于基于核的预测期望值函数，在各种强化学习问题中都适用。

更新时间: 2024-10-30 23:04:10

领域: cs.LG,cs.AI,stat.ML

下载: http://arxiv.org/abs/2410.23498v1

A Priori Uncertainty Quantification of Reacting Turbulence Closure Models using Bayesian Neural Networks

While many physics-based closure model forms have been posited for the sub-filter scale (SFS) in large eddy simulation (LES), vast amounts of data available from direct numerical simulation (DNS) create opportunities to leverage data-driven modeling techniques. Albeit flexible, data-driven models still depend on the dataset and the functional form of the model chosen. Increased adoption of such models requires reliable uncertainty estimates both in the data-informed and out-of-distribution regimes. In this work, we employ Bayesian neural networks (BNNs) to capture both epistemic and aleatoric uncertainties in a reacting flow model. In particular, we model the filtered progress variable scalar dissipation rate which plays a key role in the dynamics of turbulent premixed flames. We demonstrate that BNN models can provide unique insights about the structure of uncertainty of the data-driven closure models. We also propose a method for the incorporation of out-of-distribution information in a BNN. The efficacy of the model is demonstrated by a priori evaluation on a dataset consisting of a variety of flame conditions and fuels.

Updated: 2024-10-30 23:03:59

标题: 使用贝叶斯神经网络对反应湍流闭包模型的先验不确定性量化

摘要: 尽管对于大涡模拟（LES）中的次滤波尺度（SFS）已提出了许多基于物理的闭合模型形式，但直接数值模拟（DNS）中大量的数据为利用数据驱动建模技术创造了机会。尽管灵活，数据驱动模型仍取决于所选择的数据集和模型的功能形式。增加这类模型的采用需要可靠的不确定性估计，无论是在数据驱动的范畴还是在分布之外的范畴。在这项工作中，我们使用贝叶斯神经网络（BNNs）来捕捉反应流模型中的认知和随机不确定性。具体来说，我们模拟了滤波进展变量标量耗散率，在湍流预混火焰动力学中起着关键作用。我们证明了BNN模型可以提供关于数据驱动闭合模型的不确定性结构的独特见解。我们还提出了一种将分布之外信息纳入BNN模型的方法。该模型的效力通过对包含各种火焰条件和燃料的数据集的先验评估得到证实。

更新时间: 2024-10-30 23:03:59

领域: physics.flu-dyn,cs.LG,physics.data-an

下载: http://arxiv.org/abs/2402.18729v3

Unscrambling disease progression at scale: fast inference of event permutations with optimal transport

Disease progression models infer group-level temporal trajectories of change in patients' features as a chronic degenerative condition plays out. They provide unique insight into disease biology and staging systems with individual-level clinical utility. Discrete models consider disease progression as a latent permutation of events, where each event corresponds to a feature becoming measurably abnormal. However, permutation inference using traditional maximum likelihood approaches becomes prohibitive due to combinatoric explosion, severely limiting model dimensionality and utility. Here we leverage ideas from optimal transport to model disease progression as a latent permutation matrix of events belonging to the Birkhoff polytope, facilitating fast inference via optimisation of the variational lower bound. This enables a factor of 1000 times faster inference than the current state of the art and, correspondingly, supports models with several orders of magnitude more features than the current state of the art can consider. Experiments demonstrate the increase in speed, accuracy and robustness to noise in simulation. Further experiments with real-world imaging data from two separate datasets, one from Alzheimer's disease patients, the other age-related macular degeneration, showcase, for the first time, pixel-level disease progression events in the brain and eye, respectively. Our method is low compute, interpretable and applicable to any progressive condition and data modality, giving it broad potential clinical utility.

Updated: 2024-10-30 23:00:01

标题: 规模化解读疾病进展：利用最优输运快速推断事件排列

摘要: 疾病进展模型推断患者特征的群体级时间轨迹随着慢性退行性疾病的发展而发生变化。它们为疾病生物学和分期系统提供了独特的见解，并具有个体级临床实用性。离散模型将疾病进展视为事件的潜在排列，其中每个事件对应于特征变得可测异常。然而，使用传统的最大似然方法进行排列推断变得困难，由于组合爆炸，严重限制了模型的维度和实用性。在这里，我们利用最优输运的思想将疾病进展建模为属于Birkhoff多面体的事件的潜在排列矩阵，通过优化变分下界实现快速推断。这使得推断速度比当前技术水平快1000倍，相应地，支持具有几个数量级更多特征的模型，而当前技术水平只能考虑几个数量级的特征。实验证明了在模拟中速度、准确性和对噪音的抗干扰性的提高。进一步的实验使用来自两个不同数据集的实际成像数据，一个来自阿尔茨海默病患者，另一个来自年龄相关性黄斑变性，首次展示了大脑和眼睛中像素级的疾病进展事件。我们的方法计算量低，可解释性强，适用于任何进行性疾病和数据模态，具有广泛的潜在临床实用性。

更新时间: 2024-10-30 23:00:01

领域: cs.LG

下载: http://arxiv.org/abs/2410.14388v2

Deterministic Policies for Constrained Reinforcement Learning in Polynomial Time

We present a novel algorithm that efficiently computes near-optimal deterministic policies for constrained reinforcement learning (CRL) problems. Our approach combines three key ideas: (1) value-demand augmentation, (2) action-space approximate dynamic programming, and (3) time-space rounding. Our algorithm constitutes a fully polynomial-time approximation scheme (FPTAS) for any time-space recursive (TSR) cost criteria. A TSR criteria requires the cost of a policy to be computable recursively over both time and (state) space, which includes classical expectation, almost sure, and anytime constraints. Our work answers three open questions spanning two long-standing lines of research: polynomial-time approximability is possible for 1) anytime-constrained policies, 2) almost-sure-constrained policies, and 3) deterministic expectation-constrained policies.

Updated: 2024-10-30 22:58:51

标题: 多项式时间内约束强化学习的确定性策略

摘要: 我们提出了一种新颖的算法，能够高效地计算受限强化学习（CRL）问题的近似最优确定性策略。我们的方法结合了三个关键思想：（1）价值需求增强，（2）动作空间近似动态规划，以及（3）时间-空间舍入。我们的算法构成了一个针对任何时间-空间递归（TSR）成本标准的完全多项式时间逼近方案（FPTAS）。TSR标准要求策略的成本能够在时间和（状态）空间中递归计算，其中包括经典的期望、几乎确定性和随时约束。我们的工作回答了两条长期研究的三个开放性问题：1）可以对任何时刻约束的策略进行多项式时间逼近，2）可以对几乎确定性约束的策略进行多项式时间逼近，以及3）可以对确定性期望约束的策略进行多项式时间逼近。

更新时间: 2024-10-30 22:58:51

领域: cs.LG,cs.DS

下载: http://arxiv.org/abs/2405.14183v2

Representation Noising: A Defence Mechanism Against Harmful Finetuning

Releasing open-source large language models (LLMs) presents a dual-use risk since bad actors can easily fine-tune these models for harmful purposes. Even without the open release of weights, weight stealing and fine-tuning APIs make closed models vulnerable to harmful fine-tuning attacks (HFAs). While safety measures like preventing jailbreaks and improving safety guardrails are important, such measures can easily be reversed through fine-tuning. In this work, we propose Representation Noising (RepNoise), a defence mechanism that operates even when attackers have access to the weights. RepNoise works by removing information about harmful representations such that it is difficult to recover them during fine-tuning. Importantly, our defence is also able to generalize across different subsets of harm that have not been seen during the defence process as long as they are drawn from the same distribution of the attack set. Our method does not degrade the general capability of LLMs and retains the ability to train the model on harmless tasks. We provide empirical evidence that the efficacy of our defence lies in its ``depth'': the degree to which information about harmful representations is removed across all layers of the LLM. We also find areas where RepNoise still remains ineffective and highlight how those limitations can inform future research.

Updated: 2024-10-30 22:58:40

标题: 表征噪声：一种防御机制对抗有害的微调

摘要: 释放开源大型语言模型（LLMs）存在双重风险，因为恶意行为者可以轻松地对这些模型进行有害目的的微调。即使没有权重的开放释放，权重窃取和微调API使封闭模型容易受到有害微调攻击（HFAs）的影响。虽然防止越狱和改进安全防护栏等安全措施很重要，但这些措施很容易通过微调来逆转。在这项工作中，我们提出了一种称为Representation Noising（RepNoise）的防御机制，即使攻击者可以访问权重，也能起作用。RepNoise的工作原理是消除有害表示的信息，使其在微调过程中难以恢复。重要的是，我们的防御还能够泛化到未在防御过程中看到过的来自攻击集合相同分布的不同有害子集。我们的方法不会降低LLMs的一般能力，并保留了在无害任务上训练模型的能力。我们提供实证证据，证明我们的防御的有效性在于其“深度”：在LLMs的所有层中消除有害表示的信息程度。我们还发现RepNoise仍然无效的领域，并强调这些限制如何指导未来的研究。

更新时间: 2024-10-30 22:58:40

领域: cs.CL,cs.LG

下载: http://arxiv.org/abs/2405.14577v4

Strong but simple: A Baseline for Domain Generalized Dense Perception by CLIP-based Transfer Learning

Domain generalization (DG) remains a significant challenge for perception based on deep neural networks (DNNs), where domain shifts occur due to synthetic data, lighting, weather, or location changes. Vision-language models (VLMs) marked a large step for the generalization capabilities and have been already applied to various tasks. Very recently, first approaches utilized VLMs for domain generalized segmentation and object detection and obtained strong generalization. However, all these approaches rely on complex modules, feature augmentation frameworks or additional models. Surprisingly and in contrast to that, we found that simple fine-tuning of vision-language pre-trained models yields competitive or even stronger generalization results while being extremely simple to apply. Moreover, we found that vision-language pre-training consistently provides better generalization than the previous standard of vision-only pre-training. This challenges the standard of using ImageNet-based transfer learning for domain generalization. Fully fine-tuning a vision-language pre-trained model is capable of reaching the domain generalization SOTA when training on the synthetic GTA5 dataset. Moreover, we confirm this observation for object detection on a novel synthetic-to-real benchmark. We further obtain superior generalization capabilities by reaching 77.9% mIoU on the popular Cityscapes-to-ACDC benchmark. We also found improved in-domain generalization, leading to an improved SOTA of 86.4% mIoU on the Cityscapes test set marking the first place on the leaderboard.

Updated: 2024-10-30 22:58:36

标题: 强大而简单：基于CLIP迁移学习的领域泛化密集感知的基准线

摘要: 领域泛化（DG）仍然是基于深度神经网络（DNN）的感知领域面临的一个重要挑战，其中领域转移是由于合成数据、光照、天气或位置变化而发生。视觉语言模型（VLMs）为泛化能力迈出了重要一步，并已被应用于各种任务。最近，首次尝试利用VLMs进行领域泛化分割和目标检测，并获得了强大的泛化能力。然而，所有这些方法都依赖于复杂的模块、特征增强框架或额外的模型。令人惊讶的是，与此相反，我们发现简单地对视觉语言预训练模型进行微调可以获得竞争性甚至更强的泛化结果，同时非常简单易用。此外，我们发现，视觉语言预训练一直比以前的视觉预训练标准提供更好的泛化能力。这对于使用基于ImageNet的迁移学习作为领域泛化的标准构成了挑战。完全微调一个视觉语言预训练模型能够在训练合成GTA5数据集时达到领域泛化的SOTA。此外，我们在新的从合成到真实的基准上对目标检测进行确认。我们进一步通过在流行的Cityscapes-to-ACDC基准上达到77.9%的mIoU来获得更优越的泛化能力。我们还发现在领域内泛化方面有所改善，导致在Cityscapes测试集上达到86.4%的mIoU的改进的SOTA，标志着榜首位置的获得。

更新时间: 2024-10-30 22:58:36

领域: cs.CV,cs.AI,cs.LG

下载: http://arxiv.org/abs/2312.02021v3

DiGRAF: Diffeomorphic Graph-Adaptive Activation Function

In this paper, we propose a novel activation function tailored specifically for graph data in Graph Neural Networks (GNNs). Motivated by the need for graph-adaptive and flexible activation functions, we introduce DiGRAF, leveraging Continuous Piecewise-Affine Based (CPAB) transformations, which we augment with an additional GNN to learn a graph-adaptive diffeomorphic activation function in an end-to-end manner. In addition to its graph-adaptivity and flexibility, DiGRAF also possesses properties that are widely recognized as desirable for activation functions, such as differentiability, boundness within the domain, and computational efficiency. We conduct an extensive set of experiments across diverse datasets and tasks, demonstrating a consistent and superior performance of DiGRAF compared to traditional and graph-specific activation functions, highlighting its effectiveness as an activation function for GNNs. Our code is available at https://github.com/ipsitmantri/DiGRAF.

Updated: 2024-10-30 22:58:26

标题: DiGRAF：各向同构图自适应激活函数

摘要: 在本文中，我们提出了一种专门针对图数据在图神经网络（GNNs）中的激活函数。受到对图适应性和灵活性激活函数的需求的启发，我们引入了DiGRAF，利用连续分段仿射基础（CPAB）变换，我们通过额外的GNN来学习以图适应性的等变激活函数，从而以端到端的方式。除了具有图适应性和灵活性外，DiGRAF还具有广泛认可的激活函数的理想性质，如可微性、在域内的界限和计算效率。我们在各种数据集和任务上进行了大量实验，展示了与传统和特定于图的激活函数相比，DiGRAF表现出一致且优越的性能，突显其作为GNN的激活函数的有效性。我们的代码可在https://github.com/ipsitmantri/DiGRAF 上获得。

更新时间: 2024-10-30 22:58:26

领域: cs.LG

下载: http://arxiv.org/abs/2407.02013v2

DASH: Warm-Starting Neural Network Training in Stationary Settings without Loss of Plasticity

Warm-starting neural network training by initializing networks with previously learned weights is appealing, as practical neural networks are often deployed under a continuous influx of new data. However, it often leads to loss of plasticity, where the network loses its ability to learn new information, resulting in worse generalization than training from scratch. This occurs even under stationary data distributions, and its underlying mechanism is poorly understood. We develop a framework emulating real-world neural network training and identify noise memorization as the primary cause of plasticity loss when warm-starting on stationary data. Motivated by this, we propose Direction-Aware SHrinking (DASH), a method aiming to mitigate plasticity loss by selectively forgetting memorized noise while preserving learned features. e validate our approach on vision tasks, demonstrating improvements in test accuracy and training efficiency.

Updated: 2024-10-30 22:57:54

标题: DASH：在稳定环境中进行神经网络训练的热启动，而不丧失可塑性

摘要: 通过使用先前学习的权重初始化网络来进行神经网络训练的热启动方法是有吸引力的，因为实际应用中的神经网络通常会不断接收新数据。然而，这往往会导致可塑性丧失，即网络失去学习新信息的能力，导致泛化能力比从头开始训练要差。即使在静态数据分布下，也会发生这种情况，其潜在机制尚不明确。我们开发了一个模拟真实神经网络训练的框架，并确定在静态数据上进行热启动时，噪声记忆是可塑性丧失的主要原因。受此启发，我们提出了Direction-Aware SHrinking（DASH）方法，旨在通过选择性地遗忘记忆的噪声而保留学习到的特征来减轻可塑性丧失。我们在视觉任务上验证了我们的方法，在测试精度和训练效率方面取得了改善。

更新时间: 2024-10-30 22:57:54

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2410.23495v1

Causality-Driven Audits of Model Robustness

Robustness audits of deep neural networks (DNN) provide a means to uncover model sensitivities to the challenging real-world imaging conditions that significantly degrade DNN performance in-the-wild. Such conditions are often the result of the compounding of multiple factors inherent to the environment, sensor, or processing pipeline and may lead to complex image distortions that are not easily categorized. When robustness audits are limited to a set of pre-determined imaging effects or distortions, the results cannot be (easily) transferred to real-world conditions where image corruptions may be more complex or nuanced. To address this challenge, we present a new alternative robustness auditing method that uses causal inference to measure DNN sensitivities to the factors of the imaging process that cause complex distortions. Our approach uses causal models to explicitly encode assumptions about the domain-relevant factors and their interactions. Then, through extensive experiments on natural and rendered images across multiple vision tasks, we show that our approach reliably estimates causal effects of each factor on DNN performance using observational domain data. These causal effects directly tie DNN sensitivities to observable properties of the imaging pipeline in the domain of interest towards reducing the risk of unexpected DNN failures when deployed in that domain.

Updated: 2024-10-30 22:57:50

标题: 因果驱动的模型稳健性审计

摘要: 深度神经网络（DNN）的鲁棒性审计提供了一种发现模型对具有挑战性的现实世界成像条件的敏感性的方法，这些条件显著降低了DNN在野外的性能。这些条件通常是由于环境、传感器或处理管道固有的多个因素相互作用而产生的，并可能导致复杂的图像失真，这些失真不容易分类。当鲁棒性审计局限于一组预先确定的成像效果或失真时，结果不能（轻松地）转移到图像损坏可能更为复杂或微妙的真实世界条件。为了解决这一挑战，我们提出一种新的替代性鲁棒性审计方法，该方法利用因果推断来衡量引起复杂失真的成像过程因素对DNN敏感性的影响。我们的方法使用因果模型来明确地编码有关领域相关因素及其相互作用的假设。然后，通过对自然和生成图像进行广泛实验涉及多个视觉任务，我们展示了我们的方法可靠地估计每个因素对DNN性能的因果效应，利用观察领域数据。这些因果效应直接将DNN的敏感性与感兴趣领域成像管道的可观属性联系起来，以降低在该领域部署时出现意外DNN故障的风险。

更新时间: 2024-10-30 22:57:50

领域: cs.CV,cs.AI,cs.LG

下载: http://arxiv.org/abs/2410.23494v1

Latent Functional Maps: a spectral framework for representation alignment

Neural models learn data representations that lie on low-dimensional manifolds, yet modeling the relation between these representational spaces is an ongoing challenge. By integrating spectral geometry principles into neural modeling, we show that this problem can be better addressed in the functional domain, mitigating complexity, while enhancing interpretability and performances on downstream tasks. To this end, we introduce a multi-purpose framework to the representation learning community, which allows to: (i) compare different spaces in an interpretable way and measure their intrinsic similarity; (ii) find correspondences between them, both in unsupervised and weakly supervised settings, and (iii) to effectively transfer representations between distinct spaces.We validate our framework on various applications, ranging from stitching to retrieval tasks, and on multiple modalities, demonstrating that Latent Functional Maps can serve as a swiss-army knife for representation alignment.

Updated: 2024-10-30 22:47:47

标题: 潜在功能映射：一种用于表示对齐的谱框架

摘要: 神经模型学习数据表示位于低维流形上，然而建模这些表征空间之间的关系是一个持续挑战。通过将谱几何原理整合到神经建模中，我们展示了这个问题可以在功能域中得到更好的解决，减轻复杂性，同时增强可解释性和在下游任务中的性能。为此，我们向表示学习社区引入了一个多用途框架，可以：(i)以可解释的方式比较不同空间并测量它们的内在相似性；(ii)在无监督和弱监督设置中找到它们之间的对应关系；以及(iii)有效地在不同空间之间传输表示。我们在各种应用中验证了我们的框架，包括拼接到检索任务，以及多种模态，展示了潜在功能映射可以作为表示对齐的瑞士军刀。

更新时间: 2024-10-30 22:47:47

领域: cs.LG

下载: http://arxiv.org/abs/2406.14183v3

Optimizing Attention and Cognitive Control Costs Using Temporally-Layered Architectures

The current reinforcement learning framework focuses exclusively on performance, often at the expense of efficiency. In contrast, biological control achieves remarkable performance while also optimizing computational energy expenditure and decision frequency. We propose a Decision Bounded Markov Decision Process (DB-MDP), that constrains the number of decisions and computational energy available to agents in reinforcement learning environments. Our experiments demonstrate that existing reinforcement learning algorithms struggle within this framework, leading to either failure or suboptimal performance. To address this, we introduce a biologically-inspired, Temporally Layered Architecture (TLA), enabling agents to manage computational costs through two layers with distinct time scales and energy requirements. TLA achieves optimal performance in decision-bounded environments and in continuous control environments, it matches state-of-the-art performance while utilizing a fraction of the compute cost. Compared to current reinforcement learning algorithms that solely prioritize performance, our approach significantly lowers computational energy expenditure while maintaining performance. These findings establish a benchmark and pave the way for future research on energy and time-aware control.

Updated: 2024-10-30 22:38:06

标题: 通过使用时间分层架构优化注意力和认知控制成本

摘要: 当前的强化学习框架专注于性能，通常牺牲了效率。相比之下，生物控制在实现卓越性能的同时，还优化了计算能量消耗和决策频率。我们提出了一个决策有限的马尔可夫决策过程（DB-MDP），它限制了在强化学习环境中可用于代理的决策数量和计算能量。我们的实验表明，现有的强化学习算法在这个框架内面临困难，导致失败或次优性能。为了解决这个问题，我们引入了一个受生物启发的时间分层结构（TLA），使代理能够通过具有不同时间尺度和能量需求的两个层来管理计算成本。TLA在决策有限的环境中实现了最佳性能，在连续控制环境中，它与最先进的性能相匹配，同时利用了计算成本的一小部分。与当前仅优先考虑性能的强化学习算法相比，我们的方法显著降低了计算能量消耗，同时保持了性能。这些发现建立了一个基准，并为未来关于能源和时间感知控制的研究铺平了道路。

更新时间: 2024-10-30 22:38:06

领域: cs.AI,cs.SY,eess.SY

下载: http://arxiv.org/abs/2305.18701v3

Navigating the Safety Landscape: Measuring Risks in Finetuning Large Language Models

Safety alignment is crucial to ensure that large language models (LLMs) behave in ways that align with human preferences and prevent harmful actions during inference. However, recent studies show that the alignment can be easily compromised through finetuning with only a few adversarially designed training examples. We aim to measure the risks in finetuning LLMs through navigating the LLM safety landscape. We discover a new phenomenon observed universally in the model parameter space of popular open-source LLMs, termed as "safety basin": random perturbations to model weights maintain the safety level of the original aligned model within its local neighborhood. However, outside this local region, safety is fully compromised, exhibiting a sharp, step-like drop. This safety basin contrasts sharply with the LLM capability landscape, where model performance peaks at the origin and gradually declines as random perturbation increases. Our discovery inspires us to propose the new VISAGE safety metric that measures the safety in LLM finetuning by probing its safety landscape. Visualizing the safety landscape of the aligned model enables us to understand how finetuning compromises safety by dragging the model away from the safety basin. The LLM safety landscape also highlights the system prompt's critical role in protecting a model, and that such protection transfers to its perturbed variants within the safety basin. These observations from our safety landscape research provide new insights for future work on LLM safety community. Our code is publicly available at https://github.com/ShengYun-Peng/llm-landscape.

Updated: 2024-10-30 22:35:59

标题: 穿越安全领域：在细调大型语言模型中测量风险

摘要: 安全对齐对确保大型语言模型（LLMs）的行为与人类偏好一致并在推理过程中防止有害行为至关重要。然而，最近的研究表明，通过使用少量敌对设计的训练示例微调，对齐可能很容易受到损害。我们旨在通过导航LLM安全景观来衡量微调LLMs的风险。我们发现了一种新现象，在流行的开源LLMs的模型参数空间中普遍观察到，称为“安全盆地”：对模型权重进行随机扰动可以在其本地邻域内维持原始对齐模型的安全水平。然而，在这个本地区域之外，安全性完全受到损害，呈现出明显的、阶梯状的下降。这个安全盆地与LLM能力景观形成鲜明对比，模型性能在原点达到峰值，随着随机扰动的增加逐渐下降。我们的发现启发我们提出了新的VISAGE安全度量标准，通过探究其安全景观来衡量LLM微调的安全性。可视化对齐模型的安全景观使我们能够了解微调如何通过将模型远离安全盆地来危及安全。LLM安全景观还凸显了系统提示在保护模型中的关键作用，以及这种保护如何转移到其在安全盆地内的扰动变体。我们的安全景观研究观察为未来LLM安全社区的工作提供了新的见解。我们的代码公开可用于https://github.com/ShengYun-Peng/llm-landscape。

更新时间: 2024-10-30 22:35:59

领域: cs.LG

下载: http://arxiv.org/abs/2405.17374v3

Modern Hopfield Networks meet Encoded Neural Representations -- Addressing Practical Considerations

Content-addressable memories such as Modern Hopfield Networks (MHN) have been studied as mathematical models of auto-association and storage/retrieval in the human declarative memory, yet their practical use for large-scale content storage faces challenges. Chief among them is the occurrence of meta-stable states, particularly when handling large amounts of high dimensional content. This paper introduces Hopfield Encoding Networks (HEN), a framework that integrates encoded neural representations into MHNs to improve pattern separability and reduce meta-stable states. We show that HEN can also be used for retrieval in the context of hetero association of images with natural language queries, thus removing the limitation of requiring access to partial content in the same domain. Experimental results demonstrate substantial reduction in meta-stable states and increased storage capacity while still enabling perfect recall of a significantly larger number of inputs advancing the practical utility of associative memory networks for real-world tasks.

Updated: 2024-10-30 22:35:58

标题: 现代霍普菲尔德网络结合编码神经表示——解决实际考虑

摘要: 内容寻址存储器，如现代霍普菲尔德网络（MHN），已被研究作为自动关联和存储/检索的数学模型，在人类陈述性记忆中，然而，它们在大规模内容存储方面的实际应用面临挑战。其中最主要的挑战是发生亚稳态状态，特别是在处理大量高维内容时。本文介绍了霍普菲尔德编码网络（HEN），这是一个将编码神经表示集成到MHN中的框架，以改善模式的可分离性并减少亚稳态状态。我们展示了HEN也可以用于在图像与自然语言查询的异质关联背景下进行检索，从而消除了需要访问相同领域的部分内容的限制。实验结果表明，在仍能实现完美召回更多输入的前提下，亚稳态状态大大减少，存储容量增加，推进了关联记忆网络在真实任务中的实用性。

更新时间: 2024-10-30 22:35:58

领域: cs.LG,cs.AI,cs.CV,cs.IR,cs.NE

下载: http://arxiv.org/abs/2409.16408v2

What makes unlearning hard and what to do about it

Machine unlearning is the problem of removing the effect of a subset of training data (the ''forget set'') from a trained model without damaging the model's utility e.g. to comply with users' requests to delete their data, or remove mislabeled, poisoned or otherwise problematic data. With unlearning research still being at its infancy, many fundamental open questions exist: Are there interpretable characteristics of forget sets that substantially affect the difficulty of the problem? How do these characteristics affect different state-of-the-art algorithms? With this paper, we present the first investigation aiming to answer these questions. We identify two key factors affecting unlearning difficulty and the performance of unlearning algorithms. Evaluation on forget sets that isolate these identified factors reveals previously-unknown behaviours of state-of-the-art algorithms that don't materialize on random forget sets. Based on our insights, we develop a framework coined Refined-Unlearning Meta-algorithm (RUM) that encompasses: (i) refining the forget set into homogenized subsets, according to different characteristics; and (ii) a meta-algorithm that employs existing algorithms to unlearn each subset and finally delivers a model that has unlearned the overall forget set. We find that RUM substantially improves top-performing unlearning algorithms. Overall, we view our work as an important step in (i) deepening our scientific understanding of unlearning and (ii) revealing new pathways to improving the state-of-the-art.

Updated: 2024-10-30 22:34:09

标题: 什么让“去学习”变得困难，以及应该怎么做？

摘要: 机器遗忘是从经过训练的模型中删除训练数据子集（“遗忘集”）的影响而不损害模型效用的问题，例如遵守用户要求删除其数据，或删除错误标记，毒化或其他问题数据。随着遗忘研究仍处于萌芽阶段，许多基本的问题仍然存在：是否存在一些可解释的遗忘集特征，它们会显着影响问题的难度？这些特征如何影响不同的最先进算法？通过本文，我们首次进行了旨在回答这些问题的调查。我们确定了影响遗忘难度和遗忘算法性能的两个关键因素。针对孤立出这些确定因素的遗忘集的评估揭示了先进算法的以前未知行为，这些行为在随机遗忘集上并未显现。基于我们的见解，我们开发了一个名为精细遗忘元算法（RUM）的框架，包括：（i）根据不同特征将遗忘集细化为同质化子集；和（ii）一个元算法，该元算法利用现有算法来遗忘每个子集，并最终提供一个已遗忘整体遗忘集的模型。我们发现RUM显着改善了表现最佳的遗忘算法。总的来说，我们认为我们的工作是（i）深化我们对遗忘的科学理解，以及（ii）揭示改进最先进技术的新途径的重要一步。

更新时间: 2024-10-30 22:34:09

领域: cs.LG

下载: http://arxiv.org/abs/2406.01257v2

Compute-Constrained Data Selection

Data selection can reduce the amount of training data needed to finetune LLMs; however, the efficacy of data selection scales directly with its compute. Motivated by the practical challenge of compute-constrained finetuning, we consider the setting in which both the cost of selecting data and training are budgeted for. We first formalize the problem of data selection with a cost-aware utility function, and model the data selection problem as trading off initial-selection cost for training gain. We run a comprehensive sweep of experiments across multiple tasks, varying compute budget by scaling finetuning tokens, model sizes, and data selection compute. These experiments show the validity of this model in real-world experiments. Interestingly we find that many powerful data selection methods are almost never compute-optimal, and that cheaper data selection alternatives dominate both from a theoretical and empirical perspective.

Updated: 2024-10-30 22:31:54

标题: 计算受限的数据选择

摘要: 数据选择可以减少微调LLMs所需的训练数据量；然而，数据选择的有效性与其计算量成正比。受计算受限微调的实际挑战的启发，我们考虑了既定了选择数据成本又制定了训练成本预算的情况。我们首先通过成本感知效用函数形式化了数据选择问题，并将数据选择问题建模为在初始选择成本和训练收益之间进行权衡。我们在多个任务上进行了全面的实验，通过调整微调标记、模型大小和数据选择计算来变化计算预算。这些实验展示了这个模型在现实世界实验中的有效性。有趣的是，我们发现许多强大的数据选择方法几乎从不是计算最优的，而较便宜的数据选择替代品在理论和实证角度上都占据主导地位。

更新时间: 2024-10-30 22:31:54

领域: cs.LG,cs.AI,cs.CL

下载: http://arxiv.org/abs/2410.16208v2

Keep on Swimming: Real Attackers Only Need Partial Knowledge of a Multi-Model System

Recent approaches in machine learning often solve a task using a composition of multiple models or agentic architectures. When targeting a composed system with adversarial attacks, it might not be computationally or informationally feasible to train an end-to-end proxy model or a proxy model for every component of the system. We introduce a method to craft an adversarial attack against the overall multi-model system when we only have a proxy model for the final black-box model, and when the transformation applied by the initial models can make the adversarial perturbations ineffective. Current methods handle this by applying many copies of the first model/transformation to an input and then re-use a standard adversarial attack by averaging gradients, or learning a proxy model for both stages. To our knowledge, this is the first attack specifically designed for this threat model and our method has a substantially higher attack success rate (80% vs 25%) and contains 9.4% smaller perturbations (MSE) compared to prior state-of-the-art methods. Our experiments focus on a supervised image pipeline, but we are confident the attack will generalize to other multi-model settings [e.g. a mix of open/closed source foundation models], or agentic systems

Updated: 2024-10-30 22:23:16

标题: 继续游泳：真正的攻击者只需要对多模型系统有部分了解

摘要: 最近机器学习中的方法通常使用多个模型或代理体系结构的组合来解决任务。当针对一个复合系统进行对抗攻击时，训练一个端到端的代理模型或为系统的每个组件训练一个代理模型可能在计算或信息上并不可行。我们介绍了一种方法，可以针对整个多模型系统进行对抗攻击，即使我们只有最终黑盒模型的代理模型，且初始模型应用的转换可能会使对抗扰动无效。目前的方法通过将第一个模型/转换的多个副本应用于输入，然后重复使用标准的对抗攻击方法进行平均梯度，或学习两个阶段的代理模型来处理这个问题。据我们所知，这是专门针对这种威胁模型设计的第一个攻击，我们的方法具有更高的攻击成功率（80%对25%），并且与先前最先进的方法相比，扰动较小（MSE减少了9.4%）。我们的实验集中在监督的图像处理管道上，但我们相信这种攻击方法可以推广到其他多模型设置（例如混合开源/闭源基础模型）或代理系统。

更新时间: 2024-10-30 22:23:16

领域: cs.LG,cs.AI,cs.CR,cs.CV,cs.MA

下载: http://arxiv.org/abs/2410.23483v1

Multi-fidelity Machine Learning for Uncertainty Quantification and Optimization

In system analysis and design optimization, multiple computational models are typically available to represent a given physical system. These models can be broadly classified as high-fidelity models, which provide highly accurate predictions but require significant computational resources, and low-fidelity models, which are computationally efficient but less accurate. Multi-fidelity methods integrate high- and low-fidelity models to balance computational cost and predictive accuracy. This perspective paper provides an in-depth overview of the emerging field of machine learning-based multi-fidelity methods, with a particular emphasis on uncertainty quantification and optimization. For uncertainty quantification, a particular focus is on multi-fidelity graph neural networks, compared with multi-fidelity polynomial chaos expansion. For optimization, our emphasis is on multi-fidelity Bayesian optimization, offering a unified perspective on multi-fidelity priors and proposing an application strategy when the objective function is an integral or a weighted sum. We highlight the current state of the art, identify critical gaps in the literature, and outline key research opportunities in this evolving field.

Updated: 2024-10-30 22:22:07

标题: 多模态机器学习用于不确定性量化和优化

摘要: 在系统分析和设计优化中，通常有多个计算模型可用来表示给定的物理系统。这些模型可以广泛分类为高保真度模型，提供高度准确的预测但需要大量计算资源，和低保真度模型，计算效率高但准确性较低。多保真度方法整合高低保真度模型以平衡计算成本和预测准确性。这篇观点论文提供了对基于机器学习的多保真度方法新兴领域的深入概述，特别强调不确定性量化和优化。对于不确定性量化，特别关注多保真度图神经网络，与多保真度混沌多项式扩展进行比较。对于优化，我们强调多保真度贝叶斯优化，提供统一的多保真度先验观点，并提出当目标函数为积分或加权和时的应用策略。我们强调当前的技术水平，确定文献中的关键差距，并概述这一不断发展领域的主要研究机会。

更新时间: 2024-10-30 22:22:07

领域: cs.LG,math.OC,stat.ML

下载: http://arxiv.org/abs/2410.23482v1

Towards Calibrated Robust Fine-Tuning of Vision-Language Models

Improving out-of-distribution (OOD) generalization during in-distribution (ID) adaptation is a primary goal of robust fine-tuning of zero-shot models beyond naive fine-tuning. However, despite decent OOD generalization performance from recent robust fine-tuning methods, confidence calibration for reliable model output has not been fully addressed. This work proposes a robust fine-tuning method that improves both OOD accuracy and confidence calibration simultaneously in vision language models. Firstly, we show that both OOD classification and OOD calibration errors have a shared upper bound consisting of two terms of ID data: 1) ID calibration error and 2) the smallest singular value of the ID input covariance matrix. Based on this insight, we design a novel framework that conducts fine-tuning with a constrained multimodal contrastive loss enforcing a larger smallest singular value, which is further guided by the self-distillation of a moving-averaged model to achieve calibrated prediction as well. Starting from empirical evidence supporting our theoretical statements, we provide extensive experimental results on ImageNet distribution shift benchmarks that demonstrate the effectiveness of our theorem and its practical implementation.

Updated: 2024-10-30 22:16:03

标题: 走向视觉-语言模型的校准稳健微调

摘要: 在分布内（ID）调整期间改善超出分布（OOD）概括的能力是强化零样本模型的微调的主要目标，而不仅仅是朴素的微调。然而，尽管最近的强化微调方法表现出不错的OOD概括性能，但对于可靠模型输出的置信度校准尚未得到充分解决。本研究提出了一种强化微调方法，同时提高视觉语言模型中的OOD准确性和置信度校准。首先，我们展示了OOD分类和OOD校准误差具有共享上界，由ID数据的两个项组成：1）ID校准误差和2）ID输入协方差矩阵的最小奇异值。基于这一见解，我们设计了一个新颖的框架，通过施加一个受限多模态对比损失进行微调，以强化最小奇异值，进一步通过移动平均模型的自蒸馏来指导实现校准预测。从支持我们理论陈述的经验证据开始，我们在ImageNet分布转移基准上提供了广泛的实验结果，证明了我们的定理及其实际实现的有效性。

更新时间: 2024-10-30 22:16:03

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2311.01723v6

Decision-Focused Learning with Directional Gradients

We propose a novel family of decision-aware surrogate losses, called Perturbation Gradient (PG) losses, for the predict-then-optimize framework. The key idea is to connect the expected downstream decision loss with the directional derivative of a particular plug-in objective, and then approximate this derivative using zeroth order gradient techniques. Unlike the original decision loss which is typically piecewise constant and discontinuous, our new PG losses is a Lipschitz continuous, difference of concave functions that can be optimized using off-the-shelf gradient-based methods. Most importantly, unlike existing surrogate losses, the approximation error of our PG losses vanishes as the number of samples grows. Hence, optimizing our surrogate loss yields a best-in-class policy asymptotically, even in misspecified settings. This is the first such result in misspecified settings, and we provide numerical evidence confirming our PG losses substantively outperform existing proposals when the underlying model is misspecified.

Updated: 2024-10-30 22:01:13

标题: 以方向梯度为重点的决策性学习

摘要: 我们提出了一种新颖的决策感知替代损失函数家族，称为扰动梯度（PG）损失，用于预测-优化框架。关键思想是将预期的下游决策损失与特定插件目标的方向导数相连接，然后使用零阶梯度技术来近似这个导数。与通常是分段常数和不连续的原始决策损失不同，我们的新PG损失是一个利普希茨连续、凸函数的差，可以使用现成的基于梯度的方法进行优化。最重要的是，与现有的替代损失不同，我们的PG损失的近似误差随着样本数量的增加而消失。因此，优化我们的替代损失在错误设置中渐近地产生最佳策略。这是在错误设置中的首次结果，我们提供数值证据证实了我们的PG损失在基础模型错误设置时实质上优于现有提议。

更新时间: 2024-10-30 22:01:13

领域: cs.LG,math.OC,stat.ML

下载: http://arxiv.org/abs/2402.03256v4

Operator World Models for Reinforcement Learning

Policy Mirror Descent (PMD) is a powerful and theoretically sound methodology for sequential decision-making. However, it is not directly applicable to Reinforcement Learning (RL) due to the inaccessibility of explicit action-value functions. We address this challenge by introducing a novel approach based on learning a world model of the environment using conditional mean embeddings. Leveraging tools from operator theory we derive a closed-form expression of the action-value function in terms of the world model via simple matrix operations. Combining these estimators with PMD leads to POWR, a new RL algorithm for which we prove convergence rates to the global optimum. Preliminary experiments in finite and infinite state settings support the effectiveness of our method

Updated: 2024-10-30 21:58:12

标题: 操作符世界模型用于强化学习

摘要: 政策镜像下降（PMD）是一种强大且在理论上合理的连续决策方法。然而，由于无法访问明确的动作-价值函数，它并不直接适用于强化学习（RL）。我们通过引入一种基于学习使用条件均值嵌入的环境世界模型的新方法来解决这一挑战。利用算子理论工具，我们通过简单的矩阵运算推导出了动作-价值函数的闭合形式表达式，以世界模型为基础。将这些估计器与PMD结合起来，得到了POWR，这是一种新的RL算法，我们证明了其收敛速率达到全局最优。在有限和无限状态设置中的初步实验支持了我们方法的有效性。

更新时间: 2024-10-30 21:58:12

领域: cs.LG,math.OC,stat.ML

下载: http://arxiv.org/abs/2406.19861v2

Logarithmic Smoothing for Pessimistic Off-Policy Evaluation, Selection and Learning

This work investigates the offline formulation of the contextual bandit problem, where the goal is to leverage past interactions collected under a behavior policy to evaluate, select, and learn new, potentially better-performing, policies. Motivated by critical applications, we move beyond point estimators. Instead, we adopt the principle of pessimism where we construct upper bounds that assess a policy's worst-case performance, enabling us to confidently select and learn improved policies. Precisely, we introduce novel, fully empirical concentration bounds for a broad class of importance weighting risk estimators. These bounds are general enough to cover most existing estimators and pave the way for the development of new ones. In particular, our pursuit of the tightest bound within this class motivates a novel estimator (LS), that logarithmically smooths large importance weights. The bound for LS is provably tighter than its competitors, and naturally results in improved policy selection and learning strategies. Extensive policy evaluation, selection, and learning experiments highlight the versatility and favorable performance of LS.

Updated: 2024-10-30 21:53:43

标题: 对悲观的离线政策评估、选择和学习的对数平滑

摘要: 这项工作调查了上下文强盗问题的离线表述，其目标是利用在行为策略下收集的过去交互来评估、选择和学习新的、潜在更好表现的策略。受关键应用的启发，我们超越了点估计器。相反，我们采用了悲观主义原则，我们构建了评估策略最坏情况表现的上界，使我们能够自信地选择和学习改进的策略。准确地说，我们引入了一种新颖的、完全经验的集中界限，适用于广泛类别的重要性权重风险估计器。这些界限足够一般，可以涵盖大多数现有估计器，并为新估计器的开发铺平道路。特别是，我们在这个类别中追求最紧密界限的动机推动了一种新的估计器（LS），该估计器对大的重要性权重进行了对数平滑处理。LS的界限被证明比竞争对手更紧密，自然地导致改进的策略选择和学习策略。广泛的策略评估、选择和学习实验突显了LS的多功能性和有利表现。

更新时间: 2024-10-30 21:53:43

领域: stat.ML,cs.LG

下载: http://arxiv.org/abs/2405.14335v2

Inference via Interpolation: Contrastive Representations Provably Enable Planning and Inference

Given time series data, how can we answer questions like "what will happen in the future?" and "how did we get here?" These sorts of probabilistic inference questions are challenging when observations are high-dimensional. In this paper, we show how these questions can have compact, closed form solutions in terms of learned representations. The key idea is to apply a variant of contrastive learning to time series data. Prior work already shows that the representations learned by contrastive learning encode a probability ratio. By extending prior work to show that the marginal distribution over representations is Gaussian, we can then prove that joint distribution of representations is also Gaussian. Taken together, these results show that representations learned via temporal contrastive learning follow a Gauss-Markov chain, a graphical model where inference (e.g., prediction, planning) over representations corresponds to inverting a low-dimensional matrix. In one special case, inferring intermediate representations will be equivalent to interpolating between the learned representations. We validate our theory using numerical simulations on tasks up to 46-dimensions.

Updated: 2024-10-30 21:52:16

标题: 通过插值进行推断：对比表示可证明地实现规划和推断

摘要: 鉴于时间序列数据，我们如何回答类似“未来会发生什么？”和“我们是如何到达这里的？”这类概率推断问题在观测值是高维的情况下是具有挑战性的。在本文中，我们展示了这些问题如何可以通过学习表示的紧凑、闭合形式解决。关键思想是将对比学习的变种应用于时间序列数据。先前的研究已经表明，通过对比学习学习到的表示编码了概率比。通过将先前的工作扩展到显示表示的边际分布是高斯分布，我们可以证明表示的联合分布也是高斯分布。综合这些结果表明，通过时间对比学习学习到的表示遵循高斯-马尔可夫链，这是一个图形模型，在表示上的推断（例如预测、规划）对应于反转低维矩阵。在一个特殊情况下，推断中间表示将等价于在学习到的表示之间插值。我们使用多达46维任务的数值模拟验证了我们的理论。

更新时间: 2024-10-30 21:52:16

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2403.04082v3

DRoP: Distributionally Robust Pruning

In the era of exceptionally data-hungry models, careful selection of the training data is essential to mitigate the extensive costs of deep learning. Data pruning offers a solution by removing redundant or uninformative samples from the dataset, which yields faster convergence and improved neural scaling laws. However, little is known about its impact on classification bias of the trained models. We conduct the first systematic study of this effect and reveal that existing data pruning algorithms can produce highly biased classifiers. We present theoretical analysis of the classification risk in a mixture of Gaussians to argue that choosing appropriate class pruning ratios, coupled with random pruning within classes has potential to improve worst-class performance. We thus propose DRoP, a distributionally robust approach to pruning and empirically demonstrate its performance on standard computer vision benchmarks. In sharp contrast to existing algorithms, our proposed method continues improving distributional robustness at a tolerable drop of average performance as we prune more from the datasets.

Updated: 2024-10-30 21:36:13

标题: DRoP: 分布鲁棒剪枝

摘要: 在数据需求异常高的模型时代，谨慎选择训练数据是减少深度学习的巨大成本的关键。数据修剪通过从数据集中删除冗余或无信息的样本，提供了一种解决方案，这样可以实现更快的收敛和改进神经网络的扩展定律。然而，我们很少了解其对训练模型的分类偏差的影响。我们进行了对此影响的首次系统研究，并揭示现有的数据修剪算法可能会产生高度偏见的分类器。我们在混合高斯模型中进行了分类风险的理论分析，以证明选择适当的类别修剪比例，结合类内的随机修剪有可能改善最差类别的性能。因此，我们提出了一种分布鲁棒的修剪方法DRoP，并在标准计算机视觉基准测试上对其性能进行了实证演示。与现有算法形成鲜明对比的是，我们提出的方法在从数据集中修剪更多数据的同时，继续提高分布鲁棒性，而平均性能下降是可以接受的。

更新时间: 2024-10-30 21:36:13

领域: cs.LG,cs.CV

下载: http://arxiv.org/abs/2404.05579v3

Risk Sources and Risk Management Measures in Support of Standards for General-Purpose AI Systems

There is an urgent need to identify both short and long-term risks from newly emerging types of Artificial Intelligence (AI), as well as available risk management measures. In response, and to support global efforts in regulating AI and writing safety standards, we compile an extensive catalog of risk sources and risk management measures for general-purpose AI (GPAI) systems, complete with descriptions and supporting examples where relevant. This work involves identifying technical, operational, and societal risks across model development, training, and deployment stages, as well as surveying established and experimental methods for managing these risks. To the best of our knowledge, this paper is the first of its kind to provide extensive documentation of both GPAI risk sources and risk management measures that are descriptive, self-contained and neutral with respect to any existing regulatory framework. This work intends to help AI providers, standards experts, researchers, policymakers, and regulators in identifying and mitigating systemic risks from GPAI systems. For this reason, the catalog is released under a public domain license for ease of direct use by stakeholders in AI governance and standards.

Updated: 2024-10-30 21:32:56

标题: 针对通用人工智能系统标准的风险来源和风险管理措施

摘要: 迫切需要确定新兴人工智能（AI）类型的短期和长期风险，以及可用的风险管理措施。为了支持全球规范AI和制定安全标准的努力，我们编制了一个广泛的风险来源和风险管理措施目录，适用于通用人工智能（GPAI）系统，并提供相关描述和支持示例。这项工作涉及跨模型开发、训练和部署阶段的技术、运营和社会风险的识别，以及调查用于管理这些风险的既定和实验性方法。据我们所知，本文是首个为GPAI风险来源和风险管理措施提供广泛文档记录的工作，这些记录描述性、自成体系，并与任何现有监管框架中立。这项工作旨在帮助AI供应商、标准专家、研究人员、政策制定者和监管机构识别和减轻GPAI系统的系统性风险。因此，该目录以公共领域许可证发布，以便AI治理和标准的利益相关方直接使用。

更新时间: 2024-10-30 21:32:56

领域: cs.CY,cs.AI,cs.LG

下载: http://arxiv.org/abs/2410.23472v1

Symmetric Linear Bandits with Hidden Symmetry

High-dimensional linear bandits with low-dimensional structure have received considerable attention in recent studies due to their practical significance. The most common structure in the literature is sparsity. However, it may not be available in practice. Symmetry, where the reward is invariant under certain groups of transformations on the set of arms, is another important inductive bias in the high-dimensional case that covers many standard structures, including sparsity. In this work, we study high-dimensional symmetric linear bandits where the symmetry is hidden from the learner, and the correct symmetry needs to be learned in an online setting. We examine the structure of a collection of hidden symmetry and provide a method based on model selection within the collection of low-dimensional subspaces. Our algorithm achieves a regret bound of $ O(d_0^{2/3} T^{2/3} \log(d))$, where $d$ is the ambient dimension which is potentially very large, and $d_0$ is the dimension of the true low-dimensional subspace such that $d_0 \ll d$. With an extra assumption on well-separated models, we can further improve the regret to $ O(d_0\sqrt{T\log(d)} )$.

Updated: 2024-10-30 21:26:19

标题: 具有隐藏对称性的对称线性赌臂

摘要: 在最近的研究中，具有低维结构的高维线性赌博机引起了相当大的关注，因为它们具有实际意义。文献中最常见的结构是稀疏性。然而，在实践中可能无法获得。对称性，即奖励在一组变换对臂集合上不变，是高维情况下另一个重要的归纳偏差，涵盖了许多标准结构，包括稀疏性。在这项工作中，我们研究了高维对称线性赌博机，其中对称性对学习者隐藏，需要在在线设置中学习正确的对称性。我们研究了一组隐藏对称性的结构，并提供了一种基于模型选择在低维子空间集合中的方法。我们的算法实现了一个遗憾边界为$O(d_0^{2/3} T^{2/3} \log(d))$，其中$d$是潜在非常大的环境维度，$d_0$是真实低维子空间的维度，使得$d_0 \ll d$。在对模型之间有较好分离的额外假设下，我们可以进一步将遗憾改进为$O(d_0\sqrt{T\log(d)})$。

更新时间: 2024-10-30 21:26:19

领域: stat.ML,cs.LG

下载: http://arxiv.org/abs/2405.13899v2

Gradient-free training of recurrent neural networks

Recurrent neural networks are a successful neural architecture for many time-dependent problems, including time series analysis, forecasting, and modeling of dynamical systems. Training such networks with backpropagation through time is a notoriously difficult problem because their loss gradients tend to explode or vanish. In this contribution, we introduce a computational approach to construct all weights and biases of a recurrent neural network without using gradient-based methods. The approach is based on a combination of random feature networks and Koopman operator theory for dynamical systems. The hidden parameters of a single recurrent block are sampled at random, while the outer weights are constructed using extended dynamic mode decomposition. This approach alleviates all problems with backpropagation commonly related to recurrent networks. The connection to Koopman operator theory also allows us to start using results in this area to analyze recurrent neural networks. In computational experiments on time series, forecasting for chaotic dynamical systems, and control problems, as well as on weather data, we observe that the training time and forecasting accuracy of the recurrent neural networks we construct are improved when compared to commonly used gradient-based methods.

Updated: 2024-10-30 21:24:34

标题: 无梯度训练循环神经网络

摘要: 递归神经网络是许多依赖于时间的问题的成功神经结构，包括时间序列分析、预测和动态系统建模。使用反向传播通过时间训练这些网络是一个众所周知的困难问题，因为它们的损失梯度往往会爆炸或消失。在本文中，我们介绍了一种计算方法，可以构建递归神经网络的所有权重和偏差，而不使用基于梯度的方法。该方法基于随机特征网络和动态系统的Koopman算子理论的组合。单个递归块的隐藏参数是随机采样的，而外部权重是使用扩展的动态模式分解构建的。这种方法缓解了与递归网络常见的反向传播相关的所有问题。与Koopman算子理论的联系还允许我们开始使用这一领域的结果来分析递归神经网络。在时间序列、混沌动态系统的预测和控制问题以及天气数据的计算实验中，我们观察到我们构建的递归神经网络的训练时间和预测准确性相对于常用的基于梯度的方法有所改善。

更新时间: 2024-10-30 21:24:34

领域: cs.LG,cs.NA,math.NA

下载: http://arxiv.org/abs/2410.23467v1

Learning Social Welfare Functions

Is it possible to understand or imitate a policy maker's rationale by looking at past decisions they made? We formalize this question as the problem of learning social welfare functions belonging to the well-studied family of power mean functions. We focus on two learning tasks; in the first, the input is vectors of utilities of an action (decision or policy) for individuals in a group and their associated social welfare as judged by a policy maker, whereas in the second, the input is pairwise comparisons between the welfares associated with a given pair of utility vectors. We show that power mean functions are learnable with polynomial sample complexity in both cases, even if the comparisons are social welfare information is noisy. Finally, we design practical algorithms for these tasks and evaluate their performance.

Updated: 2024-10-30 21:24:32

标题: 学习社会福利函数

摘要: 我们能否通过观察政策制定者过去的决策来理解或模拟其决策原因？我们将这个问题形式化为学习社会福利函数的问题，这些函数属于已经研究过的幂均值函数家族。我们关注两个学习任务；在第一个任务中，输入是一个群体中个体对某一行动（决策或政策）的效用向量以及政策制定者判断的社会福利，而在第二个任务中，输入是给定一对效用向量相关的福利之间的两两比较。我们证明无论比较的社会福利信息是否存在噪音，幂均值函数在这两种情况下都可以用多项式样本复杂度来学习。最后，我们设计了针对这些任务的实用算法，并评估其性能。

更新时间: 2024-10-30 21:24:32

领域: cs.GT,cs.LG

下载: http://arxiv.org/abs/2405.17700v2

Superposed Decoding: Multiple Generations from a Single Autoregressive Inference Pass

Many applications today provide users with multiple auto-complete drafts as they type, including GitHub's code completion, Gmail's smart compose, and Apple's messaging auto-suggestions. Under the hood, language models support this by running an autoregressive inference pass to provide a draft. Consequently, providing $k$ drafts to the user requires running an expensive language model $k$ times. To alleviate the computation cost of running $k$ inference passes, we propose Superposed Decoding, a new decoding algorithm that generates $k$ drafts at the computation cost of one autoregressive inference pass. We achieve this by feeding a superposition of the most recent token embeddings from the $k$ drafts as input to the next decoding step of the language model. At every inference step we combine the $k$ drafts with the top-$k$ tokens to get $k^2$ new drafts and cache the $k$ most likely options, using an n-gram interpolation with minimal compute overhead to filter out incoherent generations. Our experiments show that $k$ drafts from Superposed Decoding are at least as coherent and factual as Nucleus Sampling and Greedy Decoding respectively, while being at least $2.44\times$ faster for $k\ge3$. In a compute-normalized setting, user evaluations demonstrably favor text generated by Superposed Decoding over Nucleus Sampling. Superposed Decoding can also be combined with other decoding strategies, resulting in universal coverage gains when scaling inference time compute. Code and more examples open-sourced at https://github.com/RAIVNLab/SuperposedDecoding.

Updated: 2024-10-30 21:22:54

标题: 叠加解码：通过单个自回归推断传递的多代结果

摘要: 今天许多应用程序在用户输入时提供多个自动完成草稿，包括GitHub的代码完成、Gmail的智能撰写和Apple的消息自动建议。在幕后，语言模型通过运行自回归推理过程来支持这一功能，以提供草稿。因此，为用户提供$k$个草稿需要运行一个昂贵的语言模型$k$次。为了减轻运行$k$次推理传递的计算成本，我们提出了超定解码，一种新的解码算法，以一个自回归推理传递的计算成本生成$k$个草稿。我们通过将$k$个草稿中最近的标记嵌入的叠加作为输入，传递给语言模型的下一个解码步骤来实现这一点。在每个推理步骤中，我们将$k$个草稿与前$k$个标记相结合，得到$k^2$个新草稿，并缓存$k$个最可能的选项，使用n-gram插值来过滤出不连贯的生成，计算开销最小。我们的实验证明，超定解码生成的$k$个草稿至少与Nucleus采样和贪婪解码分别一样连贯和事实，而对于$k\ge3$，速度至少快$2.44\times$。在经过计算规范化的设置中，用户评价明显更青睐超定解码生成的文本，而不是Nucleus采样。超定解码也可以与其他解码策略结合，当扩展推理时间计算时，会导致普遍的覆盖增益。代码和更多示例开源在https://github.com/RAIVNLab/SuperposedDecoding。

更新时间: 2024-10-30 21:22:54

领域: cs.CL,cs.LG

下载: http://arxiv.org/abs/2405.18400v6

Large Language Models for Anomaly and Out-of-Distribution Detection: A Survey

Detecting anomalies or out-of-distribution (OOD) samples is critical for maintaining the reliability and trustworthiness of machine learning systems. Recently, Large Language Models (LLMs) have demonstrated their effectiveness not only in natural language processing but also in broader applications due to their advanced comprehension and generative capabilities. The integration of LLMs into anomaly and OOD detection marks a significant shift from the traditional paradigm in the field. This survey focuses on the problem of anomaly and OOD detection under the context of LLMs. We propose a new taxonomy to categorize existing approaches into two classes based on the role played by LLMs. Following our proposed taxonomy, we further discuss the related work under each of the categories and finally discuss potential challenges and directions for future research in this field. We also provide an up-to-date reading list of relevant papers.

Updated: 2024-10-30 21:18:37

标题: 大型语言模型用于异常和分布外检测：一项调查

摘要: 检测异常或超出分布（OOD）样本对于维护机器学习系统的可靠性和信任度至关重要。最近，大型语言模型（LLMs）展示了它们不仅在自然语言处理中的有效性，而且在更广泛的应用中也因其先进的理解和生成能力而受到赞誉。LLMs整合到异常和OOD检测中标志着该领域传统范式的重大转变。本文重点讨论了LLMs情境下的异常和OOD检测问题。我们提出了一个新的分类法，根据LLMs发挥的作用将现有方法分类为两类。根据我们提出的分类法，我们进一步讨论了每个类别下相关工作，并最终讨论了这一领域未来研究的潜在挑战和方向。我们还提供了相关论文的最新阅读列表。

更新时间: 2024-10-30 21:18:37

领域: cs.LG

下载: http://arxiv.org/abs/2409.01980v2

Learning the Expected Core of Strictly Convex Stochastic Cooperative Games

Reward allocation, also known as the credit assignment problem, has been an important topic in economics, engineering, and machine learning. An important concept in reward allocation is the core, which is the set of stable allocations where no agent has the motivation to deviate from the grand coalition. In previous works, computing the core requires either knowledge of the reward function in deterministic games or the reward distribution in stochastic games. However, this is unrealistic, as the reward function or distribution is often only partially known and may be subject to uncertainty. In this paper, we consider the core learning problem in stochastic cooperative games, where the reward distribution is unknown. Our goal is to learn the expected core, that is, the set of allocations that are stable in expectation, given an oracle that returns a stochastic reward for an enquired coalition each round. Within the class of strictly convex games, we present an algorithm named \texttt{Common-Points-Picking} that returns a point in the expected core given a polynomial number of samples, with high probability. To analyse the algorithm, we develop a new extension of the separation hyperplane theorem for multiple convex sets.

Updated: 2024-10-30 21:17:49

标题: 学习严格凸随机合作博弈的预期核心

摘要: 奖励分配，也被称为信用分配问题，在经济学、工程学和机器学习中是一个重要的话题。奖励分配中的一个重要概念是核心，即稳定分配的集合，在这种分配中没有代理人有动机偏离整体联盟。在先前的工作中，计算核心要么需要确定性游戏中奖励函数的知识，要么需要随机游戏中奖励分布的知识。然而，这是不现实的，因为奖励函数或分布往往只是部分已知，并可能受到不确定性的影响。在本文中，我们考虑了随机合作游戏中的核心学习问题，其中奖励分布是未知的。我们的目标是学习预期核心，即在期望中稳定的分配集合，给定一个每轮返回一个询问联盟的随机奖励的Oracle。在严格凸游戏类中，我们提出了一种名为\texttt{Common-Points-Picking}的算法，通过多项式数量的样本，返回预期核心中的一个点，具有很高的概率。为了分析这个算法，我们为多个凸集开发了一个新的分离超平面定理的扩展。

更新时间: 2024-10-30 21:17:49

领域: cs.GT,cs.LG

下载: http://arxiv.org/abs/2402.07067v3

MDCure: A Scalable Pipeline for Multi-Document Instruction-Following

Multi-document (MD) processing is crucial for LLMs to handle real-world tasks such as summarization and question-answering across large sets of documents. While LLMs have improved at processing long inputs, MD contexts still present challenges, such as managing inter-document dependencies, redundancy, and incoherent structures. We introduce MDCure, a scalable and effective fine-tuning pipeline to enhance the MD capabilities of LLMs without the computational cost of pre-training or reliance on human annotated data. MDCure is based on generation of high-quality synthetic MD instruction data from sets of related articles via targeted prompts. We further introduce MDCureRM, a multi-objective reward model which filters generated data based on their training utility for MD settings. With MDCure, we fine-tune a variety of LLMs, from the FlanT5, Qwen2, and LLAMA3.1 model families, up to 70B parameters in size. Extensive evaluations on a wide range of MD and long-context benchmarks spanning various tasks show MDCure consistently improves performance over pre-trained baselines and over corresponding base models by up to 75.5%. Our code, datasets, and models are available at https://github.com/yale-nlp/MDCure.

Updated: 2024-10-30 21:08:07

标题: MDCure: 一种用于多文档指令跟随的可扩展流程管道

摘要: 多文档（MD）处理对于LLMs处理真实世界任务（如跨大量文档的摘要和问答）至关重要。虽然LLMs已经在处理长输入方面取得了进展，但MD环境仍然存在挑战，如管理文档间的依赖关系、冗余和不连贯的结构。我们引入了MDCure，这是一个可伸缩且有效的微调管道，可以增强LLMs的MD能力，而无需进行预训练或依赖人工标注数据的计算成本。MDCure基于通过有针对性的提示从相关文章集合中生成高质量的合成MD指令数据。我们进一步介绍了MDCureRM，这是一个多目标奖励模型，根据其在MD环境中的训练效用来过滤生成的数据。通过MDCure，我们对各种LLMs进行微调，从FlanT5、Qwen2和LLAMA3.1模型系列，直到拥有70B参数的规模。在涵盖各种任务的广泛MD和长上下文基准测试上进行的广泛评估显示，MDCure始终比预训练基准模型和相应的基础模型的表现提高了高达75.5%。我们的代码、数据集和模型可在https://github.com/yale-nlp/MDCure上获得。

更新时间: 2024-10-30 21:08:07

领域: cs.CL,cs.LG

下载: http://arxiv.org/abs/2410.23463v1

ThreatKG: An AI-Powered System for Automated Open-Source Cyber Threat Intelligence Gathering and Management

Open-source cyber threat intelligence (OSCTI) has become essential for keeping up with the rapidly changing threat landscape. However, current OSCTI gathering and management solutions mainly focus on structured Indicators of Compromise (IOC) feeds, which are low-level and isolated, providing only a narrow view of potential threats. Meanwhile, the extensive and interconnected knowledge found in the unstructured text of numerous OSCTI reports (e.g., security articles, threat reports) available publicly is still largely underexplored. To bridge the gap, we propose ThreatKG, an automated system for OSCTI gathering and management. ThreatKG efficiently collects a large number of OSCTI reports from multiple sources, leverages specialized AI-based techniques to extract high-quality knowledge about various threat entities and their relationships, and constructs and continuously updates a threat knowledge graph by integrating new OSCTI data. ThreatKG features a modular and extensible design, allowing for the addition of components to accommodate diverse OSCTI report structures and knowledge types. Our extensive evaluations demonstrate ThreatKG's practical effectiveness in enhancing threat knowledge gathering and management.

Updated: 2024-10-30 21:04:34

标题: ThreatKG: 一种用于自动化开源网络威胁情报收集和管理的人工智能系统

摘要: 开源网络威胁情报（OSCTI）已经成为跟上迅速变化的威胁格局所必需的。然而，当前的OSCTI收集和管理解决方案主要集中在结构化威胁指标（IOC）源，这些源是低级且孤立的，仅提供了对潜在威胁的狭窄视角。与此同时，公开可用的众多OSCTI报告（如安全文章、威胁报告）中未经充分探索的非结构化文本中包含了广泛而相互关联的知识。为了弥合这一差距，我们提出了ThreatKG，一个用于OSCTI收集和管理的自动化系统。ThreatKG高效地从多个来源收集大量OSCTI报告，利用专门的基于人工智能技术来提取关于各种威胁实体及其关系的高质量知识，并通过整合新的OSCTI数据构建并持续更新威胁知识图。ThreatKG具有模块化和可扩展的设计，可以添加组件以适应各种OSCTI报告结构和知识类型。我们的广泛评估表明ThreatKG在增强威胁知识收集和管理方面具有实际有效性。

更新时间: 2024-10-30 21:04:34

领域: cs.CR,cs.DB

下载: http://arxiv.org/abs/2212.10388v2

Transformation-Invariant Learning and Theoretical Guarantees for OOD Generalization

Learning with identical train and test distributions has been extensively investigated both practically and theoretically. Much remains to be understood, however, in statistical learning under distribution shifts. This paper focuses on a distribution shift setting where train and test distributions can be related by classes of (data) transformation maps. We initiate a theoretical study for this framework, investigating learning scenarios where the target class of transformations is either known or unknown. We establish learning rules and algorithmic reductions to Empirical Risk Minimization (ERM), accompanied with learning guarantees. We obtain upper bounds on the sample complexity in terms of the VC dimension of the class composing predictors with transformations, which we show in many cases is not much larger than the VC dimension of the class of predictors. We highlight that the learning rules we derive offer a game-theoretic viewpoint on distribution shift: a learner searching for predictors and an adversary searching for transformation maps to respectively minimize and maximize the worst-case loss.

Updated: 2024-10-30 20:59:57

标题: 无变换学习与OOD泛化的理论保证

摘要: 学习具有相同的训练和测试分布在实践和理论上得到了广泛的研究。然而，在分布转移下的统计学习仍有许多待解。本文关注一个分布转移设置，在这个设置中，训练和测试分布可以通过（数据）变换映射的类相关联。我们在这个框架下进行了理论研究，调查了目标变换类是已知或未知的学习场景。我们建立了学习规则和算法减少到经验风险最小化（ERM），并伴随着学习保证。我们根据构成预测器和变换的类的VC维度获得了样本复杂度的上界，我们在许多情况下表明这个维度并不比预测器类的VC维度大太多。我们强调我们得出的学习规则为分布转移提供了一个博弈论的观点：一个寻找预测器的学习者和一个寻找变换映射的对手分别最小化和最大化最坏情况损失。

更新时间: 2024-10-30 20:59:57

领域: cs.LG

下载: http://arxiv.org/abs/2410.23461v1

Progressive Entropic Optimal Transport Solvers

Optimal transport (OT) has profoundly impacted machine learning by providing theoretical and computational tools to realign datasets. In this context, given two large point clouds of sizes $n$ and $m$ in $\mathbb{R}^d$, entropic OT (EOT) solvers have emerged as the most reliable tool to either solve the Kantorovich problem and output a $n\times m$ coupling matrix, or to solve the Monge problem and learn a vector-valued push-forward map. While the robustness of EOT couplings/maps makes them a go-to choice in practical applications, EOT solvers remain difficult to tune because of a small but influential set of hyperparameters, notably the omnipresent entropic regularization strength $\varepsilon$. Setting $\varepsilon$ can be difficult, as it simultaneously impacts various performance metrics, such as compute speed, statistical performance, generalization, and bias. In this work, we propose a new class of EOT solvers (ProgOT), that can estimate both plans and transport maps. We take advantage of several opportunities to optimize the computation of EOT solutions by dividing mass displacement using a time discretization, borrowing inspiration from dynamic OT formulations, and conquering each of these steps using EOT with properly scheduled parameters. We provide experimental evidence demonstrating that ProgOT is a faster and more robust alternative to standard solvers when computing couplings at large scales, even outperforming neural network-based approaches. We also prove statistical consistency of our approach for estimating optimal transport maps.

Updated: 2024-10-30 20:59:51

标题: 渐进熵最优输运求解器

摘要: 最优输运（OT）通过提供理论和计算工具来重新调整数据集，深刻影响了机器学习。在这种情况下，考虑到在$\mathbb{R}^d$中大小分别为$n$和$m$的两个大点云，熵正输运（EOT）求解器已经成为解决Kantorovich问题并输出一个$n\times m$耦合矩阵，或解决Monge问题并学习一个矢量值推进映射的最可靠工具。虽然EOT耦合/映射的稳健性使它们成为实际应用中的首选选择，但由于一小部分影响较大的超参数，尤其是普遍存在的熵正则化强度$\varepsilon$，EOT求解器仍然难以调整。设置$\varepsilon$可能会很困难，因为它同时影响各种性能指标，如计算速度、统计性能、泛化和偏差。在这项工作中，我们提出了一类新的EOT求解器（ProgOT），可以估计计划和传输映射。我们利用几个机会优化EOT解的计算，通过使用时间离散分割质量位移，借鉴动态OT形式化的灵感，并使用正确安排的参数使用EOT克服每个步骤。我们提供实验证据表明，当在大规模计算耦合时，ProgOT是标准求解器的更快更稳健的替代方案，甚至胜过基于神经网络的方法。我们还证明了我们的方法在估计最优输运映射方面的统计一致性。

更新时间: 2024-10-30 20:59:51

领域: stat.ML,cs.LG

下载: http://arxiv.org/abs/2406.05061v3

High-Dimensional Tensor Discriminant Analysis with Incomplete Tensors

Tensor classification is gaining importance across fields, yet handling partially observed data remains challenging. In this paper, we introduce a novel approach to tensor classification with incomplete data, framed within high-dimensional tensor linear discriminant analysis. Specifically, we consider a high-dimensional tensor predictor with missing observations under the Missing Completely at Random (MCR) assumption and employ the Tensor Gaussian Mixture Model (TGMM) to capture the relationship between the tensor predictor and class label. We propose a Tensor Linear Discriminant Analysis with Missing Data (Tensor LDA-MD) algorithm, which manages high-dimensional tensor predictors with missing entries by leveraging the decomposable low-rank structure of the discriminant tensor. Our work establishes convergence rates for the estimation error of the discriminant tensor with incomplete data and minimax optimal bounds for the misclassification rate, addressing key gaps in the literature. Additionally, we derive large deviation bounds for the generalized mode-wise sample covariance matrix and its inverse, which are crucial tools in our analysis and hold independent interest. Our method demonstrates excellent performance in simulations and real data analysis, even with significant proportions of missing data.

Updated: 2024-10-30 20:59:46

标题: 高维张量判别分析中的不完整张量

摘要: 张量分类在各个领域日益重要，然而处理部分观测数据仍然具有挑战性。在本文中，我们介绍了一种新颖的处理不完整数据的张量分类方法，该方法基于高维张量线性判别分析。具体地，我们考虑了一个带有缺失观测的高维张量预测器，假设缺失是完全随机的，并采用张量高斯混合模型（TGMM）来捕捉张量预测器与类标签之间的关系。我们提出了一种处理缺失数据的张量线性判别分析（Tensor LDA-MD）算法，通过利用判别张量的可分解低秩结构来处理具有缺失条目的高维张量预测器。我们的工作建立了带有不完整数据的判别张量估计误差的收敛速率，并提出了最小极值最优边界的错分率，填补了文献中的关键空白。此外，我们推导了广义模式逐样本协方差矩阵及其逆的大偏差界限，这些在我们的分析中是至关重要的工具，并具有独立的兴趣。我们的方法在仿真和实际数据分析中表现出色，即使存在大量缺失数据也能取得出色的性能。

更新时间: 2024-10-30 20:59:46

领域: stat.ML,cs.LG,stat.ME

下载: http://arxiv.org/abs/2410.14783v2

Graph-Augmented Relation Extraction Model with LLMs-Generated Support Document

This study introduces a novel approach to sentence-level relation extraction (RE) that integrates Graph Neural Networks (GNNs) with Large Language Models (LLMs) to generate contextually enriched support documents. By harnessing the power of LLMs to generate auxiliary information, our approach crafts an intricate graph representation of textual data. This graph is subsequently processed through a Graph Neural Network (GNN) to refine and enrich the embeddings associated with each entity ensuring a more nuanced and interconnected understanding of the data. This methodology addresses the limitations of traditional sentence-level RE models by incorporating broader contexts and leveraging inter-entity interactions, thereby improving the model's ability to capture complex relationships across sentences. Our experiments, conducted on the CrossRE dataset, demonstrate the effectiveness of our approach, with notable improvements in performance across various domains. The results underscore the potential of combining GNNs with LLM-generated context to advance the field of relation extraction.

Updated: 2024-10-30 20:48:34

标题: 带有LLMs生成支持文档的图增强关系抽取模型

摘要: 这项研究介绍了一种新颖的句子级关系抽取（RE）方法，将图神经网络（GNNs）与大型语言模型（LLMs）结合起来，生成具有上下文丰富支持文档。通过利用LLMs生成辅助信息的能力，我们的方法构建了文本数据的复杂图表示。随后，这个图通过图神经网络（GNN）进行处理，以精化和丰富与每个实体相关联的嵌入，确保对数据的更加细致和互相连接的理解。这种方法通过融入更广泛的背景和利用实体间的互动，解决了传统句子级RE模型的局限性，从而提高了模型捕捉跨句子间复杂关系的能力。我们在CrossRE数据集上进行的实验表明，我们的方法的有效性，各个领域的性能明显提高。结果强调了将GNNs与LLM生成的上下文相结合以推进关系抽取领域的潜力。

更新时间: 2024-10-30 20:48:34

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2410.23452v1

Rethinking Deep Thinking: Stable Learning of Algorithms using Lipschitz Constraints

Iterative algorithms solve problems by taking steps until a solution is reached. Models in the form of Deep Thinking (DT) networks have been demonstrated to learn iterative algorithms in a way that can scale to different sized problems at inference time using recurrent computation and convolutions. However, they are often unstable during training, and have no guarantees of convergence/termination at the solution. This paper addresses the problem of instability by analyzing the growth in intermediate representations, allowing us to build models (referred to as Deep Thinking with Lipschitz Constraints (DT-L)) with many fewer parameters and providing more reliable solutions. Additionally our DT-L formulation provides guarantees of convergence of the learned iterative procedure to a unique solution at inference time. We demonstrate DT-L is capable of robustly learning algorithms which extrapolate to harder problems than in the training set. We benchmark on the traveling salesperson problem to evaluate the capabilities of the modified system in an NP-hard problem where DT fails to learn.

Updated: 2024-10-30 20:48:22

标题: 重新思考深度思考：使用利普希茨约束稳定学习算法

摘要: 迭代算法通过逐步进行解决问题直到找到解决方案。已经证明，Deep Thinking（DT）网络形式的模型可以通过递归计算和卷积的方式学习迭代算法，从而能够在推断时适应不同规模的问题。然而，在训练过程中它们经常不稳定，并且无法保证收敛或终止于解决方案。本文通过分析中间表示的增长来解决不稳定性问题，使我们能够构建具有更少参数的模型（称为Deep Thinking with Lipschitz Constraints（DT-L）），并提供更可靠的解决方案。此外，我们的DT-L公式提供了保证，即在推断时学习的迭代过程将收敛于唯一解决方案。我们展示了DT-L能够稳健地学习算法，可以适用于比训练集中更难的问题。我们在旅行推销员问题上进行基准测试，以评估修改后系统在NP难问题中的能力，而传统的DT无法学习该问题。

更新时间: 2024-10-30 20:48:22

领域: cs.LG

下载: http://arxiv.org/abs/2410.23451v1

Return Augmented Decision Transformer for Off-Dynamics Reinforcement Learning

We study offline off-dynamics reinforcement learning (RL) to utilize data from an easily accessible source domain to enhance policy learning in a target domain with limited data. Our approach centers on return-conditioned supervised learning (RCSL), particularly focusing on the decision transformer (DT), which can predict actions conditioned on desired return guidance and complete trajectory history. Previous works tackle the dynamics shift problem by augmenting the reward in the trajectory from the source domain to match the optimal trajectory in the target domain. However, this strategy can not be directly applicable in RCSL owing to (1) the unique form of the RCSL policy class, which explicitly depends on the return, and (2) the absence of a straightforward representation of the optimal trajectory distribution. We propose the Return Augmented Decision Transformer (RADT) method, where we augment the return in the source domain by aligning its distribution with that in the target domain. We provide the theoretical analysis demonstrating that the RCSL policy learned from RADT achieves the same level of suboptimality as would be obtained without a dynamics shift. We introduce two practical implementations RADT-DARA and RADT-MV respectively. Extensive experiments conducted on D4RL datasets reveal that our methods generally outperform dynamic programming based methods in off-dynamics RL scenarios.

Updated: 2024-10-30 20:46:26

标题: 增强决策变换器在离线动态强化学习中的应用

摘要: 我们研究了离线离线动态强化学习（RL），利用来自易于获取的源域的数据来增强目标域中有限数据的策略学习。我们的方法集中在返回条件监督学习（RCSL），特别关注决策变换器（DT），它可以根据期望的回报指导和完整的轨迹历史预测动作。先前的研究通过增加源领域轨迹中的奖励来解决动态转移问题，以匹配目标领域中的最优轨迹。然而，由于（1）RCSL策略类的独特形式明确取决于回报，以及（2）最优轨迹分布的直接表示的缺失，这种策略在RCSL中无法直接应用。我们提出了返回增强决策变换器（RADT）方法，通过将源域中的回报与目标域中的回报分布对齐来增强源域中的回报。我们提供了理论分析，证明了从RADT学习的RCSL策略实现了与没有动态转移所获得的次优性水平相同的水平。我们分别介绍了两种实际实施RADT-DARA和RADT-MV。在D4RL数据集上进行的大量实验表明，我们的方法通常优于基于动态规划的方法在离线动态RL场景中。

更新时间: 2024-10-30 20:46:26

领域: cs.LG,cs.AI,cs.RO,stat.ML

下载: http://arxiv.org/abs/2410.23450v1

SongCreator: Lyrics-based Universal Song Generation

Music is an integral part of human culture, embodying human intelligence and creativity, of which songs compose an essential part. While various aspects of song generation have been explored by previous works, such as singing voice, vocal composition and instrumental arrangement, etc., generating songs with both vocals and accompaniment given lyrics remains a significant challenge, hindering the application of music generation models in the real world. In this light, we propose SongCreator, a song-generation system designed to tackle this challenge. The model features two novel designs: a meticulously designed dual-sequence language model (DSLM) to capture the information of vocals and accompaniment for song generation, and a series of attention mask strategies for DSLM, which allows our model to understand, generate and edit songs, making it suitable for various songrelated generation tasks by utilizing specific attention masks. Extensive experiments demonstrate the effectiveness of SongCreator by achieving state-of-the-art or competitive performances on all eight tasks. Notably, it surpasses previous works by a large margin in lyrics-to-song and lyrics-to-vocals. Additionally, it is able to independently control the acoustic conditions of the vocals and accompaniment in the generated song through different audio prompts, exhibiting its potential applicability. Our samples are available at https://thuhcsi.github.io/SongCreator/.

Updated: 2024-10-30 20:44:46

标题: 歌曲创作者：基于歌词的通用歌曲生成

摘要: 音乐是人类文化的一个重要组成部分，体现了人类的智慧和创造力，其中歌曲是一个重要的组成部分。尽管先前的研究探索了歌曲生成的各个方面，例如歌唱声音、声乐作曲和乐器编排等，但在给定歌词的情况下生成既有人声又有伴奏的歌曲仍然是一个重大挑战，阻碍了音乐生成模型在现实世界中的应用。基于这一考虑，我们提出了SongCreator，这是一个旨在解决这一挑战的歌曲生成系统。该模型具有两个新颖的设计：一个精心设计的双序列语言模型（DSLM），用于捕捉歌曲生成的人声和伴奏信息，以及一系列针对DSLM的注意力蒙版策略，使我们的模型能够理解、生成和编辑歌曲，通过利用特定的注意力蒙版适用于各种与歌曲有关的生成任务。大量实验证明了SongCreator的有效性，通过在所有八项任务上取得了最先进或具有竞争力的表现。值得注意的是，在歌词到歌曲和歌词到人声方面，它超过了先前的研究成果很大一部分。此外，通过不同的音频提示，它能够独立控制生成歌曲中人声和伴奏的声学条件，展示了其潜在的适用性。我们的样本可在https://thuhcsi.github.io/SongCreator/上找到。

更新时间: 2024-10-30 20:44:46

领域: cs.SD,cs.AI,eess.AS

下载: http://arxiv.org/abs/2409.06029v2

Unveiling the Hidden Structure of Self-Attention via Kernel Principal Component Analysis

The remarkable success of transformers in sequence modeling tasks, spanning various applications in natural language processing and computer vision, is attributed to the critical role of self-attention. Similar to the development of most deep learning models, the construction of these attention mechanisms relies on heuristics and experience. In our work, we derive self-attention from kernel principal component analysis (kernel PCA) and show that self-attention projects its query vectors onto the principal component axes of its key matrix in a feature space. We then formulate the exact formula for the value matrix in self-attention, theoretically and empirically demonstrating that this value matrix captures the eigenvectors of the Gram matrix of the key vectors in self-attention. Leveraging our kernel PCA framework, we propose Attention with Robust Principal Components (RPC-Attention), a novel class of robust attention that is resilient to data contamination. We empirically demonstrate the advantages of RPC-Attention over softmax attention on the ImageNet-1K object classification, WikiText-103 language modeling, and ADE20K image segmentation task.

Updated: 2024-10-30 20:40:04

标题: 通过核主成分分析揭示自注意力的隐藏结构

摘要: transformers在序列建模任务中取得了显著的成功，涵盖了自然语言处理和计算机视觉等各种应用，这归功于自注意力的关键作用。与大多数深度学习模型的发展类似，这些注意力机制的构建依赖于启发式和经验。在我们的工作中，我们从核主成分分析（kernel PCA）中推导出自注意力，并展示自注意力将其查询向量投影到其关键矩阵的主成分轴中的特征空间。然后，我们在自注意力中制定了准确的值矩阵公式，从理论和经验上证明了这个值矩阵捕捉了自注意力中关键向量的Gram矩阵的特征向量。利用我们的核PCA框架，我们提出了一种新颖的强鲁棒注意力类别，即具有鲁棒主成分的注意力（RPC-Attention），它对数据污染具有抵抗力。我们在ImageNet-1K对象分类、WikiText-103语言建模和ADE20K图像分割任务上经验性地证明了RPC-Attention相对于softmax注意力的优势。

更新时间: 2024-10-30 20:40:04

领域: cs.LG,cs.AI,cs.CL,cs.CV,stat.ML

下载: http://arxiv.org/abs/2406.13762v2

Venire: A Machine Learning-Guided Panel Review System for Community Content Moderation

Research into community content moderation often assumes that moderation teams govern with a single, unified voice. However, recent work has found that moderators disagree with one another at modest, but concerning rates. The problem is not the root disagreements themselves. Subjectivity in moderation is unavoidable, and there are clear benefits to including diverse perspectives within a moderation team. Instead, the crux of the issue is that, due to resource constraints, moderation decisions end up being made by individual decision-makers. The result is decision-making that is inconsistent, which is frustrating for community members. To address this, we develop Venire, an ML-backed system for panel review on Reddit. Venire uses a machine learning model trained on log data to identify the cases where moderators are most likely to disagree. Venire fast-tracks these cases for multi-person review. Ideally, Venire allows moderators to surface and resolve disagreements that would have otherwise gone unnoticed. We conduct three studies through which we design and evaluate Venire: a set of formative interviews with moderators, technical evaluations on two datasets, and a think-aloud study in which moderators used Venire to make decisions on real moderation cases. Quantitatively, we demonstrate that Venire is able to improve decision consistency and surface latent disagreements. Qualitatively, we find that Venire helps moderators resolve difficult moderation cases more confidently. Venire represents a novel paradigm for human-AI content moderation, and shifts the conversation from replacing human decision-making to supporting it.

Updated: 2024-10-30 20:39:34

标题: Venire: 一种机器学习引导的社区内容审查面板系统

摘要: 社区内容管理研究通常假设管理团队以单一统一的声音进行管理。然而，最近的研究发现，管理者之间存在着适度但令人担忧的分歧。问题不在于根本的分歧本身。在管理中的主观性是不可避免的，并且在管理团队中包含多元化的观点具有明显的好处。相反，问题的关键在于，由于资源限制，管理决策最终由个体决策者做出。结果是决策不一致，这让社区成员感到沮丧。为了解决这个问题，我们开发了Venire，这是一个在Reddit上进行小组审查的ML支持系统。Venire使用在日志数据上训练的机器学习模型来识别管理者最有可能意见不一致的案例。Venire为这些案例提供了多人审查的快速通道。理想情况下，Venire允许管理者表达和解决可能会被忽视的分歧。我们通过三个研究来设计和评估Venire：与管理者进行的一组形成性访谈，对两个数据集的技术评估，以及管理者使用Venire对真实管理案例做出决策的思考研究。在定量上，我们证明Venire能够提高决策的一致性并揭示潜在的分歧。在质性上，我们发现Venire帮助管理者更加自信地解决困难的管理案例。Venire代表了人工智能内容管理的一种新范式，并将对话从替代人类决策转变为支持人类决策。

更新时间: 2024-10-30 20:39:34

领域: cs.HC,cs.AI,cs.LG

下载: http://arxiv.org/abs/2410.23448v1

Generative Inverse Design of Metamaterials with Functional Responses by Interpretable Learning

Metamaterials with functional responses can exhibit varying properties under different conditions (e.g., wave-based responses or deformation-induced property variation). This work addresses the rapid inverse design of such metamaterials to meet target qualitative functional behaviors, a challenge due to its intractability and non-unique solutions. Unlike data-intensive and non-interpretable deep-learning-based methods, we propose the Random-forest-based Interpretable Generative Inverse Design (RIGID), a single-shot inverse design method for fast generation of metamaterial designs with on-demand functional behaviors. RIGID leverages the interpretability of a random forest-based "design$\rightarrow$response" forward model, eliminating the need for a more complex "response$\rightarrow$design" inverse model. Based on the likelihood of target satisfaction derived from the trained random forest, one can sample a desired number of design solutions using Markov chain Monte Carlo methods. We validate RIGID on acoustic and optical metamaterial design problems, each with fewer than 250 training samples. Compared to the genetic algorithm-based design generation approach, RIGID generates satisfactory solutions that cover a broader range of the design space, allowing for better consideration of additional figures of merit beyond target satisfaction. This work offers a new perspective on solving on-demand inverse design problems, showcasing the potential for incorporating interpretable machine learning into generative design under small data constraints.

Updated: 2024-10-30 20:38:21

标题: 用可解释学习进行具有功能响应的超材料生成性逆向设计

摘要: 具有功能响应的超材料在不同条件下可以表现出不同的特性（例如，基于波的响应或变形诱导的特性变化）。本研究针对这种超材料的快速反向设计，以满足目标定性功能行为的挑战，这是由于其难以处理性和非唯一解的原因。与基于数据密集和不可解释的深度学习方法不同，我们提出了基于随机森林的可解释生成反向设计（RIGID）方法，这是一种用于快速生成具有按需功能行为的超材料设计的单次反向设计方法。RIGID利用基于随机森林的“设计→响应”正向模型的可解释性，消除了更复杂的“响应→设计”反向模型的需求。基于训练的随机森林导出的目标满意度的可能性，可以使用马尔可夫链蒙特卡洛方法对所需数量的设计解决方案进行抽样。我们在声学和光学超材料设计问题上验证了RIGID，每个问题都有不到250个训练样本。与基于遗传算法的设计生成方法相比，RIGID生成了令人满意的解决方案，涵盖了更广泛的设计空间，可以更好地考虑目标满意度之外的其他优点。这项工作提供了解决按需反向设计问题的新视角，展示了在小数据约束下将可解释的机器学习纳入生成设计的潜力。

更新时间: 2024-10-30 20:38:21

领域: physics.optics,cs.LG

下载: http://arxiv.org/abs/2401.00003v6

Learning General Representation of 12-Lead Electrocardiogram with a Joint-Embedding Predictive Architecture

We propose a self-supervised learning method for 12-lead Electrocardiogram (ECG) analysis, named ECG Joint Embedding Predictive Architecture (ECG-JEPA). ECG-JEPA employs a masking strategy to learn semantic representations of ECG data. Unlike existing methods, ECG-JEPA predicts at the hidden representation level rather than reconstructing raw data. This approach offers several advantages in the ECG domain: (1) it avoids producing unnecessary details, such as noise, which is common in standard ECG; and (2) it addresses the limitations of na\"ive L2 loss between raw signals. Another key contribution is the introduction of a special masked attention tailored for 12-lead ECG data, Cross-Pattern Attention (CroPA). CroPA enables the model to effectively capture inter-patch relationships. Additionally, ECG-JEPA is highly scalable, allowing efficient training on large datasets. Our code is openly available https://github.com/sehunfromdaegu/ECG_JEPA.

Updated: 2024-10-30 20:33:40

标题: 学习使用联合嵌入预测架构的12导联心电图的通用表示

摘要: 我们提出了一种用于12导联心电图（ECG）分析的自监督学习方法，名为ECG联合嵌入预测架构（ECG-JEPA）。ECG-JEPA采用掩蔽策略来学习ECG数据的语义表示。与现有方法不同，ECG-JEPA在隐藏表示级别上进行预测，而不是重建原始数据。这种方法在ECG领域具有几个优点：（1）它避免了产生不必要的细节，如在标准ECG中常见的噪声；（2）它解决了原始信号之间的naïve L2损失的局限性。另一个关键贡献是引入了一种特殊的面向12导联ECG数据的掩蔽注意力，Cross-Pattern Attention（CroPA）。CroPA使模型能够有效地捕捉补丁之间的关系。此外，ECG-JEPA具有高度可扩展性，可以在大型数据集上进行高效训练。我们的代码可以在https://github.com/sehunfromdaegu/ECG_JEPA 上公开获取。

更新时间: 2024-10-30 20:33:40

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2410.08559v2

Learning Lipschitz Operators with respect to Gaussian Measures with Near-Optimal Sample Complexity

Operator learning, the approximation of mappings between infinite-dimensional function spaces using ideas from machine learning, has gained increasing research attention in recent years. Approximate operators, learned from data, hold promise to serve as efficient surrogate models for problems in computational science and engineering, complementing traditional numerical methods. However, despite their empirical success, our understanding of the underpinning mathematical theory is in large part still incomplete. In this paper, we study the approximation of Lipschitz operators in expectation with respect to Gaussian measures. We prove higher Gaussian Sobolev regularity of Lipschitz operators and establish lower and upper bounds on the Hermite polynomial approximation error. We further consider the reconstruction of Lipschitz operators from $m$ arbitrary (adaptive) linear samples. A key finding is the tight characterization of the smallest achievable error for all possible (adaptive) sampling and reconstruction maps in terms of $m$. It is shown that Hermite polynomial approximation is an optimal recovery strategy, but we have the following curse of sample complexity: No method to approximate Lipschitz operators based on finitely many samples can achieve algebraic convergence rates in $m$. On the positive side, we prove that a sufficiently fast spectral decay of the covariance operator of the Gaussian measure guarantees convergence rates which are arbitrarily close to any algebraic rate in the large data limit $m \to \infty$. Finally, we focus on the recovery of Lipschitz operators from finitely many point samples. We consider Christoffel sampling and weighted least-squares approximation, and present an algorithm which provably achieves near-optimal sample complexity.

Updated: 2024-10-30 20:32:30

标题: 学习关于高斯测度的利普希茨算子的近最优样本复杂度

摘要: 运算学习是利用机器学习思想在无限维函数空间之间近似映射的过程，在近年来受到越来越多的研究关注。从数据中学习的近似算子有望作为计算科学和工程问题的高效替代模型，补充传统的数值方法。然而，尽管它们在实践中取得了成功，我们对其数学理论的理解在很大程度上仍然不完整。本文研究了对高斯测度的期望中的利普希茨算子的近似。我们证明了利普希茨算子的更高高斯Sobolev正则性，并建立了Hermite多项式逼近误差的下界和上界。我们进一步考虑了从$m$个任意（自适应）线性样本中重建利普希茨算子。一个关键发现是以$m$为变量，对于所有可能的（自适应）取样和重建映射，最小可达误差的严格刻画。结果表明，Hermite多项式逼近是一种最佳恢复策略，但我们面临样本复杂性的困境：任何基于有限样本的逼近利普希茨算子的方法都无法在$m$上实现代数收敛率。积极的一面是，我们证明了高斯测度的协方差算子具有足够快的谱衰减，可以保证在大数据极限$m \to \infty$情况下，收敛速度可以任意接近任何代数速率。最后，我们专注于从有限个点样本中恢复利普希茨算子。我们考虑Christoffel取样和加权最小二乘逼近，并提出了一个可以保证实现接近最优样本复杂性的算法。

更新时间: 2024-10-30 20:32:30

领域: cs.LG,cs.NA,math.NA

下载: http://arxiv.org/abs/2410.23440v1

Learning and Transferring Sparse Contextual Bigrams with Linear Transformers

Transformers have excelled in natural language modeling and one reason behind this success is their exceptional ability to combine contextual informal and global knowledge. However, the theoretical basis remains unclear. In this paper, first we introduce the Sparse Contextual Bigram (SCB), a natural extension of the classical bigram model, where the next token's generation depends on a sparse set of earlier positions determined by the last token. We then analyze the training dynamics and sample complexity of learning SCB using a one-layer linear transformer with a gradient-based algorithm. We show that when trained from scratch, the training process can be split into an initial sample-intensive stage where the correlation is boosted from zero to a nontrivial value, followed by a more sample-efficient stage of further improvement. Additionally, we prove that, provided a nontrivial correlation between the downstream and pretraining tasks, finetuning from a pretrained model allows us to bypass the initial sample-intensive stage. We also empirically demonstrate that our algorithm can outperform SGD in this setting and discuss its relationship with the usual softmax-based transformers.

Updated: 2024-10-30 20:29:10

标题: 学习和传输稀疏上下文双字词与线性变换器

摘要: 变压器在自然语言建模方面表现出色，其中一个原因是它们在结合上下文信息和全局知识方面具有出色的能力。然而，其理论基础仍不清楚。本文首先介绍了稀疏上下文二元模型（SCB），这是经典二元模型的自然延伸，在这里，下一个标记的生成取决于由最后一个标记确定的一组稀疏位置。然后，我们分析了使用基于梯度的算法和单层线性变压器学习SCB的训练动态和样本复杂度。我们表明，当从头开始训练时，训练过程可以分为一个初始的样本密集阶段，在这个阶段，相关性从零提升到一个非平凡值，然后是一个更加样本高效的阶段，进一步改进。此外，我们证明了，只要下游任务和预训练任务之间存在非平凡的相关性，从预训练模型微调可以使我们绕过初始的样本密集阶段。我们还在实验中证明，我们的算法在此设置中可以胜过SGD，并讨论了它与传统基于softmax的变压器之间的关系。

更新时间: 2024-10-30 20:29:10

领域: cs.LG,cs.CL

下载: http://arxiv.org/abs/2410.23438v1

Mind the Gap: A Generalized Approach for Cross-Modal Embedding Alignment

Retrieval-Augmented Generation (RAG) systems enhance text generation by incorporating external knowledge but often struggle when retrieving context across different text modalities due to semantic gaps. We introduce a generalized projection-based method, inspired by adapter modules in transfer learning, that efficiently bridges these gaps between various text types, such as programming code and pseudocode, or English and French sentences. Our approach emphasizes speed, accuracy, and data efficiency, requiring minimal resources for training and inference. By aligning embeddings from heterogeneous text modalities into a unified space through a lightweight projection network, our model significantly outperforms traditional retrieval methods like the Okapi BM25 algorithm and models like Dense Passage Retrieval (DPR), while approaching the accuracy of Sentence Transformers. Extensive evaluations demonstrate the effectiveness and generalizability of our method across different tasks, highlighting its potential for real-time, resource-constrained applications.

Updated: 2024-10-30 20:28:10

标题: 注意差距：一种用于跨模态嵌入对齐的广义方法

摘要: Retrieval-Augmented Generation (RAG) 系统通过整合外部知识增强文本生成，但在检索跨不同文本模态的上下文时经常遇到语义差距的困难。我们引入了一种受迁移学习中适配器模块启发的广义基于投影的方法，有效地弥合了不同文本类型之间的差距，如编程代码和伪代码，或英语和法语句子。我们的方法强调速度、准确性和数据效率，需要极少的资源进行训练和推断。通过将异构文本模态的嵌入对齐到一个统一空间，我们的模型在显著优于传统的检索方法如 Okapi BM25 算法和 Dense Passage Retrieval (DPR) 等模型的同时，接近于 Sentence Transformers 的准确性。广泛的评估表明我们的方法在不同任务中的有效性和泛化性，突显了其在实时、资源受限应用中的潜力。

更新时间: 2024-10-30 20:28:10

领域: cs.LG,cs.CL,cs.IR,H.3.3; I.2.7; I.2.6

下载: http://arxiv.org/abs/2410.23437v1

A Cryogenic Memristive Neural Decoder for Fault-tolerant Quantum Error Correction

Neural decoders for quantum error correction (QEC) rely on neural networks to classify syndromes extracted from error correction codes and find appropriate recovery operators to protect logical information against errors. Its ability to adapt to hardware noise and long-term drifts make neural decoders a promising candidate for inclusion in a fault-tolerant quantum architecture. However, given their limited scalability, it is prudent that small-scale (local) neural decoders are treated as first stages of multi-stage decoding schemes for fault-tolerant quantum computers with millions of qubits. In this case, minimizing the decoding time to match the stabilization measurements frequency and a tight co-integration with the QPUs is highly desired. Cryogenic realizations of neural decoders can not only improve the performance of higher stage decoders, but they can minimize communication delays, and alleviate wiring bottlenecks. In this work, we design and analyze a neural decoder based on an in-memory computation (IMC) architecture, where crossbar arrays of resistive memory devices are employed to both store the synaptic weights of the neural decoder and perform analog matrix-vector multiplications. In simulations supported by experimental measurements, we investigate the impact of TiOx-based memristive devices' non-idealities on decoding fidelity. We develop hardware-aware re-training methods to mitigate the fidelity loss, restoring the ideal decoder's pseudo-threshold for the distance-3 surface code. This work provides a pathway to scalable, fast, and low-power cryogenic IMC hardware for integrated fault-tolerant QEC.

Updated: 2024-10-30 20:23:50

标题: 一个用于容错量子错误校正的低温膜电阻神经解码器

摘要: 量子纠错（QEC）的神经解码器依赖于神经网络来分类从纠错码中提取的综合征，并找到适当的恢复算子来保护逻辑信息免受错误影响。其适应硬件噪声和长期漂移的能力使神经解码器成为容纳在容错量子架构中的有前途的候选者。然而，考虑到它们的有限可扩展性，明智地将小规模（局部）神经解码器视为百万量子比特的容错量子计算机的多阶段解码方案的第一阶段。在这种情况下，将解码时间最小化以匹配稳定测量频率，并与量子处理器紧密集成是非常有必要的。神经解码器的低温实现不仅可以提高更高阶解码器的性能，还可以最小化通信延迟，减轻布线瓶颈。在这项工作中，我们设计和分析了基于内存计算（IMC）架构的神经解码器，其中利用电阻性存储器设备的交叉阵列来存储神经解码器的突触权重并执行模拟矩阵-向量乘法。在通过实验测量支持的模拟中，我们调查了基于TiOx的忆阻器件的非理想性对解码保真度的影响。我们开发了硬件感知的重新训练方法来减轻保真度损失，恢复理想解码器对距离为3的表面码的伪阈值。这项工作为集成容错QEC提供了一条可扩展、快速、低功耗的低温IMC硬件的路径。

更新时间: 2024-10-30 20:23:50

领域: quant-ph,cs.ET,cs.LG

下载: http://arxiv.org/abs/2307.09463v2

Epsilon-Greedy Thompson Sampling to Bayesian Optimization

Bayesian optimization (BO) has become a powerful tool for solving simulation-based engineering optimization problems thanks to its ability to integrate physical and mathematical understandings, consider uncertainty, and address the exploitation-exploration dilemma. Thompson sampling (TS) is a preferred solution for BO to handle the exploitation-exploration trade-off. While it prioritizes exploration by generating and minimizing random sample paths from probabilistic models -- a fundamental ingredient of BO -- TS weakly manages exploitation by gathering information about the true objective function after it obtains new observations. In this work, we improve the exploitation of TS by incorporating the $\varepsilon$-greedy policy, a well-established selection strategy in reinforcement learning. We first delineate two extremes of TS, namely the generic TS and the sample-average TS. The former promotes exploration, while the latter favors exploitation. We then adopt the $\varepsilon$-greedy policy to randomly switch between these two extremes. Small and large values of $\varepsilon$ govern exploitation and exploration, respectively. By minimizing two benchmark functions and solving an inverse problem of a steel cantilever beam, we empirically show that $\varepsilon$-greedy TS equipped with an appropriate $\varepsilon$ is more robust than its two extremes, matching or outperforming the better of the generic TS and the sample-average TS.

Updated: 2024-10-30 20:22:36

标题: Epsilon-Greedy Thompson抽样用于贝叶斯优化

摘要: 贝叶斯优化（BO）已经成为解决基于模拟的工程优化问题的强大工具，这要归功于它能够整合物理和数学理解，考虑不确定性，并解决开发-探索困境。汤普森抽样（TS）是BO处理开发-探索权衡的首选方案。它通过从概率模型生成和最小化随机样本路径来优先探索，这是BO的基本组成部分。TS通过在获得新观察后收集关于真实目标函数的信息来弱化开发。在这项工作中，我们通过将$\varepsilon$-greedy策略纳入TS来改善开发。$\varepsilon$-greedy策略是强化学习中一个成熟的选择策略。我们首先描述了TS的两个极端，即通用TS和样本平均TS。前者促进探索，而后者偏好开发。然后我们采用$\varepsilon$-greedy策略在这两个极端之间随机切换。$\varepsilon$的小值和大值分别控制开发和探索。通过最小化两个基准函数和解决钢悬臂梁的反问题，我们在实证中展示了$\varepsilon$-greedy TS配备适当的$\varepsilon$比其两个极端更稳健，与通用TS和样本平均TS中更好的一个相匹配或超越。

更新时间: 2024-10-30 20:22:36

领域: cs.LG,math.OC,stat.ML

下载: http://arxiv.org/abs/2403.00540v3

Model-free Low-Rank Reinforcement Learning via Leveraged Entry-wise Matrix Estimation

We consider the problem of learning an $\varepsilon$-optimal policy in controlled dynamical systems with low-rank latent structure. For this problem, we present LoRa-PI (Low-Rank Policy Iteration), a model-free learning algorithm alternating between policy improvement and policy evaluation steps. In the latter, the algorithm estimates the low-rank matrix corresponding to the (state, action) value function of the current policy using the following two-phase procedure. The entries of the matrix are first sampled uniformly at random to estimate, via a spectral method, the leverage scores of its rows and columns. These scores are then used to extract a few important rows and columns whose entries are further sampled. The algorithm exploits these new samples to complete the matrix estimation using a CUR-like method. For this leveraged matrix estimation procedure, we establish entry-wise guarantees that remarkably, do not depend on the coherence of the matrix but only on its spikiness. These guarantees imply that LoRa-PI learns an $\varepsilon$-optimal policy using $\widetilde{O}({S+A\over \mathrm{poly}(1-\gamma)\varepsilon^2})$ samples where $S$ (resp. $A$) denotes the number of states (resp. actions) and $\gamma$ the discount factor. Our algorithm achieves this order-optimal (in $S$, $A$ and $\varepsilon$) sample complexity under milder conditions than those assumed in previously proposed approaches.

Updated: 2024-10-30 20:22:17

标题: 无模型低秩强化学习通过杠杆入口级矩阵估计

摘要: 我们考虑在具有低秩潜在结构的受控动态系统中学习一个$\varepsilon$-最优策略的问题。针对这个问题，我们提出了LoRa-PI（低秩策略迭代），这是一个无模型学习算法，交替进行策略改进和策略评估步骤。在后者中，该算法使用以下两阶段过程估计当前策略的（状态，动作）值函数对应的低秩矩阵。首先，通过随机均匀采样矩阵的条目来估计其行和列的杠杆得分，通过谱方法估计。然后利用这些得分提取几个重要行和列，进一步采样其条目。算法利用这些新样本使用类似CUR方法完成矩阵估计。对于这种杠杆矩阵估计过程，我们建立了逐项保证，这些保证令人惊讶地不依赖于矩阵的一致性，而仅依赖于其尖锐性。这些保证意味着LoRa-PI使用$\widetilde{O}({S+A\over \mathrm{poly}(1-\gamma)\varepsilon^2})$个样本学习一个$\varepsilon$-最优策略，其中$S$（$A$）表示状态（动作）的数量，$\gamma$表示折扣因子。我们的算法在比先前提出的方法更温和的条件下，实现了在$S$，$A$和$\varepsilon$中的样本复杂度的顺序最优性。

更新时间: 2024-10-30 20:22:17

领域: cs.LG

下载: http://arxiv.org/abs/2410.23434v1

Assessing Concordance between RNA-Seq and NanoString Technologies in Ebola-Infected Nonhuman Primates Using Machine Learning

This study evaluates the concordance between RNA sequencing (RNA-Seq) and NanoString technologies for gene expression analysis in non-human primates (NHPs) infected with Ebola virus (EBOV). We performed a detailed comparison of both platforms, demonstrating a strong correlation between them, with Spearman coefficients for 56 out of 62 samples ranging from 0.78 to 0.88, with a mean of 0.83 and a median of 0.85. Bland-Altman analysis further confirmed high consistency, with most measurements falling within 95% confidence limits. A machine learning approach, using the Supervised Magnitude-Altitude Scoring (SMAS) method trained on NanoString data, identified OAS1 as a key marker for distinguishing RT-qPCR positive from negative samples. Remarkably, when applied to RNA-Seq data, OAS1 also achieved 100% accuracy in differentiating infected from uninfected samples using logistic regression, demonstrating its robustness across platforms. Further differential expression analysis identified 12 common genes including ISG15, OAS1, IFI44, IFI27, IFIT2, IFIT3, IFI44L, MX1, MX2, OAS2, RSAD2, and OASL which demonstrated the highest levels of statistical significance and biological relevance across both platforms. Gene Ontology (GO) analysis confirmed that these genes are directly involved in key immune and viral infection pathways, reinforcing their importance in EBOV infection. In addition, RNA-Seq uniquely identified genes such as CASP5, USP18, and DDX60, which play key roles in immune regulation and antiviral defense. This finding highlights the broader detection capabilities of RNA-Seq and underscores the complementary strengths of both platforms in providing a comprehensive and accurate assessment of gene expression changes during Ebola virus infection.

Updated: 2024-10-30 20:21:20

标题: 利用机器学习评估RNA-Seq和NanoString技术在埃博拉感染非人灵长类动物中的一致性

摘要: 这项研究评估了RNA测序（RNA-Seq）和NanoString技术在非人灵长类动物（NHPs）感染埃博拉病毒（EBOV）的基因表达分析中的一致性。我们对这两种平台进行了详细比较，结果显示它们之间存在很强的相关性，62个样本中有56个的Spearman系数在0.78至0.88之间，平均值为0.83，中位数为0.85。Bland-Altman分析进一步确认了高一致性，大多数测量结果在95%置信区间内。采用NanoString数据训练的Supervised Magnitude-Altitude Scoring（SMAS）方法的机器学习方法，确定了OAS1作为区分RT-qPCR阳性和阴性样本的关键标志物。值得注意的是，当应用于RNA-Seq数据时，OAS1在通过逻辑回归区分感染和未感染样本方面也实现了100%的准确率，证明了其跨平台的稳健性。进一步的差异表达分析确定了12个共同基因，包括ISG15、OAS1、IFI44、IFI27、IFIT2、IFIT3、IFI44L、MX1、MX2、OAS2、RSAD2和OASL，这些基因在两种平台上表现出最高水平的统计显著性和生物学相关性。基因本体（GO）分析确认这些基因直接参与关键的免疫和病毒感染途径，强调了它们在EBOV感染中的重要性。此外，RNA-Seq独特地识别出CAS5、USP18和DDX60等在免疫调节和抗病毒防御中发挥关键作用的基因。这一发现突显了RNA-Seq更广泛的检测能力，并强调了两种平台在提供埃博拉病毒感染期间基因表达变化的全面和准确评估方面的互补优势。

更新时间: 2024-10-30 20:21:20

领域: q-bio.GN,cs.LG

下载: http://arxiv.org/abs/2410.23433v1

GenRL: Multimodal-foundation world models for generalization in embodied agents

Learning generalist embodied agents, able to solve multitudes of tasks in different domains is a long-standing problem. Reinforcement learning (RL) is hard to scale up as it requires a complex reward design for each task. In contrast, language can specify tasks in a more natural way. Current foundation vision-language models (VLMs) generally require fine-tuning or other adaptations to be adopted in embodied contexts, due to the significant domain gap. However, the lack of multimodal data in such domains represents an obstacle to developing foundation models for embodied applications. In this work, we overcome these problems by presenting multimodal-foundation world models, able to connect and align the representation of foundation VLMs with the latent space of generative world models for RL, without any language annotations. The resulting agent learning framework, GenRL, allows one to specify tasks through vision and/or language prompts, ground them in the embodied domain's dynamics, and learn the corresponding behaviors in imagination. As assessed through large-scale multi-task benchmarking in locomotion and manipulation domains, GenRL enables multi-task generalization from language and visual prompts. Furthermore, by introducing a data-free policy learning strategy, our approach lays the groundwork for foundational policy learning using generative world models. Website, code and data: https://mazpie.github.io/genrl/

Updated: 2024-10-30 20:16:18

标题: GenRL：多模态基础世界模型对具身代理的泛化能力的影响

摘要: 学习通用的具有身体的代理人，能够解决不同领域的多种任务是一个长期存在的问题。强化学习（RL）很难扩展，因为它需要为每个任务设计复杂的奖励。相比之下，语言可以以更自然的方式指定任务。当前的基础视觉语言模型（VLMs）通常需要微调或其他调整才能在具身体环境中应用，因为存在显著的领域差距。然而，在这些领域缺乏多模态数据表示着为具身体应用开发基础模型的障碍。在这项工作中，我们通过提出多模态基础世界模型来克服这些问题，能够将基础VLMs的表示与RL的生成世界模型的潜在空间连接和对齐，而无需任何语言注释。由此产生的代理人学习框架GenRL允许通过视觉和/或语言提示指定任务，将其基于具身体领域的动态，并在想象中学习相应的行为。通过在运动和操作领域进行大规模多任务基准测试，我们发现GenRL能够从语言和视觉提示中实现多任务泛化。此外，通过引入一种无数据策略学习策略，我们的方法为使用生成世界模型进行基础策略学习奠定了基础。网站、代码和数据：https://mazpie.github.io/genrl/

更新时间: 2024-10-30 20:16:18

领域: cs.AI,cs.CV,cs.LG,cs.RO

下载: http://arxiv.org/abs/2406.18043v2

Search for Efficient Large Language Models

Large Language Models (LLMs) have long held sway in the realms of artificial intelligence research. Numerous efficient techniques, including weight pruning, quantization, and distillation, have been embraced to compress LLMs, targeting memory reduction and inference acceleration, which underscore the redundancy in LLMs. However, most model compression techniques concentrate on weight optimization, overlooking the exploration of optimal architectures. Besides, traditional architecture search methods, limited by the elevated complexity with extensive parameters, struggle to demonstrate their effectiveness on LLMs. In this paper, we propose a training-free architecture search framework to identify optimal subnets that preserve the fundamental strengths of the original LLMs while achieving inference acceleration. Furthermore, after generating subnets that inherit specific weights from the original LLMs, we introduce a reformation algorithm that utilizes the omitted weights to rectify the inherited weights with a small amount of calibration data. Compared with SOTA training-free structured pruning works that can generate smaller networks, our method demonstrates superior performance across standard benchmarks. Furthermore, our generated subnets can directly reduce the usage of GPU memory and achieve inference acceleration. Code: https://github.com/shawnricecake/search-llm

Updated: 2024-10-30 20:04:01

标题: 寻找高效的大型语言模型

摘要: 大型语言模型（LLMs）长期以来在人工智能研究领域占据主导地位。许多高效的技术，包括权重修剪、量化和蒸馏，已经被采用来压缩LLMs，旨在减少内存占用和加速推理，突显了LLMs中的冗余。然而，大多数模型压缩技术集中在权重优化上，忽略了对最佳架构的探索。此外，传统的架构搜索方法受到参数数量庞大的限制，难以在LLMs上展示其有效性。在本文中，我们提出了一个无需训练的架构搜索框架，以识别保留原始LLMs基本优势的最佳子网，同时实现推理加速。此外，在生成从原始LLMs继承特定权重的子网后，我们引入了一种改革算法，利用省略的权重利用少量校准数据纠正继承的权重。与SOTA无需训练的结构修剪作品相比，我们的方法在标准基准测试中展现出了卓越的性能。此外，我们生成的子网可以直接减少GPU内存的使用并实现推理加速。代码：https://github.com/shawnricecake/search-llm

更新时间: 2024-10-30 20:04:01

领域: cs.AI

下载: http://arxiv.org/abs/2409.17372v2

Communication-Efficient Federated Learning over Wireless Channels via Gradient Sketching

Large-scale federated learning (FL) over wireless multiple access channels (MACs) has emerged as a crucial learning paradigm with a wide range of applications. However, its widespread adoption is hindered by several major challenges, including limited bandwidth shared by many edge devices, noisy and erroneous wireless communications, and heterogeneous datasets with different distributions across edge devices. To overcome these fundamental challenges, we propose Federated Proximal Sketching (FPS), tailored towards band-limited wireless channels and handling data heterogeneity across edge devices. FPS uses a count sketch data structure to address the bandwidth bottleneck and enable efficient compression while maintaining accurate estimation of significant coordinates. Additionally, we modify the loss function in FPS such that it is equipped to deal with varying degrees of data heterogeneity. We establish the convergence of the FPS algorithm under mild technical conditions and characterize how the bias induced due to factors like data heterogeneity and noisy wireless channels play a role in the overall result. We complement the proposed theoretical framework with numerical experiments that demonstrate the stability, accuracy, and efficiency of FPS in comparison to state-of-the-art methods on both synthetic and real-world datasets. Overall, our results show that FPS is a promising solution to tackling the above challenges of FL over wireless MACs.

Updated: 2024-10-30 20:01:08

标题: 通过梯度草图在无线通道上实现高效沟通的联邦学习

摘要: 大规模的联邦学习（FL）在无线多址接入通道（MACs）上已经成为一种关键的学习范式，具有广泛的应用。然而，其广泛采用受到几个主要挑战的阻碍，包括许多边缘设备共享的带宽有限，无线通信中存在嘈杂和错误，并且边缘设备之间具有不同分布的异构数据集。为了克服这些根本性挑战，我们提出了适用于带宽有限的无线通道并处理边缘设备间数据异质性的Federated Proximal Sketching（FPS）。FPS使用计数草图数据结构来解决带宽瓶颈，并实现高效压缩，同时保持对重要坐标的准确估计。此外，我们修改了FPS中的损失函数，使其能够处理不同程度的数据异质性。我们在较轻的技术条件下建立了FPS算法的收敛性，并描述了由于数据异质性和嘈杂的无线通道等因素导致的偏差在整体结果中起到的作用。我们将提出的理论框架与数值实验相结合，展示了FPS在合成和实际数据集上与最先进方法相比的稳定性、准确性和效率。总的来说，我们的结果表明FPS是解决FL在无线MACs上面对的上述挑战的一个有前途的解决方案。

更新时间: 2024-10-30 20:01:08

领域: cs.LG

下载: http://arxiv.org/abs/2410.23424v1

Dynamic Information Sub-Selection for Decision Support

We introduce Dynamic Information Sub-Selection (DISS), a novel framework of AI assistance designed to enhance the performance of black-box decision-makers by tailoring their information processing on a per-instance basis. Blackbox decision-makers (e.g., humans or real-time systems) often face challenges in processing all possible information at hand (e.g., due to cognitive biases or resource constraints), which can degrade decision efficacy. DISS addresses these challenges through policies that dynamically select the most effective features and options to forward to the black-box decision-maker for prediction. We develop a scalable frequentist data acquisition strategy and a decision-maker mimicking technique for enhanced budget efficiency. We explore several impactful applications of DISS, including biased decision-maker support, expert assignment optimization, large language model decision support, and interpretability. Empirical validation of our proposed DISS methodology shows superior performance to state-of-the-art methods across various applications.

Updated: 2024-10-30 20:00:54

标题: 动态信息子选择用于决策支持

摘要: 我们介绍了动态信息子选择（DISS），这是一个新颖的人工智能辅助框架，旨在通过根据每个实例定制信息处理，提高黑盒决策者的性能。黑盒决策者（例如人类或实时系统）通常面临处理所有可能信息的挑战（例如由于认知偏见或资源限制），这可能降低决策效力。DISS通过动态选择最有效的特征和选项，将其传递给黑盒决策者进行预测，来解决这些挑战。我们开发了一种可扩展的频率主义数据获取策略和决策者模仿技术，以提高预算效率。我们探讨了DISS的几个重要应用，包括支持有偏见的决策者、专家分配优化、大型语言模型决策支持和可解释性。我们提出的DISS方法的实证验证表明，在各种应用中，其性能优于现有方法。

更新时间: 2024-10-30 20:00:54

领域: cs.LG

下载: http://arxiv.org/abs/2410.23423v1

Mitigating Challenges in Ethereum's Proof-of-Stake Consensus: Evaluating the Impact of EigenLayer and Lido

The transition of Ethereum from a Proof-of-Work (PoW) to a Proof-of-Stake (PoS) consensus mechanism introduces a transformative approach to blockchain validation, offering enhanced scalability, energy efficiency, and security. However, this shift also presents significant challenges, including high barriers to becoming a validator, restrictions on the liquidity of staked Ether (ETH), and the risk of centralization due to staking pool dynamics. This paper addresses these challenges by exploring two innovative solutions: EigenLayer and Lido. EigenLayer is a middleware solution enabling restaking, allowing validators to secure multiple protocols and thereby increasing decentralization and profitability. Lido, a liquid staking protocol, simplifies participation by issuing stETH tokens that retain liquidity, allowing users to earn rewards without long-term lock-up constraints. This paper provides a detailed analysis of how these technologies mitigate key PoS challenges, reduce validator entry barriers, unlock staked capital, and improve decentralization. We conclude with an evaluation of the combined potential of EigenLayer and Lido to foster a more resilient and inclusive Ethereum ecosystem, setting the stage for further advancements in decentralized finance.

Updated: 2024-10-30 19:58:46

标题: 缓解以太坊权益证明共识中的挑战：评估EigenLayer和Lido的影响

摘要: 以太坊从工作证明（PoW）向权益证明（PoS）共识机制的转变引入了一种革命性的区块链验证方法，提供了增强的可扩展性、能源效率和安全性。然而，这种转变也带来了重大挑战，包括成为验证者的高门槛、质押以太币（ETH）流动性的限制，以及由于质押池动态而导致的集中化风险。本文通过探索两种创新解决方案EigenLayer和Lido来解决这些挑战。 EigenLayer是一种中间件解决方案，实现了再质押，允许验证者保护多个协议，从而增加了去中心化和盈利能力。Lido是一种流动质押协议，通过发行保留流动性的stETH代币，简化了参与过程，使用户可以在没有长期锁定限制的情况下获得奖励。本文详细分析了这些技术如何缓解关键PoS挑战，降低验证者进入门槛，解锁质押资本，提高去中心化程度。最后，我们评估了EigenLayer和Lido结合潜力，促进更具弹性和包容性的以太坊生态系统，为去中心化金融的进一步发展奠定了基础。

更新时间: 2024-10-30 19:58:46

领域: cs.CR

下载: http://arxiv.org/abs/2410.23422v1

The Z-Gromov-Wasserstein Distance

The Gromov-Wasserstein (GW) distance is a powerful tool for comparing metric measure spaces which has found broad applications in data science and machine learning. Driven by the need to analyze datasets whose objects have increasingly complex structure (such as node and edge-attributed graphs), several variants of GW distance have been introduced in the recent literature. With a view toward establishing a general framework for the theory of GW-like distances, this paper considers a vast generalization of the notion of a metric measure space: for an arbitrary metric space $Z$, we define a $Z$-network to be a measure space endowed with a kernel valued in $Z$. We introduce a method for comparing $Z$-networks by defining a generalization of GW distance, which we refer to as $Z$-Gromov-Wasserstein ($Z$-GW) distance. This construction subsumes many previously known metrics and offers a unified approach to understanding their shared properties. This paper demonstrates that the $Z$-GW distance defines a metric on the space of $Z$-networks which retains desirable properties of $Z$, such as separability, completeness, and geodesicity. Many of these properties were unknown for existing variants of GW distance that fall under our framework. Our focus is on foundational theory, but our results also include computable lower bounds and approximations of the distance which will be useful for practical applications.

Updated: 2024-10-30 19:58:00

标题: Z-Gromov-Wasserstein距离

摘要: Gromov-Wasserstein（GW）距离是一种用于比较度量测度空间的强大工具，在数据科学和机器学习领域得到了广泛应用。受到分析具有越来越复杂结构的数据集（如节点和边属性图）的需求驱使，近期文献中引入了几种GW距离的变体。本文旨在建立GW-like距离理论的一般框架，考虑了度量测度空间概念的广泛推广：对于任意度量空间$Z$，我们定义了$Z$-网络为一个带有在$Z$中取值的核的测度空间。我们引入了一种通过定义GW距离的一般化来比较$Z$-网络的方法，我们将其称为$Z$-Gromov-Wasserstein（$Z$-GW）距离。这种构造包含了许多先前已知的度量，并提供了一种统一的方法来理解它们的共同属性。本文证明了$Z$-GW距离定义了$Z$-网络空间上的一种度量，保留了$Z$的可分性、完备性和测地性等理想特性。这些特性对于现有的符合我们框架的GW距离的变体中许多是未知的。我们关注基础理论，但我们的结果也包括可计算的距离下界和近似，这对实际应用将会很有用。

更新时间: 2024-10-30 19:58:00

领域: math.MG,cs.LG

下载: http://arxiv.org/abs/2408.08233v2

Differentiable Modal Synthesis for Physical Modeling of Planar String Sound and Motion Simulation

While significant advancements have been made in music generation and differentiable sound synthesis within machine learning and computer audition, the simulation of instrument vibration guided by physical laws has been underexplored. To address this gap, we introduce a novel model for simulating the spatio-temporal motion of nonlinear strings, integrating modal synthesis and spectral modeling within a neural network framework. Our model leverages physical properties and fundamental frequencies as inputs, outputting string states across time and space that solve the partial differential equation characterizing the nonlinear string. Empirical evaluations demonstrate that the proposed architecture achieves superior accuracy in string motion simulation compared to existing baseline architectures. The code and demo are available online.

Updated: 2024-10-30 19:54:09

标题: 可微分模态合成用于平面弦声音物理建模和运动模拟

摘要: 尽管在机器学习和计算机听觉领域取得了音乐生成和可微分声音合成方面的重大进展，但受物理定律指导的乐器振动模拟尚未得到充分探索。为了填补这一空白，我们引入了一种新颖的模型，用于模拟非线性弦的时空运动，将模态合成和谱建模融合到神经网络框架中。我们的模型利用物理特性和基本频率作为输入，输出解决描述非线性弦的偏微分方程的时空中的弦状态。实证评估表明，与现有基线架构相比，所提出的架构在弦运动模拟中实现了更高的准确性。代码和演示可在线获取。

更新时间: 2024-10-30 19:54:09

领域: eess.AS,cs.AI,cs.SD,eess.SP

下载: http://arxiv.org/abs/2407.05516v2

Stepping Out of the Shadows: Reinforcement Learning in Shadow Mode

Reinforcement learning (RL) is not yet competitive for many cyber-physical systems, such as robotics, process automation, and power systems, as training on a system with physical components cannot be accelerated, and simulation models do not exist or suffer from a large simulation-to-reality gap. During the long training time, expensive equipment cannot be used and might even be damaged due to inappropriate actions of the reinforcement learning agent. Our novel approach addresses exactly this problem: We train the reinforcement agent in a so-called shadow mode with the assistance of an existing conventional controller, which does not have to be trained and instantaneously performs reasonably well. In shadow mode, the agent relies on the controller to provide action samples and guidance towards favourable states to learn the task, while simultaneously estimating for which states the learned agent will receive a higher reward than the conventional controller. The RL agent will then control the system for these states and all other regions remain under the control of the existing controller. Over time, the RL agent will take over for an increasing amount of states, while leaving control to the baseline, where it cannot surpass its performance. Thus, we keep regret during training low and improve the performance compared to only using conventional controllers or reinforcement learning. We present and evaluate two mechanisms for deciding whether to use the RL agent or the conventional controller. The usefulness of our approach is demonstrated for a reach-avoid task, for which we are able to effectively train an agent, where standard approaches fail.

Updated: 2024-10-30 19:52:52

标题: 走出阴影：影子模式下的强化学习

摘要: 强化学习（RL）对于许多网络物理系统，如机器人技术、过程自动化和电力系统等，尚未具备竞争力，因为在具有物理组件的系统上训练无法加速，而且模拟模型不存在或存在较大的模拟与现实之间的差距。在长时间的训练过程中，昂贵的设备无法使用，甚至可能因为强化学习代理的不当操作而受损。我们的新方法正好解决了这个问题：我们在所谓的影子模式下训练强化代理，借助现有的传统控制器，无需经过训练即可立即表现良好。在影子模式下，代理依赖于控制器提供行动样本和指导，以学习任务中有利的状态，同时估计学习代理将获得比传统控制器更高奖励的状态。然后，RL代理将控制这些状态，而所有其他区域仍由现有控制器控制。随着时间的推移，RL代理将接管越来越多的状态，同时将控制留给基线，无法超越其性能。因此，我们通过在训练过程中保持后悔低并改善性能，相比仅使用传统控制器或强化学习，提出并评估了两种决定使用RL代理还是传统控制器的机制。我们的方法的实用性已在一个避障任务中得到证明，在这个任务中，我们能够有效地训练一个代理，而标准方法失败。

更新时间: 2024-10-30 19:52:52

领域: cs.LG

下载: http://arxiv.org/abs/2410.23419v1

TractShapeNet: Efficient Multi-Shape Learning with 3D Tractography Point Clouds

Brain imaging studies have demonstrated that diffusion MRI tractography geometric shape descriptors can inform the study of the brain's white matter pathways and their relationship to brain function. In this work, we investigate the possibility of utilizing a deep learning model to compute shape measures of the brain's white matter connections. We introduce a novel framework, TractShapeNet, that leverages a point cloud representation of tractography to compute five shape measures: length, span, volume, total surface area, and irregularity. We assess the performance of the method on a large dataset including 1065 healthy young adults. Experiments for shape measure computation demonstrate that our proposed TractShapeNet outperforms other point cloud-based neural network models in both the Pearson correlation coefficient and normalized error metrics. We compare the inference runtime results with the conventional shape computation tool DSI-Studio. Our results demonstrate that a deep learning approach enables faster and more efficient shape measure computation. We also conduct experiments on two downstream language cognition prediction tasks, showing that shape measures from TractShapeNet perform similarly to those computed by DSI-Studio. Our code will be available at: https://github.com/SlicerDMRI/TractShapeNet.

Updated: 2024-10-30 19:43:40

标题: TractShapeNet：利用3D Tractography点云进行高效的多形状学习

摘要: 脑成像研究表明，扩散MRI Tractography几何形状描述符可以为大脑白质通路及其与大脑功能的关系的研究提供信息。在这项工作中，我们研究了利用深度学习模型计算大脑白质连接的形状度量的可能性。我们引入了一个新颖的框架TractShapeNet，利用轨迹成像的点云表示来计算五个形状度量：长度、跨度、体积、总表面积和不规则性。我们在包括1065名健康年轻成人的大型数据集上评估了该方法的性能。形状度量计算的实验表明，我们提出的TractShapeNet在Pearson相关系数和标准化误差指标方面优于其他基于点云的神经网络模型。我们将推理运行时间结果与传统形状计算工具DSI-Studio进行比较。我们的结果表明，深度学习方法可以实现更快速和更高效的形状度量计算。我们还在两个下游语言认知预测任务上进行了实验，结果显示TractShapeNet计算的形状度量与DSI-Studio计算的形状度量表现类似。我们的代码可在以下网站找到：https://github.com/SlicerDMRI/TractShapeNet。

更新时间: 2024-10-30 19:43:40

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2410.22099v2

Interaction-Force Transport Gradient Flows

This paper presents a new gradient flow dissipation geometry over non-negative and probability measures. This is motivated by a principled construction that combines the unbalanced optimal transport and interaction forces modeled by reproducing kernels. Using a precise connection between the Hellinger geometry and the maximum mean discrepancy (MMD), we propose the interaction-force transport (IFT) gradient flows and its spherical variant via an infimal convolution of the Wasserstein and spherical MMD tensors. We then develop a particle-based optimization algorithm based on the JKO-splitting scheme of the mass-preserving spherical IFT gradient flows. Finally, we provide both theoretical global exponential convergence guarantees and improved empirical simulation results for applying the IFT gradient flows to the sampling task of MMD-minimization. Furthermore, we prove that the spherical IFT gradient flow enjoys the best of both worlds by providing the global exponential convergence guarantee for both the MMD and KL energy.

Updated: 2024-10-30 19:28:44

标题: 相互作用力传输梯度流程

摘要: 这篇论文提出了一种新的梯度流耗散几何结构，适用于非负和概率测度。这是由一个基于再现核模型的不平衡最优输运和相互作用力量的结构化构建所激发的。通过Hellinger几何与最大均值差异（MMD）之间的精确连接，我们提出了交互作用力量输运（IFT）梯度流及其球形变体，通过Wasserstein和球形MMD张量的下确界卷积。然后，我们基于保持质量的球形IFT梯度流的JKO分裂方案开发了基于粒子的优化算法。最后，我们为将IFT梯度流应用于MMD最小化的抽样任务提供了理论上的全局指数收敛保证和改进的经验模拟结果。此外，我们证明了球形IFT梯度流同时为MMD和KL能量提供全局指数收敛保证，具有最佳的两全其美之处。

更新时间: 2024-10-30 19:28:44

领域: cs.LG,math.AP,stat.ML

下载: http://arxiv.org/abs/2405.17075v2

Robust quantum dots charge autotuning using neural network uncertainty

This study presents a machine-learning-based procedure to automate the charge tuning of semiconductor spin qubits with minimal human intervention, addressing one of the significant challenges in scaling up quantum dot technologies. This method exploits artificial neural networks to identify noisy transition lines in stability diagrams, guiding a robust exploration strategy leveraging neural networks' uncertainty estimations. Tested across three distinct offline experimental datasets representing different single quantum dot technologies, the approach achieves over 99% tuning success rate in optimal cases, where more than 10% of the success is directly attributable to uncertainty exploitation. The challenging constraints of small training sets containing high diagram-to-diagram variability allowed us to evaluate the capabilities and limits of the proposed procedure.

Updated: 2024-10-30 19:27:17

标题: 利用神经网络不确定性实现鲁棒的量子点电荷自调节

摘要: 这项研究提出了一种基于机器学习的程序，以最小化人为干预自动调节半导体自旋量子比特的电荷，解决了量子点技术扩大规模中的一个重要挑战。该方法利用人工神经网络识别稳定性图中的噪音过渡线，指导了一种利用神经网络不确定性估计的稳健探索策略。在代表不同单个量子点技术的三个不同离线实验数据集上进行测试，该方法在最佳情况下实现了超过99%的调谐成功率，其中超过10%的成功直接归因于不确定性的利用。具有小训练集并包含高图与图之间变异性的挑战性约束使我们能够评估所提出的程序的能力和限制。

更新时间: 2024-10-30 19:27:17

领域: quant-ph,cs.LG,68T37 (Primary), 81V65 (Secondary),I.2.8; I.5.1

下载: http://arxiv.org/abs/2406.05175v3

Trained Without My Consent: Detecting Code Inclusion In Language Models Trained on Code

Code auditing ensures that the developed code adheres to standards, regulations, and copyright protection by verifying that it does not contain code from protected sources. The recent advent of Large Language Models (LLMs) as coding assistants in the software development process poses new challenges for code auditing. The dataset for training these models is mainly collected from publicly available sources. This raises the issue of intellectual property infringement as developers' codes are already included in the dataset. Therefore, auditing code developed using LLMs is challenging, as it is difficult to reliably assert if an LLM used during development has been trained on specific copyrighted codes, given that we do not have access to the training datasets of these models. Given the non-disclosure of the training datasets, traditional approaches such as code clone detection are insufficient for asserting copyright infringement. To address this challenge, we propose a new approach, TraWiC; a model-agnostic and interpretable method based on membership inference for detecting code inclusion in an LLM's training dataset. We extract syntactic and semantic identifiers unique to each program to train a classifier for detecting code inclusion. In our experiments, we observe that TraWiC is capable of detecting 83.87% of codes that were used to train an LLM. In comparison, the prevalent clone detection tool NiCad is only capable of detecting 47.64%. In addition to its remarkable performance, TraWiC has low resource overhead in contrast to pair-wise clone detection that is conducted during the auditing process of tools like CodeWhisperer reference tracker, across thousands of code snippets.

Updated: 2024-10-30 19:26:07

标题: 在没有我的同意的情况下接受训练：检测在代码上训练的语言模型中的代码包含

摘要: 代码审计确保开发的代码遵循标准、规定和版权保护，通过验证代码不包含受保护来源的代码来实现。最近出现的大型语言模型（LLMs）作为软件开发过程中的编码助手，为代码审计带来新挑战。用于训练这些模型的数据集主要来自公开可用的来源。这引发了知识产权侵犯的问题，因为开发人员的代码已经包含在数据集中。因此，使用LLMs开发的代码审计具有挑战性，因为很难可靠地断定在开发过程中使用的LLM是否已经在特定受版权保护代码上进行了训练，因为我们无法访问这些模型的训练数据集。鉴于训练数据集的保密性，传统方法如代码克隆检测对于断言侵犯版权是不够的。为了解决这一挑战，我们提出了一种新方法TraWiC；一种基于成员推断的模型无关且可解释的方法，用于检测代码是否包含在LLM的训练数据集中。我们提取每个程序独特的语法和语义标识符来训练一个用于检测代码包含的分类器。在我们的实验中，我们观察到TraWiC能够检测到被用于训练LLM的代码的83.87%。相比之下，流行的克隆检测工具NiCad只能检测到47.64%。除了其显著的性能之外，TraWiC在资源开销方面低于在CodeWhisperer参考追踪器等工具的审计过程中进行的成对克隆检测，跨数千个代码片段。

更新时间: 2024-10-30 19:26:07

领域: cs.SE,cs.LG

下载: http://arxiv.org/abs/2402.09299v4

TPP-Gaze: Modelling Gaze Dynamics in Space and Time with Neural Temporal Point Processes

Attention guides our gaze to fixate the proper location of the scene and holds it in that location for the deserved amount of time given current processing demands, before shifting to the next one. As such, gaze deployment crucially is a temporal process. Existing computational models have made significant strides in predicting spatial aspects of observer's visual scanpaths (where to look), while often putting on the background the temporal facet of attention dynamics (when). In this paper we present TPP-Gaze, a novel and principled approach to model scanpath dynamics based on Neural Temporal Point Process (TPP), that jointly learns the temporal dynamics of fixations position and duration, integrating deep learning methodologies with point process theory. We conduct extensive experiments across five publicly available datasets. Our results show the overall superior performance of the proposed model compared to state-of-the-art approaches. Source code and trained models are publicly available at: https://github.com/phuselab/tppgaze.

Updated: 2024-10-30 19:22:38

标题: TPP-Gaze: 用神经时间点过程模型空间和时间中的凝视动态

摘要: 注意力引导我们的目光定睛在场景的正确位置，并在当前处理需求下保持在该位置所需的时间，然后转移到下一个位置。因此，注视的部署关键是一个时间过程。现有的计算模型在预测观察者视觉扫视路径的空间方面取得了重大进展（在哪里看），同时往往将注意力动态的时间方面（何时）放在了背景中。在本文中，我们提出了TPP-Gaze，一种基于神经时间点过程（TPP）的模型扫视路径动态的新颖且合理的方法，它共同学习了注视位置和持续时间的时间动态，将深度学习方法与点过程理论相结合。我们在五个公开可用的数据集上进行了大量实验。我们的结果显示，与最先进的方法相比，所提出的模型表现出整体上的优越性能。源代码和训练模型可在以下网址公开获取：https://github.com/phuselab/tppgaze.

更新时间: 2024-10-30 19:22:38

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2410.23409v1

FlowLLM: Flow Matching for Material Generation with Large Language Models as Base Distributions

Material discovery is a critical area of research with the potential to revolutionize various fields, including carbon capture, renewable energy, and electronics. However, the immense scale of the chemical space makes it challenging to explore all possible materials experimentally. In this paper, we introduce FlowLLM, a novel generative model that combines large language models (LLMs) and Riemannian flow matching (RFM) to design novel crystalline materials. FlowLLM first fine-tunes an LLM to learn an effective base distribution of meta-stable crystals in a text representation. After converting to a graph representation, the RFM model takes samples from the LLM and iteratively refines the coordinates and lattice parameters. Our approach significantly outperforms state-of-the-art methods, increasing the generation rate of stable materials by over three times and increasing the rate for stable, unique, and novel crystals by $\sim50\%$ - a huge improvement on a difficult problem. Additionally, the crystals generated by FlowLLM are much closer to their relaxed state when compared with another leading model, significantly reducing post-hoc computational cost.

Updated: 2024-10-30 19:15:43

标题: FlowLLM：使用大型语言模型作为基本分布进行材料生成的流匹配

摘要: 材料发现是一项具有潜力彻底改变碳捕集、可再生能源和电子等各个领域的关键研究领域。然而，化学空间的巨大规模使得通过实验探索所有可能的材料变得具有挑战性。在本文中，我们介绍了FlowLLM，这是一种结合了大型语言模型（LLMs）和黎曼流匹配（RFM）的新型生成模型，用于设计新颖的晶体材料。FlowLLM首先对LLM进行微调，学习文本表示中介稳晶体的有效基本分布。在转换为图形表示后，RFM模型从LLM中取样，并迭代地优化坐标和晶格参数。我们的方法显著优于最先进的方法，将稳定材料的生成率提高了三倍以上，并将稳定、独特和新颖晶体的生成率提高了约50% - 这对于一个困难的问题来说是一个巨大的改进。此外，与另一个领先模型相比，FlowLLM生成的晶体与其放松状态更接近，显著降低了事后计算成本。

更新时间: 2024-10-30 19:15:43

领域: cs.LG,cond-mat.mtrl-sci,cs.AI,stat.ML

下载: http://arxiv.org/abs/2410.23405v1

Unifying Generation and Prediction on Graphs with Latent Graph Diffusion

In this paper, we propose the first framework that enables solving graph learning tasks of all levels (node, edge and graph) and all types (generation, regression and classification) using one formulation. We first formulate prediction tasks including regression and classification into a generic (conditional) generation framework, which enables diffusion models to perform deterministic tasks with provable guarantees. We then propose Latent Graph Diffusion (LGD), a generative model that can generate node, edge, and graph-level features of all categories simultaneously. We achieve this goal by embedding the graph structures and features into a latent space leveraging a powerful encoder and decoder, then training a diffusion model in the latent space. LGD is also capable of conditional generation through a specifically designed cross-attention mechanism. Leveraging LGD and the ``all tasks as generation'' formulation, our framework is capable of solving graph tasks of various levels and types. We verify the effectiveness of our framework with extensive experiments, where our models achieve state-of-the-art or highly competitive results across a wide range of generation and regression tasks.

Updated: 2024-10-30 19:07:31

标题: 使用潜在图扩散统一图生成和预测

摘要: 在本文中，我们提出了第一个框架，它能够使用一个表述解决所有级别（节点、边和图）和所有类型（生成、回归和分类）的图学习任务。我们首先将包括回归和分类在内的预测任务表述为一个通用（条件）生成框架，这使得扩散模型能够以可证明的保证执行确定性任务。然后，我们提出了潜在图扩散（LGD），这是一种生成模型，可以同时生成所有类别的节点、边和图级特征。我们通过将图结构和特征嵌入到潜在空间中，利用强大的编码器和解码器，然后在潜在空间中训练一个扩散模型来实现这一目标。LGD还能够通过一个特别设计的交叉注意机制进行条件生成。借助LGD和“将所有任务视为生成”表述，我们的框架能够解决各种级别和类型的图任务。我们通过广泛的实验证明了我们框架的有效性，其中我们的模型在各种生成和回归任务中取得了最先进或高度竞争力的结果。

更新时间: 2024-10-30 19:07:31

领域: cs.LG,cs.SI

下载: http://arxiv.org/abs/2402.02518v2

An Accelerated Algorithm for Stochastic Bilevel Optimization under Unbounded Smoothness

This paper investigates a class of stochastic bilevel optimization problems where the upper-level function is nonconvex with potentially unbounded smoothness and the lower-level problem is strongly convex. These problems have significant applications in sequential data learning, such as text classification using recurrent neural networks. The unbounded smoothness is characterized by the smoothness constant of the upper-level function scaling linearly with the gradient norm, lacking a uniform upper bound. Existing state-of-the-art algorithms require $\widetilde{O}(1/\epsilon^4)$ oracle calls of stochastic gradient or Hessian/Jacobian-vector product to find an $\epsilon$-stationary point. However, it remains unclear if we can further improve the convergence rate when the assumptions for the function in the population level also hold for each random realization almost surely (e.g., Lipschitzness of each realization of the stochastic gradient). To address this issue, we propose a new Accelerated Bilevel Optimization algorithm named AccBO. The algorithm updates the upper-level variable by normalized stochastic gradient descent with recursive momentum and the lower-level variable by the stochastic Nesterov accelerated gradient descent algorithm with averaging. We prove that our algorithm achieves an oracle complexity of $\widetilde{O}(1/\epsilon^3)$ to find an $\epsilon$-stationary point. Our proof relies on a novel lemma characterizing the dynamics of stochastic Nesterov accelerated gradient descent algorithm under distribution drift with high probability for the lower-level variable, which is of independent interest and also plays a crucial role in analyzing the hypergradient estimation error over time. Experimental results on various tasks confirm that our proposed algorithm achieves the predicted theoretical acceleration and significantly outperforms baselines in bilevel optimization.

Updated: 2024-10-30 19:05:17

标题: 一个用于无界光滑随机双层优化的加速算法

摘要: 本文研究了一类随机双层优化问题，其中上层函数是非凸的，具有潜在的无界平滑性，而下层问题是强凸的。这些问题在顺序数据学习中具有重要应用，例如使用循环神经网络进行文本分类。无界平滑性表现为上层函数的平滑常数与梯度范数呈线性关系，缺乏统一的上界。现有的最先进算法需要$\widetilde{O}(1/\epsilon^4)$次随机梯度或Hessian/Jacobian-向量乘积的预测调用来找到一个$\epsilon$-稳定点。然而，当种群水平函数的假设也几乎肯定地适用于每个随机实现时（例如，每个随机梯度实现的利普希茨性），我们是否可以进一步提高收敛速度仍不清楚。为了解决这个问题，我们提出了一个名为AccBO的新加速双层优化算法。该算法通过归一化随机梯度下降和递归动量更新上层变量，通过随机Nesterov加速梯度下降算法进行平均更新下层变量。我们证明我们的算法实现了$\widetilde{O}(1/\epsilon^3)$的预测复杂度来找到一个$\epsilon$-稳定点。我们的证明依赖于一个新颖引理，该引理描述了在分布漂移下随机Nesterov加速梯度下降算法对于下层变量的动态，这在分析随时间变化的超梯度估计误差中也具有独立的重要性并发挥关键作用。各种任务上的实验结果证实了我们提出的算法实现了预测的理论加速，并在双层优化中明显优于基线。

更新时间: 2024-10-30 19:05:17

领域: cs.LG,math.OC

下载: http://arxiv.org/abs/2409.19212v2

On the Optimality of Dilated Entropy and Lower Bounds for Online Learning in Extensive-Form Games

First-order methods (FOMs) are arguably the most scalable algorithms for equilibrium computation in large extensive-form games. To operationalize these methods, a distance-generating function, acting as a regularizer for the strategy space, must be chosen. The ratio between the strong convexity modulus and the diameter of the regularizer is a key parameter in the analysis of FOMs. A natural question is then: what is the optimal distance-generating function for extensive-form decision spaces? In this paper, we make a number of contributions, ultimately establishing that the weight-one dilated entropy (DilEnt) distance-generating function is optimal up to logarithmic factors. The DilEnt regularizer is notable due to its iterate-equivalence with Kernelized OMWU (KOMWU) -- the algorithm with state-of-the-art dependence on the game tree size in extensive-form games -- when used in conjunction with the online mirror descent (OMD) algorithm. However, the standard analysis for OMD is unable to establish such a result; the only current analysis is by appealing to the iterate equivalence to KOMWU. We close this gap by introducing a pair of primal-dual treeplex norms, which we contend form the natural analytic viewpoint for studying the strong convexity of DilEnt. Using these norm pairs, we recover the diameter-to-strong-convexity ratio that predicts the same performance as KOMWU. Along with a new regret lower bound for online learning in sequence-form strategy spaces, we show that this ratio is nearly optimal. Finally, we showcase our analytic techniques by refining the analysis of Clairvoyant OMD when paired with DilEnt, establishing an $\mathcal{O}(n \log |\mathcal{V}| \log T/T)$ approximation rate to coarse correlated equilibrium in $n$-player games, where $|\mathcal{V}|$ is the number of reduced normal-form strategies of the players, establishing the new state of the art.

Updated: 2024-10-30 19:03:33

标题: 有关《On the Optimality of Dilated Entropy and Lower Bounds for Online Learning in Extensive-Form Games》的翻译是：关于在广泛形式游戏中稀疏熵的最优性和在线学习的下界

摘要: First-order methods (FOMs) are considered to be the most efficient algorithms for computing equilibria in large extensive-form games. In order to use these methods, a distance-generating function is required to act as a regularizer for the strategy space. The ratio between the strong convexity modulus and the diameter of the regularizer is an important parameter for analyzing FOMs. A key question is what the optimal distance-generating function is for extensive-form decision spaces. This paper presents several contributions and concludes that the weight-one dilated entropy (DilEnt) distance-generating function is optimal, with only minor logarithmic factors. The DilEnt regularizer is noteworthy for its equivalence with Kernelized OMWU (KOMWU) and its state-of-the-art performance in extensive-form games when used with the online mirror descent (OMD) algorithm. However, the standard analysis for OMD cannot establish this result, and the only current analysis relies on the equivalence to KOMWU. This paper introduces a pair of primal-dual treeplex norms as a natural analytical approach to studying the strong convexity of DilEnt. By using these norm pairs, the diameter-to-strong-convexity ratio that predicts performance similar to KOMWU is determined. Additionally, a new regret lower bound for online learning in sequence-form strategy spaces is presented, showing that this ratio is nearly optimal. The paper also refines the analysis of Clairvoyant OMD when paired with DilEnt, achieving an approximation rate of $\mathcal{O}(n \log |\mathcal{V}| \log T/T)$ for coarse correlated equilibrium in n-player games, where $|\mathcal{V}|$ is the number of reduced normal-form strategies of the players, establishing a new state of the art.

更新时间: 2024-10-30 19:03:33

领域: cs.LG,cs.GT

下载: http://arxiv.org/abs/2410.23398v1

Learning Successor Features the Simple Way

In Deep Reinforcement Learning (RL), it is a challenge to learn representations that do not exhibit catastrophic forgetting or interference in non-stationary environments. Successor Features (SFs) offer a potential solution to this challenge. However, canonical techniques for learning SFs from pixel-level observations often lead to representation collapse, wherein representations degenerate and fail to capture meaningful variations in the data. More recent methods for learning SFs can avoid representation collapse, but they often involve complex losses and multiple learning phases, reducing their efficiency. We introduce a novel, simple method for learning SFs directly from pixels. Our approach uses a combination of a Temporal-difference (TD) loss and a reward prediction loss, which together capture the basic mathematical definition of SFs. We show that our approach matches or outperforms existing SF learning techniques in both 2D (Minigrid), 3D (Miniworld) mazes and Mujoco, for both single and continual learning scenarios. As well, our technique is efficient, and can reach higher levels of performance in less time than other approaches. Our work provides a new, streamlined technique for learning SFs directly from pixel observations, with no pretraining required.

Updated: 2024-10-30 18:59:48

标题: 以简单方式学习继承特征

摘要: 在深度强化学习（RL）中，学习不展示灾难性遗忘或干扰的表示在非平稳环境中是一项挑战。继承特征（SFs）为解决这一挑战提供了潜在解决方案。然而，从像素级观察学习SFs的规范技术通常会导致表示崩溃，其中表示退化并无法捕捉数据中的有意义变化。最近的学习SFs方法可以避免表示崩溃，但通常涉及复杂的损失和多个学习阶段，降低了效率。我们介绍了一种新颖简单的方法，可以直接从像素学习SFs。我们的方法结合了时间差分（TD）损失和奖励预测损失，这两者共同捕捉了SFs的基本数学定义。我们展示了我们的方法在2D（Minigrid）、3D（Miniworld）迷宫和Mujoco中，无论是单一还是连续学习场景中，都能匹配或胜过现有的SF学习技术。此外，我们的技术高效，可以在比其他方法更短的时间内达到更高水平的性能。我们的工作提供了一种新的简化的技术，可以直接从像素观察中学习SFs，无需预训练。

更新时间: 2024-10-30 18:59:48

领域: cs.LG

下载: http://arxiv.org/abs/2410.22133v2

Adaptive Network Intervention for Complex Systems: A Hierarchical Graph Reinforcement Learning Approach

Effective governance and steering of behavior in complex multi-agent systems (MAS) are essential for managing system-wide outcomes, particularly in environments where interactions are structured by dynamic networks. In many applications, the goal is to promote pro-social behavior among agents, where network structure plays a pivotal role in shaping these interactions. This paper introduces a Hierarchical Graph Reinforcement Learning (HGRL) framework that governs such systems through targeted interventions in the network structure. Operating within the constraints of limited managerial authority, the HGRL framework demonstrates superior performance across a range of environmental conditions, outperforming established baseline methods. Our findings highlight the critical influence of agent-to-agent learning (social learning) on system behavior: under low social learning, the HGRL manager preserves cooperation, forming robust core-periphery networks dominated by cooperators. In contrast, high social learning accelerates defection, leading to sparser, chain-like networks. Additionally, the study underscores the importance of the system manager's authority level in preventing system-wide failures, such as agent rebellion or collapse, positioning HGRL as a powerful tool for dynamic network-based governance.

Updated: 2024-10-30 18:59:02

标题: 复杂系统的自适应网络干预：一种层次图强化学习方法

摘要: 复杂多智能体系统（MAS）中有效的治理和行为引导对于管理系统整体结果至关重要，特别是在交互由动态网络结构化的环境中。在许多应用中，目标是促进智能体之间的亲社会行为，其中网络结构在塑造这些交互中发挥关键作用。本文介绍了一个层次图强化学习（HGRL）框架，通过在网络结构中进行有针对性的干预来管理这些系统。在有限的管理权约束下运作，HGRL框架在各种环境条件下展现出优越的性能，优于已建立的基准方法。我们的研究结果强调了智能体间学习（社会学习）对系统行为的重要影响：在低社会学习下，HGRL管理者保持合作，形成由合作者主导的稳健核心-边缘网络。相比之下，高社会学习加速叛变，导致更稀疏、链状的网络。此外，研究强调了系统管理者的权威水平在预防系统整体失败，如智能体叛变或崩溃方面的重要性，将HGRL定位为基于动态网络的治理的强大工具。

更新时间: 2024-10-30 18:59:02

领域: cs.LG,cs.AI,cs.GT,cs.MA

下载: http://arxiv.org/abs/2410.23396v1

Refusal in Language Models Is Mediated by a Single Direction

Conversational large language models are fine-tuned for both instruction-following and safety, resulting in models that obey benign requests but refuse harmful ones. While this refusal behavior is widespread across chat models, its underlying mechanisms remain poorly understood. In this work, we show that refusal is mediated by a one-dimensional subspace, across 13 popular open-source chat models up to 72B parameters in size. Specifically, for each model, we find a single direction such that erasing this direction from the model's residual stream activations prevents it from refusing harmful instructions, while adding this direction elicits refusal on even harmless instructions. Leveraging this insight, we propose a novel white-box jailbreak method that surgically disables refusal with minimal effect on other capabilities. Finally, we mechanistically analyze how adversarial suffixes suppress propagation of the refusal-mediating direction. Our findings underscore the brittleness of current safety fine-tuning methods. More broadly, our work showcases how an understanding of model internals can be leveraged to develop practical methods for controlling model behavior.

Updated: 2024-10-30 18:57:07

标题: 拒绝在语言模型中是由单一方向中介的

摘要: 会话式大型语言模型被精调用于遵循指示和安全性，导致模型遵循良性请求但拒绝有害请求。虽然这种拒绝行为在聊天模型中很普遍，但其基本机制仍不为人所理解。在这项工作中，我们展示拒绝是通过一个一维子空间中介的，在13个流行的开源聊天模型中，这些模型大小达到72B参数。具体来说，对于每个模型，我们找到一个单一方向，擦除这个方向从模型的残余流激活中，阻止它拒绝有害指令，而添加这个方向则在即使是无害指令上引发拒绝。利用这一见解，我们提出了一种新颖的白盒越狱方法，可以在对其他能力的影响最小的情况下手术性地禁用拒绝。最后，我们机械地分析了敌对后缀如何抑制拒绝中介方向的传播。我们的发现强调了当前安全精调方法的脆弱性。更广泛地说，我们的工作展示了如何利用对模型内部的理解来开发控制模型行为的实用方法。

更新时间: 2024-10-30 18:57:07

领域: cs.LG,cs.AI,cs.CL

下载: http://arxiv.org/abs/2406.11717v3

Resource Governance in Networked Systems via Integrated Variational Autoencoders and Reinforcement Learning

We introduce a framework that integrates variational autoencoders (VAE) with reinforcement learning (RL) to balance system performance and resource usage in multi-agent systems by dynamically adjusting network structures over time. A key innovation of this method is its capability to handle the vast action space of the network structure. This is achieved by combining Variational Auto-Encoder and Deep Reinforcement Learning to control the latent space encoded from the network structures. The proposed method, evaluated on the modified OpenAI particle environment under various scenarios, not only demonstrates superior performance compared to baselines but also reveals interesting strategies and insights through the learned behaviors.

Updated: 2024-10-30 18:57:02

标题: 通过集成变分自动编码器和强化学习实现网络系统中的资源管理

摘要: 我们引入了一个框架，将变分自动编码器（VAE）与强化学习（RL）相结合，通过动态调整网络结构来平衡多智能体系统中的系统性能和资源使用。这种方法的一个关键创新是它能够处理网络结构的庞大动作空间。通过将变分自动编码器和深度强化学习结合起来控制从网络结构编码的潜在空间来实现这一目标。所提出的方法在修改后的OpenAI粒子环境下进行了评估，不仅在各种情况下表现出优越的性能，而且通过学习到的行为展示了有趣的策略和见解。

更新时间: 2024-10-30 18:57:02

领域: cs.LG,cs.AI,cs.MA

下载: http://arxiv.org/abs/2410.23393v1

GreedyML: A Parallel Algorithm for Maximizing Submodular Functions

We describe a parallel approximation algorithm for maximizing monotone submodular functions subject to hereditary constraints on distributed memory multiprocessors. Our work is motivated by the need to solve submodular optimization problems on massive data sets, for practical applications in areas such as data summarization, machine learning, and graph sparsification. Our work builds on the randomized distributed RandGreedI algorithm, proposed by Barbosa, Ene, Nguyen, and Ward (2015). This algorithm computes a distributed solution by randomly partitioning the data among all the processors and then employing a single accumulation step in which all processors send their partial solutions to one processor. However, for large problems, the accumulation step could exceed the memory available on a processor, and the processor which performs the accumulation could become a computational bottleneck. Here, we propose a generalization of the RandGreedI algorithm that employs multiple accumulation steps to reduce the memory required. We analyze the approximation ratio and the time complexity of the algorithm (in the BSP model). We also evaluate the new GreedyML algorithm on three classes of problems, and report results from massive data sets with millions of elements. The results show that the GreedyML algorithm can solve problems where the sequential Greedy and distributed RandGreedI algorithms fail due to memory constraints. For certain computationally intensive problems, the GreedyML algorithm can be faster than the RandGreedI algorithm. The observed approximation quality of the solutions computed by the GreedyML algorithm closely matches those obtained by the RandGreedI algorithm on these problems.

Updated: 2024-10-30 18:51:58

标题: 贪婪ML：一种用于最大化子模函数的并行算法

摘要: 我们描述了一个并行逼近算法，用于在分布式内存多处理器上最大化单调次模函数，受限于遗传约束。我们的工作是为了解决大数据集上的次模优化问题，以应用于数据摘要、机器学习和图稀疏化等领域的实际应用。我们的工作基于由Barbosa、Ene、Nguyen和Ward（2015）提出的随机分布式RandGreedI算法。该算法通过将数据随机分区给所有处理器来计算分布式解决方案，然后利用一个累积步骤，在该步骤中，所有处理器将它们的部分解决方案发送给一个处理器。然而，对于大问题，累积步骤可能超出处理器可用的内存，并且执行累积的处理器可能成为计算瓶颈。在这里，我们提出了RandGreedI算法的一个泛化版本，该版本使用多个累积步骤来减少所需的内存。我们分析了算法的近似比和时间复杂度（在BSP模型中）。我们还评估了新的GreedyML算法在三类问题上的表现，并报告了来自数百万元素的大数据集的结果。结果显示，GreedyML算法可以解决由于内存约束而导致顺序贪心和分布式RandGreedI算法失败的问题。对于某些计算密集型问题，GreedyML算法可能比RandGreedI算法更快。GreedyML算法计算的解决方案的观察到的近似质量与在这些问题上由RandGreedI算法得到的解决方案密切匹配。

更新时间: 2024-10-30 18:51:58

领域: cs.DC,cs.DS,cs.LG

下载: http://arxiv.org/abs/2403.10332v2

Understanding Representation of Deep Equilibrium Models from Neural Collapse Perspective

Deep Equilibrium Model (DEQ), which serves as a typical implicit neural network, emphasizes their memory efficiency and competitive performance compared to explicit neural networks. However, there has been relatively limited theoretical analysis on the representation of DEQ. In this paper, we utilize the Neural Collapse ($\mathcal{NC}$) as a tool to systematically analyze the representation of DEQ under both balanced and imbalanced conditions. $\mathcal{NC}$ is an interesting phenomenon in the neural network training process that characterizes the geometry of class features and classifier weights. While extensively studied in traditional explicit neural networks, the $\mathcal{NC}$ phenomenon has not received substantial attention in the context of implicit neural networks. We theoretically show that $\mathcal{NC}$ exists in DEQ under balanced conditions. Moreover, in imbalanced settings, despite the presence of minority collapse, DEQ demonstrated advantages over explicit neural networks. These advantages include the convergence of extracted features to the vertices of a simplex equiangular tight frame and self-duality properties under mild conditions, highlighting DEQ's superiority in handling imbalanced datasets. Finally, we validate our theoretical analyses through experiments in both balanced and imbalanced scenarios.

Updated: 2024-10-30 18:50:16

标题: 从神经坍塌的角度理解深度平衡模型的表示

摘要: 深度平衡模型（DEQ）作为典型的隐式神经网络，强调其相对于显式神经网络的记忆效率和竞争性能。然而，对于DEQ的表示仍存在相对有限的理论分析。本文利用神经坍塌（$\mathcal{NC}$）作为一种工具，系统地分析DEQ在平衡和不平衡条件下的表示。$\mathcal{NC$是神经网络训练过程中的一个有趣现象，它表征了类特征和分类器权重的几何形状。虽然在传统显式神经网络中得到了广泛研究，但在隐式神经网络的背景下，$\mathcal{NC}$现象并未受到充分关注。我们理论上证明了在平衡条件下DEQ存在$\mathcal{NC}$。此外，在不平衡设置中，尽管存在少数坍塌，DEQ表现出优于显式神经网络的优势。这些优势包括提取特征收敛到简单化等角紧框架的顶点和在温和条件下的自对偶性质，突出了DEQ在处理不平衡数据集方面的优越性。最后，我们通过在平衡和不平衡场景下的实验证实了我们的理论分析。

更新时间: 2024-10-30 18:50:16

领域: cs.LG

下载: http://arxiv.org/abs/2410.23391v1

Transfer Learning for Diffusion Models

Diffusion models, a specific type of generative model, have achieved unprecedented performance in recent years and consistently produce high-quality synthetic samples. A critical prerequisite for their notable success lies in the presence of a substantial number of training samples, which can be impractical in real-world applications due to high collection costs or associated risks. Consequently, various finetuning and regularization approaches have been proposed to transfer knowledge from existing pre-trained models to specific target domains with limited data. This paper introduces the Transfer Guided Diffusion Process (TGDP), a novel approach distinct from conventional finetuning and regularization methods. We prove that the optimal diffusion model for the target domain integrates pre-trained diffusion models on the source domain with additional guidance from a domain classifier. We further extend TGDP to a conditional version for modeling the joint distribution of data and its corresponding labels, together with two additional regularization terms to enhance the model performance. We validate the effectiveness of TGDP on both simulated and real-world datasets.

Updated: 2024-10-30 18:48:50

标题: 扩散模型的迁移学习

摘要: 扩散模型是一种特定类型的生成模型，在近年来取得了前所未有的性能，并始终产生高质量的合成样本。它们引人注目的成功的一个关键前提在于存在大量的训练样本，但由于高昂的收集成本或相关风险，在现实世界的应用中可能是不切实际的。因此，人们提出了各种微调和正则化方法，将知识从现有预训练模型转移到具有有限数据的特定目标领域。本文介绍了一种名为Transfer Guided Diffusion Process (TGDP)的新颖方法，与传统的微调和正则化方法有所不同。我们证明了针对目标域的最优扩散模型将预先训练的扩散模型与来自领域分类器的额外指导相结合。我们进一步将TGDP扩展为一个条件版本，用于建模数据及其对应标签的联合分布，同时增加了两个额外的正则化项来增强模型性能。我们验证了TGDP在模拟和真实世界数据集上的有效性。

更新时间: 2024-10-30 18:48:50

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2405.16876v3

DynaMo: In-Domain Dynamics Pretraining for Visuo-Motor Control

Imitation learning has proven to be a powerful tool for training complex visuomotor policies. However, current methods often require hundreds to thousands of expert demonstrations to handle high-dimensional visual observations. A key reason for this poor data efficiency is that visual representations are predominantly either pretrained on out-of-domain data or trained directly through a behavior cloning objective. In this work, we present DynaMo, a new in-domain, self-supervised method for learning visual representations. Given a set of expert demonstrations, we jointly learn a latent inverse dynamics model and a forward dynamics model over a sequence of image embeddings, predicting the next frame in latent space, without augmentations, contrastive sampling, or access to ground truth actions. Importantly, DynaMo does not require any out-of-domain data such as Internet datasets or cross-embodied datasets. On a suite of six simulated and real environments, we show that representations learned with DynaMo significantly improve downstream imitation learning performance over prior self-supervised learning objectives, and pretrained representations. Gains from using DynaMo hold across policy classes such as Behavior Transformer, Diffusion Policy, MLP, and nearest neighbors. Finally, we ablate over key components of DynaMo and measure its impact on downstream policy performance. Robot videos are best viewed at https://dynamo-ssl.github.io

Updated: 2024-10-30 18:48:00

标题: DynaMo：面向视觉运动控制的领域内动态预训练

摘要: 模仿学习已被证明是培训复杂视觉动作策略的强大工具。然而，当前的方法通常需要数百到数千个专家演示来处理高维视觉观察。这种数据效率低的一个关键原因是视觉表示主要是在域外数据上预训练或直接通过行为克隆目标训练的。在这项工作中，我们提出了DynaMo，一种新的域内自监督学习视觉表示方法。给定一组专家演示，我们联合学习一个潜在的逆动力学模型和一个正向动力学模型，通过一系列图像嵌入预测潜在空间中的下一帧，不需要增强、对比采样或访问地面真实动作。重要的是，DynaMo不需要任何域外数据，如互联网数据集或交叉体数据集。在六个模拟和真实环境的套件中，我们展示了使用DynaMo学到的表示显著改善了先前自监督学习目标和预训练表示的下游模仿学习性能。使用DynaMo的收益在行为变换器、扩散策略、MLP和最近邻等策略类别中保持。最后，我们对DynaMo的关键组件进行了消融实验，并测量其对下游策略性能的影响。机器人视频最好在https://dynamo-ssl.github.io上观看。

更新时间: 2024-10-30 18:48:00

领域: cs.RO,cs.AI,cs.CV,cs.LG

下载: http://arxiv.org/abs/2409.12192v2

When to Sense and Control? A Time-adaptive Approach for Continuous-Time RL

Reinforcement learning (RL) excels in optimizing policies for discrete-time Markov decision processes (MDP). However, various systems are inherently continuous in time, making discrete-time MDPs an inexact modeling choice. In many applications, such as greenhouse control or medical treatments, each interaction (measurement or switching of action) involves manual intervention and thus is inherently costly. Therefore, we generally prefer a time-adaptive approach with fewer interactions with the system. In this work, we formalize an RL framework, Time-adaptive Control & Sensing (TaCoS), that tackles this challenge by optimizing over policies that besides control predict the duration of its application. Our formulation results in an extended MDP that any standard RL algorithm can solve. We demonstrate that state-of-the-art RL algorithms trained on TaCoS drastically reduce the interaction amount over their discrete-time counterpart while retaining the same or improved performance, and exhibiting robustness over discretization frequency. Finally, we propose OTaCoS, an efficient model-based algorithm for our setting. We show that OTaCoS enjoys sublinear regret for systems with sufficiently smooth dynamics and empirically results in further sample-efficiency gains.

Updated: 2024-10-30 18:45:36

标题: 何时感知和控制？一种连续时间强化学习的时间自适应方法

摘要: 强化学习（RL）在优化离散时间马尔可夫决策过程（MDP）的策略方面表现出色。然而，许多系统在本质上是连续的，使得离散时间MDPs成为一个不太精确的建模选择。在许多应用中，例如温室控制或医疗治疗，每次交互（测量或动作切换）都涉及手动干预，因此本质上是昂贵的。因此，我们通常更喜欢一种时间自适应的方法，与系统的互动次数较少。在这项工作中，我们形式化了一个RL框架，称为Time-adaptive Control & Sensing（TaCoS），通过优化控制预测其应用持续时间的策略来应对这一挑战。我们的公式化结果是一个扩展的MDP，任何标准的RL算法都可以解决。我们证明，训练在TaCoS上的最先进的RL算法大大减少了与其离散时间对应物的互动量，同时保持相同或更好的性能，并且表现出对离散化频率的鲁棒性。最后，我们提出了OTaCoS，一个针对我们设置的有效的基于模型的算法。我们展示OTaCoS对于具有足够平滑动力学的系统享有次线性遗憾，并在实证中导致更进一步的样本效率增益。

更新时间: 2024-10-30 18:45:36

领域: cs.LG

下载: http://arxiv.org/abs/2406.01163v3

Ensemble learning of the atrial fiber orientation with physics-informed neural networks

The anisotropic structure of the myocardium is a key determinant of the cardiac function. To date, there is no imaging modality to assess in-vivo the cardiac fiber structure. We recently proposed Fibernet, a method for the automatic identification of the anisotropic conduction -- and thus fibers -- in the atria from local electrical recordings. Fibernet uses cardiac activation as recorded during electroanatomical mappings to infer local conduction properties using physics-informed neural networks. In this work, we extend Fibernet to cope with the uncertainty in the estimated fiber field. Specifically, we use an ensemble of neural networks to produce multiple samples, all fitting the observed data, and compute posterior statistics. We also introduce a methodology to select the best fiber orientation members and define the input of the neural networks directly on the atrial surface. With these improvements, we outperform the previous methodology in terms of fiber orientation error in 8 different atrial anatomies. Currently, our approach can estimate the fiber orientation and conduction velocities in under 7 minutes with quantified uncertainty, which opens the door to its application in clinical practice. We hope the proposed methodology will enable further personalization of cardiac digital twins for precision medicine.

Updated: 2024-10-30 18:45:19

标题: 使用物理信息神经网络对心房纤维定向进行集成学习

摘要: 心肌的各向异性结构是心脏功能的关键决定因素。迄今为止，还没有成像模式可以评估心脏纤维结构。我们最近提出了Fibernet，这是一种自动识别心房中各向异性传导 - 从而纤维 - 的方法，通过当地电信记录。Fibernet利用在电生解剖映射期间记录的心脏激活来推断局部传导特性，使用基于物理的神经网络。在这项工作中，我们将Fibernet扩展到处理估计的纤维场中的不确定性。具体来说，我们使用一组神经网络生成多个样本，所有样本都适合观察到的数据，并计算后验统计。我们还引入了一种方法来选择最佳的纤维定向成员，并直接在心房表面上定义神经网络的输入。通过这些改进，我们在8种不同的心房解剖学方面在纤维定向误差方面表现优于先前的方法。目前，我们的方法可以在不到7分钟内估计纤维定向和传导速度，并带有量化的不确定性，这为其在临床实践中的应用打开了大门。我们希望提出的方法能够进一步个性化心脏数字模型，以用于精准医学。

更新时间: 2024-10-30 18:45:19

领域: cs.LG,eess.IV,q-bio.TO

下载: http://arxiv.org/abs/2410.23388v1

NeoRL: Efficient Exploration for Nonepisodic RL

We study the problem of nonepisodic reinforcement learning (RL) for nonlinear dynamical systems, where the system dynamics are unknown and the RL agent has to learn from a single trajectory, i.e., without resets. We propose Nonepisodic Optimistic RL (NeoRL), an approach based on the principle of optimism in the face of uncertainty. NeoRL uses well-calibrated probabilistic models and plans optimistically w.r.t. the epistemic uncertainty about the unknown dynamics. Under continuity and bounded energy assumptions on the system, we provide a first-of-its-kind regret bound of $\setO(\beta_T \sqrt{T \Gamma_T})$ for general nonlinear systems with Gaussian process dynamics. We compare NeoRL to other baselines on several deep RL environments and empirically demonstrate that NeoRL achieves the optimal average cost while incurring the least regret.

Updated: 2024-10-30 18:43:55

标题: NeoRL: 非时序RL的高效探索

摘要: 我们研究了非际性强化学习（RL）在非线性动态系统中的问题，其中系统动态未知，RL代理需要从单个轨迹中学习，即没有重置。我们提出了Nonepisodic Optimistic RL（NeoRL），这是一种基于乐观主义原则面对不确定性的方法。NeoRL使用良好校准的概率模型，并在未知动态的认知不确定性方面进行乐观计划。在对系统连续性和有界能量的假设下，我们为具有高斯过程动态的一般非线性系统提供了类似$\setO(\beta_T \sqrt{T \Gamma_T})$的遗憾界。我们将NeoRL与其他基线在几个深度RL环境上进行比较，并在经验上证明NeoRL实现了最佳平均成本，同时产生最小的遗憾。

更新时间: 2024-10-30 18:43:55

领域: cs.LG

下载: http://arxiv.org/abs/2406.01175v3

STIED: A deep learning model for the SpatioTemporal detection of focal Interictal Epileptiform Discharges with MEG

Magnetoencephalography (MEG) allows the non-invasive detection of interictal epileptiform discharges (IEDs). Clinical MEG analysis in epileptic patients traditionally relies on the visual identification of IEDs, which is time consuming and partially subjective. Automatic, data-driven detection methods exist but show limited performance. Still, the rise of deep learning (DL)-with its ability to reproduce human-like abilities-could revolutionize clinical MEG practice. Here, we developed and validated STIED, a simple yet powerful supervised DL algorithm combining two convolutional neural networks with temporal (1D time-course) and spatial (2D topography) features of MEG signals inspired from current clinical guidelines. Our DL model enabled both temporal and spatial localization of IEDs in patients suffering from focal epilepsy with frequent and high amplitude spikes (FE group), with high-performance metrics-accuracy, specificity, and sensitivity all exceeding 85%-when learning from spatiotemporal features of IEDs. This performance can be attributed to our handling of input data, which mimics established clinical MEG practice. Reverse engineering further revealed that STIED encodes fine spatiotemporal features of IEDs rather than their mere amplitude. The model trained on the FE group also showed promising results when applied to a separate group of presurgical patients with different types of refractory focal epilepsy, though further work is needed to distinguish IEDs from physiological transients. This study paves the way of incorporating STIED and DL algorithms into the routine clinical MEG evaluation of epilepsy.

Updated: 2024-10-30 18:41:22

标题: STIED：一种用于脑电磁图中局部间歇性癫痫放电时空检测的深度学习模型

摘要: 磁图脑电图(MEG)允许非侵入性地检测间发性癫痫放电(IEDs)。癫痫患者的临床MEG分析传统上依赖于对IEDs的视觉识别，这是耗时且部分主观的。存在自动、数据驱动的检测方法，但表现有限。然而，深度学习(DL)的兴起-具有重现类人能力的能力-可能彻底改变临床MEG实践。在这里，我们开发并验证了STIED，一个简单但强大的监督式DL算法，结合了两个卷积神经网络，灵感来自当前临床指南的MEG信号的时间(1D时间序列)和空间(2D地形)特征。我们的DL模型使得对于患有频繁且高振幅尖峰的局灶性癫痫的患者，IEDs的时间和空间定位都变得可能，性能指标-准确性、特异性和敏感性都超过85%，当从IEDs的时空特征中学习时。这种性能可以归因于我们对输入数据的处理，它模拟了已建立的临床MEG实践。逆向工程进一步揭示，STIED编码了IEDs的精细的时空特征，而不仅仅是它们的振幅。在FE组上训练的模型在应用于一个不同类型的难治性局灶性癫痫的术前患者组时也显示出有希望的结果，尽管需要进一步工作来区分IEDs和生理瞬变。这项研究为将STIED和DL算法纳入癫痫的常规临床MEG评估铺平了道路。

更新时间: 2024-10-30 18:41:22

领域: physics.med-ph,cs.AI,cs.CV,cs.LG

下载: http://arxiv.org/abs/2410.23386v1

Estimating Neural Network Robustness via Lipschitz Constant and Architecture Sensitivity

Ensuring neural network robustness is essential for the safe and reliable operation of robotic learning systems, especially in perception and decision-making tasks within real-world environments. This paper investigates the robustness of neural networks in perception systems, specifically examining their sensitivity to targeted, small-scale perturbations. We identify the Lipschitz constant as a key metric for quantifying and enhancing network robustness. We derive an analytical expression to compute the Lipschitz constant based on neural network architecture, providing a theoretical basis for estimating and improving robustness. Several experiments reveal the relationship between network design, the Lipschitz constant, and robustness, offering practical insights for developing safer, more robust robot learning systems.

Updated: 2024-10-30 18:38:42

标题: 通过李普希茨常数和架构敏感性估计神经网络的稳健性

摘要: 确保神经网络的稳健性对于机器人学习系统的安全和可靠运行至关重要，尤其是在现实环境中的感知和决策任务中。本文研究了神经网络在感知系统中的稳健性，特别是检查它们对有针对性的、小规模扰动的敏感性。我们确定Lipschitz常数作为量化和增强网络稳健性的关键指标。我们推导出一个分析表达式，根据神经网络架构计算Lipschitz常数，为估计和提高稳健性提供了理论基础。几个实验揭示了网络设计、Lipschitz常数和稳健性之间的关系，为开发更安全、更稳健的机器人学习系统提供了实用见解。

更新时间: 2024-10-30 18:38:42

领域: cs.LG,cs.AI,cs.RO

下载: http://arxiv.org/abs/2410.23382v1

Large Model Strategic Thinking, Small Model Efficiency: Transferring Theory of Mind in Large Language Models

As the performance of larger, newer Large Language Models continues to improve for strategic Theory of Mind (ToM) tasks, the demand for these state-of-the-art models increases commensurately. However, their deployment is costly both in terms of processing power and time. In this paper, we investigate the feasibility of creating smaller, highly-performing specialized algorithms by way of fine-tuning. To do this, we first present a large pre-trained model with 20 unique scenarios that combine different social contexts with games of varying social dilemmas, record its answers, and use them for Q&A fine-tuning on a smaller model of the same family. Our focus is on in-context game-theoretic decision-making, the same domain within which human interaction occurs and that requires both a theory of mind (or a semblance thereof) and an understanding of social dynamics. The smaller model is therefore trained not just on the answers provided, but also on the motivations provided by the larger model, which should contain advice and guidelines to navigate both strategic dilemmas and social cues. We find that the fine-tuned smaller language model consistently bridged the gap in performance between the smaller pre-trained version of the model and its larger relative and that its improvements extended in areas and contexts beyond the ones provided in the training examples, including on out-of-sample scenarios that include completely different game structures. On average for all games, through fine-tuning, the smaller model showed a 46% improvement measured as alignment towards the behavior of the larger model, with 100% representing indistinguishable behavior. When presented with out-of-sample social contexts and games, the fine-tuned model still displays remarkable levels of alignment, reaching an improvement of 18% and 28% respectively.

Updated: 2024-10-30 18:37:57

标题: 大型模型的战略思维，小型模型的效率：在大型语言模型中传递心智理论

摘要: 随着更大、更新的大型语言模型在战略心灵理论任务中的表现不断提高，对这些尖端模型的需求也相应增加。然而，它们的部署在处理能力和时间方面都是昂贵的。本文旨在通过微调探讨创建更小、性能更高的专门算法的可行性。为此，我们首先展示了一个包含不同社会背景和不同社会困境游戏的20个独特场景的大型预训练模型，记录其答案，并将其用于同一系列较小模型的问答微调。我们的重点是在背景下进行博弈论决策，这是人类互动发生的领域，需要心灵理论（或类似的概念）和对社会动态的理解。因此，较小的模型不仅在提供的答案上进行训练，还在较大模型提供的动机上进行训练，后者应包含用于应对战略困境和社交提示的建议和指导。我们发现，经过微调，较小的语言模型始终弥合了较小的预训练模型与其较大同类之间性能差距，并且其改进延伸至超出训练示例提供的领域和背景，包括包含完全不同游戏结构的样本之外的场景。在所有游戏中平均而言，通过微调，较小模型表现出了46%的改进，以对齐较大模型行为为度量，其中100%代表无法区分的行为。当面对样本之外的社会背景和游戏时，微调模型仍然表现出显著的对齐水平，分别达到18%和28%的改进。

更新时间: 2024-10-30 18:37:57

领域: cs.CL,cs.AI,cs.CY,cs.ET,cs.GT

下载: http://arxiv.org/abs/2408.05241v4

Ethical Leadership in the Age of AI Challenges, Opportunities and Framework for Ethical Leadership

Artificial Intelligence is currently and rapidly changing the way organizations and businesses operate. Ethical leadership has become significantly important since organizations and businesses across various sectors are evolving with AI. Organizations and businesses may be facing several challenges and potential opportunities when using AI. Ethical leadership plays a central role in guiding organizations in facing those challenges and maximizing on those opportunities. This article explores the essence of ethical leadership in the age of AI, starting with a simplified introduction of ethical leadership and AI, then dives into an understanding of ethical leadership, its characteristics and importance, the ethical challenges AI causes including bias in AI algorithms. The opportunities for ethical leadership in the age of AI answers the question: What actionable strategies can leaders employ to address the challenges and leverage opportunities? and describes the benefits for organizations through these opportunities. A proposed framework for ethical leadership is presented in this article, incorporating the core components: fairness, transparency, sustainability etc. Through the importance of interdisciplinary collaboration, case studies of ethical leadership in AI, and recommendations, this article emphasizes that ethical leadership in the age of AI is morally essential and strategically advantageous.

Updated: 2024-10-30 18:30:56

标题: 人工智能时代的伦理领导：挑战、机遇和伦理领导的框架

摘要: 人工智能目前正在迅速改变组织和企业的运作方式。由于各行各业的组织和企业正在与人工智能一起发展，伦理领导力变得极其重要。在使用人工智能时，组织和企业可能面临多种挑战和潜在机遇。伦理领导力在指导组织应对这些挑战并最大化这些机遇方面起着中心作用。本文探讨了在人工智能时代伦理领导力的本质，从对伦理领导力和人工智能的简化介绍开始，然后深入了解伦理领导力、其特征和重要性，以及人工智能引发的伦理挑战，包括人工智能算法中的偏见。在人工智能时代的伦理领导力机会中回答了一个问题：领导者可以采取哪些可行策略来应对挑战和利用机遇？并描述了这些机会为组织带来的好处。本文提出了一个伦理领导力框架，其中包含核心组件：公平、透明度、可持续性等。通过跨学科合作的重要性、人工智能伦理领导力案例研究和建议，本文强调了在人工智能时代伦理领导力在道德上的重要性和战略上的优势。

更新时间: 2024-10-30 18:30:56

领域: cs.CY,cs.AI

下载: http://arxiv.org/abs/2410.18095v2

Policy Mirror Descent with Lookahead

Policy Mirror Descent (PMD) stands as a versatile algorithmic framework encompassing several seminal policy gradient algorithms such as natural policy gradient, with connections with state-of-the-art reinforcement learning (RL) algorithms such as TRPO and PPO. PMD can be seen as a soft Policy Iteration algorithm implementing regularized 1-step greedy policy improvement. However, 1-step greedy policies might not be the best choice and recent remarkable empirical successes in RL such as AlphaGo and AlphaZero have demonstrated that greedy approaches with respect to multiple steps outperform their 1-step counterpart. In this work, we propose a new class of PMD algorithms called $h$-PMD which incorporates multi-step greedy policy improvement with lookahead depth $h$ to the PMD update rule. To solve discounted infinite horizon Markov Decision Processes with discount factor $\gamma$, we show that $h$-PMD which generalizes the standard PMD enjoys a faster dimension-free $\gamma^h$-linear convergence rate, contingent on the computation of multi-step greedy policies. We propose an inexact version of $h$-PMD where lookahead action values are estimated. Under a generative model, we establish a sample complexity for $h$-PMD which improves over prior work. Finally, we extend our result to linear function approximation to scale to large state spaces. Under suitable assumptions, our sample complexity only involves dependence on the dimension of the feature map space instead of the state space size.

Updated: 2024-10-30 18:24:40

标题: 政策镜像下降与前瞻

摘要: 政策镜像下降（PMD）作为一个多功能的算法框架，涵盖了几种重要的政策梯度算法，如自然政策梯度，与最新的强化学习（RL）算法（如TRPO和PPO）有关。PMD可以被看作是一个实现正则化1步贪婪政策改进的软政策迭代算法。然而，1步贪婪政策可能不是最佳选择，最近在RL中取得的显著实证成功（如AlphaGo和AlphaZero）表明，相对于1步对应物，对多个步骤采用贪婪方法效果更好。在这项工作中，我们提出了一类新的PMD算法，称为$h$-PMD，它将多步贪婪政策改进与前瞻深度$h$结合到PMD更新规则中。为了解决具有折扣因子$\gamma$的折扣无限视界马尔可夫决策过程，我们展示了$h$-PMD，它将标准PMD泛化为更快的无维度$\gamma^h$线性收敛速率，取决于多步贪婪政策的计算。我们提出了一个$h$-PMD的不精确版本，其中估计了前瞻行动值。在一个生成模型下，我们建立了$h$-PMD的样本复杂度，优于先前的工作。最后，我们将结果扩展到线性函数近似，以适应大状态空间。在适当的假设下，我们的样本复杂度只涉及特征映射空间的维度依赖，而不是状态空间大小。

更新时间: 2024-10-30 18:24:40

领域: cs.LG,cs.AI,stat.ML

下载: http://arxiv.org/abs/2403.14156v2

Non-binary artificial neuron with phase variation implemented on a quantum computer

The first artificial quantum neuron models followed a similar path to classic models, as they work only with discrete values. Here we introduce an algorithm that generalizes the binary model manipulating the phase of complex numbers. We propose, test, and implement a neuron model that works with continuous values in a quantum computer. Through simulations, we demonstrate that our model may work in a hybrid training scheme utilizing gradient descent as a learning algorithm. This work represents another step in the direction of evaluation of the use of artificial neural networks efficiently implemented on near-term quantum devices.

Updated: 2024-10-30 18:18:53

标题: 非二进制人工神经元在量子计算机上实现的相位变化

摘要: 第一个人工量子神经元模型沿着经典模型类似的路径发展，因为它们只处理离散值。在这里，我们介绍了一种算法，它通过操作复数的相位来推广二进制模型。我们提出、测试和实现了一个在量子计算机上使用连续值的神经元模型。通过模拟，我们展示了我们的模型可能在利用梯度下降作为学习算法的混合训练方案中发挥作用。这项工作代表了朝着有效在近期量子设备上实现的人工神经网络的使用评估方向迈出的又一步。

更新时间: 2024-10-30 18:18:53

领域: quant-ph,cs.AI,cs.LG,cs.NE

下载: http://arxiv.org/abs/2410.23373v1

Smoke and Mirrors in Causal Downstream Tasks

Machine Learning and AI have the potential to transform data-driven scientific discovery, enabling accurate predictions for several scientific phenomena. As many scientific questions are inherently causal, this paper looks at the causal inference task of treatment effect estimation, where the outcome of interest is recorded in high-dimensional observations in a Randomized Controlled Trial (RCT). Despite being the simplest possible causal setting and a perfect fit for deep learning, we theoretically find that many common choices in the literature may lead to biased estimates. To test the practical impact of these considerations, we recorded ISTAnt, the first real-world benchmark for causal inference downstream tasks on high-dimensional observations as an RCT studying how garden ants (Lasius neglectus) respond to microparticles applied onto their colony members by hygienic grooming. Comparing 6 480 models fine-tuned from state-of-the-art visual backbones, we find that the sampling and modeling choices significantly affect the accuracy of the causal estimate, and that classification accuracy is not a proxy thereof. We further validated the analysis, repeating it on a synthetically generated visual data set controlling the causal model. Our results suggest that future benchmarks should carefully consider real downstream scientific questions, especially causal ones. Further, we highlight guidelines for representation learning methods to help answer causal questions in the sciences.

Updated: 2024-10-30 18:10:26

标题: 因果下游任务中的烟雾与镜子

摘要: 机器学习和人工智能有潜力改变数据驱动的科学发现，实现对几种科学现象的准确预测。由于许多科学问题本质上是因果关系的，本文关注治疗效果估计的因果推断任务，其中感兴趣的结果记录在随机对照试验（RCT）中的高维观察中。尽管这是可能的因果设置中最简单的情形，并且非常适合深度学习，但我们在理论上发现文献中许多常见选择可能会导致有偏估计。为了测试这些考虑的实际影响，我们记录了ISTAnt，第一个针对高维观察的因果推断下游任务的真实世界基准，作为一项研究花园蚂蚁（Lasius neglectus）如何通过卫生梳理对其群体成员施加的微粒做出反应的RCT。通过比较从最先进的视觉骨干进行微调的6,480个模型，我们发现采样和建模选择显著影响因果估计的准确性，并且分类准确性并不是其代理。我们进一步验证了分析，在控制因果模型的情况下重复了一次合成生成的视觉数据集。我们的结果表明，未来的基准应仔细考虑实际的下游科学问题，特别是因果问题。此外，我们强调了表示学习方法的指导原则，以帮助回答科学中的因果问题。

更新时间: 2024-10-30 18:10:26

领域: cs.LG

下载: http://arxiv.org/abs/2405.17151v2

Tightening convex relaxations of trained neural networks: a unified approach for convex and S-shaped activations

The non-convex nature of trained neural networks has created significant obstacles in their incorporation into optimization models. Considering the wide array of applications that this embedding has, the optimization and deep learning communities have dedicated significant efforts to the convexification of trained neural networks. Many approaches to date have considered obtaining convex relaxations for each non-linear activation in isolation, which poses limitations in the tightness of the relaxations. Anderson et al. (2020) strengthened these relaxations and provided a framework to obtain the convex hull of the graph of a piecewise linear convex activation composed with an affine function; this effectively convexifies activations such as the ReLU together with the affine transformation that precedes it. In this article, we contribute to this line of work by developing a recursive formula that yields a tight convexification for the composition of an activation with an affine function for a wide scope of activation functions, namely, convex or ``S-shaped". Our approach can be used to efficiently compute separating hyperplanes or determine that none exists in various settings, including non-polyhedral cases. We provide computational experiments to test the empirical benefits of these convex approximations.

Updated: 2024-10-30 18:09:53

标题: 收紧训练神经网络的凸松弛：凸和S形激活的统一方法

摘要: 训练的神经网络的非凸性质给它们在优化模型中的整合带来了重大障碍。鉴于这种嵌入的广泛应用，优化和深度学习社区已经致力于将训练的神经网络凸化。迄今为止，许多方法考虑单独获得每个非线性激活的凸松弛，这在松弛的紧密性方面存在限制。Anderson等人（2020年）加强了这些松弛，并提供了一个框架，以获得由分段线性凸激活和仿射函数组成的图形的凸包；这有效地使诸如ReLU之类的激活与其之前的仿射变换凸化。在本文中，我们通过开发一个递归公式为激活和仿射函数的组合提供了一个紧凑的凸化，适用于各种激活函数，即凸或“S形”。我们的方法可以用来有效地计算分隔超平面或确定在各种设置中不存在分隔超平面，包括非多面体情况。我们提供了计算实验来测试这些凸近似的实证优势。

更新时间: 2024-10-30 18:09:53

领域: math.OC,cs.LG

下载: http://arxiv.org/abs/2410.23362v1

MissionGNN: Hierarchical Multimodal GNN-based Weakly Supervised Video Anomaly Recognition with Mission-Specific Knowledge Graph Generation

In the context of escalating safety concerns across various domains, the tasks of Video Anomaly Detection (VAD) and Video Anomaly Recognition (VAR) have emerged as critically important for applications in intelligent surveillance, evidence investigation, violence alerting, etc. These tasks, aimed at identifying and classifying deviations from normal behavior in video data, face significant challenges due to the rarity of anomalies which leads to extremely imbalanced data and the impracticality of extensive frame-level data annotation for supervised learning. This paper introduces a novel hierarchical graph neural network (GNN) based model MissionGNN that addresses these challenges by leveraging a state-of-the-art large language model and a comprehensive knowledge graph for efficient weakly supervised learning in VAR. Our approach circumvents the limitations of previous methods by avoiding heavy gradient computations on large multimodal models and enabling fully frame-level training without fixed video segmentation. Utilizing automated, mission-specific knowledge graph generation, our model provides a practical and efficient solution for real-time video analysis without the constraints of previous segmentation-based or multimodal approaches. Experimental validation on benchmark datasets demonstrates our model's performance in VAD and VAR, highlighting its potential to redefine the landscape of anomaly detection and recognition in video surveillance systems.

Updated: 2024-10-30 18:08:20

标题: MissionGNN: 基于层次多模态GNN的具有任务特定知识图生成的弱监督视频异常识别

摘要: 随着各个领域安全担忧不断加剧，视频异常检测（VAD）和视频异常识别（VAR）等任务已经成为智能监控、证据调查、暴力警告等应用中至关重要的组成部分。这些任务旨在识别和分类视频数据中与正常行为偏离的行为，由于异常的罕见性导致数据极度不平衡，以及对监督学习进行大量帧级数据标注的不切实际性，因此面临着重大挑战。本文介绍了一种新颖的基于层次图神经网络（GNN）的模型MissionGNN，通过利用当前最先进的大型语言模型和全面的知识图，实现了VAR中的高效弱监督学习，从而解决了这些挑战。我们的方法通过避免在大型多模型上进行繁重的梯度计算，并实现无固定视频分割的完全帧级训练，绕过了先前方法的限制。利用自动化的、任务特定的知识图生成，我们的模型为实时视频分析提供了实用且高效的解决方案，摆脱了先前基于分割或多模态方法的限制。在基准数据集上的实验验证展示了我们模型在VAD和VAR中的性能，突显了其重新定义视频监控系统中异常检测和识别的潜力。

更新时间: 2024-10-30 18:08:20

领域: cs.LG

下载: http://arxiv.org/abs/2406.18815v2

Domain-decomposed image classification algorithms using linear discriminant analysis and convolutional neural networks

In many modern computer application problems, the classification of image data plays an important role. Among many different supervised machine learning models, convolutional neural networks (CNNs) and linear discriminant analysis (LDA) as well as sophisticated variants thereof are popular techniques. In this work, two different domain decomposed CNN models are experimentally compared for different image classification problems. Both models are loosely inspired by domain decomposition methods and in addition, combined with a transfer learning strategy. The resulting models show improved classification accuracies compared to the corresponding, composed global CNN model without transfer learning and besides, also help to speed up the training process. Moreover, a novel decomposed LDA strategy is proposed which also relies on a localization approach and which is combined with a small neural network model. In comparison with a global LDA applied to the entire input data, the presented decomposed LDA approach shows increased classification accuracies for the considered test problems.

Updated: 2024-10-30 18:07:12

标题: 使用线性判别分析和卷积神经网络的领域分解图像分类算法

摘要: 在许多现代计算机应用问题中，图像数据的分类起着重要作用。在许多不同的监督式机器学习模型中，卷积神经网络（CNNs）和线性判别分析（LDA）以及其复杂的变体是流行的技术。在这项工作中，对于不同的图像分类问题，实验比较了两种不同的领域分解CNN模型。这两种模型都受到领域分解方法的启发，并且结合了迁移学习策略。与不使用迁移学习的对应的组合全局CNN模型相比，得到的模型显示出了改进的分类准确性，并且还有助于加快训练过程。此外，提出了一种新颖的分解LDA策略，该策略也依赖于一种本地化方法，并与一个小型神经网络模型结合。与应用于整个输入数据的全局LDA相比，所提出的分解LDA方法显示出了对所考虑的测试问题的增加的分类准确性。

更新时间: 2024-10-30 18:07:12

领域: cs.CV,cs.LG,cs.NA,math.NA,68T07, 68W10, 68W15, 65N55

下载: http://arxiv.org/abs/2410.23359v1

Sequential Order-Robust Mamba for Time Series Forecasting

Mamba has recently emerged as a promising alternative to Transformers, offering near-linear complexity in processing sequential data. However, while channels in time series (TS) data have no specific order in general, recent studies have adopted Mamba to capture channel dependencies (CD) in TS, introducing a sequential order bias. To address this issue, we propose SOR-Mamba, a TS forecasting method that 1) incorporates a regularization strategy to minimize the discrepancy between two embedding vectors generated from data with reversed channel orders, thereby enhancing robustness to channel order, and 2) eliminates the 1D-convolution originally designed to capture local information in sequential data. Furthermore, we introduce channel correlation modeling (CCM), a pretraining task aimed at preserving correlations between channels from the data space to the latent space in order to enhance the ability to capture CD. Extensive experiments demonstrate the efficacy of the proposed method across standard and transfer learning scenarios. Code is available at https://github.com/seunghan96/SOR-Mamba.

Updated: 2024-10-30 18:05:22

标题: 顺序稳健的Mamba用于时间序列预测

摘要: 蟒蛇最近已经成为变压器的一个有前途的替代品，它在处理序列数据时具有接近线性复杂性。然而，尽管时间序列（TS）数据中的通道通常没有特定的顺序，最近的研究已经采用蟒蛇来捕捉TS中的通道依赖性（CD），引入了一个顺序偏差。为了解决这个问题，我们提出了SOR-Mamba，这是一种TS预测方法，它1）包含了一个正则化策略，以最小化从具有反向通道顺序的数据生成的两个嵌入向量之间的差异，从而增强对通道顺序的鲁棒性，2）消除了最初设计用于捕获顺序数据中局部信息的1D卷积。此外，我们引入了通道相关建模（CCM），这是一个旨在保持从数据空间到潜在空间中通道之间相关性的预训练任务，以增强捕捉CD的能力。大量实验表明了所提出的方法在标准和迁移学习场景中的有效性。代码可在https://github.com/seunghan96/SOR-Mamba 上找到。

更新时间: 2024-10-30 18:05:22

领域: cs.LG,cs.AI,stat.ML

下载: http://arxiv.org/abs/2410.23356v1

Accelerating Direct Preference Optimization with Prefix Sharing

Offline paired preference optimization algorithms have become a popular approach for fine-tuning on preference data, outperforming traditional supervised fine-tuning in various tasks. However, traditional implementations often involve redundant computations, especially for tasks with long shared prompts. We introduce prefix sharing for preference tuning, a novel technique that processes chosen and rejected responses as one sequence with a shared prefix. To prevent cross-response contamination, we use a custom block-sparse attention mask. Our method achieves $1.1$-$1.5\times$ improvement in training throughput on popular DPO datasets, without any effect on convergence. When combined with sequence packing, we observe consistent $1.3$-$1.6\times$ speedups, benefiting even datasets with smaller sequence lengths. While we focus on Direct Preference Optimization (DPO), our approach is applicable to other paired preference tuning methods. By enhancing computational efficiency, our work contributes to making preference-based fine-tuning more accessible for a wider range of applications and model sizes. We open-source our code at https://github.com/frankxwang/dpo-prefix-sharing.

Updated: 2024-10-30 18:02:43

标题: 加速直接偏好优化通过前缀共享

摘要: 离线成对偏好优化算法已成为微调偏好数据的流行方法，在各种任务中表现优于传统的监督微调。然而，传统实现通常涉及冗余计算，尤其是对于具有长共享提示的任务。我们引入了前缀共享用于偏好微调，这是一种新颖的技术，将选择和拒绝的响应作为具有共享前缀的序列进行处理。为了防止跨响应污染，我们使用自定义的块稀疏注意力掩码。我们的方法在流行的DPO数据集上实现了1.1-1.5倍的训练吞吐量改进，而不会对收敛产生任何影响。当与序列打包结合时，我们观察到一致的1.3-1.6倍加速，即使是具有较小序列长度的数据集也会受益。虽然我们专注于直接偏好优化（DPO），但我们的方法适用于其他成对偏好微调方法。通过提高计算效率，我们的工作有助于使基于偏好的微调对更广泛的应用和模型尺寸更易访问。我们在https://github.com/frankxwang/dpo-prefix-sharing上开源我们的代码。

更新时间: 2024-10-30 18:02:43

领域: cs.LG,cs.CL

下载: http://arxiv.org/abs/2410.20305v2

Random Heterogeneous Neurochaos Learning Architecture for Data Classification

Inspired by the human brain's structure and function, Artificial Neural Networks (ANN) were developed for data classification. However, existing Neural Networks, including Deep Neural Networks, do not mimic the brain's rich structure. They lack key features such as randomness and neuron heterogeneity, which are inherently chaotic in their firing behavior. Neurochaos Learning (NL), a chaos-based neural network, recently employed one-dimensional chaotic maps like Generalized L\"uroth Series (GLS) and Logistic map as neurons. For the first time, we propose a random heterogeneous extension of NL, where various chaotic neurons are randomly placed in the input layer, mimicking the randomness and heterogeneous nature of human brain networks. We evaluated the performance of the newly proposed Random Heterogeneous Neurochaos Learning (RHNL) architectures combined with traditional Machine Learning (ML) methods. On public datasets, RHNL outperformed both homogeneous NL and fixed heterogeneous NL architectures in nearly all classification tasks. RHNL achieved high F1 scores on the Wine dataset (1.0), Bank Note Authentication dataset (0.99), Breast Cancer Wisconsin dataset (0.99), and Free Spoken Digit Dataset (FSDD) (0.98). These RHNL results are among the best in the literature for these datasets. We investigated RHNL performance on image datasets, where it outperformed stand-alone ML classifiers. In low training sample regimes, RHNL was the best among stand-alone ML. Our architecture bridges the gap between existing ANN architectures and the human brain's chaotic, random, and heterogeneous properties. We foresee the development of several novel learning algorithms centered around Random Heterogeneous Neurochaos Learning in the coming days.

Updated: 2024-10-30 18:00:14

标题: 随机异质神经混沌学习架构用于数据分类

摘要: 受人脑结构和功能启发，人工神经网络（ANN）被开发用于数据分类。然而，现有的神经网络，包括深度神经网络，并不模仿大脑丰富的结构。它们缺乏关键特征，如随机性和神经元的异质性，在其发射行为中固有地混乱。最近，一种基于混沌的神经网络Neurochaos Learning（NL）采用了一维混沌映射，如广义L\"uroth Series（GLS）和Logistic map作为神经元。我们首次提出了NL的随机异质扩展，其中各种混沌神经元被随机放置在输入层，模拟人脑网络的随机性和异质性。我们评估了新提出的随机异质神经混沌学习（RHNL）架构与传统机器学习（ML）方法的性能。在公共数据集上，RHNL在几乎所有分类任务中均优于同质NL和固定异质NL架构。RHNL在Wine数据集（1.0）、银行票据鉴别数据集（0.99）、威斯康星州乳腺癌数据集（0.99）和自由口语数字数据集（FSDD）（0.98）上实现了高F1分数。这些RHNL结果在这些数据集中处于文献中的最佳水平。我们在图像数据集上调查了RHNL的性能，它胜过了独立的ML分类器。在训练样本稀缺的情况下，RHNL是独立的ML中最好的。我们的架构弥合了现有ANN架构与人脑混沌、随机和异质性特性之间的差距。我们预见未来几天将围绕随机异质神经混沌学习开发几种新的学习算法。

更新时间: 2024-10-30 18:00:14

领域: cs.LG,F.2.2, I.2.7,F.2.2; I.2.7

下载: http://arxiv.org/abs/2410.23351v1

ASURA-FDPS-ML: Star-by-star Galaxy Simulations Accelerated by Surrogate Modeling for Supernova Feedback

We introduce new high-resolution galaxy simulations accelerated by a surrogate model that reduces the computation cost by approximately 75 percent. Massive stars with a Zero Age Main Sequence mass of about 8 solar masses and above explode as core-collapse supernovae (CCSNe), which play a critical role in galaxy formation. The energy released by CCSNe is essential for regulating star formation and driving feedback processes in the interstellar medium (ISM). However, the short integration timesteps required for SNe feedback present significant bottlenecks in star-by-star galaxy simulations that aim to capture individual stellar dynamics and the inhomogeneous shell expansion of SNe within the turbulent ISM. Our new framework combines direct numerical simulations and surrogate modeling, including machine learning and Gibbs sampling. The star formation history and the time evolution of outflow rates in the galaxy match those obtained from resolved direct numerical simulations. Our new approach achieves high-resolution fidelity while reducing computational costs, effectively bridging the physical scale gap and enabling multi-scale simulations.

Updated: 2024-10-30 18:00:02

标题: ASURA-FDPS-ML：通过代理模型加速的逐星系模拟，用于超新星反馈

摘要: 我们介绍了一种新的高分辨率星系模拟，通过一个减少计算成本约75％的替代模型加速。质量为约8太阳质量及以上的年轻恒星会爆炸成核心坍缩超新星（CCSNe），在星系形成中起着关键作用。CCSNe释放的能量对于调节恒星形成并推动星际介质（ISM）中的反馈过程至关重要。然而，为了捕捉个体恒星动力学和星系内部SNe不均匀壳体扩张的星-星星系模拟所需的较短的积分时间步长在SNe反馈中存在显著瓶颈。我们的新框架结合了直接数值模拟和替代建模，包括机器学习和吉布斯抽样。星系内的星形成历史和流出速率的时间演化与从解析直接数值模拟中获得的结果相匹配。我们的新方法在减少计算成本的同时实现了高分辨率的可信度，有效地弥合了物理尺度差距，实现了多尺度模拟。

更新时间: 2024-10-30 18:00:02

领域: astro-ph.GA,cs.AI,cs.LG

下载: http://arxiv.org/abs/2410.23346v1

MoLE: Enhancing Human-centric Text-to-image Diffusion via Mixture of Low-rank Experts

Text-to-image diffusion has attracted vast attention due to its impressive image-generation capabilities. However, when it comes to human-centric text-to-image generation, particularly in the context of faces and hands, the results often fall short of naturalness due to insufficient training priors. We alleviate the issue in this work from two perspectives. 1) From the data aspect, we carefully collect a human-centric dataset comprising over one million high-quality human-in-the-scene images and two specific sets of close-up images of faces and hands. These datasets collectively provide a rich prior knowledge base to enhance the human-centric image generation capabilities of the diffusion model. 2) On the methodological front, we propose a simple yet effective method called Mixture of Low-rank Experts (MoLE) by considering low-rank modules trained on close-up hand and face images respectively as experts. This concept draws inspiration from our observation of low-rank refinement, where a low-rank module trained by a customized close-up dataset has the potential to enhance the corresponding image part when applied at an appropriate scale. To validate the superiority of MoLE in the context of human-centric image generation compared to state-of-the-art, we construct two benchmarks and perform evaluations with diverse metrics and human studies. Datasets, model, and code are released at https://sites.google.com/view/mole4diffuser/.

Updated: 2024-10-30 17:59:57

标题: MoLE: 通过低秩专家混合增强人类中心的文本到图像扩散

摘要: 文本到图像扩散因其印象深刻的图像生成能力而受到广泛关注。然而，当涉及以人为中心的文本到图像生成，特别是在面部和手部的背景下，由于训练先验不足，结果往往缺乏自然性。我们从两个角度缓解了这个问题。1）从数据方面，我们精心收集了一个以人为中心的数据集，包括超过一百万张高质量的人物场景图像和两组特定的面部和手部特写图像。这些数据集共同提供了一个丰富的先验知识库，以增强扩散模型的人为中心图像生成能力。2）在方法上，我们提出了一种简单而有效的方法，称为低秩专家混合（MoLE），通过将分别在特写手部和面部图像上训练的低秩模块作为专家。这个概念受到我们对低秩细化的观察的启发，其中通过定制的特写数据集训练的低秩模块在适当的比例上应用时有可能增强相应的图像部分。为验证MoLE在人为中心图像生成方面相对于最先进技术的优越性，我们构建了两个基准，并使用不同的度量标准和人类研究进行评估。数据集、模型和代码发布在https://sites.google.com/view/mole4diffuser/。

更新时间: 2024-10-30 17:59:57

领域: cs.CV,cs.AI,cs.LG

下载: http://arxiv.org/abs/2410.23332v1

Bridging the Human to Robot Dexterity Gap through Object-Oriented Rewards

Training robots directly from human videos is an emerging area in robotics and computer vision. While there has been notable progress with two-fingered grippers, learning autonomous tasks for multi-fingered robot hands in this way remains challenging. A key reason for this difficulty is that a policy trained on human hands may not directly transfer to a robot hand due to morphology differences. In this work, we present HuDOR, a technique that enables online fine-tuning of policies by directly computing rewards from human videos. Importantly, this reward function is built using object-oriented trajectories derived from off-the-shelf point trackers, providing meaningful learning signals despite the morphology gap and visual differences between human and robot hands. Given a single video of a human solving a task, such as gently opening a music box, HuDOR enables our four-fingered Allegro hand to learn the task with just an hour of online interaction. Our experiments across four tasks show that HuDOR achieves a 4x improvement over baselines. Code and videos are available on our website, https://object-rewards.github.io.

Updated: 2024-10-30 17:59:41

标题: 通过面向对象的奖励弥合人类与机器人灵巧度之间的差距

摘要: 直接从人类视频中训练机器人是机器人学和计算机视觉中的一个新兴领域。虽然在两指夹具方面取得了显著进展，但通过这种方式学习多指机器人手的自主任务仍然具有挑战性。这种困难的一个关键原因是在人类手上训练的策略可能由于形态学差异而无法直接转移到机器人手上。在这项工作中，我们提出了一种名为HuDOR的技术，它通过直接从人类视频中计算奖励来实现策略的在线微调。重要的是，这个奖励函数是使用从现成点跟踪器导出的面向对象的轨迹构建的，尽管人类手和机器人手之间存在形态差异和视觉差异，但仍提供有意义的学习信号。给定一个人类解决任务的视频，比如轻轻打开音乐盒，HuDOR使我们的四指Allegro手能够在仅仅一小时的在线交互中学习完成任务。我们在四项任务上的实验表明，HuDOR相对基线取得了4倍的改进。代码和视频可在我们的网站https://object-rewards.github.io 上找到。

更新时间: 2024-10-30 17:59:41

领域: cs.RO,cs.LG

下载: http://arxiv.org/abs/2410.23289v1

Data Contamination Can Cross Language Barriers

The opacity in developing large language models (LLMs) is raising growing concerns about the potential contamination of public benchmarks in the pre-training data. Existing contamination detection methods are typically based on the text overlap between training and evaluation data, which can be too superficial to reflect deeper forms of contamination. In this paper, we first present a cross-lingual form of contamination that inflates LLMs' performance while evading current detection methods, deliberately injected by overfitting LLMs on the translated versions of benchmark test sets. Then, we propose generalization-based approaches to unmask such deeply concealed contamination. Specifically, we examine the LLM's performance change after modifying the original benchmark by replacing the false answer choices with correct ones from other questions. Contaminated models can hardly generalize to such easier situations, where the false choices can be \emph{not even wrong}, as all choices are correct in their memorization. Experimental results demonstrate that cross-lingual contamination can easily fool existing detection methods, but not ours. In addition, we discuss the potential utilization of cross-lingual contamination in interpreting LLMs' working mechanisms and in post-training LLMs for enhanced multilingual capabilities. The code and dataset we use can be obtained from \url{https://github.com/ShangDataLab/Deep-Contam}.

Updated: 2024-10-30 17:59:08

标题: 数据污染可以跨越语言障碍

摘要: 发展中的大型语言模型（LLMs）的不透明性引起了对在预训练数据中潜在污染公共基准的日益关注。现有的污染检测方法通常基于训练和评估数据之间的文本重叠，这可能过于肤浅，无法反映更深层次的污染形式。在本文中，我们首先提出了一种跨语言形式的污染，可以提升LLMs的性能，同时规避当前的检测方法，这是通过过度拟合LLMs在基准测试集的翻译版本上进行故意注入的。然后，我们提出了基于泛化的方法来揭示这种深度隐藏的污染。具体来说，我们通过将原始基准替换为其他问题的正确答案选择，来检查LLMs在改变后的基准上的性能变化。受污染的模型几乎无法泛化到这种更容易的情况，其中错误选择可能是“甚至不正确的”，因为在它们的记忆中所有选择都是正确的。实验结果表明，跨语言污染可以轻易愚弄现有的检测方法，但我们的方法则不会。此外，我们讨论了跨语言污染在解释LLMs的工作机制和在后期训练LLMs以增强多语言能力方面的潜在利用。我们使用的代码和数据集可以从\url{https://github.com/ShangDataLab/Deep-Contam}获取。

更新时间: 2024-10-30 17:59:08

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2406.13236v2

Provable acceleration for diffusion models under minimal assumptions

While score-based diffusion models have achieved exceptional sampling quality, their sampling speeds are often limited by the high computational burden of score function evaluations. Despite the recent remarkable empirical advances in speeding up the score-based samplers, theoretical understanding of acceleration techniques remains largely limited. To bridge this gap, we propose a novel training-free acceleration scheme for stochastic samplers. Under minimal assumptions -- namely, $L^2$-accurate score estimates and a finite second-moment condition on the target distribution -- our accelerated sampler provably achieves $\varepsilon$-accuracy in total variation within $\widetilde{O}(d^{5/4}/\sqrt{\varepsilon})$ iterations, thereby significantly improving upon the $\widetilde{O}(d/\varepsilon)$ iteration complexity of standard score-based samplers. Notably, our convergence theory does not rely on restrictive assumptions on the target distribution or higher-order score estimation guarantees.

Updated: 2024-10-30 17:59:06

标题: 可证明的加速扩散模型在最小假设下

摘要: 尽管基于分数的扩散模型在取样质量方面取得了异常的成就，但其取样速度通常受分数函数评估的高计算负担限制。尽管最近在加速基于分数的取样器方面取得了显著的经验进展，但对加速技术的理论理解仍然相当有限。为了弥合这一差距，我们提出了一种新颖的无需训练的随机取样器加速方案。在最小的假设条件下——即$L^2$-准确的分数估计和目标分布的有限二阶矩条件下——我们的加速取样器在$\widetilde{O}(d^{5/4}/\sqrt{\varepsilon})$次迭代中可以证明实现$\varepsilon$精度的总变差，从而显著改善了标准基于分数的取样器的$\widetilde{O}(d/\varepsilon)$迭代复杂度。值得注意的是，我们的收敛理论不依赖于对目标分布或更高阶分数估计保证的限制性假设。

更新时间: 2024-10-30 17:59:06

领域: cs.LG,cs.AI,math.OC,stat.ML

下载: http://arxiv.org/abs/2410.23285v1

A Neural Transformer Framework for Simultaneous Tasks of Segmentation, Classification, and Caller Identification of Marmoset Vocalization

Marmoset, a highly vocalized primate, has become a popular animal model for studying social-communicative behavior and its underlying mechanism. In the study of vocal communication, it is vital to know the caller identities, call contents, and vocal exchanges. Previous work of a CNN has achieved a joint model for call segmentation, classification, and caller identification for marmoset vocalizations. However, the CNN has limitations in modeling long-range acoustic patterns; the Transformer architecture that has been shown to outperform CNNs, utilizes the self-attention mechanism that efficiently segregates information parallelly over long distances and captures the global structure of marmoset vocalization. We propose using the Transformer to jointly segment and classify the marmoset calls and identify the callers for each vocalization.

Updated: 2024-10-30 17:57:13

标题: 一个神经变换器框架用于共同完成狨猴鸣叫声分割、分类和鸣叫者识别任务

摘要: 豪猴是一种高度发声的灵长类动物，已成为研究社会交流行为及其基本机制的热门动物模型。在声音交流研究中，了解呼叫者身份、呼叫内容和声音交流至关重要。以前的工作中，卷积神经网络（CNN）已经实现了一种联合模型，用于豪猴声音的呼叫分割、分类和呼叫者识别。然而，CNN在建模长距离声学模式方面存在局限性；Transformer架构已被证明优于CNN，利用自注意机制高效地并行分离信息，并捕捉豪猴声音的全局结构。我们建议使用Transformer来联合分割和分类豪猴的呼叫，并为每个声音识别呼叫者。

更新时间: 2024-10-30 17:57:13

领域: cs.SD,cs.AI,eess.AS

下载: http://arxiv.org/abs/2410.23279v1

Guiding Through Complexity: What Makes Good Supervision for Hard Reasoning Tasks?

How can "weak teacher models" such as average human annotators or existing AI systems, effectively supervise LLMs to improve performance on hard reasoning tasks, especially those that challenge and requires expertise or daily practice from the teacher models? In this paper, we seek for empirical answers to this question by investigating various data-driven strategies that offer supervision data at different quality levels upon tasks of varying complexity. Two intuitive strategies emerge for teacher models to provide supervision during alignment training: 1) using lower-quality supervision from complete tasks that match the difficulty of the target reasoning tasks, and 2) leveraging higher-quality supervision from easier subtasks that are less challenging. Interestingly, we find that even when the outcome error rate for hard task supervision is high (e.g., 90\%), training on such data can outperform perfectly correct supervision on easier subtasks on multiple hard math benchmarks. We further identify a more critical factor influencing training performance: step-wise error rates, which indicate the severity of errors in solutions. Specifically, training on hard task supervision with the same outcome error rates but disparate step-wise error rates can lead to a 30\% accuracy gap on MATH benchmark. Our results also reveal that supplementing hard task supervision with the corresponding subtask supervision can yield notable performance improvements than simply combining rephrased hard full task supervision, suggesting new avenues for data augmentation. Data and code are released at \url{https://github.com/hexuan21/Weak-to-Strong}.

Updated: 2024-10-30 17:56:22

标题: 引导复杂情境：如何进行对于艰难推理任务的良好监督？

摘要: 如何让“弱教师模型”，如普通人类标注者或现有的人工智能系统，有效地监督LLMs以提高在困难推理任务上的表现，特别是那些挑战并需要教师模型的专业知识或日常实践的任务？在本文中，我们通过研究提供不同质量级别监督数据的各种数据驱动策略来寻找这个问题的经验性答案，这些策略针对不同复杂性的任务。在对齐训练期间，两种直观的策略出现在教师模型提供监督的过程中：1）使用与目标推理任务难度相匹配的完整任务的低质量监督，2）利用较容易的次任务提供更高质量的监督，这些次任务不那么具有挑战性。有趣的是，我们发现，即使在困难任务监督的结果错误率很高（例如90％），在多个困难数学基准测试中，通过这些数据进行训练可以胜过在更容易的次任务上完全正确的监督。我们进一步确定了影响训练表现的更关键因素：逐步错误率，这表明解决方案中错误的严重程度。具体来说，在具有相同结果错误率但不同逐步错误率的困难任务监督上进行训练，可以导致MATH基准测试中30％的准确率差距。我们的结果还显示，将困难任务监督与相应的子任务监督相结合，可以比仅简单地组合重新表述的困难完整任务监督带来显着的性能改进，这为数据增强提供了新的途径。数据和代码已发布在\url{https://github.com/hexuan21/Weak-to-Strong}。

更新时间: 2024-10-30 17:56:22

领域: cs.LG,cs.CL

下载: http://arxiv.org/abs/2410.20533v2

SlowFast-VGen: Slow-Fast Learning for Action-Driven Long Video Generation

Human beings are endowed with a complementary learning system, which bridges the slow learning of general world dynamics with fast storage of episodic memory from a new experience. Previous video generation models, however, primarily focus on slow learning by pre-training on vast amounts of data, overlooking the fast learning phase crucial for episodic memory storage. This oversight leads to inconsistencies across temporally distant frames when generating longer videos, as these frames fall beyond the model's context window. To this end, we introduce SlowFast-VGen, a novel dual-speed learning system for action-driven long video generation. Our approach incorporates a masked conditional video diffusion model for the slow learning of world dynamics, alongside an inference-time fast learning strategy based on a temporal LoRA module. Specifically, the fast learning process updates its temporal LoRA parameters based on local inputs and outputs, thereby efficiently storing episodic memory in its parameters. We further propose a slow-fast learning loop algorithm that seamlessly integrates the inner fast learning loop into the outer slow learning loop, enabling the recall of prior multi-episode experiences for context-aware skill learning. To facilitate the slow learning of an approximate world model, we collect a large-scale dataset of 200k videos with language action annotations, covering a wide range of scenarios. Extensive experiments show that SlowFast-VGen outperforms baselines across various metrics for action-driven video generation, achieving an FVD score of 514 compared to 782, and maintaining consistency in longer videos, with an average of 0.37 scene cuts versus 0.89. The slow-fast learning loop algorithm significantly enhances performances on long-horizon planning tasks as well. Project Website: https://slowfast-vgen.github.io

Updated: 2024-10-30 17:55:52

标题: SlowFast-VGen: Slow-Fast学习用于基于动作的长视频生成

摘要: 人类天生具有一个互补学习系统，它可以连接一般世界动态的缓慢学习和来自新经验的情景记忆的快速存储。然而，先前的视频生成模型主要集中在通过预先在大量数据上进行训练的缓慢学习，忽视了对情景记忆存储至关重要的快速学习阶段。这一疏忽导致在生成较长视频时，跨时间间隔帧之间存在不一致性，因为这些帧超出了模型的上下文窗口。为此，我们引入了SlowFast-VGen，这是一个用于动作驱动的长视频生成的新型双速度学习系统。我们的方法结合了一个用于缓慢学习世界动态的蒙版条件视频扩散模型，以及一个基于时间 LoRA 模块的推理时快速学习策略。具体而言，快速学习过程根据局部输入和输出更新其时间 LoRA 参数，从而有效地将情景记忆存储在其参数中。我们进一步提出了一个慢快学习循环算法，可以将内部快速学习循环无缝集成到外部缓慢学习循环中，从而实现对先前多场景经验的上下文感知技能学习的回顾。为了促进对大约世界模型的缓慢学习，我们收集了一个包含 200k 视频和语言动作注释的大规模数据集，涵盖了各种场景。大量实验表明，SlowFast-VGen在各种动作驱动视频生成指标方面优于基线模型，达到了 514 的 FVD 得分，而基线模型为 782，并且在较长视频中保持一致性，平均为 0.37 个场景切换，而基线模型为 0.89。慢快学习循环算法还显著提高了长期规划任务的性能。项目网站：https://slowfast-vgen.github.io

更新时间: 2024-10-30 17:55:52

领域: cs.CV,cs.AI,cs.CL,cs.LG,cs.RO

下载: http://arxiv.org/abs/2410.23277v1

Conditional Forecasting of Margin Calls using Dynamic Graph Neural Networks

We introduce a novel Dynamic Graph Neural Network (DGNN) architecture for solving conditional $m$-steps ahead forecasting problems in temporal financial networks. The proposed DGNN is validated on simulated data from a temporal financial network model capturing stylized features of Interest Rate Swaps (IRSs) transaction networks, where financial entities trade swap contracts dynamically and the network topology evolves conditionally on a reference rate. The proposed model is able to produce accurate conditional forecasts of net variation margins up to a $21$-day horizon by leveraging conditional information under pre-determined stress test scenarios. Our work shows that the network dynamics can be successfully incorporated into stress-testing practices, thus providing regulators and policymakers with a crucial tool for systemic risk monitoring.

Updated: 2024-10-30 17:55:41

标题: 使用动态图神经网络对保证金调用的条件预测

摘要: 我们引入了一种新颖的动态图神经网络（DGNN）架构，用于解决时间金融网络中的条件m步前预测问题。所提出的DGNN在模拟数据上得到验证，该数据来自一个捕捉利率互换（IRSs）交易网络的风格化特征的时间金融网络模型，金融实体在其中动态交易互换合约，网络拓扑条件地随参考利率演变。提出的模型能够利用预先确定的应激测试场景下的条件信息，生产出净变更保证金的准确条件预测，最多达到21天的时间跨度。我们的研究表明，网络动态可以成功地纳入压力测试实践中，从而为监管机构和政策制定者提供了一个重要的工具，用于系统性风险监测。

更新时间: 2024-10-30 17:55:41

领域: q-fin.RM,cs.LG,stat.ML

下载: http://arxiv.org/abs/2410.23275v1

Multi-student Diffusion Distillation for Better One-step Generators

Diffusion models achieve high-quality sample generation at the cost of a lengthy multistep inference procedure. To overcome this, diffusion distillation techniques produce student generators capable of matching or surpassing the teacher in a single step. However, the student model's inference speed is limited by the size of the teacher architecture, preventing real-time generation for computationally heavy applications. In this work, we introduce Multi-Student Distillation (MSD), a framework to distill a conditional teacher diffusion model into multiple single-step generators. Each student generator is responsible for a subset of the conditioning data, thereby obtaining higher generation quality for the same capacity. MSD trains multiple distilled students, allowing smaller sizes and, therefore, faster inference. Also, MSD offers a lightweight quality boost over single-student distillation with the same architecture. We demonstrate MSD is effective by training multiple same-sized or smaller students on single-step distillation using distribution matching and adversarial distillation techniques. With smaller students, MSD gets competitive results with faster inference for single-step generation. Using 4 same-sized students, MSD sets a new state-of-the-art for one-step image generation: FID 1.20 on ImageNet-64x64 and 8.20 on zero-shot COCO2014.

Updated: 2024-10-30 17:54:56

标题: 多学生扩散蒸馏以实现更好的一步生成器

摘要: 扩散模型以高质量的样本生成为代价，需要漫长的多步推理过程。为了克服这一问题，扩散精馏技术产生了能够在单步中匹配或超越老师的学生生成器。然而，学生模型的推理速度受到老师架构大小的限制，从而阻碍了对计算密集型应用的实时生成。在这项工作中，我们介绍了多学生精馏（MSD），这是一个将条件老师扩散模型精馏为多个单步生成器的框架。每个学生生成器负责一部分条件数据，从而获得相同容量下更高的生成质量。MSD训练多个精炼学生，允许更小的尺寸，因此推理速度更快。此外，MSD与相同架构的单学生精馏相比，提供了轻量级的质量提升。我们通过使用分布匹配和对抗精馏技术，在单步精馏中训练多个相同大小或更小的学生，证明了MSD的有效性。使用更小的学生，MSD在单步生成中获得了具有竞争力的结果，并实现了更快的推理。使用4个相同大小的学生，MSD在一步图像生成方面创下了新的最先进水平：在ImageNet-64x64上的FID为1.20，在零样本COCO2014上为8.20。

更新时间: 2024-10-30 17:54:56

领域: cs.LG,cs.AI,cs.CV

下载: http://arxiv.org/abs/2410.23274v1

Proportional Fairness in Non-Centroid Clustering

We revisit the recently developed framework of proportionally fair clustering, where the goal is to provide group fairness guarantees that become stronger for groups of data points (agents) that are large and cohesive. Prior work applies this framework to centroid clustering, where the loss of an agent is its distance to the centroid assigned to its cluster. We expand the framework to non-centroid clustering, where the loss of an agent is a function of the other agents in its cluster, by adapting two proportional fairness criteria -- the core and its relaxation, fully justified representation (FJR) -- to this setting. We show that the core can be approximated only under structured loss functions, and even then, the best approximation we are able to establish, using an adaptation of the GreedyCapture algorithm developed for centroid clustering [Chen et al., 2019; Micha and Shah, 2020], is unappealing for a natural loss function. In contrast, we design a new (inefficient) algorithm, GreedyCohesiveClustering, which achieves the relaxation FJR exactly under arbitrary loss functions, and show that the efficient GreedyCapture algorithm achieves a constant approximation of FJR. We also design an efficient auditing algorithm, which estimates the FJR approximation of any given clustering solution up to a constant factor. Our experiments on real data suggest that traditional clustering algorithms are highly unfair, whereas GreedyCapture is considerably fairer and incurs only a modest loss in common clustering objectives.

Updated: 2024-10-30 17:53:49

标题: 非质心聚类中的比例公平性

摘要: 我们重新审视了最近发展的比例公平聚类框架，其目标是为数据点（代理）组提供更强大的公平性保证，这些数据点（代理）大且具有凝聚力。先前的工作将此框架应用于质心聚类，其中代理的损失是其分配给其群集的质心的距离。我们将框架扩展到非质心聚类，其中代理的损失是其群集中其他代理的函数，通过将两个比例公平性标准——核心及其放宽，完全合理的表示（FJR）——调整到这种情境。我们展示了核心只能在结构化损失函数下近似，即使在这种情况下，我们能够建立的最好近似，使用为质心聚类开发的GreedyCapture算法的一种改编[Chen等，2019；Micha和Shah，2020]，对于自然损失函数是不吸引人的。相比之下，我们设计了一种新的（低效率）算法，GreedyCohesiveClustering，它在任意损失函数下完全实现了FJR的放宽，并且展示了高效的GreedyCapture算法能够达到FJR的常数近似。我们还设计了一种高效的审计算法，它估计了任何给定聚类解决方案的FJR近似值，直到一个常数因子。我们对真实数据的实验表明，传统的聚类算法非常不公平，而GreedyCapture相当公平，并且在常见的聚类目标中只产生适度的损失。

更新时间: 2024-10-30 17:53:49

领域: cs.LG,cs.AI,cs.GT

下载: http://arxiv.org/abs/2410.23273v1

A Monte Carlo Framework for Calibrated Uncertainty Estimation in Sequence Prediction

Probabilistic prediction of sequences from images and other high-dimensional data is a key challenge, particularly in risk-sensitive applications. In these settings, it is often desirable to quantify the uncertainty associated with the prediction (instead of just determining the most likely sequence, as in language modeling). In this paper, we propose a Monte Carlo framework to estimate probabilities and confidence intervals associated with the distribution of a discrete sequence. Our framework uses a Monte Carlo simulator, implemented as an autoregressively trained neural network, to sample sequences conditioned on an image input. We then use these samples to estimate the probabilities and confidence intervals. Experiments on synthetic and real data show that the framework produces accurate discriminative predictions, but can suffer from miscalibration. In order to address this shortcoming, we propose a time-dependent regularization method, which is shown to produce calibrated predictions.

Updated: 2024-10-30 17:53:37

标题: 一个用于序列预测中校准不确定性估计的蒙特卡洛框架

摘要: 图像和其他高维数据序列的概率预测是一个关键挑战，特别是在风险敏感的应用中。在这些情景中，通常希望量化与预测相关的不确定性（而不仅仅是确定最可能的序列，如语言建模）。在本文中，我们提出了一个蒙特卡洛框架来估计与离散序列分布相关的概率和置信区间。我们的框架使用一个蒙特卡洛模拟器，实现为一个自回归训练的神经网络，来对基于图像输入的序列进行采样。然后我们使用这些样本来估计概率和置信区间。对合成和真实数据的实验表明，该框架产生准确的判别预测，但可能出现校准不准确的情况。为了解决这个缺点，我们提出了一种时间相关的正则化方法，显示出能够产生校准的预测。

更新时间: 2024-10-30 17:53:37

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2410.23272v1

CLIPErase: Efficient Unlearning of Visual-Textual Associations in CLIP

Machine unlearning (MU) has gained significant attention as a means to remove specific data from trained models without requiring a full retraining process. While progress has been made in unimodal domains like text and image classification, unlearning in multimodal models remains relatively underexplored. In this work, we address the unique challenges of unlearning in CLIP, a prominent multimodal model that aligns visual and textual representations. We introduce CLIPErase, a novel approach that disentangles and selectively forgets both visual and textual associations, ensuring that unlearning does not compromise model performance. CLIPErase consists of three key modules: a Forgetting Module that disrupts the associations in the forget set, a Retention Module that preserves performance on the retain set, and a Consistency Module that maintains consistency with the original model. Extensive experiments on the CIFAR-100 and Flickr30K datasets across four CLIP downstream tasks demonstrate that CLIPErase effectively forgets designated associations in zero-shot tasks for multimodal samples, while preserving the model's performance on the retain set after unlearning.

Updated: 2024-10-30 17:51:31

标题: CLIPErase：CLIP中视觉-文本关联的高效遗忘

摘要: 机器遗忘（MU）已引起显着关注，作为一种从训练模型中删除特定数据的方法，而无需进行完整的重新训练过程。虽然在文本和图像分类等单模域取得了进展，但多模型模型中的遗忘仍然相对未被充分探索。在这项工作中，我们解决了在CLIP中进行遗忘的独特挑战，CLIP是一个重要的多模型模型，它对齐了视觉和文本表示。我们引入了CLIPErase，这是一种新颖的方法，它分解并选择性地忘记了视觉和文本关联，确保遗忘不会影响模型性能。CLIPErase包括三个关键模块：一个忘记模块，破坏忘记集中的关联，一个保留模块，保持在保留集上的性能，和一个保持与原始模型一致的一致性模块。在CIFAR-100和Flickr30K数据集上进行的大量实验涵盖了四个CLIP下游任务，证明了CLIPErase在多模样本的零样本任务中有效忘记了指定的关联，同时在遗忘后保持了模型在保留集上的性能。

更新时间: 2024-10-30 17:51:31

领域: cs.CV,cs.AI,cs.LG

下载: http://arxiv.org/abs/2410.23330v1

Adam with model exponential moving average is effective for nonconvex optimization

In this work, we offer a theoretical analysis of two modern optimization techniques for training large and complex models: (i) adaptive optimization algorithms, such as Adam, and (ii) the model exponential moving average (EMA). Specifically, we demonstrate that a clipped version of Adam with model EMA achieves the optimal convergence rates in various nonconvex optimization settings, both smooth and nonsmooth. Moreover, when the scale varies significantly across different coordinates, we demonstrate that the coordinate-wise adaptivity of Adam is provably advantageous. Notably, unlike previous analyses of Adam, our analysis crucially relies on its core elements -- momentum and discounting factors -- as well as model EMA, motivating their wide applications in practice.

Updated: 2024-10-30 17:51:28

标题: 使用模型指数移动平均值的Adam对非凸优化有效果

摘要: 在这项工作中，我们对两种现代优化技术进行了理论分析，用于训练大型复杂模型：(i) 自适应优化算法，如Adam，和(ii) 模型指数移动平均（EMA）。具体地，我们证明了Adam的剪切版本与模型EMA在各种非凸优化设置中实现了最佳收敛速度，包括光滑和非光滑情况。此外，当不同坐标之间的尺度差异显著时，我们证明了Adam的坐标适应性在理论上是有优势的。值得注意的是，与以往对Adam的分析不同，我们的分析关键依赖于其核心要素--动量和折扣因子--以及模型EMA，这促使它们在实践中得到广泛应用。

更新时间: 2024-10-30 17:51:28

领域: cs.LG,math.OC

下载: http://arxiv.org/abs/2405.18199v2

TOMATO: Assessing Visual Temporal Reasoning Capabilities in Multimodal Foundation Models

Existing benchmarks often highlight the remarkable performance achieved by state-of-the-art Multimodal Foundation Models (MFMs) in leveraging temporal context for video understanding. However, how well do the models truly perform visual temporal reasoning? Our study of existing benchmarks shows that this capability of MFMs is likely overestimated as many questions can be solved by using a single, few, or out-of-order frames. To systematically examine current visual temporal reasoning tasks, we propose three principles with corresponding metrics: (1) Multi-Frame Gain, (2) Frame Order Sensitivity, and (3) Frame Information Disparity. Following these principles, we introduce TOMATO, Temporal Reasoning Multimodal Evaluation, a novel benchmark crafted to rigorously assess MFMs' temporal reasoning capabilities in video understanding. TOMATO comprises 1,484 carefully curated, human-annotated questions spanning six tasks (i.e., action count, direction, rotation, shape & trend, velocity & frequency, and visual cues), applied to 1,417 videos, including 805 self-recorded and -generated videos, that encompass human-centric, real-world, and simulated scenarios. Our comprehensive evaluation reveals a human-model performance gap of 57.3% with the best-performing model. Moreover, our in-depth analysis uncovers more fundamental limitations beyond this gap in current MFMs. While they can accurately recognize events in isolated frames, they fail to interpret these frames as a continuous sequence. We believe TOMATO will serve as a crucial testbed for evaluating the next-generation MFMs and as a call to the community to develop AI systems capable of comprehending human world dynamics through the video modality.

Updated: 2024-10-30 17:50:23

标题: 西红柿：评估多模态基础模型中的视觉时间推理能力

摘要: 现有的基准通常突出了最先进的多模态基础模型（MFMs）在利用时间上下文进行视频理解方面取得的出色表现。然而，这些模型在视觉时间推理方面表现如何？我们对现有基准的研究显示，MFMs的这种能力很可能被高估，因为许多问题可以通过使用一个、几个或无序帧来解决。为了系统地检查当前的视觉时间推理任务，我们提出了三个原则及相应的指标：（1）多帧收益，（2）帧次序敏感性，以及（3）帧信息差异性。遵循这些原则，我们引入了TOMATO，即时间推理多模态评估，这是一个精心设计的新基准，旨在严格评估MFMs在视频理解中的时间推理能力。TOMATO包括1,484个精心策划的、人工标注的问题，涵盖六个任务（即动作计数、方向、旋转、形状和趋势、速度和频率以及视觉线索），应用于1,417个视频，包括805个自录制和生成的视频，涵盖人类中心、现实世界和模拟情景。我们的全面评估显示，最佳表现模型与人类之间存在57.3%的差距。此外，我们的深入分析揭示了当前MFMs在这一差距之外的更基本限制。虽然它们可以准确识别孤立帧中的事件，但却无法将这些帧解释为连续序列。我们相信TOMATO将成为评估下一代MFMs的重要实验平台，并呼吁社区开发能够通过视频模态理解人类世界动态的人工智能系统。

更新时间: 2024-10-30 17:50:23

领域: cs.CV,cs.AI,cs.CL

下载: http://arxiv.org/abs/2410.23266v1

Certified Robustness to Data Poisoning in Gradient-Based Training

Modern machine learning pipelines leverage large amounts of public data, making it infeasible to guarantee data quality and leaving models open to poisoning and backdoor attacks. Provably bounding model behavior under such attacks remains an open problem. In this work, we address this challenge by developing the first framework providing provable guarantees on the behavior of models trained with potentially manipulated data without modifying the model or learning algorithm. In particular, our framework certifies robustness against untargeted and targeted poisoning, as well as backdoor attacks, for bounded and unbounded manipulations of the training inputs and labels. Our method leverages convex relaxations to over-approximate the set of all possible parameter updates for a given poisoning threat model, allowing us to bound the set of all reachable parameters for any gradient-based learning algorithm. Given this set of parameters, we provide bounds on worst-case behavior, including model performance and backdoor success rate. We demonstrate our approach on multiple real-world datasets from applications including energy consumption, medical imaging, and autonomous driving.

Updated: 2024-10-30 17:47:56

标题: 基于梯度训练的数据污染的可靠性认证

摘要: 现代机器学习管道利用大量的公共数据，难以保证数据质量，使模型容易受到毒化和后门攻击。在这种攻击下明确模型行为的问题仍然是一个未解决的问题。在这项工作中，我们通过开发第一个框架来解决这一挑战，对使用潜在操纵数据训练的模型的行为提供可证明的保证，而无需修改模型或学习算法。特别是，我们的框架证实了针对有界和无界操纵训练输入和标签的模型的鲁棒性，针对非定向和定向的毒化，以及后门攻击。我们的方法利用凸松弛来近似给定毒化威胁模型的所有可能参数更新的集合，从而使我们能够限制对于任何基于梯度的学习算法的所有可达参数的集合。给定这组参数，我们提供最坏情况下的行为边界，包括模型性能和后门成功率。我们在包括能源消耗、医学成像和自动驾驶在内的多个真实世界数据集上展示了我们的方法。

更新时间: 2024-10-30 17:47:56

领域: cs.LG,cs.CR,cs.CV

下载: http://arxiv.org/abs/2406.05670v2

EMMA: End-to-End Multimodal Model for Autonomous Driving

We introduce EMMA, an End-to-end Multimodal Model for Autonomous driving. Built on a multi-modal large language model foundation, EMMA directly maps raw camera sensor data into various driving-specific outputs, including planner trajectories, perception objects, and road graph elements. EMMA maximizes the utility of world knowledge from the pre-trained large language models, by representing all non-sensor inputs (e.g. navigation instructions and ego vehicle status) and outputs (e.g. trajectories and 3D locations) as natural language text. This approach allows EMMA to jointly process various driving tasks in a unified language space, and generate the outputs for each task using task-specific prompts. Empirically, we demonstrate EMMA's effectiveness by achieving state-of-the-art performance in motion planning on nuScenes as well as competitive results on the Waymo Open Motion Dataset (WOMD). EMMA also yields competitive results for camera-primary 3D object detection on the Waymo Open Dataset (WOD). We show that co-training EMMA with planner trajectories, object detection, and road graph tasks yields improvements across all three domains, highlighting EMMA's potential as a generalist model for autonomous driving applications. However, EMMA also exhibits certain limitations: it can process only a small amount of image frames, does not incorporate accurate 3D sensing modalities like LiDAR or radar and is computationally expensive. We hope that our results will inspire further research to mitigate these issues and to further evolve the state of the art in autonomous driving model architectures.

Updated: 2024-10-30 17:46:31

标题: EMMA：自动驾驶的端到端多模态模型

摘要: 我们介绍了EMMA，一个用于自动驾驶的端到端多模态模型。基于多模态大型语言模型基础，EMMA直接将原始摄像头传感器数据映射到各种驾驶特定输出，包括规划轨迹、感知对象和道路图元素。EMMA最大化了来自预训练大型语言模型的世界知识的效用，通过将所有非传感器输入（例如导航指令和自车状态）和输出（例如轨迹和3D位置）表示为自然语言文本。这种方法允许EMMA在统一的语言空间中共同处理各种驾驶任务，并使用特定于任务的提示为每个任务生成输出。从经验上讲，我们通过在nuScenes上实现运动规划的最新性能以及在Waymo Open Motion Dataset（WOMD）上取得有竞争力的结果来证明了EMMA的有效性。EMMA还在Waymo Open Dataset（WOD）上为主要摄像头的3D对象检测产生有竞争力的结果。我们展示了与规划轨迹、对象检测和道路图任务一起训练EMMA可以提高所有三个领域的性能的结果，突显了EMMA作为自动驾驶应用的通用模型的潜力。然而，EMMA也表现出一定的局限性：它只能处理少量图像帧，不包括准确的3D传感模块，如LiDAR或雷达，并且计算成本高昂。我们希望我们的结果能激发进一步研究，以减轻这些问题，并进一步发展自动驾驶模型架构的技术水平。

更新时间: 2024-10-30 17:46:31

领域: cs.CV,cs.AI,cs.CL,cs.LG,cs.RO

下载: http://arxiv.org/abs/2410.23262v1

$100K or 100 Days: Trade-offs when Pre-Training with Academic Resources

Pre-training is notoriously compute-intensive and academic researchers are notoriously under-resourced. It is, therefore, commonly assumed that academics can't pre-train models. In this paper, we seek to clarify this assumption. We first survey academic researchers to learn about their available compute and then empirically measure the time to replicate models on such resources. We introduce a benchmark to measure the time to pre-train models on given GPUs and also identify ideal settings for maximizing training speed. We run our benchmark on a range of models and academic GPUs, spending 2,000 GPU-hours on our experiments. Our results reveal a brighter picture for academic pre-training: for example, although Pythia-1B was originally trained on 64 GPUs for 3 days, we find it is also possible to replicate this model (with the same hyper-parameters) in 3x fewer GPU-days: i.e. on 4 GPUs in 18 days. We conclude with a cost-benefit analysis to help clarify the trade-offs between price and pre-training time. We believe our benchmark will help academic researchers conduct experiments that require training larger models on more data. We fully release our codebase at: https://github.com/apoorvkh/academic-pretraining.

Updated: 2024-10-30 17:46:20

标题: $100K或100天：使用学术资源进行预训练时的权衡

摘要: 预训练通常需要大量计算资源，而学术研究人员常常资源匮乏。因此，通常认为学术界无法进行模型的预训练。在本文中，我们试图澄清这一假设。我们首先调查学术研究人员的计算资源情况，然后在这些资源上实际测量复制模型所需的时间。我们引入了一个基准来衡量在给定GPU上预训练模型所需的时间，并确定了最大化训练速度的理想设置。我们在一系列模型和学术GPU上运行我们的基准，花费了2,000个GPU小时进行实验。我们的结果显示了学术界预训练的一幅更加乐观的图景：例如，尽管Pythia-1B最初是在64个GPU上进行了3天的训练，但我们发现也可以在3倍更少的GPU天数内复制这个模型（使用相同的超参数）：即在4个GPU上花费18天。我们通过成本效益分析总结，以帮助澄清价格和预训练时间之间的权衡。我们相信我们的基准将帮助学术研究人员进行需要在更多数据上训练更大模型的实验。我们完全发布我们的代码库链接：https://github.com/apoorvkh/academic-pretraining。

更新时间: 2024-10-30 17:46:20

领域: cs.CL,cs.LG

下载: http://arxiv.org/abs/2410.23261v1

Extracting thin film structures of energy materials using transformers

Neutron-Transformer Reflectometry and Advanced Computation Engine (N-TRACE ), a neural network model using transformer architecture, is introduced for neutron reflectometry data analysis. It offers fast, accurate initial parameter estimations and efficient refinements, improving efficiency and precision for real-time data analysis of lithium-mediated nitrogen reduction for electrochemical ammonia synthesis, with relevance to other chemical transformations and batteries. Despite limitations in generalizing across systems, it shows promises for the use of transformers as the basis for models that could replace trial-and-error approaches to modeling reflectometry data.

Updated: 2024-10-30 17:44:53

标题: 利用变压器提取能源材料的薄膜结构

摘要: 中子变压器反射测量和高级计算引擎（N-TRACE）是一种使用变压器架构的神经网络模型，用于中子反射测量数据分析。它提供快速、准确的初始参数估计和高效的精细调整，提高了锂介导氮还原电化学合成氨实时数据分析的效率和精度，与其他化学转化和电池相关。尽管在不同系统间的泛化存在限制，但它显示了变压器作为模型基础的潜力，可以取代试错方法对反射测量数据进行建模。

更新时间: 2024-10-30 17:44:53

领域: physics.comp-ph,cs.AI

下载: http://arxiv.org/abs/2406.16741v2

Keypoint Abstraction using Large Models for Object-Relative Imitation Learning

Generalization to novel object configurations and instances across diverse tasks and environments is a critical challenge in robotics. Keypoint-based representations have been proven effective as a succinct representation for capturing essential object features, and for establishing a reference frame in action prediction, enabling data-efficient learning of robot skills. However, their manual design nature and reliance on additional human labels limit their scalability. In this paper, we propose KALM, a framework that leverages large pre-trained vision-language models (LMs) to automatically generate task-relevant and cross-instance consistent keypoints. KALM distills robust and consistent keypoints across views and objects by generating proposals using LMs and verifies them against a small set of robot demonstration data. Based on the generated keypoints, we can train keypoint-conditioned policy models that predict actions in keypoint-centric frames, enabling robots to generalize effectively across varying object poses, camera views, and object instances with similar functional shapes. Our method demonstrates strong performance in the real world, adapting to different tasks and environments from only a handful of demonstrations while requiring no additional labels. Website: https://kalm-il.github.io/

Updated: 2024-10-30 17:37:31

标题: 用大模型进行关键点抽象以进行对象相关的模仿学习

摘要: 在机器人领域，对于在不同任务和环境中泛化到新颖物体配置和实例是一个关键挑战。基于关键点的表示已被证明是一种有效的简洁表示，可以捕捉到物体的基本特征，并在行动预测中建立一个参考框架，从而实现对机器人技能的数据高效学习。然而，它们的手动设计性质和对额外人工标签的依赖限制了它们的可扩展性。在本文中，我们提出了KALM，这是一个利用大型预训练视觉-语言模型（LMs）自动生成任务相关和跨实例一致关键点的框架。KALM通过使用LMs生成提议并对其进行验证，从而在视图和对象之间提炼出稳健且一致的关键点，并通过与少量机器人演示数据进行验证。基于生成的关键点，我们可以训练关键点条件的策略模型，以在关键点为中心的框架中预测动作，使机器人能够有效地在不同物体姿势、摄像头视角和具有相似功能形状的物体实例之间泛化。我们的方法在现实世界中表现出强大的性能，仅需要少量演示就可以适应不同任务和环境，而无需额外的标签。网站：https://kalm-il.github.io/

更新时间: 2024-10-30 17:37:31

领域: cs.RO,cs.AI,cs.CV

下载: http://arxiv.org/abs/2410.23254v1

Derivative-enhanced Deep Operator Network

The deep operator networks (DeepONet), a class of neural operators that learn mappings between function spaces, have recently been developed as surrogate models for parametric partial differential equations (PDEs). In this work we propose a derivative-enhanced deep operator network (DE-DeepONet), which leverages derivative information to enhance the solution prediction accuracy and provides a more accurate approximation of solution-to-parameter derivatives, especially when training data are limited. DE-DeepONet explicitly incorporates linear dimension reduction of high dimensional parameter input into DeepONet to reduce training cost and adds derivative loss in the loss function to reduce the number of required parameter-solution pairs. We further demonstrate that the use of derivative loss can be extended to enhance other neural operators, such as the Fourier neural operator (FNO). Numerical experiments validate the effectiveness of our approach.

Updated: 2024-10-30 17:36:56

标题: "Derivative-enhanced Deep Operator Network" 的翻译是：导数增强的深度运算符网络

摘要: 深度运算符网络（DeepONet）是一类学习函数空间之间映射的神经运算符，最近已经被开发为参数化偏微分方程（PDE）的替代模型。在这项工作中，我们提出了一种增强导数的深度运算符网络（DE-DeepONet），利用导数信息来提高解决方案预测精度，并在训练数据有限时提供更准确的解-参数导数近似。DE-DeepONet明确地将高维参数输入的线性降维纳入DeepONet中，以减少训练成本，并在损失函数中添加导数损失，以减少所需参数-解决方案对的数量。我们进一步证明，导数损失的使用可以扩展到增强其他神经运算符，如傅立叶神经运算符（FNO）。数值实验验证了我们方法的有效性。

更新时间: 2024-10-30 17:36:56

领域: cs.LG,cs.CE,cs.NA,math.NA

下载: http://arxiv.org/abs/2402.19242v2

bit2bit: 1-bit quanta video reconstruction via self-supervised photon prediction

Quanta image sensors, such as SPAD arrays, are an emerging sensor technology, producing 1-bit arrays representing photon detection events over exposures as short as a few nanoseconds. In practice, raw data are post-processed using heavy spatiotemporal binning to create more useful and interpretable images at the cost of degrading spatiotemporal resolution. In this work, we propose bit2bit, a new method for reconstructing high-quality image stacks at the original spatiotemporal resolution from sparse binary quanta image data. Inspired by recent work on Poisson denoising, we developed an algorithm that creates a dense image sequence from sparse binary photon data by predicting the photon arrival location probability distribution. However, due to the binary nature of the data, we show that the assumption of a Poisson distribution is inadequate. Instead, we model the process with a Bernoulli lattice process from the truncated Poisson. This leads to the proposal of a novel self-supervised solution based on a masked loss function. We evaluate our method using both simulated and real data. On simulated data from a conventional video, we achieve 34.35 mean PSNR with extremely photon-sparse binary input (<0.06 photons per pixel per frame). We also present a novel dataset containing a wide range of real SPAD high-speed videos under various challenging imaging conditions. The scenes cover strong/weak ambient light, strong motion, ultra-fast events, etc., which will be made available to the community, on which we demonstrate the promise of our approach. Both reconstruction quality and throughput substantially surpass the state-of-the-art methods (e.g., Quanta Burst Photography (QBP)). Our approach significantly enhances the visualization and usability of the data, enabling the application of existing analysis techniques.

Updated: 2024-10-30 17:30:35

标题: bit2bit: 通过自监督光子预测进行1比特量子视频重建

摘要: 准量子图像传感器，如单光子雪崩二极管阵列，是一种新兴的传感器技术，能够在几纳秒的曝光时间内产生代表光子检测事件的1位阵列。在实践中，原始数据经过重度时空分箱后处理，以创建更有用和可解释的图像，但会降低时空分辨率。在本研究中，我们提出了一种名为bit2bit的新方法，可以从稀疏的二进制准量子图像数据中重建高质量的图像堆栈，恢复原始的时空分辨率。受最近泊松去噪工作的启发，我们开发了一种算法，通过预测光子到达位置的概率分布，从稀疏的二进制光子数据中创建密集的图像序列。然而，由于数据的二进制特性，我们表明泊松分布的假设是不足够的。相反，我们使用从截断泊松到伯努利晶格过程对过程进行建模。这导致提出了基于掩蔽损失函数的新颖自监督解决方案。我们使用模拟数据和真实数据评估我们的方法。在来自传统视频的模拟数据中，我们实现了34.35的平均峰值信噪比，输入极端稀疏的二进制光子（每像素每帧<0.06个光子）。我们还提供了一个包含各种具有挑战性成像条件下真实准量子图像传感器高速视频的新颖数据集。这些场景涵盖强/弱环境光、强运动、超快事件等，将提供给社区使用，以展示我们方法的潜力。重建质量和吞吐量都大大超过了最先进的方法（例如准量子爆发摄影（QBP））。我们的方法显著增强了数据的可视化和可用性，使现有分析技术得以应用。

更新时间: 2024-10-30 17:30:35

领域: eess.IV,cs.CV,cs.LG,68T45,I.2.10

下载: http://arxiv.org/abs/2410.23247v1

Very fast Bayesian Additive Regression Trees on GPU

Bayesian Additive Regression Trees (BART) is a nonparametric Bayesian regression technique based on an ensemble of decision trees. It is part of the toolbox of many statisticians. The overall statistical quality of the regression is typically higher than other generic alternatives, and it requires less manual tuning, making it a good default choice. However, it is a niche method compared to its natural competitor XGBoost, due to the longer running time, making sample sizes above 10,000-100,000 a nuisance. I present a GPU-enabled implementation of BART, faster by up to 200x relative to a single CPU core, making BART competitive in running time with XGBoost. This implementation is available in the Python package bartz.

Updated: 2024-10-30 17:29:03

标题: GPU上的非常快速的贝叶斯加法回归树

摘要: Bayesian Additive Regression Trees（BART）是一种非参数贝叶斯回归技术，基于决策树的集成。它是许多统计学家工具箱中的一部分。回归的整体统计质量通常高于其他通用替代品，并且需要较少的手动调整，使其成为一个不错的默认选择。然而，与其自然竞争对手XGBoost相比，它是一种小众方法，因为运行时间较长，使得样本大小超过10,000-100,000成为一个麻烦。我提供了一个支持GPU的BART实现，相对于单个CPU核心快200倍，使得BART在运行时间上具有与XGBoost竞争力。这个实现在Python包bartz中可用。

更新时间: 2024-10-30 17:29:03

领域: stat.ML,cs.LG

下载: http://arxiv.org/abs/2410.23244v1

A little less conversation, a little more action, please: Investigating the physical common-sense of LLMs in a 3D embodied environment

As general-purpose tools, Large Language Models (LLMs) must often reason about everyday physical environments. In a question-and-answer capacity, understanding the interactions of physical objects may be necessary to give appropriate responses. Moreover, LLMs are increasingly used as reasoning engines in agentic systems, designing and controlling their action sequences. The vast majority of research has tackled this issue using static benchmarks, comprised of text or image-based questions about the physical world. However, these benchmarks do not capture the complexity and nuance of real-life physical processes. Here we advocate for a second, relatively unexplored, approach: 'embodying' the LLMs by granting them control of an agent within a 3D environment. We present the first embodied and cognitively meaningful evaluation of physical common-sense reasoning in LLMs. Our framework allows direct comparison of LLMs with other embodied agents, such as those based on Deep Reinforcement Learning, and human and non-human animals. We employ the Animal-AI (AAI) environment, a simulated 3D virtual laboratory, to study physical common-sense reasoning in LLMs. For this, we use the AAI Testbed, a suite of experiments that replicate laboratory studies with non-human animals, to study physical reasoning capabilities including distance estimation, tracking out-of-sight objects, and tool use. We demonstrate that state-of-the-art multi-modal models with no finetuning can complete this style of task, allowing meaningful comparison to the entrants of the 2019 Animal-AI Olympics competition and to human children. Our results show that LLMs are currently outperformed by human children on these tasks. We argue that this approach allows the study of physical reasoning using ecologically valid experiments drawn directly from cognitive science, improving the predictability and reliability of LLMs.

Updated: 2024-10-30 17:28:28

标题: 少一点对话，多一点行动：在3D实体环境中研究LLMs的物理常识

摘要: 作为通用工具，大型语言模型（LLMs）通常需要推理日常物理环境。在问答能力方面，理解物体之间的相互作用可能是给出适当回应所必需的。此外，LLMs越来越多地被用作代理系统中的推理引擎，设计和控制它们的动作序列。绝大多数研究使用静态基准来解决这个问题，这些基准由关于物理世界的文本或图像问题组成。然而，这些基准并未捕捉到真实物理过程的复杂性和微妙之处。在这里，我们提倡第二种相对未被探索的方法：通过赋予LLMs对3D环境中的代理进行控制来“体现”它们。我们展示了对LLMs进行身体化和认知有意义的物理常识推理评估。我们的框架允许直接比较LLMs与其他体现代理，如基于深度强化学习的代理，以及人类和非人类动物。我们利用动物智能（AAI）环境，一个模拟的3D虚拟实验室，来研究LLMs中的物理常识推理。为此，我们使用AAI测试平台，一套复制非人类动物实验室研究的实验，来研究包括距离估计、追踪视线之外的物体和工具使用在内的物理推理能力。我们证明，无需微调的最先进多模态模型可以完成这种任务，从而使得与2019年动物智能奥林匹克竞赛的参赛者和人类儿童进行有意义的比较。我们的结果显示，LLMs目前在这些任务上的表现不及人类儿童。我们认为，这种方法允许使用直接从认知科学中获取的生态有效实验来研究物理推理，提高了LLMs的可预测性和可靠性。

更新时间: 2024-10-30 17:28:28

领域: cs.AI

下载: http://arxiv.org/abs/2410.23242v1

A Theory of Synaptic Neural Balance: From Local to Global Order

We develop a general theory of synaptic neural balance and how it can emerge or be enforced in neural networks. For a given regularizer, a neuron is said to be in balance if the total cost of its input weights is equal to the total cost of its output weights. The basic example is provided by feedforward networks of ReLU units trained with $L_2$ regularizers, which exhibit balance after proper training. The theory explains this phenomenon and extends it in several directions. The first direction is the extension to bilinear and other activation functions. The second direction is the extension to more general regularizers, including all $L_p$ regularizers. The third direction is the extension to non-layered architectures, recurrent architectures, convolutional architectures, as well as architectures with mixed activation functions. Gradient descent on the error function alone does not converge in general to a balanced state, where every neuron is in balance, even when starting from a balanced state. However, gradient descent on the regularized error function ought to converge to a balanced state, and thus network balance can be used to assess learning progress. The theory is based on two local neuronal operations: scaling which is commutative, and balancing which is not commutative. Given any initial set of weights, when local balancing operations are applied to each neuron in a stochastic manner, global order always emerges through the convergence of the stochastic balancing algorithm to the same unique set of balanced weights. The reason for this is the existence of an underlying strictly convex optimization problem where the relevant variables are constrained to a linear, only architecture-dependent, manifold. Simulations show that balancing neurons prior to learning, or during learning in alternation with gradient descent steps, can improve learning speed and final performance.

Updated: 2024-10-30 17:28:02

标题: 一个突触神经平衡理论：从局部到全局秩序

摘要: 我们提出了一个关于突触神经平衡的一般理论，以及它如何在神经网络中出现或被强制执行。对于给定的正则化器，如果一个神经元的输入权重的总成本等于输出权重的总成本，则称其处于平衡状态。一个基本的示例是使用$L_2$正则化器训练的ReLU单元的前馈网络，在适当的训练后表现出平衡。该理论解释了这一现象，并在几个方向上进行了扩展。第一个方向是扩展到双线性和其他激活函数。第二个方向是扩展到更一般的正则化器，包括所有的$L_p$正则化器。第三个方向是扩展到非分层架构、循环架构、卷积架构，以及具有混合激活函数的架构。仅对错误函数进行梯度下降通常不能收敛到平衡状态，在平衡状态下，即使从平衡状态开始。然而，在正则化的错误函数上进行梯度下降应该会收敛到平衡状态，因此网络平衡可以用来评估学习进展。该理论基于两个局部神经元操作：可交换的缩放和不可交换的平衡。给定任何初始权重集，当对每个神经元以随机方式应用本地平衡操作时，通过随机平衡算法的收敛，全局顺序总是会出现在相同的平衡权重集上。这是因为存在一个基础的严格凸优化问题，其中相关变量被限制在一个线性、仅依赖于架构的流形上。模拟表明，在学习之前平衡神经元，或者在学习过程中交替进行梯度下降步骤和平衡操作，可以提高学习速度和最终性能。

更新时间: 2024-10-30 17:28:02

领域: cs.NE,cs.AI,cs.LG

下载: http://arxiv.org/abs/2405.09688v3

Super-resolution in disordered media using neural networks

We propose a methodology that exploits large and diverse data sets to accurately estimate the ambient medium's Green's functions in strongly scattering media. Given these estimates, obtained with and without the use of neural networks, excellent imaging results are achieved, with a resolution that is better than that of a homogeneous medium. This phenomenon, also known as super-resolution, occurs because the ambient scattering medium effectively enhances the physical imaging aperture.

Updated: 2024-10-30 17:27:58

标题: 在混乱介质中使用神经网络进行超分辨率

摘要: 我们提出一种利用大型和多样化数据集来准确估计强散射介质中环境介质的格林函数的方法。通过使用和不使用神经网络获得这些估计值，实现了出色的成像结果，分辨率优于均匀介质的分辨率。这种现象，也被称为超分辨率，是因为环境散射介质有效地增强了物理成像孔径。

更新时间: 2024-10-30 17:27:58

领域: cs.LG,cs.CV,eess.IV

下载: http://arxiv.org/abs/2410.21556v2

Full-waveform earthquake source inversion using simulation-based inference

This paper presents a novel framework for full-waveform seismic source inversion using simulation-based inference (SBI). Traditional probabilistic approaches often rely on simplifying assumptions about data errors, which we show can lead to inaccurate uncertainty quantification. SBI addresses this limitation by building an empirical probabilistic model of the data errors using machine learning models, known as neural density estimators, which can then be integrated into the Bayesian inference framework. We apply the SBI framework to point-source moment tensor inversions as well as joint moment tensor and time-location inversions. We construct a range of synthetic examples to explore the quality of the SBI solutions, as well as to compare the SBI results with standard Gaussian likelihood-based Bayesian inversions. We then demonstrate that under real seismic noise, common Gaussian likelihood assumptions for treating full-waveform data yield overconfident posterior distributions that underestimate the moment tensor component uncertainties by up to a factor of 3. We contrast this with SBI, which produces well-calibrated posteriors that generally agree with the true seismic source parameters, and offers an order-of-magnitude reduction in the number of simulations required to perform inference compared to standard Monte Carlo techniques. Finally, we apply our methodology to a pair of moderate magnitude earthquakes in the North Atlantic. We utilise seismic waveforms recorded by the recent UPFLOW ocean bottom seismometer array as well as by regional land stations in the Azores, comparing full moment tensor and source-time location posteriors between SBI and a Gaussian likelihood approach. We find that our adaptation of SBI can be directly applied to real earthquake sources to efficiently produce high quality posterior distributions that significantly improve upon Gaussian likelihood approaches.

Updated: 2024-10-30 17:25:57

标题: 使用基于模拟的推断进行全波形地震源反演

摘要: 这篇论文提出了一种使用基于模拟推理（SBI）的全波形地震源反演的新框架。传统的概率方法通常依赖于对数据误差的简化假设，我们显示这可能导致不准确的不确定性量化。SBI通过使用机器学习模型（称为神经密度估计器）构建数据误差的经验概率模型来解决这一局限，然后将其集成到贝叶斯推理框架中。我们将SBI框架应用于点源矩张量反演以及联合矩张量和时间-位置反演。我们构建了一系列合成示例来探索SBI解决方案的质量，以及将SBI结果与基于标准高斯似然的贝叶斯反演进行比较。然后，我们证明在真实的地震噪声下，常见的高斯似然假设用于处理全波形数据会产生过于自信的后验分布，低估矩张量分量的不确定性高达3倍。我们将其与SBI进行对比，后者产生了通常与真实地震源参数一致的校准后验，与标准蒙特卡洛技术相比，进行推理所需的模拟数量减少了一个数量级。最后，我们将我们的方法应用于北大西洋的一对中等震级地震。我们利用最近UPFLOW海底地震仪阵列记录的地震波形，以及亚速尔群岛的区域陆地台站记录的地震波形，比较SBI和高斯似然方法之间的全矩张量和源时-位置后验。我们发现我们对SBI的调整可以直接应用于真实的地震源，以高效地产生显著改进高斯似然方法的高质量后验分布。

更新时间: 2024-10-30 17:25:57

领域: physics.geo-ph,cs.LG,physics.data-an

下载: http://arxiv.org/abs/2410.23238v1

EMOTION: Expressive Motion Sequence Generation for Humanoid Robots with In-Context Learning

This paper introduces a framework, called EMOTION, for generating expressive motion sequences in humanoid robots, enhancing their ability to engage in humanlike non-verbal communication. Non-verbal cues such as facial expressions, gestures, and body movements play a crucial role in effective interpersonal interactions. Despite the advancements in robotic behaviors, existing methods often fall short in mimicking the diversity and subtlety of human non-verbal communication. To address this gap, our approach leverages the in-context learning capability of large language models (LLMs) to dynamically generate socially appropriate gesture motion sequences for human-robot interaction. We use this framework to generate 10 different expressive gestures and conduct online user studies comparing the naturalness and understandability of the motions generated by EMOTION and its human-feedback version, EMOTION++, against those by human operators. The results demonstrate that our approach either matches or surpasses human performance in generating understandable and natural robot motions under certain scenarios. We also provide design implications for future research to consider a set of variables when generating expressive robotic gestures.

Updated: 2024-10-30 17:22:45

标题: 情感：具有上下文学习的人形机器人表达动作序列生成

摘要: 这篇论文介绍了一个名为EMOTION的框架，用于在人形机器人中生成富有表现力的动作序列，增强它们参与类人非语言交流的能力。非语言暗示，如面部表情、手势和身体运动，在有效人际互动中起着至关重要的作用。尽管机器人行为有所进步，但现有方法往往无法模仿人类非语言交流的多样性和微妙之处。为了填补这一差距，我们的方法利用大型语言模型（LLMs）的上下文学习能力，动态生成适合人机交互的姿势动作序列。我们使用这个框架生成了10种不同的富有表现力的手势，并进行在线用户研究，比较了由EMOTION及其人类反馈版本EMOTION++生成的动作的自然性和可理解性与由人类操作员生成的动作。结果表明，在某些情况下，我们的方法在生成可理解和自然的机器人动作方面要么与人类表现相匹敌，要么超过。我们还提供了设计启示，供未来研究在生成富有表现力的机器人手势时考虑一组变量。

更新时间: 2024-10-30 17:22:45

领域: cs.RO,cs.AI

下载: http://arxiv.org/abs/2410.23234v1

CoTran: An LLM-based Code Translator using Reinforcement Learning with Feedback from Compiler and Symbolic Execution

In this paper, we present an LLM-based code translation method and an associated tool called CoTran, that translates whole-programs from one high-level programming language to another. Existing LLM-based code translation methods lack training to ensure that the translated code reliably compiles or bears substantial functional equivalence to the input code. In our work, we fine-tune an LLM using reinforcement learning, incorporating compiler feedback, and symbolic execution (symexec)-based testing feedback to assess functional equivalence between the input and output programs. The idea is to guide an LLM during fine-tuning, via compiler and symexec-based testing feedback, by letting it know how far it is from producing perfect translations. We conduct extensive experiments comparing CoTran with 14 other code translation tools, including human-written transpilers, LLM-based translation tools, and ChatGPT. Using a benchmark of over \num{57000} code pairs in Java and Python, we demonstrate that CoTran outperforms the other tools on relevant metrics such as compilation accuracy (CompAcc) and functional equivalence accuracy (FEqAcc). For example, in Python-to-Java translation, CoTran achieves 48.68% FEqAcc and 76.98% CompAcc, whereas the nearest competing tool (PLBART-base) gets 38.26% and 75.77% respectively. Additionally, CoTran, built on top of CodeT5, improves FEqAcc by +14.89% and CompAcc by +8.14% for Python-to-Java (resp., +12.94% and +4.30% for Java-to-Python).

Updated: 2024-10-30 17:22:41

标题: CoTran: 一种基于LLM的代码翻译器，利用强化学习并结合来自编译器和符号执行的反馈

摘要: 在本文中，我们提出了一种基于LLM的代码翻译方法，以及一个名为CoTran的相关工具，该工具可将整个程序从一种高级编程语言翻译为另一种。现有的基于LLM的代码翻译方法缺乏训练，无法确保翻译后的代码可靠地编译或具有与输入代码相当的功能等效性。在我们的工作中，我们使用强化学习对LLM进行微调，结合编译器反馈和基于符号执行（symexec）的测试反馈，以评估输入和输出程序之间的功能等效性。这个想法是通过编译器和基于symexec的测试反馈来指导LLM在微调过程中，让它知道离产生完美翻译还有多远。我们进行了大量实验，将CoTran与其他14个代码翻译工具进行了比较，包括人工编写的转换器、基于LLM的翻译工具和ChatGPT。使用Java和Python中超过57000对代码的基准测试，我们证明了CoTran在相关指标（如编译准确性（CompAcc）和功能等效性准确性（FEqAcc））方面优于其他工具。例如，在Python到Java的翻译中，CoTran实现了48.68%的FEqAcc和76.98%的CompAcc，而最接近的竞争工具（PLBART-base）分别为38.26%和75.77%。此外，建立在CodeT5之上的CoTran为Python到Java（Java到Python同理）提高了+14.89%的FEqAcc和+8.14%的CompAcc。

更新时间: 2024-10-30 17:22:41

领域: cs.PL,cs.AI,cs.SE,I.2.7; I.2.5; D.2

下载: http://arxiv.org/abs/2306.06755v4

Joint Estimation of Conditional Mean and Covariance for Unbalanced Panels

We propose a novel nonparametric kernel-based estimator of cross-sectional conditional mean and covariance matrices for large unbalanced panels. We show its consistency and provide finite-sample guarantees. In an empirical application, we estimate conditional mean and covariance matrices for a large unbalanced panel of monthly stock excess returns given macroeconomic and firm-specific covariates from 1962 to 2021.The estimator performs well with respect to statistical measures. It is informative for empirical asset pricing, generating conditional mean-variance efficient portfolios with substantial out-of-sample Sharpe ratios far beyond equal-weighted benchmarks.

Updated: 2024-10-30 17:21:15

标题: 不平衡面板数据的条件均值和协方差的联合估计

摘要: 我们提出了一种新颖的非参数核估计方法，用于大型不平衡面板的横截面条件均值和协方差矩阵。我们展示了其一致性，并提供了有限样本保证。在一个实证应用中，我们估计了从1962年到2021年的大型不平衡面板的月度股票超额收益的条件均值和协方差矩阵，给定宏观经济和公司特定的协变量。该估计方法在统计指标方面表现良好，对于实证资产定价是有信息意义的，生成了条件均方差有效投资组合，其超出等权指数基准的样本外夏普比率远远超过。

更新时间: 2024-10-30 17:21:15

领域: stat.ME,cs.LG,q-fin.ST,stat.ML,(primary) 62G05 (secondary) 62G20, 46E40, 46E22

下载: http://arxiv.org/abs/2410.21858v2

Attribute-to-Delete: Machine Unlearning via Datamodel Matching

Machine unlearning -- efficiently removing the effect of a small "forget set" of training data on a pre-trained machine learning model -- has recently attracted significant research interest. Despite this interest, however, recent work shows that existing machine unlearning techniques do not hold up to thorough evaluation in non-convex settings. In this work, we introduce a new machine unlearning technique that exhibits strong empirical performance even in such challenging settings. Our starting point is the perspective that the goal of unlearning is to produce a model whose outputs are statistically indistinguishable from those of a model re-trained on all but the forget set. This perspective naturally suggests a reduction from the unlearning problem to that of data attribution, where the goal is to predict the effect of changing the training set on a model's outputs. Thus motivated, we propose the following meta-algorithm, which we call Datamodel Matching (DMM): given a trained model, we (a) use data attribution to predict the output of the model if it were re-trained on all but the forget set points; then (b) fine-tune the pre-trained model to match these predicted outputs. In a simple convex setting, we show how this approach provably outperforms a variety of iterative unlearning algorithms. Empirically, we use a combination of existing evaluations and a new metric based on the KL-divergence to show that even in non-convex settings, DMM achieves strong unlearning performance relative to existing algorithms. An added benefit of DMM is that it is a meta-algorithm, in the sense that future advances in data attribution translate directly into better unlearning algorithms, pointing to a clear direction for future progress in unlearning.

Updated: 2024-10-30 17:20:10

标题: 要删除的属性：通过数据模型匹配实现机器遗忘

摘要: 机器遗忘 - 有效地消除预训练机器学习模型上一个小的“遗忘集”训练数据的影响 - 最近引起了相当大的研究兴趣。然而，尽管存在这种兴趣，最近的研究表明现有的机器遗忘技术在非凸设置下并不经受严格评估。在这项工作中，我们介绍了一种新的机器遗忘技术，即使在这种具有挑战性的设置下也展现出强大的经验性能。我们的出发点是，遗忘的目标是产生一个模型，其输出在统计上与重新训练除遗忘集外的所有模型的输出不可区分。这一观点自然地将遗忘问题简化为数据归因问题，其中目标是预测改变训练集对模型输出的影响。因此，我们提出了以下元算法，称为数据模型匹配（DMM）：给定一个训练好的模型，我们（a）使用数据归因来预测如果模型在除遗忘集点外重新训练时的输出；然后（b）微调预训练模型以匹配这些预测的输出。在一个简单的凸设置中，我们展示了这种方法可以明显优于各种迭代遗忘算法。在经验上，我们使用现有评估和基于KL散度的新度量标准，以展示即使在非凸设置下，DMM相对于现有算法实现了强大的遗忘性能。DMM的一个附加好处是它是一个元算法，即未来在数据归因方面的进展将直接转化为更好的遗忘算法，为未来遗忘的进展指明了明确的方向。

更新时间: 2024-10-30 17:20:10

领域: cs.LG

下载: http://arxiv.org/abs/2410.23232v1

Aligning Audio-Visual Joint Representations with an Agentic Workflow

Visual content and accompanied audio signals naturally formulate a joint representation to improve audio-visual (AV) related applications. While studies develop various AV representation learning frameworks, the importance of AV data alignment is usually undermined for achieving high-quality representation. We observe that an audio signal may contain background noise interference. Also, non-synchronization may appear between audio and video streams. These non-strict data alignment limits representation quality and downgrade application performance. In this paper, we propose to improve AV joint representations from a data-centric perspective by aligning audio signals to visual data. Our alignment is conducted in an agentic workflow controlled by an LLM-based assistant named AVAgent. For each input AV data pair, our AVAgent uses a multi-modal LLM to convert audio and visual data into language descriptions separately (i.e., tool use). Then, AVAgent reasons whether this paired data is aligned well and plans to edit the audio signal if needed (i.e., planning). The audio editing is executed by predefined actions that filter noise or augment data. Moreover, we use a VLM to evaluate how modified audio signals match the visual content and provide feedback to AVAgent (i.e., reflection). The tool use, planning, and reflection steps operate cyclically to become an agentic workflow where audio signals are gradually aligned to visual content. To this end, existing methods can directly leverage the aligned AV data via our agentic workflow to improve AV joint representations. The experimental results comprehensively demonstrate the state-of-the-art performance of the proposed approach against previous baselines in diverse downstream tasks.

Updated: 2024-10-30 17:18:53

标题: 将音频-视频联合表示与主体工作流程对齐

摘要: 视觉内容和伴随的音频信号自然地构成一个联合表示，以改善音视频（AV）相关应用。虽然研究开发了各种AV表示学习框架，但通常忽视了AV数据对齐对于获得高质量表示的重要性。我们观察到音频信号可能包含背景噪音干扰。此外，音频和视频流之间可能存在非同步现象。这些非严格的数据对齐限制了表示质量并降低了应用性能。在本文中，我们提出从数据中心的角度改进AV联合表示，通过将音频信号与视觉数据对齐。我们的对齐是在由名为AVAgent的LLM助手控制的代理工作流中进行的。对于每个输入的AV数据对，我们的AVAgent使用多模态LLM将音频和视觉数据分别转换为语言描述（即工具使用）。然后，AVAgent推理出这对数据是否对齐良好，如有必要计划编辑音频信号（即规划）。音频编辑是通过预定义的动作执行的，用于过滤噪音或增强数据。此外，我们使用VLM评估修改后的音频信号与视觉内容的匹配程度，并向AVAgent提供反馈（即反思）。工具使用、规划和反思步骤循环操作，成为一个代理工作流，其中音频信号逐渐与视觉内容对齐。为此，现有方法可以直接利用通过我们的代理工作流对齐的AV数据，以改进AV联合表示。实验结果全面展示了所提方法在多样的下游任务中相对于先前基线的最新性能。

更新时间: 2024-10-30 17:18:53

领域: cs.CV,cs.AI,cs.LG,cs.MM,cs.SD,eess.AS

下载: http://arxiv.org/abs/2410.23230v1

Entrywise error bounds for low-rank approximations of kernel matrices

In this paper, we derive entrywise error bounds for low-rank approximations of kernel matrices obtained using the truncated eigen-decomposition (or singular value decomposition). While this approximation is well-known to be optimal with respect to the spectral and Frobenius norm error, little is known about the statistical behaviour of individual entries. Our error bounds fill this gap. A key technical innovation is a delocalisation result for the eigenvectors of the kernel matrix corresponding to small eigenvalues, which takes inspiration from the field of Random Matrix Theory. Finally, we validate our theory with an empirical study of a collection of synthetic and real-world datasets.

Updated: 2024-10-30 17:17:22

标题: 核矩阵低秩逼近的逐元误差界

摘要: 在这篇论文中，我们推导了使用截断特征值分解（或奇异值分解）获得的核矩阵的低秩逼近的逐元素误差界。虽然这种逼近在谱范数和Frobenius范数误差方面被认为是最优的，但关于各个元素的统计行为知之甚少。我们的误差界填补了这一空白。一个关键的技术创新是针对与小特征值对应的核矩阵的特征向量的去局部化结果，灵感来源于随机矩阵理论领域。最后，我们通过对一系列合成和真实世界数据集的经验研究验证了我们的理论。

更新时间: 2024-10-30 17:17:22

领域: math.ST,cs.LG,stat.TH,62G20

下载: http://arxiv.org/abs/2405.14494v2

Breach By A Thousand Leaks: Unsafe Information Leakage in `Safe' AI Responses

Vulnerability of Frontier language models to misuse and jailbreaks has prompted the development of safety measures like filters and alignment training in an effort to ensure safety through robustness to adversarially crafted prompts. We assert that robustness is fundamentally insufficient for ensuring safety goals, and current defenses and evaluation methods fail to account for risks of dual-intent queries and their composition for malicious goals. To quantify these risks, we introduce a new safety evaluation framework based on impermissible information leakage of model outputs and demonstrate how our proposed question-decomposition attack can extract dangerous knowledge from a censored LLM more effectively than traditional jailbreaking. Underlying our proposed evaluation method is a novel information-theoretic threat model of inferential adversaries, distinguished from security adversaries, such as jailbreaks, in that success is measured by inferring impermissible knowledge from victim outputs as opposed to forcing explicitly impermissible outputs from the victim. Through our information-theoretic framework, we show that to ensure safety against inferential adversaries, defense mechanisms must ensure information censorship, bounding the leakage of impermissible information. However, we prove that such defenses inevitably incur a safety-utility trade-off.

Updated: 2024-10-30 17:16:44

标题: 一千漏洞中的违规行为：`安全' AI 响应中的信息泄漏问题

摘要: 前沿语言模型对滥用和越狱的脆弱性促使开发类似过滤器和对齐训练等安全措施，以确保通过对对抗性构建的提示的鲁棒性来确保安全性。我们断言，仅仅依靠鲁棒性来确保安全目标是不够的，当前的防御和评估方法未能考虑到双重意图查询及其组合对恶意目标的风险。为了量化这些风险，我们引入了一种基于模型输出不允许的信息泄漏的新的安全评估框架，并展示了我们提出的问题分解攻击如何比传统的越狱更有效地从被审查的LLM中提取危险知识。我们提出的评估方法的基础是一种新颖的信息论威胁模型，这种模型与安全威胁（如越狱）有所不同，成功是通过从受害者输出中推断出不允许的知识，而不是强制受害者明确产生不允许的输出来衡量的。通过我们的信息论框架，我们表明，为了确保对推理对手的安全性，防御机制必须确保信息审查，限制不允许信息的泄漏。然而，我们证明这样的防御措施不可避免地会造成安全与效用之间的权衡。

更新时间: 2024-10-30 17:16:44

领域: cs.CR,cs.AI,cs.CY

下载: http://arxiv.org/abs/2407.02551v2

Emergence of meta-stable clustering in mean-field transformer models

We model the evolution of tokens within a deep stack of Transformer layers as a continuous-time flow on the unit sphere, governed by a mean-field interacting particle system, building on the framework introduced in (Geshkovski et al., 2023). Studying the corresponding mean-field Partial Differential Equation (PDE), which can be interpreted as a Wasserstein gradient flow, in this paper we provide a mathematical investigation of the long-term behavior of this system, with a particular focus on the emergence and persistence of meta-stable phases and clustering phenomena, key elements in applications like next-token prediction. More specifically, we perform a perturbative analysis of the mean-field PDE around the iid uniform initialization and prove that, in the limit of large number of tokens, the model remains close to a meta-stable manifold of solutions with a given structure (e.g., periodicity). Further, the structure characterizing the meta-stable manifold is explicitly identified, as a function of the inverse temperature parameter of the model, by the index maximizing a certain rescaling of Gegenbauer polynomials.

Updated: 2024-10-30 17:16:38

标题: 均场变压器模型中亚稳态聚类的出现

摘要: 我们将Transformer层深度堆叠中的令牌演化建模为在单位球上的连续时间流，由一个平均场相互作用粒子系统控制，这是在(Geshkovski等人，2023)中引入的框架的延伸。在本文中，通过研究相应的平均场偏微分方程（PDE），可以将其解释为Wasserstein梯度流，我们对该系统的长期行为进行了数学研究，特别关注元稳定相和聚类现象的出现和持续，这些是像下一个令牌预测这样的应用中的关键要素。更具体地说，我们围绕iid均匀初始化对平均场PDE进行了摄动分析，并证明在令牌数量较大的极限下，模型保持接近具有特定结构（例如周期性）的解的元稳定流形。此外，以模型的逆温度参数为函数，通过最大化某种Gegenbauer多项式的重新缩放来明确识别表征元稳定流形的结构。

更新时间: 2024-10-30 17:16:38

领域: cs.LG,math.AP,34D05, 34D06, 35Q83

下载: http://arxiv.org/abs/2410.23228v1

(FL)$^2$: Overcoming Few Labels in Federated Semi-Supervised Learning

Federated Learning (FL) is a distributed machine learning framework that trains accurate global models while preserving clients' privacy-sensitive data. However, most FL approaches assume that clients possess labeled data, which is often not the case in practice. Federated Semi-Supervised Learning (FSSL) addresses this label deficiency problem, targeting situations where only the server has a small amount of labeled data while clients do not. However, a significant performance gap exists between Centralized Semi-Supervised Learning (SSL) and FSSL. This gap arises from confirmation bias, which is more pronounced in FSSL due to multiple local training epochs and the separation of labeled and unlabeled data. We propose $(FL)^2$, a robust training method for unlabeled clients using sharpness-aware consistency regularization. We show that regularizing the original pseudo-labeling loss is suboptimal, and hence we carefully select unlabeled samples for regularization. We further introduce client-specific adaptive thresholding and learning status-aware aggregation to adjust the training process based on the learning progress of each client. Our experiments on three benchmark datasets demonstrate that our approach significantly improves performance and bridges the gap with SSL, particularly in scenarios with scarce labeled data.

Updated: 2024-10-30 17:15:02

标题: (FL)$^2$: 克服联邦半监督学习中的少标签问题

摘要: 联邦学习（FL）是一种分布式机器学习框架，训练准确的全局模型同时保护客户的隐私敏感数据。然而，大多数FL方法假设客户拥有标记数据，而实际情况往往并非如此。联邦半监督学习（FSSL）解决了这种标记不足的问题，针对只有服务器拥有少量标记数据而客户没有的情况。然而，集中式半监督学习（SSL）和FSSL之间存在显著的性能差距。这种差距是由于确认偏差引起的，由于FSSL中存在多个本地训练轮次以及标记和未标记数据的分离，因此这种偏差更加明显。我们提出了$(FL)^2$，一种利用敏锐感知一致性正则化对未标记客户进行强健训练的方法。我们展示了对原始伪标记损失进行正则化是次优的，因此我们精心选择未标记样本进行正则化。我们进一步引入了客户特定的自适应阈值和学习状态感知聚合，根据每个客户的学习进度调整训练过程。我们在三个基准数据集上的实验表明，我们的方法显著提高了性能，并在标记数据稀缺的情况下弥合了与SSL之间的差距。

更新时间: 2024-10-30 17:15:02

领域: cs.LG

下载: http://arxiv.org/abs/2410.23227v1

COMAL: A Convergent Meta-Algorithm for Aligning LLMs with General Preferences

Many alignment methods, including reinforcement learning from human feedback (RLHF), rely on the Bradley-Terry reward assumption, which is insufficient to capture the full range of general human preferences. To achieve robust alignment with general preferences, we model the alignment problem as a two-player zero-sum game, where the Nash equilibrium policy guarantees a 50% win rate against any competing policy. However, previous algorithms for finding the Nash policy either diverge or converge to a Nash policy in a modified game, even in a simple synthetic setting, thereby failing to maintain the 50% win rate guarantee against all other policies. We propose a meta-algorithm, Convergent Meta Alignment Algorithm (COMAL), for language model alignment with general preferences, inspired by convergent algorithms in game theory. Theoretically, we prove that our meta-algorithm converges to an exact Nash policy in the last iterate. Additionally, our meta-algorithm is simple and can be integrated with many existing methods designed for RLHF and preference optimization with minimal changes. Experimental results demonstrate the effectiveness of the proposed framework when combined with existing preference policy optimization methods.

Updated: 2024-10-30 17:13:02

标题: COMAL：一种用于将LLMs与一般偏好对齐的收敛元算法

摘要: 许多对齐方法，包括从人类反馈中进行强化学习（RLHF），依赖于Bradley-Terry奖励假设，这不足以捕捉完整范围的一般人类偏好。为了实现与一般偏好的稳健对齐，我们将对齐问题建模为一个两人零和博弈，其中Nash均衡政策保证了与任何竞争政策的50％胜率。然而，先前用于找到Nash政策的算法要么发散，要么在一个修改后的游戏中收敛到Nash政策，即使在一个简单的合成环境中，因此无法保持与所有其他政策的50％胜率保证。我们提出了一种元算法，称为Convergent Meta Alignment Algorithm（COMAL），用于具有一般偏好的语言模型对齐，灵感来自于博弈论中的收敛算法。从理论上讲，我们证明了我们的元算法在最后一次迭代中收敛到一个精确的Nash政策。此外，我们的元算法简单易行，可以与许多现有设计用于RLHF和偏好优化的方法集成，只需进行最小的更改。实验结果表明，当与现有的偏好政策优化方法结合时，所提出的框架的有效性。

更新时间: 2024-10-30 17:13:02

领域: cs.LG,cs.AI,cs.CL,cs.GT

下载: http://arxiv.org/abs/2410.23223v1

Partial Channel Dependence with Channel Masks for Time Series Foundation Models

Recent advancements in foundation models have been successfully extended to the time series (TS) domain, facilitated by the emergence of large-scale TS datasets. However, previous efforts have primarily focused on designing model architectures to address explicit heterogeneity among datasets such as various numbers of channels, while often overlooking implicit heterogeneity such as varying dependencies between channels. In this work, we introduce the concept of partial channel dependence (PCD), which enables a more sophisticated adjustment of channel dependencies based on dataset-specific information. To achieve PCD, we propose a channel mask that captures the relationships between channels within a dataset using two key components: 1) a correlation matrix that encodes relative dependencies between channels, and 2) domain parameters that learn the absolute dependencies specific to each dataset, refining the correlation matrix. We validate the effectiveness of PCD across four tasks in TS including forecasting, classification, imputation, and anomaly detection, under diverse settings, including few-shot and zero-shot scenarios with both TS foundation models and single-task models. Code is available at https://github.com/seunghan96/CM.

Updated: 2024-10-30 17:12:03

标题: 使用通道屏蔽的时间序列基础模型的部分通道依赖

摘要: 最近基础模型的进展已成功扩展到时间序列（TS）领域，这得益于大规模TS数据集的出现。然而，先前的努力主要集中在设计模型架构上，以解决数据集之间明显的异质性，如不同通道数量，同时往往忽视了隐含的异质性，如通道之间的变化依赖性。在这项工作中，我们引入了部分通道依赖（PCD）的概念，这使得基于特定数据集信息更复杂地调整通道依赖性成为可能。为了实现PCD，我们提出了一个通道蒙版，它使用两个关键组件捕捉数据集内通道之间的关系：1）编码通道之间相对依赖关系的相关矩阵，以及2）学习每个数据集特定绝对依赖关系的域参数，进一步完善相关矩阵。我们在包括预测、分类、插补和异常检测在内的四个TS任务中验证了PCD的有效性，这些任务在不同设置下进行，包括使用TS基础模型和单任务模型的少样本和零样本情景。代码可在https://github.com/seunghan96/CM找到。

更新时间: 2024-10-30 17:12:03

领域: cs.LG,cs.AI,stat.ML

下载: http://arxiv.org/abs/2410.23222v1

DiaMond: Dementia Diagnosis with Multi-Modal Vision Transformers Using MRI and PET

Diagnosing dementia, particularly for Alzheimer's Disease (AD) and frontotemporal dementia (FTD), is complex due to overlapping symptoms. While magnetic resonance imaging (MRI) and positron emission tomography (PET) data are critical for the diagnosis, integrating these modalities in deep learning faces challenges, often resulting in suboptimal performance compared to using single modalities. Moreover, the potential of multi-modal approaches in differential diagnosis, which holds significant clinical importance, remains largely unexplored. We propose a novel framework, DiaMond, to address these issues with vision Transformers to effectively integrate MRI and PET. DiaMond is equipped with self-attention and a novel bi-attention mechanism that synergistically combine MRI and PET, alongside a multi-modal normalization to reduce redundant dependency, thereby boosting the performance. DiaMond significantly outperforms existing multi-modal methods across various datasets, achieving a balanced accuracy of 92.4% in AD diagnosis, 65.2% for AD-MCI-CN classification, and 76.5% in differential diagnosis of AD and FTD. We also validated the robustness of DiaMond in a comprehensive ablation study. The code is available at https://github.com/ai-med/DiaMond.

Updated: 2024-10-30 17:11:00

标题: "DiaMond：利用MRI和PET进行多模式视觉变换的痴呆症诊断"

摘要: 诊断痴呆症，特别是阿尔茨海默病（AD）和额颞叶痴呆（FTD），由于症状重叠而变得复杂。虽然磁共振成像（MRI）和正电子发射断层扫描（PET）数据对诊断至关重要，但将这些模态整合到深度学习中面临挑战，通常导致性能不佳，与使用单一模态相比。此外，多模态方法在鉴别诊断中的潜力，具有重要的临床意义，但仍然未得到充分探索。我们提出了一个新颖的框架DiaMond，以视觉Transformer有效地整合MRI和PET。DiaMond配备了自注意力和一种新颖的双重关注机制，使MRI和PET协同结合，同时进行多模态归一化以减少冗余依赖，从而提高性能。DiaMond在各种数据集上明显优于现有的多模态方法，实现了AD诊断的平衡准确率92.4%，AD-MCI-CN分类为65.2%，AD和FTD的鉴别诊断为76.5%。我们还通过全面的消融研究验证了DiaMond的稳健性。代码可在https://github.com/ai-med/DiaMond找到。

更新时间: 2024-10-30 17:11:00

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2410.23219v1

Bandits with Preference Feedback: A Stackelberg Game Perspective

Bandits with preference feedback present a powerful tool for optimizing unknown target functions when only pairwise comparisons are allowed instead of direct value queries. This model allows for incorporating human feedback into online inference and optimization and has been employed in systems for fine-tuning large language models. The problem is well understood in simplified settings with linear target functions or over finite small domains that limit practical interest. Taking the next step, we consider infinite domains and nonlinear (kernelized) rewards. In this setting, selecting a pair of actions is quite challenging and requires balancing exploration and exploitation at two levels: within the pair, and along the iterations of the algorithm. We propose MAXMINLCB, which emulates this trade-off as a zero-sum Stackelberg game, and chooses action pairs that are informative and yield favorable rewards. MAXMINLCB consistently outperforms existing algorithms and satisfies an anytime-valid rate-optimal regret guarantee. This is due to our novel preference-based confidence sequences for kernelized logistic estimators.

Updated: 2024-10-30 17:10:52

标题: 具有偏好反馈的土匪：斯塔克尔贝格博弈视角

摘要: 偏好反馈的贪婪算法是在仅允许两两比较而不是直接值查询时优化未知目标函数的强大工具。这种模型允许将人类反馈纳入在线推断和优化，并已应用于用于微调大型语言模型的系统中。在简化的线性目标函数或限制实际兴趣的有限小域中，这个问题被很好地理解。接下来，我们考虑无限域和非线性（核化）奖励。在这种设置中，选择一对动作是非常具有挑战性的，并需要在两个层面平衡探索和开发：在一对动作内部和算法的迭代过程中。我们提出了MAXMINLCB，它模拟了这种权衡作为一个零和Stackelberg博弈，并选择信息量大且产生有利奖励的动作对。MAXMINLCB始终优于现有算法，并满足任意有效的速率最优遗憾保证。这是由于我们对核化逻辑估计器的基于偏好的置信序列的创新。

更新时间: 2024-10-30 17:10:52

领域: cs.LG,cs.AI,cs.GT,stat.ML

下载: http://arxiv.org/abs/2406.16745v2

Crosstalk Attack Resilient RNS Quantum Addition

As quantum computers scale, the rise of multi-user and cloud-based quantum platforms can lead to new security challenges. Attacks within shared execution environments become increasingly feasible due to the crosstalk noise that, in combination with quantum computer's hardware specifications, can be exploited in form of crosstalk attack. Our work pursues crosstalk attack implementation in ion-trap quantum computers. We propose three novel quantum crosstalk attacks designed for ion trap qubits: (i) Alternate CNOT attack (ii) Superposition Alternate CNOT (SAC) attack (iii) Alternate Phase Change (APC) attack. We demonstrate the effectiveness of proposed attacks by conducting noise-based simulations on a commercial 20-qubit ion-trap quantum computer. The proposed attacks achieve an impressive reduction of up to 42.2% in output probability for Quantum Full Adders (QFA) having 6 to 9-qubit output. Finally, we investigate the possibility of mitigating crosstalk attacks by using Residue Number System (RNS) based Parallel Quantum Addition (PQA). We determine that PQA achieves higher attack resilience against crosstalk attacks in the form of 24.3% to 133.5% improvement in output probability against existing Non Parallel Quantum Addition (NPQA). Through our systematic methodology, we demonstrate how quantum properties such as superposition and phase transition can lead to crosstalk attacks and also how parallel quantum computing can help secure against these attacks.

Updated: 2024-10-30 17:08:23

标题: 跨谈攻击抗性的RNS量子加法

摘要: 随着量子计算机规模的扩大，多用户和基于云的量子平台的兴起可能会导致新的安全挑战。由于串扰噪声与量子计算机的硬件规格结合在一起，共享执行环境内的攻击变得越来越可行，可以被利用成为串扰攻击。我们的工作追求在离子阱量子计算机中实施串扰攻击。我们提出了三种针对离子阱量子比特设计的新型量子串扰攻击：(i) 交替CNOT攻击 (ii) 超位置交替CNOT (SAC) 攻击 (iii) 交替相位变化 (APC) 攻击。我们通过在商用20比特离子阱量子计算机上进行基于噪声的模拟，展示了提出的攻击的有效性。所提出的攻击在具有6至9比特输出的量子全加器 (QFA) 中实现了高达42.2%的输出概率减少。最后，我们通过使用基于余数系统 (RNS) 的并行量子加法 (PQA) 探讨了减轻串扰攻击的可能性。我们确定PQA在抵御串扰攻击方面表现出更高的攻击韧性，对现有的非并行量子加法 (NPQA) 实现了输出概率的24.3%至133.5%的改进。通过我们的系统方法，我们展示了量子特性如超位置和相位转变如何导致串扰攻击，以及并行量子计算如何帮助抵御这些攻击。

更新时间: 2024-10-30 17:08:23

领域: quant-ph,cs.CR

下载: http://arxiv.org/abs/2410.23217v1

CARES: A Comprehensive Benchmark of Trustworthiness in Medical Vision Language Models

Artificial intelligence has significantly impacted medical applications, particularly with the advent of Medical Large Vision Language Models (Med-LVLMs), sparking optimism for the future of automated and personalized healthcare. However, the trustworthiness of Med-LVLMs remains unverified, posing significant risks for future model deployment. In this paper, we introduce CARES and aim to comprehensively evaluate the Trustworthiness of Med-LVLMs across the medical domain. We assess the trustworthiness of Med-LVLMs across five dimensions, including trustfulness, fairness, safety, privacy, and robustness. CARES comprises about 41K question-answer pairs in both closed and open-ended formats, covering 16 medical image modalities and 27 anatomical regions. Our analysis reveals that the models consistently exhibit concerns regarding trustworthiness, often displaying factual inaccuracies and failing to maintain fairness across different demographic groups. Furthermore, they are vulnerable to attacks and demonstrate a lack of privacy awareness. We publicly release our benchmark and code in https://cares-ai.github.io/.

Updated: 2024-10-30 17:08:16

标题: CARES：医疗视觉语言模型可信度的全面基准

摘要: 人工智能对医疗应用产生了显著影响，特别是随着医学大视觉语言模型（Med-LVLMs）的出现，为自动化和个性化医疗的未来带来了乐观情绪。然而，Med-LVLMs的可信度尚未经过验证，为未来模型部署带来了重大风险。在本文中，我们介绍了CARES，并旨在全面评估医学领域Med-LVLMs的可信度。我们评估了Med-LVLMs在信任度、公平性、安全性、隐私性和稳健性等五个方面的可信度。CARES包括约41K个封闭和开放式格式的问题-答案对，涵盖了16种医学图像模态和27个解剖区域。我们的分析表明，这些模型在可信度方面存在持续的担忧，经常显示事实错误，并未能在不同人口群体之间保持公平性。此外，它们容易受到攻击，并显示缺乏隐私意识。我们在https://cares-ai.github.io/上公开发布了我们的基准和代码。

更新时间: 2024-10-30 17:08:16

领域: cs.LG,cs.CL,cs.CV,cs.CY

下载: http://arxiv.org/abs/2406.06007v2

Grounding by Trying: LLMs with Reinforcement Learning-Enhanced Retrieval

The hallucinations of large language models (LLMs) are increasingly mitigated by allowing LLMs to search for information and to ground their answers in real sources. Unfortunately, LLMs often struggle with posing the right search queries, especially when dealing with complex or otherwise indirect topics. Observing that LLMs can learn to search for relevant facts by $\textit{trying}$ different queries and learning to up-weight queries that successfully produce relevant results, we introduce $\underline{Le}$arning to $\underline{Re}$trieve by $\underline{T}$rying (LeReT), a reinforcement learning framework that explores search queries and uses preference-based optimization to improve their quality. \methodclass can improve the absolute retrieval accuracy by up to 29\% and the downstream generator evaluations by 17\%. The simplicity and flexibility of LeReT allows it to be applied to arbitrary off-the-shelf retrievers and makes it a promising technique for improving general LLM pipelines. Project website: http://sherylhsu.com/LeReT/.

Updated: 2024-10-30 17:02:54

标题: 尝试学习的基础：通过强化学习增强检索的LLMs

摘要: 使用搜索和依据真实来源的答案，逐渐减轻了大型语言模型（LLMs）的幻觉。不幸的是，LLMs在处理复杂或间接话题时经常难以提出正确的搜索查询。观察到LLMs可以通过尝试不同的查询并学习提升成功产生相关结果的查询来学习搜索相关事实，我们引入了LeReT（LeReT）学习检索尝试的强化学习框架，探索搜索查询并使用基于偏好的优化来提高其质量。方法可以将绝对检索准确性提高高达29％，并将下游生成器评估提高17％。LeReT的简单性和灵活性使其可以应用于任意现成的检索器，并使其成为改进通用LLM管道的有希望的技术。项目网站：http://sherylhsu.com/LeReT/。

更新时间: 2024-10-30 17:02:54

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2410.23214v1

Improved convergence rate of kNN graph Laplacians

In graph-based data analysis, $k$-nearest neighbor ($k$NN) graphs are widely used due to their adaptivity to local data densities. Allowing weighted edges in the graph, the kernelized graph affinity provides a more general type of $k$NN graph where the $k$NN distance is used to set the kernel bandwidth adaptively. In this work, we consider a general class of $k$NN graph where the graph affinity is $W_{ij} = \epsilon^{-d/2} \; k_0 ( \| x_i - x_j \|^2 / \epsilon \phi( \widehat{\rho}(x_i), \widehat{\rho}(x_j) )^2 ) $, with $\widehat{\rho}(x)$ being the (rescaled) $k$NN distance at the point $x$, $\phi$ a symmetric bi-variate function, and $k_0$ a non-negative function on $[0,\infty)$. Under the manifold data setting, where $N$ i.i.d. samples $x_i$ are drawn from a density $p$ on a $d$-dimensional unknown manifold embedded in a high dimensional Euclidean space, we prove the point-wise convergence of the $k$NN graph Laplacian to the limiting manifold operator (depending on $p$) at the rate of $O(N^{-2/(d+6)}\,)$, up to a log factor, when $k_0$ and $\phi$ have $C^3$ regularity and satisfy other technical conditions. This fast rate is obtained when $\epsilon \sim N^{-2/(d+6)}\,$ and $k \sim N^{6/(d+6)}\,$, both at the optimal order to balance the theoretical bias and variance errors. When $k_0$ and $\phi$ have lower regularities, including when $k_0$ is a compactly supported function as in the standard $k$NN graph, the convergence rate degenerates to $O(N^{-1/(d+4)}\,)$. Our improved convergence rate is based on a refined analysis of the $k$NN estimator, which can be of independent interest. We validate our theory by numerical experiments on simulated data.

Updated: 2024-10-30 17:01:00

标题: kNN图拉普拉斯的收敛速率改善

摘要: 在基于图的数据分析中，$k$-最近邻（$k$NN）图因其对局部数据密度的适应性而被广泛使用。允许图中存在加权边，核化图亲和性提供了一种更一般类型的$k$NN图，其中$k$NN距离被用来自适应地设置核带宽。在这项工作中，我们考虑了一类一般的$k$NN图，其中图的亲和性为$W_{ij} = \epsilon^{-d/2} \; k_0 ( \| x_i - x_j \|^2 / \epsilon \phi( \widehat{\rho}(x_i), \widehat{\rho}(x_j) )^2 ) $，其中$\widehat{\rho}(x)$是点$x$处的（重新缩放的）$k$NN距离，$\phi$是对称的双变量函数，$k_0$是定义在$[0,\infty)$上的非负函数。在流形数据设置下，从一个未知嵌入在高维欧氏空间中的$d$维流形上的密度$p$中抽取$N$个独立同分布的样本$x_i$，我们证明了$k$NN图拉普拉斯矩阵在点级别以$O(N^{-2/(d+6)}\,)$的速率收敛到极限流形算子（取决于$p$），在$k_0$和$\phi$具有$C^3$正则性并满足其他技术条件时，存在对数因子。当$\epsilon \sim N^{-2/(d+6)}\,$和$k \sim N^{6/(d+6)}\,$时，都达到了理论偏差和方差误差平衡的最佳顺序。当$k_0$和$\phi$具有较低的正则性，包括当$k_0$是一个紧支持函数，就像标准的$k$NN图一样，收敛速率会退化为$O(N^{-1/(d+4)}\,)。我们改进的收敛速率基于对$k$NN估计器的精细分析，这可能是独立感兴趣的。我们通过对模拟数据的数值实验证实了我们的理论。

更新时间: 2024-10-30 17:01:00

领域: stat.ML,cs.LG,math.ST,stat.TH

下载: http://arxiv.org/abs/2410.23212v1

Kinetix: Investigating the Training of General Agents through Open-Ended Physics-Based Control Tasks

While large models trained with self-supervised learning on offline datasets have shown remarkable capabilities in text and image domains, achieving the same generalisation for agents that act in sequential decision problems remains an open challenge. In this work, we take a step towards this goal by procedurally generating tens of millions of 2D physics-based tasks and using these to train a general reinforcement learning (RL) agent for physical control. To this end, we introduce Kinetix: an open-ended space of physics-based RL environments that can represent tasks ranging from robotic locomotion and grasping to video games and classic RL environments, all within a unified framework. Kinetix makes use of our novel hardware-accelerated physics engine Jax2D that allows us to cheaply simulate billions of environment steps during training. Our trained agent exhibits strong physical reasoning capabilities, being able to zero-shot solve unseen human-designed environments. Furthermore, fine-tuning this general agent on tasks of interest shows significantly stronger performance than training an RL agent *tabula rasa*. This includes solving some environments that standard RL training completely fails at. We believe this demonstrates the feasibility of large scale, mixed-quality pre-training for online RL and we hope that Kinetix will serve as a useful framework to investigate this further.

Updated: 2024-10-30 16:59:41

标题: Kinetix：通过开放式物理控制任务研究通用代理的训练

摘要: 尽管在离线数据集上使用自监督学习训练的大型模型在文本和图像领域展现出了非凡的能力，但对于在顺序决策问题中行动的代理来说，实现相同的泛化能力仍然是一个未解之谜。在这项工作中，我们通过逐步生成数千万个基于2D物理的任务，并利用这些任务训练一个通用的强化学习（RL）代理，向这一目标迈出了一步。为此，我们引入了Kinetix：一个开放式的基于物理的RL环境空间，可以表示从机器人运动和抓取到视频游戏和经典RL环境等各种任务，全部在一个统一的框架内。Kinetix利用我们的新颖的硬件加速物理引擎Jax2D，使我们能够在训练过程中廉价地模拟数十亿次环境步骤。我们训练的代理展现出了强大的物理推理能力，能够在未见过的人类设计的环境中零-shot解决问题。此外，在感兴趣的任务上对这个通用代理进行微调，显示出比从零开始训练RL代理更强大的性能。这包括解决一些标准RL训练完全失败的环境。我们相信这展示了在线RL大规模、混合质量的预训练的可行性，希望Kinetix将成为进一步研究的有用框架。

更新时间: 2024-10-30 16:59:41

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2410.23208v1

Impacts of floating-point non-associativity on reproducibility for HPC and deep learning applications

Run to run variability in parallel programs caused by floating-point non-associativity has been known to significantly affect reproducibility in iterative algorithms, due to accumulating errors. Non-reproducibility can critically affect the efficiency and effectiveness of correctness testing for stochastic programs. Recently, the sensitivity of deep learning training and inference pipelines to floating-point non-associativity has been found to sometimes be extreme. It can prevent certification for commercial applications, accurate assessment of robustness and sensitivity, and bug detection. New approaches in scientific computing applications have coupled deep learning models with high-performance computing, leading to an aggravation of debugging and testing challenges. Here we perform an investigation of the statistical properties of floating-point non-associativity within modern parallel programming models, and analyze performance and productivity impacts of replacing atomic operations with deterministic alternatives on GPUs. We examine the recently-added deterministic options in PyTorch within the context of GPU deployment for deep learning, uncovering and quantifying the impacts of input parameters triggering run to run variability and reporting on the reliability and completeness of the documentation. Finally, we evaluate the strategy of exploiting automatic determinism that could be provided by deterministic hardware, using the Groq accelerator for inference portions of the deep learning pipeline. We demonstrate the benefits that a hardware-based strategy can provide within reproducibility and correctness efforts.

Updated: 2024-10-30 16:52:42

标题: 浮点非结合性对HPC和深度学习应用中可重现性的影响

摘要: 并行程序中由浮点非结合性引起的运行到运行的可变性已被认为显著影响迭代算法的可再现性，这是由于累积误差导致的。不可再现性可能严重影响随机程序的正确性测试的效率和有效性。最近发现，深度学习训练和推断流程对浮点非结合性的敏感性有时极高。它可能阻止商业应用程序的认证，准确评估鲁棒性和敏感性，并检测错误。科学计算应用中的新方法将深度学习模型与高性能计算相结合，导致了调试和测试挑战的加剧。在这里，我们对现代并行编程模型中浮点非结合性的统计特性进行了调查，并分析了将原子操作替换为GPU上的确定性替代品对性能和生产力的影响。我们在GPU上部署深度学习时检查了PyTorch中最近添加的确定性选项，并揭示和量化触发运行到运行可变性的输入参数的影响，并报告文档的可靠性和完整性。最后，我们评估了利用确定性硬件提供的自动确定性的策略，使用Groq加速器来进行深度学习流程的推断部分。我们展示了硬件为基础的策略在可再现性和正确性工作中可以提供的好处。

更新时间: 2024-10-30 16:52:42

领域: cs.DC,cs.LG

下载: http://arxiv.org/abs/2408.05148v3

Resource-aware Mixed-precision Quantization for Enhancing Deployability of Transformers for Time-series Forecasting on Embedded FPGAs

This study addresses the deployment challenges of integer-only quantized Transformers on resource-constrained embedded FPGAs (Xilinx Spartan-7 XC7S15). We enhanced the flexibility of our VHDL template by introducing a selectable resource type for storing intermediate results across model layers, thereby breaking the deployment bottleneck by utilizing BRAM efficiently. Moreover, we developed a resource-aware mixed-precision quantization approach that enables researchers to explore hardware-level quantization strategies without requiring extensive expertise in Neural Architecture Search. This method provides accurate resource utilization estimates with a precision discrepancy as low as 3%, compared to actual deployment metrics. Compared to previous work, our approach has successfully facilitated the deployment of model configurations utilizing mixed-precision quantization, thus overcoming the limitations inherent in five previously non-deployable configurations with uniform quantization bitwidths. Consequently, this research enhances the applicability of Transformers in embedded systems, facilitating a broader range of Transformer-powered applications on edge devices.

Updated: 2024-10-30 16:51:39

标题: 资源感知混合精度量化技术用于增强嵌入式FPGA上基于Transformer的时间序列预测的可部署性

摘要: 这项研究探讨了整数量化的Transformer在资源受限的嵌入式FPGA（赛灵思Spartan-7 XC7S15）上的部署挑战。我们通过引入可选择的资源类型来增强我们的VHDL模板的灵活性，用于存储模型层间的中间结果，从而通过有效利用BRAM打破部署瓶颈。此外，我们开发了一种资源感知的混合精度量化方法，使研究人员能够探索硬件级量化策略，而无需广泛的神经架构搜索专业知识。与实际部署指标相比，该方法提供了精确的资源利用估计，精度差异低至3%。与先前工作相比，我们的方法成功促进了利用混合精度量化部署模型配置，从而克服了以前无法部署的五种具有统一量化位宽的配置的限制。因此，这项研究增强了Transformer在嵌入式系统中的适用性，促进了在边缘设备上更广泛范围的Transformer驱动应用。

更新时间: 2024-10-30 16:51:39

领域: cs.LG

下载: http://arxiv.org/abs/2410.03294v3

Position Coupling: Improving Length Generalization of Arithmetic Transformers Using Task Structure

Even for simple arithmetic tasks like integer addition, it is challenging for Transformers to generalize to longer sequences than those encountered during training. To tackle this problem, we propose position coupling, a simple yet effective method that directly embeds the structure of the tasks into the positional encoding of a (decoder-only) Transformer. Taking a departure from the vanilla absolute position mechanism assigning unique position IDs to each of the tokens, we assign the same position IDs to two or more "relevant" tokens; for integer addition tasks, we regard digits of the same significance as in the same position. On the empirical side, we show that with the proposed position coupling, our models trained on 1 to 30-digit additions can generalize up to 200-digit additions (6.67x of the trained length). On the theoretical side, we prove that a 1-layer Transformer with coupled positions can solve the addition task involving exponentially many digits, whereas any 1-layer Transformer without positional information cannot entirely solve it. We also demonstrate that position coupling can be applied to other algorithmic tasks such as Nx2 multiplication and a two-dimensional task.

Updated: 2024-10-30 16:50:43

标题: 位置耦合：利用任务结构改进算术变换器的长度泛化

摘要: 即使对于像整数加法这样的简单算术任务，对于Transformer来说，泛化到比训练中遇到的更长序列仍然具有挑战性。为了解决这个问题，我们提出了位置耦合，这是一种简单而有效的方法，直接将任务的结构嵌入到（仅解码器）Transformer的位置编码中。与将唯一位置ID分配给每个令牌的基本绝对位置机制不同，我们将相同的位置ID分配给两个或更多“相关”令牌；对于整数加法任务，我们将相同重要性的数字视为相同位置。在实证方面，我们展示了通过提出的位置耦合，我们的模型在1到30位数字加法上训练可以泛化到200位数字加法（训练长度的6.67倍）。在理论方面，我们证明了具有耦合位置的1层Transformer可以解决涉及指数多位数的加法任务，而没有位置信息的任何1层Transformer无法完全解决该问题。我们还证明了位置耦合可以应用于其他算法任务，如Nx2乘法和二维任务。

更新时间: 2024-10-30 16:50:43

领域: cs.LG,cs.AI,cs.CL

下载: http://arxiv.org/abs/2405.20671v2

Instigating Cooperation among LLM Agents Using Adaptive Information Modulation

This paper introduces a novel framework combining LLM agents as proxies for human strategic behavior with reinforcement learning (RL) to engage these agents in evolving strategic interactions within team environments. Our approach extends traditional agent-based simulations by using strategic LLM agents (SLA) and introducing dynamic and adaptive governance through a pro-social promoting RL agent (PPA) that modulates information access across agents in a network, optimizing social welfare and promoting pro-social behavior. Through validation in iterative games, including the prisoner dilemma, we demonstrate that SLA agents exhibit nuanced strategic adaptations. The PPA agent effectively learns to adjust information transparency, resulting in enhanced cooperation rates. This framework offers significant insights into AI-mediated social dynamics, contributing to the deployment of AI in real-world team settings.

Updated: 2024-10-30 16:45:15

标题: 使用自适应信息调制促进LLM代理之间的合作

摘要: 本文介绍了一种新颖的框架，将LLM代理作为人类战略行为的代理与强化学习（RL）相结合，以在团队环境中使这些代理参与进化的战略互动。我们的方法通过使用战略LLM代理（SLA）和引入动态和自适应治理的促进亲社会RL代理（PPA），通过调节网络中代理之间的信息获取，优化社会福利并促进亲社会行为，扩展了传统的基于代理的模拟。通过在迭代游戏中进行验证，包括囚徒困境，我们证明SLA代理表现出细致的战略适应。PPA代理有效地学会调整信息透明度，从而提高合作率。这一框架为AI介导的社会动态提供了重要见解，有助于在现实世界的团队环境中部署AI。

更新时间: 2024-10-30 16:45:15

领域: cs.AI,cs.CL,cs.CY,cs.GT

下载: http://arxiv.org/abs/2409.10372v3

Certification for Differentially Private Prediction in Gradient-Based Training

Differential privacy upper-bounds the information leakage of machine learning models, yet providing meaningful privacy guarantees has proven to be challenging in practice. The private prediction setting where model outputs are privatized is being investigated as an alternate way to provide formal guarantees at prediction time. Most current private prediction algorithms, however, rely on global sensitivity for noise calibration, which often results in large amounts of noise being added to the predictions. Data-specific noise calibration, such as smooth sensitivity, could significantly reduce the amount of noise added, but were so far infeasible to compute exactly for modern machine learning models. In this work we provide a novel and practical approach based on convex relaxation and bound propagation to compute a provable upper-bound for the local and smooth sensitivity of a prediction. This bound allows us to reduce the magnitude of noise added or improve privacy accounting in the private prediction setting. We validate our framework on datasets from financial services, medical image classification, and natural language processing and across models and find our approach to reduce the noise added by up to order of magnitude.

Updated: 2024-10-30 16:40:19

标题: 梯度训练中差分隐私预测的认证

摘要: 差分隐私上限限制了机器学习模型的信息泄露，但在实践中提供有意义的隐私保证一直是具有挑战性的。正在研究私有预测设置，其中模型输出被私有化，作为在预测时提供正式保证的替代方式。然而，大多数当前的私有预测算法依赖于全局敏感度进行噪声校准，这往往会导致在预测中添加大量噪声。数据特定的噪声校准，如平滑敏感度，可以显著减少添加的噪声量，但迄今为止计算现代机器学习模型的确切计算不可行。在这项工作中，我们提供了一种基于凸松弛和界传播的新颖且实用的方法，用于计算预测的局部和平滑敏感度的可证上限。这个界限使我们能够减少添加的噪声量或改进私有预测设置中的隐私计算。我们在金融服务、医学图像分类和自然语言处理等数据集上验证了我们的框架，并跨模型发现我们的方法可以将添加的噪声量减少到一个数量级。

更新时间: 2024-10-30 16:40:19

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2406.13433v2

ProTransformer: Robustify Transformers via Plug-and-Play Paradigm

Transformer-based architectures have dominated various areas of machine learning in recent years. In this paper, we introduce a novel robust attention mechanism designed to enhance the resilience of transformer-based architectures. Crucially, this technique can be integrated into existing transformers as a plug-and-play layer, improving their robustness without the need for additional training or fine-tuning. Through comprehensive experiments and ablation studies, we demonstrate that our ProTransformer significantly enhances the robustness of transformer models across a variety of prediction tasks, attack mechanisms, backbone architectures, and data domains. Notably, without further fine-tuning, the ProTransformer consistently improves the performance of vanilla transformers by 19.5%, 28.3%, 16.1%, and 11.4% for BERT, ALBERT, DistilBERT, and RoBERTa, respectively, under the classical TextFooler attack. Furthermore, ProTransformer shows promising resilience in large language models (LLMs) against prompting-based attacks, improving the performance of T5 and LLaMA by 24.8% and 17.8%, respectively, and enhancing Vicuna by an average of 10.4% against the Jailbreaking attack. Beyond the language domain, ProTransformer also demonstrates outstanding robustness in both vision and graph domains.

Updated: 2024-10-30 16:38:09

标题: ProTransformer：通过即插即用范例增强Transformer的鲁棒性

摘要: 基于Transformer的架构在机器学习的各个领域近年来占据主导地位。在本文中，我们介绍了一种新颖的强大的注意力机制，旨在增强基于Transformer的架构的韧性。关键是，这种技术可以作为即插即用的层集成到现有的Transformer中，提高它们的韧性，而无需额外的训练或微调。通过全面的实验和消融研究，我们证明了我们的ProTransformer显著提高了Transformer模型在各种预测任务、攻击机制、骨干架构和数据领域的韧性。值得注意的是，在没有进一步微调的情况下，ProTransformer在经典的TextFooler攻击下分别将BERT、ALBERT、DistilBERT和RoBERTa的性能提高了19.5%、28.3%、16.1%和11.4%。此外，ProTransformer在大型语言模型（LLMs）对基于提示的攻击表现出有希望的韧性，分别将T5和LLaMA的性能提高了24.8%和17.8%，并将Vicuna对抗Jailbreaking攻击的平均性能提高了10.4%。在语言领域之外，ProTransformer在视觉和图领域也展现出出色的韧性。

更新时间: 2024-10-30 16:38:09

领域: cs.LG,cs.CL,cs.CR

下载: http://arxiv.org/abs/2410.23182v1

ReasoningRec: Bridging Personalized Recommendations and Human-Interpretable Explanations through LLM Reasoning

This paper presents ReasoningRec, a reasoning-based recommendation framework that leverages Large Language Models (LLMs) to bridge the gap between recommendations and human-interpretable explanations. In contrast to conventional recommendation systems that rely on implicit user-item interactions, ReasoningRec employs LLMs to model users and items, focusing on preferences, aversions, and explanatory reasoning. The framework utilizes a larger LLM to generate synthetic explanations for user preferences, subsequently used to fine-tune a smaller LLM for enhanced recommendation accuracy and human-interpretable explanation. Our experimental study investigates the impact of reasoning and contextual information on personalized recommendations, revealing that the quality of contextual and personalized data significantly influences the LLM's capacity to generate plausible explanations. Empirical evaluations demonstrate that ReasoningRec surpasses state-of-the-art methods by up to 12.5\% in recommendation prediction while concurrently providing human-intelligible explanations. The code is available here: https://github.com/millenniumbismay/reasoningrec.

Updated: 2024-10-30 16:37:04

标题: ReasoningRec：通过LLM推理桥接个性化推荐和人类可解释的解释

摘要: 这篇论文介绍了ReasoningRec，一个基于推理的推荐框架，利用大型语言模型（LLMs）来弥合推荐和人类可解释解释之间的差距。与依赖隐式用户-物品交互的传统推荐系统不同，ReasoningRec利用LLMs来建模用户和物品，关注偏好、厌恶和解释推理。该框架利用一个更大的LLM生成用户偏好的合成解释，随后用于微调一个较小的LLM以提高推荐准确性和人类可解释性解释。我们的实验研究探讨了推理和上下文信息对个性化推荐的影响，揭示了上下文和个性化数据的质量显著影响LLM生成合理解释的能力。实证评估表明，ReasoningRec在推荐预测方面超过了最先进的方法，同时提供人类可理解的解释。代码可在此处获得：https://github.com/millenniumbismay/reasoningrec。

更新时间: 2024-10-30 16:37:04

领域: cs.IR,cs.AI

下载: http://arxiv.org/abs/2410.23180v1

Does equivariance matter at scale?

Given large data sets and sufficient compute, is it beneficial to design neural architectures for the structure and symmetries of each problem? Or is it more efficient to learn them from data? We study empirically how equivariant and non-equivariant networks scale with compute and training samples. Focusing on a benchmark problem of rigid-body interactions and on general-purpose transformer architectures, we perform a series of experiments, varying the model size, training steps, and dataset size. We find evidence for three conclusions. First, equivariance improves data efficiency, but training non-equivariant models with data augmentation can close this gap given sufficient epochs. Second, scaling with compute follows a power law, with equivariant models outperforming non-equivariant ones at each tested compute budget. Finally, the optimal allocation of a compute budget onto model size and training duration differs between equivariant and non-equivariant models.

Updated: 2024-10-30 16:36:59

标题: 在规模上等变性重要吗？

摘要: 鉴于大规模数据集和足够的计算资源，是否有益于为每个问题的结构和对称性设计神经网络架构？还是从数据中学习更为高效？我们通过实证研究了等变和非等变网络随着计算资源和训练样本的规模变化情况。我们重点研究了刚体相互作用的基准问题和通用Transformer架构，并进行了一系列实验，变化模型大小、训练步骤和数据集大小。我们得出三个结论证据。首先，等变性可以提高数据效率，但通过数据增强训练非等变模型可以在足够的训练轮次下缩小这一差距。第二，随着计算资源的扩大，等变模型在每个测试的计算预算下表现优于非等变模型。最后，计算预算在模型大小和训练持续时间之间的最佳分配在等变和非等变模型之间有所不同。

更新时间: 2024-10-30 16:36:59

领域: cs.LG

下载: http://arxiv.org/abs/2410.23179v1

Uncertainty quantification for fast reconstruction methods using augmented equivariant bootstrap: Application to radio interferometry

The advent of next-generation radio interferometers like the Square Kilometer Array promises to revolutionise our radio astronomy observational capabilities. The unprecedented volume of data these devices generate requires fast and accurate image reconstruction algorithms to solve the ill-posed radio interferometric imaging problem. Most state-of-the-art reconstruction methods lack trustworthy and scalable uncertainty quantification, which is critical for the rigorous scientific interpretation of radio observations. We propose an unsupervised technique based on a conformalized version of a radio-augmented equivariant bootstrapping method, which allows us to quantify uncertainties for fast reconstruction methods. Noticeably, we rely on reconstructions from ultra-fast unrolled algorithms. The proposed method brings more reliable uncertainty estimations to our problem than existing alternatives.

Updated: 2024-10-30 16:36:55

标题: 使用增强等变Bootstrap进行快速重建方法的不确定性量化：射电干涉应用

摘要: 下一代射电干涉仪如方千米阵列的出现将彻底改变我们的射电天文观测能力。这些设备产生的数据量前所未有，需要快速准确的图像重建算法来解决不适定的射电干涉成像问题。大多数最先进的重建方法缺乏可信的可扩展的不确定性量化，这对于射电观测的严谨科学解释至关重要。我们提出了一种基于射电增强等变自举方法的一致化版本的无监督技术，这使我们能够对快速重建方法进行不确定性量化。值得注意的是，我们依赖于超快速展开算法的重建。所提出的方法为我们的问题带来了比现有替代方案更可靠的不确定性估计。

更新时间: 2024-10-30 16:36:55

领域: astro-ph.IM,cs.LG

下载: http://arxiv.org/abs/2410.23178v1

Aequitas Flow: Streamlining Fair ML Experimentation

Aequitas Flow is an open-source framework and toolkit for end-to-end Fair Machine Learning (ML) experimentation, and benchmarking in Python. This package fills integration gaps that exist in other fair ML packages. In addition to the existing audit capabilities in Aequitas, the Aequitas Flow module provides a pipeline for fairness-aware model training, hyperparameter optimization, and evaluation, enabling easy-to-use and rapid experiments and analysis of results. Aimed at ML practitioners and researchers, the framework offers implementations of methods, datasets, metrics, and standard interfaces for these components to improve extensibility. By facilitating the development of fair ML practices, Aequitas Flow hopes to enhance the incorporation of fairness concepts in AI systems making AI systems more robust and fair.

Updated: 2024-10-30 16:34:12

标题: Aequitas Flow: 简化公平机器学习实验

摘要: Aequitas Flow是一个用于端到端公平机器学习实验和基准测试的开源框架和工具包，使用Python编写。该软件包填补了其他公平机器学习软件包中存在的集成空缺。除了Aequitas中已有的审计功能外，Aequitas Flow模块提供了一个用于公平感知模型训练、超参数优化和评估的流水线，实现了易于使用和快速的实验和结果分析。该框架旨在为机器学习从业者和研究人员提供方法、数据集、度量标准和这些组件的标准接口的实现，以提高可扩展性。通过促进公平机器学习实践的发展，Aequitas Flow希望增强公平概念在人工智能系统中的融入，使人工智能系统更加健壮和公平。

更新时间: 2024-10-30 16:34:12

领域: cs.LG,cs.AI,cs.CY

下载: http://arxiv.org/abs/2405.05809v2

Exploring Design Choices for Building Language-Specific LLMs

Despite rapid progress in large language models (LLMs), their performance on a vast majority of languages remains unsatisfactory. In this paper, we study building language-specific LLMs by adapting monolingual and multilingual LLMs. We conduct systematic experiments on how design choices (base model selection, vocabulary extension, and continued pretraining) impact the adapted LLM, both in terms of efficiency (how many tokens are needed to encode the same amount of information) and end task performance. We find that (1) the initial performance of LLM does not always correlate with the final performance after the adaptation. Adapting an English-centric models can yield better results than adapting multilingual models despite their worse initial performance on low-resource languages. (2) Efficiency can easily improved with simple vocabulary extension and continued pretraining in most LLMs we study, and (3) The optimal adaptation method (choice of the base model, new vocabulary size, training data, initialization strategy) is highly language-dependent, and the simplest embedding initialization works well across various experimental settings. Together, our work lays foundations on efficiently building language-specific LLMs by adapting existing LLMs.

Updated: 2024-10-30 16:33:48

标题: 探讨构建特定语言LLMs的设计选择

摘要: 尽管大型语言模型（LLMs）取得了快速进展，但它们在绝大多数语言上的性能仍然令人不满。在本文中，我们研究了通过调整单语和多语言LLMs构建特定语言LLMs。我们进行了系统实验，研究了设计选择（基础模型选择、词汇扩展和继续预训练）如何影响调整后的LLM，无论是在效率（编码相同信息所需的标记量）还是最终任务性能方面。我们发现（1）LLM的初始性能并不总是与适应后的最终性能相关。适应以英语为中心的模型可能会产生比适应多语言模型更好的结果，尽管它们在低资源语言上的初始性能较差。（2）在我们研究的大多数LLMs中，通过简单的词汇扩展和持续预训练可以轻松提高效率，（3）最佳适应方法（基础模型选择、新词汇量、训练数据、初始化策略）高度依赖于语言，并且在各种实验设置中，最简单的嵌入初始化方法都能很好地工作。综上所述，我们的工作为通过调整现有LLMs高效构建特定语言LLMs奠定了基础。

更新时间: 2024-10-30 16:33:48

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2406.14670v2

Weight Copy and Low-Rank Adaptation for Few-Shot Distillation of Vision Transformers

Few-shot knowledge distillation recently emerged as a viable approach to harness the knowledge of large-scale pre-trained models, using limited data and computational resources. In this paper, we propose a novel few-shot feature distillation approach for vision transformers. Our approach is based on two key steps. Leveraging the fact that vision transformers have a consistent depth-wise structure, we first copy the weights from intermittent layers of existing pre-trained vision transformers (teachers) into shallower architectures (students), where the intermittence factor controls the complexity of the student transformer with respect to its teacher. Next, we employ an enhanced version of Low-Rank Adaptation (LoRA) to distill knowledge into the student in a few-shot scenario, aiming to recover the information processing carried out by the skipped teacher layers. We present comprehensive experiments with supervised and self-supervised transformers as teachers, on six data sets from various domains (natural, medical and satellite images) and tasks (classification and segmentation). The empirical results confirm the superiority of our approach over state-of-the-art competitors. Moreover, the ablation results demonstrate the usefulness of each component of the proposed pipeline. We release our code at https://github.com/dianagrigore/WeCoLoRA.

Updated: 2024-10-30 16:27:20

标题: 视觉Transformer的少样本蒸馏的权重复制和低秩适应

摘要: 最近，少样本知识蒸馏作为一种有效的方法出现，可以利用有限的数据和计算资源来利用大规模预训练模型的知识。在本文中，我们提出了一种新颖的少样本特征蒸馏方法，用于视觉变换器。我们的方法基于两个关键步骤。首先，利用视觉变换器具有一致的深度结构这一事实，我们将现有预训练视觉变换器（教师）的间歇层的权重复制到更浅的架构（学生）中，其中间歇因子控制学生变换器相对于其教师的复杂性。接下来，我们使用增强版本的低秩适应（LoRA）将知识蒸馏到学生中，在少样本场景中，旨在恢复被跳过的教师层执行的信息处理。我们在六个来自各个领域（自然、医学和卫星图像）和任务（分类和分割）的数据集上使用监督和自监督变换器作为教师进行了全面实验。实证结果证实了我们的方法优于最先进的竞争对手。此外，消融结果展示了所提出流程的每个组件的有用性。我们在https://github.com/dianagrigore/WeCoLoRA 上发布我们的代码。

更新时间: 2024-10-30 16:27:20

领域: cs.CV,cs.AI,cs.LG

下载: http://arxiv.org/abs/2404.09326v3

Functional Gradient Flows for Constrained Sampling

Recently, through a unified gradient flow perspective of Markov chain Monte Carlo (MCMC) and variational inference (VI), particle-based variational inference methods (ParVIs) have been proposed that tend to combine the best of both worlds. While typical ParVIs such as Stein Variational Gradient Descent (SVGD) approximate the gradient flow within a reproducing kernel Hilbert space (RKHS), many attempts have been made recently to replace RKHS with more expressive function spaces, such as neural networks. While successful, these methods are mainly designed for sampling from unconstrained domains. In this paper, we offer a general solution to constrained sampling by introducing a boundary condition for the gradient flow which would confine the particles within the specific domain. This allows us to propose a new functional gradient ParVI method for constrained sampling, called constrained functional gradient flow (CFG), with provable continuous-time convergence in total variation (TV). We also present novel numerical strategies to handle the boundary integral term arising from the domain constraints. Our theory and experiments demonstrate the effectiveness of the proposed framework.

Updated: 2024-10-30 16:20:48

标题: 限制采样的功能梯度流

摘要: 最近，通过马尔可夫链蒙特卡罗（MCMC）和变分推理（VI）的统一梯度流视角，提出了基于粒子的变分推理方法（ParVIs），这些方法倾向于结合两者的优点。虽然传统的ParVIs，如Stein变分梯度下降（SVGD），在再生核希尔伯特空间（RKHS）中近似梯度流，但最近已经有许多尝试用更具表达力的函数空间，如神经网络，取代RKHS。虽然这些方法取得了成功，但主要设计用于从无约束域中进行采样。在本文中，我们通过引入梯度流的边界条件来提供一种对约束采样的通用解决方案，这将限制粒子在特定域内。这使我们能够提出一种新的用于约束采样的功能梯度ParVI方法，称为约束功能梯度流（CFG），在总变分（TV）中具有可证的连续时间收敛性。我们还提出了处理由于域约束而产生的边界积分项的新颖数值策略。我们的理论和实验证明了所提出框架的有效性。

更新时间: 2024-10-30 16:20:48

领域: stat.ML,cs.LG

下载: http://arxiv.org/abs/2410.23170v1

The Persistence of Neural Collapse Despite Low-Rank Bias: An Analytic Perspective Through Unconstrained Features

Modern deep neural networks have been observed to exhibit a simple structure in their final layer features and weights, commonly referred to as neural collapse. This phenomenon has also been noted in layers beyond the final one, an extension known as deep neural collapse. Recent findings indicate that such a structure is generally not optimal in the deep unconstrained feature model, an approximation of an expressive network. This is attributed to a low-rank bias induced by regularization, which favors solutions with lower-rank than those typically associated with deep neural collapse. In this work, we extend these observations to the cross-entropy loss and analyze how the low-rank bias influences various solutions. Additionally, we explore how this bias induces specific structures in the singular values of the weights at global optima. Furthermore, we examine the loss surface of these models and provide evidence that the frequent observation of deep neural collapse in practice, despite its suboptimality, may result from its higher degeneracy on the loss surface.

Updated: 2024-10-30 16:20:39

标题: 尽管低秩偏差存在，神经坍塌仍然持续：通过无约束特征的分析视角

摘要: 现代深度神经网络被观察到在其最终层特征和权重中表现出简单的结构，通常称为神经坍塌。这种现象也被注意到在最终层之外的层中，这种扩展被称为深度神经坍塌。最近的研究结果表明，在深度无约束特征模型中，这种结构通常不是最优化的，这是一种表达性网络的近似。这被归因于由正则化引起的低秩偏差，偏好于具有比通常与深度神经坍塌相关的低秩解决方案。在这项工作中，我们将这些观察结果扩展到交叉熵损失，并分析低秩偏差如何影响各种解决方案。此外，我们探讨这种偏差如何在全局最优解的权重的奇异值中诱导特定结构。此外，我们研究这些模型的损失表面，并提供证据表明，尽管其并不是最优化的，但实际上经常观察到深度神经坍塌，可能是由于其在损失表面上的更高退化性。

更新时间: 2024-10-30 16:20:39

领域: cs.LG

下载: http://arxiv.org/abs/2410.23169v1

Variable Resolution Sampling and Deep Learning Image Recovery for Accelerated Multi-Spectral MRI Near Metal Implants

Purpose: This study presents a variable resolution (VR) sampling and deep learning reconstruction approach for multi-spectral MRI near metal implants, aiming to reduce scan times while maintaining image quality. Background: The rising use of metal implants has increased MRI scans affected by metal artifacts. Multi-spectral imaging (MSI) reduces these artifacts but sacrifices acquisition efficiency. Methods: This retrospective study on 1.5T MSI knee and hip data from patients with metal hardware used a novel spectral undersampling scheme to improve acquisition efficiency by ~40%. U-Net-based deep learning models were trained for reconstruction. Image quality was evaluated using SSIM, PSNR, and RESI metrics. Results: Deep learning reconstructions of undersampled VR data (DL-VR) showed significantly higher SSIM and PSNR values (p<0.001) compared to conventional reconstruction (CR-VR), with improved edge sharpness. Edge sharpness in DL-reconstructed images matched fully sampled references (p=0.5). Conclusion: This approach can potentially enhance MRI examinations near metal implants by reducing scan times or enabling higher resolution. Further prospective studies are needed to assess clinical value.

Updated: 2024-10-30 16:19:06

标题: 变分辨率采样和深度学习图像恢复用于加速多光谱MRI金属植入物附近

摘要: 目的：本研究提出了一种用于多光谱MRI近金属植入物的可变分辨率（VR）采样和深度学习重建方法，旨在缩短扫描时间同时保持图像质量。背景：金属植入物的广泛使用增加了受金属伪影影响的MRI扫描。多光谱成像（MSI）减少了这些伪影，但牺牲了采集效率。方法：这项回顾性研究使用了一种新颖的光谱欠采样方案，对来自患有金属植入物的患者的1.5T MSI膝盖和髋关节数据进行了改进，以提高采集效率约40%。基于U-Net的深度学习模型用于重建。使用SSIM、PSNR和RESI指标评估了图像质量。结果：欠采样VR数据的深度学习重建（DL-VR）显示出显著更高的SSIM和PSNR值（p<0.001），与传统重建（CR-VR）相比，边缘清晰度得到改善。DL重建图像的边缘清晰度与完全采样的参考图像相匹配（p=0.5）。结论：这种方法有可能通过缩短扫描时间或实现更高分辨率来增强MRI检查，尤其是在金属植入物附近。需要进一步的前瞻性研究来评估其临床价值。

更新时间: 2024-10-30 16:19:06

领域: eess.IV,cs.AI,cs.CV,physics.med-ph

下载: http://arxiv.org/abs/2410.23329v1

TokenFormer: Rethinking Transformer Scaling with Tokenized Model Parameters

Transformers have become the predominant architecture in foundation models due to their excellent performance across various domains. However, the substantial cost of scaling these models remains a significant concern. This problem arises primarily from their dependence on a fixed number of parameters within linear projections. When architectural modifications (e.g., channel dimensions) are introduced, the entire model typically requires retraining from scratch. As model sizes continue growing, this strategy results in increasingly high computational costs and becomes unsustainable. To overcome this problem, we introduce TokenFormer, a natively scalable architecture that leverages the attention mechanism not only for computations among input tokens but also for interactions between tokens and model parameters, thereby enhancing architectural flexibility. By treating model parameters as tokens, we replace all the linear projections in Transformers with our token-parameter attention layer, where input tokens act as queries and model parameters as keys and values. This reformulation allows for progressive and efficient scaling without necessitating retraining from scratch. Our model scales from 124M to 1.4B parameters by incrementally adding new key-value parameter pairs, achieving performance comparable to Transformers trained from scratch while greatly reducing training costs. Code and models are available at \url{https://github.com/Haiyang-W/TokenFormer}.

Updated: 2024-10-30 16:19:00

标题: TokenFormer：重新思考使用标记化模型参数进行Transformer缩放

摘要: 变压器已成为基础模型中的主导架构，因为它们在各个领域表现出色。然而，扩展这些模型的重大成本仍然是一个重要关注点。这个问题主要源于它们对线性投影中固定数量的参数的依赖。当引入架构修改（例如，通道维度）时，整个模型通常需要从头开始重新训练。随着模型规模的不断增长，这种策略导致了越来越高的计算成本，并变得不可持续。为了解决这个问题，我们引入了TokenFormer，这是一种本地可扩展的架构，它不仅利用注意机制在输入标记之间进行计算，还用于标记与模型参数之间的交互，从而增强了架构的灵活性。通过将模型参数视为标记，我们用我们的标记-参数注意层替换了变压器中的所有线性投影，其中输入标记充当查询，模型参数充当键和值。这种重构允许渐进和高效的扩展，而无需从头开始重新训练。我们的模型通过逐步添加新的键值参数对，从124M扩展到1.4B个参数，实现了与从头开始训练的变压器相当的性能，同时大大降低了训练成本。代码和模型可在\url{https://github.com/Haiyang-W/TokenFormer}上获得。

更新时间: 2024-10-30 16:19:00

领域: cs.LG

下载: http://arxiv.org/abs/2410.23168v1

A Survey Analyzing Generalization in Deep Reinforcement Learning

Reinforcement learning research obtained significant success and attention with the utilization of deep neural networks to solve problems in high dimensional state or action spaces. While deep reinforcement learning policies are currently being deployed in many different fields from medical applications to large language models, there are still ongoing questions the field is trying to answer on the generalization capabilities of deep reinforcement learning policies. In this paper, we will formalize and analyze generalization in deep reinforcement learning. We will explain the fundamental reasons why deep reinforcement learning policies encounter overfitting problems that limit their generalization capabilities. Furthermore, we will categorize and explain the manifold solution approaches to increase generalization, and overcome overfitting in deep reinforcement learning policies. From exploration to adversarial analysis and from regularization to robustness our paper provides an analysis on a wide range of subfields within deep reinforcement learning with a broad scope and in-depth view. We believe our study can provide a compact guideline for the current advancements in deep reinforcement learning, and help to construct robust deep neural policies with higher generalization skills.

Updated: 2024-10-30 16:18:57

标题: 一项分析深度强化学习泛化能力的调查

摘要: 强化学习研究在利用深度神经网络解决高维状态或动作空间中的问题时取得了显著的成功和关注。虽然深度强化学习策略目前正在被部署在从医疗应用到大型语言模型等许多不同领域，但仍然存在着有关深度强化学习策略泛化能力的持续问题。本文将规范和分析深度强化学习中的泛化。我们将解释深度强化学习策略遇到过拟合问题的根本原因，限制了它们的泛化能力。此外，我们将分类和解释多种解决方案来增加泛化能力，并克服深度强化学习策略中的过拟合问题。从探索到对抗分析，从正则化到鲁棒性，我们的论文提供了对深度强化学习中各种子领域的广泛和深入的分析。我们相信我们的研究可以为当前深度强化学习的进展提供一个紧凑的指南，并有助于构建具有更高泛化能力的稳健深度神经策略。

更新时间: 2024-10-30 16:18:57

领域: cs.LG,cs.AI,stat.ML

下载: http://arxiv.org/abs/2401.02349v2

SciPIP: An LLM-based Scientific Paper Idea Proposer

The exponential growth of knowledge and the increasing complexity of interdisciplinary research pose significant challenges for researchers, including information overload and difficulties in exploring novel ideas. The advancements in large language models (LLMs), such as GPT-4, have shown great potential in enhancing idea proposals, but how to effectively utilize large models for reasonable idea proposal has not been thoroughly explored. This paper proposes a scientific paper idea proposer (SciPIP). Based on a user-provided research background, SciPIP retrieves helpful papers from a literature database while leveraging the capabilities of LLMs to generate more novel and feasible ideas. To this end, 1) we construct a literature retrieval database, extracting lots of papers' multi-dimension information for fast access. Then, a literature retrieval method based on semantics, entity, and citation co-occurrences is proposed to search relevant literature from multiple aspects based on the user-provided background. 2) After literature retrieval, we introduce dual-path idea proposal strategies, where one path infers solutions from the retrieved literature and the other path generates original ideas through model brainstorming. We then combine the two to achieve a good balance between feasibility and originality. Through extensive experiments on the natural language processing (NLP) field, we demonstrate that SciPIP can retrieve citations similar to those of existing top conference papers and generate many ideas consistent with them. Additionally, we evaluate the originality of other ideas generated by SciPIP using large language models, further validating the effectiveness of our proposed method. The code and the database are released at https://github.com/cheerss/SciPIP.

Updated: 2024-10-30 16:18:22

标题: SciPIP：基于LLM的科学论文想法提出者

摘要: 知识的指数增长和跨学科研究日益复杂化给研究人员带来了重大挑战，包括信息过载和探索新颖思想的困难。大型语言模型（LLMs）的进步，如GPT-4，显示出在增强思想提议方面具有巨大潜力，但如何有效利用大型模型进行合理的思想提议尚未得到深入探讨。本文提出了一种科学论文思想提议器（SciPIP）。基于用户提供的研究背景，SciPIP从文献数据库中检索有用的论文，同时利用LLMs的能力生成更多新颖和可行的想法。为此，1）我们构建了一个文献检索数据库，提取了大量论文的多维信息以便快速访问。然后，提出了一种基于语义、实体和引文共现的文献检索方法，根据用户提供的背景从多个方面搜索相关文献。2）在文献检索之后，我们引入了双路径思想提议策略，其中一条路径从检索到的文献中推断解决方案，另一条路径通过模型头脑风暴生成原创想法。然后将两者结合以实现可行性和原创性之间的良好平衡。通过在自然语言处理（NLP）领域的广泛实验，我们证明了SciPIP能够检索到与现有顶级会议论文相似的引文，并生成许多与之一致的想法。此外，我们使用大型语言模型评估了SciPIP生成的其他想法的独创性，进一步验证了我们提出的方法的有效性。代码和数据库发布在https://github.com/cheerss/SciPIP。

更新时间: 2024-10-30 16:18:22

领域: cs.CL,cs.AI,cs.IR,cs.LG

下载: http://arxiv.org/abs/2410.23166v1

FlexTSF: A Universal Forecasting Model for Time Series with Variable Regularities

Developing a foundation model for time series forecasting across diverse domains has attracted significant attention in recent years. Existing works typically assume regularly sampled, well-structured data, limiting their applicability to more generalized scenarios where time series often contain missing values, unequal sequence lengths, and irregular time intervals between measurements. To cover diverse domains and handle variable regularities, we propose FlexTSF, a universal time series forecasting model that possesses better generalization and natively support both regular and irregular time series. FlexTSF produces forecasts in an autoregressive manner and incorporates three novel designs: VT-Norm, a normalization strategy to ablate data domain barriers, IVP Patcher, a patching module to learn representations from flexibly structured time series, and LED attention, an attention mechanism to seamlessly integrate these two and propagate forecasts with awareness of domain and time information. Experiments on 12 datasets show that FlexTSF outperforms state-of-the-art forecasting models respectively designed for regular and irregular time series. Furthermore, after self-supervised pre-training, FlexTSF shows exceptional performance in both zero-shot and few-show settings for time series forecasting.

Updated: 2024-10-30 16:14:09

标题: FlexTSF：一种适用于具有可变规律性的时间序列的通用预测模型

摘要: 在过去几年中，针对各种领域的时间序列预测开发基础模型引起了重要关注。现有工作通常假设数据是定期采样的、结构良好的，从而限制了它们在更广泛场景中的适用性，这些场景中时间序列经常包含缺失值、不等长度序列和测量之间的不规则时间间隔。为了涵盖各种领域并处理不同的规律性，我们提出了FlexTSF，这是一个通用的时间序列预测模型，具有更好的泛化性，并原生支持定期和不规则时间序列。FlexTSF以自回归方式生成预测，并融合了三个新设计：VT-Norm，一种标准化策略，消除数据领域障碍；IVP Patcher，一个补丁模块，可以从灵活结构的时间序列中学习表示；LED attention，一种注意机制，无缝地整合这两个设计，并传播具有领域和时间信息意识的预测。在12个数据集上的实验表明，FlexTSF在定期和不规则时间序列方面分别设计的最先进预测模型表现出色。此外，在自监督预训练后，FlexTSF在零样本和少样本设置下的时间序列预测中表现出色。

更新时间: 2024-10-30 16:14:09

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2410.23160v1

Fourier Amplitude and Correlation Loss: Beyond Using L2 Loss for Skillful Precipitation Nowcasting

Deep learning approaches have been widely adopted for precipitation nowcasting in recent years. Previous studies mainly focus on proposing new model architectures to improve pixel-wise metrics. However, they frequently result in blurry predictions which provide limited utility to forecasting operations. In this work, we propose a new Fourier Amplitude and Correlation Loss (FACL) which consists of two novel loss terms: Fourier Amplitude Loss (FAL) and Fourier Correlation Loss (FCL). FAL regularizes the Fourier amplitude of the model prediction and FCL complements the missing phase information. The two loss terms work together to replace the traditional $L_2$ losses such as MSE and weighted MSE for the spatiotemporal prediction problem on signal-based data. Our method is generic, parameter-free and efficient. Extensive experiments using one synthetic dataset and three radar echo datasets demonstrate that our method improves perceptual metrics and meteorology skill scores, with a small trade-off to pixel-wise accuracy and structural similarity. Moreover, to improve the error margin in meteorological skill scores such as Critical Success Index (CSI) and Fractions Skill Score (FSS), we propose and adopt the Regional Histogram Divergence (RHD), a distance metric that considers the patch-wise similarity between signal-based imagery patterns with tolerance to local transforms. Code is available at https://github.com/argenycw/FACL

Updated: 2024-10-30 16:12:56

标题: 傅立叶振幅和相关损失：超越使用L2损失进行熟练的降水短时预报

摘要: 深度学习方法近年来被广泛应用于降水短期预报。先前的研究主要集中在提出新的模型架构来改进像素级指标。然而，它们经常导致模糊的预测结果，限制了预测操作的效用。在这项工作中，我们提出了一种新的傅里叶振幅和相关损失（FACL），它包含两个新的损失项：傅里叶振幅损失（FAL）和傅里叶相关损失（FCL）。FAL规范了模型预测的傅里叶振幅，而FCL补充了缺失的相位信息。这两个损失项共同取代了传统的$L_2$损失，如均方误差和加权均方误差，用于信号数据的时空预测问题。我们的方法是通用的、无参数的和高效的。使用一个合成数据集和三个雷达回波数据集进行的大量实验表明，我们的方法改善了感知指标和气象技能得分，对像素级准确性和结构相似性有一定的折衷。此外，为了改善气象技能得分（如临界成功指数（CSI）和分数技能得分（FSS））中的误差边界，我们提出并采用了区域直方图差异（RHD），这是一种考虑局部变换容忍度的基于信号图像模式的图像之间的基于补丁的相似度的距离度量。代码可在https://github.com/argenycw/FACL上找到。

更新时间: 2024-10-30 16:12:56

领域: cs.CV,cs.AI,cs.LG

下载: http://arxiv.org/abs/2410.23159v1

Source Code Foundation Models are Transferable Binary Analysis Knowledge Bases

Human-Oriented Binary Reverse Engineering (HOBRE) lies at the intersection of binary and source code, aiming to lift binary code to human-readable content relevant to source code, thereby bridging the binary-source semantic gap. Recent advancements in uni-modal code model pre-training, particularly in generative Source Code Foundation Models (SCFMs) and binary understanding models, have laid the groundwork for transfer learning applicable to HOBRE. However, existing approaches for HOBRE rely heavily on uni-modal models like SCFMs for supervised fine-tuning or general LLMs for prompting, resulting in sub-optimal performance. Inspired by recent progress in large multi-modal models, we propose that it is possible to harness the strengths of uni-modal code models from both sides to bridge the semantic gap effectively. In this paper, we introduce a novel probe-and-recover framework that incorporates a binary-source encoder-decoder model and black-box LLMs for binary analysis. Our approach leverages the pre-trained knowledge within SCFMs to synthesize relevant, symbol-rich code fragments as context. This additional context enables black-box LLMs to enhance recovery accuracy. We demonstrate significant improvements in zero-shot binary summarization and binary function name recovery, with a 10.3% relative gain in CHRF and a 16.7% relative gain in a GPT4-based metric for summarization, as well as a 6.7% and 7.4% absolute increase in token-level precision and recall for name recovery, respectively. These results highlight the effectiveness of our approach in automating and improving binary code analysis.

Updated: 2024-10-30 16:12:36

标题: 源代码基础模型是可转移的二进制分析知识库

摘要: 人类导向的二进制逆向工程（HOBRE）位于二进制代码和源代码的交集处，旨在将二进制代码提升为与源代码相关的可读内容，从而弥合二进制-源码语义差距。最近在单模型代码模型预训练方面取得了进展，特别是在生成式源代码基础模型（SCFMs）和二进制理解模型方面，为适用于HOBRE的迁移学习奠定了基础。然而，现有的HOBRE方法主要依赖于像SCFMs这样的单模型进行监督微调或一般LLMs进行提示，导致性能不佳。受大型多模型模型最近进展的启发，我们提出可以利用双方的单模型代码模型的优势来有效地弥合语义差距。在本文中，我们引入了一种新颖的探测和恢复框架，该框架结合了一个二进制-源码编码器-解码器模型和黑盒LLMs用于二进制分析。我们的方法利用SCFMs中预训练的知识来合成相关的、符号丰富的代码片段作为上下文。这种额外的上下文使黑盒LLMs能够提高恢复准确性。我们展示了在零样本二进制摘要和二进制函数名称恢复方面的显著改进，CHRF相对增益为10.3％，基于GPT4的摘要度量相对增益为16.7％，名称恢复的令牌级精度和召回率分别增加了6.7％和7.4％。这些结果突显了我们的方法在自动化和改进二进制代码分析方面的有效性。

更新时间: 2024-10-30 16:12:36

领域: cs.SE,cs.AI

下载: http://arxiv.org/abs/2405.19581v2

Directional anomaly detection

Semi-supervised anomaly detection is based on the principle that potential anomalies are those records that look different from normal training data. However, in some cases we are specifically interested in anomalies that correspond to high attribute values (or low, but not both). We present two asymmetrical distance measures that take this directionality into account: ramp distance and signed distance. Through experiments on synthetic and real-life datasets we show that ramp distance performs as well or better than the absolute distance traditionally used in anomaly detection. While signed distance also performs well on synthetic data, it performs substantially poorer on real-life datasets. We argue that this reflects the fact that in practice, good scores on some attributes should not be allowed to compensate for bad scores on others.

Updated: 2024-10-30 16:11:40

标题: 方向性异常检测

摘要: 半监督异常检测基于一个原则，即潜在异常是那些与正常训练数据看起来不同的记录。然而，在某些情况下，我们特别关注与高属性值（或低属性值，但不能同时）相对应的异常。我们提出了两种考虑这种方向性的非对称距离度量：坡道距离和有符号距离。通过对合成和真实数据集的实验，我们展示了坡道距离的性能与传统异常检测中通常使用的绝对距离一样好甚至更好。虽然有符号距离在合成数据上表现良好，但在真实数据集上表现明显较差。我们认为这反映了实践中的一个事实，即在某些属性上的良好得分不应该用来弥补其他属性上的差得分。

更新时间: 2024-10-30 16:11:40

领域: cs.LG

下载: http://arxiv.org/abs/2410.23158v1

VisualPredicator: Learning Abstract World Models with Neuro-Symbolic Predicates for Robot Planning

Broadly intelligent agents should form task-specific abstractions that selectively expose the essential elements of a task, while abstracting away the complexity of the raw sensorimotor space. In this work, we present Neuro-Symbolic Predicates, a first-order abstraction language that combines the strengths of symbolic and neural knowledge representations. We outline an online algorithm for inventing such predicates and learning abstract world models. We compare our approach to hierarchical reinforcement learning, vision-language model planning, and symbolic predicate invention approaches, on both in- and out-of-distribution tasks across five simulated robotic domains. Results show that our approach offers better sample complexity, stronger out-of-distribution generalization, and improved interpretability.

Updated: 2024-10-30 16:11:05

标题: VisualPredicator: 使用神经符号谓词学习机器人规划的抽象世界模型

摘要: 广义智能代理应该形成特定任务的抽象，选择性地暴露任务的基本要素，同时将原始感知运动空间的复杂性抽象出来。在这项工作中，我们提出了神经符号谓词，这是一种结合了符号和神经知识表示优势的一阶抽象语言。我们概述了一种在线算法，用于发明这种谓词并学习抽象世界模型。我们将我们的方法与分层强化学习、视觉语言模型规划和符号谓词发明方法进行比较，在五个模拟机器人领域的内部和外部任务上。结果显示，我们的方法提供了更好的样本复杂性，更强的外部泛化能力和改进的可解释性。

更新时间: 2024-10-30 16:11:05

领域: cs.AI,cs.CV,cs.LG,cs.RO

下载: http://arxiv.org/abs/2410.23156v1

QWO: Speeding Up Permutation-Based Causal Discovery in LiGAMs

Causal discovery is essential for understanding relationships among variables of interest in many scientific domains. In this paper, we focus on permutation-based methods for learning causal graphs in Linear Gaussian Acyclic Models (LiGAMs), where the permutation encodes a causal ordering of the variables. Existing methods in this setting are not scalable due to their high computational complexity. These methods are comprised of two main components: (i) constructing a specific DAG, $\mathcal{G}^\pi$, for a given permutation $\pi$, which represents the best structure that can be learned from the available data while adhering to $\pi$, and (ii) searching over the space of permutations (i.e., causal orders) to minimize the number of edges in $\mathcal{G}^\pi$. We introduce QWO, a novel approach that significantly enhances the efficiency of computing $\mathcal{G}^\pi$ for a given permutation $\pi$. QWO has a speed-up of $O(n^2)$ ($n$ is the number of variables) compared to the state-of-the-art BIC-based method, making it highly scalable. We show that our method is theoretically sound and can be integrated into existing search strategies such as GRASP and hill-climbing-based methods to improve their performance.

Updated: 2024-10-30 16:10:46

标题: QWO：加速LiGAMs中基于排列的因果发现

摘要: 因果发现对于理解许多科学领域中感兴趣变量之间的关系至关重要。在本文中，我们专注于基于排列的方法来学习线性高斯无环模型（LiGAMs）中的因果图，其中排列编码了变量的因果顺序。由于其高计算复杂性，现有方法在这种设置下并不可扩展。这些方法由两个主要组成部分组成：（i）为给定排列$\pi$构造特定的DAG，$\mathcal{G}^\pi$，该DAG代表了可以从可用数据中学习的最佳结构，同时遵循$\pi$；（ii）搜索排列空间（即因果顺序），以最小化$\mathcal{G}^\pi$中的边数。我们引入了一种名为QWO的新方法，显著提高了为给定排列$\pi$计算$\mathcal{G}^\pi$的效率。与最先进的基于BIC的方法相比，QWO有$O(n^2)$的加速（其中$n$是变量的数量），使其具有高度可扩展性。我们证明我们的方法在理论上是可靠的，并且可以集成到现有的搜索策略中，如GRASP和基于爬山的方法，以提高它们的性能。

更新时间: 2024-10-30 16:10:46

领域: cs.LG,stat.ME,stat.ML

下载: http://arxiv.org/abs/2410.23155v1

Unbounded: A Generative Infinite Game of Character Life Simulation

We introduce the concept of a generative infinite game, a video game that transcends the traditional boundaries of finite, hard-coded systems by using generative models. Inspired by James P. Carse's distinction between finite and infinite games, we leverage recent advances in generative AI to create Unbounded: a game of character life simulation that is fully encapsulated in generative models. Specifically, Unbounded draws inspiration from sandbox life simulations and allows you to interact with your autonomous virtual character in a virtual world by feeding, playing with and guiding it - with open-ended mechanics generated by an LLM, some of which can be emergent. In order to develop Unbounded, we propose technical innovations in both the LLM and visual generation domains. Specifically, we present: (1) a specialized, distilled large language model (LLM) that dynamically generates game mechanics, narratives, and character interactions in real-time, and (2) a new dynamic regional image prompt Adapter (IP-Adapter) for vision models that ensures consistent yet flexible visual generation of a character across multiple environments. We evaluate our system through both qualitative and quantitative analysis, showing significant improvements in character life simulation, user instruction following, narrative coherence, and visual consistency for both characters and the environments compared to traditional related approaches.

Updated: 2024-10-30 16:10:33

标题: 无界：一个生成无限的角色生活模拟游戏

摘要: 我们介绍了生成式无限游戏的概念，这是一种视频游戏，通过使用生成模型，超越了传统有限、硬编码系统的边界。受到詹姆斯·P·卡斯尔对有限和无限游戏的区分的启发，我们利用生成式人工智能的最新进展，创造了《Unbounded》：一个完全封装在生成模型中的角色生活模拟游戏。具体而言，《Unbounded》从沙盒生活模拟中汲取灵感，允许您通过喂食、玩耍和引导来与您的虚拟角色在虚拟世界中进行互动-这些互动机制由LLM生成，其中一些可能是新兴的。为了开发《Unbounded》，我们在LLM和视觉生成领域提出了技术创新。具体来说，我们提出：（1）一个专门的、精简的大型语言模型（LLM），可以动态生成游戏机制、叙事和角色互动，以及（2）一个新的动态区域图像提示适配器（IP-Adapter），用于视觉模型，确保跨多个环境中的角色的视觉生成保持一致而灵活。我们通过定性和定量分析评估了我们的系统，显示相比传统相关方法，角色生活模拟、用户指导跟随、叙事连贯性和角色以及环境的视觉一致性方面均取得了显著改进。

更新时间: 2024-10-30 16:10:33

领域: cs.CV,cs.AI,cs.CL,cs.GR,cs.LG

下载: http://arxiv.org/abs/2410.18975v2

When can classical neural networks represent quantum states?

A naive classical representation of an n-qubit state requires specifying exponentially many amplitudes in the computational basis. Past works have demonstrated that classical neural networks can succinctly express these amplitudes for many physically relevant states, leading to computationally powerful representations known as neural quantum states. What underpins the efficacy of such representations? We show that conditional correlations present in the measurement distribution of quantum states control the performance of their neural representations. Such conditional correlations are basis dependent, arise due to measurement-induced entanglement, and reveal features not accessible through conventional few-body correlations often examined in studies of phases of matter. By combining theoretical and numerical analysis, we demonstrate how the state's entanglement and sign structure, along with the choice of measurement basis, give rise to distinct patterns of short- or long-range conditional correlations. Our findings provide a rigorous framework for exploring the expressive power of neural quantum states.

Updated: 2024-10-30 16:06:53

标题: 经典神经网络何时可以表示量子态？

摘要: 一种天真的n比特状态的经典表示需要在计算基础上指定指数级别的振幅。过去的研究表明，经典神经网络可以简洁地表达许多物理相关状态的这些振幅，从而导致计算强大的表示形式，被称为神经量子状态。是什么支持这种表示形式的有效性？我们展示了量子状态的测量分布中存在的条件相关性控制其神经表示的性能。这种条件相关性取决于基础，由于测量引起的纠缠产生，并揭示通过常规的少体相关性通常在物质相研究中检查不可访问的特征。通过结合理论和数值分析，我们展示了状态的纠缠和符号结构，以及测量基础的选择，如何产生短程或长程条件相关性的不同模式。我们的发现为探索神经量子状态的表达能力提供了严谨的框架。

更新时间: 2024-10-30 16:06:53

领域: quant-ph,cond-mat.str-el,cs.LG

下载: http://arxiv.org/abs/2410.23152v1

Attention-Enhanced Prioritized Proximal Policy Optimization for Adaptive Edge Caching

This paper tackles the growing issue of excessive data transmission in networks. With increasing traffic, backhaul links and core networks are under significant traffic, leading to the investigation of caching solutions at edge routers. Many existing studies utilize Markov Decision Processes (MDP) to tackle caching problems, often assuming decision points at fixed intervals; however, real-world environments are characterized by random request arrivals. Additionally, critical file attributes such as lifetime, size, and priority significantly impact the effectiveness of caching policies, yet existing research fails to integrate all these attributes in policy design. In this work, we model the caching problem using a Semi-Markov Decision Process (SMDP) to better capture the continuous-time nature of real-world applications, enabling caching decisions to be triggered by random file requests. We then introduce a Proximal Policy Optimization (PPO)--based caching strategy that fully considers file attributes like lifetime, size, and priority. Simulations show that our method outperforms a recent Deep Reinforcement Learning-based technique. To further advance our research, we improved the convergence rate of PPO by prioritizing transitions within the replay buffer through an attention mechanism. This mechanism evaluates the similarity between the current state and all stored transitions, assigning higher priorities to transitions that exhibit greater similarity.

Updated: 2024-10-30 16:06:21

标题: 关注增强的优先级近端策略优化用于自适应边缘缓存

摘要: 这篇论文解决了网络中过度数据传输的日益严重的问题。随着流量的增加，远程传输链接和核心网络承受着巨大的压力，导致在边缘路由器上研究缓存解决方案。许多现有研究利用马尔可夫决策过程（MDP）来解决缓存问题，通常假设在固定间隔的决策点上进行决策；然而，现实世界的环境特点是随机请求到达。此外，关键文件属性，如生命周期、大小和优先级，对缓存策略的有效性有着显著影响，但现有研究未能在策略设计中整合所有这些属性。在这项工作中，我们使用半马尔可夫决策过程（SMDP）对缓存问题进行建模，以更好地捕捉现实世界应用的连续时间特性，从而使缓存决策能够由随机文件请求触发。然后，我们引入了基于Proximal Policy Optimization（PPO）的缓存策略，充分考虑文件属性如生命周期、大小和优先级。模拟结果表明，我们的方法优于最近一种基于深度强化学习的技术。为了进一步推进我们的研究，我们通过优先考虑重播缓冲区内的转换来提高PPO的收敛速度，通过一个关注机制。这个机制评估当前状态与所有存储的转换之间的相似性，将更高优先级分配给展现出更大相似性的转换。

更新时间: 2024-10-30 16:06:21

领域: cs.NI,cs.LG,cs.SY,eess.SY

下载: http://arxiv.org/abs/2402.14576v3

HiBO: Hierarchical Bayesian Optimization via Adaptive Search Space Partitioning

Optimizing black-box functions in high-dimensional search spaces has been known to be challenging for traditional Bayesian Optimization (BO). In this paper, we introduce HiBO, a novel hierarchical algorithm integrating global-level search space partitioning information into the acquisition strategy of a local BO-based optimizer. HiBO employs a search-tree-based global-level navigator to adaptively split the search space into partitions with different sampling potential. The local optimizer then utilizes this global-level information to guide its acquisition strategy towards most promising regions within the search space. A comprehensive set of evaluations demonstrates that HiBO outperforms state-of-the-art methods in high-dimensional synthetic benchmarks and presents significant practical effectiveness in the real-world task of tuning configurations of database management systems (DBMSs).

Updated: 2024-10-30 16:04:16

标题: HiBO：通过自适应搜索空间划分的分层贝叶斯优化

摘要: 在高维搜索空间中优化黑盒函数一直是传统贝叶斯优化（BO）面临的挑战。本文介绍了一种新颖的分层算法HiBO，将全局级别的搜索空间划分信息整合到基于本地BO的优化器的获取策略中。HiBO利用基于搜索树的全局级别导航器自适应地将搜索空间划分为具有不同采样潜力的分区。然后，本地优化器利用这些全局级别信息来指导其获取策略朝着搜索空间中最有前途的区域发展。一系列全面的评估表明，HiBO在高维合成基准测试中优于最先进的方法，并在调整数据库管理系统（DBMSs）配置的实际任务中表现出显著的实际有效性。

更新时间: 2024-10-30 16:04:16

领域: cs.LG

下载: http://arxiv.org/abs/2410.23148v1

FoLDTree: A ULDA-Based Decision Tree Framework for Efficient Oblique Splits and Feature Selection

Traditional decision trees are limited by axis-orthogonal splits, which can perform poorly when true decision boundaries are oblique. While oblique decision tree methods address this limitation, they often face high computational costs, difficulties with multi-class classification, and a lack of effective feature selection. In this paper, we introduce LDATree and FoLDTree, two novel frameworks that integrate Uncorrelated Linear Discriminant Analysis (ULDA) and Forward ULDA into a decision tree structure. These methods enable efficient oblique splits, handle missing values, support feature selection, and provide both class labels and probabilities as model outputs. Through evaluations on simulated and real-world datasets, LDATree and FoLDTree consistently outperform axis-orthogonal and other oblique decision tree methods, achieving accuracy levels comparable to the random forest. The results highlight the potential of these frameworks as robust alternatives to traditional single-tree methods.

Updated: 2024-10-30 16:03:51

标题: FoLDTree：一种基于ULDA的决策树框架，用于高效的斜分裂和特征选择

摘要: 传统决策树受到轴正交分割的限制，当真实决策边界是斜向时，性能可能较差。虽然斜向决策树方法解决了这一限制，但往往面临高计算成本、多类别分类困难以及缺乏有效的特征选择。本文介绍了LDATree和FoLDTree两个新颖的框架，将无关线性判别分析（ULDA）和前向ULDA集成到决策树结构中。这些方法能够实现高效的斜向分割，处理缺失值，支持特征选择，并提供类标签和概率作为模型输出。通过对模拟和真实世界数据集的评估，LDATree和FoLDTree始终优于轴正交和其他斜向决策树方法，实现了与随机森林相当的准确性水平。结果突显了这些框架作为传统单一决策树方法的强大替代品的潜力。

更新时间: 2024-10-30 16:03:51

领域: cs.LG,stat.ME,stat.ML

下载: http://arxiv.org/abs/2410.23147v1

Public Domain 12M: A Highly Aesthetic Image-Text Dataset with Novel Governance Mechanisms

We present Public Domain 12M (PD12M), a dataset of 12.4 million high-quality public domain and CC0-licensed images with synthetic captions, designed for training text-to-image models. PD12M is the largest public domain image-text dataset to date, with sufficient size to train foundation models while minimizing copyright concerns. Through the Source.Plus platform, we also introduce novel, community-driven dataset governance mechanisms that reduce harm and support reproducibility over time.

Updated: 2024-10-30 15:59:05

标题: 公共领域12M：一种具有新型治理机制的高度美学的图像文本数据集

摘要: 我们介绍Public Domain 12M（PD12M）数据集，这是一个包含1240万高质量公有领域和CC0许可证图像以及合成标题的数据集，旨在用于训练文本到图像模型。PD12M是迄今为止最大的公有领域图像文本数据集，具有足够的大小来训练基础模型，同时最大程度地减少版权问题。通过Source.Plus平台，我们还引入了新颖的、社区驱动的数据集治理机制，以减少伤害并支持随时间的可再现性。

更新时间: 2024-10-30 15:59:05

领域: cs.AI

下载: http://arxiv.org/abs/2410.23144v1

The Good, the Bad, and the Ugly: The Role of AI Quality Disclosure in Lie Detection

We investigate how low-quality AI advisors, lacking quality disclosures, can help spread text-based lies while seeming to help people detect lies. Participants in our experiment discern truth from lies by evaluating transcripts from a game show that mimicked deceptive social media exchanges on topics with objective truths. We find that when relying on low-quality advisors without disclosures, participants' truth-detection rates fall below their own abilities, which recovered once the AI's true effectiveness was revealed. Conversely, high-quality advisor enhances truth detection, regardless of disclosure. We discover that participants' expectations about AI capabilities contribute to their undue reliance on opaque, low-quality advisors.

Updated: 2024-10-30 15:58:05

标题: 好的，坏的和丑陋的：AI质量披露在谎言检测中的作用

摘要: 我们研究了缺乏质量披露的低质量人工智能顾问如何在看似帮助人们检测谎言的同时传播基于文本的谎言。在我们的实验中，参与者通过评估模拟具有客观真相的欺骗性社交媒体交流的游戏节目的文字记录来辨别真相和谎言。我们发现，当依赖缺乏披露的低质量顾问时，参与者的真相检测率低于他们自己的能力，一旦揭示了人工智能的真正效果，这种情况得以恢复。相反，高质量顾问无论是否披露都增强了真相检测能力。我们发现，参与者对人工智能能力的期望导致了他们对不透明低质量顾问的过度依赖。

更新时间: 2024-10-30 15:58:05

领域: cs.CL,cs.AI,cs.CY,cs.HC,cs.LG

下载: http://arxiv.org/abs/2410.23143v1

FAIR-TAT: Improving Model Fairness Using Targeted Adversarial Training

Deep neural networks are susceptible to adversarial attacks and common corruptions, which undermine their robustness. In order to enhance model resilience against such challenges, Adversarial Training (AT) has emerged as a prominent solution. Nevertheless, adversarial robustness is often attained at the expense of model fairness during AT, i.e., disparity in class-wise robustness of the model. While distinctive classes become more robust towards such adversaries, hard to detect classes suffer. Recently, research has focused on improving model fairness specifically for perturbed images, overlooking the accuracy of the most likely non-perturbed data. Additionally, despite their robustness against the adversaries encountered during model training, state-of-the-art adversarial trained models have difficulty maintaining robustness and fairness when confronted with diverse adversarial threats or common corruptions. In this work, we address the above concerns by introducing a novel approach called Fair Targeted Adversarial Training (FAIR-TAT). We show that using targeted adversarial attacks for adversarial training (instead of untargeted attacks) can allow for more favorable trade-offs with respect to adversarial fairness. Empirical results validate the efficacy of our approach.

Updated: 2024-10-30 15:58:03

标题: FAIR-TAT：利用定向对抗训练提高模型公平性

摘要: 深度神经网络容易受到对抗性攻击和常见的损坏影响，从而损害其稳健性。为了增强模型对这些挑战的抵抗力，对抗训练（AT）已成为一种突出的解决方案。然而，在AT过程中，对抗性鲁棒性往往是以模型公平性为代价获得的，即模型在类别上的鲁棒性存在差异。虽然独特类别对这些对手变得更加稳健，但难以检测的类别却受到影响。最近的研究集中在特别改进对扰动图像的模型公平性，而忽视了最有可能未被扰动数据的准确性。此外，尽管最先进的对抗训练模型对在模型训练过程中遇到的对手具有鲁棒性，但在面对不同的对抗性威胁或常见损坏时，难以保持鲁棒性和公平性。在本研究中，我们通过引入一种名为公平目标对抗训练（FAIR-TAT）的新方法来解决上述问题。我们展示了使用有针对性的对抗性攻击进行对抗训练（而不是无针对性攻击）可以在对抗公平性方面实现更有利的权衡。实证结果验证了我们方法的有效性。

更新时间: 2024-10-30 15:58:03

领域: cs.LG,cs.CV

下载: http://arxiv.org/abs/2410.23142v1

Banyan: Fast Rotating Leader BFT

This paper presents Banyan, the first rotating leader state machine replication (SMR) protocol that allows transactions to be confirmed in just a single round-trip time in the Byzantine fault tolerance (BFT) setting. Based on minimal alterations to the Internet Computer Consensus (ICC) protocol and with negligible communication overhead, we introduce a novel dual mode mechanism that enables optimal block finalization latency in the fast path. Crucially, the modes of operation are integrated, such that even if the fast path is not effective, no penalties are incurred. Moreover, our algorithm maintains the core attributes of the ICC protocol it is based on, including optimistic responsiveness and rotating leaders without the necessity for a view-change protocol. We prove the correctness of our protocol and provide an open-source implementation of it. Banyan is compared to its predecessor ICC, as well as other well known BFT protocols, in a globally distributed wide-area network. Our evaluation reveals that Banyan reduces latency by up to 30% compared to state-of-the-art protocols, without requiring additional security assumptions.

Updated: 2024-10-30 15:55:23

标题: 榕树：快速轮换领导者BFT

摘要: 本文介绍了榕树（Banyan），这是第一个旋转领导者状态机复制（SMR）协议，允许在拜占庭容错（BFT）环境中仅需一次往返时间确认事务。基于对互联网计算机共识（ICC）协议的最小改动，并且通信开销微不足道，我们引入了一种新颖的双模式机制，可以在快速路径中实现最佳区块最终确认延迟。关键是，操作模式被整合在一起，即使快速路径不起作用，也不会产生任何惩罚。此外，我们的算法保持了基于的ICC协议的核心属性，包括乐观响应和旋转领导者，而无需进行视图更改协议。我们证明了我们协议的正确性，并提供了一个开源实现。在全球分布的广域网中，榕树与其前身ICC以及其他知名的BFT协议进行了比较。我们的评估显示，与最先进的协议相比，榕树可以将延迟降低高达30%，而无需额外的安全假设。

更新时间: 2024-10-30 15:55:23

领域: cs.DC,cs.CR

下载: http://arxiv.org/abs/2312.05869v2

Kronecker-Factored Approximate Curvature for Physics-Informed Neural Networks

Physics-informed neural networks (PINNs) are infamous for being hard to train. Recently, second-order methods based on natural gradient and Gauss-Newton methods have shown promising performance, improving the accuracy achieved by first-order methods by several orders of magnitude. While promising, the proposed methods only scale to networks with a few thousand parameters due to the high computational cost to evaluate, store, and invert the curvature matrix. We propose Kronecker-factored approximate curvature (KFAC) for PINN losses that greatly reduces the computational cost and allows scaling to much larger networks. Our approach goes beyond the established KFAC for traditional deep learning problems as it captures contributions from a PDE's differential operator that are crucial for optimization. To establish KFAC for such losses, we use Taylor-mode automatic differentiation to describe the differential operator's computation graph as a forward network with shared weights. This allows us to apply KFAC thanks to a recently-developed general formulation for networks with weight sharing. Empirically, we find that our KFAC-based optimizers are competitive with expensive second-order methods on small problems, scale more favorably to higher-dimensional neural networks and PDEs, and consistently outperform first-order methods and LBFGS.

Updated: 2024-10-30 15:53:30

标题: Kronecker分解近似曲率用于物理信息神经网络

摘要: 物理信息神经网络（PINNs）因难以训练而臭名昭著。最近，基于自然梯度和高斯-牛顿方法的二阶方法表现出有希望的性能，将第一阶方法所实现的准确性提高了数个数量级。尽管有所希望，但由于评估、存储和求逆曲率矩阵的高计算成本，提出的方法仅适用于具有少量参数的网络。我们提出了用于PINN损失的Kronecker分解近似曲率（KFAC），大大降低了计算成本，并允许扩展到更大的网络。我们的方法超越了传统深度学习问题的KFAC，因为它捕捉了对优化至关重要的PDE微分算子的贡献。为了建立这些损失的KFAC，我们使用Taylor-mode自动微分将微分算子的计算图描述为具有共享权重的前向网络。这使我们能够应用KFAC，这要归功于最近为具有权重共享的网络开发的一般公式。根据经验证，我们发现我们基于KFAC的优化器在小问题上与昂贵的二阶方法相竞争，对更高维度的神经网络和PDEs更有利，并且始终优于第一阶方法和LBFGS。

更新时间: 2024-10-30 15:53:30

领域: cs.LG,physics.comp-ph

下载: http://arxiv.org/abs/2405.15603v3

Fair Division with Market Values

We introduce a model of fair division with market values, where indivisible goods must be partitioned among agents with (additive) subjective valuations, and each good additionally has a market value. The market valuation can be viewed as a separate additive valuation that holds identically across all the agents. We seek allocations that are simultaneously fair with respect to the subjective valuations and with respect to the market valuation. We show that an allocation that satisfies stochastically-dominant envy-freeness up to one good (SD-EF1) with respect to both the subjective valuations and the market valuation does not always exist, but the weaker guarantee of EF1 with respect to the subjective valuations along with SD-EF1 with respect to the market valuation can be guaranteed. We also study a number of other guarantees such as Pareto optimality, EFX, and MMS. In addition, we explore non-additive valuations and extend our model to cake-cutting. Along the way, we identify several tantalizing open questions.

Updated: 2024-10-30 15:52:15

标题: 公平分配与市场价值

摘要: 我们介绍了一个具有市场价值的公平分配模型，其中不可分割的物品必须在具有（加法）主观估值的代理之间分配，每个物品另外还有一个市场价值。市场估值可以被视为一个独立的加法估值，在所有代理之间都是相同的。我们寻求在主观估值和市场估值方面同时公平的分配。我们展示了一个满足主观估值和市场估值的随机支配无嫉妒性（SD-EF1）的分配并不总是存在，但是在主观估值方面具有EF1的较弱保证，以及在市场估值方面具有SD-EF1的保证是可以保证的。我们还研究了一些其他保证，如帕累托最优性，EFX和MMS。此外，我们探讨了非加法估值，并将我们的模型扩展到切蛋糕问题。在此过程中，我们发现了一些耐人寻味的未解问题。

更新时间: 2024-10-30 15:52:15

领域: cs.GT,cs.AI

下载: http://arxiv.org/abs/2410.23137v1

Robust Statistical Scaling of Outlier Scores: Improving the Quality of Outlier Probabilities for Outliers (Extended Version)

Outlier detection algorithms typically assign an outlier score to each observation in a dataset, indicating the degree to which an observation is an outlier. However, these scores are often not comparable across algorithms and can be difficult for humans to interpret. Statistical scaling addresses this problem by transforming outlier scores into outlier probabilities without using ground-truth labels, thereby improving interpretability and comparability across algorithms. However, the quality of this transformation can be different for outliers and inliers. Missing outliers in scenarios where they are of particular interest - such as healthcare, finance, or engineering - can be costly or dangerous. Thus, ensuring good probabilities for outliers is essential. This paper argues that statistical scaling, as commonly used in the literature, does not produce equally good probabilities for outliers as for inliers. Therefore, we propose robust statistical scaling, which uses robust estimators to improve the probabilities for outliers. We evaluate several variants of our method against other outlier score transformations for real-world datasets and outlier detection algorithms, where it can improve the probabilities for outliers.

Updated: 2024-10-30 15:51:52

标题: 强健的统计异常值分数缩放：提高异常值概率的质量（扩展版）

摘要: 异常检测算法通常会为数据集中的每个观测分配一个异常分数，指示观测是异常的程度。然而，这些分数通常在算法之间不可比较，且对人类来说很难解释。统计缩放通过将异常分数转换为异常概率来解决这一问题，而不使用地面真实标签，从而提高了解释性和算法之间的可比性。然而，这种转换的质量对异常值和正常值可能不同。在特别重要的情况下缺少异常值，比如在医疗、金融或工程领域，可能会造成成本或危险。因此，确保异常值的良好概率至关重要。本文认为，通常在文献中使用的统计缩放并不能像对正常值那样为异常值产生同样好的概率。因此，我们提出了鲁棒统计缩放，使用鲁棒估计量来改善异常值的概率。我们评估了我们方法的几个变体与其他异常分数转换方法在现实世界数据集和异常检测算法中的表现，证明它可以提高异常值的概率。

更新时间: 2024-10-30 15:51:52

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2408.15874v3

Lorentz-Equivariant Geometric Algebra Transformers for High-Energy Physics

Extracting scientific understanding from particle-physics experiments requires solving diverse learning problems with high precision and good data efficiency. We propose the Lorentz Geometric Algebra Transformer (L-GATr), a new multi-purpose architecture for high-energy physics. L-GATr represents high-energy data in a geometric algebra over four-dimensional space-time and is equivariant under Lorentz transformations, the symmetry group of relativistic kinematics. At the same time, the architecture is a Transformer, which makes it versatile and scalable to large systems. L-GATr is first demonstrated on regression and classification tasks from particle physics. We then construct the first Lorentz-equivariant generative model: a continuous normalizing flow based on an L-GATr network, trained with Riemannian flow matching. Across our experiments, L-GATr is on par with or outperforms strong domain-specific baselines.

Updated: 2024-10-30 15:50:21

标题: 洛伦兹等价几何代数变换器在高能物理中的应用

摘要: 从粒子物理实验中提取科学理解需要解决多样化的学习问题，要求高精度和良好的数据效率。我们提出了洛伦兹几何代数变压器（L-GATr），这是一种用于高能物理的新型多功能架构。L-GATr在四维时空中的几何代数中表示高能数据，并且在洛伦兹变换下是等变的，即相对论运动学的对称群。与此同时，该架构是一个Transformer，使其具有多功能性和可扩展性以适应大型系统。L-GATr 首先在粒子物理的回归和分类任务上进行了演示。然后我们构建了第一个洛伦兹等变生成模型：基于L-GATr网络的连续标准化流，通过黎曼流匹配进行训练。在我们的实验中，L-GATr 与强领域特定基线相当或表现更好。

更新时间: 2024-10-30 15:50:21

领域: physics.data-an,cs.LG,hep-ph,stat.ML

下载: http://arxiv.org/abs/2405.14806v3

Human Expertise in Algorithmic Prediction

We introduce a novel framework for incorporating human expertise into algorithmic predictions. Our approach leverages human judgment to distinguish inputs which are algorithmically indistinguishable, or "look the same" to predictive algorithms. We argue that this framing clarifies the problem of human-AI collaboration in prediction tasks, as experts often form judgments by drawing on information which is not encoded in an algorithm's training data. Algorithmic indistinguishability yields a natural test for assessing whether experts incorporate this kind of "side information", and further provides a simple but principled method for selectively incorporating human feedback into algorithmic predictions. We show that this method provably improves the performance of any feasible algorithmic predictor and precisely quantify this improvement. We find empirically that although algorithms often outperform their human counterparts on average, human judgment can improve algorithmic predictions on specific instances (which can be identified ex-ante). In an X-ray classification task, we find that this subset constitutes nearly $30\%$ of the patient population. Our approach provides a natural way of uncovering this heterogeneity and thus enabling effective human-AI collaboration.

Updated: 2024-10-30 15:45:32

标题: Algorithmic Prediction中的人类专业知识

摘要: 我们提出了一个新颖的框架，将人类专业知识融入算法预测中。我们的方法利用人类判断力来区分那些在算法上无法区分或者"看起来相同"的输入，以便预测算法。我们认为这种框架澄清了人类与人工智能在预测任务中的合作问题，因为专家通常通过利用不在算法训练数据中编码的信息来形成判断。算法上的无法区分性提供了一个自然的测试，用于评估专家是否整合了这种"附加信息"，并进一步提供了一个简单而有原则的方法，用于有选择地将人类反馈整合到算法预测中。我们展示了这种方法可以明显提高任何可行算法预测器的性能，并精确量化了这种改进。我们实验证明，尽管算法通常在平均水平上优于人类同行，但人类判断可以提高特定实例上的算法预测（这些实例可以在先验条件下确定）。在X光分类任务中，我们发现这个子集占患者群体的近30%。我们的方法提供了一种自然的方式来揭示这种异质性，从而实现有效的人工智能合作。

更新时间: 2024-10-30 15:45:32

领域: cs.LG,cs.AI,cs.HC

下载: http://arxiv.org/abs/2402.00793v3

Revisiting MAE pre-training for 3D medical image segmentation

Self-Supervised Learning (SSL) presents an exciting opportunity to unlock the potential of vast, untapped clinical datasets, for various downstream applications that suffer from the scarcity of labeled data. While SSL has revolutionized fields like natural language processing and computer vision, their adoption in 3D medical image computing has been limited by three key pitfalls: Small pre-training dataset sizes, architectures inadequate for 3D medical image analysis, and insufficient evaluation practices. We address these issues by i) leveraging a large-scale dataset of 44k 3D brain MRI volumes and ii) using a Residual Encoder U-Net architecture within the state-of-the-art nnU-Net framework. iii) A robust development framework, incorporating 5 development and 8 testing brain MRI segmentation datasets, allowed performance-driven design decisions to optimize the simple concept of Masked Auto Encoders (MAEs) for 3D CNNs. The resulting model not only surpasses previous SSL methods but also outperforms the strong nnU-Net baseline by an average of approximately 3 Dice points. Furthermore, our model demonstrates exceptional stability, achieving the highest average rank of 2 out of 7 methods, compared to the second-best method's mean rank of 3.

Updated: 2024-10-30 15:42:59

标题: 重新审视用于三维医学图像分割的MAE预训练

摘要: 自监督学习（SSL）为挖掘大量未开发的临床数据集提供了一个激动人心的机会，用于各种下游应用，这些应用饱受标记数据稀缺之苦。尽管SSL已经彻底改变了自然语言处理和计算机视觉等领域，但它们在3D医学图像计算中的应用受到了三个关键缺陷的限制：小规模的预训练数据集、不适用于3D医学图像分析的架构和不足的评估实践。我们通过（i）利用一个包含44k个3D脑MRI体积的大型数据集，并（ii）在最先进的nnU-Net框架内使用残差编码器U-Net架构来解决这些问题。（iii）一个强大的开发框架，整合了5个开发和8个测试脑MRI分割数据集，允许基于性能的设计决策来优化3D CNNs的简单概念Masked Auto Encoders（MAEs）。结果模型不仅超越了先前的SSL方法，而且平均优于强大的nnU-Net基准约3个Dice点。此外，我们的模型表现出色稳定性，平均排名为7种方法中的第2名，而第二名方法的平均排名为3。

更新时间: 2024-10-30 15:42:59

领域: cs.CV,cs.AI,cs.LG

下载: http://arxiv.org/abs/2410.23132v1

Quantum Boltzmann machine learning of ground-state energies

Estimating the ground-state energy of Hamiltonians is a fundamental task for which it is believed that quantum computers can be helpful. Several approaches have been proposed toward this goal, including algorithms based on quantum phase estimation and hybrid quantum-classical optimizers involving parameterized quantum circuits, the latter falling under the umbrella of the variational quantum eigensolver. Here, we analyze the performance of quantum Boltzmann machines for this task, which is a less explored ansatz based on parameterized thermal states and which is not known to suffer from the barren-plateau problem. We delineate a hybrid quantum-classical algorithm for this task and rigorously prove that it converges to an $\varepsilon$-approximate stationary point of the energy function optimized over parameter space, while using a number of parameterized-thermal-state samples that is polynomial in $\varepsilon^{-1}$, the number of parameters, and the norm of the Hamiltonian being optimized. Our algorithm estimates the gradient of the energy function efficiently by means of a novel quantum circuit construction that combines classical sampling, Hamiltonian simulation, and the Hadamard test, thus overcoming a key obstacle to quantum Boltzmann machine learning that has been left open since [Amin et al., Phys. Rev. X 8, 021050 (2018)]. Additionally supporting our main claims are calculations of the gradient and Hessian of the energy function, as well as an upper bound on the matrix elements of the latter that is used in the convergence analysis.

Updated: 2024-10-30 15:42:01

标题: 量子玻尔兹曼机器学习地态能量

摘要: 估计哈密顿量的基态能量是一个基本任务，人们认为量子计算机可以提供帮助。针对这一目标提出了几种方法，包括基于量子相位估计和涉及参数化量子电路的混合量子-经典优化器的算法，后者属于变分量子本征求解器的范畴。在这里，我们分析了量子玻尔兹曼机在此任务中的表现，这是一种基于参数化热态的较少探索的方法，不被认为存在无效平台问题。我们勾勒了一个用于此任务的混合量子-经典算法，并严格证明它收敛到能量函数在参数空间上优化的一个$\varepsilon$-近似稳定点，同时使用的参数化热态样本数量是多项式量级的$\varepsilon^{-1}$、参数数量和被优化的哈密顿量的范数。我们的算法通过一种新颖的量子电路构造有效地估计能量函数的梯度，结合了经典采样、哈密顿模拟和哈达玛测试，从而克服了自[Amin等人，Phys. Rev. X 8, 021050 (2018)]以来一直悬而未决的量子玻尔兹曼机学习的一个关键障碍。此外，支持我们主要主张的是能量函数的梯度和海森矩阵的计算，以及后者矩阵元的上界，该上界在收敛分析中被使用。

更新时间: 2024-10-30 15:42:01

领域: quant-ph,cond-mat.stat-mech,cs.LG,math.OC

下载: http://arxiv.org/abs/2410.12935v2

Federated Learning under Periodic Client Participation and Heterogeneous Data: A New Communication-Efficient Algorithm and Analysis

In federated learning, it is common to assume that clients are always available to participate in training, which may not be feasible with user devices in practice. Recent works analyze federated learning under more realistic participation patterns, such as cyclic client availability or arbitrary participation. However, all such works either require strong assumptions (e.g., all clients participate almost surely within a bounded window), do not achieve linear speedup and reduced communication rounds, or are not applicable in the general non-convex setting. In this work, we focus on nonconvex optimization and consider participation patterns in which the chance of participation over a fixed window of rounds is equal among all clients, which includes cyclic client availability as a special case. Under this setting, we propose a new algorithm, named Amplified SCAFFOLD, and prove that it achieves linear speedup, reduced communication, and resilience to data heterogeneity simultaneously. In particular, for cyclic participation, our algorithm is proved to enjoy $\mathcal{O}(\epsilon^{-2})$ communication rounds to find an $\epsilon$-stationary point in the non-convex stochastic setting. In contrast, the prior work under the same setting requires $\mathcal{O}(\kappa^2 \epsilon^{-4})$ communication rounds, where $\kappa$ denotes the data heterogeneity. Therefore, our algorithm significantly reduces communication rounds due to better dependency in terms of $\epsilon$ and $\kappa$. Our analysis relies on a fine-grained treatment of the nested dependence between client participation and errors in the control variates, which results in tighter guarantees than previous work. We also provide experimental results with (1) synthetic data and (2) real-world data with a large number of clients $(N = 250)$, demonstrating the effectiveness of our algorithm under periodic client participation.

Updated: 2024-10-30 15:41:35

标题: 分散学习在周期性客户参与和异构数据下：一种新的通信高效算法和分析

摘要: 在联邦学习中，通常假设客户端始终可用于参与训练，但在实际用户设备中可能不可行。最近的研究分析了在更现实的参与模式下的联邦学习，例如循环客户端可用性或任意参与。然而，所有这些工作要么需要强假设（例如，所有客户端几乎肯定在有界窗口内参与），要么无法实现线性加速和减少通信轮次，或者在一般的非凸设置下不适用。在这项工作中，我们专注于非凸优化，并考虑参与模式，其中在固定轮次窗口内参与的机会在所有客户端中是相等的，这包括循环客户端可用性作为特例。在这种设置下，我们提出了一种名为Amplified SCAFFOLD的新算法，并证明它同时实现了线性加速、减少通信和对数据异质性的韧性。特别地，对于周期性参与，我们的算法被证明在非凸随机设置中寻找到一个ε-稳定点只需$O(ε^{-2})$通信轮次。相比之下，在相同设置下的先前工作需要$O(κ^2ε^{-4})$通信轮次，其中κ表示数据的异质性。因此，我们的算法显著减少了通信轮次，这是由于在ε和κ方面具有更好的依赖性。我们的分析依赖于对客户端参与和控制变量中的错误之间嵌套依赖的精细处理，这导致比以前的工作更紧密的保证。我们还提供了使用（1）合成数据和（2）具有大量客户端（N = 250）的实际数据的实验结果，展示了我们的算法在周期性客户端参与下的有效性。

更新时间: 2024-10-30 15:41:35

领域: cs.LG,cs.DC

下载: http://arxiv.org/abs/2410.23131v1

Why Fine-grained Labels in Pretraining Benefit Generalization?

Recent studies show that pretraining a deep neural network with fine-grained labeled data, followed by fine-tuning on coarse-labeled data for downstream tasks, often yields better generalization than pretraining with coarse-labeled data. While there is ample empirical evidence supporting this, the theoretical justification remains an open problem. This paper addresses this gap by introducing a "hierarchical multi-view" structure to confine the input data distribution. Under this framework, we prove that: 1) coarse-grained pretraining only allows a neural network to learn the common features well, while 2) fine-grained pretraining helps the network learn the rare features in addition to the common ones, leading to improved accuracy on hard downstream test samples.

Updated: 2024-10-30 15:41:30

标题: 为什么预训练中的细粒度标签有益于泛化？

摘要: 最近的研究表明，先用细粒度标记数据对深度神经网络进行预训练，然后在粗标记数据上进行微调以进行下游任务，通常比使用粗标记数据进行预训练更好地泛化。虽然有大量的实证证据支持这一点，但理论上的证明仍然是一个开放的问题。本文通过引入“分层多视图”结构来限制输入数据分布来填补这一差距。在这个框架下，我们证明：1）粗粒度预训练只允许神经网络很好地学习共同特征，而2）细粒度预训练帮助网络学习稀有特征以及共同特征，从而在困难的下游测试样本上提高准确性。

更新时间: 2024-10-30 15:41:30

领域: cs.LG,cs.CV,stat.ML

下载: http://arxiv.org/abs/2410.23129v1

Deep learning meets tree phenology modeling: PhenoFormer vs. process-based models

Phenology, the timing of cyclical plant life events such as leaf emergence and coloration, is crucial in the bio-climatic system. Climate change drives shifts in these phenological events, impacting ecosystems and the climate itself. Accurate phenology models are essential to predict the occurrence of these phases under changing climatic conditions. Existing methods include hypothesis-driven process models and data-driven statistical approaches. Process models account for dormancy stages and various phenology drivers, while statistical models typically rely on linear or traditional machine learning techniques. Research shows that process models often outperform statistical methods when predicting under climate conditions outside historical ranges, especially with climate change scenarios. However, deep learning approaches remain underexplored in climate phenology modeling. We introduce PhenoFormer, a neural architecture better suited than traditional statistical methods at predicting phenology under shift in climate data distribution, while also bringing significant improvements or performing on par to the best performing process-based models. Our numerical experiments on a 70-year dataset of 70,000 phenological observations from 9 woody species in Switzerland show that PhenoFormer outperforms traditional machine learning methods by an average of 13% R2 and 1.1 days RMSE for spring phenology, and 11% R2 and 0.7 days RMSE for autumn phenology, while matching or exceeding the best process-based models. Our results demonstrate that deep learning has the potential to be a valuable methodological tool for accurate climate-phenology prediction, and our PhenoFormer is a first promising step in improving phenological predictions before a complete understanding of the underlying physiological mechanisms is available.

Updated: 2024-10-30 15:40:55

标题: 深度学习与树木物候建模相遇：PhenoFormer与基于过程的模型

摘要: 物候学，即植物周期性生命周期事件的时机，如叶片出现和变色，对生物气候系统至关重要。气候变化导致这些物候事件的转变，影响生态系统和气候本身。准确的物候模型对于预测这些阶段在不断变化的气候条件下的发生至关重要。现有的方法包括基于假设的过程模型和基于数据驱动的统计方法。过程模型考虑了休眠阶段和各种物候驱动因素，而统计模型通常依赖于线性或传统的机器学习技术。研究表明，当预测历史范围之外的气候条件时，过程模型通常优于统计方法，尤其是在气候变化情景下。然而，深度学习方法在气候物候建模中仍未得到充分探索。我们介绍了PhenoFormer，这是一种神经结构，比传统统计方法更适合在气候数据分布发生变化时预测物候，同时也带来了显著改进或与表现最佳的基于过程的模型相媲美。我们在瑞士9种木本植物的70年、7万次物候观察数据集上进行了数值实验，结果显示，PhenoFormer在春季物候方面比传统机器学习方法平均提高了13%的R2和1.1天的RMSE，在秋季物候方面提高了11%的R2和0.7天的RMSE，同时匹配或超过了最佳基于过程的模型。我们的结果表明，深度学习有潜力成为准确气候物候预测的有价值的方法论工具，而我们的PhenoFormer是改善物候预测的第一步，尽管在完全了解潜在生理机制之前。

更新时间: 2024-10-30 15:40:55

领域: q-bio.QM,cs.CV,cs.LG

下载: http://arxiv.org/abs/2410.23327v1

Reassessing Noise Augmentation Methods in the Context of Adversarial Speech

In this study, we investigate if noise-augmented training can concurrently improve adversarial robustness in automatic speech recognition (ASR) systems. We conduct a comparative analysis of the adversarial robustness of four different state-of-the-art ASR architectures, where each of the ASR architectures is trained under three different augmentation conditions: one subject to background noise, speed variations, and reverberations, another subject to speed variations only, and a third without any form of data augmentation. The results demonstrate that noise augmentation not only improves model performance on noisy speech but also the model's robustness to adversarial attacks.

Updated: 2024-10-30 15:40:38

标题: 重新评估在对抗性语音环境中的噪声增强方法

摘要: 在这项研究中，我们调查了通过噪声增强训练是否可以同时提高自动语音识别（ASR）系统中的对抗鲁棒性。我们对四种不同的最先进ASR架构的对抗鲁棒性进行了比较分析，其中每种ASR架构在三种不同的增强条件下进行训练：一种受到背景噪声、语速变化和混响的影响，另一种仅受到语速变化的影响，第三种没有任何形式的数据增强。结果表明，噪声增强不仅提高了模型在嘈杂语音上的性能，还提高了模型对对抗性攻击的鲁棒性。

更新时间: 2024-10-30 15:40:38

领域: eess.AS,cs.LG,cs.SD

下载: http://arxiv.org/abs/2409.01813v2

Provably Optimal Memory Capacity for Modern Hopfield Models: Transformer-Compatible Dense Associative Memories as Spherical Codes

We study the optimal memorization capacity of modern Hopfield models and Kernelized Hopfield Models (KHMs), a transformer-compatible class of Dense Associative Memories. We present a tight analysis by establishing a connection between the memory configuration of KHMs and spherical codes from information theory. Specifically, we treat the stored memory set as a specialized spherical code. This enables us to cast the memorization problem in KHMs into a point arrangement problem on a hypersphere. We show that the optimal capacity of KHMs occurs when the feature space allows memories to form an optimal spherical code. This unique perspective leads to: (i) An analysis of how KHMs achieve optimal memory capacity, and identify corresponding necessary conditions. Importantly, we establish an upper capacity bound that matches the well-known exponential lower bound in the literature. This provides the first tight and optimal asymptotic memory capacity for modern Hopfield models. (ii) A sub-linear time algorithm $\mathtt{U}\text{-}\mathtt{Hop}$+ to reach KHMs' optimal capacity. (iii) An analysis of the scaling behavior of the required feature dimension relative to the number of stored memories. These efforts improve both the retrieval capability of KHMs and the representation learning of corresponding transformers. Experimentally, we provide thorough numerical results to back up theoretical findings.

Updated: 2024-10-30 15:35:51

标题: 现代Hopfield模型的可证明最优内存容量：与Transformer兼容的密集联想记忆作为球形编码

摘要: 我们研究了现代Hopfield模型和Kernelized Hopfield模型（KHMs）的最佳记忆容量，这是一种与变压器兼容的稠密关联记忆类。我们通过建立KHMs的记忆配置与信息理论中的球形码之间的联系，提出了一种紧密的分析。具体地，我们将存储的记忆集视为一种专门的球形码。这使我们能够将KHMs中的记忆问题转化为在超球面上的点排列问题。我们展示了当特征空间允许记忆形成最佳球形码时，KHMs的最佳容量发生。这种独特的视角导致了：（i）对KHMs如何实现最佳记忆容量的分析，并确定相应的必要条件。重要的是，我们建立了一个上容量界，与文献中已知的指数下界相匹配。这为现代Hopfield模型提供了第一个紧密和最佳的渐近记忆容量。（ii）一个次线性时间算法U-Hop+来达到KHMs的最佳容量。（iii）对所需特征维度相对于存储记忆数量的缩放行为进行分析。这些努力提高了KHMs的检索能力和相应变压器的表示学习。在实验上，我们提供了充分的数值结果来支持理论发现。

更新时间: 2024-10-30 15:35:51

领域: stat.ML,cs.AI,cs.LG,cs.NE

下载: http://arxiv.org/abs/2410.23126v1

Robustness of data-driven approaches in limited angle tomography

The limited angle Radon transform is notoriously difficult to invert due to its ill-posedness. In this work, we give a mathematical explanation that data-driven approaches can stably reconstruct more information compared to traditional methods like filtered backprojection. In addition, we use experiments based on the U-Net neural network to validate our theory.

Updated: 2024-10-30 15:34:53

标题: 有限角度层析成像中数据驱动方法的稳健性

摘要: 由于其逆问题的不适定性，有限角度Radon变换极其难以反演。在这项工作中，我们提供了一个数学解释，即数据驱动方法可以相对于传统方法，如滤波反投影，更稳定地重建更多信息。此外，我们使用基于U-Net神经网络的实验证实了我们的理论。

更新时间: 2024-10-30 15:34:53

领域: math.NA,cs.LG,cs.NA,35R30

下载: http://arxiv.org/abs/2403.11350v2

Optimal deep learning of holomorphic operators between Banach spaces

Operator learning problems arise in many key areas of scientific computing where Partial Differential Equations (PDEs) are used to model physical systems. In such scenarios, the operators map between Banach or Hilbert spaces. In this work, we tackle the problem of learning operators between Banach spaces, in contrast to the vast majority of past works considering only Hilbert spaces. We focus on learning holomorphic operators - an important class of problems with many applications. We combine arbitrary approximate encoders and decoders with standard feedforward Deep Neural Network (DNN) architectures - specifically, those with constant width exceeding the depth - under standard $\ell^2$-loss minimization. We first identify a family of DNNs such that the resulting Deep Learning (DL) procedure achieves optimal generalization bounds for such operators. For standard fully-connected architectures, we then show that there are uncountably many minimizers of the training problem that yield equivalent optimal performance. The DNN architectures we consider are `problem agnostic', with width and depth only depending on the amount of training data $m$ and not on regularity assumptions of the target operator. Next, we show that DL is optimal for this problem: no recovery procedure can surpass these generalization bounds up to log terms. Finally, we present numerical results demonstrating the practical performance on challenging problems including the parametric diffusion, Navier-Stokes-Brinkman and Boussinesq PDEs.

Updated: 2024-10-30 15:34:22

标题: 最优的深度学习：Banach空间之间的全纯算子

摘要: 运算学习问题在许多科学计算的关键领域中出现，其中使用偏微分方程（PDEs）来模拟物理系统。在这种情况下，运算符在Banach空间或Hilbert空间之间进行映射。在这项工作中，我们解决了学习Banach空间之间的运算符的问题，与过去绝大多数只考虑Hilbert空间的工作形成对比。我们专注于学习全纯运算符 - 这是一个具有许多应用的重要问题类别。我们将任意近似编码器和解码器与标准前馈深度神经网络（DNN）架构相结合 - 具体来说，宽度超过深度的常数 - 在标准的$\ell^2$-损失最小化下。我们首先确定了一组DNN族，使得得到的深度学习（DL）过程为这些运算符实现了最优的泛化界限。对于标准的全连接架构，我们展示了训练问题有无穷多个最小化者，它们提供了等效的最佳性能。我们考虑的DNN架构是“问题不可知”的，宽度和深度仅取决于训练数据量$m$，而不取决于目标运算符的正则性假设。接下来，我们展示了DL对于这个问题是最优的：没有恢复过程可以超越这些泛化界限，直到对数项。最后，我们提供了数值结果，展示了在具有挑战性的问题上的实际性能，包括参数扩散、Navier-Stokes-Brinkman和Boussinesq PDEs。

更新时间: 2024-10-30 15:34:22

领域: cs.LG,cs.NA,math.NA

下载: http://arxiv.org/abs/2406.13928v2

Gaussian process-based online health monitoring and fault analysis of lithium-ion battery systems from field data

Health monitoring, fault analysis, and detection are critical for the safe and sustainable operation of battery systems. We apply Gaussian process resistance models on lithium iron phosphate battery field data to effectively separate the time-dependent and operating point-dependent resistance. The data set contains 29 battery systems returned to the manufacturer for warranty, each with eight cells in series, totaling 232 cells and 131 million data rows. We develop probabilistic fault detection rules using recursive spatiotemporal Gaussian processes. These processes allow the quick processing of over a million data points, enabling advanced online monitoring and furthering the understanding of battery pack failure in the field. The analysis underlines that often, only a single cell shows abnormal behavior or a knee point, consistent with weakest-link failure for cells connected in series, amplified by local resistive heating. The results further the understanding of how batteries degrade and fail in the field and demonstrate the potential of efficient online monitoring based on data. We open-source the code and publish the large data set upon completion of the review of this article.

Updated: 2024-10-30 15:28:05

标题: 高斯过程基于的锂离子电池系统在线健康监测和故障分析的研究【来自现场数据】

摘要: 健康监测、故障分析和检测对于电池系统的安全和可持续运行至关重要。我们将高斯过程电阻模型应用于锂铁磷酸盐电池现场数据，有效地分离了时间依赖性和工作点依赖性电阻。数据集包含了29个退回制造商进行保修的电池系统，每个系统有8个串联电池，总共232个电池和1.31亿个数据行。我们利用递归时空高斯过程开发了概率故障检测规则。这些过程能够快速处理超过一百万个数据点，实现先进的在线监测，并深入了解电池组在现场的失效情况。分析强调，通常只有一个电池显示异常行为或拐点，与串联连接的电池的最弱环节故障一致，由局部电阻加热放大。结果进一步了解了电池在现场如何退化和失效，并展示了基于数据的高效在线监测的潜力。我们将代码开源并在此文章审核完成后发布大型数据集。

更新时间: 2024-10-30 15:28:05

领域: cs.LG,cs.AI,cs.SY,eess.SY,stat.AP,I.2.6

下载: http://arxiv.org/abs/2406.19015v3

Dynamic Vocabulary Pruning in Early-Exit LLMs

Increasing the size of large language models (LLMs) has been shown to lead to better performance. However, this comes at the cost of slower and more expensive inference. Early-exiting is a promising approach for improving the efficiency of LLM inference by enabling next token prediction at intermediate layers. Yet, the large vocabulary size in modern LLMs makes the confidence estimation required for exit decisions computationally expensive, diminishing the efficiency gains. To address this, we propose dynamically pruning the vocabulary at test time for each token. Specifically, the vocabulary is pruned at one of the initial layers, and the smaller vocabulary is then used throughout the rest of the forward pass. Our experiments demonstrate that such post-hoc dynamic vocabulary pruning improves the efficiency of confidence estimation in early-exit LLMs while maintaining competitive performance.

Updated: 2024-10-30 15:28:02

标题: 早期退出LLM中的动态词汇修剪

摘要: 增加大型语言模型（LLMs）的大小已被证明可以提高性能。然而，这样做的代价是推理速度更慢、更昂贵。早期退出是一种提高LLM推理效率的有前途的方法，通过在中间层实现下一个标记预测。然而，现代LLMs中的大词汇量使得用于退出决策的置信度估计计算成本高昂，降低了效率收益。为了解决这个问题，我们提出在测试时动态修剪每个标记的词汇表。具体而言，词汇表在初始层之一被修剪，然后在整个前向传递过程中使用更小的词汇表。我们的实验表明，这种事后动态词汇表修剪提高了早期退出LLM中置信度估计的效率，同时保持了竞争性能。

更新时间: 2024-10-30 15:28:02

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2410.18952v2

Teaching a Language Model to Distinguish Between Similar Details using a Small Adversarial Training Set

Language models can achieve high accuracy on natural language tasks such as NLI, but performance suffers on manually created adversarial examples. We investigate the performance of a language model trained on the Stanford Natural Language Inference (SNLI) corpus on a manually created adversarial test set. We then improve the model's performance by fine tuning the model on a small, manually created adversarial training set, designed to help the language model to learn to differentiate between similar words and phrases in the data. We show an increase in accuracy on the adversarial test set (+ 13%) while still maintaining good performance on the original NLI task. We also show an increase in accuracy from 91.2% to 92.9% on the most similar contradictions in the SNLI test set (as judged by cosine similarity).

Updated: 2024-10-30 15:27:55

标题: 教授语言模型使用小对抗训练集区分相似细节

摘要: 语言模型在自然语言任务（如NLI）上可以取得很高的准确性，但在人工创建的对抗性示例上性能会受到影响。我们研究了在斯坦福自然语言推理（SNLI）语料库上训练的语言模型在一个手动创建的对抗性测试集上的性能。然后，我们通过在一个小的、手动创建的对抗性训练集上对模型进行微调来提高模型的性能，该训练集旨在帮助语言模型学习区分数据中相似的单词和短语。我们展示了在对抗性测试集上准确性的提高（+13%），同时仍保持在原始NLI任务上的良好性能。我们还展示了在SNLI测试集中最相似的矛盾上准确性从91.2%提高到92.9%（由余弦相似度判断）。

更新时间: 2024-10-30 15:27:55

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2410.23118v1

DistriBlock: Identifying adversarial audio samples by leveraging characteristics of the output distribution

Adversarial attacks can mislead automatic speech recognition (ASR) systems into predicting an arbitrary target text, thus posing a clear security threat. To prevent such attacks, we propose DistriBlock, an efficient detection strategy applicable to any ASR system that predicts a probability distribution over output tokens in each time step. We measure a set of characteristics of this distribution: the median, maximum, and minimum over the output probabilities, the entropy of the distribution, as well as the Kullback-Leibler and the Jensen-Shannon divergence with respect to the distributions of the subsequent time step. Then, by leveraging the characteristics observed for both benign and adversarial data, we apply binary classifiers, including simple threshold-based classification, ensembles of such classifiers, and neural networks. Through extensive analysis across different state-of-the-art ASR systems and language data sets, we demonstrate the supreme performance of this approach, with a mean area under the receiver operating characteristic curve for distinguishing target adversarial examples against clean and noisy data of 99% and 97%, respectively. To assess the robustness of our method, we show that adaptive adversarial examples that can circumvent DistriBlock are much noisier, which makes them easier to detect through filtering and creates another avenue for preserving the system's robustness.

Updated: 2024-10-30 15:25:54

标题: DistriBlock：利用输出分布特征识别对抗性音频样本

摘要: 对抗性攻击可以误导自动语音识别（ASR）系统，使其预测任意目标文本，从而构成明显的安全威胁。为了防止这种攻击，我们提出了DistriBlock，一种高效的检测策略，适用于任何在每个时间步预测输出令牌概率分布的ASR系统。我们测量了这个分布的一组特征：输出概率的中位数、最大值和最小值，分布的熵，以及相对于后续时间步的分布的Kullback-Leibler和Jensen-Shannon散度。然后，通过利用对良性和对抗性数据观察到的特征，我们应用二元分类器，包括基于阈值的简单分类、这些分类器的集成以及神经网络。通过对不同最先进的ASR系统和语言数据集进行广泛分析，我们展示了这种方法的卓越性能，对于区分目标对抗性示例与清洁和嘈杂数据的平均接收者操作特征曲线下面积分别为99%和97%。为了评估我们方法的鲁棒性，我们展示了可以规避DistriBlock的自适应对抗性示例更加嘈杂，这使得它们更容易通过过滤检测，并为保护系统的鲁棒性创造了另一途径。

更新时间: 2024-10-30 15:25:54

领域: cs.SD,cs.CR,cs.LG,eess.AS

下载: http://arxiv.org/abs/2305.17000v7

Continuous Product Graph Neural Networks

Processing multidomain data defined on multiple graphs holds significant potential in various practical applications in computer science. However, current methods are mostly limited to discrete graph filtering operations. Tensorial partial differential equations on graphs (TPDEGs) provide a principled framework for modeling structured data across multiple interacting graphs, addressing the limitations of the existing discrete methodologies. In this paper, we introduce Continuous Product Graph Neural Networks (CITRUS) that emerge as a natural solution to the TPDEG. CITRUS leverages the separability of continuous heat kernels from Cartesian graph products to efficiently implement graph spectral decomposition. We conduct thorough theoretical analyses of the stability and over-smoothing properties of CITRUS in response to domain-specific graph perturbations and graph spectra effects on the performance. We evaluate CITRUS on well-known traffic and weather spatiotemporal forecasting datasets, demonstrating superior performance over existing approaches. The implementation codes are available at https://github.com/ArefEinizade2/CITRUS.

Updated: 2024-10-30 15:25:38

标题: 连续产品图神经网络

摘要: 在计算机科学中，处理定义在多个图上的多域数据具有重要的潜力。然而，当前的方法大多局限于离散图滤波操作。图上的张量偏微分方程（TPDEG）提供了一个合理的框架，用于对多个相互作用图上的结构化数据进行建模，解决了现有离散方法的局限性。在本文中，我们介绍了连续乘积图神经网络（CITRUS），它作为TPDEG的一个自然解决方案。CITRUS利用连续热核与笛卡尔图乘积的可分离性，有效地实现了图谱分解。我们对CITRUS的稳定性和过度平滑特性进行了彻底的理论分析，以应对特定领域的图扰动和图谱效应对性能的影响。我们在知名的交通和天气时空预测数据集上评估了CITRUS，展示了其优于现有方法的性能。实现代码可在https://github.com/ArefEinizade2/CITRUS找到。

更新时间: 2024-10-30 15:25:38

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2405.18877v2

Unified Triplet-Level Hallucination Evaluation for Large Vision-Language Models

Despite the outstanding performance in vision-language reasoning, Large Vision-Language Models (LVLMs) might generate hallucinated contents that do not exist in the given image. Most existing LVLM hallucination benchmarks are constrained to evaluate the object-related hallucinations. However, the potential hallucination on the relations between two objects, i.e., relation hallucination, still lacks investigation. To remedy that, in this paper we design a unified framework to measure object and relation hallucination in LVLMs simultaneously. The core idea of our framework is to conduct hallucination evaluation on (object, relation, object) triplets extracted from LVLMs' responses, and thus, could be easily generalized to different vision-language tasks. Based on our framework, we further introduce Tri-HE, a novel Triplet-level Hallucination Evaluation benchmark which can be used to study both object and relation hallucination at the same time. We conduct comprehensive evaluations on Tri-HE and observe that the relation hallucination issue is even more serious than object hallucination among existing LVLMs, highlighting a previously neglected problem towards reliable LVLMs. Moreover, based on our findings, we design a simple yet effective training-free approach to mitigate hallucinations for LVLMs, with which, we exceed all open-sourced counterparts on Tri-HE, achieving comparable performance with the powerful GPT-4V. Our dataset and code for the reproduction of our experiments are available publicly at https://github.com/wujunjie1998/Tri-HE.

Updated: 2024-10-30 15:25:06

标题: 大型视觉-语言模型的统一三元级幻觉评估

摘要: 尽管大型视觉-语言模型（LVLMs）在视觉-语言推理方面表现出色，但可能会生成在给定图像中不存在的幻觉内容。大多数现有的LVLM幻觉基准主要用于评估与对象相关的幻觉。然而，关于两个对象之间的关系幻觉的潜在问题仍然缺乏研究。为了解决这个问题，在本文中，我们设计了一个统一框架，同时测量LVLMs中的对象和关系幻觉。我们框架的核心思想是对从LVLMs的响应中提取出的（对象，关系，对象）三元组进行幻觉评估，因此，可以轻松地推广到不同的视觉-语言任务。基于我们的框架，我们进一步引入了Tri-HE，一个新颖的三元幻觉评估基准，可以用来同时研究对象和关系幻觉。我们对Tri-HE进行了全面评估，观察到在现有LVLMs中，关系幻觉问题甚至比对象幻觉更为严重，突显了一个之前被忽视的问题，即可靠的LVLMs。此外，基于我们的发现，我们设计了一个简单但有效的无需训练的方法来减轻LVLMs的幻觉，借此我们在Tri-HE上超过了所有开源对应物，实现了与强大的GPT-4V相当的性能。我们的数据集和用于重现实验的代码已公开在https://github.com/wujunjie1998/Tri-HE。

更新时间: 2024-10-30 15:25:06

领域: cs.CV,cs.AI,cs.CL,cs.LG

下载: http://arxiv.org/abs/2410.23114v1

Why Gradient Subspace? Identifying and Mitigating LoRA's Bottlenecks in Federated Fine-Tuning of Large Language Models

Large Language Models (LLMs) have demonstrated remarkable capabilities across various domains, particularly in task generalization for both text and vision data. While fine-tuning these models can significantly enhance their performance on specific downstream tasks, it often requires high-quality data that cannot be shared due to privacy concerns. Federated Learning (FL) offers a promising solution for collaborative training without direct data sharing. However, many parameter-efficient fine-tuning strategies for LLMs in FL, particularly those based on Low-Rank Adaptation (LoRA), face limitations. In this paper, we critically analyze the convergence and performance guarantees of popular FL frameworks utilizing LoRA, highlighting its suboptimal nature due to constrained subspace learning of low-rank matrices. This limitation hinders effective fine-tuning of LLMs in federated settings. Through rigorous analytical and empirical evaluations, we demonstrate that direct weight averaging outperforms LoRA-based strategies, leading to superior performance for fine-tuned models. Our comprehensive comparison exposes inefficiencies in LoRA approaches and underscores the advantages of full-rank weight aggregation. We extend our analysis to low-rank gradient-based optimizers, such as GaLore, used during local training steps. Our findings show that GaLore is a more effective alternative, outperforming federated LoRA methods like FlexLoRA and FFA-LoRA across both text and image modalities. While privacy remains paramount in FL discourse, our focus is on assessing performance outcomes of federated fine-tuned models and evaluating various FL frameworks from both theoretical and empirical perspectives. Our findings advocate reassessing the reliance on LoRA within FL contexts, paving the way for more efficient training methodologies.

Updated: 2024-10-30 15:23:44

标题: 为什么梯度子空间？识别和减轻联邦微调大型语言模型中的LoRA瓶颈

摘要: 大型语言模型（LLMs）已经在各个领域展示出了卓越的能力，特别是在文本和视觉数据的任务泛化方面。虽然对这些模型进行微调可以显著提升它们在特定下游任务上的性能，但往往需要高质量的数据，由于隐私问题无法共享。联邦学习（FL）提供了一种无需直接共享数据的协作训练的有希望的解决方案。然而，在FL中针对LLMs的许多参数高效微调策略，特别是基于低秩适应（LoRA）的策略，面临限制。在本文中，我们对利用LoRA的流行FL框架的收敛性和性能保证进行了批判性分析，强调了由于低秩矩阵受限子空间学习而导致的其次优性质。这一限制阻碍了在联邦设置中对LLMs进行有效微调。通过严格的分析和实证评估，我们证明直接权重平均优于基于LoRA的策略，从而导致微调模型的优越性能。我们的全面比较揭示了LoRA方法的低效性，并强调了全秩权重聚合的优势。我们将我们的分析扩展到低秩梯度优化器，如GaLore，在本地训练步骤中使用。我们的发现表明，GaLore是一种更有效的替代方案，在文本和图像模态中都优于联邦LoRA方法，如FlexLoRA和FFA-LoRA。虽然隐私在FL讨论中仍然至关重要，但我们的重点是评估联邦微调模型的性能结果，并从理论和实证角度评估各种FL框架。我们的研究结果倡导重新评估FL环境中对LoRA的依赖，为更有效的训练方法铺平道路。

更新时间: 2024-10-30 15:23:44

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2410.23111v1

Accelerating Transformers with Spectrum-Preserving Token Merging

Increasing the throughput of the Transformer architecture, a foundational component used in numerous state-of-the-art models for vision and language tasks (e.g., GPT, LLaVa), is an important problem in machine learning. One recent and effective strategy is to merge token representations within Transformer models, aiming to reduce computational and memory requirements while maintaining accuracy. Prior works have proposed algorithms based on Bipartite Soft Matching (BSM), which divides tokens into distinct sets and merges the top k similar tokens. However, these methods have significant drawbacks, such as sensitivity to token-splitting strategies and damage to informative tokens in later layers. This paper presents a novel paradigm called PiToMe, which prioritizes the preservation of informative tokens using an additional metric termed the energy score. This score identifies large clusters of similar tokens as high-energy, indicating potential candidates for merging, while smaller (unique and isolated) clusters are considered as low-energy and preserved. Experimental findings demonstrate that PiToMe saved from 40-60\% FLOPs of the base models while exhibiting superior off-the-shelf performance on image classification (0.5\% average performance drop of ViT-MAE-H compared to 2.6\% as baselines), image-text retrieval (0.3\% average performance drop of CLIP on Flickr30k compared to 4.5\% as others), and analogously in visual questions answering with LLaVa-7B. Furthermore, PiToMe is theoretically shown to preserve intrinsic spectral properties of the original token space under mild conditions

Updated: 2024-10-30 15:22:53

标题: 使用保持频谱的标记合并加速变压器

摘要: 提高Transformer架构的吞吐量是机器学习中一个重要问题，Transformer是用于视觉和语言任务的众多最先进模型（如GPT、LLaVa）中的基础组件。最近一种有效的策略是在Transformer模型内部合并令牌表示，旨在降低计算和内存需求同时保持精度。先前的研究提出了基于Bipartite Soft Matching（BSM）的算法，将令牌分成不同的集合并合并相似度最高的前k个令牌。然而，这些方法存在显著缺点，如对令牌分割策略的敏感性和对后续层中信息性令牌的破坏。本文提出了一种名为PiToMe的新范式，它通过额外的能量分数度量优先保留信息性令牌。这个分数将相似令牌的大簇标识为高能量，表示潜在的合并候选项，而较小的（独特和孤立的）簇被视为低能量并得以保留。实验结果表明，PiToMe在图像分类（与基线相比，ViT-MAE-H的平均性能下降了0.5\%，而FLOPs基础模型节省了40-60\%）、图像文本检索（与其他方法相比，CLIP在Flickr30k上的平均性能下降了0.3\%）以及使用LLaVa-7B进行视觉问答时表现出色。此外，在温和条件下，PiToMe在理论上被证明保留了原始令牌空间的内在谱特性。

更新时间: 2024-10-30 15:22:53

领域: cs.LG

下载: http://arxiv.org/abs/2405.16148v2

ChatKBQA: A Generate-then-Retrieve Framework for Knowledge Base Question Answering with Fine-tuned Large Language Models

Knowledge Base Question Answering (KBQA) aims to answer natural language questions over large-scale knowledge bases (KBs), which can be summarized into two crucial steps: knowledge retrieval and semantic parsing. However, three core challenges remain: inefficient knowledge retrieval, mistakes of retrieval adversely impacting semantic parsing, and the complexity of previous KBQA methods. To tackle these challenges, we introduce ChatKBQA, a novel and simple generate-then-retrieve KBQA framework, which proposes first generating the logical form with fine-tuned LLMs, then retrieving and replacing entities and relations with an unsupervised retrieval method, to improve both generation and retrieval more directly. Experimental results show that ChatKBQA achieves new state-of-the-art performance on standard KBQA datasets, WebQSP, and CWQ. This work can also be regarded as a new paradigm for combining LLMs with knowledge graphs (KGs) for interpretable and knowledge-required question answering. Our code is publicly available.

Updated: 2024-10-30 15:22:12

标题: ChatKBQA: 一个用于知识库问答的生成-检索框架，基于经过微调的大型语言模型

摘要: 知识库问答（KBQA）旨在回答大规模知识库（KBs）上的自然语言问题，可以概括为两个关键步骤：知识检索和语义解析。然而，仍然存在三个核心挑战：知识检索低效、检索错误对语义解析产生不利影响以及以前的KBQA方法的复杂性。为了解决这些挑战，我们引入了ChatKBQA，这是一个新颖且简单的生成-检索KBQA框架，首先使用经过微调的LLMs生成逻辑形式，然后使用无监督检索方法检索并替换实体和关系，以更直接地提高生成和检索。实验结果表明，ChatKBQA在标准KBQA数据集WebQSP和CWQ上实现了新的最先进性能。这项工作也可以被视为一种将LLMs与知识图（KGs）相结合用于可解释和需要知识的问答的新范式。我们的代码是公开可用的。

更新时间: 2024-10-30 15:22:12

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2310.08975v3

Controllable Game Level Generation: Assessing the Effect of Negative Examples in GAN Models

Generative Adversarial Networks (GANs) are unsupervised models designed to learn and replicate a target distribution. The vanilla versions of these models can be extended to more controllable models. Conditional Generative Adversarial Networks (CGANs) extend vanilla GANs by conditioning both the generator and discriminator on some additional information (labels). Controllable models based on complementary learning, such as Rumi-GAN, have been introduced. Rumi-GANs leverage negative examples to enhance the generator's ability to learn positive examples. We evaluate the performance of two controllable GAN variants, CGAN and Rumi-GAN, in generating game levels targeting specific constraints of interest: playability and controllability. This evaluation is conducted under two scenarios: with and without the inclusion of negative examples. The goal is to determine whether incorporating negative examples helps the GAN models avoid generating undesirable outputs. Our findings highlight the strengths and weaknesses of each method in enforcing the generation of specific conditions when generating outputs based on given positive and negative examples.

Updated: 2024-10-30 15:18:26

标题: 可控游戏关卡生成：评估GAN模型中负例的影响

摘要: 生成对抗网络（GANs）是设计用来学习和复制目标分布的无监督模型。这些模型的基本版本可以扩展为更可控的模型。条件生成对抗网络（CGANs）通过在生成器和鉴别器上附加一些额外信息（标签）来扩展基本的GANs。基于互补学习的可控模型，如Rumi-GAN，已被引入。Rumi-GAN利用负例来增强生成器学习正例的能力。我们评估了两种可控GAN变体，CGAN和Rumi-GAN，在生成游戏关卡时针对特定约束条件（可玩性和可控性）的表现。这个评估在两种情景下进行：包括和不包括负例。我们的目标是确定是否将负例纳入是否有助于GAN模型避免生成不良输出。我们的发现突出了每种方法在生成基于给定正负例的输出时强制特定条件生成时的优缺点。

更新时间: 2024-10-30 15:18:26

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2410.23108v1

Segment, Shuffle, and Stitch: A Simple Layer for Improving Time-Series Representations

Existing approaches for learning representations of time-series keep the temporal arrangement of the time-steps intact with the presumption that the original order is the most optimal for learning. However, non-adjacent sections of real-world time-series may have strong dependencies. Accordingly, we raise the question: Is there an alternative arrangement for time-series which could enable more effective representation learning? To address this, we propose a simple plug-and-play neural network layer called Segment, Shuffle, and Stitch (S3) designed to improve representation learning in time-series models. S3 works by creating non-overlapping segments from the original sequence and shuffling them in a learned manner that is optimal for the task at hand. It then re-attaches the shuffled segments back together and performs a learned weighted sum with the original input to capture both the newly shuffled sequence along with the original sequence. S3 is modular and can be stacked to achieve different levels of granularity, and can be added to many forms of neural architectures including CNNs or Transformers with negligible computation overhead. Through extensive experiments on several datasets and state-of-the-art baselines, we show that incorporating S3 results in significant improvements for the tasks of time-series classification, forecasting, and anomaly detection, improving performance on certain datasets by up to 68\%. We also show that S3 makes the learning more stable with a smoother training loss curve and loss landscape compared to the original baseline. The code is available at https://github.com/shivam-grover/S3-TimeSeries.

Updated: 2024-10-30 15:18:22

标题: 分段、洗牌和拼接：改进时间序列表示的简单层

摘要: 现有的用于学习时间序列表示的方法保持时间步骤的时间排列不变，假设原始顺序对于学习是最优的。然而，现实世界时间序列的非相邻部分可能存在强烈的依赖关系。因此，我们提出一个问题：是否存在一种替代的时间序列安排方式，可以实现更有效的表示学习？为了解决这个问题，我们提出了一个名为Segment、Shuffle和Stitch（S3）的简单插拔式神经网络层，旨在改善时间序列模型中的表示学习。S3通过从原始序列中创建不重叠的片段，并以学习到的最优方式将它们进行混洗。然后，它将混洗后的片段重新连接在一起，并通过与原始输入的学习加权和来捕获新混洗序列以及原始序列。S3是模块化的，可以堆叠以实现不同粒度的水平，并可以添加到许多形式的神经架构，包括CNN或Transformer，计算开销可以忽略不计。通过在几个数据集和最先进基线上进行广泛实验，我们展示了融合S3在时间序列分类、预测和异常检测任务中带来了显著的改进，某些数据集的性能提高了高达68%。我们还展示了与原始基线相比，S3使学习更加稳定，训练损失曲线和损失景观更加平滑。代码可在https://github.com/shivam-grover/S3-TimeSeries获取。

更新时间: 2024-10-30 15:18:22

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2405.20082v3

Text2NKG: Fine-Grained N-ary Relation Extraction for N-ary relational Knowledge Graph Construction

Beyond traditional binary relational facts, n-ary relational knowledge graphs (NKGs) are comprised of n-ary relational facts containing more than two entities, which are closer to real-world facts with broader applications. However, the construction of NKGs remains at a coarse-grained level, which is always in a single schema, ignoring the order and variable arity of entities. To address these restrictions, we propose Text2NKG, a novel fine-grained n-ary relation extraction framework for n-ary relational knowledge graph construction. We introduce a span-tuple classification approach with hetero-ordered merging and output merging to accomplish fine-grained n-ary relation extraction in different arity. Furthermore, Text2NKG supports four typical NKG schemas: hyper-relational schema, event-based schema, role-based schema, and hypergraph-based schema, with high flexibility and practicality. The experimental results demonstrate that Text2NKG achieves state-of-the-art performance in F1 scores on the fine-grained n-ary relation extraction benchmark. Our code and datasets are publicly available.

Updated: 2024-10-30 15:18:14

标题: Text2NKG：用于N元关系知识图构建的细粒度N元关系抽取

摘要: 除了传统的二元关系事实之外，n元关系知识图（NKGs）由包含两个以上实体的n元关系事实组成，更接近具有更广泛应用的真实世界事实。然而，NKGs的构建仍处于粗粒度级别，总是在单个模式中，忽略了实体的顺序和可变性。为了解决这些限制，我们提出了Text2NKG，这是一个新颖的细粒度n元关系抽取框架，用于n元关系知识图构建。我们引入了一种跨度元组分类方法，具有异构排序合并和输出合并，以完成不同维度中的细粒度n元关系抽取。此外，Text2NKG支持四种典型的NKG模式：超关系模式、基于事件的模式、基于角色的模式和基于超图的模式，具有高灵活性和实用性。实验结果表明，Text2NKG在细粒度n元关系抽取基准上实现了F1分数的最先进性能。我们的代码和数据集已公开提供。

更新时间: 2024-10-30 15:18:14

领域: cs.AI,cs.CL

下载: http://arxiv.org/abs/2310.05185v3

Decoupling Semantic Similarity from Spatial Alignment for Neural Networks

What representation do deep neural networks learn? How similar are images to each other for neural networks? Despite the overwhelming success of deep learning methods key questions about their internal workings still remain largely unanswered, due to their internal high dimensionality and complexity. To address this, one approach is to measure the similarity of activation responses to various inputs. Representational Similarity Matrices (RSMs) distill this similarity into scalar values for each input pair. These matrices encapsulate the entire similarity structure of a system, indicating which input leads to similar responses. While the similarity between images is ambiguous, we argue that the spatial location of semantic objects does neither influence human perception nor deep learning classifiers. Thus this should be reflected in the definition of similarity between image responses for computer vision systems. Revisiting the established similarity calculations for RSMs we expose their sensitivity to spatial alignment. In this paper, we propose to solve this through semantic RSMs, which are invariant to spatial permutation. We measure semantic similarity between input responses by formulating it as a set-matching problem. Further, we quantify the superiority of semantic RSMs over spatio-semantic RSMs through image retrieval and by comparing the similarity between representations to the similarity between predicted class probabilities.

Updated: 2024-10-30 15:17:58

标题: 解除神经网络中语义相似性与空间对齐的耦合

摘要: 深度神经网络学习了什么样的表示？神经网络中图像之间的相似度有多高？尽管深度学习方法取得了巨大成功，但关于它们内部运作的关键问题仍然大多没有得到解答，这是由于其内部的高维度和复杂性。为了解决这个问题，一种方法是测量对不同输入的激活响应的相似度。表示相似矩阵（RSMs）将这种相似性提炼成每个输入对应的标量值。这些矩阵包含了系统的整个相似性结构，指示了哪些输入导致相似的响应。虽然图像之间的相似性是模糊的，但我们认为语义对象的空间位置既不影响人类的感知，也不影响深度学习分类器。因此，这应该在计算计算机视觉系统图像响应之间的相似性时得到反映。重新审视对RSMs的已建立的相似性计算，我们揭示了它们对空间对齐的敏感性。在本文中，我们建议通过语义RSMs来解决这个问题，这些RSMs对空间排列具有不变性。我们通过将其形式化为一个集合匹配问题来衡量输入响应之间的语义相似性。此外，我们通过图像检索和比较表示之间的相似性与预测类别概率之间的相似性来量化语义RSMs相对于空间-语义RSMs的优越性。

更新时间: 2024-10-30 15:17:58

领域: cs.CV,cs.AI,cs.LG

下载: http://arxiv.org/abs/2410.23107v1

Towards Heterogeneous Long-tailed Learning: Benchmarking, Metrics, and Toolbox

Long-tailed data distributions pose challenges for a variety of domains like e-commerce, finance, biomedical science, and cyber security, where the performance of machine learning models is often dominated by head categories while tail categories are inadequately learned. This work aims to provide a systematic view of long-tailed learning with regard to three pivotal angles: (A1) the characterization of data long-tailedness, (A2) the data complexity of various domains, and (A3) the heterogeneity of emerging tasks. We develop HeroLT, a comprehensive long-tailed learning benchmark integrating 18 state-of-the-art algorithms, 10 evaluation metrics, and 17 real-world datasets across 6 tasks and 4 data modalities. HeroLT with novel angles and extensive experiments (315 in total) enables effective and fair evaluation of newly proposed methods compared with existing baselines on varying dataset types. Finally, we conclude by highlighting the significant applications of long-tailed learning and identifying several promising future directions. For accessibility and reproducibility, we open-source our benchmark HeroLT and corresponding results at https://github.com/SSSKJ/HeroLT.

Updated: 2024-10-30 15:17:00

标题: 走向异质长尾学习：基准、度量和工具箱

摘要: 长尾数据分布对诸如电子商务、金融、生物医学科学和网络安全等各个领域提出了挑战，在这些领域中，机器学习模型的性能往往受到头部类别的主导，而尾部类别的学习不足。本文旨在从三个关键角度提供对长尾学习的系统性视角：（A1）数据长尾性的表征，（A2）各种领域的数据复杂性，以及（A3）新兴任务的异质性。我们开发了HeroLT，这是一个全面的长尾学习基准，集成了18种最先进的算法、10种评估指标和17个现实世界数据集，涵盖了6个任务和4种数据形式。HeroLT通过新颖的视角和大量实验（总共315个），能够有效而公平地评估新提出的方法与现有基线在不同数据集类型上的性能。最后，我们总结了长尾学习的重要应用，并确定了几个有前途的未来方向。为了方便访问和可重现性，我们开源了我们的基准HeroLT以及相应的结果，网址为https://github.com/SSSKJ/HeroLT。

更新时间: 2024-10-30 15:17:00

领域: cs.LG

下载: http://arxiv.org/abs/2307.08235v2

WaveMixSR-V2: Enhancing Super-resolution with Higher Efficiency

Recent advancements in single image super-resolution have been predominantly driven by token mixers and transformer architectures. WaveMixSR utilized the WaveMix architecture, employing a two-dimensional discrete wavelet transform for spatial token mixing, achieving superior performance in super-resolution tasks with remarkable resource efficiency. In this work, we present an enhanced version of the WaveMixSR architecture by (1) replacing the traditional transpose convolution layer with a pixel shuffle operation and (2) implementing a multistage design for higher resolution tasks ($4\times$). Our experiments demonstrate that our enhanced model -- WaveMixSR-V2 -- outperforms other architectures in multiple super-resolution tasks, achieving state-of-the-art for the BSD100 dataset, while also consuming fewer resources, exhibits higher parameter efficiency, lower latency and higher throughput. Our code is available at https://github.com/pranavphoenix/WaveMixSR.

Updated: 2024-10-30 15:16:43

标题: WaveMixSR-V2：通过更高效率增强超分辨率

摘要: 最近单图像超分辨率的进展主要受到令牌混合器和变换器架构的推动。WaveMixSR利用WaveMix架构，采用二维离散小波变换进行空间令牌混合，在超分辨率任务中表现出卓越性能和显著的资源效率。在这项工作中，我们提出了WaveMixSR架构的增强版本，通过（1）将传统的转置卷积层替换为像素重排操作，（2）为更高分辨率任务（4倍）实现多阶段设计。我们的实验表明，我们增强的模型--WaveMixSR-V2--在多个超分辨率任务中胜过其他架构，在BSD100数据集中实现了最先进的性能，同时消耗更少资源，表现出更高的参数效率，更低的延迟和更高的吞吐量。我们的代码可在https://github.com/pranavphoenix/WaveMixSR找到。

更新时间: 2024-10-30 15:16:43

领域: eess.IV,cs.AI,cs.CV,cs.LG,I.2.10; I.4.0; I.4.1; I.4.2; I.4.6; I.4.7; I.4.8; I.4.9; I.4.10; I.2.10; I.5.1; I.5.2; I.5.4; I.4.3; I.4.4; I.4.5

下载: http://arxiv.org/abs/2409.10582v3

Bayesian Optimisation with Unknown Hyperparameters: Regret Bounds Logarithmically Closer to Optimal

Bayesian Optimization (BO) is widely used for optimising black-box functions but requires us to specify the length scale hyperparameter, which defines the smoothness of the functions the optimizer will consider. Most current BO algorithms choose this hyperparameter by maximizing the marginal likelihood of the observed data, albeit risking misspecification if the objective function is less smooth in regions we have not yet explored. The only prior solution addressing this problem with theoretical guarantees was A-GP-UCB, proposed by Berkenkamp et al. (2019). This algorithm progressively decreases the length scale, expanding the class of functions considered by the optimizer. However, A-GP-UCB lacks a stopping mechanism, leading to over-exploration and slow convergence. To overcome this, we introduce Length scale Balancing (LB) - a novel approach, aggregating multiple base surrogate models with varying length scales. LB intermittently adds smaller length scale candidate values while retaining longer scales, balancing exploration and exploitation. We formally derive a cumulative regret bound of LB and compare it with the regret of an oracle BO algorithm using the optimal length scale. Denoting the factor by which the regret bound of A-GP-UCB was away from oracle as $g(T)$, we show that LB is only $\log g(T)$ away from oracle regret. We also empirically evaluate our algorithm on synthetic and real-world benchmarks and show it outperforms A-GP-UCB, maximum likelihood estimation and MCMC.

Updated: 2024-10-30 15:15:37

标题: 贝叶斯优化中未知超参数：遗憾界限对数接近最优解

摘要: 贝叶斯优化（BO）被广泛用于优化黑盒函数，但需要我们指定长度尺度超参数，该超参数定义了优化器将考虑的函数的平滑程度。大多数当前的BO算法通过最大化观测数据的边缘似然来选择这个超参数，尽管在我们尚未探索的区域目标函数不太平滑时存在误差。唯一具有理论保证解决这个问题的先前解决方案是由Berkenkamp等人提出的A-GP-UCB（2019）。该算法逐渐减小长度尺度，扩展了优化器考虑的函数类别。然而，A-GP-UCB缺乏停止机制，导致过度探索和缓慢收敛。为了克服这一问题，我们引入了长度尺度平衡（LB）-一种新颖的方法，将多个具有不同长度尺度的基本代理模型聚合在一起。LB不时添加较小的长度尺度候选值，同时保留较长的尺度，平衡了探索和利用。我们正式推导了LB的累积遗憾界，并将其与使用最佳长度尺度的Oracle BO算法的遗憾进行比较。将A-GP-UCB的遗憾界与Oracle之间的因子表示为$g(T)$，我们表明LB与Oracle遗憾仅相差$\log g(T)$。我们还在合成和真实基准上对我们的算法进行了实证评估，并展示其优于A-GP-UCB，最大似然估计和MCMC。

更新时间: 2024-10-30 15:15:37

领域: stat.ML,cs.LG

下载: http://arxiv.org/abs/2410.10384v2

Guided Game Level Repair via Explainable AI

Procedurally generated levels created by machine learning models can be unsolvable without further editing. Various methods have been developed to automatically repair these levels by enforcing hard constraints during the post-processing step. However, as levels increase in size, these constraint-based repairs become increasingly slow. This paper proposes using explainability methods to identify specific regions of a level that contribute to its unsolvability. By assigning higher weights to these regions, constraint-based solvers can prioritize these problematic areas, enabling more efficient repairs. Our results, tested across three games, demonstrate that this approach can help to repair procedurally generated levels faster.

Updated: 2024-10-30 15:12:36

标题: 通过可解释的人工智能引导游戏级别修复

摘要: 由机器学习模型创建的程序生成的关卡可能在进一步编辑之前无法解决。已经开发了各种方法，在后处理步骤中通过强制执行硬约束自动修复这些关卡。然而，随着关卡规模的增加，基于约束的修复变得越来越慢。本文提出使用可解释性方法识别导致关卡不可解的特定区域。通过将更高的权重分配给这些区域，基于约束的求解器可以优先处理这些问题区域，从而实现更高效的修复。我们的结果在三款游戏中测试，表明这种方法可以帮助更快地修复程序生成的关卡。

更新时间: 2024-10-30 15:12:36

领域: cs.AI,cs.LG

下载: http://arxiv.org/abs/2410.23101v1

Comparative Analysis of Demonstration Selection Algorithms for LLM In-Context Learning

In-context learning can help Large Language Models (LLMs) to adapt new tasks without additional training. However, this performance heavily depends on the quality of the demonstrations, driving research into effective demonstration selection algorithms to optimize this process. These algorithms assist users in selecting the best $k$ input-label pairs (demonstration examples) based on a given test input, enabling LLMs to in-context learn the relationship between the provided examples and the test inputs. Despite all the proposed demonstration selection algorithms, their efficiency and effectiveness remain unclear. This lack of clarity make it difficult to apply these algorithms in real-world scenarios and poses challenges for future research aimed at developing improved methods. This paper revisits six proposed algorithms, evaluating them on five datasets from both efficiency and effectiveness perspectives. Our experiments reveal significant variations in algorithm performance across different tasks, with some methods struggling to outperform random selection in certain scenarios. We also find that increasing the number of demonstrations does not always lead to better performance, and that there are often trade-offs between accuracy and computational efficiency. Our code is available at https://github.com/Tizzzzy/Demonstration_Selection_Overview.

Updated: 2024-10-30 15:11:58

标题: LLM上下文学习中演示选择算法的比较分析

摘要: 在上下文学习可以帮助大型语言模型（LLMs）在不需要额外训练的情况下适应新任务。然而，这种表现在很大程度上取决于示范的质量，这推动了研究有效示范选择算法以优化这一过程。这些算法帮助用户基于给定的测试输入选择最佳的$k$个输入-标签对（示范示例），使LLMs能够在上下文中学习提供的示例和测试输入之间的关系。尽管提出了各种示范选择算法，它们的效率和有效性仍不清楚。这种缺乏清晰度使得在现实场景中应用这些算法变得困难，并为未来旨在开发改进方法的研究提出挑战。本文重新审视了六种提出的算法，从效率和有效性的角度对它们在五个数据集上进行评估。我们的实验揭示了在不同任务中算法性能存在显著差异，有些方法在某些场景下难以超越随机选择。我们还发现增加示范数量并不总是会导致更好的性能，并且通常在准确性和计算效率之间存在权衡。我们的代码可在https://github.com/Tizzzzy/Demonstration_Selection_Overview获取。

更新时间: 2024-10-30 15:11:58

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2410.23099v1

IMITATE: Clinical Prior Guided Hierarchical Vision-Language Pre-training

In the field of medical Vision-Language Pre-training (VLP), significant efforts have been devoted to deriving text and image features from both clinical reports and associated medical images. However, most existing methods may have overlooked the opportunity in leveraging the inherent hierarchical structure of clinical reports, which are generally split into `findings' for descriptive content and `impressions' for conclusive observation. Instead of utilizing this rich, structured format, current medical VLP approaches often simplify the report into either a unified entity or fragmented tokens. In this work, we propose a novel clinical prior guided VLP framework named IMITATE to learn the structure information from medical reports with hierarchical vision-language alignment. The framework derives multi-level visual features from the chest X-ray (CXR) images and separately aligns these features with the descriptive and the conclusive text encoded in the hierarchical medical report. Furthermore, a new clinical-informed contrastive loss is introduced for cross-modal learning, which accounts for clinical prior knowledge in formulating sample correlations in contrastive learning. The proposed model, IMITATE, outperforms baseline VLP methods across six different datasets, spanning five medical imaging downstream tasks. Comprehensive experimental results highlight the advantages of integrating the hierarchical structure of medical reports for vision-language alignment. The code related to this paper is available at https://github.com/cheliu-computation/IMITATE-TMI2024.

Updated: 2024-10-30 15:08:37

标题: 模仿：临床先导指导的分层视觉-语言预训练

摘要: 在医学视觉-语言预训练（VLP）领域，人们已经付出了大量努力，以从临床报告和相关医学图像中提取文本和图像特征。然而，大多数现有方法可能忽视了利用临床报告固有的层次结构的机会，这些报告通常被分为描述性内容的“发现”和总结性观察的“印象”。与利用这种丰富的结构化格式不同，当前的医学VLP方法通常将报告简化为统一实体或分段标记。在这项工作中，我们提出了一种新颖的临床先验引导的VLP框架，命名为IMITATE，用于从具有层次结构的医学报告中学习结构信息，实现视觉-语言对齐。该框架从胸部X射线（CXR）图像中派生多层次视觉特征，并将这些特征与编码在层次医学报告中的描述性和总结性文本分开对齐。此外，引入了一种新的临床信息对比损失，用于跨模态学习，考虑了在对比学习中制定样本相关性时的临床先验知识。所提出的模型IMITATE在跨越五个医学成像下游任务的六个不同数据集上优于基线VLP方法。全面的实验结果突出了集成医学报告的层次结构以实现视觉-语言对齐的优势。本文相关代码可在https://github.com/cheliu-computation/IMITATE-TMI2024找到。

更新时间: 2024-10-30 15:08:37

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2310.07355v5

MassSpecGym: A benchmark for the discovery and identification of molecules

The discovery and identification of molecules in biological and environmental samples is crucial for advancing biomedical and chemical sciences. Tandem mass spectrometry (MS/MS) is the leading technique for high-throughput elucidation of molecular structures. However, decoding a molecular structure from its mass spectrum is exceptionally challenging, even when performed by human experts. As a result, the vast majority of acquired MS/MS spectra remain uninterpreted, thereby limiting our understanding of the underlying (bio)chemical processes. Despite decades of progress in machine learning applications for predicting molecular structures from MS/MS spectra, the development of new methods is severely hindered by the lack of standard datasets and evaluation protocols. To address this problem, we propose MassSpecGym -- the first comprehensive benchmark for the discovery and identification of molecules from MS/MS data. Our benchmark comprises the largest publicly available collection of high-quality labeled MS/MS spectra and defines three MS/MS annotation challenges: \textit{de novo} molecular structure generation, molecule retrieval, and spectrum simulation. It includes new evaluation metrics and a generalization-demanding data split, therefore standardizing the MS/MS annotation tasks and rendering the problem accessible to the broad machine learning community. MassSpecGym is publicly available at \url{https://github.com/pluskal-lab/MassSpecGym}.

Updated: 2024-10-30 15:08:05

标题: MassSpecGym：发现和鉴定分子的基准

摘要: 生物和环境样本中分子的发现和鉴定对推动生物医学和化学科学至关重要。串联质谱（MS/MS）是用于高通量阐明分子结构的主要技术。然而，从质谱中解码分子结构即使由专家进行也是极具挑战性的。因此，大部分获得的MS/MS光谱仍未解释，从而限制了我们对潜在（生物）化学过程的理解。尽管机器学习应用在从MS/MS光谱预测分子结构方面取得了几十年的进展，但由于缺乏标准数据集和评估协议，新方法的发展受到严重阻碍。为解决这一问题，我们提出MassSpecGym - 第一个用于从MS/MS数据中发现和鉴定分子的全面基准。我们的基准包含了最大的公开可用的高质量标记的MS/MS光谱集合，并定义了三个MS/MS注释挑战：\textit{de novo}分子结构生成、分子检索和光谱模拟。它包括新的评估指标和一个对泛化要求高的数据分割，从而标准化了MS/MS注释任务，并使问题可以被广泛的机器学习社区接触。MassSpecGym可以在\url{https://github.com/pluskal-lab/MassSpecGym}上公开获取。

更新时间: 2024-10-30 15:08:05

领域: q-bio.QM,cs.LG

下载: http://arxiv.org/abs/2410.23326v1

No imputation without representation

By filling in missing values in datasets, imputation allows these datasets to be used with algorithms that cannot handle missing values by themselves. However, missing values may in principle contribute useful information that is lost through imputation. The missing-indicator approach can be used in combination with imputation to instead represent this information as a part of the dataset. There are several theoretical considerations why missing-indicators may or may not be beneficial, but there has not been any large-scale practical experiment on real-life datasets to test this question for machine learning predictions. We perform this experiment for three imputation strategies and a range of different classification algorithms, on the basis of twenty real-life datasets. In a follow-up experiment, we determine attribute-specific missingness thresholds for each classifier above which missing-indicators are more likely than not to increase classification performance. And in a second follow-up experiment, we evaluate numerical imputation of one-hot encoded categorical attributes. We reach the following conclusions. Firstly, missing-indicators generally increase classification performance. Secondly, with missing-indicators, nearest neighbour and iterative imputation do not lead to better performance than simple mean/mode imputation. Thirdly, for decision trees, pruning is necessary to prevent overfitting. Fourthly, the thresholds above which missing-indicators are more likely than not to improve performance are lower for categorical attributes than for numerical attributes. Lastly, mean imputation of numerical attributes preserves some of the information from missing values. Consequently, when not using missing-indicators it can be advantageous to apply mean imputation to one-hot encoded categorical attributes instead of mode imputation.

Updated: 2024-10-30 15:03:44

标题: 没有代表就没有补偿

摘要: 通过填补数据集中的缺失值，插补使得这些数据集可以与无法自行处理缺失值的算法一起使用。然而，缺失值原则上可能包含通过插补而丢失的有用信息。缺失指示器方法可以与插补结合使用，将这些信息作为数据集的一部分来表示。有几个理论考虑缺失指示器可能或不可能有益，但尚未有任何大规模的实践实验用于测试机器学习预测这一问题。我们对二十个真实数据集进行了三种插补策略和不同分类算法的实验。在后续实验中，我们确定了每个分类器的属性特定的缺失阈值，超过该阈值时，缺失指示器更有可能提高分类性能。在第二个后续实验中，我们评估了对独热编码的分类属性进行数值插补。我们得出以下结论。首先，缺失指示器通常会提高分类性能。其次，使用缺失指示器时，最近邻和迭代插补不会比简单的均值/众数插补带来更好的性能。第三，在决策树中，修剪是必要的以防止过拟合。第四，超过缺失指示器更有可能提高性能的阈值对于分类属性而言比数值属性更低。最后，数值属性的均值插补保留了一些缺失值的信息。因此，在不使用缺失指示器时，将均值插补应用于独热编码的分类属性而不是众数插补可能是有利的。

更新时间: 2024-10-30 15:03:44

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2206.14254v4

Statistical-Computational Trade-offs for Density Estimation

We study the density estimation problem defined as follows: given $k$ distributions $p_1, \ldots, p_k$ over a discrete domain $[n]$, as well as a collection of samples chosen from a ``query'' distribution $q$ over $[n]$, output $p_i$ that is ``close'' to $q$. Recently~\cite{aamand2023data} gave the first and only known result that achieves sublinear bounds in {\em both} the sampling complexity and the query time while preserving polynomial data structure space. However, their improvement over linear samples and time is only by subpolynomial factors. Our main result is a lower bound showing that, for a broad class of data structures, their bounds cannot be significantly improved. In particular, if an algorithm uses $O(n/\log^c k)$ samples for some constant $c>0$ and polynomial space, then the query time of the data structure must be at least $k^{1-O(1)/\log \log k}$, i.e., close to linear in the number of distributions $k$. This is a novel \emph{statistical-computational} trade-off for density estimation, demonstrating that any data structure must use close to a linear number of samples or take close to linear query time. The lower bound holds even in the realizable case where $q=p_i$ for some $i$, and when the distributions are flat (specifically, all distributions are uniform over half of the domain $[n]$). We also give a simple data structure for our lower bound instance with asymptotically matching upper bounds. Experiments show that the data structure is quite efficient in practice.

Updated: 2024-10-30 15:03:33

标题: 密度估计的统计计算权衡

摘要: 我们研究了以下定义的密度估计问题：给定$k$个在离散域$[n]$上的分布$p_1, \ldots, p_k$，以及从“查询”分布$q$上选择的样本集合，输出与$q$“接近”的$p_i$。最近~\cite{aamand2023data} 给出了第一个也是唯一已知的结果，实现了在保持多项式数据结构空间的同时达到子线性边界的采样复杂度和查询时间。然而，他们相对于线性采样和时间的改进仅仅是次多项式因子。我们的主要结果是一个下界，表明对于一类广泛的数据结构，他们的边界无法显着改进。特别地，如果算法对某个常数$c>0$使用$O(n/\log^c k)$个样本并且具有多项式空间，那么数据结构的查询时间必须至少为$k^{1-O(1)/\log \log k}$，即接近于分布数量$k$的线性时间。这是密度估计的一个新颖的\emph{统计计算}权衡，表明任何数据结构必须使用接近于线性数量的样本或花费接近于线性的查询时间。即使在可实现的情况下，即对于某个$i$有$q=p_i$，以及分布是平坦的（具体地说，所有分布在域$[n]$的一半上均匀分布）。我们还为我们的下界实例提供了一个简单的数据结构，其渐近匹配上界。实验证明该数据结构在实践中非常高效。

更新时间: 2024-10-30 15:03:33

领域: cs.DS,cs.LG,stat.ML

下载: http://arxiv.org/abs/2410.23087v1

From Hype to Reality: The Road Ahead of Deploying DRL in 6G Networks

The industrial landscape is rapidly evolving with the advent of 6G applications, which demand massive connectivity, high computational capacity, and ultra-low latency. These requirements present new challenges, which can no longer be efficiently addressed by conventional strategies. In response, this article underscores the transformative potential of Deep Reinforcement Learning (DRL) for 6G, highlighting its advantages over classic machine learning solutions in meeting the demands of 6G. The necessity of DRL is further validated through three DRL applications in an end-to-end communication procedure, including wireless access control, baseband function placement, and network slicing coordination. However, DRL-based network management initiatives are far from mature. We extend the discussion to identify the challenges of applying DRL in practical networks and explore potential solutions along with their respective limitations. In the end, these insights are validated through a practical DRL deployment in managing network slices on the testbed.

Updated: 2024-10-30 15:02:54

标题: 从炒作到现实：在6G网络中部署深度强化学习的未来之路

摘要: 随着6G应用的出现，工业景观正在迅速演变，这些应用要求大规模连接性、高计算能力和超低延迟。这些要求提出了新的挑战，传统策略已经无法有效应对。为此，本文强调了深度强化学习（DRL）在6G领域的转变潜力，突显其在满足6G需求方面优于经典机器学习解决方案的优势。通过三个DRL应用案例，包括无线接入控制、基带功能放置和网络切片协调，进一步验证了DRL的必要性。然而，基于DRL的网络管理倡议尚未成熟。我们扩展讨论以识别在实际网络中应用DRL面临的挑战，并探索潜在解决方案及其各自的局限性。最后，通过在测试平台上管理网络切片的实际DRL部署来验证这些见解。

更新时间: 2024-10-30 15:02:54

领域: cs.NI,cs.AI,cs.DC,cs.SY,eess.SY

下载: http://arxiv.org/abs/2410.23086v1

MemControl: Mitigating Memorization in Diffusion Models via Automated Parameter Selection

Diffusion models excel in generating images that closely resemble their training data but are also susceptible to data memorization, raising privacy, ethical, and legal concerns, particularly in sensitive domains such as medical imaging. We hypothesize that this memorization stems from the overparameterization of deep models and propose that regularizing model capacity during fine-tuning can mitigate this issue. Firstly, we empirically show that regulating the model capacity via Parameter-efficient fine-tuning (PEFT) mitigates memorization to some extent, however, it further requires the identification of the exact parameter subsets to be fine-tuned for high-quality generation. To identify these subsets, we introduce a bi-level optimization framework, MemControl, that automates parameter selection using memorization and generation quality metrics as rewards during fine-tuning. The parameter subsets discovered through MemControl achieve a superior tradeoff between generation quality and memorization. For the task of medical image generation, our approach outperforms existing state-of-the-art memorization mitigation strategies by fine-tuning as few as 0.019% of model parameters. Moreover, we demonstrate that the discovered parameter subsets are transferable to non-medical domains. Our framework is scalable to large datasets, agnostic to reward functions, and can be integrated with existing approaches for further memorization mitigation. To the best of our knowledge, this is the first study to empirically evaluate memorization in medical images and propose a targeted yet universal mitigation strategy. The code is available at https://github.com/Raman1121/Diffusion_Memorization_HPO

Updated: 2024-10-30 15:02:54

标题: MemControl: 通过自动参数选择减轻扩散模型中的记忆化

摘要: 扩散模型在生成与训练数据密切相似的图像方面表现出色，但也容易记忆数据，引发隐私、伦理和法律关切，特别是在医学影像等敏感领域。我们假设这种记忆源于深度模型的过度参数化，并提出在微调过程中对模型容量进行正则化可以缓解这一问题。首先，我们通过参数高效微调（PEFT）的经验表明，通过调节模型容量可以在一定程度上减少记忆问题，但进一步需要确定要进行高质量生成微调的确切参数子集。为了识别这些子集，我们引入了一个双层优化框架MemControl，通过在微调过程中将记忆和生成质量度量作为奖励来自动选择参数。通过MemControl发现的参数子集在生成质量和记忆之间取得了更好的平衡。在医学图像生成任务中，我们的方法通过微调仅有0.019%的模型参数就超越了现有的最先进的记忆减轻策略。此外，我们证明了发现的参数子集可以转移到非医学领域。我们的框架可扩展到大型数据集，不受奖励函数的影响，并可与现有方法集成以进一步减轻记忆问题。据我们所知，这是第一项对医学图像中的记忆进行实证评估并提出有针对性但通用的缓解策略的研究。代码可在https://github.com/Raman1121/Diffusion_Memorization_HPO获取。

更新时间: 2024-10-30 15:02:54

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2405.19458v2

Robustifying automatic speech recognition by extracting slowly varying features

In the past few years, it has been shown that deep learning systems are highly vulnerable under attacks with adversarial examples. Neural-network-based automatic speech recognition (ASR) systems are no exception. Targeted and untargeted attacks can modify an audio input signal in such a way that humans still recognise the same words, while ASR systems are steered to predict a different transcription. In this paper, we propose a defense mechanism against targeted adversarial attacks consisting in removing fast-changing features from the audio signals, either by applying slow feature analysis, a low-pass filter, or both, before feeding the input to the ASR system. We perform an empirical analysis of hybrid ASR models trained on data pre-processed in such a way. While the resulting models perform quite well on benign data, they are significantly more robust against targeted adversarial attacks: Our final, proposed model shows a performance on clean data similar to the baseline model, while being more than four times more robust.

Updated: 2024-10-30 15:02:41

标题: 通过提取缓慢变化的特征来增强自动语音识别的鲁棒性

摘要: 在过去几年中，已经表明深度学习系统在受到具有对抗性示例的攻击时非常脆弱。基于神经网络的自动语音识别（ASR）系统也不例外。有针对性和无针对性的攻击可以修改音频输入信号，使人类仍然能够识别相同的单词，而ASR系统则被引导预测出不同的转录。在本文中，我们提出了一种防御机制，针对有针对性的对抗性攻击，包括在将输入馈送给ASR系统之前从音频信号中移除快速变化的特征，通过应用慢特征分析、低通滤波器或两者结合的方式。我们对在这种方式下进行预处理的数据训练的混合ASR模型进行了实证分析。尽管结果模型在良性数据上表现良好，但它们对有针对性的对抗性攻击更具有显著的鲁棒性：我们的最终提出的模型在干净数据上表现类似于基线模型，但鲁棒性超过四倍。

更新时间: 2024-10-30 15:02:41

领域: eess.AS,cs.LG,cs.SD,stat.ML

下载: http://arxiv.org/abs/2112.07400v2

S3PT: Scene Semantics and Structure Guided Clustering to Boost Self-Supervised Pre-Training for Autonomous Driving

Recent self-supervised clustering-based pre-training techniques like DINO and Cribo have shown impressive results for downstream detection and segmentation tasks. However, real-world applications such as autonomous driving face challenges with imbalanced object class and size distributions and complex scene geometries. In this paper, we propose S3PT a novel scene semantics and structure guided clustering to provide more scene-consistent objectives for self-supervised training. Specifically, our contributions are threefold: First, we incorporate semantic distribution consistent clustering to encourage better representation of rare classes such as motorcycles or animals. Second, we introduce object diversity consistent spatial clustering, to handle imbalanced and diverse object sizes, ranging from large background areas to small objects such as pedestrians and traffic signs. Third, we propose a depth-guided spatial clustering to regularize learning based on geometric information of the scene, thus further refining region separation on the feature level. Our learned representations significantly improve performance in downstream semantic segmentation and 3D object detection tasks on the nuScenes, nuImages, and Cityscapes datasets and show promising domain translation properties.

Updated: 2024-10-30 15:00:06

标题: S3PT：场景语义和结构引导的聚类，以提升自主驾驶的自监督预训练

摘要: 最近的自监督聚类预训练技术，如DINO和Cribo，已经显示出对下游检测和分割任务的令人印象深刻的结果。然而，像自动驾驶这样的实际应用面临着类别和尺寸分布不平衡以及复杂场景几何结构的挑战。在本文中，我们提出了一种新颖的场景语义和结构引导的聚类方法S3PT，为自监督训练提供更多场景一致的目标。具体而言，我们的贡献有三个方面：首先，我们将语义分布一致的聚类方法纳入，以促进对摩托车或动物等稀有类别的更好表示。其次，我们引入了物体多样性一致的空间聚类，以处理不平衡和多样化的物体尺寸，从大背景区域到小物体（如行人和交通标志）。第三，我们提出了一种基于深度引导的空间聚类，根据场景的几何信息规范学习，从而进一步在特征级别上优化区域分离。我们学到的表示显著提高了在nuScenes、nuImages和Cityscapes数据集上下游语义分割和3D物体检测任务的性能，并显示出有希望的领域转换属性。

更新时间: 2024-10-30 15:00:06

领域: cs.CV,cs.AI,cs.RO

下载: http://arxiv.org/abs/2410.23085v1

Developing a Self-Explanatory Transformer

While IoT devices provide significant benefits, their rapid growth results in larger data volumes, increased complexity, and higher security risks. To manage these issues, techniques like encryption, compression, and mapping are used to process data efficiently and securely. General-purpose and AI platforms handle these tasks well, but mapping in natural language processing is often slowed by training times. This work explores a self-explanatory, training-free mapping transformer based on non-deterministic finite automata, designed for Field-Programmable Gate Arrays (FPGAs). Besides highlighting the advantages of this proposed approach in providing real-time, cost-effective processing and dataset-loading, we also address the challenges and considerations for enhancing the design in future iterations.

Updated: 2024-10-30 14:59:53

标题: 开发一个自解释的变压器

摘要: 尽管物联网设备提供了显著的好处，但它们的快速增长导致数据量更大、复杂性增加和安全风险加剧。为了管理这些问题，使用加密、压缩和映射等技术来高效且安全地处理数据。通用目的和人工智能平台能很好地处理这些任务，但自然语言处理中的映射通常会因训练时间而变慢。这项工作探讨了一种基于非确定性有限自动机的自解释、无需训练的映射变换器，专为可编程门阵列（FPGAs）设计。除了强调这种提议方法在提供实时、具有成本效益的处理和数据集加载方面的优势之外，我们还讨论了未来版本中增强设计所面临的挑战和考虑。

更新时间: 2024-10-30 14:59:53

领域: cs.CR

下载: http://arxiv.org/abs/2410.23083v1

Scientific and Technological Information Oriented Semantics-adversarial and Media-adversarial Cross-media Retrieval

Cross-media retrieval of scientific and technological information is one of the important tasks in the cross-media study. Cross-media scientific and technological information retrieval obtain target information from massive multi-source and heterogeneous scientific and technological resources, which helps to design applications that meet users' needs, including scientific and technological information recommendation, personalized scientific and technological information retrieval, etc. The core of cross-media retrieval is to learn a common subspace, so that data from different media can be directly compared with each other after being mapped into this subspace. In subspace learning, existing methods often focus on modeling the discrimination of intra-media data and the invariance of inter-media data after mapping; however, they ignore the semantic consistency of inter-media data before and after mapping and media discrimination of intra-semantics data, which limit the result of cross-media retrieval. In light of this, we propose a scientific and technological information oriented Semantics-adversarial and Media-adversarial Cross-media Retrieval method (SMCR) to find an effective common subspace. Specifically, SMCR minimizes the loss of inter-media semantic consistency in addition to modeling intra-media semantic discrimination, to preserve semantic similarity before and after mapping. Furthermore, SMCR constructs a basic feature mapping network and a refined feature mapping network to jointly minimize the media discriminative loss within semantics, so as to enhance the feature mapping network's ability to confuse the media discriminant network. Experimental results on two datasets demonstrate that the proposed SMCR outperforms state-of-the-art methods in cross-media retrieval.

Updated: 2024-10-30 14:56:09

标题: 科技信息导向的语义对抗和媒体对抗跨媒体检索

摘要: 科技信息的跨媒体检索是跨媒体研究中的重要任务之一。跨媒体科技信息检索从大规模多源和异构科技资源中获取目标信息，有助于设计满足用户需求的应用程序，包括科技信息推荐、个性化科技信息检索等。跨媒体检索的核心是学习一个共同的子空间，使不同媒体的数据在映射到这个子空间后可以直接进行比较。在子空间学习中，现有方法通常侧重于建模映射后的媒体内数据的区分性和跨媒体数据的不变性；然而，它们忽视了映射前后跨媒体数据的语义一致性以及媒体内语义数据的媒体区分性，这限制了跨媒体检索的结果。基于此，我们提出了一种面向科技信息的语义对抗和媒体对抗的跨媒体检索方法（SMCR），以找到一个有效的共同子空间。具体来说，SMCR不仅最小化跨媒体语义一致性的损失，还建模媒体内语义区分，以保持映射前后的语义相似性。此外，SMCR构建了一个基本特征映射网络和一个精细特征映射网络，共同最小化语义内的媒体区分损失，以增强特征映射网络对媒体区分网络的混淆能力。在两个数据集上的实验结果表明，所提出的SMCR在跨媒体检索中胜过了最先进的方法。

更新时间: 2024-10-30 14:56:09

领域: cs.IR,cs.AI,68T30,H.3.3

下载: http://arxiv.org/abs/2203.08615v3

Differentially Private Representation Learning via Image Captioning

Differentially private (DP) machine learning is considered the gold-standard solution for training a model from sensitive data while still preserving privacy. However, a major barrier to achieving this ideal is its sub-optimal privacy-accuracy trade-off, which is particularly visible in DP representation learning. Specifically, it has been shown that under modest privacy budgets, most models learn representations that are not significantly better than hand-crafted features. In this work, we show that effective DP representation learning can be done via image captioning and scaling up to internet-scale multimodal datasets. Through a series of engineering tricks, we successfully train a DP image captioner (DP-Cap) on a 233M subset of LAION-2B from scratch using a reasonable amount of computation, and obtaining unprecedented high-quality image features that can be used in a variety of downstream vision and vision-language tasks. For example, under a privacy budget of $\varepsilon=8$ for the LAION dataset, a linear classifier trained on top of learned DP-Cap features attains $65.8\%$ accuracy on ImageNet-1K, considerably improving the previous SOTA of $56.5\%$.

Updated: 2024-10-30 14:55:58

标题: 差分隐私表示学习通过图像字幕生成

摘要: 差分隐私（DP）机器学习被认为是从敏感数据中训练模型并同时保护隐私的黄金标准解决方案。然而，实现这一理想的一个主要障碍是其次优的隐私-准确性权衡，尤其在DP表示学习中更为明显。具体来说，已经证明在适度的隐私预算下，大多数模型学习的表示并不比手工制作的特征显著更好。在这项工作中，我们展示了通过图像字幕生成和扩展到互联网规模的多模态数据集，可以有效地进行DP表示学习。通过一系列工程技巧，我们成功地从头开始训练了一个DP图像字幕生成器（DP-Cap），使用合理数量的计算在LAION-2B的233M子集上，并获得了前所未有的高质量图像特征，可用于各种下游视觉和视觉-语言任务。例如，在LAION数据集的隐私预算为$\varepsilon=8$时，一个在学习的DP-Cap特征之上训练的线性分类器在ImageNet-1K上达到了$65.8\%$的准确率，大大提高了先前的最佳成绩$56.5\%$。

更新时间: 2024-10-30 14:55:58

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2403.02506v2

An Event-Based Digital Compute-In-Memory Accelerator with Flexible Operand Resolution and Layer-Wise Weight/Output Stationarity

Compute-in-memory (CIM) accelerators for spiking neural networks (SNNs) are promising solutions to enable $\mu$s-level inference latency and ultra-low energy in edge vision applications. Yet, their current lack of flexibility at both the circuit and system levels prevents their deployment in a wide range of real-life scenarios. In this work, we propose a novel digital CIM macro that supports arbitrary operand resolution and shape, with a unified CIM storage for weights and membrane potentials. These circuit-level techniques enable a hybrid weight- and output-stationary dataflow at the system level to maximize operand reuse, thereby minimizing costly on- and off-chip data movements during the SNN execution. Measurement results of a fabricated FlexSpIM prototype in 40-nm CMOS demonstrate a 2$\times$ increase in bit-normalized energy efficiency compared to prior fixed-precision digital CIM-SNNs, while providing resolution reconfiguration with bitwise granularity. Our approach can save up to 90% energy in large-scale systems, while reaching a state-of-the-art classification accuracy of 95.8% on the IBM DVS gesture dataset.

Updated: 2024-10-30 14:55:13

标题: 一种具有灵活操作数分辨率和逐层权重/输出稳定性的基于事件的数字计算内存加速器

摘要: 计算内存（CIM）加速器用于脉冲神经网络（SNN）是在边缘视觉应用中实现微秒级推断延迟和超低能量的有前途的解决方案。然而，它们当前在电路和系统级别的灵活性不足，阻碍了它们在各种实际场景中的部署。在这项工作中，我们提出了一种支持任意操作数分辨率和形状的新型数字CIM宏，具有统一的用于权重和膜电位的CIM存储。这些电路级技术在系统级别实现了混合权重和输出静态数据流，以最大化操作数重用，从而在SNN执行过程中最大程度地减少昂贵的片上和片外数据移动。在40纳米CMOS上制造的FlexSpIM原型的测量结果显示，与先前的固定精度数字CIM-SNN相比，能效提高了2倍，同时提供了按位粒度的分辨率重配置。我们的方法可以在大规模系统中节省高达90%的能量，同时在IBM DVS手势数据集上达到95.8%的最新分类精度。

更新时间: 2024-10-30 14:55:13

领域: cs.AR,cs.AI,B.2.0; B.3.0; B.6.0; B.7.0; C.3

下载: http://arxiv.org/abs/2410.23082v1

Utilizing Large Language Models in an iterative paradigm with Domain feedback for Zero-shot Molecule optimization

Molecule optimization is a critical task in drug discovery to optimize desired properties of a given molecule through chemical modification. Despite Large Language Models (LLMs) holding the potential to efficiently simulate this task by using natural language to direct the optimization, straightforwardly utilizing shows limited performance. In this work, we facilitate utilizing LLMs in an iterative paradigm by proposing a simple yet highly effective domain feedback provider, namely $\text{Re}^3$DF. In detail, $\text{Re}^3$DF harnesses an external toolkit, RDKit, to handle the molecule hallucination, if the modified molecule is chemically invalid. Otherwise, its desired properties are computed and compared to the original one, establishing reliable domain feedback with correct direction and distance towards the objective, followed by a retrieved example, to explicitly guide the LLM to refine the modified molecule. We conduct experiments across both single- and multi-property objectives with 2 thresholds, where $\text{Re}^3$DF shows significant improvements. Particularly, for 20 single-property objectives, $\text{Re}^3$DF enhances Hit ratio by 16.95% and 20.76% under loose and strict thresholds, respectively. For 32 multi-property objectives, $\text{Re}^3$DF enhances Hit ratio by 6.04% and 5.25%.

Updated: 2024-10-30 14:54:25

标题: 使用大型语言模型在具有领域反馈的迭代范式中进行零样本分子优化

摘要: 分子优化是药物发现中的关键任务，通过化学修饰来优化给定分子的所需性质。尽管大型语言模型（LLMs）有潜力通过使用自然语言来指导优化任务，但直接利用表现受限。在这项工作中，我们通过提出一个简单但高效的领域反馈提供者$\text{Re}^3$DF，促进了LLMs在迭代范式中的应用。具体来说，$\text{Re}^3$DF利用外部工具包RDKit来处理分子幻觉，如果修改后的分子在化学上无效。否则，计算其所需性质并与原始性质进行比较，建立可靠的领域反馈，正确指向目标的方向和距离，然后检索示例，明确指导LLMs优化修改后的分子。我们在单一和多属性目标上进行实验，设置了2个阈值，$\text{Re}^3$DF显示出显著的改进。特别地，对于20个单一属性目标，$\text{Re}^3$DF在宽松和严格的阈值下分别将命中率提高了16.95%和20.76%。对于32个多属性目标，$\text{Re}^3$DF将命中率分别提高了6.04%和5.25%。

更新时间: 2024-10-30 14:54:25

领域: cs.LG,cs.AI,cs.CV

下载: http://arxiv.org/abs/2410.13147v4

BUZZ: Beehive-structured Sparse KV Cache with Segmented Heavy Hitters for Efficient LLM Inference

Large language models (LLMs) are essential in natural language processing but often struggle with inference speed and computational efficiency, limiting real-time deployment. The key-value (KV) cache mechanism reduces computational overhead in transformer models, but challenges in maintaining contextual understanding remain. In this paper, we propose BUZZ, a novel KV caching algorithm that leverages structured contextual information to minimize cache memory usage while enhancing inference speed. BUZZ employs a beehive-structured sparse cache, incorporating a sliding window to capture recent information and dynamically segmenting historical tokens into chunks to prioritize important tokens in local neighborhoods. We evaluate BUZZ on four real-world datasets: CNN/Daily Mail, XSUM, Wikitext, and 10-QA. Our results demonstrate that BUZZ (1) reduces cache memory usage by $\textbf{2.5}\times$ in LLM inference while maintaining over 99% accuracy in long-text summarization, and (2) surpasses state-of-the-art performance in multi-document question answering by $\textbf{7.69%}$ under the same memory limit, where full cache methods encounter out-of-memory issues. Additionally, BUZZ achieves significant inference speedup with a $\log{n}$ time complexity. The code is available at https://github.com/JunqiZhao888/buzz-llm.

Updated: 2024-10-30 14:53:37

标题: 嗡嗡声：具有分段重点击中项的蜂巢结构稀疏KV缓存，用于高效的LLM推理

摘要: 大型语言模型（LLMs）在自然语言处理中至关重要，但往往在推理速度和计算效率方面遇到困难，限制了实时部署。键-值（KV）缓存机制可以减少变压器模型中的计算开销，但在维护上下文理解方面仍存在挑战。在本文中，我们提出了一种新颖的KV缓存算法BUZZ，利用结构化的上下文信息来最小化缓存内存使用量，同时增强推理速度。BUZZ采用了一个蜂巢结构的稀疏缓存，包括一个滑动窗口来捕获最近的信息，并动态地将历史标记分成块，以优先处理本地邻域中的重要标记。我们在四个真实数据集上评估了BUZZ：CNN/Daily Mail、XSUM、Wikitext和10-QA。我们的结果表明，BUZZ（1）在LLM推理中将缓存内存使用量降低了$\textbf{2.5}\times$，同时在长文本摘要中保持了超过99%的准确性，（2）在相同的内存限制下，在多文档问答方面超过了最先进的性能$\textbf{7.69%}$，而全缓存方法遇到了内存不足的问题。此外，BUZZ实现了显著的推理加速，时间复杂度为$\log{n}$。代码可在https://github.com/JunqiZhao888/buzz-llm上找到。

更新时间: 2024-10-30 14:53:37

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2410.23079v1

MInference 1.0: Accelerating Pre-filling for Long-Context LLMs via Dynamic Sparse Attention

The computational challenges of Large Language Model (LLM) inference remain a significant barrier to their widespread deployment, especially as prompt lengths continue to increase. Due to the quadratic complexity of the attention computation, it takes 30 minutes for an 8B LLM to process a prompt of 1M tokens (i.e., the pre-filling stage) on a single A100 GPU. Existing methods for speeding up prefilling often fail to maintain acceptable accuracy or efficiency when applied to long-context LLMs. To address this gap, we introduce MInference (Milliontokens Inference), a sparse calculation method designed to accelerate pre-filling of long-sequence processing. Specifically, we identify three unique patterns in long-context attention matrices-the A-shape, Vertical-Slash, and Block-Sparsethat can be leveraged for efficient sparse computation on GPUs. We determine the optimal pattern for each attention head offline and dynamically build sparse indices based on the assigned pattern during inference. With the pattern and sparse indices, we perform efficient sparse attention calculations via our optimized GPU kernels to significantly reduce the latency in the pre-filling stage of long-context LLMs. Our proposed technique can be directly applied to existing LLMs without any modifications to the pre-training setup or additional fine-tuning. By evaluating on a wide range of downstream tasks, including InfiniteBench, RULER, PG-19, and Needle In A Haystack, and models including LLaMA-3-1M, GLM4-1M, Yi-200K, Phi-3-128K, and Qwen2-128K, we demonstrate that MInference effectively reduces inference latency by up to 10x for pre-filling on an A100, while maintaining accuracy. Our code is available at https://aka.ms/MInference.

Updated: 2024-10-30 14:53:22

标题: MInference 1.0：通过动态稀疏注意力加速长文本LLM的预填充

摘要: 大型语言模型（LLM）推断的计算挑战仍然是它们广泛部署的重要障碍，特别是随着提示长度的持续增加。由于注意力计算的二次复杂度，一个8B的LLM在单个A100 GPU上处理1M标记的提示（即预填充阶段）需要30分钟。现有的加速预填充的方法通常在应用于长上下文LLM时无法保持可接受的准确性或效率。为了解决这一问题，我们引入了MInference（百万标记推断），这是一种稀疏计算方法，旨在加速长序列处理的预填充。具体而言，我们在长上下文注意力矩阵中确定了三种独特的模式- A形状、竖杠和块稀疏，可以利用这些模式在GPU上进行高效稀疏计算。我们在离线确定每个注意力头的最佳模式，并根据分配的模式在推断过程中动态构建稀疏索引。有了模式和稀疏索引，我们通过我们优化的GPU内核执行高效的稀疏注意力计算，显著降低长上下文LLM预填充阶段的延迟。我们提出的技术可以直接应用于现有的LLM，无需修改预训练设置或额外微调。通过在一系列下游任务上进行评估，包括InfiniteBench、RULER、PG-19和Needle In A Haystack，以及LLaMA-3-1M、GLM4-1M、Yi-200K、Phi-3-128K和Qwen2-128K等模型，我们证明了MInference可以有效地将A100上的预填充推断延迟降低多达10倍，同时保持准确性。我们的代码可在https://aka.ms/MInference上找到。

更新时间: 2024-10-30 14:53:22

领域: cs.CL,cs.LG

下载: http://arxiv.org/abs/2407.02490v2

Benchmarking Agentic Workflow Generation

Large Language Models (LLMs), with their exceptional ability to handle a wide range of tasks, have driven significant advancements in tackling reasoning and planning tasks, wherein decomposing complex problems into executable workflows is a crucial step in this process. Existing workflow evaluation frameworks either focus solely on holistic performance or suffer from limitations such as restricted scenario coverage, simplistic workflow structures, and lax evaluation standards. To this end, we introduce WorFBench, a unified workflow generation benchmark with multi-faceted scenarios and intricate graph workflow structures. Additionally, we present WorFEval, a systemic evaluation protocol utilizing subsequence and subgraph matching algorithms to accurately quantify the LLM agent's workflow generation capabilities. Through comprehensive evaluations across different types of LLMs, we discover distinct gaps between the sequence planning capabilities and graph planning capabilities of LLM agents, with even GPT-4 exhibiting a gap of around 15%. We also train two open-source models and evaluate their generalization abilities on held-out tasks. Furthermore, we observe that the generated workflows can enhance downstream tasks, enabling them to achieve superior performance with less time during inference. Code and dataset are available at https://github.com/zjunlp/WorFBench.

Updated: 2024-10-30 14:49:49

标题: 基准测试代理工作流生成

摘要: 大型语言模型（LLM）以其出色的处理各种任务的能力，推动了在处理推理和规划任务方面的重大进展，其中将复杂问题分解为可执行工作流是这一过程中的关键步骤。现有的工作流评估框架要么仅关注整体性能，要么存在诸如受限场景覆盖、简单工作流结构和宽松评估标准等限制。为此，我们引入了WorFBench，一个统一的工作流生成基准，具有多方面的场景和复杂的图形工作流结构。此外，我们提出了WorFEval，一个系统化的评估协议，利用子序列和子图匹配算法来准确量化LLM代理的工作流生成能力。通过对不同类型的LLM进行全面评估，我们发现LLM代理的序列规划能力和图形规划能力之间存在明显差距，即使是GPT-4也存在约15％的差距。我们还训练了两个开源模型，并评估它们在保留任务上的泛化能力。此外，我们观察到生成的工作流可以增强下游任务，使它们在推断期间能够以更少的时间实现更优越的性能。代码和数据集可在https://github.com/zjunlp/WorFBench获取。

更新时间: 2024-10-30 14:49:49

领域: cs.CL,cs.AI,cs.HC,cs.LG,cs.MA

下载: http://arxiv.org/abs/2410.07869v2

CNN Explainability with Multivector Tucker Saliency Maps for Self-Supervised Models

Interpreting the decisions of Convolutional Neural Networks (CNNs) is essential for understanding their behavior, yet explainability remains a significant challenge, particularly for self-supervised models. Most existing methods for generating saliency maps rely on ground truth labels, restricting their use to supervised tasks. EigenCAM is the only notable label-independent alternative, leveraging Singular Value Decomposition to generate saliency maps applicable across CNN models, but it does not fully exploit the tensorial structure of feature maps. In this work, we introduce the Tucker Saliency Map (TSM) method, which applies Tucker tensor decomposition to better capture the inherent structure of feature maps, producing more accurate singular vectors and values. These are used to generate high-fidelity saliency maps, effectively highlighting objects of interest in the input. We further extend EigenCAM and TSM into multivector variants -Multivec-EigenCAM and Multivector Tucker Saliency Maps (MTSM)- which utilize all singular vectors and values, further improving saliency map quality. Quantitative evaluations on supervised classification models demonstrate that TSM, Multivec-EigenCAM, and MTSM achieve competitive performance with label-dependent methods. Moreover, TSM enhances explainability by approximately 50% over EigenCAM for both supervised and self-supervised models. Multivec-EigenCAM and MTSM further advance state-of-the-art explainability performance on self-supervised models, with MTSM achieving the best results.

Updated: 2024-10-30 14:46:34

标题: 使用多维张量塔克显著性图解释CNN自监督模型

摘要: 卷积神经网络（CNNs）的决策解释对于理解其行为至关重要，然而可解释性仍然是一个重大挑战，特别是对于自监督模型。大多数现有的生成显著性地图的方法依赖于地面真实标签，限制了它们在受监督任务中的使用。EigenCAM是唯一一个显著性地图无需标签的备选方案，利用奇异值分解来生成适用于CNN模型的显著性地图，但它并没有充分利用特征图的张量结构。在这项工作中，我们介绍了Tucker显著性地图（TSM）方法，该方法应用Tucker张量分解来更好地捕捉特征图的固有结构，产生更准确的奇异向量和值。这些被用来生成高保真度的显著性地图，有效地突出输入中感兴趣的对象。我们进一步将EigenCAM和TSM扩展为多矢量变体-Multivec-EigenCAM和Multivector Tucker Saliency Maps（MTSM）-利用所有奇异向量和值，进一步提高显著性地图的质量。在受监督的分类模型上的定量评估表明，TSM、Multivec-EigenCAM和MTSM与依赖标签的方法实现了竞争性能。此外，对于受监督和自监督模型，TSM将可解释性提高了约50%。Multivec-EigenCAM和MTSM进一步提高了自监督模型的最新可解释性表现，其中MTSM取得了最佳结果。

更新时间: 2024-10-30 14:46:34

领域: cs.CV,cs.AI,cs.LG

下载: http://arxiv.org/abs/2410.23072v1

LLMs Integration in Software Engineering Team Projects: Roles, Impact, and a Pedagogical Design Space for AI Tools in Computing Education

This work takes a pedagogical lens to explore the implications of generative AI (GenAI) models and tools, such as ChatGPT and GitHub Copilot, in a semester-long 2nd-year undergraduate Software Engineering Team Project. Qualitative findings from survey (39 students) and interviews (eight students) provide insights into the students' views on the impact of GenAI use on their coding experience, learning, and self-efficacy. Our results address a particular gap in understanding the role and implications of GenAI on teamwork, team-efficacy, and team dynamics. The analysis of the learning aspects is distinguished by the application of learning and pedagogy informed lenses to discuss the data. We propose a preliminary design space for GenAI-based programming learning tools highlighting the importance of considering the roles that GenAI can play during the learning process, the varying support-ability patterns that can be applied to each role, and the importance of supporting transparency in GenAI for team members and students in addition to educators.

Updated: 2024-10-30 14:43:33

标题: LLMs在软件工程团队项目中的整合：角色、影响以及计算机教育中AI工具的教学设计空间

摘要: 这项工作从教育的角度探讨了生成AI（GenAI）模型和工具（如ChatGPT和GitHub Copilot）在一个学期长的大二软件工程团队项目中的影响。通过对39名学生进行的调查和对8名学生进行的访谈，我们获得了关于学生对GenAI使用对他们编码体验、学习和自我效能的影响看法的深入见解。我们的结果填补了了解GenAI对团队合作、团队效能和团队动态作用和影响的特定空白。对学习方面的分析通过应用学习和教育启发的视角来讨论数据而显得独特。我们提出了一个基于GenAI的编程学习工具的初步设计空间，强调了在学习过程中考虑GenAI可以发挥的作用、可以应用于每个角色的不同支持模式以及支持团队成员和学生透明性的重要性，除了教育者。

更新时间: 2024-10-30 14:43:33

领域: cs.SE,cs.AI

下载: http://arxiv.org/abs/2410.23069v1

Don't Just Pay Attention, PLANT It: Transfer L2R Models to Fine-tune Attention in Extreme Multi-Label Text Classification

State-of-the-art Extreme Multi-Label Text Classification (XMTC) models rely heavily on multi-label attention layers to focus on key tokens in input text, but obtaining optimal attention weights is challenging and resource-intensive. To address this, we introduce PLANT -- Pretrained and Leveraged AtteNTion -- a novel transfer learning strategy for fine-tuning XMTC decoders. PLANT surpasses existing state-of-the-art methods across all metrics on mimicfull, mimicfifty, mimicfour, eurlex, and wikiten datasets. It particularly excels in few-shot scenarios, outperforming previous models specifically designed for few-shot scenarios by over 50 percentage points in F1 scores on mimicrare and by over 36 percentage points on mimicfew, demonstrating its superior capability in handling rare codes. PLANT also shows remarkable data efficiency in few-shot scenarios, achieving precision comparable to traditional models with significantly less data. These results are achieved through key technical innovations: leveraging a pretrained Learning-to-Rank model as the planted attention layer, integrating mutual-information gain to enhance attention, introducing an inattention mechanism, and implementing a stateful-decoder to maintain context. Comprehensive ablation studies validate the importance of these contributions in realizing the performance gains.

Updated: 2024-10-30 14:41:23

标题: 不仅仅要关注，要种植它：将L2R模型转移到极端多标签文本分类中微调注意力

摘要: 最先进的极端多标签文本分类（XMTC）模型主要依赖于多标签注意力层，以便聚焦输入文本中的关键标记，但获得最佳的注意力权重是具有挑战性和资源密集的。为解决这一问题，我们引入了PLANT —— 预训练和利用关注（Pretrained and Leveraged AtteNTion）—— 一种新颖的迁移学习策略，用于微调XMTC解码器。PLANT在mimicfull、mimicfifty、mimicfour、eurlex和wikiten数据集上超越了现有的最先进方法。它在少样本情景中表现出色，其F1分数在mimicrare上超过了之前专门为少样本情景设计的模型50个百分点，在mimicfew上超过了36个百分点，展示了其在处理稀有代码方面的卓越能力。PLANT在少样本情景中也表现出卓越的数据效率，以显著较少的数据实现与传统模型相当的精确度。这些成果是通过关键技术创新实现的：利用预训练的学习排序模型作为种植的注意力层，整合互信息增益以增强注意力，引入了一个无关注机制，并实现了一个有状态的解码器以保持上下文。全面的消融研究验证了这些贡献在实现性能提升方面的重要性。

更新时间: 2024-10-30 14:41:23

领域: cs.CL,cs.LG

下载: http://arxiv.org/abs/2410.23066v1

Full Event Particle-Level Unfolding with Variable-Length Latent Variational Diffusion

The measurements performed by particle physics experiments must account for the imperfect response of the detectors used to observe the interactions. One approach, unfolding, statistically adjusts the experimental data for detector effects. Recently, generative machine learning models have shown promise for performing unbinned unfolding in a high number of dimensions. However, all current generative approaches are limited to unfolding a fixed set of observables, making them unable to perform full-event unfolding in the variable dimensional environment of collider data. A novel modification to the variational latent diffusion model (VLD) approach to generative unfolding is presented, which allows for unfolding of high- and variable-dimensional feature spaces. The performance of this method is evaluated in the context of semi-leptonic top quark pair production at the Large Hadron Collider.

Updated: 2024-10-30 14:39:15

标题: 使用可变长度潜变分扩散进行完整事件粒子级展开

摘要: 粒子物理实验所进行的测量必须考虑探测器对观察相互作用的不完全响应。一种方法，即展开，可以统计上调整实验数据以考虑探测器效应。最近，生成式机器学习模型显示出在高维度中执行未分组展开的潜力。然而，所有当前的生成方法都局限于展开一组固定的可观测量，使它们无法在碰撞器数据的可变维度环境中执行完整事件展开。本文提出了对生成展开方法（VLD）的变分潜扩散模型进行的一种新颖修改，允许在高维度和可变维度的特征空间中进行展开。该方法的性能在大型强子对撞机上半轻子顶夸克对产生的背景下进行评估。

更新时间: 2024-10-30 14:39:15

领域: hep-ex,cs.AI,cs.LG,hep-ph

下载: http://arxiv.org/abs/2404.14332v2

Certified Minimax Unlearning with Generalization Rates and Deletion Capacity

We study the problem of $(\epsilon,\delta)$-certified machine unlearning for minimax models. Most of the existing works focus on unlearning from standard statistical learning models that have a single variable and their unlearning steps hinge on the direct Hessian-based conventional Newton update. We develop a new $(\epsilon,\delta)$-certified machine unlearning algorithm for minimax models. It proposes a minimax unlearning step consisting of a total-Hessian-based complete Newton update and the Gaussian mechanism borrowed from differential privacy. To obtain the unlearning certification, our method injects calibrated Gaussian noises by carefully analyzing the "sensitivity" of the minimax unlearning step (i.e., the closeness between the minimax unlearning variables and the retraining-from-scratch variables). We derive the generalization rates in terms of population strong and weak primal-dual risk for three different cases of loss functions, i.e., (strongly-)convex-(strongly-)concave losses. We also provide the deletion capacity to guarantee that a desired population risk can be maintained as long as the number of deleted samples does not exceed the derived amount. With training samples $n$ and model dimension $d$, it yields the order $\mathcal O(n/d^{1/4})$, which shows a strict gap over the baseline method of differentially private minimax learning that has $\mathcal O(n/d^{1/2})$. In addition, our rates of generalization and deletion capacity match the state-of-the-art rates derived previously for standard statistical learning models.

Updated: 2024-10-30 14:37:32

标题: 经过认证的最小化遗忘与泛化速率和删除能力

摘要: 我们研究了$(\epsilon,\delta)$-认证机器遗忘问题，针对极小化模型。大多数现有的研究集中在从具有单个变量的标准统计学习模型进行遗忘，它们的遗忘步骤依赖于基于直接Hessian的传统牛顿更新。我们为极小化模型开发了一种新的$(\epsilon,\delta)$-认证机器遗忘算法。它提出了一个包含总Hessian的完整牛顿更新和从差分隐私借鉴的高斯机制的极小化遗忘步骤。为了获得遗忘认证，我们的方法通过仔细分析极小化遗忘步骤的“敏感性”（即极小化遗忘变量与重新训练变量之间的接近程度），注入校准的高斯噪声。我们根据人口强弱原始-对偶风险推导了三种不同损失函数的泛化率，即（强）凸-（强）凹损失。我们还提供了删除容量，以确保只要删除的样本数不超过推导出的量，就可以保持所需的人口风险。对于训练样本$n$和模型维度$d$，它的阶数为$\mathcal O(n/d^{1/4})$，这显示出与具有$\mathcal O(n/d^{1/2})$的差分隐私极小化学习基准方法之间的严格差距。此外，我们的泛化率和删除容量与以前为标准统计学习模型推导的最新速率相匹配。

更新时间: 2024-10-30 14:37:32

领域: cs.LG

下载: http://arxiv.org/abs/2312.10336v2

Diffusion for World Modeling: Visual Details Matter in Atari

World models constitute a promising approach for training reinforcement learning agents in a safe and sample-efficient manner. Recent world models predominantly operate on sequences of discrete latent variables to model environment dynamics. However, this compression into a compact discrete representation may ignore visual details that are important for reinforcement learning. Concurrently, diffusion models have become a dominant approach for image generation, challenging well-established methods modeling discrete latents. Motivated by this paradigm shift, we introduce DIAMOND (DIffusion As a Model Of eNvironment Dreams), a reinforcement learning agent trained in a diffusion world model. We analyze the key design choices that are required to make diffusion suitable for world modeling, and demonstrate how improved visual details can lead to improved agent performance. DIAMOND achieves a mean human normalized score of 1.46 on the competitive Atari 100k benchmark; a new best for agents trained entirely within a world model. We further demonstrate that DIAMOND's diffusion world model can stand alone as an interactive neural game engine by training on static Counter-Strike: Global Offensive gameplay. To foster future research on diffusion for world modeling, we release our code, agents, videos and playable world models at https://diamond-wm.github.io.

Updated: 2024-10-30 14:34:49

标题: 世界建模中的扩散：Atari中的视觉细节至关重要

摘要: 世界模型构成了一种有前途的方法，可以以安全和高效的方式训练强化学习代理。最近的世界模型主要是基于序列的离散潜变量来建模环境动态。然而，这种压缩成紧凑的离散表示可能会忽略对强化学习重要的视觉细节。与此同时，扩散模型已经成为图像生成的主要方法，挑战着传统的建模离散潜变量的方法。受到这种范式转变的启发，我们引入了DIAMOND（DIffusion As a Model Of eNvironment Dreams），这是一个在扩散世界模型中训练的强化学习代理。我们分析了使扩散适合世界建模所需的关键设计选择，并展示了如何改善视觉细节可以提高代理性能。DIAMOND在竞争性Atari 100k基准测试中取得了平均人类标准化得分为1.46；这是完全在世界模型中训练的代理的新记录。我们进一步证明了DIAMOND的扩散世界模型可以独立作为一个交互式神经游戏引擎，通过在静态《反恐精英：全球攻势》游戏玩法上进行训练。为了促进未来关于扩散用于世界建模的研究，我们在https://diamond-wm.github.io上发布了我们的代码、代理、视频和可玩世界模型。

更新时间: 2024-10-30 14:34:49

领域: cs.LG,cs.AI,cs.CV

下载: http://arxiv.org/abs/2405.12399v2

Predicting Molecular Ground-State Conformation via Conformation Optimization

Predicting ground-state conformation from the corresponding molecular graph is crucial for many chemical applications, such as molecular modeling, molecular docking, and molecular property prediction. Recently, many learning-based methods have been proposed to replace time-consuming simulations for this task. However, these methods are often inefficient and sub-optimal as they merely rely on molecular graph information to make predictions from scratch. In this work, considering that molecular low-quality conformations are readily available, we propose a novel framework called ConfOpt to predict molecular ground-state conformation from the perspective of conformation optimization. Specifically, ConfOpt takes the molecular graph and corresponding low-quality 3D conformation as inputs, and then derives the ground-state conformation by iteratively optimizing the low-quality conformation under the guidance of the molecular graph. During training, ConfOpt concurrently optimizes the predicted atomic 3D coordinates and the corresponding interatomic distances, resulting in a strong predictive model. Extensive experiments demonstrate that ConfOpt significantly outperforms existing methods, thus providing a new paradigm for efficiently and accurately predicting molecular ground-state conformation.

Updated: 2024-10-30 14:33:37

标题: 通过构象优化预测分子基态构象

摘要: 从相应的分子图预测基态构象对于许多化学应用非常关键，如分子建模、分子对接和分子性质预测。最近，许多基于学习的方法已被提出，用于替代耗时的模拟来完成这项任务。然而，这些方法通常效率低下且不够优化，因为它们仅仅依赖于分子图信息从头开始进行预测。在这项工作中，考虑到分子低质量构象是readily available的，我们提出了一个称为ConfOpt的新框架，从构象优化的角度预测分子基态构象。具体来说，ConfOpt以分子图和相应的低质量3D构象作为输入，然后通过在分子图的指导下迭代优化低质量构象来推导基态构象。在训练过程中，ConfOpt同时优化预测的原子3D坐标和相应的原子间距，从而产生一个强大的预测模型。大量实验表明，ConfOpt明显优于现有方法，因此提供了一个新的范式，可以高效且准确地预测分子基态构象。

更新时间: 2024-10-30 14:33:37

领域: q-bio.BM,cs.AI,cs.LG,physics.chem-ph

下载: http://arxiv.org/abs/2410.09795v2

Diff-A-Riff: Musical Accompaniment Co-creation via Latent Diffusion Models

Recent advancements in deep generative models present new opportunities for music production but also pose challenges, such as high computational demands and limited audio quality. Moreover, current systems frequently rely solely on text input and typically focus on producing complete musical pieces, which is incompatible with existing workflows in music production. To address these issues, we introduce "Diff-A-Riff," a Latent Diffusion Model designed to generate high-quality instrumental accompaniments adaptable to any musical context. This model offers control through either audio references, text prompts, or both, and produces 48kHz pseudo-stereo audio while significantly reducing inference time and memory usage. We demonstrate the model's capabilities through objective metrics and subjective listening tests, with extensive examples available on the accompanying website: sonycslparis.github.io/diffariff-companion/

Updated: 2024-10-30 14:33:27

标题: Diff-A-Riff：通过潜在扩散模型进行音乐伴奏共创

摘要: 最近深度生成模型的进展为音乐制作带来了新的机遇，但也带来了挑战，如高计算需求和有限的音频质量。此外，当前系统通常仅依赖文本输入，并通常专注于生成完整的音乐作品，这与音乐制作中现有的工作流程不兼容。为了解决这些问题，我们引入了“Diff-A-Riff”，一种设计用于生成适应任何音乐环境的高质量乐器伴奏的潜在扩散模型。该模型通过音频参考、文本提示或两者都提供控制，并在显著减少推断时间和内存使用的同时生成48kHz伪立体声音频。我们通过客观指标和主观听力测试展示了模型的能力，并在相关网站上提供了大量示例：sonycslparis.github.io/diffariff-companion/

更新时间: 2024-10-30 14:33:27

领域: cs.SD,cs.AI,eess.AS

下载: http://arxiv.org/abs/2406.08384v2

SECURE: Benchmarking Large Language Models for Cybersecurity

Large Language Models (LLMs) have demonstrated potential in cybersecurity applications but have also caused lower confidence due to problems like hallucinations and a lack of truthfulness. Existing benchmarks provide general evaluations but do not sufficiently address the practical and applied aspects of LLM performance in cybersecurity-specific tasks. To address this gap, we introduce the SECURE (Security Extraction, Understanding \& Reasoning Evaluation), a benchmark designed to assess LLMs performance in realistic cybersecurity scenarios. SECURE includes six datasets focussed on the Industrial Control System sector to evaluate knowledge extraction, understanding, and reasoning based on industry-standard sources. Our study evaluates seven state-of-the-art models on these tasks, providing insights into their strengths and weaknesses in cybersecurity contexts, and offer recommendations for improving LLMs reliability as cyber advisory tools.

Updated: 2024-10-30 14:29:37

标题: SECURE：用于网络安全的大型语言模型基准测试

摘要: 大型语言模型（LLMs）在网络安全应用中展示了潜力，但也因出现幻觉和缺乏真实性等问题而降低了信心。现有的基准测试提供了一般性评估，但并未充分解决LLMs在网络安全特定任务中性能的实际和应用方面。为填补这一空白，我们引入了SECURE（安全提取、理解和推理评估），这是一个旨在评估LLMs在现实网络安全场景中表现的基准测试。SECURE包括六个数据集，专注于工业控制系统领域，以评估基于行业标准来源的知识提取、理解和推理。我们的研究评估了七种最先进的模型在这些任务上的表现，提供了有关它们在网络安全环境中优势和弱点的见解，并提出了改进LLMs作为网络安全咨询工具可靠性的建议。

更新时间: 2024-10-30 14:29:37

领域: cs.CR,cs.AI,cs.HC

下载: http://arxiv.org/abs/2405.20441v4

Improved Particle Approximation Error for Mean Field Neural Networks

Mean-field Langevin dynamics (MFLD) minimizes an entropy-regularized nonlinear convex functional defined over the space of probability distributions. MFLD has gained attention due to its connection with noisy gradient descent for mean-field two-layer neural networks. Unlike standard Langevin dynamics, the nonlinearity of the objective functional induces particle interactions, necessitating multiple particles to approximate the dynamics in a finite-particle setting. Recent works (Chen et al., 2022; Suzuki et al., 2023b) have demonstrated the uniform-in-time propagation of chaos for MFLD, showing that the gap between the particle system and its mean-field limit uniformly shrinks over time as the number of particles increases. In this work, we improve the dependence on logarithmic Sobolev inequality (LSI) constants in their particle approximation errors, which can exponentially deteriorate with the regularization coefficient. Specifically, we establish an LSI-constant-free particle approximation error concerning the objective gap by leveraging the problem structure in risk minimization. As the application, we demonstrate improved convergence of MFLD, sampling guarantee for the mean-field stationary distribution, and uniform-in-time Wasserstein propagation of chaos in terms of particle complexity.

Updated: 2024-10-30 14:24:34

标题: 平均场神经网络的改进粒子逼近误差

摘要: Mean-field Langevin dynamics (MFLD)是通过最小化在概率分布空间上定义的熵正则化非线性凸泛函来实现的。MFLD因其与均场两层神经网络的噪声梯度下降的关联而受到关注。与标准的 Langevin 动力学不同，目标泛函的非线性性引起了粒子之间的相互作用，需要多个粒子来在有限粒子设置中逼近动力学。最近的研究表明，MFLD的时间均匀传播混沌，表明随着粒子数量的增加，粒子系统与其均场极限之间的差距在时间上均匀收缩。在这项工作中，我们改善了粒子逼近误差中对数Sobolev不等式（LSI）常数的依赖性，这些误差可以随着正则化系数的指数恶化。具体来说，我们通过利用风险最小化中的问题结构，建立了一个与LSI常数无关的粒子逼近误差，涉及目标差距。作为应用，我们展示了MFLD的改进收敛性，均场稳态分布的采样保证，以及与粒子复杂度相关的时间均匀 Wasserstein 混沌传播。

更新时间: 2024-10-30 14:24:34

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2405.15767v3

Measuring Progress in Dictionary Learning for Language Model Interpretability with Board Game Models

What latent features are encoded in language model (LM) representations? Recent work on training sparse autoencoders (SAEs) to disentangle interpretable features in LM representations has shown significant promise. However, evaluating the quality of these SAEs is difficult because we lack a ground-truth collection of interpretable features that we expect good SAEs to recover. We thus propose to measure progress in interpretable dictionary learning by working in the setting of LMs trained on chess and Othello transcripts. These settings carry natural collections of interpretable features -- for example, "there is a knight on F3" -- which we leverage into $\textit{supervised}$ metrics for SAE quality. To guide progress in interpretable dictionary learning, we introduce a new SAE training technique, $\textit{p-annealing}$, which improves performance on prior unsupervised metrics as well as our new metrics.

Updated: 2024-10-30 14:21:59

标题: 使用棋盘游戏模型来衡量词典学习在语言模型可解释性方面的进展

摘要: 语言模型（LM）表示中编码了哪些潜在特征？最近关于训练稀疏自动编码器（SAEs）以解开LM表示中可解释特征的研究显示出了显著的潜力。然而，评估这些SAEs的质量是困难的，因为我们缺乏一组可解释特征的基本真值集合，我们期望良好的SAEs能够恢复。因此，我们建议通过在训练了国际象棋和奥赛洛（Othello）的文本的LM设置中工作来测量可解释字典学习的进展。这些设置包含自然的可解释特征集合 -- 例如，“F3上有一个骑士” -- 我们利用这些特征为SAE质量提供监督度量。为了引导可解释字典学习的进展，我们引入了一种新的SAE训练技术，即p-退火（p-annealing），它提高了先前无监督度量以及我们的新度量的性能。

更新时间: 2024-10-30 14:21:59

领域: cs.LG,cs.AI,cs.CL

下载: http://arxiv.org/abs/2408.00113v2

Controlling Language and Diffusion Models by Transporting Activations

The increasing capabilities of large generative models and their ever more widespread deployment have raised concerns about their reliability, safety, and potential misuse. To address these issues, recent works have proposed to control model generation by steering model activations in order to effectively induce or prevent the emergence of concepts or behaviors in the generated output. In this paper we introduce Activation Transport (AcT), a general framework to steer activations guided by optimal transport theory that generalizes many previous activation-steering works. AcT is modality-agnostic and provides fine-grained control over the model behavior with negligible computational overhead, while minimally impacting model abilities. We experimentally show the effectiveness and versatility of our approach by addressing key challenges in large language models (LLMs) and text-to-image diffusion models (T2Is). For LLMs, we show that AcT can effectively mitigate toxicity, induce arbitrary concepts, and increase their truthfulness. In T2Is, we show how AcT enables fine-grained style control and concept negation.

Updated: 2024-10-30 14:21:33

标题: 通过传输激活来控制语言和扩散模型

摘要: 随着大型生成模型的能力不断增强以及它们的更广泛应用，人们对它们的可靠性、安全性和潜在误用引起了关注。为了解决这些问题，最近的研究提出通过引导模型激活来控制模型生成，以有效诱导或阻止生成输出中概念或行为的出现。在本文中，我们介绍了Activation Transport (AcT)，这是一个通过最优传输理论引导激活的通用框架，它泛化了许多先前的激活引导工作。AcT是模态无关的，并且在对模型行为提供细粒度控制时具有可忽略的计算开销，同时最小程度地影响模型能力。我们通过实验证明了我们的方法的有效性和多功能性，解决了大型语言模型（LLMs）和文本-图像扩散模型（T2Is）中的关键挑战。对于LLMs，我们展示了AcT可以有效减轻毒性，诱导任意概念，并增加其真实性。在T2Is中，我们展示了AcT如何实现细粒度风格控制和概念否定。

更新时间: 2024-10-30 14:21:33

领域: cs.LG,cs.AI,cs.CL,cs.CV,68T07, 49Q22,I.2.6; I.2.7; I.4.8

下载: http://arxiv.org/abs/2410.23054v1

Generating Code World Models with Large Language Models Guided by Monte Carlo Tree Search

In this work we consider Code World Models, world models generated by a Large Language Model (LLM) in the form of Python code for model-based Reinforcement Learning (RL). Calling code instead of LLMs for planning has potential to be more precise, reliable, interpretable, and extremely efficient. However, writing appropriate Code World Models requires the ability to understand complex instructions, to generate exact code with non-trivial logic and to self-debug a long program with feedback from unit tests and environment trajectories. To address these challenges, we propose Generate, Improve and Fix with Monte Carlo Tree Search (GIF-MCTS), a new code generation strategy for LLMs. To test our approach in an offline RL setting, we introduce the Code World Models Benchmark (CWMB), a suite of program synthesis and planning tasks comprised of 18 diverse RL environments paired with corresponding textual descriptions and curated trajectories. GIF-MCTS surpasses all baselines on the CWMB and two other benchmarks, and we show that the Code World Models synthesized with it can be successfully used for planning, resulting in model-based RL agents with greatly improved sample efficiency and inference speed.

Updated: 2024-10-30 14:19:57

标题: 使用蒙特卡洛树搜索指导的大型语言模型生成代码世界模型

摘要: 在这项工作中，我们考虑了由大型语言模型（LLM）生成的代码世界模型，这些模型采用Python代码形式用于基于模型的强化学习（RL）。调用代码而不是LLM进行规划具有更高的精确性、可靠性、可解释性和极高的效率潜力。然而，编写适当的代码世界模型需要具备理解复杂指令、生成带有非平凡逻辑的确切代码以及通过单元测试和环境轨迹反馈自我调试的能力。为了解决这些挑战，我们提出了一种新的LLM代码生成策略Generate, Improve and Fix with Monte Carlo Tree Search（GIF-MCTS）。为了在离线RL设置中测试我们的方法，我们引入了代码世界模型基准（CWMB），这是一个由18个不同RL环境组成的程序综合和规划任务套件，配备了相应的文本描述和策划轨迹。GIF-MCTS在CWMB和其他两个基准测试中都超越了所有基线，我们展示了与之合成的代码世界模型可以成功用于规划，从而产生基于模型的RL代理，具有大大提高的样本效率和推理速度。

更新时间: 2024-10-30 14:19:57

领域: cs.AI

下载: http://arxiv.org/abs/2405.15383v2

Reward Centering

We show that discounted methods for solving continuing reinforcement learning problems can perform significantly better if they center their rewards by subtracting out the rewards' empirical average. The improvement is substantial at commonly used discount factors and increases further as the discount factor approaches one. In addition, we show that if a problem's rewards are shifted by a constant, then standard methods perform much worse, whereas methods with reward centering are unaffected. Estimating the average reward is straightforward in the on-policy setting; we propose a slightly more sophisticated method for the off-policy setting. Reward centering is a general idea, so we expect almost every reinforcement-learning algorithm to benefit by the addition of reward centering.

Updated: 2024-10-30 14:18:42

标题: 奖励中心化

摘要: 我们发现，解决持续强化学习问题的折扣方法可以在将奖励居中处理，即减去奖励的经验平均值后，表现显著更好。这种改进在常用的折扣因子下效果显著，并且随着折扣因子接近1而进一步增加。此外，我们发现如果一个问题的奖励被一个常数偏移，标准方法的表现会差很多，而奖励居中的方法则不受影响。在on-policy设置中，估计平均奖励是直接的；我们提出了一个稍微更复杂的方法来处理off-policy设置。奖励居中是一个通用的想法，因此我们期望几乎每个强化学习算法都会受益于奖励居中的添加。

更新时间: 2024-10-30 14:18:42

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2405.09999v2

Spectral Graph Pruning Against Over-Squashing and Over-Smoothing

Message Passing Graph Neural Networks are known to suffer from two problems that are sometimes believed to be diametrically opposed: over-squashing and over-smoothing. The former results from topological bottlenecks that hamper the information flow from distant nodes and are mitigated by spectral gap maximization, primarily, by means of edge additions. However, such additions often promote over-smoothing that renders nodes of different classes less distinguishable. Inspired by the Braess phenomenon, we argue that deleting edges can address over-squashing and over-smoothing simultaneously. This insight explains how edge deletions can improve generalization, thus connecting spectral gap optimization to a seemingly disconnected objective of reducing computational resources by pruning graphs for lottery tickets. To this end, we propose a more effective spectral gap optimization framework to add or delete edges and demonstrate its effectiveness on large heterophilic datasets.

Updated: 2024-10-30 14:17:02

标题: 光谱图修剪抵抗过度压缩和过度平滑

摘要: 消息传递图神经网络被认为存在两个问题，有时被认为是截然相反的：过度压缩和过度平滑。前者是由于拓扑瓶颈导致信息流障碍，主要通过边添加来缓解，从而最大化谱间隙。然而，这种添加通常会促使过度平滑，使不同类别的节点难以区分。受 Braess 现象的启发，我们认为删除边可以同时解决过度压缩和过度平滑问题。这一观点解释了边删除如何改善泛化效果，从而将谱间隙优化与减少计算资源的 seemingly disconnected 目标——为彩票抽奖修剪图连接起来。为此，我们提出了一个更有效的谱间隙优化框架，用于添加或删除边，并在大型异质数据集上展示其有效性。

更新时间: 2024-10-30 14:17:02

领域: cs.LG,eess.SP,stat.ML

下载: http://arxiv.org/abs/2404.04612v2

Legitimate ground-truth-free metrics for deep uncertainty classification scoring

Despite the increasing demand for safer machine learning practices, the use of Uncertainty Quantification (UQ) methods in production remains limited. This limitation is exacerbated by the challenge of validating UQ methods in absence of UQ ground truth. In classification tasks, when only a usual set of test data is at hand, several authors suggested different metrics that can be computed from such test points while assessing the quality of quantified uncertainties. This paper investigates such metrics and proves that they are theoretically well-behaved and actually tied to some uncertainty ground truth which is easily interpretable in terms of model prediction trustworthiness ranking. Equipped with those new results, and given the applicability of those metrics in the usual supervised paradigm, we argue that our contributions will help promoting a broader use of UQ in deep learning.

Updated: 2024-10-30 14:14:32

标题: 无真实标签的深度不确定性分类评分的合法指标

摘要: 尽管对更安全的机器学习实践的需求不断增加，但在生产中使用不确定性量化（UQ）方法仍然受到限制。这种限制加剧了在没有UQ基本事实的情况下验证UQ方法的挑战。在分类任务中，当只有一组常规的测试数据时，有几位作者建议可以从这些测试点计算出不同的度量标准，以评估量化不确定性的质量。本文研究了这些度量标准，并证明它们在理论上行为良好，实际上与一些易于解释的不确定性基本事实相关，这些基本事实与模型预测的可信度排名有关。凭借这些新的结果，并考虑到这些度量标准在常规监督范式中的适用性，我们认为我们的贡献将有助于促进UQ在深度学习中的广泛应用。

更新时间: 2024-10-30 14:14:32

领域: cs.LG

下载: http://arxiv.org/abs/2410.23046v1

Response Estimation and System Identification of Dynamical Systems via Physics-Informed Neural Networks

The accurate modelling of structural dynamics is crucial across numerous engineering applications, such as Structural Health Monitoring (SHM), seismic analysis, and vibration control. Often, these models originate from physics-based principles and can be derived from corresponding governing equations, often of differential equation form. However, complex system characteristics, such as nonlinearities and energy dissipation mechanisms, often imply that such models are approximative and often imprecise. This challenge is further compounded in SHM, where sensor data is often sparse, making it difficult to fully observe the system's states. To address these issues, this paper explores the use of Physics-Informed Neural Networks (PINNs), a class of physics-enhanced machine learning (PEML) techniques, for the identification and estimation of dynamical systems. PINNs offer a unique advantage by embedding known physical laws directly into the neural network's loss function, allowing for simple embedding of complex phenomena, even in the presence of uncertainties. This study specifically investigates three key applications of PINNs: state estimation in systems with sparse sensing, joint state-parameter estimation, when both system response and parameters are unknown, and parameter estimation within a Bayesian framework to quantify uncertainties. The results demonstrate that PINNs deliver an efficient tool across all aforementioned tasks, even in presence of modelling errors. However, these errors tend to have a more significant impact on parameter estimation, as the optimization process must reconcile discrepancies between the prescribed model and the true system behavior. Despite these challenges, PINNs show promise in dynamical system modeling, offering a robust approach to handling uncertainties.

Updated: 2024-10-30 14:10:21

标题: 利用物理信息神经网络对动力系统的响应估计和系统识别进行研究

摘要: 结构动力学的准确建模在许多工程应用中至关重要，如结构健康监测（SHM）、地震分析和振动控制。通常，这些模型源于基于物理原理，并可以从相应的控制方程中导出，通常为微分方程形式。然而，复杂系统特性，如非线性和能量耗散机制，通常意味着这些模型是近似的且常常不精确。这一挑战在SHM中更加严重，因为传感器数据通常是稀疏的，使得完全观察系统状态变得困难。为了解决这些问题，本文探讨了物理信息神经网络（PINNs）的使用，这是一类物理增强机器学习（PEML）技术，用于动态系统的识别和估计。PINNs通过将已知的物理定律直接嵌入到神经网络的损失函数中，提供了独特的优势，可以简单地嵌入复杂现象，即使存在不确定性。本研究特别调查了PINNs的三个关键应用：在具有稀疏感知的系统中进行状态估计，当系统响应和参数都未知时进行联合状态参数估计，以及在贝叶斯框架内进行参数估计以量化不确定性。结果表明，即使在存在建模错误的情况下，PINNs在所有上述任务中都提供了一种高效的工具。然而，这些错误往往对参数估计产生更显著的影响，因为优化过程必须将规定模型与真实系统行为之间的差异协调一致。尽管存在这些挑战，PINNs在动态系统建模中显示出潜力，提供了处理不确定性的健壮方法。

更新时间: 2024-10-30 14:10:21

领域: physics.comp-ph,cs.LG

下载: http://arxiv.org/abs/2410.01340v2

Toward Understanding In-context vs. In-weight Learning

It has recently been demonstrated empirically that in-context learning emerges in transformers when certain distributional properties are present in the training data, but this ability can also diminish upon further training. We provide a new theoretical understanding of these phenomena by identifying simplified distributional properties that give rise to the emergence and eventual disappearance of in-context learning. We do so by first analyzing a simplified model that uses a gating mechanism to choose between an in-weight and an in-context predictor. Through a combination of a generalization error and regret analysis we identify conditions where in-context and in-weight learning emerge. These theoretical findings are then corroborated experimentally by comparing the behaviour of a full transformer on the simplified distributions to that of the stylized model, demonstrating aligned results. We then extend the study to a full large language model, showing how fine-tuning on various collections of natural language prompts can elicit similar in-context and in-weight learning behaviour.

Updated: 2024-10-30 14:09:00

标题: 朝向理解上下文与权重学习

摘要: 最近的实证研究表明，在训练数据中存在特定的分布特性时，transformer 中的上下文学习会出现，但这种能力在进一步训练后也可能减弱。我们通过识别简化的分布特性来提供对这些现象的新理论理解，这些特性导致了上下文学习的出现和最终消失。我们首先通过分析一个使用门控机制在一个 in-weight 和一个 in-context 预测器之间进行选择的简化模型来实现这一点。通过结合泛化误差和遗憾分析，我们确定了上下文和权重学习出现的条件。这些理论发现通过将完整 transformer 在简化分布上的行为与风格化模型进行比较来进行实验证实。然后我们将研究扩展到一个完整的大型语言模型，展示如何在各种自然语言提示的收集上进行微调可以引发类似的上下文和权重学习行为。

更新时间: 2024-10-30 14:09:00

领域: cs.LG

下载: http://arxiv.org/abs/2410.23042v1

Emotional RAG: Enhancing Role-Playing Agents through Emotional Retrieval

As LLMs exhibit a high degree of human-like capability, increasing attention has been paid to role-playing research areas in which responses generated by LLMs are expected to mimic human replies. This has promoted the exploration of role-playing agents in various applications, such as chatbots that can engage in natural conversations with users and virtual assistants that can provide personalized support and guidance. The crucial factor in the role-playing task is the effective utilization of character memory, which stores characters' profiles, experiences, and historical dialogues. Retrieval Augmented Generation (RAG) technology is used to access the related memory to enhance the response generation of role-playing agents. Most existing studies retrieve related information based on the semantic similarity of memory to maintain characters' personalized traits, and few attempts have been made to incorporate the emotional factor in the retrieval argument generation (RAG) of LLMs. Inspired by the Mood-Dependent Memory theory, which indicates that people recall an event better if they somehow reinstate during recall the original emotion they experienced during learning, we propose a novel emotion-aware memory retrieval framework, termed Emotional RAG, which recalls the related memory with consideration of emotional state in role-playing agents. Specifically, we design two kinds of retrieval strategies, i.e., combination strategy and sequential strategy, to incorporate both memory semantic and emotional states during the retrieval process. Extensive experiments on three representative role-playing datasets demonstrate that our Emotional RAG framework outperforms the method without considering the emotional factor in maintaining the personalities of role-playing agents. This provides evidence to further reinforce the Mood-Dependent Memory theory in psychology.

Updated: 2024-10-30 14:08:50

标题: 情感RAG：通过情感检索增强角色扮演代理程序

摘要: 由于LLMs表现出较高的人类般能力，越来越多的关注被付诸到角色扮演研究领域，其中LLMs生成的响应被期望模仿人类的回复。这促进了在各种应用中探索角色扮演代理，例如可以与用户进行自然对话的聊天机器人和可以提供个性化支持和指导的虚拟助手。在角色扮演任务中的关键因素是有效利用角色记忆，其中存储了角色的概况、经验和历史对话。检索增强生成（RAG）技术用于访问相关记忆，以增强角色扮演代理的响应生成。大多数现有研究基于记忆的语义相似性检索相关信息，以维护角色的个性特征，很少尝试在LLMs的检索增强生成（RAG）中融入情感因素。受情绪依赖记忆理论的启发，该理论指出，如果在回忆时某种程度上恢复在学习时体验到的原始情绪，人们会更好地回忆事件，我们提出了一种新颖的情感感知记忆检索框架，称为情感RAG，它在考虑角色扮演代理的情绪状态的情况下回忆相关记忆。具体来说，我们设计了两种检索策略，即组合策略和顺序策略，以在检索过程中同时融入记忆语义和情感状态。对三个代表性的角色扮演数据集进行的广泛实验表明，我们的情感RAG框架优于不考虑情感因素维护角色扮演代理个性的方法。这为进一步加强心理学中的情绪依赖记忆理论提供了证据。

更新时间: 2024-10-30 14:08:50

领域: cs.AI

下载: http://arxiv.org/abs/2410.23041v1

Do LLMs "know" internally when they follow instructions?

Instruction-following is crucial for building AI agents with large language models (LLMs), as these models must adhere strictly to user-provided constraints and guidelines. However, LLMs often fail to follow even simple and clear instructions. To improve instruction-following behavior and prevent undesirable outputs, a deeper understanding of how LLMs' internal states relate to these outcomes is required. Our analysis of LLM internal states reveal a dimension in the input embedding space linked to successful instruction-following. We demonstrate that modifying representations along this dimension improves instruction-following success rates compared to random changes, without compromising response quality. Further investigation reveals that this dimension is more closely related to the phrasing of prompts rather than the inherent difficulty of the task or instructions. This discovery also suggests explanations for why LLMs sometimes fail to follow clear instructions and why prompt engineering is often effective, even when the content remains largely unchanged. This work provides insight into the internal workings of LLMs' instruction-following, paving the way for reliable LLM agents.

Updated: 2024-10-30 14:06:12

标题: LLM在遵循指示时是否内部“知道”？

摘要: 遵循指令对于构建具有大型语言模型（LLM）的人工智能代理至关重要，因为这些模型必须严格遵守用户提供的约束和指导方针。然而，LLM经常无法遵循甚至简单明确的指令。为了改善遵循指令的行为并防止不良输出，需要更深入地了解LLM内部状态与这些结果的关系。我们对LLM内部状态的分析揭示了与成功遵循指令相关的输入嵌入空间中的一个维度。我们证明，沿着这个维度修改表示比随机变化提高了遵循指令的成功率，而不会损害响应质量。进一步的调查揭示，这个维度与提示的措辞更为密切相关，而不是任务或指令本身的固有难度。这一发现也为解释为什么LLM有时无法遵循明确指令以及为什么提示工程通常有效提供了解释，即使内容基本保持不变。这项工作为LLM遵循指令的内部工作提供了洞察，为可靠的LLM代理铺平了道路。

更新时间: 2024-10-30 14:06:12

领域: cs.AI,cs.CL

下载: http://arxiv.org/abs/2410.14516v4

Advanced Detection of Source Code Clones via an Ensemble of Unsupervised Similarity Measures

The capability of accurately determining code similarity is crucial in many tasks related to software development. For example, it might be essential to identify code duplicates for performing software maintenance. This research introduces a novel ensemble learning approach for code similarity assessment, combining the strengths of multiple unsupervised similarity measures. The key idea is that the strengths of a diverse set of similarity measures can complement each other and mitigate individual weaknesses, leading to improved performance. Preliminary results show that while Transformers-based CodeBERT and its variant GraphCodeBERT are undoubtedly the best option in the presence of abundant training data, in the case of specific small datasets (up to 500 samples), our ensemble achieves similar results, without prejudice to the interpretability of the resulting solution, and with a much lower associated carbon footprint due to training. The source code of this novel approach can be downloaded from https://github.com/jorge-martinez-gil/ensemble-codesim.

Updated: 2024-10-30 14:01:43

标题: 通过一组无监督相似度度量的高级源代码克隆检测

摘要: 准确确定代码相似性的能力在许多与软件开发相关的任务中至关重要。例如，识别代码重复以进行软件维护可能是必不可少的。这项研究引入了一种新颖的集成学习方法，用于代码相似性评估，结合了多个无监督相似性度量的优势。关键思想是，多种相似性度量的优势可以相互补充，并减轻个体的弱点，从而提高性能。初步结果显示，虽然基于Transformer的CodeBERT及其变种GraphCodeBERT在拥有丰富的训练数据时无疑是最佳选择，但在特定小数据集的情况下（最多500个样本），我们的集成方法也能取得类似的结果，且不影响结果解释性，并且由于训练过程中产生的碳足迹较低。这种新颖方法的源代码可以从https://github.com/jorge-martinez-gil/ensemble-codesim下载。

更新时间: 2024-10-30 14:01:43

领域: cs.SE,cs.AI

下载: http://arxiv.org/abs/2405.02095v2

Offline Reinforcement Learning and Sequence Modeling for Downlink Link Adaptation

Contemporary radio access networks employ link adaption (LA) algorithms to optimize the modulation and coding schemes to adapt to the prevailing propagation conditions and are near-optimal in terms of the achieved spectral efficiency. LA is a challenging task in the presence of mobility, fast fading, and imperfect channel quality information and limited knowledge of the receiver characteristics at the transmitter, which render model-based LA algorithms complex and suboptimal. Model-based LA is especially difficult as connected user equipment devices become increasingly heterogeneous in terms of receiver capabilities, antenna configurations and hardware characteristics. Recognizing these difficulties, previous works have proposed reinforcement learning (RL) for LA, which faces deployment difficulties due to their potential negative impacts on live performance. To address this challenge, this paper considers offline RL to learn LA policies from data acquired in live networks with minimal or no intrusive effects on the network operation. We propose three LA designs based on batch-constrained deep Q-learning, conservative Q-learning, and decision transformers, showing that offline RL algorithms can achieve performance of state-of-the-art online RL methods when data is collected with a proper behavioral policy.

Updated: 2024-10-30 14:01:31

标题: 脱机强化学习和序列建模用于下行链路自适应

摘要: 当代无线接入网络采用链路自适应（LA）算法来优化调制和编码方案，以适应当前传播条件，并在实现频谱效率方面接近最佳。在移动性、快速衰落和信道质量信息不完善以及发送端对接收端特性了解有限的情况下，LA是一项具有挑战性的任务，这使得基于模型的LA算法复杂且次优。随着连接的用户设备在接收机能力、天线配置和硬件特性方面越来越异构，基于模型的LA尤为困难。认识到这些困难，先前的研究提出了应用于LA的强化学习（RL），但由于其潜在的负面影响而面临部署困难。为解决这一挑战，本文考虑了离线RL，从实时网络中获取的数据中学习LA策略，对网络运行几乎没有或没有干扰效果。我们提出了基于批次约束深度Q学习、保守Q学习和决策变换器的三种LA设计，表明当数据以适当的行为策略收集时，离线RL算法可以实现与最先进的在线RL方法相当的性能。

更新时间: 2024-10-30 14:01:31

领域: cs.LG,cs.AI,cs.SY,eess.SY

下载: http://arxiv.org/abs/2410.23031v1

Planning and Learning in Risk-Aware Restless Multi-Arm Bandit Problem

In restless multi-arm bandits, a central agent is tasked with optimally distributing limited resources across several bandits (arms), with each arm being a Markov decision process. In this work, we generalize the traditional restless multi-arm bandit problem with a risk-neutral objective by incorporating risk-awareness. We establish indexability conditions for the case of a risk-aware objective and provide a solution based on Whittle index. In addition, we address the learning problem when the true transition probabilities are unknown by proposing a Thompson sampling approach and show that it achieves bounded regret that scales sublinearly with the number of episodes and quadratically with the number of arms. The efficacy of our method in reducing risk exposure in restless multi-arm bandits is illustrated through a set of numerical experiments.

Updated: 2024-10-30 13:59:30

标题: 在风险感知的多臂赌博问题中的规划和学习

摘要: 在不安的多臂老虎机中，一个中央代理被委托在几个老虎机（手臂）之间最佳地分配有限资源，每个手臂都是一个马尔可夫决策过程。在这项工作中，我们通过引入风险意识来推广传统的不安多臂老虎机问题。我们为风险意识目标建立了指标化条件，并提供了基于Whittle指数的解决方案。此外，我们通过提出一种汤普森抽样方法来解决当真实转移概率未知时的学习问题，并表明它实现了随着集数增加而呈亚线性缩放和随着手臂数量增加而呈二次缩放的有界遗憾。我们的方法在减少不安的多臂老虎机中的风险敞口方面的有效性通过一系列数字实验得到了说明。

更新时间: 2024-10-30 13:59:30

领域: cs.LG,cs.SY,eess.SY

下载: http://arxiv.org/abs/2410.23029v1

Imprecise Label Learning: A Unified Framework for Learning with Various Imprecise Label Configurations

Learning with reduced labeling standards, such as noisy label, partial label, and multiple label candidates, which we generically refer to as \textit{imprecise} labels, is a commonplace challenge in machine learning tasks. Previous methods tend to propose specific designs for every emerging imprecise label configuration, which is usually unsustainable when multiple configurations of imprecision coexist. In this paper, we introduce imprecise label learning (ILL), a framework for the unification of learning with various imprecise label configurations. ILL leverages expectation-maximization (EM) for modeling the imprecise label information, treating the precise labels as latent variables.Instead of approximating the correct labels for training, it considers the entire distribution of all possible labeling entailed by the imprecise information. We demonstrate that ILL can seamlessly adapt to partial label learning, semi-supervised learning, noisy label learning, and, more importantly, a mixture of these settings. Notably, ILL surpasses the existing specified techniques for handling imprecise labels, marking the first unified framework with robust and effective performance across various challenging settings. We hope our work will inspire further research on this topic, unleashing the full potential of ILL in wider scenarios where precise labels are expensive and complicated to obtain.

Updated: 2024-10-30 13:58:35

标题: 模糊标签学习：一种统一框架，用于学习各种模糊标签配置。

摘要: 学习减少标注标准，例如嘈杂标签、部分标签和多个标签候选，我们通常称之为\textit{不精确}标签，是机器学习任务中常见的挑战。先前的方法往往针对每种新出现的不准确标签配置提出具体设计，当多种不准确性配置共存时，这种方法通常是不可持续的。在本文中，我们介绍了不精确标签学习（ILL），这是一个统一学习各种不准确标签配置的框架。ILL利用期望最大化（EM）来建模不准确标签信息，将精确标签视为潜在变量。与近似于训练的正确标签不同，ILL考虑了由不准确信息引起的所有可能标注的整个分布。我们证明ILL可以无缝地适应部分标签学习、半监督学习、嘈杂标签学习，更重要的是，这些设置的混合。值得注意的是，ILL超越了现有的处理不准确标签的指定技术，标志着第一个具有强大和有效性能的统一框架，在各种具有挑战性的设置中表现出色。我们希望我们的工作能够激发对这一主题的进一步研究，释放ILL在精确标签昂贵且难以获得的更广泛场景中的全部潜力。

更新时间: 2024-10-30 13:58:35

领域: cs.LG,cs.AI,cs.CV

下载: http://arxiv.org/abs/2305.12715v4

Learning Dynamic Selection and Pricing of Out-of-Home Deliveries

Home delivery failures, traffic congestion, and relatively large handling times have a negative impact on the profitability of last-mile logistics. A potential solution is the delivery to parcel lockers or parcel shops, denoted by out-of-home (OOH) delivery. In the academic literature, models for OOH delivery were so far limited to static settings, contrasting with the sequential nature of the problem. We model the sequential decision-making problem of which OOH location to offer against what incentive for each incoming customer, taking into account future customer arrivals and choices. We propose Dynamic Selection and Pricing of OOH (DSPO), an algorithmic pipeline that uses a novel spatial-temporal state encoding as input to a convolutional neural network. We demonstrate the performance of our method by benchmarking it against two state-of-the-art approaches. Our extensive numerical study, guided by real-world data, reveals that DSPO can save 19.9%pt in costs compared to a situation without OOH locations, 7%pt compared to a static selection and pricing policy, and 3.8%pt compared to a state-of-the-art demand management benchmark. We provide comprehensive insights into the complex interplay between OOH delivery dynamics and customer behavior influenced by pricing strategies. The implications of our findings suggest practitioners to adopt dynamic selection and pricing policies.

Updated: 2024-10-30 13:56:21

标题: 学习动态选择和定价户外送货

摘要: 家庭递送失败、交通拥堵以及相对较长的处理时间对最后一英里物流的盈利能力产生了负面影响。一种潜在的解决方案是将货物递送至包裹柜或包裹店，称为户外递送（OOH递送）。在学术文献中，迄今为止，关于OOH递送的模型仅限于静态设置，与问题的顺序性质相矛盾。我们对要向哪个OOH位置提供以及为每位到来的客户提供什么激励的顺序决策问题进行建模，考虑未来客户到达和选择。我们提出了Dynamic Selection and Pricing of OOH（DSPO），这是一个算法流水线，将一种新颖的空间-时间状态编码作为卷积神经网络的输入。我们通过将其与两种最先进的方法进行基准测试来展示我们方法的性能。我们进行了广泛的数值研究，根据真实世界数据，发现DSPO相比没有OOH位置的情况可以节省19.9个百分点的成本，相比于静态选择和定价政策可以节省7个百分点，相比于最先进的需求管理基准可以节省3.8个百分点。我们提供了有关OOH递送动态和受定价策略影响的客户行为之间复杂相互作用的全面见解。我们的研究结果暗示从业人员采用动态选择和定价政策。

更新时间: 2024-10-30 13:56:21

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2311.13983v3

On the stability of gradient descent with second order dynamics for time-varying cost functions

Gradient based optimization algorithms deployed in Machine Learning (ML) applications are often analyzed and compared by their convergence rates or regret bounds. While these rates and bounds convey valuable information they don't always directly translate to stability guarantees. Stability and similar concepts, like robustness, will become ever more important as we move towards deploying models in real-time and safety critical systems. In this work we build upon the results in Gaudio et al. 2021 and Moreu & Annaswamy 2022 for gradient descent with second order dynamics when applied to explicitly time varying cost functions and provide more general stability guarantees. These more general results can aid in the design and certification of these optimization schemes so as to help ensure safe and reliable deployment for real-time learning applications. We also hope that the techniques provided here will stimulate and cross-fertilize the analysis that occurs on the same algorithms from the online learning and stochastic optimization communities.

Updated: 2024-10-30 13:55:03

标题: 关于时间变化成本函数的二阶动力学梯度下降稳定性

摘要: 在机器学习（ML）应用中部署的基于梯度的优化算法通常通过其收敛速度或遗憾界进行分析和比较。虽然这些速度和界传达了有价值的信息，但并不总是直接转化为稳定性保证。稳定性和类似概念，如鲁棒性，将在我们朝着在实时和安全关键系统中部署模型的过程中变得越来越重要。在这项工作中，我们基于Gaudio等人2021年和Moreu＆Annaswamy 2022年的结果，针对显式时间变化成本函数应用于具有二阶动力学的梯度下降，并提供更一般的稳定性保证。这些更一般的结果可以帮助设计和认证这些优化方案，以确保实时学习应用的安全可靠部署。我们还希望这里提供的技术能够激发并促进在线学习和随机优化社区对同一算法进行的分析。

更新时间: 2024-10-30 13:55:03

领域: cs.LG,math.OC

下载: http://arxiv.org/abs/2405.13765v2

Fast TRAC: A Parameter-Free Optimizer for Lifelong Reinforcement Learning

A key challenge in lifelong reinforcement learning (RL) is the loss of plasticity, where previous learning progress hinders an agent's adaptation to new tasks. While regularization and resetting can help, they require precise hyperparameter selection at the outset and environment-dependent adjustments. Building on the principled theory of online convex optimization, we present a parameter-free optimizer for lifelong RL, called TRAC, which requires no tuning or prior knowledge about the distribution shifts. Extensive experiments on Procgen, Atari, and Gym Control environments show that TRAC works surprisingly well-mitigating loss of plasticity and rapidly adapting to challenging distribution shifts-despite the underlying optimization problem being nonconvex and nonstationary.

Updated: 2024-10-30 13:54:18

标题: 快速TRAC：一种无需参数的增强学习生命周期优化器

摘要: 终身强化学习（RL）中的一个关键挑战是可塑性的丧失，先前的学习进展阻碍了代理人对新任务的适应。虽然正则化和重置可以帮助，但它们需要在一开始精确选择超参数，并且需要根据环境进行调整。基于在线凸优化的原则性理论，我们提出了一种名为TRAC的终身RL无参数优化器，不需要调整或关于分布转变的先验知识。在Procgen、Atari和Gym Control环境上进行了大量实验，显示TRAC在减轻可塑性丧失和迅速适应具有挑战性的分布转变方面表现出色，尽管基础优化问题是非凸的和非定常的。

更新时间: 2024-10-30 13:54:18

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2405.16642v3

Interpretable Mesomorphic Networks for Tabular Data

Even though neural networks have been long deployed in applications involving tabular data, still existing neural architectures are not explainable by design. In this paper, we propose a new class of interpretable neural networks for tabular data that are both deep and linear at the same time (i.e. mesomorphic). We optimize deep hypernetworks to generate explainable linear models on a per-instance basis. As a result, our models retain the accuracy of black-box deep networks while offering free-lunch explainability for tabular data by design. Through extensive experiments, we demonstrate that our explainable deep networks have comparable performance to state-of-the-art classifiers on tabular data and outperform current existing methods that are explainable by design.

Updated: 2024-10-30 13:54:09

标题: 可解释的表格数据Mesomorphic网络

摘要: 尽管神经网络长期以来在涉及表格数据的应用中被广泛使用，但现有的神经架构仍然无法通过设计解释。在本文中，我们提出了一种新型适用于表格数据的可解释神经网络，这种网络既深层又线性（即间质的）。我们优化深度超网络，以在每个实例基础上生成可解释的线性模型。结果，我们的模型保留了黑盒深度网络的准确性，同时通过设计为表格数据提供了免费解释性。通过大量实验，我们展示了我们的可解释深度网络在表格数据上与最先进的分类器具有可比较的性能，并且优于当前现有的设计可解释方法。

更新时间: 2024-10-30 13:54:09

领域: cs.LG

下载: http://arxiv.org/abs/2305.13072v2

Slight Corruption in Pre-training Data Makes Better Diffusion Models

Diffusion models (DMs) have shown remarkable capabilities in generating realistic high-quality images, audios, and videos. They benefit significantly from extensive pre-training on large-scale datasets, including web-crawled data with paired data and conditions, such as image-text and image-class pairs. Despite rigorous filtering, these pre-training datasets often inevitably contain corrupted pairs where conditions do not accurately describe the data. This paper presents the first comprehensive study on the impact of such corruption in pre-training data of DMs. We synthetically corrupt ImageNet-1K and CC3M to pre-train and evaluate over 50 conditional DMs. Our empirical findings reveal that various types of slight corruption in pre-training can significantly enhance the quality, diversity, and fidelity of the generated images across different DMs, both during pre-training and downstream adaptation stages. Theoretically, we consider a Gaussian mixture model and prove that slight corruption in the condition leads to higher entropy and a reduced 2-Wasserstein distance to the ground truth of the data distribution generated by the corruptly trained DMs. Inspired by our analysis, we propose a simple method to improve the training of DMs on practical datasets by adding condition embedding perturbations (CEP). CEP significantly improves the performance of various DMs in both pre-training and downstream tasks. We hope that our study provides new insights into understanding the data and pre-training processes of DMs and all models are released at https://huggingface.co/DiffusionNoise.

Updated: 2024-10-30 13:52:56

标题: 微小的预训练数据损坏会使扩散模型更好

摘要: 扩散模型（DMs）已经展示出在生成逼真高质量图像、音频和视频方面的显著能力。它们从在大规模数据集上进行广泛的预训练中受益，包括带有成对数据和条件的网络爬取数据，如图像文本和图像类别对。尽管经过严格过滤，但这些预训练数据集往往不可避免地包含条件未准确描述数据的损坏对。本文首次全面研究了这种损坏对DMs预训练数据的影响。我们通过对ImageNet-1K和CC3M进行合成损坏来预训练和评估50多个有条件的DMs。我们的实证发现表明，预训练中各种类型的轻微损坏可以显著提高生成图像的质量、多样性和保真度，无论是在预训练阶段还是在下游适应阶段。从理论上讲，我们考虑了一个高斯混合模型，并证明了条件中轻微损坏会导致更高的熵和由受损训练的DMs生成的数据分布的与地面真相的2-瓦瑟斯坦距离减小。受到我们分析的启发，我们提出了一种简单的方法来通过添加条件嵌入扰动（CEP）来改善DMs在实际数据集上的训练。CEP显著提高了各种DMs在预训练和下游任务中的性能。我们希望我们的研究为理解DMs的数据和预训练过程提供新的见解，并且所有模型均可在https://huggingface.co/DiffusionNoise上获取。

更新时间: 2024-10-30 13:52:56

领域: cs.CV,cs.AI,cs.LG

下载: http://arxiv.org/abs/2405.20494v2

Online Intrinsic Rewards for Decision Making Agents from Large Language Model Feedback

Automatically synthesizing dense rewards from natural language descriptions is a promising paradigm in reinforcement learning (RL), with applications to sparse reward problems, open-ended exploration, and hierarchical skill design. Recent works have made promising steps by exploiting the prior knowledge of large language models (LLMs). However, these approaches suffer from important limitations: they are either not scalable to problems requiring billions of environment samples; or are limited to reward functions expressible by compact code, which may require source code and have difficulty capturing nuanced semantics; or require a diverse offline dataset, which may not exist or be impossible to collect. In this work, we address these limitations through a combination of algorithmic and systems-level contributions. We propose ONI, a distributed architecture that simultaneously learns an RL policy and an intrinsic reward function using LLM feedback. Our approach annotates the agent's collected experience via an asynchronous LLM server, which is then distilled into an intrinsic reward model. We explore a range of algorithmic choices for reward modeling with varying complexity, including hashing, classification, and ranking models. By studying their relative tradeoffs, we shed light on questions regarding intrinsic reward design for sparse reward problems. Our approach achieves state-of-the-art performance across a range of challenging, sparse reward tasks from the NetHack Learning Environment in a simple unified process, solely using the agent's gathered experience, without requiring external datasets nor source code. We make our code available at \url{URL} (coming soon).

Updated: 2024-10-30 13:52:43

标题: 大型语言模型反馈中的在线内在奖励对决策制定代理的影响

摘要: 自然语言描述自动合成密集奖励是强化学习（RL）中一种有前景的范式，可应用于稀疏奖励问题、开放式探索和层次化技能设计。最近的工作通过利用大型语言模型（LLMs）的先验知识迈出了令人鼓舞的步伐。然而，这些方法存在重要局限性：要么无法扩展到需要数十亿个环境样本的问题；要么仅限于由紧凑代码表达的奖励函数，这可能需要源代码并难以捕捉微妙的语义；要么需要多样化的离线数据集，这些数据集可能不存在或无法收集。在这项工作中，我们通过算法和系统层面的贡献来解决这些限制。我们提出了ONI，一个分布式架构，同时利用LLM反馈学习RL策略和内在奖励函数。我们的方法通过异步LLM服务器注释代理收集的经验，然后将其提炼为内在奖励模型。我们探索了一系列算法选择，包括哈希、分类和排名模型，用于奖励建模，具有不同的复杂性。通过研究它们的相对权衡，我们揭示了关于稀疏奖励问题内在奖励设计的问题。我们的方法在NetHack学习环境中一系列具有挑战性的稀疏奖励任务中取得了最先进的性能，通过一个简单的统一过程，仅使用代理收集的经验，而无需外部数据集或源代码。我们将我们的代码提供在\url{URL}（即将推出）。

更新时间: 2024-10-30 13:52:43

领域: cs.LG,cs.AI,cs.CL,cs.RO

下载: http://arxiv.org/abs/2410.23022v1

A Generalized Framework for Multiscale State-Space Modeling with Nested Nonlinear Dynamics: An Application to Bayesian Learning under Switching Regimes

In this work, we introduce a generalized framework for multiscale state-space modeling that incorporates nested nonlinear dynamics, with a specific focus on Bayesian learning under switching regimes. Our framework captures the complex interactions between fast and slow processes within systems, allowing for the analysis of how these dynamics influence each other across various temporal scales. We model these interactions through a hierarchical structure in which finer time-scale dynamics are nested within coarser ones, while facilitating feedback between the scales. To promote the practical application of our framework, we address the problem of identifying switching regimes and transient dynamics. In particular, we develop a Bayesian learning approach to estimate latent states and indicators corresponding to switching dynamics, enabling the model to adapt effectively to regime changes. We employ Sequential Monte Carlo, or particle filtering, for inference. We illustrate the utility of our framework through simulations. The results demonstrate that our Bayesian learning approach effectively tracks state transitions and achieves accurate identification of switching dynamics in multiscale systems.

Updated: 2024-10-30 13:52:39

标题: 一个广义的多尺度状态空间建模框架与嵌套非线性动力学：在切换制度下贝叶斯学习的应用

摘要: 在这项工作中，我们介绍了一个通用的多尺度状态空间建模框架，该框架整合了嵌套非线性动力学，特别关注在切换制度下的贝叶斯学习。我们的框架捕捉了系统内快速和慢速过程之间复杂的相互作用，允许分析这些动态如何在不同的时间尺度上相互影响。我们通过一个分层结构来建模这些相互作用，在这个结构中，更精细的时间尺度动态嵌套在更粗糙的时间尺度内，同时促进了尺度之间的反馈。为了促进我们框架的实际应用，我们解决了识别切换制度和瞬态动态的问题。特别是，我们开发了一个贝叶斯学习方法来估计与切换动态相关的潜在状态和指示器，使模型能够有效地适应制度变化。我们使用顺序蒙特卡洛或粒子滤波进行推断。我们通过模拟展示了我们框架的实用性。结果表明，我们的贝叶斯学习方法有效地跟踪状态转换，并在多尺度系统中准确识别切换动态。

更新时间: 2024-10-30 13:52:39

领域: stat.ML,cs.CE,cs.LG,eess.SP

下载: http://arxiv.org/abs/2410.19074v2

Achilles, Neural Network to Predict the Gold Vs US Dollar Integration with Trading Bot for Automatic Trading

Predicting the stock market is a big challenge for the machine learning world. It is known how difficult it is to have accurate and consistent predictions with ML models. Some architectures are able to capture the movement of stocks but almost never are able to be launched to the production world. We present Achilles, with a classical architecture of LSTM(Long Short Term Memory) neural network this model is able to predict the Gold vs USD commodity. With the predictions minute-per-minute of this model we implemented a trading bot to run during 23 days of testing excluding weekends. At the end of the testing period we generated $1623.52 in profit with the methodology used. The results of our method demonstrate Machine Learning can successfully be implemented to predict the Gold vs USD commodity.

Updated: 2024-10-30 13:52:24

标题: 阿基里斯，神经网络用于预测黄金对美元的集成，并与交易机器人结合实现自动交易

摘要: 预测股市是机器学习领域的一大挑战。众所周知，使用机器学习模型进行准确和一致的预测是非常困难的。一些架构能够捕捉股票的波动，但几乎从不能够投入生产世界。我们提出了Achilles，采用LSTM（长短期记忆）神经网络的经典架构，该模型能够预测黄金对美元的商品。通过该模型每分钟的预测结果，我们实施了一个交易机器人，在进行了23天的测试（不包括周末）后，我们通过所采用的方法赚取了1623.52美元的利润。我们的方法结果表明，机器学习可以成功地用于预测黄金对美元的商品。

更新时间: 2024-10-30 13:52:24

领域: q-fin.ST,cs.LG

下载: http://arxiv.org/abs/2410.21291v2

Linear Adversarial Concept Erasure

Modern neural models trained on textual data rely on pre-trained representations that emerge without direct supervision. As these representations are increasingly being used in real-world applications, the inability to \emph{control} their content becomes an increasingly important problem. We formulate the problem of identifying and erasing a linear subspace that corresponds to a given concept, in order to prevent linear predictors from recovering the concept. We model this problem as a constrained, linear maximin game, and show that existing solutions are generally not optimal for this task. We derive a closed-form solution for certain objectives, and propose a convex relaxation, \method, that works well for others. When evaluated in the context of binary gender removal, the method recovers a low-dimensional subspace whose removal mitigates bias by intrinsic and extrinsic evaluation. We show that the method is highly expressive, effectively mitigating bias in deep nonlinear classifiers while maintaining tractability and interpretability.

Updated: 2024-10-30 13:50:02

标题: 线性对抗概念消除

摘要: 现代基于文本数据训练的神经模型依赖于在没有直接监督的情况下出现的预训练表示。随着这些表示越来越多地被用于现实世界的应用程序中，无法\emph{控制}它们的内容变得越来越重要。我们提出了识别和擦除与给定概念相对应的线性子空间的问题，以防止线性预测器恢复该概念。我们将这个问题建模为一个受限制的线性最大极小博弈，并展示现有的解决方案通常不适用于这个任务。我们为某些目标推导出了一个闭合解决方案，并提出了一个凸松弛，\method，对其他目标效果很好。在二元性别移除的背景下评估时，该方法恢复了一个低维子空间，其移除减轻了内在和外部评估中的偏见。我们展示该方法具有高度表达性，在维护可处理性和可解释性的同时有效减轻了深度非线性分类器中的偏见。

更新时间: 2024-10-30 13:50:02

领域: cs.LG,cs.CL

下载: http://arxiv.org/abs/2201.12091v6

Active Learning of Mealy Machines with Timers

We present the first algorithm for query learning of a class of Mealy machines with timers in a black-box context. Our algorithm is an extension of the L# algorithm of Vaandrager et al. to a timed setting. We rely on symbolic queries which empower us to reason on untimed executions while learning. Similarly to the algorithm for learning timed automata of Waga, these symbolic queries can be implemented using finitely many concrete queries. Experiments with a prototype implementation, written in Rust, show that our algorithm is able to efficiently learn realistic benchmarks.

Updated: 2024-10-30 13:49:15

标题: 带有定时器的Mealy机器的主动学习

摘要: 我们提出了第一个用于在黑盒上下文中查询学习带有定时器的Mealy机器类的算法。我们的算法是Vaandrager等人的L#算法在定时环境下的扩展。我们依赖于符号查询，这使我们能够在学习过程中对非定时执行进行推理。类似于Waga用于学习定时自动机的算法，这些符号查询可以使用有限数量的具体查询来实现。使用Rust编写的原型实现进行的实验表明，我们的算法能够有效地学习现实基准。

更新时间: 2024-10-30 13:49:15

领域: cs.FL,cs.LG,68Q45,F.4.3

下载: http://arxiv.org/abs/2403.02019v2

InterpBench: Semi-Synthetic Transformers for Evaluating Mechanistic Interpretability Techniques

Mechanistic interpretability methods aim to identify the algorithm a neural network implements, but it is difficult to validate such methods when the true algorithm is unknown. This work presents InterpBench, a collection of semi-synthetic yet realistic transformers with known circuits for evaluating these techniques. We train simple neural networks using a stricter version of Interchange Intervention Training (IIT) which we call Strict IIT (SIIT). Like the original, SIIT trains neural networks by aligning their internal computation with a desired high-level causal model, but it also prevents non-circuit nodes from affecting the model's output. We evaluate SIIT on sparse transformers produced by the Tracr tool and find that SIIT models maintain Tracr's original circuit while being more realistic. SIIT can also train transformers with larger circuits, like Indirect Object Identification (IOI). Finally, we use our benchmark to evaluate existing circuit discovery techniques.

Updated: 2024-10-30 13:44:04

标题: InterpBench: 用于评估机理可解释性技术的半合成变压器

摘要: 机理可解释性方法旨在确定神经网络实现的算法，但在真实算法未知的情况下很难验证这些方法。这项工作提出了InterpBench，这是一个包含已知电路的半合成但真实的变压器集合，用于评估这些技术。我们使用称为Strict IIT（SIIT）的严格版Interchange Intervention Training（IIT）训练简单的神经网络。与原始方法类似，SIIT通过使其内部计算与所需的高级因果模型对齐来训练神经网络，但它还防止非电路节点影响模型的输出。我们在由Tracr工具生成的稀疏变压器上评估SIIT，并发现SIIT模型保持了Tracr原始电路的同时更加真实。SIIT还可以训练具有更大电路的变压器，例如间接对象识别（IOI）。最后，我们使用我们的基准来评估现有的电路发现技术。

更新时间: 2024-10-30 13:44:04

领域: cs.LG

下载: http://arxiv.org/abs/2407.14494v2

NeuralSolver: Learning Algorithms For Consistent and Efficient Extrapolation Across General Tasks

We contribute NeuralSolver, a novel recurrent solver that can efficiently and consistently extrapolate, i.e., learn algorithms from smaller problems (in terms of observation size) and execute those algorithms in large problems. Contrary to previous recurrent solvers, NeuralSolver can be naturally applied in both same-size problems, where the input and output sizes are the same, and in different-size problems, where the size of the input and output differ. To allow for this versatility, we design NeuralSolver with three main components: a recurrent module, that iteratively processes input information at different scales, a processing module, responsible for aggregating the previously processed information, and a curriculum-based training scheme, that improves the extrapolation performance of the method. To evaluate our method we introduce a set of novel different-size tasks and we show that NeuralSolver consistently outperforms the prior state-of-the-art recurrent solvers in extrapolating to larger problems, considering smaller training problems and requiring less parameters than other approaches.

Updated: 2024-10-30 13:42:44

标题: 神经求解器：用于在一般任务中保持一致和高效外推的学习算法

摘要: 我们提出了NeuralSolver，这是一种新颖的递归求解器，可以高效且一致地外推，即从较小的问题（观测大小方面）中学习算法并在较大的问题中执行这些算法。与先前的递归求解器相反，NeuralSolver可以自然地应用于同尺寸问题和不同尺寸问题，输入和输出尺寸相同或不同。为了实现这种多功能性，我们设计了具有三个主要组件的NeuralSolver：一个递归模块，逐步以不同尺度处理输入信息，一个处理模块，负责聚合先前处理的信息，以及基于课程的训练方案，提高方法的外推性能。为了评估我们的方法，我们引入了一组新颖的不同尺寸任务，并展示了NeuralSolver在外推到较大问题时始终优于先前最先进的递归求解器，考虑到较小的训练问题并需要比其他方法更少的参数。

更新时间: 2024-10-30 13:42:44

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2402.15393v3

Equivariant Machine Learning on Graphs with Nonlinear Spectral Filters

Equivariant machine learning is an approach for designing deep learning models that respect the symmetries of the problem, with the aim of reducing model complexity and improving generalization. In this paper, we focus on an extension of shift equivariance, which is the basis of convolution networks on images, to general graphs. Unlike images, graphs do not have a natural notion of domain translation. Therefore, we consider the graph functional shifts as the symmetry group: the unitary operators that commute with the graph shift operator. Notably, such symmetries operate in the signal space rather than directly in the spatial space. We remark that each linear filter layer of a standard spectral graph neural network (GNN) commutes with graph functional shifts, but the activation function breaks this symmetry. Instead, we propose nonlinear spectral filters (NLSFs) that are fully equivariant to graph functional shifts and show that they have universal approximation properties. The proposed NLSFs are based on a new form of spectral domain that is transferable between graphs. We demonstrate the superior performance of NLSFs over existing spectral GNNs in node and graph classification benchmarks.

Updated: 2024-10-30 13:39:43

标题: 具有非线性谱滤波器的图上等变机器学习

摘要: 等变机器学习是一种设计深度学习模型的方法，它尊重问题的对称性，旨在降低模型复杂性并改善泛化能力。在本文中，我们关注一种对移位等变性的扩展，这是卷积网络在图像上的基础，在通用图上。与图像不同，图没有自然的域平移概念。因此，我们将图功能平移视为对称群：与图平移算子交换的酉算子。值得注意的是，这样的对称性在信号空间中操作，而不是直接在空间中操作。我们指出，标准谱图神经网络（GNN）的每个线性滤波层都与图功能平移交换，但激活函数破坏了这种对称性。相反，我们提出全等变于图功能平移的非线性谱滤波器（NLSFs），并展示它们具有通用逼近性质。所提出的NLSFs基于一种可在图之间传输的新形式的谱域。我们展示了NLSFs在节点和图分类基准测试中优于现有的谱GNN的性能。

更新时间: 2024-10-30 13:39:43

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2406.01249v2

Scoring Rules and Calibration for Imprecise Probabilities

What does it mean to say that, for example, the probability for rain tomorrow is between 20% and 30%? The theory for the evaluation of precise probabilistic forecasts is well-developed and is grounded in the key concepts of proper scoring rules and calibration. For the case of imprecise probabilistic forecasts (sets of probabilities), such theory is still lacking. In this work, we therefore generalize proper scoring rules and calibration to the imprecise case. We develop these concepts as relative to data models and decision problems. As a consequence, the imprecision is embedded in a clear context. We establish a close link to the paradigm of (group) distributional robustness and in doing so provide new insights for it. We argue that proper scoring rules and calibration serve two distinct goals, which are aligned in the precise case, but intriguingly are not necessarily aligned in the imprecise case. The concept of decision-theoretic entropy plays a key role for both goals. Finally, we demonstrate the theoretical insights in machine learning practice, in particular we illustrate subtle pitfalls relating to the choice of loss function in distributional robustness.

Updated: 2024-10-30 13:29:47

标题: 不准确概率的评分规则和校准

摘要: 什么意味着说，例如，明天下雨的概率在20%到30%之间？对于精确概率预测的评估理论已经很成熟，并基于适当评分规则和校准的关键概念。对于不精确概率预测（概率集合）的情况，这样的理论仍然缺乏。在这项工作中，我们将适当评分规则和校准推广到不精确情况。我们将这些概念发展为相对于数据模型和决策问题的概念。因此，这种不确定性被嵌入在一个明确的背景中。我们与（群体）分布鲁棒性范式建立了密切联系，并为此提供了新的见解。我们认为适当评分规则和校准服务于两个不同的目标，在精确的情况下是一致的，但在不精确的情况下却并不一定一致。决策论熵的概念对于这两个目标起着关键作用。最后，我们在机器学习实践中展示了理论见解，特别是我们阐明了与分布鲁棒性损失函数选择相关的微妙陷阱。

更新时间: 2024-10-30 13:29:47

领域: cs.LG,math.ST,stat.TH

下载: http://arxiv.org/abs/2410.23001v1

A Comparison of Prompt Engineering Techniques for Task Planning and Execution in Service Robotics

Recent advances in LLM have been instrumental in autonomous robot control and human-robot interaction by leveraging their vast general knowledge and capabilities to understand and reason across a wide range of tasks and scenarios. Previous works have investigated various prompt engineering techniques for improving the performance of \glspl{LLM} to accomplish tasks, while others have proposed methods that utilize LLMs to plan and execute tasks based on the available functionalities of a given robot platform. In this work, we consider both lines of research by comparing prompt engineering techniques and combinations thereof within the application of high-level task planning and execution in service robotics. We define a diverse set of tasks and a simple set of functionalities in simulation, and measure task completion accuracy and execution time for several state-of-the-art models.

Updated: 2024-10-30 13:22:55

标题: 一个比较服务机器人任务规划和执行中的即时工程技术的研究

摘要: 最近在LLM方面取得的进展对自主机器人控制和人机交互至关重要，通过利用它们广泛的通用知识和能力来理解和推理各种任务和场景。先前的研究探讨了各种提示工程技术，以提高LLM的性能以完成任务，而其他人则提出了利用LLM规划和执行任务的方法，基于给定机器人平台的可用功能。在这项工作中，我们通过比较提示工程技术和其组合在服务机器人领域中的高级任务规划和执行应用中，考虑了这两条研究线路。我们在仿真中定义了一组多样的任务和简单的功能集，并对几种最先进的模型的任务完成准确性和执行时间进行了测量。

更新时间: 2024-10-30 13:22:55

领域: cs.RO,cs.AI

下载: http://arxiv.org/abs/2410.22997v1

Semantic Enrichment of the Quantum Cascade Laser Properties in Text- A Knowledge Graph Generation Approach

A well structured collection of the various Quantum Cascade Laser (QCL) design and working properties data provides a platform to analyze and understand the relationships between these properties. By analyzing these relationships, we can gain insights into how different design features impact laser performance properties such as the working temperature. Most of these QCL properties are captured in scientific text. There is therefore need for efficient methodologies that can be utilized to extract QCL properties from text and generate a semantically enriched and interlinked platform where the properties can be analyzed to uncover hidden relations. There is also the need to maintain provenance and reference information on which these properties are based. Semantic Web technologies such as Ontologies and Knowledge Graphs have proven capability in providing interlinked data platforms for knowledge representation in various domains. In this paper, we propose an approach for generating a QCL properties Knowledge Graph (KG) from text for semantic enrichment of the properties. The approach is based on the QCL ontology and a Retrieval Augmented Generation (RAG) enabled information extraction pipeline based on GPT 4-Turbo language model. The properties of interest include: working temperature, laser design type, lasing frequency, laser optical power and the heterostructure. The experimental results demonstrate the feasibility and effectiveness of this approach for efficiently extracting QCL properties from unstructured text and generating a QCL properties Knowledge Graph, which has potential applications in semantic enrichment and analysis of QCL data.

Updated: 2024-10-30 13:22:22

标题: 文本中量子级联激光器性能的语义丰富化-一种知识图生成方法

摘要: 一个结构良好的各种量子级联激光器（QCL）设计和工作特性数据的收集提供了一个平台，用于分析和理解这些特性之间的关系。通过分析这些关系，我们可以深入了解不同设计特征如何影响激光器的工作性能特性，比如工作温度。大部分这些QCL特性都被记录在科学文本中。因此，有必要采用高效的方法来从文本中提取QCL特性，并生成一个语义丰富且相互关联的平台，以便分析这些特性以揭示隐藏的关系。同时，还需要保留这些特性所基于的来源和参考信息。语义网技术，如本体论和知识图谱，已经被证明在提供各个领域的知识表示的相互关联数据平台方面具有能力。在本文中，我们提出了一种从文本中生成QCL特性知识图谱（KG）以用于特性的语义丰富化的方法。该方法基于QCL本体论和一个基于GPT 4-Turbo语言模型的检索增强生成（RAG）启用的信息提取管道。感兴趣的特性包括：工作温度、激光设计类型、激射频率、激光光功率和异质结构。实验结果表明，这种方法能够有效地从非结构化文本中提取QCL特性并生成QCL特性知识图谱，这在QCL数据的语义丰富化和分析方面具有潜在的应用。

更新时间: 2024-10-30 13:22:22

领域: cs.AI

下载: http://arxiv.org/abs/2410.22996v1

VisAidMath: Benchmarking Visual-Aided Mathematical Reasoning

Although previous research on large language models (LLMs) and large multi-modal models (LMMs) has systematically explored mathematical problem-solving (MPS) within visual contexts, the analysis of how these models process visual information during problem-solving remains insufficient. To address this gap, we present VisAidMath, a benchmark for evaluating the MPS process related to visual information. We follow a rigorous data curation pipeline involving both automated processes and manual annotations to ensure data quality and reliability. Consequently, this benchmark includes 1,200 challenging problems from various mathematical branches, vision-aid formulations, and difficulty levels, collected from diverse sources such as textbooks, examination papers, and Olympiad problems. Based on the proposed benchmark, we conduct comprehensive evaluations on ten mainstream LLMs and LMMs, highlighting deficiencies in the visual-aided reasoning process. For example, GPT-4V only achieves 45.33% accuracy in the visual-aided reasoning task, even with a drop of 2 points when provided with golden visual aids. In-depth analysis reveals that the main cause of deficiencies lies in hallucination regarding the implicit visual reasoning process, shedding light on future research directions in the visual-aided MPS process.

Updated: 2024-10-30 13:19:44

标题: VisAidMath：基于视觉辅助的数学推理基准测试

摘要: 尽管先前对大型语言模型（LLMs）和大型多模态模型（LMMs）进行的研究系统地探索了视觉背景下的数学问题解决（MPS），但对这些模型在问题解决过程中如何处理视觉信息的分析仍然不足。为了弥补这一空白，我们提出了VisAidMath，这是一个用于评估与视觉信息相关的MPS过程的基准。我们遵循严格的数据筛选流程，包括自动化过程和手动注释，以确保数据质量和可靠性。因此，这个基准包括来自各种数学分支、视觉辅助公式和难度水平的1,200个具有挑战性的问题，这些问题来自各种来源，如教科书、考试试卷和奥林匹克竞赛问题。基于提出的基准，我们对十种主流的LLMs和LMMs进行了全面评估，突显了在视觉辅助推理过程中的不足之处。例如，即使提供了黄金视觉辅助工具，GPT-4V在视觉辅助推理任务中也仅达到45.33%的准确率，甚至下降了2个百分点。深入分析揭示了缺陷的主要原因在于对隐含的视觉推理过程的妄想，为未来在视觉辅助MPS过程中的研究方向提供了启示。

更新时间: 2024-10-30 13:19:44

领域: cs.CV,cs.AI,cs.CL,cs.LG

下载: http://arxiv.org/abs/2410.22995v1

Particle Semi-Implicit Variational Inference

Semi-implicit variational inference (SIVI) enriches the expressiveness of variational families by utilizing a kernel and a mixing distribution to hierarchically define the variational distribution. Existing SIVI methods parameterize the mixing distribution using implicit distributions, leading to intractable variational densities. As a result, directly maximizing the evidence lower bound (ELBO) is not possible, so they resort to one of the following: optimizing bounds on the ELBO, employing costly inner-loop Markov chain Monte Carlo runs, or solving minimax objectives. In this paper, we propose a novel method for SIVI called Particle Variational Inference (PVI) which employs empirical measures to approximate the optimal mixing distributions characterized as the minimizer of a free energy functional. PVI arises naturally as a particle approximation of a Euclidean--Wasserstein gradient flow and, unlike prior works, it directly optimizes the ELBO whilst making no parametric assumption about the mixing distribution. Our empirical results demonstrate that PVI performs favourably compared to other SIVI methods across various tasks. Moreover, we provide a theoretical analysis of the behaviour of the gradient flow of a related free energy functional: establishing the existence and uniqueness of solutions as well as propagation of chaos results.

Updated: 2024-10-30 13:18:41

标题: 半隐变量粒子变分推断

摘要: 半隐式变分推断（SIVI）通过利用核函数和混合分布来分层定义变分分布，丰富了变分家族的表达能力。现有的SIVI方法使用隐式分布参数化混合分布，导致变分密度难以处理。因此，直接最大化证据下界（ELBO）是不可能的，因此它们采用以下一种方法：优化ELBO的边界，使用昂贵的内循环马尔可夫链蒙特卡洛运行，或解决极小极大目标。在本文中，我们提出了一种称为粒子变分推断（PVI）的SIVI新方法，该方法利用经验测量来近似最优混合分布，该分布被描述为自由能泛函的最小值。PVI自然地产生为欧几里得-瓦尔德斯坦梯度流的粒子近似，与以前的工作不同，它直接优化ELBO，同时对混合分布不做参数假设。我们的实证结果表明，与其他SIVI方法相比，PVI在各种任务中表现优异。此外，我们对相关自由能泛函的梯度流行为进行了理论分析：建立解的存在性和唯一性，以及混沌传播结果。

更新时间: 2024-10-30 13:18:41

领域: stat.ML,cs.LG

下载: http://arxiv.org/abs/2407.00649v2

Dynamic Matching with Post-allocation Service and its Application to Refugee Resettlement

Motivated by our collaboration with a major refugee resettlement agency in the U.S., we study a dynamic matching problem where each new arrival (a refugee case) must be matched immediately and irrevocably to one of the static resources (a location with a fixed annual quota). In addition to consuming the static resource, each case requires post-allocation service from a server, such as a translator. Given the time-consuming nature of service, a server may not be available at a given time, thus we refer to it as a dynamic resource. Upon matching, the case will wait to avail service in a first-come-first-serve manner. Bursty matching to a location may result in undesirable congestion at its corresponding server. Consequently, the central planner (the agency) faces a dynamic matching problem with an objective that combines the matching reward (captured by pair-specific employment outcomes) with the cost for congestion for dynamic resources and over-allocation for the static ones. Motivated by the observed fluctuations in the composition of refugee pools across the years, we design algorithms that do not rely on distributional knowledge constructed based on past years' data. To that end, we develop learning-based algorithms that are asymptotically optimal in certain regimes, easy to interpret, and computationally fast. Our design is based on learning the dual variables of the underlying optimization problem; however, the main challenge lies in the time-varying nature of the dual variables associated with dynamic resources. To overcome this challenge, our theoretical development brings together techniques from Lyapunov analysis, adversarial online learning, and stochastic optimization. On the application side, when tested on real data from our partner agency, our method outperforms existing ones making it a viable candidate for replacing the current practice upon experimentation.

Updated: 2024-10-30 13:17:38

标题: 动态匹配与后分配服务及其在难民安置中的应用

摘要: 受美国一家主要难民安置机构的合作启发，我们研究了一个动态匹配问题，其中每个新到达者（难民案例）必须立即且不可撤销地与静态资源（具有固定年度配额的地点）之一进行匹配。除了消耗静态资源外，每个案例还需要从服务器（如翻译）获得分配后服务。考虑到服务耗时的特性，服务器可能在某个时间点不可用，因此我们将其称为动态资源。匹配后，案例将按先来先服务的方式等待获取服务。对某个地点的突发匹配可能导致相应服务器出现不良拥堵。因此，中央规划者（机构）面临一个动态匹配问题，其目标结合了匹配奖励（通过特定对就业结果来捕捉）与动态资源拥堵成本和静态资源过度分配成本。受观察到的难民群体在各年份间组成波动的启发，我们设计了不依赖于基于过去年份数据构建的分布知识的算法。为此，我们开发了一种基于学习的算法，在某些范围内渐近最优，易于解释且计算速度快。我们的设计基于学习基础优化问题的对偶变量；然而，主要挑战在于与动态资源相关联的对偶变量的时变性质。为了克服这一挑战，我们的理论发展将来自李雅普诺夫分析、对抗性在线学习和随机优化的技术结合在一起。在应用方面，在从合作机构收集的真实数据上进行测试时，我们的方法优于现有方法，使其成为在实验中取代当前实践的可行候选。

更新时间: 2024-10-30 13:17:38

领域: cs.DS,cs.GT,cs.LG,math.OC

下载: http://arxiv.org/abs/2410.22992v1

Transfer Learning in Vocal Education: Technical Evaluation of Limited Samples Describing Mezzo-soprano

Vocal education in the music field is difficult to quantify due to the individual differences in singers' voices and the different quantitative criteria of singing techniques. Deep learning has great potential to be applied in music education due to its efficiency to handle complex data and perform quantitative analysis. However, accurate evaluations with limited samples over rare vocal types, such as Mezzo-soprano, requires extensive well-annotated data support using deep learning models. In order to attain the objective, we perform transfer learning by employing deep learning models pre-trained on the ImageNet and Urbansound8k datasets for the improvement on the precision of vocal technique evaluation. Furthermore, we tackle the problem of the lack of samples by constructing a dedicated dataset, the Mezzo-soprano Vocal Set (MVS), for vocal technique assessment. Our experimental results indicate that transfer learning increases the overall accuracy (OAcc) of all models by an average of 8.3%, with the highest accuracy at 94.2%. We not only provide a novel approach to evaluating Mezzo-soprano vocal techniques but also introduce a new quantitative assessment method for music education.

Updated: 2024-10-30 13:17:13

标题: 声乐教育中的迁移学习：对描述女中音有限样本的技术评估

摘要: 在音乐领域中，声乐教育很难量化，因为歌手的声音存在个体差异，而歌唱技术的量化标准也各不相同。深度学习由于其处理复杂数据和执行定量分析的效率，具有在音乐教育中应用的巨大潜力。然而，对于罕见的声音类型（如女中音）进行准确评估需要使用深度学习模型进行大量良好注释的数据支持。为了实现这一目标，我们通过利用在ImageNet和Urbansound8k数据集上预训练的深度学习模型进行迁移学习，以提高声乐技术评估的准确性。此外，我们通过构建专门的数据集“女中音声乐组合（MVS）”来解决样本不足的问题，用于声乐技术评估。我们的实验结果表明，迁移学习将所有模型的整体准确度（OAcc）平均提高了8.3％，最高准确度达到94.2％。我们不仅提供了评估女中音声乐技术的新方法，还引入了一种新的音乐教育定量评估方法。

更新时间: 2024-10-30 13:17:13

领域: eess.AS,cs.AI,cs.MM,cs.SD

下载: http://arxiv.org/abs/2410.23325v1

Efficient distributed representations with linear-time attention scores normalization

The attention score matrix ${\rm SoftMax}(XY^T)$ encodes relational similarity patterns between objects and is extremely popular in machine learning. However, the complexity required to calculate it runs quadratically with the problem size, making it a computationally heavy solution. In this article, we propose a linear-time approximation of the attention score normalization constants for embedding vectors with bounded norms. We show on several pre-trained embeddings that the accuracy of our estimation formula surpasses competing kernel methods by even orders of magnitude. From this result, we design a linear-time and task-agnostic embedding algorithm based on the optimization of the attention scores. The proposed algorithm is highly interpretable and easily adapted to an arbitrary embedding problem. We consider a few use-cases and observe similar or higher performances and a lower computational time with respect to comparable embedding algorithms.

Updated: 2024-10-30 13:10:19

标题: 具有线性时间注意力分数归一化的高效分布式表示

摘要: 注意力分数矩阵${\rm SoftMax}(XY^T)$编码对象之间的关系相似性模式，在机器学习中非常流行。然而，计算它所需的复杂度随着问题规模的增加呈二次增长，使其成为一种计算负担较重的解决方案。在本文中，我们提出了一种线性时间的注意力分数归一化常数的近似方法，用于具有有界范数的嵌入向量。我们在几个预训练的嵌入中展示了我们估计公式的准确性超过竞争的核方法几个数量级。基于注意力分数的优化，我们设计了一种基于线性时间和任务无关的嵌入算法。该算法具有很高的可解释性，并且可以轻松适应任意嵌入问题。我们考虑了几种用例，并观察到相似或更高的性能，以及与可比较的嵌入算法相比更低的计算时间。

更新时间: 2024-10-30 13:10:19

领域: cs.LG,cs.CL,stat.ML

下载: http://arxiv.org/abs/2303.17475v3

V2X-Assisted Distributed Computing and Control Framework for Connected and Automated Vehicles under Ramp Merging Scenario

This paper investigates distributed computing and cooperative control of connected and automated vehicles (CAVs) in ramp merging scenario under transportation cyber-physical system. Firstly, a centralized cooperative trajectory planning problem is formulated subject to the safely constraints and traffic performance in ramp merging scenario, where the trajectories of all vehicles are jointly optimized. To get rid of the reliance on a central controller and reduce computation time, a distributed solution to this problem implemented among CAVs through Vehicles-to-Everything (V2X) communication is proposed. Unlike existing method, our method can distribute the computational task among CAVs and carry out parallel solving through V2X communication. Then, a multi-vehicles model predictive control (MPC) problem aimed at maximizing system stability and minimizing control input is formulated based on the solution of the first problem subject to strict safety constants and input limits. Due to these complex constraints, this problem becomes high-dimensional, centralized, and non-convex. To solve it in a short time, a decomposition and convex reformulation method, namely distributed cooperative iterative model predictive control (DCIMPC), is proposed. This method leverages the communication capability of CAVs to decompose the problem, making full use of the computational resources on vehicles to achieve fast solutions and distributed control. The two above problems with their corresponding solving methods form the systemic framework of the V2X assisted distributed computing and control. Simulations have been conducted to evaluate the framework's convergence, safety, and solving speed. Additionally, extra experiments are conducted to validate the performance of DCIMPC. The results show that our method can greatly improve computation speed without sacrificing system performance.

Updated: 2024-10-30 12:56:49

标题: V2X辅助的连通和自动化车辆在匝道合流情景下的分布式计算和控制框架

摘要: 本文研究了在交通网络物理系统下，联网自动驾驶汽车（CAVs）在匝道合并场景中的分布式计算和合作控制。首先，提出了一个集中式协作轨迹规划问题，考虑了匝道合并场景中的安全约束和交通性能，其中所有车辆的轨迹被联合优化。为了摆脱对中央控制器的依赖并减少计算时间，提出了一种通过车辆间一切互联（V2X）通信实现的分布式解决方案。与现有方法不同，我们的方法可以在CAVs之间分配计算任务，并通过V2X通信实现并行求解。然后，基于第一个问题的解决方案，制定了一个旨在最大化系统稳定性并最小化控制输入的多车辆模型预测控制（MPC）问题，受严格的安全约束和输入限制。由于这些复杂的约束条件，这个问题变得高维、集中式和非凸。为了在短时间内解决这个问题，提出了一种分解和凸重构方法，即分布式协作迭代模型预测控制（DCIMPC）。该方法利用CAVs的通信能力来分解问题，充分利用车辆上的计算资源实现快速解决方案和分布式控制。上述两个问题及其相应的解决方法构成了V2X辅助分布式计算和控制的系统框架。进行了模拟实验以评估框架的收敛性、安全性和解决速度。此外，还进行了额外实验以验证DCIMPC的性能。结果表明，我们的方法可以大大提高计算速度，而不会牺牲系统性能。

更新时间: 2024-10-30 12:56:49

领域: eess.SY,cs.LG,cs.NI,cs.SY

下载: http://arxiv.org/abs/2410.22987v1

SpGesture: Source-Free Domain-adaptive sEMG-based Gesture Recognition with Jaccard Attentive Spiking Neural Network

Surface electromyography (sEMG) based gesture recognition offers a natural and intuitive interaction modality for wearable devices. Despite significant advancements in sEMG-based gesture-recognition models, existing methods often suffer from high computational latency and increased energy consumption. Additionally, the inherent instability of sEMG signals, combined with their sensitivity to distribution shifts in real-world settings, compromises model robustness. To tackle these challenges, we propose a novel SpGesture framework based on Spiking Neural Networks, which possesses several unique merits compared with existing methods: (1) Robustness: By utilizing membrane potential as a memory list, we pioneer the introduction of Source-Free Domain Adaptation into SNN for the first time. This enables SpGesture to mitigate the accuracy degradation caused by distribution shifts. (2) High Accuracy: With a novel Spiking Jaccard Attention, SpGesture enhances the SNNs' ability to represent sEMG features, leading to a notable rise in system accuracy. To validate SpGesture's performance, we collected a new sEMG gesture dataset which has different forearm postures, where SpGesture achieved the highest accuracy among the baselines ($89.26\%$). Moreover, the actual deployment on the CPU demonstrated a system latency below 100ms, well within real-time requirements. This impressive performance showcases SpGesture's potential to enhance the applicability of sEMG in real-world scenarios. The code is available at https://github.com/guoweiyu/SpGesture/.

Updated: 2024-10-30 12:56:44

标题: SpGesture：无源领域自适应的基于sEMG的手势识别与Jaccard注意力脉冲神经网络

摘要: 基于表面肌电图（sEMG）的手势识别为可穿戴设备提供了一种自然和直观的交互模式。尽管基于sEMG的手势识别模型取得了显著进展，但现有方法通常存在高计算延迟和增加的能量消耗的问题。此外，sEMG信号的固有不稳定性，结合其对真实世界环境中分布变化的敏感性，损害了模型的鲁棒性。为了应对这些挑战，我们提出了一个基于尖峰神经网络的新型SpGesture框架，与现有方法相比具有几个独特的优点：（1）鲁棒性：通过利用膜电位作为记忆列表，我们首次将无源领域适应引入到SNN中。这使得SpGesture能够减轻由分布变化导致的准确度降低。（2）高准确度：通过一种新颖的尖峰Jaccard注意力，SpGesture增强了SNN对表示sEMG特征的能力，从而显著提高系统的准确性。为了验证SpGesture的性能，我们收集了一个新的sEMG手势数据集，其中包含不同的前臂姿势，在此数据集中，SpGesture在基线中取得了最高的准确率（89.26%）。此外，在CPU上的实际部署展示了系统延迟低于100ms，完全满足实时需求。这一令人印象深刻的性能展示了SpGesture在增强sEMG在真实场景中的适用性方面的潜力。代码可在https://github.com/guoweiyu/SpGesture/获取。

更新时间: 2024-10-30 12:56:44

领域: cs.HC,cs.AI,eess.SP

下载: http://arxiv.org/abs/2405.14398v3

Higher-order Cross-structural Embedding Model for Time Series Analysis

Time series analysis has gained significant attention due to its critical applications in diverse fields such as healthcare, finance, and sensor networks. The complexity and non-stationarity of time series make it challenging to capture the interaction patterns across different timestamps. Current approaches struggle to model higher-order interactions within time series, and focus on learning temporal or spatial dependencies separately, which limits performance in downstream tasks. To address these gaps, we propose Higher-order Cross-structural Embedding Model for Time Series (High-TS), a novel framework that jointly models both temporal and spatial perspectives by combining multiscale Transformer with Topological Deep Learning (TDL). Meanwhile, High-TS utilizes contrastive learning to integrate these two structures for generating robust and discriminative representations. Extensive experiments show that High-TS outperforms state-of-the-art methods in various time series tasks and demonstrate the importance of higher-order cross-structural information in improving model performance.

Updated: 2024-10-30 12:51:14

标题: 高阶交叉结构嵌入模型用于时间序列分析

摘要: 时间序列分析因其在医疗保健、金融和传感器网络等多个领域的重要应用而受到重视。时间序列的复杂性和非平稳性使得捕捉不同时间戳之间的交互模式具有挑战性。当前方法往往难以建模时间序列中的高阶交互，并且主要关注分别学习时间或空间依赖关系，这限制了下游任务的性能。为了弥补这些缺口，我们提出了一种新颖的框架——时间序列的高阶交叉结构嵌入模型（High-TS），它通过将多尺度Transformer与拓扑深度学习（TDL）相结合，共同建模时间和空间视角。同时，High-TS利用对比学习来整合这两种结构，生成稳健且有区分性的表示。大量实验证明，High-TS在各种时间序列任务中优于最先进的方法，并展示了高阶交叉结构信息在提高模型性能方面的重要性。

更新时间: 2024-10-30 12:51:14

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2410.22984v1

Dual-Optimized Adaptive Graph Reconstruction for Multi-View Graph Clustering

Multi-view clustering is an important machine learning task for multi-media data, encompassing various domains such as images, videos, and texts. Moreover, with the growing abundance of graph data, the significance of multi-view graph clustering (MVGC) has become evident. Most existing methods focus on graph neural networks (GNNs) to extract information from both graph structure and feature data to learn distinguishable node representations. However, traditional GNNs are designed with the assumption of homophilous graphs, making them unsuitable for widely prevalent heterophilous graphs. Several techniques have been introduced to enhance GNNs for heterophilous graphs. While these methods partially mitigate the heterophilous graph issue, they often neglect the advantages of traditional GNNs, such as their simplicity, interpretability, and efficiency. In this paper, we propose a novel multi-view graph clustering method based on dual-optimized adaptive graph reconstruction, named DOAGC. It mainly aims to reconstruct the graph structure adapted to traditional GNNs to deal with heterophilous graph issues while maintaining the advantages of traditional GNNs. Specifically, we first develop an adaptive graph reconstruction mechanism that accounts for node correlation and original structural information. To further optimize the reconstruction graph, we design a dual optimization strategy and demonstrate the feasibility of our optimization strategy through mutual information theory. Numerous experiments demonstrate that DOAGC effectively mitigates the heterophilous graph problem.

Updated: 2024-10-30 12:50:21

标题: 多视图图聚类的双重优化自适应图重建

摘要: 多视图聚类是多媒体数据的重要机器学习任务，涵盖了诸如图像、视频和文本等各个领域。此外，随着图数据的日益丰富，多视图图聚类（MVGC）的重要性变得明显。大多数现有方法侧重于图神经网络（GNNs），以从图结构和特征数据中提取信息来学习可区分的节点表示。然而，传统的GNNs是在同质图的假设下设计的，因此不适用于普遍存在的异质图。已经引入了几种技术来增强GNNs用于异质图。虽然这些方法在一定程度上缓解了异质图的问题，但它们常常忽略了传统GNNs的优势，比如简单性、可解释性和效率。在本文中，我们提出了一种基于双优化自适应图重构的新型多视图图聚类方法，命名为DOAGC。它主要旨在重构适应传统GNNs的图结构，以解决异质图的问题，同时保持传统GNNs的优势。具体而言，我们首先开发了一个考虑节点相关性和原始结构信息的自适应图重构机制。为了进一步优化重构图，我们设计了一个双优化策略，并通过互信息理论证明了我们的优化策略的可行性。大量实验证明，DOAGC有效地缓解了异质图问题。

更新时间: 2024-10-30 12:50:21

领域: cs.LG

下载: http://arxiv.org/abs/2410.22983v1

Dynamical loss functions shape landscape topography and improve learning in artificial neural networks

Dynamical loss functions are derived from standard loss functions used in supervised classification tasks, but they are modified such that the contribution from each class periodically increases and decreases. These oscillations globally alter the loss landscape without affecting the global minima. In this paper, we demonstrate how to transform cross-entropy and mean squared error into dynamical loss functions. We begin by discussing the impact of increasing the size of the neural network or the learning rate on the learning process. Building on this intuition, we propose several versions of dynamical loss functions and show how they significantly improve validation accuracy for networks of varying sizes. Finally, we explore how the landscape of these dynamical loss functions evolves during training, highlighting the emergence of instabilities that may be linked to edge-of-instability minimization.

Updated: 2024-10-30 12:47:04

标题: 动力损失函数塑造景观地形并改善人工神经网络的学习

摘要: 动态损失函数是从用于监督分类任务的标准损失函数中导出的，但它们经过修改，使得每个类别的贡献周期性增加和减少。这些振荡在全局范围内改变了损失景观，而不会影响全局最小值。在本文中，我们展示了如何将交叉熵和均方误差转化为动态损失函数。我们首先讨论增加神经网络的规模或学习速率对学习过程的影响。基于这种直觉，我们提出了几个版本的动态损失函数，并展示了它们如何显著改善不同规模网络的验证准确性。最后，我们探讨了这些动态损失函数的景观在训练过程中如何演变，突出了可能与边缘不稳定性最小化相关的不稳定性的出现。

更新时间: 2024-10-30 12:47:04

领域: cs.LG

下载: http://arxiv.org/abs/2410.10690v2

PDSR: Efficient UAV Deployment for Swift and Accurate Post-Disaster Search and Rescue

This paper introduces a comprehensive framework for Post-Disaster Search and Rescue (PDSR), aiming to optimize search and rescue operations leveraging Unmanned Aerial Vehicles (UAVs). The primary goal is to improve the precision and availability of sensing capabilities, particularly in various catastrophic scenarios. Central to this concept is the rapid deployment of UAV swarms equipped with diverse sensing, communication, and intelligence capabilities, functioning as an integrated system that incorporates multiple technologies and approaches for efficient detection of individuals buried beneath rubble or debris following a disaster. Within this framework, we propose architectural solution and address associated challenges to ensure optimal performance in real-world disaster scenarios. The proposed framework aims to achieve complete coverage of damaged areas significantly faster than traditional methods using a multi-tier swarm architecture. Furthermore, integrating multi-modal sensing data with machine learning for data fusion could enhance detection accuracy, ensuring precise identification of survivors.

Updated: 2024-10-30 12:46:15

标题: PDSR：高效的无人机部署，实现灾后搜救的迅速和准确

摘要: 本文介绍了一个全面的后灾难搜索和救援（PDSR）框架，旨在通过利用无人机（UAVs）来优化搜索和救援行动。主要目标是改善传感能力的精度和可用性，特别是在各种灾难场景中。这一概念的核心是快速部署配备各种传感、通信和智能能力的无人机群体，作为一个整合系统，整合多种技术和方法，以有效地检测灾难后埋在瓦砾或残骸下的个人。在这个框架内，我们提出了架构解决方案，并解决了相关挑战，以确保在现实世界的灾难场景中实现最佳性能。所提出的框架旨在通过使用多层次群体架构比传统方法更快地实现对受损区域的完全覆盖。此外，将多模态传感数据与机器学习进行数据融合，可以提高检测准确性，确保对幸存者进行精确识别。

更新时间: 2024-10-30 12:46:15

领域: cs.RO,cs.AI,cs.SY,eess.SY

下载: http://arxiv.org/abs/2410.22982v1

DisenTS: Disentangled Channel Evolving Pattern Modeling for Multivariate Time Series Forecasting

Multivariate time series forecasting plays a crucial role in various real-world applications. Significant efforts have been made to integrate advanced network architectures and training strategies that enhance the capture of temporal dependencies, thereby improving forecasting accuracy. On the other hand, mainstream approaches typically utilize a single unified model with simplistic channel-mixing embedding or cross-channel attention operations to account for the critical intricate inter-channel dependencies. Moreover, some methods even trade capacity for robust prediction based on the channel-independent assumption. Nonetheless, as time series data may display distinct evolving patterns due to the unique characteristics of each channel (including multiple strong seasonalities and trend changes), the unified modeling methods could yield suboptimal results. To this end, we propose DisenTS, a tailored framework for modeling disentangled channel evolving patterns in general multivariate time series forecasting. The central idea of DisenTS is to model the potential diverse patterns within the multivariate time series data in a decoupled manner. Technically, the framework employs multiple distinct forecasting models, each tasked with uncovering a unique evolving pattern. To guide the learning process without supervision of pattern partition, we introduce a novel Forecaster Aware Gate (FAG) module that generates the routing signals adaptively according to both the forecasters' states and input series' characteristics. The forecasters' states are derived from the Linear Weight Approximation (LWA) strategy, which quantizes the complex deep neural networks into compact matrices. Additionally, the Similarity Constraint (SC) is further proposed to guide each model to specialize in an underlying pattern by minimizing the mutual information between the representations.

Updated: 2024-10-30 12:46:14

标题: DisenTS：多变量时间序列预测的分离通道演化模式建模

摘要: 多变量时间序列预测在各种现实世界应用中起着关键作用。人们已经付出了很大努力，将先进的网络架构和训练策略整合起来，以增强对时间依赖性的捕捉，从而提高预测准确性。另一方面，主流方法通常利用单一统一模型，具有简单的通道混合嵌入或跨通道注意操作，以解释关键的复杂通道间依赖关系。此外，一些方法甚至根据通道独立假设，权衡容量以获得稳健的预测。然而，由于时间序列数据可能由于每个通道的独特特征（包括多个强季节性和趋势变化）而显示出不同的演变模式，统一建模方法可能导致次优结果。为此，我们提出了DisenTS，这是一个专门为建模一般多变量时间序列预测中的分离通道演变模式而设计的框架。DisenTS的核心思想是以分离方式模拟多变量时间序列数据中的潜在多样化模式。技术上，该框架采用多个不同的预测模型，每个模型的任务是揭示一个独特的演变模式。为了在没有模式分区监督的情况下引导学习过程，我们引入了一种新颖的Forecaster Aware Gate（FAG）模块，根据预测器的状态和输入序列的特征自适应地生成路由信号。预测器的状态是通过线性权重近似（LWA）策略导出的，该策略将复杂的深度神经网络量化为紧凑的矩阵。此外，提出了相似性约束（SC），进一步指导每个模型通过最小化表示之间的互信息来专门化一个潜在模式。

更新时间: 2024-10-30 12:46:14

领域: cs.LG

下载: http://arxiv.org/abs/2410.22981v1

Scale Equivariant Graph Metanetworks

This paper pertains to an emerging machine learning paradigm: learning higher-order functions, i.e. functions whose inputs are functions themselves, $\textit{particularly when these inputs are Neural Networks (NNs)}$. With the growing interest in architectures that process NNs, a recurring design principle has permeated the field: adhering to the permutation symmetries arising from the connectionist structure of NNs. $\textit{However, are these the sole symmetries present in NN parameterizations}$? Zooming into most practical activation functions (e.g. sine, ReLU, tanh) answers this question negatively and gives rise to intriguing new symmetries, which we collectively refer to as $\textit{scaling symmetries}$, that is, non-zero scalar multiplications and divisions of weights and biases. In this work, we propose $\textit{Scale Equivariant Graph MetaNetworks - ScaleGMNs}$, a framework that adapts the Graph Metanetwork (message-passing) paradigm by incorporating scaling symmetries and thus rendering neuron and edge representations equivariant to valid scalings. We introduce novel building blocks, of independent technical interest, that allow for equivariance or invariance with respect to individual scalar multipliers or their product and use them in all components of ScaleGMN. Furthermore, we prove that, under certain expressivity conditions, ScaleGMN can simulate the forward and backward pass of any input feedforward neural network. Experimental results demonstrate that our method advances the state-of-the-art performance for several datasets and activation functions, highlighting the power of scaling symmetries as an inductive bias for NN processing. The source code is publicly available at https://github.com/jkalogero/scalegmn.

Updated: 2024-10-30 12:45:18

标题: 尺度等变图元网络

摘要: 这篇论文涉及到一种新兴的机器学习范式：学习高阶函数，即输入为函数本身的函数，尤其是当这些输入是神经网络时。随着对处理神经网络的架构越来越感兴趣，一个经常出现的设计原则已经渗透到这个领域中：遵循神经网络连接结构所产生的置换对称性。然而，神经网络参数化中是否只存在这些对称性呢？深入研究大多数实际激活函数（如正弦、ReLU、tanh）的结构，是否定回答了这个问题，并产生了引人入胜的新对称性，我们统称为“缩放对称性”，即权重和偏置的非零标量乘法和除法。在这项工作中，我们提出了“缩放等变图元网络- ScaleGMNs”框架，通过整合缩放对称性，调整图元网络（消息传递）范式，从而使神经元和边的表示等变于有效的缩放。我们引入了独立技术兴趣的新型构建模块，使其能够对个体标量乘法器或其乘积进行等变或不变，并在ScaleGMN的所有组件中使用它们。此外，我们证明，在某些表达条件下，ScaleGMN可以模拟任何输入前馈神经网络的前向和反向传递。实验结果表明，我们的方法在几个数据集和激活函数的性能方面推动了最新技术的进步，突显了缩放对称性作为神经网络处理的归纳偏好的力量。源代码可在https://github.com/jkalogero/scalegmn 上公开获取。

更新时间: 2024-10-30 12:45:18

领域: cs.LG

下载: http://arxiv.org/abs/2406.10685v2

Graph Integration for Diffusion-Based Manifold Alignment

Data from individual observations can originate from various sources or modalities but are often intrinsically linked. Multimodal data integration can enrich information content compared to single-source data. Manifold alignment is a form of data integration that seeks a shared, underlying low-dimensional representation of multiple data sources that emphasizes similarities between alternative representations of the same entities. Semi-supervised manifold alignment relies on partially known correspondences between domains, either through shared features or through other known associations. In this paper, we introduce two semi-supervised manifold alignment methods. The first method, Shortest Paths on the Union of Domains (SPUD), forms a unified graph structure using known correspondences to establish graph edges. By learning inter-domain geodesic distances, SPUD creates a global, multi-domain structure. The second method, MASH (Manifold Alignment via Stochastic Hopping), learns local geometry within each domain and forms a joint diffusion operator using known correspondences to iteratively learn new inter-domain correspondences through a random-walk approach. Through the diffusion process, MASH forms a coupling matrix that links heterogeneous domains into a unified structure. We compare SPUD and MASH with existing semi-supervised manifold alignment methods and show that they outperform competing methods in aligning true correspondences and cross-domain classification. In addition, we show how these methods can be applied to transfer label information between domains.

Updated: 2024-10-30 12:43:44

标题: 基于扩散的流形对齐的图集成

摘要: 个体观察数据可以来自不同的来源或模态，但通常是内在联系的。多模态数据集成可以丰富信息内容，与单一数据源相比。流形对齐是一种数据集成形式，旨在寻找多个数据源的共享、基础的低维表示，强调了相同实体的不同表示之间的相似性。半监督流形对齐依赖于领域之间部分已知的对应关系，可以通过共享特征或其他已知关联来实现。在本文中，我们介绍了两种半监督流形对齐方法。第一种方法，即在领域联合上的最短路径（SPUD），利用已知的对应关系形成一个统一的图结构来建立图边。通过学习领域间的测地距离，SPUD创建了一个全局的、多领域结构。第二种方法，即通过随机跳跃进行流形对齐（MASH），学习了每个领域内的局部几何结构，并利用已知的对应关系来形成一个联合扩散算子，通过随机漫步方法迭代学习新的领域间对应关系。通过扩散过程，MASH形成了一个连接异构领域的耦合矩阵，将它们连接成一个统一的结构。我们将SPUD和MASH与现有的半监督流形对齐方法进行比较，并展示它们在对齐真实对应关系和跨领域分类方面优于竞争方法。此外，我们展示了这些方法如何将标签信息在不同领域之间传递。

更新时间: 2024-10-30 12:43:44

领域: stat.ML,cs.LG

下载: http://arxiv.org/abs/2410.22978v1

Learning Structure-Aware Representations of Dependent Types

Agda is a dependently-typed programming language and a proof assistant, pivotal in proof formalization and programming language theory. This paper extends the Agda ecosystem into machine learning territory, and, vice versa, makes Agda-related resources available to machine learning practitioners. We introduce and release a novel dataset of Agda program-proofs that is elaborate and extensive enough to support various machine learning applications -- the first of its kind. Leveraging the dataset's ultra-high resolution, which details proof states at the sub-type level, we propose a novel neural architecture targeted at faithfully representing dependently-typed programs on the basis of structural rather than nominal principles. We instantiate and evaluate our architecture in a premise selection setup, where it achieves promising initial results, surpassing strong baselines.

Updated: 2024-10-30 12:40:30

标题: 学习依赖类型的结构感知表示

摘要: Agda是一种依赖类型编程语言和证明助手，在证明形式化和编程语言理论中至关重要。本文将Agda生态系统扩展到机器学习领域，并且反之亦然，使Agda相关资源可供机器学习实践者使用。我们介绍并发布了一个新颖的Agda程序证明数据集，该数据集足够复杂和广泛，以支持各种机器学习应用 - 这是第一种。利用数据集的超高分辨率，详细说明了子类型级别的证明状态，我们提出了一种针对依赖类型程序的结构性而非命名原则的神经网络架构。我们在前提选择设置中实例化和评估我们的架构，在那里取得了令人鼓舞的初步结果，超越了强大的基线。

更新时间: 2024-10-30 12:40:30

领域: cs.LG,cs.PL

下载: http://arxiv.org/abs/2402.02104v2

Algebraic Positional Encodings

We introduce a novel positional encoding strategy for Transformer-style models, addressing the shortcomings of existing, often ad hoc, approaches. Our framework provides a flexible mapping from the algebraic specification of a domain to an interpretation as orthogonal operators. This design preserves the algebraic characteristics of the source domain, ensuring that the model upholds its desired structural properties. Our scheme can accommodate various structures, ncluding sequences, grids and trees, as well as their compositions. We conduct a series of experiments to demonstrate the practical applicability of our approach. Results suggest performance on par with or surpassing the current state-of-the-art, without hyper-parameter optimizations or "task search" of any kind. Code is available at https://github.com/konstantinosKokos/ape.

Updated: 2024-10-30 12:38:18

标题: 代数位置编码

摘要: 我们介绍了一种新颖的位置编码策略，适用于Transformer风格的模型，解决了现有常常是临时性的方法的不足之处。我们的框架提供了从域的代数规范到正交算子解释的灵活映射。这种设计保留了源域的代数特征，确保模型保持其所需的结构特性。我们的方案可以容纳各种结构，包括序列、网格和树，以及它们的组合。我们进行了一系列实验，以展示我们方法的实际适用性。结果表明，我们的表现与当前最先进技术相当甚至超越，而无需超参数优化或任何"任务搜索"。代码可在https://github.com/konstantinosKokos/ape获取。

更新时间: 2024-10-30 12:38:18

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2312.16045v2

einspace: Searching for Neural Architectures from Fundamental Operations

Neural architecture search (NAS) finds high performing networks for a given task. Yet the results of NAS are fairly prosaic; they did not e.g. create a shift from convolutional structures to transformers. This is not least because the search spaces in NAS often aren't diverse enough to include such transformations a priori. Instead, for NAS to provide greater potential for fundamental design shifts, we need a novel expressive search space design which is built from more fundamental operations. To this end, we introduce einspace, a search space based on a parameterised probabilistic context-free grammar. Our space is versatile, supporting architectures of various sizes and complexities, while also containing diverse network operations which allow it to model convolutions, attention components and more. It contains many existing competitive architectures, and provides flexibility for discovering new ones. Using this search space, we perform experiments to find novel architectures as well as improvements on existing ones on the diverse Unseen NAS datasets. We show that competitive architectures can be obtained by searching from scratch, and we consistently find large improvements when initialising the search with strong baselines. We believe that this work is an important advancement towards a transformative NAS paradigm where search space expressivity and strategic search initialisation play key roles.

Updated: 2024-10-30 12:35:56

标题: einspace：从基本操作中寻找神经结构

摘要: 神经架构搜索（NAS）为给定任务找到高性能网络。然而，NAS的结果相当平庸；例如，它们没有从卷积结构转变为transformers。这主要是因为NAS中的搜索空间通常没有足够的多样性来事先包括这种转换。相反，为了使NAS具有更大的潜力进行基本设计转变，我们需要一种建立在更基本操作之上的新颖表达式搜索空间设计。为此，我们介绍了einspace，这是一个基于参数化的概率无上下文语法的搜索空间。我们的空间具有多样性，支持各种大小和复杂性的架构，同时还包含各种网络操作，使其能够建模卷积、注意力组件等。它包含许多现有竞争性架构，并为发现新架构提供了灵活性。利用这个搜索空间，我们进行实验，发现新颖架构以及对多样的未知NAS数据集上现有架构的改进。我们展示了可以通过从头开始搜索获得竞争性架构，并且在使用强基线初始化搜索时，我们始终发现大幅改进。我们认为，这项工作是迈向一个变革性NAS范式的重要进步，其中搜索空间的表达能力和战略性搜索初始化起着关键作用。

更新时间: 2024-10-30 12:35:56

领域: cs.LG,cs.AI,cs.CV,stat.ML

下载: http://arxiv.org/abs/2405.20838v2

Dynamic Threshold-based Two-layer Online Unsupervised Anomaly Detector

The proliferation of the Internet of Things (IoT) has heightened the vulnerability to cyber threats, making it imperative to develop Anomaly Detection Systems (ADSs) capable of adapting to emerging or novel attacks. Prior research has predominantly concentrated on offline unsupervised learning techniques to protect ADSs, which are impractical for real-world applications. Furthermore, these studies often rely heavily on the assumption of known legitimate behaviors and fall short of meeting the interpretability requirements in security contexts, thereby hindering their practical adoption. In response, this paper introduces Adaptive NAD, a comprehensive framework aimed at enhancing and interpreting online unsupervised anomaly detection within security domains. We propose an interpretable two-layer anomaly detection approach that generates dependable, high-confidence pseudo-labels. Subsequently, we incorporate an online learning mechanism that updates Adaptive NAD using an innovative threshold adjustment method to accommodate new threats. Experimental findings reveal that Adaptive NAD surpasses state-of-the-art solutions by achieving improvements of over 5.4% and 23.0% in SPAUC on the CIC-Darknet2020 and CIC-DoHBrw-2020 datasets, respectively. The code for Adaptive NAD is publicly available at https://github.com/MyLearnCodeSpace/Adaptive-NAD.

Updated: 2024-10-30 12:26:02

标题: 基于动态阈值的两层在线无监督异常检测器

摘要: 物联网（IoT）的蓬勃发展增加了网络威胁的脆弱性，这使得开发能够适应新出现或新型攻击的异常检测系统（ADSs）成为必要。先前的研究主要集中在离线无监督学习技术上，以保护ADSs，但这对于实际应用来说是不切实际的。此外，这些研究往往过分依赖已知合法行为的假设，并且未能满足安全背景下的可解释性要求，从而阻碍了它们的实际应用。为此，本文引入了自适应NAD，这是一个旨在增强和解释在线无监督异常检测的综合框架，适用于安全领域。我们提出了一种可解释的两层异常检测方法，生成可靠、高置信度的伪标签。随后，我们结合了一种在线学习机制，通过创新的阈值调整方法更新Adaptive NAD，以适应新的威胁。实验结果显示，自适应NAD在CIC-Darknet2020和CIC-DoHBrw-2020数据集上的SPAUC分别提高了超过5.4％和23.0％，超过了最先进的解决方案。自适应NAD的代码可在https://github.com/MyLearnCodeSpace/Adaptive-NAD 公开获取。

更新时间: 2024-10-30 12:26:02

领域: cs.LG,eess.SP

下载: http://arxiv.org/abs/2410.22967v1

Scalable Sampling for High Utility Patterns

Discovering valuable insights from data through meaningful associations is a crucial task. However, it becomes challenging when trying to identify representative patterns in quantitative databases, especially with large datasets, as enumeration-based strategies struggle due to the vast search space involved. To tackle this challenge, output space sampling methods have emerged as a promising solution thanks to its ability to discover valuable patterns with reduced computational overhead. However, existing sampling methods often encounter limitations when dealing with large quantitative database, resulting in scalability-related challenges. In this work, we propose a novel high utility pattern sampling algorithm and its on-disk version both designed for large quantitative databases based on two original theorems. Our approach ensures both the interactivity required for user-centered methods and strong statistical guarantees through random sampling. Thanks to our method, users can instantly discover relevant and representative utility pattern, facilitating efficient exploration of the database within seconds. To demonstrate the interest of our approach, we present a compelling use case involving archaeological knowledge graph sub-profiles discovery. Experiments on semantic and none-semantic quantitative databases show that our approach outperforms the state-of-the art methods.

Updated: 2024-10-30 12:22:54

标题: 可扩展的采样方法用于高效模式

摘要: 通过有意义的关联从数据中发现有价值的见解是一项至关重要的任务。然而，在试图识别定量数据库中的代表性模式时，特别是在大型数据集中，由于涉及到广泛的搜索空间，基于枚举的策略往往面临挑战。为了解决这一挑战，输出空间抽样方法已经成为一种有希望的解决方案，因为它能够在减少计算开销的同时发现有价值的模式。然而，现有的抽样方法在处理大型定量数据库时经常遇到限制，导致可扩展性相关的挑战。在这项工作中，我们提出了一种新颖的高效用模式抽样算法及其基于磁盘的版本，两者均针对大型定量数据库设计，基于两个原创定理。我们的方法确保了用户中心方法所需的互动性，并通过随机抽样提供强大的统计保证。由于我们的方法，用户可以在几秒钟内即刻发现相关且代表性的效用模式，有助于有效探索数据库。为了展示我们方法的吸引力，我们提出了一个涉及考古知识图子配置文件发现的引人注目的用例。在语义和非语义定量数据库上的实验表明，我们的方法优于现有的方法。

更新时间: 2024-10-30 12:22:54

领域: cs.DB,cs.LG,60: Probability theory,G.3; E.1; E.2; F.2

下载: http://arxiv.org/abs/2410.22964v1

Selective Reincarnation: Offline-to-Online Multi-Agent Reinforcement Learning

'Reincarnation' in reinforcement learning has been proposed as a formalisation of reusing prior computation from past experiments when training an agent in an environment. In this paper, we present a brief foray into the paradigm of reincarnation in the multi-agent (MA) context. We consider the case where only some agents are reincarnated, whereas the others are trained from scratch -- selective reincarnation. In the fully-cooperative MA setting with heterogeneous agents, we demonstrate that selective reincarnation can lead to higher returns than training fully from scratch, and faster convergence than training with full reincarnation. However, the choice of which agents to reincarnate in a heterogeneous system is vitally important to the outcome of the training -- in fact, a poor choice can lead to considerably worse results than the alternatives. We argue that a rich field of work exists here, and we hope that our effort catalyses further energy in bringing the topic of reincarnation to the multi-agent realm.

Updated: 2024-10-30 12:19:51

标题: 选择性再生：离线到在线多智能体强化学习

摘要: 在强化学习中，“转世”被提出作为在训练环境中代理时重复使用先前实验中的计算的形式化方法。在这篇论文中，我们对多智能体（MA）环境中的“转世”范式进行了简要探讨。我们考虑了只有一些智能体被转世，而其他智能体从头开始训练的情况--选择性转世。在具有异质智能体的完全合作MA设置中，我们证明选择性转世可以导致比完全从头开始训练更高的收益，并且比完全转世训练更快地收敛。然而，在异质系统中选择哪些智能体转世对训练结果至关重要--实际上，选择不当可能导致比其他选择更糟糕的结果。我们认为这里存在一个丰富的研究领域，并希望我们的努力能够激发更多关于“转世”主题的讨论在多智能体领域中。

更新时间: 2024-10-30 12:19:51

领域: cs.AI,cs.LG,cs.MA

下载: http://arxiv.org/abs/2304.00977v2

A Study of Secure Algorithms for Vertical Federated Learning: Take Secure Logistic Regression as an Example

After entering the era of big data, more and more companies build services with machine learning techniques. However, it is costly for companies to collect data and extract helpful handcraft features on their own. Although it is a way to combine with other companies' data for boosting the model's performance, this approach may be prohibited by laws. In other words, finding the balance between sharing data with others and keeping data from privacy leakage is a crucial topic worthy of close attention. This paper focuses on distributed data and conducts secure model training tasks on a vertical federated learning scheme. Here, secure implies that the whole process is executed in the encrypted domain. Therefore, the privacy concern is released.

Updated: 2024-10-30 12:17:35

标题: 一个关于垂直联邦学习安全算法的研究：以安全的逻辑回归为例

摘要: 进入大数据时代后，越来越多的公司利用机器学习技术构建服务。然而，对公司来说，收集数据并提取有用的手工特征是成本高昂的。虽然将数据与其他公司的数据结合以提升模型性能是一种方式，但这种方法可能会受到法律的限制。换句话说，在与他人分享数据和防止数据泄露之间找到平衡是一个值得密切关注的关键主题。本文关注分布式数据，并在垂直联邦学习方案上进行安全模型训练任务。在这里，安全意味着整个过程在加密领域中执行。因此，隐私问题得到解决。

更新时间: 2024-10-30 12:17:35

领域: cs.CR,cs.LG

下载: http://arxiv.org/abs/2410.22960v1

Questionable practices in machine learning

Evaluating modern ML models is hard. The strong incentive for researchers and companies to report a state-of-the-art result on some metric often leads to questionable research practices (QRPs): bad practices which fall short of outright research fraud. We describe 44 such practices which can undermine reported results, giving examples where possible. Our list emphasises the evaluation of large language models (LLMs) on public benchmarks. We also discuss "irreproducible research practices", i.e. decisions that make it difficult or impossible for other researchers to reproduce, build on or audit previous research.

Updated: 2024-10-30 12:14:35

标题: 机器学习中的可疑实践

摘要: 评估现代机器学习模型是困难的。研究人员和公司有强烈的动机以某些指标报告最新的研究结果，这经常导致可疑的研究实践（QRPs）：这些实践虽不至于构成明显的研究欺诈，但仍存在问题。我们描述了44种可能破坏报告结果的实践，并尽可能给出了示例。我们的列表强调在公共基准上评估大型语言模型（LLMs）。我们还讨论了“不可重复研究实践”，即使其他研究人员难以或无法重现、继续建立或审计先前的研究。

更新时间: 2024-10-30 12:14:35

领域: cs.LG,cs.CL,cs.CY

下载: http://arxiv.org/abs/2407.12220v2

Embedding Trajectory for Out-of-Distribution Detection in Mathematical Reasoning

Real-world data deviating from the independent and identically distributed (i.i.d.) assumption of in-distribution training data poses security threats to deep networks, thus advancing out-of-distribution (OOD) detection algorithms. Detection methods in generative language models (GLMs) mainly focus on uncertainty estimation and embedding distance measurement, with the latter proven to be most effective in traditional linguistic tasks like summarization and translation. However, another complex generative scenario mathematical reasoning poses significant challenges to embedding-based methods due to its high-density feature of output spaces, but this feature causes larger discrepancies in the embedding shift trajectory between different samples in latent spaces. Hence, we propose a trajectory-based method TV score, which uses trajectory volatility for OOD detection in mathematical reasoning. Experiments show that our method outperforms all traditional algorithms on GLMs under mathematical reasoning scenarios and can be extended to more applications with high-density features in output spaces, such as multiple-choice questions.

Updated: 2024-10-30 12:10:42

标题: 嵌入轨迹在数学推理中的超出分布检测

摘要: 真实世界的数据偏离了独立同分布假设，这对深度网络构成了安全威胁，因此推动了外部分布（OOD）检测算法的发展。生成语言模型（GLMs）中的检测方法主要集中在不确定性估计和嵌入距离测量上，后者在传统语言任务如摘要和翻译中被证明最为有效。然而，另一个复杂的生成场景数学推理由于其输出空间的高密度特性对基于嵌入的方法提出了重大挑战，但这一特性导致了潜在空间中不同样本之间的嵌入偏移轨迹存在较大差异。因此，我们提出了一种基于轨迹的方法TV分数，用于在数学推理中使用轨迹波动性进行OOD检测。实验证明，我们的方法在数学推理场景下优于所有传统算法，并且可以扩展到具有高密度特征的输出空间中更多应用，如多项选择题。

更新时间: 2024-10-30 12:10:42

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2405.14039v2

Retrieval-Augmented Generation with Estimation of Source Reliability

Retrieval-augmented generation (RAG) addresses key limitations of large language models (LLMs), such as hallucinations and outdated knowledge, by incorporating external databases. These databases typically consult multiple sources to encompass up-to-date and various information. However, standard RAG methods often overlook the heterogeneous source reliability in the multi-source database and retrieve documents solely based on relevance, making them prone to propagating misinformation. To address this, we propose Reliability-Aware RAG (RA-RAG) which estimates the reliability of multiple sources and incorporates this information into both retrieval and aggregation processes. Specifically, it iteratively estimates source reliability and true answers for a set of queries with no labelling. Then, it selectively retrieves relevant documents from a few of reliable sources and aggregates them using weighted majority voting, where the selective retrieval ensures scalability while not compromising the performance. We also introduce a benchmark designed to reflect real-world scenarios with heterogeneous source reliability and demonstrate the effectiveness of RA-RAG compared to a set of baselines.

Updated: 2024-10-30 12:09:29

标题: 检索增强生成与源可靠性估计

摘要: 检索增强生成（RAG）通过整合外部数据库解决了大型语言模型（LLMs）的关键限制，例如幻觉和过时知识。这些数据库通常会查询多个来源，以涵盖最新和各种信息。然而，标准的RAG方法往往忽视多源数据库中的异构来源可靠性，并仅基于相关性检索文档，使其容易传播错误信息。为了解决这个问题，我们提出了可靠性感知RAG（RA-RAG），该方法估计多个来源的可靠性，并将此信息纳入检索和聚合过程中。具体来说，它迭代地估计一组没有标记的查询的来源可靠性和真实答案。然后，它有选择地从一些可靠来源中检索相关文档，并使用加权多数投票对它们进行聚合，其中有选择地检索确保可扩展性而不会影响性能。我们还引入了一个旨在反映具有异构来源可靠性的真实世界场景的基准，并展示了RA-RAG相对于一组基线的有效性。

更新时间: 2024-10-30 12:09:29

领域: cs.LG

下载: http://arxiv.org/abs/2410.22954v1

Dispelling the Mirage of Progress in Offline MARL through Standardised Baselines and Evaluation

Offline multi-agent reinforcement learning (MARL) is an emerging field with great promise for real-world applications. Unfortunately, the current state of research in offline MARL is plagued by inconsistencies in baselines and evaluation protocols, which ultimately makes it difficult to accurately assess progress, trust newly proposed innovations, and allow researchers to easily build upon prior work. In this paper, we firstly identify significant shortcomings in existing methodologies for measuring the performance of novel algorithms through a representative study of published offline MARL work. Secondly, by directly comparing to this prior work, we demonstrate that simple, well-implemented baselines can achieve state-of-the-art (SOTA) results across a wide range of tasks. Specifically, we show that on 35 out of 47 datasets used in prior work (almost 75% of cases), we match or surpass the performance of the current purported SOTA. Strikingly, our baselines often substantially outperform these more sophisticated algorithms. Finally, we correct for the shortcomings highlighted from this prior work by introducing a straightforward standardised methodology for evaluation and by providing our baseline implementations with statistically robust results across several scenarios, useful for comparisons in future work. Our proposal includes simple and sensible steps that are easy to adopt, which in combination with solid baselines and comparative results, could substantially improve the overall rigour of empirical science in offline MARL moving forward.

Updated: 2024-10-30 12:08:43

标题: 揭示离线多智能体强化学习中进展的幻象：通过标准基线和评估

摘要: 离线多智能体强化学习（MARL）是一个具有巨大潜力的新兴领域，适用于现实世界的应用。不幸的是，目前离线MARL研究的现状受到基线和评估协议的不一致性的困扰，这最终使得准确评估进展、信任新提出的创新，并允许研究人员轻松建立在先前工作基础上变得困难。在本文中，我们首先通过对已发表的离线MARL工作进行代表性研究，识别现有测量新算法性能的方法论存在的重大缺陷。其次，通过直接与之前的工作进行比较，我们展示了简单、良好实现的基线可以在各种任务中取得最新技术（SOTA）结果。具体来说，在先前工作中使用的47个数据集中的35个（近75%的情况下），我们与当前被认为是SOTA的性能相匹敌或超越。令人惊讶的是，我们的基线往往明显优于这些更复杂的算法。最后，我们通过引入一种简单的标准化评估方法来纠正从先前工作中突出的缺陷，并通过提供我们的基线实现在几个场景中具有统计上健壮的结果，有助于未来工作中的比较。我们的提议包括简单而明智的步骤，易于采纳，结合坚实的基线和比较结果，可能会大大提高离线MARL实证科学的整体严谨性。

更新时间: 2024-10-30 12:08:43

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2406.09068v3

Improved Few-Shot Jailbreaking Can Circumvent Aligned Language Models and Their Defenses

Recently, Anil et al. (2024) show that many-shot (up to hundreds of) demonstrations can jailbreak state-of-the-art LLMs by exploiting their long-context capability. Nevertheless, is it possible to use few-shot demonstrations to efficiently jailbreak LLMs within limited context sizes? While the vanilla few-shot jailbreaking may be inefficient, we propose improved techniques such as injecting special system tokens like [/INST] and employing demo-level random search from a collected demo pool. These simple techniques result in surprisingly effective jailbreaking against aligned LLMs (even with advanced defenses). For examples, our method achieves >80% (mostly >95%) ASRs on Llama-2-7B and Llama-3-8B without multiple restarts, even if the models are enhanced by strong defenses such as perplexity detection and/or SmoothLLM, which is challenging for suffix-based jailbreaking. In addition, we conduct comprehensive and elaborate (e.g., making sure to use correct system prompts) evaluations against other aligned LLMs and advanced defenses, where our method consistently achieves nearly 100% ASRs. Our code is available at https://github.com/sail-sg/I-FSJ.

Updated: 2024-10-30 12:08:42

标题: 改进的少样本越狱技术可以规避对齐语言模型及其防御措施

摘要: 最近，Anil等人（2024年）表明，通过利用长上下文能力，可以使用许多样本（多达数百个）来破解最先进的LLM。然而，是否可能使用少量样本来高效地破解LLM，而又受限于上下文的大小？尽管普通的少样本破解可能效率低下，我们提出了改进的技术，例如注入特殊的系统标记（如[/INST]）并利用从收集的演示池中进行演示级别的随机搜索。这些简单的技术对齐LLM进行了出乎意料的有效的破解（即使使用先进的防御措施）。例如，我们的方法在没有多次重启的情况下，在Llama-2-7B和Llama-3-8B上实现了>80%（大部分>95%）的ASR，即使模型通过强大的防御措施进行了增强，例如困惑检测和/或SmoothLLM，这对于基于后缀的破解是具有挑战性的。此外，我们对其他对齐的LLM和高级防御进行了全面和详尽的评估（例如，确保使用正确的系统提示），在这些评估中，我们的方法始终达到了近100%的ASR。我们的代码可在https://github.com/sail-sg/I-FSJ上找到。

更新时间: 2024-10-30 12:08:42

领域: cs.CL,cs.AI,cs.CR,cs.LG

下载: http://arxiv.org/abs/2406.01288v2

Efficient Adaptation of Pre-trained Vision Transformer via Householder Transformation

A common strategy for Parameter-Efficient Fine-Tuning (PEFT) of pre-trained Vision Transformers (ViTs) involves adapting the model to downstream tasks by learning a low-rank adaptation matrix. This matrix is decomposed into a product of down-projection and up-projection matrices, with the bottleneck dimensionality being crucial for reducing the number of learnable parameters, as exemplified by prevalent methods like LoRA and Adapter. However, these low-rank strategies typically employ a fixed bottleneck dimensionality, which limits their flexibility in handling layer-wise variations. To address this limitation, we propose a novel PEFT approach inspired by Singular Value Decomposition (SVD) for representing the adaptation matrix. SVD decomposes a matrix into the product of a left unitary matrix, a diagonal matrix of scaling values, and a right unitary matrix. We utilize Householder transformations to construct orthogonal matrices that efficiently mimic the unitary matrices, requiring only a vector. The diagonal values are learned in a layer-wise manner, allowing them to flexibly capture the unique properties of each layer. This approach enables the generation of adaptation matrices with varying ranks across different layers, providing greater flexibility in adapting pre-trained models. Experiments on standard downstream vision tasks demonstrate that our method achieves promising fine-tuning performance.

Updated: 2024-10-30 12:08:30

标题: 通过Householder变换高效地调整预训练视觉Transformer

摘要: 一种用于参数高效微调（PEFT）的常见策略是通过学习一个低秩适应矩阵来将预训练的视觉变换器（ViTs）模型适应到下游任务。该矩阵被分解为下投影矩阵和上投影矩阵的乘积，瓶颈维度对于减少可学习参数的数量至关重要，正如LoRA和Adapter等流行方法所示。然而，这些低秩策略通常使用固定的瓶颈维度，限制了它们处理层间变化的灵活性。为了解决这一限制，我们提出了一种受奇异值分解（SVD）启发的新颖PEFT方法来表示适应矩阵。SVD将一个矩阵分解为左单元矩阵、一个包含缩放值的对角矩阵和右单元矩阵的乘积。我们利用Householder变换构建正交矩阵，有效地模拟单元矩阵，只需一个向量。对角值按层逐个学习，使其能灵活地捕捉每一层的独特特性。这种方法使得在不同层之间生成具有不同秩的适应矩阵成为可能，提供了在适应预训练模型时更大的灵活性。在标准下游视觉任务上的实验表明，我们的方法实现了有希望的微调性能。

更新时间: 2024-10-30 12:08:30

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2410.22952v1

SpiroActive: Active Learning for Efficient Data Acquisition for Spirometry

Respiratory illnesses are a significant global health burden. Respiratory illnesses, primarily Chronic obstructive pulmonary disease (COPD), is the seventh leading cause of poor health worldwide and the third leading cause of death worldwide, causing 3.23 million deaths in 2019, necessitating early identification and diagnosis for effective mitigation. Among the diagnostic tools employed, spirometry plays a crucial role in detecting respiratory abnormalities. However, conventional clinical spirometry methods often entail considerable costs and practical limitations like the need for specialized equipment, trained personnel, and a dedicated clinical setting, making them less accessible. To address these challenges, wearable spirometry technologies have emerged as promising alternatives, offering accurate, cost-effective, and convenient solutions. The development of machine learning models for wearable spirometry heavily relies on the availability of high-quality ground truth spirometry data, which is a laborious and expensive endeavor. In this research, we propose using active learning, a sub-field of machine learning, to mitigate the challenges associated with data collection and labeling. By strategically selecting samples from the ground truth spirometer, we can mitigate the need for resource-intensive data collection. We present evidence that models trained on small subsets obtained through active learning achieve comparable/better results than models trained on the complete dataset.

Updated: 2024-10-30 12:07:30

标题: SpiroActive：用于肺功能测试的高效数据采集的主动学习

摘要: 呼吸系统疾病是全球健康负担的重要组成部分。呼吸系统疾病，主要是慢性阻塞性肺疾病（COPD），是世界范围内健康不佳的第七大原因，也是世界第三大死亡原因，导致2019年有323万人死亡，需要早期识别和诊断以有效缓解。在使用的诊断工具中，肺功能试验在检测呼吸异常方面起着关键作用。然而，传统的临床肺功能试验方法往往涉及相当大的成本和实际限制，如需要专门设备、受过训练的人员和专门的临床环境，使其难以获得。为了解决这些挑战，可穿戴式肺功能试验技术已成为有希望的替代方案，提供精确、具有成本效益和便利的解决方案。可穿戴式肺功能试验的机器学习模型的发展严重依赖于高质量的基础真实肺功能试验数据的可用性，这是一项费时费力的工作。在这项研究中，我们提出使用主动学习，机器学习的一个子领域，来缓解与数据收集和标记相关的挑战。通过从基础真实肺功能试验仪中策略性地选择样本，我们可以减少对资源密集型数据收集的需求。我们提供证据表明，通过主动学习获得的小子集训练的模型可以获得与完整数据集训练的模型相当/更好的结果。

更新时间: 2024-10-30 12:07:30

领域: cs.LG,cs.AI,cs.HC

下载: http://arxiv.org/abs/2410.22950v1

MutaPLM: Protein Language Modeling for Mutation Explanation and Engineering

Studying protein mutations within amino acid sequences holds tremendous significance in life sciences. Protein language models (PLMs) have demonstrated strong capabilities in broad biological applications. However, due to architectural design and lack of supervision, PLMs model mutations implicitly with evolutionary plausibility, which is not satisfactory to serve as explainable and engineerable tools in real-world studies. To address these issues, we present MutaPLM, a unified framework for interpreting and navigating protein mutations with protein language models. MutaPLM introduces a protein delta network that captures explicit protein mutation representations within a unified feature space, and a transfer learning pipeline with a chain-of-thought (CoT) strategy to harvest protein mutation knowledge from biomedical texts. We also construct MutaDescribe, the first large-scale protein mutation dataset with rich textual annotations, which provides cross-modal supervision signals. Through comprehensive experiments, we demonstrate that MutaPLM excels at providing human-understandable explanations for mutational effects and prioritizing novel mutations with desirable properties. Our code, model, and data are open-sourced at https://github.com/PharMolix/MutaPLM.

Updated: 2024-10-30 12:05:51

标题: MutaPLM：用于突变解释和工程的蛋白质语言建模

摘要: 研究氨基酸序列中的蛋白质突变在生命科学中具有重要意义。蛋白质语言模型（PLMs）已经展示出在广泛的生物应用中具有强大的能力。然而，由于架构设计和缺乏监督，PLMs隐含地模拟突变，具有进化合理性，这不足以作为可解释和可工程化工具在现实世界的研究中使用。为了解决这些问题，我们提出了MutaPLM，这是一个统一的框架，用于解释和导航蛋白质语言模型中的蛋白质突变。MutaPLM引入了一个蛋白质Δ网络，捕获了统一特征空间中明确的蛋白质突变表示，以及一个具有一贯思维（CoT）策略的迁移学习流水线，以从生物医学文本中获取蛋白质突变知识。我们还构建了MutaDescribe，这是第一个具有丰富文本注释的大规模蛋白质突变数据集，提供跨模态监督信号。通过全面的实验，我们展示了MutaPLM在为突变效应提供人类可理解的解释和优先考虑具有理想特性的新突变方面的优势。我们的代码、模型和数据在https://github.com/PharMolix/MutaPLM上开源。

更新时间: 2024-10-30 12:05:51

领域: cs.LG,q-bio.BM,68T07

下载: http://arxiv.org/abs/2410.22949v1

ELBOing Stein: Variational Bayes with Stein Mixture Inference

Stein variational gradient descent (SVGD) [Liu and Wang, 2016] performs approximate Bayesian inference by representing the posterior with a set of particles. However, SVGD suffers from variance collapse, i.e. poor predictions due to underestimating uncertainty [Ba et al., 2021], even for moderately-dimensional models such as small Bayesian neural networks (BNNs). To address this issue, we generalize SVGD by letting each particle parameterize a component distribution in a mixture model. Our method, Stein Mixture Inference (SMI), optimizes a lower bound to the evidence (ELBO) and introduces user-specified guides parameterized by particles. SMI extends the Nonlinear SVGD framework [Wang and Liu, 2019] to the case of variational Bayes. SMI effectively avoids variance collapse, judging by a previously described test developed for this purpose, and performs well on standard data sets. In addition, SMI requires considerably fewer particles than SVGD to accurately estimate uncertainty for small BNNs. The synergistic combination of NSVGD, ELBO optimization and user-specified guides establishes a promising approach towards variational Bayesian inference in the case of tall and wide data.

Updated: 2024-10-30 12:05:12

标题: 使用ELBO对Stein进行推断：具有Stein混合推断的变分贝叶斯

摘要: Stein变分梯度下降（SVGD）[Liu和Wang，2016]通过用一组粒子表示后验来执行近似贝叶斯推断。然而，SVGD存在方差坍塌的问题，即由于低估不确定性而导致预测不准确[Ba等，2021]，即使对于中等维度模型，如小贝叶斯神经网络（BNNs）也是如此。为了解决这个问题，我们通过让每个粒子参数化混合模型中的一个分量分布来推广SVGD。我们的方法，Stein混合推断（SMI），优化证据的一个下界（ELBO），并引入由粒子参数化的用户指定指南。SMI将非线性SVGD框架[Wang和Liu，2019]扩展到变分贝叶斯的情况。通过先前描述的专门开发的测试来判断，SMI有效地避免了方差坍塌，并在标准数据集上表现良好。此外，与SVGD相比，SMI对于准确估计小BNNs的不确定性需要的粒子数量明显较少。NSVGD、ELBO优化和用户指定指南的协同组合为在高度数据和宽度数据情况下的变分贝叶斯推断建立了一种有前途的方法。

更新时间: 2024-10-30 12:05:12

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2410.22948v1

KALAM: toolKit for Automating high-Level synthesis of Analog computing systeMs

Diverse computing paradigms have emerged to meet the growing needs for intelligent energy-efficient systems. The Margin Propagation (MP) framework, being one such initiative in the analog computing domain, stands out due to its scalability across biasing conditions, temperatures, and diminishing process technology nodes. However, the lack of digital-like automation tools for designing analog systems (including that of MP analog) hinders their adoption for designing large systems. The inherent scalability and modularity of MP systems present a unique opportunity in this regard. This paper introduces KALAM (toolKit for Automating high-Level synthesis of Analog computing systeMs), which leverages factor graphs as the foundational paradigm for synthesizing MP-based analog computing systems. Factor graphs are the basis of various signal processing tasks and, when coupled with MP, can be used to design scalable and energy-efficient analog signal processors. Using Python scripting language, the KALAM automation flow translates an input factor graph to its equivalent SPICE-compatible circuit netlist that can be used to validate the intended functionality. KALAM also allows the integration of design optimization strategies such as precision tuning, variable elimination, and mathematical simplification. We demonstrate KALAM's versatility for tasks such as Bayesian inference, Low-Density Parity Check (LDPC) decoding, and Artificial Neural Networks (ANN). Simulation results of the netlists align closely with software implementations, affirming the efficacy of our proposed automation tool.

Updated: 2024-10-30 12:04:22

标题: KALAM：用于自动化模拟计算系统高级综合的工具包

摘要: 多样化的计算范式已经出现，以满足对智能节能系统日益增长的需求。边际传播（MP）框架是模拟计算领域的一个倡议，由于其在偏置条件、温度和逐渐减小的工艺节点方面的可伸缩性而脱颖而出。然而，设计模拟系统（包括MP模拟系统）的缺乏类似数字化的自动化工具阻碍了它们在设计大型系统中的应用。MP系统的固有可伸缩性和模块化性为此提供了独特的机会。本文介绍了KALAM（用于自动化模拟计算系统高级综合的工具包），它利用因子图作为合成基于MP的模拟计算系统的基础范式。因子图是各种信号处理任务的基础，当与MP结合时，可以用来设计可伸缩和节能的模拟信号处理器。使用Python脚本语言，KALAM自动化流程将输入的因子图转换为其等效的与SPICE兼容的电路网表，以验证预期功能。KALAM还允许集成设计优化策略，如精度调整、变量消除和数学简化。我们展示了KALAM在贝叶斯推断、低密度奇偶校验（LDPC）解码和人工神经网络（ANN）等任务中的多功能性。网表的仿真结果与软件实现紧密对齐，证实了我们提出的自动化工具的有效性。

更新时间: 2024-10-30 12:04:22

领域: eess.SY,cs.AR,cs.ET,cs.LG,cs.SY,eess.SP

下载: http://arxiv.org/abs/2410.22946v1

From News to Forecast: Integrating Event Analysis in LLM-Based Time Series Forecasting with Reflection

This paper introduces a novel approach that leverages Large Language Models (LLMs) and Generative Agents to enhance time series forecasting by reasoning across both text and time series data. With language as a medium, our method adaptively integrates social events into forecasting models, aligning news content with time series fluctuations to provide richer insights. Specifically, we utilize LLM-based agents to iteratively filter out irrelevant news and employ human-like reasoning to evaluate predictions. This enables the model to analyze complex events, such as unexpected incidents and shifts in social behavior, and continuously refine the selection logic of news and the robustness of the agent's output. By integrating selected news events with time series data, we fine-tune a pre-trained LLM to predict sequences of digits in time series. The results demonstrate significant improvements in forecasting accuracy, suggesting a potential paradigm shift in time series forecasting through the effective utilization of unstructured news data.

Updated: 2024-10-30 12:04:18

标题: 从新闻到预测：在基于LLM的时间序列预测中整合事件分析和反思

摘要: 本文介绍了一种新颖的方法，利用大型语言模型（LLMs）和生成式代理来增强时间序列预测，通过同时对文本和时间序列数据进行推理。通过语言作为媒介，我们的方法自适应地将社会事件整合到预测模型中，将新闻内容与时间序列波动对齐，提供更丰富的见解。具体来说，我们利用基于LLM的代理来迭代地过滤掉无关的新闻，并采用类人推理来评估预测。这使得模型能够分析复杂事件，如意外事件和社会行为转变，并持续改进新闻选择逻辑和代理输出的稳健性。通过将选定的新闻事件与时间序列数据整合，我们对预训练的LLM进行微调，以预测时间序列中的数字序列。结果表明，在预测准确性方面取得了显著的改进，表明通过有效利用非结构化新闻数据可能引发时间序列预测的范式转变。

更新时间: 2024-10-30 12:04:18

领域: cs.AI

下载: http://arxiv.org/abs/2409.17515v3

Focus On This, Not That! Steering LLMs With Adaptive Feature Specification

Despite the success of Instruction Tuning (IT) in training large language models (LLMs) to perform arbitrary user-specified tasks, these models often still leverage spurious or biased features learned from their training data, leading to undesired behaviours when deploying them in new contexts. In this work, we introduce Focus Instruction Tuning (FIT), which trains LLMs to condition their responses by focusing on specific features whilst ignoring others, leading to different behaviours based on what features are specified. Across several experimental settings, we show that focus-tuned models can be adaptively steered by focusing on different features at inference-time: for instance, robustness can be improved by focusing on task-causal features and ignoring spurious features, and social bias can be mitigated by ignoring demographic categories. Furthermore, FIT can steer behaviour in new contexts, generalising under distribution shift and to new unseen features at inference time, and thereby facilitating more robust, fair, and controllable LLM applications in real-world environments.

Updated: 2024-10-30 12:01:48

标题: 专注于这个，而不是那个！用自适应特征规范引导LLMs

摘要: 尽管指令调整（IT）在训练大型语言模型（LLMs）执行任意用户指定的任务方面取得了成功，但这些模型通常仍然利用从训练数据中学到的虚假或偏见特征，导致在新环境中部署它们时出现不希望的行为。在这项工作中，我们引入了焦点指令调整（FIT），训练LLMs通过专注于特定特征而忽略其他特征来调整其响应，从而根据指定的特征产生不同的行为。在几种实验设置中，我们展示了焦点调整的模型可以通过在推理时专注于不同特征来自适应地引导：例如，通过专注于任务因果特征并忽略虚假特征可以提高鲁棒性，通过忽略人口统计类别可以减轻社会偏见。此外，FIT可以在新环境中引导行为，在分布转移下泛化，并在推理时适应新的未见特征，从而促进更具鲁棒性、公平性和可控性的LLM应用在现实环境中的应用。

更新时间: 2024-10-30 12:01:48

领域: cs.LG,cs.AI,cs.CL

下载: http://arxiv.org/abs/2410.22944v1

Are Your Models Still Fair? Fairness Attacks on Graph Neural Networks via Node Injections

Despite the remarkable capabilities demonstrated by Graph Neural Networks (GNNs) in graph-related tasks, recent research has revealed the fairness vulnerabilities in GNNs when facing malicious adversarial attacks. However, all existing fairness attacks require manipulating the connectivity between existing nodes, which may be prohibited in reality. To this end, we introduce a Node Injection-based Fairness Attack (NIFA), exploring the vulnerabilities of GNN fairness in such a more realistic setting. In detail, NIFA first designs two insightful principles for node injection operations, namely the uncertainty-maximization principle and homophily-increase principle, and then optimizes injected nodes' feature matrix to further ensure the effectiveness of fairness attacks. Comprehensive experiments on three real-world datasets consistently demonstrate that NIFA can significantly undermine the fairness of mainstream GNNs, even including fairness-aware GNNs, by injecting merely 1% of nodes. We sincerely hope that our work can stimulate increasing attention from researchers on the vulnerability of GNN fairness, and encourage the development of corresponding defense mechanisms. Our code and data are released at: https://github.com/CGCL-codes/NIFA.

Updated: 2024-10-30 11:56:49

标题: 你的模型仍然公平吗？通过节点注入对图神经网络进行公平性攻击

摘要: 尽管图神经网络（GNNs）在与图相关的任务中展示了显著的能力，但最近的研究揭示了当面对恶意对抗攻击时，GNNs存在公平性漏洞。然而，所有现有的公平性攻击都需要操纵现有节点之间的连接，这在现实中可能是被禁止的。因此，我们引入了一种基于节点注入的公平攻击（NIFA），在更真实的设置中探索了GNN公平性的漏洞。具体地，NIFA首先为节点注入操作设计了两个富有见地的原则，即不确定性最大化原则和同质性增加原则，然后优化注入节点的特征矩阵，以进一步确保公平性攻击的有效性。对三个真实世界的数据集进行的全面实验一致表明，NIFA可以通过仅注入1％的节点显著破坏主流GNNs甚至包括公平感知的GNNs的公平性。我们真诚希望我们的工作能够引起研究人员对GNN公平性漏洞的关注，并鼓励开发相应的防御机制。我们的代码和数据发布在：https://github.com/CGCL-codes/NIFA。

更新时间: 2024-10-30 11:56:49

领域: cs.LG

下载: http://arxiv.org/abs/2406.03052v2

Thank You, Stingray: Multilingual Large Language Models Can Not (Yet) Disambiguate Cross-Lingual Word Sense

Multilingual large language models (LLMs) have gained prominence, but concerns arise regarding their reliability beyond English. This study addresses the gap in cross-lingual semantic evaluation by introducing a novel benchmark for cross-lingual sense disambiguation, StingrayBench. In this paper, we demonstrate using false friends -- words that are orthographically similar but have completely different meanings in two languages -- as a possible approach to pinpoint the limitation of cross-lingual sense disambiguation in LLMs. We collect false friends in four language pairs, namely Indonesian-Malay, Indonesian-Tagalog, Chinese-Japanese, and English-German; and challenge LLMs to distinguish the use of them in context. In our analysis of various models, we observe they tend to be biased toward higher-resource languages. We also propose new metrics for quantifying the cross-lingual sense bias and comprehension based on our benchmark. Our work contributes to developing more diverse and inclusive language modeling, promoting fairer access for the wider multilingual community.

Updated: 2024-10-30 11:56:17

标题: 谢谢你，魟鱼：多语言大型语言模型尚不能消除跨语言词义歧义

摘要: 多语言大型语言模型(LLMs)已经引起了关注，但人们对其在英语以外的语言中的可靠性产生了担忧。本研究通过引入一个新的跨语义评估基准StingrayBench来填补跨语义歧义的差距。在本文中，我们展示了使用伪同音词--在两种语言中拼写相似但意义完全不同的词--作为一种可能的方法来指出LLMs中跨语义歧义的限制。我们收集了四种语言对中的伪同音词，即印尼语-马来语、印尼语-塔加洛语、中文-日语和英语-德语；并挑战LLMs在上下文中区分它们的使用。在我们对各种模型的分析中，我们观察到它们倾向于偏向资源更丰富的语言。我们还提出了一些新的指标，用于根据我们的基准来量化跨语义偏见和理解。我们的工作有助于开发更多样化和包容性的语言建模，促进更广泛的多语言社区公平访问。

更新时间: 2024-10-30 11:56:17

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2410.21573v2

Certifiably Robust Policies for Uncertain Parametric Environments

We present a data-driven approach for producing policies that are provably robust across unknown stochastic environments. Existing approaches can learn models of a single environment as an interval Markov decision processes (IMDP) and produce a robust policy with a probably approximately correct (PAC) guarantee on its performance. However these are unable to reason about the impact of environmental parameters underlying the uncertainty. We propose a framework based on parametric Markov decision processes (MDPs) with unknown distributions over parameters. We learn and analyse IMDPs for a set of unknown sample environments induced by parameters. The key challenge is then to produce meaningful performance guarantees that combine the two layers of uncertainty: (1) multiple environments induced by parameters with an unknown distribution; (2) unknown induced environments which are approximated by IMDPs. We present a novel approach based on scenario optimisation that yields a single PAC guarantee quantifying the risk level for which a specified performance level can be assured in unseen environments, plus a means to trade-off risk and performance. We implement and evaluate our framework using multiple robust policy generation methods on a range of benchmarks. We show that our approach produces tight bounds on a policy's performance with high confidence.

Updated: 2024-10-30 11:55:41

标题: 可以翻译为：不确定参数环境下的可靠策略认证

摘要: 我们提出了一种基于数据驱动的方法，用于生成在未知随机环境下能被证明具有稳健性的策略。现有方法可以学习单一环境的模型，如区间马尔可夫决策过程（IMDP），并生成具有关于性能的可能近似正确（PAC）保证的稳健策略。然而，这些方法无法推断出底层不确定性的环境参数的影响。我们提出了一个基于未知参数分布的参数化马尔可夫决策过程（MDP）的框架。我们学习并分析了由参数引起的一组未知样本环境的IMDP。关键挑战是产生结合两层不确定性的有意义的性能保证：（1）由未知分布的参数引起的多个环境；（2）由IMDP近似的未知引起的环境。我们提出了一种基于场景优化的新颖方法，产生一个单一的PAC保证，量化了在未知环境中可以保证特定性能水平的风险水平，并提供了一种权衡风险和性能的方法。我们使用多种稳健策略生成方法在一系列基准测试中实施和评估我们的框架。我们展示了我们的方法以高置信度产生对策略性能的紧密界限。

更新时间: 2024-10-30 11:55:41

领域: cs.LG,cs.AI,cs.SY,eess.SY

下载: http://arxiv.org/abs/2408.03093v3

PAC-Bayes-Chernoff bounds for unbounded losses

We introduce a new PAC-Bayes oracle bound for unbounded losses that extends Cram\'er-Chernoff bounds to the PAC-Bayesian setting. The proof technique relies on controlling the tails of certain random variables involving the Cram\'er transform of the loss. Our approach naturally leverages properties of Cram\'er-Chernoff bounds, such as exact optimization of the free parameter in many PAC-Bayes bounds. We highlight several applications of the main theorem. Firstly, we show that our bound recovers and generalizes previous results. Additionally, our approach allows working with richer assumptions that result in more informative and potentially tighter bounds. In this direction, we provide a general bound under a new \textit{model-dependent} assumption from which we obtain bounds based on parameter norms and log-Sobolev inequalities. Notably, many of these bounds can be minimized to obtain distributions beyond the Gibbs posterior and provide novel theoretical coverage to existing regularization techniques.

Updated: 2024-10-30 11:49:21

标题: PAC-Bayes-Chernoff不等式用于无界损失

摘要: 我们引入了一种新的适用于无界损失的PAC-Bayes预测界，将Cram\'er-Chernoff界扩展到PAC-Bayesian设置中。证明技术依赖于控制涉及损失的Cram\'er变换的某些随机变量的尾部。我们的方法自然地利用了Cram\'er-Chernoff界的性质，例如在许多PAC-Bayes界中准确优化自由参数。我们强调了主定理的几个应用。首先，我们展示我们的界恢复并推广了先前的结果。此外，我们的方法允许使用更丰富的假设，导致更具信息性和潜在更紧密的界。在这个方向上，我们提供了一个在新的\textit{模型相关}假设下的一般界，从中我们得到基于参数范数和对数Sobolev不等式的界。值得注意的是，许多这些界可以最小化以获得超越Gibbs后验分布的分布，并为现有的正则化技术提供新颖的理论覆盖。

更新时间: 2024-10-30 11:49:21

领域: stat.ML,cs.LG

下载: http://arxiv.org/abs/2401.01148v4

Mixture of Scales: Memory-Efficient Token-Adaptive Binarization for Large Language Models

Binarization, which converts weight parameters to binary values, has emerged as an effective strategy to reduce the size of large language models (LLMs). However, typical binarization techniques significantly diminish linguistic effectiveness of LLMs. To address this issue, we introduce a novel binarization technique called Mixture of Scales (BinaryMoS). Unlike conventional methods, BinaryMoS employs multiple scaling experts for binary weights, dynamically merging these experts for each token to adaptively generate scaling factors. This token-adaptive approach boosts the representational power of binarized LLMs by enabling contextual adjustments to the values of binary weights. Moreover, because this adaptive process only involves the scaling factors rather than the entire weight matrix, BinaryMoS maintains compression efficiency similar to traditional static binarization methods. Our experimental results reveal that BinaryMoS surpasses conventional binarization techniques in various natural language processing tasks and even outperforms 2-bit quantization methods, all while maintaining similar model size to static binarization techniques.

Updated: 2024-10-30 11:47:40

标题: 多尺度混合：用于大型语言模型的记忆高效的令牌自适应二值化

摘要: 二值化是将权重参数转换为二进制值的一种有效策略，以减小大型语言模型（LLMs）的大小。然而，传统的二值化技术显著降低了LLMs的语言效果。为了解决这个问题，我们引入了一种新颖的二值化技术，称为混合比例（BinaryMoS）。与传统方法不同，BinaryMoS利用多个比例专家来处理二进制权重，动态地合并这些专家以为每个标记自适应生成比例因子。这种标记自适应方法通过使二值化LLMs能够根据上下文调整二进制权重的值来增强表示能力。此外，由于这种自适应过程只涉及比例因子而不是整个权重矩阵，BinaryMoS保持了与传统静态二值化方法类似的压缩效率。我们的实验结果表明，BinaryMoS在各种自然语言处理任务中超越了传统的二值化技术，甚至优于2位量化方法，同时保持了与静态二值化技术相似的模型大小。

更新时间: 2024-10-30 11:47:40

领域: cs.LG

下载: http://arxiv.org/abs/2406.12311v2

DiffLight: A Partial Rewards Conditioned Diffusion Model for Traffic Signal Control with Missing Data

The application of reinforcement learning in traffic signal control (TSC) has been extensively researched and yielded notable achievements. However, most existing works for TSC assume that traffic data from all surrounding intersections is fully and continuously available through sensors. In real-world applications, this assumption often fails due to sensor malfunctions or data loss, making TSC with missing data a critical challenge. To meet the needs of practical applications, we introduce DiffLight, a novel conditional diffusion model for TSC under data-missing scenarios in the offline setting. Specifically, we integrate two essential sub-tasks, i.e., traffic data imputation and decision-making, by leveraging a Partial Rewards Conditioned Diffusion (PRCD) model to prevent missing rewards from interfering with the learning process. Meanwhile, to effectively capture the spatial-temporal dependencies among intersections, we design a Spatial-Temporal transFormer (STFormer) architecture. In addition, we propose a Diffusion Communication Mechanism (DCM) to promote better communication and control performance under data-missing scenarios. Extensive experiments on five datasets with various data-missing scenarios demonstrate that DiffLight is an effective controller to address TSC with missing data. The code of DiffLight is released at https://github.com/lokol5579/DiffLight-release.

Updated: 2024-10-30 11:47:40

标题: DiffLight：一种基于部分奖励条件扩散模型的交通信号控制方法，适用于缺失数据

摘要: 交通信号控制（TSC）中强化学习的应用得到了广泛研究并取得了显著成就。然而，大多数现有的TSC作品假设所有周围交叉口的交通数据通过传感器是完全和持续可用的。在现实世界的应用中，由于传感器故障或数据丢失，这种假设经常失败，使得具有缺失数据的TSC成为一个关键挑战。为了满足实际应用的需求，我们介绍了DiffLight，一种新颖的条件扩散模型，用于处理离线设置中的数据缺失场景下的TSC。具体来说，我们通过利用部分奖励条件扩散（PRCD）模型，集成了两个基本的子任务，即交通数据填充和决策制定，以防止缺失奖励干扰学习过程。同时，为了有效捕捉交叉口之间的时空依赖关系，我们设计了一个空间-时间转换器（STFormer）架构。此外，我们提出了一种扩散通信机制（DCM），以促进在数据缺失情况下更好的通信和控制性能。对五个具有不同数据缺失情况的数据集进行的大量实验证明，DiffLight是一个有效的控制器，可用于处理具有缺失数据的TSC。DiffLight的代码发布在https://github.com/lokol5579/DiffLight-release。

更新时间: 2024-10-30 11:47:40

领域: eess.SY,cs.AI,cs.LG,cs.SY

下载: http://arxiv.org/abs/2410.22938v1

Thoughtful Adoption of NLP for Civic Participation: Understanding Differences Among Policymakers

Natural language processing (NLP) tools have the potential to boost civic participation and enhance democratic processes because they can significantly increase governments' capacity to gather and analyze citizen opinions. However, their adoption in government remains limited, and harnessing their benefits while preventing unintended consequences remains a challenge. While prior work has focused on improving NLP performance, this work examines how different internal government stakeholders influence NLP tools' thoughtful adoption. We interviewed seven politicians (politically appointed officials as heads of government institutions) and thirteen public servants (career government employees who design and administrate policy interventions), inquiring how they choose whether and how to use NLP tools to support civic participation processes. The interviews suggest that policymakers across both groups focused on their needs for career advancement and the need to showcase the legitimacy and fairness of their work when considering NLP tool adoption and use. Because these needs vary between politicians and public servants, their preferred NLP features and tool designs also differ. Interestingly, despite their differing needs and opinions, neither group clearly identifies who should advocate for NLP adoption to enhance civic participation or address the unintended consequences of a poorly considered adoption. This lack of clarity in responsibility might have caused the governments' low adoption of NLP tools. We discuss how these findings reveal new insights for future HCI research. They inform the design of NLP tools for increasing civic participation efficiency and capacity, the design of other tools and methods that ensure thoughtful adoption of AI tools in government, and the design of NLP tools for collaborative use among users with different incentives and needs.

Updated: 2024-10-30 11:46:26

标题: 深思熟虑地采用自然语言处理技术促进公民参与：理解政策制定者之间的差异

摘要: 自然语言处理（NLP）工具有潜力促进公民参与和增强民主进程，因为它们可以显着增加政府收集和分析公民意见的能力。然而，它们在政府中的采用仍然有限，同时在防止意外后果的同时利用其优势仍然是一个挑战。尽管之前的工作集中在改善NLP性能上，但本研究探讨了不同内部政府利益相关者如何影响NLP工具的审慎采用。我们采访了七名政治家（政治任命的政府机构负责人）和十三名公务员（设计和管理政策干预的职业政府雇员），询问他们如何选择是否以及如何使用NLP工具来支持公民参与过程。采访表明，这两组决策者在考虑NLP工具的采用和使用时，都关注于职业晋升的需求以及展示工作的合法性和公平性的需求。由于这些需求在政治家和公务员之间有所不同，因此他们偏好的NLP特性和工具设计也不同。有趣的是，尽管他们的需求和意见不同，但两组都没有明确指出谁应该倡导NLP工具的采用以增强公民参与或解决不慎采用的意外后果。这种责任的不明确可能导致政府对NLP工具的低采用。我们讨论了这些发现如何为未来的人机交互研究提供新的见解。它们为设计NLP工具以增加公民参与效率和能力、设计其他工具和方法以确保政府在审慎采用人工智能工具方面提供了信息，以及为有不同动机和需求的用户设计NLP工具以协作使用提供了指导。

更新时间: 2024-10-30 11:46:26

领域: cs.HC,cs.AI

下载: http://arxiv.org/abs/2410.22937v1

OTTER: Effortless Label Distribution Adaptation of Zero-shot Models

Popular zero-shot models suffer due to artifacts inherited from pretraining. One particularly detrimental issue, caused by unbalanced web-scale pretraining data, is mismatched label distribution. Existing approaches that seek to repair the label distribution are not suitable in zero-shot settings, as they have mismatching requirements, such as needing access to labeled downstream task data or knowledge of the true label balance in the pretraining distribution. We sidestep these challenges and introduce a simple and lightweight approach to adjust pretrained model predictions via optimal transport. Our technique requires only an estimate of the label distribution of a downstream task. Theoretically, we characterize the improvement produced by our procedure under certain mild conditions and provide bounds on the error caused by misspecification. Empirically, we validate our method in a wide array of zero-shot image and text classification tasks, improving accuracy by 4.8% and 15.9% on average, and beating baselines like prior matching -- often by significant margins -- in 17 out of 21 datasets.

Updated: 2024-10-30 11:40:40

标题: OTTER：零样本模型轻松标签分布适应

摘要: Popular zero-shot models suffer from issues inherited from pretraining, such as mismatched label distribution. Existing approaches to fix this issue are not suitable for zero-shot settings, as they require access to labeled downstream task data or knowledge of the true label balance in the pretraining distribution. In this study, we propose a simple and lightweight approach to adjust pretrained model predictions using optimal transport. Our technique only requires an estimate of the label distribution of the downstream task. Theoretically, we analyze the improvement achieved by our method under certain conditions and provide bounds on the error caused by misspecification. Empirically, we validate our method on a variety of zero-shot image and text classification tasks, improving accuracy by an average of 4.8% and 15.9%, outperforming baselines like prior matching in 17 out of 21 datasets.

更新时间: 2024-10-30 11:40:40

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2404.08461v2

An Individual Identity-Driven Framework for Animal Re-Identification

Reliable re-identification of individuals within large wildlife populations is crucial for biological studies, ecological research, and wildlife conservation. Classic computer vision techniques offer a promising direction for Animal Re-identification (Animal ReID), but their backbones' close-set nature limits their applicability and generalizability. Despite the demonstrated effectiveness of vision-language models like CLIP in re-identifying persons and vehicles, their application to Animal ReID remains limited due to unique challenges, such as the various visual representations of animals, including variations in poses and forms. To address these limitations, we leverage CLIP's cross-modal capabilities to introduce a two-stage framework, the \textbf{Indiv}idual \textbf{A}nimal \textbf{ID}entity-Driven (IndivAID) framework, specifically designed for Animal ReID. In the first stage, IndivAID trains a text description generator by extracting individual semantic information from each image, generating both image-specific and individual-specific textual descriptions that fully capture the diverse visual concepts of each individual across animal images. In the second stage, IndivAID refines its learning of visual concepts by dynamically incorporating individual-specific textual descriptions with an integrated attention module to further highlight discriminative features of individuals for Animal ReID. Evaluation against state-of-the-art methods across eight benchmark datasets and a real-world Stoat dataset demonstrates IndivAID's effectiveness and applicability. Code is available at \url{https://github.com/ywu840/IndivAID}.

Updated: 2024-10-30 11:34:55

标题: 一个基于个体身份的动物重新识别框架

摘要: 可靠地重新识别大型野生动物群体中的个体对于生物学研究、生态研究和野生动物保护至关重要。经典的计算机视觉技术为动物重新识别（Animal ReID）提供了一个有前途的方向，但它们的骨干结构的紧密性限制了它们的适用性和普适性。尽管像CLIP这样的视觉-语言模型在重新识别人物和车辆方面表现出了有效性，但由于动物重新识别所面临的独特挑战，例如动物的各种视觉表现，包括姿势和形态的变化，它们在动物重新识别方面的应用仍然有限。为了解决这些限制，我们利用CLIP的跨模态能力引入了一个两阶段框架，即\textbf{Indiv}idual \textbf{A}nimal \textbf{ID}entity-Driven（IndivAID）框架，专门设计用于动物重新识别。在第一阶段，IndivAID通过从每个图像中提取个体语义信息来训练文本描述生成器，生成既包含图像特定又包含个体特定的文本描述，充分捕捉每个个体在动物图像中的多样视觉概念。在第二阶段，IndivAID通过动态结合个体特定的文本描述和集成注意力模块进一步突出个体的区分特征，以提升其对动物重新识别的学习。对八个基准数据集和一个真实的鼬数据集进行评估表明IndivAID的有效性和适用性。代码可在\url{https://github.com/ywu840/IndivAID}上找到。

更新时间: 2024-10-30 11:34:55

领域: cs.CV,cs.LG,68T45

下载: http://arxiv.org/abs/2410.22927v1

BIS: NL2SQL Service Evaluation Benchmark for Business Intelligence Scenarios

NL2SQL (Natural Language to Structured Query Language) transformation has seen wide adoption in Business Intelligence (BI) applications in recent years. However, existing NL2SQL benchmarks are not suitable for production BI scenarios, as they are not designed for common business intelligence questions. To address this gap, we have developed a new benchmark focused on typical NL questions in industrial BI scenarios. We discuss the challenges of constructing a BI-focused benchmark and the shortcomings of existing benchmarks. Additionally, we introduce question categories in our benchmark that reflect common BI inquiries. Lastly, we propose two novel semantic similarity evaluation metrics for assessing NL2SQL capabilities in BI applications and services.

Updated: 2024-10-30 11:33:03

标题: BIS: 面向商业智能场景的NL2SQL服务评估基准

摘要: NL2SQL（自然语言到结构化查询语言）转换近年来在商业智能（BI）应用中得到广泛应用。然而，现有的NL2SQL基准测试不适用于生产BI场景，因为它们并不针对常见的商业智能问题而设计。为了弥补这一差距，我们开发了一个新的基准测试，专注于工业BI场景中的典型NL问题。我们讨论了构建以BI为重点的基准测试所面临的挑战以及现有基准测试的不足之处。此外，我们在我们的基准测试中引入了反映常见BI查询的问题类别。最后，我们提出了两种用于评估NL2SQL在BI应用和服务中能力的新颖语义相似度评估指标。

更新时间: 2024-10-30 11:33:03

领域: cs.AI

下载: http://arxiv.org/abs/2410.22925v1

Simulation-Free Training of Neural ODEs on Paired Data

In this work, we investigate a method for simulation-free training of Neural Ordinary Differential Equations (NODEs) for learning deterministic mappings between paired data. Despite the analogy of NODEs as continuous-depth residual networks, their application in typical supervised learning tasks has not been popular, mainly due to the large number of function evaluations required by ODE solvers and numerical instability in gradient estimation. To alleviate this problem, we employ the flow matching framework for simulation-free training of NODEs, which directly regresses the parameterized dynamics function to a predefined target velocity field. Contrary to generative tasks, however, we show that applying flow matching directly between paired data can often lead to an ill-defined flow that breaks the coupling of the data pairs (e.g., due to crossing trajectories). We propose a simple extension that applies flow matching in the embedding space of data pairs, where the embeddings are learned jointly with the dynamic function to ensure the validity of the flow which is also easier to learn. We demonstrate the effectiveness of our method on both regression and classification tasks, where our method outperforms existing NODEs with a significantly lower number of function evaluations. The code is available at https://github.com/seminkim/simulation-free-node.

Updated: 2024-10-30 11:18:27

标题: 在配对数据上无需仿真的神经ODE训练

摘要: 在这项工作中，我们研究了一种无需模拟训练神经普通微分方程（NODEs）的方法，用于学习成对数据之间的确定性映射。尽管NODEs被类比为连续深度残差网络，但它们在典型的监督学习任务中的应用并不流行，主要是由于ODE求解器需要大量的函数评估以及梯度估计中的数值不稳定性。为了缓解这个问题，我们采用了流匹配框架来无需模拟训练NODEs，该框架直接将参数化的动力学函数回归到预定义的目标速度场。然而，与生成任务相反，我们发现直接在成对数据之间应用流匹配通常会导致一个不明确定的流，破坏数据对的耦合（例如，由于交叉轨迹）。我们提出了一个简单的扩展，在数据对的嵌入空间中应用流匹配，其中嵌入与动态函数一起学习，以确保流的有效性，并且更容易学习。我们在回归和分类任务上展示了我们方法的有效性，我们的方法在比现有NODEs更少的函数评估数量下表现更好。代码可在https://github.com/seminkim/simulation-free-node获取。

更新时间: 2024-10-30 11:18:27

领域: cs.LG

下载: http://arxiv.org/abs/2410.22918v1

Self-optimization in distributed manufacturing systems using Modular State-based Stackelberg Games

In this study, we introduce Modular State-based Stackelberg Games (Mod-SbSG), a novel game structure developed for distributed self-learning in modular manufacturing systems. Mod-SbSG enhances cooperative decision-making among self-learning agents within production systems by integrating State-based Potential Games (SbPG) with Stackelberg games. This hierarchical structure assigns more important modules of the manufacturing system a first-mover advantage, while less important modules respond optimally to the leaders' decisions. This decision-making process differs from typical multi-agent learning algorithms in manufacturing systems, where decisions are made simultaneously. We provide convergence guarantees for the novel game structure and design learning algorithms to account for the hierarchical game structure. We further analyse the effects of single-leader/multiple-follower and multiple-leader/multiple-follower scenarios within a Mod-SbSG. To assess its effectiveness, we implement and test Mod-SbSG in an industrial control setting using two laboratory-scale testbeds featuring sequential and serial-parallel processes. The proposed approach delivers promising results compared to the vanilla SbPG, which reduces overflow by 97.1%, and in some cases, prevents overflow entirely. Additionally, it decreases power consumption by 5-13% while satisfying the production demand, which significantly improves potential (global objective) values.

Updated: 2024-10-30 11:09:31

标题: 分布式制造系统中使用模块化状态Stackelberg博弈进行自我优化

摘要: 在这项研究中，我们介绍了Modular State-based Stackelberg Games（Mod-SbSG），这是一种为模块化制造系统中的分布式自学习开发的新型游戏结构。Mod-SbSG通过将基于状态的潜在博弈（SbPG）与斯塔克尔贝格博弈相结合，增强了生产系统中自学习代理之间的合作决策能力。这种分层结构为制造系统的重要模块分配了先行者优势，而不太重要的模块则对领导者的决策做出最优响应。这种决策过程不同于制造系统中典型的多智能体学习算法，其中决策是同时进行的。我们为这种新颖的游戏结构提供了收敛保证，并设计了学习算法来考虑分层游戏结构。我们进一步分析了在Mod-SbSG中单领导者/多从属者和多领导者/多从属者情景的影响。为了评估其有效性，我们在工业控制环境中使用两个实验室规模的测试平台实施和测试了Mod-SbSG，这两个平台分别展示了顺序和串行-并行过程。与普通的SbPG相比，所提出的方法取得了令人满意的结果，将溢出减少了97.1％，在某些情况下甚至完全阻止了溢出。此外，它还将能耗降低了5-13％，同时满足生产需求，显著提高了潜在（全局目标）价值。

更新时间: 2024-10-30 11:09:31

领域: cs.AI,cs.GT,cs.LG

下载: http://arxiv.org/abs/2410.22912v1

CopRA: A Progressive LoRA Training Strategy

Low-Rank Adaptation (LoRA) is a parameter-efficient technique for rapidly fine-tuning foundation models. In standard LoRA training dynamics, models tend to quickly converge to a local optimum near the initialization. However, this local optimum may not be ideal for out-of-distribution data or tasks such as merging and pruning. In this work, we propose a novel progressive training strategy for LoRA with random layer dropping. This strategy also optimizes the Shapley value of LoRA parameters in each layer, treating each layer as a player in a cooperative game. We refer to this method as Cooperative LoRA (CopRA). Our experimental results demonstrate that parameters trained with CopRA exhibit linear mode connectivity, which enables efficient model merging. This also paves the way for federated learning and multi-task learning via LoRA merging. Additionally, by optimizing the Shapley value, CopRA shows superior performance in pruning tasks.

Updated: 2024-10-30 11:07:09

标题: CopRA：一种渐进式LoRA训练策略

摘要: 低秩适应（LoRA）是一种参数高效的技术，用于快速微调基础模型。在标准的LoRA训练动态中，模型往往会迅速收敛到初始化附近的局部最优解。然而，这个局部最优解可能并不适合于超出分布数据或合并和剪枝等任务。在这项工作中，我们提出了一种新颖的逐步训练策略，即带有随机层丢弃的LoRA。这种策略还优化了每层LoRA参数的Shapley值，将每一层视为合作博弈中的玩家。我们将这种方法称为合作LoRA（CopRA）。我们的实验结果表明，通过CopRA训练的参数表现出线性模式连接性，从而实现了高效的模型合并。这也为通过LoRA合并进行联邦学习和多任务学习铺平了道路。此外，通过优化Shapley值，CopRA在剪枝任务中表现出优越性能。

更新时间: 2024-10-30 11:07:09

领域: cs.LG

下载: http://arxiv.org/abs/2410.22911v1

Federated UCBVI: Communication-Efficient Federated Regret Minimization with Heterogeneous Agents

In this paper, we present the Federated Upper Confidence Bound Value Iteration algorithm ($\texttt{Fed-UCBVI}$), a novel extension of the $\texttt{UCBVI}$ algorithm (Azar et al., 2017) tailored for the federated learning framework. We prove that the regret of $\texttt{Fed-UCBVI}$ scales as $\tilde{\mathcal{O}}(\sqrt{H^3 |\mathcal{S}| |\mathcal{A}| T / M})$, with a small additional term due to heterogeneity, where $|\mathcal{S}|$ is the number of states, $|\mathcal{A}|$ is the number of actions, $H$ is the episode length, $M$ is the number of agents, and $T$ is the number of episodes. Notably, in the single-agent setting, this upper bound matches the minimax lower bound up to polylogarithmic factors, while in the multi-agent scenario, $\texttt{Fed-UCBVI}$ has linear speed-up. To conduct our analysis, we introduce a new measure of heterogeneity, which may hold independent theoretical interest. Furthermore, we show that, unlike existing federated reinforcement learning approaches, $\texttt{Fed-UCBVI}$'s communication complexity only marginally increases with the number of agents.

Updated: 2024-10-30 11:05:50

标题: Federated UCBVI：具有异构代理的通信高效联邦遗憾最小化

摘要: 在这篇论文中，我们提出了联邦式上界置信区间值迭代算法（$\texttt{Fed-UCBVI}$），这是$\texttt{UCBVI}$算法（Azar等人，2017年）在联邦学习框架下的一种新颖扩展。我们证明了$\texttt{Fed-UCBVI}$的后悔度随着$\tilde{\mathcal{O}}(\sqrt{H^3 |\mathcal{S}| |\mathcal{A}| T / M})$的规模增加，其中$|\mathcal{S}|$是状态的数量，$|\mathcal{A}|$是动作的数量，$H$是每一集的长度，$M$是代理数，$T$是集数。值得注意的是，在单代理设置中，这个上界与极小化下界相匹配，而在多代理场景中，$\texttt{Fed-UCBVI}$有线性加速。为了进行分析，我们引入了一个新的异质性衡量方法，这可能具有独立的理论意义。此外，我们展示了与现有的联邦式强化学习方法不同，$\texttt{Fed-UCBVI}$的通信复杂度仅随着代理数的略微增加。

更新时间: 2024-10-30 11:05:50

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2410.22908v1

The Evolution Of The Digital Inheritance: Legal, Technical, And Practical Dimensions Of Cryptocurrency Transfer Through Succession In French-Inspired Legal Systems

In recent years, cryptocurrencies have enjoyed increased popularity in all domains. Thus, in this context, it is important to understand how these digital assets can be transmitted, both legally and efficiently, in the event of the death of their owner. The present paper analyses the mechanisms of cryptocurrencies, analysing from a technical point of view aspects related to blockchain technology, virtual wallets or cryptographic keys, as well as various types of operations regarding this type of virtual currencies. The study also examines the legal aspects related to cryptocurrencies, with an emphasis on the diversity of their status in different global jurisdictions as well as the impact on inheritance planning. The case studies present tangible examples related to successions with cryptocurrencies as the main object, thus completing the exposition related to the main challenges faced by the heirs in the transfer process. In this way, this paper offers possible solutions and recommendations related to inheritance planning with cryptocurrencies as its main object, including the legal and fiscal aspects that must be taken into account when planning a digital succession.

Updated: 2024-10-30 11:05:31

标题: 数字继承的演变：法律、技术和实际维度中的加密货币通过继承在法国法律体系中的转移

摘要: 在近年来，加密货币在各个领域都越来越受到欢迎。因此，在这种情况下，了解这些数字资产在所有者去世时如何合法和高效地传输是很重要的。本文分析了加密货币的机制，从技术角度分析了与区块链技术、虚拟钱包或加密密钥相关的方面，以及关于这种虚拟货币的各种类型操作。研究还检验了与加密货币相关的法律方面，重点放在它们在不同全球司法管辖区的地位多样性以及对继承规划的影响上。案例研究提供了与加密货币为主要对象的继承相关的具体例子，从而补充了与继承人在转移过程中面临的主要挑战相关的论述。通过这种方式，本文提供了与加密货币作为主要对象的继承规划相关的可能解决方案和建议，包括在规划数字继承时必须考虑的法律和税收方面。

更新时间: 2024-10-30 11:05:31

领域: cs.CY,cs.CR

下载: http://arxiv.org/abs/2410.22907v1

Fisher Flow Matching for Generative Modeling over Discrete Data

Generative modeling over discrete data has recently seen numerous success stories, with applications spanning language modeling, biological sequence design, and graph-structured molecular data. The predominant generative modeling paradigm for discrete data is still autoregressive, with more recent alternatives based on diffusion or flow-matching falling short of their impressive performance in continuous data settings, such as image or video generation. In this work, we introduce Fisher-Flow, a novel flow-matching model for discrete data. Fisher-Flow takes a manifestly geometric perspective by considering categorical distributions over discrete data as points residing on a statistical manifold equipped with its natural Riemannian metric: the $\textit{Fisher-Rao metric}$. As a result, we demonstrate discrete data itself can be continuously reparameterised to points on the positive orthant of the $d$-hypersphere $\mathbb{S}^d_+$, which allows us to define flows that map any source distribution to target in a principled manner by transporting mass along (closed-form) geodesics of $\mathbb{S}^d_+$. Furthermore, the learned flows in Fisher-Flow can be further bootstrapped by leveraging Riemannian optimal transport leading to improved training dynamics. We prove that the gradient flow induced by Fisher-Flow is optimal in reducing the forward KL divergence. We evaluate Fisher-Flow on an array of synthetic and diverse real-world benchmarks, including designing DNA Promoter, and DNA Enhancer sequences. Empirically, we find that Fisher-Flow improves over prior diffusion and flow-matching models on these benchmarks.

Updated: 2024-10-30 11:01:10

标题: 费舍尔流匹配用于离散数据生成建模

摘要: 最近，基于离散数据的生成建模已经取得了许多成功案例，应用范围涵盖了语言建模、生物序列设计和图结构分子数据。离散数据的主导生成建模范式仍然是自回归的，而基于扩散或流匹配的更近期的替代方法在连续数据环境中的表现并不如其在图像或视频生成方面的令人印象深刻的性能。在这项工作中，我们介绍了Fisher-Flow，一种用于离散数据的新型流匹配模型。Fisher-Flow以显式的几何视角考虑，将离散数据上的分类分布视为驻留在具有其自然黎曼度量的统计流形上的点：Fisher-Rao度量。因此，我们证明离散数据本身可以连续地重新参数化为正超球体$\mathbb{S}^d_+$上的点，这使我们能够通过沿$\mathbb{S}^d_+$的（封闭形式的）测地线沿线传输质量以一种合理的方式定义将任何源分布映射到目标分布的流。此外，Fisher-Flow中学习到的流可以通过利用黎曼最优输运进一步引导，从而改善训练动态。我们证明由Fisher-Flow引发的梯度流在减少前向KL散度方面是最优的。我们在一系列合成和多样的真实世界基准测试上评估了Fisher-Flow，包括设计DNA启动子和DNA增强子序列。实证结果表明，Fisher-Flow在这些基准测试中优于先前的扩散和流匹配模型。

更新时间: 2024-10-30 11:01:10

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2405.14664v4

YOLOv11 for Vehicle Detection: Advancements, Performance, and Applications in Intelligent Transportation Systems

Accurate vehicle detection is essential for the development of intelligent transportation systems, autonomous driving, and traffic monitoring. This paper presents a detailed analysis of YOLO11, the latest advancement in the YOLO series of deep learning models, focusing exclusively on vehicle detection tasks. Building upon the success of its predecessors, YOLO11 introduces architectural improvements designed to enhance detection speed, accuracy, and robustness in complex environments. Using a comprehensive dataset comprising multiple vehicle types-cars, trucks, buses, motorcycles, and bicycles we evaluate YOLO11's performance using metrics such as precision, recall, F1 score, and mean average precision (mAP). Our findings demonstrate that YOLO11 surpasses previous versions (YOLOv8 and YOLOv10) in detecting smaller and more occluded vehicles while maintaining a competitive inference time, making it well-suited for real-time applications. Comparative analysis shows significant improvements in the detection of complex vehicle geometries, further contributing to the development of efficient and scalable vehicle detection systems. This research highlights YOLO11's potential to enhance autonomous vehicle performance and traffic monitoring systems, offering insights for future developments in the field.

Updated: 2024-10-30 10:57:46

标题: YOLOv11用于车辆检测：智能交通系统中的进展、性能和应用

摘要: 准确的车辆检测对于智能交通系统、自动驾驶和交通监控的发展至关重要。本文详细分析了YOLO系列深度学习模型中最新的进展YOLO11，专注于车辆检测任务。借鉴其前辈的成功，YOLO11引入了旨在增强检测速度、准确性和鲁棒性的架构改进，适用于复杂环境。使用包含多种车辆类型-汽车、卡车、公共汽车、摩托车和自行车的综合数据集，我们评估了YOLO11的性能，使用精度、召回率、F1分数和平均精度（mAP）等指标。我们的研究结果表明，YOLO11在检测较小和更遮挡的车辆方面超越了之前的版本（YOLOv8和YOLOv10），同时保持了竞争力的推断时间，使其非常适用于实时应用。比较分析显示在复杂车辆几何形状检测方面取得了显著改进，进一步促进了高效和可扩展的车辆检测系统的发展。这项研究突出了YOLO11在增强自动驾驶车辆性能和交通监控系统方面的潜力，为未来领域的发展提供了见解。

更新时间: 2024-10-30 10:57:46

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2410.22898v1

Enhancing Neural Network Representations with Prior Knowledge-Based Normalization

Deep learning models face persistent challenges in training, particularly due to internal covariate shift and label shift. While single-mode normalization methods like Batch Normalization partially address these issues, they are constrained by batch size dependencies and limiting distributional assumptions. Multi-mode normalization techniques mitigate these limitations but struggle with computational demands when handling diverse Gaussian distributions. In this paper, we introduce a new approach to multi-mode normalization that leverages prior knowledge to improve neural network representations. Our method organizes data into predefined structures, or "contexts", prior to training and normalizes based on these contexts, with two variants: Context Normalization (CN) and Context Normalization - Extended (CN-X). When contexts are unavailable, we introduce Adaptive Context Normalization (ACN), which dynamically builds contexts in the latent space during training. Across tasks in image classification, domain adaptation, and image generation, our methods demonstrate superior convergence and performance.

Updated: 2024-10-30 10:55:01

标题: 利用基于先验知识的归一化增强神经网络表示

摘要: 深度学习模型在训练过程中面临持续挑战，特别是由于内部协变量漂移和标签漂移。虽然像Batch Normalization这样的单模态归一化方法部分地解决了这些问题，但它们受到批量大小依赖性和限制分布假设的约束。多模态归一化技术缓解了这些限制，但在处理不同的高斯分布时会遇到计算需求的困难。在本文中，我们介绍了一种新的多模态归一化方法，利用先验知识来改进神经网络表示。我们的方法在训练之前将数据组织成预定义的结构或“上下文”，并基于这些上下文进行归一化，有两种变体：上下文归一化（CN）和上下文归一化-扩展（CN-X）。当上下文不可用时，我们引入了自适应上下文归一化（ACN），在训练过程中动态在潜在空间中构建上下文。在图像分类、领域适应和图像生成任务中，我们的方法展示了更优越的收敛性和性能。

更新时间: 2024-10-30 10:55:01

领域: cs.LG,cs.AI,cs.NE

下载: http://arxiv.org/abs/2403.16798v3

Sui Lutris: A Blockchain Combining Broadcast and Consensus

Sui Lutris is the first smart-contract platform to sustainably achieve sub-second finality. It achieves this significant decrease by employing consensusless agreement not only for simple payments but for a large variety of transactions. Unlike prior work, Sui Lutris neither compromises expressiveness nor throughput and can run perpetually without restarts. Sui Lutris achieves this by safely integrating consensuless agreement with a high-throughput consensus protocol that is invoked out of the critical finality path but ensures that when a transaction is at risk of inconsistent concurrent accesses, its settlement is delayed until the total ordering is resolved. Building such a hybrid architecture is especially delicate during reconfiguration events, where the system needs to preserve the safety of the consensusless path without compromising the long-term liveness of potentially misconfigured clients. We thus develop a novel reconfiguration protocol, the first to provably show the safe and efficient reconfiguration of a consensusless blockchain. Sui Lutris is currently running in production and underpins the Sui smart-contract platform. Combined with the use of Objects instead of accounts it enables the safe execution of smart contracts that expose objects as a first-class resource. In our experiments Sui Lutris achieves latency lower than 0.5 seconds for throughput up to 5,000 certificates per second (150k ops/s with transaction blocks), compared to the state-of-the-art real-world consensus latencies of 3 seconds. Furthermore, it gracefully handles validators crash-recovery and does not suffer visible performance degradation during reconfiguration.

Updated: 2024-10-30 10:49:30

标题: Sui Lutris：结合广播和共识的区块链

摘要: Sui Lutris是第一个可以持续实现次秒终局性的智能合约平台。它通过采用非共识协议达成了这一显著降低，不仅适用于简单支付，还适用于各种交易。与先前的工作不同，Sui Lutris既不会损害表达能力，也不会降低吞吐量，并且可以在不重启的情况下持续运行。Sui Lutris通过安全地将非共识协议与高吞吐共识协议集成在一起来实现这一点，后者在关键的终局路径之外被调用，但确保当交易存在不一致的并发访问时，其结算会延迟直到总排序被解决。构建这样的混合架构在重新配置事件期间尤为微妙，系统需要保持共识路径的安全，同时又不会损害潜在配置错误的客户端的长期活力。因此，我们开发了一种新颖的重新配置协议，首次明确展示了共识链的安全和高效重新配置。Sui Lutris目前正在生产中运行，并支持Sui智能合约平台。与账户不同，它使用对象，并且使得智能合约的安全执行可以将对象作为一流资源。在我们的实验中，与现实世界共识延迟为3秒相比，Sui Lutris实现了低于0.5秒的延迟，吞吐量高达每秒5,000个证书（包含事务块的150k ops/s）。此外，它可以优雅地处理验证者的崩溃恢复，并且在重新配置期间不会出现明显的性能降低。

更新时间: 2024-10-30 10:49:30

领域: cs.DC,cs.CR

下载: http://arxiv.org/abs/2310.18042v5

VPO: Leveraging the Number of Votes in Preference Optimization

Direct Preference Optimization (DPO) trains a language model using human preference data, bypassing the explicit reward modeling phase of Reinforcement Learning from Human Feedback (RLHF). By iterating over sentence pairs in a preference dataset, DPO enhances generation quality by increasing the likelihood of producing preferred sentences over less favored ones. Preference datasets are typically created by selecting preferred sentences through a voting process involving multiple individuals, as opinions can vary due to the subjective nature of human preferences. While the number of votes offers insight into whether a sentence pair is clearly preferable or controversial, current methods do not fully leverage this information. In this paper, we introduce a technique that leverages user voting data to better align with diverse subjective preferences. We employ the Bayesian Minimum Mean Square Error (Bayesian MMSE) estimator to model the probability that one generation is preferable to another. Using this estimated probability as a target, we develop the Vote-based Preference Optimization (VPO) framework, which incorporates the number of votes on both sides to distinguish between controversial and obvious generation pairs. We show that previous algorithms, such as DPO and Identity Preference Optimization (IPO), can be extended using the proposed framework, termed VDPO and VIPO. Our experiments demonstrate that these proposed algorithms outperform various existing methods, including their base algorithms.

Updated: 2024-10-30 10:39:34

标题: VPO: 利用偏好优化中的选票数量

摘要: 直接偏好优化（DPO）使用人类偏好数据训练语言模型，绕过了强化学习从人类反馈中学习（RLHF）的明确奖励建模阶段。通过在偏好数据集中迭代句子对，DPO通过增加生成优选句子的可能性而提高生成质量。偏好数据集通常是通过选择首选句子的投票过程创建的，涉及多个个体，因为由于人类偏好的主观性质，意见可能会有所不同。虽然投票次数可以揭示一对句子是否明显更受偏好还是具有争议性，但当前方法并未充分利用这些信息。在本文中，我们引入了一种利用用户投票数据更好地与多样化主观偏好对齐的技术。我们使用贝叶斯最小均方误差（Bayesian MMSE）估计器来建模一个生成优于另一个的概率。利用这个估计概率作为目标，我们开发了基于投票的偏好优化（VPO）框架，该框架整合了双方的投票数，以区分具有争议性和明显的生成对。我们展示了先前的算法，如DPO和身份偏好优化（IPO），可以使用提出的框架进行扩展，称为VDPO和VIPO。我们的实验表明，这些提出的算法优于各种现有方法，包括它们的基本算法。

更新时间: 2024-10-30 10:39:34

领域: cs.LG,cs.AI,cs.CL

下载: http://arxiv.org/abs/2410.22891v1

Effective and Efficient Adversarial Detection for Vision-Language Models via A Single Vector

Visual Language Models (VLMs) are vulnerable to adversarial attacks, especially those from adversarial images, which is however under-explored in literature. To facilitate research on this critical safety problem, we first construct a new laRge-scale Adervsarial images dataset with Diverse hArmful Responses (RADAR), given that existing datasets are either small-scale or only contain limited types of harmful responses. With the new RADAR dataset, we further develop a novel and effective iN-time Embedding-based AdveRSarial Image DEtection (NEARSIDE) method, which exploits a single vector that distilled from the hidden states of VLMs, which we call the attacking direction, to achieve the detection of adversarial images against benign ones in the input. Extensive experiments with two victim VLMs, LLaVA and MiniGPT-4, well demonstrate the effectiveness, efficiency, and cross-model transferrability of our proposed method. Our code is available at https://github.com/mob-scu/RADAR-NEARSIDE

Updated: 2024-10-30 10:33:10

标题: 通过单一向量实现视觉-语言模型的有效和高效的对抗性检测

摘要: 视觉语言模型（VLMs）容易受到对抗攻击，特别是来自对抗性图像的攻击，然而这方面的研究在文献中尚未得到充分探讨。为了促进对这一关键安全问题的研究，我们首先构建了一个新的大规模具有多样有害响应的对抗性图像数据集（RADAR），考虑到现有数据集要么规模较小，要么只包含有限类型的有害响应。借助新的RADAR数据集，我们进一步开发了一种新颖有效的基于嵌入式对抗图像检测（NEARSIDE）方法，该方法利用从VLMs的隐藏状态中提取的单个向量，即攻击方向，实现了对输入中对抗性图像与良性图像的检测。通过对两个受害VLMs，LLaVA和MiniGPT-4进行广泛实验，充分展示了我们提出的方法的有效性、效率和跨模型的可转移性。我们的代码可在https://github.com/mob-scu/RADAR-NEARSIDE 上找到。

更新时间: 2024-10-30 10:33:10

领域: cs.CV,cs.CL,cs.CR

下载: http://arxiv.org/abs/2410.22888v1

Generalization Bounds via Conditional $f$-Information

In this work, we introduce novel information-theoretic generalization bounds using the conditional $f$-information framework, an extension of the traditional conditional mutual information (MI) framework. We provide a generic approach to derive generalization bounds via $f$-information in the supersample setting, applicable to both bounded and unbounded loss functions. Unlike previous MI-based bounds, our proof strategy does not rely on upper bounding the cumulant-generating function (CGF) in the variational formula of MI. Instead, we set the CGF or its upper bound to zero by carefully selecting the measurable function invoked in the variational formula. Although some of our techniques are partially inspired by recent advances in the coin-betting framework (e.g., Jang et al. (2023)), our results are independent of any previous findings from regret guarantees of online gambling algorithms. Additionally, our newly derived MI-based bound recovers many previous results and improves our understanding of their potential limitations. Finally, we empirically compare various $f$-information measures for generalization, demonstrating the improvement of our new bounds over the previous bounds.

Updated: 2024-10-30 10:33:07

标题: 通过条件 $f$-信息的泛化界限

摘要: 在这项工作中，我们引入了使用条件$f$-信息框架的新颖信息论泛化界限，这是传统条件互信息（MI）框架的扩展。我们提供了一个通用方法，通过超样本设置中的$f$-信息来推导泛化界限，适用于有界和无界损失函数。与先前基于MI的界限不同，我们的证明策略并不依赖于在MI的变分公式中上界估计生成函数（CGF）。相反，我们通过精心选择在变分公式中调用的可测函数，将CGF或其上界设为零。尽管我们的一些技术部分受到最近硬币投注框架的进展启发（例如，Jang等人（2023）），但我们的结果与在线赌博算法的遗憾保证没有任何前期发现有关。此外，我们新推导的基于MI的界限恢复了许多先前的结果，并提高了我们对其潜在局限性的理解。最后，我们通过实证比较各种$f$-信息度量来进行泛化，展示了我们的新界限相对于先前界限的改进。

更新时间: 2024-10-30 10:33:07

领域: stat.ML,cs.IT,cs.LG,math.IT

下载: http://arxiv.org/abs/2410.22887v1

Less is More: Pre-Training Cross-Lingual Small-Scale Language Models with Cognitively-Plausible Curriculum Learning Strategies

Curriculum Learning has been a popular strategy to improve the cognitive plausibility of Small-Scale Language Models (SSLMs) in the BabyLM Challenge. However, it has not led to considerable improvements over non-curriculum models. We assess whether theoretical linguistic acquisition theories can be used to specify more fine-grained curriculum learning strategies, creating age-ordered corpora of Child-Directed Speech for four typologically distant language families to implement SSLMs and acquisition-inspired curricula cross-lingually. Comparing the success of three objective curricula (Growing, Inwards and MMM) that precisely replicate the predictions of acquisition theories on a standard SSLM architecture, we find fine-grained acquisition-inspired curricula can outperform non-curriculum baselines and performance benefits of curricula strategies in SSLMs can be derived by specifying fine-grained language-specific curricula that precisely replicate language acquisition theories.

Updated: 2024-10-30 10:31:54

标题: Less is More: 使用认知合理的课程学习策略预训练跨语言小规模语言模型

摘要: 课程学习一直是一种流行的策略，用来提高小规模语言模型（SSLMs）在BabyLM挑战中的认知可信度。然而，它并没有带来比非课程模型更显著的改进。我们评估了理论语言习得理论是否可以用来指定更精细的课程学习策略，创建四个语言家族的婴儿导向语料库的按年龄排序，以实现跨语言的SSLMs和受习得启发的课程。通过比较三种客观课程（Growing、Inwards和MMM），这些课程精确地复制了在标准SSLM架构上对习得理论的预测，我们发现细粒度的习得启发课程可以胜过非课程基线，并且在SSLMs中课程策略的绩效优势可以通过指定精细的语言特定课程来获得，这些课程精确地复制了语言习得理论。

更新时间: 2024-10-30 10:31:54

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2410.22886v1

L3Cube-IndicQuest: A Benchmark Question Answering Dataset for Evaluating Knowledge of LLMs in Indic Context

Large Language Models (LLMs) have made significant progress in incorporating Indic languages within multilingual models. However, it is crucial to quantitatively assess whether these languages perform comparably to globally dominant ones, such as English. Currently, there is a lack of benchmark datasets specifically designed to evaluate the regional knowledge of LLMs in various Indic languages. In this paper, we present the L3Cube-IndicQuest, a gold-standard factual question-answering benchmark dataset designed to evaluate how well multilingual LLMs capture regional knowledge across various Indic languages. The dataset contains 200 question-answer pairs, each for English and 19 Indic languages, covering five domains specific to the Indic region. We aim for this dataset to serve as a benchmark, providing ground truth for evaluating the performance of LLMs in understanding and representing knowledge relevant to the Indian context. The IndicQuest can be used for both reference-based evaluation and LLM-as-a-judge evaluation. The dataset is shared publicly at https://github.com/l3cube-pune/indic-nlp .

Updated: 2024-10-30 10:30:57

标题: L3Cube-IndicQuest：一个用于评估Indic语境中LLM知识的基准问答数据集

摘要: 大型语言模型（LLMs）在将印度语言纳入多语言模型中取得了显著进展。然而，关键在于定量评估这些语言是否与全球主导语言（如英语）表现相当。目前，缺乏专门设计用于评估LLMs在各种印度语言中的区域知识的基准数据集。在本文中，我们提出了L3Cube-IndicQuest，这是一个设计用于评估多语言LLMs在各种印度语言中捕捉区域知识能力的金标准事实问答基准数据集。该数据集包含200个问题-答案对，每个对应英语和19种印度语言，涵盖印度地区特定的五个领域。我们希望该数据集可以作为一个基准，为评估LLMs在理解和代表与印度背景相关的知识表现提供基准。IndicQuest可用于基于参考的评估和LLM作为评判者的评估。该数据集已公开共享在https://github.com/l3cube-pune/indic-nlp。

更新时间: 2024-10-30 10:30:57

领域: cs.CL,cs.LG

下载: http://arxiv.org/abs/2409.08706v2

Enhancing Preference-based Linear Bandits via Human Response Time

Interactive preference learning systems present humans with queries as pairs of options; humans then select their preferred choice, allowing the system to infer preferences from these binary choices. While binary choice feedback is simple and widely used, it offers limited information about preference strength. To address this, we leverage human response times, which inversely correlate with preference strength, as complementary information. We introduce a computationally efficient method based on the EZ-diffusion model, combining choices and response times to estimate the underlying human utility function. Theoretical and empirical comparisons with traditional choice-only estimators show that for queries where humans have strong preferences (i.e., "easy" queries), response times provide valuable complementary information and enhance utility estimates. We integrate this estimator into preference-based linear bandits for fixed-budget best-arm identification. Simulations on three real-world datasets demonstrate that incorporating response times significantly accelerates preference learning.

Updated: 2024-10-30 10:28:36

标题: 通过人类响应时间增强基于偏好的线性赌博机算法

摘要: 交互式偏好学习系统向人类提供选项对作为查询；人类然后选择他们更喜欢的选择，使系统能够从这些二元选择中推断偏好。虽然二元选择反馈简单且广泛使用，但它对偏好强度提供的信息有限。为了解决这个问题，我们利用人类反应时间作为补充信息，因为它与偏好强度呈负相关。我们引入了一种基于EZ扩散模型的计算高效方法，结合选择和反应时间来估计潜在的人类效用函数。理论和实证比较表明，在人类有强烈偏好的查询（即“容易”查询）中，反应时间提供了有价值的补充信息，并增强了效用估计。我们将这种估计器整合到基于偏好的线性赌博机中，用于固定预算的最佳臂识别。对三个真实数据集的模拟表明，整合反应时间显著加速了偏好学习。

更新时间: 2024-10-30 10:28:36

领域: cs.LG,cs.AI,cs.HC,econ.EM,stat.ML

下载: http://arxiv.org/abs/2409.05798v3

Graph Neural Flows for Unveiling Systemic Interactions Among Irregularly Sampled Time Series

Interacting systems are prevalent in nature. It is challenging to accurately predict the dynamics of the system if its constituent components are analyzed independently. We develop a graph-based model that unveils the systemic interactions of time series observed at irregular time points, by using a directed acyclic graph to model the conditional dependencies (a form of causal notation) of the system components and learning this graph in tandem with a continuous-time model that parameterizes the solution curves of ordinary differential equations (ODEs). Our technique, a graph neural flow, leads to substantial enhancements over non-graph-based methods, as well as graph-based methods without the modeling of conditional dependencies. We validate our approach on several tasks, including time series classification and forecasting, to demonstrate its efficacy.

Updated: 2024-10-30 10:25:43

标题: 图神经流用于揭示不规则采样时间序列之间的系统性相互作用

摘要: 相互作用系统在自然界中很常见。如果将系统的组成部分独立分析，要准确预测系统的动态是具有挑战性的。我们发展了一种基于图的模型，通过使用有向无环图来建模系统组件的条件依赖关系（一种因果符号形式），揭示在不规则时间点观察到的时间序列的系统相互作用。我们同时学习这个图和一个参数化常微分方程解曲线的连续时间模型，这个技术被称为图神经流（graph neural flow）。与非基于图的方法以及没有建模条件依赖关系的基于图的方法相比，我们的技术带来了显著的改进。我们在多个任务上验证了我们的方法，包括时间序列分类和预测，以展示其有效性。

更新时间: 2024-10-30 10:25:43

领域: cs.LG,cs.CL

下载: http://arxiv.org/abs/2410.14030v2

Stealing User Prompts from Mixture of Experts

Mixture-of-Experts (MoE) models improve the efficiency and scalability of dense language models by routing each token to a small number of experts in each layer. In this paper, we show how an adversary that can arrange for their queries to appear in the same batch of examples as a victim's queries can exploit Expert-Choice-Routing to fully disclose a victim's prompt. We successfully demonstrate the effectiveness of this attack on a two-layer Mixtral model, exploiting the tie-handling behavior of the torch.topk CUDA implementation. Our results show that we can extract the entire prompt using $O({VM}^2)$ queries (with vocabulary size $V$ and prompt length $M$) or 100 queries on average per token in the setting we consider. This is the first attack to exploit architectural flaws for the purpose of extracting user prompts, introducing a new class of LLM vulnerabilities.

Updated: 2024-10-30 10:25:35

标题: 从专家混合中窃取用户提示

摘要: Mixture-of-Experts（MoE）模型通过将每个令牌路由到每层的少数专家来提高密集语言模型的效率和可扩展性。在本文中，我们展示了一个对手可以安排他们的查询出现在与受害者的查询相同的示例批次中，从而利用专家选择路由完全披露受害者提示的方法。我们成功地在一个两层Mixtral模型上展示了这种攻击的有效性，利用了torch.topk CUDA实现的绑定处理行为。我们的结果表明，在我们考虑的设置中，我们可以使用$O({VM}^2)$个查询（其中$V$是词汇量，$M$是提示长度）或平均每个令牌100个查询来提取整个提示。这是第一个利用架构缺陷来提取用户提示的攻击，引入了一类新的LLM漏洞。

更新时间: 2024-10-30 10:25:35

领域: cs.CR,cs.AI,cs.CL,cs.LG

下载: http://arxiv.org/abs/2410.22884v1

Adaptive Paradigm Synergy: Can a Cross-Paradigm Objective Enhance Long-Tailed Learning?

Self-supervised learning (SSL) has achieved impressive results across several computer vision tasks, even rivaling supervised methods. However, its performance degrades on real-world datasets with long-tailed distributions due to difficulties in capturing inherent class imbalances. Although supervised long-tailed learning offers significant insights, the absence of labels in SSL prevents direct transfer of these strategies.To bridge this gap, we introduce Adaptive Paradigm Synergy (APS), a cross-paradigm objective that seeks to unify the strengths of both paradigms. Our approach reexamines contrastive learning from a spatial structure perspective, dynamically adjusting the uniformity of latent space structure through adaptive temperature tuning. Furthermore, we draw on a re-weighting strategy from supervised learning to compensate for the shortcomings of temperature adjustment in explicit quantity perception.Extensive experiments on commonly used long-tailed datasets demonstrate that APS improves performance effectively and efficiently. Our findings reveal the potential for deeper integration between supervised and self-supervised learning, paving the way for robust models that handle real-world class imbalance.

Updated: 2024-10-30 10:25:22

标题: 自适应范式协同：跨范式目标是否可以增强长尾学习？

摘要: 自监督学习（SSL）在几个计算机视觉任务中取得了令人印象深刻的成果，甚至与监督方法媲美。然而，在长尾分布的真实世界数据集上，其性能会下降，原因是难以捕捉固有的类别不平衡。尽管监督长尾学习提供了重要见解，但SSL中缺乏标签阻碍了这些策略的直接传递。为了弥合这一差距，我们引入了自适应范式协同（APS），这是一个跨范式目标，旨在统一两种范式的优势。我们的方法从空间结构的角度重新审视对比学习，通过自适应温度调节动态调整潜在空间结构的均匀性。此外，我们借鉴了监督学习中的重新加权策略，以弥补显式数量感知中温度调整的缺点。对常用的长尾数据集进行的大量实验表明，APS有效且高效地提高了性能。我们的研究结果揭示了监督学习和自监督学习之间更深层次整合的潜力，为处理真实世界类别不平衡的强大模型铺平了道路。

更新时间: 2024-10-30 10:25:22

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2410.22883v1

SFA-UNet: More Attention to Multi-Scale Contrast and Contextual Information in Infrared Small Object Segmentation

Computer vision researchers have extensively worked on fundamental infrared visual recognition for the past few decades. Among various approaches, deep learning has emerged as the most promising candidate. However, Infrared Small Object Segmentation (ISOS) remains a major focus due to several challenges including: 1) the lack of effective utilization of local contrast and global contextual information; 2) the potential loss of small objects in deep models; and 3) the struggling to capture fine-grained details and ignore noise. To address these challenges, we propose a modified U-Net architecture, named SFA-UNet, by combining Scharr Convolution (SC) and Fast Fourier Convolution (FFC) in addition to vertical and horizontal Attention gates (AG) into UNet. SFA-UNet utilizes double convolution layers with the addition of SC and FFC in its encoder and decoder layers. SC helps to learn the foreground-to-background contrast information whereas FFC provide multi-scale contextual information while mitigating the small objects vanishing problem. Additionally, the introduction of vertical AGs in encoder layers enhances the model's focus on the targeted object by ignoring irrelevant regions. We evaluated the proposed approach on publicly available, SIRST and IRSTD datasets, and achieved superior performance by an average 0.75% with variance of 0.025 of all combined metrics in multiple runs as compared to the existing state-of-the-art methods

Updated: 2024-10-30 10:21:23

标题: SFA-UNet：红外小目标分割中更多关注多尺度对比和上下文信息

摘要: 计算机视觉研究人员在过去几十年里广泛致力于基本红外视觉识别。在各种方法中，深度学习已被认为是最有前景的候选者。然而，红外小物体分割（ISOS）仍然是一个主要关注点，因为存在一些挑战，包括：1）未能有效利用局部对比和全局上下文信息；2）在深度模型中可能会丢失小物体；3）难以捕捉细粒度细节并忽略噪音。为了解决这些挑战，我们提出了一种改进的U-Net架构，名为SFA-UNet，通过将Scharr卷积（SC）和快速傅里叶卷积（FFC）与UNet中的垂直和水平注意门（AG）相结合。SFA-UNet在其编码器和解码器层中使用双卷积层，并添加了SC和FFC。SC有助于学习前景到背景的对比信息，而FFC提供多尺度的上下文信息，同时减轻小物体消失的问题。此外，在编码器层引入垂直AG增强了模型对目标对象的关注，忽略了不相关区域。我们在公开可用的SIRST和IRSTD数据集上评估了提出的方法，并与现有最先进方法相比，在多次运行中，所有综合指标的平均值提高了0.75％，方差为0.025，表现出卓越的性能。

更新时间: 2024-10-30 10:21:23

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2410.22881v1

Block Sparse Bayesian Learning: A Diversified Scheme

This paper introduces a novel prior called Diversified Block Sparse Prior to characterize the widespread block sparsity phenomenon in real-world data. By allowing diversification on intra-block variance and inter-block correlation matrices, we effectively address the sensitivity issue of existing block sparse learning methods to pre-defined block information, which enables adaptive block estimation while mitigating the risk of overfitting. Based on this, a diversified block sparse Bayesian learning method (DivSBL) is proposed, utilizing EM algorithm and dual ascent method for hyperparameter estimation. Moreover, we establish the global and local optimality theory of our model. Experiments validate the advantages of DivSBL over existing algorithms.

Updated: 2024-10-30 10:13:21

标题: 稀疏区块贝叶斯学习：一种多样化方案

摘要: 本文介绍了一种称为Diversified Block Sparse Prior的新型先验，用于表征现实世界数据中普遍存在的块稀疏现象。通过允许对块内方差和块间相关性矩阵进行多样化处理，我们有效地解决了现有块稀疏学习方法对预定义块信息的敏感性问题，从而实现了自适应块估计并减轻了过拟合的风险。基于此，提出了一种多样化块稀疏贝叶斯学习方法（DivSBL），利用EM算法和双升算法进行超参数估计。此外，我们建立了模型的全局和局部最优性理论。实验证实了DivSBL相对于现有算法的优势。

更新时间: 2024-10-30 10:13:21

领域: cs.LG,math.OC

下载: http://arxiv.org/abs/2402.04646v2

VISAGE: Video Synthesis using Action Graphs for Surgery

Surgical data science (SDS) is a field that analyzes patient data before, during, and after surgery to improve surgical outcomes and skills. However, surgical data is scarce, heterogeneous, and complex, which limits the applicability of existing machine learning methods. In this work, we introduce the novel task of future video generation in laparoscopic surgery. This task can augment and enrich the existing surgical data and enable various applications, such as simulation, analysis, and robot-aided surgery. Ultimately, it involves not only understanding the current state of the operation but also accurately predicting the dynamic and often unpredictable nature of surgical procedures. Our proposed method, VISAGE (VIdeo Synthesis using Action Graphs for Surgery), leverages the power of action scene graphs to capture the sequential nature of laparoscopic procedures and utilizes diffusion models to synthesize temporally coherent video sequences. VISAGE predicts the future frames given only a single initial frame, and the action graph triplets. By incorporating domain-specific knowledge through the action graph, VISAGE ensures the generated videos adhere to the expected visual and motion patterns observed in real laparoscopic procedures. The results of our experiments demonstrate high-fidelity video generation for laparoscopy procedures, which enables various applications in SDS.

Updated: 2024-10-30 10:13:18

标题: 视觉感知：利用手术动作图像合成视频

摘要: 外科数据科学（SDS）是一种分析手术前、手术中和手术后患者数据以改善手术结果和技能的领域。然而，手术数据稀缺、异质且复杂，这限制了现有机器学习方法的适用性。在这项工作中，我们引入了腹腔镜手术中未来视频生成的新任务。这一任务可以增强和丰富现有的手术数据，并实现各种应用，如模拟、分析和机器辅助手术。最终，这不仅涉及理解手术过程的当前状态，还准确预测手术程序的动态和常常不可预测的特性。我们提出的方法VISAGE（使用动作图进行手术视频合成）利用动作场景图的力量捕捉腹腔镜手术程序的顺序性质，并利用扩散模型合成时间上连贯的视频序列。VISAGE仅凭借单个初始帧和动作图三元组就可以预测未来帧。通过动作图融入领域特定知识，VISAGE确保生成的视频符合真实腹腔镜手术中观察到的期望视觉和运动模式。我们的实验结果展示了腹腔镜手术程序的高保真视频生成，从而实现了SDS中的各种应用。

更新时间: 2024-10-30 10:13:18

领域: cs.CV,cs.AI,cs.LG

下载: http://arxiv.org/abs/2410.17751v2

Stealth edits to large language models

We reveal the theoretical foundations of techniques for editing large language models, and present new methods which can do so without requiring retraining. Our theoretical insights show that a single metric (a measure of the intrinsic dimension of the model's features) can be used to assess a model's editability and reveals its previously unrecognised susceptibility to malicious stealth attacks. This metric is fundamental to predicting the success of a variety of editing approaches, and reveals new bridges between disparate families of editing methods. We collectively refer to these as stealth editing methods, because they directly update a model's weights to specify its response to specific known hallucinating prompts without affecting other model behaviour. By carefully applying our theoretical insights, we are able to introduce a new jet-pack network block which is optimised for highly selective model editing, uses only standard network operations, and can be inserted into existing networks. We also reveal the vulnerability of language models to stealth attacks: a small change to a model's weights which fixes its response to a single attacker-chosen prompt. Stealth attacks are computationally simple, do not require access to or knowledge of the model's training data, and therefore represent a potent yet previously unrecognised threat to redistributed foundation models. Extensive experimental results illustrate and support our methods and their theoretical underpinnings. Demos and source code are available at https://github.com/qinghua-zhou/stealth-edits.

Updated: 2024-10-30 10:12:24

标题: 大型语言模型的隐形编辑

摘要: 我们揭示了编辑大型语言模型的技术的理论基础，并提出了新的方法，可以在无需重新训练的情况下进行编辑。我们的理论洞见显示，可以使用单一度量（模型特征的内在维度的度量）来评估模型的可编辑性，并揭示其先前未被认识到的对恶意隐身攻击的敏感性。这个度量对于预测各种编辑方法的成功至关重要，并揭示了不同家族的编辑方法之间的新桥梁。我们统称这些方法为隐身编辑方法，因为它们直接更新模型的权重，以指定其对特定已知幻觉提示的响应，而不影响其他模型行为。通过精心应用我们的理论洞见，我们能够引入一个新的喷气背包网络块，该块经过优化，可以用于高度选择性的模型编辑，仅使用标准网络操作，并可插入到现有网络中。我们还揭示了语言模型对隐身攻击的脆弱性：对模型权重进行微小更改，以修正其对单个攻击者选择的提示的响应。隐身攻击计算简单，不需要访问或了解模型的训练数据，因此代表了一种强大但以前未被认识到的对分布式基础模型的威胁。大量实验结果说明和支持我们的方法及其理论基础。演示和源代码可在https://github.com/qinghua-zhou/stealth-edits 上找到。

更新时间: 2024-10-30 10:12:24

领域: cs.AI,cs.LG,68T07, 68T50, 68W40,I.2.7; F.2.0

下载: http://arxiv.org/abs/2406.12670v2

Eliciting Critical Reasoning in Retrieval-Augmented Language Models via Contrastive Explanations

Retrieval-augmented generation (RAG) has emerged as a critical mechanism in contemporary NLP to support Large Language Models(LLMs) in systematically accessing richer factual context. However, the integration of RAG mechanisms brings its inherent challenges, as LLMs need to deal with potentially noisy contexts. Recent studies have shown that LLMs still struggle to critically analyse RAG-based in-context information, a limitation that may lead to incorrect inferences and hallucinations. In this paper, we investigate how to elicit critical reasoning in RAG via contrastive explanations. In particular, we propose Contrastive-RAG (C-RAG), a framework that (i) retrieves relevant documents given a query, (ii) selects and exemplifies relevant passages, and (iii) generates explanations that explicitly contrast the relevance of the passages to (iv) support the final answer. We show the impact of C-RAG building contrastive reasoning demonstrations from LLMs to instruct smaller models for retrieval-augmented tasks. Extensive experiments demonstrate that C-RAG improves state-of-the-art RAG models while (a) requiring significantly fewer prompts and demonstrations and (b) being robust to perturbations in the retrieved documents.

Updated: 2024-10-30 10:11:53

标题: 通过对比解释引发检索增强语言模型中的关键推理

摘要: 检索增强生成（RAG）已成为当代自然语言处理中的关键机制，以支持大型语言模型（LLMs）系统地访问更丰富的事实背景。然而，RAG机制的整合带来了固有的挑战，因为LLMs需要处理潜在的嘈杂上下文。最近的研究表明，LLMs仍然难以批判性地分析基于RAG的上下文信息，这一限制可能导致不正确的推断和幻觉。在本文中，我们研究了如何通过对比解释来引发RAG中的批判性推理。特别是，我们提出了对比-RAG（C-RAG）框架，该框架（i）在给定查询时检索相关文档，（ii）选择并举例相关段落，（iii）生成明确对比相关性的解释，以支持最终答案。我们展示了C-RAG从LLMs构建对比推理演示对于指导较小模型进行检索增强任务的影响。广泛的实验表明，C-RAG提高了最先进的RAG模型，同时（a）需要显著较少的提示和演示，（b）对检索文档中的扰动具有稳健性。

更新时间: 2024-10-30 10:11:53

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2410.22874v1

Zero-Shot Reinforcement Learning from Low Quality Data

Zero-shot reinforcement learning (RL) promises to provide agents that can perform any task in an environment after an offline, reward-free pre-training phase. Methods leveraging successor measures and successor features have shown strong performance in this setting, but require access to large heterogenous datasets for pre-training which cannot be expected for most real problems. Here, we explore how the performance of zero-shot RL methods degrades when trained on small homogeneous datasets, and propose fixes inspired by conservatism, a well-established feature of performant single-task offline RL algorithms. We evaluate our proposals across various datasets, domains and tasks, and show that conservative zero-shot RL algorithms outperform their non-conservative counterparts on low quality datasets, and perform no worse on high quality datasets. Somewhat surprisingly, our proposals also outperform baselines that get to see the task during training. Our code is available via https://enjeeneer.io/projects/zero-shot-rl/ .

Updated: 2024-10-30 10:11:03

标题: 零样本强化学习从低质量数据中

摘要: 零样本强化学习（RL）承诺在离线、无奖励的预训练阶段后，提供可以在环境中执行任何任务的代理。利用继任者度量和继任者特征的方法在这种情况下表现出色，但需要访问大型异构数据集进行预训练，这对大多数真实问题来说是不可预期的。在这里，我们探讨了零样本RL方法在训练在小型同质数据集时性能如何下降，并提出了受保守主义启发的修复方法，这是性能良好的单任务离线RL算法的一个基本特征。我们评估了我们的提议在各种数据集、领域和任务上的表现，并展示了保守的零样本RL算法在低质量数据集上优于非保守的对应算法，并且在高质量数据集上表现不差。令人惊讶的是，我们的提议还优于在训练过程中看到任务的基线。我们的代码可以通过https://enjeeneer.io/projects/zero-shot-rl/ 获取。

更新时间: 2024-10-30 10:11:03

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2309.15178v3

Data subsampling for Poisson regression with pth-root-link

We develop and analyze data subsampling techniques for Poisson regression, the standard model for count data $y\in\mathbb{N}$. In particular, we consider the Poisson generalized linear model with ID- and square root-link functions. We consider the method of coresets, which are small weighted subsets that approximate the loss function of Poisson regression up to a factor of $1\pm\varepsilon$. We show $\Omega(n)$ lower bounds against coresets for Poisson regression that continue to hold against arbitrary data reduction techniques up to logarithmic factors. By introducing a novel complexity parameter and a domain shifting approach, we show that sublinear coresets with $1\pm\varepsilon$ approximation guarantee exist when the complexity parameter is small. In particular, the dependence on the number of input points can be reduced to polylogarithmic. We show that the dependence on other input parameters can also be bounded sublinearly, though not always logarithmically. In particular, we show that the square root-link admits an $O(\log(y_{\max}))$ dependence, where $y_{\max}$ denotes the largest count presented in the data, while the ID-link requires a $\Theta(\sqrt{y_{\max}/\log(y_{\max})})$ dependence. As an auxiliary result for proving the tightness of the bound with respect to $y_{\max}$ in the case of the ID-link, we show an improved bound on the principal branch of the Lambert $W_0$ function, which may be of independent interest. We further show the limitations of our analysis when $p$th degree root-link functions for $p\geq 3$ are considered, which indicate that other analytical or computational methods would be required if such a generalization is even possible.

Updated: 2024-10-30 10:09:05

标题: 使用pth-root-link的泊松回归数据子采样

摘要: 我们开发并分析了用于泊松回归的数据子采样技术，这是计数数据$y\in\mathbb{N}$的标准模型。具体而言，我们考虑具有ID和平方根连接函数的泊松广义线性模型。我们考虑核心集方法，这些核心集是小型加权子集，可以近似泊松回归的损失函数，误差因子为$1\pm\varepsilon$。我们展示了针对泊松回归的核心集的$\Omega(n)$下界，这些下界仍然适用于对数因子的任意数据减少技术。通过引入一种新颖的复杂性参数和域转移方法，我们展示了当复杂性参数较小时，具有$1\pm\varepsilon$近似保证的次线性核心集是存在的。特别是，对输入点数的依赖性可以降低到多对数级别。我们展示了对其他输入参数的依赖性也可以以次线性的方式限制，尽管不总是对数级别。特别是，我们展示了平方根连接具有$O(\log(y_{\max}))$依赖性，其中$y_{\max}$表示数据中出现的最大计数，而ID连接则需要$\Theta(\sqrt{y_{\max}/\log(y_{\max})})$的依赖性。作为证明ID连接情况下与$y_{\max}$相关界的紧密性的辅助结果，我们展示了Lambert $W_0$函数的主支的改进界，这可能是独立感兴趣的。我们进一步展示了当考虑$p\geq 3$的$p$次根连接函数时，我们分析的局限性，这表明如果这样的泛化甚至可能存在，那么可能需要其他分析或计算方法。

更新时间: 2024-10-30 10:09:05

领域: cs.LG,cs.DS,stat.ML

下载: http://arxiv.org/abs/2410.22872v1

Conditioned quantum-assisted deep generative surrogate for particle-calorimeter interactions

Particle collisions at accelerators such as the Large Hadron Collider, recorded and analyzed by experiments such as ATLAS and CMS, enable exquisite measurements of the Standard Model and searches for new phenomena. Simulations of collision events at these detectors have played a pivotal role in shaping the design of future experiments and analyzing ongoing ones. However, the quest for accuracy in Large Hadron Collider (LHC) collisions comes at an imposing computational cost, with projections estimating the need for millions of CPU-years annually during the High Luminosity LHC (HL-LHC) run \cite{collaboration2022atlas}. Simulating a single LHC event with \textsc{Geant4} currently devours around 1000 CPU seconds, with simulations of the calorimeter subdetectors in particular imposing substantial computational demands \cite{rousseau2023experimental}. To address this challenge, we propose a conditioned quantum-assisted deep generative model. Our model integrates a conditioned variational autoencoder (VAE) on the exterior with a conditioned Restricted Boltzmann Machine (RBM) in the latent space, providing enhanced expressiveness compared to conventional VAEs. The RBM nodes and connections are meticulously engineered to enable the use of qubits and couplers on D-Wave's Pegasus-structured \textit{Advantage} quantum annealer (QA) for sampling. We introduce a novel method for conditioning the quantum-assisted RBM using \textit{flux biases}. We further propose a novel adaptive mapping to estimate the effective inverse temperature in quantum annealers. The effectiveness of our framework is illustrated using Dataset 2 of the CaloChallenge \cite{calochallenge}.

Updated: 2024-10-30 10:08:03

标题: 经过训练的量子辅助深度生成替代模型用于粒子-量热器相互作用

摘要: 在粒子加速器中发生的粒子碰撞，如大型强子对撞机（Large Hadron Collider，LHC）中记录并由ATLAS和CMS等实验进行分析，使得标准模型的精确测量和新现象的探索成为可能。在这些探测器中模拟碰撞事件在塑造未来实验设计和分析正在进行的实验方面发挥了关键作用。然而，在LHC对撞中追求精确度的过程中需要巨大的计算成本，根据预测，在高亮度LHC（HL-LHC）运行期间每年可能需要数百万CPU年的计算资源。目前，使用Geant4模拟单个LHC事件需要约1000 CPU秒，特别是模拟量热计数器子探测器需要大量的计算资源。为了解决这一挑战，我们提出了一种条件量子辅助深度生成模型。我们的模型在外部集成了一个条件变分自编码器（VAE），并在潜在空间中使用一个条件受限玻尔兹曼机（RBM），相比传统的VAE具有更强的表达能力。RBM的节点和连接被精心设计，以实现在D-Wave的Pegasus结构Advantage量子退火器（QA）上使用比特和耦合器进行采样。我们引入了一种新颖的方法，通过flux偏置来对量子辅助的RBM进行调节。我们进一步提出了一种新颖的自适应映射方法来估计量子退火器中的有效反温度。我们的框架的有效性通过CaloChallenge的数据集2进行了展示。

更新时间: 2024-10-30 10:08:03

领域: cs.LG,cs.AI,hep-ph,physics.comp-ph,physics.ins-det

下载: http://arxiv.org/abs/2410.22870v1

Towards Population Scale Testis Volume Segmentation in DIXON MRI

Testis size is known to be one of the main predictors of male fertility, usually assessed in clinical workup via palpation or imaging. Despite its potential, population-level evaluation of testicular volume using imaging remains underexplored. Previous studies, limited by small and biased datasets, have demonstrated the feasibility of machine learning for testis volume segmentation. This paper presents an evaluation of segmentation methods for testicular volume using Magnet Resonance Imaging data from the UKBiobank. The best model achieves a median dice score of $0.87$, compared to median dice score of $0.83$ for human interrater reliability on the same dataset, enabling large-scale annotation on a population scale for the first time. Our overall aim is to provide a trained model, comparative baseline methods, and annotated training data to enhance accessibility and reproducibility in testis MRI segmentation research.

Updated: 2024-10-30 10:03:55

标题: 朝向人口规模的DIXON MRI睾丸体积分割

摘要: 睾丸大小被认为是男性生育能力的主要预测因素之一，通常通过触诊或成像在临床工作中进行评估。尽管具有潜力，但利用成像对睾丸容积进行人群水平评估仍未得到充分探索。先前的研究受限于小而有偏见的数据集，已经证明了机器学习在睾丸容积分割方面的可行性。本文提出了使用英国生物银行(Magnet Resonance Imaging)数据对睾丸容积进行分割方法的评估。最佳模型实现了中值Dice分数为0.87，而人类评分者在同一数据集上的中值Dice分数为0.83，首次实现了在人口规模上的大规模注释。我们的总体目标是提供经过训练的模型、比较基准方法和已标注的训练数据，以增强睾丸MRI分割研究的可访问性和可重复性。

更新时间: 2024-10-30 10:03:55

领域: eess.IV,cs.CV,cs.LG

下载: http://arxiv.org/abs/2410.22866v1

Investigations into Proof Structures

We introduce and elaborate a novel formalism for the manipulation and analysis of proofs as objects in a global manner. In this first approach the formalism is restricted to first-order problems characterized by condensed detachment. It is applied in an exemplary manner to a coherent and comprehensive formal reconstruction and analysis of historical proofs of a widely-studied problem due to {\L}ukasiewicz. The underlying approach opens the door towards new systematic ways of generating lemmas in the course of proof search to the effects of reducing the search effort and finding shorter proofs. Among the numerous reported experiments along this line, a proof of {\L}ukasiewicz's problem was automatically discovered that is much shorter than any proof found before by man or machine.

Updated: 2024-10-30 09:55:46

标题: 对证明结构的研究

摘要: 我们引入并详细阐述了一种新颖的形式主义，用于全局方式处理和分析证明作为对象。在这种第一种方法中，形式主义被限制在由紧凝脱离特征化的一阶问题上。它以示范性方式应用于对{\L}ukasiewicz的一个广泛研究的问题的历史证明的一致和全面的形式重建和分析。这种基本方法打开了新的系统生成引理的方式，以减少搜索工作并找到更短的证明。在沿着这条线报告的众多实验中，一个{\L}ukasiewicz问题的证明被自动发现，比以往任何人或机器发现的证明都要短。

更新时间: 2024-10-30 09:55:46

领域: cs.LO,cs.AI

下载: http://arxiv.org/abs/2304.12827v3

AtGCN: A Graph Convolutional Network For Ataxic Gait Detection

Video-based gait analysis can be defined as the task of diagnosing pathologies, such as ataxia, using videos of patients walking in front of a camera. This paper presents a graph convolution network called AtGCN for detecting ataxic gait and identifying its severity using 2D videos. The problem is especially challenging as the deviation of an ataxic gait from a healthy gait is very subtle. The datasets for ataxic gait detection are also quite small, with the largest dataset having only 149 videos. The paper addresses the first problem using special spatiotemporal graph convolution that successfully captures important gait-related features. To handle the small dataset size, a deep spatiotemporal graph convolution network pre-trained on an action recognition dataset is systematically truncated and then fine-tuned on the ataxia dataset to obtain the AtGCN model. The paper also presents an augmentation strategy that segments a video sequence into multiple gait cycles. The proposed AtGCN model then operates on a graph of body part locations belonging to a single gait cycle. The evaluation results support the strength of the proposed AtGCN model, as it outperforms the state-of-the-art in detection and severity prediction with an accuracy of 93.46% and a MAE of 0.4169, respectively.

Updated: 2024-10-30 09:55:30

标题: AtGCN：一种用于共济失调步态检测的图卷积网络

摘要: 基于视频的步态分析可以定义为使用病人在摄像机前行走的视频来诊断病理，如共济失调。本文介绍了一种名为AtGCN的图卷积网络，用于使用2D视频检测共济失调步态并确定其严重程度。这个问题尤其具有挑战性，因为共济失调步态与健康步态的偏差非常微妙。共济失调步态检测的数据集也非常小，最大的数据集只有149个视频。本文通过成功捕捉重要的步态相关特征，使用特殊的时空图卷积来解决第一个问题。为了处理小的数据集大小，一个在动作识别数据集上预训练的深度时空图卷积网络被系统地截断，然后在共济失调数据集上进行微调，得到AtGCN模型。该论文还提出了一种增强策略，将视频序列分割成多个步态周期。提出的AtGCN模型然后在属于单个步态周期的身体部位位置图上运行。评估结果支持提出的AtGCN模型的强大性能，因为它在检测和严重程度预测中的准确率分别为93.46%和0.4169的MAE，优于现有技术。

更新时间: 2024-10-30 09:55:30

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2410.22862v1

On the Worst Prompt Performance of Large Language Models

The performance of large language models (LLMs) is acutely sensitive to the phrasing of prompts, which raises significant concerns about their reliability in real-world scenarios. Existing studies often divide prompts into task-level instructions and case-level inputs and primarily focus on evaluating and improving robustness against variations in tasks-level instructions. However, this setup fails to fully address the diversity of real-world user queries and assumes the existence of task-specific datasets. To address these limitations, we introduce RobustAlpacaEval, a new benchmark that consists of semantically equivalent case-level queries and emphasizes the importance of using the worst prompt performance to gauge the lower bound of model performance. Extensive experiments on RobustAlpacaEval with ChatGPT and six open-source LLMs from the Llama, Mistral, and Gemma families uncover substantial variability in model performance; for instance, a difference of 45.48% between the worst and best performance for the Llama-2-70B-chat model, with its worst performance dipping as low as 9.38%. We further illustrate the difficulty in identifying the worst prompt from both model-agnostic and model-dependent perspectives, emphasizing the absence of a shortcut to characterize the worst prompt. We also attempt to enhance the worst prompt performance using existing prompt engineering and prompt consistency methods, but find that their impact is limited. These findings underscore the need to create more resilient LLMs that can maintain high performance across diverse prompts. Data and code are available at https://github.com/cbwbuaa/On-the-Worst-Prompt- Performance-of-LLMs.

Updated: 2024-10-30 09:48:52

标题: 大型语言模型最糟糕提示表现研究

摘要: 大型语言模型（LLMs）的性能对提示的措辞非常敏感，这引发了对它们在现实场景中可靠性的重大担忧。现有研究通常将提示分为任务级别指令和案例级别输入，并主要关注评估和改进对任务级别指令变化的鲁棒性。然而，这种设置未能充分解决真实用户查询的多样性，并假设存在任务特定的数据集。为了解决这些限制，我们引入了RobustAlpacaEval，这是一个由语义等效的案例级别查询组成的新基准，强调使用最差提示性能来衡量模型性能的下限的重要性。在RobustAlpacaEval上进行了大量实验，使用ChatGPT和来自Llama、Mistral和Gemma系列的六个开源LLMs揭示了模型性能的显著变异性；例如，Llama-2-70B-chat模型的最差和最佳性能之间的差异达到了45.48%，其最差性能下降至9.38%。我们进一步说明了从模型无关和模型相关的视角识别最差提示的困难，强调了缺乏表征最差提示的捷径。我们还尝试使用现有的提示工程和提示一致性方法来提高最差提示的性能，但发现它们的影响有限。这些发现强调了需要创建更具弹性的LLMs，以在各种提示下保持高性能。数据和代码可在https://github.com/cbwbuaa/On-the-Worst-Prompt- Performance-of-LLMs 上获得。

更新时间: 2024-10-30 09:48:52

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2406.10248v4

Exploiting Phonological Similarities between African Languages to achieve Speech to Speech Translation

This paper presents a pilot study on direct speech-to-speech translation (S2ST) by leveraging linguistic similarities among selected African languages within the same phylum, particularly in cases where traditional data annotation is expensive or impractical. We propose a segment-based model that maps speech segments both within and across language phyla, effectively eliminating the need for large paired datasets. By utilizing paired segments and guided diffusion, our model enables translation between any two languages in the dataset. We evaluate the model on a proprietary dataset from the Kenya Broadcasting Corporation (KBC), which includes five languages: Swahili, Luo, Kikuyu, Nandi, and English. The model demonstrates competitive performance in segment pairing and translation quality, particularly for languages within the same phylum. Our experiments reveal that segment length significantly influences translation accuracy, with average-length segments yielding the highest pairing quality. Comparative analyses with traditional cascaded ASR-MT techniques show that the proposed model delivers nearly comparable translation performance. This study underscores the potential of exploiting linguistic similarities within language groups to perform efficient S2ST, especially in low-resource language contexts.

Updated: 2024-10-30 09:44:52

标题: 利用非洲语言之间的音韵相似性实现语音到语音翻译

摘要: 本文介绍了一项关于直接语音到语音翻译(S2ST)的试点研究，通过利用同一门类中所选非洲语言之间的语言相似性，特别是在传统数据标注昂贵或不切实际的情况下。我们提出了一个基于片段的模型，可以在语言门类内外映射语音片段，有效消除了大型配对数据集的需求。通过利用配对片段和引导扩散，我们的模型可以在数据集中的任意两种语言之间进行翻译。我们在肯尼亚广播公司(KBC)的专有数据集上对模型进行评估，该数据集包括斯瓦希里语、罗语、吉库尤语、南迪语和英语。该模型在片段配对和翻译质量方面表现出竞争力，特别是对于同一门类内的语言。我们的实验表明，片段长度显著影响翻译准确性，平均长度的片段产生最高的配对质量。与传统级联ASR-MT技术的比较分析表明，所提出的模型提供了几乎可以媲美的翻译性能。这项研究强调了利用语言群体内的语言相似性进行高效S2ST的潜力，特别是在资源匮乏的语言环境中。

更新时间: 2024-10-30 09:44:52

领域: eess.AS,cs.AI,cs.CL

下载: http://arxiv.org/abs/2410.23323v1

Schur's Positive-Definite Network: Deep Learning in the SPD cone with structure

Estimating matrices in the symmetric positive-definite (SPD) cone is of interest for many applications ranging from computer vision to graph learning. While there exist various convex optimization-based estimators, they remain limited in expressivity due to their model-based approach. The success of deep learning motivates the use of learning-based approaches to estimate SPD matrices with neural networks in a data-driven fashion. However, designing effective neural architectures for SPD learning is challenging, particularly when the task requires additional structural constraints, such as element-wise sparsity. Current approaches either do not ensure that the output meets all desired properties or lack expressivity. In this paper, we introduce SpodNet, a novel and generic learning module that guarantees SPD outputs and supports additional structural constraints. Notably, it solves the challenging task of learning jointly SPD and sparse matrices. Our experiments illustrate the versatility and relevance of SpodNet layers for such applications.

Updated: 2024-10-30 09:42:36

标题: 舒尔正定网络：在带有结构的SPD锥体中进行深度学习

摘要: 在对称正定（SPD）锥体中估计矩阵对于许多应用都很有意义，从计算机视觉到图学习等应用广泛。虽然存在各种基于凸优化的估计器，但由于其基于模型的方法，它们在表达能力上仍然存在限制。深度学习的成功激发了使用基于学习的方法以数据驱动的方式估计SPD矩阵。然而，为SPD学习设计有效的神经架构具有挑战性，特别是当任务需要额外的结构约束，如逐元素稀疏性时。当前的方法要么不能确保输出满足所有期望的属性，要么缺乏表达能力。在本文中，我们介绍了SpodNet，一个全新的通用学习模块，它保证SPD输出并支持额外的结构约束。值得注意的是，它解决了联合学习SPD和稀疏矩阵的具有挑战性的任务。我们的实验展示了SpodNet层对这类应用的多功能性和相关性。

更新时间: 2024-10-30 09:42:36

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2406.09023v3

AMARO: All Heavy-Atom Transferable Neural Network Potentials of Protein Thermodynamics

All-atom molecular simulations offer detailed insights into macromolecular phenomena, but their substantial computational cost hinders the exploration of complex biological processes. We introduce Advanced Machine-learning Atomic Representation Omni-force-field (AMARO), a new neural network potential (NNP) that combines an O(3)-equivariant message-passing neural network architecture, TensorNet, with a coarse-graining map that excludes hydrogen atoms. AMARO demonstrates the feasibility of training coarser NNP, without prior energy terms, to run stable protein dynamics with scalability and generalization capabilities.

Updated: 2024-10-30 09:42:12

标题: AMARO: 蛋白质热力学的全重原子可转移神经网络势能

摘要: 全原子分子模拟提供了对大分子现象的详细洞察，但它们巨大的计算成本阻碍了对复杂生物过程的探索。我们引入了Advanced Machine-learning Atomic Representation Omni-force-field (AMARO)，这是一个新的神经网络势能（NNP），它将O(3)等变消息传递神经网络架构TensorNet与一个排除氢原子的粗粒化映射结合起来。AMARO展示了训练更粗糙的NNP的可行性，而不需要先验能量项，以稳定地运行蛋白质动力学，并具有可扩展性和泛化能力。

更新时间: 2024-10-30 09:42:12

领域: q-bio.BM,cs.LG,physics.bio-ph,physics.comp-ph

下载: http://arxiv.org/abs/2409.17852v2

Hyperparameter Optimization in Machine Learning

Hyperparameters are configuration variables controlling the behavior of machine learning algorithms. They are ubiquitous in machine learning and artificial intelligence and the choice of their values determine the effectiveness of systems based on these technologies. Manual hyperparameter search is often unsatisfactory and becomes unfeasible when the number of hyperparameters is large. Automating the search is an important step towards automating machine learning, freeing researchers and practitioners alike from the burden of finding a good set of hyperparameters by trial and error. In this survey, we present a unified treatment of hyperparameter optimization, providing the reader with examples and insights into the state-of-the-art. We cover the main families of techniques to automate hyperparameter search, often referred to as hyperparameter optimization or tuning, including random and quasi-random search, bandit-, model- and gradient- based approaches. We further discuss extensions, including online, constrained, and multi-objective formulations, touch upon connections with other fields such as meta-learning and neural architecture search, and conclude with open questions and future research directions.

Updated: 2024-10-30 09:39:22

标题: 机器学习中的超参数优化

摘要: 超参数是控制机器学习算法行为的配置变量。它们在机器学习和人工智能中随处可见，其值的选择决定了基于这些技术的系统的有效性。手动超参数搜索通常不尽如人意，在超参数数量较大时变得不可行。自动化搜索是实现自动化机器学习的重要一步，使研究人员和实践者都摆脱了通过试错找到一组好的超参数的负担。在这项调查中，我们提供了超参数优化的统一处理，为读者提供了现有技术的示例和见解。我们涵盖了自动化超参数搜索的主要技术家族，通常被称为超参数优化或调整，包括随机和准随机搜索，赌徒、模型和基于梯度的方法。我们进一步讨论了扩展，包括在线、约束和多目标制定，触及与其他领域如元学习和神经结构搜索的联系，并最后提出了一些未解之谜和未来研究方向。

更新时间: 2024-10-30 09:39:22

领域: stat.ML,cs.LG

下载: http://arxiv.org/abs/2410.22854v1

Dataset of polarimetric images of mechanically generated water surface waves coupled with surface elevation records by wave gauges linear array

Effective spatio-temporal measurements of water surface elevation (water waves) in laboratory experiments are essential for scientific and engineering research. Existing techniques are often cumbersome, computationally heavy and generally suffer from limited wavenumber/frequency response. To address these challenges a novel method was developed, using polarization filter equipped camera as the main sensor and Machine Learning (ML) algorithms for data processing [1,2]. The developed method training and evaluation was based on in-house made supervised dataset. Here we present this supervised dataset of polarimetric images of the water surface coupled with the water surface elevation measurements made by a linear array of resistance-type wave gauges (WG). The water waves were mechanically generated in a laboratory waves basin, and the polarimetric images were captured under an artificial light source. Meticulous camera and WGs calibration and instruments synchronization supported high spatio-temporal resolution. The data set covers several wavefield conditions, from simple monochromatic wave trains of various steepness, to irregular wavefield of JONSWAP prescribed spectral shape and several wave breaking scenarios. The dataset contains measurements repeated in several camera positions relative to the wave field propagation direction.

Updated: 2024-10-30 09:35:27

标题: 机械产生的水面波极化图像数据集与波浪计测线性阵列的水面高程记录相结合

摘要: 在实验室实验中，有效地测量水面高程（水波）的时空特性对科学研究和工程研究至关重要。现有的技术通常繁琐、计算量大，并且通常受到波数/频率响应有限的限制。为了解决这些挑战，开发了一种新方法，利用装备偏振滤光片的相机作为主要传感器，并使用机器学习（ML）算法进行数据处理。开发的方法训练和评估基于内部制作的监督数据集。在这里，我们介绍了这个监督数据集，其中包括水面的偏振图像和由一排电阻波浪计（WG）测量的水面高程。水波是在实验室波浪池中机械产生的，偏振图像是在人工光源下拍摄的。精心的相机和WG的校准和仪器同步支持高时空分辨率。数据集涵盖了几种波场条件，从各种陡峭的简单单色波列，到JONSWAP规定的光谱形状不规则波场和几种波浪破碎情景。数据集包含在几个相机位置相对于波场传播方向重复的测量。

更新时间: 2024-10-30 09:35:27

领域: physics.ao-ph,cs.LG

下载: http://arxiv.org/abs/2410.22849v1

The Adoption and Efficacy of Large Language Models: Evidence From Consumer Complaints in the Financial Industry

Large Language Models (LLMs) are reshaping consumer decision-making, particularly in communication with firms, yet our understanding of their impact remains limited. This research explores the effect of LLMs on consumer complaints submitted to the Consumer Financial Protection Bureau from 2015 to 2024, documenting the adoption of LLMs for drafting complaints and evaluating the likelihood of obtaining relief from financial firms. Utilizing a leading AI detection tool, we analyzed over 1 million complaints and identified a significant increase in LLM usage following the release of ChatGPT. We establish a causal relationship between LLM usage and an increased likelihood of obtaining relief by employing instrumental variables to address endogeneity in LLM adoption. Experimental data further support this link, demonstrating that LLMs enhance the clarity and persuasiveness of consumer narratives. Our findings suggest that facilitating access to LLMs can help firms better understand consumer concerns and level the playing field among consumers. This underscores the importance of policies promoting technological accessibility, enabling all consumers to effectively voice their concerns.

Updated: 2024-10-30 09:29:11

标题: 大型语言模型的采用和有效性：来自金融行业消费者投诉的证据

摘要: 大型语言模型（LLMs）正在重塑消费者决策，特别是在与公司的沟通中，然而我们对它们的影响仍知之甚少。本研究探讨了LLMs对2015年至2024年提交给消费者金融保护局的投诉的影响，记录了LLMs在起草投诉和评估从金融公司获得救济的采用情况。利用一种领先的人工智能检测工具，我们分析了超过100万份投诉，并发现在ChatGPT发布后LLM的使用量显著增加。我们通过采用工具变量来解决LLMs采用中的内生性问题，建立了LLM使用与获得救济机会增加之间的因果关系。实验数据进一步支持了这一联系，表明LLMs增强了消费者叙述的清晰度和说服力。我们的研究结果表明，促进对LLMs的访问可以帮助公司更好地理解消费者的关注点，并在消费者之间实现公平竞争。这强调了促进技术可访问性的政策的重要性，使所有消费者都能有效地表达他们的关切。

更新时间: 2024-10-30 09:29:11

领域: cs.HC,cs.AI,cs.CL

下载: http://arxiv.org/abs/2311.16466v3

Trade-Offs of Diagonal Fisher Information Matrix Estimators

The Fisher information matrix can be used to characterize the local geometry of the parameter space of neural networks. It elucidates insightful theories and useful tools to understand and optimize neural networks. Given its high computational cost, practitioners often use random estimators and evaluate only the diagonal entries. We examine two popular estimators whose accuracy and sample complexity depend on their associated variances. We derive bounds of the variances and instantiate them in neural networks for regression and classification. We navigate trade-offs for both estimators based on analytical and numerical studies. We find that the variance quantities depend on the non-linearity wrt different parameter groups and should not be neglected when estimating the Fisher information.

Updated: 2024-10-30 09:29:10

标题: 对角费舍尔信息矩阵估计器的权衡Trade-Offs

摘要: 费舍尔信息矩阵可用于表征神经网络参数空间的局部几何特性。它阐明了深刻的理论和有用的工具，可以帮助理解和优化神经网络。由于计算成本高，实践者通常使用随机估计器，并仅评估对角线条目。我们研究了两种流行的估计器，它们的准确性和样本复杂性取决于它们相关的方差。我们推导了方差的界限，并在用于回归和分类的神经网络中具体化了它们。我们通过分析和数值研究来权衡这两种估计器的折衷。我们发现方差数量取决于不同参数组的非线性，并且在估计费舍尔信息时不应忽视它们。

更新时间: 2024-10-30 09:29:10

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2402.05379v3

Danoliteracy of Generative, Large Language Models

The language technology moonshot moment of Generative, Large Language Models (GLLMs) was not limited to English: These models brought a surge of technological applications, investments and hype to low-resource languages as well. However, the capabilities of these models in languages such as Danish were until recently difficult to verify beyond qualitative demonstrations due to a lack of applicable evaluation corpora. We present a GLLM benchmark to evaluate Danoliteracy, a measure of Danish language and cultural competency, across eight diverse scenarios such Danish citizenship tests and abstractive social media question answering. This limited-size benchmark is found to produce a robust ranking that correlates to human feedback at $\rho \sim 0.8$ with GPT-4 and Claude Opus models achieving the highest rankings. Analyzing these model results across scenarios, we find one strong underlying factor explaining $95\%$ of scenario performance variance for GLLMs in Danish, suggesting a $g$ factor of model consistency in language adaption.

Updated: 2024-10-30 09:18:31

标题: 生成式大语言模型的文盲问题

摘要: 大规模生成式语言模型（GLLMs）的语言技术飞跃时刻并不局限于英语：这些模型也为低资源语言带来了技术应用、投资和炒作的激增。然而，直到最近，这些模型在丹麦语等语言中的能力很难通过定性演示以外的方式进行验证，因为缺乏适用的评估语料库。我们提出了一个GLLM基准来评估Danoliteracy，即丹麦语言和文化能力，在包括丹麦公民测试和提取性社交媒体问答等八种不同场景下。这个规模有限的基准发现产生了一个稳健的排名，与人类反馈的相关性为ρ约为0.8，其中GPT-4和Claude Opus模型获得了最高排名。通过跨场景分析这些模型结果，我们发现一个强大的潜在因素解释了GLLMs在丹麦语中$95\%$的场景表现差异，表明了语言适应中模型一致性的g因素。

更新时间: 2024-10-30 09:18:31

领域: cs.CL,cs.AI,cs.LG,I.2.7

下载: http://arxiv.org/abs/2410.22839v1

Chain of Ideas: Revolutionizing Research Via Novel Idea Development with LLM Agents

Effective research ideation is a critical step for scientific research. However, the exponential increase in scientific literature makes it challenging for researchers to stay current with recent advances and identify meaningful research directions. Recent developments in large language models~(LLMs) suggest a promising avenue for automating the generation of novel research ideas. However, existing methods for idea generation either trivially prompt LLMs or directly expose LLMs to extensive literature without indicating useful information. Inspired by the research process of human researchers, we propose a Chain-of-Ideas~(CoI) agent, an LLM-based agent that organizes relevant literature in a chain structure to effectively mirror the progressive development in a research domain. This organization facilitates LLMs to capture the current advancements in research, thereby enhancing their ideation capabilities. Furthermore, we propose Idea Arena, an evaluation protocol that can comprehensively evaluate idea generation methods from different perspectives, aligning closely with the preferences of human researchers. Experimental results indicate that the CoI agent consistently outperforms other methods and shows comparable quality as humans in research idea generation. Moreover, our CoI agent is budget-friendly, with a minimum cost of \$0.50 to generate a candidate idea and its corresponding experimental design.

Updated: 2024-10-30 09:17:59

标题: 思路链：通过LLM代理的新思路开发彻底改革研究

摘要: 有效的研究构思是科学研究中的关键步骤。然而，科学文献的指数增长使得研究人员难以跟上最新进展并确定有意义的研究方向。大型语言模型（LLMs）的最新发展表明自动生成新颖研究想法的前景看好。然而，现有的构思方法要么简单地提示LLMs，要么直接将LLMs暴露于大量文献中，而没有指示有用信息。受人类研究者研究过程启发，我们提出了一种基于链式思想（CoI）代理的LLM代理，该代理将相关文献组织成链式结构，有效地反映研究领域的逐步发展。这种组织方式有助于LLMs捕捉研究的最新进展，从而增强其构思能力。此外，我们提出了Idea Arena，一个评估协议，可以全面评估不同视角的构思方法，与人类研究者的偏好密切契合。实验结果表明，CoI代理始终优于其他方法，并在研究构思生成方面显示出与人类相当的质量。此外，我们的CoI代理成本友好，生成候选想法及其相应实验设计的最低成本为0.50美元。

更新时间: 2024-10-30 09:17:59

领域: cs.AI,cs.CL

下载: http://arxiv.org/abs/2410.13185v5

HijackRAG: Hijacking Attacks against Retrieval-Augmented Large Language Models

Retrieval-Augmented Generation (RAG) systems enhance large language models (LLMs) by integrating external knowledge, making them adaptable and cost-effective for various applications. However, the growing reliance on these systems also introduces potential security risks. In this work, we reveal a novel vulnerability, the retrieval prompt hijack attack (HijackRAG), which enables attackers to manipulate the retrieval mechanisms of RAG systems by injecting malicious texts into the knowledge database. When the RAG system encounters target questions, it generates the attacker's pre-determined answers instead of the correct ones, undermining the integrity and trustworthiness of the system. We formalize HijackRAG as an optimization problem and propose both black-box and white-box attack strategies tailored to different levels of the attacker's knowledge. Extensive experiments on multiple benchmark datasets show that HijackRAG consistently achieves high attack success rates, outperforming existing baseline attacks. Furthermore, we demonstrate that the attack is transferable across different retriever models, underscoring the widespread risk it poses to RAG systems. Lastly, our exploration of various defense mechanisms reveals that they are insufficient to counter HijackRAG, emphasizing the urgent need for more robust security measures to protect RAG systems in real-world deployments.

Updated: 2024-10-30 09:15:51

标题: HijackRAG: 针对检索增强大型语言模型的劫持攻击

摘要: 检索增强生成（RAG）系统通过整合外部知识增强大型语言模型（LLMs），使它们适用于各种应用，具有灵活性和成本效益。然而，对这些系统日益依赖也引入了潜在的安全风险。在这项工作中，我们揭示了一种新的漏洞，即检索提示劫持攻击（HijackRAG），使攻击者能够通过向知识数据库注入恶意文本来操纵RAG系统的检索机制。当RAG系统遇到目标问题时，它会生成攻击者预先确定的答案，而不是正确的答案，从而破坏了系统的完整性和可信度。我们将HijackRAG形式化为一个优化问题，并提出了针对不同攻击者知识水平的黑盒和白盒攻击策略。在多个基准数据集上进行的广泛实验表明，HijackRAG始终能够实现高攻击成功率，优于现有的基准攻击。此外，我们证明这种攻击可以在不同的检索模型之间转移，突显了它对RAG系统带来的广泛风险。最后，我们对各种防御机制的探索表明它们不足以抵御HijackRAG，强调了在实际部署中保护RAG系统所需更加强大的安全措施的紧迫性。

更新时间: 2024-10-30 09:15:51

领域: cs.CR,cs.AI,cs.IR

下载: http://arxiv.org/abs/2410.22832v1

aiXcoder-7B: A Lightweight and Effective Large Language Model for Code Completion

Large Language Models (LLMs) have been widely used in code completion, and researchers are focusing on scaling up LLMs to improve their accuracy. However, larger LLMs will increase the response time of code completion and decrease the developers' productivity. In this paper, we propose a lightweight and effective LLM for code completion named aiXcoder-7B. Compared to existing LLMs, aiXcoder-7B achieves higher code completion accuracy while having smaller scales (i.e., 7 billion parameters). We attribute the superiority of aiXcoder-7B to three key factors: (1) Multi-objective training. We employ three training objectives, one of which is our proposed Structured Fill-In-the-Middle (SFIM). SFIM considers the syntax structures in code and effectively improves the performance of LLMs for code. (2) Diverse data sampling strategies. They consider inter-file relationships and enhance the capability of LLMs in understanding cross-file contexts. (3) Extensive high-quality data. We establish a rigorous data collection pipeline and consume a total of 1.2 trillion unique tokens for training aiXcoder-7B. This vast volume of data enables aiXcoder-7B to learn a broad distribution of code. We evaluate aiXcoder-7B in five popular code completion benchmarks and a new benchmark collected by this paper. The results show that aiXcoder-7B outperforms the latest six LLMs with similar sizes and even surpasses four larger LLMs (e.g., StarCoder2-15B and CodeLlama-34B), positioning aiXcoder-7B as a lightweight and effective LLM for academia and industry. Finally, we summarize three valuable insights for helping practitioners train the next generations of LLMs for code. aiXcoder-7B has been open-souced and gained significant attention. As of the submission date, aiXcoder-7B has received 2,193 GitHub Stars.

Updated: 2024-10-30 09:10:35

标题: aiXcoder-7B：一种轻便有效的大型语言模型，用于代码补全

摘要: 大型语言模型（LLMs）已被广泛应用于代码完成，并且研究人员正致力于扩大LLMs的规模以提高其准确性。然而，更大规模的LLMs将增加代码完成的响应时间，降低开发人员的生产力。在本文中，我们提出了一个轻量级且有效的LLM，用于代码完成，名为aiXcoder-7B。与现有的LLMs相比，aiXcoder-7B在拥有更小规模（即70亿参数）的同时实现了更高的代码完成准确性。我们将aiXcoder-7B的优越性归因于三个关键因素：（1）多目标训练。我们采用三个训练目标，其中之一是我们提出的结构化填充中间（SFIM）。 SFIM考虑了代码中的语法结构，并有效提高了LLMs在代码中的性能。（2）多样化的数据采样策略。它们考虑文件之间的关系，并增强了LLMs在理解跨文件上下文方面的能力。（3）大量高质量数据。我们建立了一个严格的数据收集管道，并在训练aiXcoder-7B时使用了总共1.2万亿个唯一标记。这一庞大的数据量使aiXcoder-7B能够学习代码的广泛分布。我们在五个流行的代码完成基准测试和本文收集的一个新基准测试中评估了aiXcoder-7B。结果显示，aiXcoder-7B优于最新的六个具有相似规模的LLMs，甚至胜过四个更大的LLMs（例如StarCoder2-15B和CodeLlama-34B），将aiXcoder-7B定位为学术界和工业界的轻量级和有效的LLM。最后，我们总结了三个有价值的见解，以帮助从业者训练下一代LLMs用于代码。aiXcoder-7B已经开源并受到了重大关注。截至提交日期，aiXcoder-7B已获得了2,193个GitHub星标。

更新时间: 2024-10-30 09:10:35

领域: cs.CL,cs.AI,cs.SE

下载: http://arxiv.org/abs/2410.13187v2

Improving Generalization and Convergence by Enhancing Implicit Regularization

In this work, we propose an Implicit Regularization Enhancement (IRE) framework to accelerate the discovery of flat solutions in deep learning, thereby improving generalization and convergence. Specifically, IRE decouples the dynamics of flat and sharp directions, which boosts the sharpness reduction along flat directions while maintaining the training stability in sharp directions. We show that IRE can be practically incorporated with {\em generic base optimizers} without introducing significant computational overload. Experiments show that IRE consistently improves the generalization performance for image classification tasks across a variety of benchmark datasets (CIFAR-10/100, ImageNet) and models (ResNets and ViTs). Surprisingly, IRE also achieves a $2\times$ {\em speed-up} compared to AdamW in the pre-training of Llama models (of sizes ranging from 60M to 229M) on datasets including Wikitext-103, Minipile, and Openwebtext. Moreover, we provide theoretical guarantees, showing that IRE can substantially accelerate the convergence towards flat minima in Sharpness-aware Minimization (SAM).

Updated: 2024-10-30 08:58:04

标题: 通过增强隐式正则化来提高泛化性能和收敛性

摘要: 在这项工作中，我们提出了一种隐式正则化增强（IRE）框架，以加速在深度学习中发现平坦解的过程，从而提高泛化能力和收敛速度。具体来说，IRE将平坦和尖锐方向的动态解耦，从而在沿着平坦方向降低尖锐度的同时保持尖锐方向的训练稳定性。我们展示了IRE可以与通用基本优化器结合使用，而不会引入显著的计算负担。实验证明，IRE在各种基准数据集（CIFAR-10/100，ImageNet）和模型（ResNets和ViTs）的图像分类任务中始终提高泛化性能。令人惊讶的是，IRE在Llama模型（规模从60M到229M）的预训练中与AdamW相比实现了2倍的加速，在包括Wikitext-103、Minipile和Openwebtext在内的数据集上。此外，我们提供了理论保证，表明IRE可以显著加速向Sharpness-aware Minimization（SAM）中的平坦极小值的收敛。

更新时间: 2024-10-30 08:58:04

领域: cs.LG,math.OC,stat.ML

下载: http://arxiv.org/abs/2405.20763v3

Dynamical Mode Recognition of Coupled Flame Oscillators by Supervised and Unsupervised Learning Approaches

Combustion instability in gas turbines and rocket engines, as one of the most challenging problems in combustion research, arises from the complex interactions among flames, which are also influenced by chemical reactions, heat and mass transfer, and acoustics. Identifying and understanding combustion instability is essential to ensure the safe and reliable operation of many combustion systems, where exploring and classifying the dynamical behaviors of complex flame systems is a core take. To facilitate fundamental studies, the present work concerns dynamical mode recognition of coupled flame oscillators made of flickering buoyant diffusion flames, which have gained increasing attention in recent years but are not sufficiently understood. The time series data of flame oscillators are generated by fully validated reacting flow simulations. Due to limitations of expertise-based models, a data-driven approach is adopted. In this study, a nonlinear dimensional reduction model of variational autoencoder (VAE) is used to project the simulation data onto a 2-dimensional latent space. Based on the phase trajectories in latent space, both supervised and unsupervised classifiers are proposed for datasets with well known labeling and without, respectively. For labeled datasets, we establish the Wasserstein-distance-based classifier (WDC) for mode recognition; for unlabeled datasets, we develop a novel unsupervised classifier (GMM-DTWC) combining dynamic time warping (DTW) and Gaussian mixture model (GMM). Through comparing with conventional approaches for dimensionality reduction and classification, the proposed supervised and unsupervised VAE-based approaches exhibit a prominent performance for distinguishing dynamical modes, implying their potential extension to dynamical mode recognition of complex combustion problems.

Updated: 2024-10-30 08:57:32

标题: 通过监督和无监督学习方法识别耦合火焰振荡器的动态模式

摘要: 燃气涡轮和火箭发动机中的燃烧不稳定性是燃烧研究中最具挑战性的问题之一，源于火焰之间复杂的相互作用，受化学反应、热量和质量传递以及声学的影响。识别和理解燃烧不稳定性对于确保许多燃烧系统的安全可靠运行至关重要，其中探索和分类复杂火焰系统的动态行为是核心任务。为了促进基础研究，本研究关注由闪烁的浮力扩散火焰构成的耦合火焰振荡器的动态模式识别，这在近年来引起了越来越多的关注，但尚未得到充分理解。火焰振荡器的时间序列数据是通过完全验证的反应流模拟生成的。由于基于专业知识的模型的局限性，采用了数据驱动的方法。在这项研究中，采用非线性降维模型变分自动编码器（VAE）将模拟数据投影到一个二维潜在空间。基于潜在空间中的相位轨迹，提出了分别针对已知标签和未知标签数据集的监督和无监督分类器。对于有标签的数据集，我们建立了基于Wasserstein距离的分类器（WDC）用于模式识别；对于无标签的数据集，我们开发了一种结合动态时间扭曲（DTW）和高斯混合模型（GMM）的新型无监督分类器（GMM-DTWC）。通过与传统的降维和分类方法进行比较，提出的监督和无监督基于VAE的方法在区分动态模式方面表现出卓越的性能，暗示其潜在扩展到复杂燃烧问题的动态模式识别。

更新时间: 2024-10-30 08:57:32

领域: cs.LG

下载: http://arxiv.org/abs/2404.17801v2

CURATRON: Complete and Robust Preference Data for Rigorous Alignment of Large Language Models

This paper addresses the challenges of aligning large language models (LLMs) with human values via preference learning (PL), focusing on incomplete and corrupted data in preference datasets. We propose a novel method for robustly and completely recalibrating values within these datasets to enhance LLMs' resilience against the issues. In particular, we devise a guaranteed polynomial time ranking algorithm that robustifies several existing models, such as the classic Bradley-Terry-Luce (BTL) (Bradley and Terry, 1952) model and certain generalizations of it. To the best of our knowledge, our present work is the first to propose an algorithm that provably recovers an $\epsilon$-optimal ranking with high probability while allowing as large as $O(n)$ perturbed pairwise comparison results per model response. Furthermore, we show robust recovery results in the partially observed setting. Our experiments confirm that our algorithms handle adversarial noise and unobserved comparisons well in both general and LLM preference dataset settings. This work contributes to the development and scaling of more reliable and ethically aligned AI models by equipping the dataset curation pipeline with the ability to handle missing and maliciously manipulated inputs.

Updated: 2024-10-30 08:54:38

标题: CURATRON：用于对齐大型语言模型的完整和稳健偏好数据

摘要: 这篇论文讨论了通过偏好学习（PL）来解决大型语言模型（LLMs）与人类价值观之间的挑战，重点关注偏好数据集中的不完整和损坏数据。我们提出了一种新颖的方法，可以强化和完全重新校准这些数据集中的价值观，以增强LLMs对问题的抵抗力。特别地，我们设计了一种保证多项式时间排序算法，可以强化几种现有模型，如经典的Bradley-Terry-Luce（BTL）（Bradley和Terry，1952年）模型及其某些推广形式。据我们所知，我们目前的工作是第一个提出一种算法，可以证明以高概率恢复出一个$\epsilon$-最优排序，同时允许每个模型响应中最多$O(n)$个扰动的成对比较结果。此外，我们展示了在部分观察设置下的强健恢复结果。我们的实验证实，我们的算法在一般和LLM偏好数据集设置中很好地处理对抗性噪声和未观察到的比较。这项工作通过为数据集筛选管道提供处理缺失和恶意操纵输入的能力，有助于开发和扩展更可靠和道德对齐的人工智能模型。

更新时间: 2024-10-30 08:54:38

领域: cs.AI,cs.CL

下载: http://arxiv.org/abs/2403.02745v2

Towards Robust and Efficient Federated Low-Rank Adaptation with Heterogeneous Clients

Federated fine-tuning for Large Language Models (LLMs) has recently gained attention due to the heavy communication overhead of transmitting large model updates. Low Rank Adaptation (LoRA) has been proposed as a solution, yet its application in federated learning is complicated by discordance in aggregation. Existing methods addressing this discordance often suffer from performance degradation at low ranks in heterogeneous data settings. In response, we introduce LoRA-A2 (Low Rank Adaptation with Alternating freeze and Adaptive rank selection), which demonstrates robustness in challenging settings with low ranks and high data heterogeneity. Our experimental findings reveal that LoRA-A2 maintains performance even under extreme heterogeneity and low rank conditions, achieving up to a 99.8% reduction in uploaded parameters compared to full fine-tuning without compromising performance. This adaptive mechanism boosts robustness and communication efficiency in federated fine-tuning, enabling the practical deployment of LLMs in resource-constrained environments.

Updated: 2024-10-30 08:48:21

标题: 朝向具有异构客户的稳健高效的联邦低秩适应

摘要: 最近，由于传输大型模型更新的沉重通信开销，联合微调大型语言模型（LLMs）引起了关注。低秩适应（LoRA）被提出作为解决方案，然而在联合学习中应用LoRA受到聚合不一致的复杂性的影响。现有方法解决这种不一致性通常在异构数据设置中低秩处于性能下降状态。为此，我们介绍了LoRA-A2（交替冻结和自适应秩选择的低秩适应），在低秩和高数据异质性的挑战性环境中展现出稳健性。我们的实验结果显示，与完全微调相比，LoRA-A2即使在极端异质性和低秩条件下也能保持性能，上传参数量减少高达99.8%而不影响性能。这种自适应机制提高了联合微调中的稳健性和通信效率，使LLMs能够在资源受限环境中得到实际部署。

更新时间: 2024-10-30 08:48:21

领域: cs.LG,cs.AI,cs.DC

下载: http://arxiv.org/abs/2410.22815v1

Understanding the Expressive Power and Mechanisms of Transformer for Sequence Modeling

We conduct a systematic study of the approximation properties of Transformer for sequence modeling with long, sparse and complicated memory. We investigate the mechanisms through which different components of Transformer, such as the dot-product self-attention, positional encoding and feed-forward layer, affect its expressive power, and we study their combined effects through establishing explicit approximation rates. Our study reveals the roles of critical parameters in the Transformer, such as the number of layers and the number of attention heads. These theoretical insights are validated experimentally and offer natural suggestions for alternative architectures.

Updated: 2024-10-30 08:47:36

标题: 理解Transformer在序列建模中的表现能力和机制

摘要: 我们对Transformer在对长、稀疏和复杂记忆进行序列建模时的逼近特性进行了系统研究。我们调查了Transformer的不同组件（如点积自注意力、位置编码和前馈层）对其表达能力的影响机制，并通过建立明确的逼近率研究它们的组合效应。我们的研究揭示了Transformer中关键参数（如层数和注意力头数）的作用。这些理论洞察力在实验中得到验证，并为替代架构提供自然建议。

更新时间: 2024-10-30 08:47:36

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2402.00522v6

A Kernel Perspective on Distillation-based Collaborative Learning

Over the past decade, there is a growing interest in collaborative learning that can enhance AI models of multiple parties. However, it is still challenging to enhance performance them without sharing private data and models from individual parties. One recent promising approach is to develop distillation-based algorithms that exploit unlabeled public data but the results are still unsatisfactory in both theory and practice. To tackle this problem, we rigorously analyze a representative distillation-based algorithm in the view of kernel regression. This work provides the first theoretical results to prove the (nearly) minimax optimality of the nonparametric collaborative learning algorithm that does not directly share local data or models in massively distributed statistically heterogeneous environments. Inspired by our theoretical results, we also propose a practical distillation-based collaborative learning algorithm based on neural network architecture. Our algorithm successfully bridges the gap between our theoretical assumptions and practical settings with neural networks through feature kernel matching. We simulate various regression tasks to verify our theory and demonstrate the practical feasibility of our proposed algorithm.

Updated: 2024-10-30 08:45:50

标题: 基于蒸馏的协作学习的核心视角

摘要: 在过去的十年中，人们对增强多方合作学习的人工智能模型表现出了日益浓厚的兴趣。然而，要在不共享个体方的私人数据和模型的情况下增强它们的性能仍然具有挑战性。最近一种有前途的方法是开发基于蒸馏的算法，利用未标记的公共数据，但结果在理论和实践中仍然不尽人意。为了解决这个问题，我们从核回归的角度严格分析了一个代表性的基于蒸馏的算法。这项工作提供了第一个理论结果，证明了在极其分布统计异构的环境中，在不直接共享本地数据或模型的情况下，非参数化合作学习算法的（几乎）极小最优性。受到我们的理论结果的启发，我们还提出了一种基于神经网络架构的实用蒸馏型合作学习算法。我们的算法通过特征核匹配成功地弥合了我们的理论假设和神经网络在实践设置中的差距。我们模拟了各种回归任务来验证我们的理论，并展示了我们提出的算法的实践可行性。

更新时间: 2024-10-30 08:45:50

领域: cs.LG

下载: http://arxiv.org/abs/2410.17592v2

Universality of the $π^2/6$ Pathway in Avoiding Model Collapse

Researchers in empirical machine learning recently spotlighted their fears of so-called Model Collapse. They imagined a discard workflow, where an initial generative model is trained with real data, after which the real data are discarded, and subsequently, the model generates synthetic data on which a new model is trained. They came to the conclusion that models degenerate as model-fitting generations proceed. However, other researchers considered an augment workflow, where the original real data continue to be used in each generation of training, augmented by synthetic data from models fit in all earlier generations. Empirical results on canonical datasets and learning procedures confirmed the occurrence of model collapse under the discard workflow and avoidance of model collapse under the augment workflow. Under the augment workflow, theoretical evidence also confirmed avoidance in particular instances; specifically, Gerstgrasser et al. (2024) found that for classical Linear Regression, test risk at any later generation is bounded by a moderate multiple, viz. pi-squared-over-6 of the test risk of training with the original real data alone. Some commentators questioned the generality of theoretical conclusions based on the generative model assumed in Gerstgrasser et al. (2024): could similar conclusions be reached for other task/model pairings? In this work, we demonstrate the universality of the pi-squared-over-6 augment risk bound across a large family of canonical statistical models, offering key insights into exactly why collapse happens under the discard workflow and is avoided under the augment workflow. In the process, we provide a framework that is able to accommodate a large variety of workflows (beyond discard and augment), thereby enabling an experimenter to judge the comparative merits of multiple different workflows by simulating a simple Gaussian process.

Updated: 2024-10-30 08:44:10

标题: 避免模型崩溃中$π^2/6$路径的普遍性

摘要: 近期实证机器学习研究人员关注了所谓的模型崩溃现象。他们设想了一个丢弃工作流程，在该工作流程中，首先使用真实数据训练一个生成模型，然后丢弃真实数据，随后模型生成合成数据，新模型在这些数据上进行训练。他们得出结论，随着模型拟合的世代进行，模型会退化。然而，其他研究人员考虑了一个增强工作流程，在这个工作流程中，原始真实数据在每一代训练中继续被使用，并通过在所有早期世代拟合的模型生成的合成数据进行增强。经过对经典数据集和学习程序的实证结果验证，发现在丢弃工作流程下模型崩溃的发生，而在增强工作流程下模型崩溃被避免。在增强工作流程下，理论证据也证实了在特定情况下的避免；具体来说，Gerstgrasser等人（2024年）发现对于经典的线性回归，任何后续世代的测试风险都受到一个适度倍数的限制，即pi平方除以6与仅使用原始真实数据进行训练时的测试风险。一些评论家质疑了基于Gerstgrasser等人（2024年）所假设的生成模型的理论结论的普遍性：对于其他任务/模型配对是否可以得出类似的结论？在这项工作中，我们展示了pi平方除以6增强风险边界在大量经典统计模型家族中的普遍性，为为什么在丢弃工作流程下会发生崩溃并在增强工作流程下被避免提供了关键见解。在这个过程中，我们提供了一个能够容纳多种工作流程（超出丢弃和增强）的框架，从而使实验者可以通过模拟一个简单的高斯过程来判断多种不同工作流程的相对优点。

更新时间: 2024-10-30 08:44:10

领域: cs.LG,cs.AI,cs.ET,math.ST,stat.ML,stat.TH

下载: http://arxiv.org/abs/2410.22812v1

Causality-Enhanced Behavior Sequence Modeling in LLMs for Personalized Recommendation

Recent advancements in recommender systems have focused on leveraging Large Language Models (LLMs) to improve user preference modeling, yielding promising outcomes. However, current LLM-based approaches struggle to fully leverage user behavior sequences, resulting in suboptimal preference modeling for personalized recommendations. In this study, we propose a novel Counterfactual Fine-Tuning (CFT) method to address this issue by explicitly emphasizing the role of behavior sequences when generating recommendations. Specifically, we employ counterfactual reasoning to identify the causal effects of behavior sequences on model output and introduce a task that directly fits the ground-truth labels based on these effects, achieving the goal of explicit emphasis. Additionally, we develop a token-level weighting mechanism to adjust the emphasis strength for different item tokens, reflecting the diminishing influence of behavior sequences from earlier to later tokens during predicting an item. Extensive experiments on real-world datasets demonstrate that CFT effectively improves behavior sequence modeling. Our codes are available at https://github.com/itsmeyjt/CFT.

Updated: 2024-10-30 08:41:13

标题: 因果关系增强的LLMs个性化推荐行为序列建模

摘要: 最近推荐系统的进展集中在利用大型语言模型（LLMs）来改进用户偏好建模，取得了令人期待的成果。然而，目前基于LLM的方法在充分利用用户行为序列方面存在困难，导致个性化推荐的偏好建模不够优化。在本研究中，我们提出了一种新颖的反事实微调（CFT）方法，通过明确强调生成推荐时行为序列的作用来解决这个问题。具体来说，我们采用反事实推理来确定行为序列对模型输出的因果效应，并引入一项任务，根据这些效应直接拟合基于真实标签的目标，实现明确强调的目的。此外，我们开发了一个基于标记级别的权重机制，用于调整不同物品标记的强调力度，反映了在预测物品时从较早到较晚的标记中行为序列的影响逐渐减弱。在真实世界数据集上进行的大量实验证明了CFT有效地改进了行为序列建模。我们的代码可在https://github.com/itsmeyjt/CFT 上找到。

更新时间: 2024-10-30 08:41:13

领域: cs.IR,cs.AI

下载: http://arxiv.org/abs/2410.22809v1

Mapping the DeFi Crime Landscape: An Evidence-based Picture

Decentralized finance (DeFi) has been the target of numerous profit-driven crimes, but the prevalence and cumulative impact of these crimes have not yet been assessed. This study provides a comprehensive assessment of profit-driven crimes targeting the DeFi sector. We collected data on 1153 crime events from 2017 to 2022. Of these, 1,048 were related to DeFi (the main focus of this study) and 105 to centralized finance (CeFi). The findings show that the entire cryptoasset industry has suffered a minimum loss of US$30B, with two thirds related to CeFi and one third to DeFi. Focusing on DeFi, a taxonomy was developed to clarify the similarities and differences among these crimes. All events were mapped onto the DeFi stack to assess the impacted technical layers, and the financial damages were quantified to gauge their scale. The results highlight that during an attack, a DeFi actor (an entity developing a DeFi technology) can serve as a direct target (due to technical vulnerabilities or exploitation of human risks), as a perpetrator (through malicious uses of contracts or market manipulations), or as an intermediary (by being imitated through, for example, phishing scams). The findings also show that DeFi actors are the first victims of crimes targeting the DeFi industry: 52.2% of events targeted them, primarily due to technical vulnerabilities at the protocol layer, and these events accounted for 83% of all financial damages. Alternatively, in 40.7% of events, DeFi actors were themselves malicious perpetrators, predominantly misusing contracts at the cryptoasset layer (e.g., rug pull scams). However, these events accounted for only 17% of all financial damages. The study offers a preliminary assessment of the size and scope of crime events within the DeFi sector and highlights the vulnerable position of DeFi actors in the ecosystem.

Updated: 2024-10-30 08:40:51

标题: 绘制 DeFi 犯罪景观：证据为基础的画面

摘要: DeFi分散式金融一直是许多以盈利为目的的犯罪行为的目标，但这些犯罪的普遍程度和累积影响尚未被评估。本研究对针对DeFi行业的以盈利为目的的犯罪行为进行了全面评估。我们收集了2017年至2022年间的1153起犯罪事件数据。其中，有1048起与DeFi相关（本研究的主要焦点），105起与中心化金融（CeFi）相关。研究结果显示，整个加密资产行业至少损失了300亿美元，其中三分之二与CeFi相关，三分之一与DeFi相关。聚焦于DeFi，我们制定了一个分类法，以澄清这些犯罪行为之间的相似之处和不同之处。所有事件被映射到DeFi堆栈上，以评估受影响的技术层，并量化金融损失以衡量其规模。研究结果突出显示，在一次攻击中，DeFi行为者（开发DeFi技术的实体）可以作为直接目标（由于技术漏洞或利用人为风险而成为目标）、作为作恶者（通过恶意使用合约或操纵市场）或作为中介（例如通过仿冒，如网络钓鱼诈骗）。研究结果还显示，DeFi行为者是针对DeFi行业犯罪的第一受害者：52.2%的事件针对他们，主要是由于协议层的技术漏洞，这些事件占据了所有财务损失的83%。另外，在40.7%的事件中，DeFi行为者本身是恶意作恶者，主要是在加密资产层滥用合约（例如抽羊毛诈骗）。然而，这些事件仅占所有财务损失的17%。本研究对DeFi行业内的犯罪事件的规模和范围进行了初步评估，并突出了DeFi行为者在生态系统中的脆弱地位。

更新时间: 2024-10-30 08:40:51

领域: cs.CR,cs.CY

下载: http://arxiv.org/abs/2310.04356v2

Are High-Degree Representations Really Unnecessary in Equivariant Graph Neural Networks?

Equivariant Graph Neural Networks (GNNs) that incorporate E(3) symmetry have achieved significant success in various scientific applications. As one of the most successful models, EGNN leverages a simple scalarization technique to perform equivariant message passing over only Cartesian vectors (i.e., 1st-degree steerable vectors), enjoying greater efficiency and efficacy compared to equivariant GNNs using higher-degree steerable vectors. This success suggests that higher-degree representations might be unnecessary. In this paper, we disprove this hypothesis by exploring the expressivity of equivariant GNNs on symmetric structures, including $k$-fold rotations and regular polyhedra. We theoretically demonstrate that equivariant GNNs will always degenerate to a zero function if the degree of the output representations is fixed to 1 or other specific values. Based on this theoretical insight, we propose HEGNN, a high-degree version of EGNN to increase the expressivity by incorporating high-degree steerable vectors while maintaining EGNN's efficiency through the scalarization trick. Our extensive experiments demonstrate that HEGNN not only aligns with our theoretical analyses on toy datasets consisting of symmetric structures, but also shows substantial improvements on more complicated datasets such as $N$-body and MD17. Our theoretical findings and empirical results potentially open up new possibilities for the research of equivariant GNNs.

Updated: 2024-10-30 08:38:14

标题: 等变图神经网络中的高阶表示是否真的不必要？

摘要: 在各种科学应用中，结合E(3)对称性的等变图神经网络（GNNs）取得了显著的成功。作为最成功的模型之一，EGNN利用简单的标量化技术进行等变消息传递，仅对笛卡尔向量（即1阶可转向向量）进行操作，与使用更高阶可转向向量的等变GNNs相比，效率和功效更高。这一成功表明，更高阶的表示可能是不必要的。本文通过探索等变GNNs在对称结构上的表现力，包括$k$-fold旋转和正多面体，驳斥了这一假设。我们在理论上证明，如果将输出表示的阶数固定为1或其他特定值，等变GNNs将总是退化为零函数。基于这一理论洞见，我们提出HEGNN，这是EGNN的高阶版本，通过整合高阶可转向向量来增加表现力，同时通过标量化技巧保持EGNN的效率。我们的广泛实验表明，HEGNN不仅符合我们在由对称结构组成的玩具数据集上的理论分析，还在更复杂的数据集（如$N$-body和MD17）上显示出显著改进。我们的理论发现和实证结果可能为等变GNNs的研究开辟新的可能性。

更新时间: 2024-10-30 08:38:14

领域: cs.LG

下载: http://arxiv.org/abs/2410.11443v3

MILP-StuDio: MILP Instance Generation via Block Structure Decomposition

Mixed-integer linear programming (MILP) is one of the most popular mathematical formulations with numerous applications. In practice, improving the performance of MILP solvers often requires a large amount of high-quality data, which can be challenging to collect. Researchers thus turn to generation techniques to generate additional MILP instances. However, existing approaches do not take into account specific block structures -- which are closely related to the problem formulations -- in the constraint coefficient matrices (CCMs) of MILPs. Consequently, they are prone to generate computationally trivial or infeasible instances due to the disruptions of block structures and thus problem formulations. To address this challenge, we propose a novel MILP generation framework, called Block Structure Decomposition (MILP-StuDio), to generate high-quality instances by preserving the block structures. Specifically, MILP-StuDio begins by identifying the blocks in CCMs and decomposing the instances into block units, which serve as the building blocks of MILP instances. We then design three operators to construct new instances by removing, substituting, and appending block units in the original instances, enabling us to generate instances with flexible sizes. An appealing feature of MILP-StuDio is its strong ability to preserve the feasibility and computational hardness of the generated instances. Experiments on the commonly-used benchmarks demonstrate that using instances generated by MILP-StuDio is able to significantly reduce over 10% of the solving time for learning-based solvers.

Updated: 2024-10-30 08:33:27

标题: MILP-StuDio：通过块结构分解生成MILP实例

摘要: 混合整数线性规划（MILP）是一种应用广泛的数学形式之一。在实践中，改进MILP求解器的性能通常需要大量高质量的数据，这可能很难收集。因此，研究人员转向生成技术来生成额外的MILP实例。然而，现有方法没有考虑MILP的约束系数矩阵（CCM）中与问题形式密切相关的特定块结构。因此，由于块结构的破坏以及问题形式的影响，它们很容易生成计算上微不足道或不可行的实例。为了解决这一挑战，我们提出了一种新颖的MILP生成框架，称为块结构分解（MILP-StuDio），通过保留块结构生成高质量的实例。具体而言，MILP-StuDio首先通过识别CCM中的块并将实例分解为块单元，这些单元作为MILP实例的构建基块。然后，我们设计了三种运算符，通过从原始实例中移除、替换和附加块单元，来构建新实例，使我们能够生成具有灵活大小的实例。MILP-StuDio的一个吸引人特点是其强大的能力，能够保持生成实例的可行性和计算难度。在常用基准测试上的实验表明，使用MILP-StuDio生成的实例能够显著减少学习型求解器的解决时间超过10％。

更新时间: 2024-10-30 08:33:27

领域: cs.LG,cs.DM

下载: http://arxiv.org/abs/2410.22806v1

Run-Time Adaptation of Neural Beamforming for Robust Speech Dereverberation and Denoising

This paper describes speech enhancement for realtime automatic speech recognition (ASR) in real environments. A standard approach to this task is to use neural beamforming that can work efficiently in an online manner. It estimates the masks of clean dry speech from a noisy echoic mixture spectrogram with a deep neural network (DNN) and then computes a enhancement filter used for beamforming. The performance of such a supervised approach, however, is drastically degraded under mismatched conditions. This calls for run-time adaptation of the DNN. Although the ground-truth speech spectrogram required for adaptation is not available at run time, blind dereverberation and separation methods such as weighted prediction error (WPE) and fast multichannel nonnegative matrix factorization (FastMNMF) can be used for generating pseudo groundtruth data from a mixture. Based on this idea, a prior work proposed a dual-process system based on a cascade of WPE and minimum variance distortionless response (MVDR) beamforming asynchronously fine-tuned by block-online FastMNMF. To integrate the dereverberation capability into neural beamforming and make it fine-tunable at run time, we propose to use weighted power minimization distortionless response (WPD) beamforming, a unified version of WPE and minimum power distortionless response (MPDR), whose joint dereverberation and denoising filter is estimated using a DNN. We evaluated the impact of run-time adaptation under various conditions with different numbers of speakers, reverberation times, and signal-to-noise ratios (SNRs).

Updated: 2024-10-30 08:32:47

标题: 神经波束成形的运行时适应性用于稳健的语音去混响和降噪

摘要: 本文描述了实时环境下自动语音识别（ASR）的语音增强。这项任务的标准方法是使用可以有效在线工作的神经波束成形技术。它利用深度神经网络（DNN）从嘈杂的回声混合谱图中估计干净干燥语音的掩蔽，并计算用于波束成形的增强滤波器。然而，这种监督方法的性能在条件不匹配时会急剧下降。这要求对DNN进行运行时适应。虽然运行时需要用于调整的真实语音谱图数据不可用，但可以使用盲消混响和分离方法，如加权预测误差（WPE）和快速多通道非负矩阵分解（FastMNMF），从混合中生成伪地面真实数据。基于这个想法，之前的研究提出了一个基于WPE和最小方差无失真响应（MVDR）波束成形级联的双处理系统，通过块在线FastMNMF进行异步微调。为了将消混响能力整合到神经波束成形中，并使其在运行时可微调，我们提出使用加权功率最小化无失真响应（WPD）波束成形，这是WPE和最小功率无失真响应（MPDR）的统一版本，其联合消混响和降噪滤波器是使用DNN估计的。我们评估了在不同说话者数量、混响时间和信噪比（SNR）条件下运行时适应的影响。

更新时间: 2024-10-30 08:32:47

领域: cs.SD,cs.AI,cs.LG,eess.AS

下载: http://arxiv.org/abs/2410.22805v1

DOA-Aware Audio-Visual Self-Supervised Learning for Sound Event Localization and Detection

This paper describes sound event localization and detection (SELD) for spatial audio recordings captured by firstorder ambisonics (FOA) microphones. In this task, one may train a deep neural network (DNN) using FOA data annotated with the classes and directions of arrival (DOAs) of sound events. However, the performance of this approach is severely bounded by the amount of annotated data. To overcome this limitation, we propose a novel method of pretraining the feature extraction part of the DNN in a self-supervised manner. We use spatial audio-visual recordings abundantly available as virtual reality contents. Assuming that sound objects are concurrently observed by the FOA microphones and the omni-directional camera, we jointly train audio and visual encoders with contrastive learning such that the audio and visual embeddings of the same recording and DOA are made close. A key feature of our method is that the DOA-wise audio embeddings are jointly extracted from the raw audio data, while the DOA-wise visual embeddings are separately extracted from the local visual crops centered on the corresponding DOA. This encourages the latent features of the audio encoder to represent both the classes and DOAs of sound events. The experiment using the DCASE2022 Task 3 dataset of 20 hours shows non-annotated audio-visual recordings of 100 hours reduced the error score of SELD from 36.4 pts to 34.9 pts.

Updated: 2024-10-30 08:31:58

标题: 意思是：针对声音事件定位和检测的DOA感知音频-视觉自监督学习

摘要: 本文描述了用于由一阶环形声学（FOA）麦克风捕捉的空间音频录音的声音事件定位和检测（SELD）。在这个任务中，可以使用带有声音事件的类别和到达方向（DOAs）注释的FOA数据训练深度神经网络（DNN）。然而，这种方法的性能受到注释数据量的严重限制。为了克服这一限制，我们提出了一种新颖的方法，以自监督的方式对DNN的特征提取部分进行预训练。我们使用丰富的空间音频-视觉录音作为虚拟现实内容。假设声音对象同时被FOA麦克风和全向摄像头观察到，我们联合训练音频和视觉编码器，通过对比学习使相同录音和DOA的音频和视觉嵌入接近。我们方法的一个关键特点是，DOA-wise音频嵌入是从原始音频数据中联合提取的，而DOA-wise视觉嵌入是从以相应DOA为中心的局部视觉裁剪中分别提取的。这鼓励音频编码器的潜在特征表示声音事件的类别和DOAs。使用20小时的DCASE2022任务3数据集的实验表明，100小时的非注释音频-视觉录音将SELD的误差分数从36.4分降低到34.9分。

更新时间: 2024-10-30 08:31:58

领域: cs.SD,cs.AI,cs.LG,cs.MM,eess.AS

下载: http://arxiv.org/abs/2410.22803v1

Machine Learning Nonadiabatic Dynamics: Eliminating Phase Freedom of Nonadiabatic Couplings with the State-Intraction State-Averaged Spin-Restricted Ensemble-Referenced Kohn-Sham Approach

Excited-state molecular dynamics (ESMD) simulations near conical intersections (CIs) pose significant challenges when using machine learning potentials (MLPs). Although MLPs have gained recognition for their integration into mixed quantum-classical (MQC) methods, such as trajectory surface hopping (TSH), and their capacity to model correlated electron-nuclear dynamics efficiently, difficulties persist in managing nonadiabatic dynamics. Specifically, singularities at CIs and double-valued coupling elements result in discontinuities that disrupt the smoothness of predictive functions. Partial solutions have been provided by learning diabatic Hamiltonians with phaseless loss functions to these challenges. However, a definitive method for addressing the discontinuities caused by CIs and double-valued coupling elements has yet to be developed. Here, we introduce the phaseless coupling term, $\Delta^2$, derived from the square of the off-diagonal elements of the diabatic Hamiltonian in the SSR(2,2) formalism. This approach improves the stability and accuracy of the MLP model by addressing the issues arising from CI singularities and double-valued coupling functions. We apply this method to the penta-2,4-dieniminium cation (PSB3), demonstrating its effectiveness in improving MLP training for ML-based nonadiabatic dynamics. Our results show that the $\Delta^2$ based ML-ESMD method can reproduce ab initio ESMD simulations, underscoring its potential and efficiency for broader applications, particularly in large-scale and long-timescale ESMD simulations.

Updated: 2024-10-30 08:30:46

标题: 机器学习非绝热动力学：用状态相互作用状态平均自旋限制集合参考Kohn-Sham方法消除非绝热耦合的相位自由度

摘要: 激发态分子动力学（ESMD）模拟在锥形交叉点（CIs）附近时，使用机器学习势（MLPs）面临重大挑战。尽管MLPs因其融入混合量子-经典（MQC）方法（如轨迹表面跃迁（TSH））和高效建模相关电子-核动力学而得到认可，但在处理非绝热动力学方面仍存在困难。具体来说，在CIs处的奇异性和双值耦合元素导致不连续性，破坏了预测函数的平滑性。通过学习具有无相位损失函数的耦合哈密顿量提供了部分解决方案。然而，尚未开发出一种明确的方法来解决CIs和双值耦合元素引起的不连续性。在这里，我们引入了从SSR（2,2）形式的非对角元素的平方导出的无相位耦合项$\Delta^2$。该方法通过解决由CI奇异性和双值耦合函数引起的问题，改善了MLP模型的稳定性和准确性。我们将此方法应用于五乙-2,4-二烯亚铵阳离子（PSB3），展示其在改善基于ML的非绝热动力学的MLP训练中的有效性。我们的结果表明，基于$\Delta^2$的ML-ESMD方法可以重现从头算ESMD模拟，突显了其在更广泛应用中的潜力和效率，特别是在大规模和长时间尺度的ESMD模拟中。

更新时间: 2024-10-30 08:30:46

领域: physics.chem-ph,cs.LG

下载: http://arxiv.org/abs/2410.22801v1

Solving Differential Equations with Constrained Learning

(Partial) differential equations (PDEs) are fundamental tools for describing natural phenomena, making their solution crucial in science and engineering. While traditional methods, such as the finite element method, provide reliable solutions, their accuracy is often tied to the use of computationally intensive fine meshes. Moreover, they do not naturally account for measurements or prior solutions, and any change in the problem parameters requires results to be fully recomputed. Neural network-based approaches, such as physics-informed neural networks and neural operators, offer a mesh-free alternative by directly fitting those models to the PDE solution. They can also integrate prior knowledge and tackle entire families of PDEs by simply aggregating additional training losses. Nevertheless, they are highly sensitive to hyperparameters such as collocation points and the weights associated with each loss. This paper addresses these challenges by developing a science-constrained learning (SCL) framework. It demonstrates that finding a (weak) solution of a PDE is equivalent to solving a constrained learning problem with worst-case losses. This explains the limitations of previous methods that minimize the expected value of aggregated losses. SCL also organically integrates structural constraints (e.g., invariances) and (partial) measurements or known solutions. The resulting constrained learning problems can be tackled using a practical algorithm that yields accurate solutions across a variety of PDEs, neural network architectures, and prior knowledge levels without extensive hyperparameter tuning and sometimes even at a lower computational cost.

Updated: 2024-10-30 08:20:39

标题: 使用受限学习解决微分方程

摘要: 偏微分方程（PDEs）是描述自然现象的基本工具，因此在科学和工程中解决它们的解决方案至关重要。传统方法，如有限元方法，提供可靠的解决方案，但其准确性通常取决于使用计算密集型的细网格。此外，它们并不自然地考虑测量或先前的解决方案，而且问题参数的任何改变都需要完全重新计算结果。基于神经网络的方法，如物理信息神经网络和神经算子，提供了一种无网格的替代方案，通过直接将这些模型拟合到PDE解决方案来实现。它们还可以整合先前的知识，并通过简单地聚合额外的训练损失来处理整个PDE族。然而，它们对超参数（如配点和与每个损失相关的权重）非常敏感。本文通过开发科学约束学习（SCL）框架来解决这些挑战。它证明找到PDE的（弱）解等同于解决一个带有最坏情况损失的约束学习问题。这解释了最小化聚合损失的预期值的先前方法的局限性。SCL还有机地整合了结构约束（例如不变性）和（部分）测量或已知解决方案。由此产生的约束学习问题可以使用一个实用算法来处理，该算法可以在各种PDE、神经网络架构和先前知识水平之间产生准确的解决方案，而无需进行广泛的超参数调整，有时甚至可以以更低的计算成本实现。

更新时间: 2024-10-30 08:20:39

领域: cs.LG,cs.CE

下载: http://arxiv.org/abs/2410.22796v1

Bayesian Collaborative Bandits with Thompson Sampling for Improved Outreach in Maternal Health Program

Mobile health (mHealth) programs face a critical challenge in optimizing the timing of automated health information calls to beneficiaries. This challenge has been formulated as a collaborative multi-armed bandit problem, requiring online learning of a low-rank reward matrix. Existing solutions often rely on heuristic combinations of offline matrix completion and exploration strategies. In this work, we propose a principled Bayesian approach using Thompson Sampling for this collaborative bandit problem. Our method leverages prior information through efficient Gibbs sampling for posterior inference over the low-rank matrix factors, enabling faster convergence. We demonstrate significant improvements over state-of-the-art baselines on a real-world dataset from the world's largest maternal mHealth program. Our approach achieves a $16\%$ reduction in the number of calls compared to existing methods and a $47$\% reduction compared to the deployed random policy. This efficiency gain translates to a potential increase in program capacity by $0.5-1.4$ million beneficiaries, granting them access to vital ante-natal and post-natal care information. Furthermore, we observe a $7\%$ and $29\%$ improvement in beneficiary retention (an extremely hard metric to impact) compared to state-of-the-art and deployed baselines, respectively. Synthetic simulations further demonstrate the superiority of our approach, particularly in low-data regimes and in effectively utilizing prior information. We also provide a theoretical analysis of our algorithm in a special setting using Eluder dimension.

Updated: 2024-10-30 08:19:24

标题: 贝叶斯协作式贝叶斯策略在孕妇健康计划中的应用，用于改进推广效果

摘要: 移动健康（mHealth）计划在优化向受益人发送自动健康信息电话的时机方面面临着一个关键挑战。这一挑战被表述为一个协作多臂老虎机问题，需要在线学习低秩奖励矩阵。现有的解决方案通常依赖于离线矩阵补全和探索策略的启发式组合。在这项工作中，我们提出了一个基于贝叶斯原理的方法，使用汤普森抽样来解决这个协作老虎机问题。我们的方法通过高效的吉布斯抽样进行后验推断，从而加速了对低秩矩阵因子的收敛。我们在来自世界上最大的产妇mHealth计划的真实数据集上展示了与现有技术基准相比的显著改进。我们的方法与现有方法相比，减少了呼叫次数16％，与部署的随机策略相比减少了47％。这种效率提升意味着计划容量可能增加0.5-1.4百万受益人，使他们能够获得重要的产前和产后保健信息。此外，与现有技术和已部署的基准相比，我们观察到受益人留存率提高了7％和29％（这是一个极难影响的度量）。合成模拟进一步展示了我们方法的优越性，特别是在低数据情况下和有效利用先验信息方面。我们还在一个特殊设置下使用Eluder维度对我们的算法进行了理论分析。

更新时间: 2024-10-30 08:19:24

领域: cs.LG

下载: http://arxiv.org/abs/2410.21405v2

Dual Contrastive Transformer for Hierarchical Preference Modeling in Sequential Recommendation

Sequential recommender systems (SRSs) aim to predict the subsequent items which may interest users via comprehensively modeling users' complex preference embedded in the sequence of user-item interactions. However, most of existing SRSs often model users' single low-level preference based on item ID information while ignoring the high-level preference revealed by item attribute information, such as item category. Furthermore, they often utilize limited sequence context information to predict the next item while overlooking richer inter-item semantic relations. To this end, in this paper, we proposed a novel hierarchical preference modeling framework to substantially model the complex low- and high-level preference dynamics for accurate sequential recommendation. Specifically, in the framework, a novel dual-transformer module and a novel dual contrastive learning scheme have been designed to discriminatively learn users' low- and high-level preference and to effectively enhance both low- and high-level preference learning respectively. In addition, a novel semantics-enhanced context embedding module has been devised to generate more informative context embedding for further improving the recommendation performance. Extensive experiments on six real-world datasets have demonstrated both the superiority of our proposed method over the state-of-the-art ones and the rationality of our design.

Updated: 2024-10-30 08:09:33

标题: 双重对比变压器用于序列推荐中的分层偏好建模

摘要: Sequential recommender systems (SRSs)旨在通过全面建模用户与物品交互序列中嵌入的复杂偏好，预测可能吸引用户的后续物品。然而，大多数现有的SRSs经常基于物品ID信息模拟用户的单一低级偏好，而忽略了物品属性信息（如物品类别）揭示的高级偏好。此外，它们经常利用有限的序列上下文信息来预测下一个物品，而忽视了更丰富的物品间语义关系。为此，在本文中，我们提出了一种新的层次偏好建模框架，用于准确的顺序推荐中实质性地建模复杂的低级和高级偏好动态。具体而言，在该框架中，设计了一种新颖的双变换器模块和一种新颖的双对比学习方案，用于区分性地学习用户的低级和高级偏好，并分别有效增强低级和高级偏好的学习。此外，设计了一种新颖的语义增强上下文嵌入模块，用于生成更具信息性的上下文嵌入，以进一步提高推荐性能。对六个真实世界数据集的大量实验证明了我们提出的方法优于现有技术的优越性和我们设计的合理性。

更新时间: 2024-10-30 08:09:33

领域: cs.IR,cs.AI

下载: http://arxiv.org/abs/2410.22790v1

Theoretical Investigations and Practical Enhancements on Tail Task Risk Minimization in Meta Learning

Meta learning is a promising paradigm in the era of large models and task distributional robustness has become an indispensable consideration in real-world scenarios. Recent advances have examined the effectiveness of tail task risk minimization in fast adaptation robustness improvement \citep{wang2023simple}. This work contributes to more theoretical investigations and practical enhancements in the field. Specifically, we reduce the distributionally robust strategy to a max-min optimization problem, constitute the Stackelberg equilibrium as the solution concept, and estimate the convergence rate. In the presence of tail risk, we further derive the generalization bound, establish connections with estimated quantiles, and practically improve the studied strategy. Accordingly, extensive evaluations demonstrate the significance of our proposal and its scalability to multimodal large models in boosting robustness.

Updated: 2024-10-30 08:07:43

标题: 元学习中尾部任务风险最小化的理论研究和实际增强

摘要: 元学习是在大模型时代中的一个有前途的范式，任务分布鲁棒性已经成为现实场景中不可或缺的考虑因素。最近的进展已经检验了尾部任务风险最小化在快速适应稳健性改进中的有效性。这项工作为该领域提供了更多的理论研究和实际增强。具体而言，我们将分布稳健策略简化为一个最大-最小优化问题，构成斯塔克尔贝格均衡作为解决概念，并估计收敛速度。在尾部风险存在的情况下，我们进一步推导了泛化界限，建立了与估计分位数的联系，并实际改进了研究的策略。因此，广泛的评估证明了我们的提议的重要性以及其在提高稳健性方面对多模态大模型的可扩展性。

更新时间: 2024-10-30 08:07:43

领域: cs.LG

下载: http://arxiv.org/abs/2410.22788v1

Parameter uncertainties for imperfect surrogate models in the low-noise regime

Bayesian regression determines model parameters by minimizing the expected loss, an upper bound to the true generalization error. However, the loss ignores misspecification, where models are imperfect. Parameter uncertainties from Bayesian regression are thus significantly underestimated and vanish in the large data limit. This is particularly problematic when building models of low-noise, or near-deterministic, calculations, as the main source of uncertainty is neglected. We analyze the generalization error of misspecified, near-deterministic surrogate models, a regime of broad relevance in science and engineering. We show posterior distributions must cover every training point to avoid a divergent generalization error and design an ansatz that respects this constraint, which for linear models incurs minimal overhead. This is demonstrated on model problems before application to thousand dimensional datasets in atomistic machine learning. Our efficient misspecification-aware scheme gives accurate prediction and bounding of test errors where existing schemes fail, allowing this important source of uncertainty to be incorporated in computational workflows.

Updated: 2024-10-30 08:05:41

标题: 低噪声环境下不完美代理模型的参数不确定性

摘要: 贝叶斯回归通过最小化期望损失来确定模型参数，这是真实泛化误差的一个上界。然而，这种损失忽略了错误规范，即模型是不完美的情况。因此，贝叶斯回归的参数不确定性被显著低估，并在大数据限制下消失。这在构建低噪声或接近确定性计算模型时尤为棘手，因为主要的不确定性源被忽略了。我们分析了错误规范、接近确定性的替代模型的泛化误差，这是科学和工程中广泛相关的情况。我们展示后验分布必须覆盖每个训练点，以避免发散的泛化误差，并设计了一种符合这一约束的方法，对线性模型而言，开销最小。这在模型问题上得到了证明，然后应用于原子机器学习中的千维数据集。我们的高效错误规范感知方案可以准确预测和限制测试错误，而现有方案失败，这使得这种重要的不确定性源可以纳入计算工作流程中。

更新时间: 2024-10-30 08:05:41

领域: stat.ML,cs.LG,physics.data-an

下载: http://arxiv.org/abs/2402.01810v4

Contrastive Learning and Adversarial Disentanglement for Privacy-Preserving Task-Oriented Semantic Communications

Task-oriented semantic communication systems have emerged as a promising approach to achieving efficient and intelligent data transmission, where only information relevant to a specific task is communicated. However, existing methods struggle to fully disentangle task-relevant and task-irrelevant information, leading to privacy concerns and subpar performance. To address this, we propose an information-bottleneck method, named CLAD (contrastive learning and adversarial disentanglement). CLAD leverages contrastive learning to effectively capture task-relevant features while employing adversarial disentanglement to discard task-irrelevant information. Additionally, due to the lack of reliable and reproducible methods to gain insight into the informativeness and minimality of the encoded feature vectors, we introduce a new technique to compute the information retention index (IRI), a comparative metric used as a proxy for the mutual information between the encoded features and the input, reflecting the minimality of the encoded features. The IRI quantifies the minimality and informativeness of the encoded feature vectors across different task-oriented communication techniques. Our extensive experiments demonstrate that CLAD outperforms state-of-the-art baselines in terms of task performance, privacy preservation, and IRI. CLAD achieves a predictive performance improvement of around 2.5-3%, along with a 77-90% reduction in IRI and a 57-76% decrease in adversarial accuracy.

Updated: 2024-10-30 07:59:52

标题: 对比学习和对抗性解缠在隐私保护任务导向的语义通信中的应用

摘要: 面向任务的语义通信系统已经成为一种有前途的方法，可实现高效智能的数据传输，只传递与特定任务相关的信息。然而，现有方法难以完全解开任务相关和任务无关信息，导致隐私问题和性能不佳。为了解决这个问题，我们提出了一种信息瓶颈方法，称为CLAD（对比学习和对抗解缠）。CLAD利用对比学习有效地捕获任务相关特征，同时使用对抗解缠来丢弃任务无关信息。此外，由于缺乏可靠和可重复的方法来了解编码特征向量的信息量和极小性，我们引入了一种新技术来计算信息保留指数（IRI），这是一种比较度量，用作编码特征和输入之间的互信息的代理，反映编码特征的极小性。IRI量化了编码特征向量在不同面向任务的通信技术中的极小性和信息量。我们的大量实验证明，CLAD在任务性能、隐私保护和IRI方面优于最先进的基线。CLAD实现了约2.5-3%的预测性能改善，以及IRI减少了77-90%，对抗准确率降低了57-76%。

更新时间: 2024-10-30 07:59:52

领域: cs.LG,cs.AI,cs.CV,cs.IT,eess.IV,math.IT

下载: http://arxiv.org/abs/2410.22784v1

MALoRA: Mixture of Asymmetric Low-Rank Adaptation for Enhanced Multi-Task Learning

Parameter-Efficient Fine-Tuning (PEFT) methods like LoRA have significantly improved the adaptation of LLMs to downstream tasks in a resource-efficient manner. However, in multi-task scenarios, challenges such as training imbalance and the seesaw effect frequently emerge. Mixture-of-LoRA (MoLoRA), which combines LoRA with sparse Mixture-of-Experts, mitigates some of these issues by promoting task-specific learning across experts. Despite this, MoLoRA remains inefficient in terms of training speed, parameter utilization, and overall multi-task performance. In this paper, we propose Mixture of Asymmetric Low-Rank Adaptaion (MALoRA), a flexible fine-tuning framework that leverages asymmetric optimization across LoRA experts. MALoRA reduces the number of trainable parameters by 30% to 48%, increases training speed by 1.2x, and matches the computational efficiency of single-task LoRA models. Additionally, MALoRA addresses overfitting issues commonly seen in high-rank configurations, enhancing performance stability. Extensive experiments across diverse multi-task learning scenarios demonstrate that MALoRA consistently outperforms all baseline methods in both inter-domain and intra-domain tasks.

Updated: 2024-10-30 07:53:52

标题: MALoRA：混合的不对称低秩适应以增强多任务学习

摘要: Parameter-Efficient Fine-Tuning (PEFT)方法如LoRA显著改善了LLMs对下游任务的资源有效适应。然而，在多任务场景中，训练不平衡和跷跷板效应等挑战经常出现。将LoRA与稀疏的Mixture-of-Experts结合的Mixture-of-LoRA（MoLoRA）通过促进专家之间的特定任务学习来缓解一些问题。尽管如此，MoLoRA在训练速度、参数利用率和整体多任务性能方面仍然效率低下。在本文中，我们提出了一种灵活的微调框架Mixture of Asymmetric Low-Rank Adaptaion（MALoRA），利用LoRA专家之间的不对称优化。MALoRA将可训练参数数量减少了30%至48%，训练速度提高了1.2倍，并且与单任务LoRA模型的计算效率相匹配。此外，MALoRA解决了高秩配置中常见的过拟合问题，增强了性能稳定性。在各种多任务学习场景中进行的广泛实验表明，MALoRA在跨域和内域任务中始终优于所有基线方法。

更新时间: 2024-10-30 07:53:52

领域: cs.CL,cs.LG,I.2.7

下载: http://arxiv.org/abs/2410.22782v1

Unfolding Target Detection with State Space Model

Target detection is a fundamental task in radar sensing, serving as the precursor to any further processing for various applications. Numerous detection algorithms have been proposed. Classical methods based on signal processing, e.g., the most widely used CFAR, are challenging to tune and sensitive to environmental conditions. Deep learning-based methods can be more accurate and robust, yet usually lack interpretability and physical relevance. In this paper, we introduce a novel method that combines signal processing and deep learning by unfolding the CFAR detector with a state space model architecture. By reserving the CFAR pipeline yet turning its sophisticated configurations into trainable parameters, our method achieves high detection performance without manual parameter tuning, while preserving model interpretability. We implement a lightweight model of only 260K parameters and conduct real-world experiments for human target detection using FMCW radars. The results highlight the remarkable performance of the proposed method, outperforming CFAR and its variants by 10X in detection rate and false alarm rate. Our code is open-sourced here: https://github.com/aiot-lab/NeuroDet.

Updated: 2024-10-30 07:43:18

标题: 使用状态空间模型展开目标检测

摘要: 目标检测是雷达感知中的基本任务，作为各种应用的进一步处理的先导。已经提出了许多检测算法。基于信号处理的经典方法，如最广泛使用的CFAR，很难调整，对环境条件敏感。基于深度学习的方法可以更准确和稳健，但通常缺乏可解释性和物理相关性。在本文中，我们介绍了一种将信号处理和深度学习相结合的新方法，通过使用状态空间模型架构展开CFAR检测器。通过保留CFAR管道，将其复杂配置转化为可训练参数，我们的方法在不需要手动参数调整的情况下实现了高检测性能，同时保持了模型的可解释性。我们实现了一个仅有260K参数的轻量级模型，并使用FMCW雷达进行了人体目标检测的实际实验。结果突显了所提出方法的显著性能，在检测率和误报率方面超过CFAR及其变体10倍。我们的代码在此处开源：https://github.com/aiot-lab/NeuroDet。

更新时间: 2024-10-30 07:43:18

领域: eess.SP,cs.LG

下载: http://arxiv.org/abs/2410.22774v1

Reliability Assessment of Information Sources Based on Random Permutation Set

In pattern recognition, handling uncertainty is a critical challenge that significantly affects decision-making and classification accuracy. Dempster-Shafer Theory (DST) is an effective reasoning framework for addressing uncertainty, and the Random Permutation Set (RPS) extends DST by additionally considering the internal order of elements, forming a more ordered extension of DST. However, there is a lack of a transformation method based on permutation order between RPS and DST, as well as a sequence-based probability transformation method for RPS. Moreover, the reliability of RPS sources remains an issue that requires attention. To address these challenges, this paper proposes an RPS transformation approach and a probability transformation method tailored for RPS. On this basis, a reliability computation method for RPS sources, based on the RPS probability transformation, is introduced and applied to pattern recognition. Experimental results demonstrate that the proposed approach effectively bridges the gap between DST and RPS and achieves superior recognition accuracy in classification problems.

Updated: 2024-10-30 07:40:35

标题: 基于随机排列集的信息来源可靠性评估

摘要: 在模式识别中，处理不确定性是一个关键挑战，它显著影响决策和分类准确性。Dempster-Shafer理论（DST）是一种有效的推理框架，用于处理不确定性，而随机排列集（RPS）通过另外考虑元素的内部顺序，形成了DST的更有序的扩展。然而，缺乏基于排列顺序的RPS和DST之间的转换方法，以及针对RPS的基于序列的概率转换方法。此外，RPS来源的可靠性仍然是一个需要关注的问题。为了解决这些挑战，本文提出了一种适用于RPS的RPS转换方法和概率转换方法。在此基础上，基于RPS概率转换的可靠性计算方法被引入并应用于模式识别。实验结果表明，所提出的方法有效地弥合了DST和RPS之间的差距，并在分类问题中实现了优越的识别准确性。

更新时间: 2024-10-30 07:40:35

领域: cs.AI

下载: http://arxiv.org/abs/2410.22772v1

InjecGuard: Benchmarking and Mitigating Over-defense in Prompt Injection Guardrail Models

Prompt injection attacks pose a critical threat to large language models (LLMs), enabling goal hijacking and data leakage. Prompt guard models, though effective in defense, suffer from over-defense -- falsely flagging benign inputs as malicious due to trigger word bias. To address this issue, we introduce NotInject, an evaluation dataset that systematically measures over-defense across various prompt guard models. NotInject contains 339 benign samples enriched with trigger words common in prompt injection attacks, enabling fine-grained evaluation. Our results show that state-of-the-art models suffer from over-defense issues, with accuracy dropping close to random guessing levels (60%). To mitigate this, we propose InjecGuard, a novel prompt guard model that incorporates a new training strategy, Mitigating Over-defense for Free (MOF), which significantly reduces the bias on trigger words. InjecGuard demonstrates state-of-the-art performance on diverse benchmarks including NotInject, surpassing the existing best model by 30.8%, offering a robust and open-source solution for detecting prompt injection attacks. The code and datasets are released at https://github.com/SaFoLab-WISC/InjecGuard.

Updated: 2024-10-30 07:39:42

标题: InjecGuard：评估和缓解即时注入防护模型中的过度防御

摘要: 快速注入攻击对大型语言模型(LLMs)构成了严重威胁，使其可能遭受目标劫持和数据泄露。尽管Prompt守卫模型在防御方面有效，但存在过度防御的问题 -- 由于触发词偏见，误将良性输入错误地标记为恶意输入。为解决这一问题，我们引入了一个评估数据集NotInject，系统地衡量了各种Prompt守卫模型的过度防御情况。NotInject包含了339个带有常见触发词的良性样本，可进行细粒度评估。我们的研究结果显示，最先进的模型存在过度防御问题，准确率接近随机猜测水平(60%)。为缓解这一问题，我们提出了InjecGuard，这是一种新颖的Prompt守卫模型，其中包含一种新的训练策略，即“免费减少过度防御”(MOF)，可以显著减少对触发词的偏见。InjecGuard在包括NotInject在内的各种基准测试中表现出色，超过现有最佳模型30.8%，为检测Prompt注入攻击提供了稳健且开源的解决方案。代码和数据集已发布在https://github.com/SaFoLab-WISC/InjecGuard。

更新时间: 2024-10-30 07:39:42

领域: cs.CL,cs.AI,cs.CR

下载: http://arxiv.org/abs/2410.22770v1

Beyond Ontology in Dialogue State Tracking for Goal-Oriented Chatbot

Goal-oriented chatbots are essential for automating user tasks, such as booking flights or making restaurant reservations. A key component of these systems is Dialogue State Tracking (DST), which interprets user intent and maintains the dialogue state. However, existing DST methods often rely on fixed ontologies and manually compiled slot values, limiting their adaptability to open-domain dialogues. We propose a novel approach that leverages instruction tuning and advanced prompt strategies to enhance DST performance, without relying on any predefined ontologies. Our method enables Large Language Model (LLM) to infer dialogue states through carefully designed prompts and includes an anti-hallucination mechanism to ensure accurate tracking in diverse conversation contexts. Additionally, we employ a Variational Graph Auto-Encoder (VGAE) to model and predict subsequent user intent. Our approach achieved state-of-the-art with a JGA of 42.57% outperforming existing ontology-less DST models, and performed well in open-domain real-world conversations. This work presents a significant advancement in creating more adaptive and accurate goal-oriented chatbots.

Updated: 2024-10-30 07:36:23

标题: 目标导向聊天机器人中对话状态跟踪的本体之外

摘要: 目标导向的聊天机器人对于自动化用户任务（如预订机票或订餐厅）至关重要。这些系统的关键组成部分是对话状态跟踪（DST），它解释用户意图并维护对话状态。然而，现有的DST方法通常依赖固定的本体论和手动编制的槽值，限制了它们对开放域对话的适应性。我们提出了一种新颖的方法，利用指导调整和先进的提示策略来增强DST的性能，而不依赖任何预定义的本体论。我们的方法通过精心设计的提示使大型语言模型（LLM）能够推断对话状态，并包括一种抗幻觉机制，以确保在多样的对话环境中进行准确跟踪。此外，我们采用变分图自编码器（VGAE）来建模和预测随后的用户意图。我们的方法取得了42.57%的JGA，超越了现有的无本体论DST模型，并在开放域真实对话中表现良好。这项工作在创建更具适应性和准确性的目标导向聊天机器人方面取得了重大进展。

更新时间: 2024-10-30 07:36:23

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2410.22767v1

The Oscars of AI Theater: A Survey on Role-Playing with Language Models

This survey explores the burgeoning field of role-playing with language models, focusing on their development from early persona-based models to advanced character-driven simulations facilitated by Large Language Models (LLMs). Initially confined to simple persona consistency due to limited model capabilities, role-playing tasks have now expanded to embrace complex character portrayals involving character consistency, behavioral alignment, and overall attractiveness. We provide a comprehensive taxonomy of the critical components in designing these systems, including data, models and alignment, agent architecture and evaluation. This survey not only outlines the current methodologies and challenges, such as managing dynamic personal profiles and achieving high-level persona consistency but also suggests avenues for future research in improving the depth and realism of role-playing applications. The goal is to guide future research by offering a structured overview of current methodologies and identifying potential areas for improvement. Related resources and papers are available at https://github.com/nuochenpku/Awesome-Role-Play-Papers.

Updated: 2024-10-30 07:33:59

标题: AI剧院的奥斯卡奖：关于语言模型角色扮演的调查

摘要: 这项调查探讨了角色扮演与语言模型领域的迅速发展，重点关注它们从早期基于角色的模型发展到由大型语言模型（LLMs）推动的先进角色驱动模拟。最初由于模型能力有限而局限于简单的角色一致性，角色扮演任务现在已扩展到包括角色一致性、行为对齐和整体吸引力在内的复杂角色表现。我们提供了设计这些系统关键组成部分的全面分类，包括数据、模型和对齐、代理体系结构和评估。这项调查不仅概述了当前的方法和挑战，如管理动态个人资料和实现高水平角色一致性，还提出了未来研究在改进角色扮演应用的深度和现实感方面的途径。目标是通过提供当前方法的结构化概述和确定改进潜在领域来指导未来研究。相关资源和论文可在https://github.com/nuochenpku/Awesome-Role-Play-Papers找到。

更新时间: 2024-10-30 07:33:59

领域: cs.AI,cs.CL

下载: http://arxiv.org/abs/2407.11484v7

Self-Driving Car Racing: Application of Deep Reinforcement Learning

This paper explores the application of deep reinforcement learning (RL) techniques in the domain of autonomous self-driving car racing. Motivated by the rise of AI-driven mobility and autonomous racing events, the project aims to develop an AI agent that efficiently drives a simulated car in the OpenAI Gymnasium CarRacing environment. We investigate various RL algorithms, including Deep Q-Network (DQN), Proximal Policy Optimization (PPO), and novel adaptations that incorporate transfer learning and recurrent neural networks (RNNs) for enhanced performance. The project demonstrates that while DQN provides a strong baseline for policy learning, integrating ResNet and LSTM models significantly improves the agent's ability to capture complex spatial and temporal dynamics. PPO, particularly in continuous action spaces, shows promising results for fine control, although challenges such as policy collapse remain. We compare the performance of these approaches and outline future research directions focused on improving computational efficiency and addressing model stability. Our findings contribute to the ongoing development of AI systems in autonomous driving and related control tasks.

Updated: 2024-10-30 07:32:25

标题: 自动驾驶汽车竞赛：深度强化学习的应用

摘要: 本文探讨了深度强化学习（RL）技术在自主无人驾驶汽车赛车领域的应用。受人工智能驱动的移动性和自主赛车活动的兴起的启发，该项目旨在开发一个能够有效驾驶模拟汽车在OpenAI Gymnasium CarRacing环境中的AI代理。我们研究了各种RL算法，包括深度Q网络（DQN）、近端策略优化（PPO）以及结合迁移学习和循环神经网络（RNNs）以提高性能的新颖适应方法。该项目表明，虽然DQN为策略学习提供了强大的基准线，但集成ResNet和LSTM模型显着提高了代理捕捉复杂的空间和时间动态的能力。尤其是在连续动作空间中，PPO显示出对精细控制有着令人期待的结果，尽管存在策略崩溃等挑战。我们比较了这些方法的性能，并概述了未来研究方向，重点是提高计算效率和解决模型稳定性的问题。我们的发现有助于自主驾驶和相关控制任务中人工智能系统的持续发展。

更新时间: 2024-10-30 07:32:25

领域: cs.AI

下载: http://arxiv.org/abs/2410.22766v1

Aligning Diffusion Behaviors with Q-functions for Efficient Continuous Control

Drawing upon recent advances in language model alignment, we formulate offline Reinforcement Learning as a two-stage optimization problem: First pretraining expressive generative policies on reward-free behavior datasets, then fine-tuning these policies to align with task-specific annotations like Q-values. This strategy allows us to leverage abundant and diverse behavior data to enhance generalization and enable rapid adaptation to downstream tasks using minimal annotations. In particular, we introduce Efficient Diffusion Alignment (EDA) for solving continuous control problems. EDA utilizes diffusion models for behavior modeling. However, unlike previous approaches, we represent diffusion policies as the derivative of a scalar neural network with respect to action inputs. This representation is critical because it enables direct density calculation for diffusion models, making them compatible with existing LLM alignment theories. During policy fine-tuning, we extend preference-based alignment methods like Direct Preference Optimization (DPO) to align diffusion behaviors with continuous Q-functions. Our evaluation on the D4RL benchmark shows that EDA exceeds all baseline methods in overall performance. Notably, EDA maintains about 95\% of performance and still outperforms several baselines given only 1\% of Q-labelled data during fine-tuning.

Updated: 2024-10-30 07:31:43

标题: 将扩散行为与Q函数对齐，以实现高效的连续控制

摘要: 借鉴最近语言模型对齐的进展，我们将离线强化学习形式化为一个两阶段优化问题：首先在无奖励行为数据集上预训练表达丰富的生成策略，然后微调这些策略以与任务特定的注释（如Q值）对齐。这种策略使我们能够利用丰富多样的行为数据来增强泛化能力，并使用最少的注释实现对下游任务的快速适应。特别地，我们引入了用于解决连续控制问题的高效扩散对齐（EDA）。EDA利用扩散模型进行行为建模。然而，与先前的方法不同，我们将扩散策略表示为相对于动作输入的标量神经网络的导数。这种表示至关重要，因为它能够直接计算扩散模型的密度，使其与现有的LLM对齐理论兼容。在策略微调过程中，我们将基于偏好的对齐方法（如直接偏好优化，DPO）扩展到将扩散行为与连续的Q函数对齐。我们在D4RL基准测试上的评估结果显示，EDA在整体性能方面超越了所有基线方法。值得注意的是，在微调过程中仅提供1％的Q标记数据，EDA仍然保持大约95％的性能，并且仍然优于几个基线方法。

更新时间: 2024-10-30 07:31:43

领域: cs.LG

下载: http://arxiv.org/abs/2407.09024v2

A Game-Theoretic Approach for Security Control Selection

Selecting the combination of security controls that will most effectively protect a system's assets is a difficult task. If the wrong controls are selected, the system may be left vulnerable to cyber-attacks that can impact the confidentiality, integrity and availability of critical data and services. In practical settings, it is not possible to select and implement every control possible. Instead considerations, such as budget, effectiveness, and dependencies among various controls, must be considered to choose a combination of security controls that best achieve a set of system security objectives. In this paper, we propose a game-theoretic approach for selecting effective combinations of security controls based on expected attacker profiles and a set budget. The control selection problem is set up as a two-person zero-sum one-shot game. Valid control combinations for selection are generated using an algebraic formalism to account for dependencies among selected controls. We demonstrate the proposed approach on an illustrative financial system used in government departments under four different scenarios. The results illustrate how a security analyst can use the proposed approach to guide and support decision-making in the control selection activity when developing secure systems.

Updated: 2024-10-30 07:29:48

标题: 安全控制选择的博弈论方法

摘要: 选择最有效地保护系统资产的安全控制组合是一项困难的任务。如果选择了错误的控制措施，系统可能会容易受到网络攻击，这可能会影响关键数据和服务的机密性、完整性和可用性。在实际环境中，不可能选择并实施所有可能的控制措施。相反，必须考虑预算、有效性和各种控制措施之间的依赖关系，以选择最能实现一组系统安全目标的安全控制组合。在本文中，我们提出了一种基于预期攻击者配置文件和一组预算的博弈论方法，用于选择有效的安全控制组合。控制选择问题被设置为一个两人零和一次性游戏。通过使用代数形式主义生成有效的控制组合，以考虑所选控制措施之间的依赖关系。我们在政府部门中使用的一个示例金融系统上演示了所提出的方法在四种不同情景下的应用。结果说明了安全分析师如何使用所提出的方法来指导和支持在开发安全系统时的控制选择活动中的决策制定。

更新时间: 2024-10-30 07:29:48

领域: cs.CR,cs.GT

下载: http://arxiv.org/abs/2410.22762v1

Noise Contrastive Alignment of Language Models with Explicit Rewards

User intentions are typically formalized as evaluation rewards to be maximized when fine-tuning language models (LMs). Existing alignment methods, such as Direct Preference Optimization (DPO), are mainly tailored for pairwise preference data where rewards are implicitly defined rather than explicitly given. In this paper, we introduce a general framework for LM alignment, leveraging Noise Contrastive Estimation (NCE) to bridge the gap in handling reward datasets explicitly annotated with scalar evaluations. Our framework comprises two parallel algorithms, NCA and InfoNCA, both enabling the direct extraction of an LM policy from reward data as well as preference data. Notably, we show that the DPO loss is a special case of our proposed InfoNCA objective under pairwise preference settings, thereby integrating and extending current alignment theories. By comparing NCA and InfoNCA, we demonstrate that the well-observed decreasing-likelihood trend of DPO/InfoNCA is caused by their focus on adjusting relative likelihood across different responses. In contrast, NCA optimizes the absolute likelihood for each response, thereby effectively preventing the chosen likelihood from decreasing. We evaluate our methods in both reward and preference settings with Mistral-8*7B and 7B models. Experiments suggest that InfoNCA/NCA surpasses various preference baselines when reward datasets are available. We also find NCA significantly outperforms DPO in complex reasoning tasks like math and coding.

Updated: 2024-10-30 07:29:40

标题: 语言模型的噪声对比对齐与明确奖励

摘要: 用户意向通常被形式化为要最大化的评估奖励，当微调语言模型（LMs）时。现有的对齐方法，如直接偏好优化（DPO），主要针对成对偏好数据进行定制，其中奖励是隐式定义的，而不是明确给定的。在本文中，我们引入了一个通用的LM对齐框架，利用噪声对比估计（NCE）来弥合处理明确注释了标量评估的奖励数据之间的差距。我们的框架包括两个并行算法，NCA和InfoNCA，两者都能从奖励数据和偏好数据中直接提取LM策略。值得注意的是，我们展示了DPO损失是我们提出的InfoNCA目标在成对偏好设置下的特殊情况，从而整合和扩展了当前的对齐理论。通过比较NCA和InfoNCA，我们展示了DPO/InfoNCA的降低可能性趋势是由于它们专注于调整不同响应之间的相对可能性。相比之下，NCA优化了每个响应的绝对可能性，从而有效地防止所选可能性的降低。我们在奖励和偏好设置下使用Mistral-8*7B和7B模型评估了我们的方法。实验表明，当奖励数据可用时，InfoNCA/NCA超越了各种偏好基线。我们还发现，NCA在数学和编码等复杂推理任务中明显优于DPO。

更新时间: 2024-10-30 07:29:40

领域: cs.LG,cs.CL

下载: http://arxiv.org/abs/2402.05369v3

An Overview of Causal Inference using Kernel Embeddings

Kernel embeddings have emerged as a powerful tool for representing probability measures in a variety of statistical inference problems. By mapping probability measures into a reproducing kernel Hilbert space (RKHS), kernel embeddings enable flexible representations of complex relationships between variables. They serve as a mechanism for efficiently transferring the representation of a distribution downstream to other tasks, such as hypothesis testing or causal effect estimation. In the context of causal inference, the main challenges include identifying causal associations and estimating the average treatment effect from observational data, where confounding variables may obscure direct cause-and-effect relationships. Kernel embeddings provide a robust nonparametric framework for addressing these challenges. They allow for the representations of distributions of observational data and their seamless transformation into representations of interventional distributions to estimate relevant causal quantities. We overview recent research that leverages the expressiveness of kernel embeddings in tandem with causal inference.

Updated: 2024-10-30 07:23:34

标题: 使用核嵌入进行因果推断的概述

摘要: 核嵌入已经成为在各种统计推断问题中表示概率测度的强大工具。通过将概率测度映射到再生核希尔伯特空间（RKHS），核嵌入使复杂变量之间的关系得以灵活表示。它们作为一种机制，可以有效地将分布的表示传递到其他任务，如假设检验或因果效应估计。在因果推断的背景下，主要挑战包括识别因果关联和从观测数据中估计平均处理效应，其中混杂变量可能掩盖直接因果关系。核嵌入为解决这些挑战提供了强大的非参数框架。它们允许对观测数据的分布进行表示，并将其无缝转换为干预分布的表示，以估计相关因果数量。我们概述了最近利用核嵌入与因果推断相结合的研究。

更新时间: 2024-10-30 07:23:34

领域: stat.ML,cs.LG,stat.ME

下载: http://arxiv.org/abs/2410.22754v1

$C^2M^3$: Cycle-Consistent Multi-Model Merging

In this paper, we present a novel data-free method for merging neural networks in weight space. Differently from most existing works, our method optimizes for the permutations of network neurons globally across all layers. This allows us to enforce cycle consistency of the permutations when merging $N \geq 3$ models, allowing circular compositions of permutations to be computed without accumulating error along the path. We qualitatively and quantitatively motivate the need for such a constraint, showing its benefits when merging sets of models in scenarios spanning varying architectures and datasets. We finally show that, when coupled with activation renormalization, our approach yields the best results in the task.

Updated: 2024-10-30 07:18:46

标题: $C^2M^3$: 周期一致的多模型合并

摘要: 在本文中，我们提出了一种新颖的无数据方法，用于在权重空间中合并神经网络。与大多数现有工作不同，我们的方法在全局范围内优化网络神经元的排列。这使我们能够在合并$N \geq 3$个模型时强制执行排列的循环一致性，从而能够计算循环排列而不沿路径累积误差。我们在定性和定量上证明了这种约束的必要性，展示了在合并跨越不同架构和数据集的情况下模型集时的好处。最后，我们展示了当与激活重正化结合时，我们的方法在任务中产生最佳结果。

更新时间: 2024-10-30 07:18:46

领域: cs.LG

下载: http://arxiv.org/abs/2405.17897v2

SoftCTRL: Soft conservative KL-control of Transformer Reinforcement Learning for Autonomous Driving

In recent years, motion planning for urban self-driving cars (SDV) has become a popular problem due to its complex interaction of road components. To tackle this, many methods have relied on large-scale, human-sampled data processed through Imitation learning (IL). Although effective, IL alone cannot adequately handle safety and reliability concerns. Combining IL with Reinforcement learning (RL) by adding KL divergence between RL and IL policy to the RL loss can alleviate IL's weakness but suffer from over-conservation caused by covariate shift of IL. To address this limitation, we introduce a method that combines IL with RL using an implicit entropy-KL control that offers a simple way to reduce the over-conservation characteristic. In particular, we validate different challenging simulated urban scenarios from the unseen dataset, indicating that although IL can perform well in imitation tasks, our proposed method significantly improves robustness (over 17\% reduction in failures) and generates human-like driving behavior.

Updated: 2024-10-30 07:18:00

标题: SoftCTRL: 软保守型变压器强化学习在自动驾驶中的KL控制

摘要: 近年来，由于城市自动驾驶汽车（SDV）的运动规划涉及复杂的道路组件之间的交互，因此已成为一个热门问题。为了解决这个问题，许多方法依赖于通过模仿学习（IL）处理的大规模人类采样数据。尽管有效，但单独的IL不能充分处理安全性和可靠性问题。通过在RL损失中添加RL和IL策略之间的KL散度来将IL与强化学习（RL）相结合可以缓解IL的弱点，但受到IL协变量转变引起的过度保守的影响。为了解决这一限制，我们引入了一种方法，通过使用隐式熵-KL控制将IL与RL结合起来，提供了一种简单的方法来减少过度保守的特征。特别地，我们验证了来自未见数据集的不同具有挑战性的模拟城市场景，表明尽管IL在模仿任务中表现良好，但我们提出的方法显著提高了鲁棒性（失败率减少超过17％），并产生类似人类驾驶行为。

更新时间: 2024-10-30 07:18:00

领域: cs.RO,cs.AI

下载: http://arxiv.org/abs/2410.22752v1

Stochastic Amortization: A Unified Approach to Accelerate Feature and Data Attribution

Many tasks in explainable machine learning, such as data valuation and feature attribution, perform expensive computation for each data point and are intractable for large datasets. These methods require efficient approximations, and although amortizing the process by learning a network to directly predict the desired output is a promising solution, training such models with exact labels is often infeasible. We therefore explore training amortized models with noisy labels, and we find that this is inexpensive and surprisingly effective. Through theoretical analysis of the label noise and experiments with various models and datasets, we show that this approach tolerates high noise levels and significantly accelerates several feature attribution and data valuation methods, often yielding an order of magnitude speedup over existing approaches.

Updated: 2024-10-30 07:17:45

标题: 随机摊销：加速特征和数据归因的统一方法

摘要: 在可解释机器学习中，许多任务，如数据估值和特征归因，对每个数据点执行昂贵的计算，对于大型数据集来说是难以处理的。这些方法需要高效的近似，虽然通过学习一个网络直接预测所需输出来分摊处理过程是一个有希望的解决方案，但使用精确标签训练这样的模型通常是不可行的。因此，我们探索了使用嘈杂标签训练摊销模型的方法，并发现这是廉价且令人惊讶地有效的。通过对标签噪声的理论分析以及对各种模型和数据集的实验，我们表明这种方法能够容忍高噪声水平，并显著加速几种特征归因和数据估值方法，通常比现有方法快上一个数量级。

更新时间: 2024-10-30 07:17:45

领域: cs.LG

下载: http://arxiv.org/abs/2401.15866v2

Understanding Aggregations of Proper Learners in Multiclass Classification

Multiclass learnability is known to exhibit a properness barrier: there are learnable classes which cannot be learned by any proper learner. Binary classification faces no such barrier for learnability, but a similar one for optimal learning, which can in general only be achieved by improper learners. Fortunately, recent advances in binary classification have demonstrated that this requirement can be satisfied using aggregations of proper learners, some of which are strikingly simple. This raises a natural question: to what extent can simple aggregations of proper learners overcome the properness barrier in multiclass classification? We give a positive answer to this question for classes which have finite Graph dimension, $d_G$. Namely, we demonstrate that the optimal binary learners of Hanneke, Larsen, and Aden-Ali et al. (appropriately generalized to the multiclass setting) achieve sample complexity $O\left(\frac{d_G + \ln(1 / \delta)}{\epsilon}\right)$. This forms a strict improvement upon the sample complexity of ERM. We complement this with a lower bound demonstrating that for certain classes of Graph dimension $d_G$, majorities of ERM learners require $\Omega \left( \frac{d_G + \ln(1 / \delta)}{\epsilon}\right)$ samples. Furthermore, we show that a single ERM requires $\Omega \left(\frac{d_G \ln(1 / \epsilon) + \ln(1 / \delta)}{\epsilon}\right)$ samples on such classes, exceeding the lower bound of Daniely et al. (2015) by a factor of $\ln(1 / \epsilon)$. For multiclass learning in full generality -- i.e., for classes of finite DS dimension but possibly infinite Graph dimension -- we give a strong refutation to these learning strategies, by exhibiting a learnable class which cannot be learned to constant error by any aggregation of a finite number of proper learners.

Updated: 2024-10-30 07:12:02

标题: 理解多类分类中适当学习者的聚合

摘要: 多类别可学习性已知存在一个适当性障碍：有一些可学习的类别无法被任何适当的学习器学习。二元分类面临的学习性没有这样的障碍，但对于最优学习却存在类似的障碍，通常只能通过不适当的学习器来实现。幸运的是，最近在二元分类方面取得的进展表明，这一要求可以通过适当的学习器的聚合来满足，其中一些方法非常简单。这引发了一个自然的问题：在多类别分类中，简单的适当学习器的聚合能够克服适当性障碍的程度如何？我们对具有有限图维度$d_G$的类别给出了肯定的答案。换句话说，我们证明了Hanneke、Larsen和Aden-Ali等人的最优二元学习器（适当推广到多类别设置）实现了样本复杂度$O\left(\frac{d_G + \ln(1 / \delta)}{\epsilon}\right)$。这对ERM的样本复杂度形成了明显的改进。我们补充了一个下界，证明了对于一定图维度$d_G$的类别，ERM学习器的大多数需要$\Omega \left( \frac{d_G + \ln(1 / \delta)}{\epsilon}\right)$个样本。此外，我们表明在这样的类别上，单个ERM需要$\Omega \left(\frac{d_G \ln(1 / \epsilon) + \ln(1 / \delta)}{\epsilon}\right)$个样本，超过了Daniely等人（2015年）的下界$\ln(1 / \epsilon)$倍。对于全面的多类别学习——即对于具有有限DS维度但可能有无限图维度的类别——我们通过展示一个可学习的类别，证明了任何有限数量的适当学习器的聚合无法将其学习到恒定误差。

更新时间: 2024-10-30 07:12:02

领域: cs.LG,math.ST,stat.ML,stat.TH

下载: http://arxiv.org/abs/2410.22749v1

Metric Based Few-Shot Graph Classification

Many modern deep-learning techniques do not work without enormous datasets. At the same time, several fields demand methods working in scarcity of data. This problem is even more complex when the samples have varying structures, as in the case of graphs. Graph representation learning techniques have recently proven successful in a variety of domains. Nevertheless, the employed architectures perform miserably when faced with data scarcity. On the other hand, few-shot learning allows employing modern deep learning models in scarce data regimes without waiving their effectiveness. In this work, we tackle the problem of few-shot graph classification, showing that equipping a simple distance metric learning baseline with a state-of-the-art graph embedder allows to obtain competitive results on the task. While the simplicity of the architecture is enough to outperform more complex ones, it also allows straightforward additions. To this end, we show that additional improvements may be obtained by encouraging a task-conditioned embedding space. Finally, we propose a MixUp-based online data augmentation technique acting in the latent space and show its effectiveness on the task.

Updated: 2024-10-30 07:05:29

标题: 基于度量的少样本图分类

摘要: 许多现代深度学习技术在没有庞大数据集的情况下无法正常工作。同时，一些领域需要在数据稀缺的情况下运作。当样本具有不同结构时，如图形的情况下，这个问题变得更加复杂。图表示学习技术最近在各种领域表现出成功。然而，当面临数据稀缺时，所采用的体系结构表现不佳。另一方面，少样本学习允许在稀缺数据环境中使用现代深度学习模型而不降低它们的有效性。在这项工作中，我们解决了少样本图分类的问题，展示了将简单的距离度量学习基线与最先进的图嵌入器结合使用，可以在任务上获得竞争性结果。尽管这种体系结构的简单性足以胜过更复杂的体系结构，但它也允许简单地进行添加。为此，我们展示了通过鼓励任务条件化嵌入空间可以获得额外的改进。最后，我们提出了一种基于MixUp的在线数据增强技术，作用于潜在空间，并展示了它在任务上的有效性。

更新时间: 2024-10-30 07:05:29

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2206.03695v3

FLIP: Fine-grained Alignment between ID-based Models and Pretrained Language Models for CTR Prediction

Click-through rate (CTR) prediction plays as a core function module in various personalized online services. The traditional ID-based models for CTR prediction take as inputs the one-hot encoded ID features of tabular modality, which capture the collaborative signals via feature interaction modeling. But the one-hot encoding discards the semantic information included in the textual features. Recently, the emergence of Pretrained Language Models(PLMs) has given rise to another paradigm, which takes as inputs the sentences of textual modality obtained by hard prompt templates and adopts PLMs to extract the semantic knowledge. However, PLMs often face challenges in capturing field-wise collaborative signals and distinguishing features with subtle textual differences. In this paper, to leverage the benefits of both paradigms and meanwhile overcome their limitations, we propose to conduct Fine-grained feature-level ALignment between ID-based Models and Pretrained Language Models(FLIP) for CTR prediction. Unlike most methods that solely rely on global views through instance-level contrastive learning, we design a novel jointly masked tabular/language modeling task to learn fine-grained alignment between tabular IDs and word tokens. Specifically, the masked data of one modality (IDs and tokens) has to be recovered with the help of the other modality, which establishes the feature-level interaction and alignment via sufficient mutual information extraction between dual modalities. Moreover, we propose to jointly finetune the ID-based model and PLM by adaptively combining the output of both models, thus achieving superior performance in downstream CTR prediction tasks. Extensive experiments on three real-world datasets demonstrate that FLIP outperforms SOTA baselines, and is highly compatible with various ID-based models and PLMs. The code is at \url{https://github.com/justarter/FLIP}.

Updated: 2024-10-30 07:04:25

标题: FLIP：基于ID模型和预训练语言模型的细粒度对齐用于CTR预测

摘要: 点击率（CTR）预测在各种个性化在线服务中扮演着核心功能模块的角色。传统的基于ID的CTR预测模型将表格模态的单热编码ID特征作为输入，通过特征交互建模捕获协同信号。但是单热编码丢弃了包含在文本特征中的语义信息。最近，预训练语言模型（PLMs）的出现引发了另一种范式，该范式将通过硬提示模板获取的文本模态句子作为输入，并采用PLMs来提取语义知识。然而，PLMs经常面临捕捉领域间协同信号和区分具有微妙文本差异特征的挑战。在本文中，为了发挥这两种范式的优势并同时克服它们的局限性，我们提出进行基于特征级别细粒度对齐的ID-based模型和预训练语言模型（FLIP）用于CTR预测。与大多数仅依赖于实例级对比学习的方法不同，我们设计了一个新颖的联合掩码表格/语言建模任务，以学习表格ID和单词标记之间的细粒度对齐。具体而言，一种模态的掩码数据（IDs和标记）必须在另一种模态的帮助下恢复，从而通过对双模态之间的充分信息提取建立特征级互动和对齐。此外，我们提出通过自适应地结合两个模型的输出来联合微调ID-based模型和PLM，从而在下游CTR预测任务中实现卓越性能。对三个真实数据集的广泛实验表明，FLIP优于SOTA基线，并且与各种ID-based模型和PLMs高度兼容。该代码位于\url{https://github.com/justarter/FLIP}。

更新时间: 2024-10-30 07:04:25

领域: cs.IR,cs.AI

下载: http://arxiv.org/abs/2310.19453v4

Designing AI Personalities: Enhancing Human-Agent Interaction Through Thoughtful Persona Design

In the rapidly evolving field of artificial intelligence (AI) agents, designing the agent's characteristics is crucial for shaping user experience. This workshop aims to establish a research community focused on AI agent persona design for various contexts, such as in-car assistants, educational tools, and smart home environments. We will explore critical aspects of persona design, such as voice, embodiment, and demographics, and their impact on user satisfaction and engagement. Through discussions and hands-on activities, we aim to propose practices and standards that enhance the ecological validity of agent personas. Topics include the design of conversational interfaces, the influence of agent personas on user experience, and approaches for creating contextually appropriate AI agents. This workshop will provide a platform for building a community dedicated to developing AI agent personas that better fit diverse, everyday interactions.

Updated: 2024-10-30 06:58:59

标题: 设计AI人格：通过周到的人格设计增强人与智能体的交互

摘要: 在快速发展的人工智能（AI）代理领域，设计代理的特征对塑造用户体验至关重要。本研讨会旨在建立一个专注于各种情境下AI代理人物设计的研究社区，如汽车助手、教育工具和智能家居环境。我们将探讨人物设计的关键方面，如语音、体现和人口统计数据，以及它们对用户满意度和参与度的影响。通过讨论和实践活动，我们的目标是提出增强代理人物生态效度的实践和标准。主题包括对话界面的设计，代理人物对用户体验的影响，以及创建情境适宜的AI代理的方法。本研讨会将为建立一个致力于开发更适合各种日常互动的AI代理人物的社区提供平台。

更新时间: 2024-10-30 06:58:59

领域: cs.HC,cs.AI

下载: http://arxiv.org/abs/2410.22744v1

TLCM: Training-efficient Latent Consistency Model for Image Generation with 2-8 Steps

Distilling latent diffusion models (LDMs) into ones that are fast to sample from is attracting growing research interest. However, the majority of existing methods face two critical challenges: (1) They hinge on long training using a huge volume of real data. (2) They routinely lead to quality degradation for generation, especially in text-image alignment. This paper proposes a novel training-efficient Latent Consistency Model (TLCM) to overcome these challenges. Our method first accelerates LDMs via data-free multistep latent consistency distillation (MLCD), and then data-free latent consistency distillation is proposed to efficiently guarantee the inter-segment consistency in MLCD. Furthermore, we introduce bags of techniques, e.g., distribution matching, adversarial learning, and preference learning, to enhance TLCM's performance at few-step inference without any real data. TLCM demonstrates a high level of flexibility by enabling adjustment of sampling steps within the range of 2 to 8 while still producing competitive outputs compared to full-step approaches. Notably, TLCM enjoys the data-free merit by employing synthetic data from the teacher for distillation. With just 70 training hours on an A100 GPU, a 3-step TLCM distilled from SDXL achieves an impressive CLIP Score of 33.68 and an Aesthetic Score of 5.97 on the MSCOCO-2017 5K benchmark, surpassing various accelerated models and even outperforming the teacher model in human preference metrics. We also demonstrate the versatility of TLCMs in applications including image style transfer, controllable generation, and Chinese-to-image generation.

Updated: 2024-10-30 06:49:52

标题: TLCM：训练高效的图像生成潜在一致性模型，步骤为2-8步

摘要: 将潜在扩散模型（LDMs）提炼为能够快速采样的模型正吸引着越来越多的研究兴趣。然而，现有大多数方法面临两个关键挑战：（1）它们依赖于长时间使用大量真实数据进行训练。（2）它们通常会导致生成质量下降，特别是在文本-图像对齐方面。本文提出了一种新颖的训练高效的潜在一致性模型（TLCM）来克服这些挑战。我们的方法首先通过无数据多步潜在一致性提炼（MLCD）加速LDMs，然后提出了无数据潜在一致性提炼以有效地保证MLCD中的分段一致性。此外，我们引入了一系列技术，例如分布匹配、对抗学习和偏好学习，以增强TLCM在无任何真实数据的少步推断中的性能。TLCM通过在2到8范围内调整采样步骤而仍然产生与全步方法相竞争的输出，展现了高度的灵活性。值得注意的是，TLCM通过利用老师提供的合成数据进行提炼，享有无数据的优点。通过在A100 GPU上仅进行70小时的训练，从SDXL提炼出的3步TLCM在MSCOCO-2017 5K基准测试中获得了引人注目的CLIP得分为33.68和审美得分为5.97，超越了各种加速模型，甚至在人类偏好指标上表现出色地超过了老师模型。我们还展示了TLCMs在图像风格转移、可控生成和中文到图像生成等应用中的多功能性。

更新时间: 2024-10-30 06:49:52

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2406.05768v4

MIXAD: Memory-Induced Explainable Time Series Anomaly Detection

For modern industrial applications, accurately detecting and diagnosing anomalies in multivariate time series data is essential. Despite such need, most state-of-the-art methods often prioritize detection performance over model interpretability. Addressing this gap, we introduce MIXAD (Memory-Induced Explainable Time Series Anomaly Detection), a model designed for interpretable anomaly detection. MIXAD leverages a memory network alongside spatiotemporal processing units to understand the intricate dynamics and topological structures inherent in sensor relationships. We also introduce a novel anomaly scoring method that detects significant shifts in memory activation patterns during anomalies. Our approach not only ensures decent detection performance but also outperforms state-of-the-art baselines by 34.30% and 34.51% in interpretability metrics.

Updated: 2024-10-30 06:46:23

标题: MIXAD：记忆引起的可解释时间序列异常检测

摘要: 对于现代工业应用，准确地检测和诊断多变量时间序列数据中的异常是至关重要的。尽管有这样的需求，大多数最先进的方法通常更注重检测性能而非模型可解释性。针对这一差距，我们介绍了MIXAD（Memory-Induced Explainable Time Series Anomaly Detection），这是一种专为可解释异常检测而设计的模型。MIXAD利用记忆网络以及时空处理单元来理解传感器关系中固有的复杂动态和拓扑结构。我们还引入了一种新颖的异常评分方法，用于检测异常期间记忆激活模式的显著变化。我们的方法不仅确保了良好的检测性能，而且在可解释性指标方面比最先进的基线表现提高了34.30%和34.51%。

更新时间: 2024-10-30 06:46:23

领域: cs.LG

下载: http://arxiv.org/abs/2410.22735v1

R^2AG: Incorporating Retrieval Information into Retrieval Augmented Generation

Retrieval augmented generation (RAG) has been applied in many scenarios to augment large language models (LLMs) with external documents provided by retrievers. However, a semantic gap exists between LLMs and retrievers due to differences in their training objectives and architectures. This misalignment forces LLMs to passively accept the documents provided by the retrievers, leading to incomprehension in the generation process, where the LLMs are burdened with the task of distinguishing these documents using their inherent knowledge. This paper proposes R$^2$AG, a novel enhanced RAG framework to fill this gap by incorporating Retrieval information into Retrieval Augmented Generation. Specifically, R$^2$AG utilizes the nuanced features from the retrievers and employs a R$^2$-Former to capture retrieval information. Then, a retrieval-aware prompting strategy is designed to integrate retrieval information into LLMs' generation. Notably, R$^2$AG suits low-source scenarios where LLMs and retrievers are frozen. Extensive experiments across five datasets validate the effectiveness, robustness, and efficiency of R$^2$AG. Our analysis reveals that retrieval information serves as an anchor to aid LLMs in the generation process, thereby filling the semantic gap.

Updated: 2024-10-30 06:41:45

标题: R^2AG：将检索信息融入检索增强生成

摘要: 检索增强生成（RAG）已经应用于许多场景，以在外部检索器提供的外部文档的基础上增强大型语言模型（LLMs）。然而，由于LLMs和检索器在训练目标和架构上的差异，它们之间存在语义差距。这种不匹配迫使LLMs passively接受检索器提供的文档，在生成过程中导致理解困难，LLMs需要利用其固有知识来区分这些文档。本文提出了R$^2$AG，这是一个新颖的增强型RAG框架，通过将检索信息整合到检索增强生成中来填补这一差距。具体来说，R$^2$AG利用检索器的微妙特征，并采用R$^2$-Former来捕获检索信息。然后，设计了一种检索感知提示策略，将检索信息整合到LLMs的生成中。值得注意的是，R$^2$AG适用于低源场景，其中LLMs和检索器被冻结。对五个数据集进行的广泛实验验证了R$^2$AG的有效性、鲁棒性和高效性。我们的分析表明，检索信息作为辅助LLMs生成过程的锚点，从而填补了语义差距。

更新时间: 2024-10-30 06:41:45

领域: cs.CL,cs.AI,cs.IR

下载: http://arxiv.org/abs/2406.13249v2

st-DTPM: Spatial-Temporal Guided Diffusion Transformer Probabilistic Model for Delayed Scan PET Image Prediction

PET imaging is widely employed for observing biological metabolic activities within the human body. However, numerous benign conditions can cause increased uptake of radiopharmaceuticals, confounding differentiation from malignant tumors. Several studies have indicated that dual-time PET imaging holds promise in distinguishing between malignant and benign tumor processes. Nevertheless, the hour-long distribution period of radiopharmaceuticals post-injection complicates the determination of optimal timing for the second scan, presenting challenges in both practical applications and research. Notably, we have identified that delay time PET imaging can be framed as an image-to-image conversion problem. Motivated by this insight, we propose a novel spatial-temporal guided diffusion transformer probabilistic model (st-DTPM) to solve dual-time PET imaging prediction problem. Specifically, this architecture leverages the U-net framework that integrates patch-wise features of CNN and pixel-wise relevance of Transformer to obtain local and global information. And then employs a conditional DDPM model for image synthesis. Furthermore, on spatial condition, we concatenate early scan PET images and noisy PET images on every denoising step to guide the spatial distribution of denoising sampling. On temporal condition, we convert diffusion time steps and delay time to a universal time vector, then embed it to each layer of model architecture to further improve the accuracy of predictions. Experimental results demonstrated the superiority of our method over alternative approaches in preserving image quality and structural information, thereby affirming its efficacy in predictive task.

Updated: 2024-10-30 06:37:55

标题: st-DTPM：空间-时间引导扩散变换器概率模型用于延迟扫描PET图像预测

摘要: PET成像广泛应用于观察人体内的生物代谢活动。然而，许多良性疾病可以引起放射性药物的增加吸收，使其与恶性肿瘤难以区分。一些研究表明，双时PET成像有望区分恶性和良性肿瘤过程。然而，放射性药物注射后长达一小时的分布期间使确定第二次扫描的最佳时机变得复杂，从而在实际应用和研究中带来挑战。值得注意的是，我们发现延迟时间PET成像可以被视为图像到图像转换问题。受此启发，我们提出了一种新颖的空间-时间引导扩散变换器概率模型（st-DTPM）来解决双时PET成像预测问题。具体来说，该架构利用U-net框架，将CNN的基于补丁的特征和Transformer的像素级相关性相结合，以获取局部和全局信息。然后采用条件DDPM模型进行图像合成。此外，在空间条件下，我们在每个去噪步骤中连接早期扫描PET图像和嘈杂的PET图像，以指导去噪采样的空间分布。在时间条件下，我们将扩散时间步和延迟时间转换为通用时间向量，然后将其嵌入到模型架构的每一层中，以进一步提高预测的准确性。实验结果表明，我们的方法在保留图像质量和结构信息方面优于替代方法，从而证实了其在预测任务中的有效性。

更新时间: 2024-10-30 06:37:55

领域: eess.IV,cs.AI,cs.CV

下载: http://arxiv.org/abs/2410.22732v1

BeGin: Extensive Benchmark Scenarios and An Easy-to-use Framework for Graph Continual Learning

Continual Learning (CL) is the process of learning ceaselessly a sequence of tasks. Most existing CL methods deal with independent data (e.g., images and text) for which many benchmark frameworks and results under standard experimental settings are available. Compared to them, however, CL methods for graph data (graph CL) are relatively underexplored because of (a) the lack of standard experimental settings, especially regarding how to deal with the dependency between instances, (b) the lack of benchmark datasets and scenarios, and (c) high complexity in implementation and evaluation due to the dependency. In this paper, regarding (a) we define four standard incremental settings (task-, class-, domain-, and time-incremental) for node-, link-, and graph-level problems, extending the previously explored scope. Regarding (b), we provide 35 benchmark scenarios based on 24 real-world graphs. Regarding (c), we develop BeGin, an easy and fool-proof framework for graph CL. BeGin is easily extended since it is modularized with reusable modules for data processing, algorithm design, and evaluation. Especially, the evaluation module is completely separated from user code to eliminate potential mistakes. Regarding benchmark results, we cover 3x more combinations of incremental settings and levels of problems than the latest benchmark. All assets for the benchmark framework are publicly available at https://github.com/ShinhwanKang/BeGin.

Updated: 2024-10-30 06:37:32

标题: BeGin：广泛的基准场景和用于图连续学习的易于使用的框架

摘要: 持续学习（CL）是不断学习一系列任务的过程。大多数现有的CL方法处理独立数据（例如图像和文本），针对这些数据有许多基准框架和在标准实验设置下的结果。然而，与它们相比，用于图数据的CL方法（图CL）相对未被充分探索，因为（a）缺乏标准实验设置，特别是关于如何处理实例之间的依赖性，（b）缺乏基准数据集和场景，以及（c）由于依赖性而在实施和评估中具有较高的复杂性。在本文中，关于（a），我们为节点-、链接-和图级问题定义了四种标准增量设置（任务增量、类增量、领域增量和时间增量），扩展了之前探索的范围。关于（b），我们提供了基于24个真实图的35个基准场景。关于（c），我们开发了BeGin，一个简单且易于使用的图CL框架。BeGin易于扩展，因为它是通过可重用的数据处理、算法设计和评估模块进行模块化的。特别是，评估模块完全与用户代码分离，以消除潜在错误。关于基准结果，我们覆盖了比最新基准更多的增量设置组合和问题级别。基准框架的所有资产都可以在https://github.com/ShinhwanKang/BeGin 上公开获取。

更新时间: 2024-10-30 06:37:32

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2211.14568v5

Transductive Learning Is Compact

We demonstrate a compactness result holding broadly across supervised learning with a general class of loss functions: Any hypothesis class $H$ is learnable with transductive sample complexity $m$ precisely when all of its finite projections are learnable with sample complexity $m$. We prove that this exact form of compactness holds for realizable and agnostic learning with respect to any proper metric loss function (e.g., any norm on $\mathbb{R}^d$) and any continuous loss on a compact space (e.g., cross-entropy, squared loss). For realizable learning with improper metric losses, we show that exact compactness of sample complexity can fail, and provide matching upper and lower bounds of a factor of 2 on the extent to which such sample complexities can differ. We conjecture that larger gaps are possible for the agnostic case. Furthermore, invoking the equivalence between sample complexities in the PAC and transductive models (up to lower order factors, in the realizable case) permits us to directly port our results to the PAC model, revealing an almost-exact form of compactness holding broadly in PAC learning.

Updated: 2024-10-30 06:33:33

标题: 跨领域学习是紧凑的

摘要: 我们展示了一个广泛适用于具有一般损失函数的监督学习的紧致性结果：当所有有限投影均可学习且具有样本复杂度$m$时，任何假设类$H$均可用转导样本复杂度$m$学习。我们证明，对于任何适合的度量损失函数（例如$\mathbb{R}^d$上的任何范数）和紧致空间上的任何连续损失（例如交叉熵、平方损失），这种精确的紧致性形式对于可实现和不可知学习成立。对于具有不合适的度量损失的可实现学习，我们证明了样本复杂度的精确紧致性可能失败，并提供了一个因子为2的上下界，以说明这种样本复杂度可能存在的差异程度。我们推测在不可知情况下可能存在更大的差距。此外，利用PAC模型和转导模型之间样本复杂度的等价性（在可实现情况下，仅存在较低阶因子），使我们能够直接将结果迁移到PAC模型，揭示了一种几乎精确的广泛适用于PAC学习的紧致性形式。

更新时间: 2024-10-30 06:33:33

领域: cs.LG,cs.CC,cs.DS,cs.LO,stat.ML

下载: http://arxiv.org/abs/2402.10360v3

SHED: Shapley-Based Automated Dataset Refinement for Instruction Fine-Tuning

The pre-trained Large Language Models (LLMs) can be adapted for many downstream tasks and tailored to align with human preferences through fine-tuning. Recent studies have discovered that LLMs can achieve desirable performance with only a small amount of high-quality data, suggesting that a large amount of the data in these extensive datasets is redundant or even harmful. Identifying high-quality data from vast datasets to curate small yet effective datasets has emerged as a critical challenge. In this paper, we introduce SHED, an automated dataset refinement framework based on Shapley value for instruction fine-tuning. SHED eliminates the need for human intervention or the use of commercial LLMs. Moreover, the datasets curated through SHED exhibit transferability, indicating they can be reused across different LLMs with consistently high performance. We conduct extensive experiments to evaluate the datasets curated by SHED. The results demonstrate SHED's superiority over state-of-the-art methods across various tasks and LLMs; notably, datasets comprising only 10% of the original data selected by SHED achieve performance comparable to or surpassing that of the full datasets.

Updated: 2024-10-30 06:33:02

标题: SHED：基于谢普利值的自动数据集细化用于指令微调

摘要: 预先训练的大型语言模型（LLMs）可以通过微调适应许多下游任务，并根据人类偏好进行定制。最近的研究发现，LLMs 可以仅使用少量高质量数据就实现理想的性能，这表明这些庞大数据集中的大量数据是多余的，甚至是有害的。从庞大数据集中识别高质量数据，筛选出小而有效的数据集已经成为一个关键挑战。在本文中，我们介绍了基于 Shapley 值的自动数据集精化框架 SHED 用于指导微调。SHED 消除了人工干预或使用商业LLMs的需求。此外，通过 SHED 筛选的数据集表现出可传递性，表明它们可以在不同的LLMs之间被重复使用，并且表现始终高效。我们进行了大量实验来评估 SHED 筛选的数据集。结果表明，SHED 在各种任务和LLMs上优于最先进的方法；值得注意的是，SHED 选定的仅占原始数据10%的数据集表现可以媲美或优于完整数据集的性能。

更新时间: 2024-10-30 06:33:02

领域: cs.CL,cs.LG

下载: http://arxiv.org/abs/2405.00705v2

Extensional Properties of Recurrent Neural Networks

A property of a recurrent neural network (RNN) is called \emph{extensional} if, loosely speaking, it is a property of the function computed by the RNN rather than a property of the RNN algorithm. Many properties of interest in RNNs are extensional, for example, robustness against small changes of input or good clustering of inputs. Given an RNN, it is natural to ask whether it has such a property. We give a negative answer to the general question about testing extensional properties of RNNs. Namely, we prove a version of Rice's theorem for RNNs: any nontrivial extensional property of RNNs is undecidable.

Updated: 2024-10-30 06:29:02

标题: 循环神经网络的延展性质

摘要: 循环神经网络（RNN）的一个特性被称为\emph{外延性}，简而言之，它是由RNN计算的函数的特性，而不是RNN算法的特性。在RNN中感兴趣的许多特性都是外延性的，例如对输入的微小变化具有鲁棒性或良好的输入聚类。给定一个RNN，自然地会问它是否具有这样的特性。我们对RNN的外延性特性进行测试的一般问题给出了否定答案。换句话说，我们证明了RNN的一个版本的Rice定理：RNN的任何非平凡外延性特性都是不可判定的。

更新时间: 2024-10-30 06:29:02

领域: cs.NE,cs.LG

下载: http://arxiv.org/abs/2410.22730v1

Identifying Drift, Diffusion, and Causal Structure from Temporal Snapshots

Stochastic differential equations (SDEs) are a fundamental tool for modelling dynamic processes, including gene regulatory networks (GRNs), contaminant transport, financial markets, and image generation. However, learning the underlying SDE from observational data is a challenging task, especially when individual trajectories are not observable. Motivated by burgeoning research in single-cell datasets, we present the first comprehensive approach for jointly estimating the drift and diffusion of an SDE from its temporal marginals. Assuming linear drift and additive diffusion, we prove that these parameters are identifiable from marginals if and only if the initial distribution is not invariant to a class of generalized rotations, a condition that is satisfied by most distributions. We further prove that the causal graph of any SDE with additive diffusion can be recovered from the SDE parameters. To complement this theory, we adapt entropy-regularized optimal transport to handle anisotropic diffusion, and introduce APPEX (Alternating Projection Parameter Estimation from $X_0$), an iterative algorithm designed to estimate the drift, diffusion, and causal graph of an additive noise SDE, solely from temporal marginals. We show that each of these steps are asymptotically optimal with respect to the Kullback-Leibler divergence, and demonstrate APPEX's effectiveness on simulated data from linear additive noise SDEs.

Updated: 2024-10-30 06:28:21

标题: 从时间快照中识别漂移、扩散和因果结构

摘要: 随机微分方程（SDEs）是建模动态过程的基本工具，包括基因调控网络（GRNs）、污染物传输、金融市场和图像生成。然而，从观测数据中学习基础SDE是一项具有挑战性的任务，特别是当个体轨迹不可观察时。受单细胞数据集研究的推动，我们提出了第一个全面的方法，从SDE的时间边际中联合估计漂移和扩散。假设线性漂移和附加扩散，我们证明这些参数仅当初始分布不对一类广义旋转不变时，即这个条件大多数分布都满足时，才能从边际中识别出来。我们进一步证明了任何具有附加扩散的SDE的因果图可以从SDE参数中恢复。为了补充这一理论，我们调整了熵正则化的最优传输方法来处理各向异性扩散，并引入了APPEX（从$X_0$交替投影参数估计），这是一个设计用于仅通过时间边际估计线性附加噪声SDE的漂移、扩散和因果图的迭代算法。我们展示了每个步骤在Kullback-Leibler散度方面是渐近最优的，并展示了APPEX在模拟数据中对线性附加噪声SDE的有效性。

更新时间: 2024-10-30 06:28:21

领域: stat.ML,cs.LG,math.ST,stat.TH

下载: http://arxiv.org/abs/2410.22729v1

Offline Behavior Distillation

Massive reinforcement learning (RL) data are typically collected to train policies offline without the need for interactions, but the large data volume can cause training inefficiencies. To tackle this issue, we formulate offline behavior distillation (OBD), which synthesizes limited expert behavioral data from sub-optimal RL data, enabling rapid policy learning. We propose two naive OBD objectives, DBC and PBC, which measure distillation performance via the decision difference between policies trained on distilled data and either offline data or a near-expert policy. Due to intractable bi-level optimization, the OBD objective is difficult to minimize to small values, which deteriorates PBC by its distillation performance guarantee with quadratic discount complexity $\mathcal{O}(1/(1-\gamma)^2)$. We theoretically establish the equivalence between the policy performance and action-value weighted decision difference, and introduce action-value weighted PBC (Av-PBC) as a more effective OBD objective. By optimizing the weighted decision difference, Av-PBC achieves a superior distillation guarantee with linear discount complexity $\mathcal{O}(1/(1-\gamma))$. Extensive experiments on multiple D4RL datasets reveal that Av-PBC offers significant improvements in OBD performance, fast distillation convergence speed, and robust cross-architecture/optimizer generalization.

Updated: 2024-10-30 06:28:09

标题: 线下行为蒸馏

摘要: 大规模强化学习（RL）数据通常用于在没有交互的情况下训练策略，但大量数据可能导致训练效率低下。为了解决这个问题，我们提出了离线行为蒸馏（OBD），通过从次优RL数据中合成有限的专家行为数据，实现快速策略学习。我们提出了两个朴素的OBD目标，DBC和PBC，通过在蒸馏数据上训练的策略与离线数据或接近专家策略之间的决策差异来衡量蒸馏性能。由于难以解决的双层优化问题，OBD目标难以最小化到较小的值，这导致了PBC通过其蒸馏性能保证具有二次折扣复杂度O(1/(1-γ)^2)而恶化。我们在理论上建立了策略性能与动作值加权决策差异之间的等价性，并引入了动作值加权PBC（Av-PBC）作为更有效的OBD目标。通过优化加权决策差异，Av-PBC实现了具有线性折扣复杂度O(1/(1-γ))的优越蒸馏保证。对多个D4RL数据集进行的广泛实验表明，Av-PBC在OBD性能、快速蒸馏收敛速度和稳健的跨架构/优化器泛化方面提供了显著改进。

更新时间: 2024-10-30 06:28:09

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2410.22728v1

Empowering Persian LLMs for Instruction Following: A Novel Dataset and Training Approach

Instruction-tuned large language models have demonstrated remarkable capabilities in following human instructions across various domains. However, their proficiency remains notably deficient in many low-resource languages. To address this challenge, we begin by introducing FarsInstruct a comprehensive instruction dataset designed to enhance the instruction following ability of large language models specifically for the Persian language a significant yet underrepresented language globally. FarsInstruct encompasses a wide range of task types and datasets, each containing a mix of straightforward to complex manual written instructions, as well as translations from the Public Pool of Prompts, ensuring a rich linguistic and cultural representation. Furthermore, we introduce Co-CoLA, a framework designed to enhance the multi-task adaptability of LoRA-tuned models. Through extensive experimental analyses, our study showcases the effectiveness of the FarsInstruct dataset coupled with training by the Co-CoLA framework, in improving the performance of large language models within the Persian context. As of the current writing, FarsInstruct comprises 197 templates across 21 distinct datasets, and we intend to update it consistently, thus augmenting its applicability.

Updated: 2024-10-30 06:26:22

标题: 赋予波斯语母语者遵循指导的能力：一个新颖的数据集和培训方法

摘要: 指导调整的大型语言模型已经展示出在各个领域遵循人类指令的显著能力。然而，在许多资源匮乏的语言中，它们的熟练程度仍然明显不足。为了解决这一挑战，我们首先介绍了 FarsInstruct，这是一个旨在增强大型语言模型特别针对波斯语的指令遵循能力的全面指令数据集。波斯语是一种在全球范围内具有重要而被低估的语言。FarsInstruct 包含各种任务类型和数据集，每个数据集都包含从简单到复杂的手动书面指令，以及来自公共提示池的翻译，确保了丰富的语言和文化表现。此外，我们介绍了 Co-CoLA，这是一个旨在增强 LoRA 调整模型的多任务适应能力的框架。通过广泛的实验分析，我们的研究展示了 FarsInstruct 数据集与 Co-CoLA 框架训练相结合，在改善波斯语境内大型语言模型的表现方面的有效性。截至目前，FarsInstruct 包括 21 个不同数据集中的 197 个模板，我们打算持续更新它，从而增强其适用性。

更新时间: 2024-10-30 06:26:22

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2407.11186v3

Impact of Code Transformation on Detection of Smart Contract Vulnerabilities

While smart contracts are foundational elements of blockchain applications, their inherent susceptibility to security vulnerabilities poses a significant challenge. Existing training datasets employed for vulnerability detection tools may be limited, potentially compromising their efficacy. This paper presents a method for improving the quantity and quality of smart contract vulnerability datasets and evaluates current detection methods. The approach centers around semantic-preserving code transformation, a technique that modifies the source code structure without altering its semantic meaning. The transformed code snippets are inserted into all potential locations within benign smart contract code, creating new vulnerable contract versions. This method aims to generate a wider variety of vulnerable codes, including those that can bypass detection by current analysis tools. The paper experiments evaluate the method's effectiveness using tools like Slither, Mythril, and CrossFuzz, focusing on metrics like the number of generated vulnerable samples and the false negative rate in detecting these vulnerabilities. The improved results show that many newly created vulnerabilities can bypass tools and the false reporting rate goes up to 100% and increases dataset size minimum by 2.5X.

Updated: 2024-10-30 06:23:13

标题: 代码转换对智能合约漏洞检测的影响

摘要: 智能合约是区块链应用的基础元素，但其固有的易受安全漏洞影响的特性构成了一个重大挑战。用于漏洞检测工具的现有训练数据集可能存在限制，可能会损害它们的功效。本文提出了一种改进智能合约漏洞数据集数量和质量的方法，并评估了当前的检测方法。该方法围绕语义保留代码转换展开，这是一种修改源代码结构而不改变其语义含义的技术。转换后的代码片段被插入到良性智能合约代码的所有潜在位置中，创建新的有漏洞的合约版本。该方法旨在生成更多种类的有漏洞代码，包括那些可以绕过当前分析工具检测的代码。本文中的实验评估了该方法的有效性，使用了像Slither、Mythril和CrossFuzz这样的工具，关注生成的有漏洞样本数量和检测这些漏洞的假阴性率等指标。改进的结果显示，许多新创建的漏洞可以绕过工具，虚报率达到100%，并将数据集大小最少增加2.5倍。

更新时间: 2024-10-30 06:23:13

领域: cs.CR,cs.SE

下载: http://arxiv.org/abs/2410.21685v2

Balancing Cost and Effectiveness of Synthetic Data Generation Strategies for LLMs

As large language models (LLMs) are applied to more use cases, creating high quality, task-specific datasets for fine-tuning becomes a bottleneck for model improvement. Using high quality human data has been the most common approach to unlock model performance, but is prohibitively expensive in many scenarios. Several alternative methods have also emerged, such as generating synthetic or hybrid data, but the effectiveness of these approaches remain unclear, especially in resource-constrained scenarios and tasks that are not easily verified. To investigate this, we group various synthetic data generation strategies into three representative categories -- Answer Augmentation, Question Rephrase and New Question -- and study the performance of student LLMs trained under various constraints, namely seed instruction set size and query budget. We demonstrate that these strategies are not equally effective across settings. Notably, the optimal data generation strategy depends strongly on the ratio between the available teacher query budget and the size of the seed instruction set. When this ratio is low, generating new answers to existing questions proves most effective, but as this ratio increases, generating new questions becomes optimal. Across all tasks, we find that choice of augmentation method and other design choices matter substantially more in low to mid data regimes than in high data regimes. We provide a practical framework for selecting the appropriate augmentation method across settings, taking into account additional factors such as the scalability of each method, the importance of verifying synthetic data, and the use of different LLMs for synthetic data generation.

Updated: 2024-10-30 06:12:49

标题: 平衡LLMs的合成数据生成策略的成本和效果

摘要: 随着大型语言模型（LLMs）被应用于更多用例，为了进行微调而创建高质量、特定任务的数据集成为模型改进的瓶颈。使用高质量的人工数据是解锁模型性能的最常见方法，但在许多情况下成本过高。也出现了几种替代方法，如生成合成数据或混合数据，但这些方法的有效性仍不清楚，特别是在资源受限的情况下以及不容易验证的任务中。为了研究这一问题，我们将各种合成数据生成策略分为三个代表性类别--回答增强、问题重述和新问题--并研究在不同约束条件下训练的学生LLMs的性能，即种子指令集大小和查询预算。我们证明这些策略在不同设置下并不同样有效。值得注意的是，最佳的数据生成策略强烈依赖于可用的教师查询预算与种子指令集大小之间的比例。当这个比例较低时，生成现有问题的新答案被证明是最有效的，但随着这个比例的增加，生成新问题变得最佳。在所有任务中，我们发现在低至中等数据范围内，增强方法的选择和其他设计选择比在高数据范围内更为重要。我们提供了一个实用框架，用于在不同设置中选择适当的增强方法，考虑到其他因素，如每种方法的可扩展性、验证合成数据的重要性以及使用不同的LLMs进行合成数据生成。

更新时间: 2024-10-30 06:12:49

领域: cs.CL,cs.LG

下载: http://arxiv.org/abs/2409.19759v3

Winner-Take-All Column Row Sampling for Memory Efficient Adaptation of Language Model

With the rapid growth in model size, fine-tuning the large pre-trained language model has become increasingly difficult due to its extensive memory usage. Previous works usually focus on reducing the number of trainable parameters in the network. While the model parameters do contribute to memory usage, the primary memory bottleneck during training arises from storing feature maps, also known as activations, as they are crucial for gradient calculation. Notably, neural networks are usually trained using stochastic gradient descent. We argue that in stochastic optimization, models can handle noisy gradients as long as the gradient estimator is unbiased with reasonable variance. Following this motivation, we propose a new family of unbiased estimators called WTA-CRS, for matrix production with reduced variance, which only requires storing the sub-sampled activations for calculating the gradient. Our work provides both theoretical and experimental evidence that, in the context of tuning transformers, our proposed estimators exhibit lower variance compared to existing ones. By replacing the linear operation with our approximated one in transformers, we can achieve up to 2.7$\times$ peak memory reduction with almost no accuracy drop and enables up to $6.4\times$ larger batch size. Under the same hardware, WTA-CRS enables better down-streaming task performance by applying larger models and/or faster training speed with larger batch sizes.

Updated: 2024-10-30 06:12:05

标题: 胜者通吃的列行抽样用于语言模型的内存高效适应

摘要: 随着模型规模的迅速增长，由于大型预训练语言模型的广泛内存使用，微调已变得越来越困难。先前的研究通常侧重于减少网络中可训练参数的数量。虽然模型参数确实会影响内存使用，但在训练过程中主要的内存瓶颈是存储特征映射，也称为激活，因为它们对梯度计算至关重要。值得注意的是，神经网络通常是使用随机梯度下降进行训练的。我们认为，在随机优化中，只要梯度估计器是无偏的并且具有合理的方差，模型就可以处理嘈杂的梯度。基于这一动机，我们提出了一种新的无偏估计器家族，称为WTA-CRS，用于矩阵乘法，具有降低方差，仅需要存储用于计算梯度的子采样激活。我们的工作提供了在调整变压器时，我们提出的估计器相对于现有估计器具有更低方差的理论和实验证据。通过在变压器中用我们的近似线性运算替换线性运算，我们可以实现高达2.7倍的峰值内存降低，几乎不损失精度，并使批量大小增加高达6.4倍。在相同的硬件条件下，WTA-CRS使得通过应用更大的模型和/或更快的训练速度以及更大的批量大小来实现更好的下游任务性能成为可能。

更新时间: 2024-10-30 06:12:05

领域: cs.LG,cs.CL

下载: http://arxiv.org/abs/2305.15265v3

Enhancing binary classification: A new stacking method via leveraging computational geometry

Stacking, a potent ensemble learning method, leverages a meta-model to harness the strengths of multiple base models, thereby enhancing prediction accuracy. Traditional stacking techniques typically utilize established learning models, such as logistic regression, as the meta-model. This paper introduces a novel approach that integrates computational geometry techniques, specifically solving the maximum weighted rectangle problem, to develop a new meta-model for binary classification. Our method is evaluated on multiple open datasets, with statistical analysis showing its stability and demonstrating improvements in accuracy compared to current state-of-the-art stacking methods with out-of-fold predictions. This new stacking method also boasts two significant advantages: enhanced interpretability and the elimination of hyperparameter tuning for the meta-model, thus increasing its practicality. These merits make our method highly applicable not only in stacking ensemble learning but also in various real-world applications, such as hospital health evaluation scoring and bank credit scoring systems, offering a fresh evaluation perspective.

Updated: 2024-10-30 06:11:08

标题: 增强二元分类：通过利用计算几何学的新堆叠方法

摘要: Stacking是一种强大的集成学习方法，利用元模型来利用多个基本模型的优势，从而提高预测准确性。传统的堆叠技术通常利用已建立的学习模型，如逻辑回归，作为元模型。本文介绍了一种集成计算几何技术的新方法，特别是解决最大加权矩形问题，以开发用于二元分类的新元模型。我们的方法在多个开放数据集上进行评估，统计分析显示其稳定性，并展示了与当前最先进的堆叠方法相比的准确性提高，其中包括离线预测。这种新的堆叠方法还具有两个显着优势：增强的可解释性和元模型的超参数调整的消除，从而提高了其实用性。这些优点使我们的方法不仅在堆叠集成学习中高度适用，而且在各种实际应用中，如医院健康评估评分和银行信用评分系统中，提供了一种新的评估视角。

更新时间: 2024-10-30 06:11:08

领域: cs.LG,cs.CG,68T05, 68U05,I.3.6; G.2.1

下载: http://arxiv.org/abs/2410.22722v1

Improving Causal Reasoning in Large Language Models: A Survey

Causal reasoning (CR) is a crucial aspect of intelligence, essential for problem-solving, decision-making, and understanding the world. While large language models (LLMs) can generate rationales for their outputs, their ability to reliably perform causal reasoning remains uncertain, often falling short in tasks requiring a deep understanding of causality. In this survey, we provide a comprehensive review of research aimed at enhancing LLMs for causal reasoning. We categorize existing methods based on the role of LLMs: either as reasoning engines or as helpers providing knowledge or data to traditional CR methods, followed by a detailed discussion of the methodologies in each category. We then evaluate the performance of LLMs on various causal reasoning tasks, providing key findings and in-depth analysis. Finally, we provide insights from current studies and highlight promising directions for future research. We aim for this work to serve as a comprehensive resource, fostering further advancements in causal reasoning with LLMs. Resources are available at https://github.com/chendl02/Awesome-LLM-causal-reasoning.

Updated: 2024-10-30 06:10:14

标题: Improving Causal Reasoning in Large Language Models: A Survey 改进大型语言模型中的因果推理：一项调查

摘要: 因果推理（CR）是智能的关键方面，对问题解决、决策制定和对世界的理解至关重要。虽然大型语言模型（LLMs）可以为其输出生成合理性，但它们在可靠地执行因果推理方面仍存在不确定性，通常在需要深入理解因果关系的任务中表现不佳。在本调查中，我们对旨在增强LLMs用于因果推理的研究进行了全面回顾。我们根据LLMs的角色将现有方法进行分类：作为推理引擎或作为为传统CR方法提供知识或数据的助手，然后详细讨论每个类别中的方法学。然后，我们评估LLMs在各种因果推理任务中的表现，提供关键发现和深入分析。最后，我们提供来自当前研究的见解，并突出未来研究的有希望的方向。我们的目标是使这项工作成为一个全面的资源，促进LLMs在因果推理方面的进一步发展。资源可在https://github.com/chendl02/Awesome-LLM-causal-reasoning中找到。

更新时间: 2024-10-30 06:10:14

领域: cs.AI,cs.CL

下载: http://arxiv.org/abs/2410.16676v2

Community search signatures as foundation features for human-centered geospatial modeling

Aggregated relative search frequencies offer a unique composite signal reflecting people's habits, concerns, interests, intents, and general information needs, which are not found in other readily available datasets. Temporal search trends have been successfully used in time series modeling across a variety of domains such as infectious diseases, unemployment rates, and retail sales. However, most existing applications require curating specialized datasets of individual keywords, queries, or query clusters, and the search data need to be temporally aligned with the outcome variable of interest. We propose a novel approach for generating an aggregated and anonymized representation of search interest as foundation features at the community level for geospatial modeling. We benchmark these features using spatial datasets across multiple domains. In zip codes with a population greater than 3000 that cover over 95% of the contiguous US population, our models for predicting missing values in a 20% set of holdout counties achieve an average $R^2$ score of 0.74 across 21 health variables, and 0.80 across 6 demographic and environmental variables. Our results demonstrate that these search features can be used for spatial predictions without strict temporal alignment, and that the resulting models outperform spatial interpolation and state of the art methods using satellite imagery features.

Updated: 2024-10-30 06:09:22

标题: 社区搜索签名作为以人为中心的地理空间建模的基础特征

摘要: 汇总的相对搜索频率提供了一个独特的综合信号，反映了人们的习惯、关注点、兴趣、意图和一般信息需求，这些信息在其他可获得的数据集中找不到。时间搜索趋势已成功应用于各种领域的时间序列建模，如传染病、失业率和零售销售。然而，大多数现有应用需要整理专门的数据集，包括个别关键词、查询或查询群集，搜索数据需要与感兴趣的结果变量在时间上对齐。我们提出了一种新方法，用于在社区级别生成聚合和匿名化的搜索兴趣表示，作为地理空间建模的基础特征。我们使用跨多个领域的空间数据集对这些特征进行基准测试。在覆盖了95%以上美国连续区人口的人口超过3000的邮政编码中，我们用于预测保留县20%数据缺失值的模型在21个健康变量中实现平均$R^2$得分为0.74，在6个人口统计和环境变量中为0.80。我们的结果表明，这些搜索特征可以用于空间预测，无需严格的时间对齐，而且生成的模型优于使用卫星图像特征的空间插值和最先进方法。

更新时间: 2024-10-30 06:09:22

领域: cs.LG

下载: http://arxiv.org/abs/2410.22721v1

Conformal Classification with Equalized Coverage for Adaptively Selected Groups

This paper introduces a conformal inference method to evaluate uncertainty in classification by generating prediction sets with valid coverage conditional on adaptively chosen features. These features are carefully selected to reflect potential model limitations or biases. This can be useful to find a practical compromise between efficiency -- by providing informative predictions -- and algorithmic fairness -- by ensuring equalized coverage for the most sensitive groups. We demonstrate the validity and effectiveness of this method on simulated and real data sets.

Updated: 2024-10-30 05:52:09

标题: 自适应选定组的等覆盖度符合分类

摘要: 本文介绍了一种一致推断方法，用于评估分类中的不确定性，通过生成具有有效覆盖条件的预测集，这些预测集是基于自适应选择的特征。这些特征被精心选择，以反映潜在的模型局限性或偏见。这对于找到在提供信息性预测方面的效率和在确保对最敏感群体的覆盖范围平等方面的算法公平之间的实际妥协是有用的。我们在模拟和真实数据集上展示了这种方法的有效性和有效性。

更新时间: 2024-10-30 05:52:09

领域: stat.ML,cs.LG

下载: http://arxiv.org/abs/2405.15106v2

Identifiability Analysis of Linear ODE Systems with Hidden Confounders

The identifiability analysis of linear Ordinary Differential Equation (ODE) systems is a necessary prerequisite for making reliable causal inferences about these systems. While identifiability has been well studied in scenarios where the system is fully observable, the conditions for identifiability remain unexplored when latent variables interact with the system. This paper aims to address this gap by presenting a systematic analysis of identifiability in linear ODE systems incorporating hidden confounders. Specifically, we investigate two cases of such systems. In the first case, latent confounders exhibit no causal relationships, yet their evolution adheres to specific functional forms, such as polynomial functions of time $t$. Subsequently, we extend this analysis to encompass scenarios where hidden confounders exhibit causal dependencies, with the causal structure of latent variables described by a Directed Acyclic Graph (DAG). The second case represents a more intricate variation of the first case, prompting a more comprehensive identifiability analysis. Accordingly, we conduct detailed identifiability analyses of the second system under various observation conditions, including both continuous and discrete observations from single or multiple trajectories. To validate our theoretical results, we perform a series of simulations, which support and substantiate our findings.

Updated: 2024-10-30 05:46:38

标题: 线性ODE系统中具有隐藏混淆因素的可辨识性分析

摘要: 线性常微分方程（ODE）系统的可识别性分析对于对这些系统进行可靠因果推断是必要的先决条件。虽然在系统完全可观测的情况下可识别性已经得到深入研究，但当潜变量与系统交互时，可识别性的条件仍未被探讨。本文旨在填补这一空白，通过对包含隐藏混杂因素的线性ODE系统的可识别性进行系统分析。具体来说，我们研究了这种系统的两种情况。在第一种情况下，潜在混杂因素没有因果关系，但它们的演变遵循特定的功能形式，比如时间t的多项式函数。随后，我们将这种分析扩展到包括隐藏混杂因素展现因果依赖的情况，其中潜在变量的因果结构由有向无环图（DAG）描述。第二种情况是对第一种情况的更复杂变体，需要进行更全面的可识别性分析。因此，我们在各种观测条件下对第二个系统进行了详细的可识别性分析，包括来自单个或多个轨迹的连续和离散观测。为了验证我们的理论结果，我们进行了一系列模拟实验，这些实验支持和证实了我们的发现。

更新时间: 2024-10-30 05:46:38

领域: stat.ML,cs.LG

下载: http://arxiv.org/abs/2410.21917v2

Speak Out of Turn: Safety Vulnerability of Large Language Models in Multi-turn Dialogue

Large Language Models (LLMs) have been demonstrated to generate illegal or unethical responses, particularly when subjected to "jailbreak." Research on jailbreak has highlighted the safety issues of LLMs. However, prior studies have predominantly focused on single-turn dialogue, ignoring the potential complexities and risks presented by multi-turn dialogue, a crucial mode through which humans derive information from LLMs. In this paper, we argue that humans could exploit multi-turn dialogue to induce LLMs into generating harmful information. LLMs may not intend to reject cautionary or borderline unsafe queries, even if each turn is closely served for one malicious purpose in a multi-turn dialogue. Therefore, by decomposing an unsafe query into several sub-queries for multi-turn dialogue, we induced LLMs to answer harmful sub-questions incrementally, culminating in an overall harmful response. Our experiments, conducted across a wide range of LLMs, indicate current inadequacies in the safety mechanisms of LLMs in multi-turn dialogue. Our findings expose vulnerabilities of LLMs in complex scenarios involving multi-turn dialogue, presenting new challenges for the safety of LLMs.

Updated: 2024-10-30 05:43:51

标题: 说出轮次不当：多轮对话中大型语言模型的安全漏洞

摘要: 大型语言模型（LLMs）已被证明在经历“越狱”时会生成非法或不道德的回应。对越狱的研究凸显了LLMs的安全问题。然而，先前的研究主要集中在单轮对话上，忽略了多轮对话可能带来的复杂性和风险，这是人类从LLMs获取信息的关键方式。本文认为人类可以利用多轮对话诱使LLMs生成有害信息。即使在多轮对话中每个轮次都是为了一个恶意目的而服务的，LLMs可能不会拒绝警告性或边缘不安全的查询。因此，通过将一个不安全的查询分解为多个子查询用于多轮对话，我们逐步诱使LLMs回答有害的子问题，最终导致整体有害的回应。我们在一系列LLMs上进行的实验表明，当前的LLMs在多轮对话中的安全机制存在不足。我们的发现揭示了LLMs在涉及多轮对话的复杂场景中的脆弱性，为LLMs的安全提出了新的挑战。

更新时间: 2024-10-30 05:43:51

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2402.17262v2

Learning General Parameterized Policies for Infinite Horizon Average Reward Constrained MDPs via Primal-Dual Policy Gradient Algorithm

This paper explores the realm of infinite horizon average reward Constrained Markov Decision Processes (CMDPs). To the best of our knowledge, this work is the first to delve into the regret and constraint violation analysis of average reward CMDPs with a general policy parametrization. To address this challenge, we propose a primal dual-based policy gradient algorithm that adeptly manages the constraints while ensuring a low regret guarantee toward achieving a global optimal policy. In particular, our proposed algorithm achieves $\tilde{\mathcal{O}}({T}^{4/5})$ objective regret and $\tilde{\mathcal{O}}({T}^{4/5})$ constraint violation bounds.

Updated: 2024-10-30 05:42:06

标题: 通过原始-对偶策略梯度算法学习无限时间跨度平均奖励受限MDPs的一般参数化策略

摘要: 本文探讨了无限时间跨度平均奖励受限马尔可夫决策过程（CMDPs）的领域。据我们所知，这项工作是第一次深入研究具有一般政策参数化的平均奖励CMDPs的遗憾和约束违规分析。为了解决这一挑战，我们提出了一种基于原始对偶的政策梯度算法，能够熟练地管理约束，同时确保在实现全局最优政策方面具有低遗憾保证。特别是，我们提出的算法实现了$ \tilde{\mathcal{O}}({T}^{4/5}) $的目标遗憾和$ \tilde{\mathcal{O}}({T}^{4/5}) $的约束违规界限。

更新时间: 2024-10-30 05:42:06

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2402.02042v3

Robotic State Recognition with Image-to-Text Retrieval Task of Pre-Trained Vision-Language Model and Black-Box Optimization

State recognition of the environment and objects, such as the open/closed state of doors and the on/off of lights, is indispensable for robots that perform daily life support and security tasks. Until now, state recognition methods have been based on training neural networks from manual annotations, preparing special sensors for the recognition, or manually programming to extract features from point clouds or raw images. In contrast, we propose a robotic state recognition method using a pre-trained vision-language model, which is capable of Image-to-Text Retrieval (ITR) tasks. We prepare several kinds of language prompts in advance, calculate the similarity between these prompts and the current image by ITR, and perform state recognition. By applying the optimal weighting to each prompt using black-box optimization, state recognition can be performed with higher accuracy. Experiments show that this theory enables a variety of state recognitions by simply preparing multiple prompts without retraining neural networks or manual programming. In addition, since only prompts and their weights need to be prepared for each recognizer, there is no need to prepare multiple models, which facilitates resource management. It is possible to recognize the open/closed state of transparent doors, the state of whether water is running or not from a faucet, and even the qualitative state of whether a kitchen is clean or not, which have been challenging so far, through language.

Updated: 2024-10-30 05:34:52

标题: 用预训练的视觉-语言模型和黑盒优化进行机器人状态识别的图像到文本检索任务

摘要: 环境和物体的状态识别，例如门的开合状态和灯的开关状态，对于执行日常生活支持和安全任务的机器人是不可或缺的。到目前为止，状态识别方法基于从手动注释训练神经网络、为识别准备特殊传感器，或手动编程从点云或原始图像中提取特征。相比之下，我们提出了一种使用预训练的视觉语言模型的机器人状态识别方法，该模型能够进行图像到文本检索（ITR）任务。我们提前准备了几种语言提示，通过ITR计算这些提示与当前图像之间的相似度，并进行状态识别。通过使用黑盒优化对每个提示应用最佳加权，可以实现更高准确度的状态识别。实验证明，这一理论通过简单准备多个提示而无需重新训练神经网络或手动编程，就能实现多种状态识别。此外，由于每个识别器只需要准备提示和其权重，不需要准备多个模型，这有利于资源管理。通过语言，可以识别透明门的开合状态，水龙头是否流水的状态，甚至厨房是否清洁等迄今为止具有挑战性的定性状态。

更新时间: 2024-10-30 05:34:52

领域: cs.RO,cs.AI,cs.CV

下载: http://arxiv.org/abs/2410.22707v1

PACER: Physics Informed Uncertainty Aware Climate Emulator

Climate models serve as critical tools for evaluating the effects of climate change and projecting future climate scenarios. However, the reliance on numerical simulations of physical equations renders them computationally intensive and inefficient. While deep learning methodologies have made significant progress in weather forecasting, they are still unstable for climate emulation tasks. Here, we propose PACER, a lightweight 684K parameter Physics Informed Uncertainty Aware Climate Emulator. PACER emulates temperature and precipitation stably for 86 years while only being trained on greenhouse gas emissions data. We incorporate a fundamental physical law of advection-diffusion in PACER accounting for boundary conditions and empirically estimating the diffusion co-efficient and flow velocities from emissions data. PACER has been trained on 15 climate models provided by ClimateSet outperforming baselines across most of the climate models and advancing a new state of the art in a climate diagnostic task.

Updated: 2024-10-30 05:33:12

标题: PACER：物理信息不确定性感知气候模拟器

摘要: 气候模型是评估气候变化影响和预测未来气候情景的关键工具。然而，对物理方程的数值模拟依赖使它们在计算上消耗巨大且效率低下。虽然深度学习方法在天气预报方面取得了显著进展，但对气候仿真任务仍不稳定。在这里，我们提出了PACER，一个轻量级684K参数的物理信息不确定性感知气候仿真器。PACER在仅训练温室气体排放数据的情况下，稳定地模拟了86年的温度和降水情况。我们在PACER中结合了平流-扩散的基本物理定律，考虑了边界条件，并从排放数据中经验地估计了扩散系数和流速。PACER已经在ClimateSet提供的15个气候模型上进行了训练，在大多数气候模型上优于基线，并推动了气候诊断任务的新的技术水平。

更新时间: 2024-10-30 05:33:12

领域: physics.ao-ph,cs.AI,cs.LG

下载: http://arxiv.org/abs/2410.21657v2

LeanAgent: Lifelong Learning for Formal Theorem Proving

Large Language Models (LLMs) have been successful in mathematical reasoning tasks such as formal theorem proving when integrated with interactive proof assistants like Lean. Existing approaches involve training or fine-tuning an LLM on a specific dataset to perform well on particular domains, such as undergraduate-level mathematics. These methods struggle with generalizability to advanced mathematics. A fundamental limitation is that these approaches operate on static domains, failing to capture how mathematicians often work across multiple domains and projects simultaneously or cyclically. We present LeanAgent, a novel lifelong learning framework for theorem proving that continuously generalizes to and improves on ever-expanding mathematical knowledge without forgetting previously learned knowledge. LeanAgent introduces several key innovations, including a curriculum learning strategy that optimizes the learning trajectory in terms of mathematical difficulty, a dynamic database for efficient management of evolving mathematical knowledge, and progressive training to balance stability and plasticity. LeanAgent successfully proves 162 theorems previously unproved by humans across 23 diverse Lean repositories, many from advanced mathematics. It performs significantly better than the static LLM baseline, proving challenging theorems in domains like abstract algebra and algebraic topology while showcasing a clear progression of learning from basic concepts to advanced topics. In addition, we analyze LeanAgent's superior performance on key lifelong learning metrics. LeanAgent achieves exceptional scores in stability and backward transfer, where learning new tasks improves performance on previously learned tasks. This emphasizes LeanAgent's continuous generalizability and improvement, explaining its superior theorem-proving performance.

Updated: 2024-10-30 05:20:25

标题: LeanAgent：形式定理证明的终身学习

摘要: 大型语言模型（LLMs）在数学推理任务中取得了成功，例如与Lean等交互式证明助手集成时的形式定理证明。现有方法涉及在特定数据集上训练或微调LLM，以便在特定领域（如本科水平数学）表现良好。这些方法在推广到高级数学方面方面上存在困难。一个根本的局限是这些方法在静态领域上操作，未能捕捉到数学家通常跨多个领域和项目同时或循环工作的方式。我们提出了LeanAgent，这是一个新颖的生命周期学习框架，用于定理证明，可以持续泛化并改进不断扩大的数学知识，而不会忘记之前学到的知识。LeanAgent引入了几个关键创新，包括一个课程学习策略，可以以数学难度为优化学习轨迹，一个动态数据库，用于高效管理不断演变的数学知识，以及渐进式训练，以平衡稳定性和可塑性。LeanAgent成功证明了162个之前人类未证明的定理，涵盖了23个不同的Lean存储库，其中许多属于高级数学。它比静态LLM基准表现得更好，证明了抽象代数和代数拓扑等领域中具有挑战性的定理，同时展示了从基本概念到高级主题的学习清晰的进展。此外，我们分析了LeanAgent在关键的终身学习指标上的卓越表现。LeanAgent在稳定性和向后转移方面取得了异常的分数，即学习新任务可以提高先前学习任务的表现。这强调了LeanAgent的持续泛化和改进，解释了其卓越的定理证明表现。

更新时间: 2024-10-30 05:20:25

领域: cs.LG,cs.AI,cs.LO

下载: http://arxiv.org/abs/2410.06209v5

CIDGMed: Causal Inference-Driven Medication Recommendation with Enhanced Dual-Granularity Learning

Medication recommendation aims to integrate patients' long-term health records to provide accurate and safe medication combinations for specific health states. Existing methods often fail to deeply explore the true causal relationships between diseases/procedures and medications, resulting in biased recommendations. Additionally, in medication representation learning, the relationships between information at different granularities of medications, coarse-grained (medication itself) and fine-grained (molecular level), are not effectively integrated, leading to biases in representation learning. To address these limitations, we propose the Causal Inference-driven Dual-Granularity Medication Recommendation method (CIDGMed). Our approach leverages causal inference to uncover the relationships between diseases/procedures and medications, thereby enhancing the rationality and interpretability of recommendations. By integrating coarse-grained medication effects with fine-grained molecular structure information, CIDGMed provides a comprehensive representation of medications. Additionally, we employ a bias correction model during the prediction phase to further refine recommendations, ensuring both accuracy and safety. Through extensive experiments, CIDGMed significantly outperforms current state-of-the-art models across multiple metrics, achieving a 2.54% increase in accuracy, a 3.65% reduction in side effects, and a 39.42% improvement in time efficiency. Additionally, we demonstrate the rationale of CIDGMed through a case study.

Updated: 2024-10-30 05:18:03

标题: CIDGMed：具有增强双粒度学习的因果推断驱动药物推荐

摘要: 药物推荐旨在整合患者的长期健康记录，为特定健康状态提供准确安全的药物组合。现有方法往往未能深入探索疾病/程序与药物之间真实的因果关系，导致偏见推荐。此外，在药物表示学习中，药物在不同颗粒度信息之间的关系，即粗粒度（药物本身）和细粒度（分子水平），未能有效整合，导致表示学习中的偏见。为解决这些局限性，我们提出了基于因果推断的双颗粒度药物推荐方法（CIDGMed）。我们的方法利用因果推断揭示疾病/程序与药物之间的关系，从而提高推荐的合理性和可解释性。通过将粗颗粒度药物效果与细颗粒度分子结构信息整合，CIDGMed提供了药物的全面表示。此外，在预测阶段我们采用偏见校正模型进一步优化推荐，确保准确性和安全性。通过大量实验证明，CIDGMed在多个指标上明显优于当前的最先进模型，准确率提高了2.54%，副作用减少了3.65%，时间效率提高了39.42%。此外，我们通过一个案例研究展示了CIDGMed的合理性。

更新时间: 2024-10-30 05:18:03

领域: cs.IR,cs.AI

下载: http://arxiv.org/abs/2403.00880v2

Exactly Minimax-Optimal Locally Differentially Private Sampling

The sampling problem under local differential privacy has recently been studied with potential applications to generative models, but a fundamental analysis of its privacy-utility trade-off (PUT) remains incomplete. In this work, we define the fundamental PUT of private sampling in the minimax sense, using the f-divergence between original and sampling distributions as the utility measure. We characterize the exact PUT for both finite and continuous data spaces under some mild conditions on the data distributions, and propose sampling mechanisms that are universally optimal for all f-divergences. Our numerical experiments demonstrate the superiority of our mechanisms over baselines, in terms of theoretical utilities for finite data space and of empirical utilities for continuous data space.

Updated: 2024-10-30 05:13:18

标题: 确切的最小化最优局部差分隐私抽样

摘要: 最近，针对局部差分隐私下的采样问题进行了研究，可能应用于生成模型，但其隐私-效用权衡（PUT）的基本分析仍未完成。在这项工作中，我们以极大-极小的方式定义了私有采样的基本PUT，将原始分布和采样分布之间的f-散度作为效用度量。我们在一些数据分布的条件下对有限和连续数据空间的确切PUT进行了表征，并提出了在所有f-散度下都是最优的采样机制。我们的数值实验表明，相比基线，我们的机制在有限数据空间的理论效用和连续数据空间的实证效用方面具有优势。

更新时间: 2024-10-30 05:13:18

领域: cs.LG,cs.CR

下载: http://arxiv.org/abs/2410.22699v1

An Iterative Algorithm for Regularized Non-negative Matrix Factorizations

We generalize the non-negative matrix factorization algorithm of Lee and Seung to accept a weighted norm, and to support ridge and Lasso regularization. We recast the Lee and Seung multiplicative update as an additive update which does not get stuck on zero values. We apply the companion R package rnnmf to the problem of finding a reduced rank representation of a database of cocktails.

Updated: 2024-10-30 05:12:06

标题: 一个用于正则化非负矩阵分解的迭代算法

摘要: 我们将李和Seung的非负矩阵分解算法推广为接受加权范数，并支持岭回归和Lasso正则化。我们将李和Seung的乘法更新重新构造为一个不会卡在零值上的加法更新。我们将伴侣R包rnnmf应用于解决一个鸡尾酒数据库的降维表示问题。

更新时间: 2024-10-30 05:12:06

领域: cs.LG,math.OC,stat.AP,stat.CO,90-10,G.1.6; J.4

下载: http://arxiv.org/abs/2410.22698v1

MassiveGNN: Efficient Training via Prefetching for Massively Connected Distributed Graphs

Graph Neural Networks (GNN) are indispensable in learning from graph-structured data, yet their rising computational costs, especially on massively connected graphs, pose significant challenges in terms of execution performance. To tackle this, distributed-memory solutions such as partitioning the graph to concurrently train multiple replicas of GNNs are in practice. However, approaches requiring a partitioned graph usually suffer from communication overhead and load imbalance, even under optimal partitioning and communication strategies due to irregularities in the neighborhood minibatch sampling. This paper proposes practical trade-offs for improving the sampling and communication overheads for representation learning on distributed graphs (using popular GraphSAGE architecture) by developing a parameterized continuous prefetch and eviction scheme on top of the state-of-the-art Amazon DistDGL distributed GNN framework, demonstrating about 15-40% improvement in end-to-end training performance on the National Energy Research Scientific Computing Center's (NERSC) Perlmutter supercomputer for various OGB datasets.

Updated: 2024-10-30 05:10:38

标题: MassiveGNN: 高效训练通过预取以用于大规模连接的分布式图网络

摘要: 图神经网络（GNN）在从图结构化数据中学习中不可或缺，然而它们不断增加的计算成本，尤其是在大规模连接的图上，给执行性能带来了重大挑战。为了解决这个问题，分布式内存解决方案，如将图分区以同时训练多个GNN的副本，正在实践中使用。然而，需要分区图的方法通常会由于邻域小批量采样中的不规则性而遭受通信开销和负载不平衡的问题，即使在最佳分区和通信策略下也是如此。本文提出了一种实用的权衡方法，通过在最先进的Amazon DistDGL分布式GNN框架上开发参数化的连续预取和驱逐方案，来改善在分布式图上表示学习的采样和通信开销（使用流行的GraphSAGE架构），在美国国家能源研究科学计算中心（NERSC）Perlmutter超级计算机上展示了大约15-40%的端到端训练性能改善，针对各种OGB数据集。

更新时间: 2024-10-30 05:10:38

领域: cs.DC,cs.LG,cs.PF

下载: http://arxiv.org/abs/2410.22697v1

MR-Ben: A Meta-Reasoning Benchmark for Evaluating System-2 Thinking in LLMs

Large language models (LLMs) have shown increasing capability in problem-solving and decision-making, largely based on the step-by-step chain-of-thought reasoning processes. However, evaluating these reasoning abilities has become increasingly challenging. Existing outcome-based benchmarks are beginning to saturate, becoming less effective in tracking meaningful progress. To address this, we present a process-based benchmark MR-Ben that demands a meta-reasoning skill, where LMs are asked to locate and analyse potential errors in automatically generated reasoning steps. Our meta-reasoning paradigm is especially suited for system-2 slow thinking, mirroring the human cognitive process of carefully examining assumptions, conditions, calculations, and logic to identify mistakes. MR-Ben comprises 5,975 questions curated by human experts across a wide range of subjects, including physics, chemistry, logic, coding, and more. Through our designed metrics for assessing meta-reasoning on this benchmark, we identify interesting limitations and weaknesses of current LLMs (open-source and closed-source models). For example, with models like the o1 series from OpenAI demonstrating strong performance by effectively scrutinizing the solution space, many other state-of-the-art models fall significantly behind on MR-Ben, exposing potential shortcomings in their training strategies and inference methodologies.

Updated: 2024-10-30 05:07:26

标题: MR-Ben：用于评估LLMs中系统2思维的元推理基准

摘要: Large language models (LLMs) have shown increasing capability in problem-solving and decision-making, largely based on the step-by-step chain-of-thought reasoning processes. However, evaluating these reasoning abilities has become increasingly challenging. Existing outcome-based benchmarks are beginning to saturate, becoming less effective in tracking meaningful progress. To address this, we present a process-based benchmark MR-Ben that demands a meta-reasoning skill, where LMs are asked to locate and analyse potential errors in automatically generated reasoning steps. Our meta-reasoning paradigm is especially suited for system-2 slow thinking, mirroring the human cognitive process of carefully examining assumptions, conditions, calculations, and logic to identify mistakes. MR-Ben comprises 5,975 questions curated by human experts across a wide range of subjects, including physics, chemistry, logic, coding, and more. Through our designed metrics for assessing meta-reasoning on this benchmark, we identify interesting limitations and weaknesses of current LLMs (open-source and closed-source models). For example, with models like the o1 series from OpenAI demonstrating strong performance by effectively scrutinizing the solution space, many other state-of-the-art models fall significantly behind on MR-Ben, exposing potential shortcomings in their training strategies and inference methodologies.

更新时间: 2024-10-30 05:07:26

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2406.13975v2

Permutation Invariant Learning with High-Dimensional Particle Filters

Sequential learning in deep models often suffers from challenges such as catastrophic forgetting and loss of plasticity, largely due to the permutation dependence of gradient-based algorithms, where the order of training data impacts the learning outcome. In this work, we introduce a novel permutation-invariant learning framework based on high-dimensional particle filters. We theoretically demonstrate that particle filters are invariant to the sequential ordering of training minibatches or tasks, offering a principled solution to mitigate catastrophic forgetting and loss-of-plasticity. We develop an efficient particle filter for optimizing high-dimensional models, combining the strengths of Bayesian methods with gradient-based optimization. Through extensive experiments on continual supervised and reinforcement learning benchmarks, including SplitMNIST, SplitCIFAR100, and ProcGen, we empirically show that our method consistently improves performance, while reducing variance compared to standard baselines.

Updated: 2024-10-30 05:06:55

标题: 高维粒子滤波器的置换不变学习

摘要: 深度模型中的连续学习经常面临挑战，例如灾难性遗忘和可塑性丧失，这在很大程度上是由于基于梯度的算法对排列的依赖，训练数据的顺序会影响学习结果。在这项工作中，我们介绍了一种基于高维粒子滤波器的新型排列不变学习框架。我们理论上证明了粒子滤波器对于训练小批量或任务的顺序是不变的，提供了一个有原则的解决方案来减轻灾难性遗忘和可塑性丧失。我们开发了一种高效的粒子滤波器来优化高维模型，结合了贝叶斯方法和基于梯度的优化的优势。通过对连续监督学习和增强学习基准的广泛实验，包括SplitMNIST、SplitCIFAR100和ProcGen，我们在经验上展示了我们的方法在提高性能的同时降低了方差，与标准基线相比性能持续改善。

更新时间: 2024-10-30 05:06:55

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2410.22695v1

On the Credibility of Backdoor Attacks Against Object Detectors in the Physical World

Object detectors are vulnerable to backdoor attacks. In contrast to classifiers, detectors possess unique characteristics, architecturally and in task execution; often operating in challenging conditions, for instance, detecting traffic signs in autonomous cars. But, our knowledge dominates attacks against classifiers and tests in the "digital domain". To address this critical gap, we conducted an extensive empirical study targeting multiple detector architectures and two challenging detection tasks in real-world settings: traffic signs and vehicles. Using the diverse, methodically collected videos captured from driving cars and flying drones, incorporating physical object trigger deployments in authentic scenes, we investigated the viability of physical object-triggered backdoor attacks in application settings. Our findings revealed 8 key insights. Importantly, the prevalent "digital" data poisoning method for injecting backdoors into models does not lead to effective attacks against detectors in the real world, although proven effective in classification tasks. We construct a new, cost-efficient attack method, dubbed MORPHING, incorporating the unique nature of detection tasks; ours is remarkably successful in injecting physical object-triggered backdoors, even capable of poisoning triggers with clean label annotations or invisible triggers without diminishing the success of physical object triggered backdoors. We discovered that the defenses curated are ill-equipped to safeguard detectors against such attacks. To underscore the severity of the threat and foster further research, we, for the first time, release an extensive video test set of real-world backdoor attacks. Our study not only establishes the credibility and seriousness of this threat but also serves as a clarion call to the research community to advance backdoor defenses in the context of object detection.

Updated: 2024-10-30 05:03:50

标题: 关于在物理世界中对物体检测器进行后门攻击的可信度

摘要: 目标检测器容易受到后门攻击的威胁。与分类器相比，检测器在体系结构和任务执行方面具有独特的特点，通常在具有挑战性的条件下运行，例如在自动驾驶汽车中检测交通标志。然而，我们对分类器的攻击了解较多，并在“数字领域”进行测试。为填补这一关键差距，我们开展了一项广泛的实证研究，针对多个检测器架构和两个具有挑战性的检测任务，即交通标志和车辆，在真实环境中进行。利用从驾驶汽车和飞行无人机拍摄的多样化、系统地收集的视频，并在真实场景中部署物理对象触发器，我们调查了物理对象触发的后门攻击在应用场景中的可行性。我们的研究结果揭示了8个关键见解。重要的是，“数字”数据污染方法用于向模型注入后门在现实世界中不会导致对检测器的有效攻击，尽管在分类任务中被证明有效。我们构建了一种新的、成本效益高的攻击方法，命名为MORPHING，结合了检测任务的独特性；我们在注入物理对象触发的后门方面取得了显著成功，甚至能够在不减弱物理对象触发的后门成功率的情况下污染带有干净标签注释或看不见触发器。我们发现目前的防御措施无法保护检测器免受此类攻击。为强调这种威胁的严重性并促进进一步研究，我们首次发布了大量实际后门攻击的视频测试集。我们的研究不仅确立了这种威胁的可信度和严重性，而且作为对研究界的号召，以推进目标检测背门防御的研究。

更新时间: 2024-10-30 05:03:50

领域: cs.CR

下载: http://arxiv.org/abs/2408.12122v2

Geometric-Averaged Preference Optimization for Soft Preference Labels

Many algorithms for aligning LLMs with human preferences assume that human preferences are binary and deterministic. However, human preferences can vary across individuals, and therefore should be represented distributionally. In this work, we introduce the distributional soft preference labels and improve Direct Preference Optimization (DPO) with a weighted geometric average of the LLM output likelihood in the loss function. This approach adjusts the scale of learning loss based on the soft labels such that the loss would approach zero when the responses are closer to equally preferred. This simple modification can be easily applied to any DPO-based methods and mitigate over-optimization and objective mismatch, which prior works suffer from. Our experiments simulate the soft preference labels with AI feedback from LLMs and demonstrate that geometric averaging consistently improves performance on standard benchmarks for alignment research. In particular, we observe more preferable responses than binary labels and significant improvements where modestly-confident labels are in the majority.

Updated: 2024-10-30 05:02:12

标题: 几何平均偏好优化软偏好标签

摘要: 许多用于将LLMs与人类偏好对齐的算法都假设人类偏好是二元的和确定性的。然而，人类偏好可能会因个体而异，因此应该以分布方式表示。在这项工作中，我们引入了分布式软偏好标签，并通过在损失函数中使用LLM输出可能性的加权几何平均来改进直接偏好优化（DPO）。这种方法根据软标签调整学习损失的规模，使得当响应更接近于平等偏好时，损失会接近零。这种简单修改可以轻松应用于任何基于DPO的方法，并缓解过度优化和客观不匹配的问题，前期工作中存在这些问题。我们的实验使用来自LLMs的人工智能反馈模拟软偏好标签，并表明几何平均在对齐研究的标准基准上始终改善性能。特别地，我们观察到比二元标签更可取的响应，并在大多数情况下出现稍微自信的标签时显著改善。

更新时间: 2024-10-30 05:02:12

领域: cs.LG,cs.AI,cs.CL

下载: http://arxiv.org/abs/2409.06691v2

Opportunities and Challenges of Generative-AI in Finance

Gen-AI techniques are able to improve understanding of context and nuances in language modeling, translation between languages, handle large volumes of data, provide fast, low-latency responses and can be fine-tuned for various tasks and domains. In this manuscript, we present a comprehensive overview of the applications of Gen-AI techniques in the finance domain. In particular, we present the opportunities and challenges associated with the usage of Gen-AI techniques. We also illustrate the various methodologies which can be used to train Gen-AI techniques and present the various application areas of Gen-AI technologies in the finance ecosystem. To the best of our knowledge, this work represents the most comprehensive summarization of Gen-AI techniques within the financial domain. The analysis is designed for a deep overview of areas marked for substantial advancement while simultaneously pin-point those warranting future prioritization. We also hope that this work would serve as a conduit between finance and other domains, thus fostering the cross-pollination of innovative concepts and practices.

Updated: 2024-10-30 04:52:42

标题: 金融领域中生成式人工智能的机遇与挑战

摘要: Gen-AI技术能够提高对语境和细微差别的理解，在语言建模、语言翻译、处理大量数据、提供快速、低延迟的响应，并可以针对各种任务和领域进行调优。在这篇稿件中，我们全面介绍了Gen-AI技术在金融领域的应用。特别是，我们介绍了使用Gen-AI技术所涉及的机遇和挑战。我们还说明了可以用来训练Gen-AI技术的各种方法，并展示了Gen-AI技术在金融生态系统中的各种应用领域。据我们所知，这项工作代表了金融领域中Gen-AI技术最全面的总结。这项分析旨在深入了解标记为重要进展领域的领域，同时强调未来需要优先考虑的领域。我们也希望这项工作能够成为金融和其他领域之间的桥梁，促进创新概念和实践的跨领域交流。

更新时间: 2024-10-30 04:52:42

领域: cs.AI

下载: http://arxiv.org/abs/2410.15653v2

Choice between Partial Trajectories

As AI agents generate increasingly sophisticated behaviors, manually encoding human preferences to guide these agents becomes more challenging. To address this, it has been suggested that agents instead learn preferences from human choice data. This approach requires a model of choice behavior that the agent can use to interpret the data. For choices between partial trajectories of states and actions, previous models assume choice probabilities to be determined by the partial return or the cumulative advantage. We consider an alternative model based instead on the bootstrapped return, which adds to the partial return an estimate of the future return. Benefits of the bootstrapped return model stem from its treatment of human beliefs. Unlike partial return, choices based on bootstrapped return reflect human beliefs about the environment. Further, while recovering the reward function from choices based on cumulative advantage requires that those beliefs are correct, doing so from choices based on bootstrapped return does not. To motivate the bootstrapped return model, we formulate axioms and prove an Alignment Theorem. This result formalizes how, for a general class of human preferences, such models are able to disentangle goals from beliefs. This ensures recovery of an aligned reward function when learning from choices based on bootstrapped return. The bootstrapped return model also affords greater robustness to choice behavior. Even when choices are based on partial return, learning via a bootstrapped return model recovers an aligned reward function. The same holds with choices based on the cumulative advantage if the human and the agent both adhere to correct and consistent beliefs about the environment. On the other hand, if choices are based on bootstrapped return, learning via partial return or cumulative advantage models does not generally produce an aligned reward function.

Updated: 2024-10-30 04:52:22

标题: 选择部分轨迹

摘要: 随着人工智能代理生成越来越复杂的行为，手动编码人类偏好以指导这些代理变得更具挑战性。为了解决这个问题，有人建议代理代替从人类选择数据中学习偏好。这种方法需要一个选择行为模型，代理可以用来解释数据。对于状态和动作的部分轨迹之间的选择，先前的模型假定选择概率由部分回报或累积优势确定。我们考虑一个基于自举回报的替代模型，它在部分回报基础上增加了对未来回报的估计。自举回报模型的好处源于其对人类信念的处理。与部分回报不同，基于自举回报的选择反映了人类对环境的信念。此外，虽然从基于累积优势的选择中恢复奖励函数需要这些信念是正确的，但从基于自举回报的选择中恢复奖励函数则不需要。为了推动自举回报模型，我们制定了公理并证明了一个对齐定理。这个结果形式化了如何对于一般类别的人类偏好，这些模型能够将目标与信念分离开来。这确保了在从基于自举回报的选择中学习时恢复一个对齐的奖励函数。自举回报模型还为选择行为提供了更大的稳健性。即使选择是基于部分回报，通过自举回报模型学习可以恢复一个对齐的奖励函数。如果人类和代理都遵循关于环境的正确和一致的信念，那么基于累积优势的选择也是如此。另一方面，如果选择是基于自举回报，那么通过部分回报或累积优势模型学习通常不会产生一个对齐的奖励函数。

更新时间: 2024-10-30 04:52:22

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2410.22690v1

Lina-Speech: Gated Linear Attention is a Fast and Parameter-Efficient Learner for text-to-speech synthesis

Neural codec language models have achieved state-of-the-art performance in text-to-speech (TTS) synthesis, leveraging scalable architectures like autoregressive transformers and large-scale speech datasets. By framing voice cloning as a prompt continuation task, these models excel at cloning voices from short audio samples. However, this approach is limited in its ability to handle numerous or lengthy speech excerpts, since the concatenation of source and target speech must fall within the maximum context length which is determined during training. In this work, we introduce Lina-Speech, a model that replaces traditional self-attention mechanisms with emerging recurrent architectures like Gated Linear Attention (GLA). Building on the success of initial-state tuning on RWKV, we extend this technique to voice cloning, enabling the use of multiple speech samples and full utilization of the context window in synthesis. This approach is fast, easy to deploy, and achieves performance comparable to fine-tuned baselines when the dataset size ranges from 3 to 15 minutes. Notably, Lina-Speech matches or outperforms state-of-the-art baseline models, including some with a parameter count up to four times higher or trained in an end-to-end style. We release our code and checkpoints. Audio samples are available at https://theodorblackbird.github.io/blog/demo_lina/.

Updated: 2024-10-30 04:50:40

标题: Lina-Speech：门控线性注意力是一种快速且参数高效的文本转语音合成学习器

摘要: 神经编解码语言模型在文本转语音（TTS）合成中取得了最先进的性能，利用了可扩展的架构，如自回归变压器和大规模语音数据集。通过将语音克隆视为一个提示继续任务，这些模型擅长从短音频样本中克隆声音。然而，这种方法在处理大量或较长的演讲摘录时能力有限，因为源和目标语音的连接必须在训练期间确定的最大上下文长度内。在这项工作中，我们引入了Lina-Speech，该模型用新兴的循环架构（如门控线性注意力（GLA））取代了传统的自注意力机制。在RWKV上的初始状态调整取得成功后，我们将这种技术扩展到语音克隆，使其能够使用多个语音样本并充分利用合成中的上下文窗口。这种方法快速、易于部署，并在数据集大小范围从3到15分钟时实现与微调基线相当的性能。值得注意的是，Lina-Speech与最先进的基线模型相匹配或优于一些参数计数高达四倍或以端到端方式训练的模型。我们发布了我们的代码和检查点。音频样本可在https://theodorblackbird.github.io/blog/demo_lina/ 获取。

更新时间: 2024-10-30 04:50:40

领域: eess.AS,cs.AI,cs.SD

下载: http://arxiv.org/abs/2410.23320v1

Multi-Task Interactive Robot Fleet Learning with Visual World Models

Recent advancements in large-scale multi-task robot learning offer the potential for deploying robot fleets in household and industrial settings, enabling them to perform diverse tasks across various environments. However, AI-enabled robots often face challenges with generalization and robustness when exposed to real-world variability and uncertainty. We introduce Sirius-Fleet, a multi-task interactive robot fleet learning framework to address these challenges. Sirius-Fleet monitors robot performance during deployment and involves humans to correct the robot's actions when necessary. We employ a visual world model to predict the outcomes of future actions and build anomaly predictors to predict whether they will likely result in anomalies. As the robot autonomy improves, the anomaly predictors automatically adapt their prediction criteria, leading to fewer requests for human intervention and gradually reducing human workload over time. Evaluations on large-scale benchmarks demonstrate Sirius-Fleet's effectiveness in improving multi-task policy performance and monitoring accuracy. We demonstrate Sirius-Fleet's performance in both RoboCasa in simulation and Mutex in the real world, two diverse, large-scale multi-task benchmarks. More information is available on the project website: https://ut-austin-rpl.github.io/sirius-fleet

Updated: 2024-10-30 04:49:39

标题: 多任务交互式机器人舰队学习与视觉世界模型

摘要: 最近在大规模多任务机器人学习方面取得的进展为在家庭和工业环境中部署机器人舰队提供了潜力，使它们能够在不同环境中执行多样化任务。然而，当AI启用的机器人面临真实世界的变化和不确定性时，往往面临泛化和稳健性方面的挑战。我们引入了Sirius-Fleet，一个多任务交互式机器人舰队学习框架，以解决这些挑战。Sirius-Fleet在部署过程中监控机器人的性能，并在必要时涉及人类来纠正机器人的行动。我们利用视觉世界模型来预测未来行动的结果，并构建异常预测器来预测它们是否可能导致异常。随着机器人自主性的提高，异常预测器会自动调整其预测标准，从而减少对人类干预的请求，并逐渐减少人类工作量。在大规模基准测试上的评估表明，Sirius-Fleet在提高多任务策略性能和监控准确性方面的有效性。我们在仿真中展示了Sirius-Fleet在RoboCasa和真实世界中的表现，这两个多样化、大规模多任务基准测试。有关更多信息，请访问项目网站：https://ut-austin-rpl.github.io/sirius-fleet

更新时间: 2024-10-30 04:49:39

领域: cs.RO,cs.AI

下载: http://arxiv.org/abs/2410.22689v1

Reward Difference Optimization For Sample Reweighting In Offline RLHF

With the rapid advances in Large Language Models (LLMs), aligning LLMs with human preferences become increasingly important. Although Reinforcement Learning with Human Feedback (RLHF) proves effective, it is complicated and highly resource-intensive. As such, offline RLHF has been introduced as an alternative solution, which directly optimizes LLMs with ranking losses on a fixed preference dataset. Current offline RLHF only captures the "ordinal relationship" between responses, overlooking the crucial aspect of how much one is preferred over the others. To address this issue, we propose a simple yet effective solution called Reward Difference Optimization, shorted as RDO. Specifically, we introduce reward difference coefficients to reweigh sample pairs in offline RLHF. We then develop a difference model which captures rich interactions between a pair of responses for predicting these difference coefficients. Experiments with 7B LLMs on the HH and TL;DR datasets substantiate the effectiveness of our method in both automatic metrics and human evaluation, thereby highlighting its potential for aligning LLMs with human intent and values

Updated: 2024-10-30 04:47:00

标题: Offline RLHF中的样本重新加权奖励差异优化

摘要: 随着大语言模型（LLMs）的快速发展，将LLMs与人类偏好对齐变得越来越重要。尽管使用人类反馈的强化学习（RLHF）被证明是有效的，但它复杂且资源密集。因此，离线RLHF被引入作为一种替代解决方案，直接在固定偏好数据集上使用排名损失优化LLMs。当前的离线RLHF只捕捉了响应之间的“顺序关系”，忽略了一个响应相对于其他响应的偏好程度这一关键方面。为了解决这个问题，我们提出了一种简单而有效的解决方案，称为奖励差异优化（RDO）。具体来说，我们引入奖励差异系数来重新加权离线RLHF中的样本对。然后，我们开发了一个差异模型，捕捉一对响应之间丰富的交互以预测这些差异系数。在HH和TL;DR数据集上对7B LLMs进行的实验证实了我们的方法在自动指标和人类评估中的有效性，从而突显了其将LLMs与人类意图和价值对齐的潜力。

更新时间: 2024-10-30 04:47:00

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2408.09385v2

Improving Uncertainty Quantification in Large Language Models via Semantic Embeddings

Accurately quantifying uncertainty in large language models (LLMs) is crucial for their reliable deployment, especially in high-stakes applications. Current state-of-the-art methods for measuring semantic uncertainty in LLMs rely on strict bidirectional entailment criteria between multiple generated responses and also depend on sequence likelihoods. While effective, these approaches often overestimate uncertainty due to their sensitivity to minor wording differences, additional correct information, and non-important words in the sequence. We propose a novel approach that leverages semantic embeddings to achieve smoother and more robust estimation of semantic uncertainty in LLMs. By capturing semantic similarities without depending on sequence likelihoods, our method inherently reduces any biases introduced by irrelevant words in the answers. Furthermore, we introduce an amortised version of our approach by explicitly modelling semantics as latent variables in a joint probabilistic model. This allows for uncertainty estimation in the embedding space with a single forward pass, significantly reducing computational overhead compared to existing multi-pass methods. Experiments across multiple question-answering datasets and frontier LLMs demonstrate that our embedding-based methods provide more accurate and nuanced uncertainty quantification than traditional approaches.

Updated: 2024-10-30 04:41:46

标题: 通过语义嵌入改进大型语言模型中的不确定性量化

摘要: 准确量化大型语言模型（LLMs）中的不确定性对于它们的可靠部署至关重要，特别是在高风险应用中。目前用于测量LLMs中语义不确定性的最先进方法依赖于多个生成的响应之间严格的双向蕴涵标准，并且还依赖于序列可能性。尽管有效，这些方法通常会由于对微小措辞差异、额外的正确信息和序列中的无关紧要的词语敏感而高估不确定性。我们提出了一种新颖的方法，利用语义嵌入来实现对LLMs中语义不确定性的更平滑和更健壮的估计。通过捕获语义相似性而不依赖于序列可能性，我们的方法本质上减少了答案中无关词语引入的任何偏见。此外，我们引入了我们方法的一种摊销版本，通过在联合概率模型中明确地对语义建模为潜变量。这允许在嵌入空间中进行不确定性估计，只需进行一次前向传递，与现有的多次传递方法相比，显著减少了计算开销。跨多个问答数据集和前沿LLMs进行的实验表明，我们基于嵌入的方法比传统方法提供更准确和更细致的不确定性量化。

更新时间: 2024-10-30 04:41:46

领域: cs.LG,cs.AI,cs.CL

下载: http://arxiv.org/abs/2410.22685v1

M$^2$PT: Multimodal Prompt Tuning for Zero-shot Instruction Learning

Multimodal Large Language Models (MLLMs) demonstrate remarkable performance across a wide range of domains, with increasing emphasis on enhancing their zero-shot generalization capabilities for unseen tasks across various modalities. Instruction tuning has emerged as an effective strategy for achieving zero-shot generalization by finetuning pretrained models on diverse multimodal tasks. As the scale of MLLMs continues to grow, parameter-efficient finetuning becomes increasingly critical. However, most existing parameter-efficient approaches focus only on single modalities and often overlook the multimodal characteristics during finetuning. In this work, we introduce a novel Multimodal Prompt Tuning (M$^2$PT) approach for efficient instruction tuning of MLLMs. M$^2$PT effectively integrates visual and textual prompts into the vision encoder and language processor respectively during finetuning, facilitating the extraction and alignment of features across modalities. Empirical results on various multimodal evaluation datasets demonstrate the superior performance of our approach compared to several state-of-the-art baselines. A comprehensive set of ablation studies validates the effectiveness of our prompt design and the efficiency of our approach.

Updated: 2024-10-30 04:38:52

标题: M$^2$PT：用于零-shot指令学习的多模态提示调整

摘要: 多模态大型语言模型（MLLMs）在各个领域展现出卓越的性能，越来越强调增强它们在不同模态下未见任务的零射泛化能力。指令微调已经成为一种有效的策略，通过在多样的多模态任务上微调预训练模型实现零射泛化。随着MLLMs规模的不断扩大，参数高效微调变得越来越关键。然而，大多数现有的参数高效方法仅关注单一模态，并且在微调过程中经常忽视多模态特征。在这项工作中，我们引入了一种新颖的多模态提示微调（M$^2$PT）方法，用于高效微调MLLMs。M$^2$PT在微调过程中有效地将视觉和文本提示分别整合到视觉编码器和语言处理器中，促进跨模态特征的提取和对齐。在各种多模态评估数据集上的实证结果表明，与几种最先进的基线方法相比，我们的方法表现出更好的性能。一系列全面的消融研究验证了我们的提示设计的有效性和我们方法的效率。

更新时间: 2024-10-30 04:38:52

领域: cs.AI,cs.CL,cs.LG

下载: http://arxiv.org/abs/2409.15657v4

Linear Transformers are Versatile In-Context Learners

Recent research has demonstrated that transformers, particularly linear attention models, implicitly execute gradient-descent-like algorithms on data provided in-context during their forward inference step. However, their capability in handling more complex problems remains unexplored. In this paper, we prove that each layer of a linear transformer maintains a weight vector for an implicit linear regression problem and can be interpreted as performing a variant of preconditioned gradient descent. We also investigate the use of linear transformers in a challenging scenario where the training data is corrupted with different levels of noise. Remarkably, we demonstrate that for this problem linear transformers discover an intricate and highly effective optimization algorithm, surpassing or matching in performance many reasonable baselines. We analyze this algorithm and show that it is a novel approach incorporating momentum and adaptive rescaling based on noise levels. Our findings show that even linear transformers possess the surprising ability to discover sophisticated optimization strategies.

Updated: 2024-10-30 04:27:00

标题: 线性变换器是多功能的上下文学习器

摘要: 最近的研究表明，变压器，尤其是线性注意力模型，在前向推断步骤中隐含地执行梯度下降类似的算法，对提供的数据进行上下文处理。然而，它们在处理更复杂问题方面的能力尚未被探索。在本文中，我们证明线性变压器的每一层都维护着一个权重向量，用于隐含的线性回归问题，并可以解释为执行一种预条件梯度下降的变体。我们还研究了在线性变压器中使用线性变换器的挑战性场景，其中训练数据受到不同水平噪声的干扰。值得注意的是，我们证明对于这个问题，线性变压器发现了一种复杂而高效的优化算法，超越或匹敌许多合理的基准线性。我们分析了这种算法，并展示了它是一种基于噪声水平的动量和自适应重缩放的新颖方法。我们的发现表明，即使是线性变压器也具有惊人的能力，可以发现复杂的优化策略。

更新时间: 2024-10-30 04:27:00

领域: cs.LG

下载: http://arxiv.org/abs/2402.14180v2

Byzantine-Robust Federated Learning: An Overview With Focus on Developing Sybil-based Attacks to Backdoor Augmented Secure Aggregation Protocols

Federated Learning (FL) paradigms enable large numbers of clients to collaboratively train Machine Learning models on private data. However, due to their multi-party nature, traditional FL schemes are left vulnerable to Byzantine attacks that attempt to hurt model performance by injecting malicious backdoors. A wide variety of prevention methods have been proposed to protect frameworks from such attacks. This paper provides a exhaustive and updated taxonomy of existing methods and frameworks, before zooming in and conducting an in-depth analysis of the strengths and weaknesses of the Robustness of Federated Learning (RoFL) protocol. From there, we propose two novel Sybil-based attacks that take advantage of vulnerabilities in RoFL. Finally, we conclude with comprehensive proposals for future testing, describe and detail implementation of the proposed attacks, and offer direction for improvements in the RoFL protocol as well as Byzantine-robust frameworks as a whole.

Updated: 2024-10-30 04:20:22

标题: 拜占庭强韧联邦学习：以发展基于赛伯尔攻击的后门增强安全聚合协议为重点的概述

摘要: 联合学习（FL）范式使大量客户端能够共同在私人数据上训练机器学习模型。然而，由于它们的多方性质，传统的FL方案容易受到拜占庭攻击的威胁，这些攻击试图通过注入恶意后门来损害模型性能。为了保护框架免受此类攻击，已经提出了各种防范方法。本文提供了现有方法和框架的详尽和最新的分类，然后着重对Federated Learning（RoFL）协议的鲁棒性进行深入分析，分析了其优势和劣势。在此基础上，我们提出了两种利用RoFL中的漏洞的新型Sybil攻击。最后，我们提出了未来测试的全面建议，描述和详细实现了提出的攻击，并为改进RoFL协议以及整体拜占庭鲁棒框架提供了方向。

更新时间: 2024-10-30 04:20:22

领域: cs.LG,cs.CR

下载: http://arxiv.org/abs/2410.22680v1

Make-An-Agent: A Generalizable Policy Network Generator with Behavior-Prompted Diffusion

Can we generate a control policy for an agent using just one demonstration of desired behaviors as a prompt, as effortlessly as creating an image from a textual description? In this paper, we present Make-An-Agent, a novel policy parameter generator that leverages the power of conditional diffusion models for behavior-to-policy generation. Guided by behavior embeddings that encode trajectory information, our policy generator synthesizes latent parameter representations, which can then be decoded into policy networks. Trained on policy network checkpoints and their corresponding trajectories, our generation model demonstrates remarkable versatility and scalability on multiple tasks and has a strong generalization ability on unseen tasks to output well-performed policies with only few-shot demonstrations as inputs. We showcase its efficacy and efficiency on various domains and tasks, including varying objectives, behaviors, and even across different robot manipulators. Beyond simulation, we directly deploy policies generated by Make-An-Agent onto real-world robots on locomotion tasks. Project page: https://cheryyunl.github.io/make-an-agent/

Updated: 2024-10-30 04:16:58

标题: 生成代理：一个具有行为提示扩散的可泛化策略网络生成器

摘要: 我们能否仅通过一次所需行为的演示来生成一个代理的控制策略，就像从文本描述中创建图像一样轻松？在本文中，我们提出了Make-An-Agent，这是一个新颖的策略参数生成器，利用条件扩散模型的能力进行行为到策略的生成。受到编码轨迹信息的行为嵌入的指导，我们的策略生成器合成潜在参数表示，然后可以解码成策略网络。通过对策略网络检查点及其对应轨迹进行训练，我们的生成模型在多个任务上展现出了显著的多样性和可扩展性，并且在看不见的任务上具有强大的泛化能力，仅使用少量演示作为输入输出表现良好的策略。我们展示了它在各个领域和任务上的有效性和效率，包括不同目标、行为，甚至跨不同的机器人操作器。除了模拟，我们还直接将Make-An-Agent生成的策略部署到现实世界的机器人上，进行运动任务。项目页面：https://cheryyunl.github.io/make-an-agent/

更新时间: 2024-10-30 04:16:58

领域: cs.AI

下载: http://arxiv.org/abs/2407.10973v2

2D-OOB: Attributing Data Contribution Through Joint Valuation Framework

Data valuation has emerged as a powerful framework for quantifying each datum's contribution to the training of a machine learning model. However, it is crucial to recognize that the quality of cells within a single data point can vary greatly in practice. For example, even in the case of an abnormal data point, not all cells are necessarily noisy. The single scalar score assigned by existing data valuation methods blurs the distinction between noisy and clean cells of a data point, making it challenging to interpret the data values. In this paper, we propose 2D-OOB, an out-of-bag estimation framework for jointly determining helpful (or detrimental) samples as well as the particular cells that drive them. Our comprehensive experiments demonstrate that 2D-OOB achieves state-of-the-art performance across multiple use cases while being exponentially faster. Specifically, 2D-OOB shows promising results in detecting and rectifying fine-grained outliers at the cell level, and localizing backdoor triggers in data poisoning attacks.

Updated: 2024-10-30 04:10:12

标题: 2D-OOB：通过联合估值框架归因数据贡献

摘要: 数据估值已经成为一个强大的框架，用于量化每个数据对机器学习模型训练的贡献。然而，在实践中，必须认识到单个数据点内的单元格质量可能会有很大差异。例如，即使在异常数据点的情况下，也并非所有单元格都一定是嘈杂的。现有数据估值方法分配的单一标量分数模糊了数据点的嘈杂和干净单元格之间的区别，使得难以解释数据值。在本文中，我们提出了2D-OOB，一个用于共同确定有益（或有害）样本以及驱动它们的特定单元格的袋外估计框架。我们的全面实验表明，2D-OOB在多种用例中实现了最先进的性能，同时速度指数级增长。具体来说，2D-OOB在单元格级别检测和纠正精细异常值，以及在数据污染攻击中定位后门触发器方面展现出有希望的结果。

更新时间: 2024-10-30 04:10:12

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2408.03572v2

Real-Time Recurrent Learning using Trace Units in Reinforcement Learning

Recurrent Neural Networks (RNNs) are used to learn representations in partially observable environments. For agents that learn online and continually interact with the environment, it is desirable to train RNNs with real-time recurrent learning (RTRL); unfortunately, RTRL is prohibitively expensive for standard RNNs. A promising direction is to use linear recurrent architectures (LRUs), where dense recurrent weights are replaced with a complex-valued diagonal, making RTRL efficient. In this work, we build on these insights to provide a lightweight but effective approach for training RNNs in online RL. We introduce Recurrent Trace Units (RTUs), a small modification on LRUs that we nonetheless find to have significant performance benefits over LRUs when trained with RTRL. We find RTUs significantly outperform other recurrent architectures across several partially observable environments while using significantly less computation.

Updated: 2024-10-30 04:07:51

标题: 使用迹单位在强化学习中进行实时递归学习

摘要: 循环神经网络（RNNs）被用于在部分可观测的环境中学习表示。对于在线学习并持续与环境交互的代理，使用实时循环学习（RTRL）来训练RNNs是理想的；不幸的是，对于标准RNNs来说，RTRL的成本过高。一个有希望的方向是使用线性循环结构（LRUs），其中密集的循环权重被复数值对角线取代，使得RTRL更加高效。在这项工作中，我们基于这些见解提出了一种轻量但有效的方法来训练在线RL中的RNNs。我们引入了循环跟踪单元（RTUs），这是对LRUs的一个小修改，然而我们发现当使用RTRL训练时，RTUs相比LRUs具有显著的性能优势。我们发现RTUs在使用较少计算资源的情况下，在多个部分可观测的环境中明显优于其他循环结构。

更新时间: 2024-10-30 04:07:51

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2409.01449v2

Backdoor Attack Against Vision Transformers via Attention Gradient-Based Image Erosion

Vision Transformers (ViTs) have outperformed traditional Convolutional Neural Networks (CNN) across various computer vision tasks. However, akin to CNN, ViTs are vulnerable to backdoor attacks, where the adversary embeds the backdoor into the victim model, causing it to make wrong predictions about testing samples containing a specific trigger. Existing backdoor attacks against ViTs have the limitation of failing to strike an optimal balance between attack stealthiness and attack effectiveness. In this work, we propose an Attention Gradient-based Erosion Backdoor (AGEB) targeted at ViTs. Considering the attention mechanism of ViTs, AGEB selectively erodes pixels in areas of maximal attention gradient, embedding a covert backdoor trigger. Unlike previous backdoor attacks against ViTs, AGEB achieves an optimal balance between attack stealthiness and attack effectiveness, ensuring the trigger remains invisible to human detection while preserving the model's accuracy on clean samples. Extensive experimental evaluations across various ViT architectures and datasets confirm the effectiveness of AGEB, achieving a remarkable Attack Success Rate (ASR) without diminishing Clean Data Accuracy (CDA). Furthermore, the stealthiness of AGEB is rigorously validated, demonstrating minimal visual discrepancies between the clean and the triggered images.

Updated: 2024-10-30 04:06:12

标题: 通过基于注意力梯度的图像侵蚀对视觉变换器进行背门攻击

摘要: Vision Transformers（ViTs）在各种计算机视觉任务中表现优于传统的卷积神经网络（CNN）。然而，类似于CNN，ViTs容易受到后门攻击的影响，其中对手将后门嵌入受害模型中，导致其对包含特定触发器的测试样本进行错误预测。现有针对ViTs的后门攻击存在一个问题，即未能在攻击的潜在性和有效性之间实现最佳平衡。在这项工作中，我们提出了一种基于注意力梯度的侵蚀后门攻击（AGEB），针对ViTs。考虑到ViTs的注意力机制，AGEB选择性地侵蚀在最大注意力梯度区域的像素，嵌入一个隐秘的后门触发器。与以往针对ViTs的后门攻击不同，AGEB实现了攻击潜在性和有效性之间的最佳平衡，确保触发器对人类检测保持不可见，同时在干净样本上保持模型的准确性。通过对各种ViT架构和数据集进行广泛的实验评估，证实了AGEB的有效性，实现了显著的攻击成功率（ASR）而不降低干净数据的准确性。此外，AGEB的潜在性也经过了严格验证，展示了干净图像和触发图像之间的最小视觉差异。

更新时间: 2024-10-30 04:06:12

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2410.22678v1

A Concentration Bound for TD(0) with Function Approximation

We derive a concentration bound of the type `for all $n \geq n_0$ for some $n_0$' for TD(0) with linear function approximation. We work with online TD learning with samples from a single sample path of the underlying Markov chain. This makes our analysis significantly different from offline TD learning or TD learning with access to independent samples from the stationary distribution of the Markov chain. We treat TD(0) as a contractive stochastic approximation algorithm, with both martingale and Markov noises. Markov noise is handled using the Poisson equation and the lack of almost sure guarantees on boundedness of iterates is handled using the concept of relaxed concentration inequalities.

Updated: 2024-10-30 04:00:16

标题: 使用函数逼近的TD(0)算法的集中度界限

摘要: 我们推导出一种类型的浓度界限，对于具有线性函数逼近的TD(0)，对于某个$n_0$，对所有$n \geq n_0$都成立。我们使用来自潜在马尔可夫链的单个样本路径的在线TD学习进行工作。这使得我们的分析与离线TD学习或具有从马尔可夫链的稳态分布中获得独立样本的TD学习显着不同。我们将TD(0)视为一种收敛随机逼近算法，其中既有鞅噪声又有马尔可夫噪声。通过使用泊松方程处理马尔可夫噪声，通过使用弛缓浓度不等式的概念处理迭代的有界性几乎肯定保证的缺失。

更新时间: 2024-10-30 04:00:16

领域: cs.LG,cs.SY,eess.SY,stat.ML

下载: http://arxiv.org/abs/2312.10424v2

Is Function Similarity Over-Engineered? Building a Benchmark

Binary analysis is a core component of many critical security tasks, including reverse engineering, malware analysis, and vulnerability detection. Manual analysis is often time-consuming, but identifying commonly-used or previously-seen functions can reduce the time it takes to understand a new file. However, given the complexity of assembly, and the NP-hard nature of determining function equivalence, this task is extremely difficult. Common approaches often use sophisticated disassembly and decompilation tools, graph analysis, and other expensive pre-processing steps to perform function similarity searches over some corpus. In this work, we identify a number of discrepancies between the current research environment and the underlying application need. To remedy this, we build a new benchmark, REFuSE-Bench, for binary function similarity detection consisting of high-quality datasets and tests that better reflect real-world use cases. In doing so, we address issues like data duplication and accurate labeling, experiment with real malware, and perform the first serious evaluation of ML binary function similarity models on Windows data. Our benchmark reveals that a new, simple basline, one which looks at only the raw bytes of a function, and requires no disassembly or other pre-processing, is able to achieve state-of-the-art performance in multiple settings. Our findings challenge conventional assumptions that complex models with highly-engineered features are being used to their full potential, and demonstrate that simpler approaches can provide significant value.

Updated: 2024-10-30 03:59:46

标题: 功能相似性是否过度设计？构建一个基准

摘要: 二进制分析是许多关键安全任务的核心组成部分，包括逆向工程、恶意软件分析和漏洞检测。手动分析通常耗时，但识别常用或先前见过的功能可以减少理解新文件所需的时间。然而，鉴于汇编的复杂性和确定功能等价性的NP-难性质，这项任务非常困难。常见的方法通常使用复杂的反汇编和反编译工具、图分析和其他昂贵的预处理步骤来在一些语料库上执行功能相似性搜索。在这项工作中，我们确定了当前研究环境与基础应用需求之间的一些差异。为了解决这个问题，我们建立了一个新的基准测试，REFuSE-Bench，用于二进制函数相似性检测，包括高质量的数据集和更能反映实际用例的测试。通过这样做，我们解决了数据重复和准确标记等问题，尝试了真实恶意软件，并在Windows数据上进行了首次严格评估ML二进制函数相似性模型。我们的基准测试显示，一个新的、简单的基线，仅查看函数的原始字节，不需要反汇编或其他预处理，能够在多种环境中实现最先进的性能。我们的发现挑战了传统模型复杂、具有高度工程化特征的假设正在充分利用，证明了更简单的方法可以提供显著价值。

更新时间: 2024-10-30 03:59:46

领域: cs.LG,cs.CR

下载: http://arxiv.org/abs/2410.22677v1

Designing Algorithmic Recommendations to Achieve Human-AI Complementarity

Algorithms frequently assist, rather than replace, human decision-makers. However, the design and analysis of algorithms often focus on predicting outcomes and do not explicitly model their effect on human decisions. This discrepancy between the design and role of algorithmic assistants becomes particularly concerning in light of empirical evidence that suggests that algorithmic assistants again and again fail to improve human decisions. In this article, we formalize the design of recommendation algorithms that assist human decision-makers without making restrictive ex-ante assumptions about how recommendations affect decisions. We formulate an algorithmic-design problem that leverages the potential-outcomes framework from causal inference to model the effect of recommendations on a human decision-maker's binary treatment choice. Within this model, we introduce a monotonicity assumption that leads to an intuitive classification of human responses to the algorithm. Under this assumption, we can express the human's response to algorithmic recommendations in terms of their compliance with the algorithm and the active decision they would take if the algorithm sends no recommendation. We showcase the utility of our framework using an online experiment that simulates a hiring task. We argue that our approach can make sense of the relative performance of different recommendation algorithms in the experiment and can help design solutions that realize human-AI complementarity. Finally, we leverage our approach to derive minimax optimal recommendation algorithms that can be implemented with machine learning using limited training data.

Updated: 2024-10-30 03:56:34

标题: 设计算法推荐以实现人工智能与人类互补性

摘要: 算法通常是协助而非取代人类决策者。然而，算法的设计和分析往往侧重于预测结果，而并未明确建模其对人类决策的影响。设计和角色之间的这种差异在经验证据表明，算法助手一再未能改善人类决策时尤为令人担忧。在本文中，我们正式化了设计能够协助人类决策者的推荐算法，而不会对推荐如何影响决策做出限制性的先验假设。我们制定了一个算法设计问题，利用因果推断的潜在结果框架来建模推荐对人类决策者的二元处理选择的影响。在这个模型中，我们引入了一个单调性假设，导致了对算法的人类反应进行直观分类。在这个假设下，我们可以用算法推荐的人类响应来表达，以及如果算法不发送推荐，他们将采取的主动决策。我们利用一个在线实验展示了我们框架的实用性，模拟了一项招聘任务。我们认为，我们的方法可以解释实验中不同推荐算法的相对表现，并可以帮助设计实现人工智能和人类互补性的解决方案。最后，我们利用我们的方法推导出可以使用有限训练数据实现的机器学习最小最优推荐算法。

更新时间: 2024-10-30 03:56:34

领域: cs.HC,cs.LG,econ.EM,stat.ML

下载: http://arxiv.org/abs/2405.01484v2

Dynamic PET Image Prediction Using a Network Combining Reversible and Irreversible Modules

Dynamic positron emission tomography (PET) images can reveal the distribution of tracers in the organism and the dynamic processes involved in biochemical reactions, and it is widely used in clinical practice. Despite the high effectiveness of dynamic PET imaging in studying the kinetics and metabolic processes of radiotracers. Pro-longed scan times can cause discomfort for both patients and medical personnel. This study proposes a dynamic frame prediction method for dynamic PET imaging, reduc-ing dynamic PET scanning time by applying a multi-module deep learning framework composed of reversible and irreversible modules. The network can predict kinetic parameter images based on the early frames of dynamic PET images, and then generate complete dynamic PET images. In validation experiments with simulated data, our network demonstrated good predictive performance for kinetic parameters and was able to reconstruct high-quality dynamic PET images. Additionally, in clinical data experiments, the network exhibited good generalization performance and attached that the proposed method has promising clinical application prospects.

Updated: 2024-10-30 03:52:21

标题: 使用结合可逆和不可逆模块的网络进行动态PET图像预测

摘要: 动态正电子发射断层扫描（PET）图像可以显示示踪剂在机体内的分布以及生物化学反应中涉及的动态过程，广泛应用于临床实践。尽管动态PET成像在研究放射性示踪剂的动力学和代谢过程方面具有高效性，但长时间的扫描可能会给患者和医护人员带来不适。本研究提出了一种动态PET成像的动态帧预测方法，通过应用由可逆和不可逆模块组成的多模深度学习框架，减少动态PET扫描时间。该网络可以基于动态PET图像的早期帧预测动力学参数图像，然后生成完整的动态PET图像。在模拟数据的验证实验中，我们的网络展示了对动力学参数的良好预测性能，并能够重建高质量的动态PET图像。此外，在临床数据实验中，网络表现出良好的泛化性能，并表明所提出的方法具有有前途的临床应用前景。

更新时间: 2024-10-30 03:52:21

领域: eess.IV,cs.LG

下载: http://arxiv.org/abs/2410.22674v1

Calibrating Practical Privacy Risks for Differentially Private Machine Learning

Differential privacy quantifies privacy through the privacy budget $\epsilon$, yet its practical interpretation is complicated by variations across models and datasets. Recent research on differentially private machine learning and membership inference has highlighted that with the same theoretical $\epsilon$ setting, the likelihood-ratio-based membership inference (LiRA) attacking success rate (ASR) may vary according to specific datasets and models, which might be a better indicator for evaluating real-world privacy risks. Inspired by this practical privacy measure, we study the approaches that can lower the attacking success rate to allow for more flexible privacy budget settings in model training. We find that by selectively suppressing privacy-sensitive features, we can achieve lower ASR values without compromising application-specific data utility. We use the SHAP and LIME model explainer to evaluate feature sensitivities and develop feature-masking strategies. Our findings demonstrate that the LiRA $ASR^M$ on model $M$ can properly indicate the inherent privacy risk of a dataset for modeling, and it's possible to modify datasets to enable the use of larger theoretical $\epsilon$ settings to achieve equivalent practical privacy protection. We have conducted extensive experiments to show the inherent link between ASR and the dataset's privacy risk. By carefully selecting features to mask, we can preserve more data utility with equivalent practical privacy protection and relaxed $\epsilon$ settings. The implementation details are shared online at the provided GitHub URL \url{https://anonymous.4open.science/r/On-sensitive-features-and-empirical-epsilon-lower-bounds-BF67/}.

Updated: 2024-10-30 03:52:01

标题: 校准差分隐私机器学习的实际隐私风险

摘要: 差分隐私通过隐私预算$\epsilon$来量化隐私，然而其实际解释因模型和数据集的差异而变得复杂。最近关于差分私有机器学习和成员推断的研究突显出，在相同的理论$\epsilon$设置下，基于似然比的成员推断(LiRA)攻击成功率(ASR)可能会根据特定数据集和模型而变化，这可能是评估现实世界隐私风险的更好指标。受到这种实际隐私度量的启发，我们研究了可以降低攻击成功率的方法，以允许在模型训练中设置更灵活的隐私预算。我们发现通过选择性地抑制隐私敏感特征，可以在不影响特定应用数据效用的情况下实现更低的ASR值。我们使用SHAP和LIME模型解释器来评估特征敏感性，并开发特征掩码策略。我们的研究结果表明，模型M上的LiRA ASR^M可以适当地指示用于建模的数据集的固有隐私风险，并且可以修改数据集以使用更大的理论$\epsilon$设置以实现等效的实际隐私保护。我们进行了大量实验，展示了ASR与数据集隐私风险之间的固有联系。通过精心选择要屏蔽的特征，我们可以在等效的实际隐私保护和放宽的$\epsilon$设置下保留更多数据效用。实现细节在提供的GitHub URL上共享。

更新时间: 2024-10-30 03:52:01

领域: cs.LG,cs.CR

下载: http://arxiv.org/abs/2410.22673v1

The Impact of Quantum-Safe Cryptography (QSC) on Website Response

Modern web traffic relies on 2048-bit RSA encryption to secure our data in transit. Rapid advances in Quantum Computing pose a grave challenge by allowing hackers to break this encryption in hours. In August of 2024, the National Institute of Standards and Technology published Quantum-Safe Cryptography (QSC) standards, including CRYSTALS-Kyber for general encryption and CRYSTALS-Dilithium, FALCON, and SPHINCS+ for digital signatures. Despite this proactive approach, the slow adoption of encryption protocols remains a concern, leaving a significant portion of data vulnerable to interception. In this context, this study aims to evaluate the impact of NIST's Quantum-Resistant Cryptographic Algorithms on website response times, particularly focusing on SSL handshake time and total download time under varying network conditions. By assessing the performance of these algorithms, this research seeks to provide empirical evidence and a reusable framework for validating the efficacy of QSC in real-world scenarios. It was found that the QSC algorithms outperformed the classical algorithm under normal and congested network conditions. There was also found to be an improvement in the total download time for larger file sizes, and a better performance by QSC under higher latency and packet loss conditions. Therefore, this study recommends that websites switch to QSC when the standards are ratified. These insights are crucial for accelerating the adoption of QSC and ensuring the security of data in the face of quantum computing threats.

Updated: 2024-10-30 03:44:46

标题: 量子安全加密（QSC）对网站响应的影响

摘要: 现代网络流量依赖于2048位RSA加密来保护我们的数据传输安全。量子计算的快速进步带来了严峻的挑战，使黑客能够在几小时内破解这种加密。2024年8月，美国国家标准与技术研究所发布了量子安全密码学（QSC）标准，包括CRYSTALS-Kyber用于一般加密，以及CRYSTALS-Dilithium、FALCON和SPHINCS+用于数字签名。尽管采取了这种积极的方法，加密协议的缓慢采用仍然是一个问题，导致大量数据容易被拦截。在这种背景下，本研究旨在评估NIST的量子抗性密码算法对网站响应时间的影响，特别关注SSL握手时间和在不同网络条件下的总下载时间。通过评估这些算法的性能，本研究旨在为验证QSC在现实场景中的有效性提供经验证据和可重复使用的框架。研究发现，在正常和拥挤的网络条件下，QSC算法优于经典算法。对于较大的文件大小，总下载时间也有所改善，并且在高延迟和丢包条件下，QSC表现更好。因此，本研究建议网站在标准得到认可时转向QSC。这些见解对于加速QSC的采用并确保数据安全面对量子计算威胁至关重要。

更新时间: 2024-10-30 03:44:46

领域: cs.CR,cs.NI

下载: http://arxiv.org/abs/2411.05024v1

A Walsh Hadamard Derived Linear Vector Symbolic Architecture

Vector Symbolic Architectures (VSAs) are one approach to developing Neuro-symbolic AI, where two vectors in $\mathbb{R}^d$ are `bound' together to produce a new vector in the same space. VSAs support the commutativity and associativity of this binding operation, along with an inverse operation, allowing one to construct symbolic-style manipulations over real-valued vectors. Most VSAs were developed before deep learning and automatic differentiation became popular and instead focused on efficacy in hand-designed systems. In this work, we introduce the Hadamard-derived linear Binding (HLB), which is designed to have favorable computational efficiency, and efficacy in classic VSA tasks, and perform well in differentiable systems. Code is available at https://github.com/FutureComputing4AI/Hadamard-derived-Linear-Binding

Updated: 2024-10-30 03:42:59

标题: 一个沃尔什哈达马衍生的线性矢量符号架构

摘要: 矢量符号体系结构（VSAs）是发展神经符号人工智能的一种方法，其中$\mathbb{R}^d$中的两个向量被“绑定”在一起，产生相同空间中的新向量。VSAs支持此绑定操作的可交换性和可结合性，以及一个逆操作，允许构建关于实值向量的符号风格操作。大多数VSAs是在深度学习和自动微分变得流行之前开发的，而不是专注于手工设计系统的有效性。在这项工作中，我们介绍了Hadamard派生的线性绑定（HLB），它被设计为具有良好的计算效率，并在经典VSA任务中具有有效性，并在可微系统中表现良好。代码可在https://github.com/FutureComputing4AI/Hadamard-derived-Linear-Binding 上找到。

更新时间: 2024-10-30 03:42:59

领域: cs.AI,cs.LG

下载: http://arxiv.org/abs/2410.22669v1

Robots Pre-train Robots: Manipulation-Centric Robotic Representation from Large-Scale Robot Datasets

The pre-training of visual representations has enhanced the efficiency of robot learning. Due to the lack of large-scale in-domain robotic datasets, prior works utilize in-the-wild human videos to pre-train robotic visual representation. Despite their promising results, representations from human videos are inevitably subject to distribution shifts and lack the dynamics information crucial for task completion. We first evaluate various pre-trained representations in terms of their correlation to the downstream robotic manipulation tasks (i.e., manipulation centricity). Interestingly, we find that the "manipulation centricity" is a strong indicator of success rates when applied to downstream tasks. Drawing from these findings, we propose Manipulation Centric Representation (MCR), a foundation representation learning framework capturing both visual features and the dynamics information such as actions and proprioceptions of manipulation tasks to improve manipulation centricity. Specifically, we pre-train a visual encoder on the DROID robotic dataset and leverage motion-relevant data such as robot proprioceptive states and actions. We introduce a novel contrastive loss that aligns visual observations with the robot's proprioceptive state-action dynamics, combined with a behavior cloning (BC)-like actor loss to predict actions during pre-training, along with a time contrastive loss. Empirical results across 4 simulation domains with 20 tasks verify that MCR outperforms the strongest baseline method by 14.8%. Moreover, MCR boosts the performance of data-efficient learning with a UR5e arm on 3 real-world tasks by 76.9%. Project website: https://robots-pretrain-robots.github.io/.

Updated: 2024-10-30 03:33:08

标题: 机器人预训练机器人：大型机器人数据集中的基于操作的机器人表示

摘要: 视觉表示的预训练已经提高了机器人学习的效率。由于缺乏大规模领域内的机器人数据集，先前的工作利用野外人类视频来预训练机器人视觉表示。尽管它们取得了令人期待的结果，但来自人类视频的表示不可避免地受到分布转移的影响，缺乏对任务完成至关重要的动态信息。我们首先评估各种预训练表示与下游机器人操作任务（即操作中心性）之间的相关性。有趣的是，我们发现“操作中心性”是应用于下游任务时成功率的强有力指标。基于这些发现，我们提出了操作中心性表示（MCR），这是一个基础表示学习框架，捕捉视觉特征和动态信息（如操作和操控任务的本体感知）以提高操作中心性。具体来说，我们在DROID机器人数据集上预训练一个视觉编码器，并利用与运动相关的数据，如机器人本体感知状态和动作。我们引入了一种新颖的对比损失，将视觉观察与机器人的本体感知状态-动作动态对齐，结合类似行为克隆（BC）的演员损失来在预训练期间预测动作，以及时间对比损失。通过对20个任务的4个模拟领域的实证结果验证，MCR的表现优于最强基线方法14.8％。此外，MCR通过UR5e机械臂在3个真实世界任务上提高了76.9％的数据有效学习性能。项目网站：https://robots-pretrain-robots.github.io/。

更新时间: 2024-10-30 03:33:08

领域: cs.RO,cs.AI,cs.CV

下载: http://arxiv.org/abs/2410.22325v2

Training-Free Exponential Context Extension via Cascading KV Cache

The transformer's context window is vital for tasks such as few-shot learning and conditional generation as it preserves previous tokens for active memory. However, as the context lengths increase, the computational costs grow quadratically, hindering the deployment of large language models (LLMs) in real-world, long sequence scenarios. Although some recent key-value caching (KV Cache) methods offer linear inference complexity, they naively manage the stored context, prematurely evicting tokens and losing valuable information. Moreover, they lack an optimized prefill/prompt stage strategy, resulting in higher latency than even quadratic attention for realistic context sizes. In response, we introduce a novel mechanism that leverages cascading sub-cache buffers to selectively retain the most relevant tokens, enabling the model to maintain longer context histories without increasing the cache size. Our approach outperforms linear caching baselines across key benchmarks, including streaming perplexity, question answering, book summarization, and passkey retrieval, where it retains better retrieval accuracy at 1M tokens after four doublings of the cache size of 65K. Additionally, our method reduces prefill stage latency by a factor of 6.8 when compared to flash attention on 1M tokens. These innovations not only enhance the computational efficiency of LLMs but also pave the way for their effective deployment in resource-constrained environments, enabling large-scale, real-time applications with significantly reduced latency.

Updated: 2024-10-30 03:31:09

标题: 无需训练的指数上下文扩展方法：通过级联键值缓存实现

摘要: 变压器的上下文窗口对于任务如少样本学习和条件生成至关重要，因为它保留了先前的标记以用于活跃记忆。然而，随着上下文长度的增加，计算成本呈二次增长，阻碍了大型语言模型（LLMs）在真实世界，长序列场景中的部署。尽管一些最近的键值缓存（KV缓存）方法提供线性推理复杂度，但它们简单地管理存储的上下文，过早地驱逐标记并丢失宝贵信息。此外，它们缺乏优化的预填/prompt阶段策略，导致在现实上下文大小情况下的延迟甚至高于二次注意力。作为回应，我们引入了一种新颖的机制，利用级联子缓存缓冲区选择性地保留最相关的标记，使模型能够维持更长的上下文历史而不增加缓存大小。我们的方法在关键基准测试中优于线性缓存基线，包括流畅度、问答、书籍总结和通行证检索，在缓存大小从65K加倍四次后，在1M标记时保留更好的检索准确性。此外，与1M标记上的快闪注意力相比，我们的方法将预填阶段的延迟降低了6.8倍。这些创新不仅提高了LLMs的计算效率，还为它们在资源受限环境中的有效部署铺平了道路，实现了具有显著降低延迟的大规模实时应用。

更新时间: 2024-10-30 03:31:09

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2406.17808v2

Automated Trustworthiness Oracle Generation for Machine Learning Text Classifiers

Machine learning (ML) for text classification has been widely used in various domains, such as toxicity detection, chatbot consulting, and review analysis. These applications can significantly impact ethics, economics, and human behavior, raising serious concerns about trusting ML decisions. Several studies indicate that traditional metrics, such as model confidence and accuracy, are insufficient to build human trust in ML models. These models often learn spurious correlations during training and predict based on them during inference. In the real world, where such correlations are absent, their performance can deteriorate significantly. To avoid this, a common practice is to test whether predictions are reasonable. Along with this, a challenge known as the trustworthiness oracle problem has been introduced. Due to the lack of automated trustworthiness oracles, the assessment requires manual validation of the decision process disclosed by explanation methods, which is time-consuming and not scalable. We propose TOKI, the first automated trustworthiness oracle generation method for text classifiers, which automatically checks whether the prediction-contributing words are related to the predicted class using explanation methods and word embeddings. To demonstrate its practical usefulness, we introduce a novel adversarial attack method targeting trustworthiness issues identified by TOKI. We compare TOKI with a naive baseline based solely on model confidence using human-created ground truths of 6,000 predictions. We also compare TOKI-guided adversarial attack method with A2T, a SOTA adversarial attack method. Results show that relying on prediction uncertainty cannot distinguish between trustworthy and untrustworthy predictions, TOKI achieves 142% higher accuracy than the naive baseline, and TOKI-guided adversarial attack method is more effective with fewer perturbations than A2T.

Updated: 2024-10-30 03:26:37

标题: 机器学习文本分类器的自动生成可信赖的信任度Oracle

摘要: 文本分类的机器学习（ML）已被广泛应用于各个领域，如毒性检测、聊天机器人咨询和评论分析。这些应用可以显著影响道德、经济和人类行为，引起了对信任ML决策的严重关注。几项研究表明，传统指标，如模型置信度和准确性，不足以建立人类对ML模型的信任。这些模型在训练过程中通常学习到虚假相关性，并在推断过程中基于这些相关性进行预测。在现实世界中，这种相关性不存在，它们的性能可能会显著下降。为了避免这种情况，一个常见做法是测试预测是否合理。与此同时，引入了一种称为可信度神谕问题的挑战。由于缺乏自动化的可信度神谕，评估需要通过解释方法披露的决策过程进行手动验证，这是耗时且不可扩展的。我们提出了TOKI，这是用于文本分类器的第一个自动化可信度神谕生成方法，它使用解释方法和词嵌入自动检查预测贡献词与预测类别之间是否相关。为了展示其实用性，我们引入了一种新颖的针对TOKI识别的可信度问题的对抗攻击方法。我们将TOKI与仅基于模型置信度的朴素基线以及人工创建的6,000个预测的基本事实进行比较。我们还将TOKI引导的对抗攻击方法与A2T（最先进的对抗攻击方法）进行比较。结果表明，依赖于预测不确定性无法区分可信和不可信的预测，TOKI的准确性比朴素基线高出142％，TOKI引导的对抗攻击方法比A2T更有效，而且扰动更少。

更新时间: 2024-10-30 03:26:37

领域: cs.SE,cs.CL,cs.CR

下载: http://arxiv.org/abs/2410.22663v1

$\textbf{EMOS}$: $\textbf{E}$mbodiment-aware Heterogeneous $\textbf{M}$ulti-robot $\textbf{O}$perating $\textbf{S}$ystem with LLM Agents

Heterogeneous multi-robot systems (HMRS) have emerged as a powerful approach for tackling complex tasks that single robots cannot manage alone. Current large-language-model-based multi-agent systems (LLM-based MAS) have shown success in areas like software development and operating systems, but applying these systems to robot control presents unique challenges. In particular, the capabilities of each agent in a multi-robot system are inherently tied to the physical composition of the robots, rather than predefined roles. To address this issue, we introduce a novel multi-agent framework designed to enable effective collaboration among heterogeneous robots with varying embodiments and capabilities, along with a new benchmark named Habitat-MAS. One of our key designs is $\textit{Robot Resume}$: Instead of adopting human-designed role play, we propose a self-prompted approach, where agents comprehend robot URDF files and call robot kinematics tools to generate descriptions of their physics capabilities to guide their behavior in task planning and action execution. The Habitat-MAS benchmark is designed to assess how a multi-agent framework handles tasks that require embodiment-aware reasoning, which includes 1) manipulation, 2) perception, 3) navigation, and 4) comprehensive multi-floor object rearrangement. The experimental results indicate that the robot's resume and the hierarchical design of our multi-agent system are essential for the effective operation of the heterogeneous multi-robot system within this intricate problem context.

Updated: 2024-10-30 03:20:01

标题: EMOS：具有LLM代理的感知感知异构多机器人操作系统

摘要: 异构多机器人系统（HMRS）已经成为解决单一机器人无法独立管理的复杂任务的一种强大方法。当前基于大型语言模型的多智能体系统（LLM-based MAS）在软件开发和操作系统等领域取得了成功，但将这些系统应用于机器人控制提出了独特的挑战。特别是，在多机器人系统中，每个智能体的能力与机器人的物理组成密切相关，而不是预定义的角色。为了解决这个问题，我们引入了一个新颖的多智能体框架，旨在实现异构机器人之间的有效协作，包括各种不同的实体和能力，以及一个名为Habitat-MAS的新基准。我们的一个关键设计是“机器人简历”：我们提出了一种自我促进的方法，代替采用人类设计的角色扮演，其中代理理解机器人URDF文件，并调用机器人运动学工具生成描述他们的物理能力，以指导他们在任务规划和行动执行中的行为。Habitat-MAS基准旨在评估多智能体框架如何处理需要考虑实体感知推理的任务，包括1）操纵，2）感知，3）导航和4）全面的多层对象重新排列。实验结果表明，机器人简历和我们多智能体系统的分层设计对于在这个错综复杂的问题背景下有效运行异构多机器人系统至关重要。

更新时间: 2024-10-30 03:20:01

领域: cs.RO,cs.AI,cs.MA,I.2.7; I.2.8; I.2.9; I.2.10

下载: http://arxiv.org/abs/2410.22662v1

A Benchmark for AI-based Weather Data Assimilation

Recent advancements in Artificial Intelligence (AI) have led to the development of several Large Weather Models (LWMs) that rival State-Of-The-Art (SOTA) Numerical Weather Prediction (NWP) systems. Until now, these models have still relied on traditional NWP-generated analysis fields as input and are far from autonomous. Currently, scientists are increasingly focusing on developing data-driven data assimilation (DA) models for LWMs. To expedite advancements in this field and facilitate the operationalization of data-driven end-to-end weather forecasting systems, we propose DABench, a benchmark constructed by simulated observations, real-world observations, and ERA5 reanalysis. DABench contributes four standard features: (1) sparse and noisy observations provided for both simulated and real-world experiments; (2) a Skillful pre-trained Transformer-based weather prediction model, Sformer, designed to generate background fields while rigorously assessing the impact of assimilation outcomes on predictions; (3) standardized evaluation metrics for the model comparison; (4) a strong DA baseline, 4DVarFormerV2. Our experimental results demonstrate that the end-to-end weather forecasting system, integrating 4DVarFormerV2 and Sformer, can assimilate real-world observations, thereby facilitating a stable DA cycle lasting one year and achieving a skillful forecasting lead time of up to 7 days. The proposed DABench will significantly advance research in AI-based DA, AI-based weather forecasting, and related domains.

Updated: 2024-10-30 03:19:39

标题: 基于人工智能的气象数据同化基准

摘要: 人工智能（AI）的最新进展导致了几个大型天气模型（LWMs）的发展，这些模型与最先进的数值天气预报（NWP）系统相媲美。到目前为止，这些模型仍然依赖传统的NWP生成的分析场作为输入，并且远未实现自主。目前，科学家们越来越专注于为LWMs开发基于数据驱动的数据同化（DA）模型。为了加速该领域的进展并促进基于数据驱动的端到端天气预报系统的运营，我们提出了DABench，这是一个由模拟观测、实际观测和ERA5再分析构建的基准。DABench提供了四个标准特性：（1）为模拟和实际实验提供稀疏和嘈杂的观测数据；（2）一个经过训练的Transformer-based天气预测模型Sformer，旨在生成背景场并严格评估同化结果对预测的影响；（3）用于模型比较的标准化评估指标；（4）一个强大的DA基准线4DVarFormerV2。我们的实验结果表明，集成4DVarFormerV2和Sformer的端到端天气预报系统可以同化实际观测数据，从而实现持续一年的稳定DA循环，并实现最多7天的熟练预报提前时间。所提出的DABench将显着推动基于AI的DA、基于AI的天气预报以及相关领域的研究。

更新时间: 2024-10-30 03:19:39

领域: cs.LG,cs.CV,physics.ao-ph

下载: http://arxiv.org/abs/2408.11438v2

Integration of Large Language Models and Federated Learning

As the parameter size of Large Language Models (LLMs) continues to expand, there is an urgent need to address the scarcity of high-quality data. In response, existing research has attempted to make a breakthrough by incorporating Federated Learning (FL) into LLMs. Conversely, considering the outstanding performance of LLMs in task generalization, researchers have also tried applying LLMs within FL to tackle challenges in relevant domains. The complementarity between LLMs and FL has already ignited widespread research interest. In this paper, we aim to deeply explore the integration of LLMs and FL. We propose a research framework, dividing the fusion of LLMs and FL into three parts: the combination of LLM sub-technologies with FL, the integration of FL sub-technologies with LLMs, and the overall merger of LLMs and FL. We first provide a comprehensive review of the current state of research in the domain of LLMs combined with FL, including their typical applications, integration advantages, challenges faced, and future directions for resolution. Subsequently, we discuss the practical applications of the combination of LLMs and FL in critical scenarios such as healthcare, finance, and education, and provide new perspectives and insights into future research directions for LLMs and FL.

Updated: 2024-10-30 03:04:21

标题: 大型语言模型与联邦学习的整合

摘要: 随着大型语言模型（LLMs）的参数规模不断扩大，迫切需要解决高质量数据的稀缺问题。作为回应，现有研究已尝试通过将联邦学习（FL）整合到LLMs中来取得突破。相反地，考虑到LLMs在任务泛化中的出色表现，研究人员还尝试将LLMs应用于FL以解决相关领域的挑战。LLMs和FL之间的互补性已经引发了广泛的研究兴趣。本文旨在深入探讨LLMs和FL的整合。我们提出了一个研究框架，将LLMs和FL的融合分为三个部分：LLM子技术与FL的结合，FL子技术与LLMs的整合，以及LLMs和FL的整体合并。我们首先全面回顾了LLMs与FL结合领域的当前研究状态，包括它们的典型应用、整合优势、面临的挑战以及未来解决方向。随后，我们讨论了LLMs和FL组合在关键场景（如医疗保健、金融和教育）中的实际应用，并为LLMs和FL的未来研究方向提供新的观点和见解。

更新时间: 2024-10-30 03:04:21

领域: cs.LG,cs.AI,cs.CL

下载: http://arxiv.org/abs/2307.08925v3

Interpretable Multi-View Clustering Based on Anchor Graph Tensor Factorization

The clustering method based on the anchor graph has gained significant attention due to its exceptional clustering performance and ability to process large-scale data. One common approach is to learn bipartite graphs with K-connected components, helping avoid the need for post-processing. However, this method has strict parameter requirements and may not always get K-connected components. To address this issue, an alternative approach is to directly obtain the cluster label matrix by performing non-negative matrix factorization (NMF) on the anchor graph. Nevertheless, existing multi-view clustering methods based on anchor graph factorization lack adequate cluster interpretability for the decomposed matrix and often overlook the inter-view information. We address this limitation by using non-negative tensor factorization to decompose an anchor graph tensor that combines anchor graphs from multiple views. This approach allows us to consider inter-view information comprehensively. The decomposed tensors, namely the sample indicator tensor and the anchor indicator tensor, enhance the interpretability of the factorization. Extensive experiments validate the effectiveness of this method.

Updated: 2024-10-30 03:03:58

标题: 可解释的基于锚图张量分解的多视图聚类

摘要: 基于锚图的聚类方法因其出色的聚类性能和处理大规模数据的能力而备受关注。一种常见的方法是学习具有K个连接组件的二部图，有助于避免后处理的需要。然而，这种方法有严格的参数要求，可能并不总是得到K个连接组件。为了解决这个问题，另一种替代方法是通过在锚图上执行非负矩阵分解（NMF）直接获得聚类标签矩阵。然而，基于锚图分解的现有多视图聚类方法缺乏对分解矩阵的充分解释性，并经常忽视视图间的信息。我们通过使用非负张量分解来分解将来自多个视图的锚图组合在一起的锚图张量来解决这一限制。这种方法使我们能够全面考虑视图间的信息。分解后的张量，即样本指示器张量和锚点指示器张量，增强了分解的可解释性。大量实验证实了这种方法的有效性。

更新时间: 2024-10-30 03:03:58

领域: cs.LG

下载: http://arxiv.org/abs/2404.00883v2

ChatQA: Surpassing GPT-4 on Conversational QA and RAG

In this work, we introduce ChatQA, a suite of models that outperform GPT-4 on retrieval-augmented generation (RAG) and conversational question answering (QA). To enhance generation, we propose a two-stage instruction tuning method that significantly boosts the performance of RAG. For effective retrieval, we introduce a dense retriever optimized for conversational QA, which yields results comparable to the alternative state-of-the-art query rewriting models, while substantially reducing deployment costs. We also present the ChatRAG Bench, which encompasses ten datasets covering comprehensive evaluations on RAG, table-related QA, arithmetic calculations, and scenarios involving unanswerable questions. Our ChatQA-1.0-70B (score: 54.14), built on Llama2, a weaker foundation model than GPT-4, can slightly outperform GPT-4-0613 (score: 53.90) and GPT-4-Turbo-2024-04-09 (score: 54.03) on the ChatRAG Bench, without relying on any synthetic data from OpenAI GPT models. Notably, the Llama3-ChatQA-1.5-70B model surpasses the accuracy of GPT-4-Turbo-2024-04-09, achieving a 4.4% improvement. To advance research in this field, we open-sourced the model weights, instruction tuning data, ChatRAG Bench, and retriever for the community: https://chatqa-project.github.io/.

Updated: 2024-10-30 02:58:14

标题: ChatQA：在对话问答和RAG上超越GPT-4

摘要: 在这项工作中，我们介绍了ChatQA，这是一套在检索增强生成（RAG）和对话问答（QA）方面优于GPT-4的模型。为了增强生成，我们提出了一种两阶段指导调整方法，可以显著提升RAG的性能。为了有效检索，我们引入了一种针对对话QA优化的密集检索器，其结果与备选的最新查询重写模型相媲美，同时大大降低了部署成本。我们还推出了ChatRAG Bench，涵盖了十个数据集，对RAG、与表格相关的QA、算术计算和涉及无法回答的问题的情景进行了全面评估。我们的ChatQA-1.0-70B（得分：54.14），建立在Llama2上，比GPT-4更弱的基础模型，可以在ChatRAG Bench上略微优于GPT-4-0613（得分：53.90）和GPT-4-Turbo-2024-04-09（得分：54.03），而无需依赖来自OpenAI GPT模型的任何合成数据。值得注意的是，Llama3-ChatQA-1.5-70B模型超过了GPT-4-Turbo-2024-04-09的准确性，实现了4.4%的改进。为了推动这一领域的研究，我们向社区开放了模型权重、指导调整数据、ChatRAG Bench和检索器：https://chatqa-project.github.io/。

更新时间: 2024-10-30 02:58:14

领域: cs.CL,cs.AI,cs.IR,cs.LG

下载: http://arxiv.org/abs/2401.10225v5

CooHOI: Learning Cooperative Human-Object Interaction with Manipulated Object Dynamics

Enabling humanoid robots to clean rooms has long been a pursued dream within humanoid research communities. However, many tasks require multi-humanoid collaboration, such as carrying large and heavy furniture together. Given the scarcity of motion capture data on multi-humanoid collaboration and the efficiency challenges associated with multi-agent learning, these tasks cannot be straightforwardly addressed using training paradigms designed for single-agent scenarios. In this paper, we introduce Cooperative Human-Object Interaction (CooHOI), a framework designed to tackle the challenge of multi-humanoid object transportation problem through a two-phase learning paradigm: individual skill learning and subsequent policy transfer. First, a single humanoid character learns to interact with objects through imitation learning from human motion priors. Then, the humanoid learns to collaborate with others by considering the shared dynamics of the manipulated object using centralized training and decentralized execution (CTDE) multi-agent RL algorithms. When one agent interacts with the object, resulting in specific object dynamics changes, the other agents learn to respond appropriately, thereby achieving implicit communication and coordination between teammates. Unlike previous approaches that relied on tracking-based methods for multi-humanoid HOI, CooHOI is inherently efficient, does not depend on motion capture data of multi-humanoid interactions, and can be seamlessly extended to include more participants and a wide range of object types.

Updated: 2024-10-30 02:58:10

标题: CooHOI：利用操纵物体动力学学习合作人-物体交互

摘要: 使人形机器人能够清洁房间长期以来一直是人形研究社区追求的梦想。然而，许多任务需要多个人形机器人协作，例如一起搬动大型和沉重的家具。鉴于多人形合作的动作捕捉数据稀缺以及与多智体学习相关的效率挑战，这些任务不能简单地通过针对单一智能体场景设计的训练范式来解决。在本文中，我们介绍了合作式人物-物体互动（CooHOI）框架，旨在通过两阶段学习范式解决多人形搬运问题的挑战：个体技能学习和随后的策略转移。首先，单个人形角色通过从人类运动先验中进行模仿学习来学习与物体互动。然后，通过考虑通过集中训练和分散执行（CTDE）多智体强化学习算法操作的操纵对象的共享动态，人形学习与他人合作。当一个智能体与物体互动时，导致特定物体动力学变化，其他智能体学会适当响应，从而实现队友之间的隐式沟通和协调。与先前依赖于基于跟踪的方法进行多人形HOI的方法不同，CooHOI本质上是高效的，不依赖于多人形互动的运动捕捉数据，并且可以无缝扩展以包括更多参与者和各种物体类型。

更新时间: 2024-10-30 02:58:10

领域: cs.RO,cs.AI

下载: http://arxiv.org/abs/2406.14558v3

Incremental Learning of Retrievable Skills For Efficient Continual Task Adaptation

Continual Imitation Learning (CiL) involves extracting and accumulating task knowledge from demonstrations across multiple stages and tasks to achieve a multi-task policy. With recent advancements in foundation models, there has been a growing interest in adapter-based CiL approaches, where adapters are established parameter-efficiently for tasks newly demonstrated. While these approaches isolate parameters for specific tasks and tend to mitigate catastrophic forgetting, they limit knowledge sharing among different demonstrations. We introduce IsCiL, an adapter-based CiL framework that addresses this limitation of knowledge sharing by incrementally learning shareable skills from different demonstrations, thus enabling sample-efficient task adaptation using the skills particularly in non-stationary CiL environments. In IsCiL, demonstrations are mapped into the state embedding space, where proper skills can be retrieved upon input states through prototype-based memory. These retrievable skills are incrementally learned on their corresponding adapters. Our CiL experiments with complex tasks in Franka-Kitchen and Meta-World demonstrate robust performance of IsCiL in both task adaptation and sample-efficiency. We also show a simple extension of IsCiL for task unlearning scenarios.

Updated: 2024-10-30 02:57:35

标题: Incremental Learning of Retrievable Skills For Efficient Continual Task Adaptation （持续任务适应的高效可检索技能的增量学习）

摘要: 持续模仿学习（CiL）涉及从多个阶段和任务的演示中提取和积累任务知识，以实现多任务策略。随着基础模型的最新进展，基于适配器的CiL方法越来越受到关注，其中适配器可以高效地为新演示的任务建立。虽然这些方法隔离了特定任务的参数并且倾向于减轻灾难性遗忘，但它们限制了不同演示之间的知识共享。我们引入了IsCiL，一个基于适配器的CiL框架，通过逐渐学习来自不同演示的可共享技能来解决这一知识共享的局限性，从而在非稳态CiL环境中使用这些技能实现高效的任务适应。在IsCiL中，演示被映射到状态嵌入空间，在这里可以通过基于原型的内存在输入状态上检索出合适的技能。这些可检索的技能逐渐学习在其相应的适配器上。我们在Franka-Kitchen和Meta-World中进行了复杂任务的CiL实验，展示了IsCiL在任务适应和样本效率方面的强大性能。我们还展示了IsCiL的简单扩展，适用于任务取消学习场景。

更新时间: 2024-10-30 02:57:35

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2410.22658v1

Denoising diffusion probabilistic models are optimally adaptive to unknown low dimensionality

The denoising diffusion probabilistic model (DDPM) has emerged as a mainstream generative model in generative AI. While sharp convergence guarantees have been established for the DDPM, the iteration complexity is, in general, proportional to the ambient data dimension, resulting in overly conservative theory that fails to explain its practical efficiency. This has motivated the recent work Li and Yan (2024a) to investigate how the DDPM can achieve sampling speed-ups through automatic exploitation of intrinsic low dimensionality of data. We strengthen this line of work by demonstrating, in some sense, optimal adaptivity to unknown low dimensionality. For a broad class of data distributions with intrinsic dimension $k$, we prove that the iteration complexity of the DDPM scales nearly linearly with $k$, which is optimal when using KL divergence to measure distributional discrepancy. Notably, our work is closely aligned with the independent concurrent work Potaptchik et al. (2024) -- posted two weeks prior to ours -- in establishing nearly linear-$k$ convergence guarantees for the DDPM.

Updated: 2024-10-30 02:55:47

标题: 去噪扩散概率模型对未知低维度最适应。

摘要: 去噪扩散概率模型（DDPM）已经成为生成式人工智能中的主流生成模型。虽然已经为DDPM建立了尖锐的收敛保证，但迭代复杂度通常与环境数据维度成正比，导致过于保守的理论无法解释其实际效率。这促使最近的李和严（2024a）对DDPM如何通过自动利用数据固有的低维性来实现采样加速进行了调查。我们通过在某种意义上展示对未知低维度的最佳适应性来加强这一研究方向。对于具有固有维度$k$的广泛数据分布，我们证明了DDPM的迭代复杂度几乎与$k$成线性关系，这在使用KL散度来衡量分布不一致性时是最优的。值得注意的是，我们的工作与独立的同时工作Potaptchik等人（2024）密切相关，在建立了DDPM的近线性$k$收敛保证之前发布了两周。

更新时间: 2024-10-30 02:55:47

领域: cs.LG,cs.NA,eess.SP,math.NA,math.ST,stat.ML,stat.TH

下载: http://arxiv.org/abs/2410.18784v2

Reweighting Local Mimina with Tilted SAM

Sharpness-Aware Minimization (SAM) has been demonstrated to improve the generalization performance of overparameterized models by seeking flat minima on the loss landscape through optimizing model parameters that incur the largest loss within a neighborhood. Nevertheless, such min-max formulations are computationally challenging especially when the problem is highly non-convex. Additionally, focusing only on the worst-case local solution while ignoring potentially many other local solutions may be suboptimal when searching for flat minima. In this work, we propose Tilted SAM (TSAM), a generalization of SAM inspired by exponential tilting that effectively assigns higher priority to local solutions that are flatter and that incur larger losses. TSAM is parameterized by a tilt hyperparameter t and reduces to SAM as t approaches infinity. We prove that (1) the TSAM objective is smoother than SAM and thus easier to optimize; and (2) TSAM explicitly favors flatter minima as t increases. This is desirable as flatter minima could have better generalization properties for certain tasks. We develop algorithms motivated by the discretization of Hamiltonian dynamics to solve TSAM. Empirically, TSAM arrives at flatter local minima and results in superior test performance than the baselines of SAM and ERM across a range of image and text tasks.

Updated: 2024-10-30 02:49:48

标题: 使用倾斜的SAM重新调整本地最小值

摘要: 锐度感知最小化（SAM）已被证明可以通过优化在邻域内产生最大损失的模型参数来寻找损失景观上的平坦最小值，从而提高过度参数化模型的泛化性能。然而，这种极小-极大的公式在问题高度非凸时尤其具有挑战性。此外，仅关注最坏情况下的局部解而忽略可能存在的许多其他局部解可能在搜索平坦最小值时是次优的。在这项工作中，我们提出了倾斜SAM（TSAM），这是SAM的一个泛化，受指数倾斜启发，有效地将更高的优先级分配给更平坦且产生更大损失的局部解。TSAM由倾斜超参数t参数化，并在t趋于无穷大时退化为SAM。我们证明了（1）TSAM目标比SAM更平滑，因此更容易优化；以及（2）TSAM在t增加时明确偏爱更平坦的最小值。这是令人期待的，因为更平坦的最小值对某些任务可能具有更好的泛化特性。我们开发了受哈密顿动力学离散化启发的算法来解决TSAM。经验上，TSAM可以达到更平坦的局部最小值，并在一系列图像和文本任务中产生比SAM和ERM基线更优越的测试性能。

更新时间: 2024-10-30 02:49:48

领域: cs.LG

下载: http://arxiv.org/abs/2410.22656v1

Late Breaking Results: Fast System Technology Co-Optimization Framework for Emerging Technology Based on Graph Neural Networks

This paper proposes a fast system technology co-optimization (STCO) framework that optimizes power, performance, and area (PPA) for next-generation IC design, addressing the challenges and opportunities presented by novel materials and device architectures. We focus on accelerating the technology level of STCO using AI techniques, by employing graph neural network (GNN)-based approaches for both TCAD simulation and cell library characterization, which are interconnected through a unified compact model, collectively achieving over a 100X speedup over traditional methods. These advancements enable comprehensive STCO iterations with runtime speedups ranging from 1.9X to 14.1X and supports both emerging and traditional technologies.

Updated: 2024-10-30 02:44:09

标题: 最新进展：基于图神经网络的新兴技术快速系统技术协同优化框架

摘要: 本文提出了一种快速系统技术协同优化（STCO）框架，用于优化下一代集成电路设计的功耗、性能和面积（PPA），以应对新材料和器件架构带来的挑战和机遇。我们专注于利用人工智能技术加速STCO的技术水平，通过采用基于图神经网络（GNN）的方法来进行TCAD模拟和单元库特性化，这两者通过统一的紧凑模型相互连接，共同实现传统方法的100倍以上的加速。这些进展使得可以以1.9倍到14.1倍不等的速度提升进行全面的STCO迭代，并支持新兴和传统技术。

更新时间: 2024-10-30 02:44:09

领域: cs.ET,cs.AI

下载: http://arxiv.org/abs/2404.06939v4

FT-PrivacyScore: Personalized Privacy Scoring Service for Machine Learning Participation

Training data privacy has been a top concern in AI modeling. While methods like differentiated private learning allow data contributors to quantify acceptable privacy loss, model utility is often significantly damaged. In practice, controlled data access remains a mainstream method for protecting data privacy in many industrial and research environments. In controlled data access, authorized model builders work in a restricted environment to access sensitive data, which can fully preserve data utility with reduced risk of data leak. However, unlike differential privacy, there is no quantitative measure for individual data contributors to tell their privacy risk before participating in a machine learning task. We developed the demo prototype FT-PrivacyScore to show that it's possible to efficiently and quantitatively estimate the privacy risk of participating in a model fine-tuning task. The demo source code will be available at \url{https://github.com/RhincodonE/demo_privacy_scoring}.

Updated: 2024-10-30 02:41:26

标题: FT-PrivacyScore：用于机器学习参与的个性化隐私评分服务

摘要: 数据隐私培训一直是AI建模中的重要关注点。虽然像差异化私有学习这样的方法可以让数据贡献者量化可接受的隐私损失，但模型效用往往会受到严重损害。在实践中，受控数据访问仍然是许多工业和研究环境中保护数据隐私的主流方法。在受控数据访问中，授权的模型构建者在受限环境中访问敏感数据，这可以完全保留数据效用并降低数据泄露风险。然而，与差分隐私不同，没有量化措施可以让各个数据贡献者在参与机器学习任务之前评估他们的隐私风险。我们开发了演示原型FT-PrivacyScore，以展示参与模型微调任务的隐私风险可以高效且定量地估计。演示源代码将在\url{https://github.com/RhincodonE/demo_privacy_scoring}上提供。

更新时间: 2024-10-30 02:41:26

领域: cs.LG,cs.CR

下载: http://arxiv.org/abs/2410.22651v1

LLaVA-VSD: Large Language-and-Vision Assistant for Visual Spatial Description

Visual Spatial Description (VSD) aims to generate texts that describe the spatial relationships between objects within images. Traditional visual spatial relationship classification (VSRC) methods typically output the spatial relationship between two objects in an image, often neglecting world knowledge and lacking general language capabilities. In this paper, we propose a Large Language-and-Vision Assistant for Visual Spatial Description, named LLaVA-VSD, which is designed for the classification, description, and open-ended description of visual spatial relationships. Specifically, the model first constructs a VSD instruction-following dataset using given figure-caption pairs for the three tasks. It then employs LoRA to fine-tune a Large Language and Vision Assistant for VSD, which has 13 billion parameters and supports high-resolution images. Finally, a large language model (Qwen-2) is used to refine the generated sentences, enhancing their diversity and accuracy. LLaVA-VSD demonstrates excellent multimodal conversational capabilities and can follow open-ended instructions to assist with inquiries about object relationships in images.

Updated: 2024-10-30 02:38:29

标题: LLaVA-VSD：大规模语言与视觉助手用于视觉空间描述

摘要: 可视空间描述（VSD）旨在生成描述图像中物体之间空间关系的文本。传统的可视空间关系分类（VSRC）方法通常输出图像中两个物体之间的空间关系，往往忽略世界知识并缺乏通用语言能力。在本文中，我们提出了一个大型语言和视觉助手，名为LLaVA-VSD，旨在对可视空间关系进行分类、描述和开放式描述。具体而言，该模型首先利用给定的图像标题对构建一个VSD指令遵循数据集，用于三项任务。然后，它采用LoRA来微调一个具有130亿参数并支持高分辨率图像的大型语言和视觉助手，用于VSD。最后，一个大型语言模型（Qwen-2）用于完善生成的句子，增强其多样性和准确性。LLaVA-VSD展示出出色的多模式对话能力，并能够遵循开放式指令，帮助查询图像中物体之间的关系。

更新时间: 2024-10-30 02:38:29

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2408.04957v4

WaveRoRA: Wavelet Rotary Route Attention for Multivariate Time Series Forecasting

In recent years, Transformer-based models (Transformers) have achieved significant success in multivariate time series forecasting (MTSF). However, previous works focus on extracting features either from the time domain or the frequency domain, which inadequately captures the trends and periodic characteristics. To address this issue, we propose a wavelet learning framework to model complex temporal dependencies of the time series data. The wavelet domain integrates both time and frequency information, allowing for the analysis of local characteristics of signals at different scales. Additionally, the Softmax self-attention mechanism used by Transformers has quadratic complexity, which leads to excessive computational costs when capturing long-term dependencies. Therefore, we propose a novel attention mechanism: Rotary Route Attention (RoRA). Unlike Softmax attention, RoRA utilizes rotary position embeddings to inject relative positional information to sequence tokens and introduces a small number of routing tokens $r$ to aggregate information from the $KV$ matrices and redistribute it to the $Q$ matrix, offering linear complexity. We further propose WaveRoRA, which leverages RoRA to capture inter-series dependencies in the wavelet domain. We conduct extensive experiments on eight real-world datasets. The results indicate that WaveRoRA outperforms existing state-of-the-art models while maintaining lower computational costs.

Updated: 2024-10-30 02:36:55

标题: WaveRoRA：用于多变量时间序列预测的小波旋转路由注意力模型

摘要: 最近几年，基于Transformer的模型在多变量时间序列预测（MTSF）方面取得了显著成功。然而，先前的研究集中在从时域或频域提取特征，这不足以捕捉趋势和周期特征。为了解决这个问题，我们提出了一个小波学习框架来建模时间序列数据的复杂时间依赖关系。小波域整合了时间和频率信息，允许在不同尺度上分析信号的局部特征。此外，Transformer使用的Softmax自注意机制具有二次复杂度，这导致在捕捉长期依赖关系时计算成本过高。因此，我们提出了一种新颖的注意机制：Rotary Route Attention（RoRA）。与Softmax注意不同，RoRA利用旋转位置嵌入将相对位置信息注入序列标记，并引入少量路由标记$r$来聚合来自$KV$矩阵的信息并将其重新分配给$Q$矩阵，提供线性复杂度。我们进一步提出了WaveRoRA，利用RoRA在小波域中捕捉不同时间序列之间的依赖关系。我们在八个真实世界数据集上进行了大量实验。实验结果表明，WaveRoRA在保持较低计算成本的同时优于现有的最先进模型。

更新时间: 2024-10-30 02:36:55

领域: cs.LG

下载: http://arxiv.org/abs/2410.22649v1

Improving Apple Object Detection with Occlusion-Enhanced Distillation

Apples growing in natural environments often face severe visual obstructions from leaves and branches. This significantly increases the risk of false detections in object detection tasks, thereby escalating the challenge. Addressing this issue, we introduce a technique called "Occlusion-Enhanced Distillation" (OED). This approach utilizes occlusion information to regularize the learning of semantically aligned features on occluded datasets and employs Exponential Moving Average (EMA) to enhance training stability. Specifically, we first design an occlusion-enhanced dataset that integrates Grounding DINO and SAM methods to extract occluding elements such as leaves and branches from each sample, creating occlusion examples that reflect the natural growth state of fruits. Additionally, we propose a multi-scale knowledge distillation strategy, where the student network uses images with increased occlusions as inputs, while the teacher network employs images without natural occlusions. Through this setup, the strategy guides the student network to learn from the teacher across scales of semantic and local features alignment, effectively narrowing the feature distance between occluded and non-occluded targets and enhancing the robustness of object detection. Lastly, to improve the stability of the student network, we introduce the EMA strategy, which aids the student network in learning more generalized feature expressions that are less affected by the noise of individual image occlusions. Our method significantly outperforms current state-of-the-art techniques through extensive comparative experiments.

Updated: 2024-10-30 02:36:18

标题: 通过遮挡增强蒸馏来改善苹果物体检测

摘要: 在自然环境中生长的苹果通常面临来自叶子和树枝的严重视觉遮挡。这显著增加了目标检测任务中误检测的风险，从而加大了挑战。为解决这一问题，我们引入了一种名为“遮挡增强蒸馏”（OED）的技术。这种方法利用遮挡信息来规范学习在遮挡数据集上的语义对齐特征，并采用指数移动平均（EMA）来增强训练稳定性。具体来说，我们首先设计了一个遮挡增强数据集，该数据集集成了Grounding DINO和SAM方法，从每个样本中提取叶子和树枝等遮挡元素，创建了反映水果自然生长状态的遮挡示例。此外，我们提出了一种多尺度知识蒸馏策略，其中学生网络使用增加遮挡的图像作为输入，而教师网络使用没有自然遮挡的图像。通过这种设置，该策略引导学生网络从教师网络中跨语义和局部特征对齐尺度学习，有效缩小遮挡和非遮挡目标之间的特征距离，并增强目标检测的鲁棒性。最后，为了提高学生网络的稳定性，我们引入了EMA策略，该策略帮助学生网络学习更一般化的特征表达，减少受到单个图像遮挡噪声的影响。我们的方法通过大量比较实验显著优于当前最先进的技术。

更新时间: 2024-10-30 02:36:18

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2409.01573v2

MDAgents: An Adaptive Collaboration of LLMs for Medical Decision-Making

Foundation models are becoming valuable tools in medicine. Yet despite their promise, the best way to leverage Large Language Models (LLMs) in complex medical tasks remains an open question. We introduce a novel multi-agent framework, named Medical Decision-making Agents (MDAgents) that helps address this gap by automatically assigning a collaboration structure to a team of LLMs. The assigned solo or group collaboration structure is tailored to the medical task at hand, emulating real-world medical decision-making processes adapted to tasks of varying complexities. We evaluate our framework and baseline methods using state-of-the-art LLMs across a suite of real-world medical knowledge and medical diagnosis benchmarks, including a comparison of LLMs' medical complexity classification against human physicians. MDAgents achieved the best performance in seven out of ten benchmarks on tasks requiring an understanding of medical knowledge and multi-modal reasoning, showing a significant improvement of up to 4.2% (p < 0.05) compared to previous methods' best performances. Ablation studies reveal that MDAgents effectively determines medical complexity to optimize for efficiency and accuracy across diverse medical tasks. Notably, the combination of moderator review and external medical knowledge in group collaboration resulted in an average accuracy improvement of 11.8%. Our code can be found at https://github.com/mitmedialab/MDAgents.

Updated: 2024-10-30 02:32:43

标题: MDAgents：用于医疗决策的LLMs自适应协作

摘要: 基金会模型正在成为医学中宝贵的工具。然而，尽管它们很有潜力，但如何最好地利用大型语言模型（LLMs）进行复杂医学任务仍然是一个悬而未决的问题。我们引入了一个新颖的多代理框架，名为医学决策代理（MDAgents），通过自动为LLMs团队分配协作结构来帮助解决这一问题。分配的独立或集体协作结构根据手头的医学任务定制，模拟了适应不同复杂性任务的现实世界医学决策过程。我们使用最先进的LLMs在一系列真实世界医学知识和医学诊断基准上评估我们的框架和基线方法，包括将LLMs的医学复杂性分类与人类医生进行比较。MDAgents在需要理解医学知识和多模态推理的十个基准任务中的七个中表现最好，相比于先前方法的最佳表现，显示出高达4.2%的显着改进（p < 0.05）。消融研究显示，MDAgents有效地确定医学复杂性，以优化各种医学任务的效率和准确性。值得注意的是，在集体协作中结合主持人审查和外部医学知识导致平均准确性提高了11.8%。我们的代码可以在https://github.com/mitmedialab/MDAgents找到。

更新时间: 2024-10-30 02:32:43

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2404.15155v3

The Prevalence of Neural Collapse in Neural Multivariate Regression

Recently it has been observed that neural networks exhibit Neural Collapse (NC) during the final stage of training for the classification problem. We empirically show that multivariate regression, as employed in imitation learning and other applications, exhibits Neural Regression Collapse (NRC), a new form of neural collapse: (NRC1) The last-layer feature vectors collapse to the subspace spanned by the $n$ principal components of the feature vectors, where $n$ is the dimension of the targets (for univariate regression, $n=1$); (NRC2) The last-layer feature vectors also collapse to the subspace spanned by the last-layer weight vectors; (NRC3) The Gram matrix for the weight vectors converges to a specific functional form that depends on the covariance matrix of the targets. After empirically establishing the prevalence of (NRC1)-(NRC3) for a variety of datasets and network architectures, we provide an explanation of these phenomena by modeling the regression task in the context of the Unconstrained Feature Model (UFM), in which the last layer feature vectors are treated as free variables when minimizing the loss function. We show that when the regularization parameters in the UFM model are strictly positive, then (NRC1)-(NRC3) also emerge as solutions in the UFM optimization problem. We also show that if the regularization parameters are equal to zero, then there is no collapse. To our knowledge, this is the first empirical and theoretical study of neural collapse in the context of regression. This extension is significant not only because it broadens the applicability of neural collapse to a new category of problems but also because it suggests that the phenomena of neural collapse could be a universal behavior in deep learning.

Updated: 2024-10-30 02:32:21

标题: 《神经多变量回归中神经崩溃的普遍性》

摘要: 最近观察到神经网络在分类问题的训练最后阶段表现出神经崩溃（NC）。我们通过实证方法展示，多变量回归，在模仿学习和其他应用中所采用的方法，表现出神经回归崩溃（NRC），一种新形式的神经崩溃：（NRC1）最后一层特征向量坍缩到由特征向量的$n$个主成分张成的子空间，其中$n$是目标的维度（对于单变量回归，$n=1$）；（NRC2）最后一层特征向量也坍缩到由最后一层权重向量张成的子空间；（NRC3）权重向量的格拉姆矩阵收敛到一个特定的函数形式，该函数形式取决于目标的协方差矩阵。在经验上确定了（NRC1）-（NRC3）在各种数据集和网络架构中普遍存在后，我们通过将回归任务建模在无约束特征模型（UFM）的背景下来解释这些现象，在该模型中，最后一层特征向量在最小化损失函数时被视为自由变量。我们表明，当UFM模型中的正则化参数严格为正时，（NRC1）-（NRC3）也会作为UFM优化问题的解出现。我们还表明，如果正则化参数等于零，则不会发生崩溃。据我们所知，这是关于回归领域神经崩溃的第一项经验和理论研究。这个拓展不仅因为它将神经崩溃的适用性扩展到一个新的问题类别而显著，而且因为它暗示了神经崩溃现象可能是深度学习中的普遍行为。

更新时间: 2024-10-30 02:32:21

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2409.04180v2

Anchor-free Clustering based on Anchor Graph Factorization

Anchor-based methods are a pivotal approach in handling clustering of large-scale data. However, these methods typically entail two distinct stages: selecting anchor points and constructing an anchor graph. This bifurcation, along with the initialization of anchor points, significantly influences the overall performance of the algorithm. To mitigate these issues, we introduce a novel method termed Anchor-free Clustering based on Anchor Graph Factorization (AFCAGF). AFCAGF innovates in learning the anchor graph, requiring only the computation of pairwise distances between samples. This process, achievable through straightforward optimization, circumvents the necessity for explicit selection of anchor points. More concretely, our approach enhances the Fuzzy k-means clustering algorithm (FKM), introducing a new manifold learning technique that obviates the need for initializing cluster centers. Additionally, we evolve the concept of the membership matrix between cluster centers and samples in FKM into an anchor graph encompassing multiple anchor points and samples. Employing Non-negative Matrix Factorization (NMF) on this anchor graph allows for the direct derivation of cluster labels, thereby eliminating the requirement for further post-processing steps. To solve the method proposed, we implement an alternating optimization algorithm that ensures convergence. Empirical evaluations on various real-world datasets underscore the superior efficacy of our algorithm compared to traditional approaches.

Updated: 2024-10-30 02:32:03

标题: 基于锚图因式分解的无锚聚类

摘要: 锚点方法是处理大规模数据聚类的一个关键方法。然而，这些方法通常包括两个明显的阶段：选择锚点和构建锚图。这种分叉，以及锚点的初始化，显著影响算法的整体性能。为了缓解这些问题，我们引入了一种名为基于锚图因子分解的无锚点聚类方法（AFCAGF）。AFCAGF在学习锚图方面进行了创新，只需要计算样本之间的成对距离。这个过程通过简单的优化实现，避免了明确选择锚点的必要性。更具体地说，我们的方法增强了模糊K均值聚类算法（FKM），引入了一种新的流形学习技术，消除了初始化聚类中心的需要。此外，我们将FKM中聚类中心和样本之间的成员矩阵的概念演变为一个包含多个锚点和样本的锚图。在这个锚图上应用非负矩阵分解（NMF）允许直接推导出聚类标签，从而消除了进一步后处理步骤的要求。为了解决所提出的方法，我们实现了一种交替优化算法，确保收敛。对各种真实世界数据集的实证评估突显了我们的算法相对于传统方法的卓越效果。

更新时间: 2024-10-30 02:32:03

领域: cs.LG

下载: http://arxiv.org/abs/2402.15688v2

SleepNetZero: Zero-Burden Zero-Shot Reliable Sleep Staging With Neural Networks Based on Ballistocardiograms

Sleep monitoring plays a crucial role in maintaining good health, with sleep staging serving as an essential metric in the monitoring process. Traditional methods, utilizing medical sensors like EEG and ECG, can be effective but often present challenges such as unnatural user experience, complex deployment, and high costs. Ballistocardiography~(BCG), a type of piezoelectric sensor signal, offers a non-invasive, user-friendly, and easily deployable alternative for long-term home monitoring. However, reliable BCG-based sleep staging is challenging due to the limited sleep monitoring data available for BCG. A restricted training dataset prevents the model from generalization across populations. Additionally, transferring to BCG faces difficulty ensuring model robustness when migrating from other data sources. To address these issues, we introduce SleepNetZero, a zero-shot learning based approach for sleep staging. To tackle the generalization challenge, we propose a series of BCG feature extraction methods that align BCG components with corresponding respiratory, cardiac, and movement channels in PSG. This allows models to be trained on large-scale PSG datasets that are diverse in population. For the migration challenge, we employ data augmentation techniques, significantly enhancing generalizability. We conducted extensive training and testing on large datasets~(12393 records from 9637 different subjects), achieving an accuracy of 0.803 and a Cohen's Kappa of 0.718. ZeroSleepNet was also deployed in real prototype~(monitoring pads) and tested in actual hospital settings~(265 users), demonstrating an accuracy of 0.697 and a Cohen's Kappa of 0.589. To the best of our knowledge, this work represents the first known reliable BCG-based sleep staging effort and marks a significant step towards in-home health monitoring.

Updated: 2024-10-30 02:25:47

标题: SleepNetZero：基于心搏图的神经网络的零负担零射击可靠睡眠分期

摘要: 睡眠监测在保持良好健康方面起着至关重要的作用，睡眠分期在监测过程中扮演着一个关键指标的角色。传统方法利用脑电图（EEG）和心电图（ECG）等医疗传感器可以有效，但往往存在用户体验不自然、部署复杂和成本高的挑战。心动描记术（BCG）作为一种压电传感器信号，提供了一种非侵入性、用户友好且易于部署的长期家庭监测替代方案。然而，由于BCG的睡眠监测数据有限，可靠的基于BCG的睡眠分期具有挑战性。有限的训练数据集阻碍了模型在不同人群中的泛化。此外，从其他数据源迁移到BCG时，确保模型的稳健性也面临困难。为了解决这些问题，我们引入了基于零样本学习的SleepNetZero方法用于睡眠分期。为了解决泛化挑战，我们提出了一系列BCG特征提取方法，将BCG组件与PSG中的相应呼吸、心脏和运动通道进行对齐。这使得模型可以在多样人群的大规模PSG数据集上进行训练。为了解决迁移挑战，我们采用数据增强技术，显著增强了泛化能力。我们在大型数据集上进行了广泛的训练和测试（来自9637个不同被试者的12393个记录），实现了0.803的准确率和0.718的Cohen's Kappa。ZeroSleepNet还在实际原型（监测垫）中部署，并在实际医院环境中进行测试（265名用户），展示了0.697的准确率和0.589的Cohen's Kappa。据我们所知，这项工作代表了已知的首个可靠的基于BCG的睡眠分期工作，并标志着家庭健康监测迈出了重要一步。

更新时间: 2024-10-30 02:25:47

领域: eess.SP,cs.LG

下载: http://arxiv.org/abs/2410.22646v1

Survey on Large Language Model-Enhanced Reinforcement Learning: Concept, Taxonomy, and Methods

With extensive pre-trained knowledge and high-level general capabilities, large language models (LLMs) emerge as a promising avenue to augment reinforcement learning (RL) in aspects such as multi-task learning, sample efficiency, and high-level task planning. In this survey, we provide a comprehensive review of the existing literature in LLM-enhanced RL and summarize its characteristics compared to conventional RL methods, aiming to clarify the research scope and directions for future studies. Utilizing the classical agent-environment interaction paradigm, we propose a structured taxonomy to systematically categorize LLMs' functionalities in RL, including four roles: information processor, reward designer, decision-maker, and generator. For each role, we summarize the methodologies, analyze the specific RL challenges that are mitigated, and provide insights into future directions. Lastly, a comparative analysis of each role, potential applications, prospective opportunities, and challenges of the LLM-enhanced RL are discussed. By proposing this taxonomy, we aim to provide a framework for researchers to effectively leverage LLMs in the RL field, potentially accelerating RL applications in complex applications such as robotics, autonomous driving, and energy systems.

Updated: 2024-10-30 02:22:46

标题: 对大型语言模型增强强化学习的调查：概念、分类和方法

摘要: 借助广泛的预训练知识和高水平的通用能力，大型语言模型(LLMs)成为增强强化学习(RL)的一个有前途的途径，可以在多任务学习、样本效率和高级任务规划等方面提升。在这项调查中，我们对LLM增强的RL现有文献进行了全面回顾，并总结了与传统RL方法相比的特点，旨在澄清未来研究的范围和方向。利用经典的代理-环境交互范式，我们提出了一个结构化分类法，系统地对LLMs在RL中的功能进行分类，包括四个角色：信息处理器、奖励设计者、决策者和生成器。对于每个角色，我们总结了方法论，分析了缓解的具体RL挑战，并提供了未来方向的见解。最后，讨论了每个角色、潜在应用、前景机会和LLM增强RL的挑战的比较分析。通过提出这一分类法，我们旨在为研究人员提供一个框架，有效地利用LLMs在RL领域，潜在地加速RL在复杂应用中的应用，如机器人技术、自动驾驶和能源系统。

更新时间: 2024-10-30 02:22:46

领域: cs.LG,cs.AI,cs.CL,cs.RO

下载: http://arxiv.org/abs/2404.00282v3

Comparing Template-based and Template-free Language Model Probing

The differences between cloze-task language model (LM) probing with 1) expert-made templates and 2) naturally-occurring text have often been overlooked. Here, we evaluate 16 different LMs on 10 probing English datasets -- 4 template-based and 6 template-free -- in general and biomedical domains to answer the following research questions: (RQ1) Do model rankings differ between the two approaches? (RQ2) Do models' absolute scores differ between the two approaches? (RQ3) Do the answers to RQ1 and RQ2 differ between general and domain-specific models? Our findings are: 1) Template-free and template-based approaches often rank models differently, except for the top domain-specific models. 2) Scores decrease by up to 42% Acc@1 when comparing parallel template-free and template-based prompts. 3) Perplexity is negatively correlated with accuracy in the template-free approach, but, counter-intuitively, they are positively correlated for template-based probing. 4) Models tend to predict the same answers frequently across prompts for template-based probing, which is less common when employing template-free techniques.

Updated: 2024-10-30 02:16:34

标题: 比较基于模板和无模板语言模型探测

摘要: 使用1) 专家制作的模板和2) 自然发生的文本进行闭式任务语言模型（LM）探测之间的差异经常被忽视。在这里，我们评估了16种不同的LM在10个探测英语数据集上的表现--4个基于模板和6个无模板--涵盖一般和生物医学领域，以回答以下研究问题：（RQ1）两种方法之间的模型排名是否不同？（RQ2）模型的绝对得分是否在两种方法之间不同？（RQ3）RQ1和RQ2的答案是否在一般和特定领域模型之间不同？我们的发现是：1）无模板和基于模板的方法经常以不同的方式对模型进行排名，除了顶部特定领域模型。2）在比较平行的无模板和基于模板的提示时，Acc@1得分最多降低42％。3）在无模板方法中，困惑度与准确性呈负相关，但令人费解的是，在基于模板的探测中，它们呈正相关。4）在基于模板的探测中，模型往往频繁地预测相同的答案，而在使用无模板技术时这种情况较少见。

更新时间: 2024-10-30 02:16:34

领域: cs.CL,cs.LG

下载: http://arxiv.org/abs/2402.00123v2

Prove Your Point!: Bringing Proof-Enhancement Principles to Argumentative Essay Generation

Argumentative essay generation (AEG) aims to generate complete texts on specific controversial topics or debates. Although current AEG methods can generate individual opinions, they often overlook the high-level connections between these opinions. This often leads to the generated results being mired in logical confusion, unable to proof their own arguments effectively. The generated essay may present evidence that contradicts the claims or they may fail to assemble the claims into logical flow. In this paper, we present a unified two-stage framework: Proof-Enhancement and Self-Annotation (PESA) for AEG with a focus on logical enhancement. Specifically, we first construct pseudo-labels for logical information,claims and grounds, using a large language model. We then propose a tree planning approach that introduces proof principles and ensures logical consistency. Extensive experimental results show that, benefiting from proof principle guidance, PESA generates argumentative essays with better logical validity and persuasiveness than strong baseline models.

Updated: 2024-10-30 02:13:39

标题: 证明你的观点！：将证据增强原则引入论证性文章生成

摘要: Argumentative Essay Generation (AEG)旨在针对特定有争议的话题或辩论生成完整的文本。尽管当前的AEG方法可以生成个别意见，但它们经常忽视这些意见之间的高层连接。这经常导致生成的结果深陷逻辑混乱，无法有效证明自己的论点。生成的论文可能提供与声明相矛盾的证据，或者它们可能无法将声明组合成逻辑流。在本文中，我们提出了一个统一的两阶段框架：Proof-Enhancement and Self-Annotation (PESA)，重点是逻辑增强。具体而言，我们首先利用一个大型语言模型构建逻辑信息、声明和依据的伪标签。然后，我们提出了一种树规划方法，引入证明原则并确保逻辑一致性。广泛的实验结果表明，受益于证明原则指导，PESA生成的论辩文章在逻辑有效性和说服力方面优于强基线模型。

更新时间: 2024-10-30 02:13:39

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2410.22642v1

Adaptive Transfer Clustering: A Unified Framework

We propose a general transfer learning framework for clustering given a main dataset and an auxiliary one about the same subjects. The two datasets may reflect similar but different latent grouping structures of the subjects. We propose an adaptive transfer clustering (ATC) algorithm that automatically leverages the commonality in the presence of unknown discrepancy, by optimizing an estimated bias-variance decomposition. It applies to a broad class of statistical models including Gaussian mixture models, stochastic block models, and latent class models. A theoretical analysis proves the optimality of ATC under the Gaussian mixture model and explicitly quantifies the benefit of transfer. Extensive simulations and real data experiments confirm our method's effectiveness in various scenarios.

Updated: 2024-10-30 02:11:35

标题: 自适应转移聚类：一个统一框架

摘要: 我们提出了一个通用的迁移学习框架，用于聚类分析给定一个关于相同主题的主数据集和一个辅助数据集。这两个数据集可能反映了主题的类似但不同的潜在分组结构。我们提出了一种自适应迁移聚类（ATC）算法，通过优化估计的偏差-方差分解，在未知差异存在的情况下自动利用共性。它适用于包括高斯混合模型、随机块模型和潜在类模型在内的广泛类别的统计模型。理论分析证明了在高斯混合模型下ATC的最佳性，并明确量化了迁移的好处。大量模拟和真实数据实验证实了我们方法在各种场景下的有效性。

更新时间: 2024-10-30 02:11:35

领域: stat.ME,cs.LG,math.ST,stat.ML,stat.TH,62F35, 62C20

下载: http://arxiv.org/abs/2410.21263v2

Consistency Diffusion Bridge Models

Diffusion models (DMs) have become the dominant paradigm of generative modeling in a variety of domains by learning stochastic processes from noise to data. Recently, diffusion denoising bridge models (DDBMs), a new formulation of generative modeling that builds stochastic processes between fixed data endpoints based on a reference diffusion process, have achieved empirical success across tasks with coupled data distribution, such as image-to-image translation. However, DDBM's sampling process typically requires hundreds of network evaluations to achieve decent performance, which may impede their practical deployment due to high computational demands. In this work, inspired by the recent advance of consistency models in DMs, we tackle this problem by learning the consistency function of the probability-flow ordinary differential equation (PF-ODE) of DDBMs, which directly predicts the solution at a starting step given any point on the ODE trajectory. Based on a dedicated general-form ODE solver, we propose two paradigms: consistency bridge distillation and consistency bridge training, which is flexible to apply on DDBMs with broad design choices. Experimental results show that our proposed method could sample $4\times$ to $50\times$ faster than the base DDBM and produce better visual quality given the same step in various tasks with pixel resolution ranging from $64 \times 64$ to $256 \times 256$, as well as supporting downstream tasks such as semantic interpolation in the data space.

Updated: 2024-10-30 02:04:23

标题: 一致性扩散桥模型

摘要: 扩散模型（DMs）已成为在各种领域中学习随机过程从噪声到数据的生成建模的主导范式。最近，扩散去噪桥模型（DDBMs），一种新的生成建模形式，在固定数据端点之间构建基于参考扩散过程的随机过程，已在具有耦合数据分布的任务中取得了实证成功，如图像到图像的翻译。然而，DDBM的采样过程通常需要数百次网络评估才能达到良好的性能，这可能会由于高计算需求而妨碍它们的实际部署。在这项工作中，受到DMs中一致性模型的最新进展的启发，我们通过学习DDBMs的概率流常微分方程（PF-ODE）的一致性函数来解决这个问题，该函数直接预测了在ODE轨迹上的任意点处的起始步骤的解。基于专用的一般形式ODE求解器，我们提出了两种范式：一致性桥蒸馏和一致性桥训练，可以灵活应用于具有广泛设计选择的DDBMs。实验结果表明，我们提出的方法在各种任务中能够比基本DDBM快4倍到50倍，并在像素分辨率从64×64到256×256的情况下产生更好的视觉质量，同时支持数据空间内的语义插值等下游任务。

更新时间: 2024-10-30 02:04:23

领域: cs.LG,cs.CV

下载: http://arxiv.org/abs/2410.22637v1

A Novel Score-CAM based Denoiser for Spectrographic Signature Extraction without Ground Truth

Sonar based audio classification techniques are a growing area of research in the field of underwater acoustics. Usually, underwater noise picked up by passive sonar transducers contains all types of signals that travel through the ocean and is transformed into spectrographic images. As a result, the corresponding spectrograms intended to display the temporal-frequency data of a certain object often include the tonal regions of abundant extraneous noise that can effectively interfere with a 'contact'. So, a majority of spectrographic samples extracted from underwater audio signals are rendered unusable due to their clutter and lack the required indistinguishability between different objects. With limited clean true data for supervised training, creating classification models for these audio signals is severely bottlenecked. This paper derives several new techniques to combat this problem by developing a novel Score-CAM based denoiser to extract an object's signature from noisy spectrographic data without being given any ground truth data. In particular, this paper proposes a novel generative adversarial network architecture for learning and producing spectrographic training data in similar distributions to low-feature spectrogram inputs. In addition, this paper also a generalizable class activation mapping based denoiser for different distributions of acoustic data, even real-world data distributions. Utilizing these novel architectures and proposed denoising techniques, these experiments demonstrate state-of-the-art noise reduction accuracy and improved classification accuracy than current audio classification standards. As such, this approach has applications not only to audio data but for countless data distributions used all around the world for machine learning.

Updated: 2024-10-30 02:02:40

标题: 一种基于评分CAM的新型去噪器，用于在没有基准真值的情况下提取光谱特征签名

摘要: 基于声纳的音频分类技术是水声学领域中一个不断发展的研究领域。通常，被 pass ive sonar 传感器捕捉到的水下噪音包含通过海洋传播的所有类型信号，并被转换成频谱图像。因此，旨在显示特定对象的时频数据的相应频谱图通常包括大量杂音的音调区域，这些杂音可以有效干扰“接触”。因此，由水下音频信号提取的大多数频谱图样本由于混乱而变得无用，缺乏不同对象之间所需的可区分性。由于受限于有限的干净真实数据进行监督训练，为这些音频信号创建分类模型变得严重受阻。本文提出了几种新技术来解决这一问题，通过开发一种基于 Score-CAM 的新型去噪器，从嘈杂的频谱数据中提取对象的特征，而无需提供任何地面真实数据。具体来说，本文提出了一种新颖的生成对抗网络架构，用于学习和生成类似于低特征频谱图输入的频谱训练数据分布。此外，本文还提出了一种基于一般化类激活映射的去噪器，适用于不同分布的声学数据，甚至适用于真实世界的数据分布。利用这些新颖的架构和提出的去噪技术，这些实验展示了比当前音频分类标准更高水平的噪声降低准确性和改进的分类准确性。因此，这种方法不仅适用于音频数据，而且适用于世界各地用于机器学习的无数数据分布。

更新时间: 2024-10-30 02:02:40

领域: cs.SD,cs.LG,eess.AS

下载: http://arxiv.org/abs/2410.21557v2

Self-Supervised Graph Embedding Clustering

The K-means one-step dimensionality reduction clustering method has made some progress in addressing the curse of dimensionality in clustering tasks. However, it combines the K-means clustering and dimensionality reduction processes for optimization, leading to limitations in the clustering effect due to the introduced hyperparameters and the initialization of clustering centers. Moreover, maintaining class balance during clustering remains challenging. To overcome these issues, we propose a unified framework that integrates manifold learning with K-means, resulting in the self-supervised graph embedding framework. Specifically, we establish a connection between K-means and the manifold structure, allowing us to perform K-means without explicitly defining centroids. Additionally, we use this centroid-free K-means to generate labels in low-dimensional space and subsequently utilize the label information to determine the similarity between samples. This approach ensures consistency between the manifold structure and the labels. Our model effectively achieves one-step clustering without the need for redundant balancing hyperparameters. Notably, we have discovered that maximizing the $\ell_{2,1}$-norm naturally maintains class balance during clustering, a result that we have theoretically proven. Finally, experiments on multiple datasets demonstrate that the clustering results of Our-LPP and Our-MFA exhibit excellent and reliable performance.

Updated: 2024-10-30 02:01:17

标题: 自监督图嵌入聚类

摘要: K-means单步降维聚类方法在处理聚类任务中的维度灾难方面取得了一些进展。然而，它将K-means聚类和降维过程结合起来进行优化，导致由于引入超参数和聚类中心的初始化而在聚类效果上存在局限性。此外，在聚类过程中保持类别平衡仍然具有挑战性。为了克服这些问题，我们提出了一个统一框架，将流形学习与K-means集成在一起，形成了自监督图嵌入框架。具体地，我们建立了K-means和流形结构之间的联系，使我们能够在不明确定义质心的情况下执行K-means。此外，我们利用这种无质心的K-means在低维空间生成标签，并随后利用标签信息来确定样本之间的相似性。这种方法确保了流形结构与标签之间的一致性。我们的模型有效地实现了一步聚类，无需冗余的平衡超参数。值得注意的是，我们发现最大化$\ell_{2,1}$-范数自然地在聚类过程中保持了类别平衡，这是我们在理论上证明的结果。最后，对多个数据集的实验表明，我们的LPP和MFA聚类结果表现出卓越可靠的性能。

更新时间: 2024-10-30 02:01:17

领域: cs.LG

下载: http://arxiv.org/abs/2409.15887v2

Time-MMD: A New Multi-Domain Multimodal Dataset for Time Series Analysis

Time series data are ubiquitous across a wide range of real-world domains. While real-world time series analysis (TSA) requires human experts to integrate numerical series data with multimodal domain-specific knowledge, most existing TSA models rely solely on numerical data, overlooking the significance of information beyond numerical series. This oversight is due to the untapped potential of textual series data and the absence of a comprehensive, high-quality multimodal dataset. To overcome this obstacle, we introduce Time-MMD, the first multi-domain, multimodal time series dataset covering 9 primary data domains. Time-MMD ensures fine-grained modality alignment, eliminates data contamination, and provides high usability. Additionally, we develop MM-TSFlib, the first multimodal time-series forecasting (TSF) library, seamlessly pipelining multimodal TSF evaluations based on Time-MMD for in-depth analyses. Extensive experiments conducted on Time-MMD through MM-TSFlib demonstrate significant performance enhancements by extending unimodal TSF to multimodality, evidenced by over 15% mean squared error reduction in general, and up to 40% in domains with rich textual data. More importantly, our datasets and library revolutionize broader applications, impacts, research topics to advance TSA. The dataset and library are available at https://github.com/AdityaLab/Time-MMD and https://github.com/AdityaLab/MM-TSFlib.

Updated: 2024-10-30 02:00:10

标题: Time-MMD：一种新的用于时间序列分析的多领域多模态数据集

摘要: 时间序列数据在各种实际领域中是无处不在的。虽然实际时间序列分析(TSA)需要人类专家将数字序列数据与多模态领域特定知识整合，但大多数现有的TSA模型仅依赖数字数据，忽视了超出数字序列的信息的重要性。这种忽视是由于文本序列数据的未开发潜力和缺乏全面高质量的多模态数据集。为了克服这一障碍，我们引入了Time-MMD，这是第一个涵盖9个主要数据领域的多领域多模态时间序列数据集。Time-MMD确保了细粒度的模态对齐，消除了数据污染，并提供了高可用性。此外，我们开发了MM-TSFlib，这是第一个多模态时间序列预测(TSF)库，通过基于Time-MMD的多模态TSF评估实现无缝地进行深入分析。通过MM-TSFlib在Time-MMD上进行的广泛实验表明，通过将单模态TSF扩展到多模态，可以显著提高性能，一般情况下均方误差减少超过15％，在具有丰富文本数据的领域甚至高达40％。更重要的是，我们的数据集和库彻底改变了更广泛的应用、影响和研究主题，推动了TSA的进步。数据集和库可在https://github.com/AdityaLab/Time-MMD 和https://github.com/AdityaLab/MM-TSFlib 上获得。

更新时间: 2024-10-30 02:00:10

领域: cs.LG,cs.CL

下载: http://arxiv.org/abs/2406.08627v2

Exploring the Role of Token in Transformer-based Time Series Forecasting

Transformer-based methods are a mainstream approach for solving time series forecasting (TSF). These methods use temporal or variable tokens from observable data to make predictions. However, most focus on optimizing the model structure, with few studies paying attention to the role of tokens for predictions. The role is crucial since a model that distinguishes useful tokens from useless ones will predict more effectively. In this paper, we explore this issue. Through theoretical analyses, we find that the gradients mainly depend on tokens that contribute to the predicted series, called positive tokens. Based on this finding, we explore what helps models select these positive tokens. Through a series of experiments, we obtain three observations: i) positional encoding (PE) helps the model identify positive tokens; ii) as the network depth increases, the PE information gradually weakens, affecting the model's ability to identify positive tokens in deeper layers; iii) both enhancing PE in the deeper layers and using semantic-based PE can improve the model's ability to identify positive tokens, thus boosting performance. Inspired by these findings, we design temporal positional encoding (T-PE) for temporal tokens and variable positional encoding (V-PE) for variable tokens. To utilize T-PE and V-PE, we propose T2B-PE, a Transformer-based dual-branch framework. Extensive experiments demonstrate that T2B-PE has superior robustness and effectiveness.

Updated: 2024-10-30 01:49:45

标题: 探讨Transformer在基于标记的时间序列预测中的作用

摘要: 基于Transformer的方法是解决时间序列预测（TSF）的主流方法。这些方法利用可观察数据中的时间或变量标记来进行预测。然而，大多数方法侧重于优化模型结构，很少有研究关注标记在预测中的作用。这一作用至关重要，因为一个能够区分有用标记和无用标记的模型将更有效地进行预测。在本文中，我们探讨了这一问题。通过理论分析，我们发现梯度主要依赖于对预测序列有贡献的标记，称为正标记。基于这一发现，我们探讨了帮助模型选择这些正标记的方法。通过一系列实验，我们得出了三个观察结果：i）位置编码（PE）有助于模型识别正标记；ii）随着网络深度的增加，PE信息逐渐减弱，影响模型在更深层次上识别正标记的能力；iii）增强深层的PE和使用基于语义的PE都可以提高模型识别正标记的能力，从而提高性能。受到这些发现的启发，我们为时间标记设计了时间位置编码（T-PE），为变量标记设计了变量位置编码（V-PE）。为了利用T-PE和V-PE，我们提出了T2B-PE，一个基于Transformer的双分支框架。大量实验证明T2B-PE具有更优越的鲁棒性和有效性。

更新时间: 2024-10-30 01:49:45

领域: cs.AI

下载: http://arxiv.org/abs/2404.10337v3

Score-based Causal Representation Learning: Linear and General Transformations

This paper addresses intervention-based causal representation learning (CRL) under a general nonparametric latent causal model and an unknown transformation that maps the latent variables to the observed variables. Linear and general transformations are investigated. The paper addresses both the identifiability and achievability aspects. Identifiability refers to determining algorithm-agnostic conditions that ensure recovering the true latent causal variables and the latent causal graph underlying them. Achievability refers to the algorithmic aspects and addresses designing algorithms that achieve identifiability guarantees. By drawing novel connections between score functions (i.e., the gradients of the logarithm of density functions) and CRL, this paper designs a score-based class of algorithms that ensures both identifiability and achievability. First, the paper focuses on linear transformations and shows that one stochastic hard intervention per node suffices to guarantee identifiability. It also provides partial identifiability guarantees for soft interventions, including identifiability up to ancestors for general causal models and perfect latent graph recovery for sufficiently non-linear causal models. Secondly, it focuses on general transformations and shows that two stochastic hard interventions per node suffice for identifiability. Notably, one does not need to know which pair of interventional environments have the same node intervened. Finally, the theoretical results are empirically validated via experiments on structured synthetic data and image data.

Updated: 2024-10-30 01:47:27

标题: 基于分数的因果表征学习：线性和一般变换

摘要: 本文讨论了基于干预的因果表示学习（CRL）在一般非参数潜在因果模型和将潜在变量映射到观测变量的未知转换下的情况。研究了线性和一般转换。本文同时涉及可识别性和可实现性方面。可识别性指确定算法不可知条件，以确保恢复真实的潜在因果变量和潜在的因果图。可实现性涉及算法方面，着重设计能够实现可识别性保证的算法。通过在得分函数（即密度函数对数的梯度）和CRL之间建立新的联系，本文设计了一类基于得分的算法，确保了可识别性和可实现性。首先，本文关注线性转换，并展示了每个节点进行一次随机硬干预足以保证可识别性。它还为软干预提供了部分可识别性保证，包括对一般因果模型的祖先进行可识别性以及对足够非线性因果模型的完美潜在图恢复。其次，它关注一般转换，并展示每个节点进行两次随机硬干预足以确保可识别性。值得注意的是，不需要知道哪一对干预环境具有相同的节点进行干预。最后，通过在结构化合成数据和图像数据上进行实验证实了理论结果。

更新时间: 2024-10-30 01:47:27

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2402.00849v3

It's Our Loss: No Privacy Amplification for Hidden State DP-SGD With Non-Convex Loss

Differentially Private Stochastic Gradient Descent (DP-SGD) is a popular iterative algorithm used to train machine learning models while formally guaranteeing the privacy of users. However, the privacy analysis of DP-SGD makes the unrealistic assumption that all intermediate iterates (aka internal state) of the algorithm are released since, in practice, only the final trained model, i.e., the final iterate of the algorithm is released. In this hidden state setting, prior work has provided tighter analyses, albeit only when the loss function is constrained, e.g., strongly convex and smooth or linear. On the other hand, the privacy leakage observed empirically from hidden state DP-SGD, even when using non-convex loss functions, suggests that there is in fact a gap between the theoretical privacy analysis and the privacy guarantees achieved in practice. Therefore, it remains an open question whether hidden state privacy amplification for DP-SGD is possible for all (possibly non-convex) loss functions in general. In this work, we design a counter-example and show, both theoretically and empirically, that a hidden state privacy amplification result for DP-SGD for all loss functions in general is not possible. By carefully constructing a loss function for DP-SGD, we show that for specific loss functions, the final iterate of DP-SGD alone leaks as much information as the sequence of all iterates combined. Furthermore, we empirically verify this result by evaluating the privacy leakage from the final iterate of DP-SGD with our loss function and show that this exactly matches the theoretical upper bound guaranteed by DP. Therefore, we show that the current privacy analysis for DP-SGD is tight for general loss functions and conclude that no privacy amplification is possible for DP-SGD in general for all (possibly non-convex) loss functions.

Updated: 2024-10-30 01:41:44

标题: 这是我们的损失：对于非凸损失下的隐藏状态差分隐私随机梯度下降（DP-SGD）没有隐私放大效应

摘要: 差分隐私随机梯度下降（DP-SGD）是一种流行的迭代算法，用于训练机器学习模型，同时正式保证用户的隐私。然而，DP-SGD的隐私分析做出了不切实际的假设，即算法的所有中间迭代（即内部状态）都被发布，而实际上只有最终训练模型，即算法的最终迭代被发布。在这种隐藏状态设置下，先前的研究提供了更紧密的分析，尽管仅在损失函数受限时，例如强凸平滑或线性。另一方面，即使在使用非凸损失函数时，从隐藏状态DP-SGD中观察到的隐私泄漏也表明，理论隐私分析与实践中实现的隐私保证之间实际上存在差距。因此，隐藏状态下对DP-SGD的隐私增强是否适用于一般（可能是非凸）损失函数仍然是一个开放性问题。在这项工作中，我们设计了一个反例，并在理论上和实证上展示了，一般情况下，对所有损失函数进行DP-SGD的隐私增强结果是不可能的。通过精心构建DP-SGD的损失函数，我们展示了对于特定损失函数，仅最终迭代的DP-SGD泄露的信息量与所有迭代序列的组合相同。此外，我们通过评估使用我们的损失函数从DP-SGD的最终迭代中泄漏的隐私，并展示这与DP所保证的理论上限完全匹配。因此，我们展示了DP-SGD的当前隐私分析对于一般损失函数是紧密的，并得出结论，一般情况下对所有（可能是非凸）损失函数进行DP-SGD的隐私增强是不可能的。

更新时间: 2024-10-30 01:41:44

领域: cs.LG,cs.CR

下载: http://arxiv.org/abs/2407.06496v3

Learning to Route for Dynamic Adapter Composition in Continual Learning with Language Models

Parameter-efficient fine-tuning (PEFT) methods are increasingly used with pre-trained language models (PLMs) for continual learning (CL). These methods typically involve training a PEFT module for each new task and employing similarity-based selection to route modules during inference. However, they face two major limitations: 1) interference during module training with already learned modules and 2) suboptimal routing when composing modules. In this paper, we present L2R, a method that isolates the training of new PEFT modules to ensure their task specialization. L2R then learns to compose the learned modules by training a network of routers that leverages a small memory containing examples of previously seen tasks. We evaluate our method in two CL setups using various benchmarks. Our results demonstrate that L2R provides an effective composition of PEFT modules, leading to improved generalization and performance compared to other methods.

Updated: 2024-10-30 01:38:27

标题: 在使用语言模型进行不断学习时学习为动态适配器组合进行路由

摘要: Parameter-efficient fine-tuning (PEFT)方法越来越被广泛应用于预训练语言模型(PLMs)的继续学习(CL)中。这些方法通常涉及为每个新任务训练一个PEFT模块，并在推断过程中利用基于相似性的选择来路由模块。然而，它们面临两个主要限制：1)在模块训练过程中与已学习的模块发生干扰，2)当组合模块时路由不够优化。在本文中，我们提出了L2R，一种通过隔离新PEFT模块的训练来确保其任务专业化的方法。然后，L2R通过训练一个利用包含先前看到的任务示例的小内存的路由器网络来学习组合学习的模块。我们在两个CL设置中使用各种基准评估了我们的方法。我们的结果表明，与其他方法相比，L2R提供了一种有效的PEFT模块组合，从而提高了泛化性能。

更新时间: 2024-10-30 01:38:27

领域: cs.LG,cs.AI,cs.CL

下载: http://arxiv.org/abs/2408.09053v2

DECRL: A Deep Evolutionary Clustering Jointed Temporal Knowledge Graph Representation Learning Approach

Temporal Knowledge Graph (TKG) representation learning aims to map temporal evolving entities and relations to embedded representations in a continuous low-dimensional vector space. However, existing approaches cannot capture the temporal evolution of high-order correlations in TKGs. To this end, we propose a Deep Evolutionary Clustering jointed temporal knowledge graph Representation Learning approach (DECRL). Specifically, a deep evolutionary clustering module is proposed to capture the temporal evolution of high-order correlations among entities. Furthermore, a cluster-aware unsupervised alignment mechanism is introduced to ensure the precise one-to-one alignment of soft overlapping clusters across timestamps, thereby maintaining the temporal smoothness of clusters. In addition, an implicit correlation encoder is introduced to capture latent correlations between any pair of clusters under the guidance of a global graph. Extensive experiments on seven real-world datasets demonstrate that DECRL achieves the state-of-the-art performances, outperforming the best baseline by an average of 9.53%, 12.98%, 10.42%, and 14.68% in MRR, Hits@1, Hits@3, and Hits@10, respectively.

Updated: 2024-10-30 01:36:06

标题: DECRL: 一种深度进化聚类联合时态知识图表示学习方法

摘要: 时间知识图表示学习旨在将时间演变的实体和关系映射到连续低维向量空间中的嵌入表示。然而，现有方法无法捕捉时间知识图中高阶相关性的时间演变。因此，我们提出了一种深度进化聚类联合时间知识图表示学习方法（DECRL）。具体而言，提出了一个深度进化聚类模块，用于捕捉实体之间的高阶相关性的时间演变。此外，引入了一个集群感知的无监督对齐机制，以确保跨时间戳的软重叠集群的精确一对一对齐，从而保持集群的时间平滑性。此外，引入了一个隐式相关性编码器，以捕捉在全局图的指导下任意一对集群之间的潜在相关性。对七个真实世界数据集的广泛实验表明，DECRL实现了最先进的性能，在MRR、Hits@1、Hits@3和Hits@10方面分别平均优于最佳基线9.53%、12.98%、10.42%和14.68%。

更新时间: 2024-10-30 01:36:06

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2410.22631v1

A Fresh Look at Generalized Category Discovery through Non-negative Matrix Factorization

Generalized Category Discovery (GCD) aims to classify both base and novel images using labeled base data. However, current approaches inadequately address the intrinsic optimization of the co-occurrence matrix $\bar{A}$ based on cosine similarity, failing to achieve zero base-novel regions and adequate sparsity in base and novel domains. To address these deficiencies, we propose a Non-Negative Generalized Category Discovery (NN-GCD) framework. It employs Symmetric Non-negative Matrix Factorization (SNMF) as a mathematical medium to prove the equivalence of optimal K-means with optimal SNMF, and the equivalence of SNMF solver with non-negative contrastive learning (NCL) optimization. Utilizing these theoretical equivalences, it reframes the optimization of $\bar{A}$ and K-means clustering as an NCL optimization problem. Moreover, to satisfy the non-negative constraints and make a GCD model converge to a near-optimal region, we propose a GELU activation function and an NMF NCE loss. To transition $\bar{A}$ from a suboptimal state to the desired $\bar{A}^*$, we introduce a hybrid sparse regularization approach to impose sparsity constraints. Experimental results show NN-GCD outperforms state-of-the-art methods on GCD benchmarks, achieving an average accuracy of 66.1\% on the Semantic Shift Benchmark, surpassing prior counterparts by 4.7\%.

Updated: 2024-10-30 01:34:11

标题: 通过非负矩阵分解对广义类别发现进行全新探索

摘要: 广义类别发现（GCD）旨在利用标记的基础数据对基础和新颖图像进行分类。然而，目前的方法未能充分解决基于余弦相似度的共现矩阵$ \bar{A} $的固有优化问题，无法实现零基础-新颖区域和基础和新颖领域中的适当稀疏性。为了解决这些不足，我们提出了一个非负广义类别发现（NN-GCD）框架。它采用对称非负矩阵分解（SNMF）作为数学介质，证明了最优K-means与最优SNMF的等价性，以及SNMF求解器与非负对比学习（NCL）优化的等价性。利用这些理论等价性，它将$ \bar{A} $和K-means聚类的优化重新构建为一个NCL优化问题。此外，为了满足非负约束并使GCD模型收敛到一个接近最优的区域，我们提出了一个GELU激活函数和一个NMF NCE损失。为了将$ \bar{A} $从次优状态过渡到期望的$ \bar{A}^* $，我们引入了一种混合稀疏正则化方法来施加稀疏约束。实验结果显示，NN-GCD在GCD基准测试中优于最先进的方法，达到了语义转变基准测试的平均准确率为66.1％，超过先前的对手4.7％。

更新时间: 2024-10-30 01:34:11

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2410.21807v2

Towards Universal Mesh Movement Networks

Solving complex Partial Differential Equations (PDEs) accurately and efficiently is an essential and challenging problem in all scientific and engineering disciplines. Mesh movement methods provide the capability to improve the accuracy of the numerical solution without increasing the overall mesh degree of freedom count. Conventional sophisticated mesh movement methods are extremely expensive and struggle to handle scenarios with complex boundary geometries. However, existing learning-based methods require re-training from scratch given a different PDE type or boundary geometry, which limits their applicability, and also often suffer from robustness issues in the form of inverted elements. In this paper, we introduce the Universal Mesh Movement Network (UM2N), which -- once trained -- can be applied in a non-intrusive, zero-shot manner to move meshes with different size distributions and structures, for solvers applicable to different PDE types and boundary geometries. UM2N consists of a Graph Transformer (GT) encoder for extracting features and a Graph Attention Network (GAT) based decoder for moving the mesh. We evaluate our method on advection and Navier-Stokes based examples, as well as a real-world tsunami simulation case. Our method outperforms existing learning-based mesh movement methods in terms of the benchmarks described above. In comparison to the conventional sophisticated Monge-Amp\`ere PDE-solver based method, our approach not only significantly accelerates mesh movement, but also proves effective in scenarios where the conventional method fails. Our project page is at https://erizmr.github.io/UM2N/.

Updated: 2024-10-30 01:33:44

标题: 朝向通用网格移动网络

摘要: 解决复杂的偏微分方程（PDEs）准确和高效是所有科学和工程学科中一个重要且具有挑战性的问题。网格移动方法提供了提高数值解准确性的能力，而不增加总体网格自由度计数。传统的复杂网格移动方法非常昂贵，难以处理具有复杂边界几何形状的情况。然而，现有的基于学习的方法需要从头开始重新训练，限制了它们的适用性，并且经常遇到元素倒置的鲁棒性问题。在本文中，我们介绍了通用网格移动网络（UM2N），一旦训练完成，就可以以一种非侵入式、零-shot 的方式移动具有不同大小分布和结构的网格，用于适用于不同PDE类型和边界几何的求解器。UM2N由用于提取特征的图变换器（GT）编码器和基于图注意力网络（GAT）的解码器组成，用于移动网格。我们在传输和Navier-Stokes based 示例以及一个实际的海啸模拟案例上评估了我们的方法。我们的方法在上述基准测试方面优于现有的基于学习的网格移动方法。与传统的复杂蒙日 - 安培 PDE 求解器方法相比，我们的方法不仅显著加速了网格移动，而且在传统方法失败的情况下也证明有效。我们的项目页面位于https://erizmr.github.io/UM2N/。

更新时间: 2024-10-30 01:33:44

领域: math.NA,cs.AI,cs.CE,cs.LG,cs.NA

下载: http://arxiv.org/abs/2407.00382v3

Text-Guided Attention is All You Need for Zero-Shot Robustness in Vision-Language Models

Due to the impressive zero-shot capabilities, pre-trained vision-language models (e.g. CLIP), have attracted widespread attention and adoption across various domains. Nonetheless, CLIP has been observed to be susceptible to adversarial examples. Through experimental analysis, we have observed a phenomenon wherein adversarial perturbations induce shifts in text-guided attention. Building upon this observation, we propose a simple yet effective strategy: Text-Guided Attention for Zero-Shot Robustness (TGA-ZSR). This framework incorporates two components: the Attention Refinement module and the Attention-based Model Constraint module. Our goal is to maintain the generalization of the CLIP model and enhance its adversarial robustness: The Attention Refinement module aligns the text-guided attention obtained from the target model via adversarial examples with the text-guided attention acquired from the original model via clean examples. This alignment enhances the model's robustness. Additionally, the Attention-based Model Constraint module acquires text-guided attention from both the target and original models using clean examples. Its objective is to maintain model performance on clean samples while enhancing overall robustness. The experiments validate that our method yields a 9.58% enhancement in zero-shot robust accuracy over the current state-of-the-art techniques across 16 datasets. Our code is available at https://github.com/zhyblue424/TGA-ZSR.

Updated: 2024-10-30 01:22:55

标题: 文本引导的关注是视觉语言模型零-shot鲁棒性所需的一切

摘要: 由于令人印象深刻的零样本能力，预训练的视觉语言模型（例如CLIP）已经在各个领域引起了广泛关注和采用。然而，已经观察到CLIP容易受到对抗性示例的影响。通过实验分析，我们观察到了一种现象，即对抗性扰动会引起文本引导关注的变化。基于这一观察，我们提出了一种简单而有效的策略：用于零样本鲁棒性的文本引导注意力（TGA-ZSR）。该框架包括两个组件：注意力细化模块和基于注意力的模型约束模块。我们的目标是保持CLIP模型的泛化能力并增强其对抗性鲁棒性：注意力细化模块通过对抗性示例对目标模型获取的文本引导注意力与通过干净示例获取的文本引导注意力进行对齐。这种对齐增强了模型的鲁棒性。此外，基于注意力的模型约束模块使用干净示例从目标模型和原始模型获取文本引导注意力。其目标是在保持模型在干净样本上的性能的同时增强整体鲁棒性。实验证明，我们的方法在16个数据集上比当前最先进的技术提高了9.58%的零样本鲁棒性准确性。我们的代码可在https://github.com/zhyblue424/TGA-ZSR获取。

更新时间: 2024-10-30 01:22:55

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2410.21802v2

DreamSteerer: Enhancing Source Image Conditioned Editability using Personalized Diffusion Models

Recent text-to-image personalization methods have shown great promise in teaching a diffusion model user-specified concepts given a few images for reusing the acquired concepts in a novel context. With massive efforts being dedicated to personalized generation, a promising extension is personalized editing, namely to edit an image using personalized concepts, which can provide a more precise guidance signal than traditional textual guidance. To address this, a straightforward solution is to incorporate a personalized diffusion model with a text-driven editing framework. However, such a solution often shows unsatisfactory editability on the source image. To address this, we propose DreamSteerer, a plug-in method for augmenting existing T2I personalization methods. Specifically, we enhance the source image conditioned editability of a personalized diffusion model via a novel Editability Driven Score Distillation (EDSD) objective. Moreover, we identify a mode trapping issue with EDSD, and propose a mode shifting regularization with spatial feature guided sampling to avoid such an issue. We further employ two key modifications to the Delta Denoising Score framework that enable high-fidelity local editing with personalized concepts. Extensive experiments validate that DreamSteerer can significantly improve the editability of several T2I personalization baselines while being computationally efficient.

Updated: 2024-10-30 01:16:45

标题: 梦境导航器：使用个性化扩散模型增强源图像条件编辑性

摘要: 最近的文本到图像个性化方法在教授扩散模型用户指定的概念方面表现出很大的潜力，只需几张图像即可在新颖的背景下重用所获得的概念。随着大量努力致力于个性化生成，一个有前途的延伸是个性化编辑，即使用个性化概念编辑图像，这可以提供比传统文本指导更精确的指导信号。为了解决这个问题，一个简单的解决方案是将个性化扩散模型与文本驱动的编辑框架结合起来。然而，这样的解决方案通常在源图像上显示出不理想的可编辑性。为了解决这个问题，我们提出了DreamSteerer，这是一种用于增强现有T2I个性化方法的插件方法。具体地，我们通过一种新颖的可编辑性驱动评分蒸馏（EDSD）目标来增强个性化扩散模型对源图像的可编辑性。此外，我们发现了EDSD存在模式困扰问题，并提出了一种模式转移正则化与空间特征引导采样相结合的方法，以避免这种问题。我们进一步对Delta Denoising Score框架进行了两个关键修改，以实现具有个性化概念的高保真局部编辑。大量实验验证了DreamSteerer可以显著提高几个T2I个性化基线的可编辑性，同时具有高效计算的特点。

更新时间: 2024-10-30 01:16:45

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2410.11208v2

Still More Shades of Null: An Evaluation Suite for Responsible Missing Value Imputation

Data missingness is a practical challenge of sustained interest to the scientific community. In this paper, we present Shades-of-Null, an evaluation suite for responsible missing value imputation. Our work is novel in two ways (i) we model realistic and socially-salient missingness scenarios that go beyond Rubin's classic Missing Completely at Random (MCAR), Missing At Random (MAR) and Missing Not At Random (MNAR) settings, to include multi-mechanism missingness (when different missingness patterns co-exist in the data) and missingness shift (when the missingness mechanism changes between training and test) (ii) we evaluate imputers holistically, based on imputation quality, as well as on the predictive performance, fairness and stability of the models that are trained and tested on the data post-imputation. We use Shades-of-Null to conduct a large-scale empirical study involving 23,940 experimental pipelines, and find that while there is no single best-performing imputation approach for all missingness types, interesting trade-offs arise between predictive performance, fairness and stability, based on the combination of missingness scenario, imputer choice, and the architecture of the predictive model. We make Shades-of-Null publicly available, to enable researchers to rigorously evaluate missing value imputation methods on a wide range of metrics in plausible and socially meaningful scenarios.

Updated: 2024-10-30 01:06:42

标题: 更多的空值色调：一套用于负责任缺失值插补的评估套件

摘要: 数据缺失是科学界持续关注的实际挑战。在本文中，我们提出了Shades-of-Null，这是一个用于负责任地缺失值插补的评估套件。我们的工作在两个方面是新颖的，一是我们建模了现实和具有社会意义的缺失情景，超越了Rubin经典的完全随机缺失（MCAR）、随机缺失（MAR）和非随机缺失（MNAR）设置，包括多机制缺失（当数据中存在不同的缺失模式时）和缺失转移（当缺失机制在训练和测试之间发生变化时）；二是我们综合评估插补器，基于插补质量以及在插补后训练和测试的模型的预测性能、公平性和稳定性。我们使用Shades-of-Null进行了一项大规模实证研究，涉及23,940个实验管线，并发现虽然没有适用于所有缺失类型的最佳插补方法，但在预测性能、公平性和稳定性之间存在有趣的权衡，这取决于缺失情景、插补器选择和预测模型的架构的组合。我们公开提供Shades-of-Null，以便研究人员在合理且具有社会意义的情景中严格评估缺失值插补方法的广泛指标。

更新时间: 2024-10-30 01:06:42

领域: cs.AI,cs.CY,cs.LG

下载: http://arxiv.org/abs/2409.07510v2

Zeroth-Order Sampling Methods for Non-Log-Concave Distributions: Alleviating Metastability by Denoising Diffusion

This paper considers the problem of sampling from non-logconcave distribution, based on queries of its unnormalized density. It first describes a framework, Denoising Diffusion Monte Carlo (DDMC), based on the simulation of a denoising diffusion process with its score function approximated by a generic Monte Carlo estimator. DDMC is an oracle-based meta-algorithm, where its oracle is the assumed access to samples that generate a Monte Carlo score estimator. Then we provide an implementation of this oracle, based on rejection sampling, and this turns DDMC into a true algorithm, termed Zeroth-Order Diffusion Monte Carlo (ZOD-MC). We provide convergence analyses by first constructing a general framework, i.e. a performance guarantee for DDMC, without assuming the target distribution to be log-concave or satisfying any isoperimetric inequality. Then we prove that ZOD-MC admits an inverse polynomial dependence on the desired sampling accuracy, albeit still suffering from the curse of dimensionality. Consequently, for low dimensional distributions, ZOD-MC is a very efficient sampler, with performance exceeding latest samplers, including also-denoising-diffusion-based RDMC and RSDMC. Last, we experimentally demonstrate the insensitivity of ZOD-MC to increasingly higher barriers between modes or discontinuity in non-convex potential.

Updated: 2024-10-30 01:05:20

标题: 非对数凹分布的零阶采样方法：通过去噪扩散缓解亚稳态

摘要: 本文考虑从非对数凹分布中抽样的问题，基于其未归一化密度的查询。首先描述了一个基于模拟去噪扩散过程的框架，称为去噪扩散蒙特卡洛（DDMC），其中其得分函数由通用蒙特卡洛估计器逼近。DDMC是一个基于oracle的元算法，其oracle是假定可以访问生成蒙特卡洛得分估计器的样本。然后我们提供了一个基于拒绝抽样的实现这个oracle的方法，这将DDMC转变为一个真正的算法，称为零阶扩散蒙特卡洛（ZOD-MC）。我们首先构建了一个通用框架，即为DDMC提供了一个性能保证，而不假设目标分布是对数凹的或满足任何等周不等式。然后我们证明ZOD-MC对所需抽样精度具有反多项式依赖性，尽管仍受到维度诅咒的影响。因此，对于低维分布，ZOD-MC是一个非常高效的抽样器，性能超过了最新的抽样器，包括也去噪扩散基础的RDMC和RSDMC。最后，我们通过实验证明ZOD-MC对模式之间越来越高的障碍或非凸潜力中的不连续性具有鲁棒性。

更新时间: 2024-10-30 01:05:20

领域: stat.ML,cs.LG,math.PR,math.ST,stat.ME,stat.TH

下载: http://arxiv.org/abs/2402.17886v4

The Ultimate Guide to Fine-Tuning LLMs from Basics to Breakthroughs: An Exhaustive Review of Technologies, Research, Best Practices, Applied Research Challenges and Opportunities

This report examines the fine-tuning of Large Language Models (LLMs), integrating theoretical insights with practical applications. It outlines the historical evolution of LLMs from traditional Natural Language Processing (NLP) models to their pivotal role in AI. A comparison of fine-tuning methodologies, including supervised, unsupervised, and instruction-based approaches, highlights their applicability to different tasks. The report introduces a structured seven-stage pipeline for fine-tuning LLMs, spanning data preparation, model initialization, hyperparameter tuning, and model deployment. Emphasis is placed on managing imbalanced datasets and optimization techniques. Parameter-efficient methods like Low-Rank Adaptation (LoRA) and Half Fine-Tuning are explored for balancing computational efficiency with performance. Advanced techniques such as memory fine-tuning, Mixture of Experts (MoE), and Mixture of Agents (MoA) are discussed for leveraging specialized networks and multi-agent collaboration. The report also examines novel approaches like Proximal Policy Optimization (PPO) and Direct Preference Optimization (DPO), which align LLMs with human preferences, alongside pruning and routing optimizations to improve efficiency. Further sections cover validation frameworks, post-deployment monitoring, and inference optimization, with attention to deploying LLMs on distributed and cloud-based platforms. Emerging areas such as multimodal LLMs, fine-tuning for audio and speech, and challenges related to scalability, privacy, and accountability are also addressed. This report offers actionable insights for researchers and practitioners navigating LLM fine-tuning in an evolving landscape.

Updated: 2024-10-30 01:04:15

标题: 《从基础到突破：微调LLMs的终极指南：技术、研究、最佳实践、应用研究挑战和机遇的全面审查》

摘要: 这份报告探讨了大型语言模型（LLMs）的微调，将理论洞察与实际应用相结合。它概述了LLMs从传统自然语言处理（NLP）模型演变到在人工智能中发挥关键作用的历史发展。对微调方法论的比较，包括有监督、无监督和基于指令的方法，突出它们在不同任务中的适用性。报告引入了一个结构化的七阶段微调LLMs的流程，涵盖数据准备、模型初始化、超参数调整和模型部署。重点放在处理不平衡数据集和优化技术上。探索了像低秩适应（LoRA）和半微调这样的参数高效方法，以平衡计算效率和性能。讨论了高级技术，如内存微调、专家混合（MoE）和代理人混合（MoA），用于利用专业网络和多代理协作。报告还探讨了新颖方法，如近端策略优化（PPO）和直接偏好优化（DPO），以及修剪和路由优化，以提高效率。进一步的部分涵盖了验证框架、部署后监控和推理优化，重点关注在分布式和基于云的平台上部署LLMs。还讨论了新兴领域，如多模式LLMs、用于音频和语音的微调，以及与可扩展性、隐私和问责相关的挑战。这份报告为研究人员和从业者提供了在不断变化的环境中进行LLMs微调的可操作见解。

更新时间: 2024-10-30 01:04:15

领域: cs.LG,cs.CL

下载: http://arxiv.org/abs/2408.13296v3

Investigating Language-Specific Calibration For Pruning Multilingual Large Language Models

Recent advances in large language model (LLM) pruning have shown state-of-the-art (SotA) compression results in post-training and retraining-free settings while maintaining high predictive performance. However, previous research mainly considered calibrating based on English text, despite the multilingual nature of modern LLMs and their frequent use in non-English languages. In this paper, we set out to investigate calibrating the pruning of multilingual language models for monolingual applications. We present the first comprehensive empirical study, comparing different calibration languages for pruning multilingual models across diverse languages, tasks, models, and SotA pruning techniques. Our results offer practical suggestions, for example, calibrating in the target language can efficiently retain the language modeling capability but does not necessarily benefit downstream tasks. Through further analysis of latent subspaces, pruning masks, and individual neurons within pruned models, we find that while pruning generally preserves strong language-specific features, it may fail to retain language-specific neuron activation patterns and subtle, language-agnostic features associated with knowledge and reasoning that are needed for complex tasks.

Updated: 2024-10-30 00:53:43

标题: 研究语言特定的校准方法用于修剪多语言大型语言模型

摘要: 最近对大型语言模型（LLM）修剪的进展表明，在训练后和无需重新训练的情况下，可以实现最先进的压缩结果，同时保持高预测性能。然而，以往的研究主要考虑基于英语文本进行校准，尽管现代LLM具有多语言性质，并且经常在非英语语言中使用。在本文中，我们着手研究校准多语言语言模型修剪以用于单语应用。我们提出了第一个全面的实证研究，比较了在各种语言、任务、模型和最先进的修剪技术下，用不同的校准语言对多语言模型进行修剪。我们的结果提供了实用建议，例如，在目标语言中进行校准可以有效保留语言建模能力，但不一定有助于下游任务。通过进一步分析潜在子空间、修剪掩模和修剪模型内的个别神经元，我们发现，虽然修剪通常保留强大的语言特定特征，但可能无法保留语言特定的神经元激活模式和与知识和推理相关的微妙、与语言无关的特征，这些特征对于复杂任务是必需的。

更新时间: 2024-10-30 00:53:43

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2408.14398v3

FISC: Federated Domain Generalization via Interpolative Style Transfer and Contrastive Learning

Federated Learning (FL) shows promise in preserving privacy and enabling collaborative learning. However, most current solutions focus on private data collected from a single domain. A significant challenge arises when client data comes from diverse domains (i.e., domain shift), leading to poor performance on unseen domains. Existing Federated Domain Generalization approaches address this problem but assume each client holds data for an entire domain, limiting their practicality in real-world scenarios with domain-based heterogeneity and client sampling. To overcome this, we introduce FISC, a novel FL domain generalization paradigm that handles more complex domain distributions across clients. FISC enables learning across domains by extracting an interpolative style from local styles and employing contrastive learning. This strategy gives clients multi-domain representations and unbiased convergent targets. Empirical results on multiple datasets, including PACS, Office-Home, and IWildCam, show FISC outperforms state-of-the-art (SOTA) methods. Our method achieves accuracy improvements ranging from 3.64% to 57.22% on unseen domains. Our code is available at https://anonymous.4open.science/r/FISC-AAAI-16107.

Updated: 2024-10-30 00:50:23

标题: FISC：通过插值式风格迁移和对比学习实现联合领域泛化

摘要: Federated Learning（FL）显示出在维护隐私和实现协作学习方面的潜力。然而，大多数当前的解决方案侧重于从单个领域收集的私人数据。当客户数据来自不同领域（即领域转移）时，会出现重大挑战，导致在未知领域上表现不佳。现有的联邦领域泛化方法解决了这个问题，但假设每个客户端持有整个领域的数据，从而限制了它们在领域异质性和客户抽样的实际应用中的实用性。为了克服这一挑战，我们引入了FISC，一种处理客户端之间更复杂领域分布的新颖FL领域泛化范式。FISC通过从本地样式中提取插值样式并采用对比学习来实现跨领域学习。这种策略为客户端提供了多领域表示和无偏收敛目标。在多个数据集上的实证结果，包括PACS、Office-Home和IWildCam，显示FISC的性能优于最先进的方法。我们的方法在未知领域上实现了3.64%到57.22%的精度改进。我们的代码可在https://anonymous.4open.science/r/FISC-AAAI-16107找到。

更新时间: 2024-10-30 00:50:23

领域: cs.LG,cs.CV,cs.DC

下载: http://arxiv.org/abs/2410.22622v1

Efficient Feature Extraction and Classification Architecture for MRI-Based Brain Tumor Detection

Uncontrolled cell division in the brain is what gives rise to brain tumors. If the tumor size increases by more than half, there is little hope for the patient's recovery. This emphasizes the need of rapid and precise brain tumor diagnosis. When it comes to analyzing, diagnosing, and planning therapy for brain tumors, MRI imaging plays a crucial role. A brain tumor's development history is crucial information for doctors to have. When it comes to distinguishing between human soft tissues, MRI scans are superior. In order to get reliable classification results from MRI scans quickly, deep learning is one of the most practical methods. Early human illness diagnosis has been demonstrated to be more accurate when deep learning methods are used. In the case of diagnosing a brain tumor, when even a little misdiagnosis might have serious consequences, accuracy is especially important. Disclosure of brain tumors in medical images is still a difficult task. Brain MRIs are notoriously imprecise in revealing the presence or absence of tumors. Using MRI scans of the brain, a Convolutional Neural Network (CNN) was trained to identify the presence of a tumor in this research. Results from the CNN model showed an accuracy of 99.17%. The CNN model's characteristics were also retrieved. In order to evaluate the CNN model's capability for processing images, we applied the features via the following machine learning models: KNN, Logistic regression, SVM, Random Forest, Naive Bayes, and Perception. CNN and machine learning models were also evaluated using the standard metrics of Precision, Recall, Specificity, and F1 score. The significance of the doctor's diagnosis enhanced the accuracy of the CNN model's assistance in identifying the existence of tumor and treating the patient.

Updated: 2024-10-30 00:47:32

标题: MRI基于脑肿瘤检测的高效特征提取和分类架构

摘要: 大脑中的细胞不受控制的分裂是导致脑肿瘤的原因。如果肿瘤大小增加超过一半，患者的康复希望很小。这强调了对快速和精确的脑肿瘤诊断的需要。在分析、诊断和规划脑肿瘤治疗时，MRI成像发挥着关键作用。脑肿瘤的发展历史对医生来说是关键信息。在区分人类软组织方面，MRI扫描表现出优势。为了从MRI扫描中快速获得可靠的分类结果，深度学习是最实用的方法之一。早期诊断人类疾病时，使用深度学习方法被证明更准确。在诊断脑肿瘤的情况下，即使有一点误诊也可能导致严重后果，准确性尤为重要。在医学图像中揭示脑肿瘤仍然是一个困难的任务。脑MRI在揭示肿瘤的存在或缺席方面是臭名昭著的不精确。在这项研究中，使用脑部MRI扫描训练了一个卷积神经网络（CNN）来识别肿瘤的存在。CNN模型的结果显示了99.17%的准确性。还检索了CNN模型的特征。为了评估CNN模型处理图像的能力，我们通过以下机器学习模型应用了这些特征：KNN、逻辑回归、支持向量机（SVM）、随机森林、朴素贝叶斯和感知。使用标准度量指标Precision、Recall、Specificity和F1分数评估了CNN和机器学习模型。医生诊断的重要性增强了CNN模型在识别肿瘤存在并治疗患者方面的准确性。

更新时间: 2024-10-30 00:47:32

领域: eess.IV,cs.AI,cs.CV

下载: http://arxiv.org/abs/2410.22619v1

Masked Hard-Attention Transformers Recognize Exactly the Star-Free Languages

The expressive power of transformers over inputs of unbounded size can be studied through their ability to recognize classes of formal languages. In this paper, we establish exact characterizations of transformers with hard attention (in which all attention is focused on exactly one position) and attention masking (in which each position only attends to positions on one side). With strict masking (each position cannot attend to itself) and without position embeddings, these transformers are expressively equivalent to linear temporal logic (LTL), which defines exactly the star-free languages. A key technique is the use of Boolean RASP as a convenient intermediate language between transformers and LTL. We then take numerous results known for LTL and apply them to transformers, showing how position embeddings, strict masking, and depth all increase expressive power.

Updated: 2024-10-30 00:44:17

标题: Masked Hard-Attention Transformers准确识别无星语言

摘要: transformers的表达能力可以通过它们识别形式语言类别的能力来研究，这些输入的大小是不受限制的。在这篇论文中，我们建立了对具有硬注意力（所有注意力都集中在一个位置上）和注意力屏蔽（每个位置只关注一侧位置）的transformers的精确描述。在严格的屏蔽（每个位置不能关注自身）和没有位置嵌入的情况下，这些transformers在表达上等价于线性时态逻辑（LTL），它确切地定义了无星语言。一个关键技术是使用布尔RASP作为transformers和LTL之间方便的中间语言。然后，我们将已知的许多LTL结果应用于transformers，展示了位置嵌入、严格屏蔽和深度如何增加表达能力。

更新时间: 2024-10-30 00:44:17

领域: cs.FL,cs.LG,cs.LO

下载: http://arxiv.org/abs/2310.13897v4

CoGS: Model Agnostic Causality Constrained Counterfactual Explanations using goal-directed ASP

Machine learning models are increasingly used in critical areas such as loan approvals and hiring, yet they often function as black boxes, obscuring their decision-making processes. Transparency is crucial, as individuals need explanations to understand decisions, primarily if the decisions result in an undesired outcome. Our work introduces CoGS (Counterfactual Generation with s(CASP)), a model-agnostic framework capable of generating counterfactual explanations for classification models. CoGS leverages the goal-directed Answer Set Programming system s(CASP) to compute realistic and causally consistent modifications to feature values, accounting for causal dependencies between them. By using rule-based machine learning algorithms (RBML), notably the FOLD-SE algorithm, CoGS extracts the underlying logic of a statistical model to generate counterfactual solutions. By tracing a step-by-step path from an undesired outcome to a desired one, CoGS offers interpretable and actionable explanations of the changes required to achieve the desired outcome. We present details of the CoGS framework along with its evaluation.

Updated: 2024-10-30 00:43:01

标题: CoGS: 使用面向目标的ASP的模型不可知因果约束的反事实解释

摘要: 机器学习模型越来越多地应用于关键领域，如贷款批准和招聘，然而它们通常作为黑盒子工作，掩盖了它们的决策过程。透明度至关重要，因为个人需要解释来理解决策，特别是如果决策导致不良结果。我们的工作介绍了CoGS（使用s（CASP）生成反事实的框架），这是一个模型无关的框架，能够为分类模型生成反事实解释。CoGS利用目标导向的答案集编程系统s（CASP）来计算特征值的现实和因果一致的修改，考虑到它们之间的因果依赖关系。通过使用基于规则的机器学习算法（RBML），尤其是FOLD-SE算法，CoGS提取统计模型的潜在逻辑来生成反事实解决方案。通过从不良结果到期望结果的逐步路径追踪，CoGS提供了可解释和可操作的解释，说明实现期望结果所需的变化。我们提供了CoGS框架的详细信息以及其评估结果。

更新时间: 2024-10-30 00:43:01

领域: cs.AI

下载: http://arxiv.org/abs/2410.22615v1

BLAST: Block-Level Adaptive Structured Matrices for Efficient Deep Neural Network Inference

Large-scale foundation models have demonstrated exceptional performance in language and vision tasks. However, the numerous dense matrix-vector operations involved in these large networks pose significant computational challenges during inference. To address these challenges, we introduce the Block-Level Adaptive STructured (BLAST) matrix, designed to learn and leverage efficient structures prevalent in the weight matrices of linear layers within deep learning models. Compared to existing structured matrices, the BLAST matrix offers substantial flexibility, as it can represent various types of structures that are either learned from data or computed from pre-existing weight matrices. We demonstrate the efficiency of using the BLAST matrix for compressing both language and vision tasks, showing that (i) for medium-sized models such as ViT and GPT-2, training with BLAST weights boosts performance while reducing complexity by 70% and 40%, respectively; and (ii) for large foundation models such as Llama-7B and DiT-XL, the BLAST matrix achieves a 2x compression while exhibiting the lowest performance degradation among all tested structured matrices. Our code is available at https://github.com/changwoolee/BLAST.

Updated: 2024-10-30 00:38:11

标题: BLAST：用于高效深度神经网络推理的块级自适应结构化矩阵

摘要: 大规模基础模型在语言和视觉任务中表现出卓越的性能。然而，在这些大型网络中涉及的许多密集矩阵-向量操作在推断过程中带来了重大的计算挑战。为了解决这些挑战，我们引入了块级自适应结构（BLAST）矩阵，旨在学习并利用深度学习模型中线性层权重矩阵中普遍存在的高效结构。与现有的结构化矩阵相比，BLAST矩阵具有相当大的灵活性，因为它可以表示从数据中学习或从预先存在的权重矩阵计算得出的各种类型的结构。我们展示了使用BLAST矩阵压缩语言和视觉任务的效率，结果显示：（i）对于中等大小的模型，如ViT和GPT-2，使用BLAST权重进行训练可以提升性能，同时分别降低了70%和40%的复杂性；（ii）对于大型基础模型，如Llama-7B和DiT-XL，BLAST矩阵实现了2倍的压缩，同时在所有测试的结构化矩阵中表现出最低的性能降级。我们的代码可在https://github.com/changwoolee/BLAST找到。

更新时间: 2024-10-30 00:38:11

领域: cs.LG,cs.AI,stat.ML

下载: http://arxiv.org/abs/2410.21262v2

Unlocking the Wisdom of Large Language Models: An Introduction to The Path to Artificial General Intelligence

This booklet, "Unlocking the Wisdom of LLM Collaborative Intelligence," introduces the comprehensive work "The Path to Artificial General Intelligence." Through ten aphorisms, it distills the core principles of LLM Collaborative Intelligence (LCI) as a promising framework toward achieving AGI. The booklet also offers titles, abstracts, and introductions from the main chapters, along with the first two chapters in full. The second edition, released this week, includes significant enhancements to Chapters 6 to 9 and a revised preface addressing Yann LeCun's skepticism about AGI. LeCun argues that LLMs lack memory, planning, and grounding, but we propose that LCI's collaborative architecture, involving multimodal LLMs with executive, legislative, and judicial roles, overcomes these limitations. Chapters on SocraSynth, EVINCE, consciousness modeling, and behavior modeling demonstrate that collaborative LLMs with checks and balances can achieve intelligence beyond any single model's capability. By combining complementary strengths, such as world modeling and advanced sensory capabilities, LCI enables models to work together and perceive reality beyond human limitations. As with human institutions, progress depends on cooperation, not isolation. Collaborative LLMs may unlock new levels of intelligence, paving the way toward AGI.

Updated: 2024-10-30 00:15:20

标题: 解锁大型语言模型的智慧：通往人工通用智能之路简介

摘要: 这本小册子《解锁LLM协作智能的智慧》介绍了全面的作品《通往人工通用智能之路》。通过十句格言，它概括了LLM协作智能（LCI）的核心原则，作为实现AGI的一个有希望的框架。该小册子还提供了主要章节的标题、摘要和介绍，以及前两章的全部内容。本周发布的第二版对第6至9章进行了重大增强，并修订了前言，针对Yann LeCun对AGI的怀疑。LeCun认为LLM缺乏记忆、规划和基础，但我们提出LCI的协作架构，涉及具有执行、立法和司法角色的多模LMM，可以克服这些限制。关于SocraSynth、EVINCE、意识建模和行为建模的章节表明，具有制衡的协作LLM可以实现超越任何单一模型能力的智能。通过结合互补的优势，如世界建模和先进的感官能力，LCI使模型能够共同工作，并超越人类的现实限制。与人类机构一样，进步取决于合作，而不是孤立。协作LLM可能会解锁新的智能水平，为AGI铺平道路。

更新时间: 2024-10-30 00:15:20

领域: cs.AI,I.2.7

下载: http://arxiv.org/abs/2409.01007v2

Adaptive Randomized Smoothing: Certified Adversarial Robustness for Multi-Step Defences

We propose Adaptive Randomized Smoothing (ARS) to certify the predictions of our test-time adaptive models against adversarial examples. ARS extends the analysis of randomized smoothing using $f$-Differential Privacy to certify the adaptive composition of multiple steps. For the first time, our theory covers the sound adaptive composition of general and high-dimensional functions of noisy inputs. We instantiate ARS on deep image classification to certify predictions against adversarial examples of bounded $L_{\infty}$ norm. In the $L_{\infty}$ threat model, ARS enables flexible adaptation through high-dimensional input-dependent masking. We design adaptivity benchmarks, based on CIFAR-10 and CelebA, and show that ARS improves standard test accuracy by $1$ to $15\%$ points. On ImageNet, ARS improves certified test accuracy by up to $1.6\%$ points over standard RS without adaptivity. Our code is available at https://github.com/ubc-systopia/adaptive-randomized-smoothing .

Updated: 2024-10-30 00:14:07

标题: 自适应随机平滑：多步防御的认证对抗强健性

摘要: 我们提出了自适应随机平滑（ARS）来对我们的测试时间自适应模型的预测进行认证，以抵御对抗样本。ARS通过使用$f$-差分隐私来扩展随机平滑的分析，以认证多步自适应组合。首次，我们的理论涵盖了对具有噪声输入的一般和高维函数的声音自适应组合。我们在深度图像分类上实例化ARS，以对抗性例子的$L_{\infty}$范数进行认证预测。在$L_{\infty}$威胁模型中，ARS通过高维输入相关的遮蔽实现了灵活的适应性。我们设计了基于CIFAR-10和CelebA的适应性基准，并展示ARS将标准测试准确率提高了$1$到$15\%$。在ImageNet上，ARS将认证测试准确率提高了高达标准RS的$1.6\%$。我们的代码可在https://github.com/ubc-systopia/adaptive-randomized-smoothing找到。

更新时间: 2024-10-30 00:14:07

领域: cs.LG,cs.CR

下载: http://arxiv.org/abs/2406.10427v2

Sketchy Moment Matching: Toward Fast and Provable Data Selection for Finetuning

We revisit data selection in a modern context of finetuning from a fundamental perspective. Extending the classical wisdom of variance minimization in low dimensions to high-dimensional finetuning, our generalization analysis unveils the importance of additionally reducing bias induced by low-rank approximation. Inspired by the variance-bias tradeoff in high dimensions from the theory, we introduce Sketchy Moment Matching (SkMM), a scalable data selection scheme with two stages. (i) First, the bias is controlled using gradient sketching that explores the finetuning parameter space for an informative low-dimensional subspace $\mathcal{S}$; (ii) then the variance is reduced over $\mathcal{S}$ via moment matching between the original and selected datasets. Theoretically, we show that gradient sketching is fast and provably accurate: selecting $n$ samples by reducing variance over $\mathcal{S}$ preserves the fast-rate generalization $O(\dim(\mathcal{S})/n)$, independent of the parameter dimension. Empirically, we concretize the variance-bias balance via synthetic experiments and demonstrate the effectiveness of SkMM for finetuning in real vision tasks.

Updated: 2024-10-30 00:03:10

标题: 潦草瞬间匹配：快速且可证明的数据选择以用于微调

摘要: 我们从基本的角度重新审视了现代微调背景下的数据选择。将低维度中方差最小化的经典智慧延伸到高维度微调，我们的泛化分析揭示了另外减少由低秩逼近引起的偏差的重要性。受到高维度理论中方差-偏差权衡的启发，我们引入了一种可扩展的数据选择方案，名为Sketchy Moment Matching（SkMM），包含两个阶段。首先，通过梯度草图探索信息性低维子空间$\mathcal{S}$来控制偏差；然后通过原始和选择的数据集之间的矩匹配来减少$\mathcal{S}$上的方差。理论上，我们证明梯度草图快速且可靠：通过在$\mathcal{S}$上减少方差选择$n$个样本能保持快速的泛化率$O(\dim(\mathcal{S})/n)$，与参数维度无关。在实验上，我们通过合成实验具体化了方差-偏差平衡，并展示了SkMM在真实视觉任务的微调中的有效性。

更新时间: 2024-10-30 00:03:10

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2407.06120v2

TreeBoN: Enhancing Inference-Time Alignment with Speculative Tree-Search and Best-of-N Sampling

Inference-time alignment enhances the performance of large language models without requiring additional training or fine-tuning but presents challenges due to balancing computational efficiency with high-quality output. Best-of-N (BoN) sampling, as a simple yet powerful approach, generates multiple responses and selects the best one, achieving improved performance but with a high computational cost. We propose TreeBoN, a novel framework that integrates a speculative tree-search strategy into Best-of-N (BoN) Sampling. TreeBoN maintains a set of parent nodes, iteratively branching and pruning low-quality responses, thereby reducing computational overhead while maintaining high output quality. Our approach also leverages token-level rewards from Direct Preference Optimization (DPO) to guide tree expansion and prune low-quality paths. We evaluate TreeBoN using AlpacaFarm, HH-RLHF, UltraFeedback, GSM8K, and TutorEval datasets, demonstrating consistent improvements. Specifically, TreeBoN achieves the highest win rate of 65% on TutorEval and around 60% win rates across other different datasets, outperforming standard BoN with the same computational cost and showcasing its scalability and alignment efficacy.

Updated: 2024-10-30 00:02:08

标题: TreeBoN：通过推测性树搜索和最佳N采样增强推断时间对齐

摘要: 推断时间对齐增强了大型语言模型的性能，而无需额外的训练或微调，但在保持计算效率和高质量输出方面存在挑战。作为一种简单而强大的方法，最佳N（BoN）抽样生成多个响应并选择最佳响应，从而实现了性能提升，但代价是高计算成本。我们提出了TreeBoN，这是一个将一种推测树搜索策略集成到最佳N（BoN）抽样中的新框架。TreeBoN保持一组父节点，迭代地分支和修剪低质量响应，从而降低计算开销同时保持高输出质量。我们的方法还利用来自直接偏好优化（DPO）的令牌级奖励来指导树扩展和修剪低质量路径。我们使用AlpacaFarm、HH-RLHF、UltraFeedback、GSM8K和TutorEval数据集评估了TreeBoN，展示了一致的改进。具体而言，TreeBoN在TutorEval上实现了65%的最高胜率，在其他不同数据集上也实现了约60%的胜率，超越了具有相同计算成本的标准BoN，并展示了其可扩展性和对齐有效性。

更新时间: 2024-10-30 00:02:08

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2410.16033v3