Arxiv Day: Article

Gem: Gaussian Mixture Model Embeddings for Numerical Feature Distributions

Embeddings are now used to underpin a wide variety of data management tasks, including entity resolution, dataset search and semantic type detection. Such applications often involve datasets with numerical columns, but there has been more emphasis placed on the semantics of categorical data in embeddings than on the distinctive features of numerical data. In this paper, we propose a method called Gem (Gaussian mixture model embeddings) that creates embeddings that build on numerical value distributions from columns. The proposed method specializes a Gaussian Mixture Model (GMM) to identify and cluster columns with similar value distributions. We introduce a signature mechanism that generates a probability matrix for each column, indicating its likelihood of belonging to specific Gaussian components, which can be used for different applications, such as to determine semantic types. Finally, we generate embeddings for three numerical data properties: distributional, statistical, and contextual. Our core method focuses solely on numerical columns without using table names or neighboring columns for context. However, the method can be combined with other types of evidence, and we later integrate attribute names with the Gaussian embeddings to evaluate the method's contribution to improving overall performance. We compare Gem with several baseline methods for numeric only and numeric + context tasks, showing that Gem consistently outperforms the baselines on four benchmark datasets.

Updated: 2024-10-09 23:40:58

标题: Gem: 高斯混合模型嵌入用于数字特征分布

摘要: 嵌入现在被用于支撑各种数据管理任务，包括实体解析、数据集搜索和语义类型检测。这些应用通常涉及具有数字列的数据集，但在嵌入中对分类数据的语义更加强调，而对数字数据的独特特征的关注较少。在本文中，我们提出了一种名为Gem（高斯混合模型嵌入）的方法，该方法创建嵌入，基于从列中获取的数字值分布。所提出的方法专门针对高斯混合模型（GMM），以识别和聚类具有类似值分布的列。我们引入了一种签名机制，为每个列生成概率矩阵，指示其属于特定高斯分量的可能性，可用于不同的应用，例如确定语义类型。最后，我们生成三种数字数据属性的嵌入：分布、统计和上下文。我们的核心方法仅专注于数字列，而不使用表名或相邻列作为上下文。然而，该方法可以与其他类型的证据结合使用，我们稍后将属性名称与高斯嵌入集成，以评估该方法对改进整体性能的贡献。我们将Gem与几种基准方法进行了比较，用于仅数字和数字+上下文任务，结果显示Gem在四个基准数据集上始终优于基准方法。

更新时间: 2024-10-09 23:40:58

领域: cs.DB,cs.LG

下载: http://arxiv.org/abs/2410.07485v1

Extraction Propagation

We consider the problem of learning to map large instances, such as sequences and images, to outputs. Since training one large neural network end to end with backpropagation is plagued by vanishing gradients and degradation, we develop a novel neural network architecture called Extraction propagation, which works by training, in parallel, many small neural networks which interact with one another. We note that the performance of Extraction propagation is only conjectured as we have yet to implement it. We do, however, back the algorithm with some theory. A previous version of this paper was entitled "Fusion encoder networks" and detailed a slightly different architecture.

Updated: 2024-10-09 23:25:27

标题: 提取传播

摘要: 我们考虑学习将大实例（如序列和图像）映射到输出的问题。由于使用反向传播训练一个大型神经网络容易出现梯度消失和退化的问题，我们开发了一种新颖的神经网络架构，称为提取传播，它通过训练许多小型神经网络并相互交互来工作。我们注意到，提取传播的性能仅仅是推测性的，因为我们尚未实施它。然而，我们用一些理论支持这个算法。本文的一个早期版本被命名为“融合编码器网络”，详细介绍了一个略有不同的架构。

更新时间: 2024-10-09 23:25:27

领域: cs.LG

下载: http://arxiv.org/abs/2402.15883v3

Exploring the design space of deep-learning-based weather forecasting systems

Despite tremendous progress in developing deep-learning-based weather forecasting systems, their design space, including the impact of different design choices, is yet to be well understood. This paper aims to fill this knowledge gap by systematically analyzing these choices including architecture, problem formulation, pretraining scheme, use of image-based pretrained models, loss functions, noise injection, multi-step inputs, additional static masks, multi-step finetuning (including larger stride models), as well as training on a larger dataset. We study fixed-grid architectures such as UNet, fully convolutional architectures, and transformer-based models, along with grid-invariant architectures, including graph-based and operator-based models. Our results show that fixed-grid architectures outperform grid-invariant architectures, indicating a need for further architectural developments in grid-invariant models such as neural operators. We therefore propose a hybrid system that combines the strong performance of fixed-grid models with the flexibility of grid-invariant architectures. We further show that multi-step fine-tuning is essential for most deep-learning models to work well in practice, which has been a common practice in the past. Pretraining objectives degrade performance in comparison to supervised training, while image-based pretrained models provide useful inductive biases in some cases in comparison to training the model from scratch. Interestingly, we see a strong positive effect of using a larger dataset when training a smaller model as compared to training on a smaller dataset for longer. Larger models, on the other hand, primarily benefit from just an increase in the computational budget. We believe that these results will aid in the design of better weather forecasting systems in the future.

Updated: 2024-10-09 22:25:50

标题: 探索基于深度学习的天气预报系统的设计空间

摘要: 尽管在发展基于深度学习的天气预报系统方面取得了巨大进展，但包括不同设计选择的影响在内的设计空间仍未被充分理解。本文旨在通过系统分析这些选择，包括体系结构、问题表述、预训练方案、使用基于图像的预训练模型、损失函数、噪声注入、多步输入、额外的静态蒙版、多步微调（包括更大步长模型）以及在更大数据集上训练等，来填补这一知识空白。我们研究了固定网格结构，如UNet，完全卷积结构和基于Transformer的模型，以及与网格无关的架构，包括基于图形和基于运算符的模型。我们的结果显示，固定网格结构优于与网格无关的架构，这表明需要在神经算子等网格无关模型中进一步发展架构。因此，我们提出了一个混合系统，将固定网格模型的强大性能与网格无关的架构的灵活性相结合。我们进一步表明，对于大多数深度学习模型在实践中表现良好，多步微调是必不可少的，这在过去是一种常见做法。与监督训练相比，预训练目标会降低性能，而在某些情况下，与从头开始训练模型相比，基于图像的预训练模型会提供有用的归纳偏差。有趣的是，我们发现当训练较小的模型时，使用更大的数据集会比在较小数据集上更长时间训练效果更好。另一方面，较大模型主要受益于仅增加计算预算。我们相信这些结果将有助于未来设计更好的天气预报系统。

更新时间: 2024-10-09 22:25:50

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2410.07472v1

Density estimation with LLMs: a geometric investigation of in-context learning trajectories

Large language models (LLMs) demonstrate remarkable emergent abilities to perform in-context learning across various tasks, including time series forecasting. This work investigates LLMs' ability to estimate probability density functions (PDFs) from data observed in-context; such density estimation (DE) is a fundamental task underlying many probabilistic modeling problems. We leverage the Intensive Principal Component Analysis (InPCA) to visualize and analyze the in-context learning dynamics of LLaMA-2 models. Our main finding is that these LLMs all follow similar learning trajectories in a low-dimensional InPCA space, which are distinct from those of traditional density estimation methods like histograms and Gaussian kernel density estimation (KDE). We interpret the LLaMA in-context DE process as a KDE with an adaptive kernel width and shape. This custom kernel model captures a significant portion of LLaMA's behavior despite having only two parameters. We further speculate on why LLaMA's kernel width and shape differs from classical algorithms, providing insights into the mechanism of in-context probabilistic reasoning in LLMs.

Updated: 2024-10-09 22:23:20

标题: 使用LLMs进行密度估计：一项关于上下文学习轨迹的几何研究

摘要: 大型语言模型(LLMs)展示了在各种任务中执行上下文学习的卓越能力，包括时间序列预测。本研究调查了LLMs从上下文中观察到的数据中估计概率密度函数(PDFs)的能力；这种密度估计(DE)是许多概率建模问题的基本任务。我们利用密集主成分分析(InPCA)来可视化和分析LLaMA-2模型的上下文学习动态。我们的主要发现是，这些LLMs在低维InPCA空间中都遵循类似的学习轨迹，与传统的密度估计方法如直方图和高斯核密度估计(KDE)不同。我们将LLaMA的上下文DE过程解释为具有自适应核宽度和形状的KDE。这个定制的核模型捕捉了LLaMA行为的大部分，尽管只有两个参数。我们进一步推测为什么LLaMA的核宽度和形状与经典算法不同，从而提供对LLMs中上下文概率推理机制的洞察。

更新时间: 2024-10-09 22:23:20

领域: cs.LG,cs.CL,stat.ML

下载: http://arxiv.org/abs/2410.05218v2

ECCO: Can We Improve Model-Generated Code Efficiency Without Sacrificing Functional Correctness?

Although large language models (LLMs) have been largely successful in generating functionally correct programs, conditioning models to produce efficient solutions while ensuring correctness remains a challenge. Further, unreliability in benchmarking code efficiency is a hurdle across varying hardware specifications for popular interpreted languages such as Python. In this paper, we present ECCO, a reproducible benchmark for evaluating program efficiency via two paradigms: natural language (NL) based code generation and history-based code editing. On ECCO, we adapt and thoroughly investigate the three most promising existing LLM-based approaches: in-context learning, iterative refinement with execution or NL feedback, and fine-tuning conditioned on execution and editing history. While most methods degrade functional correctness and moderately increase program efficiency, we find that adding execution information often helps maintain functional correctness, and NL feedback enhances more on efficiency. We release our benchmark to support future work on LLM-based generation of efficient code.

Updated: 2024-10-09 22:20:40

标题: ECCO：我们能否在不牺牲功能正确性的情况下提高模型生成代码的效率？

摘要: 尽管大型语言模型(LLMs)在生成功能正确的程序方面取得了巨大成功，但是在确保正确性的同时，使模型生成高效解决方案仍然是一个挑战。此外，对于流行的解释型语言如Python，基准代码效率的不可靠性是一个跨不同硬件规格的障碍。在本文中，我们提出了ECCO，一个可重现的基准测试，用于通过两种范式评估程序的效率：自然语言(NL)生成的代码和基于历史的代码编辑。在ECCO上，我们对三种最有前景的基于LLM的方法进行了适应和深入研究：上下文学习、迭代改进与执行或NL反馈、以及根据执行和编辑历史进行微调。虽然大多数方法会降低功能正确性并适度提高程序效率，但我们发现添加执行信息通常有助于保持功能正确性，而NL反馈更有助于提高效率。我们发布了我们的基准测试，以支持未来基于LLM的高效代码生成的工作。

更新时间: 2024-10-09 22:20:40

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2407.14044v2

Systematic Feature Design for Cycle Life Prediction of Lithium-Ion Batteries During Formation

Optimization of the formation step in lithium-ion battery manufacturing is challenging due to limited physical understanding of solid electrolyte interphase formation and the long testing time (~100 days) for cells to reach the end of life. We propose a systematic feature design framework that requires minimal domain knowledge for accurate cycle life prediction during formation. Two simple Q(V) features designed from our framework, extracted from formation data without any additional diagnostic cycles, achieved a median of 9.20% error for cycle life prediction, outperforming thousands of autoML models using pre-defined features. We attribute the strong performance of our designed features to their physical origins - the voltage ranges identified by our framework capture the effects of formation temperature and microscopic particle resistance heterogeneity. By designing highly interpretable features, our approach can accelerate formation research, leveraging the interplay between data-driven feature design and mechanistic understanding.

Updated: 2024-10-09 21:58:54

标题: 锂离子电池形成过程中循环寿命预测的系统特征设计

摘要: 锂离子电池制造中形成步骤的优化挑战重重，因为固体电解质界面形成的物理理解有限，同时电池需要长达约100天的测试时间才能达到寿命终点。我们提出了一个系统的特征设计框架，可以在形成过程中准确预测循环寿命，而且只需要最少的领域知识。我们的框架设计了两个简单的从形成数据中提取的Q(V)特征，无需额外的诊断循环，循环寿命预测的中位误差为9.20%，优于使用预定义特征的数千个autoML模型。我们将我们设计的特征的强大性能归因于它们的物理起源 - 我们的框架所识别的电压范围捕捉了形成温度和微观粒子阻抗异质性的影响。通过设计高度可解释的特征，我们的方法可以加速形成研究，充分利用数据驱动的特征设计和机制理解之间的相互作用。

更新时间: 2024-10-09 21:58:54

领域: cs.LG,stat.AP,15-04,I.2.6

下载: http://arxiv.org/abs/2410.07458v1

Artificial Replay: A Meta-Algorithm for Harnessing Historical Data in Bandits

Most real-world deployments of bandit algorithms exist somewhere in between the offline and online set-up, where some historical data is available upfront and additional data is collected dynamically online. How best to incorporate historical data to "warm start" bandit algorithms is an open question: naively initializing reward estimates using all historical samples can suffer from spurious data and imbalanced data coverage, leading to computation and storage issues-particularly for continuous action spaces. To address these challenges, we propose Artificial-Replay, a meta-algorithm for incorporating historical data into any arbitrary base bandit algorithm. We show that Artificial-Replay uses only a fraction of the historical data compared to a full warm-start approach, while still achieving identical regret for base algorithms that satisfy independence of irrelevant data (IIData), a novel and broadly applicable property that we introduce. We complement these theoretical results with experiments on (i) K-armed bandits and (ii) continuous combinatorial bandits, on which we model green security domains using real poaching data. Our results show the practical benefits of Artificial-Replayin reducing computation and space complexity, including for base algorithms that do not satisfy IIData.

Updated: 2024-10-09 21:48:01

标题: 人工回放：一种在赌博算法中利用历史数据的元算法

摘要: 大多数实际部署的赌博算法存在于离线和在线设置之间的某个地方，其中一些历史数据可提前获得，而其他数据则动态在线收集。如何最好地将历史数据纳入“热启动”赌博算法是一个开放的问题：简单地使用所有历史样本初始化奖励估计可能会受到虚假数据和数据覆盖不平衡的影响，导致计算和存储问题-尤其是对于连续动作空间。为了解决这些挑战，我们提出了一种称为“人工回放”的元算法，用于将历史数据纳入任何任意的基础赌博算法。我们展示了人工回放相比于完整热启动方法仅使用历史数据的一部分，同时仍实现了基础算法的相同遗憾，前提是满足无关数据的独立性（IIData），这是我们引入的一种新颖且广泛适用的属性。我们通过对（i）K臂赌博和（ii）连续组合赌博的实验，以及使用真实偷猎数据对绿色安全领域进行建模，来补充这些理论结果。我们的结果显示了人工回放在减少计算和空间复杂度方面的实际益处，包括对于不满足IIData的基础算法。

更新时间: 2024-10-09 21:48:01

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2210.00025v3

SAGE: Scalable Ground Truth Evaluations for Large Sparse Autoencoders

A key challenge in interpretability is to decompose model activations into meaningful features. Sparse autoencoders (SAEs) have emerged as a promising tool for this task. However, a central problem in evaluating the quality of SAEs is the absence of ground truth features to serve as an evaluation gold standard. Current evaluation methods for SAEs are therefore confronted with a significant trade-off: SAEs can either leverage toy models or other proxies with predefined ground truth features; or they use extensive prior knowledge of realistic task circuits. The former limits the generalizability of the evaluation results, while the latter limits the range of models and tasks that can be used for evaluations. We introduce SAGE: Scalable Autoencoder Ground-truth Evaluation, a ground truth evaluation framework for SAEs that scales to large state-of-the-art SAEs and models. We demonstrate that our method can automatically identify task-specific activations and compute ground truth features at these points. Compared to previous methods we reduce the training overhead by introducing a novel reconstruction method that allows to apply residual stream SAEs to sublayer activations. This eliminates the need for SAEs trained on every task-specific activation location. Then we validate the scalability of our framework, by evaluating SAEs on novel tasks on Pythia70M, GPT-2 Small, and Gemma-2-2. Our framework therefore paves the way for generalizable, large-scale evaluations of SAEs in interpretability research.

Updated: 2024-10-09 21:42:39

标题: SAGE：用于大型稀疏自编码器的可扩展地面真实性评估

摘要: 在可解释性方面的一个关键挑战是将模型激活分解为有意义的特征。稀疏自动编码器（SAEs）已经成为这一任务的一个有前途的工具。然而，评估SAEs质量的一个核心问题是缺乏作为评估黄金标准的地面真实特征。因此，当前用于SAEs的评估方法面临着一个重要的权衡：SAEs可以利用玩具模型或其他具有预定义地面真实特征的替代物；或者它们使用现实任务电路的广泛先验知识。前者限制了评估结果的泛化能力，而后者限制了可以用于评估的模型和任务的范围。我们引入了SAGE：可扩展自动编码器地面真实评估，这是一个适用于大型最先进SAEs和模型的地面真实评估框架。我们证明了我们的方法可以自动识别任务特定的激活并在这些点计算地面真实特征。与以前的方法相比，我们通过引入一种新颖的重建方法来减少训练开销，该方法允许将残差流SAEs应用于次层激活。这消除了对在每个任务特定激活位置进行训练的SAEs的需求。然后，我们通过在Pythia70M、GPT-2 Small和Gemma-2-2上评估SAEs，验证了我们框架的可扩展性。因此，我们的框架为解释性研究中对SAEs进行泛化、大规模评估铺平了道路。

更新时间: 2024-10-09 21:42:39

领域: cs.LG

下载: http://arxiv.org/abs/2410.07456v1

A ripple in time: a discontinuity in American history

In this technical note we suggest a novel approach to discover temporal (related and unrelated to language dilation) and personality (authorship attribution) aspects in historical datasets. We exemplify our approach on the State of the Union addresses given by the past 42 US presidents: this dataset is known for its relatively small amount of data, and high variability of the size and style of texts. Nevertheless, we manage to achieve about 95\% accuracy on the authorship attribution task, and pin down the date of writing to a single presidential term.

Updated: 2024-10-09 21:40:46

标题: 时间中的涟漪：美国历史中的不连续性

摘要: 在这个技术说明中，我们提出了一种新颖的方法，用于发现历史数据集中的时间（与语言膨胀相关和不相关）和个性（作者归属）方面。我们以过去42位美国总统发表的国情咨文为例来展示我们的方法：这个数据集以数据量相对较小和文本大小和风格的高可变性而闻名。然而，我们设法在作者归属任务上实现约95\%的准确率，并确定写作日期为一个总统任期。

更新时间: 2024-10-09 21:40:46

领域: cs.CL,cs.AI,cs.LG,cs.SI,I.2.7; I.5.4; H.3.1; H.3.3

下载: http://arxiv.org/abs/2312.01185v6

Representation-Enhanced Neural Knowledge Integration with Application to Large-Scale Medical Ontology Learning

A large-scale knowledge graph enhances reproducibility in biomedical data discovery by providing a standardized, integrated framework that ensures consistent interpretation across diverse datasets. It improves generalizability by connecting data from various sources, enabling broader applicability of findings across different populations and conditions. Generating reliable knowledge graph, leveraging multi-source information from existing literature, however, is challenging especially with a large number of node sizes and heterogeneous relations. In this paper, we propose a general theoretically guaranteed statistical framework, called RENKI, to enable simultaneous learning of multiple relation types. RENKI generalizes various network models widely used in statistics and computer science. The proposed framework incorporates representation learning output into initial entity embedding of a neural network that approximates the score function for the knowledge graph and continuously trains the model to fit observed facts. We prove nonasymptotic bounds for in-sample and out-of-sample weighted MSEs in relation to the pseudo-dimension of the knowledge graph function class. Additionally, we provide pseudo-dimensions for score functions based on multilayer neural networks with ReLU activation function, in the scenarios when the embedding parameters either fixed or trainable. Finally, we complement our theoretical results with numerical studies and apply the method to learn a comprehensive medical knowledge graph combining a pretrained language model representation with knowledge graph links observed in several medical ontologies. The experiments justify our theoretical findings and demonstrate the effect of weighting in the presence of heterogeneous relations and the benefit of incorporating representation learning in nonparametric models.

Updated: 2024-10-09 21:38:48

标题: 增强表示的神经知识集成及其在大规模医学本体学习中的应用

摘要: 一个大规模的知识图谱通过提供一个标准化、集成的框架，增强了生物医学数据发现的可重复性，确保在不同数据集之间保持一致的解释。它通过连接来自各种来源的数据，提高了泛化能力，使研究结果更广泛地适用于不同的人群和条件。然而，生成可靠的知识图谱，利用现有文献中的多源信息，尤其是在节点数量众多和关系异质的情况下，是具有挑战性的。在本文中，我们提出了一个通用的理论保证的统计框架，称为RENKI，以实现多关系类型的同时学习。RENKI泛化了统计学和计算机科学中广泛使用的各种网络模型。提出的框架将表示学习输出与神经网络的初始实体嵌入相结合，近似知识图谱的评分函数，并持续训练模型以适应观察到的事实。我们证明了关于知识图谱函数类的伪维数的样本内和样本外加权均方误差的非渐近界。此外，我们提供了基于具有ReLU激活函数的多层神经网络的评分函数的伪维数，在嵌入参数固定或可训练的情况下。最后，我们通过数值研究补充了我们的理论结果，并将该方法应用于学习一个综合的医学知识图谱，结合了预先训练的语言模型表示和在几个医学本体中观察到的知识图谱链接。实验证明了我们的理论发现，并展示了在异质关系存在的情况下加权的效果以及将表示学习纳入非参数模型的益处。

更新时间: 2024-10-09 21:38:48

领域: stat.ME,cs.LG,math.ST,stat.TH

下载: http://arxiv.org/abs/2410.07454v1

Collective variables of neural networks: empirical time evolution and scaling laws

This work presents a novel means for understanding learning dynamics and scaling relations in neural networks. We show that certain measures on the spectrum of the empirical neural tangent kernel, specifically entropy and trace, yield insight into the representations learned by a neural network and how these can be improved through architecture scaling. These results are demonstrated first on test cases before being shown on more complex networks, including transformers, auto-encoders, graph neural networks, and reinforcement learning studies. In testing on a wide range of architectures, we highlight the universal nature of training dynamics and further discuss how it can be used to understand the mechanisms behind learning in neural networks. We identify two such dominant mechanisms present throughout machine learning training. The first, information compression, is seen through a reduction in the entropy of the NTK spectrum during training, and occurs predominantly in small neural networks. The second, coined structure formation, is seen through an increasing entropy and thus, the creation of structure in the neural network representations beyond the prior established by the network at initialization. Due to the ubiquity of the latter in deep neural network architectures and its flexibility in the creation of feature-rich representations, we argue that this form of evolution of the network's entropy be considered the onset of a deep learning regime.

Updated: 2024-10-09 21:37:14

标题: 神经网络的集体变量：经验性时间演化和标度律

摘要: 这项工作提出了一种新颖的方法，用于理解神经网络中学习动态和缩放关系。我们展示了在经验神经切向核谱上的特定测量，特别是熵和迹，可以揭示神经网络学习到的表示以及通过架构缩放如何改善这些表示。这些结果首先在测试案例上进行展示，然后在更复杂的网络上展示，包括变压器、自动编码器、图神经网络和强化学习研究。在对各种架构进行测试时，我们强调了训练动态的普遍性，并进一步讨论了如何利用它来理解神经网络中学习背后的机制。我们确定了两种在机器学习训练过程中存在的主要机制。第一种是信息压缩，通过训练过程中神经切向核谱熵的减少来观察，并主要出现在小型神经网络中。第二种被称为结构形成，通过熵的增加来观察，从而在神经网络表示中创建结构，超越了初始化时网络建立的结构。由于后者在深度神经网络架构中的普遍性以及在创建特征丰富表示方面的灵活性，我们认为这种网络熵的演化应被视为深度学习制度的开始。

更新时间: 2024-10-09 21:37:14

领域: cs.LG,physics.comp-ph

下载: http://arxiv.org/abs/2410.07451v1

TinyLidarNet: 2D LiDAR-based End-to-End Deep Learning Model for F1TENTH Autonomous Racing

Prior research has demonstrated the effectiveness of end-to-end deep learning for robotic navigation, where the control signals are directly derived from raw sensory data. However, the majority of existing end-to-end navigation solutions are predominantly camera-based. In this paper, we introduce TinyLidarNet, a lightweight 2D LiDAR-based end-to-end deep learning model for autonomous racing. An F1TENTH vehicle using TinyLidarNet won 3rd place in the 12th F1TENTH Autonomous Grand Prix competition, demonstrating its competitive performance. We systematically analyze its performance on untrained tracks and computing requirements for real-time processing. We find that TinyLidarNet's 1D Convolutional Neural Network (CNN) based architecture significantly outperforms widely used Multi-Layer Perceptron (MLP) based architecture. In addition, we show that it can be processed in real-time on low-end micro-controller units (MCUs).

Updated: 2024-10-09 21:28:33

标题: TinyLidarNet：基于2D LiDAR的F1TENTH自主赛车端到端深度学习模型

摘要: 先前的研究已经证明了端到端深度学习在机器人导航中的有效性，其中控制信号直接从原始感知数据中获取。然而，现有的大多数端到端导航解决方案主要基于摄像头。本文介绍了TinyLidarNet，这是一个基于轻量级2D激光雷达的端到端深度学习模型，用于自主赛车。使用TinyLidarNet的F1TENTH车辆在第12届F1TENTH自主大奖赛中获得第三名，展示了其竞争性能。我们系统地分析了它在未经训练的赛道上的表现和实时处理的计算需求。我们发现，TinyLidarNet基于1D卷积神经网络（CNN）的架构明显优于广泛使用的基于多层感知器（MLP）的架构。此外，我们展示了它可以在低端微控制器单元（MCUs）上实时处理。

更新时间: 2024-10-09 21:28:33

领域: cs.RO,cs.AI,cs.CV,cs.LG

下载: http://arxiv.org/abs/2410.07447v1

KACQ-DCNN: Uncertainty-Aware Interpretable Kolmogorov-Arnold Classical-Quantum Dual-Channel Neural Network for Heart Disease Detection

Heart failure remains a major global health challenge, contributing significantly to the 17.8 million annual deaths from cardiovascular disease, highlighting the need for improved diagnostic tools. Current heart disease prediction models based on classical machine learning face limitations, including poor handling of high-dimensional, imbalanced data, limited performance on small datasets, and a lack of uncertainty quantification, while also being difficult for healthcare professionals to interpret. To address these issues, we introduce KACQ-DCNN, a novel classical-quantum hybrid dual-channel neural network that replaces traditional multilayer perceptrons and convolutional layers with Kolmogorov-Arnold Networks (KANs). This approach enhances function approximation with learnable univariate activation functions, reducing model complexity and improving generalization. The KACQ-DCNN 4-qubit 1-layered model significantly outperforms 37 benchmark models across multiple metrics, achieving an accuracy of 92.03%, a macro-average precision, recall, and F1 score of 92.00%, and an ROC-AUC score of 94.77%. Ablation studies demonstrate the synergistic benefits of combining classical and quantum components with KAN. Additionally, explainability techniques like LIME and SHAP provide feature-level insights, improving model transparency, while uncertainty quantification via conformal prediction ensures robust probability estimates. These results suggest that KACQ-DCNN offers a promising path toward more accurate, interpretable, and reliable heart disease predictions, paving the way for advancements in cardiovascular healthcare.

Updated: 2024-10-09 21:26:49

标题: KACQ-DCNN：心脏病检测的不确定性感知可解释Kolmogorov-Arnold经典-量子双通道神经网络

摘要: 心力衰竭仍然是全球重大的健康挑战，对心血管疾病造成的每年1780万死亡人数有着显著的贡献，突显了对改进诊断工具的需求。目前基于经典机器学习的心脏疾病预测模型存在一些限制，包括对高维度、不平衡数据的处理不佳，对小数据集性能有限，以及缺乏不确定性量化，同时也难以供医疗专业人员解释。为了解决这些问题，我们引入了KACQ-DCNN，这是一种新颖的经典量子混合双通道神经网络，它用Kolmogorov-Arnold Networks (KANs) 替代了传统的多层感知器和卷积层。这种方法通过可学习的单变量激活函数增强了函数逼近，降低了模型复杂性，提高了泛化能力。KACQ-DCNN 4量子比特1层模型在多个指标上显著优于37个基准模型，实现了92.03%的准确度，92.00%的宏均值精度、召回率和F1分数，以及94.77%的ROC-AUC分数。消融研究表明了将经典和量子组件与KAN相结合的协同效益。此外，像LIME和SHAP这样的可解释性技术提供了特征级别的洞察，提高了模型的透明度，而通过符合预测进行不确定性量化确保了稳健的概率估计。这些结果表明，KACQ-DCNN为更准确、可解释和可靠的心脏疾病预测提供了一个有前途的途径，为心血管医疗的进步铺平了道路。

更新时间: 2024-10-09 21:26:49

领域: cs.LG

下载: http://arxiv.org/abs/2410.07446v1

Harnessing Generative AI for Economic Insights

We use generative AI to extract managerial expectations about their economic outlook from over 120,000 corporate conference call transcripts. The overall measure, AI Economy Score, robustly predicts future economic indicators such as GDP growth, production, and employment, both in the short term and to 10 quarters. This predictive power is incremental to that of existing measures, including survey forecasts. Moreover, industry and firm-level measures provide valuable information about sector-specific and individual firm activities. Our findings suggest that managerial expectations carry unique insights about economic activities, with implications for both macroeconomic and microeconomic decision-making.

Updated: 2024-10-09 21:25:56

标题: 利用生成式人工智能获取经济洞见

摘要: 我们利用生成式人工智能从超过12万份企业电话会议记录中提取管理人员对经济前景的期望。总体指标AI经济得分能够强有力地预测未来经济指标，如GDP增长、生产和就业，无论是短期还是未来10个季度。这种预测能力是现有指标（包括调查预测）之外的增量。此外，行业和公司级别的指标提供了有关特定行业和个别公司活动的宝贵信息。我们的研究结果表明，管理人员的期望提供了有关经济活动的独特见解，对宏观经济和微观经济决策都具有重要意义。

更新时间: 2024-10-09 21:25:56

领域: q-fin.CP,cs.LG,econ.GN,q-fin.EC

下载: http://arxiv.org/abs/2410.03897v2

Zero-Shot Generalization of Vision-Based RL Without Data Augmentation

Generalizing vision-based reinforcement learning (RL) agents to novel environments remains a difficult and open challenge. Current trends are to collect large-scale datasets or use data augmentation techniques to prevent overfitting and improve downstream generalization. However, the computational and data collection costs increase exponentially with the number of task variations and can destabilize the already difficult task of training RL agents. In this work, we take inspiration from recent advances in computational neuroscience and propose a model, Associative Latent DisentAnglement (ALDA), that builds on standard off-policy RL towards zero-shot generalization. Specifically, we revisit the role of latent disentanglement in RL and show how combining it with a model of associative memory achieves zero-shot generalization on difficult task variations without relying on data augmentation. Finally, we formally show that data augmentation techniques are a form of weak disentanglement and discuss the implications of this insight.

Updated: 2024-10-09 21:14:09

标题: 无数据增强的基于视觉的RL的零样本泛化

摘要: 将基于视觉的强化学习（RL）代理推广到新环境仍然是一个困难且尚未解决的挑战。当前的趋势是收集大规模数据集或使用数据增强技术来防止过拟合并提高下游泛化能力。然而，随着任务变化数量的增加，计算和数据收集成本呈指数级增长，可能会使训练RL代理变得更加困难。在这项工作中，我们受到计算神经科学最新进展的启发，提出了一种模型，即关联潜在分离（ALDA），该模型基于标准的离线策略RL朝向零样本泛化。具体而言，我们重新审视了RL中潜在分离的作用，并展示了如何将其与关联记忆模型结合，实现在困难的任务变化上实现零样本泛化，而无需依赖数据增强。最后，我们正式证明了数据增强技术是一种弱分离的形式，并讨论了这一洞察的影响。

更新时间: 2024-10-09 21:14:09

领域: cs.LG,cs.AI,cs.CV,cs.RO

下载: http://arxiv.org/abs/2410.07441v1

Toward Robust Real-World Audio Deepfake Detection: Closing the Explainability Gap

The rapid proliferation of AI-manipulated or generated audio deepfakes poses serious challenges to media integrity and election security. Current AI-driven detection solutions lack explainability and underperform in real-world settings. In this paper, we introduce novel explainability methods for state-of-the-art transformer-based audio deepfake detectors and open-source a novel benchmark for real-world generalizability. By narrowing the explainability gap between transformer-based audio deepfake detectors and traditional methods, our results not only build trust with human experts, but also pave the way for unlocking the potential of citizen intelligence to overcome the scalability issue in audio deepfake detection.

Updated: 2024-10-09 21:08:28

标题: 朝着强大的真实世界音频深度造假检测：填补解释能力差距

摘要: 人工智能操纵或生成的音频深度伪造迅速传播，对媒体诚信和选举安全构成严重挑战。当前基于人工智能的检测解决方案缺乏可解释性，在现实世界中表现不佳。本文介绍了一种新颖的可解释性方法，用于最先进的基于转换器的音频深度伪造检测器，并开源了一个新颖的真实世界通用性基准。通过缩小基于转换器的音频深度伪造检测器与传统方法之间的可解释性差距，我们的结果不仅在人类专家中建立了信任，还为释放公民情报的潜力铺平了道路，以克服音频深度伪造检测中的可扩展性问题。

更新时间: 2024-10-09 21:08:28

领域: cs.LG,cs.SD,eess.AS

下载: http://arxiv.org/abs/2410.07436v1

Can Transformers Reason Logically? A Study in SAT Solving

We theoretically and empirically study the logical reasoning capabilities of LLMs in the context of the Boolean satisfiability (SAT) problem. First, we construct a decoder-only Transformer that can solve SAT using backtracking and deduction via Chain-of-Thought (CoT). We prove its correctness by showing trace equivalence to the well-known DPLL SAT-solving algorithm. Second, to support the implementation of this abstract construction, we design a compiler $\texttt{PARAT}$ that takes as input a procedural specification and outputs a transformer model implementing this specification. Third, rather than $\textit{programming}$ a transformer to reason, we evaluate empirically whether it can be $\textit{trained}$ to do so by learning directly from algorithmic traces ("reasoning paths") of the DPLL algorithm.

Updated: 2024-10-09 21:01:52

标题: 变压器可以逻辑推理吗？一项SAT求解研究

摘要: 我们在布尔可满足性（SAT）问题的背景下，从理论和实证两方面研究了LLMs的逻辑推理能力。首先，我们构建了一个仅解码器的Transformer，可以通过回溯和推理的方式使用Chain-of-Thought（CoT）解决SAT问题。我们通过展示其与著名的DPLL SAT求解算法的跟踪等价性来证明其正确性。其次，为了支持这一抽象构建的实现，我们设计了一个编译器PARAT，其以过程规范为输入，并输出一个实现此规范的Transformer模型。第三，与其将Transformer进行编程推理，我们通过实证评估来确定它能否通过直接从DPLL算法的算法迹（“推理路径”）中学习来进行训练以进行推理。

更新时间: 2024-10-09 21:01:52

领域: cs.LG,cs.AI,cs.LO

下载: http://arxiv.org/abs/2410.07432v1

EventFlow: Forecasting Continuous-Time Event Data with Flow Matching

Continuous-time event sequences, in which events occur at irregular intervals, are ubiquitous across a wide range of industrial and scientific domains. The contemporary modeling paradigm is to treat such data as realizations of a temporal point process, and in machine learning it is common to model temporal point processes in an autoregressive fashion using a neural network. While autoregressive models are successful in predicting the time of a single subsequent event, their performance can be unsatisfactory in forecasting longer horizons due to cascading errors. We propose EventFlow, a non-autoregressive generative model for temporal point processes. Our model builds on the flow matching framework in order to directly learn joint distributions over event times, side-stepping the autoregressive process. EventFlow is likelihood-free, easy to implement and sample from, and either matches or surpasses the performance of state-of-the-art models in both unconditional and conditional generation tasks on a set of standard benchmarks

Updated: 2024-10-09 20:57:00

标题: EventFlow：使用流匹配预测连续时间事件数据

摘要: 连续时间事件序列，在这种序列中事件发生在不规则的间隔内，广泛存在于各种工业和科学领域。当代建模范式是将这些数据视为时间点过程的实现，在机器学习中，通常使用神经网络以自回归方式对时间点过程进行建模。虽然自回归模型在预测单个随后事件的时间方面很成功，但由于级联错误，它们在预测更长时间跨度时的性能可能令人不满意。我们提出了EventFlow，这是一个非自回归的生成模型，用于时间点过程。我们的模型建立在流匹配框架之上，以直接学习事件时间的联合分布，避开了自回归过程。EventFlow是无似然函数的，易于实现和抽样，而且在一组标准基准测试中，在无条件和有条件生成任务中，它要么与或超过现有模型的性能。

更新时间: 2024-10-09 20:57:00

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2410.07430v1

Recurrent Drafter for Fast Speculative Decoding in Large Language Models

We present Recurrent Drafter (ReDrafter), an advanced speculative decoding approach that achieves state-of-the-art speedup for large language models (LLMs) inference. The performance gains are driven by three key aspects: (1) leveraging a recurrent neural network (RNN) as the draft model conditioning on LLM's hidden states, (2) applying a dynamic tree attention algorithm over beam search results to eliminate duplicated prefixes in candidate sequences, and (3) training through knowledge distillation from the LLM. ReDrafter accelerates Vicuna inference in MT-Bench by up to 3.5x with a PyTorch implementation on Nvidia H100 GPUs. To demonstrate its practicality in production environments, we integrate ReDrafter into TensorRT-LLM, reaching up to 2.5x speedup on H100 GPUs. We also validated its effectiveness for on-device applications by implementing the approach in MLX and benchmarking performance on Metal GPUs in Apple Silicon chips, achieving up to 2.3x speedup.

Updated: 2024-10-09 20:54:02

标题: 大语言模型中快速推理解码的循环草稿者

摘要: 我们提出了一种先进的推测解码方法Recurrent Drafter（ReDrafter），它在大型语言模型（LLMs）推断中实现了最新的加速。性能提升主要由三个关键因素驱动：（1）利用循环神经网络（RNN）作为草稿模型，以LLM的隐藏状态为条件，（2）在波束搜索结果上应用动态树注意力算法，以消除候选序列中的重复前缀，（3）通过知识蒸馏从LLM中进行训练。ReDrafter 在 Nvidia H100 GPU 上的 PyTorch 实现中，将 Vicuna 推断在 MT-Bench 中加速了最多 3.5 倍。为了展示其在生产环境中的实用性，我们将 ReDrafter 集成到 TensorRT-LLM 中，在 H100 GPU 上实现了最多 2.5 倍的加速。我们还通过在 Apple Silicon 芯片上的 Metal GPU 上实施该方法，并在 MLX 中进行性能基准测试，实现了最多 2.3 倍的加速，验证了其在设备应用中的有效性。

更新时间: 2024-10-09 20:54:02

领域: cs.CL,cs.LG

下载: http://arxiv.org/abs/2403.09919v4

Language Models as Hierarchy Encoders

Interpreting hierarchical structures latent in language is a key limitation of current language models (LMs). While previous research has implicitly leveraged these hierarchies to enhance LMs, approaches for their explicit encoding are yet to be explored. To address this, we introduce a novel approach to re-train transformer encoder-based LMs as Hierarchy Transformer encoders (HiTs), harnessing the expansive nature of hyperbolic space. Our method situates the output embedding space of pre-trained LMs within a Poincar\'e ball with a curvature that adapts to the embedding dimension, followed by training on hyperbolic clustering and centripetal losses. These losses are designed to effectively cluster related entities (input as texts) and organise them hierarchically. We evaluate HiTs against pre-trained LMs, standard fine-tuned LMs, and several hyperbolic embedding baselines, focusing on their capabilities in simulating transitive inference, predicting subsumptions, and transferring knowledge across hierarchies. The results demonstrate that HiTs consistently outperform all baselines in these tasks, underscoring the effectiveness and transferability of our re-trained hierarchy encoders.

Updated: 2024-10-09 20:51:58

标题: 语言模型作为层次编码器

摘要: 解释隐藏在语言中的层次结构是当前语言模型（LMs）的一个关键限制。虽然先前的研究已经隐式地利用这些层次来增强LMs，但对于它们的显式编码方法尚未被探索。为了解决这个问题，我们引入了一种新颖的方法，将基于变压器编码器的LMs重新训练为层次变压器编码器（HiTs），利用双曲空间的广阔特性。我们的方法将预训练LMs的输出嵌入空间置于一个适应嵌入维度的Poincaré球内，然后在双曲聚类和离心损失上进行训练。这些损失旨在有效地将相关实体（输入为文本）聚类并按层次组织它们。我们评估了HiTs与预训练LMs、标准微调LMs以及几种双曲嵌入基线之间的比较，重点关注它们在模拟传递推理、预测包含关系和在层次结构之间转移知识方面的能力。结果表明，HiTs在这些任务中始终优于所有基线，突出了我们重新训练的层次编码器的有效性和可转移性。

更新时间: 2024-10-09 20:51:58

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2401.11374v3

Gymnasium: A Standard Interface for Reinforcement Learning Environments

Reinforcement Learning (RL) is a continuously growing field that has the potential to revolutionize many areas of artificial intelligence. However, despite its promise, RL research is often hindered by the lack of standardization in environment and algorithm implementations. This makes it difficult for researchers to compare and build upon each other's work, slowing down progress in the field. Gymnasium is an open-source library that provides a standard API for RL environments, aiming to tackle this issue. Gymnasium's main feature is a set of abstractions that allow for wide interoperability between environments and training algorithms, making it easier for researchers to develop and test RL algorithms. In addition, Gymnasium provides a collection of easy-to-use environments, tools for easily customizing environments, and tools to ensure the reproducibility and robustness of RL research. Through this unified framework, Gymnasium significantly streamlines the process of developing and testing RL algorithms, enabling researchers to focus more on innovation and less on implementation details. By providing a standardized platform for RL research, Gymnasium helps to drive forward the field of reinforcement learning and unlock its full potential. Gymnasium is available online at https://github.com/Farama-Foundation/Gymnasium

Updated: 2024-10-09 20:48:15

标题: 体育馆：强化学习环境的标准界面

摘要: 强化学习（RL）是一个不断发展的领域，有潜力彻底改变人工智能的许多领域。然而，尽管有这样的前景，RL研究常常受到环境和算法实现的标准化不足的阻碍。这使得研究人员难以比较和借鉴彼此的工作，从而减缓了该领域的进展。Gymnasium是一个开源库，提供了一个标准的RL环境API，旨在解决这一问题。 Gymnasium的主要特点是一组抽象，允许环境和训练算法之间的广泛互操作性，使研究人员更容易开发和测试RL算法。此外，Gymnasium提供了一系列易于使用的环境，用于轻松定制环境的工具，以及确保RL研究的可重现性和稳健性的工具。通过这一统一框架，Gymnasium显著简化了开发和测试RL算法的过程，使研究人员更专注于创新而不是实现细节。通过为RL研究提供一个标准化平台，Gymnasium有助于推动强化学习领域的发展，释放其全部潜力。Gymnasium可以在线获取，网址为https://github.com/Farama-Foundation/Gymnasium。

更新时间: 2024-10-09 20:48:15

领域: cs.LG,cs.DL

下载: http://arxiv.org/abs/2407.17032v2

A Framework for SLO, Carbon, and Wastewater-Aware Sustainable FaaS Cloud Platform Management

Function-as-a-Service (FaaS) is a growing cloud computing paradigm that is expected to reduce the user cost of service over traditional serverful approaches. However, the environmental impact of FaaS has not received much attention. We investigate FaaS scheduling and scaling from a sustainability perspective in this work. We find that the service-level objectives (SLOs) of FaaS and carbon emissions conflict with each other. We also find that SLO-focused FaaS scheduling can exacerbate water use in a datacenter. We propose a novel sustainability-focused FaaS scheduling and scaling framework to co-optimize SLO performance, carbon emissions, and wastewater generation.

Updated: 2024-10-09 20:47:52

标题: 一个SLO、碳排放和废水感知可持续的FaaS云平台管理框架

摘要: Function-as-a-Service（FaaS）是一种不断增长的云计算范式，预计将降低用户服务成本，相较于传统的基于服务器的方法。然而，FaaS的环境影响并没有受到太多关注。在这项工作中，我们从可持续性的角度研究了FaaS的调度和扩展。我们发现FaaS的服务水平目标（SLOs）与碳排放之间存在冲突。我们还发现，以SLO为重点的FaaS调度可能会加剧数据中心的用水量。我们提出了一个新颖的以可持续性为重点的FaaS调度和扩展框架，以共同优化SLO性能、碳排放和废水生成。

更新时间: 2024-10-09 20:47:52

领域: cs.DC,cs.AI,cs.LG

下载: http://arxiv.org/abs/2410.11875v1

CAFEEN: A Cooperative Approach for Energy Efficient NoCs with Multi-Agent Reinforcement Learning

In emerging high-performance Network-on-Chip (NoC) architectures, efficient power management is crucial to minimize energy consumption. We propose a novel framework called CAFEEN that employs both heuristic-based fine-grained and machine learning-based coarse-grained power-gating for energy-efficient NoCs. CAFEEN uses a fine-grained method to activate only essential NoC buffers during lower network loads. It switches to a coarse-grained method at peak loads to minimize compounding wake-up overhead using multi-agent reinforcement learning. Results show that CAFEEN adaptively balances power-efficiency with performance, reducing total energy by 2.60x for single application workloads and 4.37x for multi-application workloads, compared to state-of-the-art NoC power-gating frameworks.

Updated: 2024-10-09 20:42:55

标题: CAFEEN：一种采用多智能体强化学习的合作式能效NoCs方法

摘要: 在新兴的高性能网络芯片（NoC）架构中，高效的功耗管理对于最小化能量消耗至关重要。我们提出了一个名为CAFEEN的新颖框架，该框架采用基于启发式的细粒度和基于机器学习的粗粒度功耗门控，用于能效NoC。CAFEEN使用细粒度方法在网络负载较低时仅激活必要的NoC缓冲区。在峰值负载时切换到粗粒度方法，利用多智能体强化学习来最小化累积唤醒开销。结果显示，与最先进的NoC功耗门控框架相比，CAFEEN在单应用工作负载和多应用工作负载下将总能量减少了2.60倍和4.37倍，同时自适应平衡功耗效率和性能。

更新时间: 2024-10-09 20:42:55

领域: cs.LG,cs.AI,cs.AR

下载: http://arxiv.org/abs/2410.07426v1

Bayes-Nash Generative Privacy Protection Against Membership Inference Attacks

An ability to share data, even in aggregated form, is critical to advancing both conventional and data science. However, insofar as such datasets are comprised of individuals, their membership in these datasets is often viewed as sensitive, with membership inference attacks (MIAs) threatening to violate their privacy. We propose a Bayesian game model for privacy-preserving publishing of data-sharing mechanism outputs (for example, summary statistics for sharing genomic data). In this game, the defender minimizes a combination of expected utility and privacy loss, with the latter being maximized by a Bayes-rational attacker. We propose a GAN-style algorithm to approximate a Bayes-Nash equilibrium of this game, and introduce the notions of Bayes-Nash generative privacy (BNGP) and Bayes generative privacy (BGP) risk that aims to optimally balance the defender's privacy and utility in a way that is robust to the attacker's heterogeneous preferences with respect to true and false positives. We demonstrate the properties of composition and post-processing for BGP risk and establish conditions under which BNGP and pure differential privacy (PDP) are equivalent. We apply our method to sharing summary statistics, where MIAs can re-identify individuals even from aggregated data. Theoretical analysis and empirical results demonstrate that our Bayesian game-theoretic method outperforms state-of-the-art approaches for privacy-preserving sharing of summary statistics.

Updated: 2024-10-09 20:29:04

标题: Bayes-Nash生成隐私保护对抗成员推理攻击

摘要: 能够共享数据的能力，即使是以聚合形式，对于推动传统和数据科学至关重要。然而，由于这些数据集由个人组成，他们在这些数据集中的成员身份通常被视为敏感，成员推断攻击(MIAs)威胁侵犯他们的隐私。我们提出了一个贝叶斯博弈模型，用于隐私保护发布数据共享机制的输出（例如，用于共享基因组数据的摘要统计信息）。在这个博弈中，防御者最小化预期效用和隐私损失的组合，后者由贝叶斯理性攻击者最大化。我们提出了一种类似GAN风格的算法来近似这个博弈的贝叶斯纳什均衡，并引入了贝叶斯纳什生成隐私（BNGP）和贝叶斯生成隐私（BGP）风险的概念，旨在以一种对抗者对真假阳性具有异质偏好的方式，最佳平衡防御者的隐私和效用。我们展示了BGP风险的合成和后处理特性，并建立了BNGP和纯差分隐私（PDP）等价的条件。我们将我们的方法应用于共享摘要统计数据，其中MIAs甚至可以从聚合数据中重新识别个人。理论分析和实证结果表明，我们的贝叶斯博弈论方法优于最先进的方法，用于隐私保护共享摘要统计数据。

更新时间: 2024-10-09 20:29:04

领域: cs.CR

下载: http://arxiv.org/abs/2410.07414v1

Step-by-Step Reasoning for Math Problems via Twisted Sequential Monte Carlo

Augmenting the multi-step reasoning abilities of Large Language Models (LLMs) has been a persistent challenge. Recently, verification has shown promise in improving solution consistency by evaluating generated outputs. However, current verification approaches suffer from sampling inefficiencies, requiring a large number of samples to achieve satisfactory performance. Additionally, training an effective verifier often depends on extensive process supervision, which is costly to acquire. In this paper, we address these limitations by introducing a novel verification method based on Twisted Sequential Monte Carlo (TSMC). TSMC sequentially refines its sampling effort to focus exploration on promising candidates, resulting in more efficient generation of high-quality solutions. We apply TSMC to LLMs by estimating the expected future rewards at partial solutions. This approach results in a more straightforward training target that eliminates the need for step-wise human annotations. We empirically demonstrate the advantages of our method across multiple math benchmarks, and also validate our theoretical analysis of both our approach and existing verification methods.

Updated: 2024-10-09 20:20:47

标题: 通过扭曲的顺序蒙特卡洛逐步推理数学问题

摘要: 大语言模型（LLMs）的多步推理能力一直是一个持续的挑战。最近，验证已经显示出潜力，通过评估生成的输出来改善解决方案的一致性。然而，目前的验证方法存在采样效率低下的问题，需要大量样本才能实现令人满意的性能。此外，训练一个有效的验证器通常取决于广泛的过程监督，这是昂贵的。在本文中，我们通过引入基于扭曲顺序蒙特卡罗（TSMC）的新颖验证方法来解决这些限制。TSMC逐步改进其采样努力，将探索重点放在有前景的候选者上，从而更有效地生成高质量的解决方案。我们通过在部分解决方案上估计未来预期奖励来将TSMC应用于LLMs。这种方法导致了一个更直接的训练目标，消除了逐步人类注释的需要。我们在多个数学基准测试中经验性地展示了我们的方法的优势，并验证了我们方法和现有验证方法的理论分析。

更新时间: 2024-10-09 20:20:47

领域: cs.LG

下载: http://arxiv.org/abs/2410.01920v3

Learning responsibility allocations for multi-agent interactions: A differentiable optimization approach with control barrier functions

From autonomous driving to package delivery, ensuring safe yet efficient multi-agent interaction is challenging as the interaction dynamics are influenced by hard-to-model factors such as social norms and contextual cues. Understanding these influences can aid in the design and evaluation of socially-aware autonomous agents whose behaviors are aligned with human values. In this work, we seek to codify factors governing safe multi-agent interactions via the lens of responsibility, i.e., an agent's willingness to deviate from their desired control to accommodate safe interaction with others. Specifically, we propose a data-driven modeling approach based on control barrier functions and differentiable optimization that efficiently learns agents' responsibility allocation from data. We demonstrate on synthetic and real-world datasets that we can obtain an interpretable and quantitative understanding of how much agents adjust their behavior to ensure the safety of others given their current environment.

Updated: 2024-10-09 20:20:41

标题: 学习多智能体互动的责任分配：一种可微分优化方法与控制屏障函数

摘要: 从自动驾驶到包裹递送，确保安全而高效的多智能体互动是具有挑战性的，因为互动动态受到难以建模的因素的影响，如社会规范和情境提示。理解这些影响有助于设计和评估行为与人类价值观一致的具有社会意识的自主智能体。在这项工作中，我们通过责任的视角寻求对安全多智能体互动进行规范化的因素，即智能体愿意偏离其期望的控制以适应与他人的安全互动。具体来说，我们提出了一种基于控制屏障函数和可微优化的数据驱动建模方法，可以有效地从数据中学习智能体的责任分配。我们在合成和真实世界数据集上展示，我们可以获得关于智能体在当前环境下为确保他人安全而调整行为的可解释和定量理解。

更新时间: 2024-10-09 20:20:41

领域: eess.SY,cs.LG,cs.MA,cs.RO,cs.SY

下载: http://arxiv.org/abs/2410.07409v1

LLaMP: Large Language Model Made Powerful for High-fidelity Materials Knowledge Retrieval and Distillation

Reducing hallucination of Large Language Models (LLMs) is imperative for use in the sciences, where reliability and reproducibility are crucial. However, LLMs inherently lack long-term memory, making it a nontrivial, ad hoc, and inevitably biased task to fine-tune them on domain-specific literature and data. Here we introduce LLaMP, a multimodal retrieval-augmented generation (RAG) framework of hierarchical reasoning-and-acting (ReAct) agents that can dynamically and recursively interact with computational and experimental data on Materials Project (MP) and run atomistic simulations via high-throughput workflow interface. Without fine-tuning, LLaMP demonstrates strong tool usage ability to comprehend and integrate various modalities of materials science concepts, fetch relevant data stores on the fly, process higher-order data (such as crystal structure and elastic tensor), and streamline complex tasks in computational materials and chemistry. We propose a simple metric combining uncertainty and confidence estimates to evaluate the self-consistency of responses by LLaMP and vanilla LLMs. Our benchmark shows that LLaMP effectively mitigates the intrinsic bias in LLMs, counteracting the errors on bulk moduli, electronic bandgaps, and formation energies that seem to derive from mixed data sources. We also demonstrate LLaMP's capability to edit crystal structures and run annealing molecular dynamics simulations using pre-trained machine-learning force fields. The framework offers an intuitive and nearly hallucination-free approach to exploring and scaling materials informatics, and establishes a pathway for knowledge distillation and fine-tuning other language models. Code and live demo are available at https://github.com/chiang-yuan/llamp

Updated: 2024-10-09 20:13:51

标题: LLaMP：用于高保真度材料知识检索和提炼的大型语言模型的强大应用

摘要: 减少大型语言模型（LLMs）的幻觉对科学的使用至关重要，可靠性和可重复性至关重要。然而，LLMs固有地缺乏长期记忆，使得在特定领域文献和数据上对其进行微调成为一项非平凡的、临时的和不可避免的有偏见的任务。在这里，我们介绍LLaMP，这是一个多模式检索增强生成（RAG）框架，采用层次化推理和行动（ReAct）代理，可以动态地和递归地与材料项目（MP）上的计算和实验数据进行交互，并通过高通量工作流接口运行原子模拟。在不进行微调的情况下，LLaMP展示了强大的工具使用能力，能够理解和整合各种材料科学概念的多个模态，实时获取相关数据存储，处理更高级的数据（如晶体结构和弹性张量），并简化计算材料和化学中的复杂任务。我们提出了一个简单的度量标准，结合不确定性和置信度估计，以评估LLaMP和基础LLMs的回应的自一致性。我们的基准测试显示，LLaMP有效地减轻了LLMs中的固有偏见，抵消了似乎源自混合数据来源的体模量、电子带隙和形成能的误差。我们还展示了LLaMP编辑晶体结构并使用预训练的机器学习力场运行退火分子动力学模拟的能力。该框架提供了一种直观且几乎无幻觉的方法来探索和扩展材料信息学，并为知识蒸馏和微调其他语言模型建立了一条途径。代码和实时演示可在https://github.com/chiang-yuan/llamp 上找到。

更新时间: 2024-10-09 20:13:51

领域: cs.CL,cond-mat.mtrl-sci,cs.AI

下载: http://arxiv.org/abs/2401.17244v3

Exploring Efficient Foundational Multi-modal Models for Video Summarization

Foundational models are able to generate text outputs given prompt instructions and text, audio, or image inputs. Recently these models have been combined to perform tasks on video, such as video summarization. Such video foundation models perform pre-training by aligning outputs from each modality-specific model into the same embedding space. Then the embeddings from each model are used within a language model, which is fine-tuned on a desired instruction set. Aligning each modality during pre-training is computationally expensive and prevents rapid testing of different base modality models. During fine-tuning, evaluation is carried out within in-domain videos where it is hard to understand the generalizability and data efficiency of these methods. To alleviate these issues we propose a plug-and-play video language model. It directly uses the texts generated from each input modality into the language model, avoiding pre-training alignment overhead. Instead of fine-tuning we leverage few-shot instruction adaptation strategies. We compare the performance versus the computational costs for our plug-and-play style method and baseline tuning methods. Finally, we explore the generalizability of each method during domain shift and present insights on what data is useful when training data is limited. Through this analysis, we present practical insights on how to leverage multi-modal foundational models for effective results given realistic compute and data limitations.

Updated: 2024-10-09 20:07:06

标题: 探索视频摘要的高效基础多模型模型

摘要: 基础模型能够根据提示指令和文本、音频或图像输入生成文本输出。最近，这些模型已被结合起来执行视频任务，例如视频摘要。这种视频基础模型通过将每个特定模态模型的输出对齐到相同的嵌入空间来进行预训练。然后，每个模型的嵌入被用于语言模型中，在所需的指令集上进行微调。在预训练期间对齐每个模态是计算昂贵的，阻止了对不同基础模态模型进行快速测试。在微调期间，评估是在领域内视频中进行的，很难理解这些方法的泛化能力和数据效率。为了缓解这些问题，我们提出了一种即插即用的视频语言模型。它直接使用从每个输入模态生成的文本进入语言模型，避免了预训练对齐开销。我们利用少量指令适应策略，而不是微调。我们比较了我们的即插即用风格方法和基准调整方法的性能与计算成本。最后，我们探讨了每种方法在领域转移期间的泛化能力，并提供了在训练数据有限时哪些数据是有用的见解。通过这种分析，我们提出了如何利用多模态基础模型以获得有效结果，考虑到现实计算和数据限制的实际见解。

更新时间: 2024-10-09 20:07:06

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2410.07405v1

Fostering Intrinsic Motivation in Reinforcement Learning with Pretrained Foundation Models

Exploration remains a significant challenge in reinforcement learning, especially in environments where extrinsic rewards are sparse or non-existent. The recent rise of foundation models, such as CLIP, offers an opportunity to leverage pretrained, semantically rich embeddings that encapsulate broad and reusable knowledge. In this work we explore the potential of these foundation models not just to drive exploration, but also to analyze the critical role of the episodic novelty term in enhancing exploration effectiveness of the agent. We also investigate whether providing the intrinsic module with complete state information -- rather than just partial observations -- can improve exploration, despite the difficulties in handling small variations within large state spaces. Our experiments in the MiniGrid domain reveal that intrinsic modules can effectively utilize full state information, significantly increasing sample efficiency while learning an optimal policy. Moreover, we show that the embeddings provided by foundation models are sometimes even better than those constructed by the agent during training, further accelerating the learning process, especially when coupled with the episodic novelty term to enhance exploration.

Updated: 2024-10-09 20:05:45

标题: 使用预训练基础模型培养强化学习中的内在动机

摘要: 探索仍然是强化学习中的一个重大挑战，特别是在外部奖励稀缺或不存在的环境中。最近兴起的基础模型，如CLIP，为利用预训练的语义丰富的嵌入提供了机会，这些嵌入封装了广泛且可重复使用的知识。在这项工作中，我们探讨了这些基础模型的潜力，不仅可以推动探索，还可以分析情景新颖性项在增强代理探索效果中的关键作用。我们还调查了是否提供完整状态信息给内在模块--而不仅仅是部分观察--可以改善探索，尽管在处理大状态空间中的小变化时存在困难。我们在MiniGrid领域的实验表明，内在模块可以有效地利用完整的状态信息，显著提高样本效率，同时学习最优策略。此外，我们展示了基础模型提供的嵌入有时甚至比训练期间代理构建的嵌入更好，进一步加速学习过程，特别是当与情景新颖性项结合以增强探索时。

更新时间: 2024-10-09 20:05:45

领域: cs.AI,cs.LG

下载: http://arxiv.org/abs/2410.07404v1

Aligning AI-driven discovery with human intuition

As data-driven modeling of physical dynamical systems becomes more prevalent, a new challenge is emerging: making these models more compatible and aligned with existing human knowledge. AI-driven scientific modeling processes typically begin with identifying hidden state variables, then deriving governing equations, followed by predicting and analyzing future behaviors. The critical initial step of identification of an appropriate set of state variables remains challenging for two reasons. First, finding a compact set of meaningfully predictive variables is mathematically difficult and under-defined. A second reason is that variables found often lack physical significance, and are therefore difficult for human scientists to interpret. We propose a new general principle for distilling representations that are naturally more aligned with human intuition, without relying on prior physical knowledge. We demonstrate our approach on a number of experimental and simulated system where the variables generated by the AI closely resemble those chosen independently by human scientists. We suggest that this principle can help make human-AI collaboration more fruitful, as well as shed light on how humans make scientific modeling choices.

Updated: 2024-10-09 19:52:59

标题: 将AI驱动的发现与人类直觉对齐

摘要: 随着基于数据驱动的物理动态系统建模变得更加普遍，一个新的挑战正在出现：使这些模型更加兼容和与现有人类知识相一致。AI驱动的科学建模过程通常从识别隐藏的状态变量开始，然后推导控制方程，接着预测和分析未来行为。识别适当状态变量集合的关键初始步骤仍然具有挑战性，原因有两个。首先，找到具有意义的预测变量的紧凑集合在数学上是困难和不明确的。第二个原因是找到的变量通常缺乏物理意义，因此对于人类科学家来说难以解释。我们提出了一个新的通用原则，用于提炼更符合人类直觉的表达，而不依赖先前的物理知识。我们在多个实验和模拟系统上演示了我们的方法，其中由AI生成的变量与人类科学家独立选择的变量非常相似。我们认为这一原则可以帮助使人工智能与人类合作更加有成效，并揭示人类如何做出科学建模选择。

更新时间: 2024-10-09 19:52:59

领域: cs.LG

下载: http://arxiv.org/abs/2410.07397v1

TextLap: Customizing Language Models for Text-to-Layout Planning

Automatic generation of graphical layouts is crucial for many real-world applications, including designing posters, flyers, advertisements, and graphical user interfaces. Given the incredible ability of Large language models (LLMs) in both natural language understanding and generation, we believe that we could customize an LLM to help people create compelling graphical layouts starting with only text instructions from the user. We call our method TextLap (text-based layout planning). It uses a curated instruction-based layout planning dataset (InsLap) to customize LLMs as a graphic designer. We demonstrate the effectiveness of TextLap and show that it outperforms strong baselines, including GPT-4 based methods, for image generation and graphical design benchmarks.

Updated: 2024-10-09 19:51:38

标题: TextLap：为文本布局规划定制语言模型

摘要: 自动生成图形布局对于许多现实世界的应用至关重要，包括设计海报、传单、广告和图形用户界面。鉴于大型语言模型（LLMs）在自然语言理解和生成方面的令人难以置信的能力，我们相信我们可以定制一个LLM来帮助人们从用户的纯文本指令开始创建引人入胜的图形布局。我们称这种方法为TextLap（基于文本的布局规划）。它使用一个经过策划的基于指令的布局规划数据集（InsLap）来定制LLM作为图形设计师。我们展示了TextLap的有效性，并表明它在图像生成和图形设计基准测试中优于包括基于GPT-4的方法在内的强基线。

更新时间: 2024-10-09 19:51:38

领域: cs.CL,cs.LG

下载: http://arxiv.org/abs/2410.12844v1

Self-Exploring Language Models: Active Preference Elicitation for Online Alignment

Preference optimization, particularly through Reinforcement Learning from Human Feedback (RLHF), has achieved significant success in aligning Large Language Models (LLMs) to adhere to human intentions. Unlike offline alignment with a fixed dataset, online feedback collection from humans or AI on model generations typically leads to more capable reward models and better-aligned LLMs through an iterative process. However, achieving a globally accurate reward model requires systematic exploration to generate diverse responses that span the vast space of natural language. Random sampling from standard reward-maximizing LLMs alone is insufficient to fulfill this requirement. To address this issue, we propose a bilevel objective optimistically biased towards potentially high-reward responses to actively explore out-of-distribution regions. By solving the inner-level problem with the reparameterized reward function, the resulting algorithm, named \textit{Self-Exploring Language Models} (SELM), eliminates the need for a separate RM and iteratively updates the LLM with a straightforward objective. Compared to \textit{Direct Preference Optimization} (DPO), the SELM objective reduces indiscriminate favor of unseen extrapolations and enhances exploration efficiency. Our experimental results demonstrate that when fine-tuned on Zephyr-7B-SFT and Llama-3-8B-Instruct models, SELM significantly boosts the performance on instruction-following benchmarks such as MT-Bench and AlpacaEval 2.0, as well as various standard academic benchmarks in different settings. Our code and models are available at https://github.com/shenao-zhang/SELM.

Updated: 2024-10-09 19:49:05

标题: 自我探索语言模型：在线对齐的主动偏好引导

摘要: 偏好优化，特别是通过从人类反馈中进行强化学习（RLHF），已经取得了显著的成功，使大型语言模型（LLMs）能够符合人类意图。与固定数据集的离线对齐不同，从人类或AI收集模型生成的在线反馈通常通过迭代过程导致更能力强的奖励模型和更好对齐的LLMs。然而，实现全局准确的奖励模型需要系统地探索以生成跨越自然语言广阔空间的多样性响应。仅从标准奖励最大化的LLMs中随机采样是不足以满足这一要求的。为了解决这个问题，我们提出了一个双层目标，乐观地偏向于潜在高奖励响应，以积极探索超出分布区域。通过使用重新参数化奖励函数解决内层问题，得到的算法，称为“自我探索语言模型”（SELM），消除了对单独的RM的需要，并通过简单的目标迭代更新LLM。与“直接偏好优化”（DPO）相比，SELM的目标减少了对未见外推的不加区分的偏爱，并增强了探索效率。我们的实验结果表明，当在Zephyr-7B-SFT和Llama-3-8B-Instruct模型上进行微调时，SELM显著提升了诸如MT-Bench和AlpacaEval 2.0等遵循指令的基准测试性能，以及不同设置中的各种标准学术基准。我们的代码和模型可在https://github.com/shenao-zhang/SELM 上找到。

更新时间: 2024-10-09 19:49:05

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2405.19332v2

Exploring Prompt Engineering: A Systematic Review with SWOT Analysis

In this paper, we conduct a comprehensive SWOT analysis of prompt engineering techniques within the realm of Large Language Models (LLMs). Emphasizing linguistic principles, we examine various techniques to identify their strengths, weaknesses, opportunities, and threats. Our findings provide insights into enhancing AI interactions and improving language model comprehension of human prompts. The analysis covers techniques including template-based approaches and fine-tuning, addressing the problems and challenges associated with each. The conclusion offers future research directions aimed at advancing the effectiveness of prompt engineering in optimizing human-machine communication.

Updated: 2024-10-09 19:48:35

标题: 探索提示工程：一项带有SWOT分析的系统性综述

摘要: 在这篇论文中，我们对大型语言模型（LLMs）领域内的即时工程技术进行了全面的SWOT分析。强调语言学原理，我们审查了各种技术，以识别它们的优势、劣势、机会和威胁。我们的研究结果提供了改进人工智能交互和提高语言模型对人类提示理解的见解。分析涵盖了包括基于模板的方法和微调在内的技术，解决了与每种方法相关的问题和挑战。结论提出了未来研究方向，旨在推进即时工程在优化人机沟通中的有效性。

更新时间: 2024-10-09 19:48:35

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2410.12843v1

LLM Embeddings Improve Test-time Adaptation to Tabular $Y|X$-Shifts

For tabular datasets, the change in the relationship between the label and covariates ($Y|X$-shifts) is common due to missing variables (a.k.a. confounders). Since it is impossible to generalize to a completely new and unknown domain, we study models that are easy to adapt to the target domain even with few labeled examples. We focus on building more informative representations of tabular data that can mitigate $Y|X$-shifts, and propose to leverage the prior world knowledge in LLMs by serializing (write down) the tabular data to encode it. We find LLM embeddings alone provide inconsistent improvements in robustness, but models trained on them can be well adapted/finetuned to the target domain even using 32 labeled observations. Our finding is based on a comprehensive and systematic study consisting of 7650 source-target pairs and benchmark against 261,000 model configurations trained by 22 algorithms. Our observation holds when ablating the size of accessible target data and different adaptation strategies. The code is available at https://github.com/namkoong-lab/LLM-Tabular-Shifts.

Updated: 2024-10-09 19:46:30

标题: LLM嵌入提高对表格$Y|X$-转移的测试时间适应性

摘要: 对于表格数据集，由于缺失变量（也称为混杂因素），标签和协变量之间的关系变化（$Y|X$-shifts）是常见的。由于不可能泛化到完全新的和未知的领域，我们研究了一种模型，即使只有少量标记示例，也容易适应目标领域。我们专注于构建更具信息量的表格数据表示，可以缓解$Y|X$-shifts，并建议利用LLMs中的先验世界知识，通过串行化（写入）表格数据来对其进行编码。我们发现仅使用LLM嵌入提供的改进在稳健性方面并不一致，但在这些模型上训练的模型可以很好地适应/微调到目标领域，即使只使用32个标记观测。我们的发现基于一个包含7650个源-目标对的全面系统研究，并与22种算法训练的261,000个模型配置进行了基准测试。我们的观察结果在消除可访问目标数据的大小和不同的适应策略时也成立。代码可以在https://github.com/namkoong-lab/LLM-Tabular-Shifts找到。

更新时间: 2024-10-09 19:46:30

领域: cs.LG,cs.AI,math.OC,stat.ML

下载: http://arxiv.org/abs/2410.07395v1

On the Hölder Stability of Multiset and Graph Neural Networks

Extensive research efforts have been put into characterizing and constructing maximally separating multiset and graph neural networks. However, recent empirical evidence suggests the notion of separation itself doesn't capture several interesting phenomena. On the one hand, the quality of this separation may be very weak, to the extent that the embeddings of "separable" objects might even be considered identical when using fixed finite precision. On the other hand, architectures which aren't capable of separation in theory, somehow achieve separation when taking the network to be wide enough. In this work, we address both of these issues, by proposing a novel pair-wise separation quality analysis framework which is based on an adaptation of Lipschitz and \Holder{} stability to parametric functions. The proposed framework, which we name \emph{\Holder{} in expectation}, allows for separation quality analysis, without restricting the analysis to embeddings that can separate all the input space simultaneously. We prove that common sum-based models are lower-\Holder{} in expectation, with an exponent that decays rapidly with the network's depth . Our analysis leads to adversarial examples of graphs which can be separated by three 1-WL iterations, but cannot be separated in practice by standard maximally powerful Message Passing Neural Networks (MPNNs). To remedy this, we propose two novel MPNNs with improved separation quality, one of which is lower Lipschitz in expectation. We show these MPNNs can easily classify our adversarial examples, and compare favorably with standard MPNNs on standard graph learning tasks.

Updated: 2024-10-09 19:44:36

标题: 关于多集合和图神经网络的Hölder稳定性

摘要: 广泛的研究工作已经投入到表征和构建最大分离的多集和图神经网络中。然而，最近的经验证据表明，分离本身并不能捕捉几个有趣的现象。一方面，这种分离的质量可能非常弱，以至于在使用固定的有限精度时，“可分离”对象的嵌入甚至可能被认为是相同的。另一方面，在理论上不能实现分离的架构，在将网络扩展到足够宽的情况下，却可以实现分离。在这项工作中，我们通过提出一种基于参数函数的Lipschitz和Holder稳定性调整的新型成对分离质量分析框架，解决了这两个问题。我们提出的框架，命名为“期望中的Holder”，允许进行分离质量分析，而不限制分析到能够同时分离所有输入空间的嵌入。我们证明了常见的基于求和的模型在期望中是较低的Holder，指数随着网络深度的增加而迅速衰减。我们的分析导致图的对抗性示例，可以通过三次1-WL迭代进行分离，但在实践中无法通过标准的最大功率消息传递神经网络(MPNNs)进行分离。为了解决这个问题，我们提出了两种改进了分离质量的新型MPNNs，其中一个在期望中是较低的Lipschitz。我们展示了这些MPNNs可以轻松对我们的对抗性示例进行分类，并在标准图学习任务上与标准MPNNs进行了有利的比较。

更新时间: 2024-10-09 19:44:36

领域: cs.LG

下载: http://arxiv.org/abs/2406.06984v2

Who's in and who's out? A case study of multimodal CLIP-filtering in DataComp

As training datasets become increasingly drawn from unstructured, uncontrolled environments such as the web, researchers and industry practitioners have increasingly relied upon data filtering techniques to "filter out the noise" of web-scraped data. While datasets have been widely shown to reflect the biases and values of their creators, in this paper we contribute to an emerging body of research that assesses the filters used to create these datasets. We show that image-text data filtering also has biases and is value-laden, encoding specific notions of what is counted as "high-quality" data. In our work, we audit a standard approach of image-text CLIP-filtering on the academic benchmark DataComp's CommonPool by analyzing discrepancies of filtering through various annotation techniques across multiple modalities of image, text, and website source. We find that data relating to several imputed demographic groups -- such as LGBTQ+ people, older women, and younger men -- are associated with higher rates of exclusion. Moreover, we demonstrate cases of exclusion amplification: not only are certain marginalized groups already underrepresented in the unfiltered data, but CLIP-filtering excludes data from these groups at higher rates. The data-filtering step in the machine learning pipeline can therefore exacerbate representation disparities already present in the data-gathering step, especially when existing filters are designed to optimize a specifically-chosen downstream performance metric like zero-shot image classification accuracy. Finally, we show that the NSFW filter fails to remove sexually-explicit content from CommonPool, and that CLIP-filtering includes several categories of copyrighted content at high rates. Our conclusions point to a need for fundamental changes in dataset creation and filtering practices.

Updated: 2024-10-09 19:34:13

标题: 谁在里面，谁在外面？DataComp中多模式CLIP过滤的案例研究

摘要: 随着训练数据集越来越多地来自于无结构、无控制的环境，如网络，研究人员和行业从业者越来越依赖数据过滤技术来“过滤掉”网络抓取的数据中的噪音。虽然已经广泛显示数据集反映了其创建者的偏见和价值观，但在本文中，我们为评估用于创建这些数据集的过滤器所用的方法贡献了一部新兴研究。我们表明图像文本数据过滤也存在偏见和价值观，对什么被认为是“高质量”数据进行了特定概念的编码。在我们的工作中，通过分析在学术基准DataComp的CommonPool上通过多种模态的图像、文本和网站来源进行的各种注释技术的过滤差异，我们审计了图像文本CLIP过滤的标准方法。我们发现与几个被假定的人口群体相关的数据，如LGBTQ+人群、年长妇女和年轻男性，与排除的比率较高相关。此外，我们展示了排除放大的情况：不仅某些边缘化群体在未经过滤的数据中已经少数，而且CLIP过滤以更高的比率排除这些群体的数据。机器学习流程中的数据过滤步骤因此可能加剧已经存在于数据收集步骤中的代表性差距，特别是当现有的过滤器被设计为优化特定选择的下游性能指标，如零样本图像分类准确性。最后，我们发现NSFW过滤器无法从CommonPool中删除色情内容，并且CLIP过滤以较高的比率包含了几类受版权保护的内容。我们的结论指出了数据集创建和过滤实践需要进行根本性变革的必要性。

更新时间: 2024-10-09 19:34:13

领域: cs.CY,cs.CL,cs.CV,cs.LG

下载: http://arxiv.org/abs/2405.08209v2

The Cognitive Capabilities of Generative AI: A Comparative Analysis with Human Benchmarks

There is increasing interest in tracking the capabilities of general intelligence foundation models. This study benchmarks leading large language models and vision language models against human performance on the Wechsler Adult Intelligence Scale (WAIS-IV), a comprehensive, population-normed assessment of underlying human cognition and intellectual abilities, with a focus on the domains of VerbalComprehension (VCI), Working Memory (WMI), and Perceptual Reasoning (PRI). Most models demonstrated exceptional capabilities in the storage, retrieval, and manipulation of tokens such as arbitrary sequences of letters and numbers, with performance on the Working Memory Index (WMI) greater or equal to the 99.5th percentile when compared to human population normative ability. Performance on the Verbal Comprehension Index (VCI) which measures retrieval of acquired information, and linguistic understanding about the meaning of words and their relationships to each other, also demonstrated consistent performance at or above the 98th percentile. Despite these broad strengths, we observed consistently poor performance on the Perceptual Reasoning Index (PRI; range 0.1-10th percentile) from multimodal models indicating profound inability to interpret and reason on visual information. Smaller and older model versions consistently performed worse, indicating that training data, parameter count and advances in tuning are resulting in significant advances in cognitive ability.

Updated: 2024-10-09 19:22:26

标题: 生成型人工智能的认知能力：与人类基准的比较分析

摘要: 随着对普通智能基础模型能力的追踪兴趣日益增长，本研究对领先的大型语言模型和视觉语言模型在韦氏成人智力量表（WAIS-IV）上与人类表现进行了基准测试，该量表是对人类认知和智力能力的基础进行全面评估，并侧重于语言理解（VCI）、工作记忆（WMI）和感知推理（PRI）领域。大多数模型在存储、检索和操作诸如任意序列的字母和数字等令牌方面表现出卓越能力，与人类人口规范能力相比，工作记忆指数（WMI）的表现大于或等于99.5百分位。衡量获取信息的检索和对单词含义及其彼此关系的语言理解的语言理解指数（VCI）的表现也表现出持续在98百分位以上。尽管具有广泛的优势，我们观察到多模态模型在感知推理指数（PRI；范围0.1-10百分位）上表现持续糟糕，表明它们对于解释和理解视觉信息存在深刻的无能。较小和较旧的模型版本表现一致较差，表明训练数据、参数数量和调整的进步正在导致认知能力的显著提升。

更新时间: 2024-10-09 19:22:26

领域: cs.AI

下载: http://arxiv.org/abs/2410.07391v1

Web Retrieval Agents for Evidence-Based Misinformation Detection

This paper develops an agent-based automated fact-checking approach for detecting misinformation. We demonstrate that combining a powerful LLM agent, which does not have access to the internet for searches, with an online web search agent yields better results than when each tool is used independently. Our approach is robust across multiple models, outperforming alternatives and increasing the macro F1 of misinformation detection by as much as 20 percent compared to LLMs without search. We also conduct extensive analyses on the sources our system leverages and their biases, decisions in the construction of the system like the search tool and the knowledge base, the type of evidence needed and its impact on the results, and other parts of the overall process. By combining strong performance with in-depth understanding, we hope to provide building blocks for future search-enabled misinformation mitigation systems.

Updated: 2024-10-09 19:13:41

标题: 网络检索代理用于基于证据的虚假信息检测

摘要: 这篇论文开发了一种基于代理的自动事实核查方法，用于检测虚假信息。我们证明，将一款强大的LLM代理与一个在线网络搜索代理结合使用，比单独使用每个工具时产生更好的结果。我们的方法在多个模型中表现稳健，优于其它替代方案，并且与没有搜索功能的LLMs相比，虚假信息检测的宏观F1值提高了多达20％。我们还对系统利用的信息源及其偏见、系统构建中的决策（如搜索工具和知识库）、需要的证据类型及其对结果的影响以及整个流程的其他部分进行了广泛分析。通过将出色的性能与深入的理解相结合，我们希望为未来的搜索启用虚假信息缓解系统提供基础。

更新时间: 2024-10-09 19:13:41

领域: cs.IR,cs.AI

下载: http://arxiv.org/abs/2409.00009v2

Siamese networks for Poincaré embeddings and the reconstruction of evolutionary trees

We present a method for reconstructing evolutionary trees from high-dimensional data, with a specific application to bird song spectrograms. We address the challenge of inferring phylogenetic relationships from phenotypic traits, like vocalizations, without predefined acoustic properties. Our approach combines two main components: Poincar\'e embeddings for dimensionality reduction and distance computation, and the neighbor joining algorithm for tree reconstruction. Unlike previous work, we employ Siamese networks to learn embeddings from only leaf node samples of the latent tree. We demonstrate our method's effectiveness on both synthetic data and spectrograms from six species of finches.

Updated: 2024-10-09 19:10:08

标题: Siamese网络用于Poincaré嵌入和重建进化树

摘要: 我们提出了一种从高维数据重建进化树的方法，特定应用于鸟类歌声频谱图。我们解决了从表型特征（如声音）推断系统发育关系的挑战，而无需预定义的声学属性。我们的方法结合了两个主要组件：用于降维和距离计算的Poincar\'e嵌入，以及用于树重建的邻居连接算法。与先前的工作不同，我们采用Siamese网络从潜在树的只有叶节点样本中学习嵌入。我们展示了我们的方法在合成数据和六种雀类的频谱图上的有效性。

更新时间: 2024-10-09 19:10:08

领域: q-bio.PE,cs.LG

下载: http://arxiv.org/abs/2410.07387v1

ClinicalLab: Aligning Agents for Multi-Departmental Clinical Diagnostics in the Real World

LLMs have achieved significant performance progress in various NLP applications. However, LLMs still struggle to meet the strict requirements for accuracy and reliability in the medical field and face many challenges in clinical applications. Existing clinical diagnostic evaluation benchmarks for evaluating medical agents powered by LLMs have severe limitations. Firstly, most existing medical evaluation benchmarks face the risk of data leakage or contamination. Secondly, existing benchmarks often neglect the characteristics of multiple departments and specializations in modern medical practice. Thirdly, existing evaluation methods are limited to multiple-choice questions, which do not align with the real-world diagnostic scenarios. Lastly, existing evaluation methods lack comprehensive evaluations of end-to-end real clinical scenarios. These limitations in benchmarks in turn obstruct advancements of LLMs and agents for medicine. To address these limitations, we introduce ClinicalLab, a comprehensive clinical diagnosis agent alignment suite. ClinicalLab includes ClinicalBench, an end-to-end multi-departmental clinical diagnostic evaluation benchmark for evaluating medical agents and LLMs. ClinicalBench is based on real cases that cover 24 departments and 150 diseases. ClinicalLab also includes four novel metrics (ClinicalMetrics) for evaluating the effectiveness of LLMs in clinical diagnostic tasks. We evaluate 17 LLMs and find that their performance varies significantly across different departments. Based on these findings, in ClinicalLab, we propose ClinicalAgent, an end-to-end clinical agent that aligns with real-world clinical diagnostic practices. We systematically investigate the performance and applicable scenarios of variants of ClinicalAgent on ClinicalBench. Our findings demonstrate the importance of aligning with modern medical practices in designing medical agents.

Updated: 2024-10-09 19:09:30

标题: 临床实验室：在现实世界中为多部门临床诊断协调代理商

摘要: LLM在各种自然语言处理应用中取得了显著的性能进展。然而，在医学领域，LLM仍然很难满足准确性和可靠性的严格要求，并且在临床应用中面临许多挑战。现有用于评估由LLM驱动的医疗代理的临床诊断评估基准存在严重限制。首先，大多数现有的医疗评估基准都面临数据泄漏或污染的风险。其次，现有基准通常忽视了现代医疗实践中多个部门和专业的特点。第三，现有评估方法仅限于多项选择题，与现实世界的诊断场景不符。最后，现有评估方法缺乏对端到端真实临床场景的全面评估。这些基准的限制反过来阻碍了LLM和医疗代理的进展。为了解决这些限制，我们引入了ClinicalLab，一个全面的临床诊断代理对齐套件。ClinicalLab包括ClinicalBench，一个端到端多部门临床诊断评估基准，用于评估医疗代理和LLM。ClinicalBench基于涵盖24个部门和150种疾病的真实病例。ClinicalLab还包括四个用于评估LLM在临床诊断任务中有效性的新指标（ClinicalMetrics）。我们评估了17个LLM，并发现它们在不同部门之间的性能差异很大。基于这些发现，在ClinicalLab中，我们提出了ClinicalAgent，一个与现实世界临床诊断实践对齐的端到端临床代理。我们系统地研究了ClinicalAgent变体在ClinicalBench上的性能和适用场景。我们的研究结果表明，在设计医疗代理时与现代医疗实践对齐的重要性。

更新时间: 2024-10-09 19:09:30

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2406.13890v2

QuAILoRA: Quantization-Aware Initialization for LoRA

QLoRA reduces the memory-cost of fine-tuning a large language model (LLM) with LoRA by quantizing the base LLM. However, quantization introduces quantization errors that negatively impact model performance after fine-tuning. In this paper we introduce QuAILoRA, a quantization-aware initialization for LoRA that mitigates this negative impact by decreasing quantization errors at initialization. Our method spends a small amount of computational overhead to compute this quantization-aware initialization, without increasing the memory-cost of fine-tuning. We evaluate our method on several causal language modeling and downstream evaluation tasks using several different model sizes and families. We observe that almost all LLMs fined-tuned with QuAILoRA achieve better validation perplexity. When evaluated on downstream tasks, we find that QuAILoRA yields improvements proportional to the negative effect of quantization error. On average, applying QuAILoRA to 4-bit QLoRA models yields 75% of the validation perplexity decrease and 86% of the downstream task accuracy increase as doubling the quantization precision to 8-bit, without increasing GPU memory utilization during fine-tuning.

Updated: 2024-10-09 19:06:37

标题: QuAILoRA：LoRA的量化感知初始化

摘要: QLoRA通过对基础LLM进行量化来降低微调大型语言模型(LLM)的内存成本。然而，量化会引入量化误差，对微调后的模型性能产生负面影响。本文介绍了QuAILoRA，这是一种对LoRA进行量化感知初始化的方法，通过在初始化阶段减少量化误差来减轻这种负面影响。我们的方法在计算量上花费了少量计算开销来计算这种量化感知初始化，而不会增加微调的内存成本。我们在几个因果语言建模和下游评估任务上评估了我们的方法，使用了几种不同的模型大小和系列。我们观察到，几乎所有使用QuAILoRA进行微调的LLM都获得了更好的验证困惑度。在下游任务上进行评估时，我们发现QuAILoRA产生的改进与量化误差的负面影响成正比。平均而言，将QuAILoRA应用于4位QLoRA模型，可以获得75%的验证困惑度下降和86%的下游任务准确率增加，而不会增加GPU在微调期间的内存利用率。

更新时间: 2024-10-09 19:06:37

领域: cs.LG,cs.CL,68T50

下载: http://arxiv.org/abs/2410.14713v1

SparseGrad: A Selective Method for Efficient Fine-tuning of MLP Layers

The performance of Transformer models has been enhanced by increasing the number of parameters and the length of the processed text. Consequently, fine-tuning the entire model becomes a memory-intensive process. High-performance methods for parameter-efficient fine-tuning (PEFT) typically work with Attention blocks and often overlook MLP blocks, which contain about half of the model parameters. We propose a new selective PEFT method, namely SparseGrad, that performs well on MLP blocks. We transfer layer gradients to a space where only about 1\% of the layer's elements remain significant. By converting gradients into a sparse structure, we reduce the number of updated parameters. We apply SparseGrad to fine-tune BERT and RoBERTa for the NLU task and LLaMa-2 for the Question-Answering task. In these experiments, with identical memory requirements, our method outperforms LoRA and MeProp, robust popular state-of-the-art PEFT approaches.

Updated: 2024-10-09 19:03:52

标题: SparseGrad：一种用于MLP层有效微调的选择性方法

摘要: Transformer 模型的性能已经通过增加参数数量和处理文本长度来提高。因此，对整个模型进行微调变成了一个占用内存的过程。高性能参数高效微调（PEFT）方法通常使用注意力块，并经常忽略包含大约一半模型参数的 MLP 块。我们提出了一种新的选择性 PEFT 方法，即 SparseGrad，在 MLP 块上表现良好。我们将层梯度转移到一个空间，其中仅保留大约 1% 的层元素是显著的。通过将梯度转换为稀疏结构，我们减少了更新参数的数量。我们将 SparseGrad 应用于微调 BERT 和 RoBERTa 用于 NLU 任务和 LLaMa-2 用于问答任务。在这些实验中，与相同的内存需求相比，我们的方法胜过 LoRA 和 MeProp，这是目前流行的最先进的 PEFT 方法。

更新时间: 2024-10-09 19:03:52

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2410.07383v1

Interesting Scientific Idea Generation Using Knowledge Graphs and LLMs: Evaluations with 100 Research Group Leaders

The rapid growth of scientific literature makes it challenging for researchers to identify novel and impactful ideas, especially across disciplines. Modern artificial intelligence (AI) systems offer new approaches, potentially inspiring ideas not conceived by humans alone. But how compelling are these AI-generated ideas, and how can we improve their quality? Here, we introduce SciMuse, which uses 58 million research papers and a large-language model to generate research ideas. We conduct a large-scale evaluation in which over 100 research group leaders - from natural sciences to humanities - ranked more than 4,400 personalized ideas based on their interest. This data allows us to predict research interest using (1) supervised neural networks trained on human evaluations, and (2) unsupervised zero-shot ranking with large-language models. Our results demonstrate how future systems can help generating compelling research ideas and foster unforeseen interdisciplinary collaborations.

Updated: 2024-10-09 18:58:13

标题: 使用知识图谱和LLMs进行有趣的科学思想生成：对100位研究小组领导者进行评估

摘要: 科学文献的快速增长使得研究人员很难识别新颖且有影响力的想法，特别是跨学科的。现代人工智能（AI）系统提供了新的方法，可能会激发人类无法想象的想法。但是这些由AI生成的想法有多么具有吸引力，我们如何提高它们的质量呢？在这里，我们介绍了SciMuse，它利用5800万篇研究论文和一个大型语言模型来生成研究想法。我们进行了大规模评估，在这个评估中，超过100位研究小组负责人 - 从自然科学到人文科学 - 根据他们的兴趣对4400多个个性化想法进行了排名。这些数据使我们能够使用（1）在人类评估上训练过的监督神经网络来预测研究兴趣，以及（2）使用大型语言模型进行无监督的零-shot排名。我们的结果展示了未来系统如何帮助生成引人注目的研究想法并促进意想不到的跨学科合作。

更新时间: 2024-10-09 18:58:13

领域: cs.AI,cs.CL,cs.DL,cs.LG

下载: http://arxiv.org/abs/2405.17044v2

Learn from Real: Reality Defender's Submission to ASVspoof5 Challenge

Audio deepfake detection is crucial to combat the malicious use of AI-synthesized speech. Among many efforts undertaken by the community, the ASVspoof challenge has become one of the benchmarks to evaluate the generalizability and robustness of detection models. In this paper, we present Reality Defender's submission to the ASVspoof5 challenge, highlighting a novel pretraining strategy which significantly improves generalizability while maintaining low computational cost during training. Our system SLIM learns the style-linguistics dependency embeddings from various types of bonafide speech using self-supervised contrastive learning. The learned embeddings help to discriminate spoof from bonafide speech by focusing on the relationship between the style and linguistics aspects. We evaluated our system on ASVspoof5, ASV2019, and In-the-wild. Our submission achieved minDCF of 0.1499 and EER of 5.5% on ASVspoof5 Track 1, and EER of 7.4% and 10.8% on ASV2019 and In-the-wild respectively.

Updated: 2024-10-09 18:55:28

标题: 从真实中学习：Reality Defender对ASVspoof5挑战的提交

摘要: 音频深度伪造检测对于打击恶意使用AI合成的语音至关重要。在社区采取的诸多努力中，ASVspoof挑战已成为评估检测模型的泛化能力和鲁棒性的基准之一。本文介绍了Reality Defender对ASVspoof5挑战的提交，重点介绍了一种新颖的预训练策略，该策略在训练期间显著提高了泛化能力，同时保持低计算成本。我们的系统SLIM通过自监督对比学习从各种真实语音中学习风格-语言学依赖性嵌入。学习的嵌入有助于通过专注于风格和语言学方面的关系来区分伪造和真实语音。我们在ASVspoof5、ASV2019和In-the-wild上评估了我们的系统。我们的提交在ASVspoof5 Track 1上实现了0.1499的minDCF和5.5%的EER，以及在ASV2019和In-the-wild上分别实现了7.4%和10.8%的EER。

更新时间: 2024-10-09 18:55:28

领域: eess.AS,cs.AI,cs.CL

下载: http://arxiv.org/abs/2410.07379v1

Hyperbolic Machine Learning Moment Closures for the BGK Equations

We introduce a hyperbolic closure for the Grad moment expansion of the Bhatnagar-Gross-Krook's (BGK) kinetic model using a neural network (NN) trained on BGK's moment data. This closure is motivated by the exact closure for the free streaming limit that we derived in our paper on closures in transport \cite{Huang2022-RTE1}. The exact closure relates the gradient of the highest moment to the gradient of four lower moments. As with our past work, the model presented here learns the gradient of the highest moment in terms of the coefficients of gradients for all lower ones. By necessity, this means that the resulting hyperbolic system is not conservative in the highest moment. For stability, the output layers of the NN are designed to enforce hyperbolicity and Galilean invariance. This ensures the model can be run outside of the training window of the NN. Unlike our previous work on radiation transport that dealt with linear models, the BGK model's nonlinearity demanded advanced training tools. These comprised an optimal learning rate discovery, one cycle training, batch normalization in each neural layer, and the use of the \texttt{AdamW} optimizer. To address the non-conservative structure of the hyperbolic model, we adopt the FORCE numerical method to achieve robust solutions. This results in a comprehensive computing model combining learned closures with methods for solving hyperbolic models. The proposed model can capture accurate moment solutions across a broad spectrum of Knudsen numbers. Our paper details the multi-scale model construction and is run on a range of test problems.

Updated: 2024-10-09 18:41:33

标题: 超越机器学习矩封闭在BGK方程中的应用

摘要: 我们引入了一个双曲线闭合方法，用于对Bhatnagar-Gross-Krook（BGK）动力学模型的Grad矩展开进行闭合，该闭合方法使用神经网络（NN）对BGK的矩数据进行训练。这种闭合方法受到我们在输运闭合领域的研究中得出的对自由传输极限的精确闭合方法的启发。精确闭合方法将最高阶矩的梯度与四个较低阶矩的梯度联系起来。与我们过去的工作一样，这里提出的模型学习了最高阶矩的梯度，以较低阶矩的梯度系数表示。必然地，这意味着结果双曲线系统在最高阶矩方面不是守恒的。为了保持稳定性，NN的输出层被设计为强制执行双曲线性和伽利略不变性。这确保了模型可以在NN的训练窗口之外运行。与我们之前处理线性模型的辐射传输工作不同，BGK模型的非线性要求使用先进的训练工具。这些工具包括最佳学习率发现、单周期训练、每个神经层的批处理规范化，以及使用AdamW优化器。为了解决双曲线模型的非守恒结构，我们采用FORCE数值方法来实现稳健解决方案。这导致了一个综合的计算模型，结合了学习的闭合方法和解决双曲线模型的方法。提出的模型可以在广泛的Knudsen数范围内捕获准确的矩解。我们的论文详细介绍了多尺度模型构建，并在一系列测试问题上进行了运行。

更新时间: 2024-10-09 18:41:33

领域: math.NA,cs.LG,cs.NA,physics.comp-ph,82C32 (Primary), 82C40, 82C70 (Secondary)

下载: http://arxiv.org/abs/2401.04783v2

Linear combinations of Gaussian latents in generative models: interpolation and beyond

Sampling from generative models has become a crucial tool for applications like data synthesis and augmentation. Diffusion, Flow Matching and Continuous Normalizing Flows have shown effectiveness across various modalities, and rely on Gaussian latent variables for generation. For search-based or creative applications that require additional control over the generation process, it has become common to manipulate the latent variable directly. However, existing approaches for performing such manipulations (e.g. interpolation or forming low-dimensional representations) only work well in special cases or are network or data-modality specific. We propose Combination of Gaussian variables (COG) as a general purpose interpolation method that is easy to implement yet outperforms recent sophisticated methods. Moreover, COG naturally addresses the broader task of forming general linear combinations of latent variables, allowing the construction of subspaces of the latent space, dramatically simplifying the creation of expressive low-dimensional spaces of high-dimensional objects.

Updated: 2024-10-09 18:39:43

标题: 生成模型中的高斯潜变量的线性组合：插值与更多

摘要: 从生成模型中采样已成为数据合成和增强等应用的关键工具。扩散、流匹配和连续归一化流已在各种模态下显示出有效性，并依赖于高斯潜变量进行生成。对于需要对生成过程进行额外控制的搜索型或创意型应用，直接操纵潜变量已变得普遍。然而，现有的用于执行此类操纵的方法（例如插值或形成低维表示）仅在特殊情况下有效，或者是网络或数据模态特定的。我们提出了高斯变量组合（COG）作为一个通用插值方法，易于实现，却胜过最近的复杂方法。此外，COG自然地解决了更广泛的任务，即形成潜变量的一般线性组合，允许构建潜空间的子空间，从而大大简化了对高维对象的表达性低维空间的创建。

更新时间: 2024-10-09 18:39:43

领域: stat.ML,cs.LG

下载: http://arxiv.org/abs/2408.08558v3

An undetectable watermark for generative image models

We present the first undetectable watermarking scheme for generative image models. Undetectability ensures that no efficient adversary can distinguish between watermarked and un-watermarked images, even after making many adaptive queries. In particular, an undetectable watermark does not degrade image quality under any efficiently computable metric. Our scheme works by selecting the initial latents of a diffusion model using a pseudorandom error-correcting code (Christ and Gunn, 2024), a strategy which guarantees undetectability and robustness. We experimentally demonstrate that our watermarks are quality-preserving and robust using Stable Diffusion 2.1. Our experiments verify that, in contrast to every prior scheme we tested, our watermark does not degrade image quality. Our experiments also demonstrate robustness: existing watermark removal attacks fail to remove our watermark from images without significantly degrading the quality of the images. Finally, we find that we can robustly encode 512 bits in our watermark, and up to 2500 bits when the images are not subjected to watermark removal attacks. Our code is available at https://github.com/XuandongZhao/PRC-Watermark.

Updated: 2024-10-09 18:33:06

标题: 一种适用于生成图像模型的无法检测的水印

摘要: 我们提出了第一个适用于生成式图像模型的不可检测水印方案。不可检测性确保没有有效的对手可以区分带水印和不带水印的图像，即使在进行多次自适应查询后也是如此。特别地，不可检测水印不会在任何可计算度量下降低图像质量。我们的方案通过使用伪随机纠错码（Christ和Gunn，2024年）选择扩散模型的初始潜变量来实现，这种策略保证了不可检测性和鲁棒性。我们通过Stable Diffusion 2.1实验验证了我们的水印质量保持和鲁棒性。我们的实验验证了与我们测试过的每一种先前方案相比，我们的水印不会降低图像质量。我们的实验还展示了鲁棒性：现有的水印去除攻击无法从图像中去除我们的水印而不显著降低图像质量。最后，我们发现我们可以在我们的水印中鲁棒地编码512位，当图像没有经过水印去除攻击时，最多可以编码2500位。我们的代码可在https://github.com/XuandongZhao/PRC-Watermark中获得。

更新时间: 2024-10-09 18:33:06

领域: cs.CR,cs.AI,cs.LG,cs.MM

下载: http://arxiv.org/abs/2410.07369v1

CriSPO: Multi-Aspect Critique-Suggestion-guided Automatic Prompt Optimization for Text Generation

Existing automatic prompt engineering methods are typically designed for discriminative tasks, where new task prompts are iteratively refined with limited feedback from a single metric reflecting a single aspect. However, these approaches are suboptimal for generative tasks, which require more nuanced guidance beyond a single numeric metric to improve the prompt and optimize multiple aspects of the generated text. To address these challenges, we propose a novel multi-aspect Critique-Suggestion-guided automatic Prompt Optimization (CriSPO) approach. CriSPO introduces a critique-suggestion module as its core component. This module spontaneously discovers aspects, and compares generated and reference texts across these aspects, providing specific suggestions for prompt modification. These clear critiques and actionable suggestions guide a receptive optimizer module to make more substantial changes, exploring a broader and more effective search space. To further improve CriSPO with multi-metric optimization, we introduce an Automatic Suffix Tuning (AST) extension to enhance the performance of task prompts across multiple metrics. We evaluate CriSPO on 4 state-of-the-art LLMs across 4 summarization and 5 QA datasets. Extensive experiments show 3-4\% ROUGE score improvement on summarization and substantial improvement of various metrics on QA.

Updated: 2024-10-09 18:29:54

标题: CriSPO: 多方面批评建议引导的文本生成自动提示优化

摘要: 现有的自动提示工程方法通常设计用于区分性任务，在这些任务中，新任务提示会通过反映单个方面的单一指标得到有限反馈而被迭代地细化。然而，这些方法对于需要更加微妙指导以改进提示并优化生成文本的生成性任务来说并不是最佳选择，因为这些任务需要超出单一数值指标的更多指导。为了解决这些挑战，我们提出了一种新颖的多方面评论建议引导的自动提示优化（CriSPO）方法。CriSPO引入了一个评论建议模块作为其核心组件。该模块会自发地发现各个方面，并在这些方面上比较生成文本和参考文本，提供特定的提示以修改提示。这些明确的评论和可操作的建议会指导一个接受的优化器模块进行更加实质性的更改，探索一个更广泛且更有效的搜索空间。为了进一步改进CriSPO的多指标优化，我们引入了自动后缀调整（AST）扩展，以增强跨多个指标任务提示的性能。我们在4个最先进的LLM模型上评估了CriSPO，涵盖了4个总结和5个问答数据集。广泛的实验显示在总结任务上ROUGE分数有3-4%的提升，并在问答任务上各种指标有实质性的改进。

更新时间: 2024-10-09 18:29:54

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2410.02748v2

Rank Supervised Contrastive Learning for Time Series Classification

Recently, various contrastive learning techniques have been developed to categorize time series data and exhibit promising performance. A general paradigm is to utilize appropriate augmentations and construct feasible positive samples such that the encoder can yield robust and discriminative representations by mapping similar data points closer together in the feature space while pushing dissimilar data points farther apart. Despite its efficacy, the fine-grained relative similarity (e.g., rank) information of positive samples is largely ignored, especially when labeled samples are limited. To this end, we present Rank Supervised Contrastive Learning (RankSCL) to perform time series classification. Different from conventional contrastive learning frameworks, RankSCL augments raw data in a targeted way in the embedding space and adopts certain filtering rules to select more informative positive and negative pairs of samples. Moreover, a novel rank loss is developed to assign different weights for different levels of positive samples, enable the encoder to extract the fine-grained information of the same class, and produce a clear boundary among different classes. Thoroughly empirical studies on 128 UCR datasets and 30 UEA datasets demonstrate that the proposed RankSCL can achieve state-of-the-art performance compared to existing baseline methods.

Updated: 2024-10-09 18:29:17

标题: 等级监督对比学习用于时间序列分类

摘要: 最近，各种对比学习技术已被开发用于对时间序列数据进行分类，并展现出有希望的性能。一个通用的范例是利用适当的增强和构建可行的正样本，使编码器能够通过将相似的数据点在特征空间中映射得更近，而将不相似的数据点推得更远来产生稳健且有区分性的表示。尽管其有效性，正样本的细粒度相对相似性（例如，排名）信息在有限标记样本的情况下往往被忽视。为此，我们提出了一种称为Rank Supervised Contrastive Learning（RankSCL）的方法来执行时间序列分类。与传统的对比学习框架不同，RankSCL在嵌入空间中以有针对性的方式增强原始数据，并采用某些过滤规则来选择更具信息量的正负样本对。此外，还开发了一种新型的排名损失，以为不同级别的正样本分配不同权重，使编码器能够提取同一类的细粒度信息，并在不同类别之间产生清晰的边界。对128个UCR数据集和30个UEA数据集进行的彻底实证研究表明，所提出的RankSCL相比现有基线方法可以实现最先进的性能。

更新时间: 2024-10-09 18:29:17

领域: cs.LG

下载: http://arxiv.org/abs/2401.18057v2

Nuclear Norm Regularization for Deep Learning

Penalizing the nuclear norm of a function's Jacobian encourages it to locally behave like a low-rank linear map. Such functions vary locally along only a handful of directions, making the Jacobian nuclear norm a natural regularizer for machine learning problems. However, this regularizer is intractable for high-dimensional problems, as it requires computing a large Jacobian matrix and taking its singular value decomposition. We show how to efficiently penalize the Jacobian nuclear norm using techniques tailor-made for deep learning. We prove that for functions parametrized as compositions $f = g \circ h$, one may equivalently penalize the average squared Frobenius norm of $Jg$ and $Jh$. We then propose a denoising-style approximation that avoids the Jacobian computations altogether. Our method is simple, efficient, and accurate, enabling Jacobian nuclear norm regularization to scale to high-dimensional deep learning problems. We complement our theory with an empirical study of our regularizer's performance and investigate applications to denoising and representation learning.

Updated: 2024-10-09 18:25:15

标题: 深度学习的核范数正则化

摘要: 对函数的雅可比矩阵的核范数进行惩罚，鼓励其在局部行为像低秩线性映射。这样的函数在局部仅沿着少数方向变化，使得雅可比矩阵的核范数成为机器学习问题的自然正则化器。然而，对于高维问题来说，这种正则化器是难以处理的，因为需要计算一个庞大的雅可比矩阵并对其进行奇异值分解。我们展示了如何利用专门为深度学习设计的技术有效地惩罚雅可比矩阵的核范数。我们证明了对于参数化为组合 $f = g \circ h$ 的函数，可以等效地惩罚 $Jg$ 和 $Jh$ 的平均平方弗罗贝尼乌斯范数。然后我们提出了一种避免雅可比计算的去噪式近似方法。我们的方法简单、高效、准确，使得雅可比矩阵的核范数正则化能够应用于高维深度学习问题。我们通过实证研究补充了我们理论的性能，并探讨了去噪和表示学习的应用。

更新时间: 2024-10-09 18:25:15

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2405.14544v2

A Two-Model Approach for Humour Style Recognition

Humour, a fundamental aspect of human communication, manifests itself in various styles that significantly impact social interactions and mental health. Recognising different humour styles poses challenges due to the lack of established datasets and machine learning (ML) models. To address this gap, we present a new text dataset for humour style recognition, comprising 1463 instances across four styles (self-enhancing, self-deprecating, affiliative, and aggressive) and non-humorous text, with lengths ranging from 4 to 229 words. Our research employs various computational methods, including classic machine learning classifiers, text embedding models, and DistilBERT, to establish baseline performance. Additionally, we propose a two-model approach to enhance humour style recognition, particularly in distinguishing between affiliative and aggressive styles. Our method demonstrates an 11.61% improvement in f1-score for affiliative humour classification, with consistent improvements in the 14 models tested. Our findings contribute to the computational analysis of humour in text, offering new tools for studying humour in literature, social media, and other textual sources.

Updated: 2024-10-09 18:25:07

标题: 一种用于幽默风格识别的双模型方法

摘要: Humour, 一种人类交流中的基本要素，表现出各种风格，显著影响社交互动和心理健康。由于缺乏已建立的数据集和机器学习（ML）模型，识别不同的幽默风格存在挑战。为了填补这一空白，我们提出了一个新的文本数据集，用于幽默风格识别，包括1463个实例，涵盖四种风格（自我增强、自我贬低、亲和力和攻击性）和非幽默文本，长度从4到229个单词不等。我们的研究采用了各种计算方法，包括经典的机器学习分类器、文本嵌入模型和DistilBERT，以建立基准性能。此外，我们提出了一个两模型方法，以增强幽默风格识别，特别是在区分亲和力和攻击性风格方面。我们的方法在亲和力幽默分类的F1分数上表现出11.61%的改进，对14个测试模型一致改善。我们的研究结果有助于文本中幽默的计算分析，为研究文学、社交媒体和其他文本来源中的幽默提供新工具。

更新时间: 2024-10-09 18:25:07

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2410.12842v1

Fast leave-one-cluster-out cross-validation using clustered Network Information Criterion (NICc)

For prediction models developed on clustered data that do not account for cluster heterogeneity in model parameterization, it is crucial to use cluster-based validation to assess model generalizability on unseen clusters. This paper introduces a clustered estimator of the Network Information Criterion (NICc) to approximate leave-one-cluster-out deviance for standard prediction models with twice differentiable log-likelihood functions. The NICc serves as a fast alternative to cluster-based cross-validation. Stone (1977) proved that the Akaike Information Criterion (AIC) is asymptotically equivalent to leave-one-observation-out cross-validation for true parametric models with independent and identically distributed observations. Ripley (1996) noted that the Network Information Criterion (NIC), derived from Stone's proof, is a better approximation when the model is misspecified. For clustered data, we derived NICc by substituting the Fisher information matrix in the NIC with a clustering-adjusted estimator. The NICc imposes a greater penalty when the data exhibits stronger clustering, thereby allowing the NICc to better prevent over-parameterization. In a simulation study and an empirical example, we used standard regression to develop prediction models for clustered data with Gaussian or binomial responses. Compared to the commonly used AIC and BIC for standard regression, NICc provides a much more accurate approximation to leave-one-cluster-out deviance and results in more accurate model size and variable selection, as determined by cluster-based cross-validation, especially when the data exhibit strong clustering.

Updated: 2024-10-09 18:24:39

标题: 使用聚类网络信息准则（NICc）进行快速留一簇外交叉验证

摘要: 对于在集群数据上开发的预测模型，如果不考虑模型参数化中的集群异质性，那么使用基于集群的验证来评估模型在未见集群上的泛化能力至关重要。本文介绍了一种集群估计的网络信息准则（NICc），用于近似标准预测模型的留一集群出偏差，这些模型具有两次可微对数似然函数。NICc可作为基于集群的交叉验证的快速替代方法。Stone（1977）证明了阿卡奇克信息准则（AIC）在真实参数模型中与留一观测出交叉验证渐近等价，其中观测独立且同分布。Ripley（1996）指出，根据Stone的证明导出的网络信息准则（NIC）在模型错误指定时是一个更好的近似。对于集群数据，我们通过在NIC中用一个经过集群调整的估计器替换费舍尔信息矩阵来推导NICc。当数据表现出更强的集群性时，NICc会施加更大的惩罚，从而使NICc能更好地防止过度参数化。在模拟研究和实证例中，我们使用标准回归来为具有高斯或二项分布响应的集群数据开发预测模型。与常用的AIC和BIC相比，NICc对留一集群出偏差提供了更准确的近似，并通过基于集群的交叉验证确定了更准确的模型大小和变量选择，尤其是当数据表现出强烈的集群性时。

更新时间: 2024-10-09 18:24:39

领域: stat.ME,cs.LG,stat.CO,stat.ML

下载: http://arxiv.org/abs/2405.20400v2

Unlocking Real-Time Fluorescence Lifetime Imaging: Multi-Pixel Parallelism for FPGA-Accelerated Processing

Fluorescence lifetime imaging (FLI) is a widely used technique in the biomedical field for measuring the decay times of fluorescent molecules, providing insights into metabolic states, protein interactions, and ligand-receptor bindings. However, its broader application in fast biological processes, such as dynamic activity monitoring, and clinical use, such as in guided surgery, is limited by long data acquisition times and computationally demanding data processing. While deep learning has reduced post-processing times, time-resolved data acquisition remains a bottleneck for real-time applications. To address this, we propose a method to achieve real-time FLI using an FPGA-based hardware accelerator. Specifically, we implemented a GRU-based sequence-to-sequence (Seq2Seq) model on an FPGA board compatible with time-resolved cameras. The GRU model balances accurate processing with the resource constraints of FPGAs, which have limited DSP units and BRAM. The limited memory and computational resources on the FPGA require efficient scheduling of operations and memory allocation to deploy deep learning models for low-latency applications. We address these challenges by using STOMP, a queue-based discrete-event simulator that automates and optimizes task scheduling and memory management on hardware. By integrating a GRU-based Seq2Seq model and its compressed version, called Seq2SeqLite, generated through knowledge distillation, we were able to process multiple pixels in parallel, reducing latency compared to sequential processing. We explore various levels of parallelism to achieve an optimal balance between performance and resource utilization. Our results indicate that the proposed techniques achieved a 17.7x and 52.0x speedup over manual scheduling for the Seq2Seq model and the Seq2SeqLite model, respectively.

Updated: 2024-10-09 18:24:23

标题: 解锁实时荧光寿命成像：FPGA加速处理的多像素并行化

摘要: 荧光寿命成像（FLI）是生物医学领域中广泛使用的技术，用于测量荧光分子的衰减时间，提供关于代谢状态、蛋白质相互作用和配体-受体结合的见解。然而，在快速生物过程（如动态活动监测）和临床应用（如引导手术）中，由于长时间数据采集和计算量大的数据处理，其更广泛的应用受到限制。尽管深度学习已经减少了后期处理时间，但时间分辨数据采集仍然是实时应用的瓶颈。为解决这一问题，我们提出了一种使用基于FPGA的硬件加速器实现实时FLI的方法。具体而言，我们在与时间分辨相机兼容的FPGA板上实现了基于GRU的序列到序列（Seq2Seq）模型。GRU模型在精确处理和FPGA的资源限制之间取得平衡，FPGA具有有限的DSP单元和BRAM。FPGA上有限的内存和计算资源要求有效地调度操作和内存分配，以部署用于低延迟应用的深度学习模型。我们通过使用STOMP，一个基于队列的离散事件模拟器，自动化和优化硬件上的任务调度和内存管理来解决这些挑战。通过集成基于GRU的Seq2Seq模型及其经过知识蒸馏生成的压缩版本Seq2SeqLite，我们能够并行处理多个像素，与顺序处理相比降低延迟。我们探索各种级别的并行性，以实现性能和资源利用的最佳平衡。我们的结果表明，所提出的技术分别在Seq2Seq模型和Seq2SeqLite模型上实现了17.7倍和52.0倍的加速，相比手动调度。

更新时间: 2024-10-09 18:24:23

领域: physics.optics,cs.AI,cs.DC,cs.LG

下载: http://arxiv.org/abs/2410.07364v1

Improving the portability of predicting students performance models by using ontologies

One of the main current challenges in Educational Data Mining and Learning Analytics is the portability or transferability of predictive models obtained for a particular course so that they can be applied to other different courses. To handle this challenge, one of the foremost problems is the models excessive dependence on the low-level attributes used to train them, which reduces the models portability. To solve this issue, the use of high level attributes with more semantic meaning, such as ontologies, may be very useful. Along this line, we propose the utilization of an ontology that uses a taxonomy of actions that summarises students interactions with the Moodle learning management system. We compare the results of this proposed approach against our previous results when we used low-level raw attributes obtained directly from Moodle logs. The results indicate that the use of the proposed ontology improves the portability of the models in terms of predictive accuracy. The main contribution of this paper is to show that the ontological models obtained in one source course can be applied to other different target courses with similar usage levels without losing prediction accuracy.

Updated: 2024-10-09 18:18:54

标题: 通过使用本体论来提高预测学生表现模型的可移植性

摘要: 教育数据挖掘和学习分析领域当前面临的主要挑战之一是预测模型在特定课程中获得后的可移植性或转移性，以便能够应用到其他不同的课程中。为了解决这一挑战，其中一个最主要的问题是模型过度依赖用于训练它们的低级属性，这降低了模型的可移植性。为了解决这个问题，使用具有更多语义含义的高级属性，如本体论，可能非常有用。沿着这条线，我们提出利用一个本体论，其中使用了一个总结学生与Moodle学习管理系统的交互的行动分类法。我们将这种提出的方法的结果与我们以前直接从Moodle日志中获取的低级原始属性时的结果进行比较。结果表明，使用所提出的本体论可以提高模型的可移植性，即预测准确性。本文的主要贡献是表明，在一个来源课程中获得的本体模型可以应用于其他不同的目标课程，而不会失去预测准确性。

更新时间: 2024-10-09 18:18:54

领域: cs.AI

下载: http://arxiv.org/abs/2410.07358v1

Generating Origin-Destination Matrices in Neural Spatial Interaction Models

Agent-based models (ABMs) are proliferating as decision-making tools across policy areas in transportation, economics, and epidemiology. In these models, a central object of interest is the discrete origin-destination matrix which captures spatial interactions and agent trip counts between locations. Existing approaches resort to continuous approximations of this matrix and subsequent ad-hoc discretisations in order to perform ABM simulation and calibration. This impedes conditioning on partially observed summary statistics, fails to explore the multimodal matrix distribution over a discrete combinatorial support, and incurs discretisation errors. To address these challenges, we introduce a computationally efficient framework that scales linearly with the number of origin-destination pairs, operates directly on the discrete combinatorial space, and learns the agents' trip intensity through a neural differential equation that embeds spatial interactions. Our approach outperforms the prior art in terms of reconstruction error and ground truth matrix coverage, at a fraction of the computational cost. We demonstrate these benefits in large-scale spatial mobility ABMs in Cambridge, UK and Washington, DC, USA.

Updated: 2024-10-09 18:09:02

标题: 在神经空间交互模型中生成起始地-目的地矩阵

摘要: 基于代理的模型（ABMs）作为决策工具在交通、经济和流行病学等政策领域中不断增多。在这些模型中，一个关键的研究对象是离散的起点-终点矩阵，该矩阵捕捉了地理位置之间的空间交互作用和代理人之间的行程次数。现有方法倾向于对这个矩阵进行连续逼近，然后进行后续的临时离散化，以便进行ABM模拟和校准。这种方法阻碍了根据部分观察到的汇总统计数据进行条件设定，未能探索离散组合支持上的多模态矩阵分布，并且导致离散化误差。为了解决这些挑战，我们引入了一个计算效率高的框架，其与起点-终点对数量呈线性比例，直接在离散组合空间上运行，并通过嵌入空间交互的神经微分方程学习代理人的行程强度。我们的方法在重建误差和地面真实矩阵覆盖方面优于先前的技术，在计算成本的一小部分。我们在英国剑桥和美国华盛顿特区的大规模空间流动ABMs中展示了这些优势。

更新时间: 2024-10-09 18:09:02

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2410.07352v1

Modeling Causal Mechanisms with Diffusion Models for Interventional and Counterfactual Queries

We consider the problem of answering observational, interventional, and counterfactual queries in a causally sufficient setting where only observational data and the causal graph are available. Utilizing the recent developments in diffusion models, we introduce diffusion-based causal models (DCM) to learn causal mechanisms, that generate unique latent encodings. These encodings enable us to directly sample under interventions and perform abduction for counterfactuals. Diffusion models are a natural fit here, since they can encode each node to a latent representation that acts as a proxy for exogenous noise. Our empirical evaluations demonstrate significant improvements over existing state-of-the-art methods for answering causal queries. Furthermore, we provide theoretical results that offer a methodology for analyzing counterfactual estimation in general encoder-decoder models, which could be useful in settings beyond our proposed approach.

Updated: 2024-10-09 18:04:37

标题: 使用扩散模型对干预和反事实查询建模因果机制

摘要: 我们考虑在因果充分设置中，仅有观测数据和因果图可用的情况下回答观测、干预和反事实查询的问题。利用扩散模型的最新发展，我们引入了基于扩散的因果模型（DCM）来学习生成独特潜在编码的因果机制。这些编码使我们能够直接在干预下进行采样并进行反事实推理。扩散模型在这里是一个自然的选择，因为它们可以将每个节点编码为代表外生噪声的潜在表示。我们的实证评估显示，与现有的最先进方法相比，我们的方法在回答因果查询方面取得了显著的改进。此外，我们提供了理论结果，提供了一种分析一般编码器-解码器模型中反事实估计的方法，这对超出我们提出的方法的设置可能是有用的。

更新时间: 2024-10-09 18:04:37

领域: stat.ML,cs.LG,stat.ME

下载: http://arxiv.org/abs/2302.00860v3

MoE++: Accelerating Mixture-of-Experts Methods with Zero-Computation Experts

In this work, we aim to simultaneously enhance the effectiveness and efficiency of Mixture-of-Experts (MoE) methods. To achieve this, we propose MoE++, a general and heterogeneous MoE framework that integrates both Feed-Forward Network~(FFN) and zero-computation experts. Specifically, we introduce three types of zero-computation experts: the zero expert, copy expert, and constant expert, which correspond to discard, skip, and replace operations, respectively. This design offers three key advantages: (i) Low Computing Overhead: Unlike the uniform mixing mechanism for all tokens within vanilla MoE, MoE++ allows each token to engage with a dynamic number of FFNs, be adjusted by constant vectors, or even skip the MoE layer entirely. (ii) High Performance: By enabling simple tokens to utilize fewer FFN experts, MoE++ allows more experts to focus on challenging tokens, thereby unlocking greater performance potential than vanilla MoE. (iii) Deployment Friendly: Given that zero-computation experts have negligible parameters, we can deploy all zero-computation experts on each GPU, eliminating the significant communication overhead and expert load imbalance associated with FFN experts distributed across different GPUs. Moreover, we leverage gating residuals, enabling each token to consider the pathway taken in the previous layer when selecting the appropriate experts. Extensive experimental results demonstrate that MoE++ achieves better performance while delivering 1.1-2.1x expert forward throughput compared to a vanilla MoE model of the same size, which lays a solid foundation for developing advanced and efficient MoE-related models.

Updated: 2024-10-09 18:01:27

标题: MoE++：零计算专家加速专家混合方法

摘要: 在这项工作中，我们旨在同时增强Mixture-of-Experts（MoE）方法的效果和效率。为了实现这一目标，我们提出了MoE++，这是一个集成了前馈网络（FFN）和零计算专家的通用和异构MoE框架。具体来说，我们引入了三种类型的零计算专家：零专家、复制专家和常量专家，分别对应于丢弃、跳过和替换操作。这种设计提供了三个关键优势：（i）低计算开销：与普通MoE中所有令牌的统一混合机制不同，MoE++允许每个令牌与动态数量的FFN进行交互，通过常量向量进行调整，甚至完全跳过MoE层。（ii）高性能：通过使简单令牌利用更少的FFN专家，MoE++允许更多的专家专注于具有挑战性的令牌，从而释放比普通MoE更大的性能潜力。（iii）部署友好：由于零计算专家具有可忽略的参数，我们可以在每个GPU上部署所有零计算专家，消除了FFN专家分布在不同GPU上时的重要通信开销和专家负载不平衡。此外，我们利用门控残差，使每个令牌在选择合适的专家时考虑上一层中所采取的路径。广泛的实验结果表明，与相同大小的普通MoE模型相比，MoE++实现了更好的性能，同时提供了1.1-2.1倍的专家前向吞吐量，为开发先进和高效的MoE相关模型打下了坚实的基础。

更新时间: 2024-10-09 18:01:27

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2410.07348v1

Positive-Augmented Contrastive Learning for Vision-and-Language Evaluation and Training

Despite significant advancements in caption generation, existing evaluation metrics often fail to capture the full quality or fine-grained details of captions. This is mainly due to their reliance on non-specific human-written references or noisy pre-training data. Still, finding an effective metric is crucial not only for captions evaluation but also for the generation phase. Metrics can indeed play a key role in the fine-tuning stage of captioning models, ultimately enhancing the quality of the generated captions. In this paper, we propose PAC-S++, a learnable metric that leverages the CLIP model, pre-trained on both web-collected and cleaned data and regularized through additional pairs of generated visual and textual positive samples. Exploiting this stronger and curated pre-training, we also apply PAC-S++ as a reward in the Self-Critical Sequence Training (SCST) stage typically employed to fine-tune captioning models. Extensive experiments on different image and video datasets highlight the effectiveness of PAC-S++ compared to popular metrics for the task, including its sensitivity to object hallucinations. Furthermore, we show that integrating PAC-S++ into the fine-tuning stage of a captioning model results in semantically richer captions with fewer repetitions and grammatical errors. Evaluations on out-of-domain benchmarks further demonstrate the efficacy of our fine-tuning approach in enhancing model capabilities. Source code and trained models are publicly available at: https://github.com/aimagelab/pacscore.

Updated: 2024-10-09 18:00:09

标题: 积极增强对比学习用于视觉和语言评估与训练

摘要: 尽管字幕生成取得了显著进展，但现有的评估指标通常无法捕捉字幕的完整质量或细致的细节。这主要是由于它们依赖于非特定的人类编写的参考资料或嘈杂的预训练数据。然而，找到一种有效的指标对于字幕评估和生成阶段都至关重要。指标确实可以在字幕模型的微调阶段发挥关键作用，最终提高生成字幕的质量。在本文中，我们提出了PAC-S++，这是一种可学习的指标，利用了在网络收集和清理数据上预训练的CLIP模型，并通过额外的生成的视觉和文本正样本对进行正则化。利用这种更强大和策划的预训练，我们还将PAC-S++应用为Self-Critical Sequence Training（SCST）阶段的奖励，通常用于微调字幕模型。对不同的图像和视频数据集进行的大量实验突显了PAC-S++相对于任务的流行指标的有效性，包括其对物体幻觉的敏感性。此外，我们展示了将PAC-S++整合到字幕模型的微调阶段会导致语义更丰富的字幕，减少重复和语法错误。对域外基准的评估进一步证明了我们微调方法在增强模型能力方面的功效。源代码和训练模型可公开获取：https://github.com/aimagelab/pacscore。

更新时间: 2024-10-09 18:00:09

领域: cs.CV,cs.AI,cs.CL,cs.MM

下载: http://arxiv.org/abs/2410.07336v1

MM-Ego: Towards Building Egocentric Multimodal LLMs

This research aims to comprehensively explore building a multimodal foundation model for egocentric video understanding. To achieve this goal, we work on three fronts. First, as there is a lack of QA data for egocentric video understanding, we develop a data engine that efficiently generates 7M high-quality QA samples for egocentric videos ranging from 30 seconds to one hour long, based on human-annotated data. This is currently the largest egocentric QA dataset. Second, we contribute a challenging egocentric QA benchmark with 629 videos and 7,026 questions to evaluate the models' ability in recognizing and memorizing visual details across videos of varying lengths. We introduce a new de-biasing evaluation method to help mitigate the unavoidable language bias present in the models being evaluated. Third, we propose a specialized multimodal architecture featuring a novel "Memory Pointer Prompting" mechanism. This design includes a global glimpse step to gain an overarching understanding of the entire video and identify key visual information, followed by a fallback step that utilizes the key visual information to generate responses. This enables the model to more effectively comprehend extended video content. With the data, benchmark, and model, we successfully build MM-Ego, an egocentric multimodal LLM that shows powerful performance on egocentric video understanding.

Updated: 2024-10-09 17:59:59

标题: MM-Ego:朝着构建自我中心的多模态LLMs

摘要: 这项研究旨在全面探索建立一个用于自我中心视频理解的多模态基础模型。为实现这一目标，我们在三个方面工作。首先，由于自我中心视频理解缺乏问答数据，我们开发了一个数据引擎，根据人工标注数据，高效生成了700万个高质量的自我中心视频问答样本，时长从30秒到1小时不等。这是目前最大的自我中心问答数据集。其次，我们提供了一个具有挑战性的自我中心视频问答基准，包含629个视频和7026个问题，用于评估模型在识别和记忆不同长度视频中的视觉细节方面的能力。我们引入了一种新的去偏差评估方法，帮助减轻模型评估中不可避免的语言偏见。第三，我们提出了一个专门的多模态架构，具有一种新颖的“记忆指针提示”机制。该设计包括一个全局视觉步骤，以获得对整个视频的总体理解，并识别关键的视觉信息，随后是利用关键的视觉信息生成响应的后退步骤。这使得模型能更有效地理解扩展视频内容。借助数据、基准和模型，我们成功构建了MM-Ego，一个自我中心多模态LLM，在自我中心视频理解方面展现出强大的性能。

更新时间: 2024-10-09 17:59:59

领域: cs.CV,cs.AI,cs.LG

下载: http://arxiv.org/abs/2410.07177v1

Astute RAG: Overcoming Imperfect Retrieval Augmentation and Knowledge Conflicts for Large Language Models

Retrieval-Augmented Generation (RAG), while effective in integrating external knowledge to address the limitations of large language models (LLMs), can be undermined by imperfect retrieval, which may introduce irrelevant, misleading, or even malicious information. Despite its importance, previous studies have rarely explored the behavior of RAG through joint analysis on how errors from imperfect retrieval attribute and propagate, and how potential conflicts arise between the LLMs' internal knowledge and external sources. We find that imperfect retrieval augmentation might be inevitable and quite harmful, through controlled analysis under realistic conditions. We identify the knowledge conflicts between LLM-internal and external knowledge from retrieval as a bottleneck to overcome in the post-retrieval stage of RAG. To render LLMs resilient to imperfect retrieval, we propose Astute RAG, a novel RAG approach that adaptively elicits essential information from LLMs' internal knowledge, iteratively consolidates internal and external knowledge with source-awareness, and finalizes the answer according to information reliability. Our experiments using Gemini and Claude demonstrate that Astute RAG significantly outperforms previous robustness-enhanced RAG methods. Notably, Astute RAG is the only approach that matches or exceeds the performance of LLMs without RAG under worst-case scenarios. Further analysis reveals that Astute RAG effectively resolves knowledge conflicts, improving the reliability and trustworthiness of RAG systems.

Updated: 2024-10-09 17:59:58

标题: Astute RAG: 克服大型语言模型的不完美检索增强和知识冲突

摘要: 检索增强生成（RAG）在整合外部知识以解决大型语言模型（LLMs）的局限性方面效果显著，但可能会受到不完美检索的影响，这可能会引入无关、误导性甚至恶意信息。尽管其重要性，先前的研究很少探讨RAG的行为，即错误如何从不完美检索中归因和传播，以及LLMs的内部知识与外部来源之间可能产生的冲突。我们发现，不完美检索增强可能是不可避免的，并且在现实条件下进行了受控分析后，发现其具有相当有害性。我们确定LLM-内部和检索外部知识之间的知识冲突是RAG后检索阶段要克服的瓶颈。为了使LLMs对不完美检索具有弹性，我们提出了一种新颖的RAG方法Astute RAG，通过自适应地从LLMs的内部知识中引出关键信息，通过源感知性迭代地巩固内部和外部知识，并根据信息可靠性最终确定答案。我们使用Gemini和Claude进行的实验表明，Astute RAG明显优于先前的增强鲁棒性的RAG方法。值得注意的是，Astute RAG是唯一在最坏情况下与或超过没有RAG的LLMs性能的方法。进一步分析表明，Astute RAG有效解决了知识冲突，提高了RAG系统的可靠性和可信度。

更新时间: 2024-10-09 17:59:58

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2410.07176v1

Neural Circuit Architectural Priors for Quadruped Locomotion

Learning-based approaches to quadruped locomotion commonly adopt generic policy architectures like fully connected MLPs. As such architectures contain few inductive biases, it is common in practice to incorporate priors in the form of rewards, training curricula, imitation data, or trajectory generators. In nature, animals are born with priors in the form of their nervous system's architecture, which has been shaped by evolution to confer innate ability and efficient learning. For instance, a horse can walk within hours of birth and can quickly improve with practice. Such architectural priors can also be useful in ANN architectures for AI. In this work, we explore the advantages of a biologically inspired ANN architecture for quadruped locomotion based on neural circuits in the limbs and spinal cord of mammals. Our architecture achieves good initial performance and comparable final performance to MLPs, while using less data and orders of magnitude fewer parameters. Our architecture also exhibits better generalization to task variations, even admitting deployment on a physical robot without standard sim-to-real methods. This work shows that neural circuits can provide valuable architectural priors for locomotion and encourages future work in other sensorimotor skills.

Updated: 2024-10-09 17:59:45

标题: 四足动物运动的神经回路结构先验

摘要: 基于学习的四足动物运动方法通常采用全连接MLP等通用策略架构。由于此类架构缺乏归纳偏差，在实践中通常会引入先验，例如奖励、训练课程、模仿数据或轨迹生成器。在自然界中，动物出生时具有其神经系统架构的先验，这种架构经过演化塑造，赋予了天生的能力和高效的学习能力。例如，马出生后几小时就能行走，并且能够通过练习迅速提高。这种架构先验在人工神经网络架构中也可以发挥作用。在这项工作中，我们探讨了一种受生物启发的人工神经网络架构，用于基于哺乳动物四肢和脊髓的神经回路的四足动物运动。我们的架构实现了良好的初始性能，并且在使用更少的数据和数量级较少的参数的情况下实现了与MLP相当的最终性能。我们的架构还表现出更好的泛化能力，甚至可以在没有标准的模拟到真实方法的情况下部署到物理机器人上。这项工作表明神经回路可以为运动提供有价值的架构先验，并鼓励未来在其他感觉运动技能方面的研究。

更新时间: 2024-10-09 17:59:45

领域: q-bio.NC,cs.AI,cs.LG,cs.NE,cs.RO

下载: http://arxiv.org/abs/2410.07174v1

Do better language models have crisper vision?

How well do text-only Large Language Models (LLMs) grasp the visual world? As LLMs are increasingly used in computer vision, addressing this question becomes both fundamental and pertinent. However, existing studies have primarily focused on limited scenarios, such as their ability to generate visual content or cluster multimodal data. To this end, we propose the Visual Text Representation Benchmark (ViTeRB) to isolate key properties that make language models well-aligned with the visual world. With this, we identify large-scale decoder-based LLMs as ideal candidates for representing text in vision-centric contexts, counter to the current practice of utilizing text encoders. Building on these findings, we propose ShareLock, an ultra-lightweight CLIP-like model. By leveraging precomputable frozen features from strong vision and language models, ShareLock achieves an impressive 51% accuracy on ImageNet despite utilizing just 563k image-caption pairs. Moreover, training requires only 1 GPU hour (or 10 hours including the precomputation of features) - orders of magnitude less than prior methods. Code will be released.

Updated: 2024-10-09 17:59:33

标题: 更好的语言模型是否具有更清晰的视觉？

摘要: 文本唯一的大型语言模型（LLMs）有多好地理解视觉世界？随着LLMs在计算机视觉中的应用越来越广泛，解决这个问题变得基本和相关。然而，现有研究主要集中在有限的场景，如它们生成视觉内容或聚类多模态数据的能力。为此，我们提出了Visual Text Representation Benchmark（ViTeRB），以隔离使语言模型与视觉世界良好对齐的关键属性。通过这一举措，我们确定基于大规模解码器的LLMs是在视觉中心环境中代表文本的理想候选者，与当前利用文本编码器的做法相反。基于这些发现，我们提出了ShareLock，一个类似CLIP的超轻量级模型。通过利用强大的视觉和语言模型中可预先计算的冻结特征，ShareLock在ImageNet上实现了惊人的51％准确度，尽管仅利用了563k个图像标题对。此外，训练仅需要1个GPU小时（包括特征的预计算在内则需要10小时）-比之前的方法少几个数量级。代码将被发布。

更新时间: 2024-10-09 17:59:33

领域: cs.CL,cs.AI,cs.CV

下载: http://arxiv.org/abs/2410.07173v1

Glider: Global and Local Instruction-Driven Expert Router

The availability of performant pre-trained models has led to a proliferation of fine-tuned expert models that are specialized to particular domains. This has enabled the creation of powerful and adaptive routing-based "Model MoErging" methods with the goal of using expert modules to create an aggregate system with improved performance or generalization. However, existing MoErging methods often prioritize generalization to unseen tasks at the expense of performance on held-in tasks, which limits its practical applicability in real-world deployment scenarios. We observe that current token-level routing mechanisms neglect the global semantic context of the input task. This token-wise independence hinders effective expert selection for held-in tasks, as routing decisions fail to incorporate the semantic properties of the task. To address this, we propose, Global and Local Instruction Driven Expert Router (GLIDER) that integrates a multi-scale routing mechanism, encompassing a semantic global router and a learned local router. The global router leverages LLM's advanced reasoning capabilities for semantic-related contexts to enhance expert selection. Given the input query and LLM, the router generates semantic task instructions that guide the retrieval of the most relevant experts across all layers. This global guidance is complemented by a local router that facilitates token-level routing decisions within each module, enabling finer control and enhanced performance on unseen tasks. Our experiments using T5-based models for T0 and FLAN tasks demonstrate that GLIDER achieves substantially improved held-in performance while maintaining strong generalization on held-out tasks. We also perform ablations experiments to dive deeper into the components of GLIDER. Our experiments highlight the importance of our multi-scale routing that leverages LLM-driven semantic reasoning for MoErging methods.

Updated: 2024-10-09 17:59:14

标题: 滑翔机：全局和局部指令驱动的专家路由器

摘要: 性能良好的预训练模型的可用性导致了专门针对特定领域的微调专家模型的激增。这使得可以创建功能强大且适应性强的基于路由的“模型融合”方法，其目标是利用专家模块创建性能或泛化能力改进的整体系统。然而，现有的融合方法往往优先考虑对未见任务的泛化性，而牺牲了对保留任务的性能，从而限制了其在实际部署场景中的适用性。我们观察到当前的标记级路由机制忽视了输入任务的全局语义上下文。这种基于标记的独立性阻碍了对保留任务进行有效的专家选择，因为路由决策未能融入任务的语义属性。为了解决这个问题，我们提出了全局和局部指导驱动的专家路由器(GLIDER)，它整合了多尺度路由机制，包括语义全局路由器和学习的局部路由器。全局路由器利用LLM的先进推理能力来增强与语义相关的上下文以提高专家选择。给定输入查询和LLM，路由器生成指导检索所有层中最相关专家的语义任务指令。这种全局指导与一个局部路由器相辅相成，后者促进了每个模块内的标记级路由决策，实现对未见任务的更精细控制和增强性能。我们使用基于T5模型的T0和FLAN任务进行的实验表明，GLIDER在保留任务的性能方面取得了显著改进，同时在保留任务上保持了较强的泛化性能。我们还进行了消融实验，深入探讨了GLIDER的组成部分。我们的实验突出了我们的多尺度路由的重要性，利用LLM驱动的语义推理来促进融合方法。

更新时间: 2024-10-09 17:59:14

领域: cs.LG

下载: http://arxiv.org/abs/2410.07172v1

One Initialization to Rule them All: Fine-tuning via Explained Variance Adaptation

Foundation models (FMs) are pre-trained on large-scale datasets and then fine-tuned on a downstream task for a specific application. The most successful and most commonly used fine-tuning method is to update the pre-trained weights via a low-rank adaptation (LoRA). LoRA introduces new weight matrices that are usually initialized at random with a uniform rank distribution across model weights. Recent works focus on weight-driven initialization or learning of adaptive ranks during training. Both approaches have only been investigated in isolation, resulting in slow convergence or a uniform rank distribution, in turn leading to sub-optimal performance. We propose to enhance LoRA by initializing the new weights in a data-driven manner by computing singular value decomposition on minibatches of activation vectors. Then, we initialize the LoRA matrices with the obtained right-singular vectors and re-distribute ranks among all weight matrices to explain the maximal amount of variance and continue the standard LoRA fine-tuning procedure. This results in our new method Explained Variance Adaptation (EVA). We apply EVA to a variety of fine-tuning tasks ranging from language generation and understanding to image classification and reinforcement learning. EVA exhibits faster convergence than competitors and attains the highest average score across a multitude of tasks per domain.

Updated: 2024-10-09 17:59:06

标题: 一种初始化方法统治所有：通过解释方差适应进行微调

摘要: 基金会模型（FMs）是在大规模数据集上进行预训练，然后在特定应用的下游任务上进行微调的。最成功和最常用的微调方法是通过低秩适应（LoRA）更新预训练权重。LoRA引入新的权重矩阵，通常以均匀的等级分布随机初始化模型权重。最近的工作集中在权重驱动的初始化或在训练过程中学习自适应等级。这两种方法只进行了孤立的研究，导致收敛速度较慢或等级分布均匀，从而导致次优性能。我们提出通过在激活向量的小批量上计算奇异值分解，以数据驱动方式初始化新权重。然后，我们使用获得的右奇异向量初始化LoRA矩阵，并重新分配所有权重矩阵的等级以解释最大量的方差，并继续标准的LoRA微调过程。这导致我们的新方法解释方差适应（EVA）。我们将EVA应用于各种微调任务，从语言生成和理解到图像分类和强化学习。EVA表现出比竞争对手更快的收敛速度，并在每个领域的多个任务中获得最高的平均分数。

更新时间: 2024-10-09 17:59:06

领域: cs.LG,cs.AI,cs.CL,stat.ML

下载: http://arxiv.org/abs/2410.07170v1

Embodied Agent Interface: Benchmarking LLMs for Embodied Decision Making

We aim to evaluate Large Language Models (LLMs) for embodied decision making. While a significant body of work has been leveraging LLMs for decision making in embodied environments, we still lack a systematic understanding of their performance because they are usually applied in different domains, for different purposes, and built based on different inputs and outputs. Furthermore, existing evaluations tend to rely solely on a final success rate, making it difficult to pinpoint what ability is missing in LLMs and where the problem lies, which in turn blocks embodied agents from leveraging LLMs effectively and selectively. To address these limitations, we propose a generalized interface (Embodied Agent Interface) that supports the formalization of various types of tasks and input-output specifications of LLM-based modules. Specifically, it allows us to unify 1) a broad set of embodied decision-making tasks involving both state and temporally extended goals, 2) four commonly-used LLM-based modules for decision making: goal interpretation, subgoal decomposition, action sequencing, and transition modeling, and 3) a collection of fine-grained metrics which break down evaluation into various types of errors, such as hallucination errors, affordance errors, various types of planning errors, etc. Overall, our benchmark offers a comprehensive assessment of LLMs' performance for different subtasks, pinpointing the strengths and weaknesses in LLM-powered embodied AI systems, and providing insights for effective and selective use of LLMs in embodied decision making.

Updated: 2024-10-09 17:59:00

标题: 具象化代理界面：基于具象化决策制定的LLM基准测试

摘要: 我们的目标是评估大型语言模型（LLMs）在具体决策中的应用。虽然已经有大量研究利用LLMs进行具体环境下的决策，但由于它们通常应用于不同的领域、不同的目的，以及基于不同的输入和输出，我们仍然缺乏对它们性能的系统性理解。此外，现有的评估往往仅依赖于最终的成功率，这使得很难确定LLMs中缺少哪种能力以及问题所在，进而阻碍了具体代理人有效和有选择地利用LLMs。为了解决这些限制，我们提出了一个通用接口（具体代理人接口），支持对基于LLM的模块的各种类型任务和输入输出规范进行形式化。具体而言，它允许我们统一1）涉及状态和时间延伸目标的广泛的具体决策任务，2）用于决策制定的四种常用的基于LLM的模块：目标解释、子目标分解、行动排序和转换建模，以及3）一系列细粒度指标，将评估细分为各种类型的错误，如幻觉错误、可供性错误、各种类型的规划错误等。总的来说，我们的基准测试提供了对LLMs在不同子任务中表现的全面评估，指出了LLM驱动的具体人工智能系统的优势和劣势，并为在具体决策中有效和有选择地使用LLMs提供了见解。

更新时间: 2024-10-09 17:59:00

领域: cs.CL,cs.AI,cs.LG,cs.RO

下载: http://arxiv.org/abs/2410.07166v1

A neural network-based approach to hybrid systems identification for control

We consider the problem of designing a machine learning-based model of an unknown dynamical system from a finite number of (state-input)-successor state data points, such that the model obtained is also suitable for optimal control design. We adopt a neural network (NN) architecture that, once suitably trained, yields a hybrid system with continuous piecewise-affine (PWA) dynamics that is differentiable with respect to the network's parameters, thereby enabling the use of derivative-based training procedures. We show that a careful choice of our NN's weights produces a hybrid system model with structural properties that are highly favorable when used as part of a finite horizon optimal control problem (OCP). Specifically, we rely on available results to establish that optimal solutions with strong local optimality guarantees can be computed via nonlinear programming (NLP), in contrast to classical OCPs for general hybrid systems which typically require mixed-integer optimization. Besides being well-suited for optimal control design, numerical simulations illustrate that our NN-based technique enjoys very similar performance to state-of-the-art system identification methods for hybrid systems and it is competitive on nonlinear benchmarks.

Updated: 2024-10-09 17:58:59

标题: 一个基于神经网络的方法用于控制混合系统识别

摘要: 我们考虑从有限数量的（状态-输入）-继任状态数据点设计一个基于机器学习的未知动态系统模型的问题，使得所得模型也适用于最优控制设计。我们采用一种神经网络（NN）结构，一旦经过适当训练，就会产生一个具有连续分段仿射（PWA）动态的混合系统，该系统对网络参数可微分，从而可使用基于导数的训练过程。我们展示了我们的神经网络权重的精心选择产生了一个具有极具优势结构特性的混合系统模型，当作为有限时间最优控制问题（OCP）的一部分时，这些特性非常有利。具体而言，我们依赖于已有的结果来建立，通过非线性规划（NLP）可以计算具有强局部最优性保证的最优解，与通常需要混合整数优化的一般混合系统的经典OCP相比。除了适用于最优控制设计外，数值模拟表明，我们基于神经网络的技术与最先进的混合系统系统辨识方法在性能上非常相似，并且在非线性基准测试中竞争力强。

更新时间: 2024-10-09 17:58:59

领域: eess.SY,cs.LG,cs.SY,math.OC

下载: http://arxiv.org/abs/2404.01814v2

Simplicity Prevails: Rethinking Negative Preference Optimization for LLM Unlearning

In this work, we address the problem of large language model (LLM) unlearning, aiming to remove unwanted data influences and associated model capabilities (e.g., copyrighted data or harmful content generation) while preserving essential model utilities, without the need for retraining from scratch. Despite the growing need for LLM unlearning, a principled optimization framework remains lacking. To this end, we revisit the state-of-the-art approach, negative preference optimization (NPO), and identify the issue of reference model bias, which could undermine NPO's effectiveness, particularly when unlearning forget data of varying difficulty. Given that, we propose a simple yet effective unlearning optimization framework, called SimNPO, showing that 'simplicity' in removing the reliance on a reference model (through the lens of simple preference optimization) benefits unlearning. We also provide deeper insights into SimNPO's advantages, supported by analysis using mixtures of Markov chains. Furthermore, we present extensive experiments validating SimNPO's superiority over existing unlearning baselines in benchmarks like TOFU and MUSE, and robustness against relearning attacks. Codes are available at https://github.com/OPTML-Group/Unlearn-Simple.

Updated: 2024-10-09 17:58:12

标题: 简单为上：重新思考LLM遗忘的负面偏好优化

摘要: 在这项工作中，我们解决了大型语言模型（LLM）取消学习的问题，旨在消除不需要的数据影响和相关模型能力（例如，受版权保护的数据或有害内容生成），同时保留必要的模型效用，无需从头开始重新训练。尽管对LLM取消学习的需求不断增长，但缺乏一个有原则的优化框架。为此，我们重新审视了最先进的方法，负偏好优化（NPO），并确定了参考模型偏见的问题，这可能会削弱NPO的有效性，特别是在取消学习各种难度的遗忘数据时。鉴于此，我们提出了一个简单而有效的取消学习优化框架，称为SimNPO，表明通过简单偏好优化的视角删除对参考模型的依赖（有益于取消学习）。我们还提供了对SimNPO优势的更深入洞察，通过混合马尔可夫链的分析支持。此外，我们展示了在TOFU和MUSE等基准测试中验证SimNPO优越性的广泛实验，并展示了对重新学习攻击的强大抵抗力。代码可在https://github.com/OPTML-Group/Unlearn-Simple 上找到。

更新时间: 2024-10-09 17:58:12

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2410.07163v1

A Blockchain and Artificial Intelligence based System for Halal Food Traceability

The demand of the halal food products is increasing rapidly around the world. The consumption of halal food product is just not among the Muslims but also among non-Muslims, due to the purity of the halal food products. However, there are several challenges that are faced by the halal food consumers. The challenges raise a doubt among the halal food consumers about the authenticity of the product being halal. Therefore, a solution that can address these issues and can establish trust between consumers and producers. Blockchain technology can provide a distributed ledger of an immutable record of the information. Artificial intelligence supports developing a solution for pattern identification. The proposed research utilizes blockchain an artificial intelligence-based system for developing a system that ensure the authenticity of the halal food products by providing the traceability related to all the operations and processes of the supply chain and sourcing the raw material. The proposed system has been tested with a local supermarket. The results and tests of the developed solution seemed effective and the testers expressed interest in real-world implementation of the proposed system.

Updated: 2024-10-09 17:57:01

标题: 基于区块链和人工智能的清真食品追溯系统

摘要: 全球清真食品产品的需求正在迅速增长。清真食品产品的消费不仅仅局限于穆斯林群体，也包括非穆斯林，这是因为清真食品产品的纯净性。然而，清真食品消费者面临着一些挑战。这些挑战引起了清真食品消费者对产品真实性的怀疑。因此，需要找到一个可以解决这些问题并建立消费者和生产者之间信任的解决方案。区块链技术可以提供一个不可更改的信息分布式账本。人工智能支持开发模式识别的解决方案。所提出的研究利用基于区块链和人工智能的系统开发了一个确保清真食品产品真实性的系统，通过提供与供应链所有操作和流程以及原材料采购相关的可追溯性。该系统已在一家当地超市进行了测试。开发的解决方案的结果和测试似乎有效，测试者对所提出的系统在真实世界中的实施表示了兴趣。

更新时间: 2024-10-09 17:57:01

领域: cs.DC,cs.AI

下载: http://arxiv.org/abs/2410.07305v1

InstructG2I: Synthesizing Images from Multimodal Attributed Graphs

In this paper, we approach an overlooked yet critical task Graph2Image: generating images from multimodal attributed graphs (MMAGs). This task poses significant challenges due to the explosion in graph size, dependencies among graph entities, and the need for controllability in graph conditions. To address these challenges, we propose a graph context-conditioned diffusion model called InstructG2I. InstructG2I first exploits the graph structure and multimodal information to conduct informative neighbor sampling by combining personalized page rank and re-ranking based on vision-language features. Then, a Graph-QFormer encoder adaptively encodes the graph nodes into an auxiliary set of graph prompts to guide the denoising process of diffusion. Finally, we propose graph classifier-free guidance, enabling controllable generation by varying the strength of graph guidance and multiple connected edges to a node. Extensive experiments conducted on three datasets from different domains demonstrate the effectiveness and controllability of our approach. The code is available at https://github.com/PeterGriffinJin/InstructG2I.

Updated: 2024-10-09 17:56:15

标题: InstructG2I：从多模态属性图中合成图像

摘要: 在本文中，我们探讨了一个被忽视但至关重要的任务Graph2Image：从多模态属性图（MMAGs）生成图像。这项任务由于图的规模爆炸、图实体之间的依赖关系以及图条件的可控性需求而面临着重大挑战。为了解决这些挑战，我们提出了一种名为InstructG2I的图上下文条件扩散模型。InstructG2I首先利用图结构和多模态信息，通过将个性化页面排名和基于视觉语言特征的重新排序相结合来进行信息邻居采样。然后，一个Graph-QFormer编码器将图节点自适应地编码成一组辅助图提示，以指导扩散的去噪过程。最后，我们提出了无需图分类器指导的方法，通过调整图引导的强度和对节点的多条连接边实现可控生成。在来自不同领域的三个数据集上进行的大量实验证明了我们方法的有效性和可控性。代码可在https://github.com/PeterGriffinJin/InstructG2I找到。

更新时间: 2024-10-09 17:56:15

领域: cs.AI,cs.CL,cs.CV,cs.LG,cs.SI

下载: http://arxiv.org/abs/2410.07157v1

CHASE: Learning Convex Hull Adaptive Shift for Skeleton-based Multi-Entity Action Recognition

Skeleton-based multi-entity action recognition is a challenging task aiming to identify interactive actions or group activities involving multiple diverse entities. Existing models for individuals often fall short in this task due to the inherent distribution discrepancies among entity skeletons, leading to suboptimal backbone optimization. To this end, we introduce a Convex Hull Adaptive Shift based multi-Entity action recognition method (CHASE), which mitigates inter-entity distribution gaps and unbiases subsequent backbones. Specifically, CHASE comprises a learnable parameterized network and an auxiliary objective. The parameterized network achieves plausible, sample-adaptive repositioning of skeleton sequences through two key components. First, the Implicit Convex Hull Constrained Adaptive Shift ensures that the new origin of the coordinate system is within the skeleton convex hull. Second, the Coefficient Learning Block provides a lightweight parameterization of the mapping from skeleton sequences to their specific coefficients in convex combinations. Moreover, to guide the optimization of this network for discrepancy minimization, we propose the Mini-batch Pair-wise Maximum Mean Discrepancy as the additional objective. CHASE operates as a sample-adaptive normalization method to mitigate inter-entity distribution discrepancies, thereby reducing data bias and improving the subsequent classifier's multi-entity action recognition performance. Extensive experiments on six datasets, including NTU Mutual 11/26, H2O, Assembly101, Collective Activity and Volleyball, consistently verify our approach by seamlessly adapting to single-entity backbones and boosting their performance in multi-entity scenarios. Our code is publicly available at https://github.com/Necolizer/CHASE .

Updated: 2024-10-09 17:55:43

标题: CHASE：学习基于骨架的多实体动作识别的凸壳自适应转移

摘要: 基于骨架的多实体动作识别是一个具有挑战性的任务，旨在识别涉及多个不同实体的互动动作或群体活动。现有的个体模型通常在这项任务中表现不佳，这是由于实体骨架之间固有的分布差异导致主干优化不佳。为此，我们引入了一种基于凸包自适应移位的多实体动作识别方法（CHASE），该方法缓解了实体之间的分布差距，并使后续主干网络无偏。具体地，CHASE包括一个可学习的参数化网络和一个辅助目标。参数化网络通过两个关键组件实现骨架序列的合理、样本自适应的重新定位。首先，隐式凸包约束自适应移位确保坐标系的新原点在骨架凸包内。其次，系数学习块提供了从骨架序列到它们特定系数的凸组合的映射的轻量化参数化。此外，为了指导这个网络的优化以减少差异，我们提出了迷你批量成对最大均值差异作为额外目标。CHASE作为一种样本自适应的归一化方法，用于缓解实体之间的分布差异，从而减少数据偏差并提高后续分类器的多实体动作识别性能。在包括NTU Mutual 11/26、H2O、Assembly101、Collective Activity和排球在内的六个数据集上进行的大量实验一致验证了我们的方法，通过无缝地适应单实体主干并提升其在多实体场景中的性能。我们的代码公开可用于https://github.com/Necolizer/CHASE。

更新时间: 2024-10-09 17:55:43

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2410.07153v1

Towards Interpreting Visual Information Processing in Vision-Language Models

Vision-Language Models (VLMs) are powerful tools for processing and understanding text and images. We study the processing of visual tokens in the language model component of LLaVA, a prominent VLM. Our approach focuses on analyzing the localization of object information, the evolution of visual token representations across layers, and the mechanism of integrating visual information for predictions. Through ablation studies, we demonstrated that object identification accuracy drops by over 70\% when object-specific tokens are removed. We observed that visual token representations become increasingly interpretable in the vocabulary space across layers, suggesting an alignment with textual tokens corresponding to image content. Finally, we found that the model extracts object information from these refined representations at the last token position for prediction, mirroring the process in text-only language models for factual association tasks. These findings provide crucial insights into how VLMs process and integrate visual information, bridging the gap between our understanding of language and vision models, and paving the way for more interpretable and controllable multimodal systems.

Updated: 2024-10-09 17:55:02

标题: 朝向解释视觉-语言模型中的视觉信息处理

摘要: 视觉-语言模型（VLMs）是处理和理解文本和图像的强大工具。我们研究了LLaVA中语言模型组件中对视觉标记的处理，LLaVA是一个著名的VLM。我们的方法主要集中在分析对象信息的定位、视觉标记在各层之间的表示演变以及整合视觉信息进行预测的机制。通过消融研究，我们发现当删除特定对象标记时，对象识别准确率下降超过70\%。我们观察到，随着层次的增加，视觉标记的表示在词汇空间中变得越来越可解释，表明与对应图像内容的文本标记存在一定的对齐。最后，我们发现模型从这些精细表示中提取对象信息用于预测，反映了仅包含文本的语言模型在事实关联任务中的处理过程。这些发现为我们了解VLMs如何处理和整合视觉信息提供了关键见解，弥合了我们对语言和视觉模型的理解之间的差距，并为更具可解释性和可控性的多模态系统铺平了道路。

更新时间: 2024-10-09 17:55:02

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2410.07149v1

Taking a turn for the better: Conversation redirection throughout the course of mental-health therapy

Mental-health therapy involves a complex conversation flow in which patients and therapists continuously negotiate what should be talked about next. For example, therapists might try to shift the conversation's direction to keep the therapeutic process on track and avoid stagnation, or patients might push the discussion towards issues they want to focus on. How do such patient and therapist redirections relate to the development and quality of their relationship? To answer this question, we introduce a probabilistic measure of the extent to which a certain utterance immediately redirects the flow of the conversation, accounting for both the intention and the actual realization of such a change. We apply this new measure to characterize the development of patient-therapist relationships over multiple sessions in a very large, widely-used online therapy platform. Our analysis reveals that (1) patient control of the conversation's direction generally increases relative to that of the therapist as their relationship progresses; and (2) patients who have less control in the first few sessions are significantly more likely to eventually express dissatisfaction with their therapist and terminate the relationship.

Updated: 2024-10-09 17:54:41

标题: 走向更好的转变：精神健康治疗过程中的对话重定向

摘要: 心理健康治疗涉及一种复杂的对话流程，其中患者和治疗师不断协商接下来应该谈论什么。例如，治疗师可能试图转变对话的方向，以保持治疗过程的顺利进行，避免停滞，或者患者可能推动讨论朝着他们想要关注的问题发展。这种患者和治疗师的重新定向如何与他们的关系发展和质量相关？为了回答这个问题，我们引入了一个概率测量，用于衡量某种话语立即重新定向对话流的程度，考虑到这种改变的意图和实际实现。我们将这个新的度量应用于表征在一个非常大型、广泛使用的在线治疗平台上的多个会话中患者和治疗师关系的发展。我们的分析显示，（1）随着他们的关系发展，患者对话的方向控制一般会相对增加，而不是治疗师的；（2）在最初几个会话中控制较少的患者更有可能最终表达对治疗师的不满并终止关系。

更新时间: 2024-10-09 17:54:41

领域: cs.CL,cs.AI,cs.CY

下载: http://arxiv.org/abs/2410.07147v1

Stuffed Mamba: State Collapse and State Capacity of RNN-Based Long-Context Modeling

One essential advantage of recurrent neural networks (RNNs) over transformer-based language models is their linear computational complexity concerning the sequence length, which makes them much faster in handling long sequences during inference. However, most publicly available RNNs (e.g., Mamba and RWKV) are trained on sequences with less than 10K tokens, and their effectiveness in longer contexts remains largely unsatisfying so far. In this paper, we study the cause of the inability to process long context for RNNs and suggest critical mitigations. We examine two practical concerns when applying state-of-the-art RNNs to long contexts: (1) the inability to extrapolate to inputs longer than the training length and (2) the upper bound of memory capacity. Addressing the first concern, we first investigate *state collapse* (SC), a phenomenon that causes severe performance degradation on sequence lengths not encountered during training. With controlled experiments, we attribute this to overfitting due to the recurrent state being overparameterized for the training length. For the second concern, we train a series of Mamba-2 models on long documents to empirically estimate the recurrent state capacity in language modeling and passkey retrieval. Then, three SC mitigation methods are proposed to improve Mamba-2's length generalizability, allowing the model to process more than 1M tokens without SC. We also find that the recurrent state capacity in passkey retrieval scales exponentially to the state size, and we empirically train a Mamba-2 370M with near-perfect passkey retrieval accuracy on 256K context length. This suggests a promising future for RNN-based long-context modeling.

Updated: 2024-10-09 17:54:28

标题: 填充的曼巴：基于RNN的长上下文建模的国家崩溃和国家能力

摘要: 循环神经网络（RNNs）相对于基于transformer的语言模型的一个重要优势是其在序列长度方面的线性计算复杂性，这使它们在推理过程中处理长序列时更快。然而，大多数公开可用的RNNs（如Mamba和RWKV）是在少于10K个标记的序列上训练的，并且它们在更长上下文中的有效性到目前为止仍然令人不满意。本文研究了RNN无法处理长上下文的原因，并提出了关键的缓解方法。我们在将最先进的RNNs应用于长上下文时考虑了两个实际问题：（1）无法对超出训练长度的输入进行外推和（2）内存容量的上限。针对第一个问题，我们首先研究了*状态坍塌*（SC），这是一种导致在训练过程中未遇到的序列长度上性能严重下降的现象。通过控制实验，我们将此归因于由于循环状态对于训练长度过度参数化而导致的过拟合。针对第二个问题，我们在长文档上训练了一系列Mamba-2模型，以经验估计在语言建模和通行证检索中的循环状态容量。然后，提出了三种SC缓解方法以提高Mamba-2的长度泛化能力，使模型能够处理超过1M个标记而无SC。我们还发现通行证检索中的循环状态容量与状态大小呈指数关系，并在256K上下文长度上经验训练了一款具有近乎完美通行证检索准确性的370M Mamba-2。这表明了基于RNN的长上下文建模的光明未来。

更新时间: 2024-10-09 17:54:28

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2410.07145v1

Cheating Automatic LLM Benchmarks: Null Models Achieve High Win Rates

Automatic LLM benchmarks, such as AlpacaEval 2.0, Arena-Hard-Auto, and MT-Bench, have become popular for evaluating language models due to their cost-effectiveness and scalability compared to human evaluation. Achieving high win rates on these benchmarks can significantly boost the promotional impact of newly released language models. This promotional benefit may motivate tricks, such as manipulating model output length or style to game win rates, even though several mechanisms have been developed to control length and disentangle style to reduce gameability. Nonetheless, we show that even a "null model" that always outputs a constant response (irrelevant to input instructions) can cheat automatic benchmarks and achieve top-ranked win rates: an 86.5% LC win rate on AlpacaEval 2.0; an 83.0 score on Arena-Hard-Auto; and a 9.55 score on MT-Bench. Moreover, the crafted cheating outputs are transferable because we assume that the instructions of these benchmarks (e.g., 805 samples of AlpacaEval 2.0) are private and cannot be accessed. While our experiments are primarily proof-of-concept, an adversary could use LLMs to generate more imperceptible cheating responses, unethically benefiting from high win rates and promotional impact. Our findings call for the development of anti-cheating mechanisms for reliable automatic benchmarks. The code is available at https://github.com/sail-sg/Cheating-LLM-Benchmarks.

Updated: 2024-10-09 17:53:06

标题: 作弊自动LLM基准测试：空模型实现高胜率

摘要: 自动LLM基准测试，如AlpacaEval 2.0，Arena-Hard-Auto和MT-Bench，由于其与人类评估相比具有成本效益和可扩展性，已经变得流行用于评估语言模型。在这些基准测试中取得高胜率可以显著提升新发布语言模型的宣传影响。这种宣传效益可能激励一些技巧，例如操纵模型输出长度或样式以达到提高胜率的目的，尽管已经开发了几种机制来控制长度和解开样式以减少游戏性。然而，我们展示了即使一个“空模型”总是输出一个固定的响应（与输入指令无关），也可以欺骗自动基准测试并获得排名靠前的胜率：在AlpacaEval 2.0上获得86.5％的LC胜率；在Arena-Hard-Auto上获得83.0分；在MT-Bench上获得9.55分。此外，精心制作的作弊输出是可转移的，因为我们假设这些基准测试的指令（例如AlpacaEval 2.0的805个样本）是私有的，无法访问。虽然我们的实验主要是概念验证，但攻击者可以利用LLMs生成更不可察觉的作弊响应，从高胜率和宣传影响中不道德地获益。我们的研究结果呼吁开发可靠自动基准测试的反作弊机制。代码可在https://github.com/sail-sg/Cheating-LLM-Benchmarks找到。

更新时间: 2024-10-09 17:53:06

领域: cs.CL,cs.AI,cs.CR,cs.LG

下载: http://arxiv.org/abs/2410.07137v1

The Moral Turing Test: Evaluating Human-LLM Alignment in Moral Decision-Making

As large language models (LLMs) become increasingly integrated into society, their alignment with human morals is crucial. To better understand this alignment, we created a large corpus of human- and LLM-generated responses to various moral scenarios. We found a misalignment between human and LLM moral assessments; although both LLMs and humans tended to reject morally complex utilitarian dilemmas, LLMs were more sensitive to personal framing. We then conducted a quantitative user study involving 230 participants (N=230), who evaluated these responses by determining whether they were AI-generated and assessed their agreement with the responses. Human evaluators preferred LLMs' assessments in moral scenarios, though a systematic anti-AI bias was observed: participants were less likely to agree with judgments they believed to be machine-generated. Statistical and NLP-based analyses revealed subtle linguistic differences in responses, influencing detection and agreement. Overall, our findings highlight the complexities of human-AI perception in morally charged decision-making.

Updated: 2024-10-09 17:52:00

标题: 道德图灵测试：评估人类在道德决策中与LLM的一致性

摘要: 随着大型语言模型（LLMs）越来越多地融入社会，它们与人类道德的一致性至关重要。为了更好地理解这种一致性，我们创建了一个包含人类和LLM生成的响应的大型语料库，涉及各种道德情境。我们发现人类和LLM的道德评估存在不一致；尽管LLMs和人类都倾向于拒绝道德复杂的功利主义困境，但LLMs更加敏感于个人框架。接着，我们进行了一项涉及230名参与者（N=230）的定量用户研究，他们通过判断这些响应是否是由人工智能生成的来评估它们，并评估他们对这些响应的同意程度。人类评估者更倾向于LLMs在道德情境中的评估，尽管观察到一种系统性的反对人工智能的偏见：参与者更不太可能同意他们认为是机器生成的判断。统计和基于自然语言处理的分析揭示了响应中微妙的语言差异，影响了检测和同意程度。总的来说，我们的研究结果突显了人类与人工智能在道德决策中的感知复杂性。

更新时间: 2024-10-09 17:52:00

领域: cs.HC,cs.AI

下载: http://arxiv.org/abs/2410.07304v1

ReFeR: Improving Evaluation and Reasoning through Hierarchy of Models

Assessing the quality of outputs generated by generative models, such as large language models and vision language models, presents notable challenges. Traditional methods for evaluation typically rely on either human assessments, which are resource-intensive, or automatic metrics that often show a low correlation with human judgment. Another common approach is to use deep learning systems, which not only consume a substantial amount of compute and time but also require extensive training data. In this study, we introduce a tuning-free framework called ReFeR, designed to evaluate generative outputs, including both text and images, by leveraging a 2-level hierarchy of LLMs and VLMs themselves. We rigorously evaluate our framework, ReFeR, across four diverse evaluation tasks. The framework not only improves the accuracy of these evaluations, surpassing previous benchmarks but also generates constructive feedback. Interestingly, the framework is also applicable to reasoning tasks. Experiments on four reasoning tasks demonstrate superior collective reasoning abilities of the framework. We present two variants of the framework: ReFeR-Turbo, optimized for accelerated performance, and ReFeR-Lite, offering a more cost-effective solution. ReFeR-Lite is $\sim7.7\times$ more efficient while being comparably accurate to ReFeR-Turbo. We make code, data and PIP package publicly available. See this PIP URL https://pypi.org/project/refer-agents/ and this Git URL https://github.com/yaswanth-iitkgp/ReFeR_Code .

Updated: 2024-10-09 17:51:44

标题: ReFeR：通过模型层次结构改进评估和推理

摘要: 评估生成模型产生的输出质量，如大型语言模型和视觉语言模型，面临着显著挑战。传统的评估方法通常依赖于人工评估或自动指标，但这些方法往往与人类判断之间的相关性较低。另一种常见的方法是使用深度学习系统，这不仅消耗大量计算资源和时间，还需要大量的训练数据。在本研究中，我们介绍了一个名为ReFeR的无调优框架，旨在通过利用LLMs和VLMs自身的两级层次结构来评估生成输出，包括文本和图像。我们在四个不同的评估任务中对我们的框架ReFeR进行了严格评估。该框架不仅提高了这些评估的准确性，超越了以前的基准，还生成了建设性的反馈。有趣的是，该框架还适用于推理任务。对四个推理任务的实验展示了框架的优越集体推理能力。我们提出了框架的两个变体：ReFeR-Turbo，针对加速性能进行优化，和ReFeR-Lite，提供更具成本效益的解决方案。与ReFeR-Turbo相比，ReFeR-Lite更高效，准确性可比。我们公开提供了代码、数据和PIP包。请查看此PIP URLhttps://pypi.org/project/refer-agents/以及此Git URLhttps://github.com/yaswanth-iitkgp/ReFeR_Code。

更新时间: 2024-10-09 17:51:44

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2407.12877v2

Thing2Reality: Transforming 2D Content into Conditioned Multiviews and 3D Gaussian Objects for XR Communication

During remote communication, participants often share both digital and physical content, such as product designs, digital assets, and environments, to enhance mutual understanding. Recent advances in augmented communication have facilitated users to swiftly create and share digital 2D copies of physical objects from video feeds into a shared space. However, conventional 2D representations of digital objects restricts users' ability to spatially reference items in a shared immersive environment. To address this, we propose Thing2Reality, an Extended Reality (XR) communication platform that enhances spontaneous discussions of both digital and physical items during remote sessions. With Thing2Reality, users can quickly materialize ideas or physical objects in immersive environments and share them as conditioned multiview renderings or 3D Gaussians. Thing2Reality enables users to interact with remote objects or discuss concepts in a collaborative manner. Our user study revealed that the ability to interact with and manipulate 3D representations of objects significantly enhances the efficiency of discussions, with the potential to augment discussion of 2D artifacts.

Updated: 2024-10-09 17:49:06

标题: Thing2Reality：将2D内容转换为条件化的多视图和3D高斯对象，用于XR通信

摘要: 在远程通信过程中，参与者经常分享数字和物理内容，例如产品设计、数字资产和环境，以增进彼此的理解。最近在增强通信方面取得的进展使用户能够快速创建和共享来自视频源的物理物体的数字2D副本，进入共享空间。然而，传统的数字物体的2D表示限制了用户在共享的沉浸式环境中空间参考物品的能力。为了解决这个问题，我们提出了Thing2Reality，这是一个增强现实（XR）通信平台，可以增强远程会议期间对数字和物理物品的即时讨论。通过Thing2Reality，用户可以快速将想法或物理物体实现在沉浸式环境中，并将它们分享为条件多视图渲染或3D高斯。Thing2Reality使用户能够以合作方式与远程物体互动或讨论概念。我们的用户研究表明，与和操作物体的3D表示显着增强了讨论的效率，有潜力增强对2D物品的讨论。

更新时间: 2024-10-09 17:49:06

领域: cs.HC,cs.AI,cs.CV

下载: http://arxiv.org/abs/2410.07119v1

The FIX Benchmark: Extracting Features Interpretable to eXperts

Feature-based methods are commonly used to explain model predictions, but these methods often implicitly assume that interpretable features are readily available. However, this is often not the case for high-dimensional data, and it can be hard even for domain experts to mathematically specify which features are important. Can we instead automatically extract collections or groups of features that are aligned with expert knowledge? To address this gap, we present FIX (Features Interpretable to eXperts), a benchmark for measuring how well a collection of features aligns with expert knowledge. In collaboration with domain experts, we propose FIXScore, a unified expert alignment measure applicable to diverse real-world settings across cosmology, psychology, and medicine domains in vision, language and time series data modalities. With FIXScore, we find that popular feature-based explanation methods have poor alignment with expert-specified knowledge, highlighting the need for new methods that can better identify features interpretable to experts.

Updated: 2024-10-09 17:47:01

标题: FIX基准：提取专家可解释的特征

摘要: 基于特征的方法通常用于解释模型预测，但这些方法通常隐含地假设可解释的特征是readily available。然而，对于高维数据，这通常并非如此，即使对于领域专家来说，也很难数学地指定哪些特征是重要的。我们能否自动提取与专家知识一致的特征集合或组群？为了解决这一问题，我们提出了FIX（Features Interpretable to eXperts），这是一个衡量一组特征与专家知识一致程度的基准。与领域专家合作，我们提出FIXScore，这是一种统一的专家对齐度量，适用于宇宙学、心理学和医学领域的视觉、语言和时间序列数据模态。通过FIXScore，我们发现流行的基于特征的解释方法与专家指定的知识之间存在较差的对齐度，突显了需要新的方法来更好地识别专家可解释的特征。

更新时间: 2024-10-09 17:47:01

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2409.13684v2

VHELM: A Holistic Evaluation of Vision Language Models

Current benchmarks for assessing vision-language models (VLMs) often focus on their perception or problem-solving capabilities and neglect other critical aspects such as fairness, multilinguality, or toxicity. Furthermore, they differ in their evaluation procedures and the scope of the evaluation, making it difficult to compare models. To address these issues, we extend the HELM framework to VLMs to present the Holistic Evaluation of Vision Language Models (VHELM). VHELM aggregates various datasets to cover one or more of the 9 aspects: visual perception, knowledge, reasoning, bias, fairness, multilinguality, robustness, toxicity, and safety. In doing so, we produce a comprehensive, multi-dimensional view of the capabilities of the VLMs across these important factors. In addition, we standardize the standard inference parameters, methods of prompting, and evaluation metrics to enable fair comparisons across models. Our framework is designed to be lightweight and automatic so that evaluation runs are cheap and fast. Our initial run evaluates 22 VLMs on 21 existing datasets to provide a holistic snapshot of the models. We uncover new key findings, such as the fact that efficiency-focused models (e.g., Claude 3 Haiku or Gemini 1.5 Flash) perform significantly worse than their full models (e.g., Claude 3 Opus or Gemini 1.5 Pro) on the bias benchmark but not when evaluated on the other aspects. For transparency, we release the raw model generations and complete results on our website (https://crfm.stanford.edu/helm/vhelm/v2.0.1). VHELM is intended to be a living benchmark, and we hope to continue adding new datasets and models over time.

Updated: 2024-10-09 17:46:34

标题: VHELM：视觉语言模型的整体评估

摘要: 目前用于评估视觉语言模型（VLMs）的基准往往集中在它们的感知或问题解决能力上，而忽视了其他关键方面，如公平性、多语种性或毒性。此外，它们在评估程序和评估范围上存在差异，使得比较模型变得困难。为了解决这些问题，我们将HELM框架扩展到VLMs，提出了全面评估视觉语言模型（VHELM）。VHELM整合了各种数据集，涵盖了视觉感知、知识、推理、偏见、公平性、多语种性、稳健性、毒性和安全性等9个方面中的一个或多个。通过这样做，我们为VLMs在这些重要因素上的能力提供了全面的、多维度的视图。此外，我们标准化了标准推理参数、提示方法和评估指标，以便在模型之间进行公平比较。我们的框架旨在轻量化和自动化，使评估运行成本低廉且快速。我们的初步运行评估了22个VLMs在21个现有数据集上，以提供模型的全面快照。我们发现了一些新的关键发现，例如，以效率为重点的模型（如Claude 3 Haiku或Gemini 1.5 Flash）在偏见基准上表现明显较差，但在其他方面评估时并非如此。为了透明度，我们在我们的网站发布了原始模型生成和完整结果（https://crfm.stanford.edu/helm/vhelm/v2.0.1）。VHELM旨在成为一个动态的基准，我们希望随着时间的推移继续添加新的数据集和模型。

更新时间: 2024-10-09 17:46:34

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2410.07112v1

Continual Learning: Less Forgetting, More OOD Generalization via Adaptive Contrastive Replay

Machine learning models often suffer from catastrophic forgetting of previously learned knowledge when learning new classes. Various methods have been proposed to mitigate this issue. However, rehearsal-based learning, which retains samples from previous classes, typically achieves good performance but tends to memorize specific instances, struggling with Out-of-Distribution (OOD) generalization. This often leads to high forgetting rates and poor generalization. Surprisingly, the OOD generalization capabilities of these methods have been largely unexplored. In this paper, we highlight this issue and propose a simple yet effective strategy inspired by contrastive learning and data-centric principles to address it. We introduce Adaptive Contrastive Replay (ACR), a method that employs dual optimization to simultaneously train both the encoder and the classifier. ACR adaptively populates the replay buffer with misclassified samples while ensuring a balanced representation of classes and tasks. By refining the decision boundary in this way, ACR achieves a balance between stability and plasticity. Our method significantly outperforms previous approaches in terms of OOD generalization, achieving an improvement of 13.41\% on Split CIFAR-100, 9.91\% on Split Mini-ImageNet, and 5.98\% on Split Tiny-ImageNet.

Updated: 2024-10-09 17:45:47

标题: 持续学习：通过自适应对比重演减少遗忘，提高ODD泛化

摘要: 机器学习模型在学习新类别时经常遭受先前学习知识的灾难性遗忘。已经提出了各种方法来缓解这个问题。然而，基于复述的学习，即保留先前类别的样本，通常可以取得良好的性能，但往往会记忆特定实例，从而在面对分布之外（OOD）的泛化时遇到困难。这经常导致高遗忘率和泛化能力差。令人惊讶的是，这些方法的OOD泛化能力在很大程度上尚未被探索。在本文中，我们强调了这个问题，并提出了一种简单而有效的策略，灵感来自对比学习和数据中心原则以解决这个问题。我们引入了自适应对比复述（ACR），这种方法采用双重优化来同时训练编码器和分类器。ACR自适应地填充重复缓冲区，同时确保类别和任务的平衡表示。通过这种方式细化决策边界，ACR在稳定性和可塑性之间实现了平衡。我们的方法在OOD泛化方面明显优于先前的方法，在Split CIFAR-100上实现了13.41％的改善，在Split Mini-ImageNet上实现了9.91％的改善，在Split Tiny-ImageNet上实现了5.98％的改善。

更新时间: 2024-10-09 17:45:47

领域: cs.LG,cs.CV

下载: http://arxiv.org/abs/2410.07110v1

Private prediction for large-scale synthetic text generation

We present an approach for generating differentially private synthetic text using large language models (LLMs), via private prediction. In the private prediction framework, we only require the output synthetic data to satisfy differential privacy guarantees. This is in contrast to approaches that train a generative model on potentially sensitive user-supplied source data and seek to ensure the model itself is safe to release. We prompt a pretrained LLM with source data, but ensure that next-token predictions are made with differential privacy guarantees. Previous work in this paradigm reported generating a small number of examples (<10) at reasonable privacy levels, an amount of data that is useful only for downstream in-context learning or prompting. In contrast, we make changes that allow us to generate thousands of high-quality synthetic data points, greatly expanding the set of potential applications. Our improvements come from an improved privacy analysis and a better private selection mechanism, which makes use of the equivalence between the softmax layer for sampling tokens in LLMs and the exponential mechanism. Furthermore, we introduce a novel use of public predictions via the sparse vector technique, in which we do not pay privacy costs for tokens that are predictable without sensitive data; we find this to be particularly effective for structured data.

Updated: 2024-10-09 17:45:07

标题: 大规模合成文本生成的私人预测

摘要: 我们提出了一种使用大型语言模型（LLMs）通过隐私预测生成差分私有合成文本的方法。在私有预测框架中，我们只需要输出的合成数据满足差分隐私保证。这与训练一个生成模型的方法形成对比，后者使用潜在敏感的用户提供的源数据，并寻求确保模型本身可以安全发布。我们使用预训练的LLM与源数据进行提示，但确保接下来的标记预测具有差分隐私保证。在这种范例中的先前工作报告了在合理的隐私水平下生成少量示例（<10），这种数据量仅对下游上下文学习或提示有用。相比之下，我们进行了改进，使我们能够生成成千上万个高质量的合成数据点，大大扩展了潜在应用的范围。我们的改进来自于改进的隐私分析和更好的私有选择机制，后者利用了在LLMs中对标记进行采样的softmax层和指数机制之间的等价性。此外，我们通过稀疏向量技术引入了一种公共预测的新用法，其中我们对于无需敏感数据即可预测的标记不支付隐私成本；我们发现这对于结构化数据特别有效。

更新时间: 2024-10-09 17:45:07

领域: cs.LG,cs.CL,cs.CR

下载: http://arxiv.org/abs/2407.12108v2

Topologically Faithful Multi-class Segmentation in Medical Images

Topological accuracy in medical image segmentation is a highly important property for downstream applications such as network analysis and flow modeling in vessels or cell counting. Recently, significant methodological advancements have brought well-founded concepts from algebraic topology to binary segmentation. However, these approaches have been underexplored in multi-class segmentation scenarios, where topological errors are common. We propose a general loss function for topologically faithful multi-class segmentation extending the recent Betti matching concept, which is based on induced matchings of persistence barcodes. We project the N-class segmentation problem to N single-class segmentation tasks, which allows us to use 1-parameter persistent homology, making training of neural networks computationally feasible. We validate our method on a comprehensive set of four medical datasets with highly variant topological characteristics. Our loss formulation significantly enhances topological correctness in cardiac, cell, artery-vein, and Circle of Willis segmentation.

Updated: 2024-10-09 17:44:14

标题: 医学图像中的拓扑保真多类分割

摘要: 在医学图像分割中，拓扑准确性是一项非常重要的性质，对于下游应用如网络分析和血管或细胞计数的流模拟至关重要。最近，重要的方法论进展已经将代数拓扑的基础概念引入二值分割中。然而，在多类分割场景中，这些方法尚未得到充分探索，拓扑错误很常见。我们提出了一个通用的损失函数，用于拓扑忠实的多类分割，扩展了最近的Betti匹配概念，该概念基于持续条形码的诱导匹配。我们将N类分割问题投影到N个单类分割任务，这使我们能够使用1参数持久同调，从而使神经网络的训练在计算上可行。我们在一组包含高度不同拓扑特征的四个医学数据集上验证了我们的方法。我们的损失公式显著提高了心脏、细胞、动脉-静脉和Willis环分割的拓扑正确性。

更新时间: 2024-10-09 17:44:14

领域: eess.IV,cs.CV,cs.LG

下载: http://arxiv.org/abs/2403.11001v2

Examining the Prevalence and Dynamics of AI-Generated Media in Art Subreddits

Broadly accessible generative AI models like Dall-E have made it possible for anyone to create compelling visual art. In online communities, the introduction of AI-generated content (AIGC) may impact community dynamics by shifting the kinds of content being posted or the responses to content suspected of being generated by AI. We take steps towards examining the potential impact of AIGC on art-related communities on Reddit. We distinguish between communities that disallow AI content and those without a direct policy. We look at image-based posts made to these communities that are transparently created by AI, or comments in these communities that suspect authors of using generative AI. We find that AI posts (and accusations) have played a very small part in these communities through the end of 2023, accounting for fewer than 0.2% of the image-based posts. Even as the absolute number of author-labelled AI posts dwindles over time, accusations of AI use remain more persistent. We show that AI content is more readily used by newcomers and may help increase participation if it aligns with community rules. However, the tone of comments suspecting AI use by others have become more negative over time, especially in communities that do not have explicit rules about AI. Overall, the results show the changing norms and interactions around AIGC in online communities designated for creativity.

Updated: 2024-10-09 17:41:13

标题: 审视艺术Subreddits中AI生成媒体的普及和动态

摘要: 广泛可访问的生成式人工智能模型，如Dall-E，使任何人都有可能创作引人入胜的视觉艺术。在在线社区中，引入AI生成的内容（AIGC）可能会通过改变发布的内容类型或对被怀疑是由AI生成的内容的回应，影响社区动态。我们着手研究AIGC对Reddit上的与艺术相关社区的潜在影响。我们区分了不允许AI内容和没有直接政策的社区。我们观察了这些社区发布的透明由AI创建的基于图像的帖子，或者怀疑作者使用生成式AI的评论。我们发现，直到2023年底，AI帖子（和指控）在这些社区中所占比例非常小，仅占基于图像的帖子的不到0.2%。即使随着时间的推移，标记为AI的帖子的绝对数量在减少，对AI使用的指控仍然更加持久。我们展示了AI内容更容易被新手使用，并且如果符合社区规则，可能有助于增加参与度。然而，随着时间的推移，对他人怀疑使用AI的评论的语气变得更加负面，尤其是在没有关于AI的明确规则的社区中。总的来说，结果显示了在线创作社区中围绕AIGC的规范和互动的变化。

更新时间: 2024-10-09 17:41:13

领域: cs.AI,cs.CY,cs.SI

下载: http://arxiv.org/abs/2410.07302v1

DoPAMine: Domain-specific Pre-training Adaptation from seed-guided data Mining

Large Language Models (LLMs) have shown remarkable ability to generalize effectively across numerous industry domains while executing a range of tasks. Many of these competencies are obtained from the data utilized during the pre-training phase of the Language Models (LMs). However, these models exhibit limitations when tasked with performing in specialized or low-resource industry domains. More recent approaches use LLMs for generating domain-specific synthetic data but most often they lack in truthfulness and complexity. Alternatively, in cases where domain data is available like healthcare and finance most of the LMs are proprietary necessitating the need for a scalable method to curate real world industry specific pre-training data. In this work, we propose an automated and scalable framework - DoPAMine:Domain-specific Pre-training Adaptation from seed-guided data Mining, to mine domain specific training data from a large data corpus for domain adaptation of a LM. The framework leverages the parametric knowledge of a LLM to generate diverse and representative seed data tailored to a specific domain which is then used to mine real world data from a large data corpus like Common Crawl. We evaluated our framework's performance in the continual pre-training (CPT) setting by training two domain specific 7B parameter LMs in healthcare and finance with data mined via DoPAMine. Our experiments show that DoPAMine boosts the performance of pre-trained LLMs on average by 4.9% and 5.1% in zero-shot and 5-shot settings respectively on healthcare tasks from MMLU, MedQA, MedMCQA and PubMedQA datasets, and 2.9% and 6.7% for zero-shot and 5-shot settings respectively on finance tasks from FiQA-SA, FPB and Headlines datasets when compared to the baseline.

Updated: 2024-10-09 17:39:59

标题: 多巴胺：基于领域特定的预训练自适应从种子引导的数据挖掘

摘要: 大型语言模型(LLMs)展现出了在执行各种任务时有效地在许多行业领域进行泛化的卓越能力。许多这些能力是从语言模型(LMs)的预训练阶段使用的数据中获得的。然而，当这些模型被要求在专门或低资源行业领域中执行时，它们会表现出一些限制。最近的一些方法使用LLMs生成特定领域的合成数据，但往往缺乏真实性和复杂性。相反，在像医疗保健和金融这样有领域数据可用的情况下，大多数LMs都是专有的，需要一种可扩展的方法来筛选真实世界的行业特定的预训练数据。在这项工作中，我们提出了一个自动化和可扩展的框架 - DoPAMine:从种子引导数据挖掘的领域特定预训练适应，从大型数据语料库中挖掘领域特定的训练数据，用于LM的领域适应。该框架利用LLM的参数化知识生成针对特定领域的多样化和代表性种子数据，然后利用这些数据从类似Common Crawl这样的大型数据语料库中挖掘真实世界的数据。我们在持续预训练(CPT)设置中评估了我们框架的性能，通过使用DoPAMine挖掘的数据训练了两个领域特定的7B参数LM，分别是医疗保健和金融领域。我们的实验结果显示，与基准相比，DoPAMine在MMLU、MedQA、MedMCQA和PubMedQA数据集的医疗保健任务中分别平均提高了4.9%和5.1%的零样本和5-shot设置的预训练LLMs的性能，以及在FiQA-SA、FPB和Headlines数据集的金融任务中分别提高了2.9%和6.7%的零样本和5-shot设置的性能。

更新时间: 2024-10-09 17:39:59

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2410.00260v2

An Approach for Auto Generation of Labeling Functions for Software Engineering Chatbots

Software engineering (SE) chatbots are increasingly gaining attention for their role in enhancing development processes. At the core of chatbots are the Natural Language Understanding platforms (NLUs), which enable them to comprehend and respond to user queries. Before deploying NLUs, there is a need to train them with labeled data. However, acquiring such labeled data for SE chatbots is challenging due to the scarcity of high-quality datasets. This challenge arises because training SE chatbots requires specialized vocabulary and phrases not found in typical language datasets. Consequently, chatbot developers often resort to manually annotating user queries to gather the data necessary for training effective chatbots, a process that is both time-consuming and resource-intensive. Previous studies propose approaches to support chatbot practitioners in annotating users' posed queries. However, these approaches require human intervention to generate rules, called labeling functions (LFs), that identify and categorize user queries based on specific patterns in the data. To address this issue, we propose an approach to automatically generate LFs by extracting patterns from labeled user queries. We evaluate the effectiveness of our approach by applying it to the queries of four diverse SE datasets (namely AskGit, MSA, Ask Ubuntu, and Stack Overflow) and measure the performance improvement gained from training the NLU on the queries labeled by the generated LFs. We find that the generated LFs effectively label data with AUC scores of up to 85.3%, and NLU's performance improvement of up to 27.2% across the studied datasets. Furthermore, our results show that the number of LFs used to generate LFs affects the labeling performance. We believe that our approach can save time and resources in labeling users' queries, allowing practitioners to focus on core chatbot functionalities.

Updated: 2024-10-09 17:34:14

标题: 一个用于软件工程聊天机器人标签函数自动生成的方法

摘要: 软件工程（SE）聊天机器人因其在增强开发流程中的作用而越来越受到关注。在聊天机器人的核心是自然语言理解平台（NLUs），使它们能够理解并回应用户查询。在部署NLUs之前，需要用标记数据对其进行训练。然而，由于高质量数据集的稀缺性，为SE聊天机器人获取此类标记数据是具有挑战性的。这一挑战的原因在于训练SE聊天机器人需要专门词汇和短语，而这些在典型语言数据集中找不到。因此，聊天机器人开发人员通常会手动注释用户查询，以收集训练有效聊天机器人所需的数据，这是一项既耗时又资源密集的过程。先前的研究提出了支持聊天机器人从业者注释用户提出的查询的方法。然而，这些方法需要人为干预来生成规则，称为标记函数（LFs），根据数据中的特定模式识别和分类用户查询。为了解决这个问题，我们提出了一种方法，通过从标记的用户查询中提取模式来自动生成LFs。我们通过将其应用于四个不同的SE数据集的查询（即AskGit、MSA、Ask Ubuntu和Stack Overflow）来评估我们方法的有效性，并测量通过对由生成的LFs标记的查询进行训练后NLU获得的性能改进。我们发现，生成的LFs有效地标记数据，AUC分数高达85.3％，NLU在研究的数据集中性能提高了最多27.2％。此外，我们的结果表明，用于生成LFs的LFs数量会影响标记性能。我们相信我们的方法可以节省标记用户查询的时间和资源，使从业者能够专注于核心聊天机器人功能。

更新时间: 2024-10-09 17:34:14

领域: cs.SE,cs.AI,cs.CL,cs.LG

下载: http://arxiv.org/abs/2410.07094v1

Collusion Detection with Graph Neural Networks

Collusion is a complex phenomenon in which companies secretly collaborate to engage in fraudulent practices. This paper presents an innovative methodology for detecting and predicting collusion patterns in different national markets using neural networks (NNs) and graph neural networks (GNNs). GNNs are particularly well suited to this task because they can exploit the inherent network structures present in collusion and many other economic problems. Our approach consists of two phases: In Phase I, we develop and train models on individual market datasets from Japan, the United States, two regions in Switzerland, Italy, and Brazil, focusing on predicting collusion in single markets. In Phase II, we extend the models' applicability through zero-shot learning, employing a transfer learning approach that can detect collusion in markets in which training data is unavailable. This phase also incorporates out-of-distribution (OOD) generalization to evaluate the models' performance on unseen datasets from other countries and regions. In our empirical study, we show that GNNs outperform NNs in detecting complex collusive patterns. This research contributes to the ongoing discourse on preventing collusion and optimizing detection methodologies, providing valuable guidance on the use of NNs and GNNs in economic applications to enhance market fairness and economic welfare.

Updated: 2024-10-09 17:31:41

标题: 使用图神经网络进行串通检测

摘要: 共谋是一种复杂现象，即公司秘密合作从事欺诈行为。本文提出了一种创新方法，利用神经网络（NNs）和图神经网络（GNNs）在不同国家市场中检测和预测共谋模式。GNNs特别适合这项任务，因为它们可以利用共谋和许多其他经济问题中存在的固有网络结构。我们的方法分为两个阶段：在第一阶段，我们在来自日本、美国、瑞士两个地区、意大利和巴西的单个市场数据集上开发和训练模型，重点预测单个市场的共谋。在第二阶段，我们通过零样本学习扩展模型的适用性，采用迁移学习方法，在没有训练数据的市场中检测共谋。这一阶段还包括对外分布（OOD）泛化，评估模型在其他国家和地区的未知数据集上的性能。在我们的实证研究中，我们展示了GNNs在检测复杂共谋模式方面胜过NNs。这项研究为预防共谋和优化检测方法的持续讨论做出了贡献，为在经济应用中利用NNs和GNNs增强市场公平和经济福祉提供了宝贵指导。

更新时间: 2024-10-09 17:31:41

领域: econ.EM,cs.LG,stat.ML

下载: http://arxiv.org/abs/2410.07091v1

MedSafetyBench: Evaluating and Improving the Medical Safety of Large Language Models

As large language models (LLMs) develop increasingly sophisticated capabilities and find applications in medical settings, it becomes important to assess their medical safety due to their far-reaching implications for personal and public health, patient safety, and human rights. However, there is little to no understanding of the notion of medical safety in the context of LLMs, let alone how to evaluate and improve it. To address this gap, we first define the notion of medical safety in LLMs based on the Principles of Medical Ethics set forth by the American Medical Association. We then leverage this understanding to introduce MedSafetyBench, the first benchmark dataset designed to measure the medical safety of LLMs. We demonstrate the utility of MedSafetyBench by using it to evaluate and improve the medical safety of LLMs. Our results show that publicly-available medical LLMs do not meet standards of medical safety and that fine-tuning them using MedSafetyBench improves their medical safety while preserving their medical performance. By introducing this new benchmark dataset, our work enables a systematic study of the state of medical safety in LLMs and motivates future work in this area, paving the way to mitigate the safety risks of LLMs in medicine. The benchmark dataset and code are available at https://github.com/AI4LIFE-GROUP/med-safety-bench.

Updated: 2024-10-09 17:22:24

标题: MedSafetyBench：评估和改进大型语言模型的医疗安全性

摘要: 随着大型语言模型（LLMs）的不断发展和在医疗领域的应用，评估它们的医疗安全性变得越来越重要，因为这对个人和公共健康、患者安全和人权具有深远的影响。然而，在LLMs的背景下，对医疗安全的概念几乎没有理解，更不用说如何评估和改进了。为了解决这一差距，我们首先根据美国医学协会制定的医学伦理原则定义了LLMs中的医疗安全概念。然后利用这一理解引入了MedSafetyBench，这是第一个旨在衡量LLMs医疗安全性的基准数据集。我们展示了MedSafetyBench的实用性，通过使用它来评估和提高LLMs的医疗安全性。我们的结果表明，公开可用的医学LLMs不符合医疗安全标准，而使用MedSafetyBench对其进行微调可以提高其医疗安全性同时保持其医疗性能。通过引入这一新的基准数据集，我们的工作促进了对LLMs医疗安全状况的系统研究，并激励未来在这一领域的工作，为减轻LLMs在医学中的安全风险铺平道路。该基准数据集和代码可在https://github.com/AI4LIFE-GROUP/med-safety-bench上找到。

更新时间: 2024-10-09 17:22:24

领域: cs.AI

下载: http://arxiv.org/abs/2403.03744v5

ZS4C: Zero-Shot Synthesis of Compilable Code for Incomplete Code Snippets using LLMs

Technical Q&A sites are valuable for software developers seeking knowledge, but the code snippets they provide are often uncompilable and incomplete due to unresolved types and missing libraries. This poses a challenge for users who wish to reuse or analyze these snippets. Existing methods either do not focus on creating compilable code or have low success rates. To address this, we propose ZS4C, a lightweight approach for zero-shot synthesis of compilable code from incomplete snippets using Large Language Models (LLMs). ZS4C operates in two stages: first, it uses an LLM, like GPT-3.5, to identify missing import statements in a snippet; second, it collaborates with a validator (e.g., compiler) to fix compilation errors caused by incorrect imports and syntax issues. We evaluated ZS4C on the StatType-SO benchmark and a new dataset, Python-SO, which includes 539 Python snippets from Stack Overflow across the 20 most popular Python libraries. ZS4C significantly outperforms existing methods, improving the compilation rate from 63% to 95.1% compared to the state-of-the-art SnR, marking a 50.1% improvement. On average, ZS4C can infer more accurate import statements (with an F1 score of 0.98) than SnR, with an improvement of 8.5% in the F1.

Updated: 2024-10-09 17:19:47

标题: ZS4C：使用LLMs进行不完整代码片段的零-shot可编译代码合成

摘要: 技术问答网站对寻求知识的软件开发人员非常有价值，但它们提供的代码片段通常由于未解决的类型和缺失的库而无法编译和不完整。这给希望重用或分析这些代码片段的用户带来了挑战。现有方法要么不专注于创建可编译的代码，要么成功率低。为了解决这个问题，我们提出了ZS4C，一种使用大型语言模型（LLMs）进行零射击合成可编译代码的轻量级方法。ZS4C分两个阶段进行：首先，它使用像GPT-3.5这样的LLM来识别代码片段中缺失的导入语句；其次，它与验证器（例如编译器）合作，修复由于错误的导入和语法问题导致的编译错误。我们在StatType-SO基准和一个新数据集Python-SO上评估了ZS4C，该数据集包括来自Stack Overflow的539个Python代码片段，涵盖了最流行的20个Python库。与SnR最新技术相比，ZS4C明显优于现有方法，将编译率从63％提高到95.1％，改进了50.1％。平均来看，ZS4C可以比SnR更准确地推断导入语句（F1分数为0.98），F1得分提高了8.5％。

更新时间: 2024-10-09 17:19:47

领域: cs.SE,cs.AI

下载: http://arxiv.org/abs/2401.14279v2

Let's Ask GNN: Empowering Large Language Model for Graph In-Context Learning

Textual Attributed Graphs (TAGs) are crucial for modeling complex real-world systems, yet leveraging large language models (LLMs) for TAGs presents unique challenges due to the gap between sequential text processing and graph-structured data. We introduce AskGNN, a novel approach that bridges this gap by leveraging In-Context Learning (ICL) to integrate graph data and task-specific information into LLMs. AskGNN employs a Graph Neural Network (GNN)-powered structure-enhanced retriever to select labeled nodes across graphs, incorporating complex graph structures and their supervision signals. Our learning-to-retrieve algorithm optimizes the retriever to select example nodes that maximize LLM performance on graph. Experiments across three tasks and seven LLMs demonstrate AskGNN's superior effectiveness in graph task performance, opening new avenues for applying LLMs to graph-structured data without extensive fine-tuning.

Updated: 2024-10-09 17:19:12

标题: 让我们问问GNN：赋能大型语言模型进行图形上下文学习

摘要: 文本属性图（TAGs）对于建模复杂的现实世界系统至关重要，然而利用大型语言模型（LLMs）来处理TAGs存在独特的挑战，这是由于序列文本处理和图结构化数据之间的差距。我们引入了AskGNN，这是一种新颖的方法，通过利用上下文学习（ICL）将图数据和任务特定信息集成到LLMs中，从而弥合这一差距。AskGNN采用了一个由图神经网络（GNN）驱动的结构增强型检索器，以选择跨图的标记节点，并整合复杂的图结构和它们的监督信号。我们的学习检索算法优化了检索器，以选择最大化LLM在图上表现的示例节点。通过对三个任务和七个LLMs的实验，证明了AskGNN在图任务性能上具有卓越的有效性，为将LLMs应用于图结构化数据开辟了新的途径，而无需进行大量的微调。

更新时间: 2024-10-09 17:19:12

领域: cs.LG

下载: http://arxiv.org/abs/2410.07074v1

Towards xAI: Configuring RNN Weights using Domain Knowledge for MIMO Receive Processing

Deep learning is making a profound impact in the physical layer of wireless communications. Despite exhibiting outstanding empirical performance in tasks such as MIMO receive processing, the reasons behind the demonstrated superior performance improvement remain largely unclear. In this work, we advance the field of Explainable AI (xAI) in the physical layer of wireless communications utilizing signal processing principles. Specifically, we focus on the task of MIMO-OFDM receive processing (e.g., symbol detection) using reservoir computing (RC), a framework within recurrent neural networks (RNNs), which outperforms both conventional and other learning-based MIMO detectors. Our analysis provides a signal processing-based, first-principles understanding of the corresponding operation of the RC. Building on this fundamental understanding, we are able to systematically incorporate the domain knowledge of wireless systems (e.g., channel statistics) into the design of the underlying RNN by directly configuring the untrained RNN weights for MIMO-OFDM symbol detection. The introduced RNN weight configuration has been validated through extensive simulations demonstrating significant performance improvements. This establishes a foundation for explainable RC-based architectures in MIMO-OFDM receive processing and provides a roadmap for incorporating domain knowledge into the design of neural networks for NextG systems.

Updated: 2024-10-09 17:16:11

标题: 朝着可解释人工智能（xAI）的方向：利用领域知识配置循环神经网络权重以用于MIMO接收处理

摘要: 深度学习对无线通信物理层产生了深远影响。尽管在诸如MIMO接收处理等任务中表现出优异的经验性能，但所展示的卓越性能改进背后的原因仍然不清楚。在本研究中，我们在无线通信物理层利用信号处理原理推进了可解释人工智能（xAI）领域。具体而言，我们关注使用嵌入式计算（RC）的MIMO-OFDM接收处理任务（例如，符号检测），该框架位于循环神经网络（RNNs）中，优于传统和其他基于学习的MIMO检测器。我们的分析提供了基于信号处理的第一原则理解RC的相应运行。基于这种基本理解，我们能够系统地将无线系统领域知识（例如，信道统计）纳入未经训练的RNN权重的设计中，直接为MIMO-OFDM符号检测配置这些权重。引入的RNN权重配置经过广泛的模拟验证，证明了显著的性能改进。这为MIMO-OFDM接收处理中可解释的RC架构奠定了基础，并为将领域知识纳入神经网络设计的NextG系统提供了路线图。

更新时间: 2024-10-09 17:16:11

领域: eess.SP,cs.LG

下载: http://arxiv.org/abs/2410.07072v1

Retrieval-Augmented Decision Transformer: External Memory for In-context RL

In-context learning (ICL) is the ability of a model to learn a new task by observing a few exemplars in its context. While prevalent in NLP, this capability has recently also been observed in Reinforcement Learning (RL) settings. Prior in-context RL methods, however, require entire episodes in the agent's context. Given that complex environments typically lead to long episodes with sparse rewards, these methods are constrained to simple environments with short episodes. To address these challenges, we introduce Retrieval-Augmented Decision Transformer (RA-DT). RA-DT employs an external memory mechanism to store past experiences from which it retrieves only sub-trajectories relevant for the current situation. The retrieval component in RA-DT does not require training and can be entirely domain-agnostic. We evaluate the capabilities of RA-DT on grid-world environments, robotics simulations, and procedurally-generated video games. On grid-worlds, RA-DT outperforms baselines, while using only a fraction of their context length. Furthermore, we illuminate the limitations of current in-context RL methods on complex environments and discuss future directions. To facilitate future research, we release datasets for four of the considered environments.

Updated: 2024-10-09 17:15:30

标题: 检索增强决策转换器：上下文强化学习的外部记忆

摘要: In-context learning (ICL) 是指模型通过观察其上下文中的一些实例来学习新任务的能力。虽然在自然语言处理（NLP）中很常见，但最近也在强化学习（RL）环境中观察到了这种能力。然而，先前的上下文RL方法需要整个episode在agent的上下文中。鉴于复杂环境通常导致具有稀疏奖励的长episode，这些方法受限于简单环境和短episode。为了解决这些挑战，我们介绍了检索增强决策变换器（RA-DT）。RA-DT利用外部存储器机制存储过去的经验，从中检索出与当前情况相关的子轨迹。RA-DT中的检索组件不需要训练，可以完全与领域无关。我们在网格世界环境、机器人模拟和程序生成的视频游戏上评估了RA-DT的能力。在网格世界中，RA-DT优于基线方法，同时仅使用它们上下文长度的一小部分。此外，我们阐明了当前上下文RL方法在复杂环境中的局限性，并讨论了未来的方向。为了促进未来的研究，我们发布了四个考虑环境的数据集。

更新时间: 2024-10-09 17:15:30

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2410.07071v1

ReIFE: Re-evaluating Instruction-Following Evaluation

The automatic evaluation of instruction following typically involves using large language models (LLMs) to assess response quality. However, there is a lack of comprehensive evaluation of these LLM-based evaluators across two dimensions: the base LLMs and the evaluation protocols. Therefore, we present a thorough meta-evaluation of instruction following, including 25 base LLMs and 15 recently proposed evaluation protocols, on 4 human-annotated datasets, assessing the evaluation accuracy of the LLM-evaluators. Our evaluation allows us to identify the best-performing base LLMs and evaluation protocols with a high degree of robustness. Moreover, our large-scale evaluation reveals: (1) Base LLM performance ranking remains largely consistent across evaluation protocols, with less capable LLMs showing greater improvement from protocol enhancements; (2) Robust evaluation of evaluation protocols requires many base LLMs with varying capability levels, as protocol effectiveness can depend on the base LLM used; (3) Evaluation results on different datasets are not always consistent, so a rigorous evaluation requires multiple datasets with distinctive features. We release our meta-evaluation suite ReIFE, which provides the codebase and evaluation result collection for more than 500 LLM-evaluator configurations, to support future research in instruction-following evaluation.

Updated: 2024-10-09 17:14:50

标题: ReIFE: 重新评估指令遵循评估

摘要: 指导性遵循的自动评估通常涉及使用大型语言模型（LLMs）来评估响应质量。然而，在两个维度上缺乏对这些基于LLM的评估器的综合评估：基础LLM和评估协议。因此，我们对指导性遵循进行了彻底的元评估，包括25个基础LLM和15个最近提出的评估协议，在4个人工标注的数据集上评估LLM评估器的评估准确性。我们的评估使我们能够确定表现最佳的基础LLM和评估协议，具有高度的稳健性。此外，我们的大规模评估揭示了：（1）基础LLM的性能排名在评估协议中基本保持一致，能力较差的LLM从协议改进中获得更大的改进；（2）评估协议的稳健评估需要许多具有不同能力水平的基础LLM，因为协议的有效性可能取决于使用的基础LLM；（3）不同数据集上的评估结果并不总是一致的，因此严格的评估需要多个具有独特特征的数据集。我们发布了我们的元评估套件ReIFE，提供了500多个LLM评估器配置的代码库和评估结果收集，以支持未来的指导性遵循评估研究。

更新时间: 2024-10-09 17:14:50

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2410.07069v1

Answering Questions in Stages: Prompt Chaining for Contract QA

Finding answers to legal questions about clauses in contracts is an important form of analysis in many legal workflows (e.g., understanding market trends, due diligence, risk mitigation) but more important is being able to do this at scale. Prior work showed that it is possible to use large language models with simple zero-shot prompts to generate structured answers to questions, which can later be incorporated into legal workflows. Such prompts, while effective on simple and straightforward clauses, fail to perform when the clauses are long and contain information not relevant to the question. In this paper, we propose two-stage prompt chaining to produce structured answers to multiple-choice and multiple-select questions and show that they are more effective than simple prompts on more nuanced legal text. We analyze situations where this technique works well and areas where further refinement is needed, especially when the underlying linguistic variations are more than can be captured by simply specifying possible answers. Finally, we discuss future research that seeks to refine this work by improving stage one results by making them more question-specific.

Updated: 2024-10-09 17:14:13

标题: 分阶段回答问题：合同问答的提示链接

摘要: 在许多法律工作流程中，寻找合同条款的法律问题的答案是一种重要的分析形式（例如，理解市场趋势、尽职调查、风险缓解），但更重要的是能够实现大规模的分析。先前的研究表明，可以使用大型语言模型和简单的零-shot提示来生成问题的结构化答案，这些答案可以后续纳入法律工作流程中。这种提示在简单直接的条款上有效，但在条款较长且包含与问题无关的信息时无法发挥作用。在本文中，我们提出了两阶段提示链接的方法，以生成多项选择和多项选择问题的结构化答案，并表明它们比简单提示在更微妙的法律文本上更有效。我们分析了这种技术运作良好的情况以及需要进一步改进的领域，特别是当底层语言变化超出简单指定可能答案的范围时。最后，我们讨论了未来研究的方向，该研究旨在通过使第一阶段结果更具问题特定性来完善这项工作。

更新时间: 2024-10-09 17:14:13

领域: cs.CL,cs.CV,cs.LG

下载: http://arxiv.org/abs/2410.12840v1

Towards Generalisable Time Series Understanding Across Domains

In natural language processing and computer vision, self-supervised pre-training on large datasets unlocks foundational model capabilities across domains and tasks. However, this potential has not yet been realised in time series analysis, where existing methods disregard the heterogeneous nature of time series characteristics. Time series are prevalent in many domains, including medicine, engineering, natural sciences, and finance, but their characteristics vary significantly in terms of variate count, inter-variate relationships, temporal dynamics, and sampling frequency. This inherent heterogeneity across domains prevents effective pre-training on large time series corpora. To address this issue, we introduce OTiS, an open model for general time series analysis, that has been specifically designed to handle multi-domain heterogeneity. We propose a novel pre-training paradigm including a tokeniser with learnable domain-specific signatures, a dual masking strategy to capture temporal causality, and a normalised cross-correlation loss to model long-range dependencies. Our model is pre-trained on a large corpus of 640,187 samples and 11 billion time points spanning 8 distinct domains, enabling it to analyse time series from any (unseen) domain. In comprehensive experiments across 15 diverse applications - including classification, regression, and forecasting - OTiS showcases its ability to accurately capture domain-specific data characteristics and demonstrates its competitiveness against state-of-the-art baselines. Our code and pre-trained weights are publicly available at https://github.com/oetu/otis.

Updated: 2024-10-09 17:09:30

标题: 通往跨领域可推广的时间序列理解

摘要: 在自然语言处理和计算机视觉领域，对大型数据集进行自监督预训练可以释放跨领域和任务的基础模型能力。然而，这种潜力在时间序列分析中尚未得到实现，现有方法忽视了时间序列特征的异质性。时间序列在许多领域中普遍存在，包括医学、工程、自然科学和金融，但它们的特征在变量数量、变量间关系、时间动态和采样频率方面存在显著差异。跨领域的本质异质性阻碍了对大型时间序列语料库进行有效的预训练。为了解决这个问题，我们引入了OTiS，一个用于一般时间序列分析的开放模型，专门设计用于处理多领域的异质性。我们提出了一种新颖的预训练范式，包括具有可学习领域特定签名的分词器，用于捕捉时间因果关系的双重屏蔽策略，以及用于建模长程依赖性的归一化交叉相关损失。我们的模型在一个包含640,187个样本和110亿个时间点的大型语料库上进行了预训练，覆盖了8个不同领域，使其能够分析来自任何（未见过的）领域的时间序列。在包括分类、回归和预测在内的15个不同应用的全面实验中，OTiS展示了其准确捕捉领域特定数据特征的能力，并展示了其与最先进基线模型的竞争力。我们的代码和预训练权重可在https://github.com/oetu/otis 上公开获取。

更新时间: 2024-10-09 17:09:30

领域: cs.LG,cs.AI,cs.CV

下载: http://arxiv.org/abs/2410.07299v1

Capturing Bias Diversity in LLMs

This paper presents research on enhancements to Large Language Models (LLMs) through the addition of diversity in its generated outputs. Our study introduces a configuration of multiple LLMs which demonstrates the diversities capable with a single LLM. By developing multiple customised instances of a GPT model, each reflecting biases in specific demographic characteristics including gender, age, and race, we propose, develop and evaluate a framework for a more nuanced and representative AI dialogue which we call BiasGPT. The customised GPT models will ultimately collaborate, merging their diverse perspectives on a topic into an integrated response that captures a broad spectrum of human experiences and viewpoints. In this paper, through experiments, we demonstrate the capabilities of a GPT model to embed different biases which, when combined, can open the possibilities of more inclusive AI technologies.

Updated: 2024-10-09 17:07:50

标题: 捕捉LLMs中的偏见多样性

摘要: 本文介绍了对大型语言模型（LLMs）进行增强的研究，通过增加其生成输出的多样性。我们的研究引入了多个LLMs的配置，展示了单个LLM所能展现的多样性。通过开发多个定制的GPT模型实例，每个模型反映了特定人口统计特征包括性别、年龄和种族的偏见，我们提出、开发和评估了一个更加细致和代表性的AI对话框架，我们称之为BiasGPT。这些定制的GPT模型最终将合作，将它们对某一主题的不同观点融合为一个整体回应，捕捉了广泛人类经验和观点的光谱。本文通过实验展示了GPT模型嵌入不同偏见的能力，当这些偏见结合在一起时，可以打开更加包容性的AI技术的可能性。

更新时间: 2024-10-09 17:07:50

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2410.12839v1

Population Transformer: Learning Population-level Representations of Neural Activity

We present a self-supervised framework that learns population-level codes for arbitrary ensembles of neural recordings at scale. We address two key challenges in scaling models with neural time-series data: sparse and variable electrode distribution across subjects and datasets. The Population Transformer (PopT) stacks on top of pretrained representations and enhances downstream decoding by enabling learned aggregation of multiple spatially-sparse data channels. The pretrained PopT lowers the amount of data required for downstream decoding experiments, while increasing accuracy, even on held-out subjects and tasks. Compared to end-to-end methods, this approach is computationally lightweight and more interpretable, while still retaining competitive performance. We further show how our framework is generalizable to multiple time-series embeddings and neural data modalities. Beyond decoding, we interpret the pretrained PopT and fine-tuned models to show how they can be used to extract neuroscience insights from massive amounts of data. We release our code as well as a pretrained PopT to enable off-the-shelf improvements in multi-channel intracranial data decoding and interpretability.

Updated: 2024-10-09 17:07:27

标题: 人口变换器：学习神经活动的人口级表征

摘要: 我们提出了一个自监督框架，可以在规模上学习任意神经记录集合的人口级编码。我们解决了使用神经时间序列数据扩展模型的两个关键挑战：不同受试者和数据集之间稀疏和可变的电极分布。人口变换器（PopT）堆叠在预训练表示的顶部，并通过启用学习到的多个空间稀疏数据通道的聚合来增强下游解码。预训练的PopT减少了下游解码实验所需的数据量，同时提高了精度，甚至在留出的受试者和任务上也是如此。与端到端方法相比，这种方法在计算上更轻量级，更易解释，同时仍保持竞争性能。我们进一步展示了我们的框架如何推广到多个时间序列嵌入和神经数据模态。除了解码外，我们解释了预训练的PopT和微调模型，展示了它们如何从大量数据中提取神经科学见解。我们发布了我们的代码以及一个预训练的PopT，以实现多通道颅内数据解码和可解释性的即插即用改进。

更新时间: 2024-10-09 17:07:27

领域: cs.LG,q-bio.NC

下载: http://arxiv.org/abs/2406.03044v2

InAttention: Linear Context Scaling for Transformers

VRAM requirements for transformer models scale quadratically with context length due to the self-attention mechanism. In this paper we modify the decoder-only transformer, replacing self-attention with InAttention, which scales linearly with context length during inference by having tokens attend only to initial states. Benchmarking shows that InAttention significantly reduces VRAM usage during inference, enabling handling of long sequences on consumer GPUs. We corroborate that fine-tuning extends context length efficiently, improving performance on long sequences without high training costs. InAttention offers a scalable solution for long-range dependencies in transformer models, paving the way for further optimization.

Updated: 2024-10-09 17:05:15

标题: InAttention: Transformers的线性上下文缩放

摘要: 变压器模型的VRAM要求随着上下文长度的增加而呈二次方增长，这是由于自注意力机制导致的。本文中，我们修改了仅包含解码器的transformer模型，将自注意力机制替换为InAttention，在推断过程中，通过令令牌仅关注初始状态，实现了与上下文长度线性扩展。基准测试显示，InAttention在推断过程中显著减少了VRAM的使用量，使得消费级GPU能够处理长序列。我们证实，微调有效地扩展了上下文长度，提高了在长序列上的性能，而无需高昂的训练成本。InAttention为transformer模型中长距离依赖关系提供了可扩展的解决方案，为进一步优化铺平了道路。

更新时间: 2024-10-09 17:05:15

领域: cs.LG

下载: http://arxiv.org/abs/2410.07063v1

Online Epsilon Net and Piercing Set for Geometric Concepts

VC-dimension and $\varepsilon$-nets are key concepts in Statistical Learning Theory. Intuitively, VC-dimension is a measure of the size of a class of sets. The famous $\varepsilon$-net theorem, a fundamental result in Discrete Geometry, asserts that if the VC-dimension of a set system is bounded, then a small sample exists that intersects all sufficiently large sets. In online learning scenarios where data arrives sequentially, the VC-dimension helps to bound the complexity of the set system, and $\varepsilon$-nets ensure the selection of a small representative set. This sampling framework is crucial in various domains, including spatial data analysis, motion planning in dynamic environments, optimization of sensor networks, and feature extraction in computer vision, among others. Motivated by these applications, we study the online $\varepsilon$-net problem for geometric concepts with bounded VC-dimension. While the offline version of this problem has been extensively studied, surprisingly, there are no known theoretical results for the online version to date. We present the first deterministic online algorithm with an optimal competitive ratio for intervals in $\mathbb{R}$. Next, we give a randomized online algorithm with a near-optimal competitive ratio for axis-aligned boxes in $\mathbb{R}^d$, for $d\le 3$. Furthermore, we introduce a novel technique to analyze similar-sized objects of constant description complexity in $\mathbb{R}^d$, which may be of independent interest. Next, we focus on the continuous version of this problem, where ranges of the set system are geometric concepts in $\mathbb{R}^d$ arriving in an online manner, but the universe is the entire space, and the objective is to choose a small sample that intersects all the ranges.

Updated: 2024-10-09 16:58:36

标题: 在线 Epsilon 网和几何概念的穿透集

摘要: VC维和$\varepsilon$-网是统计学习理论中的关键概念。直观地说，VC维度是一组集合的大小度量。著名的$\varepsilon$-网定理是离散几何中的基本结果，它断言如果一个集合系统的VC维度有界，那么存在一个小样本，它与所有足够大的集合相交。在数据按顺序到达的在线学习场景中，VC维度有助于限制集合系统的复杂性，而$\varepsilon$-网确保选择一个小的代表性集合。这种抽样框架在各个领域中至关重要，包括空间数据分析、动态环境中的运动规划、传感器网络的优化以及计算机视觉中的特征提取等。受这些应用的启发，我们研究了具有有界VC维度的几何概念的在线$\varepsilon$-网问题。虽然该问题的离线版本已经得到广泛研究，但令人惊讶的是，迄今为止对于在线版本尚无已知的理论结果。我们提出了第一个针对$\mathbb{R}$中区间的在线确定性算法，具有最优的竞争比。接下来，我们为$\mathbb{R}^d$中轴对齐框的随机在线算法提供了一个接近最优的竞争比，其中$d\le 3$。此外，我们引入了一种新颖的技术，用于分析$\mathbb{R}^d$中具有恒定描述复杂性的尺寸相似对象，这可能是独立感兴趣的。接下来，我们关注该问题的连续版本，其中集合系统的范围是$\mathbb{R}^d$中的几何概念，按在线方式到达，但宇宙是整个空间，目标是选择一个小样本，它与所有范围相交。

更新时间: 2024-10-09 16:58:36

领域: cs.LG,cs.CG

下载: http://arxiv.org/abs/2410.07059v1

Mitigating the Language Mismatch and Repetition Issues in LLM-based Machine Translation via Model Editing

Large Language Models (LLMs) have recently revolutionized the NLP field, while they still fall short in some specific down-stream tasks. In the work, we focus on utilizing LLMs to perform machine translation, where we observe that two patterns of errors frequently occur and drastically affect the translation quality: language mismatch and repetition. The work sets out to explore the potential for mitigating these two issues by leveraging model editing methods, e.g., by locating Feed-Forward Network (FFN) neurons or something that are responsible for the errors and deactivating them in the inference time. We find that directly applying such methods either limited effect on the targeted errors or has significant negative side-effect on the general translation quality, indicating that the located components may also be crucial for ensuring machine translation with LLMs on the rails. To this end, we propose to refine the located components by fetching the intersection of the locating results under different language settings, filtering out the aforementioned information that is irrelevant to targeted errors. The experiment results empirically demonstrate that our methods can effectively reduce the language mismatch and repetition ratios and meanwhile enhance or keep the general translation quality in most cases.

Updated: 2024-10-09 16:51:21

标题: 通过模型编辑减轻LLM机器翻译中的语言不匹配和重复问题

摘要: 大型语言模型(LLMs)最近在自然语言处理领域引起了革命，但它们在一些特定的下游任务中仍存在不足。在这项工作中，我们专注于利用LLMs进行机器翻译，观察到两种错误模式经常发生并严重影响翻译质量：语言不匹配和重复。该工作旨在探索通过利用模型编辑方法来减轻这两个问题的潜力，例如，通过定位前馈网络(FFN)神经元或其他负责错误的部分，并在推理时将其停用。我们发现直接应用这些方法对目标错误有限的影响，或者对一般翻译质量有显著负面影响，这表明定位的组件对确保LLMs进行机器翻译也是至关重要的。因此，我们建议通过获取不同语言设置下的定位结果的交集来完善定位的组件，过滤掉与目标错误无关的信息。实验证明，我们的方法可以在大多数情况下有效减少语言不匹配和重复比例，同时增强或保持一般翻译质量。

更新时间: 2024-10-09 16:51:21

领域: cs.CL,cs.LG

下载: http://arxiv.org/abs/2410.07054v1

SAGMAN: Stability Analysis of Graph Neural Networks on the Manifolds

Modern graph neural networks (GNNs) can be sensitive to changes in the input graph structure and node features, potentially resulting in unpredictable behavior and degraded performance. In this work, we introduce a spectral framework known as SAGMAN for examining the stability of GNNs. This framework assesses the distance distortions that arise from the nonlinear mappings of GNNs between the input and output manifolds: when two nearby nodes on the input manifold are mapped (through a GNN model) to two distant ones on the output manifold, it implies a large distance distortion and thus a poor GNN stability. We propose a distance-preserving graph dimension reduction (GDR) approach that utilizes spectral graph embedding and probabilistic graphical models (PGMs) to create low-dimensional input/output graph-based manifolds for meaningful stability analysis. Our empirical evaluations show that SAGMAN effectively assesses the stability of each node when subjected to various edge or feature perturbations, offering a scalable approach for evaluating the stability of GNNs, extending to applications within recommendation systems. Furthermore, we illustrate its utility in downstream tasks, notably in enhancing GNN stability and facilitating adversarial targeted attacks.

Updated: 2024-10-09 16:51:02

标题: SAGMAN：图神经网络在流形上的稳定性分析

摘要: 现代图神经网络（GNNs）可能对输入图结构和节点特征的变化敏感，可能导致不可预测的行为和性能下降。在这项工作中，我们介绍了一个称为SAGMAN的谱框架，用于检查GNN的稳定性。该框架评估了由于GNN在输入和输出流形之间的非线性映射而产生的距离失真：当输入流形上的两个相邻节点（通过GNN模型）映射到输出流形上的两个相距较远的节点时，意味着存在较大的距离失真，从而导致GNN的稳定性较差。我们提出了一种保持距离的图维度缩减（GDR）方法，利用谱图嵌入和概率图模型（PGM）来创建有意义的稳定性分析的低维输入/输出基于图的流形。我们的实证评估表明，SAGMAN在各种边缘或特征扰动下有效评估每个节点的稳定性，为评估GNN的稳定性提供了一种可扩展的方法，可以扩展到推荐系统中的应用。此外，我们展示了其在下游任务中的实用性，特别是在增强GNN的稳定性和促进有针对性的对抗攻击方面。

更新时间: 2024-10-09 16:51:02

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2402.08653v4

CMMD: Contrastive Multi-Modal Diffusion for Video-Audio Conditional Modeling

We introduce a multi-modal diffusion model tailored for the bi-directional conditional generation of video and audio. We propose a joint contrastive training loss to improve the synchronization between visual and auditory occurrences. We present experiments on two datasets to evaluate the efficacy of our proposed model. The assessment of generation quality and alignment performance is carried out from various angles, encompassing both objective and subjective metrics. Our findings demonstrate that the proposed model outperforms the baseline in terms of quality and generation speed through introduction of our novel cross-modal easy fusion architectural block. Furthermore, the incorporation of the contrastive loss results in improvements in audio-visual alignment, particularly in the high-correlation video-to-audio generation task.

Updated: 2024-10-09 16:49:58

标题: CMMD：视频-音频条件建模的对比多模态扩散

摘要: 我们介绍了一种专为视频和音频双向条件生成定制的多模态扩散模型。我们提出了一种联合对比训练损失，以改善视觉和听觉事件之间的同步性。我们在两个数据集上进行实验，以评估我们提出的模型的有效性。从各个角度进行生成质量和对齐性能的评估，包括客观和主观指标。我们的研究结果表明，通过引入我们的新型跨模态易融合结构块，所提出的模型在质量和生成速度方面优于基线模型。此外，对比损失的引入导致音频-视觉对齐性的改善，特别是在高相关性的视频到音频生成任务中。

更新时间: 2024-10-09 16:49:58

领域: cs.LG,cs.CV,cs.MM,cs.SD,eess.AS

下载: http://arxiv.org/abs/2312.05412v2

Likelihood-Free Frequentist Inference: Bridging Classical Statistics and Machine Learning for Reliable Simulator-Based Inference

Many areas of science rely on simulators that implicitly encode intractable likelihood functions of complex systems. Classical statistical methods are poorly suited for these so-called likelihood-free inference (LFI) settings, especially outside asymptotic and low-dimensional regimes. At the same time, popular LFI methods - such as Approximate Bayesian Computation or more recent machine learning techniques - do not necessarily lead to valid scientific inference because they do not guarantee confidence sets with nominal coverage in general settings. In addition, LFI currently lacks practical diagnostic tools to check the actual coverage of computed confidence sets across the entire parameter space. In this work, we propose a modular inference framework that bridges classical statistics and modern machine learning to provide (i) a practical approach for constructing confidence sets with near finite-sample validity at any value of the unknown parameters, and (ii) interpretable diagnostics for estimating empirical coverage across the entire parameter space. We refer to this framework as likelihood-free frequentist inference (LF2I). Any method that defines a test statistic can leverage LF2I to create valid confidence sets and diagnostics without costly Monte Carlo or bootstrap samples at fixed parameter settings. We study two likelihood-based test statistics (ACORE and BFF) and demonstrate their performance on high-dimensional complex data. Code is available at https://github.com/lee-group-cmu/lf2i.

Updated: 2024-10-09 16:47:08

标题: 无似然频率推断：弥合经典统计学和机器学习，以获得可靠的基于模拟器的推断

摘要: 许多科学领域依赖于隐式编码复杂系统无法处理的似然函数的模拟器。传统统计方法在这些所谓的无似然推断(LFI)设置中很难适用，特别是在非渐近和低维度的情况下。同时，流行的LFI方法 - 如近似贝叶斯计算或更近期的机器学习技术 - 不一定会导致有效的科学推断，因为它们在一般情况下不能保证名义覆盖率的置信区间。此外，LFI目前缺乏实用的诊断工具，无法检查计算的置信区间在整个参数空间中的实际覆盖率。在这项工作中，我们提出了一个模块化推断框架，它将传统统计学和现代机器学习相结合，提供(i)一种在未知参数的任何值附近构建具有近似有限样本有效性的置信区间的实用方法，以及(ii)用于估计整个参数空间中的经验覆盖率的可解释诊断。我们将这个框架称为无似然频率推断(LF2I)。任何定义了检验统计量的方法都可以利用LF2I创建有效的置信区间和诊断，而无需在固定参数设置下进行昂贵的蒙特卡洛或自助采样。我们研究了两个基于似然的检验统计量(ACORE和BFF)，并展示了它们在高维复杂数据上的性能。代码可在https://github.com/lee-group-cmu/lf2i找到。

更新时间: 2024-10-09 16:47:08

领域: stat.ML,cs.LG

下载: http://arxiv.org/abs/2107.03920v9

Context-Augmented Code Generation Using Programming Knowledge Graphs

Large Language Models (LLMs) and Code-LLMs (CLLMs) have significantly improved code generation, but, they frequently face difficulties when dealing with challenging and complex problems. Retrieval-Augmented Generation (RAG) addresses this issue by retrieving and integrating external knowledge at the inference time. However, retrieval models often fail to find most relevant context, and generation models, with limited context capacity, can hallucinate when given irrelevant data. We present a novel framework that leverages a Programming Knowledge Graph (PKG) to semantically represent and retrieve code. This approach enables fine-grained code retrieval by focusing on the most relevant segments while reducing irrelevant context through a tree-pruning technique. PKG is coupled with a re-ranking mechanism to reduce even more hallucinations by selectively integrating non-RAG solutions. We propose two retrieval approaches-block-wise and function-wise-based on the PKG, optimizing context granularity. Evaluations on the HumanEval and MBPP benchmarks show our method improves pass@1 accuracy by up to 20%, and outperforms state-of-the-art models by up to 34% on MBPP. Our contributions include PKG-based retrieval, tree pruning to enhance retrieval precision, a re-ranking method for robust solution selection and a Fill-in-the-Middle (FIM) enhancer module for automatic code augmentation with relevant comments and docstrings.

Updated: 2024-10-09 16:35:41

标题: 利用编程知识图增强上下文的代码生成

摘要: 大型语言模型（LLMs）和代码-LLMs（CLLMs）已显著改进了代码生成，但在处理具有挑战性和复杂问题时经常面临困难。检索增强生成（RAG）通过在推理时检索和整合外部知识来解决这个问题。然而，检索模型通常无法找到最相关的上下文，而生成模型由于上下文容量有限，在提供无关数据时可能会产生幻觉。我们提出了一个新颖的框架，利用编程知识图（PKG）来语义化地表示和检索代码。这种方法通过聚焦于最相关的片段来实现细粒度的代码检索，同时通过树修剪技术减少无关上下文。PKG与重新排名机制相结合，通过选择性地集成非RAG解决方案来进一步减少幻觉。我们提出了基于PKG的两种检索方法：基于块和基于函数，优化上下文粒度。在HumanEval和MBPP基准测试上的评估表明，我们的方法将pass@1准确率提高了高达20％，在MBPP上优于最先进模型高达34％。我们的贡献包括基于PKG的检索、树修剪以增强检索精度、一种用于选择稳健解决方案的重新排名方法，以及一种用于自动代码增强的填充中间（FIM）增强模块，其中包括相关注释和文档字符串。

更新时间: 2024-10-09 16:35:41

领域: cs.SE,cs.AI,cs.IR

下载: http://arxiv.org/abs/2410.18251v1

Abstracting Situation Calculus Action Theories

We develop a general framework for agent abstraction based on the situation calculus and the ConGolog agent programming language. We assume that we have a high-level specification and a low-level specification of the agent, both represented as basic action theories. A refinement mapping specifies how each high-level action is implemented by a low-level ConGolog program and how each high-level fluent can be translated into a low-level formula. We define a notion of sound abstraction between such action theories in terms of the existence of a suitable bisimulation between their respective models. Sound abstractions have many useful properties that ensure that we can reason about the agent's actions (e.g., executability, projection, and planning) at the abstract level, and refine and concretely execute them at the low level. We also characterize the notion of complete abstraction where all actions (including exogenous ones) that the high level thinks can happen can in fact occur at the low level. To facilitate verifying that one has a sound/complete abstraction relative to a mapping, we provide a set of necessary and sufficient conditions. Finally, we identify a set of basic action theory constraints that ensure that for any low-level action sequence, there is a unique high-level action sequence that it refines. This allows us to track/monitor what the low-level agent is doing and describe it in abstract terms (i.e., provide high-level explanations, for instance, to a client or manager).

Updated: 2024-10-09 16:34:28

标题: 抽象化情境演算动作理论

摘要: 我们基于Situation Calculus和ConGolog代理编程语言开发了一个通用的代理抽象框架。我们假设我们有代理的高级规范和低级规范，两者都表示为基本的动作理论。通过细化映射，我们可以指定每个高级动作如何由低级ConGolog程序实现，以及如何将每个高级流畅翻译成低级公式。我们定义了在这种动作理论之间存在适当的双模拟时的声音抽象概念。声音抽象具有许多有用的属性，可以确保我们可以在抽象层面推理代理的动作（例如，可执行性、投影和规划），并在低层次上对其进行细化和具体执行。我们还表征了完全抽象的概念，即高层认为可能发生的所有动作（包括外部动作）实际上可以在低层发生。为了验证相对于映射是否具有声音/完全抽象，我们提供了一组必要和充分条件。最后，我们确定了一组基本的动作理论约束，确保对于任何低级动作序列，都存在一个唯一的高级动作序列对应。这使我们可以跟踪/监视低级代理正在做什么，并用抽象术语描述它（例如，向客户或经理提供高级解释）。

更新时间: 2024-10-09 16:34:28

领域: cs.LO,cs.AI,I.2.4

下载: http://arxiv.org/abs/2410.14712v1

Greener GRASS: Enhancing GNNs with Encoding, Rewiring, and Attention

Graph Neural Networks (GNNs) have become important tools for machine learning on graph-structured data. In this paper, we explore the synergistic combination of graph encoding, graph rewiring, and graph attention, by introducing Graph Attention with Stochastic Structures (GRASS), a novel GNN architecture. GRASS utilizes relative random walk probabilities (RRWP) encoding and a novel decomposed variant (D-RRWP) to efficiently capture structural information. It rewires the input graph by superimposing a random regular graph to enhance long-range information propagation. It also employs a novel additive attention mechanism tailored for graph-structured data. Our empirical evaluations demonstrate that GRASS achieves state-of-the-art performance on multiple benchmark datasets, including a 20.3% reduction in mean absolute error on the ZINC dataset.

Updated: 2024-10-09 16:32:11

标题: 更绿的草地：通过编码、重连和注意力增强GNNs

摘要: 图神经网络（GNNs）已成为处理图结构数据的重要工具。本文探讨了图编码、图重连和图注意力的协同组合，引入了一种新颖的GNN架构——具有随机结构的图注意力（GRASS）。GRASS利用相对随机游走概率（RRWP）编码和一种新颖的分解变体（D-RRWP）来有效捕捉结构信息。它通过叠加随机正则图来重连输入图，以增强远程信息传播。它还采用了一种针对图结构数据量身定制的新型加性注意力机制。我们的实证评估表明，GRASS在多个基准数据集上实现了最先进的性能，包括在ZINC数据集上平均绝对误差减少了20.3%。

更新时间: 2024-10-09 16:32:11

领域: cs.LG,cs.AI,cs.NE

下载: http://arxiv.org/abs/2407.05649v3

Emergent properties with repeated examples

We study the performance of transformers as a function of the number of repetitions of training examples with algorithmically generated datasets. On three problems of mathematics: the greatest common divisor, modular multiplication, and matrix eigenvalues, we show that for a fixed number of training steps, models trained on smaller sets of repeated examples outperform models trained on larger sets of single-use examples. We also demonstrate that two-set training - repeated use of a small random subset of examples, along normal sampling on the rest of the training set - provides for faster learning and better performance. This highlights that the benefits of repetition can outweigh those of data diversity. These datasets and problems provide a controlled setting to shed light on the still poorly understood interplay between generalization and memorization in deep learning.

Updated: 2024-10-09 16:28:23

标题: 重复示例的紧急属性

摘要: 我们研究了变压器在算法生成的数据集上训练示例的重复次数对性能的影响。在数学的三个问题上：最大公约数、模乘法和矩阵特征值，我们发现，在固定训练步数的情况下，模型在重复训练示例较少的数据集上训练要优于在单次使用示例较多的数据集上训练。我们还展示了两组训练-反复使用小随机子集示例，以及在其余训练集上进行常规抽样-可以提供更快的学习速度和更好的性能。这突出了重复的好处可能超过数据多样性的好处。这些数据集和问题提供了一个受控的环境，以阐明深度学习中广义化和记忆之间仍然不太理解的相互作用。

更新时间: 2024-10-09 16:28:23

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2410.07041v1

A Poincaré Inequality and Consistency Results for Signal Sampling on Large Graphs

Large-scale graph machine learning is challenging as the complexity of learning models scales with the graph size. Subsampling the graph is a viable alternative, but sampling on graphs is nontrivial as graphs are non-Euclidean. Existing graph sampling techniques require not only computing the spectra of large matrices but also repeating these computations when the graph changes, e.g., grows. In this paper, we introduce a signal sampling theory for a type of graph limit -- the graphon. We prove a Poincar\'e inequality for graphon signals and show that complements of node subsets satisfying this inequality are unique sampling sets for Paley-Wiener spaces of graphon signals. Exploiting connections with spectral clustering and Gaussian elimination, we prove that such sampling sets are consistent in the sense that unique sampling sets on a convergent graph sequence converge to unique sampling sets on the graphon. We then propose a related graphon signal sampling algorithm for large graphs, and demonstrate its good empirical performance on graph machine learning tasks.

Updated: 2024-10-09 16:28:15

标题: 一个关于大图信号采样的Poincaré不等式和一致性结果

摘要: 大规模图机器学习具有挑战性，因为学习模型的复杂性随着图的大小而增加。对图进行子抽样是一种可行的替代方法，但在图上进行抽样是非平凡的，因为图是非欧几里德的。现有的图抽样技术不仅需要计算大矩阵的谱，而且在图发生变化时需要重复这些计算，例如图的增长。在本文中，我们介绍了一种信号抽样理论，针对一种图极限——图拓扑。我们证明了图拓扑信号的Poincar\'e不等式，并展示了满足该不等式的节点子集的补集是图拓扑信号的Paley-Wiener空间的唯一抽样集。利用与谱聚类和高斯消元的联系，我们证明这种抽样集在一致性上是一致的，即在收敛的图序列上的唯一抽样集会收敛到图拓扑上的唯一抽样集。然后，我们提出了一个相关的图拓扑信号抽样算法用于大型图，并展示了它在图机器学习任务上的良好实证表现。

更新时间: 2024-10-09 16:28:15

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2311.10610v3

Distributionally Robust Clustered Federated Learning: A Case Study in Healthcare

In this paper, we address the challenge of heterogeneous data distributions in cross-silo federated learning by introducing a novel algorithm, which we term Cross-silo Robust Clustered Federated Learning (CS-RCFL). Our approach leverages the Wasserstein distance to construct ambiguity sets around each client's empirical distribution that capture possible distribution shifts in the local data, enabling evaluation of worst-case model performance. We then propose a model-agnostic integer fractional program to determine the optimal distributionally robust clustering of clients into coalitions so that possible biases in the local models caused by statistically heterogeneous client datasets are avoided, and analyze our method for linear and logistic regression models. Finally, we discuss a federated learning protocol that ensures the privacy of client distributions, a critical consideration, for instance, when clients are healthcare institutions. We evaluate our algorithm on synthetic and real-world healthcare data.

Updated: 2024-10-09 16:25:01

标题: 分布稳健的集群化联邦学习：以医疗为例研究

摘要: 在这篇论文中，我们针对跨存储 federated learning 中异构数据分布的挑战，引入了一种新颖的算法，我们将其称为 Cross-silo Robust Clustered Federated Learning (CS-RCFL)。我们的方法利用Wasserstein距离构建每个客户端经验分布周围的模糊集，捕捉本地数据可能的分布变化，从而评估最坏情况下模型的性能。然后，我们提出了一个与模型无关的整数分数规划，以确定客户端的最佳分布鲁棒聚类，避免由统计异构客户端数据集引起的本地模型可能的偏差，并分析我们的方法适用于线性和逻辑回归模型。最后，我们讨论了一个确保客户端分布隐私的联邦学习协议，这是一个关键考虑因素，例如当客户端是医疗机构时。我们对合成和真实世界的医疗保健数据评估了我们的算法。

更新时间: 2024-10-09 16:25:01

领域: cs.LG

下载: http://arxiv.org/abs/2410.07039v1

IterGen: Iterative Structured LLM Generation

Large Language Models (LLMs) are widely used for tasks such as natural language and code generation. Still, their outputs often suffer from issues like privacy violations, and semantically inaccurate code generation. Current libraries for LLM generation rely on left-to-right decoding without systematic support for backtracking, limiting the ability to correct or refine outputs mid-generation. To address this issue, we introduce IterGen, an intuitive framework for iterative, grammar-guided LLM generation that enables users to move both forward and backward within the generated output based on grammar symbols. By leveraging a symbol-to-position mapping, IterGen ensures efficient and structured generation while allowing for corrections during the process. We demonstrate IterGen's effectiveness in two important applications: reducing privacy leakage in LLM outputs and improving the accuracy of LLM-generated SQL queries. Our code is available at https://github.com/uiuc-arc/itergen

Updated: 2024-10-09 16:21:38

标题: IterGen：迭代式结构化LLM生成

摘要: 大型语言模型（LLMs）广泛用于自然语言和代码生成等任务。然而，它们的输出经常存在隐私侵犯和语义不准确的代码生成等问题。当前用于LLM生成的库依赖于从左到右的解码，没有系统支持回溯，限制了在生成过程中纠正或完善输出的能力。为了解决这个问题，我们引入了IterGen，一个直观的迭代、基于语法引导的LLM生成框架，使用户能够基于语法符号在生成的输出中向前和向后移动。通过利用符号到位置映射，IterGen确保高效和结构化的生成，同时允许在过程中进行纠正。我们展示了IterGen在两个重要应用中的有效性：减少LLM输出中的隐私泄漏和提高LLM生成的SQL查询的准确性。我们的代码可在https://github.com/uiuc-arc/itergen上找到。

更新时间: 2024-10-09 16:21:38

领域: cs.SE,cs.LG,cs.PL

下载: http://arxiv.org/abs/2410.07295v1

PositionID: LLMs can Control Lengths, Copy and Paste with Explicit Positional Awareness

Large Language Models (LLMs) demonstrate impressive capabilities across various domains, including role-playing, creative writing, mathematical reasoning, and coding. Despite these advancements, LLMs still encounter challenges with length control, frequently failing to adhere to specific length constraints due to their token-level operations and insufficient training on data with strict length limitations. We identify this issue as stemming from a lack of positional awareness and propose novel approaches--PositionID Prompting and PositionID Fine-Tuning--to address it. These methods enhance the model's ability to continuously monitor and manage text length during generation. Additionally, we introduce PositionID CP Prompting to enable LLMs to perform copy and paste operations accurately. Furthermore, we develop two benchmarks for evaluating length control and copy-paste abilities. Our experiments demonstrate that our methods significantly improve the model's adherence to length constraints and copy-paste accuracy without compromising response quality.

Updated: 2024-10-09 16:15:36

标题: PositionID：LLMs可以通过明确的位置感知控制长度，复制和粘贴

摘要: 大型语言模型（LLMs）展示了在各个领域的强大能力，包括角色扮演、创意写作、数学推理和编码。尽管取得了这些进展，LLMs仍然在长度控制方面遇到挑战，经常无法遵守特定长度约束，这是由于它们的令牌级操作和对具有严格长度限制的数据的不足训练所致。我们确定了这个问题源于缺乏位置意识，并提出了新颖的方法——PositionID提示和PositionID微调——来解决这个问题。这些方法增强了模型在生成过程中持续监控和管理文本长度的能力。此外，我们引入了PositionID CP提示，使LLMs能够准确执行复制和粘贴操作。此外，我们开发了两个用于评估长度控制和复制粘贴能力的基准。我们的实验表明，我们的方法显著提高了模型对长度约束和复制粘贴准确度的遵守，而不会损害响应质量。

更新时间: 2024-10-09 16:15:36

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2410.07035v1

Defensive Unlearning with Adversarial Training for Robust Concept Erasure in Diffusion Models

Diffusion models (DMs) have achieved remarkable success in text-to-image generation, but they also pose safety risks, such as the potential generation of harmful content and copyright violations. The techniques of machine unlearning, also known as concept erasing, have been developed to address these risks. However, these techniques remain vulnerable to adversarial prompt attacks, which can prompt DMs post-unlearning to regenerate undesired images containing concepts (such as nudity) meant to be erased. This work aims to enhance the robustness of concept erasing by integrating the principle of adversarial training (AT) into machine unlearning, resulting in the robust unlearning framework referred to as AdvUnlearn. However, achieving this effectively and efficiently is highly nontrivial. First, we find that a straightforward implementation of AT compromises DMs' image generation quality post-unlearning. To address this, we develop a utility-retaining regularization on an additional retain set, optimizing the trade-off between concept erasure robustness and model utility in AdvUnlearn. Moreover, we identify the text encoder as a more suitable module for robustification compared to UNet, ensuring unlearning effectiveness. And the acquired text encoder can serve as a plug-and-play robust unlearner for various DM types. Empirically, we perform extensive experiments to demonstrate the robustness advantage of AdvUnlearn across various DM unlearning scenarios, including the erasure of nudity, objects, and style concepts. In addition to robustness, AdvUnlearn also achieves a balanced tradeoff with model utility. To our knowledge, this is the first work to systematically explore robust DM unlearning through AT, setting it apart from existing methods that overlook robustness in concept erasing. Codes are available at: https://github.com/OPTML-Group/AdvUnlearn

Updated: 2024-10-09 16:12:40

标题: 扩散模型中鲁棒概念消除的对抗训练防御性遗忘

摘要: 扩散模型（DMs）在文本到图像生成方面取得了显著成功，但它们也带来了安全风险，如潜在生成有害内容和侵犯版权的可能性。机器未学习技术，也称为概念擦除，已经被开发用于解决这些风险。然而，这些技术仍然容易受到对抗性提示攻击，这可能会导致DMs在未学习后重新生成包含意图擦除的概念（如裸露）的不受欢迎的图像。本文旨在通过将对抗训练（AT）原则整合到机器未学习中，从而增强概念擦除的鲁棒性，形成称为AdvUnlearn的鲁棒未学习框架。然而，实现这一目标的方式既有效又高效是非常困难的。首先，我们发现AT的直接实现会损害DMs在未学习后的图像生成质量。为了解决这个问题，我们在额外的保留集上开发了一个保留效用正则化，优化概念擦除鲁棒性与模型效用之间的平衡。此外，我们确定文本编码器相对于UNet更适合用于提高鲁棒性，确保未学习的有效性。获取的文本编码器可以作为各种DM类型的即插即用的鲁棒未学习器。在经验上，我们进行了大量实验，展示了AdvUnlearn在各种DM未学习场景中的鲁棒性优势，包括擦除裸露、物体和风格概念。除了鲁棒性，AdvUnlearn还实现了与模型效用的平衡折衷。据我们所知，这是第一个通过AT系统地探索鲁棒DM未学习的工作，使其与现有方法区别开来，这些方法忽视了概念擦除中的鲁棒性。代码可在以下链接找到：https://github.com/OPTML-Group/AdvUnlearn

更新时间: 2024-10-09 16:12:40

领域: cs.CV,cs.CR

下载: http://arxiv.org/abs/2405.15234v3

Stochastic Extragradient with Random Reshuffling: Improved Convergence for Variational Inequalities

The Stochastic Extragradient (SEG) method is one of the most popular algorithms for solving finite-sum min-max optimization and variational inequality problems (VIPs) appearing in various machine learning tasks. However, existing convergence analyses of SEG focus on its with-replacement variants, while practical implementations of the method randomly reshuffle components and sequentially use them. Unlike the well-studied with-replacement variants, SEG with Random Reshuffling (SEG-RR) lacks established theoretical guarantees. In this work, we provide a convergence analysis of SEG-RR for three classes of VIPs: (i) strongly monotone, (ii) affine, and (iii) monotone. We derive conditions under which SEG-RR achieves a faster convergence rate than the uniform with-replacement sampling SEG. In the monotone setting, our analysis of SEG-RR guarantees convergence to an arbitrary accuracy without large batch sizes, a strong requirement needed in the classical with-replacement SEG. As a byproduct of our results, we provide convergence guarantees for Shuffle Once SEG (shuffles the data only at the beginning of the algorithm) and the Incremental Extragradient (does not shuffle the data). We supplement our analysis with experiments validating empirically the superior performance of SEG-RR over the classical with-replacement sampling SEG.

Updated: 2024-10-09 16:10:16

标题: 随机外梯度与随机重排列：改进了变分不等式的收敛性

摘要: 随机外梯度（SEG）方法是解决出现在各种机器学习任务中的有限和最小-最大优化和变分不等式问题（VIPs）的最流行算法之一。然而，现有的SEG收敛分析侧重于其带替换变体，而方法的实际实现会随机重排组件并依次使用它们。与经过深入研究的带替换变体不同，带随机重排的SEG（SEG-RR）缺乏已建立的理论保证。在这项工作中，我们为三类VIPs提供了SEG-RR的收敛分析：（i）强单调、（ii）仿射和（iii）单调。我们推导出SEG-RR实现更快收敛速度的条件，比均匀带替换采样的SEG更快。在单调设置中，我们的SEG-RR分析保证在不需要大批量大小的情况下收敛到任意精度，这是经典带替换SEG中所需的强要求。作为我们结果的副产品，我们为Shuffle Once SEG（算法仅在开始时对数据进行重排）和Incremental Extragradient（不对数据进行重排）提供了收敛保证。我们通过实验证明了SEG-RR相对于经典的带替换采样SEG在性能上的优越性。

更新时间: 2024-10-09 16:10:16

领域: math.OC,cs.GT,cs.LG,stat.ML

下载: http://arxiv.org/abs/2403.07148v2

Parameter-Efficient Fine-Tuning via Selective Discrete Cosine Transform

In the era of large language models, parameter-efficient fine-tuning (PEFT) has been extensively studied. However, these approaches usually rely on the space domain, which encounters storage challenges especially when handling extensive adaptations or larger models. The frequency domain, in contrast, is more effective in compressing trainable parameters while maintaining the expressive capability. In this paper, we propose a novel Selective Discrete Cosine Transformation (sDCTFT) fine-tuning scheme to push this frontier. Its general idea is to exploit the superior energy compaction and decorrelation properties of DCT to improve both model efficiency and accuracy. Specifically, it projects the weight change from the low-rank adaptation into the discrete cosine space. Then, the weight change is partitioned over different levels of the discrete cosine spectrum, and the most critical frequency components in each partition are selected. Extensive experiments on four benchmark datasets demonstrate the superior accuracy, reduced computational cost, and lower storage requirements of the proposed method over the prior arts. For instance, when performing instruction tuning on the LLaMA3.1-8B model, sDCTFT outperforms LoRA with just 0.05M trainable parameters compared to LoRA's 38.2M, and surpasses FourierFT with 30\% less trainable parameters. The source code will be publicly available.

Updated: 2024-10-09 16:07:42

标题: 通过选择性离散余弦变换实现参数高效微调

摘要: 在大型语言模型时代，参数高效微调（PEFT）得到了广泛研究。然而，这些方法通常依赖于空间域，特别是在处理大规模适应或更大模型时会遇到存储挑战。相比之下，频域在压缩可训练参数的同时保持表达能力方面更加有效。本文提出了一种新颖的选择性离散余弦变换（sDCTFT）微调方案，以推动这一领域的发展。其基本思想是利用DCT的优越能量压缩和去相关性质来提高模型的效率和准确性。具体而言，它将低秩适应的权重变化投影到离散余弦空间中。然后，将权重变化分配到不同级别的离散余弦谱中，并选择每个分区中最关键的频率成分。对四个基准数据集进行的大量实验表明，所提出的方法相对于先前的技术具有更高的准确性、降低的计算成本和更低的存储需求。例如，在对LLaMA3.1-8B模型进行指导微调时，sDCTFT仅使用0.05M可训练参数，而LoRA使用38.2M可训练参数，超过FourierFT使用30\%更少的可训练参数。源代码将公开提供。

更新时间: 2024-10-09 16:07:42

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2410.09103v1

Do Contemporary CATE Models Capture Real-World Heterogeneity? Findings from a Large-Scale Benchmark

We present unexpected findings from a large-scale benchmark study evaluating Conditional Average Treatment Effect (CATE) estimation algorithms. By running 16 modern CATE models across 43,200 datasets, we find that: (a) 62\% of CATE estimates have a higher Mean Squared Error (MSE) than a trivial zero-effect predictor, rendering them ineffective; (b) in datasets with at least one useful CATE estimate, 80\% still have higher MSE than a constant-effect model; and (c) Orthogonality-based models outperform other models only 30\% of the time, despite widespread optimism about their performance. These findings expose significant limitations in current CATE models and suggest ample opportunities for further research. Our findings stem from a novel application of \textit{observational sampling}, originally developed to evaluate Average Treatment Effect (ATE) estimates from observational methods with experiment data. To adapt observational sampling for CATE evaluation, we introduce a statistical parameter, $Q$, equal to MSE minus a constant and preserves the ranking of models by their MSE. We then derive a family of sample statistics, collectively called $\hat{Q}$, that can be computed from real-world data. We prove that $\hat{Q}$ is a consistent estimator of $Q$ under mild technical conditions. When used in observational sampling, $\hat{Q}$ is unbiased and asymptotically selects the model with the smallest MSE. To ensure the benchmark reflects real-world heterogeneity, we handpick datasets where outcomes come from field rather than simulation. By combining the new observational sampling method, new statistics, and real-world datasets, the benchmark provides a unique perspective on CATE estimator performance and uncover gaps in capturing real-world heterogeneity.

Updated: 2024-10-09 16:04:40

标题: 当代CATE模型是否能捕捉现实世界的异质性？来自大规模基准测试的发现

摘要: 我们提出了一项大规模基准研究，评估条件平均处理效应（CATE）估计算法的意外发现。通过在43,200个数据集上运行16个现代CATE模型，我们发现：(a) 62％的CATE估计具有比微不足道的零效应预测器更高的均方误差（MSE），使它们无效；(b) 在至少有一个有用的CATE估计的数据集中，80％仍然比恒定效应模型具有更高的MSE；(c) 基于正交性的模型仅有30％的时间胜过其他模型，尽管对它们的表现普遍持乐观态度。这些发现揭示了当前CATE模型的显著局限性，并暗示了进一步研究的充足机会。我们的发现源自一种新颖的\textit{观测抽样}应用，最初是为了评估观测方法与实验数据的平均处理效应（ATE）估计值。为了将观测抽样用于CATE评估，我们引入一个统计参数$Q$，等于MSE减去一个常数，并保留模型按其MSE进行排名。然后，我们推导出一系列样本统计量，统称为$\hat{Q}$，可以从真实世界数据中计算得出。我们证明，在温和的技术条件下，$\hat{Q}$是$Q$的一个一致估计量。当用于观测抽样时，$\hat{Q}$是无偏的，并在渐近情况下选择具有最小MSE的模型。为了确保基准反映真实世界的异质性，我们手工挑选了结果来自领域而不是模拟的数据集。通过结合新的观测抽样方法、新的统计量和真实世界数据集，该基准提供了对CATE估计器性能的独特视角，并揭示了捕捉真实世界异质性的差距。

更新时间: 2024-10-09 16:04:40

领域: stat.ML,cs.LG

下载: http://arxiv.org/abs/2410.07021v1

The Vital Role of Gradient Clipping in Byzantine-Resilient Distributed Learning

Byzantine-resilient distributed machine learning seeks to achieve robust learning performance in the presence of misbehaving or adversarial workers. While state-of-the-art (SOTA) robust distributed gradient descent (Robust-DGD) methods were proven theoretically optimal, their empirical success has often relied on pre-aggregation gradient clipping. However, the currently considered static clipping strategy exhibits mixed results: improving robustness against some attacks while being ineffective or detrimental against others. We address this gap by proposing a principled adaptive clipping strategy, termed Adaptive Robust Clipping (ARC). We show that ARC consistently enhances the empirical robustness of SOTA Robust-DGD methods, while preserving the theoretical robustness guarantees. Our analysis shows that ARC provably improves the asymptotic convergence guarantee of Robust-DGD in the case when the model is well-initialized. We validate this theoretical insight through an exhaustive set of experiments on benchmark image classification tasks. We observe that the improvement induced by ARC is more pronounced in highly heterogeneous and adversarial settings.

Updated: 2024-10-09 16:04:01

标题: 《梯度裁剪在拜占庭容错分布式学习中的重要作用》

摘要: 拜占庭弹性分布式机器学习旨在在存在行为不端或对抗性工作者的情况下实现稳健的学习性能。虽然最新技术（SOTA）的稳健分布式梯度下降（Robust-DGD）方法在理论上被证明是最佳的，但它们的实证成功通常依赖于预聚合梯度裁剪。然而，当前考虑的静态裁剪策略表现出了不同的结果：在一些攻击方面提高了稳健性，而在其他方面则无效或有害。我们通过提出一种有原则的自适应裁剪策略，称为自适应稳健裁剪（ARC），来解决这一差距。我们展示了ARC始终提高了SOTA Robust-DGD方法的实证稳健性，同时保持了理论稳健性保证。我们的分析表明，在模型良好初始化的情况下，ARC可以证明改进Robust-DGD的渐近收敛保证。我们通过对基准图像分类任务进行详尽的实验证实了这一理论洞察。我们观察到，ARC引起的改进在高度异质和对抗性环境中更为显著。

更新时间: 2024-10-09 16:04:01

领域: cs.LG

下载: http://arxiv.org/abs/2405.14432v4

LLMs learn governing principles of dynamical systems, revealing an in-context neural scaling law

Pretrained large language models (LLMs) are surprisingly effective at performing zero-shot tasks, including time-series forecasting. However, understanding the mechanisms behind such capabilities remains highly challenging due to the complexity of the models. We study LLMs' ability to extrapolate the behavior of dynamical systems whose evolution is governed by principles of physical interest. Our results show that LLaMA 2, a language model trained primarily on texts, achieves accurate predictions of dynamical system time series without fine-tuning or prompt engineering. Moreover, the accuracy of the learned physical rules increases with the length of the input context window, revealing an in-context version of neural scaling law. Along the way, we present a flexible and efficient algorithm for extracting probability density functions of multi-digit numbers directly from LLMs.

Updated: 2024-10-09 16:02:13

标题: LLMs学习动力系统的治理原则，揭示出一种上下文中的神经缩放定律

摘要: 预训练的大型语言模型(LLMs)在执行零-shot任务(包括时间序列预测)时表现出惊人的有效性。然而，由于模型的复杂性，理解这种能力背后的机制仍然具有极高的挑战性。本文研究LLMs在外推由物理利益原则控制演化的动态系统行为的能力。我们的研究结果显示，主要基于文本训练的语言模型LLaMA 2在无需微调或提示工程的情况下，能够准确预测动态系统时间序列。此外，学习的物理规律的准确性随着输入上下文窗口的长度增加而提高，揭示了一种上下文版本的神经缩放定律。在这过程中，我们提出了一种灵活高效的算法，可以直接从LLMs中提取多位数的概率密度函数。

更新时间: 2024-10-09 16:02:13

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2402.00795v4

Tri-Level Navigator: LLM-Empowered Tri-Level Learning for Time Series OOD Generalization

Out-of-Distribution (OOD) generalization in machine learning is a burgeoning area of study. Its primary goal is to enhance the adaptability and resilience of machine learning models when faced with new, unseen, and potentially adversarial data that significantly diverges from their original training datasets. In this paper, we investigate time series OOD generalization via pre-trained Large Language Models (LLMs). We first propose a novel \textbf{T}ri-level learning framework for \textbf{T}ime \textbf{S}eries \textbf{O}OD generalization, termed TTSO, which considers both sample-level and group-level uncertainties. This formula offers a fresh theoretic perspective for formulating and analyzing OOD generalization problem. In addition, we provide a theoretical analysis to justify this method is well motivated. We then develop a stratified localization algorithm tailored for this tri-level optimization problem, theoretically demonstrating the guaranteed convergence of the proposed algorithm. Our analysis also reveals that the iteration complexity to obtain an $\epsilon$-stationary point is bounded by O($\frac{1}{\epsilon^{2}}$). Extensive experiments on real-world datasets have been conducted to elucidate the effectiveness of the proposed method.

Updated: 2024-10-09 16:00:21

标题: 三级导航器：LLM增强的时序OOD泛化的三级学习

摘要: 机器学习中的OOD（Out-of-Distribution）泛化是一个新兴的研究领域。其主要目标是增强机器学习模型在面对新的、未见过的、可能具有敌对性的数据时的适应能力和韧性，这些数据与它们原始训练数据明显不同。在本文中，我们通过预训练的大型语言模型（LLMs）研究时间序列OOD泛化。我们首先提出了一个新颖的三级学习框架，称为TTSO（Time Series OOD），该框架考虑了样本级和组级的不确定性。这个公式为制定和分析OOD泛化问题提供了新鲜的理论视角。此外，我们提供了理论分析来证明这种方法是合理的。然后，我们开发了一个专为这个三级优化问题量身定制的分层定位算法，理论上证明了所提出算法的收敛性。我们的分析还揭示了获得一个$\epsilon$-稳定点的迭代复杂度被界定为O($\frac{1}{\epsilon^{2}}$)。我们对真实数据集进行了大量实验，以阐明所提出方法的有效性。

更新时间: 2024-10-09 16:00:21

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2410.07018v1

Optimizing Estimators of Squared Calibration Errors in Classification

In this work, we propose a mean-squared error-based risk that enables the comparison and optimization of estimators of squared calibration errors in practical settings. Improving the calibration of classifiers is crucial for enhancing the trustworthiness and interpretability of machine learning models, especially in sensitive decision-making scenarios. Although various calibration (error) estimators exist in the current literature, there is a lack of guidance on selecting the appropriate estimator and tuning its hyperparameters. By leveraging the bilinear structure of squared calibration errors, we reformulate calibration estimation as a regression problem with independent and identically distributed (i.i.d.) input pairs. This reformulation allows us to quantify the performance of different estimators even for the most challenging calibration criterion, known as canonical calibration. Our approach advocates for a training-validation-testing pipeline when estimating a calibration error on an evaluation dataset. We demonstrate the effectiveness of our pipeline by optimizing existing calibration estimators and comparing them with novel kernel ridge regression-based estimators on standard image classification tasks.

Updated: 2024-10-09 15:58:06

标题: 优化分类中平方校准误差的估计量

摘要: 在这项工作中，我们提出了一种基于均方误差的风险，可以在实际环境中比较和优化平方校准误差的估计器。改进分类器的校准对于增强机器学习模型的可靠性和可解释性至关重要，特别是在敏感的决策场景中。尽管当前文献中存在各种校准（误差）估计器，但缺乏选择适当估计器并调整其超参数的指导。通过利用平方校准误差的双线性结构，我们将校准估计重新制定为具有独立同分布（i.i.d.）输入对的回归问题。这种重新制定使我们能够量化不同估计器的性能，即使是最具挑战性的校准标准，即称为经典校准。我们的方法主张在评估数据集上估计校准误差时采用训练-验证-测试流程。我们通过优化现有的校准估计器，并将它们与基于新型核岭回归的估计器在标准图像分类任务中进行比较，展示了我们流程的有效性。

更新时间: 2024-10-09 15:58:06

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2410.07014v1

Causal Representation Learning in Temporal Data via Single-Parent Decoding

Scientific research often seeks to understand the causal structure underlying high-level variables in a system. For example, climate scientists study how phenomena, such as El Ni\~no, affect other climate processes at remote locations across the globe. However, scientists typically collect low-level measurements, such as geographically distributed temperature readings. From these, one needs to learn both a mapping to causally-relevant latent variables, such as a high-level representation of the El Ni\~no phenomenon and other processes, as well as the causal model over them. The challenge is that this task, called causal representation learning, is highly underdetermined from observational data alone, requiring other constraints during learning to resolve the indeterminacies. In this work, we consider a temporal model with a sparsity assumption, namely single-parent decoding: each observed low-level variable is only affected by a single latent variable. Such an assumption is reasonable in many scientific applications that require finding groups of low-level variables, such as extracting regions from geographically gridded measurement data in climate research or capturing brain regions from neural activity data. We demonstrate the identifiability of the resulting model and propose a differentiable method, Causal Discovery with Single-parent Decoding (CDSD), that simultaneously learns the underlying latents and a causal graph over them. We assess the validity of our theoretical results using simulated data and showcase the practical validity of our method in an application to real-world data from the climate science field.

Updated: 2024-10-09 15:57:50

标题: 在时间数据中通过单父节点解码进行因果表示学习

摘要: 科学研究通常旨在理解系统中高级变量背后的因果结构。例如，气候科学家研究现象，如厄尔尼诺，如何影响全球各地的其他气候过程。然而，科学家通常收集低级别的测量数据，如地理分布的温度读数。从这些数据中，我们需要学习到一种映射，将其转化为因果相关的潜在变量，例如厄尔尼诺现象和其他过程的高级表示，以及它们之间的因果模型。挑战在于这个任务，称为因果表示学习，仅凭观测数据就是高度欠定的，需要在学习过程中施加其他约束来解决不确定性。在这项工作中，我们考虑了一个具有稀疏假设的时间模型，即单亲解码：每个观察到的低级别变量只受到一个潜在变量的影响。这种假设在许多科学应用中是合理的，这些应用需要找到低级变量的组，例如在气候研究中从地理格点测量数据中提取区域，或者从神经活动数据中捕捉大脑区域。我们演示了所得模型的可辨识性，并提出了一种可微分方法，即具有单亲解码的因果发现（CDSD），该方法同时学习潜在变量和它们之间的因果图。我们使用模拟数据评估了我们理论结果的有效性，并展示了我们的方法在应用于气候科学领域的真实数据中的实际有效性。

更新时间: 2024-10-09 15:57:50

领域: cs.LG

下载: http://arxiv.org/abs/2410.07013v1

Pap2Pat: Towards Automated Paper-to-Patent Drafting using Chunk-based Outline-guided Generation

The patent domain is gaining attention in natural language processing research, offering practical applications in streamlining the patenting process and providing challenging benchmarks for large language models (LLMs). However, the generation of the description sections of patents, which constitute more than 90% of the patent document, has not been studied to date. We address this gap by introducing the task of outline-guided paper-to-patent generation, where an academic paper provides the technical specification of the invention and an outline conveys the desired patent structure. We present PAP2PAT, a new challenging benchmark of 1.8k patent-paper pairs with document outlines, collected using heuristics that reflect typical research lab practices. Our experiments with current open-weight LLMs and outline-guided chunk-based generation show that they can effectively use information from the paper but struggle with repetitions, likely due to the inherent repetitiveness of patent language. We release our data and code.

Updated: 2024-10-09 15:52:48

标题: Pap2Pat: 基于分块大纲引导生成的自动论文到专利草拟

摘要: 专利领域在自然语言处理研究中越来越受到关注，提供了在简化专利申请过程中提供实际应用的机会，并为大型语言模型（LLMs）提供了具有挑战性的基准。然而，专利描述部分的生成，其占专利文档的90%以上，迄今尚未得到研究。我们通过引入基于大纲指导的论文到专利生成任务来填补这一空白，其中学术论文提供了发明的技术规范，而大纲传达了所需的专利结构。我们提出了PAP2PAT，一个包含1.8k专利-论文对和文档大纲的新挑战性基准，使用反映典型研究实验室实践的启发式方法收集。我们对当前的开放权重LLMs和基于大纲指导的基于块的生成进行了实验，结果表明它们可以有效地利用论文中的信息，但在重复方面存在困难，可能是由于专利语言本身的重复性。我们发布了我们的数据和代码。

更新时间: 2024-10-09 15:52:48

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2410.07009v1

Through the Looking Glass: Mirror Schrödinger Bridges

Resampling from a target measure whose density is unknown is a fundamental problem in mathematical statistics and machine learning. A setting that dominates the machine learning literature consists of learning a map from an easy-to-sample prior, such as the Gaussian distribution, to a target measure. Under this model, samples from the prior are pushed forward to generate a new sample on the target measure, which is often difficult to sample from directly. In this paper, we propose a new model for conditional resampling called mirror Schr\"odinger bridges. Our key observation is that solving the Schr\"odinger bridge problem between a distribution and itself provides a natural way to produce new samples from conditional distributions, giving in-distribution variations of an input data point. We show how to efficiently solve this largely overlooked version of the Schr\"odinger bridge problem. We prove that our proposed method leads to significant algorithmic simplifications over existing alternatives, in addition to providing control over in-distribution variation. Empirically, we demonstrate how these benefits can be leveraged to produce proximal samples in a number of application domains.

Updated: 2024-10-09 15:48:56

标题: 穿越镜子：镜子薛定谔桥

摘要: 从一个密度未知的目标度量中重新采样是数学统计和机器学习中的一个基本问题。在机器学习文献中占主导地位的一种情境是学习从易于抽样的先验分布（如高斯分布）到目标度量的映射。在这个模型下，从先验中抽样被推送到目标度量上生成一个新的样本，而直接从目标度量中抽样通常是困难的。在本文中，我们提出了一个新的条件重新采样模型，称为镜子Schr\"odinger桥。我们的关键观察是解决一个分布与自身之间的Schr\"odinger桥问题提供了一种自然的方式来产生从条件分布中产生新样本，给出了输入数据点的分布内变异。我们展示了如何高效地解决这个被大多数人忽视的Schr\"odinger桥问题的版本。我们证明了我们提出的方法相比现有的替代方法导致了显著的算法简化，此外还提供了对分布内变异的控制。在经验上，我们演示了如何利用这些好处在许多应用领域中产生近似样本。

更新时间: 2024-10-09 15:48:56

领域: cs.LG

下载: http://arxiv.org/abs/2410.07003v1

CursorCore: Assist Programming through Aligning Anything

Large language models have been successfully applied to programming assistance tasks, such as code completion, code insertion, and instructional code editing. However, these applications remain insufficiently automated and struggle to effectively integrate various types of information during the programming process, including coding history, current code, and user instructions. In this work, we propose a new conversational framework that comprehensively integrates these information sources, collect data to train our models and evaluate their performance. Firstly, to thoroughly evaluate how well models align with different types of information and the quality of their outputs, we introduce a new benchmark, APEval (Assist Programming Eval), to comprehensively assess the performance of models in programming assistance tasks. Then, for data collection, we develop a data generation pipeline, Programming-Instruct, which synthesizes training data from diverse sources, such as GitHub and online judge platforms. This pipeline can automatically generate various types of messages throughout the programming process. Finally, using this pipeline, we generate 219K samples, fine-tune multiple models, and develop the CursorCore series. We show that CursorCore outperforms other models of comparable size. This framework unifies applications such as inline chat and automated editing, contributes to the advancement of coding assistants. Code, models and data are freely available at https://github.com/TechxGenus/CursorCore.

Updated: 2024-10-09 15:45:52

标题: CursorCore：通过对齐任何内容辅助编程

摘要: 大型语言模型已成功应用于编程辅助任务，如代码补全、代码插入和指导性代码编辑。然而，这些应用仍然不够自动化，难以有效整合编程过程中的各种信息，包括编码历史、当前代码和用户指令。在这项工作中，我们提出了一个新的对话框架，全面整合这些信息源，收集数据来训练我们的模型并评估它们的性能。首先，为了全面评估模型与不同类型信息的对齐程度和其输出质量，我们引入了一个新的基准，APEval（辅助编程评估），全面评估模型在编程辅助任务中的表现。然后，为了数据收集，我们开发了一个数据生成管道，Programming-Instruct，它从各种来源（如GitHub和在线评判平台）合成训练数据。这个管道可以自动生成编程过程中的各种类型消息。最后，利用这个管道，我们生成了219K个样本，微调多个模型，并开发了CursorCore系列。我们展示了CursorCore在性能上超过了其他大小相当的模型。这个框架统一了诸如内联聊天和自动编辑等应用，促进了编码助手的进步。代码、模型和数据可在https://github.com/TechxGenus/CursorCore免费获取。

更新时间: 2024-10-09 15:45:52

领域: cs.CL,cs.AI,cs.SE

下载: http://arxiv.org/abs/2410.07002v1

When "A Helpful Assistant" Is Not Really Helpful: Personas in System Prompts Do Not Improve Performances of Large Language Models

Prompting serves as the major way humans interact with Large Language Models (LLM). Commercial AI systems commonly define the role of the LLM in system prompts. For example, ChatGPT uses ``You are a helpful assistant'' as part of its default system prompt. Despite current practices of adding personas to system prompts, it remains unclear how different personas affect a model's performance on objective tasks. In this study, we present a systematic evaluation of personas in system prompts. We curate a list of 162 roles covering 6 types of interpersonal relationships and 8 domains of expertise. Through extensive analysis of 4 popular families of LLMs and 2,410 factual questions, we demonstrate that adding personas in system prompts does not improve model performance across a range of questions compared to the control setting where no persona is added. Nevertheless, further analysis suggests that the gender, type, and domain of the persona can all influence the resulting prediction accuracies. We further experimented with a list of persona search strategies and found that, while aggregating results from the best persona for each question significantly improves prediction accuracy, automatically identifying the best persona is challenging, with predictions often performing no better than random selection. Overall, our findings suggest that while adding a persona may lead to performance gains in certain settings, the effect of each persona can be largely random. Code and data are available at https://github.com/Jiaxin-Pei/Prompting-with-Social-Roles.

Updated: 2024-10-09 15:44:36

标题: 当“一个有用的助手”并不真正有帮助：系统提示中的角色并不能提高大型语言模型的性能

摘要: 提示是人类与大型语言模型（LLM）进行互动的主要方式。商业人工智能系统通常通过系统提示定义LLM的角色。例如，ChatGPT在其默认系统提示中使用“您是一个有帮助的助手”。尽管当前的做法是在系统提示中添加角色，但不清楚不同角色如何影响模型在客观任务上的表现。在这项研究中，我们对系统提示中的角色进行了系统评估。我们策划了一个包括6种人际关系和8个专业领域的162个角色列表。通过对4个流行的LLM系列和2,410个事实问题的广泛分析，我们证明在系统提示中添加角色并不会提高模型在一系列问题上的性能，与未添加角色的控制设置相比。然而，进一步分析表明，角色的性别、类型和领域都可能影响最终的预测准确性。我们进一步尝试了一系列角色搜索策略，并发现，虽然聚合每个问题的最佳角色的结果显著提高了预测准确性，但自动识别最佳角色具有挑战性，预测通常并不比随机选择更好。总的来说，我们的发现表明，虽然添加角色可能会在某些情况下带来性能提升，但每个角色的效果可能主要是随机的。代码和数据可在https://github.com/Jiaxin-Pei/Prompting-with-Social-Roles找到。

更新时间: 2024-10-09 15:44:36

领域: cs.CL,cs.AI,cs.CY,cs.HC,cs.LG

下载: http://arxiv.org/abs/2311.10054v3

Can Your Generative Model Detect Out-of-Distribution Covariate Shift?

Detecting Out-of-Distribution (OOD) sensory data and covariate distribution shift aims to identify new test examples with different high-level image statistics to the captured, normal and In-Distribution (ID) set. Existing OOD detection literature largely focuses on semantic shift with little-to-no consensus over covariate shift. Generative models capture the ID data in an unsupervised manner, enabling them to effectively identify samples that deviate significantly from this learned distribution, irrespective of the downstream task. In this work, we elucidate the ability of generative models to detect and quantify domain-specific covariate shift through extensive analyses that involves a variety of models. To this end, we conjecture that it is sufficient to detect most occurring sensory faults (anomalies and deviations in global signals statistics) by solely modeling high-frequency signal-dependent and independent details. We propose a novel method, CovariateFlow, for OOD detection, specifically tailored to covariate heteroscedastic high-frequency image-components using conditional Normalizing Flows (cNFs). Our results on CIFAR10 vs. CIFAR10-C and ImageNet200 vs. ImageNet200-C demonstrate the effectiveness of the method by accurately detecting OOD covariate shift. This work contributes to enhancing the fidelity of imaging systems and aiding machine learning models in OOD detection in the presence of covariate shift.

Updated: 2024-10-09 15:44:35

标题: 您的生成模型能否检测到分布偏移的外部协变量？

摘要: 检测分布外（OOD）的感官数据和协变量分布转移旨在识别具有不同高层图像统计信息的新测试示例，以捕获的正常和分布内（ID）集合。现有的OOD检测文献主要集中在语义转移，对协变量转移几乎没有共识。生成模型以无监督的方式捕获ID数据，使它们能够有效地识别与这种学习分布显著偏离的样本，无论下游任务如何。在这项工作中，我们阐明了生成模型通过广泛的分析来检测和量化领域特定的协变量转移的能力，其中涉及各种模型。为此，我们推测仅通过对高频信号相关和独立细节进行建模，就足以检测大多数发生的感官错误（全局信号统计中的异常和偏差）。我们提出了一种新方法，CovariateFlow，用于OOD检测，专门针对协变量异方差高频图像组件，使用条件归一化流（cNFs）。我们在CIFAR10 vs. CIFAR10-C和ImageNet200 vs. ImageNet200-C上的结果表明，该方法通过准确检测OOD协变量转移的效果。这项工作有助于提高成像系统的可靠性，并在存在协变量转移的情况下帮助机器学习模型进行OOD检测。

更新时间: 2024-10-09 15:44:35

领域: cs.CV,cs.AI,cs.LG

下载: http://arxiv.org/abs/2409.03043v2

Efficient Distribution Matching of Representations via Noise-Injected Deep InfoMax

Deep InfoMax (DIM) is a well-established method for self-supervised representation learning (SSRL) based on maximization of the mutual information between the input and the output of a deep neural network encoder. Despite the DIM and contrastive SSRL in general being well-explored, the task of learning representations conforming to a specific distribution (i.e., distribution matching, DM) is still under-addressed. Motivated by the importance of DM to several downstream tasks (including generative modeling, disentanglement, outliers detection and other), we enhance DIM to enable automatic matching of learned representations to a selected prior distribution. To achieve this, we propose injecting an independent noise into the normalized outputs of the encoder, while keeping the same InfoMax training objective. We show that such modification allows for learning uniformly and normally distributed representations, as well as representations of other absolutely continuous distributions. Our approach is tested on various downstream tasks. The results indicate a moderate trade-off between the performance on the downstream tasks and quality of DM.

Updated: 2024-10-09 15:40:04

标题: 通过注入噪音的深度信息最大化方法实现表示的高效分布匹配

摘要: Deep InfoMax（DIM）是一种基于最大化深度神经网络编码器的输入和输出之间的互信息的自监督表示学习（SSRL）方法。尽管DIM和对比SSRL一般都得到了很好的探索，但学习符合特定分布（即分布匹配，DM）的表示仍未得到充分解决。受到DM对几个下游任务（包括生成建模、解缠、异常值检测等）的重要性的启发，我们增强了DIM，使其能够自动将学习到的表示匹配到选定的先验分布。为实现这一目标，我们提出向编码器的归一化输出注入独立噪声，同时保持相同的InfoMax训练目标。我们表明，这种修改可以实现学习均匀分布和正态分布的表示，以及其他绝对连续分布的表示。我们的方法在各种下游任务上进行了测试。结果表明，在下游任务的性能和DM质量之间存在适度的权衡。

更新时间: 2024-10-09 15:40:04

领域: cs.LG,cs.IT,math.IT,stat.ML,94A16 (Primary) 68T07, 94A17 (Secondary),E.4; H.1.1

下载: http://arxiv.org/abs/2410.06993v1

NetDiff: Deep Graph Denoising Diffusion for Ad Hoc Network Topology Generation

This work introduces NetDiff, an expressive graph denoising diffusion probabilistic architecture that generates wireless ad hoc network link topologies. Such networks, with directional antennas, can achieve unmatched performance when the communication links are designed to provide good geometric properties, notably by reducing interference between these links while respecting diverse physical constraints. How to craft such a link assignment algorithm is yet a real problem. Deep graph generation offers multiple advantages compared to traditional approaches: it allows to relieve the network nodes of the communication burden caused by the search of viable links and to avoid resorting to heavy combinatorial methods to find a good link topology. Denoising diffusion also provides a built-in method to update the network over time. Given that graph neural networks sometimes tend to struggle with global, structural properties, we augment the popular graph transformer with cross-attentive modulation tokens in order to improve global control over the predicted topology. We also incorporate simple node and edge features, as well as additional loss terms, to facilitate the compliance with the network topology physical constraints. A network evolution algorithm based on partial diffusion is also proposed to maintain a stable network topology over time when the nodes move. Our results show that the generated links are realistic, present structural properties similar to the dataset graphs', and require only minor corrections and verification steps to be operational.

Updated: 2024-10-09 15:39:49

标题: NetDiff：用于自组织网络拓扑生成的深度图去噪扩散

摘要: 这项工作介绍了NetDiff，一种表达性图去噪扩散概率架构，用于生成无线自组网链路拓扑。在具有定向天线的网络中，当通信链路设计提供良好的几何特性时，可以实现无与伦比的性能，特别是通过减少这些链路之间的干扰同时尊重各种物理约束。如何设计这样的链路分配算法仍然是一个真正的问题。深度图生成相对传统方法具有多重优势：它可以减轻网络节点由于搜索可行链路而带来的通信负担，并避免使用繁重的组合方法来找到良好的链路拓扑。去噪扩散还提供了一种内置方法来随时间更新网络。鉴于图神经网络有时倾向于在全局、结构性质方面遇到困难，我们利用交叉注意力调制令牌来增强流行的图变换器，以提高对预测拓扑的全局控制。我们还结合了简单的节点和边特征，以及额外的损失项，以促进符合网络拓扑物理约束。还提出了基于部分扩散的网络演化算法，用于在节点移动时随时间维持稳定的网络拓扑。我们的结果显示，生成的链路是现实的，具有类似数据集图的结构性质，并且只需要进行轻微的校正和验证步骤即可投入运行。

更新时间: 2024-10-09 15:39:49

领域: cs.SI,cs.LG,cs.NI

下载: http://arxiv.org/abs/2410.08238v1

Fine-tuning can Help Detect Pretraining Data from Large Language Models

In the era of large language models (LLMs), detecting pretraining data has been increasingly important due to concerns about fair evaluation and ethical risks. Current methods differentiate members and non-members by designing scoring functions, like Perplexity and Min-k%. However, the diversity and complexity of training data magnifies the difficulty of distinguishing, leading to suboptimal performance in detecting pretraining data. In this paper, we first explore the benefits of unseen data, which can be easily collected after the release of the LLM. We find that the perplexities of LLMs perform differently for members and non-members, after fine-tuning with a small amount of previously unseen data. In light of this, we introduce a novel and effective method termed Fine-tuned Score Deviation (FSD), which improves the performance of current scoring functions for pretraining data detection. In particular, we propose to measure the deviation distance of current scores after fine-tuning on a small amount of unseen data within the same domain. In effect, using a few unseen data can largely decrease the scores of all non-members, leading to a larger deviation distance than members. Extensive experiments demonstrate the effectiveness of our method, significantly improving the AUC score on common benchmark datasets across various models.

Updated: 2024-10-09 15:36:42

标题: 微调可以帮助检测大型语言模型的预训练数据

摘要: 在大语言模型(LLMs)的时代，由于对公平评估和伦理风险的担忧，检测预训练数据变得越来越重要。当前的方法通过设计得分函数（如Perplexity和Min-k%）来区分成员和非成员。然而，训练数据的多样性和复杂性加大了区分的难度，导致在检测预训练数据方面表现不佳。在本文中，我们首先探讨了未见数据的好处，在LLM发布后可以轻松收集。我们发现，在用少量以前未见数据进行微调后，LLMs的困惑度对成员和非成员表现出不同的性能。基于此，我们提出了一种名为Fine-tuned Score Deviation（FSD）的新颖有效方法，改进了当前用于检测预训练数据的得分函数的性能。具体而言，我们建议在相同领域内用少量未见数据进行微调后，测量当前得分的偏差距离。实际上，使用少量未见数据可以大幅降低所有非成员的得分，导致比成员更大的偏差距离。大量实验证明了我们方法的有效性，在各种模型上显著提高了常见基准数据集的AUC分数。

更新时间: 2024-10-09 15:36:42

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2410.10880v1

MeteoRA: Multiple-tasks Embedded LoRA for Large Language Models

The pretrain+fine-tune paradigm is foundational for deploying large language models (LLMs) across various downstream applications. Within this framework, Low-Rank Adaptation (LoRA) stands out for its parameter-efficient fine-tuning (PEFT), producing numerous reusable task-specific LoRA adapters. However, this approach requires explicit task intention selection, posing challenges for autonomous task sensing and switching during inference with multiple existing LoRA adapters embedded in a single LLM. In this work, we introduce MeteoRA (Multiple-tasks embedded LoRA), a scalable and efficient framework that reuses multiple task-specific LoRA adapters into the base LLM via a full-mode Mixture-of-Experts (MoE) architecture. This framework also includes novel MoE forward acceleration strategies to address the efficiency challenges of traditional MoE implementations. Our evaluation, using the LlaMA2-13B and LlaMA3-8B base models equipped with 28 existing LoRA adapters through MeteoRA, demonstrates equivalent performance with the traditional PEFT method. Moreover, the LLM equipped with MeteoRA achieves superior performance in handling composite tasks, effectively solving ten sequential problems in a single inference pass, thereby demonstrating the framework's enhanced capability for timely adapter switching.

Updated: 2024-10-09 15:33:10

标题: MeteoRA: 用于大型语言模型的多任务嵌入式 LoRA

摘要: 预训练+微调范式是在各种下游应用中部署大型语言模型（LLMs）的基础。在这个框架内，低秩适应（LoRA）以其参数高效的微调（PEFT）脱颖而出，产生了大量可重复使用的特定任务LoRA适配器。然而，这种方法需要明确的任务意图选择，在推断过程中与嵌入在单个LLM中的多个现有LoRA适配器进行自主任务感知和切换时带来挑战。在这项工作中，我们介绍了MeteoRA（多任务嵌入LoRA），这是一个可扩展和高效的框架，通过全模式专家混合（MoE）架构将多个特定任务的LoRA适配器重新利用到基础LLM中。该框架还包括新颖的MoE前向加速策略，以解决传统MoE实现的效率挑战。我们使用通过MeteoRA装备了28个现有LoRA适配器的LlaMA2-13B和LlaMA3-8B基础模型进行评估，证明了与传统PEFT方法相当的性能。此外，装备了MeteoRA的LLM在处理复合任务方面表现出更优异的性能，有效地在单次推断中解决十个连续问题，从而展示了该框架在及时适配器切换方面的增强能力。

更新时间: 2024-10-09 15:33:10

领域: cs.CL,cs.AI,I.2.7

下载: http://arxiv.org/abs/2405.13053v3

Symbolic Recovery of Differential Equations: The Identifiability Problem

Symbolic recovery of differential equations is the ambitious attempt at automating the derivation of governing equations with the use of machine learning techniques. In contrast to classical methods which assume the structure of the equation to be known and focus on the estimation of specific parameters, these algorithms aim to learn the structure and the parameters simultaneously. While the uniqueness and, therefore, the identifiability of parameters of governing equations are a well-addressed problem in the field of parameter estimation, it has not been investigated for symbolic recovery. However, this problem should be even more present in this field since the algorithms aim to cover larger spaces of governing equations. In this paper, we investigate under which conditions a solution of a differential equation does not uniquely determine the equation itself. For various classes of differential equations, we provide both necessary and sufficient conditions for a function to uniquely determine the corresponding differential equation. We then use our results to devise numerical algorithms aiming to determine whether a function solves a differential equation uniquely. Finally, we provide extensive numerical experiments showing that our algorithms can indeed guarantee the uniqueness of the learned governing differential equation, without assuming any knowledge about the analytic form of function, thereby ensuring the reliability of the learned equation.

Updated: 2024-10-09 15:27:08

标题: 微分方程的符号恢复：可辨识性问题

摘要: 符号恢复微分方程是一种雄心勃勃的尝试，旨在利用机器学习技术自动推导控制方程。与传统方法不同，传统方法假设方程的结构已知，并专注于估计特定参数，这些算法旨在同时学习结构和参数。虽然参数估计领域中控制方程参数的独特性和可识别性是一个被广泛讨论的问题，但这个问题尚未被用于符号恢复。然而，这个问题在这个领域中应该更加普遍，因为这些算法旨在涵盖更大空间的控制方程。在本文中，我们研究了在哪些条件下微分方程的解不能唯一确定方程本身。对于各种类别的微分方程，我们提供了函数唯一确定相应微分方程的必要和充分条件。然后，我们利用我们的结果设计了数值算法，旨在确定函数是否唯一解决微分方程。最后，我们提供了广泛的数值实验，显示我们的算法确实可以保证学习到的控制微分方程的唯一性，而不需要假设对函数的解析形式有任何知识，从而确保学习到的方程的可靠性。

更新时间: 2024-10-09 15:27:08

领域: cs.LG,math-ph,math.MP

下载: http://arxiv.org/abs/2210.08342v9

A Unified Generative Framework for Realistic Lidar Simulation in Autonomous Driving Systems

Simulation models for perception sensors are integral components of automotive simulators used for the virtual Verification and Validation (V\&V) of Autonomous Driving Systems (ADS). These models also serve as powerful tools for generating synthetic datasets to train deep learning-based perception models. Lidar is a widely used sensor type among the perception sensors for ADS due to its high precision in 3D environment scanning. However, developing realistic Lidar simulation models is a significant technical challenge. In particular, unrealistic models can result in a large gap between the synthesised and real-world point clouds, limiting their effectiveness in ADS applications. Recently, deep generative models have emerged as promising solutions to synthesise realistic sensory data. However, for Lidar simulation, deep generative models have been primarily hybridised with conventional algorithms, leaving unified generative approaches largely unexplored in the literature. Motivated by this research gap, we propose a unified generative framework to enhance Lidar simulation fidelity. Our proposed framework projects Lidar point clouds into depth-reflectance images via a lossless transformation, and employs our novel Controllable Lidar point cloud Generative model, CoLiGen, to translate the images. We extensively evaluate our CoLiGen model, comparing it with the state-of-the-art image-to-image translation models using various metrics to assess the realness, faithfulness, and performance of a downstream perception model. Our results show that CoLiGen exhibits superior performance across most metrics. The dataset and source code for this research are available at https://github.com/hamedhaghighi/CoLiGen.git.

Updated: 2024-10-09 15:26:25

标题: 一个统一的生成框架用于自动驾驶系统中逼真激光雷达模拟

摘要: 感知传感器的仿真模型是用于自动驾驶系统（ADS）的虚拟验证和验证（V＆V）的汽车模拟器的组成部分。这些模型还可以作为强大的工具，用于生成用于训练基于深度学习的感知模型的合成数据集。由于其在3D环境扫描中的高精度，激光雷达是ADS中广泛使用的传感器类型之一。然而，开发真实的激光雷达仿真模型是一个重要的技术挑战。特别是，不真实的模型可能导致合成和真实世界点云之间存在很大的差距，从而限制了它们在ADS应用中的有效性。最近，深度生成模型已经成为合成真实感知数据的有希望的解决方案。然而，对于激光雷达仿真，深度生成模型主要与传统算法混合，统一的生成方法在文献中基本未被探讨。受到这一研究空白的启发，我们提出了一个统一的生成框架来增强激光雷达仿真的保真度。我们提出的框架通过无损变换将激光雷达点云投影到深度-反射图像中，并采用我们的新颖的可控激光雷达点云生成模型CoLiGen来转换这些图像。我们广泛评估了我们的CoLiGen模型，将其与各种指标使用最先进的图像到图像翻译模型进行比较，以评估下游感知模型的真实性、忠实度和性能。我们的结果表明，CoLiGen在大多数指标上表现出优越性能。本研究的数据集和源代码可在https://github.com/hamedhaghighi/CoLiGen.git获取。

更新时间: 2024-10-09 15:26:25

领域: cs.CV,cs.LG,cs.RO,eess.IV

下载: http://arxiv.org/abs/2312.15817v2

GPT-4V Cannot Generate Radiology Reports Yet

GPT-4V's purported strong multimodal abilities raise interests in using it to automate radiology report writing, but there lacks thorough evaluations. In this work, we perform a systematic evaluation of GPT-4V in generating radiology reports on two chest X-ray report datasets: MIMIC-CXR and IU X-Ray. We attempt to directly generate reports using GPT-4V through different prompting strategies and find that it fails terribly in both lexical metrics and clinical efficacy metrics. To understand the low performance, we decompose the task into two steps: 1) the medical image reasoning step of predicting medical condition labels from images; and 2) the report synthesis step of generating reports from (groundtruth) conditions. We show that GPT-4V's performance in image reasoning is consistently low across different prompts. In fact, the distributions of model-predicted labels remain constant regardless of which groundtruth conditions are present on the image, suggesting that the model is not interpreting chest X-rays meaningfully. Even when given groundtruth conditions in report synthesis, its generated reports are less correct and less natural-sounding than a finetuned LLaMA-2. Altogether, our findings cast doubt on the viability of using GPT-4V in a radiology workflow.

Updated: 2024-10-09 15:23:44

标题: GPT-4V目前无法生成放射学报告

摘要: GPT-4V被认为具有强大的多模态能力，引起了将其用于自动化放射学报告撰写的兴趣，但缺乏彻底的评估。在这项工作中，我们对GPT-4V在生成胸部X射线报告方面进行了系统评估，使用了MIMIC-CXR和IU X射线两个数据集。我们尝试通过不同的提示策略直接使用GPT-4V生成报告，发现它在词汇指标和临床效果指标方面都表现糟糕。为了理解低性能，我们将任务分解为两个步骤：1）医学图像推理步骤，即从图像中预测医学状况标签；2）报告综合步骤，即从（基准）状况生成报告。我们发现，GPT-4V在图像推理方面的性能在不同提示下始终较低。事实上，模型预测的标签分布保持不变，无论图像上出现哪些基准条件，这表明该模型无法有意义地解释胸部X射线。即使在报告综合中提供基准条件，其生成的报告也比经过微调的LLaMA-2更不正确，听起来也不太自然。总的来说，我们的发现对于将GPT-4V用于放射学工作流程的可行性提出了质疑。

更新时间: 2024-10-09 15:23:44

领域: cs.CY,cs.AI,cs.CL

下载: http://arxiv.org/abs/2407.12176v2

Diffusion Density Estimators

We investigate the use of diffusion models as neural density estimators. The current approach to this problem involves converting the generative process to a smooth flow, known as the Probability Flow ODE. The log density at a given sample can be obtained by solving the ODE with a black-box solver. We introduce a new, highly parallelizable method that computes log densities without the need to solve a flow. Our approach is based on estimating a path integral by Monte Carlo, in a manner identical to the simulation-free training of diffusion models. We also study how different training parameters affect the accuracy of the density calculation, and offer insights into how these models can be made more scalable and efficient.

Updated: 2024-10-09 15:21:53

标题: 扩散密度估计器

摘要: 我们研究了扩散模型作为神经密度估计器的应用。目前解决这个问题的方法涉及将生成过程转换为平滑流，称为概率流ODE。通过使用黑匣子求解器解ODE，可以获得给定样本的对数密度。我们引入了一种新的高度可并行化的方法，可以计算对数密度而无需解流。我们的方法是基于通过蒙特卡洛估计路径积分，与无需模拟的扩散模型训练方法相同。我们还研究了不同训练参数如何影响密度计算的准确性，并提供了关于如何使这些模型更具可扩展性和效率的见解。

更新时间: 2024-10-09 15:21:53

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2410.06986v1

Sparse Autoencoders Reveal Universal Feature Spaces Across Large Language Models

We investigate feature universality in large language models (LLMs), a research field that aims to understand how different models similarly represent concepts in the latent spaces of their intermediate layers. Demonstrating feature universality allows discoveries about latent representations to generalize across several models. However, comparing features across LLMs is challenging due to polysemanticity, in which individual neurons often correspond to multiple features rather than distinct ones. This makes it difficult to disentangle and match features across different models. To address this issue, we employ a method known as dictionary learning by using sparse autoencoders (SAEs) to transform LLM activations into more interpretable spaces spanned by neurons corresponding to individual features. After matching feature neurons across models via activation correlation, we apply representational space similarity metrics like Singular Value Canonical Correlation Analysis to analyze these SAE features across different LLMs. Our experiments reveal significant similarities in SAE feature spaces across various LLMs, providing new evidence for feature universality.

Updated: 2024-10-09 15:18:57

标题: 稀疏自动编码器揭示了大型语言模型之间的通用特征空间

摘要: 我们研究了大型语言模型（LLMs）中的特征普适性，这是一个旨在了解不同模型如何在中间层的潜在空间中类似地表示概念的研究领域。展示特征的普适性可以使潜在表示的发现能够在几个模型中推广。然而，由于多义性，跨LLMs比较特征是具有挑战性的，其中个别神经元通常与多个特征而不是不同的特征对应。这使得在不同模型之间解开和匹配特征变得困难。为了解决这个问题，我们采用一种称为字典学习的方法，通过使用稀疏自动编码器（SAEs）将LLM激活转换为更可解释的空间，该空间由对应于单个特征的神经元构成。通过激活相关性匹配模型间的特征神经元后，我们应用诸如奇异值典型相关分析之类的表示空间相似性度量来分析这些SAE特征在不同LLMs中的情况。我们的实验揭示了各种LLMs中SAE特征空间的显著相似性，为特征普适性提供了新的证据。

更新时间: 2024-10-09 15:18:57

领域: cs.LG,cs.AI,cs.CL

下载: http://arxiv.org/abs/2410.06981v1

Adaptive High-Frequency Transformer for Diverse Wildlife Re-Identification

Wildlife ReID involves utilizing visual technology to identify specific individuals of wild animals in different scenarios, holding significant importance for wildlife conservation, ecological research, and environmental monitoring. Existing wildlife ReID methods are predominantly tailored to specific species, exhibiting limited applicability. Although some approaches leverage extensively studied person ReID techniques, they struggle to address the unique challenges posed by wildlife. Therefore, in this paper, we present a unified, multi-species general framework for wildlife ReID. Given that high-frequency information is a consistent representation of unique features in various species, significantly aiding in identifying contours and details such as fur textures, we propose the Adaptive High-Frequency Transformer model with the goal of enhancing high-frequency information learning. To mitigate the inevitable high-frequency interference in the wilderness environment, we introduce an object-aware high-frequency selection strategy to adaptively capture more valuable high-frequency components. Notably, we unify the experimental settings of multiple wildlife datasets for ReID, achieving superior performance over state-of-the-art ReID methods. In domain generalization scenarios, our approach demonstrates robust generalization to unknown species.

Updated: 2024-10-09 15:16:30

标题: 适应性高频变压器用于多样化野生动物再识别

摘要: 野生动物个体识别（Wildlife ReID）涉及利用视觉技术在不同场景中识别特定的野生动物个体，对于野生动物保护、生态研究和环境监测具有重要意义。现有的野生动物ReID方法主要针对特定物种，适用性有限。虽然一些方法借鉴了广泛研究的人物ReID技术，但仍难以解决野生动物带来的独特挑战。因此，在本文中，我们提出了一个统一的、多物种的野生动物ReID框架。鉴于高频信息是各种物种独特特征的一致表示，极大地有助于识别轮廓和细节，如毛发纹理，我们提出了自适应高频变换器模型，旨在增强高频信息学习。为了减轻荒野环境中不可避免的高频干扰，我们引入了一种基于对象的高频选择策略，以自适应地捕获更有价值的高频成分。值得注意的是，我们统一了多个野生动物数据集的实验设置，相对于最先进的ReID方法，取得了卓越的性能。在域泛化场景中，我们的方法展现出对未知物种的强大泛化能力。

更新时间: 2024-10-09 15:16:30

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2410.06977v1

AdaRC: Mitigating Graph Structure Shifts during Test-Time

Powerful as they are, graph neural networks (GNNs) are known to be vulnerable to distribution shifts. Recently, test-time adaptation (TTA) has attracted attention due to its ability to adapt a pre-trained model to a target domain without re-accessing the source domain. However, existing TTA algorithms are primarily designed for attribute shifts in vision tasks, where samples are independent. These methods perform poorly on graph data that experience structure shifts, where node connectivity differs between source and target graphs. We attribute this performance gap to the distinct impact of node attribute shifts versus graph structure shifts: the latter significantly degrades the quality of node representations and blurs the boundaries between different node categories. To address structure shifts in graphs, we propose AdaRC, an innovative framework designed for effective and efficient adaptation to structure shifts by adjusting the hop-aggregation parameters in GNNs. To enhance the representation quality, we design a prediction-informed clustering loss to encourage the formation of distinct clusters for different node categories. Additionally, AdaRC seamlessly integrates with existing TTA algorithms, allowing it to handle attribute shifts effectively while improving overall performance under combined structure and attribute shifts. We validate the effectiveness of AdaRC on both synthetic and real-world datasets, demonstrating its robustness across various combinations of structure and attribute shifts.

Updated: 2024-10-09 15:15:40

标题: AdaRC：在测试时缓解图结构转移

摘要: 尽管图神经网络（GNNs）非常强大，但已知它们容易受到分布转移的影响。最近，测试时间适应（TTA）因其能够将预训练模型调整到目标域而不需要重新访问源域而受到关注。然而，现有的TTA算法主要设计用于视觉任务中属性转移，其中样本是独立的。这些方法在经历结构转移的图数据上表现不佳，其中节点连接在源图和目标图之间存在差异。我们将这种性能差距归因于节点属性转移与图结构转移的不同影响：后者显著降低了节点表示的质量，并模糊了不同节点类别之间的界限。为了解决图中的结构转移，我们提出了AdaRC，这是一个创新的框架，旨在通过调整GNNs中的跳跃聚合参数来有效和高效地适应结构转移。为了增强表示质量，我们设计了一个预测导向的聚类损失，以鼓励形成不同节点类别的明显聚类。此外，AdaRC可以无缝集成现有的TTA算法，使其能够有效处理属性转移，同时提高在结构和属性转移下的整体性能。我们验证了AdaRC在合成和真实数据集上的有效性，证明了其在各种结构和属性转移组合下的稳健性。

更新时间: 2024-10-09 15:15:40

领域: cs.LG

下载: http://arxiv.org/abs/2410.06976v1

LSTM networks provide efficient cyanobacterial blooms forecasting even with incomplete spatio-temporal data

Cyanobacteria are the most frequent dominant species of algal blooms in inland waters, threatening ecosystem function and water quality, especially when toxin-producing strains predominate. Enhanced by anthropogenic activities and global warming, cyanobacterial blooms are expected to increase in frequency and global distribution. Early warning systems (EWS) for cyanobacterial blooms development allow timely implementation of management measures, reducing the risks associated to these blooms. In this paper, we propose an effective EWS for cyanobacterial bloom forecasting, which uses 6 years of incomplete high-frequency spatio-temporal data from multiparametric probes, including phycocyanin (PC) fluorescence as a proxy for cyanobacteria. A probe agnostic and replicable method is proposed to pre-process the data and to generate time series specific for cyanobacterial bloom forecasting. Using these pre-processed data, six different non-site/species-specific predictive models were compared including the autoregressive and multivariate versions of Linear Regression, Random Forest, and Long-Term Short-Term (LSTM) neural networks. Results were analyzed for seven forecasting time horizons ranging from 4 to 28 days evaluated with a hybrid system that combined regression metrics (MSE, R2, MAPE) for PC values, classification metrics (Accuracy, F1, Kappa) for a proposed alarm level of 10 ug PC/L, and a forecasting-specific metric to measure prediction improvement over the displaced signal (skill). The multivariate version of LSTM showed the best and most consistent results across all forecasting horizons and metrics, achieving accuracies of up to 90% in predicting the proposed PC alarm level. Additionally, positive skill values indicated its outstanding effectiveness to forecast cyanobacterial blooms from 16 to 28 days in advance.

Updated: 2024-10-09 15:13:24

标题: LSTM网络即使在不完整的时空数据情况下也能有效地预测蓝藻水华

摘要: 蓝藻是内陆水体中最常见的优势藻类水华物种，威胁生态系统功能和水质，尤其是当产生毒素的菌株占优势时。由人类活动和全球变暖加剧，预计蓝藻水华将在频率和全球分布上增加。蓝藻水华发展的预警系统（EWS）可以及时实施管理措施，降低这些水华带来的风险。本文提出了一种有效的蓝藻水华预测EWS，利用了6年来自多参数探测器的不完整高频时空数据，其中包括藻蓝蛋白（PC）荧光作为蓝藻的代理。提出了一种探测器不可知和可复制的方法来预处理数据，并生成特定于蓝藻水华预测的时间序列。利用这些预处理数据，比较了六种不特定于地点/物种的预测模型，包括线性回归的自回归和多变量版本、随机森林和长期短期（LSTM）神经网络。结果针对从4天到28天的七个预测时间范围进行分析，评估了一个混合系统，结合了PC值的回归指标（MSE、R2、MAPE）、一个建议的10 ug PC/L警报级别的分类指标（准确度、F1、Kappa），以及一个专门用于测量预测改进的指标（技能）与被移位信号。多元化版本的LSTM在所有预测时间范围和指标上表现最好且最一致，可以在预测所提议的PC警报级别时达到高达90%的准确率。此外，积极的技能值表明其在提前16至28天预测蓝藻水华方面具有卓越的效果。

更新时间: 2024-10-09 15:13:24

领域: q-bio.QM,cs.LG,cs.NE

下载: http://arxiv.org/abs/2410.08237v1

Diagnosis of Malignant Lymphoma Cancer Using Hybrid Optimized Techniques Based on Dense Neural Networks

Lymphoma diagnosis, particularly distinguishing between subtypes, is critical for effective treatment but remains challenging due to the subtle morphological differences in histopathological images. This study presents a novel hybrid deep learning framework that combines DenseNet201 for feature extraction with a Dense Neural Network (DNN) for classification, optimized using the Harris Hawks Optimization (HHO) algorithm. The model was trained on a dataset of 15,000 biopsy images, spanning three lymphoma subtypes: Chronic Lymphocytic Leukemia (CLL), Follicular Lymphoma (FL), and Mantle Cell Lymphoma (MCL). Our approach achieved a testing accuracy of 99.33\%, demonstrating significant improvements in both accuracy and model interpretability. Comprehensive evaluation using precision, recall, F1-score, and ROC-AUC underscores the model's robustness and potential for clinical adoption. This framework offers a scalable solution for improving diagnostic accuracy and efficiency in oncology.

Updated: 2024-10-09 15:12:35

标题: 使用基于密集神经网络的混合优化技术诊断恶性淋巴瘤癌症

摘要: 淋巴瘤诊断，特别是区分亚型，对于有效治疗至关重要，但由于组织病理学图像中微妙的形态差异，仍然具有挑战性。本研究提出了一种新颖的混合深度学习框架，将DenseNet201用于特征提取，与Dense神经网络（DNN）结合进行分类，使用Harris Hawks Optimization（HHO）算法进行优化。该模型在一个包含15,000个活检图像的数据集上进行训练，涵盖三种淋巴瘤亚型：慢性淋巴细胞白血病（CLL）、滤泡性淋巴瘤（FL）和袖珍细胞淋巴瘤（MCL）。我们的方法实现了99.33\%的测试准确率，显示了准确性和模型可解释性方面的显著改进。使用精确度、召回率、F1分数和ROC-AUC进行全面评估，强调了模型的稳健性和临床采用潜力。该框架提供了一种可扩展的解决方案，用于提高肿瘤学诊断的准确性和效率。

更新时间: 2024-10-09 15:12:35

领域: eess.IV,cs.CV,cs.LG

下载: http://arxiv.org/abs/2410.06974v1

Personal Intelligence System UniLM: Hybrid On-Device Small Language Model and Server-Based Large Language Model for Malay Nusantara

In contexts with limited computational and data resources, high-resource language models often prove inadequate, particularly when addressing the specific needs of Malay languages. This paper introduces a Personal Intelligence System designed to efficiently integrate both on-device and server-based models. The system incorporates SLiM-34M for on-device processing, optimized for low memory and power usage, and MANYAK-1.3B for server-based tasks, allowing for scalable, high-performance language processing. The models achieve significant results across various tasks, such as machine translation, question-answering, and translate IndoMMLU. Particularly noteworthy is SLiM-34M's ability to achieve a high improvement in accuracy compared to other LLMs while using 2 times fewer pre-training tokens. This work challenges the prevailing assumption that large-scale computational resources are necessary to build effective language models, contributing to the development of resource-efficient models for the Malay language with the unique orchestration between SLiM-34M and MANYAK-1.3B.

Updated: 2024-10-09 15:11:13

标题: 个人智能系统UniLM：马来尼亚南大的混合式本地小语言模型和基于服务器的大型语言模型

摘要: 在计算和数据资源有限的情况下，高资源语言模型通常无法满足特别是在处理马来语言特定需求时。本文介绍了一种个人智能系统，旨在有效地整合设备内和基于服务器的模型。该系统集成了SLiM-34M用于设备内处理，针对低内存和功耗进行优化，以及MANYAK-1.3B用于基于服务器的任务，实现可伸缩的高性能语言处理。这些模型在各种任务中取得了显著的结果，如机器翻译、问答和IndoMMLU翻译。特别值得注意的是，与其他LLMs相比，SLiM-34M在使用2倍较少的预训练标记的情况下实现了高精度改进。这项工作挑战了普遍的观点，即必须拥有大规模计算资源才能构建有效的语言模型，为马来语言的资源高效模型的发展做出了贡献，通过SLiM-34M和MANYAK-1.3B之间的独特协调。

更新时间: 2024-10-09 15:11:13

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2410.06973v1

The BRAVO Semantic Segmentation Challenge Results in UNCV2024

We propose the unified BRAVO challenge to benchmark the reliability of semantic segmentation models under realistic perturbations and unknown out-of-distribution (OOD) scenarios. We define two categories of reliability: (1) semantic reliability, which reflects the model's accuracy and calibration when exposed to various perturbations; and (2) OOD reliability, which measures the model's ability to detect object classes that are unknown during training. The challenge attracted nearly 100 submissions from international teams representing notable research institutions. The results reveal interesting insights into the importance of large-scale pre-training and minimal architectural design in developing robust and reliable semantic segmentation models.

Updated: 2024-10-09 15:09:47

标题: UNCV2024中的BRAVO语义分割挑战结果

摘要: 我们提出了统一的BRAVO挑战来评估语义分割模型在现实扰动和未知分布（OOD）场景下的可靠性。我们定义了两种可靠性类别：（1）语义可靠性，反映模型在面对各种扰动时的准确性和校准性；（2）OOD可靠性，衡量模型在训练期间未知的对象类别的检测能力。这一挑战吸引了来自国际知名研究机构的近100个提交。结果揭示了大规模预训练和最小化架构设计在开发稳健可靠的语义分割模型中的重要性。

更新时间: 2024-10-09 15:09:47

领域: cs.CV,cs.AI,cs.LG

下载: http://arxiv.org/abs/2409.15107v2

DLGNet: Hyperedge Classification through Directed Line Graphs for Chemical Reactions

Graphs and hypergraphs provide powerful abstractions for modeling interactions among a set of entities of interest and have been attracting a growing interest in the literature thanks to many successful applications in several fields. In particular, they are rapidly expanding in domains such as chemistry and biology, especially in the areas of drug discovery and molecule generation. One of the areas witnessing the fasted growth is the chemical reactions field, where chemical reactions can be naturally encoded as directed hyperedges of a hypergraph. In this paper, we address the chemical reaction classification problem by introducing the notation of a Directed Line Graph (DGL) associated with a given directed hypergraph. On top of it, we build the Directed Line Graph Network (DLGNet), the first spectral-based Graph Neural Network (GNN) expressly designed to operate on a hypergraph via its DLG transformation. The foundation of DLGNet is a novel Hermitian matrix, the Directed Line Graph Laplacian, which compactly encodes the directionality of the interactions taking place within the directed hyperedges of the hypergraph thanks to the DLG representation. The Directed Line Graph Laplacian enjoys many desirable properties, including admitting an eigenvalue decomposition and being positive semidefinite, which make it well-suited for its adoption within a spectral-based GNN. Through extensive experiments on chemical reaction datasets, we show that DGLNet significantly outperforms the existing approaches, achieving on a collection of real-world datasets an average relative-percentage-difference improvement of 33.01%, with a maximum improvement of 37.71%.

Updated: 2024-10-09 15:07:53

标题: DLGNet：通过有向线图对化学反应进行超边分类

摘要: 图和超图为建模感兴趣的实体之间的交互提供了强大的抽象，由于在多个领域中的许多成功应用，它们在文献中引起了越来越多的关注。特别是在化学和生物学等领域，它们正在迅速扩展，尤其是在药物发现和分子生成领域。其中一个见证增长最快的领域是化学反应领域，化学反应可以自然地编码为超图的有向超边。在本文中，我们通过引入与给定有向超图相关联的有向线图（DGL）的符号，解决了化学反应分类问题。在此基础上，我们构建了Directed Line Graph Network（DLGNet），这是第一个专门设计用于通过其DLG转换在超图上运行的基于谱的图神经网络（GNN）。DLGNet的基础是一个新颖的Hermite矩阵，有向线图Laplacian，通过DLG表示紧凑地编码了在超图的有向超边内发生的交互的方向性。Directed Line Graph Laplacian具有许多理想的特性，包括允许特征值分解和是半正定的，这使得它非常适合在基于谱的GNN中采用。通过对化学反应数据集的广泛实验，我们表明DGLNet明显优于现有方法，在一系列真实数据集上，平均相对百分比差异改善了33.01％，最大改善为37.71％。

更新时间: 2024-10-09 15:07:53

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2410.06969v1

$\texttt{ModSCAN}$: Measuring Stereotypical Bias in Large Vision-Language Models from Vision and Language Modalities

Large vision-language models (LVLMs) have been rapidly developed and widely used in various fields, but the (potential) stereotypical bias in the model is largely unexplored. In this study, we present a pioneering measurement framework, $\texttt{ModSCAN}$, to $\underline{SCAN}$ the stereotypical bias within LVLMs from both vision and language $\underline{Mod}$alities. $\texttt{ModSCAN}$ examines stereotypical biases with respect to two typical stereotypical attributes (gender and race) across three kinds of scenarios: occupations, descriptors, and persona traits. Our findings suggest that 1) the currently popular LVLMs show significant stereotype biases, with CogVLM emerging as the most biased model; 2) these stereotypical biases may stem from the inherent biases in the training dataset and pre-trained models; 3) the utilization of specific prompt prefixes (from both vision and language modalities) performs well in reducing stereotypical biases. We believe our work can serve as the foundation for understanding and addressing stereotypical bias in LVLMs.

Updated: 2024-10-09 15:07:05

标题: $\texttt{ModSCAN}$：从视觉和语言模态中测量大型视觉-语言模型中的刻板偏见

摘要: 大型视觉-语言模型（LVLMs）已经迅速发展并广泛应用于各个领域，但模型中的（潜在）刻板印象偏差尚未得到深入探讨。在本研究中，我们提出了一个开创性的测量框架，$\texttt{ModSCAN}$，用于从视觉和语言模态的两个方面$\underline{SCAN}$ LVLMs中的刻板印象偏差。$\texttt{ModSCAN}$通过对职业、描述词和人物特征这三种典型情景中的两种刻板印象属性（性别和种族）进行测量，检查了刻板印象偏差。我们的研究结果表明：1）目前流行的LVLMs显示出显著的刻板印象偏差，其中CogVLM被确认为最具偏见的模型；2）这些刻板印象偏差可能源自训练数据集和预训练模型中固有的偏见；3）利用特定提示前缀（来自视觉和语言模态）在减少刻板印象偏差方面表现良好。我们相信我们的工作可以为理解和解决LVLMs中的刻板印象偏差奠定基础。

更新时间: 2024-10-09 15:07:05

领域: cs.CR,cs.CY

下载: http://arxiv.org/abs/2410.06967v1

KID-PPG: Knowledge Informed Deep Learning for Extracting Heart Rate from a Smartwatch

Accurate extraction of heart rate from photoplethysmography (PPG) signals remains challenging due to motion artifacts and signal degradation. Although deep learning methods trained as a data-driven inference problem offer promising solutions, they often underutilize existing knowledge from the medical and signal processing community. In this paper, we address three shortcomings of deep learning models: motion artifact removal, degradation assessment, and physiologically plausible analysis of the PPG signal. We propose KID-PPG, a knowledge-informed deep learning model that integrates expert knowledge through adaptive linear filtering, deep probabilistic inference, and data augmentation. We evaluate KID-PPG on the PPGDalia dataset, achieving an average mean absolute error of 2.85 beats per minute, surpassing existing reproducible methods. Our results demonstrate a significant performance improvement in heart rate tracking through the incorporation of prior knowledge into deep learning models. This approach shows promise in enhancing various biomedical applications by incorporating existing expert knowledge in deep learning models.

Updated: 2024-10-09 15:03:38

标题: KID-PPG：知识引导的深度学习用于从智能手表中提取心率

摘要: 从光电容积脉搏图（PPG）信号中准确提取心率仍然具有挑战性，这是由于运动伪影和信号退化。虽然训练为数据驱动推断问题的深度学习方法提供了有希望的解决方案，但它们通常未充分利用来自医学和信号处理社区的现有知识。在本文中，我们解决了深度学习模型的三个缺陷：运动伪影去除、信号退化评估以及PPG信号的生理合理分析。我们提出了KID-PPG，这是一种知识驱动的深度学习模型，通过自适应线性滤波、深度概率推断和数据增强来整合专业知识。我们在PPGDalia数据集上评估了KID-PPG，实现了平均每分钟2.85拍的平均绝对误差，超过了现有可重现方法。我们的结果表明，通过将先验知识纳入深度学习模型，可以显著提高心率跟踪的性能。这种方法显示了在深度学习模型中整合现有专业知识以增强各种生物医学应用的潜力。

更新时间: 2024-10-09 15:03:38

领域: eess.SP,cs.LG

下载: http://arxiv.org/abs/2405.09559v2

Uncovering Factor Level Preferences to Improve Human-Model Alignment

Despite advancements in Large Language Model (LLM) alignment, understanding the reasons behind LLM preferences remains crucial for bridging the gap between desired and actual behavior. LLMs often exhibit biases or tendencies that diverge from human preferences, such as favoring certain writing styles or producing overly verbose outputs. However, current methods for evaluating preference alignment often lack explainability, relying on coarse-grained comparisons. To address this, we introduce PROFILE (PRObing Factors of InfLuence for Explainability), a novel framework that uncovers and quantifies the influence of specific factors driving preferences. PROFILE's factor level analysis explains the 'why' behind human-model alignment and misalignment, offering insights into the direction of model improvement. We apply PROFILE to analyze human and LLM preferences across three tasks: summarization, helpful response generation, and document-based question-answering. Our factor level analysis reveals a substantial discrepancy between human and LLM preferences in generation tasks, whereas LLMs show strong alignment with human preferences in evaluation tasks. We demonstrate how leveraging factor level insights, including addressing misaligned factors or exploiting the generation-evaluation gap, can improve alignment with human preferences. This work underscores the importance of explainable preference analysis and highlights PROFILE's potential to provide valuable training signals, driving further improvements in human-model alignment.

Updated: 2024-10-09 15:02:34

标题: 揭示因素级别偏好以改善人体模型对齐

摘要: 尽管大型语言模型（LLM）对齐方面取得了进展，但理解LLM偏好背后的原因仍然至关重要，以缩小所需行为和实际行为之间的差距。LLMs经常表现出偏见或倾向，与人类偏好相背离，例如偏爱某些写作风格或产生过于啰嗦的输出。然而，目前用于评估偏好对齐的方法往往缺乏解释性，依赖粗粒度的比较。为了解决这一问题，我们介绍了PROFILE（PRObing Factors of InfLuence for Explainability），这是一个新颖的框架，可以揭示和量化驱动偏好的具体因素的影响。PROFILE的因素级别分析解释了人类模型对齐和不对齐背后的原因，提供了改进模型方向的见解。我们将PROFILE应用于分析人类和LLM在三个任务中的偏好：摘要、有用的回复生成和基于文档的问题回答。我们的因素级别分析揭示了在生成任务中人类和LLM偏好之间存在显著差异，而在评估任务中，LLMs与人类偏好强烈对齐。我们展示了如何利用因素级别的见解，包括解决不对齐的因素或利用生成-评估差距，可以改善与人类偏好的对齐。这项工作强调了解释性偏好分析的重要性，并突显了PROFILE提供有价值的训练信号的潜力，推动进一步改善人类模型对齐。

更新时间: 2024-10-09 15:02:34

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2410.06965v1

Self-Boosting Large Language Models with Synthetic Preference Data

Through alignment with human preferences, Large Language Models (LLMs) have advanced significantly in generating honest, harmless, and helpful responses. However, collecting high-quality preference data is a resource-intensive and creativity-demanding process, especially for the continual improvement of LLMs. We introduce SynPO, a self-boosting paradigm that leverages synthetic preference data for model alignment. SynPO employs an iterative mechanism wherein a self-prompt generator creates diverse prompts, and a response improver refines model responses progressively. This approach trains LLMs to autonomously learn the generative rewards for their own outputs and eliminates the need for large-scale annotation of prompts and human preferences. After four SynPO iterations, Llama3-8B and Mistral-7B show significant enhancements in instruction-following abilities, achieving over 22.1% win rate improvements on AlpacaEval 2.0 and ArenaHard. Simultaneously, SynPO improves the general performance of LLMs on various tasks, validated by a 3.2 to 5.0 average score increase on the well-recognized Open LLM leaderboard.

Updated: 2024-10-09 14:57:31

标题: 利用合成偏好数据自我提升的大型语言模型

摘要: 通过与人类偏好的对齐，大型语言模型（LLMs）在生成诚实、无害和有帮助的回应方面取得了显著进展。然而，收集高质量的偏好数据是一项资源密集型且需要创造力的过程，特别是对于持续改进LLMs。我们引入了SynPO，这是一种利用合成偏好数据进行模型对齐的自我增强范式。SynPO采用一种迭代机制，其中自我提示生成器创建多样化的提示，而响应改进者逐步完善模型回应。这种方法训练LLMs自主学习其输出的生成性奖励，并消除了对大规模提示和人类偏好的注释的需求。经过四次SynPO迭代，Llama3-8B和Mistral-7B在遵循指令能力方面表现出显著提升，在AlpacaEval 2.0和ArenaHard上实现了超过22.1%的胜率提高。同时，SynPO提高了LLMs在各种任务上的通用性能，通过公认的Open LLM排行榜上的3.2至5.0平均分数增加得到验证。

更新时间: 2024-10-09 14:57:31

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2410.06961v1

A Stability Principle for Learning under Non-Stationarity

We develop a versatile framework for statistical learning in non-stationary environments. In each time period, our approach applies a stability principle to select a look-back window that maximizes the utilization of historical data while keeping the cumulative bias within an acceptable range relative to the stochastic error. Our theory and numerical experiments showcase the adaptivity of this approach to unknown non-stationarity. We prove regret bounds that are minimax optimal up to logarithmic factors when the population losses are strongly convex, or Lipschitz only. At the heart of our analysis lie two novel components: a measure of similarity between functions and a segmentation technique for dividing the non-stationary data sequence into quasi-stationary pieces.

Updated: 2024-10-09 14:55:30

标题: 学习非平稳性下的稳定性原则

摘要: 我们开发了一个多功能的框架，用于在非稳态环境中进行统计学习。在每个时间段，我们的方法应用稳定性原则来选择一个回顾窗口，最大限度地利用历史数据，同时保持累积偏差在与随机误差相比可接受范围内。我们的理论和数值实验展示了这种方法对未知非稳态性的适应性。当总体损失是强凸的，或者仅是利普希茨时，我们证明了遗憾界在对数因子上是极小化最优的。在我们的分析中，有两个新颖的组成部分：函数之间的相似度度量和一种将非稳态数据序列划分为准稳态片段的分割技术。

更新时间: 2024-10-09 14:55:30

领域: cs.LG,cs.AI,math.OC,stat.ML,68T05, 90C15

下载: http://arxiv.org/abs/2310.18304v3

Support Vector Boosting Machine (SVBM): Enhancing Classification Performance with AdaBoost and Residual Connections

In traditional boosting algorithms, the focus on misclassified training samples emphasizes their importance based on difficulty during the learning process. While using a standard Support Vector Machine (SVM) as a weak learner in an AdaBoost framework can enhance model performance by concentrating on error samples, this approach introduces significant challenges. Specifically, SVMs, characterized by their stability and robustness, may require destabilization to fit the boosting paradigm, which in turn can constrain performance due to reliance on the weighted results from preceding iterations. To address these challenges, we propose the Support Vector Boosting Machine (SVBM), which integrates a novel subsampling process with SVM algorithms and residual connection techniques. This method updates sample weights by considering both the current model's predictions and the outputs from prior rounds, allowing for effective sparsity control. The SVBM framework enhances the ability to form complex decision boundaries, thereby improving classification performance. The MATLAB source code for SVBM can be accessed at https://github.com/junbolian/SVBM.

Updated: 2024-10-09 14:55:19

标题: 支持向量增强机器（SVBM）：通过AdaBoost和残差连接增强分类性能

摘要: 在传统的增强算法中，对于误分类的训练样本的关注强调了它们在学习过程中的困难性，因此强调了它们的重要性。在AdaBoost框架中使用标准支持向量机（SVM）作为弱学习器可以通过集中关注错误样本来提高模型性能，但是这种方法引入了显著的挑战。具体来说，以稳定性和鲁棒性为特点的SVM可能需要破坏以适应增强范式，这反过来可能会由于依赖前几轮迭代的加权结果而限制性能。为了解决这些挑战，我们提出了支持向量增强机（SVBM），它将一种新颖的子采样过程与SVM算法和残差连接技术相结合。该方法通过考虑当前模型的预测和先前轮次的输出来更新样本权重，从而实现有效的稀疏控制。SVBM框架增强了形成复杂决策边界的能力，从而提高了分类性能。SVBM的MATLAB源代码可以在https://github.com/junbolian/SVBM 上访问。

更新时间: 2024-10-09 14:55:19

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2410.06957v1

Directly Handling Missing Data in Linear Discriminant Analysis for Enhancing Classification Accuracy and Interpretability

As the adoption of Artificial Intelligence (AI) models expands into critical real-world applications, ensuring the explainability of these models becomes paramount, particularly in sensitive fields such as medicine and finance. Linear Discriminant Analysis (LDA) remains a popular choice for classification due to its interpretable nature, derived from its capacity to model class distributions and enhance class separation through linear combinations of features. However, real-world datasets often suffer from incomplete data, posing substantial challenges for both classification accuracy and model interpretability. In this paper, we introduce a novel and robust classification method, termed Weighted missing Linear Discriminant Analysis (WLDA), which extends LDA to handle datasets with missing values without the need for imputation. Our approach innovatively incorporates a weight matrix that penalizes missing entries, thereby refining parameter estimation directly on incomplete data. This methodology not only preserves the interpretability of LDA but also significantly enhances classification performance in scenarios plagued by missing data. We conduct an in-depth theoretical analysis to establish the properties of WLDA and thoroughly evaluate its explainability. Experimental results across various datasets demonstrate that WLDA consistently outperforms traditional methods, especially in challenging environments where missing values are prevalent in both training and test datasets. This advancement provides a critical tool for improving classification accuracy and maintaining model transparency in the face of incomplete data.

Updated: 2024-10-09 14:51:23

标题: 直接处理线性判别分析中的缺失数据以提高分类准确性和可解释性

摘要: 随着人工智能（AI）模型在关键实际应用中的应用扩展，确保这些模型的可解释性变得至关重要，特别是在敏感领域如医学和金融领域。线性判别分析（LDA）由于其可解释性而仍然是分类的热门选择，这种可解释性来源于其能够通过特征的线性组合来建模类别分布并增强类别分离能力。然而，现实世界的数据集往往存在数据不完整的问题，这给分类准确度和模型可解释性带来了重大挑战。在本文中，我们介绍了一种新颖而强大的分类方法，称为加权缺失线性判别分析（WLDA），它将LDA扩展到处理具有缺失值的数据集，而无需进行插补。我们的方法创新地将一个惩罚缺失条目的权重矩阵结合进来，从而直接在不完整数据上优化参数估计。这种方法不仅保留了LDA的可解释性，还显著提高了在存在缺失数据的场景中的分类性能。我们进行了深入的理论分析，建立了WLDA的性质，并对其可解释性进行了全面评估。跨越各种数据集的实验结果表明，WLDA一贯优于传统方法，特别是在训练和测试数据集中普遍存在缺失值的挑战性环境中。这一进步为提高分类准确性和保持模型透明度提供了关键工具。

更新时间: 2024-10-09 14:51:23

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2407.00710v3

Faithful Interpretation for Graph Neural Networks

Currently, attention mechanisms have garnered increasing attention in Graph Neural Networks (GNNs), such as Graph Attention Networks (GATs) and Graph Transformers (GTs). It is not only due to the commendable boost in performance they offer but also its capacity to provide a more lucid rationale for model behaviors, which are often viewed as inscrutable. However, Attention-based GNNs have demonstrated instability in interpretability when subjected to various sources of perturbations during both training and testing phases, including factors like additional edges or nodes. In this paper, we propose a solution to this problem by introducing a novel notion called Faithful Graph Attention-based Interpretation (FGAI). In particular, FGAI has four crucial properties regarding stability and sensitivity to interpretation and final output distribution. Built upon this notion, we propose an efficient methodology for obtaining FGAI, which can be viewed as an ad hoc modification to the canonical Attention-based GNNs. To validate our proposed solution, we introduce two novel metrics tailored for graph interpretation assessment. Experimental results demonstrate that FGAI exhibits superior stability and preserves the interpretability of attention under various forms of perturbations and randomness, which makes FGAI a more faithful and reliable explanation tool.

Updated: 2024-10-09 14:47:12

标题: 图神经网络的忠实解释

摘要: 目前，注意力机制在图神经网络（GNNs）中越来越受到关注，例如图注意力网络（GATs）和图变换器（GTs）。这不仅是因为它们提供的性能提升可观，而且还因为它们能够为模型行为提供更清晰的解释，这些行为通常被认为是晦涩难懂的。然而，基于注意力的GNNs在训练和测试阶段遭受各种扰动时，包括额外的边缘或节点，已经表现出解释能力的不稳定性。在本文中，我们提出了一种解决这个问题的方法，引入了一个名为忠实图注意力解释（FGAI）的新概念。具体来说，FGAI具有四个关键属性，涉及稳定性和对解释和最终输出分布的敏感性。基于这一概念，我们提出了一种有效的方法来获得FGAI，可以看作是对标准基于注意力的GNNs的一种临时修改。为了验证我们提出的解决方案，我们引入了两个针对图解释评估定制的新指标。实验结果表明，FGAI表现出更高的稳定性，并在各种形式的扰动和随机性下保持了注意力的可解释性，使得FGAI成为一种更忠实和可靠的解释工具。

更新时间: 2024-10-09 14:47:12

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2410.06950v1

A Computational Harmonic Detection Algorithm to Detect Data Leakage through EM Emanation

Unintended electromagnetic emissions from electronic devices, known as EM emanations, pose significant security risks because they can be processed to recover the source signal's information content. Defense organizations typically use metal shielding to prevent data leakage, but this approach is costly and impractical for widespread use, especially in uncontrolled environments like government facilities in the wild. This is particularly relevant for IoT devices due to their large numbers and deployment in varied environments. This gives rise to a research need for an automated emanation detection method to monitor the facilities and take prompt steps when leakage is detected. To address this, in the preliminary version of this work [1], we collected emanation data from 3 types of HDMI cables and proposed a CNN-based detection method that provided 95% accuracy up to 22.5m. However, the CNN-based method has some limitations: hardware dependency, confusion among multiple sources, and struggle at low SNR. In this extended version, we augment the initial study by collecting emanation data from IoT devices, everyday electronic devices, and cables. Data analysis reveals that each device's emanation has a unique harmonic pattern with intermodulation products, in contrast to communication signals with fixed frequency bands, spectra, and modulation patterns. Leveraging this, we propose a harmonic-based detection method by developing a computational harmonic detector. The proposed method addresses the limitations of the CNN method and provides ~100 accuracy not only for HDMI emanation (compared to 95% in the earlier CNN-based method) but also for all other tested devices/cables in different environments.

Updated: 2024-10-09 14:40:15

标题: 一种用于检测通过电磁泄漏引起的数据泄露的计算谐波检测算法

摘要: 电子设备的意外电磁辐射，即电磁辐射，由于可以被处理以恢复源信号的信息内容，因此构成重要的安全风险。防御组织通常使用金属屏蔽来防止数据泄漏，但这种方法成本高且在广泛使用中不切实际，特别是在像政府设施之类的不受控制的环境中。由于物联网设备数量众多并部署在各种环境中，这对于IoT设备尤为重要。这导致了对一种自动辐射检测方法的研究需求，以监测设施并在检测到泄漏时采取及时措施。为了解决这个问题，在本工作的初步版本中，我们从3种类型的HDMI电缆收集了辐射数据，并提出了一种基于CNN的检测方法，其在22.5m范围内提供了95%的准确性。然而，基于CNN的方法存在一些局限性：硬件依赖性，多个源之间的混淆以及在低信噪比下的困难。在这个扩展版本中，我们通过从物联网设备、日常电子设备和电缆中收集辐射数据来补充初步研究。数据分析显示，每个设备的辐射具有独特的谐波模式和互调产物，与具有固定频段、频谱和调制模式的通信信号形成对比。利用这一点，我们提出了一种基于谐波的检测方法，通过开发计算谐波检测器。所提出的方法解决了CNN方法的局限性，并不仅为HDMI辐射提供了大约100%的准确性（与早期基于CNN的方法相比提高了5%），而且还适用于不同环境中的所有其他测试设备/电缆。

更新时间: 2024-10-09 14:40:15

领域: cs.CR,eess.SP

下载: http://arxiv.org/abs/2410.16316v1

AutoFeedback: An LLM-based Framework for Efficient and Accurate API Request Generation

Large Language Models (LLMs) leverage external tools primarily through generating the API request to enhance task completion efficiency. The accuracy of API request generation significantly determines the capability of LLMs to accomplish tasks. Due to the inherent hallucinations within the LLM, it is difficult to efficiently and accurately generate the correct API request. Current research uses prompt-based feedback to facilitate the LLM-based API request generation. However, existing methods lack factual information and are insufficiently detailed. To address these issues, we propose AutoFeedback, an LLM-based framework for efficient and accurate API request generation, with a Static Scanning Component (SSC) and a Dynamic Analysis Component (DAC). SSC incorporates errors detected in the API requests as pseudo-facts into the feedback, enriching the factual information. DAC retrieves information from API documentation, enhancing the level of detail in feedback. Based on this two components, Autofeedback implementes two feedback loops during the process of generating API requests by the LLM. Extensive experiments demonstrate that it significantly improves accuracy of API request generation and reduces the interaction cost. AutoFeedback achieves an accuracy of 100.00\% on a real-world API dataset and reduces the cost of interaction with GPT-3.5 Turbo by 23.44\%, and GPT-4 Turbo by 11.85\%.

Updated: 2024-10-09 14:38:28

标题: AutoFeedback: 一种基于LLM的高效准确API请求生成框架

摘要: 大型语言模型（LLMs）主要通过生成API请求来利用外部工具，以提高任务完成效率。 API请求生成的准确性显著影响LLMs完成任务的能力。由于LLM内在的幻觉，难以有效和准确地生成正确的API请求。当前研究使用基于提示的反馈来促进基于LLM的API请求生成。然而，现有方法缺乏事实信息，细节不足。为解决这些问题，我们提出了AutoFeedback，一种基于LLM的框架，用于高效和准确地生成API请求，具有静态扫描组件（SSC）和动态分析组件（DAC）。 SSC将检测到的API请求中的错误作为伪事实纳入反馈中，丰富事实信息。 DAC从API文档中检索信息，增强反馈的详细程度。基于这两个组件，AutoFeedback在LLM生成API请求过程中实施了两个反馈回路。大量实验证明，它显著提高了API请求生成的准确性，并降低了交互成本。 AutoFeedback在真实API数据集上实现了100.00％的准确性，并将与GPT-3.5 Turbo的交互成本降低了23.44％，与GPT-4 Turbo降低了11.85％。

更新时间: 2024-10-09 14:38:28

领域: cs.SE,cs.AI

下载: http://arxiv.org/abs/2410.06943v1

Representation Alignment for Generation: Training Diffusion Transformers Is Easier Than You Think

Recent studies have shown that the denoising process in (generative) diffusion models can induce meaningful (discriminative) representations inside the model, though the quality of these representations still lags behind those learned through recent self-supervised learning methods. We argue that one main bottleneck in training large-scale diffusion models for generation lies in effectively learning these representations. Moreover, training can be made easier by incorporating high-quality external visual representations, rather than relying solely on the diffusion models to learn them independently. We study this by introducing a straightforward regularization called REPresentation Alignment (REPA), which aligns the projections of noisy input hidden states in denoising networks with clean image representations obtained from external, pretrained visual encoders. The results are striking: our simple strategy yields significant improvements in both training efficiency and generation quality when applied to popular diffusion and flow-based transformers, such as DiTs and SiTs. For instance, our method can speed up SiT training by over 17.5$\times$, matching the performance (without classifier-free guidance) of a SiT-XL model trained for 7M steps in less than 400K steps. In terms of final generation quality, our approach achieves state-of-the-art results of FID=1.42 using classifier-free guidance with the guidance interval.

Updated: 2024-10-09 14:34:53

标题: 代表性调整用于生成：训练扩散变压器比您想象的要容易

摘要: 最近的研究表明，在（生成式）扩散模型中的去噪过程可以在模型内部引入有意义的（判别式）表示，尽管这些表示的质量仍然落后于最近自监督学习方法学到的表示。我们认为，在训练大规模扩散模型进行生成时，一个主要的瓶颈在于有效学习这些表示。此外，通过将高质量的外部视觉表示纳入训练中，而不是仅依赖于扩散模型独立学习，可以使训练变得更容易。我们通过引入一种称为REPresentation Alignment（REPA）的简单正则化来研究这一点，该正则化将去噪网络中的噪声输入隐藏状态的投影与来自外部预训练视觉编码器获得的干净图像表示进行对齐。结果令人震惊：我们的简单策略在应用于流行的扩散和基于流的变压器（如DiTs和SiTs）时，在训练效率和生成质量方面都取得了显著的改进。例如，我们的方法可以将SiT的训练加速超过17.5倍，与在不到400K步中训练了7M步的SiT-XL模型（无分类器指导）的性能相匹配。在最终的生成质量方面，我们的方法使用无分类器指导和指导间隔实现了FID=1.42的最新结果。

更新时间: 2024-10-09 14:34:53

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2410.06940v1

Learn while Unlearn: An Iterative Unlearning Framework for Generative Language Models

Recent advancements in machine learning, particularly in Natural Language Processing (NLP), have led to the development of sophisticated models trained on extensive datasets, yet raising concerns about the potential leakage of sensitive information. In response, regulatory measures such as the European Union's General Data Protection Regulation (GDPR) have driven increasing interest in Machine Unlearning techniques, which enable models to selectively forget specific data entries. Early approaches primarily relied on pre-processing methods, while more recent research has shifted towards training-based unlearning techniques. Despite their effectiveness, most existing methods require access to the original training data, which is often inaccessible. Additionally, directly applying unlearning techniques bear the cost of undermining the model's expressive capabilities. To address these challenges, we introduce the Iterative Contrastive Unlearning (ICU) framework, which consists of three core components: A Knowledge Unlearning Induction module designed to remove specific knowledge through an unlearning loss; A Contrastive Learning Enhancement module to preserve the model's expressive capabilities against the pure unlearning goal; And an Iterative Unlearning Refinement module that dynamically assess the unlearning extent on specific data pieces and make iterative update. Experimental results demonstrate the efficacy of our ICU method in unlearning sensitive information while maintaining the model's overall performance, offering a promising solution for privacy-conscious machine learning applications.

Updated: 2024-10-09 14:30:08

标题: 学习同时忘却：一种面向生成语言模型的迭代式忘却框架

摘要: 最近在机器学习领域，特别是在自然语言处理（NLP）方面取得的进展，导致了在大量数据集上训练的复杂模型的发展，但也引起了对潜在敏感信息泄露的担忧。作为回应，诸如欧盟的《通用数据保护条例》（GDPR）等监管措施推动了对机器学习取消技术的越来越多的兴趣，这些技术使模型能够选择性地忘记特定数据条目。早期方法主要依赖于预处理方法，而最近的研究已经转向基于训练的取消技术。尽管它们有效，但大多数现有方法需要访问原始训练数据，而这往往是不可访问的。此外，直接应用取消技术会损害模型的表现能力。为了解决这些挑战，我们引入了迭代对比取消（ICU）框架，其中包括三个核心组件：一个知识取消诱导模块，旨在通过取消损失删除特定知识；一个对比学习增强模块，以保护模型的表现能力免受纯取消目标的影响；以及一个迭代取消精化模块，动态评估特定数据片段的取消程度并进行迭代更新。实验结果表明，我们的ICU方法在取消敏感信息的同时保持模型整体性能的有效性，为注重隐私的机器学习应用提供了一种有前途的解决方案。

更新时间: 2024-10-09 14:30:08

领域: cs.LG,cs.AI,cs.CL

下载: http://arxiv.org/abs/2407.20271v2

Predicting Bitcoin Market Trends with Enhanced Technical Indicator Integration and Classification Models

Thanks to the high potential for profit, trading has become increasingly attractive to investors as the cryptocurrency and stock markets rapidly expand. However, because financial markets are intricate and dynamic, accurately predicting prices remains a significant challenge. The volatile nature of the cryptocurrency market makes it even harder for traders and investors to make decisions. This study presents a machine learning model based on classification to forecast the direction of the cryptocurrency market, i.e., whether prices will increase or decrease. The model is trained using historical data and important technical indicators such as the Moving Average Convergence Divergence, the Relative Strength Index, and Bollinger Bands. We illustrate our approach with an empirical study of the closing price of Bitcoin. Several simulations, including a confusion matrix and Receiver Operating Characteristic curve, are used to assess the model's performance, and the results show a buy/sell signal accuracy of over 92%. These findings demonstrate how machine learning models can assist investors and traders of cryptocurrencies in making wise/informed decisions in a very volatile market.

Updated: 2024-10-09 14:29:50

标题: 使用增强技术指标集成和分类模型预测比特币市场趋势

摘要: 由于加密货币和股票市场迅速扩大，交易因其高利润潜力而变得越来越吸引投资者。然而，由于金融市场复杂而动态，准确预测价格仍然是一个重大挑战。加密货币市场的波动性使交易员和投资者更难做出决策。本研究提出了一种基于分类的机器学习模型，用于预测加密货币市场的走势，即价格是上涨还是下跌。该模型使用历史数据和重要的技术指标进行训练，如移动平均收敛差离、相对强度指数和布林带。我们通过对比特币收盘价的实证研究来说明我们的方法。通过几个模拟，包括混淆矩阵和受试者工作特征曲线，来评估模型的性能，结果显示买入/卖出信号准确率超过92%。这些发现表明，机器学习模型可以帮助加密货币投资者和交易员在一个非常波动的市场中做出明智的决策。

更新时间: 2024-10-09 14:29:50

领域: cs.LG

下载: http://arxiv.org/abs/2410.06935v1

Gaitor: Learning a Unified Representation Across Gaits for Real-World Quadruped Locomotion

The current state-of-the-art in quadruped locomotion is able to produce a variety of complex motions. These methods either rely on switching between a discrete set of skills or learn a distribution across gaits using complex black-box models. Alternatively, we present Gaitor, which learns a disentangled and 2D representation across locomotion gaits. This learnt representation forms a planning space for closed-loop control delivering continuous gait transitions and perceptive terrain traversal. Gaitor's latent space is readily interpretable and we discover that during gait transitions, novel unseen gaits emerge. The latent space is disentangled with respect to footswing heights and lengths. This means that these gait characteristics can be varied independently in the 2D latent representation. Together with a simple terrain encoding and a learnt planner operating in the latent space, Gaitor can take motion commands including desired gait type and swing characteristics all while reacting to uneven terrain. We evaluate Gaitor in both simulation and the real world on the ANYmal C platform. To the best of our knowledge, this is the first work learning a unified and interpretable latent space for multiple gaits, resulting in continuous blending between different locomotion modes on a real quadruped robot. An overview of the methods and results in this paper is found at https://youtu.be/eVFQbRyilCA.

Updated: 2024-10-09 14:27:51

标题: Gaitor：学习实现四足动物在现实世界中步态统一表示

摘要: 目前四足动物运动学的最新技术能够产生各种复杂的动作。这些方法要么依赖于在一组离散技能之间切换，要么使用复杂的黑盒模型学习步态分布。相反，我们提出了Gaitor，它学习了在运动步态之间解耦的2D表示。这种学习表示形成了一个用于闭环控制的规划空间，实现了连续的步态过渡和感知地形遍历。Gaitor的潜在空间是容易解释的，我们发现在步态过渡过程中，新颖的未见步态会出现。潜在空间与脚摆高度和长度解耦。这意味着这些步态特征可以在2D潜在表示中独立变化。结合简单的地形编码和在潜在空间中运行的学习规划器，Gaitor可以接受包括期望步态类型和摆动特性在内的运动命令，同时对不平整地形做出反应。我们在ANYmal C平台的模拟和实际世界中评估了Gaitor。据我们所知，这是第一项为多种步态学习统一且可解释的潜在空间的工作，从而在真实四足机器人上实现不同运动模式之间的连续融合。本文中的方法和结果概述可在https://youtu.be/eVFQbRyilCA找到。

更新时间: 2024-10-09 14:27:51

领域: cs.RO,cs.LG

下载: http://arxiv.org/abs/2405.19452v2

Reproducing and Extending Experiments in Behavioral Strategy with Large Language Models

In this study, we propose LLM agents as a novel approach in behavioral strategy research, complementing simulations and laboratory experiments to advance our understanding of cognitive processes in decision-making. Specifically, we reproduce a human laboratory experiment in behavioral strategy using large language model (LLM) generated agents and investigate how LLM agents compare to observed human behavior. Our results show that LLM agents effectively reproduce search behavior and decision-making comparable to humans. Extending our experiment, we analyze LLM agents' simulated "thoughts," discovering that more forward-looking thoughts correlate with favoring exploitation over exploration to maximize wealth. We show how this new approach can be leveraged in behavioral strategy research and address limitations.

Updated: 2024-10-09 14:26:20

标题: 使用大型语言模型复制和扩展行为策略实验

摘要: 在这项研究中，我们提出LLM代理作为行为策略研究中的一种新方法，补充了模拟和实验室实验，以促进我们对决策中认知过程的理解。具体来说，我们使用大型语言模型（LLM）生成的代理重新现了一个人类实验室实验，在行为策略中探讨了LLM代理与观察到的人类行为之间的比较。我们的结果表明，LLM代理有效地重现了与人类相当的搜索行为和决策过程。扩展我们的实验，我们分析了LLM代理的模拟“思考”，发现更具前瞻性的思考与更倾向于利用而非探索以最大化财富的行为相关。我们展示了这种新方法如何可以在行为策略研究中发挥作用并解决局限性。

更新时间: 2024-10-09 14:26:20

领域: econ.GN,cs.AI,q-fin.EC

下载: http://arxiv.org/abs/2410.06932v1

Combining Automated Optimisation of Hyperparameters and Reward Shape

There has been significant progress in deep reinforcement learning (RL) in recent years. Nevertheless, finding suitable hyperparameter configurations and reward functions remains challenging even for experts, and performance heavily relies on these design choices. Also, most RL research is conducted on known benchmarks where knowledge about these choices already exists. However, novel practical applications often pose complex tasks for which no prior knowledge about good hyperparameters and reward functions is available, thus necessitating their derivation from scratch. Prior work has examined automatically tuning either hyperparameters or reward functions individually. We demonstrate empirically that an RL algorithm's hyperparameter configurations and reward function are often mutually dependent, meaning neither can be fully optimised without appropriate values for the other. We then propose a methodology for the combined optimisation of hyperparameters and the reward function. Furthermore, we include a variance penalty as an optimisation objective to improve the stability of learned policies. We conducted extensive experiments using Proximal Policy Optimisation and Soft Actor-Critic on four environments. Our results show that combined optimisation significantly improves over baseline performance in half of the environments and achieves competitive performance in the others, with only a minor increase in computational costs. This suggests that combined optimisation should be best practice.

Updated: 2024-10-09 14:24:24

标题: 结合超参数的自动优化和奖励形状

摘要: 近年来，深度强化学习（RL）取得了显著进展。然而，即使对专家来说，寻找合适的超参数配置和奖励函数仍然具有挑战性，并且性能在很大程度上取决于这些设计选择。此外，大多数RL研究是在已知基准上进行的，其中已经存在关于这些选择的知识。然而，新颖的实际应用通常提出了复杂的任务，其中没有关于良好超参数和奖励函数的先前知识可用，因此需要从头开始推导它们。先前的工作已经研究了自动调整超参数或奖励函数。我们通过实验证明，RL算法的超参数配置和奖励函数通常是相互依赖的，意味着在没有适当值的情况下，两者都无法完全优化。然后，我们提出了一种同时优化超参数和奖励函数的方法论。此外，我们将方差惩罚作为优化目标，以提高学习策略的稳定性。我们在四个环境中使用Proximal Policy Optimisation和Soft Actor-Critic进行了大量实验。我们的结果表明，结合优化显著改善了半数环境的基准性能，并在其他环境中取得了竞争性能，而计算成本只有轻微增加。这表明，结合优化应该是最佳实践。

更新时间: 2024-10-09 14:24:24

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2406.18293v2

RouteFinder: Towards Foundation Models for Vehicle Routing Problems

This paper introduces RouteFinder, a comprehensive foundation model framework to tackle different Vehicle Routing Problem (VRP) variants. Our core idea is that a foundation model for VRPs should be able to represent variants by treating each as a subset of a generalized problem equipped with different attributes. We propose a unified VRP environment capable of efficiently handling any attribute combination. The RouteFinder model leverages a modern transformer-based encoder and global attribute embeddings to improve task representation. Additionally, we introduce two reinforcement learning techniques to enhance multi-task performance: mixed batch training, which enables training on different variants at once, and multi-variant reward normalization to balance different reward scales. Finally, we propose efficient adapter layers that enable fine-tuning for new variants with unseen attributes. Extensive experiments on 24 VRP variants show RouteFinder achieves competitive results. Our code is openly available at https://github.com/ai4co/routefinder.

Updated: 2024-10-09 14:23:24

标题: RouteFinder: 为车辆路径问题建立基础模型

摘要: 本文介绍了RouteFinder，这是一个全面的基础模型框架，用于解决不同的车辆路径问题（VRP）变体。我们的核心思想是，VRP的基础模型应该能够通过将每个变体视为配备不同属性的广义问题的子集来表示变体。我们提出了一个统一的VRP环境，能够高效处理任何属性组合。RouteFinder模型利用了基于现代转换器的编码器和全局属性嵌入来改进任务表示。此外，我们引入了两种强化学习技术来增强多任务性能：混合批量训练，可以同时训练不同的变体，以及多变体奖励归一化，以平衡不同的奖励尺度。最后，我们提出了有效的适配器层，可实现对具有未见属性的新变体进行微调。对24种VRP变体的广泛实验表明，RouteFinder取得了竞争性的结果。我们的代码可以在https://github.com/ai4co/routefinder上公开获取。

更新时间: 2024-10-09 14:23:24

领域: cs.AI

下载: http://arxiv.org/abs/2406.15007v2

Estimating Exoplanet Mass using Machine Learning on Incomplete Datasets

The exoplanet archive is an incredible resource of information on the properties of discovered extrasolar planets, but statistical analysis has been limited by the number of missing values. One of the most informative bulk properties is planet mass, which is particularly challenging to measure with more than 70\% of discovered planets with no measured value. We compare the capabilities of five different machine learning algorithms that can utilize multidimensional incomplete datasets to estimate missing properties for imputing planet mass. The results are compared when using a partial subset of the archive with a complete set of six planet properties, and where all planet discoveries are leveraged in an incomplete set of six and eight planet properties. We find that imputation results improve with more data even when the additional data is incomplete, and allows a mass prediction for any planet regardless of which properties are known. Our favored algorithm is the newly developed $k$NN$\times$KDE, which can return a probability distribution for the imputed properties. The shape of this distribution can indicate the algorithm's level of confidence, and also inform on the underlying demographics of the exoplanet population. We demonstrate how the distributions can be interpreted with a series of examples for planets where the discovery was made with either the transit method, or radial velocity method. Finally, we test the generative capability of the $k$NN$\times$KDE to create a large synthetic population of planets based on the archive, and identify potential categories of planets from groups of properties in the multidimensional space. All codes are Open Source.

Updated: 2024-10-09 14:19:33

标题: 使用机器学习在不完整数据集上估计外行星质量

摘要: 外行星档案是一个关于已发现的系外行星性质的信息资源，但由于缺失数值的数量限制了统计分析的范围。其中一个最具信息量的整体性质是行星质量，这在测量时尤为具有挑战性，因为超过70％的已发现行星没有测量数值。我们比较了五种不同的机器学习算法的能力，这些算法可以利用多维不完整数据集来估计缺失的属性，以推测行星质量。当使用存档的部分子集，其中包含完整的六个行星属性，以及在不完整的六个和八个行星属性集中利用所有行星发现时，我们对结果进行了比较。我们发现，即使额外数据是不完整的，使用更多数据可以改善插补结果，并且可以为任何行星预测质量，而不管了解哪些属性。我们钟爱的算法是新开发的$k$NN$\times$KDE，它可以返回插补属性的概率分布。这个分布的形状可以指示算法的信心水平，并且还可以提供关于系外行星族群的基础人口统计信息。我们演示了如何通过一系列例子来解释这些分布，这些例子中的行星是通过凌日法或径向速度法发现的。最后，我们测试了$k$NN$\times$KDE的生成能力，根据存档创建了一个大规模的合成行星族群，并从多维空间中的属性组中识别潜在的行星类别。所有代码均为开源的。

更新时间: 2024-10-09 14:19:33

领域: astro-ph.EP,astro-ph.IM,cs.LG

下载: http://arxiv.org/abs/2410.06922v1

Adversarial Vulnerability as a Consequence of On-Manifold Inseparibility

Recent works have shown theoretically and empirically that redundant data dimensions are a source of adversarial vulnerability. However, the inverse doesn't seem to hold in practice; employing dimension-reduction techniques doesn't exhibit robustness as expected. In this work, we consider classification tasks and characterize the data distribution as a low-dimensional manifold, with high/low variance features defining the on/off manifold direction. We argue that clean training experiences poor convergence in the off-manifold direction caused by the ill-conditioning in widely used first-order optimizers like gradient descent. The poor convergence then acts as a source of adversarial vulnerability when the dataset is inseparable in the on-manifold direction. We provide theoretical results for logistic regression and a 2-layer linear network on the considered data distribution. Furthermore, we advocate using second-order methods that are immune to ill-conditioning and lead to better robustness. We perform experiments and exhibit tremendous robustness improvements in clean training through long training and the employment of second-order methods, corroborating our framework. Additionally, we find the inclusion of batch-norm layers hinders such robustness gains. We attribute this to differing implicit biases between traditional and batch-normalized neural networks.

Updated: 2024-10-09 14:18:52

标题: 对抗性脆弱性作为在流形上不可分离性的结果

摘要: 最近的研究在理论和实证上表明，冗余的数据维度是对抗性脆弱性的一个来源。然而，在实践中似乎并不成立；采用降维技术并没有像预期的那样表现出鲁棒性。在这项工作中，我们考虑分类任务，并将数据分布描述为一个低维流形，高/低方差特征定义了流形方向的开/关。我们认为，在广泛使用的梯度下降等一阶优化器中存在的病态导致了在流形方向上的不良收敛，从而作为对抗性脆弱性的一个来源，特别是当数据集在流形方向上不可分时。我们为所考虑的数据分布上的逻辑回归和2层线性网络提供了理论结果。此外，我们主张使用对病态鲁棒且能够实现更好鲁棒性的二阶方法。我们进行实验，并通过长时间训练和使用二阶方法展示了干净训练的巨大鲁棒性改进，证实了我们的框架。此外，我们发现批归一化层的加入阻碍了这种鲁棒性的提升。我们将这归因于传统神经网络和批归一化神经网络之间的不同隐含偏差。

更新时间: 2024-10-09 14:18:52

领域: stat.ML,cs.LG

下载: http://arxiv.org/abs/2410.06921v1

Compositional Entailment Learning for Hyperbolic Vision-Language Models

Image-text representation learning forms a cornerstone in vision-language models, where pairs of images and textual descriptions are contrastively aligned in a shared embedding space. Since visual and textual concepts are naturally hierarchical, recent work has shown that hyperbolic space can serve as a high-potential manifold to learn vision-language representation with strong downstream performance. In this work, for the first time we show how to fully leverage the innate hierarchical nature of hyperbolic embeddings by looking beyond individual image-text pairs. We propose Compositional Entailment Learning for hyperbolic vision-language models. The idea is that an image is not only described by a sentence but is itself a composition of multiple object boxes, each with their own textual description. Such information can be obtained freely by extracting nouns from sentences and using openly available localized grounding models. We show how to hierarchically organize images, image boxes, and their textual descriptions through contrastive and entailment-based objectives. Empirical evaluation on a hyperbolic vision-language model trained with millions of image-text pairs shows that the proposed compositional learning approach outperforms conventional Euclidean CLIP learning, as well as recent hyperbolic alternatives, with better zero-shot and retrieval generalization and clearly stronger hierarchical performance.

Updated: 2024-10-09 14:12:50

标题: 超几何视觉语言模型的组合蕴含学习

摘要: 图文表示学习是视觉语言模型中的基石，其中图像和文本描述的配对在共享嵌入空间中进行对比对齐。由于视觉和文本概念在本质上是层次化的，最近的研究表明，双曲空间可以作为学习具有强大下游性能的视觉语言表示的高潜力流形。在这项工作中，我们首次展示了如何充分利用双曲嵌入的固有层次结构，超越单个图像-文本对。我们提出了用于双曲视觉语言模型的组合蕴涵学习。其核心思想是图像不仅由一个句子描述，而且本身是由多个物体框组成，每个框都有自己的文本描述。这样的信息可以通过从句子中提取名词并使用公开可用的本地化基础模型自由获取。我们展示了如何通过对比和蕴涵为目标，分层组织图像、图像框和它们的文本描述。经验评估表明，在使用数百万个图像-文本对训练的双曲视觉语言模型上，所提出的组合学习方法优于传统的欧几里得CLIP学习，以及最近的双曲替代方案，在零样本和检索泛化以及明显更强的层次性能方面表现更好。

更新时间: 2024-10-09 14:12:50

领域: cs.CV,cs.AI,cs.LG

下载: http://arxiv.org/abs/2410.06912v1

Combining Planning and Diffusion for Mobility with Unknown Dynamics

Manipulation of large objects over long horizons (such as carts in a warehouse) is an essential skill for deployable robotic systems. Large objects require mobile manipulation which involves simultaneous manipulation, navigation, and movement with the object in tow. In many real-world situations, object dynamics are incredibly complex, such as the interaction of an office chair (with a rotating base and five caster wheels) and the ground. We present a hierarchical algorithm for long-horizon robot manipulation problems in which the dynamics are partially unknown. We observe that diffusion-based behavior cloning is highly effective for short-horizon problems with unknown dynamics, so we decompose the problem into an abstract high-level, obstacle-aware motion-planning problem that produces a waypoint sequence. We use a short-horizon, relative-motion diffusion policy to achieve the waypoints in sequence. We train mobile manipulation policies on a Spot robot that has to push and pull an office chair. Our hierarchical manipulation policy performs consistently better, especially when the horizon increases, compared to a diffusion policy trained on long-horizon demonstrations or motion planning assuming a rigidly-attached object (success rate of 8 (versus 0 and 5 respectively) out of 10 runs). Importantly, our learned policy generalizes to new layouts, grasps, chairs, and flooring that induces more friction, without any further training, showing promise for other complex mobile manipulation problems. Project Page: https://yravan.github.io/plannerorderedpolicy/

Updated: 2024-10-09 14:12:28

标题: 结合规划和扩散在未知动态下的移动性

摘要: 操纵长期视野内的大型物体（如仓库中的推车）是可部署机器人系统的重要技能。大型物体需要移动操纵，涉及同时操纵、导航和移动物体。在许多实际情况中，物体动态非常复杂，例如办公椅（带有旋转底座和五个转轮）与地面的相互作用。我们提出了一种分层算法，用于长期视野内部分未知动态的机器人操纵问题。我们观察到，基于扩散的行为克隆对于具有未知动态的短期视野问题非常有效，因此我们将问题分解为一个抽象的高级、障碍感知的运动规划问题，生成一个路径点序列。我们使用短期视野、相对运动扩散策略来按顺序实现路径点。我们在一个需要推拉办公椅的Spot机器人上训练移动操纵策略。与在长期视野演示或假设刚性附着物体的运动规划上训练的扩散策略相比，我们的分层操纵策略表现更好，特别是当视野增加时（在10次运行中的成功率分别为8（与0和5相比）。重要的是，我们学习的策略可以泛化到新的布局、抓取、椅子和引起更多摩擦的地板上，无需进一步训练，显示了对其他复杂移动操纵问题的潜力。项目页面：https://yravan.github.io/plannerorderedpolicy/

更新时间: 2024-10-09 14:12:28

领域: cs.RO,cs.AI

下载: http://arxiv.org/abs/2410.06911v1

Complexity Assessment of Analog and Digital Security Primitives Signals Using the Disentropy of Autocorrelation

The study of regularity in signals can be of great importance, typically in medicine to analyse electrocardiogram (ECG) or electromyography (EMG) signals, but also in climate studies, finance or security. In this work we focus on security primitives such as Physical Unclonable Functions (PUFs) or Pseudo-Random Number Generators (PRNGs). Such primitives must have a high level of complexity or entropy in their responses to guarantee enough security for their applications. There are several ways of assessing the complexity of their responses, especially in the binary domain. With the development of analog PUFs such as optical (photonic) PUFs, it would be useful to be able to assess their complexity in the analog domain when designing them, for example, before converting analog signals into binary. In this numerical study, we decided to explore the potential of the disentropy of autocorrelation as a measure of complexity for security primitives as PUFs, TRNGs or PRNGs with analog output or responses. We compare this metric to others used to assess regularities in analog signals such as Approximate Entropy (ApEn) and Fuzzy Entropy (FuzEn). We show that the disentropy of autocorrelation is able to differentiate between well-known PRNGs and non-optimised or bad PRNGs in the analog and binary domain with a better contrast than ApEn and FuzEn. Next, we show that the disentropy of autocorrelation is able to detect small patterns injected in PUFs responses and then we applied it to photonic PUFs simulations.

Updated: 2024-10-09 14:12:18

标题: 使用自相关的不确定性对模拟和数字安全基元信号的复杂性进行评估

摘要: 信号规律性的研究可能具有重要意义，通常用于医学领域分析心电图（ECG）或肌电图（EMG）信号，同时也用于气候研究、金融和安全领域。在本研究中，我们关注安全原语，如物理不可克隆函数（PUFs）或伪随机数生成器（PRNGs）。这些原语在响应中必须具有高复杂性或熵水平，以确保其应用的足够安全性。评估它们响应的复杂性有几种方法，特别是在二进制领域。随着模拟PUFs（如光学（光子）PUFs）的发展，能够在设计它们时评估其模拟域中的复杂性将非常有用，例如，在将模拟信号转换为二进制之前。在这项数字研究中，我们决定探索自相关的非熵作为评估PUFs、TRNGs或PRNGs等具有模拟输出或响应的安全原语复杂性的潜力。我们将这一度量与用于评估模拟信号规律性的其他度量（如近似熵（ApEn）和模糊熵（FuzEn））进行比较。我们展示了自相关的非熵能够在模拟和二进制领域中区分出众所周知的PRNGs和未优化或糟糕的PRNGs，其对比度比ApEn和FuzEn更好。接下来，我们展示了自相关的非熵能够检测注入到PUFs响应中的小模式，然后我们将其应用于光子PUFs模拟。

更新时间: 2024-10-09 14:12:18

领域: cs.CR

下载: http://arxiv.org/abs/2402.17488v2

Principal Orthogonal Latent Components Analysis (POLCA Net)

Representation learning is a pivotal area in the field of machine learning, focusing on the development of methods to automatically discover the representations or features needed for a given task from raw data. Unlike traditional feature engineering, which requires manual crafting of features, representation learning aims to learn features that are more useful and relevant for tasks such as classification, prediction, and clustering. We introduce Principal Orthogonal Latent Components Analysis Network (POLCA Net), an approach to mimic and extend PCA and LDA capabilities to non-linear domains. POLCA Net combines an autoencoder framework with a set of specialized loss functions to achieve effective dimensionality reduction, orthogonality, variance-based feature sorting, high-fidelity reconstructions, and additionally, when used with classification labels, a latent representation well suited for linear classifiers and low dimensional visualization of class distribution as well.

Updated: 2024-10-09 14:04:31

标题: 主正交潜在成分分析（POLCA Net）

摘要: 表征学习是机器学习领域的一个关键领域，重点是开发方法，从原始数据中自动发现为给定任务所需的表示或特征。与传统的特征工程不同，需要手动制作特征，表征学习旨在学习对于分类、预测和聚类等任务更有用和相关的特征。我们介绍了主正交潜在成分分析网络（POLCA Net），这是一种模仿和扩展PCA和LDA能力到非线性领域的方法。POLCA Net结合了自动编码器框架和一组专门的损失函数，以实现有效的降维、正交性、基于方差的特征排序、高保真重构，此外，当与分类标签一起使用时，还适合线性分类器和类别分布的低维可视化的潜在表示。

更新时间: 2024-10-09 14:04:31

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2410.07289v1

Secure Software/Hardware Hybrid In-Field Testing for System-on-Chip

Modern Systems-on-Chip (SoCs) incorporate built-in self-test (BIST) modules deeply integrated into the device's intellectual property (IP) blocks. Such modules handle hardware faults and defects during device operation. As such, BIST results potentially reveal the internal structure and state of the device under test (DUT) and hence open attack vectors. So-called result compaction can overcome this vulnerability by hiding the BIST chain structure but introduces the issues of aliasing and invalid signatures. Software-BIST provides a flexible solution, that can tackle these issues, but suffers from limited observability and fault coverage. In this paper, we hence introduce a low-overhead software/hardware hybrid approach that overcomes the mentioned limitations. It relies on (a) keyed-hash message authentication code (KMAC) available on the SoC providing device-specific secure and valid signatures with zero aliasing and (b) the SoC processor for test scheduling hence increasing DUT availability. The proposed approach offers both on-chip- and remote-testing capabilities. We showcase a RISC-V-based SoC to demonstrate our approach, discussing system overhead and resulting compaction rates.

Updated: 2024-10-09 14:01:46

标题: 在现场测试中用于片上系统的安全软硬件混合测试

摘要: 现代系统芯片（SoCs）集成了内置自测试（BIST）模块，深度集成到设备的知识产权（IP）块中。这些模块在设备操作过程中处理硬件故障和缺陷。因此，BIST结果可能会揭示被测试设备（DUT）的内部结构和状态，从而打开攻击通道。所谓的结果压缩可以通过隐藏BIST链结构来克服这种脆弱性，但会引入别名和无效签名的问题。软件-BIST提供了一种灵活的解决方案，可以解决这些问题，但受限于有限的可观察性和故障覆盖。因此，在本文中，我们介绍一种低开销的软件/硬件混合方法，克服了上述限制。它依赖于（a）SoC上可用的密钥哈希消息认证码（KMAC）提供具有零别名的设备特定安全和有效签名，以及（b）SoC处理器用于测试调度，从而提高DUT的可用性。所提出的方法提供了芯片内和远程测试功能。我们展示了基于RISC-V的SoC来演示我们的方法，讨论系统开销和结果压缩率。

更新时间: 2024-10-09 14:01:46

领域: cs.AR,cs.CR

下载: http://arxiv.org/abs/2410.05109v2

Average Certified Radius is a Poor Metric for Randomized Smoothing

Randomized smoothing is a popular approach for providing certified robustness guarantees against adversarial attacks, and has become a very active area of research. Over the past years, the average certified radius (ACR) has emerged as the single most important metric for comparing methods and tracking progress in the field. However, in this work, we show that ACR is an exceptionally poor metric for evaluating robustness guarantees provided by randomized smoothing. We theoretically show not only that a trivial classifier can have arbitrarily large ACR, but also that ACR is much more sensitive to improvements on easy samples than on hard ones. Empirically, we confirm that existing training strategies that improve ACR reduce the model's robustness on hard samples. Further, we show that by focusing on easy samples, we can effectively replicate the increase in ACR. We develop strategies, including explicitly discarding hard samples, reweighing the dataset with certified radius, and extreme optimization for easy samples, to achieve state-of-the-art ACR, although these strategies ignore robustness for the general data distribution. Overall, our results suggest that ACR has introduced a strong undesired bias to the field, and better metrics are required to holistically evaluate randomized smoothing.

Updated: 2024-10-09 13:58:41

标题: 平均认证半径是随机平滑的一个糟糕度量标准

摘要: 随机平滑是提供对抗性攻击的认证鲁棒性保证的一种流行方法，已经成为一个非常活跃的研究领域。在过去几年中，平均认证半径（ACR）已经成为比较方法和跟踪领域进展最重要的指标。然而，在本研究中，我们展示了ACR是评估随机平滑提供的鲁棒性保证的一个异常糟糕的指标。我们理论上展示了一个微不足道的分类器不仅可以具有任意大的ACR，而且ACR对于在易样本上的改进更加敏感，而对于困难样本的改进则不那么敏感。从经验上看，我们确认现有的训练策略虽然可以改善ACR，但却会降低模型在困难样本上的鲁棒性。此外，我们展示了通过专注于易样本，我们可以有效地复制ACR的增加。我们制定了策略，包括明确丢弃困难样本、使用认证半径对数据集进行重新加权和极端优化易样本，以实现最先进的ACR，尽管这些策略忽略了一般数据分布的鲁棒性。总的来说，我们的结果表明ACR已经为该领域引入了一个强烈的不良偏见，需要更好的指标来全面评估随机平滑。

更新时间: 2024-10-09 13:58:41

领域: cs.LG

下载: http://arxiv.org/abs/2410.06895v1

Applying Quantum Autoencoders for Time Series Anomaly Detection

Anomaly detection is an important problem with applications in various domains such as fraud detection, pattern recognition or medical diagnosis. Several algorithms have been introduced using classical computing approaches. However, using quantum computing for solving anomaly detection problems in time series data is a widely unexplored research field. This paper explores the application of quantum autoencoders to time series anomaly detection. We investigate two primary techniques for classifying anomalies: (1) Analyzing the reconstruction error generated by the quantum autoencoder and (2) latent representation analysis. Our simulated experimental results, conducted across various ansaetze, demonstrate that quantum autoencoders consistently outperform classical deep learning-based autoencoders across multiple datasets. Specifically, quantum autoencoders achieve superior anomaly detection performance while utilizing 60-230 times fewer parameters and requiring five times fewer training iterations. In addition, we implement our quantum encoder on real quantum hardware. Our experimental results demonstrate that quantum autoencoders achieve anomaly detection performance on par with their simulated counterparts.

Updated: 2024-10-09 13:56:28

标题: 应用量子自编码器进行时间序列异常检测

摘要: 异常检测是一个重要的问题，在各个领域都有应用，比如欺诈检测、模式识别或医学诊断。已经提出了几种使用传统计算方法的算法。然而，在解决时间序列数据中的异常检测问题时，利用量子计算是一个广泛未开发的研究领域。本文探讨了量子自编码器在时间序列异常检测中的应用。我们研究了两种主要的分类异常技术：（1）分析量子自编码器生成的重构误差和（2）潜在表示分析。我们的模拟实验结果，跨多种ansaetze进行，表明量子自编码器在多个数据集上始终优于基于经典深度学习的自编码器。具体而言，量子自编码器在利用60-230倍更少的参数和需要五倍更少的训练迭代的情况下实现了优越的异常检测性能。此外，我们在真实量子硬件上实现了我们的量子编码器。我们的实验结果表明，量子自编码器在实现异常检测性能方面与其模拟对应物相当。

更新时间: 2024-10-09 13:56:28

领域: cs.LG,cs.AI,cs.ET,quant-ph

下载: http://arxiv.org/abs/2410.04154v2

Federated Impression for Learning with Distributed Heterogeneous Data

Standard deep learning-based classification approaches may not always be practical in real-world clinical applications, as they require a centralized collection of all samples. Federated learning (FL) provides a paradigm that can learn from distributed datasets across clients without requiring them to share data, which can help mitigate privacy and data ownership issues. In FL, sub-optimal convergence caused by data heterogeneity is common among data from different health centers due to the variety in data collection protocols and patient demographics across centers. Through experimentation in this study, we show that data heterogeneity leads to the phenomenon of catastrophic forgetting during local training. We propose FedImpres which alleviates catastrophic forgetting by restoring synthetic data that represents the global information as federated impression. To achieve this, we distill the global model resulting from each communication round. Subsequently, we use the synthetic data alongside the local data to enhance the generalization of local training. Extensive experiments show that the proposed method achieves state-of-the-art performance on both the BloodMNIST and Retina datasets, which contain label imbalance and domain shift, with an improvement in classification accuracy of up to 20%.

Updated: 2024-10-09 13:55:01

标题: 分布异构数据学习中的联邦印象

摘要: 标准的基于深度学习的分类方法在真实世界的临床应用中并不总是实用的，因为它们需要集中收集所有样本。联邦学习（FL）提供了一种范例，可以从客户端分布的数据集中学习，而无需要求它们共享数据，这有助于缓解隐私和数据所有权问题。在FL中，由于不同健康中心的数据收集协议和患者人口统计数据的差异，数据异质性引起的次优收敛在来自不同健康中心的数据中很常见。通过本研究的实验，我们展示了数据异质性导致了本地训练中的灾难性遗忘现象。我们提出了FedImpres，通过恢复代表全局信息的合成数据作为联邦印象来缓解灾难性遗忘。为实现这一目标，我们提炼了每次通信轮次产生的全局模型。随后，我们使用合成数据和本地数据来增强本地训练的泛化能力。大量实验证明，所提出的方法在包含标签不平衡和领域转移的BloodMNIST和Retina数据集上取得了最先进的性能，分类准确率提高了高达20%。

更新时间: 2024-10-09 13:55:01

领域: cs.LG,cs.AI,cs.CV,cs.DC

下载: http://arxiv.org/abs/2409.07351v2

LightDE: A Lightweight Method for Eliminating Dangling Pointers

The widespread presence of Use-After-Free (UAF) vulnerabilities poses a serious threat to software security, with dangling pointers being considered the primary cause of these vulnerabilities. However, existing methods for defending against UAF vulnerabilities by eliminating dangling pointers need to interrupt the program's execution when encountering pointer assignment operations in order to store the memory addresses of the pointers in a specific data structure. This makes these methods not lightweight. To overcome this drawback, we propose a novel approach called LightDE. This method does not require storing the memory addresses of pointers during program execution. LightDE uses our proposed structure-sensitive pointer analysis method to determine which objects pointers point to and stores the pointing relationships in the program's data segment during program compilation. Since LightDE only needs to verify if pointers identified by the pointer analysis point to released objects when eliminating dangling pointers, it is very lightweight. Our experimental results show that LightDE can effectively defend against UAF vulnerabilities and the performance overhead it introduces is very low.

Updated: 2024-10-09 13:51:07

标题: LightDE：一种轻量级消除悬空指针的方法

摘要: 广泛存在的Use-After-Free（UAF）漏洞对软件安全构成严重威胁，悬空指针被认为是这些漏洞的主要原因。然而，现有的通过消除悬空指针来防御UAF漏洞的方法需要在遇到指针赋值操作时中断程序执行，以便将指针的内存地址存储在特定数据结构中。这使得这些方法不够轻量级。为了克服这一缺点，我们提出了一种名为LightDE的新方法。该方法在程序执行期间不需要存储指针的内存地址。LightDE利用我们提出的结构敏感指针分析方法来确定指针指向的对象，并在程序编译期间将指向关系存储在程序的数据段中。由于LightDE只需要验证指针分析识别的指针是否指向已释放的对象，因此非常轻量级。我们的实验结果表明，LightDE可以有效防御UAF漏洞，并且引入的性能开销非常低。

更新时间: 2024-10-09 13:51:07

领域: cs.CR

下载: http://arxiv.org/abs/2405.20697v3

Graph Fourier Neural Kernels (G-FuNK): Learning Solutions of Nonlinear Diffusive Parametric PDEs on Multiple Domains

Predicting time-dependent dynamics of complex systems governed by non-linear partial differential equations (PDEs) with varying parameters and domains is a challenging task motivated by applications across various fields. We introduce a novel family of neural operators based on our Graph Fourier Neural Kernels, designed to learn solution generators for nonlinear PDEs in which the highest-order term is diffusive, across multiple domains and parameters. G-FuNK combines components that are parameter- and domain-adapted with others that are not. The domain-adapted components are constructed using a weighted graph on the discretized domain, where the graph Laplacian approximates the highest-order diffusive term, ensuring boundary condition compliance and capturing the parameter and domain-specific behavior. Meanwhile, the learned components transfer across domains and parameters using our variant Fourier Neural Operators. This approach naturally embeds geometric and directional information, improving generalization to new test domains without need for retraining the network. To handle temporal dynamics, our method incorporates an integrated ODE solver to predict the evolution of the system. Experiments show G-FuNK's capability to accurately approximate heat, reaction diffusion, and cardiac electrophysiology equations across various geometries and anisotropic diffusivity fields. G-FuNK achieves low relative errors on unseen domains and fiber fields, significantly accelerating predictions compared to traditional finite-element solvers.

Updated: 2024-10-09 13:46:31

标题: 图傅立叶神经核（G-FuNK）：在多个域上学习非线性扩散参数PDE的解答

摘要: 预测由非线性偏微分方程（PDEs）控制的具有不同参数和域的复杂系统的时间依赖动态是一个具有挑战性的任务，受到各个领域应用的激励。我们引入了一种基于我们的图傅立叶神经核的新型神经算子家族，旨在学习非线性PDEs的解生成器，其中最高阶项是扩散的，跨多个域和参数。G-FuNK结合了适应参数和域的组件与不适应的其他组件。适应域的组件是使用在离散域上的加权图构建的，其中图拉普拉斯近似最高阶扩散项，确保边界条件的符合性并捕捉参数和特定域的行为。同时，学习的组件使用我们的变体傅立叶神经算子在域和参数之间传递。这种方法自然地嵌入了几何和方向信息，提高了对新测试域的泛化能力，无需重新训练网络。为了处理时间动态，我们的方法结合了一个整合的ODE求解器来预测系统的演变。实验表明，G-FuNK能够准确地近似各种几何和各向异性扩散场中的热、反应扩散和心脏电生理方程。与传统的有限元求解器相比，G-FuNK在看不见的领域和纤维场上实现了较低的相对误差，显著加速了预测过程。

更新时间: 2024-10-09 13:46:31

领域: cs.LG,cs.AI,math.SP,stat.ME,stat.ML

下载: http://arxiv.org/abs/2410.04655v2

Adaptive Refinement Protocols for Distributed Distribution Estimation under $\ell^p$-Losses

Consider the communication-constrained estimation of discrete distributions under $\ell^p$ losses, where each distributed terminal holds multiple independent samples and uses limited number of bits to describe the samples. We obtain the minimax optimal rates of the problem in most parameter regimes. An elbow effect of the optimal rates at $p=2$ is clearly identified. To show the optimal rates, we first design estimation protocols to achieve them. The key ingredient of these protocols is to introduce adaptive refinement mechanisms, which first generate rough estimate by partial information and then establish refined estimate in subsequent steps guided by the rough estimate. The protocols leverage successive refinement, sample compression and thresholding methods to achieve the optimal rates in different parameter regimes. The optimality of the protocols is shown by deriving compatible minimax lower bounds.

Updated: 2024-10-09 13:46:08

标题: 自适应细化协议用于$\ell^p$-损失下的分布估计

摘要: 考虑在$\ell^p$损失下通信受限的离散分布估计问题，其中每个分布终端持有多个独立样本，并使用有限数量的比特来描述这些样本。我们在大多数参数范围内获得了该问题的极小极大最优速率。明确地识别了$p=2$时最优速率的“拐点”效应。为展示最优速率，我们首先设计了估计协议来实现它们。这些协议的关键要素是引入自适应细化机制，首先通过部分信息生成粗略估计，然后在后续步骤中根据粗略估计建立精细估计。这些协议利用连续细化、样本压缩和阈值方法，在不同参数范围内实现最优速率。通过推导相容的极小极大下界来展示协议的最优性。

更新时间: 2024-10-09 13:46:08

领域: cs.LG,cs.IT,math.IT,math.ST,stat.TH

下载: http://arxiv.org/abs/2410.06884v1

Privately Counting Partially Ordered Data

We consider differentially private counting when each data point consists of $d$ bits satisfying a partial order. Our main technical contribution is a problem-specific $K$-norm mechanism that runs in time $O(d^2)$. Experiments show that, depending on the partial order in question, our solution dominates existing pure differentially private mechanisms, and can reduce their error by an order of magnitude or more.

Updated: 2024-10-09 13:43:35

标题: 私下统计部分有序数据

摘要: 我们考虑在每个数据点包含满足部分顺序的$d$位时的差分隐私计数。我们的主要技术贡献是一个问题特定的$K$-范数机制，运行时间为$O(d^2)$。实验表明，根据所讨论的部分顺序，我们的解决方案优于现有的纯差分隐私机制，并且可以将它们的误差降低一个数量级或更多。

更新时间: 2024-10-09 13:43:35

领域: cs.CR

下载: http://arxiv.org/abs/2410.06881v1

Noise is All You Need: Private Second-Order Convergence of Noisy SGD

Private optimization is a topic of major interest in machine learning, with differentially private stochastic gradient descent (DP-SGD) playing a key role in both theory and practice. Furthermore, DP-SGD is known to be a powerful tool in contexts beyond privacy, including robustness, machine unlearning, etc. Existing analyses of DP-SGD either make relatively strong assumptions (e.g., Lipschitz continuity of the loss function, or even convexity) or prove only first-order convergence (and thus might end at a saddle point in the non-convex setting). At the same time, there has been progress in proving second-order convergence of the non-private version of ``noisy SGD'', as well as progress in designing algorithms that are more complex than DP-SGD and do guarantee second-order convergence. We revisit DP-SGD and show that ``noise is all you need'': the noise necessary for privacy already implies second-order convergence under the standard smoothness assumptions, even for non-Lipschitz loss functions. Hence, we get second-order convergence essentially for free: DP-SGD, the workhorse of modern private optimization, under minimal assumptions can be used to find a second-order stationary point.

Updated: 2024-10-09 13:43:17

标题: 噪声就是你所需要的：有噪声SGD的私密二阶收敛

摘要: 私人优化是机器学习中一个备受关注的话题，具有差分私有随机梯度下降（DP-SGD）在理论和实践中发挥关键作用。此外，DP-SGD已被认为是一个强大工具，可应用于隐私以外的环境，包括鲁棒性、机器遗忘等。现有的DP-SGD分析要么做出相对强的假设（例如，损失函数的Lipschitz连续性，甚至是凸性），要么只能证明一阶收敛（因此在非凸设置中可能停留在鞍点）。同时，非私有版本的“有噪声SGD”已经取得了二阶收敛的进展，还有设计更复杂比DP-SGD更复杂的算法，可以保证二阶收敛。我们重新审视DP-SGD，并展示“噪声就是你需要的一切”：隐私所需的噪声已经意味着在标准平滑性假设下，即使对于非Lipschitz损失函数，也已经实现了二阶收敛。因此，我们基本上免费获得了二阶收敛：现代私有优化的主力军DP-SGD，在最小的假设下可以用来找到一个二阶稳定点。

更新时间: 2024-10-09 13:43:17

领域: cs.LG

下载: http://arxiv.org/abs/2410.06878v1

Representation Tuning

Activation engineering is becoming increasingly popular as a means of online control of large language models (LLMs). In this work, I extend the idea of active steering with vectors that represent a behavioral direction of interest to tuning those vectors directly into the model, obviating the need for online control. First, I identify activation vectors related to honesty in an open-source LLM (Llama- 2-13b-chat). Next, I demonstrate that model output can be made more or less honest by adding positive or negative multiples of these vectors to residual stream activations during generation. Then, I show that a similar effect can be achieved by fine-tuning the vectors directly into the model, by use of a dual loss function based on the cosine similarity of residual stream activations to the vectors combined with a standard token-based loss ("representation tuning"). Finally, I compare the generations in response to honesty-probing prompts from the resulting models to those from models fine-tuned with a token-based loss alone, and to those from the untuned model subjected to online steering. Overall, fine-tuning the vectors into the models using the cosine similarity plus token loss showed a stronger effect than online steering, and generalized better than using the standard loss, suggesting the potential utility of this approach as a safety measure. Code and data are available at https://github.com/cma1114/representation_tuning; tuned models are available at https://huggingface.co/collections/cackerman/ representation-tuning-66da1e5ab41cd1b824687d9f.

Updated: 2024-10-09 13:39:27

标题: 表现调整

摘要: 激活工程在在线控制大型语言模型（LLMs）方面越来越受欢迎。在这项工作中，我扩展了使用代表感兴趣的行为方向的向量进行主动引导的想法，直接将这些向量调整到模型中，避免了在线控制的需要。首先，我在一个开源LLM（Llama-2-13b-chat）中识别与诚实相关的激活向量。接下来，我展示了通过在生成过程中向残差流激活中添加这些向量的正负倍数，可以使模型的输出更加诚实或不诚实。然后，我展示了通过将这些向量直接调整到模型中，通过基于余弦相似度的双重损失函数与标准基于token的损失相结合（“表示调整”），也可以实现类似的效果。最后，我将根据对比结果模型对诚实探测提示的生成结果，将使用余弦相似度加token损失进行向量微调的模型与仅使用基于token的损失微调的模型以及经过在线控制的未调整模型进行比较。总体而言，使用余弦相似度加token损失将向量微调到模型中显示出比在线控制更强的效果，并且比使用标准损失更好地泛化，暗示了这种方法作为一种安全措施的潜在实用性。代码和数据可在https://github.com/cma1114/representation_tuning获得；调整后的模型可在https://huggingface.co/collections/cackerman/representation-tuning-66da1e5ab41cd1b824687d9f获得。

更新时间: 2024-10-09 13:39:27

领域: cs.LG,cs.CL

下载: http://arxiv.org/abs/2409.06927v3

Group Shapley Value and Counterfactual Simulations in a Structural Model

We propose a variant of the Shapley value, the group Shapley value, to interpret counterfactual simulations in structural economic models by quantifying the importance of different components. Our framework compares two sets of parameters, partitioned into multiple groups, and applying group Shapley value decomposition yields unique additive contributions to the changes between these sets. The relative contributions sum to one, enabling us to generate an importance table that is as easily interpretable as a regression table. The group Shapley value can be characterized as the solution to a constrained weighted least squares problem. Using this property, we develop robust decomposition methods to address scenarios where inputs for the group Shapley value are missing. We first apply our methodology to a simple Roy model and then illustrate its usefulness by revisiting two published papers.

Updated: 2024-10-09 13:38:59

标题: 群体谢普利值和结构模型中的反事实模拟

摘要: 我们提出了Shapley值的一个变种，即组Shapley值，用于解释结构经济模型中的反事实模拟，通过量化不同组成部分的重要性。我们的框架比较了分成多个组的两组参数，并应用组Shapley值分解得出了这些组之间变化的独特累加贡献。相对贡献总和为一，使我们能够生成一个与回归表一样易于解释的重要性表。组Shapley值可以被描述为受限加权最小二乘问题的解。利用这一性质，我们开发了健壮的分解方法，以解决组Shapley值的输入缺失的情况。我们首先将我们的方法应用于一个简单的罗伊模型，然后通过重新审视两篇已发表的论文来说明其实用性。

更新时间: 2024-10-09 13:38:59

领域: econ.EM,cs.LG,stat.ME

下载: http://arxiv.org/abs/2410.06875v1

Differentially Private Deep Model-Based Reinforcement Learning

We address private deep offline reinforcement learning (RL), where the goal is to train a policy on standard control tasks that is differentially private (DP) with respect to individual trajectories in the dataset. To achieve this, we introduce PriMORL, a model-based RL algorithm with formal differential privacy guarantees. PriMORL first learns an ensemble of trajectory-level DP models of the environment from offline data. It then optimizes a policy on the penalized private model, without any further interaction with the system or access to the dataset. In addition to offering strong theoretical foundations, we demonstrate empirically that PriMORL enables the training of private RL agents on offline continuous control tasks with deep function approximations, whereas current methods are limited to simpler tabular and linear Markov Decision Processes (MDPs). We furthermore outline the trade-offs involved in achieving privacy in this setting.

Updated: 2024-10-09 13:31:25

标题: 差分隐私深度模型驱动的强化学习

摘要: 我们探讨了私人深度离线强化学习（RL），目标是在标准控制任务上训练一个政策，该政策在数据集中的个体轨迹方面是差分私有（DP）。为了实现这一目标，我们引入了PriMORL，这是一种具有形式差分隐私保证的基于模型的RL算法。PriMORL首先从离线数据中学习环境的轨迹级DP模型集合。然后，在惩罚私有模型的基础上优化政策，而无需进一步与系统互动或访问数据集。除了提供强大的理论基础外，我们还在实证上证明了PriMORL能够在具有深度函数逼近的离线连续控制任务上训练私有RL代理，而当前方法仅限于更简单的表格和线性马尔可夫决策过程（MDP）。此外，我们还概述了在这种设置中实现隐私的权衡。

更新时间: 2024-10-09 13:31:25

领域: cs.LG,cs.AI,cs.CR,stat.ML

下载: http://arxiv.org/abs/2402.05525v2

Students' Perceptions and Use of Generative AI Tools for Programming Across Different Computing Courses

Investigation of students' perceptions and opinions on the use of generative artificial intelligence (GenAI) in education is a topic gaining much interest. Studies addressing this are typically conducted with large heterogeneous groups, at one moment in time. However, how students perceive and use GenAI tools can potentially depend on many factors, including their background knowledge, familiarity with the tools, and the learning goals and policies of the courses they are taking. In this study we explore how students following computing courses use GenAI for programming-related tasks across different programs and courses: Bachelor and Master, in courses in which learning programming is the learning goal, courses that require programming as a means to achieve another goal, and in courses in which programming is optional, but can be useful. We are also interested in changes over time, since GenAI capabilities are changing at a fast pace, and users are adopting GenAI increasingly. We conducted three consecutive surveys (fall `23, winter `23, and spring `24) among students of all computing programs of a large European research university. We asked questions on the use in education, ethics, and job prospects, and we included specific questions on the (dis)allowed use of GenAI tools in the courses they were taking at the time. We received 264 responses, which we quantitatively and qualitatively analyzed, to find out how students have employed GenAI tools across 59 different computing courses, and whether the opinion of an average student about these tools evolves over time. Our study contributes to the emerging discussion of how to differentiate GenAI use across different courses, and how to align its use with the learning goals of a computing course.

Updated: 2024-10-09 13:24:06

标题: 学生对不同计算课程中用于编程的生成式人工智能工具的看法和使用情况

摘要: 学生对在教育中使用生成人工智能（GenAI）的看法和意见是一个备受关注的话题。通常，研究这一问题的研究是针对大型异质群体进行的，且只在某一时间点进行。然而，学生如何看待和使用GenAI工具可能取决于许多因素，包括他们的背景知识、对工具的熟悉程度以及他们所上课程的学习目标和政策。在这项研究中，我们探讨了在计算课程中学生如何使用GenAI进行与编程相关的任务，涵盖不同的程序和课程：学士和硕士课程，学习编程是学习目标的课程，需要编程作为实现其他目标的手段的课程，以及编程是可选的但可能有用的课程。我们还对时间变化感兴趣，因为GenAI的能力正在快速变化，用户对GenAI的采用也在增加。我们在一所欧洲大型研究型大学的所有计算机课程的学生中进行了三次连续调查（23年秋季，23年冬季和24年春季）。我们询问了关于教育中的使用、伦理和就业前景的问题，并包括了关于他们当时所上课程中GenAI工具的（不）允许使用的具体问题。我们收到了264份回复，我们对这些回复进行了定量和定性分析，以了解学生如何在59门不同的计算课程中使用GenAI工具，以及一般学生对这些工具的看法是否随时间而演变。我们的研究有助于探讨如何区分不同课程中的GenAI使用，并如何将其使用与计算机课程的学习目标保持一致的讨论。

更新时间: 2024-10-09 13:24:06

领域: cs.CY,cs.AI

下载: http://arxiv.org/abs/2410.06865v1

Crafting desirable climate trajectories with RL explored socio-environmental simulations

Climate change poses an existential threat, necessitating effective climate policies to enact impactful change. Decisions in this domain are incredibly complex, involving conflicting entities and evidence. In the last decades, policymakers increasingly use simulations and computational methods to guide some of their decisions. Integrated Assessment Models (IAMs) are one of such methods, which combine social, economic, and environmental simulations to forecast potential policy effects. For example, the UN uses outputs of IAMs for their recent Intergovernmental Panel on Climate Change (IPCC) reports. Traditionally these have been solved using recursive equation solvers, but have several shortcomings, e.g. struggling at decision making under uncertainty. Recent preliminary work using Reinforcement Learning (RL) to replace the traditional solvers shows promising results in decision making in uncertain and noisy scenarios. We extend on this work by introducing multiple interacting RL agents as a preliminary analysis on modelling the complex interplay of socio-interactions between various stakeholders or nations that drives much of the current climate crisis. Our findings show that cooperative agents in this framework can consistently chart pathways towards more desirable futures in terms of reduced carbon emissions and improved economy. However, upon introducing competition between agents, for instance by using opposing reward functions, desirable climate futures are rarely reached. Modelling competition is key to increased realism in these simulations, as such we employ policy interpretation by visualising what states lead to more uncertain behaviour, to understand algorithm failure. Finally, we highlight the current limitations and avenues for further work to ensure future technology uptake for policy derivation.

Updated: 2024-10-09 13:21:50

标题: 使用强化学习探索社会-环境模拟的可取气候发展路径制定

摘要: 气候变化构成了一种存在威胁，需要有效的气候政策来实施有影响的改变。在这个领域的决策非常复杂，涉及到冲突的实体和证据。在过去的几十年中，政策制定者越来越多地使用模拟和计算方法来指导他们的一些决策。综合评估模型（IAMs）是这种方法之一，它结合了社会、经济和环境模拟，以预测潜在的政策效果。例如，联合国使用IAMs的输出来支持他们最近的《政府间气候变化专门委员会》（IPCC）报告。传统上，这些问题已经通过递归方程求解器来解决，但存在一些缺点，例如在不确定性下做决策时遇到困难。最近的初步工作使用强化学习（RL）来取代传统求解器，在不确定和嘈杂的情况下做出决策显示出有希望的结果。我们在这项工作中进一步引入了多个相互作用的RL代理，作为对模拟推动当前气候危机中许多社会互动之间复杂相互作用的初步分析。我们的研究结果表明，在这种框架中，合作代理可以始终开辟通往更理想未来的途径，即减少碳排放和改善经济。然而，在代理之间引入竞争，例如使用相反的奖励函数，很少达到理想的气候未来。建模竞争对于增加这些模拟的现实性至关重要，因此我们通过可视化政策解释来理解哪些状态会导致更不确定的行为，以了解算法失败。最后，我们强调当前的局限和进一步工作的途径，以确保未来技术的接受度用于政策制定。

更新时间: 2024-10-09 13:21:50

领域: physics.soc-ph,cs.AI

下载: http://arxiv.org/abs/2410.07287v1

Benchmarking Data Heterogeneity Evaluation Approaches for Personalized Federated Learning

There is growing research interest in measuring the statistical heterogeneity of clients' local datasets. Such measurements are used to estimate the suitability for collaborative training of personalized federated learning (PFL) models. Currently, these research endeavors are taking place in silos and there is a lack of a unified benchmark to provide a fair and convenient comparison among various approaches in common settings. We aim to bridge this important gap in this paper. The proposed benchmarking framework currently includes six representative approaches. Extensive experiments have been conducted to compare these approaches under five standard non-IID FL settings, providing much needed insights into which approaches are advantageous under which settings. The proposed framework offers useful guidance on the suitability of various data divergence measures in FL systems. It is beneficial for keeping related research activities on the right track in terms of: (1) designing PFL schemes, (2) selecting appropriate data heterogeneity evaluation approaches for specific FL application scenarios, and (3) addressing fairness issues in collaborative model training. The code is available at https://github.com/Xiaoni-61/DH-Benchmark.

Updated: 2024-10-09 13:16:02

标题: 基于个性化联邦学习的数据异质性评估方法基准测试

摘要: 越来越多的研究对客户本地数据集的统计异质性进行测量感兴趣。这些测量用于估计个性化联邦学习（PFL）模型的协作训练适用性。目前，这些研究工作正在独立进行，缺乏一个统一的基准来在常见环境中提供公平和方便的比较。本文旨在填补这一重要空白。提出的基准框架目前包括六种代表性方法。进行了大量实验，比较了这些方法在五种标准的非IID FL设置下的表现，为了解哪些方法在哪些设置下具有优势提供了非常需要的见解。提出的框架为FL系统中各种数据差异度量的适用性提供了有用的指导。这对于在以下方面保持相关研究活动走在正确的轨道上是有益的：（1）设计PFL方案，（2）为特定的FL应用场景选择适当的数据异质性评估方法，以及（3）解决协作模型训练中的公平性问题。代码可在https://github.com/Xiaoni-61/DH-Benchmark获取。

更新时间: 2024-10-09 13:16:02

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2410.07286v1

Understanding Model Ensemble in Transferable Adversarial Attack

Model ensemble adversarial attack has become a powerful method for generating transferable adversarial examples that can target even unknown models, but its theoretical foundation remains underexplored. To address this gap, we provide early theoretical insights that serve as a roadmap for advancing model ensemble adversarial attack. We first define transferability error to measure the error in adversarial transferability, alongside concepts of diversity and empirical model ensemble Rademacher complexity. We then decompose the transferability error into vulnerability, diversity, and a constant, which rigidly explains the origin of transferability error in model ensemble attack: the vulnerability of an adversarial example to ensemble components, and the diversity of ensemble components. Furthermore, we apply the latest mathematical tools in information theory to bound the transferability error using complexity and generalization terms, contributing to three practical guidelines for reducing transferability error: (1) incorporating more surrogate models, (2) increasing their diversity, and (3) reducing their complexity in cases of overfitting. Finally, extensive experiments with 54 models validate our theoretical framework, representing a significant step forward in understanding transferable model ensemble adversarial attacks.

Updated: 2024-10-09 13:14:11

标题: 理解可转移对抗攻击中的模型集成

摘要: 模型集成对抗攻击已成为一种强大的方法，用于生成可针对甚至未知模型的可传递对抗样本，但其理论基础仍未得到充分探讨。为解决这一差距，我们提供了早期的理论见解，作为推动模型集成对抗攻击的路线图。我们首先定义转移误差，以衡量对抗传递性中的误差，同时引入多样性和经验模型集成Rademacher复杂度的概念。然后，我们将转移误差分解为易受攻击性、多样性和常数，从而刚性地解释了模型集成攻击中转移误差的起源：对集成组件的对抗样本的易受攻击性，以及集成组件的多样性。此外，我们应用最新的信息论数学工具，利用复杂性和泛化项约束转移误差，为减少转移误差提供三项实用指导原则：（1）加入更多替代模型，（2）增加它们的多样性，以及（3）在过拟合情况下减少它们的复杂性。最后，通过对54个模型进行广泛实验验证了我们的理论框架，代表了对理解可传递模型集成对抗攻击迈出的重要一步。

更新时间: 2024-10-09 13:14:11

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2410.06851v1

On the Security and Design of Cryptosystems Using Gabidulin-Kronecker Product Codes

This paper is a preliminary study on the security and design of cryptosystems using Gabidulin-Kronecker Product Codes. In particular, we point out the design impracticality of the system, and propose ways to improve it.

Updated: 2024-10-09 13:09:29

标题: 关于使用Gabidulin-Kronecker乘积码的密码系统安全性和设计

摘要: 这篇论文是关于使用Gabidulin-Kronecker积码的密码系统安全性和设计的初步研究。我们特别指出了该系统设计的不切实际性，并提出了改进方法。

更新时间: 2024-10-09 13:09:29

领域: cs.CR

下载: http://arxiv.org/abs/2410.06849v1

Forgetting Through Transforming: Enabling Federated Unlearning via Class-Aware Representation Transformation

Federated Unlearning (FU) enables clients to selectively remove the influence of specific data from a trained federated learning model, addressing privacy concerns and regulatory requirements. However, existing FU methods often struggle to balance effective erasure with model utility preservation, especially for class-level unlearning in non-IID settings. We propose Federated Unlearning via Class-aware Representation Transformation (FUCRT), a novel method that achieves unlearning through class-aware representation transformation. FUCRT employs two key components: (1) a transformation class selection strategy to identify optimal forgetting directions, and (2) a transformation alignment technique using dual class-aware contrastive learning to ensure consistent transformations across clients. Extensive experiments on four datasets demonstrate FUCRT's superior performance in terms of erasure guarantee, model utility preservation, and efficiency. FUCRT achieves complete (100\%) erasure of unlearning classes while maintaining or improving performance on remaining classes, outperforming state-of-the-art baselines across both IID and Non-IID settings. Analysis of the representation space reveals FUCRT's ability to effectively merge unlearning class representations with the transformation class from remaining classes, closely mimicking the model retrained from scratch.

Updated: 2024-10-09 13:08:14

标题: 通过转换实现遗忘：通过课程感知表示转换实现联合遗忘

摘要: 联邦遗忘（FU）使客户能够有选择性地从训练好的联邦学习模型中删除特定数据的影响，从而解决隐私和监管要求的问题。然而，现有的FU方法通常很难在非独立同分布的情况下平衡有效的擦除和模型效用保存，尤其是在类别级别的遗忘方面。我们提出了一种新颖的方法，即通过类别感知表示转换（FUCRT）实现遗忘，该方法通过类别感知表示转换实现遗忘。FUCRT采用两个关键组件：（1）转换类选择策略，以确定最佳遗忘方向，（2）使用双类别感知对比学习的转换对齐技术，确保在客户端之间实现一致的转换。在四个数据集上进行的广泛实验表明，FUCRT在擦除保证、模型效用保存和效率方面具有优越性能。FUCRT在完全擦除（100\%）遗忘类别的同时，保持或提高了对剩余类别的性能，优于IID和非IID设置下的最先进基线。对表示空间的分析揭示了FUCRT有效地将遗忘类别的表示与剩余类别的转换类别合并，与从头开始重新训练的模型密切相似。

更新时间: 2024-10-09 13:08:14

领域: cs.LG

下载: http://arxiv.org/abs/2410.06848v1

A Safety Modulator Actor-Critic Method in Model-Free Safe Reinforcement Learning and Application in UAV Hovering

This paper proposes a safety modulator actor-critic (SMAC) method to address safety constraint and overestimation mitigation in model-free safe reinforcement learning (RL). A safety modulator is developed to satisfy safety constraints by modulating actions, allowing the policy to ignore safety constraint and focus on maximizing reward. Additionally, a distributional critic with a theoretical update rule for SMAC is proposed to mitigate the overestimation of Q-values with safety constraints. Both simulation and real-world scenarios experiments on Unmanned Aerial Vehicles (UAVs) hovering confirm that the SMAC can effectively maintain safety constraints and outperform mainstream baseline algorithms.

Updated: 2024-10-09 13:07:24

标题: 一种安全调节器演员-评论家方法在无模型安全强化学习中的应用及其在无人机悬停中的应用

摘要: 本文提出了一种安全调节器 actor-critic（SMAC）方法，用于解决无模型安全强化学习中的安全约束和过度估计问题。通过开发一个安全调节器来满足安全约束，从而调节行为，使策略可以忽略安全约束并专注于最大化奖励。此外，提出了一个具有理论更新规则的分布式评论者，用于减轻在安全约束下 Q 值的过度估计。在无人机悬停的模拟和真实场景实验中证实了SMAC可以有效地维持安全约束，并胜过主流基准算法。

更新时间: 2024-10-09 13:07:24

领域: cs.AI,cs.LG,cs.RO

下载: http://arxiv.org/abs/2410.06847v1

Joint Fine-tuning and Conversion of Pretrained Speech and Language Models towards Linear Complexity

Architectures such as Linformer and Mamba have recently emerged as competitive linear time replacements for transformers. However, corresponding large pretrained models are often unavailable, especially in non-text domains. To remedy this, we present a Cross-Architecture Layerwise Distillation (CALD) approach that jointly converts a transformer model to a linear time substitute and fine-tunes it to a target task. We also compare several means to guide the fine-tuning to optimally retain the desired inference capability from the original model. The methods differ in their use of the target model and the trajectory of the parameters. In a series of empirical studies on language processing, language modeling, and speech processing, we show that CALD can effectively recover the result of the original model, and that the guiding strategy contributes to the result. Some reasons for the variation are suggested.

Updated: 2024-10-09 13:06:43

标题: 预训练语音和语言模型的联合微调和转换朝向线性复杂度

摘要: 最近出现了像Linformer和Mamba这样的架构，它们被认为是transformer的竞争性线性时间替代品。然而，相应的大型预训练模型通常不可用，特别是在非文本领域。为了解决这个问题，我们提出了一种Cross-Architecture Layerwise Distillation (CALD)方法，它联合将transformer模型转换为线性时间替代品，并对其进行微调以适应目标任务。我们还比较了几种引导微调的方法，以最佳地保留原始模型的所需推理能力。这些方法在使用目标模型和参数轨迹方面有所不同。在一系列关于语言处理、语言建模和语音处理的实证研究中，我们展示了CALD能够有效地恢复原始模型的结果，并且引导策略对结果有所贡献。对变化的一些原因进行了推测。

更新时间: 2024-10-09 13:06:43

领域: cs.CL,cs.AI,cs.LG,cs.SD,eess.AS

下载: http://arxiv.org/abs/2410.06846v1

MentalArena: Self-play Training of Language Models for Diagnosis and Treatment of Mental Health Disorders

Mental health disorders are one of the most serious diseases in the world. Most people with such a disease lack access to adequate care, which highlights the importance of training models for the diagnosis and treatment of mental health disorders. However, in the mental health domain, privacy concerns limit the accessibility of personalized treatment data, making it challenging to build powerful models. In this paper, we introduce MentalArena, a self-play framework to train language models by generating domain-specific personalized data, where we obtain a better model capable of making a personalized diagnosis and treatment (as a therapist) and providing information (as a patient). To accurately model human-like mental health patients, we devise Symptom Encoder, which simulates a real patient from both cognition and behavior perspectives. To address intent bias during patient-therapist interactions, we propose Symptom Decoder to compare diagnosed symptoms with encoded symptoms, and dynamically manage the dialogue between patient and therapist according to the identified deviations. We evaluated MentalArena against 6 benchmarks, including biomedicalQA and mental health tasks, compared to 6 advanced models. Our models, fine-tuned on both GPT-3.5 and Llama-3-8b, significantly outperform their counterparts, including GPT-4o. We hope that our work can inspire future research on personalized care. Code is available in https://github.com/Scarelette/MentalArena/tree/main

Updated: 2024-10-09 13:06:40

标题: 心理竞技场：自我对弈训练语言模型用于诊断和治疗心理健康障碍

摘要: 心理健康障碍是世界上最严重的疾病之一。大多数患有这种疾病的人缺乏获得充分护理的途径，这凸显了为诊断和治疗心理健康障碍建立培训模型的重要性。然而，在心理健康领域，隐私问题限制了个性化治疗数据的可访问性，使得构建强大模型具有挑战性。在本文中，我们介绍了MentalArena，一个自我训练框架，通过生成领域特定的个性化数据来训练语言模型，我们得到了一个更好的模型，能够进行个性化诊断和治疗（作为治疗师）并提供信息（作为患者）。为了准确建模类似于人类的心理健康患者，我们设计了Symptom Encoder，从认知和行为的角度模拟真实患者。为了解决患者-治疗师互动过程中的意图偏见，我们提出了Symptom Decoder，将诊断症状与编码症状进行比较，并根据识别到的偏差动态管理患者和治疗师之间的对话。我们对MentalArena进行了与6个基准测试的评估，包括生物医学问答和心理健康任务，与6个先进模型进行了比较。我们的模型，在GPT-3.5和Llama-3-8b上经过精细调整，明显优于它们的对手，包括GPT-4o。我们希望我们的工作能够激发未来个性化护理方面的研究。代码可在https://github.com/Scarelette/MentalArena/tree/main获取。

更新时间: 2024-10-09 13:06:40

领域: cs.CL,cs.AI,cs.MA

下载: http://arxiv.org/abs/2410.06845v1

Exploring and Exploiting the Asymmetric Valley of Deep Neural Networks

Exploring the loss landscape offers insights into the inherent principles of deep neural networks (DNNs). Recent work suggests an additional asymmetry of the valley beyond the flat and sharp ones, yet without thoroughly examining its causes or implications. Our study methodically explores the factors affecting the symmetry of DNN valleys, encompassing (1) the dataset, network architecture, initialization, and hyperparameters that influence the convergence point; and (2) the magnitude and direction of the noise for 1D visualization. Our major observation shows that the {\it degree of sign consistency} between the noise and the convergence point is a critical indicator of valley symmetry. Theoretical insights from the aspects of ReLU activation and softmax function could explain the interesting phenomenon. Our discovery propels novel understanding and applications in the scenario of Model Fusion: (1) the efficacy of interpolating separate models significantly correlates with their sign consistency ratio, and (2) imposing sign alignment during federated learning emerges as an innovative approach for model parameter alignment.

Updated: 2024-10-09 13:04:29

标题: 探索和利用深度神经网络的不对称谷底

摘要: 探索损失景观可以揭示深度神经网络（DNN）固有原则。最近的研究表明，除了平坦和陡峭之外，山谷还存在额外的不对称性，但尚未彻底考察其原因或影响。我们的研究系统地探讨了影响DNN山谷对称性的因素，包括（1）影响收敛点的数据集、网络架构、初始化和超参数；以及（2）用于1D可视化的噪声的大小和方向。我们的主要观察显示，噪声与收敛点之间的符号一致性程度是山谷对称性的关键指标。从ReLU激活和softmax函数的角度出发的理论洞察可以解释这一有趣现象。我们的发现推动了在模型融合场景中的新理解和应用：（1）插值分离模型的效力与它们的符号一致性比例显著相关，（2）在联邦学习中强调符号对齐，成为模型参数对齐的创新方法。

更新时间: 2024-10-09 13:04:29

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2405.12489v4

Counterfactual Concept Bottleneck Models

Current deep learning models are not designed to simultaneously address three fundamental questions: predict class labels to solve a given classification task (the "What?"), simulate changes in the situation to evaluate how this impacts class predictions (the "How?"), and imagine how the scenario should change to result in different class predictions (the "Why not?"). The inability to answer these questions represents a crucial gap in deploying reliable AI agents, calibrating human trust, and improving human-machine interaction. To bridge this gap, we introduce CounterFactual Concept Bottleneck Models (CF-CBMs), a class of models designed to efficiently address the above queries all at once without the need to run post-hoc searches. Our experimental results demonstrate that CF-CBMs: achieve classification accuracy comparable to black-box models and existing CBMs ("What?"), rely on fewer important concepts leading to simpler explanations ("How?"), and produce interpretable, concept-based counterfactuals ("Why not?"). Additionally, we show that training the counterfactual generator jointly with the CBM leads to two key improvements: (i) it alters the model's decision-making process, making the model rely on fewer important concepts (leading to simpler explanations), and (ii) it significantly increases the causal effect of concept interventions on class predictions, making the model more responsive to these changes.

Updated: 2024-10-09 12:57:37

标题: 反事实概念瓶颈模型

摘要: 目前的深度学习模型并没有设计成能同时解决三个基本问题：预测类别标签以解决给定分类任务（“是什么？”），模拟情况的变化以评估这如何影响类别预测（“怎么回事？”），以及想象情景应该如何改变以导致不同的类别预测（“为什么不？”）。无法回答这些问题代表着在部署可靠的人工智能代理、校准人类信任和改善人机交互方面存在重要差距。为了弥补这一差距，我们引入CounterFactual Concept Bottleneck Models（CF-CBMs），这是一类旨在高效解决上述问题的模型，无需进行事后搜索。我们的实验结果表明，CF-CBMs：实现了与黑匣子模型和现有CBMs相当的分类准确率（“是什么？”），依赖更少的重要概念导致更简单的解释（“怎么回事？”），并产生可解释的、基于概念的对照因素（“为什么不？”）。此外，我们展示了与CBM联合训练对于两个关键改进的影响：（i）它改变了模型的决策过程，使模型依赖更少的重要概念（导致更简单的解释），（ii）它显著增加了概念干预对类别预测的因果效应，使模型更加响应这些变化。

更新时间: 2024-10-09 12:57:37

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2402.01408v2

Instructional Segment Embedding: Improving LLM Safety with Instruction Hierarchy

Large Language Models (LLMs) are susceptible to security and safety threats, such as prompt injection, prompt extraction, and harmful requests. One major cause of these vulnerabilities is the lack of an instruction hierarchy. Modern LLM architectures treat all inputs equally, failing to distinguish between and prioritize various types of instructions, such as system messages, user prompts, and data. As a result, lower-priority user prompts may override more critical system instructions, including safety protocols. Existing approaches to achieving instruction hierarchy, such as delimiters and instruction-based training, do not address this issue at the architectural level. We introduce the Instructional Segment Embedding (ISE) technique, inspired by BERT, to modern large language models, which embeds instruction priority information directly into the model. This approach enables models to explicitly differentiate and prioritize various instruction types, significantly improving safety against malicious prompts that attempt to override priority rules. Our experiments on the Structured Query and Instruction Hierarchy benchmarks demonstrate an average robust accuracy increase of up to 15.75% and 18.68%, respectively. Furthermore, we observe an improvement in instruction-following capability of up to 4.1% evaluated on AlpacaEval. Overall, our approach offers a promising direction for enhancing the safety and effectiveness of LLM architectures.

Updated: 2024-10-09 12:52:41

标题: 教学片段嵌入：通过教学层次结构提高LLM安全性

摘要: 大语言模型（LLMs）容易受到安全和安全威胁，例如提示注入、提示提取和有害请求。这些漏洞的一个主要原因是缺乏指令层次结构。现代LLM架构将所有输入视为相等，未能区分和优先考虑各种类型的指令，例如系统消息、用户提示和数据。因此，优先级较低的用户提示可能会覆盖更重要的系统指令，包括安全协议。现有的实现指令层次结构的方法，如分隔符和基于指令的训练，未能解决这个问题在架构级别。我们引入了受BERT启发的指令段嵌入（ISE）技术到现代大型语言模型中，直接将指令优先级信息嵌入模型。这种方法使模型能够明确区分和优先考虑各种指令类型，显着提高了对试图覆盖优先规则的恶意提示的安全性。我们在结构化查询和指令层次基准上的实验分别展示了平均鲁棒准确率增加高达15.75%和18.68%。此外，我们观察到在AlpacaEval上评估的指令遵循能力提高了高达4.1%。总体而言，我们的方法为增强LLM架构的安全性和有效性提供了一个有前途的方向。

更新时间: 2024-10-09 12:52:41

领域: cs.CR,cs.AI,cs.CL,cs.LG

下载: http://arxiv.org/abs/2410.09102v1

Dynamic metastability in the self-attention model

We consider the self-attention model - an interacting particle system on the unit sphere, which serves as a toy model for Transformers, the deep neural network architecture behind the recent successes of large language models. We prove the appearance of dynamic metastability conjectured in [GLPR23] - although particles collapse to a single cluster in infinite time, they remain trapped near a configuration of several clusters for an exponentially long period of time. By leveraging a gradient flow interpretation of the system, we also connect our result to an overarching framework of slow motion of gradient flows proposed by Otto and Reznikoff [OR07] in the context of coarsening and the Allen-Cahn equation. We finally probe the dynamics beyond the exponentially long period of metastability, and illustrate that, under an appropriate time-rescaling, the energy reaches its global maximum in finite time and has a staircase profile, with trajectories manifesting saddle-to-saddle-like behavior, reminiscent of recent works in the analysis of training dynamics via gradient descent for two-layer neural networks.

Updated: 2024-10-09 12:50:50

标题: 自注意力模型中的动态稳定性

摘要: 我们考虑了自注意力模型-一个在单位球上的相互作用粒子系统，它作为变压器的玩具模型，是最近大型语言模型成功背后的深度神经网络架构。我们证明了在[GLPR23]中猜测的动态亚稳定的出现-尽管粒子在无限时间内会坍缩成单个簇，它们仍然会在接近几个簇的配置附近困于指数长的时间。通过利用系统的梯度流解释，我们还将我们的结果与由Otto和Reznikoff [OR07]在粗化和Allen-Cahn方程的背景下提出的梯度流缓慢运动的总体框架联系起来。最后，我们探究了超出指数长亚稳定期的动态，并说明，在适当的时间重缩下，能量在有限时间内达到其全局最大值，并具有阶梯状轮廓，轨迹表现出鞍点到鞍点的行为，让人想起最近对两层神经网络通过梯度下降进行训练动态分析的研究作品。

更新时间: 2024-10-09 12:50:50

领域: cs.LG,math.AP,math.DS

下载: http://arxiv.org/abs/2410.06833v1

Data Taggants: Dataset Ownership Verification via Harmless Targeted Data Poisoning

Dataset ownership verification, the process of determining if a dataset is used in a model's training data, is necessary for detecting unauthorized data usage and data contamination. Existing approaches, such as backdoor watermarking, rely on inducing a detectable behavior into the trained model on a part of the data distribution. However, these approaches have limitations, as they can be harmful to the model's performances or require unpractical access to the model's internals. Most importantly, previous approaches lack guarantee against false positives. This paper introduces data taggants, a novel non-backdoor dataset ownership verification technique. Our method uses pairs of out-of-distribution samples and random labels as secret keys, and leverages clean-label targeted data poisoning to subtly alter a dataset, so that models trained on it respond to the key samples with the corresponding key labels. The keys are built as to allow for statistical certificates with black-box access only to the model. We validate our approach through comprehensive and realistic experiments on ImageNet1k using ViT and ResNet models with state-of-the-art training recipes. Our findings demonstrate that data taggants can reliably make models trained on the protected dataset detectable with high confidence, without compromising validation accuracy, and demonstrates superiority over backdoor watermarking. Moreover, our method shows to be stealthy and robust against various defense mechanisms.

Updated: 2024-10-09 12:49:23

标题: 数据标记：通过无害的有针对性的数据污染进行数据集所有权验证

摘要: 数据集所有权验证是确定数据集是否用于模型训练数据的过程，对于检测未经授权的数据使用和数据污染是必要的。现有的方法，如后门水印，依赖于在训练模型的一部分数据分布中引入可检测行为。然而，这些方法存在局限性，因为它们可能对模型的性能有害，或者需要不切实际地访问模型的内部。最重要的是，先前的方法缺乏对误报的保证。本文介绍了数据标记剂，一种新颖的非后门数据集所有权验证技术。我们的方法使用带有随机标签的超出分布样本对作为秘密密钥，并利用干净标签的有针对性数据污染来微妙地改变数据集，以便在其上训练的模型用相应的密钥标签响应密钥样本。这些密钥被设计为只允许在仅能访问模型的黑盒的情况下进行统计证书。我们通过在ImageNet1k上使用ViT和ResNet模型以及最先进的训练配方进行全面和实际的实验证实了我们的方法。我们的研究结果表明，数据标记剂可以可靠地使在受保护的数据集上训练的模型具有高置信度的检测能力，而不会损害验证准确性，并且比后门水印技术表现更优。此外，我们的方法显示出对各种防御机制具有潜伏性和稳健性。

更新时间: 2024-10-09 12:49:23

领域: cs.CR,cs.LG,stat.ML

下载: http://arxiv.org/abs/2410.09101v1

Evidence-Focused Fact Summarization for Knowledge-Augmented Zero-Shot Question Answering

Recent studies have investigated utilizing Knowledge Graphs (KGs) to enhance Quesetion Answering (QA) performance of Large Language Models (LLMs), yet structured KG verbalization remains challengin. Existing methods, such as triple-form or free-form textual conversion of triple-form facts, encounter several issues. These include reduced evidence density due to duplicated entities or relationships, and reduced evidence clarity due to an inability to emphasize crucial evidence. To address these issues, we propose EFSum, an Evidence-focused Fact Summarization framework for enhanced QA with knowledge-augmented LLMs. We optimize an open-source LLM as a fact summarizer through distillation and preference alignment. Our extensive experiments show that EFSum improves LLM's zero-shot QA performance, and it is possible to ensure both the helpfulness and faithfulness of the summary.

Updated: 2024-10-09 12:46:40

标题: 基于证据的事实总结对知识增强的零-shot问题回答进行翻译

摘要: 最近的研究调查了利用知识图谱（KGs）来增强大型语言模型（LLMs）的问答（QA）性能，然而结构化的KG语言化仍然具有挑战性。现有方法，如三元组形式或自由形式文本转换三元组形式事实，遇到了几个问题。这些问题包括由于重复实体或关系而导致的证据密度降低，以及由于无法强调关键证据而导致的证据清晰度降低。为了解决这些问题，我们提出了EFSum，一种用于增强带知识的LLMs的QA的证据聚焦事实摘要框架。我们通过精炼和偏好对齐将一个开源的LLM优化为事实摘要器。我们的广泛实验表明，EFSum改善了LLM的零-shot QA性能，并且可以确保摘要的有用性和忠实性。

更新时间: 2024-10-09 12:46:40

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2403.02966v3

Transfer Learning for a Class of Cascade Dynamical Systems

This work considers the problem of transfer learning in the context of reinforcement learning. Specifically, we consider training a policy in a reduced order system and deploying it in the full state system. The motivation for this training strategy is that running simulations in the full-state system may take excessive time if the dynamics are complex. While transfer learning alleviates the computational issue, the transfer guarantees depend on the discrepancy between the two systems. In this work, we consider a class of cascade dynamical systems, where the dynamics of a subset of the state-space influence the rest of the states but not vice-versa. The reinforcement learning policy learns in a model that ignores the dynamics of these states and treats them as commanded inputs. In the full-state system, these dynamics are handled using a classic controller (e.g., a PID). These systems have vast applications in the control literature and their structure allows us to provide transfer guarantees that depend on the stability of the inner loop controller. Numerical experiments on a quadrotor support the theoretical findings.

Updated: 2024-10-09 12:40:31

标题: 级联动力系统类的迁移学习

摘要: 这项工作考虑了在强化学习背景下的迁移学习问题。具体而言，我们考虑在降低阶系统中训练策略，并在完整状态系统中部署它。这种训练策略的动机是，如果动力学复杂，那么在完整状态系统中运行模拟可能会花费过多时间。虽然迁移学习缓解了计算问题，但迁移保证取决于两个系统之间的差异。在这项工作中，我们考虑了一类级联动力系统，其中部分状态空间的动态影响其他状态，但反之不然。强化学习策略在一个忽略这些状态动态的模型中学习，并将它们视为指令输入。在完整状态系统中，这些动态使用经典控制器（例如PID）处理。这些系统在控制文献中有着广泛的应用，并且它们的结构使我们能够提供取决于内环控制器稳定性的迁移保证。在四旋翼飞行器上的数值实验支持了理论发现。

更新时间: 2024-10-09 12:40:31

领域: cs.LG,F.2.2, I.2.7

下载: http://arxiv.org/abs/2410.06828v1

On the Byzantine-Resilience of Distillation-Based Federated Learning

Federated Learning (FL) algorithms using Knowledge Distillation (KD) have received increasing attention due to their favorable properties with respect to privacy, non-i.i.d. data and communication cost. These methods depart from transmitting model parameters and instead communicate information about a learning task by sharing predictions on a public dataset. In this work, we study the performance of such approaches in the byzantine setting, where a subset of the clients act in an adversarial manner aiming to disrupt the learning process. We show that KD-based FL algorithms are remarkably resilient and analyze how byzantine clients can influence the learning process. Based on these insights, we introduce two new byzantine attacks and demonstrate their ability to break existing byzantine-resilient methods. Additionally, we propose a novel defence method which enhances the byzantine resilience of KD-based FL algorithms. Finally, we provide a general framework to obfuscate attacks, making them significantly harder to detect, thereby improving their effectiveness. Our findings serve as an important building block in the analysis of byzantine FL, contributing through the development of new attacks and new defence mechanisms, further advancing the robustness of KD-based FL algorithms.

Updated: 2024-10-09 12:38:26

标题: 关于基于蒸馏的联邦学习的拜占庭恢复能力

摘要: 具有知识蒸馏的联邦学习（FL）算法由于其在隐私、非独立同分布数据和通信成本方面的有利特性而受到越来越多的关注。这些方法不再传输模型参数，而是通过在公共数据集上共享预测来传达有关学习任务的信息。在这项工作中，我们研究了这些方法在拜占庭环境中的性能，即一部分客户端以敌对的方式行事，旨在破坏学习过程。我们展示了基于知识蒸馏的FL算法具有显著的抗干扰能力，并分析了拜占庭客户端如何影响学习过程。基于这些见解，我们引入了两种新的拜占庭攻击，并展示它们能够破坏现有的拜占庭抗性方法。此外，我们提出了一种增强基于知识蒸馏的FL算法拜占庭抗性的新防御方法。最后，我们提供了一个通用框架来模糊攻击，使其变得更难检测，从而提高它们的有效性。我们的发现是拜占庭FL分析的重要基础，通过开发新的攻击和新的防御机制，进一步提高了基于知识蒸馏的FL算法的稳健性。

更新时间: 2024-10-09 12:38:26

领域: cs.LG,cs.AI,cs.DC

下载: http://arxiv.org/abs/2402.12265v2

K-SAM: A Prompting Method Using Pretrained U-Net to Improve Zero Shot Performance of SAM on Lung Segmentation in CXR Images

In clinical procedures, precise localization of the target area is an essential step for clinical diagnosis and screening. For many diagnostic applications, lung segmentation of chest X-ray images is an essential first step that significantly reduces the image size to speed up the subsequent analysis. One of the primary difficulties with this task is segmenting the lung regions covered by dense abnormalities also known as opacities due to diseases like pneumonia and tuberculosis. SAM has astonishing generalization capabilities for category agnostic segmentation. In this study we propose an algorithm to improve zero shot performance of SAM on lung region segmentation task by automatic prompt selection. Two separate UNet models were trained, one for predicting lung segments and another for heart segment. Though these predictions lack fine details around the edges, they provide positive and negative points as prompt for SAM. Using proposed prompting method zero shot performance of SAM is evaluated on two benchmark datasets. ViT-l version of the model achieved slightly better performance compared to other two versions, ViTh and ViTb. It yields an average Dice score of 95.5 percent and 94.9 percent on hold out data for two datasets respectively. Though, for most of the images, SAM did outstanding segmentation, its prediction was way off for some of the images. After careful inspection it is found that all of these images either had extreme abnormality or distorted shape. Unlike most of the research performed so far on lung segmentation from CXR images using SAM, this study proposes a fully automated prompt selection process only from the input image. Our finding indicates that using pretrained models for prompt selection can utilize SAM impressive generalization capability to its full extent.

Updated: 2024-10-09 12:37:12

标题: K-SAM：使用预训练的U-Net改进SAM在CXR图像中肺部分割的零样本性能的提示方法

摘要: 在临床程序中，精确定位目标区域是临床诊断和筛查的关键步骤。对于许多诊断应用程序，胸部X射线图像的肺部分割是一个重要的第一步，它显著减小了图像大小，加快了后续分析的速度。这项任务的主要困难之一是分割被疾病如肺炎和结核引起的密集异常区域，也被称为混浊区域。SAM在类别无关分割方面具有惊人的泛化能力。在这项研究中，我们提出了一种算法，通过自动提示选择来改进SAM在肺部分割任务上的零射击性能。训练了两个独立的UNet模型，一个用于预测肺部分割，另一个用于心脏分割。尽管这些预测在边缘周围缺乏细节，但它们提供了SAM的积极和消极点。使用所提出的提示方法，在两个基准数据集上评估了SAM的零射击性能。模型的ViT-l版本在保留数据上表现略好于其他两个版本，ViTh和ViTb。它分别在两个数据集的保留数据上获得了平均骰子得分分别为95.5％和94.9％。尽管在大多数图像中，SAM表现出色，但对一些图像的预测完全偏离了实际情况。经过仔细检查发现，所有这些图像要么有极端异常，要么形状扭曲。与迄今为止大多数使用SAM从CXR图像进行肺分割的研究不同，这项研究提出了一种仅从输入图像中自动选择提示的完全自动化过程。我们的发现表明，使用预训练模型进行提示选择可以充分利用SAM惊人的泛化能力。

更新时间: 2024-10-09 12:37:12

领域: eess.IV,cs.LG

下载: http://arxiv.org/abs/2410.06825v1

Causal Concept Graph Models: Beyond Causal Opacity in Deep Learning

Causal opacity denotes the difficulty in understanding the "hidden" causal structure underlying the decisions of deep neural network (DNN) models. This leads to the inability to rely on and verify state-of-the-art DNN-based systems, especially in high-stakes scenarios. For this reason, circumventing causal opacity in DNNs represents a key open challenge at the intersection of deep learning, interpretability, and causality. This work addresses this gap by introducing Causal Concept Graph Models (Causal CGMs), a class of interpretable models whose decision-making process is causally transparent by design. Our experiments show that Causal CGMs can: (i) match the generalisation performance of causally opaque models, (ii) enable human-in-the-loop corrections to mispredicted intermediate reasoning steps, boosting not just downstream accuracy after corrections but also the reliability of the explanations provided for specific instances, and (iii) support the analysis of interventional and counterfactual scenarios, thereby improving the model's causal interpretability and supporting the effective verification of its reliability and fairness.

Updated: 2024-10-09 12:34:31

标题: 因果概念图模型：深度学习中超越因果不透明性

摘要: 因果不透明性指的是理解深度神经网络（DNN）模型决策背后的“隐藏”因果结构的困难。这导致无法依赖和验证最先进的基于DNN的系统，尤其是在高风险场景下。因此，突破DNN中的因果不透明性代表了深度学习、可解释性和因果性交集处的一个关键开放挑战。本研究通过引入因果概念图模型（Causal CGMs），一类设计上决策过程因果透明的可解释模型，来解决这一问题。我们的实验表明，因果CGMs能够：（i）匹配因果不透明模型的泛化性能，（ii）使人类在决策过程中进行错误推理步骤的修正，不仅提高修正后的下游准确性，还提高了针对特定实例所提供解释的可靠性，（iii）支持干预和反事实情景的分析，从而提高模型的因果解释性，并支持有效验证其可靠性和公平性。

更新时间: 2024-10-09 12:34:31

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2405.16507v3

Read Over the Lines: Attacking LLMs and Toxicity Detection Systems with ASCII Art to Mask Profanity

We introduce a novel family of adversarial attacks that exploit the inability of language models to interpret ASCII art. To evaluate these attacks, we propose the ToxASCII benchmark and develop two custom ASCII art fonts: one leveraging special tokens and another using text-filled letter shapes. Our attacks achieve a perfect 1.0 Attack Success Rate across ten models, including OpenAI's o1-preview and LLaMA 3.1. Warning: this paper contains examples of toxic language used for research purposes.

Updated: 2024-10-09 12:29:38

标题: 审查文本：使用ASCII艺术攻击LLMs和毒性检测系统，以掩盖粗话

摘要: 我们介绍了一种新颖的对抗攻击家族，利用了语言模型无法解释ASCII艺术的能力。为了评估这些攻击，我们提出了ToxASCII基准测试，并开发了两种自定义的ASCII艺术字体：一种利用特殊令牌，另一种使用填充文本的字母形状。我们的攻击在包括OpenAI的o1-preview和LLaMA 3.1在内的十个模型中实现了完美的1.0攻击成功率。警告：本文包含了出于研究目的使用的有毒语言示例。

更新时间: 2024-10-09 12:29:38

领域: cs.CL,cs.AI,cs.CR

下载: http://arxiv.org/abs/2409.18708v4

Dynamic Neural Potential Field: Online Trajectory Optimization in Presence of Moving Obstacles

We address a task of local trajectory planning for the mobile robot in the presence of static and dynamic obstacles. Local trajectory is obtained as a numerical solution of the Model Predictive Control (MPC) problem. Collision avoidance may be provided by adding repulsive potential of the obstacles to the cost function of MPC. We develop an approach, where repulsive potential is estimated by the neural model. We propose and explore three possible strategies of handling dynamic obstacles. First, environment with dynamic obstacles is considered as a sequence of static environments. Second, the neural model predict a sequence of repulsive potential at once. Third, the neural model predict future repulsive potential step by step in autoregressive mode. We implement these strategies and compare it with CIAO* and MPPI using BenchMR framework. First two strategies showed higher performance than CIAO* and MPPI while preserving safety constraints. The third strategy was a bit slower, however it still satisfy time limits. We deploy our approach on Husky UGV mobile platform, which move through the office corridors under proposed MPC local trajectory planner. The code and trained models are available at \url{https://github.com/CognitiveAISystems/Dynamic-Neural-Potential-Field}.

Updated: 2024-10-09 12:27:09

标题: 动态神经潜能场：移动障碍物存在下的在线轨迹优化

摘要: 我们研究了移动机器人在静态和动态障碍物存在的情况下的局部轨迹规划任务。局部轨迹是通过模型预测控制（MPC）问题的数值解得到的。碰撞避免可以通过将障碍物的斥力势加入到MPC的成本函数中来实现。我们开发了一种方法，其中斥力势由神经模型估计。我们提出并探讨了处理动态障碍物的三种可能策略。首先，将具有动态障碍物的环境视为静态环境序列。其次，神经模型一次性预测一系列斥力势。第三，神经模型以自回归模式逐步预测未来的斥力势。我们实施了这些策略，并使用BenchMR框架将其与CIAO*和MPPI进行比较。前两种策略表现出比CIAO*和MPPI更高的性能，同时保持安全约束。第三种策略速度稍慢，但仍满足时间限制。我们将我们的方法部署到Husky UGV移动平台上，在提出的MPC局部轨迹规划器下穿过办公室走廊。代码和训练模型可在\url{https://github.com/CognitiveAISystems/Dynamic-Neural-Potential-Field}上找到。

更新时间: 2024-10-09 12:27:09

领域: cs.RO,cs.AI

下载: http://arxiv.org/abs/2410.06819v1

An Improved Approach for Cardiac MRI Segmentation based on 3D UNet Combined with Papillary Muscle Exclusion

Left ventricular ejection fraction (LVEF) is the most important clinical parameter of cardiovascular function. The accuracy in estimating this parameter is highly dependent upon the precise segmentation of the left ventricle (LV) structure at the end diastole and systole phases. Therefore, it is crucial to develop robust algorithms for the precise segmentation of the heart structure during different phases. Methodology: In this work, an improved 3D UNet model is introduced to segment the myocardium and LV, while excluding papillary muscles, as per the recommendation of the Society for Cardiovascular Magnetic Resonance. For the practical testing of the proposed framework, a total of 8,400 cardiac MRI images were collected and analysed from the military hospital in Tunis (HMPIT), as well as the popular ACDC public dataset. As performance metrics, we used the Dice coefficient and the F1 score for validation/testing of the LV and the myocardium segmentation. Results: The data was split into 70%, 10%, and 20% for training, validation, and testing, respectively. It is worth noting that the proposed segmentation model was tested across three axis views: basal, medio basal and apical at two different cardiac phases: end diastole and end systole instances. The experimental results showed a Dice index of 0.965 and 0.945, and an F1 score of 0.801 and 0.799, at the end diastolic and systolic phases, respectively. Additionally, clinical evaluation outcomes revealed a significant difference in the LVEF and other clinical parameters when the papillary muscles were included or excluded.

Updated: 2024-10-09 12:19:58

标题: 基于3D UNet和乳头肌排除的心脏MRI分割的改进方法

摘要: 左心室射血分数（LVEF）是心血管功能最重要的临床参数。准确估计此参数高度依赖于左心室（LV）结构在舒张末期和收缩期的精确分割。因此，对于在不同阶段精确分割心脏结构而言，开发强大的算法至关重要。方法学：在本研究中，引入了改进的3D UNet模型，用于分割心肌和LV，同时根据心血管磁共振学会的建议排除乳头肌。为了对所提出的框架进行实际测试，从突尼斯军事医院（HMPIT）和流行的ACDC公共数据集中收集和分析了8400张心脏MRI图像。我们使用Dice系数和F1分数作为性能指标，用于验证/测试LV和心肌分割。结果：数据分为70％、10％和20％用于训练、验证和测试。值得注意的是，所提出的分割模型在三个轴视图上进行了测试：基底、中基底和心尖，在两个不同心脏阶段：舒张末期和收缩末期。实验结果显示，分别在舒张末期和收缩末期，Dice指数为0.965和0.945，F1分数为0.801和0.799。此外，临床评估结果显示，在包括或排除乳头肌时，LVEF和其他临床参数之间存在显著差异。

更新时间: 2024-10-09 12:19:58

领域: cs.CV,cs.AI,cs.LG,eess.IV

下载: http://arxiv.org/abs/2410.06818v1

Adaptive Training of Grid-Dependent Physics-Informed Kolmogorov-Arnold Networks

Physics-Informed Neural Networks (PINNs) have emerged as a robust framework for solving Partial Differential Equations (PDEs) by approximating their solutions via neural networks and imposing physics-based constraints on the loss function. Traditionally, Multilayer Perceptrons (MLPs) have been the neural network of choice, with significant progress made in optimizing their training. Recently, Kolmogorov-Arnold Networks (KANs) were introduced as a viable alternative, with the potential of offering better interpretability and efficiency while requiring fewer parameters. In this paper, we present a fast JAX-based implementation of grid-dependent Physics-Informed Kolmogorov-Arnold Networks (PIKANs) for solving PDEs, achieving up to 84 times faster training times than the original KAN implementation. We propose an adaptive training scheme for PIKANs, introducing an adaptive state transition technique to avoid loss function peaks between grid extensions, and a methodology for designing PIKANs with alternative basis functions. Through comparative experiments, we demonstrate that the adaptive features significantly enhance solution accuracy, decreasing the L^2 error relative to the reference solution by up to 43.02%. For the studied PDEs, our methodology approaches or surpasses the results obtained from architectures that utilize up to 8.5 times more parameters, highlighting the potential of adaptive, grid-dependent PIKANs as a superior alternative in scientific and engineering applications.

Updated: 2024-10-09 12:18:37

标题: 自适应训练的网格相关的物理信息科尔莫哥洛夫-阿诺德网络

摘要: 物理信息神经网络（PINNs）已经成为一个强大的框架，通过利用神经网络逼近其解，并在损失函数上施加基于物理的约束来解决偏微分方程（PDEs）。传统上，多层感知器（MLPs）一直是首选的神经网络，在优化它们的训练方面取得了显著进展。最近，科尔莫戈洛夫-阿诺德网络（KANs）被引入作为一种可行的替代方案，具有更好的可解释性和效率，并且需要更少的参数。在本文中，我们提出了一种基于JAX的快速实现，用于解决PDEs的基于网格的物理信息科尔莫戈洛夫-阿诺德网络（PIKANs），其训练速度比原始KAN实现快达84倍。我们提出了一种自适应训练方案，引入了一种自适应状态转换技术，以避免在网格扩展之间出现损失函数峰值，并提出了一种设计具有替代基函数的PIKANs的方法。通过比较实验，我们表明自适应特性显著提高了解的准确性，使L^2误差相对于参考解减少了高达43.02%。对于研究的PDEs，我们的方法接近或超过了利用多达8.5倍参数的架构获得的结果，突显了自适应、基于网格的PIKANs作为在科学和工程应用中更优越的替代方案的潜力。

更新时间: 2024-10-09 12:18:37

领域: cs.LG,physics.comp-ph

下载: http://arxiv.org/abs/2407.17611v2

Robust Regression over Averaged Uncertainty

We propose a new formulation of robust regression by integrating all realizations of the uncertainty set and taking an averaged approach to obtain the optimal solution for the ordinary least squares regression problem. We show that this formulation recovers ridge regression exactly and establishes the missing link between robust optimization and the mean squared error approaches for existing regression problems. We further demonstrate that the condition of this equivalence relies on the geometric properties of the defined uncertainty set. We provide exact, closed-form, in some cases, analytical solutions to the equivalent regularization strength under uncertainty sets induced by $\ell_p$ norm, Schatten $p$-norm, and general polytopes. We then show in synthetic datasets with different levels of uncertainties, a consistent improvement of the averaged formulation over the existing worst-case formulation in out-of-sample performance. In real-world regression problems obtained from UCI datasets, similar improvements are seen in the out-of-sample datasets.

Updated: 2024-10-09 12:16:15

标题: 坚韧回归在平均不确定性上的应用

摘要: 我们提出了一种新的稳健回归的公式，通过整合不确定性集的所有实现，并采取平均方法来获得普通最小二乘回归问题的最优解。我们展示了这种公式确切地恢复了岭回归，并建立了稳健优化与均方误差方法之间现有回归问题的缺失联系。我们进一步证明了这种等价性的条件取决于定义的不确定性集的几何属性。我们在由$\ell_p$范数、Schatten $p$-范数和一般多面体引起的不确定性集下，提供了等价正则化强度的确切、闭合形式，在某些情况下是解析解。然后我们在具有不同不确定性水平的合成数据集中展示了平均公式在样本外性能上相对于现有最坏情况公式的一致改进。在从UCI数据集获得的真实世界回归问题中，类似的改进也在样本外数据集中出现。

更新时间: 2024-10-09 12:16:15

领域: cs.LG,math.OC

下载: http://arxiv.org/abs/2311.06960v2

Multi-Neuron Unleashes Expressivity of ReLU Networks Under Convex Relaxation

Neural work certification has established itself as a crucial tool for ensuring the robustness of neural networks. Certification methods typically rely on convex relaxations of the feasible output set to provide sound bounds. However, complete certification requires exact bounds, which strongly limits the expressivity of ReLU networks: even for the simple ``$\max$'' function in $\mathbb{R}^2$, there does not exist a ReLU network that expresses this function and can be exactly bounded by single-neuron relaxation methods. This raises the question whether there exists a convex relaxation that can provide exact bounds for general continuous piecewise linear functions in $\mathbb{R}^n$. In this work, we answer this question affirmatively by showing that (layer-wise) multi-neuron relaxation provides complete certification for general ReLU networks. Based on this novel result, we show that the expressivity of ReLU networks is no longer limited under multi-neuron relaxation. To the best of our knowledge, this is the first positive result on the completeness of convex relaxations, shedding light on the practice of certified robustness.

Updated: 2024-10-09 12:14:24

标题: 多神经元在凸松弛下释放ReLU网络的表达能力

摘要: 神经网络认证已经确立为确保神经网络鲁棒性的关键工具。认证方法通常依赖于对可行输出集的凸松弛，以提供可靠的边界。然而，完全认证需要精确的边界，这严重限制了ReLU网络的表达能力：即使对于简单的“$\max$”函数在$\mathbb{R}^2$中，也不存在一个能够表达这个函数并能够通过单神经元松弛方法精确边界的ReLU网络。这引出了一个问题，即是否存在一个凸松弛可以为$\mathbb{R}^n$中的一般连续分段线性函数提供精确边界。在这项工作中，我们通过展示（逐层）多神经元松弛提供了一般ReLU网络的完全认证来肯定回答了这个问题。基于这一新结果，我们展示了在多神经元松弛下，ReLU网络的表达能力不再受限。据我们所知，这是关于凸松弛完整性的首个积极结果，为认证鲁棒性的实践带来了启示。

更新时间: 2024-10-09 12:14:24

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2410.06816v1

Quest: Query-centric Data Synthesis Approach for Long-context Scaling of Large Language Model

Recent advancements in large language models (LLMs) have highlighted the importance of extending context lengths for handling complex tasks. While traditional methods for training on long contexts often use filtered long documents, these approaches lead to domain imbalances, limiting model performance. To address this, techniques like random document concatenation (Standard) and similarity-based methods (KNN, ICLM) have been developed. However, they either sacrifice semantic coherence or diversity. To balance both aspects, we introduce Quest, a query-centric data synthesis method aggregating semantically relevant yet diverse documents. Quest uses a generative model to predict potential queries for each document, grouping documents with similar queries and keywords. Extensive experiments demonstrate Quest's superior performance on long-context tasks, achieving remarkable results with context lengths of up to 1M tokens and confirming its scalability across various model sizes.

Updated: 2024-10-09 12:14:22

标题: 探索：用于大语言模型长上下文扩展的查询中心数据合成方法

摘要: 最近大型语言模型（LLMs）的进展凸显了扩展上下文长度以处理复杂任务的重要性。传统的训练长上下文的方法通常使用过滤的长文档，但这些方法会导致领域不平衡，限制模型性能。为了解决这个问题，发展了诸如随机文档串联（标准）和基于相似性的方法（KNN、ICLM）等技术。然而，它们要么牺牲语义连贯性，要么牺牲多样性。为了平衡这两方面，我们引入了Quest，一种以查询为中心的数据综合方法，聚合语义相关但多样化的文档。Quest使用生成模型为每个文档预测潜在查询，将具有相似查询和关键词的文档分组。广泛的实验表明Quest在长上下文任务中表现出优越的性能，实现了在长达1M令牌的上下文长度下取得显著结果，并证实了其在各种模型大小上的可扩展性。

更新时间: 2024-10-09 12:14:22

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2405.19846v5

Shap-Select: Lightweight Feature Selection Using SHAP Values and Regression

Feature selection is an essential process in machine learning, especially when dealing with high-dimensional datasets. It helps reduce the complexity of machine learning models, improve performance, mitigate overfitting, and decrease computation time. This paper presents a novel feature selection framework, shap-select. The framework conducts a linear or logistic regression of the target on the Shapley values of the features, on the validation set, and uses the signs and significance levels of the regression coefficients to implement an efficient heuristic for feature selection in tabular regression and classification tasks. We evaluate shap-select on the Kaggle credit card fraud dataset, demonstrating its effectiveness compared to established methods such as Recursive Feature Elimination (RFE), HISEL (a mutual information-based feature selection method), Boruta and a simpler Shapley value-based method. Our findings show that shap-select combines interpretability, computational efficiency, and performance, offering a robust solution for feature selection.

Updated: 2024-10-09 12:14:06

标题: Shap-Select: 使用SHAP值和回归进行轻量级特征选择

摘要: 特征选择是机器学习中的一个重要过程，特别是在处理高维数据集时。它有助于减少机器学习模型的复杂性，提高性能，减轻过拟合，并减少计算时间。本文提出了一种新颖的特征选择框架，shap-select。该框架在验证集上对目标与特征的Shapley值进行线性或逻辑回归，并利用回归系数的符号和显著性水平来实现对表格回归和分类任务中特征选择的有效启发式。我们在Kaggle信用卡欺诈数据集上评估了shap-select，展示了它相对于已建立的方法（如递归特征消除（RFE）、基于互信息的HISEL特征选择方法、Boruta和一个更简单的Shapley值方法）的有效性。我们的研究结果表明，shap-select结合了可解释性、计算效率和性能，为特征选择提供了一个强大的解决方案。

更新时间: 2024-10-09 12:14:06

领域: cs.LG

下载: http://arxiv.org/abs/2410.06815v1

Defending Membership Inference Attacks via Privacy-aware Sparsity Tuning

Over-parameterized models are typically vulnerable to membership inference attacks, which aim to determine whether a specific sample is included in the training of a given model. Previous Weight regularizations (e.g., L1 regularization) typically impose uniform penalties on all parameters, leading to a suboptimal tradeoff between model utility and privacy. In this work, we first show that only a small fraction of parameters substantially impact the privacy risk. In light of this, we propose Privacy-aware Sparsity Tuning (PAST), a simple fix to the L1 Regularization, by employing adaptive penalties to different parameters. Our key idea behind PAST is to promote sparsity in parameters that significantly contribute to privacy leakage. In particular, we construct the adaptive weight for each parameter based on its privacy sensitivity, i.e., the gradient of the loss gap with respect to the parameter. Using PAST, the network shrinks the loss gap between members and non-members, leading to strong resistance to privacy attacks. Extensive experiments demonstrate the superiority of PAST, achieving a state-of-the-art balance in the privacy-utility trade-off.

Updated: 2024-10-09 12:13:49

标题: 通过隐私感知稀疏调整来防御成员推断攻击

摘要: 过度参数化的模型通常容易受到成员推断攻击的威胁，这些攻击旨在确定特定样本是否包含在给定模型的训练中。先前的权重正则化（例如L1正则化）通常对所有参数施加统一的惩罚，导致模型效用和隐私之间的次优权衡。在这项工作中，我们首先展示只有一小部分参数会显著影响隐私风险。基于此，我们提出了隐私感知稀疏调整（PAST），这是对L1正则化的一个简单修复，通过对不同参数施加自适应惩罚。我们提出PAST的关键思想是促进对隐私泄露有显著贡献的参数的稀疏性。具体来说，我们根据每个参数的隐私敏感度构建自适应权重，即相对于该参数的损失差距的梯度。使用PAST，网络缩小了成员和非成员之间的损失差距，从而有效抵抗隐私攻击。大量实验证明了PAST的优越性，实现了隐私效用权衡的最新平衡。

更新时间: 2024-10-09 12:13:49

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2410.06814v1

From Persona to Personalization: A Survey on Role-Playing Language Agents

Recent advancements in large language models (LLMs) have significantly boosted the rise of Role-Playing Language Agents (RPLAs), i.e., specialized AI systems designed to simulate assigned personas. By harnessing multiple advanced abilities of LLMs, including in-context learning, instruction following, and social intelligence, RPLAs achieve a remarkable sense of human likeness and vivid role-playing performance. RPLAs can mimic a wide range of personas, ranging from historical figures and fictional characters to real-life individuals. Consequently, they have catalyzed numerous AI applications, such as emotional companions, interactive video games, personalized assistants and copilots, and digital clones. In this paper, we conduct a comprehensive survey of this field, illustrating the evolution and recent progress in RPLAs integrating with cutting-edge LLM technologies. We categorize personas into three types: 1) Demographic Persona, which leverages statistical stereotypes; 2) Character Persona, focused on well-established figures; and 3) Individualized Persona, customized through ongoing user interactions for personalized services. We begin by presenting a comprehensive overview of current methodologies for RPLAs, followed by the details for each persona type, covering corresponding data sourcing, agent construction, and evaluation. Afterward, we discuss the fundamental risks, existing limitations, and future prospects of RPLAs. Additionally, we provide a brief review of RPLAs in AI applications, which reflects practical user demands that shape and drive RPLA research. Through this work, we aim to establish a clear taxonomy of RPLA research and applications, and facilitate future research in this critical and ever-evolving field, and pave the way for a future where humans and RPLAs coexist in harmony.

Updated: 2024-10-09 12:11:15

标题: 从角色扮演到个性化：关于角色扮演语言代理的调查

摘要: 最近大型语言模型（LLMs）的进步显著推动了角色扮演语言代理（RPLAs）的崛起，即专门设计用于模拟指定角色的人工智能系统。通过利用LLMs的多种先进能力，包括上下文学习、遵循指令和社交智能，RPLAs实现了令人惊叹的类人性和生动的角色扮演表现。RPLAs可以模仿各种人物，从历史人物和虚构人物到现实生活中的个人。因此，它们催生了许多人工智能应用，如情感伴侣、互动视频游戏、个性化助手和副驾驶员，以及数字克隆。在本文中，我们对这一领域进行了全面调查，展示了RPLAs与尖端LLM技术的演变和最新进展。我们将角色划分为三种类型：1）人口统计学角色，利用统计刻板印象；2）角色角色，专注于知名人物；3）个性化角色，通过持续的用户互动定制个性化服务。我们首先介绍了RPLAs的当前方法论总览，然后详细介绍了每种角色类型的具体情况，涵盖了相应的数据来源、代理人构建和评估。随后，我们讨论了RPLAs的基本风险、现有限制和未来前景。此外，我们还简要回顾了RPLAs在人工智能应用中的情况，反映了塑造和推动RPLA研究的实际用户需求。通过这项工作，我们的目标是建立RPLA研究和应用的清晰分类体系，促进这一关键且不断发展的领域的未来研究，并为人类和RPLAs和谐共存的未来铺平道路。

更新时间: 2024-10-09 12:11:15

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2404.18231v2

OpenGraph: Towards Open Graph Foundation Models

Graph learning has become essential in various domains, including recommendation systems and social network analysis. Graph Neural Networks (GNNs) have emerged as promising techniques for encoding structural information and improving performance in tasks like link prediction and node classification. However, a key challenge remains: the difficulty of generalizing to unseen graph data with different properties. In this work, we propose a novel graph foundation model, called OpenGraph, to address this challenge. Our approach tackles several technical obstacles. Firstly, we enhance data augmentation using a large language model (LLM) to overcome data scarcity in real-world scenarios. Secondly, we introduce a unified graph tokenizer that enables the model to generalize effectively to diverse graph data, even when encountering unseen properties during training. Thirdly, our developed scalable graph transformer captures node-wise dependencies within the global topological context. Extensive experiments validate the effectiveness of our framework. By adapting OpenGraph to new graph characteristics and comprehending diverse graphs, our approach achieves remarkable zero-shot graph learning performance across various settings. We release the model implementation at https://github.com/HKUDS/OpenGraph.

Updated: 2024-10-09 12:10:38

标题: OpenGraph：走向开放图基础模型

摘要: 图学习已经成为各个领域的重要组成部分，包括推荐系统和社交网络分析。图神经网络（GNNs）已经成为一种有前途的技术，可以编码结构信息并提高任务中的性能，如链接预测和节点分类。然而，一个关键挑战仍然存在：难以泛化到具有不同特性的未见图数据。在这项工作中，我们提出了一个新颖的图基础模型，称为OpenGraph，以解决这一挑战。我们的方法解决了几个技术障碍。首先，我们利用大型语言模型（LLM）增强数据增强，以克服现实场景中的数据稀缺性。其次，我们引入了一个统一的图标记器，使模型能够在训练过程中有效地泛化到各种图数据，即使遇到未见属性。第三，我们开发的可扩展图变换器捕获了全局拓扑环境中的节点依赖关系。大量实验证实了我们框架的有效性。通过将OpenGraph适应新的图特性并理解各种图形，我们的方法在各种设置中实现了令人瞩目的零样本图学习性能。我们在https://github.com/HKUDS/OpenGraph发布了模型实现。

更新时间: 2024-10-09 12:10:38

领域: cs.LG,cs.AI,cs.SI

下载: http://arxiv.org/abs/2403.01121v4

Root Defence Strategies: Ensuring Safety of LLM at the Decoding Level

Large language models (LLMs) have demonstrated immense utility across various industries. However, as LLMs advance, the risk of harmful outputs increases due to incorrect or malicious instruction prompts. While current methods effectively address jailbreak risks, they share common limitations: 1) Judging harmful responses from the prefill-level lacks utilization of the model's decoding outputs, leading to relatively lower effectiveness and robustness. 2) Rejecting potentially harmful responses based on a single evaluation can significantly impair the model's helpfulness.This paper examines the LLMs' capability to recognize harmful outputs, revealing and quantifying their proficiency in assessing the danger of previous tokens. Motivated by pilot experiment results, we design a robust defense mechanism at the decoding level. Our novel decoder-oriented, step-by-step defense architecture corrects harmful queries directly rather than rejecting them outright. We introduce speculative decoding to enhance usability and facilitate deployment to boost secure decoding speed. Extensive experiments demonstrate that our approach improves model security without compromising reasoning speed. Notably, our method leverages the model's ability to discern hazardous information, maintaining its helpfulness compared to existing methods.

Updated: 2024-10-09 12:09:30

标题: 根部防御策略：确保LLM在解码水平上的安全

摘要: 大型语言模型（LLMs）已经在各行各业展示出巨大的效用。然而，随着LLMs的进步，由于不正确或恶意的指令提示，有害输出的风险也在增加。尽管当前方法有效地解决了越狱风险，但它们存在共同的局限性：1）仅从预填级别判断有害响应缺乏对模型解码输出的利用，导致相对较低的有效性和稳健性。2）基于单一评估拒绝潜在有害响应可能会严重影响模型的帮助性。本文研究了LLMs识别有害输出的能力，揭示并量化它们评估先前令牌危险性的熟练程度。在受到试验结果的激励下，我们设计了一个在解码级别上的强大的防御机制。我们的新型解码器导向、逐步防御架构直接纠正有害查询而不是直接拒绝它们。我们引入了推测解码以增强可用性，并促进部署以提高安全解码速度。大量实验证明我们的方法提高了模型的安全性，而不影响推理速度。值得注意的是，我们的方法利用了模型辨别危险信息的能力，与现有方法相比保持了其帮助性。

更新时间: 2024-10-09 12:09:30

领域: cs.CL,cs.CR

下载: http://arxiv.org/abs/2410.06809v1

The Clear Sky Corridor: Insights Towards Aerosol Formation in Exoplanets Using An AI-based Survey of Exoplanet Atmospheres

Producing optimized and accurate transmission spectra of exoplanets from telescope data has traditionally been a manual and labor-intensive procedure. Here we present the results of the first attempt to improve and standardize this procedure using artificial intelligence (AI) based processing of light curves and spectroscopic data from transiting exoplanets observed with the Hubble Space Telescope's (HST) Wide Field Camera 3 (WFC3) instrument. We implement an AI-based parameter optimizer that autonomously operates the Eureka pipeline to produce homogeneous transmission spectra of publicly available HST WFC3 datasets, spanning exoplanet types from hot Jupiters to sub-Neptunes. Surveying 43 exoplanets with temperatures between 280 and 2580 Kelvin, we confirm modeled relationships between the amplitude of the water band at 1.4um in hot Jupiters and their equilibrium temperatures. We also identify a similar, novel trend in Neptune/sub-Neptune atmospheres, but shifted to cooler temperatures. Excitingly, a planet mass versus equilibrium temperature diagram reveals a "Clear Sky Corridor," where planets between 700 and 1700 Kelvin (depending on the mass) show stronger 1.4um H2O band measurements. This novel trend points to metallicity as a potentially important driver of aerosol formation. As we unveil and include these new discoveries into our understanding of aerosol formation, we enter a thrilling future for the study of exoplanet atmospheres. With HST sculpting this foundational understanding for aerosol formation in various exoplanet types, ranging from Jupiters to sub-Neptunes, we present a compelling platform for the James Webb Space Telescope (JWST) to discover similar atmospheric trends for more planets across a broader wavelength range.

Updated: 2024-10-09 12:00:56

标题: 晴朗的天空走廊：利用基于人工智能的外星行星大气调查揭示外星行星中气溶胶形成的见解

摘要: 利用望远镜数据生成外星系行星的优化和准确的透射光谱传统上是一个手动和劳动密集的过程。在这里，我们首次尝试通过使用基于人工智能（AI）处理从哈勃太空望远镜（HST）广域相机3（WFC3）仪器观测到的过境外星系行星的光变曲线和光谱数据来改进和标准化这一过程的结果。我们实施了一个基于AI的参数优化器，自主运行Eureka管道，生成公开可用的HST WFC3数据集的均一透射光谱，涵盖了从热木星到亚海王星等不同类型的外星系行星。通过对具有280到2580开尔文温度之间的43颗外星系行星进行调查，我们验证了热木星水带在1.4um处的振幅与其平衡温度之间的建模关系。我们还在海王星/亚海王星大气层中发现了一个类似的、新颖的趋势，但是温度偏低。令人兴奋的是，一张行星质量与平衡温度的图表显示了一个“晴朗天空走廊”，在这个走廊中，介于700和1700开尔文之间的行星（取决于质量）显示出更强的1.4um H2O带测量值。这一新颖趋势指向金属丰度可能是气溶胶形成的一个重要驱动因素。随着我们揭示和将这些新发现纳入我们对气溶胶形成的理解中，我们进入了一个激动人心的未来，探究外星系行星大气层。随着HST在各种外星系行星类型，从木星到亚海王星，塑造这种基础性对气溶胶形成的理解，我们为詹姆斯·韦伯太空望远镜（JWST）提供了一个引人注目的平台，以发现更多行星在更广泛波长范围内的类似大气趋势。

更新时间: 2024-10-09 12:00:56

领域: astro-ph.EP,astro-ph.IM,cs.LG

下载: http://arxiv.org/abs/2410.06804v1

Faithfulness and the Notion of Adversarial Sensitivity in NLP Explanations

Faithfulness is arguably the most critical metric to assess the reliability of explainable AI. In NLP, current methods for faithfulness evaluation are fraught with discrepancies and biases, often failing to capture the true reasoning of models. We introduce Adversarial Sensitivity as a novel approach to faithfulness evaluation, focusing on the explainer's response when the model is under adversarial attack. Our method accounts for the faithfulness of explainers by capturing sensitivity to adversarial input changes. This work addresses significant limitations in existing evaluation techniques, and furthermore, quantifies faithfulness from a crucial yet underexplored paradigm.

Updated: 2024-10-09 11:59:34

标题: 忠诚度和在自然语言处理解释中的对抗敏感性概念

摘要: 忠实性可以说是评估可解释人工智能可靠性的最关键指标。在自然语言处理中，目前用于忠实性评估的方法充满了差异和偏见，往往无法捕捉模型真正的推理过程。我们引入对抗灵敏度作为一种新颖的忠实性评估方法，重点关注当模型遭受对抗攻击时解释者的响应。我们的方法通过捕捉对抗输入变化的灵敏度来考虑解释者的忠实性。这项工作解决了现有评估技术的重大局限性，并且进一步从一个至关重要但尚未充分探索的范式中量化忠实性。

更新时间: 2024-10-09 11:59:34

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2409.17774v2

Enhancing Vision-Language Model Pre-training with Image-text Pair Pruning Based on Word Frequency

We propose Word-Frequency-based Image-Text Pair Pruning (WFPP), a novel data pruning method that improves the efficiency of VLMs. Unlike MetaCLIP, our method does not need metadata for pruning, but selects text-image pairs to prune based on the content of the text. Specifically, WFPP prunes text-image pairs containing high-frequency words across the entire training dataset. The effect of WFPP is to reduce the dominance of frequent words. The result a better balanced word-frequency distribution in the dataset, which is known to improve the training of word embedding models. After pre-training on the pruned subset, we fine-tuned the model on the entire dataset for one additional epoch to achieve better performance. Our experiments demonstrate that applying WFPP when training a CLIP model improves performance on a wide range of downstream tasks. WFPP also provides the advantage of speeding up pre-training by using fewer samples. Additionally, we analyze the training data before and after pruning to visualize how WFPP changes the balance of word frequencies. We hope our work encourages researchers to consider the distribution of words in the training data when pre-training VLMs, not limited to CLIP.

Updated: 2024-10-09 11:54:41

标题: 通过基于词频的图像文本配对修剪增强视觉语言模型预训练

摘要: 我们提出了一种基于词频的图像-文本配对修剪（WFPP）方法，这是一种改进VLMs效率的新型数据修剪方法。与MetaCLIP不同，我们的方法不需要元数据进行修剪，而是根据文本内容选择要修剪的文本-图像配对。具体而言，WFPP修剪包含整个训练数据集中高频词的文本-图像对。WFPP的效果是减少常见词的主导性。结果是数据集中出现了更平衡的词频分布，这已知可以改善词嵌入模型的训练。在修剪的子集上进行预训练后，我们在整个数据集上进行了一轮额外的微调，以达到更好的性能。我们的实验表明，在训练CLIP模型时应用WFPP可以提高各种下游任务的性能。WFPP还提供了利用更少样本加速预训练的优势。此外，我们分析了修剪前后的训练数据，以可视化WFPP如何改变词频的平衡。我们希望我们的工作鼓励研究人员在预训练VLMs时考虑训练数据中词的分布，而不仅限于CLIP。

更新时间: 2024-10-09 11:54:41

领域: cs.LG,cs.AI,cs.CL,cs.CV

下载: http://arxiv.org/abs/2410.10879v1

Efficient Weight-Space Laplace-Gaussian Filtering and Smoothing for Sequential Deep Learning

Efficiently learning a sequence of related tasks, such as in continual learning, poses a significant challenge for neural nets due to the delicate trade-off between catastrophic forgetting and loss of plasticity. We address this challenge with a grounded framework for sequentially learning related tasks based on Bayesian inference. Specifically, we treat the model's parameters as a nonlinear Gaussian state-space model and perform efficient inference using Gaussian filtering and smoothing. This general formalism subsumes existing continual learning approaches, while also offering a clearer conceptual understanding of its components. Leveraging Laplace approximations during filtering, we construct Gaussian posterior measures on the weight space of a neural network for each task. We use it as an efficient regularizer by exploiting the structure of the generalized Gauss-Newton matrix (GGN) to construct diagonal plus low-rank approximations. The dynamics model allows targeted control of the learning process and the incorporation of domain-specific knowledge, such as modeling the type of shift between tasks. Additionally, using Bayesian approximate smoothing can enhance the performance of task-specific models without needing to re-access any data.

Updated: 2024-10-09 11:54:33

标题: 高效的权重空间拉普拉斯-高斯滤波和平滑用于序贯深度学习

摘要: Efficiently learning a sequence of related tasks, such as in continual learning, poses a significant challenge for neural nets due to the delicate trade-off between catastrophic forgetting and loss of plasticity. We address this challenge with a grounded framework for sequentially learning related tasks based on Bayesian inference. Specifically, we treat the model's parameters as a nonlinear Gaussian state-space model and perform efficient inference using Gaussian filtering and smoothing. This general formalism subsumes existing continual learning approaches, while also offering a clearer conceptual understanding of its components. Leveraging Laplace approximations during filtering, we construct Gaussian posterior measures on the weight space of a neural network for each task. We use it as an efficient regularizer by exploiting the structure of the generalized Gauss-Newton matrix (GGN) to construct diagonal plus low-rank approximations. The dynamics model allows targeted control of the learning process and the incorporation of domain-specific knowledge, such as modeling the type of shift between tasks. Additionally, using Bayesian approximate smoothing can enhance the performance of task-specific models without needing to re-access any data.

更新时间: 2024-10-09 11:54:33

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2410.06800v1

Diffuse or Confuse: A Diffusion Deepfake Speech Dataset

Advancements in artificial intelligence and machine learning have significantly improved synthetic speech generation. This paper explores diffusion models, a novel method for creating realistic synthetic speech. We create a diffusion dataset using available tools and pretrained models. Additionally, this study assesses the quality of diffusion-generated deepfakes versus non-diffusion ones and their potential threat to current deepfake detection systems. Findings indicate that the detection of diffusion-based deepfakes is generally comparable to non-diffusion deepfakes, with some variability based on detector architecture. Re-vocoding with diffusion vocoders shows minimal impact, and the overall speech quality is comparable to non-diffusion methods.

Updated: 2024-10-09 11:51:08

标题: “扩散还是混淆：一个扩散深度伪造语音数据集”

摘要: 人工智能和机器学习的进步显著改善了合成语音生成。本文探讨了扩散模型，这是一种用于创建逼真合成语音的新方法。我们利用现有工具和预训练模型创建了一个扩散数据集。此外，本研究评估了扩散生成的深假视频与非扩散视频的质量，以及它们对目前深假检测系统的潜在威胁。研究结果表明，基于扩散的深假视频的检测通常与非扩散深假视频相当，但根据检测器架构会有一些差异。使用扩散声码器重新编码几乎没有影响，整体语音质量与非扩散方法相当。

更新时间: 2024-10-09 11:51:08

领域: cs.CR,cs.AI,cs.LG,cs.SD,I.2.7

下载: http://arxiv.org/abs/2410.06796v1

Hi-SLAM: Scaling-up Semantics in SLAM with a Hierarchically Categorical Gaussian Splatting

We propose Hi-SLAM, a semantic 3D Gaussian Splatting SLAM method featuring a novel hierarchical categorical representation, which enables accurate global 3D semantic mapping, scaling-up capability, and explicit semantic label prediction in the 3D world. The parameter usage in semantic SLAM systems increases significantly with the growing complexity of the environment, making it particularly challenging and costly for scene understanding. To address this problem, we introduce a novel hierarchical representation that encodes semantic information in a compact form into 3D Gaussian Splatting, leveraging the capabilities of large language models (LLMs). We further introduce a novel semantic loss designed to optimize hierarchical semantic information through both inter-level and cross-level optimization. Furthermore, we enhance the whole SLAM system, resulting in improved tracking and mapping performance. Our Hi-SLAM outperforms existing dense SLAM methods in both mapping and tracking accuracy, while achieving a 2x operation speed-up. Additionally, it exhibits competitive performance in rendering semantic segmentation in small synthetic scenes, with significantly reduced storage and training time requirements. Rendering FPS impressively reaches 2,000 with semantic information and 3,000 without it. Most notably, it showcases the capability of handling the complex real-world scene with more than 500 semantic classes, highlighting its valuable scaling-up capability.

Updated: 2024-10-09 11:48:33

标题: Hi-SLAM：在SLAM中通过分层分类高斯雨滴扩展语义

摘要: 我们提出了Hi-SLAM，一种语义3D高斯喷射SLAM方法，具有一种新颖的层次分类表示，可以实现准确的全局3D语义映射、扩展能力以及在3D世界中明确的语义标签预测。随着环境复杂性的增加，语义SLAM系统中的参数使用量显著增加，使场景理解变得尤为具有挑战性和昂贵。为了解决这个问题，我们引入了一种新颖的层次表示，将语义信息以紧凑的形式编码到3D高斯喷射中，利用大型语言模型（LLMs）的能力。我们进一步引入了一种旨在通过层间和跨层优化来优化层次语义信息的新颖语义损失。此外，我们增强了整个SLAM系统，从而提高了跟踪和映射性能。我们的Hi-SLAM在映射和跟踪准确性方面优于现有的密集SLAM方法，同时实现了2倍的操作加速。此外，在小型合成场景中呈现语义分割的竞争性性能，显著减少了存储和训练时间需求。渲染FPS令人印象深刻地达到了2,000，带有语义信息时则为3,000。值得注意的是，它展示了处理具有500多种语义类别的复杂现实世界场景的能力，突显了其有价值的扩展能力。

更新时间: 2024-10-09 11:48:33

领域: cs.RO,cs.AI

下载: http://arxiv.org/abs/2409.12518v2

Universal Vulnerabilities in Large Language Models: Backdoor Attacks for In-context Learning

In-context learning, a paradigm bridging the gap between pre-training and fine-tuning, has demonstrated high efficacy in several NLP tasks, especially in few-shot settings. Despite being widely applied, in-context learning is vulnerable to malicious attacks. In this work, we raise security concerns regarding this paradigm. Our studies demonstrate that an attacker can manipulate the behavior of large language models by poisoning the demonstration context, without the need for fine-tuning the model. Specifically, we design a new backdoor attack method, named ICLAttack, to target large language models based on in-context learning. Our method encompasses two types of attacks: poisoning demonstration examples and poisoning demonstration prompts, which can make models behave in alignment with predefined intentions. ICLAttack does not require additional fine-tuning to implant a backdoor, thus preserving the model's generality. Furthermore, the poisoned examples are correctly labeled, enhancing the natural stealth of our attack method. Extensive experimental results across several language models, ranging in size from 1.3B to 180B parameters, demonstrate the effectiveness of our attack method, exemplified by a high average attack success rate of 95.0% across the three datasets on OPT models.

Updated: 2024-10-09 11:46:24

标题: 大语言模型的通用漏洞：背门攻击用于上下文学习

摘要: 在上下文学习中，一种连接预训练和微调之间差距的范式已经在多个自然语言处理任务中表现出高效性，特别是在少样本设置下。尽管被广泛应用，上下文学习容易受到恶意攻击。在这项研究中，我们提出了关于这种范式的安全性问题。我们的研究表明，攻击者可以通过操纵示范上下文来影响大型语言模型的行为，而无需对模型进行微调。具体来说，我们设计了一种新的后门攻击方法，名为ICLAttack，针对基于上下文学习的大型语言模型。我们的方法包括两种攻击类型：污染示范示例和污染示范提示，这可以使模型按照预定义的意图行事。ICLAttack不需要额外的微调来植入后门，从而保留了模型的通用性。此外，污染示例被正确标记，增强了我们攻击方法的自然隐蔽性。在多个语言模型上进行的大量实验结果显示了我们攻击方法的有效性，其中在OPT模型的三个数据集上，平均攻击成功率高达95.0%。

更新时间: 2024-10-09 11:46:24

领域: cs.CL,cs.AI,cs.CR

下载: http://arxiv.org/abs/2401.05949v6

LayerKV: Optimizing Large Language Model Serving with Layer-wise KV Cache Management

The expanding context windows in large language models (LLMs) have greatly enhanced their capabilities in various applications, but they also introduce significant challenges in maintaining low latency, particularly in Time to First Token (TTFT). This paper identifies that the sharp rise in TTFT as context length increases is predominantly driven by queuing delays, which are caused by the growing demands for GPU Key-Value (KV) cache allocation clashing with the limited availability of KV cache blocks. To address this issue, we propose LayerKV, a simple yet effective plug-in method that effectively reduces TTFT without requiring additional hardware or compromising output performance, while seamlessly integrating with existing parallelism strategies and scheduling techniques. Specifically, LayerKV introduces layer-wise KV block allocation, management, and offloading for fine-grained control over system memory, coupled with an SLO-aware scheduler to optimize overall Service Level Objectives (SLOs). Comprehensive evaluations on representative models, ranging from 7B to 70B parameters, across various GPU configurations, demonstrate that LayerKV improves TTFT latency up to 69x and reduces SLO violation rates by 28.7%, significantly enhancing the user experience.

Updated: 2024-10-09 11:40:31

标题: LayerKV：使用分层KV缓存管理优化大型语言模型的服务

摘要: 大型语言模型（LLMs）中不断扩大的上下文窗口极大地增强了它们在各种应用中的能力，但也在维持低延迟，特别是首个标记到达时间（TTFT）方面引入了重要挑战。本文指出，随着上下文长度的增加，TTFT急剧上升主要是由于排队延迟引起的，这是由于GPU键-值（KV）缓存分配的需求增长与有限的KV缓存块可用性之间的冲突造成的。为了解决这个问题，我们提出了LayerKV，这是一种简单而有效的插件方法，可以有效降低TTFT，而无需额外的硬件或损害输出性能，同时与现有的并行策略和调度技术无缝集成。具体来说，LayerKV引入了逐层KV块分配、管理和卸载，以对系统内存进行细粒度控制，结合了一个SLO感知调度程序，以优化整体服务级目标（SLOs）。对代表性模型进行了全面评估，参数范围从7B到70B，跨各种GPU配置，结果显示LayerKV将TTFT延迟提高了69倍，并将SLO违规率降低了28.7％，显著提升了用户体验。

更新时间: 2024-10-09 11:40:31

领域: cs.DC,cs.AI,cs.LG,I.2.11; C.4

下载: http://arxiv.org/abs/2410.00428v3

On diffusion-based generative models and their error bounds: The log-concave case with full convergence estimates

We provide full theoretical guarantees for the convergence behaviour of diffusion-based generative models under the assumption of strongly log-concave data distributions while our approximating class of functions used for score estimation is made of Lipschitz continuous functions avoiding any Lipschitzness assumption on the score function. We demonstrate via a motivating example, sampling from a Gaussian distribution with unknown mean, the powerfulness of our approach. In this case, explicit estimates are provided for the associated optimization problem, i.e. score approximation, while these are combined with the corresponding sampling estimates. As a result, we obtain the best known upper bound estimates in terms of key quantities of interest, such as the dimension and rates of convergence, for the Wasserstein-2 distance between the data distribution (Gaussian with unknown mean) and our sampling algorithm. Beyond the motivating example and in order to allow for the use of a diverse range of stochastic optimizers, we present our results using an $L^2$-accurate score estimation assumption, which crucially is formed under an expectation with respect to the stochastic optimizer and our novel auxiliary process that uses only known information. This approach yields the best known convergence rate for our sampling algorithm.

Updated: 2024-10-09 11:38:01

标题: 关于基于扩散的生成模型及其误差界限：具有完全收敛估计的对数凹情况

摘要: 我们为扩散式生成模型的收敛行为提供了完整的理论保证，假设数据分布强烈对数凹，而我们用于评分估计的函数近似类由利普希茨连续函数组成，避免了对评分函数的利普希茨性假设。我们通过一个激励示例展示了我们方法的强大性能，即从具有未知均值的高斯分布中进行采样。在这种情况下，我们为相关的优化问题提供了明确的估计，即评分近似，同时结合了相应的采样估计。因此，我们获得了关键量（例如维度和收敛速度）的最佳已知上限估计，用于数据分布（具有未知均值的高斯分布）和我们的采样算法之间的Wasserstein-2距离。除了激励示例之外，为了允许使用各种随机优化器，我们使用一个$L^2$-准确的评分估计假设呈现了我们的结果，这在预期随机优化器和我们的新颖辅助过程之间形成，该过程仅使用已知信息。这种方法为我们的采样算法提供了已知的最佳收敛速度。

更新时间: 2024-10-09 11:38:01

领域: cs.LG,math.OC,math.PR,stat.ML

下载: http://arxiv.org/abs/2311.13584v4

Deep End-to-End Survival Analysis with Temporal Consistency

In this study, we present a novel Survival Analysis algorithm designed to efficiently handle large-scale longitudinal data. Our approach draws inspiration from Reinforcement Learning principles, particularly the Deep Q-Network paradigm, extending Temporal Learning concepts to Survival Regression. A central idea in our method is temporal consistency, a hypothesis that past and future outcomes in the data evolve smoothly over time. Our framework uniquely incorporates temporal consistency into large datasets by providing a stable training signal that captures long-term temporal relationships and ensures reliable updates. Additionally, the method supports arbitrarily complex architectures, enabling the modeling of intricate temporal dependencies, and allows for end-to-end training. Through numerous experiments we provide empirical evidence demonstrating our framework's ability to exploit temporal consistency across datasets of varying sizes. Moreover, our algorithm outperforms benchmarks on datasets with long sequences, demonstrating its ability to capture long-term patterns. Finally, ablation studies show how our method enhances training stability.

Updated: 2024-10-09 11:37:09

标题: 深度端到端生存分析与时间一致性

摘要: 在这项研究中，我们提出了一种新颖的生存分析算法，旨在高效处理大规模的纵向数据。我们的方法受到强化学习原理的启发，特别是深度 Q 网络范式，将时间学习概念扩展到生存回归。我们方法的一个核心思想是时间一致性，即数据中的过去和未来结果随着时间平稳演变的假设。我们的框架通过提供稳定的训练信号将时间一致性独特地纳入大型数据集，捕捉长期时间关系并确保可靠的更新。此外，该方法支持任意复杂的架构，能够建模复杂的时间依赖关系，并允许端到端训练。通过大量实验，我们提供了实证证据，证明我们的框架能够利用各种大小的数据集中的时间一致性。此外，我们的算法在具有长序列的数据集上表现优于基准，展示了其捕捉长期模式的能力。最后，消融研究显示了我们的方法如何增强训练稳定性。

更新时间: 2024-10-09 11:37:09

领域: cs.LG

下载: http://arxiv.org/abs/2410.06786v1

Asymmetry of the Relative Entropy in the Regularization of Empirical Risk Minimization

The effect of relative entropy asymmetry is analyzed in the context of empirical risk minimization (ERM) with relative entropy regularization (ERM-RER). Two regularizations are considered: $(a)$ the relative entropy of the measure to be optimized with respect to a reference measure (Type-I ERM-RER); or $(b)$ the relative entropy of the reference measure with respect to the measure to be optimized (Type-II ERM-RER). The main result is the characterization of the solution to the Type-II ERM-RER problem and its key properties. By comparing the well-understood Type-I ERM-RER with Type-II ERM-RER, the effects of entropy asymmetry are highlighted. The analysis shows that in both cases, regularization by relative entropy forces the solution's support to collapse into the support of the reference measure, introducing a strong inductive bias that can overshadow the evidence provided by the training data. Finally, it is shown that Type-II regularization is equivalent to Type-I regularization with an appropriate transformation of the empirical risk function.

Updated: 2024-10-09 11:28:41

标题: 经验风险最小化中相对熵的正则化的不对称性

摘要: 在相对熵不对称性的影响下，分析了在相对熵正则化（ERM-RER）的背景下的经验风险最小化（ERM）的效果。考虑了两种正则化：$(a)$优化测度相对于参考测度的相对熵（Type-I ERM-RER）；或$(b)$参考测度相对于优化测度的相对熵（Type-II ERM-RER）。主要结果是对Type-II ERM-RER问题的解以及其关键特性进行了表征。通过比较已知的Type-I ERM-RER和Type-II ERM-RER，突出了熵不对称性的影响。分析表明，在两种情况下，通过相对熵正则化将解的支持集合收缩到参考测度的支持集合中，引入了强烈的归纳偏见，可能会掩盖训练数据提供的证据。最后，表明Type-II正则化等效于对经验风险函数进行适当转换的Type-I正则化。

更新时间: 2024-10-09 11:28:41

领域: stat.ML,cs.IT,cs.LG,math.IT

下载: http://arxiv.org/abs/2410.02833v2

ROSE: Register Assisted General Time Series Forecasting with Decomposed Frequency Learning

With the increasing collection of time series data from various domains, there arises a strong demand for general time series forecasting models pre-trained on a large number of time-series datasets to support a variety of downstream prediction tasks. Enabling general time series forecasting faces two challenges: how to obtain unified representations from multi-domian time series data, and how to capture domain-specific features from time series data across various domains for adaptive transfer in downstream tasks. To address these challenges, we propose a Register Assisted General Time Series Forecasting Model with Decomposed Frequency Learning (ROSE), a novel pre-trained model for time series forecasting. ROSE employs Decomposed Frequency Learning for the pre-training task, which decomposes coupled semantic and periodic information in time series with frequency-based masking and reconstruction to obtain unified representations across domains. We also equip ROSE with a Time Series Register, which learns to generate a register codebook to capture domain-specific representations during pre-training and enhances domain-adaptive transfer by selecting related register tokens on downstream tasks. After pre-training on large-scale time series data, ROSE achieves state-of-the-art forecasting performance on 8 real-world benchmarks. Remarkably, even in few-shot scenarios, it demonstrates competitive or superior performance compared to existing methods trained with full data.

Updated: 2024-10-09 11:23:14

标题: ROSE：使用分解频率学习的注册辅助的一般时间序列预测

摘要: 随着从各个领域收集的时间序列数据数量不断增加，人们对于预先训练的通用时间序列预测模型的需求也日益强烈，以支持各种下游预测任务。实现通用时间序列预测面临两个挑战：如何从多领域时间序列数据中获得统一的表示，以及如何从不同领域的时间序列数据中捕获领域特定特征，以进行下游任务的自适应转移。为了解决这些挑战，我们提出了一种具有分解频率学习的注册辅助通用时间序列预测模型（ROSE），这是一种新颖的用于时间序列预测的预训练模型。ROSE采用分解频率学习进行预训练任务，通过基于频率的掩码和重构分解时间序列中的耦合语义和周期信息，以获得跨领域的统一表示。我们还为ROSE配备了一个时间序列注册器，该注册器学习生成一个注册码书，以捕获在预训练期间的领域特定表示，并通过在下游任务中选择相关的注册令牌来增强领域自适应转移。在大规模时间序列数据上进行预训练后，ROSE在8个真实世界基准测试中实现了最先进的预测性能。值得注意的是，即使在少样本场景中，与使用完整数据训练的现有方法相比，它也表现出竞争力或更优越的性能。

更新时间: 2024-10-09 11:23:14

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2405.17478v2

rECGnition_v1.0: Arrhythmia detection using cardiologist-inspired multi-modal architecture incorporating demographic attributes in ECG

A substantial amount of variability in ECG manifested due to patient characteristics hinders the adoption of automated analysis algorithms in clinical practice. None of the ECG annotators developed till date consider the characteristics of the patients in a multi-modal architecture. We employed the XGBoost model to analyze the UCI Arrhythmia dataset, linking patient characteristics to ECG morphological changes. The model accurately classified patient gender using discriminative ECG features with 87.75% confidence. We propose a novel multi-modal methodology for ECG analysis and arrhythmia classification that can help defy the variability in ECG related to patient-specific conditions. This deep learning algorithm, named rECGnition_v1.0 (robust ECG abnormality detection Version 1), fuses Beat Morphology with Patient Characteristics to create a discriminative feature map that understands the internal correlation between both modalities. A Squeeze and Excitation based Patient characteristic Encoding Network (SEPcEnet) has been introduced, considering the patient's demographics. The trained model outperformed the various existing algorithms by achieving the overall F1-score of 0.986 for the ten arrhythmia class classification in the MITDB and achieved near perfect prediction scores of ~0.99 for LBBB, RBBB, Premature ventricular contraction beat, Atrial premature beat and Paced beat. Subsequently, the methodology was validated across INCARTDB, EDB and different class groups of MITDB using transfer learning. The generalizability test provided F1-scores of 0.980, 0.946, 0.977, and 0.980 for INCARTDB, EDB, MITDB AAMI, and MITDB Normal vs. Abnormal Classification, respectively. Therefore, with a more enhanced and comprehensive understanding of the patient being examined and their ECG for diverse CVD manifestations, the proposed rECGnition_v1.0 algorithm paves the way for its deployment in clinics.

Updated: 2024-10-09 11:17:02

标题: rECGnition_v1.0：使用心脏病学家启发的多模态架构和心电图中的人口属性进行心律失常检测

摘要: 患者特征导致的心电图(ECG)变异性大大阻碍了自动化分析算法在临床实践中的应用。迄今为止开发的所有ECG注释器均未考虑多模态架构中的患者特征。我们采用XGBoost模型分析UCI心律失常数据集，将患者特征与ECG形态变化联系起来。该模型利用具有87.75%置信度的区分性ECG特征准确地对患者性别进行分类。我们提出了一种新颖的多模态ECG分析和心律失常分类方法，可以帮助抵御与患者特定情况相关的ECG变异性。这种名为rECGnition_v1.0（鲁棒ECG异常检测版本1）的深度学习算法将节拍形态与患者特征融合，创建一个理解两种模态之间内在相关性的区分性特征图。基于Squeeze and Excitation的患者特征编码网络(SEPcEnet)已经推出，考虑到患者的人口统计学特征。训练模型通过在MITDB中实现十种心律失常分类的整体F1分数为0.986，实现了对LBBB、RBBB、早搏、房性早搏和起搏搏动等几乎完美的预测分数约为0.99。随后，该方法通过迁移学习在INCARTDB、EDB和MITDB的不同类组中进行了验证。泛化测试为INCARTDB、EDB、MITDB AAMI和MITDB正常vs.异常分类分别提供了0.980、0.946、0.977和0.980的F1分数。因此，通过更加全面和深入地了解被检查患者及其心电图的不同心血管疾病表现，所提议的rECGnition_v1.0算法为其在诊所中的部署铺平了道路。

更新时间: 2024-10-09 11:17:02

领域: eess.SP,cs.AI,cs.LG

下载: http://arxiv.org/abs/2410.18985v1

Schur's Positive-Definite Network: Deep Learning in the SPD cone with structure

Estimating matrices in the symmetric positive-definite (SPD) cone is of interest for many applications ranging from computer vision to graph learning. While there exist various convex optimization-based estimators, they remain limited in expressivity due to their model-based approach. The success of deep learning motivates the use of learning-based approaches to estimate SPD matrices with neural networks in a data-driven fashion. However, designing effective neural architectures for SPD learning is challenging, particularly when the task requires additional structural constraints, such as element-wise sparsity. Current approaches either do not ensure that the output meets all desired properties or lack expressivity. In this paper, we introduce SpodNet, a novel and generic learning module that guarantees SPD outputs and supports additional structural constraints. Notably, it solves the challenging task of learning jointly SPD and sparse matrices. Our experiments illustrate the versatility and relevance of SpodNet layers for such applications.

Updated: 2024-10-09 11:15:05

标题: 舒尔正定网络：在带有结构的SPD锥中进行深度学习

摘要: 在对称正定（SPD）锥中估计矩阵在许多应用中都很有意义，从计算机视觉到图学习等。虽然存在各种基于凸优化的估计器，但由于其基于模型的方法，它们在表达能力上仍然存在限制。深度学习的成功激发了使用基于学习的方法以数据驱动的方式估计SPD矩阵的动机。然而，设计用于SPD学习的有效神经结构是具有挑战性的，特别是在任务需要额外的结构约束，如逐元素稀疏性时。当前方法要么不能确保输出满足所有期望的属性，要么缺乏表达能力。在本文中，我们介绍了SpodNet，一个新颖且通用的学习模块，可以保证SPD输出并支持额外的结构约束。值得注意的是，它解决了同时学习SPD和稀疏矩阵的具有挑战性的任务。我们的实验展示了SpodNet层对这类应用的多功能性和相关性。

更新时间: 2024-10-09 11:15:05

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2406.09023v2

Safe and High-Performance Learning of Model Predicitve Control using Kernel-Based Interpolation

We present a method, which allows efficient and safe approximation of model predictive controllers using kernel interpolation. Since the computational complexity of the approximating function scales linearly with the number of data points, we propose to use a scoring function which chooses the most promising data. To further reduce the complexity of the approximation, we restrict our considerations to the set of closed-loop reachable states. That is, the approximating function only has to be accurate within this set. This makes our method especially suited for systems, where the set of initial conditions is small. In order to guarantee safety and high performance of the designed approximated controller, we use reachability analysis based on Monte Carlo methods.

Updated: 2024-10-09 11:04:15

标题: 基于核插值的模型预测控制的安全高性能学习

摘要: 我们提出了一种方法，通过核插值有效且安全地近似模型预测控制器。由于近似函数的计算复杂度随数据点数量线性增长，我们建议使用一个评分函数，选择最有前途的数据。为了进一步减少近似的复杂度，我们将考虑范围限制在闭环可达状态集合上。也就是说，近似函数只需要在该集合内准确。这使得我们的方法特别适用于初始条件集合较小的系统。为了确保设计的近似控制器的安全性和高性能，我们使用基于蒙特卡罗方法的可达性分析。

更新时间: 2024-10-09 11:04:15

领域: eess.SY,cs.LG,cs.SY

下载: http://arxiv.org/abs/2410.06771v1

Prompt Infection: LLM-to-LLM Prompt Injection within Multi-Agent Systems

As Large Language Models (LLMs) grow increasingly powerful, multi-agent systems are becoming more prevalent in modern AI applications. Most safety research, however, has focused on vulnerabilities in single-agent LLMs. These include prompt injection attacks, where malicious prompts embedded in external content trick the LLM into executing unintended or harmful actions, compromising the victim's application. In this paper, we reveal a more dangerous vector: LLM-to-LLM prompt injection within multi-agent systems. We introduce Prompt Infection, a novel attack where malicious prompts self-replicate across interconnected agents, behaving much like a computer virus. This attack poses severe threats, including data theft, scams, misinformation, and system-wide disruption, all while propagating silently through the system. Our extensive experiments demonstrate that multi-agent systems are highly susceptible, even when agents do not publicly share all communications. To address this, we propose LLM Tagging, a defense mechanism that, when combined with existing safeguards, significantly mitigates infection spread. This work underscores the urgent need for advanced security measures as multi-agent LLM systems become more widely adopted.

Updated: 2024-10-09 11:01:29

标题: 即时感染：多智能体系统内的LLM到LLM即时注入

摘要: 随着大型语言模型（LLMs）的不断增强，多智能体系统在现代人工智能应用中变得更加普遍。然而，大部分安全研究集中在单一智能体LLMs中的漏洞上。这些漏洞包括提示注入攻击，其中嵌入在外部内容中的恶意提示欺骗LLMs执行意外或有害操作，危害受害者的应用程序。在本文中，我们揭示了一个更加危险的向量：在多智能体系统内LLM到LLM提示注入。我们介绍了Prompt Infection，一种新型攻击，其中恶意提示在互联智能体之间自我复制，行为类似于计算机病毒。这种攻击带来了严重威胁，包括数据窃取、诈骗、虚假信息和系统范围的破坏，同时在系统中静默传播。我们的广泛实验表明，即使智能体不公开共享所有通信，多智能体系统也极易受到攻击。为了解决这个问题，我们提出了LLM标记，一种防御机制，当与现有保障措施结合时，显著减轻了感染传播。这项工作强调了随着多智能体LLM系统被更广泛采用，迫切需要先进的安全措施。

更新时间: 2024-10-09 11:01:29

领域: cs.MA,cs.AI,cs.CR

下载: http://arxiv.org/abs/2410.07283v1

A Utility-Mining-Driven Active Learning Approach for Analyzing Clickstream Sequences

In rapidly evolving e-commerce industry, the capability of selecting high-quality data for model training is essential. This study introduces the High-Utility Sequential Pattern Mining using SHAP values (HUSPM-SHAP) model, a utility mining-based active learning strategy to tackle this challenge. We found that the parameter settings for positive and negative SHAP values impact the model's mining outcomes, introducing a key consideration into the active learning framework. Through extensive experiments aimed at predicting behaviors that do lead to purchases or not, the designed HUSPM-SHAP model demonstrates its superiority across diverse scenarios. The model's ability to mitigate labeling needs while maintaining high predictive performance is highlighted. Our findings demonstrate the model's capability to refine e-commerce data processing, steering towards more streamlined, cost-effective prediction modeling.

Updated: 2024-10-09 10:44:02

标题: 一种基于实用挖掘驱动的主动学习方法，用于分析点击流序列

摘要: 在快速发展的电子商务行业中，选择高质量数据进行模型训练的能力至关重要。本研究介绍了基于效用挖掘的主动学习策略，即使用SHAP值的高效用顺序模式挖掘（HUSPM-SHAP）模型，以解决这一挑战。我们发现，正负SHAP值的参数设置会影响模型的挖掘结果，引入了主动学习框架的一个关键考虑因素。通过广泛的实验来预测导致购买或不购买的行为，设计的HUSPM-SHAP模型在不同场景下展现出了其优越性。该模型在减少标注需求的同时保持高预测性能的能力得到了强调。我们的研究结果表明了该模型在优化电子商务数据处理方面的能力，朝着更加流畅、成本效益的预测建模方向发展。

更新时间: 2024-10-09 10:44:02

领域: cs.LG

下载: http://arxiv.org/abs/2410.07282v1

Evaluating the Quality of Hallucination Benchmarks for Large Vision-Language Models

Despite the rapid progress and outstanding performance of Large Vision-Language Models (LVLMs) in recent years, LVLMs have been plagued by the issue of hallucination, i.e., LVLMs tend to generate responses that are inconsistent with the corresponding visual inputs. To evaluate the degree of hallucination in LVLMs, previous works have proposed a series of benchmarks featuring different types of tasks and evaluation metrics. However, we find that the quality of the existing hallucination benchmarks varies, with some suffering from problems, e.g., inconsistent evaluation results under repeated tests, and misalignment with human evaluation. To this end, we propose a Hallucination benchmark Quality Measurement framework (HQM), which leverages various indicators to assess the reliability and validity of existing hallucination benchmarks separately. Specifically, for reliability we explore test-retest reliability and parallel-forms reliability, while for validity we examine criterion validity and coverage of hallucination types. Furthermore, based on the results of our quality measurement, we construct a High-Quality Hallucination Benchmark (HQH) for LVLMs, which demonstrates superior reliability and validity under our HQM framework. We conduct an extensive evaluation of over 10 representative LVLMs, including GPT-4o and Gemini-1.5-Pro, to provide an in-depth analysis of the hallucination issues in existing models. Our benchmark is publicly available at https://github.com/HQHBench/HQHBench.

Updated: 2024-10-09 10:43:47

标题: 评估大规模视觉-语言模型的幻觉基准质量

摘要: 尽管近年来大型视觉语言模型（LVLMs）取得了快速进展和出色表现，但LVLMs一直受到幻觉问题的困扰，即LVLMs倾向于生成与相应视觉输入不一致的响应。为了评估LVLMs中幻觉的程度，先前的研究提出了一系列具有不同类型任务和评估指标的基准。然而，我们发现现有幻觉基准的质量存在差异，有些存在问题，例如在重复测试下评估结果不一致，与人类评估不一致等。为此，我们提出了一个幻觉基准质量测量框架（HQM），利用各种指标分别评估现有幻觉基准的可靠性和有效性。具体而言，对于可靠性，我们探讨了测试-重测可靠性和平行形式可靠性，而对于有效性，我们考察了标准效度和幻觉类型的覆盖范围。此外，根据我们质量测量的结果，我们构建了一个高质量的幻觉基准（HQH）用于LVLMs，在我们的HQM框架下表现出卓越的可靠性和有效性。我们对超过10个代表性LVLMs进行了广泛评估，包括GPT-4o和Gemini-1.5-Pro，以深入分析现有模型中的幻觉问题。我们的基准公开可用于https://github.com/HQHBench/HQHBench。

更新时间: 2024-10-09 10:43:47

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2406.17115v2

Adaptive Active Inference Agents for Heterogeneous and Lifelong Federated Learning

Handling heterogeneity and unpredictability are two core problems in pervasive computing. The challenge is to seamlessly integrate devices with varying computational resources in a dynamic environment to form a cohesive system that can fulfill the needs of all participants. Existing work on systems that adapt to changing requirements typically focuses on optimizing individual variables or low-level Service Level Objectives (SLOs), such as constraining the usage of specific resources. While low-level control mechanisms permit fine-grained control over a system, they introduce considerable complexity, particularly in dynamic environments. To this end, we propose drawing from Active Inference (AIF), a neuroscientific framework for designing adaptive agents. Specifically, we introduce a conceptual agent for heterogeneous pervasive systems that permits setting global systems constraints as high-level SLOs. Instead of manually setting low-level SLOs, the system finds an equilibrium that can adapt to environmental changes. We demonstrate the viability of AIF agents with an extensive experiment design, using heterogeneous and lifelong federated learning as an application scenario. We conduct our experiments on a physical testbed of devices with different resource types and vendor specifications. The results provide convincing evidence that an AIF agent can adapt a system to environmental changes. In particular, the AIF agent can balance competing SLOs in resource heterogeneous environments to ensure up to 98% fulfillment rate.

Updated: 2024-10-09 10:43:29

标题: 适应性主动推理代理用于异构和终身联邦学习

摘要: 处理异质性和不可预测性是普适计算中的两个核心问题。挑战在于在动态环境中无缝地整合具有不同计算资源的设备，形成一个能够满足所有参与者需求的协调系统。现有的适应变化需求的系统工作通常侧重于优化个别变量或低级别服务水平目标（SLO），如限制特定资源的使用。虽然低级别控制机制允许对系统进行精细控制，但在动态环境中尤其引入了相当复杂性。为此，我们提出借鉴主动推理（AIF），这是一个用于设计自适应代理的神经科学框架。具体地，我们引入了一个用于异构普适系统的概念代理，允许将全局系统约束设置为高级别SLO。系统不是手动设置低级别SLO，而是找到一个可以适应环境变化的平衡点。我们通过广泛的实验设计展示了AIF代理的可行性，使用异构和终身联邦学习作为应用场景。我们在具有不同资源类型和供应商规格的设备的物理测试平台上进行实验。结果提供了令人信服的证据，表明AIF代理可以使系统适应环境变化。特别是，在资源异构环境中，AIF代理可以平衡竞争的SLO，以确保高达98%的满足率。

更新时间: 2024-10-09 10:43:29

领域: cs.LG,cs.AI,cs.DC

下载: http://arxiv.org/abs/2410.09099v1

Cluster-wise Graph Transformer with Dual-granularity Kernelized Attention

In the realm of graph learning, there is a category of methods that conceptualize graphs as hierarchical structures, utilizing node clustering to capture broader structural information. While generally effective, these methods often rely on a fixed graph coarsening routine, leading to overly homogeneous cluster representations and loss of node-level information. In this paper, we envision the graph as a network of interconnected node sets without compressing each cluster into a single embedding. To enable effective information transfer among these node sets, we propose the Node-to-Cluster Attention (N2C-Attn) mechanism. N2C-Attn incorporates techniques from Multiple Kernel Learning into the kernelized attention framework, effectively capturing information at both node and cluster levels. We then devise an efficient form for N2C-Attn using the cluster-wise message-passing framework, achieving linear time complexity. We further analyze how N2C-Attn combines bi-level feature maps of queries and keys, demonstrating its capability to merge dual-granularity information. The resulting architecture, Cluster-wise Graph Transformer (Cluster-GT), which uses node clusters as tokens and employs our proposed N2C-Attn module, shows superior performance on various graph-level tasks. Code is available at https://github.com/LUMIA-Group/Cluster-wise-Graph-Transformer.

Updated: 2024-10-09 10:30:01

标题: 具有双粒度核化注意力的簇内图变换器

摘要: 在图学习领域中，有一类方法将图概念化为分层结构，利用节点聚类来捕捉更广泛的结构信息。虽然通常有效，但这些方法通常依赖于固定的图粗化程序，导致过度均匀的簇表示和丢失节点级信息。在本文中，我们将图想象为一个由相互连接的节点集组成的网络，而不是将每个簇压缩成单个嵌入。为了实现这些节点集之间的有效信息传递，我们提出了节点到簇的注意力（N2C-Attn）机制。N2C-Attn将多核学习技术整合到核化注意力框架中，有效地捕捉节点和簇级别的信息。然后，我们使用基于簇的消息传递框架设计了N2C-Attn的高效形式，实现了线性时间复杂度。我们进一步分析了N2C-Attn如何结合查询和键的双层特征图，展示其合并双粒度信息的能力。由此产生的架构，基于簇的图变换器（Cluster-GT），使用节点簇作为令牌，并采用我们提出的N2C-Attn模块，在各种图级任务上显示出卓越性能。代码可在https://github.com/LUMIA-Group/Cluster-wise-Graph-Transformer找到。

更新时间: 2024-10-09 10:30:01

领域: cs.LG

下载: http://arxiv.org/abs/2410.06746v1

Utilizing Transfer Learning and pre-trained Models for Effective Forest Fire Detection: A Case Study of Uttarakhand

Forest fires pose a significant threat to the environment, human life, and property. Early detection and response are crucial to mitigating the impact of these disasters. However, traditional forest fire detection methods are often hindered by our reliability on manual observation and satellite imagery with low spatial resolution. This paper emphasizes the role of transfer learning in enhancing forest fire detection in India, particularly in overcoming data collection challenges and improving model accuracy across various regions. We compare traditional learning methods with transfer learning, focusing on the unique challenges posed by regional differences in terrain, climate, and vegetation. Transfer learning can be categorized into several types based on the similarity between the source and target tasks, as well as the type of knowledge transferred. One key method is utilizing pre-trained models for efficient transfer learning, which significantly reduces the need for extensive labeled data. We outline the transfer learning process, demonstrating how researchers can adapt pre-trained models like MobileNetV2 for specific tasks such as forest fire detection. Finally, we present experimental results from training and evaluating a deep learning model using the Uttarakhand forest fire dataset, showcasing the effectiveness of transfer learning in this context.

Updated: 2024-10-09 10:21:45

标题: 利用迁移学习和预训练模型实现有效的森林火灾检测：以北阿坎德邦为例研究

摘要: 森林火灾对环境、人类生命和财产构成重大威胁。早期检测和响应对减轻这些灾害的影响至关重要。然而，传统的森林火灾检测方法通常受到我们对手动观测和空间分辨率较低的卫星图像的依赖而受阻。本文强调了迁移学习在增强印度森林火灾检测中的作用，特别是在克服数据收集挑战和提高模型在各个地区的准确性方面。我们比较了传统学习方法与迁移学习，重点关注地形、气候和植被的区域差异带来的独特挑战。迁移学习可以根据源任务和目标任务之间的相似性以及转移知识的类型分为几种类型。一种关键方法是利用预训练模型进行高效的迁移学习，这显著减少了对大量标记数据的需求。我们概述了迁移学习过程，展示了研究人员如何调整像MobileNetV2这样的预训练模型以用于特定任务，如森林火灾检测。最后，我们展示了使用Uttarakhand森林火灾数据集训练和评估深度学习模型的实验结果，展示了迁移学习在这种情境下的有效性。

更新时间: 2024-10-09 10:21:45

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2410.06743v1

Inference over Unseen Entities, Relations and Literals on Knowledge Graphs

In recent years, knowledge graph embedding models have been successfully applied in the transductive setting to tackle various challenging tasks including link prediction, and query answering. Yet, the transductive setting does not allow for reasoning over unseen entities, relations, let alone numerical or non-numerical literals. Although increasing efforts are put into exploring inductive scenarios, inference over unseen entities, relations, and literals has yet to come. This limitation prohibits the existing methods from handling real-world dynamic knowledge graphs involving heterogeneous information about the world. Here, we propose a remedy to this limitation. We propose the attentive byte-pair encoding layer (BytE) to construct a triple embedding from a sequence of byte-pair encoded subword units of entities and relations. Compared to the conventional setting, BytE leads to massive feature reuse via weight tying, since it forces a knowledge graph embedding model to learn embeddings for subword units instead of entities and relations directly. Consequently, the size of the embedding matrices are not anymore bound to the unique number of entities and relations of a knowledge graph. Experimental results show that BytE improves the link prediction performance of 4 knowledge graph embedding models on datasets where the syntactic representations of triples are semantically meaningful. However, benefits of training a knowledge graph embedding model with BytE dissipate on knowledge graphs where entities and relations are represented with plain numbers or URIs. We provide an open source implementation of BytE to foster reproducible research.

Updated: 2024-10-09 10:20:54

标题: 在知识图谱上对未见实体、关系和文字的推理

摘要: 最近几年，知识图谱嵌入模型已成功应用于传导设置中，以解决各种具有挑战性的任务，包括链接预测和查询回答。然而，传导设置并不允许对未见实体、关系，更不用说数值或非数值文字进行推理。尽管增加了对归纳场景的探索，但对未见实体、关系和文字的推理仍未出现。这种限制阻碍了现有方法处理涉及世界各种异构信息的真实动态知识图谱。在这里，我们提出了对这种限制的补救措施。我们提出了注意力字节对编码层（BytE），用于从实体和关系的字节对编码子词单元序列构建三元嵌入。与传统设置相比，BytE通过权重绑定带来了大规模特征重用，因为它强制知识图嵌入模型学习子词单元的嵌入，而不是直接针对实体和关系。因此，嵌入矩阵的大小不再受限于知识图谱中实体和关系的唯一数量。实验结果显示，BytE改善了4种知识图谱嵌入模型在三元组的句法表示在语义上有意义的数据集上的链接预测性能。然而，在实体和关系用纯数字或URI表示的知识图上，通过BytE训练知识图嵌入模型的好处会消失。我们提供了BytE的开源实现，以促进可重复性研究。

更新时间: 2024-10-09 10:20:54

领域: cs.LG

下载: http://arxiv.org/abs/2410.06742v1

CoBa: Convergence Balancer for Multitask Finetuning of Large Language Models

Multi-task learning (MTL) benefits the fine-tuning of large language models (LLMs) by providing a single model with improved performance and generalization ability across tasks, presenting a resource-efficient alternative to developing separate models for each task. Yet, existing MTL strategies for LLMs often fall short by either being computationally intensive or failing to ensure simultaneous task convergence. This paper presents CoBa, a new MTL approach designed to effectively manage task convergence balance with minimal computational overhead. Utilizing Relative Convergence Scores (RCS), Absolute Convergence Scores (ACS), and a Divergence Factor (DF), CoBa dynamically adjusts task weights during the training process, ensuring that the validation loss of all tasks progress towards convergence at an even pace while mitigating the issue of individual task divergence. The results of our experiments involving three disparate datasets underscore that this approach not only fosters equilibrium in task improvement but enhances the LLMs' performance by up to 13% relative to the second-best baselines. Code is open-sourced at https://github.com/codefuse-ai/MFTCoder.

Updated: 2024-10-09 10:20:32

标题: CoBa：用于大型语言模型多任务微调的收敛平衡器

摘要: 多任务学习（MTL）通过为大型语言模型（LLMs）提供一个性能和泛化能力更强的单一模型，从而有利于对其进行微调，为为每个任务开发单独模型的资源高效替代方案。然而，现有的LLMs的MTL策略往往要么计算密集，要么无法确保任务同时收敛。本文提出了CoBa，一种新的MTL方法，旨在在最小计算开销的情况下有效管理任务收敛平衡。利用相对收敛分数（RCS）、绝对收敛分数（ACS）和发散因子（DF），CoBa在训练过程中动态调整任务权重，确保所有任务的验证损失以均匀的速度向收敛进展，同时缓解个别任务发散的问题。我们对涉及三个不同数据集的实验结果表明，这种方法不仅促进了任务改善的平衡，而且相对于次佳基线，提高了LLMs的性能高达13%。代码已在https://github.com/codefuse-ai/MFTCoder开源。

更新时间: 2024-10-09 10:20:32

领域: cs.CL,cs.LG

下载: http://arxiv.org/abs/2410.06741v1

Federated learning with distributed fixed design quantum chips and quantum channels

The privacy in classical federated learning can be breached through the use of local gradient results combined with engineered queries to the clients. However, quantum communication channels are considered more secure because a measurement on the channel causes a loss of information, which can be detected by the sender. Therefore, the quantum version of federated learning can be used to provide better privacy. Additionally, sending an $N$-dimensional data vector through a quantum channel requires sending $\log N$ entangled qubits, which can potentially provide efficiency if the data vector is utilized as quantum states. In this paper, we propose a quantum federated learning model in which fixed design quantum chips are operated based on the quantum states sent by a centralized server. Based on the incoming superposition states, the clients compute and then send their local gradients as quantum states to the server, where they are aggregated to update parameters. Since the server does not send model parameters, but instead sends the operator as a quantum state, the clients are not required to share the model. This allows for the creation of asynchronous learning models. In addition, the model is fed into client-side chips directly as a quantum state; therefore, it does not require measurements on the incoming quantum state to obtain model parameters in order to compute gradients. This can provide efficiency over models where the parameter vector is sent via classical or quantum channels and local gradients are obtained through the obtained values these parameters.

Updated: 2024-10-09 10:19:49

标题: 使用分布式固定设计量子芯片和量子通道的联邦学习

摘要: 经典联邦学习中的隐私可能会通过利用本地梯度结果与客户端的工程化查询而被突破。然而，量子通信渠道被认为更安全，因为对通道的测量会导致信息丢失，发件人可以检测到。因此，量子版本的联邦学习可以用来提供更好的隐私保护。此外，通过量子通道发送一个$N$维数据向量需要发送$\log N$个纠缠的量子比特，如果数据向量被用作量子态，这可能会提高效率。在本文中，我们提出了一个量子联邦学习模型，其中基于中央服务器发送的量子态，固定设计的量子芯片进行操作。根据传入的叠加态，客户端计算并将本地梯度作为量子态发送到服务器，然后对其进行聚合以更新参数。由于服务器不发送模型参数，而是以量子态形式发送算子，因此客户端无需共享模型。这允许创建异步学习模型。此外，模型直接作为量子态馈送到客户端芯片中，因此不需要对传入的量子态进行测量以获取模型参数以计算梯度。这可以提供比通过经典或量子通道发送参数向量并通过这些参数的值获得本地梯度的模型更高的效率。

更新时间: 2024-10-09 10:19:49

领域: quant-ph,cs.DC,cs.LG

下载: http://arxiv.org/abs/2401.13421v3

Predictability and Fairness in Load Aggregation with Deadband

Virtual power plants and load aggregation are becoming increasingly common. There, one regulates the aggregate power output of an ensemble of distributed energy resources (DERs). Marecek et al. [Automatica, Volume 147, January 2023, 110743, arXiv:2110.03001] recently suggested that long-term averages of prices or incentives offered should exist and be independent of the initial states of the operators of the DER, the aggregator, and the power grid. This can be seen as predictability, which underlies fairness. Unfortunately, the existence of such averages cannot be guaranteed with many traditional regulators, including the proportional-integral (PI) regulator with or without deadband. Here, we consider the effects of losses in the alternating current model and the deadband in the controller. This yields a non-linear dynamical system (due to the non-linear losses) exhibiting discontinuities (due to the deadband). We show that Filippov invariant measures enable reasoning about predictability and fairness while considering non-linearity of the alternating-current model and deadband.

Updated: 2024-10-09 10:17:32

标题: 负载汇总中的可预测性和公平性与死区

摘要: 虚拟电厂和负荷聚合越来越普遍。在那里，一个调节分布式能源资源（DERs）集合的总体功率输出。Marecek等人[Automatica，2023年1月147卷，110743，arXiv:2110.03001]最近建议应存在并且独立于DER操作员、聚合器和电网的初始状态的价格或激励的长期平均值。这可以看作是可预测性，而公平性是其基础。不幸的是，许多传统调节器，包括有或没有死区的比例积分（PI）调节器，无法保证这样的平均存在。在这里，我们考虑交流模型中的损耗和控制器中的死区的影响。这导致了一个非线性动力系统（由于非线性损耗）展示了不连续性（由于死区）。我们展示Filippov不变测量使得在考虑交流模型和死区的非线性的情况下对可预测性和公平性进行推理。

更新时间: 2024-10-09 10:17:32

领域: math.OC,cs.AI,cs.SY,eess.SY

下载: http://arxiv.org/abs/2305.17725v2

Which Programming Language and What Features at Pre-training Stage Affect Downstream Logical Inference Performance?

Recent large language models (LLMs) have demonstrated remarkable generalization abilities in mathematics and logical reasoning tasks. Prior research indicates that LLMs pre-trained with programming language data exhibit high mathematical and reasoning abilities; however, this causal relationship has not been rigorously tested. Our research aims to verify which programming languages and features during pre-training affect logical inference performance. Specifically, we pre-trained decoder-based language models from scratch using datasets from ten programming languages (e.g., Python, C, Java) and three natural language datasets (Wikipedia, Fineweb, C4) under identical conditions. Thereafter, we evaluated the trained models in a few-shot in-context learning setting on logical reasoning tasks: FLD and bAbi, which do not require commonsense or world knowledge. The results demonstrate that nearly all models trained with programming languages consistently outperform those trained with natural languages, indicating that programming languages contain factors that elicit logic inference performance. In addition, we found that models trained with programming languages exhibit a better ability to follow instructions compared to those trained with natural languages. Further analysis reveals that the depth of Abstract Syntax Trees representing parsed results of programs also affects logical reasoning performance. These findings will offer insights into the essential elements of pre-training for acquiring the foundational abilities of LLMs.

Updated: 2024-10-09 10:13:13

标题: 哪种编程语言和预训练阶段的哪些特性会影响下游逻辑推理性能？

摘要: 最近，大型语言模型(LLMs)在数学和逻辑推理任务中展现出了非凡的泛化能力。先前的研究表明，使用编程语言数据进行预训练的LLMs具有较高的数学和推理能力；然而，这种因果关系尚未得到严格测试。我们的研究旨在验证预训练过程中哪些编程语言和特征会影响逻辑推理性能。具体来说，我们从头开始使用来自十种编程语言（例如Python、C、Java）和三个自然语言数据集（维基百科、Fineweb、C4）的数据集预训练解码器为基础的语言模型，在相同条件下进行。随后，我们在少样本上下文学习设置中评估了训练模型在逻辑推理任务中的表现：FLD和bAbi，这些任务不需要常识或世界知识。结果表明，几乎所有使用编程语言进行训练的模型在逻辑推理任务中始终优于那些使用自然语言进行训练的模型，表明编程语言包含促进逻辑推理性能的因素。此外，我们发现，使用编程语言进行训练的模型比使用自然语言进行训练的模型更能够遵循指令。进一步的分析显示，表示程序解析结果的抽象语法树的深度也影响了逻辑推理性能。这些发现将为获取LLMs基本能力的预训练的重要元素提供见解。

更新时间: 2024-10-09 10:13:13

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2410.06735v1

Herald: A Natural Language Annotated Lean 4 Dataset

Verifiable formal languages like Lean have profoundly impacted mathematical reasoning, particularly through the use of large language models (LLMs) for automated reasoning. A significant challenge in training LLMs for these formal languages is the lack of parallel datasets that align natural language with formal language proofs. To address this challenge, this paper introduces a novel framework for translating the Mathlib4 corpus (a unified library of mathematics in formal language Lean 4) into natural language. Building upon this, we employ a dual augmentation strategy that combines tactic-based and informal-based approaches, leveraging the Lean-jixia system, a Lean 4 analyzer. We present the results of this pipeline on Mathlib4 as Herald (Hierarchy and Retrieval-based Translated Lean Dataset). We also propose the Herald Translator, which is fine-tuned on Herald. Herald translator achieves a 93.2% accuracy (Pass@128) on formalizing statements in the miniF2F-test and a 22.5% accuracy on our internal graduate-level textbook dataset, outperforming InternLM2-Math-Plus-7B (74.0% and 7.5%) and TheoremLlama (50.1% and 4.0%). Furthermore, we propose a section-level translation framework for real-world applications. As a direct application of Herald translator, we have successfully translated a template section in the Stack project, marking a notable progress in the automatic formalization of graduate-level mathematical literature. Our model, along with the datasets, will be open-sourced to the public soon.

Updated: 2024-10-09 10:11:24

标题: 传令官：一个自然语言标注的Lean 4数据集

摘要: 可验证的形式语言如Lean对数学推理产生了深远影响，特别是通过使用大型语言模型（LLMs）进行自动推理。训练这些形式语言的LLMs的一个重要挑战是缺乏将自然语言与形式语言证明对齐的并行数据集。为了解决这一挑战，本文介绍了一个新颖的框架，用于将Mathlib4语料库（一种使用形式语言Lean 4编写的数学统一库）转化为自然语言。在此基础上，我们采用了一种双重增强策略，结合了基于策略和非正式方法，利用Lean 4分析器Lean-jixia系统。我们在Mathlib4上展示了这一流水线的结果，命名为Herald（层次结构和基于检索的翻译Lean数据集）。我们还提出了Herald Translator，该Translator在Herald上进行了微调。Herald Translator在miniF2F-test中实现了93.2%的准确率（Pass@128），在我们的内部研究生级教科书数据集中实现了22.5%的准确率，优于InternLM2-Math-Plus-7B（74.0%和7.5%）和TheoremLlama（50.1%和4.0%）。此外，我们提出了一个用于实际应用的章节级翻译框架。作为Herald Translator的直接应用，我们已成功将Stack项目中的一个模板部分翻译，标志着在自动化形式化研究生级数学文献方面取得了显著进展。我们的模型以及数据集将很快向公众开放源代码。

更新时间: 2024-10-09 10:11:24

领域: cs.CL,cs.AI,cs.LG,cs.LO

下载: http://arxiv.org/abs/2410.10878v1

Steering Large Language Models using Conceptors: Improving Addition-Based Activation Engineering

Large language models have transformed AI, yet reliably controlling their outputs remains a challenge. This paper explores activation engineering, where outputs of pre-trained LLMs are controlled by manipulating their activations at inference time. Unlike traditional methods using a single steering vector, we introduce conceptors - mathematical constructs that represent sets of activation vectors as ellipsoidal regions. Conceptors act as soft projection matrices and offer more precise control over complex activation patterns. Our experiments demonstrate that conceptors outperform traditional methods across multiple in-context learning steering tasks. We further use Boolean operations on conceptors that allows for combined steering goals that empirically outperforms combining steering vectors on a set of tasks. These results highlight conceptors as a promising tool for more effective steering of LLMs.

Updated: 2024-10-09 10:09:37

标题: 使用概念器引导大型语言模型：改进基于加法激活工程的技术请问还有其他需要帮忙的吗？如果有，请告诉我。谢谢！

摘要: 大型语言模型已经改变了人工智能，但是可靠地控制它们的输出仍然是一个挑战。本文探讨了激活工程，通过在推理时操纵预训练的LLMs的激活来控制它们的输出。与传统方法使用单个控制向量不同，我们引入了概念器 - 数学构造，代表激活向量的集合为椭球形区域。概念器作为软投影矩阵，提供了对复杂激活模式更精确的控制。我们的实验表明，概念器在多个上下文学习控制任务中优于传统方法。我们进一步使用概念器上的布尔运算，允许结合控制目标，从经验上优于在一组任务上结合控制向量。这些结果突出了概念器作为更有效地控制LLMs的有希望的工具。

更新时间: 2024-10-09 10:09:37

领域: cs.NE,cs.LG

下载: http://arxiv.org/abs/2410.16314v1

Weak-eval-Strong: Evaluating and Eliciting Lateral Thinking of LLMs with Situation Puzzles

While advancements in NLP have significantly improved the performance of Large Language Models (LLMs) on tasks requiring vertical thinking, their lateral thinking capabilities remain under-explored and challenging to measure due to the complexity of assessing creative thought processes and the scarcity of relevant data. To address these challenges, we introduce SPLAT, a benchmark leveraging Situation Puzzles to evaluate and elicit LAteral Thinking of LLMs. This benchmark, containing 975 graded situation puzzles across three difficulty levels, employs a new multi-turn player-judge framework instead of the traditional model-based evaluation, which often necessitates a stronger evaluation model. This framework simulates an interactive game where the model (player) asks the evaluation model (judge) questions about an incomplete story to infer the full scenario. The judge answers based on a detailed reference scenario or evaluates if the player's predictions align with the reference one. This approach lessens dependence on more robust evaluation models, enabling the assessment of state-of-the-art LLMs. The experiments demonstrate that a robust evaluation model, such as WizardLM-2, closely matches human judgements in both intermediate question-answering and final scenario accuracy, achieving over 80% agreement-similar to the agreement levels among humans. Furthermore, applying data and reasoning processes from our benchmark to other lateral thinking-related benchmarks, e.g., RiddleSense and BrainTeaser, leads to performance enhancements. This suggests that our benchmark effectively evaluates and elicits the lateral thinking abilities of LLMs. Code is available at: https://github.com/chenqi008/LateralThinking.

Updated: 2024-10-09 10:09:11

标题: 弱-评估-强：通过情景谜题评估和引导LLMs的横向思维

摘要: 尽管自然语言处理的进展显著提高了大型语言模型（LLMs）在需要垂直思维的任务上的性能，但它们的横向思维能力仍未得到充分探索，并且由于评估创造性思维过程的复杂性和相关数据的稀缺性，很难进行量化。为了解决这些挑战，我们引入了SPLAT，一个利用情境谜题来评估和引发LLMs横向思维的基准。该基准包含975个分级情境谜题，分为三个难度级别，采用了一个新的多轮玩家-评判者框架，而不是传统的基于模型的评估，后者通常需要更强的评估模型。这个框架模拟了一个互动游戏，在游戏中，模型（玩家）向评判者提问关于一个不完整故事的问题，以推断全面的情景。评判者基于详细的参考情景回答或评估玩家的预测是否与参考情景一致。这种方法减少了对更强大的评估模型的依赖，使得能够评估最先进的LLMs。实验证明，一个强大的评估模型，比如WizardLM-2，在中间问答和最终情景准确性方面与人类判断接近，达到了80%以上的一致性水平，类似于人类之间的一致性水平。此外，将我们基准的数据和推理过程应用到其他横向思维相关的基准，如RiddleSense和BrainTeaser，可以提升性能。这表明我们的基准有效地评估和引发了LLMs的横向思维能力。源代码可在https://github.com/chenqi008/LateralThinking获取。

更新时间: 2024-10-09 10:09:11

领域: cs.CL,cs.AI,cs.CV

下载: http://arxiv.org/abs/2410.06733v1

LISBET: a machine learning model for the automatic segmentation of social behavior motifs

Social behavior is crucial for survival in many animal species, and a heavily investigated research subject. Current analysis methods generally rely on measuring animal interaction time or annotating predefined behaviors. However, these approaches are time consuming, human biased, and can fail to capture subtle behaviors. Here we introduce LISBET (LISBET Is a Social BEhavior Transformer), a machine learning model for detecting and segmenting social interactions. Using self-supervised learning on body tracking data, our model eliminates the need for extensive human annotation. We tested LISBET in three scenarios across multiple datasets in mice: supervised behavior classification, unsupervised motifs segmentation, and unsupervised animal phenotyping. Additionally, in vivo electrophysiology revealed distinct neural signatures in the Ventral Tegmental Area corresponding to motifs identified by our model. In summary, LISBET automates data annotation and reduces human bias in social behavior research, offering a promising approach to enhance our understanding of behavior and its neural correlates.

Updated: 2024-10-09 10:08:29

标题: LISBET：一种用于自动分割社会行为模式的机器学习模型

摘要: 社会行为对于许多动物物种的生存至关重要，并且是一个被广泛研究的课题。目前的分析方法通常依赖于测量动物的互动时间或注释预定义的行为。然而，这些方法耗时，受人类偏见，且可能无法捕捉到微妙的行为。在这里，我们介绍了LISBET（LISBET是一种社会行为转换器），这是一种用于检测和分割社会互动的机器学习模型。通过对身体追踪数据进行自监督学习，我们的模型消除了对广泛人类注释的需要。我们在小鼠的多个数据集中测试了LISBET在三种情景下的表现：监督行为分类、无监督主题分割和无监督动物表型学。此外，体内电生理学揭示了与我们的模型识别的主题相对应的腹侧袗质区的明显神经特征。总之，LISBET自动化数据注释，减少了社会行为研究中的人类偏见，提供了一种有望增进我们对行为及其神经相关性的理解的方法。

更新时间: 2024-10-09 10:08:29

领域: cs.CV,cs.LG,q-bio.QM,stat.ML

下载: http://arxiv.org/abs/2311.04069v2

Improving Data Efficiency via Curating LLM-Driven Rating Systems

Instruction tuning is critical for adapting large language models (LLMs) to downstream tasks, and recent studies have demonstrated that small amounts of human-curated data can outperform larger datasets, challenging traditional data scaling laws. While LLM-based data quality rating systems offer a cost-effective alternative to human annotation, they often suffer from inaccuracies and biases, even in powerful models like GPT-4. In this work, we introduce DS2, a Diversity-aware Score curation method for Data Selection. By systematically modeling error patterns through a score transition matrix, DS2 corrects LLM-based scores and promotes diversity in the selected data samples. Our approach shows that a curated subset (just 3.3% of the original dataset) outperforms full-scale datasets (300k samples) across various machine-alignment benchmarks, and matches or surpasses human-aligned datasets such as LIMA with the same sample size (1k samples). These findings challenge conventional data scaling assumptions, highlighting that redundant, low-quality samples can degrade performance and reaffirming that "more can be less."

Updated: 2024-10-09 10:07:55

标题: 通过精心策划LLM驱动的评分系统来提高数据效率

摘要: 指导调整对于适应大型语言模型（LLMs）到下游任务至关重要，最近的研究表明，少量人工筛选的数据可以胜过更大的数据集，挑战传统的数据扩展定律。虽然基于LLM的数据质量评分系统提供了一种经济高效的替代方案，但它们往往存在不准确和偏见，即使在像GPT-4这样的强大模型中也是如此。在这项工作中，我们引入了DS2，一种用于数据选择的具有多样性意识的评分筛选方法。通过系统地建模错误模式，通过评分转换矩阵，DS2纠正了LLM的基础分数，并促进了选定数据样本的多样性。我们的方法表明，一个经过筛选的子集（仅原始数据集的3.3%）在各种机器对齐基准测试中胜过全尺度数据集（30万个样本），并且与具有相同样本大小（1k个样本）的人工对齐数据集如LIMA相匹配或超过。这些发现挑战了传统的数据扩展假设，强调多余的、质量低的样本可能会降低性能，并重申了“更多可能意味着更少”的观点。

更新时间: 2024-10-09 10:07:55

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2410.10877v1

Enhancing Interpretability and Generalizability in Extended Isolation Forests

Anomaly Detection (AD) focuses on identifying unusual behaviors in complex datasets. Machine Learning (ML) algorithms and Decision Support Systems (DSSs) provide effective solutions for AD, but detecting anomalies alone may not be enough, especially in engineering, where diagnostics and maintenance are crucial. Users need clear explanations to support root cause analysis and build trust in the model. The unsupervised nature of AD, however, makes interpretability a challenge. This paper introduces Extended Isolation Forest Feature Importance (ExIFFI), a method that explains predictions made by Extended Isolation Forest (EIF) models, which split data using hyperplanes. ExIFFI provides explanations at both global and local levels by leveraging feature importance. We also present an improved version, Enhanced Extended Isolation Forest (EIF+), designed to enhance the model's ability to detect unseen anomalies through a revised splitting strategy. Using five synthetic and eleven real-world datasets, we conduct a comparative analysis, evaluating unsupervised AD methods with the Average Precision metric. EIF+ consistently outperforms EIF across all datasets when trained without anomalies, demonstrating better generalization. To assess ExIFFI's interpretability, we introduce the Area Under the Curve of Feature Selection (AUC\_FS), a novel metric using feature selection as a proxy task. ExIFFI outperforms other unsupervised interpretability methods on 8 of 11 real-world datasets and successfully identifies anomalous features in synthetic datasets. When trained only on inliers, ExIFFI also outperforms competing models on real-world data and accurately detects anomalous features in synthetic datasets. We provide open-source code to encourage further research and reproducibility.

Updated: 2024-10-09 09:56:32

标题: 增强扩展隔离森林的解释性和泛化能力

摘要: 异常检测（AD）专注于识别复杂数据集中的异常行为。机器学习（ML）算法和决策支持系统（DSS）为AD提供了有效的解决方案，但仅仅检测异常可能不足够，特别是在工程领域，诊断和维护至关重要。用户需要清晰的解释来支持根本原因分析并建立对模型的信任。然而，AD的非监督性使得解释性成为一个挑战。本文介绍了扩展隔离森林特征重要性（ExIFFI），一种解释Extended Isolation Forest（EIF）模型的预测的方法，该模型使用超平面分割数据。ExIFFI通过利用特征重要性在全局和局部水平提供解释。我们还提出了改进版本，Enhanced Extended Isolation Forest（EIF+），旨在通过修订的分割策略增强模型对未知异常的检测能力。在使用五个合成和十一个真实世界数据集进行的比较分析中，我们评估了使用平均精度指标的无监督AD方法。EIF+在所有数据集上始终优于EIF，当不包含异常进行训练时，表现出更好的泛化能力。为了评估ExIFFI的解释性，我们引入了特征选择曲线下面积（AUC\_FS），这是一种使用特征选择作为代理任务的新颖指标。ExIFFI在8个真实世界数据集中优于其他无监督的解释性方法，并成功识别合成数据集中的异常特征。当仅在内点上进行训练时，ExIFFI也优于竞争模型在真实世界数据上，并准确地检测合成数据集中的异常特征。我们提供开源代码以促进进一步研究和可重复性。

更新时间: 2024-10-09 09:56:32

领域: stat.ML,cs.LG,stat.AP

下载: http://arxiv.org/abs/2310.05468v3

Sharp Bounds of the Causal Effect Under MNAR Confounding

We report bounds for any contrast between the probabilities of the counterfactual outcome under exposure and non-exposure when the confounders are missing not at random. We assume that the missingness mechanism is outcome-independent, and prove that our bounds are arbitrarily sharp, i.e., practically attainable or logically possible.

Updated: 2024-10-09 09:50:06

标题: 在MNAR混杂下因果效应的尖锐界限

摘要: 我们报告了在混杂因素不随机缺失时，在暴露和非暴露情况下的反事实结果概率之间的任何对比的界限。我们假设缺失机制是与结果无关的，并证明我们的界限是任意尖锐的，即在实践中可达到或在逻辑上可能的。

更新时间: 2024-10-09 09:50:06

领域: stat.ME,cs.LG,stat.ML

下载: http://arxiv.org/abs/2410.06726v1

Evaluating the Impact of Point Cloud Colorization on Semantic Segmentation Accuracy

Point cloud semantic segmentation, the process of classifying each point into predefined categories, is essential for 3D scene understanding. While image-based segmentation is widely adopted due to its maturity, methods relying solely on RGB information often suffer from degraded performance due to color inaccuracies. Recent advancements have incorporated additional features such as intensity and geometric information, yet RGB channels continue to negatively impact segmentation accuracy when errors in colorization occur. Despite this, previous studies have not rigorously quantified the effects of erroneous colorization on segmentation performance. In this paper, we propose a novel statistical approach to evaluate the impact of inaccurate RGB information on image-based point cloud segmentation. We categorize RGB inaccuracies into two types: incorrect color information and similar color information. Our results demonstrate that both types of color inaccuracies significantly degrade segmentation accuracy, with similar color errors particularly affecting the extraction of geometric features. These findings highlight the critical need to reassess the role of RGB information in point cloud segmentation and its implications for future algorithm design.

Updated: 2024-10-09 09:46:53

标题: 评估点云着色对语义分割精度的影响

摘要: 点云语义分割是将每个点分类为预定义类别的过程，对于3D场景理解至关重要。虽然基于图像的分割由于成熟度而被广泛采用，但仅依赖RGB信息的方法通常由于颜色不准确而性能下降。最近的进展已经将附加特征（如强度和几何信息）纳入其中，但当颜色化错误发生时，RGB通道仍然会负面影响分割准确性。尽管如此，先前的研究并没有严格量化错误颜色化对分割性能的影响。在本文中，我们提出了一种新颖的统计方法来评估不准确的RGB信息对基于图像的点云分割的影响。我们将RGB不准确性分为两种类型：颜色信息错误和相似颜色信息。我们的结果表明，这两种颜色不准确性都显著降低了分割准确性，尤其是相似颜色错误对几何特征的提取产生了影响。这些发现突显了重新评估RGB信息在点云分割中的作用以及对未来算法设计的影响的重要性。

更新时间: 2024-10-09 09:46:53

领域: cs.CV,cs.AI,cs.LG,cs.MM

下载: http://arxiv.org/abs/2410.06725v1

Adaptive Parametric Activation

The activation function plays a crucial role in model optimisation, yet the optimal choice remains unclear. For example, the Sigmoid activation is the de-facto activation in balanced classification tasks, however, in imbalanced classification, it proves inappropriate due to bias towards frequent classes. In this work, we delve deeper in this phenomenon by performing a comprehensive statistical analysis in the classification and intermediate layers of both balanced and imbalanced networks and we empirically show that aligning the activation function with the data distribution, enhances the performance in both balanced and imbalanced tasks. To this end, we propose the Adaptive Parametric Activation (APA) function, a novel and versatile activation function that unifies most common activation functions under a single formula. APA can be applied in both intermediate layers and attention layers, significantly outperforming the state-of-the-art on several imbalanced benchmarks such as ImageNet-LT, iNaturalist2018, Places-LT, CIFAR100-LT and LVIS and balanced benchmarks such as ImageNet1K, COCO and V3DET. The code is available at https://github.com/kostas1515/AGLU.

Updated: 2024-10-09 09:46:22

标题: 自适应参数激活

摘要: 激活函数在模型优化中起着至关重要的作用，然而最佳选择仍不清楚。例如，在平衡分类任务中，Sigmoid激活是事实上的激活函数，然而，在不平衡分类中，由于对频繁类别的偏见，它被证明是不合适的。在这项工作中，我们通过对平衡和不平衡网络的分类和中间层进行全面的统计分析，深入探讨了这一现象，并通过实验证明，将激活函数与数据分布对齐可以提高平衡和不平衡任务的性能。为此，我们提出了自适应参数化激活（APA）函数，这是一种新颖且多功能的激活函数，将大多数常见的激活函数统一在一个公式下。APA可以应用于中间层和注意力层，显著优于一些不平衡基准数据集，如ImageNet-LT、iNaturalist2018、Places-LT、CIFAR100-LT和LVIS，以及平衡基准数据集，如ImageNet1K、COCO和V3DET。代码可在https://github.com/kostas1515/AGLU上找到。

更新时间: 2024-10-09 09:46:22

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2407.08567v2

Evaluating Computational Pathology Foundation Models for Prostate Cancer Grading under Distribution Shifts

Foundation models have recently become a popular research direction within computational pathology. They are intended to be general-purpose feature extractors, promising to achieve good performance on a range of downstream tasks. Real-world pathology image data does however exhibit considerable variability. Foundation models should be robust to these variations and other distribution shifts which might be encountered in practice. We evaluate two computational pathology foundation models: UNI (trained on more than 100,000 whole-slide images) and CONCH (trained on more than 1.1 million image-caption pairs), by utilizing them as feature extractors within prostate cancer grading models. We find that while UNI and CONCH perform well relative to baselines, the absolute performance can still be far from satisfactory in certain settings. The fact that foundation models have been trained on large and varied datasets does not guarantee that downstream models always will be robust to common distribution shifts.

Updated: 2024-10-09 09:45:53

标题: 评估计算病理学基础模型在分布变化下对前列腺癌分级的适用性

摘要: 基础模型最近已成为计算病理学中受欢迎的研究方向。它们旨在成为通用特征提取器，承诺在各种下游任务上取得良好的性能。然而，现实世界的病理图像数据确实表现出相当大的变异性。基础模型应该对这些变异性和在实践中可能遇到的其他分布转移具有鲁棒性。我们评估了两个计算病理学基础模型：UNI（在超过10万个全幻灯片图像上训练）和CONCH（在超过110万个图像-标题对上训练），通过将它们用作前列腺癌分级模型中的特征提取器。我们发现，虽然相对于基线，UNI和CONCH表现良好，但在某些情况下，绝对性能仍可能远非令人满意。基础模型已经在大型和多样化的数据集上进行了训练，并不保证下游模型总是对常见的分布转移具有鲁棒性。

更新时间: 2024-10-09 09:45:53

领域: eess.IV,cs.CV,cs.LG

下载: http://arxiv.org/abs/2410.06723v1

Scaling Laws for Mixed quantization in Large Language Models

Post-training quantization of Large Language Models (LLMs) has proven effective in reducing the computational requirements for running inference on these models. In this study, we focus on a straightforward question: When aiming for a specific accuracy or perplexity target for low-precision quantization, how many high-precision numbers or calculations are required to preserve as we scale LLMs to larger sizes? We first introduce a critical metric named the quantization ratio, which compares the number of parameters quantized to low-precision arithmetic against the total parameter count. Through extensive and carefully controlled experiments across different model families, arithmetic types, and quantization granularities (e.g. layer-wise, matmul-wise), we identify two central phenomenons. 1) The larger the models, the better they can preserve performance with an increased quantization ratio, as measured by perplexity in pre-training tasks or accuracy in downstream tasks. 2) The finer the granularity of mixed-precision quantization (e.g., matmul-wise), the more the model can increase the quantization ratio. We believe these observed phenomena offer valuable insights for future AI hardware design and the development of advanced Efficient AI algorithms.

Updated: 2024-10-09 09:45:01

标题: 大型语言模型中混合量化的标度律

摘要: 大语言模型（LLMs）的后训练量化已被证明能够有效地减少在这些模型上运行推理所需的计算要求。在这项研究中，我们关注一个直接的问题：当针对低精度量化的特定准确度或困惑度目标时，为了保持多少高精度数字或计算，我们需要将LLMs扩展到更大的规模？我们首先引入了一个关键指标，即量化比率，它比较了量化为低精度算术的参数数量与总参数数量之间的比例。通过在不同模型系列、算术类型和量化粒度（例如，逐层、逐matmul）上进行广泛和精心控制的实验，我们确定了两个核心现象。1）模型越大，它们可以通过增加量化比率来更好地保持性能，以预训练任务中的困惑度或下游任务中的准确度来衡量。2）混合精度量化（例如，逐matmul）的粒度越细，模型可以增加量化比率。我们相信这些观察到的现象为未来人工智能硬件设计和高效人工智能算法的发展提供了宝贵的见解。

更新时间: 2024-10-09 09:45:01

领域: cs.CL,cs.LG

下载: http://arxiv.org/abs/2410.06722v1

MatMamba: A Matryoshka State Space Model

State Space Models (SSMs) like Mamba2 are a promising alternative to Transformers, with faster theoretical training and inference times -- especially for long context lengths. Recent work on Matryoshka Representation Learning -- and its application to Transformer backbones in works like MatFormer -- showed how to introduce nested granularities of smaller submodels in one universal elastic model. In this work, we present MatMamba: a state space model which combines Matryoshka-style learning with Mamba2, by modifying the block to contain nested dimensions to enable joint training and adaptive inference. MatMamba allows for efficient and adaptive deployment across various model sizes. We train a single large MatMamba model and are able to get a number of smaller nested models for free -- while maintaining or improving upon the performance of a baseline smaller model trained from scratch. We train language and image models at a variety of parameter sizes from 35M to 1.4B. Our results on ImageNet and FineWeb show that MatMamba models scale comparably to Transformers, while having more efficient inference characteristics. This makes MatMamba a practically viable option for deploying large-scale models in an elastic way based on the available inference compute. Code and models are open sourced at \url{https://github.com/ScaledFoundations/MatMamba}

Updated: 2024-10-09 09:41:34

标题: MatMamba：一个Matryoshka状态空间模型

摘要: 状态空间模型（SSMs）如Mamba2是变压器的一个有前途的替代方案，具有更快的理论训练和推断时间 - 尤其是对于长上下文长度。最近关于Matryoshka表示学习的研究 - 以及在作品中将其应用于变压器骨干的作品如MatFormer - 展示了如何在一个通用的弹性模型中引入更小子模型的嵌套粒度。在这项工作中，我们提出了MatMamba：一个状态空间模型，它将Matryoshka风格的学习与Mamba2相结合，通过修改块以包含嵌套维度以实现联合训练和自适应推断。MatMamba允许在各种模型大小上进行高效和自适应的部署。我们训练一个大型的MatMamba模型，并能够免费获得许多更小的嵌套模型 - 同时保持或改进从头开始训练的基线较小模型的性能。我们在各种参数大小的语言和图像模型上进行训练，从35M到1.4B。我们在ImageNet和FineWeb上的结果显示，MatMamba模型与变压器的规模相当，同时具有更高效的推断特性。这使得MatMamba成为一种基于可用推断计算部署大规模模型的实用选择。代码和模型在\url{https://github.com/ScaledFoundations/MatMamba}上开源。

更新时间: 2024-10-09 09:41:34

领域: cs.LG,cs.CL,cs.CV

下载: http://arxiv.org/abs/2410.06718v1

Detecting Subtle Differences between Human and Model Languages Using Spectrum of Relative Likelihood

Human and model-generated texts can be distinguished by examining the magnitude of likelihood in language. However, it is becoming increasingly difficult as language model's capabilities of generating human-like texts keep evolving. This study provides a new perspective by using the relative likelihood values instead of absolute ones, and extracting useful features from the spectrum-view of likelihood for the human-model text detection task. We propose a detection procedure with two classification methods, supervised and heuristic-based, respectively, which results in competitive performances with previous zero-shot detection methods and a new state-of-the-art on short-text detection. Our method can also reveal subtle differences between human and model languages, which find theoretical roots in psycholinguistics studies. Our code is available at https://github.com/CLCS-SUSTech/FourierGPT

Updated: 2024-10-09 09:36:49

标题: 使用相对可能性谱检测人类语言与模型语言之间的微小差异

摘要: 人类和模型生成的文本可以通过检查语言中的可能性大小来区分。然而，随着语言模型生成类人文本的能力不断发展，这变得越来越困难。本研究通过使用相对可能性值而不是绝对可能性值，从可能性的频谱视图中提取有用特征，为人类-模型文本检测任务提供了新的视角。我们提出了一个检测过程，分别采用监督和基于启发式的两种分类方法，其结果与先前零样本检测方法具有竞争力，并在短文本检测方面达到了新的最新水平。我们的方法还可以揭示人类语言和模型语言之间的微妙差异，这些差异在心理语言学研究中找到了理论根源。我们的代码可在https://github.com/CLCS-SUSTech/FourierGPT上找到。

更新时间: 2024-10-09 09:36:49

领域: cs.CL,cs.AI,I.2.7

下载: http://arxiv.org/abs/2406.19874v2

Analysis of different disparity estimation techniques on aerial stereo image datasets

With the advent of aerial image datasets, dense stereo matching has gained tremendous progress. This work analyses dense stereo correspondence analysis on aerial images using different techniques. Traditional methods, optimization based methods and learning based methods have been implemented and compared here for aerial images. For traditional methods, we implemented the architecture of Stereo SGBM while using different cost functions to get an understanding of their performance on aerial datasets. Analysis of most of the methods in standard datasets has shown good performance, however in case of aerial dataset, not much benchmarking is available. Visual qualitative and quantitative analysis has been carried out for two stereo aerial datasets in order to compare different cost functions and techniques for the purpose of depth estimation from stereo images. Using existing pre-trained models, recent learning based architectures have also been tested on stereo pairs along with different cost functions in SGBM. The outputs and given ground truth are compared using MSE, SSIM and other error metrics.

Updated: 2024-10-09 09:33:48

标题: 翻译：对航空立体图像数据集上不同视差估计技术的分析

摘要: 随着航空图像数据集的出现，密集立体匹配取得了巨大进步。本文分析了利用不同技术在航空图像上进行密集立体对应分析。针对航空图像，实施并比较了传统方法、基于优化的方法和基于学习的方法。对于传统方法，我们实现了Stereo SGBM的架构，同时使用不同的代价函数来了解它们在航空数据集上的性能。在标准数据集中对大多数方法进行的分析表明性能良好，然而在航空数据集的情况下，可用的基准测试不多。为了比较不同代价函数和技术用于深度估计的目的，对两个立体航空数据集进行了视觉定性和定量分析。利用现有的预训练模型，还在SGBM中测试了最近的基于学习的架构，并使用不同的代价函数对立体对进行了测试。通过使用均方误差（MSE）、结构相似性指标（SSIM）和其他误差指标来比较输出和给定的地面实况。

更新时间: 2024-10-09 09:33:48

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2410.06711v1

TASAR: Transfer-based Attack on Skeletal Action Recognition

Skeletal sequences, as well-structured representations of human behaviors, play a vital role in Human Activity Recognition (HAR). The transferability of adversarial skeletal sequences enables attacks in real-world HAR scenarios, such as autonomous driving, intelligent surveillance, and human-computer interactions. However, most existing skeleton-based HAR (S-HAR) attacks are primarily designed for white-box scenarios and exhibit weak adversarial transferability. Therefore, they cannot be considered true transfer-based S-HAR attacks. More importantly, the reason for this failure remains unclear. In this paper, we study this phenomenon through the lens of loss surface, and find that its sharpness contributes to the weak transferability in S-HAR. Inspired by this observation, we assume and empirically validate that smoothening the rugged loss landscape could potentially improve adversarial transferability in S-HAR. To this end, we propose the first \textbf{T}ransfer-based \textbf{A}ttack on \textbf{S}keletal \textbf{A}ction \textbf{R}ecognition, TASAR. TASAR explores the smoothed model posterior without requiring surrogate re-training, which is achieved by a new post-train Dual Bayesian optimization strategy. Furthermore, unlike previous transfer-based attacks that treat each frame independently and overlook temporal coherence within sequences, TASAR incorporates motion dynamics into the Bayesian attack gradient, effectively disrupting the spatial-temporal coherence of S-HARs. To exhaustively evaluate the effectiveness of existing methods and our method, we build the first large-scale robust S-HAR benchmark, comprising 7 S-HAR models, 10 attack methods, 3 S-HAR datasets and 2 defense methods. Extensive results demonstrate the superiority of TASAR. Our benchmark enables easy comparisons for future studies, with the code available in the supplementary material.

Updated: 2024-10-09 09:33:04

标题: TASAR：基于转移的骨架动作识别攻击

摘要: 骨骼序列作为人类行为的良好结构化表示，在人类活动识别（HAR）中发挥着至关重要的作用。对抗性骨骼序列的可传递性使其能够在现实世界的HAR场景中进行攻击，例如自动驾驶、智能监视和人机交互。然而，大多数现有基于骨骼的HAR（S-HAR）攻击主要设计用于白盒场景，并表现出较弱的对抗性可传递性。因此，它们不能被视为真正基于传递的S-HAR攻击。更重要的是，造成这种失败的原因仍不清楚。在本文中，我们通过损失曲面的视角研究了这一现象，并发现其陡峭性导致了S-HAR中的弱传递性。受到这一观察的启发，我们假设并经验验证，平滑崎岖的损失地形可能有助于改善S-HAR中的对抗性传递性。为此，我们提出了第一个基于传递的骨骼动作识别攻击，TASAR。TASAR在不需要替代重新训练的情况下，探索了平滑的模型后验，这是通过一种新的后训练双贝叶斯优化策略实现的。此外，与以前的基于传递的攻击不同，这些攻击将每一帧独立处理，并忽视序列内的时间一致性，TASAR将运动动态纳入贝叶斯攻击梯度中，有效地干扰了S-HAR的时空一致性。为了全面评估现有方法和我们的方法的有效性，我们建立了第一个大规模稳健的S-HAR基准，包括7个S-HAR模型、10种攻击方法、3个S-HAR数据集和2种防御方法。广泛的结果表明了TASAR的优越性。我们的基准测试使得未来研究能够轻松进行比较，代码可在补充材料中找到。

更新时间: 2024-10-09 09:33:04

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2409.02483v2

Probabilistic Conformal Prediction with Approximate Conditional Validity

We develop a new method for generating prediction sets that combines the flexibility of conformal methods with an estimate of the conditional distribution $P_{Y \mid X}$. Existing methods, such as conformalized quantile regression and probabilistic conformal prediction, usually provide only a marginal coverage guarantee. In contrast, our approach extends these frameworks to achieve approximately conditional coverage, which is crucial for many practical applications. Our prediction sets adapt to the behavior of the predictive distribution, making them effective even under high heteroscedasticity. While exact conditional guarantees are infeasible without assumptions on the underlying data distribution, we derive non-asymptotic bounds that depend on the total variation distance of the conditional distribution and its estimate. Using extensive simulations, we show that our method consistently outperforms existing approaches in terms of conditional coverage, leading to more reliable statistical inference in a variety of applications.

Updated: 2024-10-09 09:28:33

标题: 使用近似条件有效性的概率一致性预测

摘要: 我们开发了一种新方法，用于生成结合了符合方法的灵活性和条件分布$P_{Y \mid X}$估计的预测集。现有的方法，如符合分位数回归和概率符合预测，通常只提供边际覆盖保证。相比之下，我们的方法扩展了这些框架，实现了近似条件覆盖，这对许多实际应用至关重要。我们的预测集适应了预测分布的行为，使其在高异方差性下仍然有效。虽然在不对基础数据分布做出假设的情况下无法实现精确的条件保证，但我们推导出依赖于条件分布和其估计的总变差距离的非渐近界限。通过大量模拟，我们展示了我们的方法在条件覆盖方面始终优于现有方法，从而在各种应用中实现更可靠的统计推断。

更新时间: 2024-10-09 09:28:33

领域: stat.ML,cs.LG,math.PR,math.ST,stat.ME,stat.TH

下载: http://arxiv.org/abs/2407.01794v2

Iterative regularization in classification via hinge loss diagonal descent

Iterative regularization is a classic idea in regularization theory, that has recently become popular in machine learning. On the one hand, it allows to design efficient algorithms controlling at the same time numerical and statistical accuracy. On the other hand it allows to shed light on the learning curves observed while training neural networks. In this paper, we focus on iterative regularization in the context of classification. After contrasting this setting with that of linear inverse problems, we develop an iterative regularization approach based on the use of the hinge loss function. More precisely we consider a diagonal approach for a family of algorithms for which we prove convergence as well as rates of convergence and stability results for a suitable classification noise model. Our approach compares favorably with other alternatives, as confirmed by numerical simulations.

Updated: 2024-10-09 09:23:34

标题: 通过铰链损失对角下降的分类中的迭代正则化

摘要: 迭代正则化是正则化理论中的一个经典概念，最近在机器学习中变得流行起来。一方面，它允许设计高效的算法，同时控制数值和统计精度。另一方面，它可以揭示在训练神经网络时观察到的学习曲线。在本文中，我们关注分类问题中的迭代正则化。在将这种设置与线性逆问题进行对比后，我们基于使用铰链损失函数开发了一种迭代正则化方法。更确切地说，我们考虑一族算法的对角线方法，我们证明了对于适当的分类噪声模型，收敛性以及收敛速度和稳定性结果。我们的方法在数值模拟中与其他替代方案相比表现出色。

更新时间: 2024-10-09 09:23:34

领域: stat.ML,cs.LG,math.OC

下载: http://arxiv.org/abs/2212.12675v2

Calibrating Verbalized Probabilities for Large Language Models

Calibrating verbalized probabilities presents a novel approach for reliably assessing and leveraging outputs from black-box Large Language Models (LLMs). Recent methods have demonstrated improved calibration by applying techniques like Platt scaling or temperature scaling to the confidence scores generated by LLMs. In this paper, we explore the calibration of verbalized probability distributions for discriminative tasks. First, we investigate the capability of LLMs to generate probability distributions over categorical labels. We theoretically and empirically identify the issue of re-softmax arising from the scaling of verbalized probabilities, and propose using the invert softmax trick to approximate the "logit" by inverting verbalized probabilities. Through extensive evaluation on three public datasets, we demonstrate: (1) the robust capability of LLMs in generating class distributions, and (2) the effectiveness of the invert softmax trick in estimating logits, which, in turn, facilitates post-calibration adjustments.

Updated: 2024-10-09 09:20:24

标题: 为大型语言模型校准口头概率

摘要: 校准口头化概率提供了一种可靠地评估和利用黑盒大型语言模型（LLMs）输出的新方法。最近的方法通过将Platt缩放或温度缩放等技术应用于LLMs生成的置信度评分，已经证明了改进的校准。在本文中，我们探讨了用于判别任务的口头化概率分布的校准。首先，我们调查了LLMs生成分类标签概率分布的能力。我们在理论上和实证上确定了由于口头化概率缩放而产生的重新softmax的问题，并提议使用反向softmax技巧来逼近通过反转口头化概率来逼近“logit”。通过对三个公共数据集的广泛评估，我们证明：（1）LLMs在生成类分布方面具有强大的能力，（2）反向softmax技巧在估计logits方面的有效性，进而促进后校准调整。

更新时间: 2024-10-09 09:20:24

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2410.06707v1

PII-Scope: A Benchmark for Training Data PII Leakage Assessment in LLMs

In this work, we introduce PII-Scope, a comprehensive benchmark designed to evaluate state-of-the-art methodologies for PII extraction attacks targeting LLMs across diverse threat settings. Our study provides a deeper understanding of these attacks by uncovering several hyperparameters (e.g., demonstration selection) crucial to their effectiveness. Building on this understanding, we extend our study to more realistic attack scenarios, exploring PII attacks that employ advanced adversarial strategies, including repeated and diverse querying, and leveraging iterative learning for continual PII extraction. Through extensive experimentation, our results reveal a notable underestimation of PII leakage in existing single-query attacks. In fact, we show that with sophisticated adversarial capabilities and a limited query budget, PII extraction rates can increase by up to fivefold when targeting the pretrained model. Moreover, we evaluate PII leakage on finetuned models, showing that they are more vulnerable to leakage than pretrained models. Overall, our work establishes a rigorous empirical benchmark for PII extraction attacks in realistic threat scenarios and provides a strong foundation for developing effective mitigation strategies.

Updated: 2024-10-09 09:16:25

标题: PII-Scope：LLMs中培训数据PII泄露评估的基准Benchmark

摘要: 在这项工作中，我们介绍了PII-Scope，这是一个全面的基准测试，旨在评估针对LLMs的PII提取攻击的最先进方法，涵盖多样的威胁设置。我们的研究通过揭示一些关键的超参数（例如，演示选择），深入理解了这些攻击。在此基础上，我们将研究扩展到更真实的攻击场景，探索采用先进对抗策略的PII攻击，包括重复和多样化的查询，并利用迭代学习进行持续PII提取。通过广泛的实验，我们的结果揭示了现有的单次查询攻击对PII泄漏的明显低估。事实上，我们展示了，通过复杂的对抗能力和有限的查询预算，当针对预训练模型时，PII提取率可以增加多达五倍。此外，我们评估了对微调模型的PII泄漏，结果显示它们比预训练模型更容易泄漏。总的来说，我们的工作为真实威胁场景中的PII提取攻击建立了严格的经验基准，并为开发有效的缓解策略奠定了坚实基础。

更新时间: 2024-10-09 09:16:25

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2410.06704v1

Two is Better Than One: Digital Siblings to Improve Autonomous Driving Testing

Simulation-based testing represents an important step to ensure the reliability of autonomous driving software. In practice, when companies rely on third-party general-purpose simulators, either for in-house or outsourced testing, the generalizability of testing results to real autonomous vehicles is at stake. In this paper, we enhance simulation-based testing by introducing the notion of digital siblings, a multi-simulator approach that tests a given autonomous vehicle on multiple general-purpose simulators built with different technologies, that operate collectively as an ensemble in the testing process. We exemplify our approach on a case study focused on testing the lane-keeping component of an autonomous vehicle. We use two open-source simulators as digital siblings, and we empirically compare such a multi-simulator approach against a digital twin of a physical scaled autonomous vehicle on a large set of test cases. Our approach requires generating and running test cases for each individual simulator, in the form of sequences of road points. Then, test cases are migrated between simulators, using feature maps to characterize the exercised driving conditions. Finally, the joint predicted failure probability is computed, and a failure is reported only in cases of agreement among the siblings. Our empirical evaluation shows that the ensemble failure predictor by the digital siblings is superior to each individual simulator at predicting the failures of the digital twin. We discuss the findings of our case study and detail how our approach can help researchers interested in automated testing of autonomous driving software.

Updated: 2024-10-09 09:14:58

标题: 两个比一个更好：数字孪生以改进自动驾驶测试

摘要: Simulation-based testing is an important way to ensure the reliability of autonomous driving software. However, when companies use third-party simulators for testing, there is a risk that the results may not accurately reflect real-world conditions. In this paper, we propose a new approach called digital siblings, where multiple simulators with different technologies work together to test autonomous vehicles. We demonstrate this approach in a case study focusing on testing the lane-keeping component of an autonomous vehicle using open-source simulators. By comparing the results of the digital siblings approach to a physical scaled autonomous vehicle, we show that the ensemble predictor is more effective at predicting failures. Our approach involves generating and running test cases on each simulator, transferring test cases between simulators, and computing a joint failure probability based on agreement among the siblings. Our findings suggest that the digital siblings approach can improve automated testing of autonomous driving software.

更新时间: 2024-10-09 09:14:58

领域: cs.SE,cs.AI,cs.RO,D.2.5

下载: http://arxiv.org/abs/2305.08060v3

Energy-Efficient Federated Edge Learning with Streaming Data: A Lyapunov Optimization Approach

Federated learning (FL) has received significant attention in recent years for its advantages in efficient training of machine learning models across distributed clients without disclosing user-sensitive data. Specifically, in federated edge learning (FEEL) systems, the time-varying nature of wireless channels introduces inevitable system dynamics in the communication process, thereby affecting training latency and energy consumption. In this work, we further consider a streaming data scenario where new training data samples are randomly generated over time at edge devices. Our goal is to develop a dynamic scheduling and resource allocation algorithm to address the inherent randomness in data arrivals and resource availability under long-term energy constraints. To achieve this, we formulate a stochastic network optimization problem and use the Lyapunov drift-plus-penalty framework to obtain a dynamic resource management design. Our proposed algorithm makes adaptive decisions on device scheduling, computational capacity adjustment, and allocation of bandwidth and transmit power in every round. We provide convergence analysis for the considered setting with heterogeneous data and time-varying objective functions, which supports the rationale behind our proposed scheduling design. The effectiveness of our scheme is verified through simulation results, demonstrating improved learning performance and energy efficiency as compared to baseline schemes.

Updated: 2024-10-09 09:11:02

标题: 高效的边缘联邦学习与实时数据：一种李亚普诺夫优化方法

摘要: 联邦学习（FL）近年来受到了广泛关注，因为它在分布式客户端之间高效训练机器学习模型的优势，而不泄露用户敏感数据。具体来说，在联邦边缘学习（FEEL）系统中，无线信道的时变性质引入了通信过程中不可避免的系统动态，从而影响训练延迟和能耗。在这项工作中，我们进一步考虑了一个流式数据场景，即边缘设备上会随时间随机生成新的训练数据样本。我们的目标是开发一种动态调度和资源分配算法，以解决数据到达和资源可用性的固有随机性，同时符合长期能源约束。为此，我们制定了一个随机网络优化问题，并利用Lyapunov漂移加惩罚框架来获得动态资源管理设计。我们提出的算法在每一轮中对设备调度、计算能力调整、带宽和发送功率分配做出自适应决策。我们对考虑异构数据和时变目标函数的设置进行了收敛分析，支持我们提出的调度设计背后的理论基础。通过模拟结果验证了我们方案的有效性，表明与基准方案相比，实现了改进的学习性能和能源效率。

更新时间: 2024-10-09 09:11:02

领域: cs.LG,cs.DC,cs.IT,eess.SP,math.IT

下载: http://arxiv.org/abs/2405.12046v2

Break the Visual Perception: Adversarial Attacks Targeting Encoded Visual Tokens of Large Vision-Language Models

Large vision-language models (LVLMs) integrate visual information into large language models, showcasing remarkable multi-modal conversational capabilities. However, the visual modules introduces new challenges in terms of robustness for LVLMs, as attackers can craft adversarial images that are visually clean but may mislead the model to generate incorrect answers. In general, LVLMs rely on vision encoders to transform images into visual tokens, which are crucial for the language models to perceive image contents effectively. Therefore, we are curious about one question: Can LVLMs still generate correct responses when the encoded visual tokens are attacked and disrupting the visual information? To this end, we propose a non-targeted attack method referred to as VT-Attack (Visual Tokens Attack), which constructs adversarial examples from multiple perspectives, with the goal of comprehensively disrupting feature representations and inherent relationships as well as the semantic properties of visual tokens output by image encoders. Using only access to the image encoder in the proposed attack, the generated adversarial examples exhibit transferability across diverse LVLMs utilizing the same image encoder and generality across different tasks. Extensive experiments validate the superior attack performance of the VT-Attack over baseline methods, demonstrating its effectiveness in attacking LVLMs with image encoders, which in turn can provide guidance on the robustness of LVLMs, particularly in terms of the stability of the visual feature space.

Updated: 2024-10-09 09:06:56

标题: 突破视觉感知：以大规模视觉-语言模型编码的视觉令牌为目标的对抗性攻击

摘要: 大型视觉-语言模型（LVLMs）将视觉信息整合到大型语言模型中，展示出出色的多模态对话能力。然而，视觉模块引入了新的挑战，即对于LVLMs的鲁棒性，因为攻击者可以制作视觉清晰但可能误导模型生成错误答案的对抗性图像。一般来说，LVLMs依赖于视觉编码器将图像转换为视觉标记，这对于语言模型有效地感知图像内容至关重要。因此，我们对一个问题很感兴趣：当编码的视觉标记遭受攻击并破坏视觉信息时，LVLMs是否仍然能够生成正确的响应？为此，我们提出了一种非定向攻击方法，称为VT-Attack（Visual Tokens Attack），该方法从多个角度构建对抗性示例，旨在全面破坏特征表示和固有关系以及图像编码器输出的视觉标记的语义属性。在所提出的攻击中，仅使用对图像编码器的访问，生成的对抗性示例在使用相同图像编码器的不同LVLMs之间具有可传递性，并且在不同任务之间具有普遍性。大量实验证实了VT-Attack相对于基线方法的卓越攻击性能，展示了其在攻击具有图像编码器的LVLMs方面的有效性，从而可以为LVLMs的鲁棒性提供指导，特别是在视觉特征空间稳定性方面。

更新时间: 2024-10-09 09:06:56

领域: cs.CV,cs.AI,cs.LG

下载: http://arxiv.org/abs/2410.06699v1

GenSim: A General Social Simulation Platform with Large Language Model based Agents

With the rapid advancement of large language models (LLMs), recent years have witnessed many promising studies on leveraging LLM-based agents to simulate human social behavior. While prior work has demonstrated significant potential across various domains, much of it has focused on specific scenarios involving a limited number of agents and has lacked the ability to adapt when errors occur during simulation. To overcome these limitations, we propose a novel LLM-agent-based simulation platform called \textit{GenSim}, which: (1) \textbf{Abstracts a set of general functions} to simplify the simulation of customized social scenarios; (2) \textbf{Supports one hundred thousand agents} to better simulate large-scale populations in real-world contexts; (3) \textbf{Incorporates error-correction mechanisms} to ensure more reliable and long-term simulations. To evaluate our platform, we assess both the efficiency of large-scale agent simulations and the effectiveness of the error-correction mechanisms. To our knowledge, GenSim represents an initial step toward a general, large-scale, and correctable social simulation platform based on LLM agents, promising to further advance the field of social science.

Updated: 2024-10-09 09:03:48

标题: GenSim：基于大型语言模型代理的通用社会模拟平台

摘要: 随着大型语言模型（LLMs）的快速发展，近年来已经出现了许多有前途的研究，利用基于LLM的代理模拟人类社会行为。尽管先前的工作已经在各个领域展示了显著的潜力，但大部分集中在涉及有限数量代理的特定场景，并且在模拟过程中发生错误时缺乏适应能力。为了克服这些限制，我们提出了一种新颖的基于LLM代理的模拟平台，称为GenSim，该平台：（1）抽象出一组通用功能，简化定制社会场景的模拟；（2）支持十万个代理，更好地模拟现实世界中的大规模人口；（3）融入纠错机制，确保更可靠和长期的模拟。为了评估我们的平台，我们评估了大规模代理模拟的效率和纠错机制的有效性。据我们所知，GenSim代表了朝着基于LLM代理的通用、大规模和可纠正的社会模拟平台迈出的一小步，有望进一步推动社会科学领域的发展。

更新时间: 2024-10-09 09:03:48

领域: cs.MA,cs.AI

下载: http://arxiv.org/abs/2410.04360v2

Knowledge Gradient for Multi-Objective Bayesian Optimization with Decoupled Evaluations

Multi-objective Bayesian optimization aims to find the Pareto front of trade-offs between a set of expensive objectives while collecting as few samples as possible. In some cases, it is possible to evaluate the objectives separately, and a different latency or evaluation cost can be associated with each objective. This decoupling of the objectives presents an opportunity to learn the Pareto front faster by avoiding unnecessary, expensive evaluations. We propose a scalarization based knowledge gradient acquisition function which accounts for the different evaluation costs of the objectives. We prove asymptotic consistency of the estimator of the optimum for an arbitrary, D-dimensional, real compact search space and show empirically that the algorithm performs comparably with the state of the art and significantly outperforms versions which always evaluate both objectives.

Updated: 2024-10-09 08:50:08

标题: 多目标贝叶斯优化的知识梯度及解耦评估

摘要: 多目标贝叶斯优化旨在在尽可能收集尽可能少的样本的情况下找到一组昂贵目标之间的权衡帕累托前沿。在某些情况下，可以分别评估目标，并且可以为每个目标关联不同的延迟或评估成本。这种目标的解耦提供了一个机会，可以通过避免不必要的昂贵评估更快地学习帕累托前沿。我们提出了一种基于标量化的知识梯度获取函数，考虑了目标的不同评估成本。我们证明了对于任意的D维实紧致搜索空间，估计最优解的渐近一致性，并在实证研究中展示了该算法表现与现有技术相当，并且明显优于总是评估两个目标的版本。

更新时间: 2024-10-09 08:50:08

领域: stat.ML,cs.LG,math.OC

下载: http://arxiv.org/abs/2302.01310v2

AI, Climate, and Regulation: From Data Centers to the AI Act

We live in a world that is experiencing an unprecedented boom of AI applications that increasingly penetrate and enhance all sectors of private and public life, from education, media, medicine, and mobility to the industrial and professional workspace, and -- potentially particularly consequentially -- robotics. As this world is simultaneously grappling with climate change, the climate and environmental implications of the development and use of AI have become an important subject of public and academic debate. In this paper, we aim to provide guidance on the climate-related regulation for data centers and AI specifically, and discuss how to operationalize these requirements. We also highlight challenges and room for improvement, and make a number of policy proposals to this end. In particular, we propose a specific interpretation of the AI Act to bring reporting on the previously unadressed energy consumption from AI inferences back into the scope. We also find that the AI Act fails to address indirect greenhouse gas emissions from AI applications. Furthermore, for the purpose of energy consumption reporting, we compare levels of measurement within data centers and recommend measurement at the cumulative server level. We also argue for an interpretation of the AI Act that includes environmental concerns in the mandatory risk assessment (sustainability risk assessment, SIA), and provide guidance on its operationalization. The EU data center regulation proves to be a good first step but requires further development by including binding renewable energy and efficiency targets for data centers. Overall, we make twelve concrete policy proposals, in four main areas: Energy and Environmental Reporting Obligations; Legal and Regulatory Clarifications; Transparency and Accountability Mechanisms; and Future Far-Reaching Measures beyond Transparency.

Updated: 2024-10-09 08:43:53

标题: 人工智能、气候和监管：从数据中心到人工智能法案

摘要: 我们生活在一个经历着前所未有的人工智能应用繁荣的世界中，这些应用越来越深入和增强私人和公共生活的各个领域，从教育、媒体、医疗和移动性到工业和专业工作空间，以及--潜在地尤其重要--机器人技术。随着这个世界同时应对气候变化，人工智能的发展和使用对气候和环境的影响已成为公众和学术界争论的重要议题。在本文中，我们旨在为数据中心和人工智能的气候相关监管提供指导，并讨论如何实施这些要求。我们还强调了挑战和改进空间，并提出了一些政策建议。特别是，我们提出了对AI法案的具体解释，将对以前未涉及的AI推断能耗的报告重新纳入范围。我们还发现，AI法案未能解决来自人工智能应用的间接温室气体排放。此外，为了能耗报告的目的，我们比较了数据中心内测量水平，并建议对累计服务器水平进行测量。我们还主张对AI法案进行解释，将环境问题纳入强制性风险评估（可持续性风险评估，SIA），并提供其实施指导。欧盟数据中心监管证明是一个良好的第一步，但需要通过包括数据中心的可再生能源和效率目标来进一步发展。总体而言，我们提出了四个主要领域的十二项具体政策建议：能源和环境报告义务；法律和监管澄清；透明度和问责机制；以及超越透明度的未来深远措施。

更新时间: 2024-10-09 08:43:53

领域: cs.CY,cs.AI

下载: http://arxiv.org/abs/2410.06681v1

Training Overhead Ratio: A Practical Reliability Metric for Large Language Model Training Systems

Large Language Models (LLMs) are revolutionizing the AI industry with their superior capabilities. Training these models requires large-scale GPU clusters and significant computing time, leading to frequent failures that significantly increase training costs. Despite its significance, this field lacks a metric for evaluating reliability. In this work, we introduce a novel reliability metric called \emph{Training Overhead Ratio} (TOR) to evaluate the reliability of fault-tolerant LLM training systems. TOR is defined as the ratio of optimal training time to the observed training time of a system, serving as a practical tool for users to estimate the actual time required to train an LLM on a given system. Furthermore, our investigation identifies the key factor for enhancing reliability and present TOR equations for various types of failures encountered in practice.

Updated: 2024-10-09 08:43:25

标题: 训练开销比率：大型语言模型训练系统的实用可靠性指标

摘要: 大型语言模型(LLMs)以其卓越的能力正在改变人工智能行业。训练这些模型需要大规模的GPU集群和大量的计算时间，导致频繁失败，显著增加训练成本。尽管其重要性，这个领域缺乏一个评估可靠性的指标。在这项工作中，我们引入了一个新颖的可靠性指标，称为\emph{训练超额比率}（TOR），用于评估容错LLM训练系统的可靠性。TOR被定义为系统的最佳训练时间与观察到的训练时间的比率，作为用户估计在给定系统上训练LLM所需实际时间的实用工具。此外，我们的调查确定了增强可靠性的关键因素，并针对实践中遇到的各种故障提出了TOR方程。

更新时间: 2024-10-09 08:43:25

领域: cs.DC,cs.AI

下载: http://arxiv.org/abs/2408.07482v3

Nested Deep Learning Model Towards A Foundation Model for Brain Signal Data

Epilepsy affects over 50 million people globally, with EEG/MEG-based spike detection playing a crucial role in diagnosis and treatment. Manual spike identification is time-consuming and requires specialized training, limiting the number of professionals available to analyze EEG/MEG data. To address this, various algorithmic approaches have been developed. However, current methods face challenges in handling varying channel configurations and in identifying the specific channels where spikes originate. This paper introduces a novel Nested Deep Learning (NDL) framework designed to overcome these limitations. NDL applies a weighted combination of signals across all channels, ensuring adaptability to different channel setups, and allows clinicians to identify key channels more accurately. Through theoretical analysis and empirical validation on real EEG/MEG datasets, NDL demonstrates superior accuracy in spike detection and channel localization compared to traditional methods. The results show that NDL improves prediction accuracy, supports cross-modality data integration, and can be fine-tuned for various neurophysiological applications.

Updated: 2024-10-09 08:29:54

标题: 基于嵌套深度学习模型的脑信号数据基础模型

摘要: 癫痫病影响全球超过5000万人，基于EEG/MEG的尖峰检测在诊断和治疗中起着至关重要的作用。手动尖峰识别耗时且需要专门培训，限制了能够分析EEG/MEG数据的专业人员数量。为了解决这个问题，开发了各种算法方法。然而，当前的方法在处理不同通道配置和识别尖峰起源的特定通道方面面临挑战。本文介绍了一种新颖的嵌套深度学习（NDL）框架，旨在克服这些限制。NDL应用了跨所有通道的信号的加权组合，确保适应不同的通道设置，并允许临床医生更准确地识别关键通道。通过对真实EEG/MEG数据集的理论分析和实证验证，NDL在尖峰检测和通道定位方面展示了优越的准确性，相较于传统方法。结果表明，NDL提高了预测准确性，支持跨模态数据集成，并可以针对各种神经生理学应用进行微调。

更新时间: 2024-10-09 08:29:54

领域: stat.ML,cs.LG

下载: http://arxiv.org/abs/2410.03191v2

GLA-DA: Global-Local Alignment Domain Adaptation for Multivariate Time Series

Unlike images and natural language tokens, time series data is highly semantically sparse, resulting in labor-intensive label annotations. Unsupervised and Semi-supervised Domain Adaptation (UDA and SSDA) have demonstrated efficiency in addressing this issue by utilizing pre-labeled source data to train on unlabeled or partially labeled target data. However, in domain adaptation methods designed for downstream classification tasks, directly adapting labeled source samples with unlabelled target samples often results in similar distributions across various classes, thereby compromising the performance of the target classification task. To tackle this challenge, we proposed a Global-Local Alignment Domain Adaptation (GLA-DA) method for multivariate time series data. Data from two domains were initially encoded to align in an intermediate feature space adversarially, achieving Global Feature Alignment (GFA). Subsequently, GLA-DA leveraged the consistency between similarity-based and deep learning-based models to assign pseudo labels to unlabeled target data. This process aims to preserve differences among data with distinct labels by aligning the samples with the same class labels together, achieving Local Class Alignment (LCA). We implemented GLA-DA in both UDA and SSDA scenarios, showcasing its superiority over state-of-the-art methods through extensive experiments on various public datasets. Ablation experiments underscored the significance of key components within GLA-DA.

Updated: 2024-10-09 08:27:26

标题: GLA-DA：多变量时间序列的全局-局部对准领域自适应

摘要: 与图像和自然语言标记不同，时间序列数据在语义上非常稀疏，导致标签注释工作繁重。无监督和半监督领域自适应（UDA和SSDA）已经证明通过利用预标记的源数据来训练未标记或部分标记的目标数据，有效地解决了这个问题。然而，在为下游分类任务设计的领域自适应方法中，直接将带有未标记目标样本的标记源样本适应往往导致各个类别之间的分布相似，从而损害目标分类任务的性能。为了应对这一挑战，我们提出了一种全局-局部对齐领域适应（GLA-DA）方法，用于多变量时间序列数据。来自两个领域的数据最初被编码以在一个中间特征空间中对齐对抗，实现全局特征对齐（GFA）。随后，GLA-DA利用基于相似度和基于深度学习的模型之间的一致性，为未标记的目标数据分配伪标签。这个过程旨在通过将具有相同类标签的样本对齐在一起，实现局部类对齐（LCA），以保留具有不同标签的数据之间的差异。我们在UDA和SSDA场景中实施了GLA-DA，通过对各种公共数据集进行广泛实验展示了其优越性。消融实验强调了GLA-DA内关键组件的重要性。

更新时间: 2024-10-09 08:27:26

领域: cs.LG

下载: http://arxiv.org/abs/2410.06671v1

Parametric PDE Control with Deep Reinforcement Learning and Differentiable L0-Sparse Polynomial Policies

Optimal control of parametric partial differential equations (PDEs) is crucial in many applications in engineering and science. In recent years, the progress in scientific machine learning has opened up new frontiers for the control of parametric PDEs. In particular, deep reinforcement learning (DRL) has the potential to solve high-dimensional and complex control problems in a large variety of applications. Most DRL methods rely on deep neural network (DNN) control policies. However, for many dynamical systems, DNN-based control policies tend to be over-parametrized, which means they need large amounts of training data, show limited robustness, and lack interpretability. In this work, we leverage dictionary learning and differentiable L$_0$ regularization to learn sparse, robust, and interpretable control policies for parametric PDEs. Our sparse policy architecture is agnostic to the DRL method and can be used in different policy-gradient and actor-critic DRL algorithms without changing their policy-optimization procedure. We test our approach on the challenging tasks of controlling parametric Kuramoto-Sivashinsky and convection-diffusion-reaction PDEs. We show that our method (1) outperforms baseline DNN-based DRL policies, (2) allows for the derivation of interpretable equations of the learned optimal control laws, and (3) generalizes to unseen parameters of the PDE without retraining the policies.

Updated: 2024-10-09 08:24:52

标题: 使用深度强化学习和可微L0稀疏多项式策略的参数化PDE控制

摘要: 参数化偏微分方程（PDEs）的最优控制在工程和科学的许多应用中至关重要。近年来，科学机器学习的进展为参数化PDEs的控制开辟了新的领域。特别是，深度强化学习（DRL）有潜力解决各种应用中的高维和复杂控制问题。大多数DRL方法依赖于深度神经网络（DNN）控制策略。然而，对于许多动态系统，基于DNN的控制策略往往过度参数化，这意味着它们需要大量的训练数据，表现出有限的鲁棒性，并且缺乏可解释性。在这项工作中，我们利用字典学习和可微L$_0$正则化来学习稀疏、鲁棒和可解释的参数化PDEs控制策略。我们的稀疏策略架构与DRL方法无关，可以应用于不同的策略梯度和演员-评论家DRL算法，而无需改变它们的策略优化程序。我们在控制参数化Kuramoto-Sivashinsky和对流-扩散-反应PDEs的挑战性任务上测试我们的方法。我们展示了我们的方法（1）优于基线基于DNN的DRL策略，（2）允许推导出学习到的最优控制规律的可解释方程，（3）在不重新训练策略的情况下推广到PDE的未知参数。

更新时间: 2024-10-09 08:24:52

领域: cs.LG

下载: http://arxiv.org/abs/2403.15267v2

Revisiting Multi-Permutation Equivariance through the Lens of Irreducible Representations

This paper explores the characterization of equivariant linear layers for representations of permutations and related groups. Unlike traditional approaches, which address these problems using parameter-sharing, we consider an alternative methodology based on irreducible representations and Schur's lemma. Using this methodology, we obtain an alternative derivation for existing models like DeepSets, 2-IGN graph equivariant networks, and Deep Weight Space (DWS) networks. The derivation for DWS networks is significantly simpler than that of previous results. Next, we extend our approach to unaligned symmetric sets, where equivariance to the wreath product of groups is required. Previous works have addressed this problem in a rather restrictive setting, in which almost all wreath equivariant layers are Siamese. In contrast, we give a full characterization of layers in this case and show that there is a vast number of additional non-Siamese layers in some settings. We also show empirically that these additional non-Siamese layers can improve performance in tasks like graph anomaly detection, weight space alignment, and learning Wasserstein distances. Our code is available at \href{https://github.com/yonatansverdlov/Irreducible-Representations-of-Deep-Weight-Spaces}{GitHub}.

Updated: 2024-10-09 08:19:31

标题: 通过不可约表示的视角重新审视多置换等变性

摘要: 本文探讨了用于置换和相关群表示的等变线性层的特征化。与传统方法不同，传统方法使用参数共享来解决这些问题，我们考虑了一种基于不可约表示和舒尔引理的替代方法。使用这种方法，我们获得了现有模型如DeepSets、2-IGN图等变网络和Deep Weight Space（DWS）网络的替代推导。DWS网络的推导比先前结果简单得多。接下来，我们将我们的方法扩展到非对齐对称集合，这里需要对群的环积等变性。先前的研究在一个相当受限的设置中解决了这个问题，在这种设置中，几乎所有的环积等变层都是连体的。相反，我们在这种情况下给出了层的完整特征化，并展示在某些情景中存在大量额外的非连体层。我们还通过实验证明，这些额外的非连体层可以提高图异常检测、权重空间对齐和学习Wasserstein距离等任务的性能。我们的代码可在GitHub上找到。

更新时间: 2024-10-09 08:19:31

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2410.06665v1

Decouple-Then-Merge: Towards Better Training for Diffusion Models

Diffusion models are trained by learning a sequence of models that reverse each step of noise corruption. Typically, the model parameters are fully shared across multiple timesteps to enhance training efficiency. However, since the denoising tasks differ at each timestep, the gradients computed at different timesteps may conflict, potentially degrading the overall performance of image generation. To solve this issue, this work proposes a Decouple-then-Merge (DeMe) framework, which begins with a pretrained model and finetunes separate models tailored to specific timesteps. We introduce several improved techniques during the finetuning stage to promote effective knowledge sharing while minimizing training interference across timesteps. Finally, after finetuning, these separate models can be merged into a single model in the parameter space, ensuring efficient and practical inference. Experimental results show significant generation quality improvements upon 6 benchmarks including Stable Diffusion on COCO30K, ImageNet1K, PartiPrompts, and DDPM on LSUN Church, LSUN Bedroom, and CIFAR10.

Updated: 2024-10-09 08:19:25

标题: 解耦-然后-合并：迈向更好的扩散模型训练

摘要: 扩散模型通过学习一系列逆转每个步骤的噪声损坏模型来训练。通常，模型参数在多个时间步上完全共享，以增强训练效率。然而，由于每个时间步的去噪任务不同，不同时间步计算的梯度可能会冲突，可能会降低图像生成的整体性能。为了解决这个问题，本文提出了一个分离-合并（DeMe）框架，该框架从一个预训练模型开始，并对适用于特定时间步的单独模型进行微调。我们在微调阶段引入了几种改进技术，以促进有效的知识共享，同时最小化跨时间步的训练干扰。最后，在微调之后，这些单独模型可以在参数空间中合并为一个单一模型，确保有效和实用的推断。实验结果显示，在包括COCO30K上的稳定扩散，ImageNet1K，PartiPrompts和LSUN Church，LSUN Bedroom和CIFAR10上的DDPM等6个基准测试中，生成质量显著提高。

更新时间: 2024-10-09 08:19:25

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2410.06664v1

Uncertainty-Guided Optimization on Large Language Model Search Trees

Tree search algorithms such as greedy and beam search are the standard when it comes to finding sequences of maximum likelihood in the decoding processes of large language models (LLMs). However, they are myopic since they do not take the complete root-to-leaf path into account. Moreover, they are agnostic to prior knowledge available about the process: For example, it does not consider that the objective being maximized is a probability and thereby has specific properties like being bound in the unit interval. Taking a probabilistic approach, we define prior beliefs over LLMs' transition probabilities and obtain posterior beliefs over the most promising paths in each iteration. These beliefs are useful for defining a sample-based, non-myopic acquisition function that allows for a more data-efficient exploration scheme than standard search algorithms on LLMs. Crucially, unlike expensive simulation-based non-myopic methods like the Monte Carlo tree search, our method only requires samples from the beliefs. Our formulation thus views LLM decoding as Bayesian optimization on trees. We discuss how to select the prior and the acquisition function, and demonstrate in experiments with various LLMs that our method achieves higher efficiency than recent baselines: Our method achieves the same or a higher likelihood while expanding fewer nodes.

Updated: 2024-10-09 08:16:18

标题: 大规模语言模型搜索树上的不确定性引导优化

摘要: 树搜索算法，如贪婪搜索和波束搜索，在大型语言模型（LLMs）的解码过程中寻找最大似然序列时是标准的。然而，它们是目光短浅的，因为它们没有考虑完整的从根到叶路径。此外，它们对于关于过程的先验知识是无知的：例如，它不考虑被最大化的目标是概率，因此具有类似于被绑定在单位间隔中的特性。采用概率方法，我们在LLMs的转换概率上定义先验信念，并在每次迭代中获取对最有前途路径的后验信念。这些信念有助于定义一个基于样本的、非目光短浅的获取函数，比LLMs上的标准搜索算法更具数据效率的探索方案。关键的是，与昂贵的基于模拟的非目光短浅方法，如蒙特卡罗树搜索不同，我们的方法只需要信念中的样本。因此，我们的公式将LLM解码视为树上的贝叶斯优化。我们讨论了如何选择先验和获取函数，并在与各种LLMs的实验中展示，我们的方法比最近的基准方法实现了更高的效率：我们的方法在扩展更少的节点的同时实现了相同或更高的可能性。

更新时间: 2024-10-09 08:16:18

领域: cs.LG

下载: http://arxiv.org/abs/2407.03951v2

Latent Diffusion Bridges for Unsupervised Musical Audio Timbre Transfer

Music timbre transfer is a challenging task that involves modifying the timbral characteristics of an audio signal while preserving its melodic structure. In this paper, we propose a novel method based on dual diffusion bridges, trained using the CocoChorales Dataset, which consists of unpaired monophonic single-instrument audio data. Each diffusion model is trained on a specific instrument with a Gaussian prior. During inference, a model is designated as the source model to map the input audio to its corresponding Gaussian prior, and another model is designated as the target model to reconstruct the target audio from this Gaussian prior, thereby facilitating timbre transfer. We compare our approach against existing unsupervised timbre transfer models such as VAEGAN and Gaussian Flow Bridges (GFB). Experimental results demonstrate that our method achieves both better Fr\'echet Audio Distance (FAD) and melody preservation, as reflected by lower pitch distances (DPD) compared to VAEGAN and GFB. Additionally, we discover that the noise level from the Gaussian prior, $\sigma$, can be adjusted to control the degree of melody preservation and amount of timbre transferred.

Updated: 2024-10-09 08:11:47

标题: 潜在扩散桥梁用于无监督音乐音频音色转移

摘要: 音色转移是一个具有挑战性的任务，它涉及修改音频信号的音色特征，同时保留其旋律结构。在本文中，我们提出了一种基于双扩散桥的新方法，该方法使用CocoChorales数据集进行训练，该数据集包含不成对的单声部单乐器音频数据。每个扩散模型都是在具有高斯先验的特定乐器上进行训练的。在推断过程中，一个模型被指定为源模型，将输入音频映射到其相应的高斯先验，另一个模型被指定为目标模型，从这个高斯先验重新构造目标音频，从而促进音色转移。我们将我们的方法与现有的无监督音色转移模型（如VAEGAN和高斯流桥）进行了比较。实验结果表明，与VAEGAN和GFB相比，我们的方法在FAD（Fr\'echet Audio Distance）和旋律保留方面都取得了更好的表现，表现为较低的音高距离（DPD）。此外，我们发现，来自高斯先验的噪声水平$\sigma$可以调整以控制旋律保留的程度和音色转移的数量。

更新时间: 2024-10-09 08:11:47

领域: cs.SD,cs.AI,cs.IR,eess.AS

下载: http://arxiv.org/abs/2409.06096v3

WardropNet: Traffic Flow Predictions via Equilibrium-Augmented Learning

When optimizing transportation systems, anticipating traffic flows is a central element. Yet, computing such traffic equilibria remains computationally expensive. Against this background, we introduce a novel combinatorial optimization augmented neural network architecture that allows for fast and accurate traffic flow predictions. We propose WardropNet, a neural network that combines classical layers with a subsequent equilibrium layer: the first ones inform the latter by predicting the parameterization of the equilibrium problem's latency functions. Using supervised learning we minimize the difference between the actual traffic flow and the predicted output. We show how to leverage a Bregman divergence fitting the geometry of the equilibria, which allows for end-to-end learning. WardropNet outperforms pure learning-based approaches in predicting traffic equilibria for realistic and stylized traffic scenarios. On realistic scenarios, WardropNet improves on average for time-invariant predictions by up to 72% and for time-variant predictions by up to 23% over pure learning-based approaches.

Updated: 2024-10-09 08:07:29

标题: WardropNet：通过平衡增强学习进行交通流预测

摘要: 在优化运输系统时，预测交通流量是一个核心要素。然而，计算这样的交通均衡仍然是计算昂贵的。在这种背景下，我们介绍了一种新颖的组合优化增强神经网络架构，可以实现快速准确的交通流量预测。我们提出了WardropNet，这是一个神经网络，它将经典层与后续的均衡层相结合：前者通过预测均衡问题的延迟函数的参数化来通知后者。通过监督学习，我们最小化实际交通流量和预测输出之间的差异。我们展示了如何利用符合均衡几何的Bregman散度进行端到端学习。在预测现实和样式化交通情景的交通均衡时，WardropNet优于纯学习方法。在现实情景中，WardropNet对于不变时间预测的平均改进高达72％，对于变时预测的改进高达23％，超过了纯学习方法。

更新时间: 2024-10-09 08:07:29

领域: cs.LG

下载: http://arxiv.org/abs/2410.06656v1

Toward Physics-guided Time Series Embedding

In various scientific and engineering fields, the primary research areas have revolved around physics-based dynamical systems modeling and data-driven time series analysis. According to the embedding theory, dynamical systems and time series can be mutually transformed using observation functions and physical reconstruction techniques. Based on this, we propose Embedding Duality Theory, where the parameterized embedding layer essentially provides a linear estimation of the non-linear time series dynamics. This theory enables us to bypass the parameterized embedding layer and directly employ physical reconstruction techniques to acquire a data embedding representation. Utilizing physical priors results in a 10X reduction in parameters, a 3X increase in speed, and maximum performance boosts of 18% in expert, 22% in few-shot, and 53\% in zero-shot tasks without any hyper-parameter tuning. All methods are encapsulated as a plug-and-play module

Updated: 2024-10-09 08:04:06

标题: 走向物理引导的时序嵌入

摘要: 在各种科学和工程领域，主要的研究领域围绕基于物理的动态系统建模和数据驱动的时间序列分析展开。根据嵌入理论，动态系统和时间序列可以相互转换，使用观测函数和物理重建技术。基于此，我们提出了嵌入对偶理论，其中参数化嵌入层本质上提供了非线性时间序列动态的线性估计。这个理论使我们能够绕过参数化嵌入层，直接使用物理重建技术获取数据嵌入表示。利用物理先验能够减少参数10倍，提高速度3倍，并在专家任务中最大性能提升18％，在少样本任务中提升22％，在零样本任务中提升53％，而无需任何超参数调整。所有方法被封装为即插即用的模块。

更新时间: 2024-10-09 08:04:06

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2410.06651v1

Staircase Cascaded Fusion of Lightweight Local Pattern Recognition and Long-Range Dependencies for Structural Crack Segmentation

Detecting cracks with pixel-level precision for key structures is a significant challenge, as existing methods struggle to effectively integrate local textures and pixel dependencies of cracks. Furthermore, these methods often possess numerous parameters and substantial computational requirements, complicating deployment on edge control devices. In this paper, we propose a staircase cascaded fusion crack segmentation network (CrackSCF) that generates high-quality crack segmentation maps using minimal computational resources. We constructed a staircase cascaded fusion module that effectively captures local patterns of cracks and long-range dependencies of pixels, and it can suppress background noise well. To reduce the computational resources required by the model, we introduced a lightweight convolution block, which replaces all convolution operations in the network, significantly reducing the required computation and parameters without affecting the network's performance. To evaluate our method, we created a challenging benchmark dataset called TUT and conducted experiments on this dataset and five other public datasets. The experimental results indicate that our method offers significant advantages over existing methods, especially in handling background noise interference and detailed crack segmentation. The F1 and mIoU scores on the TUT dataset are 0.8382 and 0.8473, respectively, achieving state-of-the-art (SOTA) performance while requiring the least computational resources. The code and dataset is available at https://github.com/Karl1109/CrackSCF.

Updated: 2024-10-09 07:58:37

标题: 楼梯级联融合轻量级局部模式识别和远程依赖性用于结构裂纹分割

摘要: 使用像素级精度检测关键结构的裂缝是一个重要挑战，因为现有方法往往难以有效地整合裂纹的局部纹理和像素依赖关系。此外，这些方法通常具有大量参数和大量的计算需求，使其难以在边缘控制设备上部署。在本文中，我们提出了一种阶梯级联融合裂纹分割网络（CrackSCF），利用最少的计算资源生成高质量的裂纹分割地图。我们构建了一个阶梯级联融合模块，有效捕捉裂纹的局部模式和像素的长程依赖关系，并且可以很好地抑制背景噪声。为了减少模型所需的计算资源，我们引入了一个轻量级的卷积块，用于替换网络中的所有卷积操作，显著减少了所需的计算和参数，而不影响网络的性能。为了评估我们的方法，我们创建了一个具有挑战性的基准数据集 TUT，并在该数据集和其他五个公共数据集上进行了实验。实验结果表明，我们的方法在处理背景噪声干扰和详细裂纹分割方面具有显著优势。TUT 数据集上的 F1 和 mIoU 分数分别为 0.8382 和 0.8473，实现了最先进的性能，同时需要最少的计算资源。代码和数据集可在 https://github.com/Karl1109/CrackSCF 上获得。

更新时间: 2024-10-09 07:58:37

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2408.12815v2

SBoRA: Low-Rank Adaptation with Regional Weight Updates

This paper introduces Standard Basis LoRA (SBoRA), a novel parameter-efficient fine-tuning approach for Large Language Models that builds upon the pioneering works of Low-Rank Adaptation (LoRA) and Orthogonal Adaptation. SBoRA reduces the number of trainable parameters by half or doubles the rank with the similar number of trainable parameters as LoRA, while improving learning performance. By utilizing orthogonal standard basis vectors to initialize one of the low-rank matrices (either $\mathbf{A}$ or $\mathbf{B}$), SBoRA facilitates regional weight updates and memory-efficient fine-tuning. This results in two variants, SBoRA-FA and SBoRA-FB, where only one of the matrices is updated, leading to a sparse update matrix $\mathrm{\Delta} \mathbf{W}$ with predominantly zero rows or columns. Consequently, most of the fine-tuned model's weights $(\mathbf{W}_0+\mathrm{\Delta} \mathbf{W})$ remain unchanged from the pre-trained weights, akin to the modular organization of the human brain, which efficiently adapts to new tasks. Our empirical results demonstrate the superiority of SBoRA-FA over LoRA in various fine-tuning tasks, including commonsense reasoning and arithmetic reasoning. Furthermore, we evaluate the effectiveness of QSBoRA on quantized LLaMA models of varying scales, highlighting its potential for efficient adaptation to new tasks. Code is available at https://github.com/cityuhkai/SBoRA

Updated: 2024-10-09 07:53:10

标题: SBoRA：具有区域权重更新的低秩适应

摘要: 本文介绍了标准基准LoRA（SBoRA），这是一种新颖的参数高效微调方法，用于大型语言模型，它建立在低秩适应（LoRA）和正交适应的开创性工作之上。SBoRA通过使用正交标准基准向量来初始化低秩矩阵（$\mathbf{A}$或$\mathbf{B}$中的一个），有助于区域权重更新和内存高效微调，从而将可训练参数的数量减少了一半或增加了排名，与LoRA相同数量的可训练参数，同时提高了学习性能。这导致了两种变体，SBoRA-FA和SBoRA-FB，其中只更新其中一个矩阵，导致稀疏更新矩阵$\mathrm{\Delta} \mathbf{W}$具有主要为零的行或列。因此，大多数微调模型的权重$(\mathbf{W}_0+\mathrm{\Delta} \mathbf{W})$与预训练权重保持不变，类似于人脑的模块化组织，有效适应新任务。我们的实证结果表明，在各种微调任务中，包括常识推理和算术推理，SBoRA-FA优于LoRA。此外，我们评估了QSBoRA对不同规模的量化LLaMA模型的有效性，突出了其在有效适应新任务方面的潜力。代码可在https://github.com/cityuhkai/SBoRA找到。

更新时间: 2024-10-09 07:53:10

领域: cs.AI,cs.CL,cs.LG

下载: http://arxiv.org/abs/2407.05413v3

Needle In A Multimodal Haystack

With the rapid advancement of multimodal large language models (MLLMs), their evaluation has become increasingly comprehensive. However, understanding long multimodal content, as a foundational ability for real-world applications, remains underexplored. In this work, we present Needle In A Multimodal Haystack (MM-NIAH), the first benchmark specifically designed to systematically evaluate the capability of existing MLLMs to comprehend long multimodal documents. Our benchmark includes three types of evaluation tasks: multimodal retrieval, counting, and reasoning. In each task, the model is required to answer the questions according to different key information scattered throughout the given multimodal document. Evaluating the leading MLLMs on MM-NIAH, we observe that existing models still have significant room for improvement on these tasks, especially on vision-centric evaluation. We hope this work can provide a platform for further research on long multimodal document comprehension and contribute to the advancement of MLLMs. Code and benchmark are released at https://github.com/OpenGVLab/MM-NIAH.

Updated: 2024-10-09 07:46:02

标题: 一个多模态干草堆中的针

摘要: 随着多模态大型语言模型（MLLMs）的快速发展，它们的评估变得越来越全面。然而，理解长篇多模态内容，作为现实世界应用的基础能力，仍然未被充分探索。在这项工作中，我们提出了“Needle In A Multimodal Haystack”（MM-NIAH），这是第一个专门设计用于系统评估现有MLLMs理解长篇多模态文档能力的基准。我们的基准包括三种类型的评估任务：多模态检索、计数和推理。在每个任务中，模型需要根据给定多模态文档中分散的不同关键信息来回答问题。通过在MM-NIAH上评估领先的MLLMs，我们观察到现有模型在这些任务上仍有显著改进空间，特别是在以视觉为中心的评估上。我们希望这项工作可以为长篇多模态文档理解的进一步研究提供平台，并促进MLLMs的发展。代码和基准已发布在https://github.com/OpenGVLab/MM-NIAH。

更新时间: 2024-10-09 07:46:02

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2406.07230v2

Subtle Errors Matter: Preference Learning via Error-injected Self-editing

Large Language Models (LLMs) have exhibited strong mathematical reasoning and computational prowess, tackling tasks ranging from basic arithmetic to advanced competition-level problems. However, frequently occurring subtle errors, such as miscalculations or incorrect substitutions, limit the models' full mathematical potential. Existing studies to improve mathematical ability typically involve distilling reasoning skills from stronger LLMs or applying preference learning to step-wise response pairs. Although these methods leverage samples of varying granularity to mitigate reasoning errors, they overlook the frequently occurring subtle errors. A major reason is that sampled preference pairs involve differences unrelated to the errors, which may distract the model from focusing on subtle errors. In this work, we propose a novel preference learning framework called eRror-Injected Self-Editing (RISE), which injects predefined subtle errors into partial tokens of correct solutions to construct hard pairs for error mitigation. In detail, RISE uses the model itself to edit a small number of tokens in the solution, injecting designed subtle errors. Then, pairs composed of self-edited solutions and their corresponding correct ones, along with pairs of correct and incorrect solutions obtained through sampling, are used together for subtle error-aware DPO training. Compared with other preference learning methods, RISE further refines the training objective to focus on predefined errors and their tokens, without requiring fine-grained sampling or preference annotation. Extensive experiments validate the effectiveness of RISE, with preference learning on Qwen2-7B-Instruct yielding notable improvements of 3.0% on GSM8K and 7.9% on MATH.

Updated: 2024-10-09 07:43:38

标题: 微小的错误很重要：通过注入错误进行自我编辑的偏好学习

摘要: 大型语言模型(LLMs)展示了强大的数学推理和计算能力，可以应对从基本算术到高级竞赛级问题的任务。然而，经常出现的微小错误，如错误计算或不正确的替换，限制了模型的完整数学潜力。现有的改进数学能力的研究通常涉及从更强的LLMs中提取推理技能，或者应用偏好学习来处理逐步响应对。尽管这些方法利用不同粒度的样本来减轻推理错误，但它们忽视了经常出现的微小错误。一个主要原因是采样偏好对涉及与错误无关的差异，这可能会使模型分散注意力，而不是集中在微小错误上。在这项工作中，我们提出了一种名为eRror-Injected Self-Editing (RISE)的新型偏好学习框架，它将预定义的微小错误注入到正确解决方案的部分标记中，以构建用于错误缓解的困难对。具体而言，RISE使用模型自身来编辑解决方案中的少量标记，注入设计的微小错误。然后，由自我编辑的解决方案和其相应的正确解决方案组成的对，以及通过采样获得的正确和不正确解决方案的对，一起用于微小错误感知DPO训练。与其他偏好学习方法相比，RISE进一步细化了训练目标，侧重于预定义的错误及其标记，而无需精细采样或偏好注释。大量实验证实了RISE的有效性，在Qwen2-7B-Instruct上进行的偏好学习取得了显著的改进，GSM8K上提高了3.0%，MATH上提高了7.9%。

更新时间: 2024-10-09 07:43:38

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2410.06638v1

Automatically Adaptive Conformal Risk Control

Science and technology have a growing need for effective mechanisms that ensure reliable, controlled performance from black-box machine learning algorithms. These performance guarantees should ideally hold conditionally on the input-that is the performance guarantees should hold, at least approximately, no matter what the input. However, beyond stylized discrete groupings such as ethnicity and gender, the right notion of conditioning can be difficult to define. For example, in problems such as image segmentation, we want the uncertainty to reflect the intrinsic difficulty of the test sample, but this may be difficult to capture via a conditioning event. Building on the recent work of Gibbs et al. [2023], we propose a methodology for achieving approximate conditional control of statistical risks-the expected value of loss functions-by adapting to the difficulty of test samples. Our framework goes beyond traditional conditional risk control based on user-provided conditioning events to the algorithmic, data-driven determination of appropriate function classes for conditioning. We apply this framework to various regression and segmentation tasks, enabling finer-grained control over model performance and demonstrating that by continuously monitoring and adjusting these parameters, we can achieve superior precision compared to conventional risk-control methods.

Updated: 2024-10-09 07:34:33

标题: 自适应一致风险控制

摘要: 科学技术对于确保黑匣子机器学习算法的可靠、可控性能有着越来越大的需求。这些性能保证应该在输入条件下成立，即至少在任何输入情况下，性能保证应该成立，至少近似成立。然而，除了类似种族和性别等离散分组外，适当的条件概念可能很难定义。例如，在图像分割等问题中，我们希望不确定性能够反映测试样本的固有难度，但这可能难以通过条件事件捕捉。借鉴Gibbs等人最近的工作，我们提出了一种方法论，通过适应测试样本的困难程度，实现对统计风险（损失函数的期望值）的近似条件控制。我们的框架超越了基于用户提供的条件事件的传统条件风险控制，转而基于算法和数据驱动的确定适当的函数类别进行条件控制。我们将这个框架应用于各种回归和分割任务，实现对模型性能的更精细控制，并表明通过不断监控和调整这些参数，我们可以实现比传统风险控制方法更优越的精度。

更新时间: 2024-10-09 07:34:33

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2406.17819v2

Progressively Label Enhancement for Large Language Model Alignment

Large Language Models (LLM) alignment aims to prevent models from producing content that misaligns with human expectations, which can lead to ethical and legal concerns. In the last few years, Reinforcement Learning from Human Feedback (RLHF) has been the most prominent method for achieving alignment. Due to challenges in stability and scalability with RLHF stages, which arise from the complex interactions between multiple models, researchers are exploring alternative methods to achieve effects comparable to those of RLHF. However, these methods often rely on large high-quality datasets. Despite some methods considering the generation of additional data to expand datasets, they often treat model training and data generation as separate and static processes, overlooking the fact that these processes are highly interdependent, leading to inefficient utilization of the generated data. To deal with this problem, we propose PLE, i.e., Progressively Label Enhancement for LLM Alignment, a framework that dynamically adjusts the model's training process based on the evolving quality of the generated data. Specifically, we prompt the model to generate responses for both the original query and the query guided by a set of carefully designed principles, and then utilize a dynamic threshold to determine the appropriate training approach for both responses based on their corresponding reward scores. Experimental results demonstrate the effectiveness of PLE compared to existing LLM alignment methods.

Updated: 2024-10-09 07:31:18

标题: 逐步标签增强用于大型语言模型对齐的进展

摘要: 大型语言模型（LLM）对齐的目标是防止模型产生与人类期望不符的内容，这可能引起道德和法律上的关注。在过去几年中，来自人类反馈的强化学习（RLHF）已成为实现对齐的最突出方法。由于RLHF阶段的稳定性和可扩展性方面存在挑战，这些挑战来自多个模型之间的复杂交互作用，研究人员正在探索实现与RLHF相当效果的替代方法。然而，这些方法通常依赖于大规模高质量的数据集。尽管有些方法考虑生成额外数据以扩展数据集，但它们往往将模型训练和数据生成视为独立和静态过程，忽视了这些过程高度相互依赖的事实，导致生成数据的低效利用。为了解决这个问题，我们提出了PLE，即逐步标签增强用于LLM对齐，这是一个框架，根据生成数据质量的演变动态调整模型的训练过程。具体来说，我们提示模型为原始查询和由一组精心设计的原则引导的查询生成响应，然后利用动态阈值根据它们对应的奖励分数确定适当的训练方法。实验结果证明了与现有LLM对齐方法相比，PLE的有效性。

更新时间: 2024-10-09 07:31:18

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2408.02599v2

ETA: Evaluating Then Aligning Safety of Vision Language Models at Inference Time

Vision Language Models (VLMs) have become essential backbones for multimodal intelligence, yet significant safety challenges limit their real-world application. While textual inputs are often effectively safeguarded, adversarial visual inputs can easily bypass VLM defense mechanisms. Existing defense methods are either resource-intensive, requiring substantial data and compute, or fail to simultaneously ensure safety and usefulness in responses. To address these limitations, we propose a novel two-phase inference-time alignment framework, Evaluating Then Aligning (ETA): 1) Evaluating input visual contents and output responses to establish a robust safety awareness in multimodal settings, and 2) Aligning unsafe behaviors at both shallow and deep levels by conditioning the VLMs' generative distribution with an interference prefix and performing sentence-level best-of-N to search the most harmless and helpful generation paths. Extensive experiments show that ETA outperforms baseline methods in terms of harmlessness, helpfulness, and efficiency, reducing the unsafe rate by 87.5% in cross-modality attacks and achieving 96.6% win-ties in GPT-4 helpfulness evaluation. The code is publicly available at https://github.com/DripNowhy/ETA.

Updated: 2024-10-09 07:21:43

标题: ETA: 在推理时评估和对齐视觉语言模型的安全性

摘要: 视觉语言模型（VLMs）已成为多模态智能的基础，但重大的安全挑战限制了它们在现实世界中的应用。虽然文本输入通常可以有效保护，但对抗性视觉输入很容易绕过VLM的防御机制。现有的防御方法要么资源密集，需要大量数据和计算，要么无法同时确保响应的安全性和实用性。为了解决这些限制，我们提出了一种新颖的两阶段推理时间对齐框架，评估然后对齐（ETA）：1）评估输入视觉内容和输出响应，建立多模态环境中的强大安全意识，2）通过将VLM的生成分布与干扰前缀进行条件化，并执行基于句子级别的N最佳搜索，来对浅层和深层的不安全行为进行对齐，从而找到最无害和有用的生成路径。大量实验证明，ETA在无害性、有用性和效率方面优于基线方法，在跨模态攻击中将不安全率降低了87.5％，在GPT-4有用性评估中取得了96.6％的胜平局。代码可以在https://github.com/DripNowhy/ETA 上公开获取。

更新时间: 2024-10-09 07:21:43

领域: cs.CV,cs.CL,cs.LG

下载: http://arxiv.org/abs/2410.06625v1

Online Bandit Learning with Offline Preference Data

Reinforcement Learning with Human Feedback (RLHF) is at the core of fine-tuning methods for generative AI models for language and images. Such feedback is often sought as rank or preference feedback from human raters, as opposed to eliciting scores since the latter tends to be noisy. On the other hand, RL theory and algorithms predominantly assume that a reward feedback is available. In particular, approaches for online learning that can be helpful in adaptive data collection via active learning cannot incorporate offline preference data. In this paper, we adopt a finite-armed linear bandit model as a prototypical model of online learning. We consider an offline preference dataset to be available generated by an expert of unknown 'competence'. We propose $\texttt{warmPref-PS}$, a posterior sampling algorithm for online learning that can be warm-started with an offline dataset with noisy preference feedback. We show that by modeling the 'competence' of the expert that generated it, we are able to use such a dataset most effectively. We support our claims with novel theoretical analysis of its Bayesian regret, as well as, extensive empirical evaluation of an approximate loss function that optimizes for infinitely many arms, and performs substantially better ($25$ to $50\%$ regret reduction) than baselines.

Updated: 2024-10-09 07:21:30

标题: 使用离线偏好数据进行在线强盗学习

摘要: 人类反馈强化学习（RLHF）是生成式AI模型（针对语言和图像）微调方法的核心。这种反馈通常以人类评分者的排名或偏好反馈形式寻求，而不是 eliciting scores，因为后者往往会带来噪音。另一方面，RL理论和算法主要假设有奖励反馈可用。特别是，适用于在线学习的方法可以通过主动学习帮助自适应数据收集，但不能整合离线偏好数据。在本文中，我们采用有限臂线性赌博模型作为在线学习的原型模型。我们假设有一个离线偏好数据集，由一个未知“能力”专家生成。我们提出了$\texttt{warmPref-PS}$，一种用于在线学习的后验抽样算法，可以通过带有噪声偏好反馈的离线数据集进行热启动。我们展示通过对生成者的“能力”进行建模，我们能够最有效地使用这样的数据集。我们通过对其贝叶斯遗憾的新颖理论分析以及对一个近似损失函数的广泛实证评估来支持我们的主张，该损失函数优化了无限多个臂，并表现出比基线更好的性能（遗憾减少$25$至$50\%$）。

更新时间: 2024-10-09 07:21:30

领域: cs.LG

下载: http://arxiv.org/abs/2406.09574v2

MINDECHO: Role-Playing Language Agents for Key Opinion Leaders

Large language models~(LLMs) have demonstrated impressive performance in various applications, among which role-playing language agents (RPLAs) have engaged a broad user base. Now, there is a growing demand for RPLAs that represent Key Opinion Leaders (KOLs), \ie, Internet celebrities who shape the trends and opinions in their domains. However, research in this line remains underexplored. In this paper, we hence introduce MINDECHO, a comprehensive framework for the development and evaluation of KOL RPLAs. MINDECHO collects KOL data from Internet video transcripts in various professional fields, and synthesizes their conversations leveraging GPT-4. Then, the conversations and the transcripts are used for individualized model training and inference-time retrieval, respectively. Our evaluation covers both general dimensions (\ie, knowledge and tones) and fan-centric dimensions for KOLs. Extensive experiments validate the effectiveness of MINDECHO in developing and evaluating KOL RPLAs.

Updated: 2024-10-09 07:19:34

标题: MINDECHO: 为关键意见领袖提供角色扮演语言代理人

摘要: 大型语言模型（LLMs）在各种应用中展示出令人印象深刻的性能，其中扮演角色的语言代理（RPLAs）已经吸引了广泛的用户群。现在，对代表关键意见领袖（KOLs）的RPLAs的需求不断增长，即塑造其领域的趋势和观点的网络名人。然而，这一领域的研究仍未被充分挖掘。因此，在本文中，我们介绍了MINDECHO，一个用于开发和评估KOL RPLAs的综合框架。MINDECHO从各个专业领域的互联网视频转录中收集KOL数据，并利用GPT-4合成他们的对话。然后，对话和转录分别用于个性化模型训练和推断时检索。我们的评估涵盖了KOL的一般维度（即知识和语调）和粉丝中心维度。广泛的实验验证了MINDECHO在开发和评估KOL RPLAs方面的有效性。

更新时间: 2024-10-09 07:19:34

领域: cs.AI

下载: http://arxiv.org/abs/2407.05305v2

Effective Exploration Based on the Structural Information Principles

Traditional information theory provides a valuable foundation for Reinforcement Learning, particularly through representation learning and entropy maximization for agent exploration. However, existing methods primarily concentrate on modeling the uncertainty associated with RL's random variables, neglecting the inherent structure within the state and action spaces. In this paper, we propose a novel Structural Information principles-based Effective Exploration framework, namely SI2E. Structural mutual information between two variables is defined to address the single-variable limitation in structural information, and an innovative embedding principle is presented to capture dynamics-relevant state-action representations. The SI2E analyzes value differences in the agent's policy between state-action pairs and minimizes structural entropy to derive the hierarchical state-action structure, referred to as the encoding tree. Under this tree structure, value-conditional structural entropy is defined and maximized to design an intrinsic reward mechanism that avoids redundant transitions and promotes enhanced coverage in the state-action space. Theoretical connections are established between SI2E and classical information-theoretic methodologies, highlighting our framework's rationality and advantage. Comprehensive evaluations in the MiniGrid, MetaWorld, and DeepMind Control Suite benchmarks demonstrate that SI2E significantly outperforms state-of-the-art exploration baselines regarding final performance and sample efficiency, with maximum improvements of 37.63% and 60.25%, respectively.

Updated: 2024-10-09 07:19:16

标题: 基于结构信息原则的有效探索

摘要: 传统的信息理论为强化学习提供了宝贵的基础，特别是通过表示学习和熵最大化来进行智能体探索。然而，现有方法主要集中于建模与强化学习的随机变量相关的不确定性，忽略了状态和动作空间内在结构。在本文中，我们提出了一种基于结构信息原则的有效探索框架，即SI2E。为了解决结构信息中的单变量限制，我们定义了两个变量之间的结构互信息，并提出了一种创新的嵌入原则，以捕捉动态相关的状态-动作表示。SI2E分析了智能体策略在状态-动作对之间的值差异，并最小化结构熵以推导层次状态-动作结构，称为编码树。在这种树结构下，定义了值条件结构熵，并最大化设计一个内在奖励机制，避免冗余转换并促进在状态-动作空间中的增强覆盖。在SI2E和经典信息理论方法之间建立了理论联系，突出了我们框架的合理性和优势。在MiniGrid、MetaWorld和DeepMind Control Suite基准测试中进行的全面评估表明，SI2E在最终性能和样本效率方面明显优于最新的探索基线，分别最大提高了37.63%和60.25%。

更新时间: 2024-10-09 07:19:16

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2410.06621v1

Retrieval Replace Reduction: An effective visual token reduction method via semantic match

Multimodal large language models (MLLMs) have demonstrated strong performance across various tasks without requiring training from scratch. However, they face significant computational and memory constraints, particularly when processing multimodal inputs that exceed context length, limiting their scalability. In this paper, we introduce a new approach, \textbf{TRSM} (\textbf{T}oken \textbf{R}eduction via \textbf{S}emantic \textbf{M}atch), which effectively reduces the number of visual tokens without compromising MLLM performance. Inspired by how humans process multimodal tasks, TRSM leverages semantic information from one modality to match relevant semantics in another, reducing the number of visual tokens.Specifically, to retain task relevant visual tokens, we use the text prompt as a query vector to retrieve the most similar vectors from the visual prompt and merge them with the text tokens. Based on experimental results, when applied to LLaVA-1.5\cite{liu2023}, our approach compresses the visual tokens by 20\%, achieving comparable performance across diverse visual question-answering and reasoning tasks.

Updated: 2024-10-09 07:13:22

标题: 检索替换减少：一种通过语义匹配实现有效的视觉标记减少方法

摘要: 多模态大型语言模型（MLLMs）在各种任务中表现出强大的性能，而无需从头开始训练。然而，它们面临着重要的计算和内存约束，特别是在处理超出上下文长度的多模态输入时，限制了它们的可扩展性。在本文中，我们介绍了一种新方法，TRSM（通过语义匹配减少标记），它有效地减少了视觉标记的数量，而不影响MLLM的性能。受人类如何处理多模态任务的启发，TRSM利用一种模态的语义信息来匹配另一种模态中的相关语义，减少视觉标记的数量。具体地，为了保留任务相关的视觉标记，我们使用文本提示作为查询向量，从视觉提示中检索最相似的向量，并将它们与文本标记合并。根据实验结果，在应用于LLaVA-1.5时，我们的方法将视觉标记压缩了20％，在各种视觉问答和推理任务中实现了可比较的性能。

更新时间: 2024-10-09 07:13:22

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2410.07278v1

$β$-calibration of Language Model Confidence Scores for Generative QA

To use generative question-and-answering (QA) systems for decision-making and in any critical application, these systems need to provide well-calibrated confidence scores that reflect the correctness of their answers. Existing calibration methods aim to ensure that the confidence score is on average indicative of the likelihood that the answer is correct. We argue, however, that this standard (average-case) notion of calibration is difficult to interpret for decision-making in generative QA. To address this, we generalize the standard notion of average calibration and introduce $\beta$-calibration, which ensures calibration holds across different question-and-answer groups. We then propose discretized posthoc calibration schemes for achieving $\beta$-calibration.

Updated: 2024-10-09 07:12:24

标题: 生成问答中语言模型置信度的$β$校准

摘要: 要将生成式问答（QA）系统用于决策和任何关键应用中，这些系统需要提供能反映其答案正确性的良好校准置信度分数。现有的校准方法旨在确保置信度分数平均上表明答案正确的可能性。然而，我们认为，这种标准（平均情况）的校准概念在生成式QA中的决策制定中很难解释。为了解决这个问题，我们推广了标准的平均校准概念，并引入了β-校准，确保校准在不同的问题和答案组中保持一致。然后，我们提出了离散后校准方案，以实现β-校准。

更新时间: 2024-10-09 07:12:24

领域: cs.CL,cs.LG

下载: http://arxiv.org/abs/2410.06615v1

PACER: A Fully Push-forward-based Distributional Reinforcement Learning Algorithm

In this paper, we propose the first fully push-forward-based distributional reinforcement learning algorithm, named PACER, which consists of a distributional critic, a stochastic actor and a sample-based encourager. Specifically, the push-forward operator is leveraged in both the critic and actor to model the return distributions and stochastic policies respectively, enabling them with equal modeling capability and thus enhancing the synergetic performance. Since it is infeasible to obtain the density function of the push-forward policies, novel sample-based regularizers are integrated in the encourager to incentivize efficient exploration and alleviate the risk of trapping into local optima. Moreover, a sample-based stochastic utility value policy gradient is established for the push-forward policy update, which circumvents the explicit demand of the policy density function in existing REINFORCE-based stochastic policy gradient. As a result, PACER fully utilizes the modeling capability of the push-forward operator and is able to explore a broader class of the policy space, compared with limited policy classes used in existing distributional actor critic algorithms (i.e. Gaussians). We validate the critical role of each component in our algorithm with extensive empirical studies. Experimental results demonstrate the superiority of our algorithm over the state-of-the-art.

Updated: 2024-10-09 07:11:15

标题: PACER：一种完全基于推进的分布式强化学习算法

摘要: 在本文中，我们提出了第一个完全基于推进式分布式强化学习算法PACER，它包括一个分布式评论家、一个随机执行者和一个基于样本的鼓励者。具体来说，在评论家和执行者中利用了推进算子来分别建模回报分布和随机策略，从而赋予它们相等的建模能力，从而增强了协同性能。由于无法获得推进策略的密度函数，因此在鼓励者中集成了新颖的基于样本的正则化器，以激励有效探索并减轻陷入局部最优的风险。此外，还建立了基于样本的随机效用值策略梯度用于推进策略更新，绕过了现有基于REINFORCE的随机策略梯度中对策略密度函数的显式需求。因此，与现有分布式演员评论家算法（即高斯分布）中使用的有限策略类别相比，PACER充分利用了推进算子的建模能力，能够探索更广泛的策略空间。我们通过大量经验研究验证了算法中每个组件的关键作用。实验结果显示我们的算法优于现有技术水平。

更新时间: 2024-10-09 07:11:15

领域: cs.LG

下载: http://arxiv.org/abs/2306.06637v2

Pair-VPR: Place-Aware Pre-training and Contrastive Pair Classification for Visual Place Recognition with Vision Transformers

In this work we propose a novel joint training method for Visual Place Recognition (VPR), which simultaneously learns a global descriptor and a pair classifier for re-ranking. The pair classifier can predict whether a given pair of images are from the same place or not. The network only comprises Vision Transformer components for both the encoder and the pair classifier, and both components are trained using their respective class tokens. In existing VPR methods, typically the network is initialized using pre-trained weights from a generic image dataset such as ImageNet. In this work we propose an alternative pre-training strategy, by using Siamese Masked Image Modelling as a pre-training task. We propose a Place-aware image sampling procedure from a collection of large VPR datasets for pre-training our model, to learn visual features tuned specifically for VPR. By re-using the Mask Image Modelling encoder and decoder weights in the second stage of training, Pair-VPR can achieve state-of-the-art VPR performance across five benchmark datasets with a ViT-B encoder, along with further improvements in localization recall with larger encoders. The Pair-VPR website is: https://csiro-robotics.github.io/Pair-VPR.

Updated: 2024-10-09 07:09:46

标题: Pair-VPR: 位置感知预训练和对比对分类的视觉位置识别与视觉转换器

摘要: 在这项工作中，我们提出了一种新颖的视觉地点识别（VPR）联合训练方法，该方法同时学习全局描述符和用于重新排序的一对分类器。这对分类器可以预测给定图像对是否来自同一地点。网络仅包含视觉变换器组件，用于编码器和对分类器，两个组件均使用各自的类令牌进行训练。在现有的VPR方法中，通常使用来自通用图像数据集（如ImageNet）的预训练权重来初始化网络。在这项工作中，我们提出了一种替代的预训练策略，即使用Siamese Masked Image Modelling作为预训练任务。我们提出了一种基于大型VPR数据集的地点感知图像采样过程，用于预训练我们的模型，以学习专门用于VPR的视觉特征。通过在第二阶段训练中重新使用Mask Image Modelling编码器和解码器权重，Pair-VPR可以在五个基准数据集上实现最先进的VPR性能，同时使用较大的编码器进一步提高定位召回率。Pair-VPR网站为：https://csiro-robotics.github.io/Pair-VPR。

更新时间: 2024-10-09 07:09:46

领域: cs.RO,cs.AI,cs.CV

下载: http://arxiv.org/abs/2410.06614v1

Evaluating the Generalization Ability of Spatiotemporal Model in Urban Scenario

Spatiotemporal neural networks have shown great promise in urban scenarios by effectively capturing temporal and spatial correlations. However, urban environments are constantly evolving, and current model evaluations are often limited to traffic scenarios and use data mainly collected only a few weeks after training period to evaluate model performance. The generalization ability of these models remains largely unexplored. To address this, we propose a Spatiotemporal Out-of-Distribution (ST-OOD) benchmark, which comprises six urban scenario: bike-sharing, 311 services, pedestrian counts, traffic speed, traffic flow, ride-hailing demand, and bike-sharing, each with in-distribution (same year) and out-of-distribution (next years) settings. We extensively evaluate state-of-the-art spatiotemporal models and find that their performance degrades significantly in out-of-distribution settings, with most models performing even worse than a simple Multi-Layer Perceptron (MLP). Our findings suggest that current leading methods tend to over-rely on parameters to overfit training data, which may lead to good performance on in-distribution data but often results in poor generalization. We also investigated whether dropout could mitigate the negative effects of overfitting. Our results showed that a slight dropout rate could significantly improve generalization performance on most datasets, with minimal impact on in-distribution performance. However, balancing in-distribution and out-of-distribution performance remains a challenging problem. We hope that the proposed benchmark will encourage further research on this critical issue.

Updated: 2024-10-09 07:03:50

标题: 评估城市场景中时空模型的泛化能力

摘要: 时空神经网络在城市场景中已经表现出很大的潜力，通过有效捕捉时间和空间相关性。然而，城市环境不断变化，目前的模型评估通常局限于交通场景，并且主要使用训练期后仅几周收集的数据来评估模型性能。这些模型的泛化能力仍然大部分未被探索。为了解决这个问题，我们提出了一个时空分布外（ST-OOD）基准，包括六种城市场景：共享单车、311服务、行人计数、交通速度、交通流量、打车需求，以及共享单车，每种场景都有内部分布（同年）和外部分布（接下来的几年）设置。我们广泛评估了最先进的时空模型，并发现它们在分布外设置中性能显著下降，大多数模型的表现甚至比简单的多层感知器（MLP）还要差。我们的发现表明，当前领先的方法往往过分依赖参数来过度拟合训练数据，这可能在内部分布数据上表现良好，但通常导致泛化能力差。我们还调查了辍学是否可以缓解过拟合的负面影响。我们的结果表明，轻微的辍学率可以显著提高大多数数据集的泛化性能，对内部分布性能的影响很小。然而，平衡内部分布和外部分布性能仍然是一个具有挑战性的问题。我们希望提出的基准可以促进进一步研究这个关键问题。

更新时间: 2024-10-09 07:03:50

领域: cs.LG,cs.AI,cs.CY,cs.DB

下载: http://arxiv.org/abs/2410.04740v2

Bahasa Harmony: A Comprehensive Dataset for Bahasa Text-to-Speech Synthesis with Discrete Codec Modeling of EnGen-TTS

This research introduces a comprehensive Bahasa text-to-speech (TTS) dataset and a novel TTS model, EnGen-TTS, designed to enhance the quality and versatility of synthetic speech in the Bahasa language. The dataset, spanning \textasciitilde55.0 hours and 52K audio recordings, integrates diverse textual sources, ensuring linguistic richness. A meticulous recording setup captures the nuances of Bahasa phonetics, employing professional equipment to ensure high-fidelity audio samples. Statistical analysis reveals the dataset's scale and diversity, laying the foundation for model training and evaluation. The proposed EnGen-TTS model performs better than established baselines, achieving a Mean Opinion Score (MOS) of 4.45 $\pm$ 0.13. Additionally, our investigation on real-time factor and model size highlights EnGen-TTS as a compelling choice, with efficient performance. This research marks a significant advancement in Bahasa TTS technology, with implications for diverse language applications. Link to Generated Samples: \url{https://bahasa-harmony-comp.vercel.app/}

Updated: 2024-10-09 07:01:05

标题: 巴哈萨和谐：一份全面的巴哈萨语文本转语音合成数据集，采用EnGen-TTS的离散编解码建模

摘要: 这项研究介绍了一个全面的Bahasa文本到语音（TTS）数据集和一种新颖的TTS模型EnGen-TTS，旨在提高Bahasa语言合成语音的质量和多样性。该数据集涵盖了约55.0小时和52K音频录音，整合了多种文本来源，确保语言丰富性。精心设计的录音设置捕捉了Bahasa语音学的细微差别，采用专业设备确保高保真度音频样本。统计分析揭示了数据集的规模和多样性，为模型训练和评估奠定了基础。提出的EnGen-TTS模型表现优于已建立的基线，实现了4.45±0.13的平均意见分数（MOS）。此外，我们对实时因素和模型大小的调查突出了EnGen-TTS作为一种引人关注的选择，具有高效的性能。这项研究标志着Bahasa TTS技术的重大进步，对多种语言应用具有重要意义。生成样本链接：https://bahasa-harmony-comp.vercel.app/

更新时间: 2024-10-09 07:01:05

领域: cs.SD,cs.AI,eess.AS

下载: http://arxiv.org/abs/2410.06608v1

A Notion of Complexity for Theory of Mind via Discrete World Models

Theory of Mind (ToM) can be used to assess the capabilities of Large Language Models (LLMs) in complex scenarios where social reasoning is required. While the research community has proposed many ToM benchmarks, their hardness varies greatly, and their complexity is not well defined. This work proposes a framework inspired by cognitive load theory to measure the complexity of ToM tasks. We quantify a problem's complexity as the number of states necessary to solve it correctly. Our complexity measure also accounts for spurious states of a ToM problem designed to make it apparently harder. We use our method to assess the complexity of five widely adopted ToM benchmarks. On top of this framework, we design a prompting technique that augments the information available to a model with a description of how the environment changes with the agents' interactions. We name this technique Discrete World Models (DWM) and show how it elicits superior performance on ToM tasks.

Updated: 2024-10-09 06:59:31

标题: 一种关于心灵理论的复杂性概念，通过离散世界模型

摘要: Theory of Mind（ToM）可用于评估大型语言模型（LLMs）在需要社会推理的复杂场景中的能力。虽然研究界提出了许多ToM基准，但它们的难度差异很大，复杂性未被明确定义。本研究提出了一个受认知负荷理论启发的框架，用于衡量ToM任务的复杂性。我们将问题的复杂性量化为解决它所需的状态数量。我们的复杂度度量还考虑了ToM问题的虚假状态，旨在使问题显得更难。我们使用我们的方法评估了五个广泛采用的ToM基准的复杂性。在这一框架之上，我们设计了一种提示技术，将模型可用信息与描述环境如何随着代理互动而改变的信息相结合。我们将这种技术命名为离散世界模型（DWM），并展示了它如何在ToM任务上引发出卓越的表现。

更新时间: 2024-10-09 06:59:31

领域: cs.AI,cs.CL,cs.LG

下载: http://arxiv.org/abs/2406.11911v3

Swin-BERT: A Feature Fusion System designed for Speech-based Alzheimer's Dementia Detection

Speech is usually used for constructing an automatic Alzheimer's dementia (AD) detection system, as the acoustic and linguistic abilities show a decline in people living with AD at the early stages. However, speech includes not only AD-related local and global information but also other information unrelated to cognitive status, such as age and gender. In this paper, we propose a speech-based system named Swin-BERT for automatic dementia detection. For the acoustic part, the shifted windows multi-head attention that proposed to extract local and global information from images, is used for designing our acoustic-based system. To decouple the effect of age and gender on acoustic feature extraction, they are used as an extra input of the designed acoustic system. For the linguistic part, the rhythm-related information, which varies significantly between people living with and without AD, is removed while transcribing the audio recordings into transcripts. To compensate for the removed rhythm-related information, the character-level transcripts are proposed to be used as the extra input of a word-level BERT-style system. Finally, the Swin-BERT combines the acoustic features learned from our proposed acoustic-based system with our linguistic-based system. The experiments are based on the two datasets provided by the international dementia detection challenges: the ADReSS and ADReSSo. The results show that both the proposed acoustic and linguistic systems can be better or comparable with previous research on the two datasets. Superior results are achieved by the proposed Swin-BERT system on the ADReSS and ADReSSo datasets, which are 85.58\% F-score and 87.32\% F-score respectively.

Updated: 2024-10-09 06:58:20

标题: Swin-BERT：一种为基于语音的阿尔茨海默病检测设计的特征融合系统

摘要: 言语通常用于构建自动阿尔茨海默病（AD）检测系统，因为声学和语言能力在早期患有AD的人群中呈下降趋势。然而，言语不仅包含与AD相关的局部和全局信息，还包含与认知状态无关的其他信息，如年龄和性别。本文提出了一种名为Swin-BERT的基于言语的自动痴呆检测系统。对于声学部分，我们采用了提出的移位窗口多头注意力机制，用于设计我们基于声学的系统，以提取图像中的局部和全局信息。为了消除年龄和性别对声学特征提取的影响，它们被用作设计声学系统的额外输入。对于语言部分，将音频记录转录为文本时删除了在患有和未患有AD的人群之间明显变化的与节奏相关的信息。为了补偿删除的与节奏相关的信息，提议使用字符级别文本作为词级别BERT风格系统的额外输入。最后，Swin-BERT将从我们提出的基于声学的系统学习的声学特征与我们的基于语言的系统相结合。实验基于国际痴呆检测挑战提供的两个数据集：ADReSS和ADReSSo。结果显示，所提出的声学和语言系统在这两个数据集上要么更好，要么与先前的研究相媲美。所提出的Swin-BERT系统在ADReSS和ADReSSo数据集上取得了优异的结果，分别为85.58％的F分数和87.32％的F分数。

更新时间: 2024-10-09 06:58:20

领域: eess.AS,cs.AI,cs.CL,cs.SD

下载: http://arxiv.org/abs/2410.07277v1

Scalable Event-by-event Processing of Neuromorphic Sensory Signals With Deep State-Space Models

Event-based sensors are well suited for real-time processing due to their fast response times and encoding of the sensory data as successive temporal differences. These and other valuable properties, such as a high dynamic range, are suppressed when the data is converted to a frame-based format. However, most current methods either collapse events into frames or cannot scale up when processing the event data directly event-by-event. In this work, we address the key challenges of scaling up event-by-event modeling of the long event streams emitted by such sensors, which is a particularly relevant problem for neuromorphic computing. While prior methods can process up to a few thousand time steps, our model, based on modern recurrent deep state-space models, scales to event streams of millions of events for both training and inference. We leverage their stable parameterization for learning long-range dependencies, parallelizability along the sequence dimension, and their ability to integrate asynchronous events effectively to scale them up to long event streams. We further augment these with novel event-centric techniques enabling our model to match or beat the state-of-the-art performance on several event stream benchmarks. In the Spiking Speech Commands task, we improve state-of-the-art by a large margin of 7.7% to 88.4%. On the DVS128-Gestures dataset, we achieve competitive results without using frames or convolutional neural networks. Our work demonstrates, for the first time, that it is possible to use fully event-based processing with purely recurrent networks to achieve state-of-the-art task performance in several event-based benchmarks.

Updated: 2024-10-09 06:57:39

标题: 使用深度状态空间模型进行神经形态感知信号的可扩展逐事件处理

摘要: 基于事件的传感器由于其快速响应时间和将感官数据编码为连续时间差异而非常适合实时处理。当数据转换为基于帧的格式时，这些以及其他有价值的特性，如高动态范围，就会被压制。然而，大多数当前的方法要么将事件合并为帧，要么在直接逐个事件处理事件数据时无法扩展。在这项工作中，我们解决了通过此类传感器发射的长事件流进行事件逐事件建模的扩展的关键挑战，这是神经形态计算的一个特别相关的问题。虽然之前的方法可以处理数千个时间步，但我们基于现代递归深度状态空间模型的模型可以扩展到数百万事件的事件流，用于训练和推理。我们利用它们稳定的参数化来学习长期依赖关系，沿序列维度的可并行性以及它们有效集成异步事件的能力，将它们扩展到长事件流。我们进一步采用了新颖的以事件为中心的技术，使我们的模型能够在几个事件流基准测试中达到或超越最先进的性能。在“尖峰语音命令”任务中，我们将最先进的性能大幅提高了7.7%，达到88.4%。在DVS128-Gestures数据集上，我们取得了竞争性的结果，而无需使用帧或卷积神经网络。我们的工作首次证明，完全基于事件的处理结合纯粹的递归网络可以在几个基于事件的基准测试中实现最先进的任务性能。

更新时间: 2024-10-09 06:57:39

领域: cs.LG,cs.AI,cs.NE

下载: http://arxiv.org/abs/2404.18508v3

RoCP-GNN: Robust Conformal Prediction for Graph Neural Networks in Node-Classification

Graph Neural Networks (GNNs) have emerged as powerful tools for predicting outcomes in graph-structured data. However, a notable limitation of GNNs is their inability to provide robust uncertainty estimates, which undermines their reliability in contexts where errors are costly. One way to address this issue is by providing prediction sets that contain the true label with a predefined probability margin. Our approach builds upon conformal prediction (CP), a framework that promises to construct statistically robust prediction sets or intervals. There are two primary challenges: first, given dependent data like graphs, it is unclear whether the critical assumption in CP - exchangeability - still holds when applied to node classification. Second, even if the exchangeability assumption is valid for conformalized link prediction, we need to ensure high efficiency, i.e., the resulting prediction set or the interval length is small enough to provide useful information. In this article, we propose a novel approach termed Robust Conformal Prediction for GNNs (RoCP-GNN), which integrates conformal prediction (CP) directly into the GNN training process. This method generates prediction sets, instead of just point predictions, that are valid at a user-defined confidence level, assuming only exchangeability. Our approach robustly predicts outcomes with any predictive GNN model while quantifying the uncertainty in predictions within the realm of graph-based semi-supervised learning (SSL). Experimental results demonstrate that GNN models with size loss provide a statistically significant increase in performance. We validate our approach on standard graph benchmark datasets by coupling it with various state-of-the-art GNNs in node classification. The code will be made available after publication.

Updated: 2024-10-09 06:54:58

标题: RoCP-GNN：节点分类中图神经网络的鲁棒置信预测

摘要: 图神经网络（GNNs）已经成为预测图结构数据结果的强大工具。然而，GNNs的一个显著局限性是它们无法提供稳健的不确定性估计，这削弱了它们在错误成本高昂的情境中的可靠性。解决这个问题的一种方法是提供包含真实标签的预测集，其具有预定义的概率边界。我们的方法建立在一种称为符合预测（CP）的框架之上，该框架承诺构建统计上稳健的预测集或区间。存在两个主要挑战：首先，对于像图这样的相关数据，不清楚在应用于节点分类时，CP中的关键假设 - 可交换性 - 是否仍然成立。其次，即使在符合化的链接预测中，可交换性假设对于有效性是有效的，我们需要确保高效性，即生成的预测集或区间长度足够小，以提供有用信息。在本文中，我们提出了一种名为图神经网络的鲁棒符合预测（RoCP-GNN）的新方法，该方法将符合预测（CP）直接整合到GNN训练过程中。该方法生成预测集，而不仅仅是点预测，这些预测集在用户定义的置信水平下是有效的，仅假设可交换性。我们的方法可以在任何预测性GNN模型中稳健地预测结果，同时在基于图的半监督学习（SSL）领域内量化预测的不确定性。实验结果表明，具有大小损失的GNN模型在性能上提供了统计上显著的增加。我们通过与各种最先进的GNNs结合在节点分类中的标准图基准数据集上验证了我们的方法。代码将在发布后提供。

更新时间: 2024-10-09 06:54:58

领域: cs.LG,cs.AI,stat.ML

下载: http://arxiv.org/abs/2408.13825v2

Can Separators Improve Chain-of-Thought Prompting?

Chain-of-thought (CoT) prompting is a simple and effective method for improving the reasoning capabilities of Large Language Models (LLMs). The basic idea of CoT is to let LLMs break down their thought processes step-by-step by putting exemplars in the input prompt. However, the densely structured prompt exemplars of CoT may cause the cognitive overload of LLMs. Inspired by human cognition, we introduce COT-SEP, a method that strategically employs separators at the end of each exemplar in CoT prompting. These separators are designed to help the LLMs understand their thought processes better while reasoning. Interestingly, it turns out that COT-SEP significantly improves the LLMs' performances on complex reasoning tasks (e.g., GSM8K, AQuA, CSQA), compared with the vanilla CoT, which does not use separators. We also study the effects of the type and the location of separators tested on multiple LLMs, including GPT-3.5-Turbo, GPT-4, and LLaMA-2 7B.

Updated: 2024-10-09 06:54:29

标题: 分隔符能改善思维链提示吗？

摘要: Chain-of-thought (CoT)提示是一种简单有效的方法，用于提高大型语言模型（LLMs）的推理能力。CoT的基本思想是通过在输入提示中放置示例让LLMs逐步分解他们的思维过程。然而，CoT提示的密集结构示例可能会导致LLMs的认知负荷过载。受人类认知的启发，我们引入了COT-SEP，这是一种策略性地在CoT提示中在每个示例的末尾使用分隔符的方法。这些分隔符旨在帮助LLMs在推理过程中更好地理解他们的思维过程。有趣的是，结果表明，与不使用分隔符的普通CoT相比，COT-SEP显着提高了LLMs在复杂推理任务（例如GSM8K，AQuA，CSQA）上的表现。我们还研究了在多个LLMs上测试的分隔符类型和位置对其效果的影响，包括GPT-3.5-Turbo，GPT-4和LLaMA-27B。

更新时间: 2024-10-09 06:54:29

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2402.10645v3

DRUPI: Dataset Reduction Using Privileged Information

Dataset reduction (DR) seeks to select or distill samples from large datasets into smaller subsets while preserving performance on target tasks. Existing methods primarily focus on pruning or synthesizing data in the same format as the original dataset, typically the input data and corresponding labels. However, in DR settings, we find it is possible to synthesize more information beyond the data-label pair as an additional learning target to facilitate model training. In this paper, we introduce Dataset Reduction Using Privileged Information (DRUPI), which enriches DR by synthesizing privileged information alongside the reduced dataset. This privileged information can take the form of feature labels or attention labels, providing auxiliary supervision to improve model learning. Our findings reveal that effective feature labels must balance between being overly discriminative and excessively diverse, with a moderate level proving optimal for improving the reduced dataset's efficacy. Extensive experiments on ImageNet, CIFAR-10/100, and Tiny ImageNet demonstrate that DRUPI integrates seamlessly with existing dataset reduction methods, offering significant performance gains. *The code will be released after the paper is accepted.*

Updated: 2024-10-09 06:52:54

标题: DRUPI：使用特权信息进行数据集减少

摘要: 数据集缩减（DR）旨在从大型数据集中选择或提取样本，形成较小的子集，同时保持对目标任务的性能。现有方法主要集中在修剪或合成与原始数据集相同格式的数据，通常是输入数据和相应的标签。然而，在DR设置中，我们发现可以合成更多信息，作为额外的学习目标以促进模型训练。在本文中，我们介绍了使用特权信息进行数据集缩减（DRUPI），通过在减少的数据集旁边合成特权信息来丰富DR。这些特权信息可以采取特征标签或关注标签的形式，提供辅助监督以改善模型学习。我们的研究发现，有效的特征标签必须在过于具有区分性和过于多样化之间取得平衡，适度水平被证明是改善减少数据集有效性的最佳选择。对ImageNet、CIFAR-10/100和Tiny ImageNet的大量实验表明，DRUPI与现有数据集缩减方法无缝集成，提供了显著的性能增益。*本文被接受后将发布代码。*

更新时间: 2024-10-09 06:52:54

领域: cs.CV,cs.AI,cs.LG

下载: http://arxiv.org/abs/2410.01611v2

On the Expressive Power of Sparse Geometric MPNNs

Motivated by applications in chemistry and other sciences, we study the expressive power of message-passing neural networks for geometric graphs, whose node features correspond to 3-dimensional positions. Recent work has shown that such models can separate \emph{generic} pairs of non-isomorphic geometric graphs, though they may fail to separate some rare and complicated instances. However, these results assume a fully connected graph, where each node possesses complete knowledge of all other nodes. In contrast, often, in application, every node only possesses knowledge of a small number of nearest neighbors. This paper shows that generic pairs of non-isomorphic geometric graphs can be separated by message-passing networks with rotation equivariant features as long as the underlying graph is connected. When only invariant intermediate features are allowed, generic separation is guaranteed for generically globally rigid graphs. We introduce a simple architecture, $\us$, which achieves our theoretical guarantees and compares favorably with alternative architecture on synthetic and chemical benchmarks. Our code is available at \url{https://github.com/yonatansverdlov/E-GenNet}.

Updated: 2024-10-09 06:47:38

标题: 关于稀疏几何MPNNs的表达能力

摘要: 受化学和其他科学应用的启发，我们研究了消息传递神经网络在几何图中的表达能力，其中节点特征对应于三维位置。最近的研究表明，这种模型可以分离\emph{通用}的非同构几何图对，尽管它们可能无法分离一些罕见和复杂的实例。然而，这些结果假定一个完全连接的图，其中每个节点都拥有所有其他节点的完全知识。相比之下，在应用中，通常每个节点只拥有少数最近邻节点的知识。本文表明，只要底层图是连通的，具有旋转等变特征的消息传递网络可以分离通用的非同构几何图对。当只允许不变的中间特征时，对于通常的全局刚性图，通用分离是保证的。我们引入了一个简单的结构，$\us$，实现了我们的理论保证，并在合成和化学基准测试中与替代结构进行了比较。我们的代码可在\url{https://github.com/yonatansverdlov/E-GenNet}获取。

更新时间: 2024-10-09 06:47:38

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2407.02025v2

Neural Networks Learn Statistics of Increasing Complexity

The distributional simplicity bias (DSB) posits that neural networks learn low-order moments of the data distribution first, before moving on to higher-order correlations. In this work, we present compelling new evidence for the DSB by showing that networks automatically learn to perform well on maximum-entropy distributions whose low-order statistics match those of the training set early in training, then lose this ability later. We also extend the DSB to discrete domains by proving an equivalence between token $n$-gram frequencies and the moments of embedding vectors, and by finding empirical evidence for the bias in LLMs. Finally we use optimal transport methods to surgically edit the low-order statistics of one class to match those of another, and show that early-training networks treat the edited samples as if they were drawn from the target class. Code is available at https://github.com/EleutherAI/features-across-time.

Updated: 2024-10-09 06:43:49

标题: 神经网络学习逐渐复杂的统计学特征

摘要: 分布简单性偏差（DSB）认为神经网络首先学习数据分布的低阶矩，然后再转向更高阶的相关性。在这项工作中，我们通过展示网络在训练早期自动学习在最大熵分布上表现良好，其低阶统计数据与训练集相匹配，然后在训练后期失去这种能力，提供了令人信服的新证据支持DSB。我们还将DSB扩展到离散领域，证明了标记$n$-gram频率与嵌入向量的矩之间的等价性，并在LLMs中找到了偏差的经验证据。最后，我们使用最优输运方法对一类的低阶统计数据进行手术式编辑，使其与另一类相匹配，并展示早期训练的网络将编辑样本视为来自目标类别。代码可在https://github.com/EleutherAI/features-across-time找到。

更新时间: 2024-10-09 06:43:49

领域: cs.LG

下载: http://arxiv.org/abs/2402.04362v3

Mitigation of gender bias in automatic facial non-verbal behaviors generation

Research on non-verbal behavior generation for social interactive agents focuses mainly on the believability and synchronization of non-verbal cues with speech. However, existing models, predominantly based on deep learning architectures, often perpetuate biases inherent in the training data. This raises ethical concerns, depending on the intended application of these agents. This paper addresses these issues by first examining the influence of gender on facial non-verbal behaviors. We concentrate on gaze, head movements, and facial expressions. We introduce a classifier capable of discerning the gender of a speaker from their non-verbal cues. This classifier achieves high accuracy on both real behavior data, extracted using state-of-the-art tools, and synthetic data, generated from a model developed in previous work.Building upon this work, we present a new model, FairGenderGen, which integrates a gender discriminator and a gradient reversal layer into our previous behavior generation model. This new model generates facial non-verbal behaviors from speech features, mitigating gender sensitivity in the generated behaviors. Our experiments demonstrate that the classifier, developed in the initial phase, is no longer effective in distinguishing the gender of the speaker from the generated non-verbal behaviors.

Updated: 2024-10-09 06:41:24

标题: 减轻自动面部非语言行为生成中的性别偏见

摘要: 针对社交互动代理的非语言行为生成的研究主要集中在非语言提示与语音的可信度和同步性。然而，现有模型主要基于深度学习架构，往往会延续训练数据中固有的偏见。这引发了伦理关注，取决于这些代理的预期应用。本文首先通过检查性别对面部非语言行为的影响来解决这些问题。我们集中于凝视、头部运动和面部表情。我们引入了一个能够从说话者的非语言提示中分辨出性别的分类器。该分类器在使用最先进工具提取的真实行为数据和从先前工作中开发的模型生成的合成数据上均实现了高准确度。在这项工作的基础上，我们提出了一个新模型FairGenderGen，该模型将性别鉴别器和梯度反转层整合到我们先前的行为生成模型中。这个新模型从语音特征生成面部非语言行为，减轻了生成行为中的性别敏感性。我们的实验表明，在初始阶段开发的分类器不再有效地区分说话者的性别与生成的非语言行为。

更新时间: 2024-10-09 06:41:24

领域: cs.CV,cs.AI,cs.HC,cs.LG,cs.NE

下载: http://arxiv.org/abs/2410.07274v1

OpenDriver: An Open-Road Driver State Detection Dataset

Among numerous studies for driver state detection, wearable physiological measurements offer a practical method for real-time monitoring. However, there are few driver physiological datasets in open-road scenarios, and the existing datasets suffer from issues such as poor signal quality, small sample sizes, and short data collection periods. Therefore, in this paper, a large-scale multimodal driving dataset, OpenDriver, for driver state detection is developed. The OpenDriver encompasses a total of 3,278 driving trips, with a signal collection duration spanning approximately 4,600 hours. Two modalities of driving signals are enrolled in OpenDriver: electrocardiogram (ECG) signals and six-axis motion data of the steering wheel from a motion measurement unit (IMU), which were recorded from 81 drivers and their vehicles. Furthermore, three challenging tasks are involved in our work, namely ECG signal quality assessment, individual biometric identification based on ECG signals, and physiological signal analysis in complex driving environments. To facilitate research in these tasks, corresponding benchmarks have also been introduced. First, a noisy augmentation strategy is applied to generate a larger-scale ECG signal dataset with realistic noise simulation for quality assessment. Second, an end-to-end contrastive learning framework is employed for individual biometric identification. Finally, a comprehensive analysis of drivers' HRV features under different driving conditions is conducted. Each benchmark provides evaluation metrics and reference results. The OpenDriver dataset will be publicly available at https://github.com/bdne/OpenDriver.

Updated: 2024-10-09 06:40:16

标题: OpenDriver：一个开放道路驾驶状态检测数据集

摘要: 在许多针对驾驶员状态检测的研究中，可穿戴生理测量为实时监测提供了一种实用方法。然而，在开放道路场景中很少有驾驶员生理数据集，现有数据集存在信号质量差、样本量小和数据收集周期短等问题。因此，在本文中，开发了一个用于驾驶员状态检测的大规模多模态驾驶数据集OpenDriver。OpenDriver包含总共3,278次驾驶行程，信号收集持续时间约为4,600小时。OpenDriver将两种驾驶信号纳入其中：来自运动测量单元（IMU）的心电图（ECG）信号和方向盘的六轴运动数据，这些数据来自81名驾驶员及其车辆。此外，我们的工作涉及三个具有挑战性的任务，即ECG信号质量评估、基于ECG信号的个体生物特征识别以及在复杂驾驶环境中的生理信号分析。为了促进这些任务的研究，还引入了相应的基准。首先，应用噪声增强策略生成具有逼真噪声模拟的更大规模ECG信号数据集进行质量评估。其次，采用端到端对比学习框架进行个体生物特征识别。最后，对不同驾驶条件下驾驶员的HRV特征进行了全面分析。每个基准提供评估指标和参考结果。OpenDriver数据集将在https://github.com/bdne/OpenDriver公开提供。

更新时间: 2024-10-09 06:40:16

领域: cs.AI,cs.CV,cs.HC,cs.LG

下载: http://arxiv.org/abs/2304.04203v2

Bots can Snoop: Uncovering and Mitigating Privacy Risks of Bots in Group Chats

New privacy concerns arise with chatbots on group messaging platforms. Chatbots may access information beyond their intended functionalities, such as messages unintended for chatbots or sender's identities. Chatbot operators may exploit such information to infer personal information and link users across groups, potentially leading to personal data breaches, pervasive tracking, and targeted advertising. Our analysis of conversation datasets shows that (1) chatbots often access far more messages than needed, and (2) when a user joins a new group with chatbots, there is a 3.4% chance that at least one of the chatbots can recognize and associate the user with their previous interactions in other groups. Although state-of-the-art group messaging protocols provide robust end-to-end security and some platforms have implemented policies to limit chatbot access, no platforms successfully combine these features. This paper introduces SnoopGuard, a secure group messaging protocol that ensures user privacy against chatbots while maintaining strong end-to-end security. Our method offers selective message access, preventing chatbots from accessing unrelated messages, and ensures sender anonymity within the group. SnoopGuard achieves $O(\log n + m)$ message-sending complexity for a group of $n$ users and $m$ chatbots, compared to $O(\log(n + m))$ in state-of-the-art protocols, with acceptable overhead for enhanced privacy. Our prototype implementation shows that sending a message in a group of 50 users and 10 chatbots takes about 30 milliseconds when integrated with Message Layer Security (MLS).

Updated: 2024-10-09 06:37:41

标题: 机器人可以偷窥：揭示和减轻群聊中机器人的隐私风险

摘要: 随着群组消息平台上的聊天机器人的出现，新的隐私问题浮出水面。聊天机器人可能会访问超出其预期功能范围的信息，比如不适合聊天机器人或发送者身份的消息。聊天机器人运营商可能会利用这些信息来推断个人信息，并在群组之间链接用户，潜在地导致个人数据泄露、普遍跟踪和定向广告。我们对对话数据集的分析显示：（1）聊天机器人经常访问比所需更多的消息，以及（2）当用户加入一个带有聊天机器人的新群组时，至少有一个聊天机器人能够识别并将用户与他们在其他群组中的先前互动联系起来的概率为3.4%。尽管最先进的群组消息协议提供了强大的端到端安全性，一些平台已实施了限制聊天机器人访问的政策，但没有平台成功地结合了这些功能。本文介绍了SnoopGuard，这是一种安全的群组消息协议，可以确保用户隐私免受聊天机器人的侵犯，同时保持强大的端到端安全性。我们的方法提供了选择性的消息访问，防止聊天机器人访问不相关的消息，并确保组内发送者匿名。与现有最先进的协议相比，SnoopGuard在$n$用户和$m$聊天机器人的群组中实现了$O(\log n + m)$的消息发送复杂度，而现有协议为$O(\log(n + m))，同时对增强隐私性具有可接受的开销。我们的原型实现显示，当与消息层安全性(MLS)集成时，在一个包含50个用户和10个聊天机器人的群组中发送一条消息大约需要30毫秒。

更新时间: 2024-10-09 06:37:41

领域: cs.CR

下载: http://arxiv.org/abs/2410.06587v1

Intelligent Repetition Counting for Unseen Exercises: A Few-Shot Learning Approach with Sensor Signals

Sensing technology has significantly advanced in automating systems that reflect human movement, particularly in robotics and healthcare, where it is used to automatically detect target movements. This study develops a method to automatically count exercise repetitions by analyzing IMU signals, with a focus on a universal exercise repetition counting task that counts all types of exercise movements, including novel exercises not seen during training, using a single model. Since peak patterns can vary significantly between different exercises as well as between individuals performing the same exercise, the model needs to learn a complex embedding space of sensor data to generalize effectively. To address this challenge,we propose a repetition counting technique utilizing a deep metric-based few-shot learning approach, designed to handle both existing and novel exercises. By redefining the counting task as a few-shot classification problem, the method is capable of detecting peak repetition patterns in exercises not seen during training. The approach employs a Siamese network with triplet loss, optimizing the embedding space to distinguish between peak and non-peak frames. Evaluation results demonstrate the effectiveness of the proposed approach, showing an 86.8% probability of accurately counting ten or more repetitions within a single set across 28 different exercises. This performance highlights the model's ability to generalize across various exercise types, including those not present in the training data. Such robustness and adaptability make the system a strong candidate for real-time implementation in fitness and healthcare applications.

Updated: 2024-10-09 06:37:36

标题: 智能重复计数对未知的锻炼：一种传感器信号的少样本学习方法

摘要: 感知技术在自动化系统中取得了显著进展，特别是在机器人和医疗保健领域，用于自动检测目标运动。本研究开发了一种通过分析IMU信号来自动计算运动重复次数的方法，重点是一个通用的运动重复计数任务，可以计算所有类型的运动，包括在训练期间没有见过的新型运动，使用单一模型。由于不同运动之间的峰值模式可以显著变化，以及执行相同运动的个体之间的差异，模型需要学习传感器数据的复杂嵌入空间，以有效地泛化。为了解决这一挑战，我们提出了一种利用深度度量为基础的少样本学习方法来进行重复计数，旨在处理现有和新型运动。通过将计数任务重新定义为少样本分类问题，该方法能够检测在训练期间未见过的运动中的峰值重复模式。该方法采用三重损失的孪生网络，优化嵌入空间以区分峰值和非峰值帧。评估结果显示了所提方法的有效性，表明在28种不同的运动中，在一个单独的集合中准确计数十次或更多次的概率为86.8%。这种性能突显了模型跨越各种运动类型的泛化能力，包括训练数据中不存在的运动。这种稳健性和适应性使该系统成为健身和医疗应用中实时实施的强有力候选。

更新时间: 2024-10-09 06:37:36

领域: cs.LG

下载: http://arxiv.org/abs/2410.00407v2

Point Cloud Compression with Bits-back Coding

This paper introduces a novel lossless compression method for compressing geometric attributes of point cloud data with bits-back coding. Our method specializes in using a deep learning-based probabilistic model to estimate the Shannon's entropy of the point cloud information, i.e., geometric attributes of the 3D floating points. Once the entropy of the point cloud dataset is estimated with a convolutional variational autoencoder (CVAE), we use the learned CVAE model to compress the geometric attributes of the point clouds with the bits-back coding technique. The novelty of our method with bits-back coding specializes in utilizing the learned latent variable model of the CVAE to compress the point cloud data. By using bits-back coding, we can capture the potential correlation between the data points, such as similar spatial features like shapes and scattering regions, into the lower-dimensional latent space to further reduce the compression ratio. The main insight of our method is that we can achieve a competitive compression ratio as conventional deep learning-based approaches, while significantly reducing the overhead cost of storage and/or communicating the compression codec, making our approach more applicable in practical scenarios. Throughout comprehensive evaluations, we found that the cost for the overhead is significantly small, compared to the reduction of the compression ratio when compressing large point cloud datasets. Experiment results show that our proposed approach can achieve a compression ratio of 1.56 bit-per-point on average, which is significantly lower than the baseline approach such as Google's Draco with a compression ratio of 1.83 bit-per-point.

Updated: 2024-10-09 06:34:48

标题: 点云压缩与比特回传编码

摘要: 这篇论文介绍了一种新的无损压缩方法，用于压缩点云数据的几何属性，并采用了bits-back编码。我们的方法专注于使用基于深度学习的概率模型来估计点云信息的Shannon熵，即3D浮点的几何属性。一旦使用卷积变分自动编码器（CVAE）估计了点云数据集的熵，我们就使用学习到的CVAE模型来压缩点云的几何属性，采用bits-back编码技术。我们的方法的新颖之处在于利用CVAE的学习潜变量模型来压缩点云数据。通过使用bits-back编码，我们可以将数据点之间的潜在相关性捕获到较低维度的潜变量空间中，进一步降低压缩比。我们方法的主要见解在于，我们可以实现与传统基于深度学习方法相当的压缩比，同时显著降低存储和/或传输压缩编解码器的开销，使我们的方法更适用于实际场景。通过全面评估，我们发现与压缩大点云数据集时的压缩比降低相比，开销成本显著较小。实验结果显示，我们提出的方法平均每点可以实现1.56比特的压缩比，明显低于基线方法，如Google的Draco，其每点的压缩比为1.83比特。

更新时间: 2024-10-09 06:34:48

领域: cs.CV,cs.AI,cs.LG

下载: http://arxiv.org/abs/2410.18115v1

RankSHAP: Shapley Value Based Feature Attributions for Learning to Rank

Numerous works propose post-hoc, model-agnostic explanations for learning to rank, focusing on ordering entities by their relevance to a query through feature attribution methods. However, these attributions often weakly correlate or contradict each other, confusing end users. We adopt an axiomatic game-theoretic approach, popular in the feature attribution community, to identify a set of fundamental axioms that every ranking-based feature attribution method should satisfy. We then introduce Rank-SHAP, extending classical Shapley values to ranking. We evaluate the RankSHAP framework through extensive experiments on two datasets, multiple ranking methods and evaluation metrics. Additionally, a user study confirms RankSHAP's alignment with human intuition. We also perform an axiomatic analysis of existing rank attribution algorithms to determine their compliance with our proposed axioms. Ultimately, our aim is to equip practitioners with a set of axiomatically backed feature attribution methods for studying IR ranking models, that ensure generality as well as consistency.

Updated: 2024-10-09 06:32:41

标题: RankSHAP：基于Shapley值的特征归因用于学习排序

摘要: 许多作品提出了针对学习排名的事后、模型无关的解释，重点是通过特征归因方法按其与查询相关性对实体进行排序。然而，这些归因经常弱相关或相互矛盾，令最终用户感到困惑。我们采用了在特征归因社区中流行的公理博弈论方法，以确定每个基于排名的特征归因方法应该满足的一组基本公理。然后，我们介绍了Rank-SHAP，将经典的Shapley值扩展到排名。我们通过对两个数据集、多种排名方法和评估指标的大量实验来评估RankSHAP框架。此外，一项用户研究证实了RankSHAP与人类直觉的一致性。我们还对现有的排名归因算法进行了公理分析，以确定它们是否符合我们提出的公理。最终，我们的目标是为从事IR排名模型研究的从业者提供一组基于公理支持的特征归因方法，以确保其普适性和一致性。

更新时间: 2024-10-09 06:32:41

领域: cs.IR,cs.LG

下载: http://arxiv.org/abs/2405.01848v2

BELM: Bidirectional Explicit Linear Multi-step Sampler for Exact Inversion in Diffusion Models

The inversion of diffusion model sampling, which aims to find the corresponding initial noise of a sample, plays a critical role in various tasks. Recently, several heuristic exact inversion samplers have been proposed to address the inexact inversion issue in a training-free manner. However, the theoretical properties of these heuristic samplers remain unknown and they often exhibit mediocre sampling quality. In this paper, we introduce a generic formulation, \emph{Bidirectional Explicit Linear Multi-step} (BELM) samplers, of the exact inversion samplers, which includes all previously proposed heuristic exact inversion samplers as special cases. The BELM formulation is derived from the variable-stepsize-variable-formula linear multi-step method via integrating a bidirectional explicit constraint. We highlight this bidirectional explicit constraint is the key of mathematically exact inversion. We systematically investigate the Local Truncation Error (LTE) within the BELM framework and show that the existing heuristic designs of exact inversion samplers yield sub-optimal LTE. Consequently, we propose the Optimal BELM (O-BELM) sampler through the LTE minimization approach. We conduct additional analysis to substantiate the theoretical stability and global convergence property of the proposed optimal sampler. Comprehensive experiments demonstrate our O-BELM sampler establishes the exact inversion property while achieving high-quality sampling. Additional experiments in image editing and image interpolation highlight the extensive potential of applying O-BELM in varying applications.

Updated: 2024-10-09 06:32:26

标题: BELM：双向显式线性多步采样器在扩散模型中的精确反演

摘要: 扩散模型采样的反演旨在找到样本的相应初始噪声，在各种任务中发挥关键作用。最近，提出了几种启发式精确反演取样器，以解决无需训练的不精确反演问题。然而，这些启发式取样器的理论性质仍未知，它们经常表现出一般的取样质量。本文介绍了一种通用的形式，即\emph{双向显式线性多步}（BELM）取样器，它包括所有先前提出的启发式精确反演取样器作为特例。BELM公式是通过整合双向显式约束从变步长-变公式线性多步方法中导出的。我们强调这种双向显式约束是数学精确反演的关键。我们系统地研究了BELM框架内的局部截断误差（LTE），并表明现有的精确反演取样器的启发式设计产生次优的LTE。因此，我们通过LTE最小化方法提出了最优的BELM（O-BELM）取样器。我们进行了额外分析，以证实所提出的最优取样器的理论稳定性和全局收敛性质。全面的实验表明，我们的O-BELM取样器建立了精确反演属性，同时实现了高质量的采样。在图像编辑和图像插值方面的额外实验突显了将O-BELM应用于各种应用中的广泛潜力。

更新时间: 2024-10-09 06:32:26

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2410.07273v1

G2D2: Gradient-guided Discrete Diffusion for image inverse problem solving

Recent literature has effectively utilized diffusion models trained on continuous variables as priors for solving inverse problems. Notably, discrete diffusion models with discrete latent codes have shown strong performance, particularly in modalities suited for discrete compressed representations, such as image and motion generation. However, their discrete and non-differentiable nature has limited their application to inverse problems formulated in continuous spaces. This paper presents a novel method for addressing linear inverse problems by leveraging image-generation models based on discrete diffusion as priors. We overcome these limitations by approximating the true posterior distribution with a variational distribution constructed from categorical distributions and continuous relaxation techniques. Furthermore, we employ a star-shaped noise process to mitigate the drawbacks of traditional discrete diffusion models with absorbing states, demonstrating that our method performs comparably to continuous diffusion techniques. To the best of our knowledge, this is the first approach to use discrete diffusion model-based priors for solving image inverse problems.

Updated: 2024-10-09 06:18:25

标题: G2D2：用于图像逆问题求解的梯度引导离散扩散

摘要: 最近的文献有效地利用了在连续变量上训练的扩散模型作为求解逆问题的先验。值得注意的是，具有离散潜在代码的离散扩散模型表现出色，特别是在适合离散压缩表示的模态中，如图像和运动生成。然而，它们的离散和不可微的特性限制了它们在连续空间中制定的逆问题中的应用。本文提出了一种新颖的方法，通过利用基于离散扩散的图像生成模型作为先验来解决线性逆问题。我们通过利用分类分布和连续松弛技术构建的变分分布来近似真实后验分布，克服了这些限制。此外，我们采用星形噪声过程来减轻传统离散扩散模型具有吸收状态的缺点，展示了我们的方法与连续扩散技术相媲美的性能。据我们所知，这是首次采用基于离散扩散模型的先验来解决图像逆问题的方法。

更新时间: 2024-10-09 06:18:25

领域: cs.CV,cs.AI,cs.LG

下载: http://arxiv.org/abs/2410.14710v1

Boosting the Performance of Decentralized Federated Learning via Catalyst Acceleration

Decentralized Federated Learning has emerged as an alternative to centralized architectures due to its faster training, privacy preservation, and reduced communication overhead. In decentralized communication, the server aggregation phase in Centralized Federated Learning shifts to the client side, which means that clients connect with each other in a peer-to-peer manner. However, compared to the centralized mode, data heterogeneity in Decentralized Federated Learning will cause larger variances between aggregated models, which leads to slow convergence in training and poor generalization performance in tests. To address these issues, we introduce Catalyst Acceleration and propose an acceleration Decentralized Federated Learning algorithm called DFedCata. It consists of two main components: the Moreau envelope function, which primarily addresses parameter inconsistencies among clients caused by data heterogeneity, and Nesterov's extrapolation step, which accelerates the aggregation phase. Theoretically, we prove the optimization error bound and generalization error bound of the algorithm, providing a further understanding of the nature of the algorithm and the theoretical perspectives on the hyperparameter choice. Empirically, we demonstrate the advantages of the proposed algorithm in both convergence speed and generalization performance on CIFAR10/100 with various non-iid data distributions. Furthermore, we also experimentally verify the theoretical properties of DFedCata.

Updated: 2024-10-09 06:17:16

标题: 通过催化剂加速提升分布式联邦学习的性能

摘要: 分散式联邦学习已经成为中央集权架构的替代方案，因为它训练速度更快，隐私保护更好，并且通信开销更低。在分散式通信中，中央集权联邦学习中的服务器聚合阶段转移到客户端，这意味着客户端以点对点的方式连接。然而，与集中式模式相比，分散式联邦学习中的数据异质性将导致聚合模型之间的差异较大，从而导致训练过程中的收敛速度较慢，测试中的泛化性能较差。为了解决这些问题，我们引入了Catalyst加速，并提出了一种名为DFedCata的加速分散式联邦学习算法。它由两个主要组件组成：Moreau包络函数主要解决由数据异质性引起的客户端参数不一致性，并且Nesterov外推步骤加速聚合阶段。在理论上，我们证明了算法的优化误差界限和泛化误差界限，进一步了解了算法的性质以及对超参数选择的理论观点。在实证方面，我们在CIFAR10/100上展示了所提算法在收敛速度和泛化性能上的优势，涵盖了各种非独立同分布的数据分布。此外，我们还在实验中验证了DFedCata的理论特性。

更新时间: 2024-10-09 06:17:16

领域: cs.LG

下载: http://arxiv.org/abs/2410.07272v1

Can DeepFake Speech be Reliably Detected?

Recent advances in text-to-speech (TTS) systems, particularly those with voice cloning capabilities, have made voice impersonation readily accessible, raising ethical and legal concerns due to potential misuse for malicious activities like misinformation campaigns and fraud. While synthetic speech detectors (SSDs) exist to combat this, they are vulnerable to ``test domain shift", exhibiting decreased performance when audio is altered through transcoding, playback, or background noise. This vulnerability is further exacerbated by deliberate manipulation of synthetic speech aimed at deceiving detectors. This work presents the first systematic study of such active malicious attacks against state-of-the-art open-source SSDs. White-box attacks, black-box attacks, and their transferability are studied from both attack effectiveness and stealthiness, using both hardcoded metrics and human ratings. The results highlight the urgent need for more robust detection methods in the face of evolving adversarial threats.

Updated: 2024-10-09 06:13:48

标题: 深度伪造语音能够可靠地检测吗？

摘要: 最近，在文本到语音（TTS）系统方面取得了进展，特别是具有语音克隆能力的系统，使声音模仿变得容易获得，引发了道德和法律关注，因为可能被恶意利用用于虚假信息宣传和欺诈活动。虽然存在用于对抗这种情况的合成语音检测器（SSD），但它们容易受到“测试域转移”的影响，在音频经过转码、回放或背景噪音干扰时表现出性能下降。这种脆弱性进一步加剧了针对欺骗检测器的合成语音的蓄意操纵。这项工作首次系统研究了针对最先进的开源SSD的主动恶意攻击。从攻击效果和隐蔽性两方面研究了白盒攻击、黑盒攻击及其可转移性，使用了硬编码指标和人工评分。结果强调了在面对不断发展的对抗威胁时更加强大的检测方法的迫切需求。

更新时间: 2024-10-09 06:13:48

领域: cs.SD,cs.CR,cs.LG

下载: http://arxiv.org/abs/2410.06572v1

LLMC: Benchmarking Large Language Model Quantization with a Versatile Compression Toolkit

Recent advancements in large language models (LLMs) are propelling us toward artificial general intelligence with their remarkable emergent abilities and reasoning capabilities. However, the substantial computational and memory requirements limit the widespread adoption. Quantization, a key compression technique, can effectively mitigate these demands by compressing and accelerating LLMs, albeit with potential risks to accuracy. Numerous studies have aimed to minimize the accuracy loss associated with quantization. However, their quantization configurations vary from each other and cannot be fairly compared. In this paper, we present LLMC, a plug-and-play compression toolkit, to fairly and systematically explore the impact of quantization. LLMC integrates dozens of algorithms, models, and hardwares, offering high extensibility from integer to floating-point quantization, from LLM to vision-language (VLM) model, from fixed-bit to mixed precision, and from quantization to sparsification. Powered by this versatile toolkit, our benchmark covers three key aspects: calibration data, algorithms (three strategies), and data formats, providing novel insights and detailed analyses for further research and practical guidance for users. Our toolkit is available at https://github.com/ModelTC/llmc.

Updated: 2024-10-09 06:09:41

标题: LLMC：使用多功能压缩工具包对大型语言模型量化进行基准测试

摘要: 最近大型语言模型（LLMs）的进展正在推动我们朝着人工通用智能迈进，其出色的新兴能力和推理能力。然而，巨大的计算和内存需求限制了广泛采用。量化是一种关键的压缩技术，可以通过压缩和加速LLMs有效缓解这些需求，尽管存在潜在的精度风险。许多研究旨在最小化与量化相关的精度损失。然而，它们的量化配置各不相同，无法公平比较。在本文中，我们提出了LLMC，一个即插即用的压缩工具包，用于公平而系统地探索量化的影响。LLMC集成了数十种算法、模型和硬件，提供了从整数到浮点量化，从LLM到视觉语言（VLM）模型，从固定位到混合精度，以及从量化到稀疏化的高扩展性。借助这个多功能工具包，我们的基准覆盖了三个关键方面：校准数据、算法（三种策略）和数据格式，为进一步研究提供了新颖的见解和详细的分析，为用户提供了实用的指导。我们的工具包可在https://github.com/ModelTC/llmc上获得。

更新时间: 2024-10-09 06:09:41

领域: cs.LG,cs.AI,cs.CL

下载: http://arxiv.org/abs/2405.06001v3

Convex Distillation: Efficient Compression of Deep Networks via Convex Optimization

Deploying large and complex deep neural networks on resource-constrained edge devices poses significant challenges due to their computational demands and the complexities of non-convex optimization. Traditional compression methods such as distillation and pruning often retain non-convexity that complicates fine-tuning in real-time on such devices. Moreover, these methods often necessitate extensive end-to-end network fine-tuning after compression to preserve model performance, which is not only time-consuming but also requires fully annotated datasets, thus potentially negating the benefits of efficient network compression. In this paper, we introduce a novel distillation technique that efficiently compresses the model via convex optimization -- eliminating intermediate non-convex activation functions and using only intermediate activations from the original model. Our approach enables distillation in a label-free data setting and achieves performance comparable to the original model without requiring any post-compression fine-tuning. We demonstrate the effectiveness of our method for image classification models on multiple standard datasets, and further show that in the data limited regime, our method can outperform standard non-convex distillation approaches. Our method promises significant advantages for deploying high-efficiency, low-footprint models on edge devices, making it a practical choice for real-world applications. We show that convex neural networks, when provided with rich feature representations from a large pre-trained non-convex model, can achieve performance comparable to their non-convex counterparts, opening up avenues for future research at the intersection of convex optimization and deep learning.

Updated: 2024-10-09 06:04:52

标题: 凸形蒸馏：通过凸优化实现深度网络的高效压缩

摘要: 在资源受限的边缘设备上部署大型复杂的深度神经网络面临着重大挑战，这是由于它们的计算需求和非凸优化的复杂性。传统的压缩方法，如蒸馏和修剪，通常保留了复杂的非凸性，这使得在此类设备上实时进行微调变得复杂。此外，这些方法通常需要在压缩后进行广泛的端到端网络微调，以保持模型性能，这不仅耗时，而且需要完全标记的数据集，因此可能抵消了高效网络压缩的好处。在本文中，我们介绍了一种通过凸优化高效压缩模型的新型蒸馏技术 - 消除中间的非凸激活函数，仅使用原始模型的中间激活。我们的方法使得在无标签数据设置中进行蒸馏成为可能，并且在不需要任何后压缩微调的情况下达到与原始模型可比的性能。我们展示了我们的方法在多个标准数据集上的图像分类模型的有效性，并进一步表明，在数据受限的情况下，我们的方法可以胜过标准的非凸蒸馏方法。我们的方法为在边缘设备上部署高效、低占用空间的模型带来了显著优势，使其成为现实世界应用的实际选择。我们展示了，当提供了来自大型预训练的非凸模型的丰富特征表示时，凸神经网络可以实现与非凸对应物可比的性能，为凸优化和深度学习交叉领域的未来研究开辟了途径。

更新时间: 2024-10-09 06:04:52

领域: cs.LG

下载: http://arxiv.org/abs/2410.06567v1

Beyond Correctness: Benchmarking Multi-dimensional Code Generation for Large Language Models

In recent years, researchers have proposed numerous benchmarks to evaluate the impressive coding capabilities of large language models (LLMs). However, current benchmarks primarily assess the accuracy of LLM-generated code, while neglecting other critical dimensions that also significantly impact code quality in real-world development. Moreover, relying exclusively on correctness as the guiding metric renders LLMs susceptible to data contamination. Therefore, this paper proposes the RACE benchmark, which comprehensively evaluates the quality of code generated by LLMs across 4 dimensions: Readability, mAintainability, Correctness, and Efficiency. Specifically, considering the demand-dependent nature of dimensions beyond correctness, we design various types of user requirements for each dimension to assess the model's ability to generate correct code that also meets user demands. We analyze 28 representative LLMs based on RACE and find that: 1) current correctness-centric benchmarks fail to capture the multifaceted requirements of code in real-world scenarios, while RACE provides a comprehensive evaluation that reveals the defects of LLMs across multiple dimensions; 2) the RACE benchmark serves as an effective tool for resisting the risk of data contamination; 3) even the most advanced code LLMs still encounter significant challenges in customized requirements involving complex instructions; 4) most LLMs exhibit an inherent preference for specific coding style. These findings highlight the need for a multidimensional evaluation of code LLMs, emphasizing metrics beyond correctness for real-world applications. Future efforts should aim to develop novel learning algorithms to enhance code generation under varied constraints and improve coverage and usability for diverse user needs.

Updated: 2024-10-09 05:59:07

标题: 超越正确性：为大型语言模型基准测试多维代码生成

摘要: 近年来，研究人员提出了许多基准来评估大型语言模型（LLMs）令人印象深刻的编码能力。然而，当前的基准主要评估LLM生成的代码的准确性，而忽略了其他同样对实际开发中代码质量产生重大影响的关键维度。此外，仅依赖正确性作为指导度量单位会使LLMs容易受到数据污染的影响。因此，本文提出了RACE基准，全面评估LLMs生成的代码质量，包括可读性、可维护性、正确性和效率这4个维度。具体来说，考虑到除了正确性之外的维度的需求依赖性，我们为每个维度设计了各种类型的用户需求，以评估模型生成正确代码的能力，同时满足用户需求。我们根据RACE分析了28个代表性的LLMs，并发现：1）当前以正确性为中心的基准无法捕捉实际场景中代码的多方面要求，而RACE提供了全面评估，揭示了LLMs在多个维度上的缺陷；2）RACE基准是一种有效抵抗数据污染风险的工具；3）即使是最先进的代码LLMs在涉及复杂指令的定制要求方面仍然面临重大挑战；4）大多数LLMs表现出对特定编码风格的固有偏好。这些发现突显了对代码LLMs进行多维度评估的必要性，强调了超越正确性的度量标准对于实际应用的重要性。未来的努力应致力于开发新颖的学习算法，以增强在各种约束条件下的代码生成，并提高对各种用户需求的覆盖范围和可用性。

更新时间: 2024-10-09 05:59:07

领域: cs.SE,cs.AI,cs.CL

下载: http://arxiv.org/abs/2407.11470v2

Measuring Diversity of Game Scenarios

This survey comprehensively reviews the multi-dimensionality of game scenario diversity, spotlighting the innovative use of procedural content generation and other fields as cornerstones for enriching player experiences through diverse game scenarios. By traversing a wide array of disciplines, from affective modeling and multi-agent systems to psychological studies, our research underscores the importance of diverse game scenarios in gameplay and education. Through a taxonomy of diversity metrics and evaluation methods, we aim to bridge the current gaps in literature and practice, offering insights into effective strategies for measuring and integrating diversity in game scenarios. Our analysis highlights the necessity for a unified taxonomy to aid developers and researchers in crafting more engaging and varied game worlds. This survey not only charts a path for future research in diverse game scenarios but also serves as a handbook for industry practitioners seeking to leverage diversity as a key component of game design and development.

Updated: 2024-10-09 05:58:58

标题: 测量游戏场景的多样性

摘要: 本调查全面审视了游戏场景多样性的多维性，重点关注程序内容生成和其他领域的创新应用，作为丰富玩家体验的基石，通过多样化的游戏场景。通过横跨广泛的学科领域，从情感建模和多智能系统到心理研究，我们的研究强调了游戏场景多样性在游戏和教育中的重要性。通过多样性指标和评估方法的分类，我们旨在弥合文献和实践中的当前差距，为测量和整合游戏场景中的多样性提供洞见。我们的分析突出了统一分类法的必要性，以帮助开发人员和研究人员打造更具吸引力和多样化的游戏世界。这项调查不仅为未来研究提供了多样化游戏场景的道路，还为希望将多样性作为游戏设计和开发的关键组成部分的行业从业者提供了手册。

更新时间: 2024-10-09 05:58:58

领域: cs.AI

下载: http://arxiv.org/abs/2404.15192v2

Learning to Plan for Retrieval-Augmented Large Language Models from Knowledge Graphs

Improving the performance of large language models (LLMs) in complex question-answering (QA) scenarios has always been a research focal point. Recent studies have attempted to enhance LLMs' performance by combining step-wise planning with external retrieval. While effective for advanced models like GPT-3.5, smaller LLMs face challenges in decomposing complex questions, necessitating supervised fine-tuning. Previous work has relied on manual annotation and knowledge distillation from teacher LLMs, which are time-consuming and not accurate enough. In this paper, we introduce a novel framework for enhancing LLMs' planning capabilities by using planning data derived from knowledge graphs (KGs). LLMs fine-tuned with this data have improved planning capabilities, better equipping them to handle complex QA tasks that involve retrieval. Evaluations on multiple datasets, including our newly proposed benchmark, highlight the effectiveness of our framework and the benefits of KG-derived planning data.

Updated: 2024-10-09 05:56:07

标题: 学习为检索增强的大型语言模型从知识图中规划

摘要: 在复杂的问答场景中提高大型语言模型（LLMs）的性能一直是研究的重点。最近的研究尝试通过将逐步规划与外部检索相结合来提高LLMs的性能。虽然对于像GPT-3.5这样的高级模型有效，但较小的LLMs在分解复杂问题方面面临挑战，需要进行监督微调。先前的工作依赖于手动注释和从教师LLMs进行知识蒸馏，这些方法耗时且不够准确。在本文中，我们介绍了一种利用知识图谱（KGs）衍生的规划数据来增强LLMs规划能力的新框架。使用这些数据进行微调的LLMs具有改进的规划能力，更好地使它们能够处理涉及检索的复杂问答任务。对包括我们新提出的基准数据集在内的多个数据集的评估突出了我们框架的有效性和KG衍生的规划数据的好处。

更新时间: 2024-10-09 05:56:07

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2406.14282v2

Efficient and Robust Knowledge Distillation from A Stronger Teacher Based on Correlation Matching

Knowledge Distillation (KD) has emerged as a pivotal technique for neural network compression and performance enhancement. Most KD methods aim to transfer dark knowledge from a cumbersome teacher model to a lightweight student model based on Kullback-Leibler (KL) divergence loss. However, the student performance improvements achieved through KD exhibit diminishing marginal returns, where a stronger teacher model does not necessarily lead to a proportionally stronger student model. To address this issue, we empirically find that the KL-based KD method may implicitly change the inter-class relationships learned by the student model, resulting in a more complex and ambiguous decision boundary, which in turn reduces the model's accuracy and generalization ability. Therefore, this study argues that the student model should learn not only the probability values from the teacher's output but also the relative ranking of classes, and proposes a novel Correlation Matching Knowledge Distillation (CMKD) method that combines the Pearson and Spearman correlation coefficients-based KD loss to achieve more efficient and robust distillation from a stronger teacher model. Moreover, considering that samples vary in difficulty, CMKD dynamically adjusts the weights of the Pearson-based loss and Spearman-based loss. CMKD is simple yet practical, and extensive experiments demonstrate that it can consistently achieve state-of-the-art performance on CIRAR-100 and ImageNet, and adapts well to various teacher architectures, sizes, and other KD methods.

Updated: 2024-10-09 05:42:47

标题: 高效稳健的知识蒸馏：基于相关性匹配的强大教师

摘要: 知识蒸馏（KD）已经成为神经网络压缩和性能增强的关键技术。大多数KD方法旨在基于Kullback-Leibler（KL）散度损失，将来自繁琐教师模型的暗知识转移给轻量级学生模型。然而，通过KD实现的学生性能改进表现出边际收益递减，即更强大的教师模型不一定会导致学生模型的相应增强。为了解决这个问题，我们在实践中发现，基于KL的KD方法可能会隐式改变学生模型学习到的类间关系，导致更复杂和模糊的决策边界，进而降低模型的准确性和泛化能力。因此，本研究认为学生模型不仅应该学习教师输出的概率值，还应该学习类的相对排名，并提出了一种新颖的基于Pearson和Spearman相关系数的KD损失的相关匹配知识蒸馏（CMKD）方法，以从更强大的教师模型中实现更高效和稳健的蒸馏。此外，考虑到样本的难度不同，CMKD动态调整了基于Pearson的损失和基于Spearman的损失的权重。CMKD简单而实用，广泛实验表明，它可以在CIFAR-100和ImageNet上始终取得最先进的性能，并且可以很好地适应各种教师架构、大小和其他KD方法。

更新时间: 2024-10-09 05:42:47

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2410.06561v1

Mitigating Time Discretization Challenges with WeatherODE: A Sandwich Physics-Driven Neural ODE for Weather Forecasting

In the field of weather forecasting, traditional models often grapple with discretization errors and time-dependent source discrepancies, which limit their predictive performance. In this paper, we present WeatherODE, a novel one-stage, physics-driven ordinary differential equation (ODE) model designed to enhance weather forecasting accuracy. By leveraging wave equation theory and integrating a time-dependent source model, WeatherODE effectively addresses the challenges associated with time-discretization error and dynamic atmospheric processes. Moreover, we design a CNN-ViT-CNN sandwich structure, facilitating efficient learning dynamics tailored for distinct yet interrelated tasks with varying optimization biases in advection equation estimation. Through rigorous experiments, WeatherODE demonstrates superior performance in both global and regional weather forecasting tasks, outperforming recent state-of-the-art approaches by significant margins of over 40.0\% and 31.8\% in root mean square error (RMSE), respectively. The source code is available at \url{https://github.com/DAMO-DI-ML/WeatherODE}.

Updated: 2024-10-09 05:41:24

标题: 使用WeatherODE缓解时间离散挑战：一种面向天气预测的Sandwich物理驱动神经ODE

摘要: 在天气预报领域，传统模型常常面临离散化误差和时间相关源差异的挑战，这限制了它们的预测性能。本文介绍了WeatherODE，这是一个新颖的一阶、受物理驱动的常微分方程（ODE）模型，旨在提高天气预报的准确性。通过利用波动方程理论并整合一个时间相关源模型，WeatherODE有效地解决了与时间离散化误差和动态大气过程相关的挑战。此外，我们设计了一个CNN-ViT-CNN三明治结构，促进了为不同但相关的任务定制的高效学习动态，在对流方程估计中具有不同的优化偏差。通过严格的实验，WeatherODE在全球和区域天气预报任务中展现出卓越的性能，在均方根误差（RMSE）方面，分别较最新的最先进方法提高了超过40.0％和31.8％。源代码可在\url{https://github.com/DAMO-DI-ML/WeatherODE}获取。

更新时间: 2024-10-09 05:41:24

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2410.06560v1

Bridging Today and the Future of Humanity: AI Safety in 2024 and Beyond

Significant progress has been made in AI safety. However, as this field thrives, a critical question emerges: Are our current efforts aligned with the broader perspective of history and human civilization? This paper presents a blueprint for an advanced human society and leverages this vision to guide contemporary AI safety efforts. We outline a future where the Internet of Everything becomes reality, and create a roadmap that represents significant technological advancements towards this envisioned future. For each stage of advancement, we forecast potential AI safety issues that humanity may face. By projecting current efforts against this blueprint, we examine the alignment between the present motivations and long-term needs. We identify gaps in current approaches and highlight unique challenges and missions that demand increasing attention from AI safety practitioners in the 2020s, addressing critical areas that must not be overlooked in shaping a safe and responsible future for AI development. This vision paper aims to offer a broader perspective on AI safety, emphasizing that our efforts should not only address immediate concerns but also anticipate potential risks in the expanding AI landscape, thereby fostering AI's role in promoting a more secure and sustainable future for human civilization.

Updated: 2024-10-09 05:36:29

标题: 连接当下与人类未来：2024年及以后的AI安全

摘要: 人工智能安全领域取得了显著进展。然而，随着这一领域的蓬勃发展，一个关键问题浮出水面：我们当前的努力是否与历史和人类文明的更广泛视角相一致？本文提出了一个先进人类社会的蓝图，并利用这一愿景指导当代人工智能安全工作。我们概述了一个未来，在这个未来中，“万物互联”成为现实，并创建了一份代表朝着这一愿景未来的重大技术进步的路线图。对于每个进步阶段，我们预测人类可能面临的潜在人工智能安全问题。通过将当前努力投射到这一蓝图上，我们审视当前动机与长期需求之间的一致性。我们确定了当前方法中的差距，并突出了需要人工智能安全从业者在2020年代更多关注的独特挑战和任务，解决在塑造人工智能发展安全和负责任未来中不容忽视的关键领域。这篇愿景论文旨在提供对人工智能安全的更广泛视角，强调我们的努力不仅应该解决眼前问题，还应该预见扩大的人工智能领域中潜在风险，从而促进人工智能在促进更安全和可持续的人类文明未来中的作用。

更新时间: 2024-10-09 05:36:29

领域: cs.CY,cs.AI

下载: http://arxiv.org/abs/2410.18114v1

A new approach to delegate signing rights to proxy signers using isogeny-based cryptography

E-governance is a two-way protocol through which one can use government services, share data and request information. It refers to the use of communication and information technologies to provide government services to public in an efficient and fast manner. In addition, any document submitted to the e-Government system must be authenticated by a government officer using a digital signature scheme. In the context of digital signatures, the proxy signature is an important cryptographic primitive that allows the original signer to delegate signing authority to another signer (proxy signer). The proxy signature has a number of important applications in the e-government system. There are now a large amount of proxy signature schemes. The security of most of them relies on the following hard problems: the discrete logarithm problem and the factorization of integers problem. However, a large-scale quantum computer can solve them in polynomial time due to Shor's algorithm. As a consequence, there is a need for a quantum computer-resistant proxy signature to secure e-governance system from quantum adversaries. In this work, we propose the first post-quantum isogeny based proxy signature scheme CSI-PS (commutative supersingular isogeny proxy signature). Our construction is proven to be uf-cma secure under the hardness of the group action inverse problem (GAIP) based on isogeny.

Updated: 2024-10-09 05:24:04

标题: 使用同态密码学将代理签名权委托给代理签名者的新方法

摘要: 电子治理是一个双向协议，通过这个协议，人们可以使用政府服务、共享数据和请求信息。它指的是利用通信和信息技术以高效快速的方式向公众提供政府服务。此外，提交给电子政府系统的任何文件都必须由政府官员使用数字签名方案进行认证。在数字签名的背景下，代理签名是一种重要的密码原语，允许原始签名者将签名权限委托给另一个签名者（代理签名者）。代理签名在电子政府系统中有许多重要的应用。现在有大量的代理签名方案。其中大部分的安全性依赖于离散对数问题和整数分解问题。然而，由于Shor算法，大规模的量子计算机可以在多项式时间内解决这些问题。因此，有必要开发一个抵抗量子计算机的代理签名来保护电子治理系统免受量子对手的威胁。在这项工作中，我们提出了第一个基于等距同源的代理签名方案CSI-PS（可交换的超奇异同源代理签名）。我们的构建经证明在等距基础上的群作用逆问题（GAIP）的困难下是uf-cma安全的。

更新时间: 2024-10-09 05:24:04

领域: cs.CR

下载: http://arxiv.org/abs/2407.13318v2

DCP: Learning Accelerator Dataflow for Neural Network via Propagation

Deep neural network (DNN) hardware (HW) accelerators have achieved great success in improving DNNs' performance and efficiency. One key reason is dataflow in executing a DNN layer, including on-chip data partitioning, computation parallelism, and scheduling policy, which have large impacts on latency and energy consumption. Unlike prior works that required considerable efforts from HW engineers to design suitable dataflows for different DNNs, this work proposes an efficient data-centric approach, named Dataflow Code Propagation (DCP), to automatically find the optimal dataflow for DNN layers in seconds without human effort. It has several attractive benefits that prior arts do not have. (i) We translate the HW dataflow configuration into a code representation in a unified dataflow coding space, which can be optimized by backpropagating gradients given a DNN layer or network. (ii) DCP learns a neural predictor to efficiently update the dataflow codes towards the desired gradient directions to minimize various optimization objectives e.g., latency and energy. (iii) It can be easily generalized to unseen HW configurations in a zero-shot or few-shot learning manner. For example, without using additional training data, DCP surpasses the GAMMA method that performs a full search using thousands of samples. Extensive experiments on several representative models such as MobileNet, ResNet, and ViT show that DCP outperforms its counterparts in various settings.

Updated: 2024-10-09 05:16:44

标题: DCP: 通过传播加速神经网络的数据流程学习

摘要: 深度神经网络（DNN）硬件（HW）加速器在提高DNN性能和效率方面取得了巨大成功。其中一个关键原因是执行DNN层时的数据流，包括芯片内数据分区、计算并行性和调度策略，对延迟和能耗有很大影响。与以往需要硬件工程师为不同DNN设计适当数据流的工作不同，本文提出了一种高效的数据中心方法，名为数据流代码传播（DCP），可以在几秒内自动找到DNN层的最佳数据流，无需人为干预。它具有一些传统方法所不具备的吸引人的优点：(i)将硬件数据流配置转换为统一数据流编码空间中的代码表示，可以通过反向传播梯度优化给定DNN层或网络。(ii)DCP学习神经预测器以有效更新数据流代码，朝着期望的梯度方向最小化各种优化目标，例如延迟和能耗。(iii)可以轻松推广到未见过的硬件配置，以零-shot或少-shot学习方式。例如，DCP不使用额外训练数据就超越了使用数千个样本进行完全搜索的GAMMA方法。对MobileNet、ResNet和ViT等几个代表性模型进行了大量实验，结果显示DCP在各种设置下均优于其对手。

更新时间: 2024-10-09 05:16:44

领域: cs.LG,eess.IV

下载: http://arxiv.org/abs/2410.06553v1

InstantIR: Blind Image Restoration with Instant Generative Reference

Handling test-time unknown degradation is the major challenge in Blind Image Restoration (BIR), necessitating high model generalization. An effective strategy is to incorporate prior knowledge, either from human input or generative model. In this paper, we introduce Instant-reference Image Restoration (InstantIR), a novel diffusion-based BIR method which dynamically adjusts generation condition during inference. We first extract a compact representation of the input via a pre-trained vision encoder. At each generation step, this representation is used to decode current diffusion latent and instantiate it in the generative prior. The degraded image is then encoded with this reference, providing robust generation condition. We observe the variance of generative references fluctuate with degradation intensity, which we further leverage as an indicator for developing a sampling algorithm adaptive to input quality. Extensive experiments demonstrate InstantIR achieves state-of-the-art performance and offering outstanding visual quality. Through modulating generative references with textual description, InstantIR can restore extreme degradation and additionally feature creative restoration.

Updated: 2024-10-09 05:15:29

标题: InstantIR：使用即时生成参考进行盲图像恢复

摘要: 处理测试时间未知的退化是盲图像恢复（BIR）中的主要挑战，需要高模型泛化能力。一种有效策略是结合先验知识，可以来自人类输入或生成模型。在本文中，我们介绍了Instant-reference Image Restoration（InstantIR），这是一种基于扩散的新型BIR方法，它在推理过程中动态调整生成条件。我们首先通过预训练的视觉编码器提取输入的紧凑表示。在每个生成步骤中，该表示被用于解码当前的扩散潜变量，并在生成先验中实例化它。然后，退化的图像被用这个参考编码，提供了稳健的生成条件。我们观察到生成参考的方差随着退化强度波动，我们进一步利用这一点作为开发适应输入质量的抽样算法的指标。大量实验表明InstantIR实现了最先进的性能，并提供了出色的视觉质量。通过通过文本描述调节生成参考，InstantIR可以恢复极端的退化，并进一步实现创意恢复。

更新时间: 2024-10-09 05:15:29

领域: cs.CV,cs.AI,cs.LG

下载: http://arxiv.org/abs/2410.06551v1

Investigating Cost-Efficiency of LLM-Generated Training Data for Conversational Semantic Frame Analysis

Recent studies have demonstrated that few-shot learning allows LLMs to generate training data for supervised models at a low cost. However, the quality of LLM-generated data may not entirely match that of human-labeled data. This raises a crucial question: how should one balance the trade-off between the higher quality but more expensive human data and the lower quality yet substantially cheaper LLM-generated data? In this paper, we synthesized training data for conversational semantic frame analysis using GPT-4 and examined how to allocate budgets optimally to achieve the best performance. Our experiments, conducted across various budget levels, reveal that optimal cost-efficiency is achieved by combining both human and LLM-generated data across a wide range of budget levels. Notably, as the budget decreases, a higher proportion of LLM-generated data becomes more preferable.

Updated: 2024-10-09 05:15:13

标题: 研究LLM生成的培训数据在会话语义框架分析中的成本效率

摘要: 最近的研究表明，少样本学习使LLMs能够以较低成本生成监督模型的训练数据。然而，LLM生成的数据质量可能并不完全匹配人工标记数据的质量。这引发了一个关键问题：如何平衡高质量但更昂贵的人工数据与质量较低但成本更低的LLM生成数据之间的权衡？在本文中，我们使用GPT-4合成了用于对话语义框架分析的训练数据，并研究了如何最优地分配预算以实现最佳性能。我们的实验在各种预算水平上进行，结果显示最佳的成本效益是通过在广泛的预算水平上结合人工和LLM生成的数据来实现的。值得注意的是，随着预算的减少，更多比例的LLM生成数据变得更可取。

更新时间: 2024-10-09 05:15:13

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2410.06550v1

Multi-Task Program Error Repair and Explanatory Diagnosis

Program errors can occur in any type of programming, and can manifest in a variety of ways, such as unexpected output, crashes, or performance issues. And program error diagnosis can often be too abstract or technical for developers to understand, especially for beginners. The goal of this paper is to present a novel machine-learning approach for Multi-task Program Error Repair and Explanatory Diagnosis (mPRED). A pre-trained language model is used to encode the source code, and a downstream model is specifically designed to identify and repair errors. Programs and test cases will be augmented and optimized from several perspectives. Additionally, our approach incorporates a "chain of thoughts" method, which enables the models to produce intermediate reasoning explanations before providing the final correction. To aid in visualizing and analyzing the program structure, we use a graph neural network for program structure visualization. Overall, our approach offers a promising approach for repairing program errors across different programming languages and providing helpful explanations to programmers.

Updated: 2024-10-09 05:09:24

标题: 多任务程序错误修复和解释性诊断

摘要: 程序错误可能发生在任何类型的编程中，并且可能以各种方式表现出来，例如意外输出、崩溃或性能问题。程序错误诊断往往对开发人员来说过于抽象或技术性强，尤其对初学者来说。本文的目标是提出一种新颖的机器学习方法，用于多任务程序错误修复和解释性诊断（mPRED）。使用预训练的语言模型对源代码进行编码，并专门设计一个下游模型来识别和修复错误。程序和测试用例将从多个角度进行增强和优化。此外，我们的方法采用了一种“思维链”方法，使模型能够在提供最终修正之前产生中间推理解释。为了帮助可视化和分析程序结构，我们使用图神经网络进行程序结构可视化。总的来说，我们的方法为跨不同编程语言修复程序错误并为程序员提供有用的解释提供了一个有前途的方法。

更新时间: 2024-10-09 05:09:24

领域: cs.SE,cs.AI

下载: http://arxiv.org/abs/2410.07271v1

DiffGAD: A Diffusion-based Unsupervised Graph Anomaly Detector

Graph Anomaly Detection (GAD) is crucial for identifying abnormal entities within networks, garnering significant attention across various fields. Traditional unsupervised methods, which decode encoded latent representations of unlabeled data with a reconstruction focus, often fail to capture critical discriminative content, leading to suboptimal anomaly detection. To address these challenges, we present a Diffusion-based Graph Anomaly Detector (DiffGAD). At the heart of DiffGAD is a novel latent space learning paradigm, meticulously designed to enhance its proficiency by guiding it with discriminative content. This innovative approach leverages diffusion sampling to infuse the latent space with discriminative content and introduces a content-preservation mechanism that retains valuable information across different scales, significantly improving its adeptness at identifying anomalies with limited time and space complexity. Our comprehensive evaluation of DiffGAD, conducted on six real-world and large-scale datasets with various metrics, demonstrated its exceptional performance.

Updated: 2024-10-09 05:02:56

标题: DiffGAD：基于扩散的无监督图异常检测器

摘要: 图形异常检测（GAD）对于识别网络中的异常实体至关重要，在各个领域引起了重要关注。传统的无监督方法，通过解码未标记数据的编码潜在表示并关注重建，通常无法捕捉关键的区分内容，导致异常检测效果不佳。为了解决这些挑战，我们提出了一种基于扩散的图形异常检测器（DiffGAD）。DiffGAD的核心是一种新颖的潜在空间学习范式，精心设计以通过指导其具有区分内容来增强其效率。这种创新方法利用扩散采样将潜在空间注入区分内容，并引入一个保留有价值信息跨不同尺度的内容保留机制，显著提高了其在有限时间和空间复杂度下识别异常的能力。我们在六个真实世界和大规模数据集上进行了DiffGAD的全面评估，使用各种指标展示了其出色的性能。

更新时间: 2024-10-09 05:02:56

领域: cs.LG,cs.AI,cs.SI

下载: http://arxiv.org/abs/2410.06549v1

FreqMark: Frequency-Based Watermark for Sentence-Level Detection of LLM-Generated Text

The increasing use of Large Language Models (LLMs) for generating highly coherent and contextually relevant text introduces new risks, including misuse for unethical purposes such as disinformation or academic dishonesty. To address these challenges, we propose FreqMark, a novel watermarking technique that embeds detectable frequency-based watermarks in LLM-generated text during the token sampling process. The method leverages periodic signals to guide token selection, creating a watermark that can be detected with Short-Time Fourier Transform (STFT) analysis. This approach enables accurate identification of LLM-generated content, even in mixed-text scenarios with both human-authored and LLM-generated segments. Our experiments demonstrate the robustness and precision of FreqMark, showing strong detection capabilities against various attack scenarios such as paraphrasing and token substitution. Results show that FreqMark achieves an AUC improvement of up to 0.98, significantly outperforming existing detection methods.

Updated: 2024-10-09 05:01:48

标题: FreqMark: 基于频率的水印技术，用于检测由LLM生成的文本中的句子级别。

摘要: 随着大型语言模型（LLMs）在生成高度连贯和上下文相关文本方面的增加使用，引入了新的风险，包括被用于不道德目的，如虚假信息或学术不诚实。为了解决这些挑战，我们提出了一种新颖的水印技术FreqMark，该技术在LLM生成文本的标记抽样过程中嵌入可检测的基于频率的水印。该方法利用周期信号来引导标记选择，创建出一种可以通过短时傅里叶变换（STFT）分析检测的水印。这种方法使得即使在包含人类撰写和LLM生成部分的混合文本情况下，也能准确识别LLM生成的内容。我们的实验展示了FreqMark的鲁棒性和精度，显示出对各种攻击场景的强大检测能力，如改写和标记替换。结果显示，FreqMark实现了高达0.98的AUC改进，明显优于现有的检测方法。

更新时间: 2024-10-09 05:01:48

领域: cs.CL,cs.CR,cs.LG

下载: http://arxiv.org/abs/2410.10876v1

MeshAnything: Artist-Created Mesh Generation with Autoregressive Transformers

Recently, 3D assets created via reconstruction and generation have matched the quality of manually crafted assets, highlighting their potential for replacement. However, this potential is largely unrealized because these assets always need to be converted to meshes for 3D industry applications, and the meshes produced by current mesh extraction methods are significantly inferior to Artist-Created Meshes (AMs), i.e., meshes created by human artists. Specifically, current mesh extraction methods rely on dense faces and ignore geometric features, leading to inefficiencies, complicated post-processing, and lower representation quality. To address these issues, we introduce MeshAnything, a model that treats mesh extraction as a generation problem, producing AMs aligned with specified shapes. By converting 3D assets in any 3D representation into AMs, MeshAnything can be integrated with various 3D asset production methods, thereby enhancing their application across the 3D industry. The architecture of MeshAnything comprises a VQ-VAE and a shape-conditioned decoder-only transformer. We first learn a mesh vocabulary using the VQ-VAE, then train the shape-conditioned decoder-only transformer on this vocabulary for shape-conditioned autoregressive mesh generation. Our extensive experiments show that our method generates AMs with hundreds of times fewer faces, significantly improving storage, rendering, and simulation efficiencies, while achieving precision comparable to previous methods.

Updated: 2024-10-09 05:00:16

标题: MeshAnything：由自回归Transformer生成的艺术家创作的网格

摘要: 最近，通过重建和生成创建的3D资产已经达到了手工制作资产的质量水平，突显了它们取代的潜力。然而，这种潜力在很大程度上尚未实现，因为这些资产总是需要转换为网格以用于3D行业应用，而当前网格提取方法产生的网格明显不如人类艺术家创建的网格（AMs），即由人类艺术家创建的网格。具体来说，当前的网格提取方法依赖于密集面并忽略几何特征，导致低效率、复杂的后处理和较低的表现质量。为了解决这些问题，我们引入了MeshAnything，这是一个将网格提取视为生成问题的模型，可以生成与指定形状对齐的AMs。通过将任何3D表示中的3D资产转换为AMs，MeshAnything可以与各种3D资产生产方法集成，从而增强它们在3D行业中的应用。MeshAnything的架构包括一个VQ-VAE和一个形状条件的解码器-仅变压器。我们首先使用VQ-VAE学习网格词汇，然后在这个词汇上训练形状条件的解码器-仅变压器进行形状条件的自回归网格生成。我们广泛的实验证明，我们的方法生成的AMs面数少了数百倍，显著提高了存储、渲染和模拟效率，同时实现了与以前方法可比的精度。

更新时间: 2024-10-09 05:00:16

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2406.10163v2

Peer-to-Peer Energy Trading of Solar and Energy Storage: A Networked Multiagent Reinforcement Learning Approach

Utilizing distributed renewable and energy storage resources in local distribution networks via peer-to-peer (P2P) energy trading has long been touted as a solution to improve energy systems' resilience and sustainability. Consumers and prosumers (those who have energy generation resources), however, do not have the expertise to engage in repeated P2P trading, and the zero-marginal costs of renewables present challenges in determining fair market prices. To address these issues, we propose multi-agent reinforcement learning (MARL) frameworks to help automate consumers' bidding and management of their solar PV and energy storage resources, under a specific P2P clearing mechanism that utilizes the so-called supply-demand ratio. In addition, we show how the MARL frameworks can integrate physical network constraints to realize voltage control, hence ensuring physical feasibility of the P2P energy trading and paving way for real-world implementations.

Updated: 2024-10-09 04:57:47

标题: 太阳能和储能的点对点能源交易：基于网络化多智能体强化学习的方法

摘要: 利用分布式可再生能源和能量存储资源在本地配电网络中通过点对点（P2P）能源交易长期被吹捧为改善能源系统的韧性和可持续性的解决方案。然而，消费者和自给自足者（拥有能源发电资源的人）缺乏参与重复P2P交易的专业知识，可再生能源的零边际成本也给确定公平市场价格带来挑战。为了解决这些问题，我们提出了多智能体强化学习（MARL）框架，帮助自动化消费者对他们的太阳能光伏和能量存储资源的投标和管理，在一个利用所谓的供需比率的特定P2P结算机制下。此外，我们展示了MARL框架如何整合物理网络约束以实现电压控制，从而确保P2P能源交易的物理可行性，并为实际应用铺平道路。

更新时间: 2024-10-09 04:57:47

领域: eess.SY,cs.LG,cs.MA,cs.SY

下载: http://arxiv.org/abs/2401.13947v3

Infusing Self-Consistency into Density Functional Theory Hamiltonian Prediction via Deep Equilibrium Models

In this study, we introduce a unified neural network architecture, the Deep Equilibrium Density Functional Theory Hamiltonian (DEQH) model, which incorporates Deep Equilibrium Models (DEQs) for predicting Density Functional Theory (DFT) Hamiltonians. The DEQH model inherently captures the self-consistency nature of Hamiltonian, a critical aspect often overlooked by traditional machine learning approaches for Hamiltonian prediction. By employing DEQ within our model architecture, we circumvent the need for DFT calculations during the training phase to introduce the Hamiltonian's self-consistency, thus addressing computational bottlenecks associated with large or complex systems. We propose a versatile framework that combines DEQ with off-the-shelf machine learning models for predicting Hamiltonians. When benchmarked on the MD17 and QH9 datasets, DEQHNet, an instantiation of the DEQH framework, has demonstrated a significant improvement in prediction accuracy. Beyond a predictor, the DEQH model is a Hamiltonian solver, in the sense that it uses the fixed-point solving capability of the deep equilibrium model to iteratively solve for the Hamiltonian. Ablation studies of DEQHNet further elucidate the network's effectiveness, offering insights into the potential of DEQ-integrated networks for Hamiltonian learning. We open source our implementation at https://github.com/Zun-Wang/DEQHNet.

Updated: 2024-10-09 04:51:36

标题: 通过深度平衡模型将自洽性融入密度泛函理论哈密顿预测中

摘要: 在这项研究中，我们介绍了一种统一的神经网络架构，Deep Equilibrium Density Functional Theory Hamiltonian（DEQH）模型，该模型整合了Deep Equilibrium Models（DEQs），用于预测密度泛函理论（DFT）哈密顿量。DEQH模型固有地捕捉了哈密顿量的自洽性质，这是传统机器学习方法通常忽视的关键方面。通过在我们的模型架构中使用DEQ，我们在训练阶段避免了需要进行DFT计算以引入哈密顿量的自洽性，从而解决了与大型或复杂系统相关的计算瓶颈。我们提出了一个多功能框架，将DEQ与现成的机器学习模型结合起来预测哈密顿量。在MD17和QH9数据集上进行基准测试时，DEQHNet作为DEQH框架的一个实例，已经展示了预测准确性的显著提高。除了作为一个预测器，DEQH模型还是一个哈密顿量求解器，它利用深度平衡模型的固定点求解能力迭代地求解哈密顿量。DEQHNet的消融研究进一步阐明了网络的有效性，为DEQ集成网络在哈密顿学习中的潜力提供了见解。我们在https://github.com/Zun-Wang/DEQHNet上开源我们的实现。

更新时间: 2024-10-09 04:51:36

领域: cs.LG

下载: http://arxiv.org/abs/2406.03794v2

A Data-to-Product Multimodal Conceptual Framework to Achieve Automated Software Evolution for Context-rich Intelligent Applications

While AI is extensively transforming Software Engineering (SE) fields, SE is still in need of a framework to overall consider all phases to facilitate Automated Software Evolution (ASEv), particularly for intelligent applications that are context-rich, instead of conquering each division independently. Its complexity comes from the intricacy of the intelligent applications, the heterogeneity of the data sources, and the constant changes in the context. This study proposes a conceptual framework for achieving automated software evolution, emphasizing the importance of multimodality learning. A Selective Sequential Scope Model (3S) model is developed based on the conceptual framework, and it can be used to categorize existing and future research when it covers different SE phases and multimodal learning tasks. This research is a preliminary step toward the blueprint of a higher-level ASEv. The proposed conceptual framework can act as a practical guideline for practitioners to prepare themselves for diving into this area. Although the study is about intelligent applications, the framework and analysis methods may be adapted for other types of software as AI brings more intelligence into their life cycles.

Updated: 2024-10-09 04:49:27

标题: 一个数据到产品的多模态概念框架，实现面向上下文丰富智能应用的自动化软件演化

摘要: 尽管人工智能广泛应用于软件工程领域，但软件工程仍需要一个框架来全面考虑所有阶段，以促进自动化软件演进（ASEv），特别是针对上下文丰富的智能应用，而不是单独征服每个部门。其复杂性源于智能应用的复杂性，数据来源的异质性以及上下文的不断变化。本研究提出了一个概念框架，用于实现自动化软件演进，强调多模态学习的重要性。基于这一概念框架，开发了一种选择性顺序范围模型（3S）模型，可用于对现有和未来研究进行分类，当涵盖不同的软件工程阶段和多模态学习任务时。这项研究是通往更高级别ASEv蓝图的初步步骤。所提出的概念框架可以作为实践者准备深入探讨这一领域的实用指南。尽管本研究是关于智能应用，但随着人工智能为各种软件的生命周期带来更多智能，这一框架和分析方法可以适用于其他类型的软件。

更新时间: 2024-10-09 04:49:27

领域: cs.SE,cs.AI

下载: http://arxiv.org/abs/2404.04821v5

Signal Watermark on Large Language Models

As Large Language Models (LLMs) become increasingly sophisticated, they raise significant security concerns, including the creation of fake news and academic misuse. Most detectors for identifying model-generated text are limited by their reliance on variance in perplexity and burstiness, and they require substantial computational resources. In this paper, we proposed a watermarking method embedding a specific watermark into the text during its generation by LLMs, based on a pre-defined signal pattern. This technique not only ensures the watermark's invisibility to humans but also maintains the quality and grammatical integrity of model-generated text. We utilize LLMs and Fast Fourier Transform (FFT) for token probability computation and detection of the signal watermark. The unique application of signal processing principles within the realm of text generation by LLMs allows for subtle yet effective embedding of watermarks, which do not compromise the quality or coherence of the generated text. Our method has been empirically validated across multiple LLMs, consistently maintaining high detection accuracy, even with variations in temperature settings during text generation. In the experiment of distinguishing between human-written and watermarked text, our method achieved an AUROC score of 0.97, significantly outperforming existing methods like GPTZero, which scored 0.64. The watermark's resilience to various attacking scenarios further confirms its robustness, addressing significant challenges in model-generated text authentication.

Updated: 2024-10-09 04:49:03

标题: 大型语言模型上的信号水印

摘要: 随着大型语言模型（LLMs）变得越来越复杂，它们引发了重大的安全担忧，包括制造虚假新闻和学术滥用。大多数用于识别模型生成文本的检测器受制于对困惑度和突发性的依赖，并且需要大量的计算资源。在本文中，我们提出了一种水印嵌入方法，在LLMs生成文本时将特定水印嵌入到文本中，基于预定义的信号模式。这种技术不仅确保水印对人类是不可见的，而且还保持了模型生成文本的质量和语法完整性。我们利用LLMs和快速傅里叶变换（FFT）进行令牌概率计算和信号水印的检测。在LLMs生成文本的过程中，将信号处理原则应用到文本生成领域中，实现了水印的微妙而有效的嵌入，不会损害生成文本的质量或连贯性。我们的方法已通过多个LLMs进行了经验验证，始终保持高的检测准确率，即使在文本生成过程中温度设置有所变化。在区分人为编写和带水印文本的实验中，我们的方法实现了0.97的AUROC分数，明显优于得分为0.64的现有方法如GPTZero。水印对各种攻击场景的抵抗能力进一步证实了其稳健性，解决了模型生成文本认证中的重大挑战。

更新时间: 2024-10-09 04:49:03

领域: cs.CR,cs.LG

下载: http://arxiv.org/abs/2410.06545v1

Scalable Co-Clustering for Large-Scale Data through Dynamic Partitioning and Hierarchical Merging

Co-clustering simultaneously clusters rows and columns, revealing more fine-grained groups. However, existing co-clustering methods suffer from poor scalability and cannot handle large-scale data. This paper presents a novel and scalable co-clustering method designed to uncover intricate patterns in high-dimensional, large-scale datasets. Specifically, we first propose a large matrix partitioning algorithm that partitions a large matrix into smaller submatrices, enabling parallel co-clustering. This method employs a probabilistic model to optimize the configuration of submatrices, balancing the computational efficiency and depth of analysis. Additionally, we propose a hierarchical co-cluster merging algorithm that efficiently identifies and merges co-clusters from these submatrices, enhancing the robustness and reliability of the process. Extensive evaluations validate the effectiveness and efficiency of our method. Experimental results demonstrate a significant reduction in computation time, with an approximate 83% decrease for dense matrices and up to 30% for sparse matrices.

Updated: 2024-10-09 04:47:22

标题: 可扩展的大规模数据的动态分区和层次合并的共聚类

摘要: 联合聚类同时对行和列进行分组，揭示更精细的群组。然而，现有的联合聚类方法存在可扩展性差的问题，无法处理大规模数据。本文提出了一种新颖且可扩展的联合聚类方法，旨在揭示高维大规模数据集中的复杂模式。具体来说，我们首先提出了一种大矩阵分区算法，将大矩阵分割成较小的子矩阵，实现并行联合聚类。该方法采用概率模型优化子矩阵的配置，平衡计算效率和分析深度。此外，我们提出了一种层次聚类合并算法，有效识别并合并这些子矩阵中的联合聚类，增强了过程的鲁棒性和可靠性。广泛的评估验证了我们方法的有效性和效率。实验结果表明，计算时间显著减少，对于密集矩阵减少约83％，对于稀疏矩阵最多减少30％。

更新时间: 2024-10-09 04:47:22

领域: cs.DC,cs.LG,H.2.8

下载: http://arxiv.org/abs/2410.18113v1

Toward a Better Understanding of Fourier Neural Operators from a Spectral Perspective

In solving partial differential equations (PDEs), Fourier Neural Operators (FNOs) have exhibited notable effectiveness. However, FNO is observed to be ineffective with large Fourier kernels that parameterize more frequencies. Current solutions rely on setting small kernels, restricting FNO's ability to capture complex PDE data in real-world applications. This paper offers empirical insights into FNO's difficulty with large kernels through spectral analysis: FNO exhibits a unique Fourier parameterization bias, excelling at learning dominant frequencies in target data while struggling with non-dominant frequencies. To mitigate such a bias, we propose SpecB-FNO to enhance the capture of non-dominant frequencies by adopting additional residual modules to learn from the previous ones' prediction residuals iteratively. By effectively utilizing large Fourier kernels, SpecB-FNO achieves better prediction accuracy on diverse PDE applications, with an average improvement of 50%.

Updated: 2024-10-09 04:43:57

标题: 朝向从谱角度更好地理解傅立叶神经操作符

摘要: 在解决偏微分方程（PDEs）时，傅立叶神经操作符（FNOs）表现出显著的有效性。然而，观察到FNO在使用参数化更多频率的大型傅立叶核时效果不佳。目前的解决方案依赖于设置小型核，限制了FNO在真实世界应用中捕获复杂PDE数据的能力。本文通过谱分析提供了关于FNO在大型核上困难的实证见解：FNO表现出独特的傅立叶参数化偏差，在学习目标数据中的主频率方面表现出色，但在非主导频率方面表现出困难。为了缓解这种偏差，我们提出了SpecB-FNO，通过采用额外的残差模块迭代地学习前一个模块的预测残差，以增强对非主导频率的捕获能力。通过有效利用大型傅立叶核，SpecB-FNO在各种PDE应用中实现了更好的预测准确性，平均改进了50%。

更新时间: 2024-10-09 04:43:57

领域: cs.LG

下载: http://arxiv.org/abs/2404.07200v2

Can We Trust the Performance Evaluation of Uncertainty Estimation Methods in Text Summarization?

Text summarization, a key natural language generation (NLG) task, is vital in various domains. However, the high cost of inaccurate summaries in risk-critical applications, particularly those involving human-in-the-loop decision-making, raises concerns about the reliability of uncertainty estimation on text summarization (UE-TS) evaluation methods. This concern stems from the dependency of uncertainty model metrics on diverse and potentially conflicting NLG metrics. To address this issue, we introduce a comprehensive UE-TS benchmark incorporating 31 NLG metrics across four dimensions. The benchmark evaluates the uncertainty estimation capabilities of two large language models and one pre-trained language model on three datasets, with human-annotation analysis incorporated where applicable. We also assess the performance of 14 common uncertainty estimation methods within this benchmark. Our findings emphasize the importance of considering multiple uncorrelated NLG metrics and diverse uncertainty estimation methods to ensure reliable and efficient evaluation of UE-TS techniques. Our code and data are available https://github.com/he159ok/Benchmark-of-Uncertainty-Estimation-Methods-in-Text-Summarization.

Updated: 2024-10-09 04:40:24

标题: 我们可以信任文本摘要中不确定性估计方法的性能评估吗？

摘要: 文本摘要是一项关键的自然语言生成（NLG）任务，在各个领域中至关重要。然而，在风险关键应用程序中，尤其是涉及人在决策中的应用程序中，不准确摘要的高成本引发了对文本摘要中不确定性估计（UE-TS）评估方法可靠性的担忧。这种担忧源于不确定性模型指标对多样化和潜在冲突的NLG指标的依赖。为解决这一问题，我们引入了一个包含31个NLG指标跨四个维度的综合UE-TS基准。该基准评估了两个大型语言模型和一个预训练语言模型在三个数据集上的不确定性估计能力，并在适用的情况下结合了人工注释分析。我们还评估了该基准中14种常见的不确定性估计方法的性能。我们的研究结果强调了考虑多个不相关的NLG指标和多样化的不确定性估计方法的重要性，以确保UE-TS技术的可靠和有效评估。我们的代码和数据可在https://github.com/he159ok/Benchmark-of-Uncertainty-Estimation-Methods-in-Text-Summarization中找到。

更新时间: 2024-10-09 04:40:24

领域: cs.CL,cs.LG

下载: http://arxiv.org/abs/2406.17274v2

Outlier Detection with Cluster Catch Digraphs

This paper introduces a novel family of outlier detection algorithms based on Cluster Catch Digraphs (CCDs), specifically tailored to address the challenges of high dimensionality and varying cluster shapes, which deteriorate the performance of most traditional outlier detection methods. We propose the Uniformity-Based CCD with Mutual Catch Graph (U-MCCD), the Uniformity- and Neighbor-Based CCD with Mutual Catch Graph (UN-MCCD), and their shape-adaptive variants (SU-MCCD and SUN-MCCD), which are designed to detect outliers in data sets with arbitrary cluster shapes and high dimensions. We present the advantages and shortcomings of these algorithms and provide the motivation or need to define each particular algorithm. Through comprehensive Monte Carlo simulations, we assess their performance and demonstrate the robustness and effectiveness of our algorithms across various settings and contamination levels. We also illustrate the use of our algorithms on various real-life data sets. The U-MCCD algorithm efficiently identifies outliers while maintaining high true negative rates, and the SU-MCCD algorithm shows substantial improvement in handling non-uniform clusters. Additionally, the UN-MCCD and SUN-MCCD algorithms address the limitations of existing methods in high-dimensional spaces by utilizing Nearest Neighbor Distances (NND) for clustering and outlier detection. Our results indicate that these novel algorithms offer substantial advancements in the accuracy and adaptability of outlier detection, providing a valuable tool for various real-world applications. Keyword: Outlier detection, Graph-based clustering, Cluster catch digraphs, $k$-nearest-neighborhood, Mutual catch graphs, Nearest neighbor distance.

Updated: 2024-10-09 04:38:36

标题: 异常值检测与聚类捕捉有向图

摘要: 本文介绍了一种基于Cluster Catch Digraphs（CCDs）的新型离群值检测算法家族，专门针对高维度和不同的簇形状等挑战，这些挑战影响了大多数传统离群值检测方法的性能。我们提出了基于Uniformity的CCD与Mutual Catch Graph（U-MCCD）、基于Uniformity和Neighbor的CCD与Mutual Catch Graph（UN-MCCD）以及它们的形状自适应变体（SU-MCCD和SUN-MCCD），旨在检测具有任意簇形状和高维度的数据集中的离群值。我们介绍了这些算法的优点和缺点，并提供了定义每个特定算法的动机或必要性。通过全面的蒙特卡罗模拟，我们评估了它们的性能，并展示了我们的算法在各种设置和污染水平下的稳健性和有效性。我们还展示了我们的算法在各种真实数据集上的应用。U-MCCD算法有效地识别离群值，同时保持高真负率，而SU-MCCD算法在处理非均匀簇方面显示出显著改进。此外，UN-MCCD和SUN-MCCD算法通过利用最近邻距离（NND）进行聚类和离群值检测，解决了现有方法在高维空间中的局限性。我们的结果表明，这些新颖算法在离群值检测的准确性和适应性方面取得了实质性进展，为各种实际应用提供了有价值的工具。关键词：离群值检测、基于图的聚类、簇捕获有向图、k最近邻居、相互捕获图、最近邻距离。

更新时间: 2024-10-09 04:38:36

领域: stat.ML,cs.LG

下载: http://arxiv.org/abs/2409.11596v2

Gumbel Rao Monte Carlo based Bi-Modal Neural Architecture Search for Audio-Visual Deepfake Detection

Deepfakes pose a critical threat to biometric authentication systems by generating highly realistic synthetic media. Existing multimodal deepfake detectors often struggle to adapt to diverse data and rely on simple fusion methods. To address these challenges, we propose Gumbel-Rao Monte Carlo Bi-modal Neural Architecture Search (GRMC-BMNAS), a novel architecture search framework that employs Gumbel-Rao Monte Carlo sampling to optimize multimodal fusion. It refines the Straight through Gumbel Softmax (STGS) method by reducing variance with Rao-Blackwellization, stabilizing network training. Using a two-level search approach, the framework optimizes the network architecture, parameters, and performance. Crucial features are efficiently identified from backbone networks, while within the cell structure, a weighted fusion operation integrates information from various sources. By varying parameters such as temperature and number of Monte carlo samples yields an architecture that maximizes classification performance and better generalisation capability. Experimental results on the FakeAVCeleb and SWAN-DF datasets demonstrate an impressive AUC percentage of 95.4\%, achieved with minimal model parameters.

Updated: 2024-10-09 04:37:35

标题: Gumbel Rao 蒙特卡洛基础的双模态神经架构搜索用于音视频深度伪造检测

摘要: Deepfakes对生物特征认证系统构成了严重威胁，因为它们可以生成高度逼真的合成媒体。现有的多模态deepfake检测器通常难以适应各种数据，并依赖简单的融合方法。为了解决这些挑战，我们提出了Gumbel-Rao Monte Carlo双模态神经架构搜索（GRMC-BMNAS），这是一个新颖的架构搜索框架，采用Gumbel-Rao Monte Carlo采样来优化多模态融合。它通过Rao-Blackwellization减少方差，稳定网络训练，改进了Straight through Gumbel Softmax（STGS）方法。使用两级搜索方法，该框架优化网络架构、参数和性能。关键特征从骨干网络中高效识别，而在细胞结构内，加权融合操作整合来自各种来源的信息。通过调整参数如温度和Monte Carlo样本数量，得到一个最大化分类性能和更好泛化能力的架构。在FakeAVCeleb和SWAN-DF数据集上的实验结果展示了令人印象深刻的AUC百分比为95.4\%，并且只需最少的模型参数。

更新时间: 2024-10-09 04:37:35

领域: cs.CR,cs.SD,eess.AS

下载: http://arxiv.org/abs/2410.06543v1

Depression Diagnosis Dialogue Simulation: Self-improving Psychiatrist with Tertiary Memory

Mental health issues, particularly depressive disorders, present significant challenges in contemporary society, necessitating the development of effective automated diagnostic methods. This paper introduces the Agent Mental Clinic (AMC), a self-improving conversational agent system designed to enhance depression diagnosis through simulated dialogues between patient and psychiatrist agents. To enhance the dialogue quality and diagnosis accuracy, we design a psychiatrist agent consisting of a tertiary memory structure, a dialogue control and reflect plugin that acts as ``supervisor'' and a memory sampling module, fully leveraging the skills reflected by the psychiatrist agent, achieving great accuracy on depression risk and suicide risk diagnosis via conversation. Experiment results on datasets collected in real-life scenarios demonstrate that the system, simulating the procedure of training psychiatrists, can be a promising optimization method for aligning LLMs with real-life distribution in specific domains without modifying the weights of LLMs, even when only a few representative labeled cases are available.

Updated: 2024-10-09 04:37:29

标题: 抑郁症诊断对话模拟：具有三级记忆的自我改进型精神科医生

摘要: 精神健康问题，特别是抑郁障碍，在当代社会中带来了重大挑战，需要发展有效的自动诊断方法。本文介绍了Agent Mental Clinic（AMC），这是一个自我改进的对话代理系统，旨在通过患者和精神科医生代理之间的模拟对话来增强抑郁症诊断。为了提高对话质量和诊断准确性，我们设计了一个包含三元记忆结构、对话控制和反射插件（充当“监督员”）以及记忆抽样模块的精神科医生代理，充分利用了精神科医生代理反映的技能，通过对话实现对抑郁风险和自杀风险诊断的高准确性。在真实场景中收集的数据集上的实验结果表明，该系统模拟了对精神科医生进行培训的过程，可以作为一种有前途的优化方法，用于将LLMs与特定领域中的实际分布相一致，而无需修改LLMs的权重，即使只有少数代表性标记的案例可用。

更新时间: 2024-10-09 04:37:29

领域: cs.CL,cs.AI,cs.HC

下载: http://arxiv.org/abs/2409.15084v2

Learning an Actionable Discrete Diffusion Policy via Large-Scale Actionless Video Pre-Training

Learning a generalist embodied agent capable of completing multiple tasks poses challenges, primarily stemming from the scarcity of action-labeled robotic datasets. In contrast, a vast amount of human videos exist, capturing intricate tasks and interactions with the physical world. Promising prospects arise for utilizing actionless human videos for pre-training and transferring the knowledge to facilitate robot policy learning through limited robot demonstrations. However, it remains a challenge due to the domain gap between humans and robots. Moreover, it is difficult to extract useful information representing the dynamic world from human videos, because of its noisy and multimodal data structure. In this paper, we introduce a novel framework to tackle these challenges, which leverages a unified discrete diffusion to combine generative pre-training on human videos and policy fine-tuning on a small number of action-labeled robot videos. We start by compressing both human and robot videos into unified video tokens. In the pre-training stage, we employ a discrete diffusion model with a mask-and-replace diffusion strategy to predict future video tokens in the latent space. In the fine-tuning stage, we harness the imagined future videos to guide low-level action learning with a limited set of robot data. Experiments demonstrate that our method generates high-fidelity future videos for planning and enhances the fine-tuned policies compared to previous state-of-the-art approaches with superior performance. Our project website is available at https://video-diff.github.io/.

Updated: 2024-10-09 04:25:34

标题: 通过大规模无动作视频预训练学习可操作的离散扩散策略

摘要: 学习一个通用的具有完成多项任务能力的实体代理人面临挑战，主要是因为动作标记的机器人数据集稀缺。相比之下，存在大量的人类视频，捕捉了复杂的任务和与物理世界的互动。利用无动作的人类视频进行预训练，并通过有限的机器人演示传递知识来促进机器人策略学习具有前景。然而，由于人类和机器人之间的领域差距，这仍然是一个挑战。此外，从人类视频中提取代表动态世界的有用信息是困难的，因为其数据结构具有嘈杂和多模态特性。在本文中，我们介绍了一个新颖的框架来应对这些挑战，该框架利用统一的离散扩散，结合对人类视频的生成预训练和对少量动作标记的机器人视频进行策略微调。我们首先将人类和机器人视频压缩为统一的视频标记。在预训练阶段，我们采用一个具有掩码替换扩散策略的离散扩散模型来在潜在空间中预测未来视频标记。在微调阶段，我们利用想象的未来视频来指导低级动作学习，使用有限的机器人数据。实验证明，与先前的最先进方法相比，我们的方法生成高保真度的未来视频用于规划，并增强了微调策略的性能。我们的项目网站可在https://video-diff.github.io/ 上找到。

更新时间: 2024-10-09 04:25:34

领域: cs.LG,cs.CV,cs.RO

下载: http://arxiv.org/abs/2402.14407v4

Graph Propagation Transformer for Graph Representation Learning

This paper presents a novel transformer architecture for graph representation learning. The core insight of our method is to fully consider the information propagation among nodes and edges in a graph when building the attention module in the transformer blocks. Specifically, we propose a new attention mechanism called Graph Propagation Attention (GPA). It explicitly passes the information among nodes and edges in three ways, i.e. node-to-node, node-to-edge, and edge-to-node, which is essential for learning graph-structured data. On this basis, we design an effective transformer architecture named Graph Propagation Transformer (GPTrans) to further help learn graph data. We verify the performance of GPTrans in a wide range of graph learning experiments on several benchmark datasets. These results show that our method outperforms many state-of-the-art transformer-based graph models with better performance. The code will be released at https://github.com/czczup/GPTrans.

Updated: 2024-10-09 04:25:18

标题: 图传播变换器用于图表示学习

摘要: 这篇论文提出了一种用于图表示学习的新型变压器架构。我们方法的核心见解是在构建变压器块中的注意力模块时，充分考虑图中节点和边之间的信息传播。具体而言，我们提出了一种称为图传播注意力（GPA）的新的注意力机制。它明确地以三种方式在节点和边之间传递信息，即节点到节点、节点到边和边到节点，这对学习图结构化数据至关重要。在此基础上，我们设计了一种名为图传播变压器（GPTrans）的有效变压器架构，进一步帮助学习图数据。我们在几个基准数据集上进行了广泛的图学习实验，验证了GPTrans的性能。这些结果表明，我们的方法在性能上优于许多最先进的基于变压器的图模型。代码将在https://github.com/czczup/GPTrans发布。

更新时间: 2024-10-09 04:25:18

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2305.11424v3

EEG-estimated functional connectivity, and not behavior, differentiates Parkinson's patients from health controls during the Simon conflict task

Neural biomarkers that can classify or predict disease are of broad interest to the neurological and psychiatric communities. Such biomarkers can be informative of disease state or treatment efficacy, even before there are changes in symptoms and/or behavior. This work investigates EEG-estimated functional connectivity (FC) as a Parkinson's Disease (PD) biomarker. Specifically, we investigate FC mediated via neural oscillations and consider such activity during the Simons conflict task. This task yields sensory-motor conflict, and one might expect differences in behavior between PD patients and healthy controls (HCs). In addition to considering spatially focused approaches, such as FC, as a biomarker, we also consider temporal biomarkers, which are more sensitive to ongoing changes in neural activity. We find that FC, estimated from delta (1-4Hz) and theta (4-7Hz) oscillations, yields spatial FC patterns significantly better at distinguishing PD from HC than temporal features or behavior. This study reinforces that FC in spectral bands is informative of differences in brain-wide processes and can serve as a biomarker distinguishing normal brain function from that seen in disease.

Updated: 2024-10-09 04:18:32

标题: 脑电图估计的功能连接，而不是行为，在Simon冲突任务中区分帕金森病患者和健康对照组

摘要: 可以的，翻译如下：神经生物标志物能够对疾病进行分类或预测，对神经科学和精神科学领域具有广泛的兴趣。这样的生物标志物可以提供疾病状态或治疗效果的信息，甚至在症状和/或行为发生变化之前。本研究调查了脑电图估计的功能连接（FC）作为帕金森病（PD）的生物标志物。具体来说，我们研究了通过神经振荡介导的FC，并考虑了在Simons冲突任务期间的这种活动。这项任务产生感觉-运动冲突，人们可能期望在PD患者和健康对照组（HCs）之间有行为上的差异。除了考虑空间集中的方法，如FC，作为生物标志物，我们还考虑了更加敏感于神经活动持续变化的时间性生物标志物。我们发现，从δ（1-4Hz）和θ（4-7Hz）振荡估计的FC，比时间特征或行为更能够显著区分PD和HC的空间FC模式。这项研究强调了频谱波段中的FC对大脑整体过程的差异具有信息价值，并可以作为一个生物标志物，区分正常脑功能和疾病中所见的脑功能。

更新时间: 2024-10-09 04:18:32

领域: q-bio.NC,cs.LG

下载: http://arxiv.org/abs/2410.06534v1

UpDLRM: Accelerating Personalized Recommendation using Real-World PIM Architecture

Deep Learning Recommendation Models (DLRMs) have gained popularity in recommendation systems due to their effectiveness in handling large-scale recommendation tasks. The embedding layers of DLRMs have become the performance bottleneck due to their intensive needs on memory capacity and memory bandwidth. In this paper, we propose UpDLRM, which utilizes real-world processingin-memory (PIM) hardware, UPMEM DPU, to boost the memory bandwidth and reduce recommendation latency. The parallel nature of the DPU memory can provide high aggregated bandwidth for the large number of irregular memory accesses in embedding lookups, thus offering great potential to reduce the inference latency. To fully utilize the DPU memory bandwidth, we further studied the embedding table partitioning problem to achieve good workload-balance and efficient data caching. Evaluations using real-world datasets show that, UpDLRM achieves much lower inference time for DLRM compared to both CPU-only and CPU-GPU hybrid counterparts.

Updated: 2024-10-09 04:11:28

标题: UpDLRM：使用真实世界的PIM架构加速个性化推荐

摘要: 深度学习推荐模型（DLRM）由于在处理大规模推荐任务时的有效性而在推荐系统中变得流行。DLRM的嵌入层由于对内存容量和内存带宽的密集需求而成为性能瓶颈。本文提出了UpDLRM，利用实际世界的处理内存中（PIM）硬件，UPMEM DPU，来提升内存带宽并减少推荐延迟。DPU内存的并行性可以为嵌入查找中大量不规则内存访问提供高聚合带宽，从而具有降低推理延迟的巨大潜力。为了充分利用DPU内存带宽，我们进一步研究了嵌入表分区问题，以实现良好的工作负载平衡和高效的数据缓存。使用实际数据集进行评估表明，与仅CPU和CPU-GPU混合对照相比，UpDLRM对DLRM的推理时间大大降低。

更新时间: 2024-10-09 04:11:28

领域: cs.IR,cs.AI

下载: http://arxiv.org/abs/2406.13941v2

Deep Learning for Surgical Instrument Recognition and Segmentation in Robotic-Assisted Surgeries: A Systematic Review

Applying deep learning (DL) for annotating surgical instruments in robot-assisted minimally invasive surgeries (MIS) represents a significant advancement in surgical technology. This systematic review examines 48 studies that and advanced DL methods and architectures. These sophisticated DL models have shown notable improvements in the precision and efficiency of detecting and segmenting surgical tools. The enhanced capabilities of these models support various clinical applications, including real-time intraoperative guidance, comprehensive postoperative evaluations, and objective assessments of surgical skills. By accurately identifying and segmenting surgical instruments in video data, DL models provide detailed feedback to surgeons, thereby improving surgical outcomes and reducing complication risks. Furthermore, the application of DL in surgical education is transformative. The review underscores the significant impact of DL on improving the accuracy of skill assessments and the overall quality of surgical training programs. However, implementing DL in surgical tool detection and segmentation faces challenges, such as the need for large, accurately annotated datasets to train these models effectively. The manual annotation process is labor-intensive and time-consuming, posing a significant bottleneck. Future research should focus on automating the detection and segmentation process and enhancing the robustness of DL models against environmental variations. Expanding the application of DL models across various surgical specialties will be essential to fully realize this technology's potential. Integrating DL with other emerging technologies, such as augmented reality (AR), also offers promising opportunities to further enhance the precision and efficacy of surgical procedures.

Updated: 2024-10-09 04:07:38

标题: 深度学习在机器辅助手术中对手术器械进行识别和分割的系统综述

摘要: 将深度学习（DL）应用于机器人辅助微创手术（MIS）中注释手术器械代表了外科技术的重大进步。这项系统性回顾研究了48项研究和先进的DL方法和架构。这些复杂的DL模型显示了在检测和分割手术工具的精度和效率方面的显著改进。这些模型的增强功能支持各种临床应用，包括实时术中指导、全面术后评估以及手术技能的客观评估。通过在视频数据中准确识别和分割手术器械，DL模型为外科医生提供详细反馈，从而改善手术结果并降低并发症风险。此外，DL在外科教育中的应用是变革性的。该回顾强调了DL对提高技能评估准确性和外科培训计划整体质量的重大影响。然而，将DL应用于手术工具检测和分割面临挑战，如需要大量准确标注的数据集来有效训练这些模型。手动注释过程繁重耗时，构成重大瓶颈。未来的研究应着重于自动化检测和分割过程，并增强DL模型对环境变化的稳健性。在各种外科专业中扩展DL模型的应用将是充分实现该技术潜力所必需的。将DL与其他新兴技术，如增强现实（AR）相结合，还提供了进一步增强手术程序精度和效力的有希望的机会。

更新时间: 2024-10-09 04:07:38

领域: eess.IV,cs.AI,cs.CV

下载: http://arxiv.org/abs/2410.07269v1

The Sampling-Gaussian for stereo matching

The soft-argmax operation is widely adopted in neural network-based stereo matching methods to enable differentiable regression of disparity. However, network trained with soft-argmax is prone to being multimodal due to absence of explicit constraint to the shape of the probability distribution. Previous methods leverages Laplacian distribution and cross-entropy for training but failed to effectively improve the accuracy and even compromises the efficiency of the network. In this paper, we conduct a detailed analysis of the previous distribution-based methods and propose a novel supervision method for stereo matching, Sampling-Gaussian. We sample from the Gaussian distribution for supervision. Moreover, we interpret the training as minimizing the distance in vector space and propose a combined loss of L1 loss and cosine similarity loss. Additionally, we leveraged bilinear interpolation to upsample the cost volume. Our method can be directly applied to any soft-argmax-based stereo matching method without a reduction in efficiency. We have conducted comprehensive experiments to demonstrate the superior performance of our Sampling-Gaussian. The experimental results prove that we have achieved better accuracy on five baseline methods and two datasets. Our method is easy to implement, and the code is available online.

Updated: 2024-10-09 03:57:13

标题: 立体匹配中的采样高斯

摘要: 软argmax操作广泛应用于基于神经网络的立体匹配方法中，以实现视差的可微回归。然而，使用软argmax训练的网络容易出现多峰问题，因为缺乏对概率分布形状的明确约束。先前的方法利用拉普拉斯分布和交叉熵进行训练，但未能有效提高准确性，甚至影响了网络的效率。在本文中，我们对先前基于分布的方法进行了详细分析，并提出了一种新的立体匹配监督方法，Sampling-Gaussian。我们从高斯分布中进行采样以进行监督。此外，我们将训练解释为在向量空间中最小化距离，并提出了L1损失和余弦相似性损失的组合损失。此外，我们利用双线性插值来上采样成本体积。我们的方法可以直接应用于任何基于软argmax的立体匹配方法，而不会降低效率。我们进行了全面的实验证明了我们的Sampling-Gaussian具有卓越的性能。实验结果证明，我们在五种基准方法和两个数据集上取得了更好的准确性。我们的方法易于实现，并且代码已在线上提供。

更新时间: 2024-10-09 03:57:13

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2410.06527v1

Do great minds think alike? Investigating Human-AI Complementarity in Question Answering with CAIMIRA

Recent advancements of large language models (LLMs) have led to claims of AI surpassing humans in natural language processing (NLP) tasks such as textual understanding and reasoning. This work investigates these assertions by introducing CAIMIRA, a novel framework rooted in item response theory (IRT) that enables quantitative assessment and comparison of problem-solving abilities of question-answering (QA) agents: humans and AI systems. Through analysis of over 300,000 responses from ~70 AI systems and 155 humans across thousands of quiz questions, CAIMIRA uncovers distinct proficiency patterns in knowledge domains and reasoning skills. Humans outperform AI systems in knowledge-grounded abductive and conceptual reasoning, while state-of-the-art LLMs like GPT-4 and LLaMA show superior performance on targeted information retrieval and fact-based reasoning, particularly when information gaps are well-defined and addressable through pattern matching or data retrieval. These findings highlight the need for future QA tasks to focus on questions that challenge not only higher-order reasoning and scientific thinking, but also demand nuanced linguistic interpretation and cross-contextual knowledge application, helping advance AI developments that better emulate or complement human cognitive abilities in real-world problem-solving.

Updated: 2024-10-09 03:53:26

标题: 大脑思维相似吗？使用CAIMIRA调查人类与人工智能在问答中的互补性

摘要: 最近大型语言模型（LLMs）的进展导致了AI在自然语言处理（NLP）任务中超越人类的主张，如文字理解和推理。本文通过引入CAIMIRA，一个根植于项目反应理论（IRT）的新框架，对问题解答（QA）代理人（人类和AI系统）的问题解决能力进行定量评估和比较，对这些主张进行了调查。通过分析来自约70个AI系统和155个人类在数千个测验问题中的30万多个回答，CAIMIRA揭示了知识领域和推理技能中的不同熟练度模式。人类在基于知识的诱因性和概念推理方面胜过AI系统，而像GPT-4和LLaMA这样的最新LLMs在有针对性的信息检索和基于事实的推理上表现出更优异的表现，特别是在信息空白被明确定义且可以通过模式匹配或数据检索来解决时。这些发现强调了未来QA任务需要专注于不仅挑战高阶推理和科学思维的问题，还需要要求细致的语言解释和跨上下文知识应用，以帮助推进更好地模拟或补充人类认知能力在现实问题解决中的AI发展。

更新时间: 2024-10-09 03:53:26

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2410.06524v1

Phase Diagram from Nonlinear Interaction between Superconducting Order and Density: Toward Data-Based Holographic Superconductor

We address an inverse problem in modeling holographic superconductors. We focus our research on the critical temperature behavior depicted by experiments. We use a physics-informed neural network method to find a mass function $M(F^2)$, which is necessary to understand phase transition behavior. This mass function describes a nonlinear interaction between superconducting order and charge carrier density. We introduce positional embedding layers to improve the learning process in our algorithm, and the Adam optimization is used to predict the critical temperature data via holographic calculation with appropriate accuracy. Consideration of the positional embedding layers is motivated by the transformer model of natural-language processing in the artificial intelligence (AI) field. We obtain holographic models that reproduce borderlines of the normal and superconducting phases provided by actual data. Our work is the first holographic attempt to match phase transition data quantitatively obtained from experiments. Also, the present work offers a new methodology for data-based holographic models.

Updated: 2024-10-09 03:52:18

标题: 超导序和密度之间的非线性相互作用相图：走向基于数据的全息超导体

摘要: 我们解决了建模全息超导体中的一个反问题。我们的研究重点是实验所描绘的临界温度行为。我们使用物理信息神经网络方法找到一个质量函数$M(F^2)$，这对理解相变行为是必要的。这个质量函数描述了超导序和载流子密度之间的非线性相互作用。我们引入位置嵌入层来改进算法中的学习过程，并使用Adam优化来通过全息计算准确预测临界温度数据。位置嵌入层的考虑受到了人工智能领域自然语言处理的变压器模型的启发。我们获得了全息模型，可以重新现实际数据提供的正常和超导相的边界。我们的工作是首次尝试定量匹配实验获得的相变数据的全息尝试。此外，本研究提供了一种基于数据的全息模型的新方法论。

更新时间: 2024-10-09 03:52:18

领域: hep-th,cond-mat.dis-nn,cond-mat.supr-con,cs.AI

下载: http://arxiv.org/abs/2410.06523v1

On the Security of Bitstream-level JPEG Encryption with Restart Markers

This paper aims to evaluate the security of a bitstream-level JPEG encryption method using restart (RST) markers, where encrypted image can keep the JPEG file format with the same file size as non-encrypted image. Data encrypted using this method can be decoded without altering header information by employing a standard JPEG decoder. Moreover, the use of RST markers enables the definition of extended blocks divided by the markers, so spatially partial encryption and block-permutation-based encryption can be carried out. However, the security of the method was evaluated only with respect to the key space analysis for brute-force attacks and other limited attacks. Accordingly, in this paper, we evaluated the security of the method with respect to robustness against ciphertext-only attacks including state-of-the-art attacks. In experiments, the method is compared with conventional encryption methods, and it is confirmed to be robust against ciphertext-only attacks if parameters used for image encryption are carefully chosen.

Updated: 2024-10-09 03:50:31

标题: 关于使用重启标记进行比特流级别JPEG加密的安全性

摘要: 本文旨在评估使用重启（RST）标记的位流级JPEG加密方法的安全性，其中加密图像可以保持与非加密图像相同的JPEG文件格式和文件大小。使用该方法加密的数据可以通过使用标准JPEG解码器解码，而无需更改标头信息。此外，使用RST标记可以定义由标记分割的扩展块，从而可以进行空间部分加密和基于块置换的加密。然而，该方法的安全性仅针对蛮力攻击和其他有限攻击的密钥空间分析进行了评估。因此，在本文中，我们评估了该方法对仅密文攻击（包括最先进的攻击）的抗性。在实验中，将该方法与传统加密方法进行比较，并确认如果谨慎选择用于图像加密的参数，则该方法对仅密文攻击具有稳健性。

更新时间: 2024-10-09 03:50:31

领域: cs.CR

下载: http://arxiv.org/abs/2410.06522v1

Limits of Transformer Language Models on Learning to Compose Algorithms

We analyze the capabilities of Transformer language models in learning compositional discrete tasks. To this end, we evaluate training LLaMA models and prompting GPT-4 and Gemini on four tasks demanding to learn a composition of several discrete sub-tasks. On both training LLaMA models from scratch and prompting on GPT-4 and Gemini, we measure how well these models can reuse primitives observable in the sub-tasks to learn the composition task. Our results indicate that compositional learning in state-of-the-art Transformer language models is highly sample inefficient: LLaMA requires more data samples than relearning all sub-tasks from scratch to learn the compositional task; in-context prompting with few samples is unreliable and fails at executing the sub-tasks or correcting the errors in multi-round code generation. Further, by leveraging complexity theory, we support these findings with a theoretical analysis focused on the sample inefficiency of gradient descent in memorizing feedforward models.

Updated: 2024-10-09 03:43:34

标题: Transformer语言模型在学习组合算法方面的局限性

摘要: 我们分析了Transformer语言模型在学习组合离散任务方面的能力。为此，我们评估了训练LLaMA模型以及在四个需要学习几个离散子任务组合的任务上对GPT-4和Gemini进行提示的能力。在从头开始训练LLaMA模型和在GPT-4和Gemini上提示时，我们测量了这些模型如何能够重复使用在子任务中可观察到的基本元素来学习组合任务。我们的结果表明，在最先进的Transformer语言模型中，组合学习具有很高的样本效率低：LLaMA需要比从头开始重新学习所有子任务更多的数据样本来学习组合任务；在上下文提示下，使用少量样本是不可靠的，并且在执行子任务或纠正多轮代码生成中的错误时会失败。此外，通过利用复杂性理论，我们支持这些发现，并进行了关于梯度下降在记忆前馈模型中的样本效率低的理论分析。

更新时间: 2024-10-09 03:43:34

领域: cs.LG,cs.AI,cs.CL

下载: http://arxiv.org/abs/2402.05785v4

Rebuilding ROME : Resolving Model Collapse during Sequential Model Editing

Recent work using Rank-One Model Editing (ROME), a popular model editing method, has shown that there are certain facts that the algorithm is unable to edit without breaking the model. Such edits have previously been called disabling edits. These disabling edits cause immediate model collapse and limits the use of ROME for sequential editing. In this paper, we show that disabling edits are an artifact of irregularities in the implementation of ROME. With this paper, we provide a more stable implementation ROME, which we call r-ROME and show that model collapse is no longer observed when making large scale sequential edits with r-ROME, while further improving generalization and locality of model editing compared to the original implementation of ROME. We also provide a detailed mathematical explanation of the reason behind disabling edits.

Updated: 2024-10-09 03:41:43

标题: 重建罗马：解决在顺序模型编辑期间发生的模型崩溃问题

摘要: 最近使用一种流行的模型编辑方法——Rank-One Model Editing (ROME)的研究表明，该算法无法编辑某些事实而不破坏模型。这些编辑之前被称为禁用编辑。这些禁用编辑会导致模型立即崩溃，并限制了ROME用于顺序编辑的使用。在本文中，我们展示禁用编辑是ROME实现中的不规则性的产物。通过本文，我们提供了一个更稳定的ROME实现，我们称之为r-ROME，并展示在使用r-ROME进行大规模顺序编辑时不再观察到模型崩溃，同时相比于原始ROME实现进一步改善了模型编辑的泛化和局部性。我们还提供了一个详细的数学解释，解释了禁用编辑背后的原因。

更新时间: 2024-10-09 03:41:43

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2403.07175v3

The Geometry of Categorical and Hierarchical Concepts in Large Language Models

The linear representation hypothesis is the informal idea that semantic concepts are encoded as linear directions in the representation spaces of large language models (LLMs). Previous work has shown how to make this notion precise for representing binary concepts that have natural contrasts (e.g., {male, female}) as directions in representation space. However, many natural concepts do not have natural contrasts (e.g., whether the output is about an animal). In this work, we show how to extend the formalization of the linear representation hypothesis to represent features (e.g., is_animal) as vectors. This allows us to immediately formalize the representation of categorical concepts as polytopes in the representation space. Further, we use the formalization to prove a relationship between the hierarchical structure of concepts and the geometry of their representations. We validate these theoretical results on the Gemma and LLaMA-3 large language models, estimating representations for 900+ hierarchically related concepts using data from WordNet.

Updated: 2024-10-09 03:39:11

标题: 大型语言模型中的范畴和层次概念的几何学

摘要: 线性表示假设是一个非正式的想法，即语义概念被编码为大型语言模型（LLMs）的表示空间中的线性方向。先前的研究已经展示了如何将这个概念精确化，以表示具有自然对比（例如{男性，女性}）的二元概念作为表示空间中的方向。然而，许多自然概念没有自然对比（例如输出是否关于动物）。在这项工作中，我们展示了如何将线性表示假设的形式化扩展到将特征（例如is_animal）表示为向量。这使我们能够立即将分类概念的表示形式化为表示空间中的多面体。此外，我们利用这种形式化来证明概念的层次结构与其表示的几何关系。我们在Gemma和LLaMA-3大型语言模型上验证了这些理论结果，利用来自WordNet的数据估计了900多个层次相关概念的表示。

更新时间: 2024-10-09 03:39:11

领域: cs.CL,cs.AI,cs.LG,stat.ML

下载: http://arxiv.org/abs/2406.01506v2

A Unified Framework for Model Editing

ROME and MEMIT are largely believed to be two different model editing algorithms, with the major difference between them being the ability to perform batched edits. In this paper, we unify these two algorithms under a single conceptual umbrella, optimizing for the same goal, which we call the preservation-memorization objective. ROME uses an equality constraint to optimize this objective to perform one edit at a time, whereas MEMIT employs a more flexible least-square constraint that allows for batched edits. We generalize ROME and enable batched editing with equality constraint in the form of EMMET - an Equality-constrained Mass Model Editing algorithm for Transformers, a new batched memory-editing algorithm. EMMET can perform batched-edits up to a batch-size of 10,000, with very similar performance to MEMIT across multiple dimensions. With the introduction of EMMET, we truly unify ROME and MEMIT and show that both algorithms are equivalent in terms of their optimization objective, their abilities (singular and batched editing), their model editing performance and their limitations.

Updated: 2024-10-09 03:37:30

标题: 一个统一的模型编辑框架

摘要: ROME和MEMIT被广泛认为是两种不同的模型编辑算法，它们之间的主要区别在于能够执行批量编辑。在本文中，我们将这两种算法统一在一个单一的概念框架下，优化同一目标，我们称之为保留-记忆目标。ROME使用等式约束来优化这个目标，一次执行一个编辑，而MEMIT采用更灵活的最小二乘约束，允许批量编辑。我们将ROME进行泛化，并通过EMMET实现批量编辑，EMMET是一种针对变压器的等式约束大规模模型编辑算法，一种新的批量内存编辑算法。EMMET可以进行批量编辑，批量大小可达10,000，在多个维度上与MEMIT表现非常相似。引入EMMET后，我们真正统一了ROME和MEMIT，并展示了这两种算法在优化目标、能力（单个和批量编辑）、模型编辑性能和局限性方面是等效的。

更新时间: 2024-10-09 03:37:30

领域: cs.LG,cs.AI,cs.CL

下载: http://arxiv.org/abs/2403.14236v5

QuadBEV: An Efficient Quadruple-Task Perception Framework via Bird's-Eye-View Representation

Bird's-Eye-View (BEV) perception has become a vital component of autonomous driving systems due to its ability to integrate multiple sensor inputs into a unified representation, enhancing performance in various downstream tasks. However, the computational demands of BEV models pose challenges for real-world deployment in vehicles with limited resources. To address these limitations, we propose QuadBEV, an efficient multitask perception framework that leverages the shared spatial and contextual information across four key tasks: 3D object detection, lane detection, map segmentation, and occupancy prediction. QuadBEV not only streamlines the integration of these tasks using a shared backbone and task-specific heads but also addresses common multitask learning challenges such as learning rate sensitivity and conflicting task objectives. Our framework reduces redundant computations, thereby enhancing system efficiency, making it particularly suited for embedded systems. We present comprehensive experiments that validate the effectiveness and robustness of QuadBEV, demonstrating its suitability for real-world applications.

Updated: 2024-10-09 03:31:45

标题: QuadBEV：通过鸟瞰图表示的高效四重任务感知框架

摘要: 鸟瞰（BEV）感知已成为自动驾驶系统的重要组成部分，因为它能够将多个传感器输入整合成统一的表示，从而增强各种下游任务的性能。然而，BEV模型的计算需求对于资源有限的车辆在现实世界中的部署提出挑战。为了解决这些限制，我们提出了QuadBEV，这是一个高效的多任务感知框架，利用四个关键任务之间共享的空间和上下文信息：3D物体检测、车道检测、地图分割和占用预测。QuadBEV不仅通过使用共享的主干和任务特定的头部简化了这些任务的整合，还解决了常见的多任务学习挑战，如学习速率敏感性和冲突的任务目标。我们的框架减少了冗余计算，从而增强了系统的效率，使其特别适用于嵌入式系统。我们进行了全面的实验证实了QuadBEV的有效性和鲁棒性，证明了其适用于实际应用。

更新时间: 2024-10-09 03:31:45

领域: cs.RO,cs.AI

下载: http://arxiv.org/abs/2410.06516v1

Learning Content-Aware Multi-Modal Joint Input Pruning via Bird's-Eye-View Representation

In the landscape of autonomous driving, Bird's-Eye-View (BEV) representation has recently garnered substantial academic attention, serving as a transformative framework for the fusion of multi-modal sensor inputs. This BEV paradigm effectively shifts the sensor fusion challenge from a rule-based methodology to a data-centric approach, thereby facilitating more nuanced feature extraction from an array of heterogeneous sensors. Notwithstanding its evident merits, the computational overhead associated with BEV-based techniques often mandates high-capacity hardware infrastructures, thus posing challenges for practical, real-world implementations. To mitigate this limitation, we introduce a novel content-aware multi-modal joint input pruning technique. Our method leverages BEV as a shared anchor to algorithmically identify and eliminate non-essential sensor regions prior to their introduction into the perception model's backbone. We validatethe efficacy of our approach through extensive experiments on the NuScenes dataset, demonstrating substantial computational efficiency without sacrificing perception accuracy. To the best of our knowledge, this work represents the first attempt to alleviate the computational burden from the input pruning point.

Updated: 2024-10-09 03:30:00

标题: 学习内容感知的多模态联合输入修剪通过鸟瞰表示

摘要: 在自动驾驶领域，鸟瞰图（BEV）表示最近引起了广泛的学术关注，作为多模态传感器输入融合的转变框架。这种BEV范式有效地将传感器融合挑战从基于规则的方法论转变为数据中心方法，从而促进了从各种异构传感器中提取更加细致的特征。尽管其明显的优点，与基于BEV的技术相关的计算开销通常需要高容量的硬件基础设施，因此对实际的实现提出了挑战。为了缓解这一限制，我们引入了一种新颖的内容感知多模态联合输入修剪技术。我们的方法利用BEV作为共享锚点，从算法上识别和消除非必要的传感器区域，然后将其引入感知模型的骨干。我们通过对NuScenes数据集进行广泛实验验证了我们方法的有效性，展示了大幅提高的计算效率而不牺牲感知准确性。据我们所知，这项工作代表了第一次尝试从输入修剪的角度减轻计算负担。

更新时间: 2024-10-09 03:30:00

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2410.07268v1

MORSE: An Efficient Homomorphic Secret Sharing Scheme Enabling Non-Linear Operation

Homomorphic secret sharing (HSS) enables two servers to locally perform functions on encrypted data directly and obtain the results in the form of shares. A Paillier-based HSS solution seamlessly achieves multiplicative homomorphism and consumes less communication costs. Unfortunately, existing Paillier-based HSS schemes suffer from a large private key size, potential calculation error, expensive computation and storage overhead, and only valid on linear operations (e.g., addition and multiplication). To this end, inspired by the Paillier cryptosystem with fast encryption and decryption, we propose MORSE, an efficient homomorphic secret sharing scheme enabling non-linear operation, which enjoys a small key size, no calculation error and low overhead. In terms of functions, MORSE supports addition, subtraction, multiplication, scalar-multiplication, and comparison. Particularly, we carefully design two conversion protocols achieving the mutual conversion between one Paillier ciphertext and two secret shares, which allows MORSE to continuously perform the above operations. Rigorous analyses demonstrate that MORSE securely outputs correct results. Experimental results show that MORSE makes a runtime improvement of up to 9.3 times in terms of secure multiplication, and a communication costs reduction of up to 16.6% in secure comparison, compared to the state-of-the-art.

Updated: 2024-10-09 03:29:50

标题: MORSE：一种有效的同态秘密分享方案，实现非线性操作

摘要: 同态秘密共享（HSS）使两个服务器可以直接在加密数据上执行函数并以份额形式获得结果。基于Paillier的HSS解决方案无缝实现了乘法同态性，并且消耗较少的通信成本。然而，现有的基于Paillier的HSS方案存在私钥大小较大、潜在计算错误、昂贵的计算和存储开销，并且仅对线性操作（例如加法和乘法）有效。因此，受到具有快速加密和解密功能的Paillier密码体系的启发，我们提出了MORSE，一种高效的同态秘密共享方案，实现了非线性操作，具有较小的密钥大小、无计算错误和低开销。在功能方面，MORSE支持加法、减法、乘法、标量乘法和比较。特别是，我们精心设计了两个转换协议，实现了一个Paillier密文和两个秘密份额之间的相互转换，这使MORSE能够持续执行上述操作。严格的分析表明，MORSE安全地输出正确的结果。实验结果显示，与最先进技术相比，MORSE在安全乘法方面的运行时间提高了高达9.3倍，并且在安全比较方面的通信成本减少了高达16.6%。

更新时间: 2024-10-09 03:29:50

领域: cs.CR

下载: http://arxiv.org/abs/2410.06514v1

SHyPar: A Spectral Coarsening Approach to Hypergraph Partitioning

State-of-the-art hypergraph partitioners utilize a multilevel paradigm to construct progressively coarser hypergraphs across multiple layers, guiding cut refinements at each level of the hierarchy. Traditionally, these partitioners employ heuristic methods for coarsening and do not consider the structural features of hypergraphs. In this work, we introduce a multilevel spectral framework, SHyPar, for partitioning large-scale hypergraphs by leveraging hyperedge effective resistances and flow-based community detection techniques. Inspired by the latest theoretical spectral clustering frameworks, such as HyperEF and HyperSF, SHyPar aims to decompose large hypergraphs into multiple subgraphs with few inter-partition hyperedges (cut size). A key component of SHyPar is a flow-based local clustering scheme for hypergraph coarsening, which incorporates a max-flow-based algorithm to produce clusters with substantially improved conductance. Additionally, SHyPar utilizes an effective resistance-based rating function for merging nodes that are strongly connected (coupled). Compared with existing state-of-the-art hypergraph partitioning methods, our extensive experimental results on real-world VLSI designs demonstrate that SHyPar can more effectively partition hypergraphs, achieving state-of-the-art solution quality.

Updated: 2024-10-09 03:29:47

标题: SHyPar：一种用于超图划分的谱粗化方法

摘要: 最先进的超图分区器利用多级范例，在多层中构建逐渐粗糙的超图，引导层次结构的每个级别的切割细化。传统上，这些分区器采用启发式方法进行粗化，并不考虑超图的结构特征。在这项工作中，我们引入了一个多级谱框架，SHyPar，通过利用超边的有效电阻和基于流的社区检测技术，对大规模超图进行分区。受到最新的理论谱聚类框架的启发，如HyperEF和HyperSF，SHyPar旨在将大型超图分解为具有少量跨分区超边（切割大小）的多个子图。SHyPar的一个关键组成部分是基于流的超图粗化的局部聚类方案，它结合了基于最大流的算法，以产生具有显著改进导纳的簇。此外，SHyPar利用基于有效电阻的评级函数来合并强连接（耦合）的节点。与现有最先进的超图分区方法相比，我们在真实世界的VLSI设计上的大量实验结果表明，SHyPar可以更有效地分区超图，实现最先进的解决方案质量。

更新时间: 2024-10-09 03:29:47

领域: cs.SI,cs.LG

下载: http://arxiv.org/abs/2410.10875v1

OPTIMA: Optimized Policy for Intelligent Multi-Agent Systems Enables Coordination-Aware Autonomous Vehicles

Coordination among connected and autonomous vehicles (CAVs) is advancing due to developments in control and communication technologies. However, much of the current work is based on oversimplified and unrealistic task-specific assumptions, which may introduce vulnerabilities. This is critical because CAVs not only interact with their environment but are also integral parts of it. Insufficient exploration can result in policies that carry latent risks, highlighting the need for methods that explore the environment both extensively and efficiently. This work introduces OPTIMA, a novel distributed reinforcement learning framework for cooperative autonomous vehicle tasks. OPTIMA alternates between thorough data sampling from environmental interactions and multi-agent reinforcement learning algorithms to optimize CAV cooperation, emphasizing both safety and efficiency. Our goal is to improve the generality and performance of CAVs in highly complex and crowded scenarios. Furthermore, the industrial-scale distributed training system easily adapts to different algorithms, reward functions, and strategies.

Updated: 2024-10-09 03:28:45

标题: OPTIMA：智能多智能体系统的优化策略实现协调感知的自主车辆

摘要: 连接和自主车辆（CAVs）之间的协调正在不断发展，这归功于控制和通信技术的发展。然而，目前许多工作都是基于过于简化和不切实际的特定任务假设，这可能会引入漏洞。这是至关重要的，因为CAVs不仅与其环境相互作用，而且也是其环境的组成部分。不足的探索可能导致携带潜在风险的政策，突显了需要既广泛又高效地探索环境的方法。这项工作介绍了OPTIMA，一种用于合作自主车辆任务的新型分布式强化学习框架。OPTIMA在从环境交互中进行彻底数据采样和多智能体强化学习算法之间交替，以优化CAV协作，强调安全性和效率。我们的目标是提高CAVs在高度复杂和拥挤的场景中的普遍性和性能。此外，工业规模的分布式训练系统可以轻松适应不同的算法、奖励函数和策略。

更新时间: 2024-10-09 03:28:45

领域: cs.MA,cs.LG,cs.RO

下载: http://arxiv.org/abs/2410.18112v1

TorchTitan: One-stop PyTorch native solution for production ready LLM pre-training

The development of large language models (LLMs) has been instrumental in advancing state-of-the-art natural language processing applications. Training LLMs with billions of parameters and trillions of tokens require sophisticated distributed systems that enable composing and comparing several state-of-the-art techniques in order to efficiently scale across thousands of accelerators. However, existing solutions are complex, scattered across multiple libraries/repositories, lack interoperability, and are cumbersome to maintain. Thus, curating and empirically comparing training recipes require non-trivial engineering effort. This paper introduces TorchTitan, an open-source, PyTorch-native distributed training system that unifies state-of-the-art techniques, streamlining integration and reducing overhead. TorchTitan enables 3D parallelism in a modular manner with elastic scaling, providing comprehensive logging, checkpointing, and debugging tools for production-ready training. It also incorporates hardware-software co-designed solutions, leveraging features like Float8 training and SymmetricMemory. As a flexible test bed, TorchTitan facilitates custom recipe curation and comparison, allowing us to develop optimized training recipes for Llama 3.1 and provide guidance on selecting techniques for maximum efficiency based on our experiences. We thoroughly assess TorchTitan on the Llama 3.1 family of LLMs, spanning 8 billion to 405 billion parameters, and showcase its exceptional performance, modular composability, and elastic scalability. By stacking training optimizations, we demonstrate accelerations of 65.08% with 1D parallelism at the 128-GPU scale (Llama 3.1 8B), an additional 12.59% with 2D parallelism at the 256-GPU scale (Llama 3.1 70B), and an additional 30% with 3D parallelism at the 512-GPU scale (Llama 3.1 405B) on NVIDIA H100 GPUs over optimized baselines.

Updated: 2024-10-09 03:26:11

标题: TorchTitan：PyTorch原生解决方案，一站式生产就绪LLM预训练

摘要: 大语言模型（LLM）的发展对于推动最先进的自然语言处理应用至关重要。训练具有数十亿参数和数万亿标记的LLM需要复杂的分布式系统，这些系统能够组合和比较多种最先进的技术，以便有效地跨越数千个加速器进行扩展。然而，现有解决方案复杂，分散在多个库/存储库中，缺乏互操作性，并且难以维护。因此，筛选和经验比较训练配方需要非常大的工程工作。本文介绍了TorchTitan，这是一个开源的、PyTorch原生的分布式训练系统，它统一了最先进的技术，简化了集成并减少了开销。TorchTitan以模块化的方式实现了三维并行性，具有弹性扩展能力，提供了全面的日志记录、检查点和调试工具，适用于生产环境的训练。它还整合了硬件-软件共同设计的解决方案，利用了Float8训练和SymmetricMemory等功能。作为一个灵活的测试平台，TorchTitan促进了自定义配方的筛选和比较，使我们能够为Llama 3.1开发优化的训练配方，并根据我们的经验提供选择技术以获得最大效率的指导。我们对Llama 3.1系列LLM进行了全面评估，涵盖了80亿到4050亿参数，并展示了其出色的性能、模块化组合性和弹性扩展性。通过堆叠训练优化，我们展示了在128-GPU规模（Llama 3.1 8B）上使用1D并行性加速65.08%，在256-GPU规模（Llama 3.1 70B）上使用2D并行性额外加速12.59%，以及在512-GPU规模（Llama 3.1 405B）上使用3D并行性额外加速30%，这些实验是在NVIDIA H100 GPU上对优化基线进行的。

更新时间: 2024-10-09 03:26:11

领域: cs.CL,cs.AI,cs.DC,cs.LG

下载: http://arxiv.org/abs/2410.06511v1

PFAttack: Stealthy Attack Bypassing Group Fairness in Federated Learning

Federated learning (FL), integrating group fairness mechanisms, allows multiple clients to collaboratively train a global model that makes unbiased decisions for different populations grouped by sensitive attributes (e.g., gender and race). Due to its distributed nature, previous studies have demonstrated that FL systems are vulnerable to model poisoning attacks. However, these studies primarily focus on perturbing accuracy, leaving a critical question unexplored: Can an attacker bypass the group fairness mechanisms in FL and manipulate the global model to be biased? The motivations for such an attack vary; an attacker might seek higher accuracy, yet fairness considerations typically limit the accuracy of the global model or aim to cause ethical disruption. To address this question, we design a novel form of attack in FL, termed Profit-driven Fairness Attack (PFATTACK), which aims not to degrade global model accuracy but to bypass fairness mechanisms. Our fundamental insight is that group fairness seeks to weaken the dependence of outputs on input attributes related to sensitive information. In the proposed PFATTACK, an attacker can recover this dependence through local fine-tuning across various sensitive groups, thereby creating a biased yet accuracy-preserving malicious model and injecting it into FL through model replacement. Compared to attacks targeting accuracy, PFATTACK is more stealthy. The malicious model in PFATTACK exhibits subtle parameter variations relative to the original global model, making it robust against detection and filtering by Byzantine-resilient aggregations. Extensive experiments on benchmark datasets are conducted for four fair FL frameworks and three Byzantine-resilient aggregations against model poisoning, demonstrating the effectiveness and stealth of PFATTACK in bypassing group fairness mechanisms in FL.

Updated: 2024-10-09 03:23:07

标题: PFAttack：绕过联邦学习中的组公平性的隐蔽攻击

摘要: 联邦学习（FL）将集团公平机制整合进来，允许多个客户共同训练一个全球模型，该模型可以为不同种群（按敏感属性分组，如性别和种族）做出无偏见的决策。由于其分布式性质，先前的研究表明FL系统容易受到模型投毒攻击的威胁。然而，这些研究主要集中在扰乱准确性上，留下了一个关键问题未探讨：攻击者是否可以绕过FL中的群体公平机制，操纵全球模型产生偏见？对这种攻击的动机各不相同；攻击者可能寻求更高的准确性，然而公平考量通常限制了全球模型的准确性，或者旨在引发道德混乱。为了解决这个问题，我们设计了一种新型的FL攻击形式，称为利润驱动公平攻击（PFATTACK），其目标不是降低全球模型的准确性，而是绕过公平机制。我们的基本见解是，群体公平旨在削弱输出与涉及敏感信息的输入属性之间的依赖关系。在提出的PFATTACK中，攻击者可以通过对各种敏感群体进行本地微调，恢复这种依赖关系，从而创建一个具有偏见但保持准确性的恶意模型，并将其通过模型替换注入到FL中。与针对准确性的攻击相比，PFATTACK更加隐蔽。在PFATTACK中的恶意模型相对于原始全球模型具有微小的参数变化，使其具有抵抗拜占庭恢复聚合检测和过滤的能力。对四个公平FL框架和三种拜占庭恢复聚合进行了广泛的基准数据集实验，以对抗模型投毒，展示了PFATTACK绕过FL中群体公平机制的有效性和隐蔽性。

更新时间: 2024-10-09 03:23:07

领域: cs.LG

下载: http://arxiv.org/abs/2410.06509v1

Unlocking the Power of Large Language Models for Entity Alignment

Entity Alignment (EA) is vital for integrating diverse knowledge graph (KG) data, playing a crucial role in data-driven AI applications. Traditional EA methods primarily rely on comparing entity embeddings, but their effectiveness is constrained by the limited input KG data and the capabilities of the representation learning techniques. Against this backdrop, we introduce ChatEA, an innovative framework that incorporates large language models (LLMs) to improve EA. To address the constraints of limited input KG data, ChatEA introduces a KG-code translation module that translates KG structures into a format understandable by LLMs, thereby allowing LLMs to utilize their extensive background knowledge to improve EA accuracy. To overcome the over-reliance on entity embedding comparisons, ChatEA implements a two-stage EA strategy that capitalizes on LLMs' capability for multi-step reasoning in a dialogue format, thereby enhancing accuracy while preserving efficiency. Our experimental results verify ChatEA's superior performance, highlighting LLMs' potential in facilitating EA tasks.

Updated: 2024-10-09 03:22:46

标题: 解锁大型语言模型在实体对齐中的潜力

摘要: 实体对齐（EA）对于整合多样化的知识图谱（KG）数据至关重要，在数据驱动的人工智能应用中发挥着至关重要的作用。传统的EA方法主要依赖于比较实体嵌入，但它们的有效性受限于有限的输入KG数据和表示学习技术的能力。在这种背景下，我们引入了ChatEA，这是一个创新框架，将大型语言模型（LLMs）整合到EA中以改进EA。为了解决有限输入KG数据的限制，ChatEA引入了一个KG-代码翻译模块，将KG结构翻译成LLMs可理解的格式，从而使LLMs能够利用其丰富的背景知识来提高EA的准确性。为了克服过度依赖实体嵌入比较的问题，ChatEA实施了一个两阶段EA策略，利用LLMs在对话格式中的多步推理能力，从而提高准确性同时保持效率。我们的实验结果验证了ChatEA的卓越性能，突显了LLMs在促进EA任务中的潜力。

更新时间: 2024-10-09 03:22:46

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2402.15048v2

Towards Self-Improvement of LLMs via MCTS: Leveraging Stepwise Knowledge with Curriculum Preference Learning

Monte Carlo Tree Search (MCTS) has recently emerged as a powerful technique for enhancing the reasoning capabilities of LLMs. Techniques such as SFT or DPO have enabled LLMs to distill high-quality behaviors from MCTS, improving their reasoning performance. However, existing distillation methods underutilize the rich trajectory information generated by MCTS, limiting the potential for improvements in LLM reasoning. In this paper, we propose AlphaLLM-CPL, a novel pairwise training framework that enables LLMs to self-improve through MCTS behavior distillation. AlphaLLM-CPL efficiently leverages MCTS trajectories via two key innovations: (1) AlphaLLM-CPL constructs stepwise trajectory pairs from child nodes sharing the same parent in the search tree, providing step-level information for more effective MCTS behavior distillation. (2) AlphaLLM-CPL introduces curriculum preference learning, dynamically adjusting the training sequence of trajectory pairs in each offline training epoch to prioritize critical learning steps and mitigate overfitting. Experimental results on mathematical reasoning tasks demonstrate that AlphaLLM-CPL significantly outperforms previous MCTS behavior distillation methods, substantially boosting the reasoning capabilities of LLMs.

Updated: 2024-10-09 03:20:02

标题: 通过MCTS实现LLMs的自我改进：利用课程偏好学习逐步知识

摘要: 蒙特卡罗树搜索（MCTS）最近已成为增强LLM推理能力的强大技术。诸如SFT或DPO的技术使LLMs能够从MCTS中提炼高质量行为，提高他们的推理表现。然而，现有提炼方法未充分利用MCTS生成的丰富轨迹信息，限制了LLM推理改进的潜力。在本文中，我们提出了AlphaLLM-CPL，这是一个新颖的成对训练框架，通过MCTS行为提炼使LLMs自我改进。AlphaLLM-CPL通过两个关键创新有效地利用MCTS轨迹：（1）AlphaLLM-CPL从在搜索树中共享相同父节点的子节点构建逐步轨迹对，为更有效的MCTS行为提炼提供步骤级信息。（2）AlphaLLM-CPL引入课程偏好学习，动态调整每个离线训练周期中的轨迹对训练顺序，以优先考虑关键学习步骤并减少过拟合。数学推理任务的实验结果表明，AlphaLLM-CPL明显优于先前的MCTS行为提炼方法，大幅提升了LLMs的推理能力。

更新时间: 2024-10-09 03:20:02

领域: cs.LG,cs.CL

下载: http://arxiv.org/abs/2410.06508v1

Optimizing Transformer based on high-performance optimizer for predicting employment sentiment in American social media content

This article improves the Transformer model based on swarm intelligence optimization algorithm, aiming to predict the emotions of employment related text content on American social media. Through text preprocessing, feature extraction, and vectorization, the text data was successfully converted into numerical data and imported into the model for training. The experimental results show that during the training process, the accuracy of the model gradually increased from 49.27% to 82.83%, while the loss value decreased from 0.67 to 0.35, indicating a significant improvement in the performance of the model on the training set. According to the confusion matrix analysis of the training set, the accuracy of the training set is 86.15%. The confusion matrix of the test set also showed good performance, with an accuracy of 82.91%. The accuracy difference between the training set and the test set is only 3.24%, indicating that the model has strong generalization ability. In addition, the evaluation of polygon results shows that the model performs well in classification accuracy, sensitivity, specificity, and area under the curve (AUC), with a Kappa coefficient of 0.66 and an F-measure of 0.80, further verifying the effectiveness of the model in social media sentiment analysis. The improved model proposed in this article not only improves the accuracy of sentiment recognition in employment related texts on social media, but also has important practical significance. This social media based data analysis method can not only capture social dynamics in a timely manner, but also promote decision-makers to pay attention to public concerns and provide data support for improving employment conditions.

Updated: 2024-10-09 03:14:05

标题: 基于高性能优化器优化的Transformer模型用于预测美国社交媒体内容中的就业情绪

摘要: 本文基于群体智能优化算法改进了变压器模型，旨在预测美国社交媒体上与就业相关的文本内容的情绪。通过文本预处理、特征提取和向量化，文本数据成功转换为数值数据并导入模型进行训练。实验结果显示，在训练过程中，模型的准确率从49.27%逐渐增加至82.83%，而损失值从0.67降至0.35，表明模型在训练集上的性能显著提高。根据训练集的混淆矩阵分析，训练集的准确率为86.15%。测试集的混淆矩阵也表现良好，准确率为82.91%。训练集和测试集之间的准确率差异仅为3.24%，表明该模型具有较强的泛化能力。此外，多边形结果的评估显示，该模型在分类准确度、敏感度、特异度和曲线下面积（AUC）方面表现良好，Kappa系数为0.66，F-度量为0.80，进一步验证了该模型在社交媒体情感分析中的有效性。本文提出的改进模型不仅提高了社交媒体上与就业相关文本的情绪识别准确率，而且具有重要的实际意义。这种基于社交媒体的数据分析方法不仅可以及时捕捉社会动态，还可以促使决策者关注公众关注，并为改善就业条件提供数据支持。

更新时间: 2024-10-09 03:14:05

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2410.10874v1

Chemistry-Inspired Diffusion with Non-Differentiable Guidance

Recent advances in diffusion models have shown remarkable potential in the conditional generation of novel molecules. These models can be guided in two ways: (i) explicitly, through additional features representing the condition, or (ii) implicitly, using a property predictor. However, training property predictors or conditional diffusion models requires an abundance of labeled data and is inherently challenging in real-world applications. We propose a novel approach that attenuates the limitations of acquiring large labeled datasets by leveraging domain knowledge from quantum chemistry as a non-differentiable oracle to guide an unconditional diffusion model. Instead of relying on neural networks, the oracle provides accurate guidance in the form of estimated gradients, allowing the diffusion process to sample from a conditional distribution specified by quantum chemistry. We show that this results in more precise conditional generation of novel and stable molecular structures. Our experiments demonstrate that our method: (1) significantly reduces atomic forces, enhancing the validity of generated molecules when used for stability optimization; (2) is compatible with both explicit and implicit guidance in diffusion models, enabling joint optimization of molecular properties and stability; and (3) generalizes effectively to molecular optimization tasks beyond stability optimization.

Updated: 2024-10-09 03:10:21

标题: 受化学启发的具有非可微引导的扩散

摘要: 最近扩散模型的发展展示出在生成新颖分子方面具有显著潜力。这些模型可以通过两种方式进行引导：(i) 明确地，通过表示条件的附加特征，或者(ii) 隐式地，使用属性预测器。然而，训练属性预测器或有条件的扩散模型需要大量标记数据，在现实应用中具有固有挑战。我们提出了一种新颖的方法，通过利用量子化学领域知识作为不可微分的神谕来引导无条件的扩散模型，减轻获取大型标记数据集的限制。神谕提供了准确的指导，以估算梯度的形式，允许扩散过程从由量子化学指定的条件分布中进行采样。我们展示，这导致更精确的生成新颖和稳定的分子结构。我们的实验表明，我们的方法：(1) 显著降低了原子力，增强了生成分子在稳定性优化中的有效性；(2) 与扩散模型中的明确和隐式引导兼容，实现了分子属性和稳定性的联合优化；以及(3) 在稳定性优化之外的分子优化任务中有效泛化。

更新时间: 2024-10-09 03:10:21

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2410.06502v1

Standard Gaussian Process Can Be Excellent for High-Dimensional Bayesian Optimization

A longstanding belief holds that Bayesian Optimization (BO) with standard Gaussian processes (GP) -- referred to as standard BO -- underperforms in high-dimensional optimization problems. While this belief seems plausible, it lacks both robust empirical evidence and theoretical justification. To address this gap, we present a systematic investigation. First, through a comprehensive evaluation across eleven widely used benchmarks, we found that while the popular Square Exponential (SE) kernel often leads to poor performance, using Matern kernels enables standard BO to consistently achieve top-tier results, frequently surpassing methods specifically designed for high-dimensional optimization. Second, our theoretical analysis reveals that the SE kernels failure primarily stems from improper initialization of the length-scale parameters, which are commonly used in practice but can cause gradient vanishing in training. We provide a probabilistic bound to characterize this issue, showing that Matern kernels are less susceptible and can robustly handle much higher dimensions. Third, we propose a simple robust initialization strategy that dramatically improves the performance of the SE kernel, bringing it close to state of the art methods, without requiring any additional priors or regularization. We prove another probabilistic bound that demonstrates how the gradient vanishing issue can be effectively mitigated with our method. Our findings advocate for a re-evaluation of standard BOs potential in high-dimensional settings.

Updated: 2024-10-09 02:58:27

标题: 标准高斯过程在高维贝叶斯优化中表现出色

摘要: 一种长期以来的信念认为，具有标准高斯过程（GP）的贝叶斯优化（BO）——称为标准BO——在高维优化问题中表现不佳。尽管这种看法似乎合理，但缺乏稳健的经验证据和理论依据。为了填补这一空白，我们进行了系统性的调查。首先，通过对十一个广泛使用的基准进行全面评估，我们发现虽然流行的平方指数（SE）核通常导致性能不佳，但使用马特恩核使得标准BO能够持续实现顶级结果，经常超越专门设计用于高维优化的方法。其次，我们的理论分析显示，SE核的失败主要源于长度尺度参数的不当初始化，这些参数在实践中常用，但可能导致训练中的梯度消失。我们提供了一个概率界限来表征这个问题，表明马特恩核不太容易受到影响，并且可以稳健地处理更高的维度。第三，我们提出了一种简单的稳健初始化策略，显著提高了SE核的性能，使其接近最先进的方法，而无需任何额外的先验或正则化。我们证明了另一个概率界限，展示了如何有效地通过我们的方法缓解梯度消失问题。我们的发现主张重新评估标准BO在高维环境中的潜力。

更新时间: 2024-10-09 02:58:27

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2402.02746v4

LM-HT SNN: Enhancing the Performance of SNN to ANN Counterpart through Learnable Multi-hierarchical Threshold Model

Compared to traditional Artificial Neural Network (ANN), Spiking Neural Network (SNN) has garnered widespread academic interest for its intrinsic ability to transmit information in a more energy-efficient manner. However, despite previous efforts to optimize the learning algorithm of SNNs through various methods, SNNs still lag behind ANNs in terms of performance. The recently proposed multi-threshold model provides more possibilities for further enhancing the learning capability of SNNs. In this paper, we rigorously analyze the relationship among the multi-threshold model, vanilla spiking model and quantized ANNs from a mathematical perspective, then propose a novel LM-HT model, which is an equidistant multi-threshold model that can dynamically regulate the global input current and membrane potential leakage on the time dimension. The LM-HT model can also be transformed into a vanilla single threshold model through reparameterization, thereby achieving more flexible hardware deployment. In addition, we note that the LM-HT model can seamlessly integrate with ANN-SNN Conversion framework under special initialization. This novel hybrid learning framework can effectively improve the relatively poor performance of converted SNNs under low time latency. Extensive experimental results have demonstrated that our model can outperform previous state-of-the-art works on various types of datasets, which promote SNNs to achieve a brand-new level of performance comparable to quantized ANNs. Code is available at https://github.com/hzc1208/LMHT_SNN.

Updated: 2024-10-09 02:56:46

标题: LM-HT SNN: 通过可学习的多层次阈值模型增强SNN到ANN对应模型的性能

摘要: 与传统的人工神经网络（ANN）相比，尖峰神经网络（SNN）因其固有的更节能传输信息的能力而引起了广泛的学术兴趣。然而，尽管先前通过各种方法优化了SNN的学习算法，但在性能方面，SNN仍然落后于ANN。最近提出的多阈值模型为进一步增强SNN的学习能力提供了更多可能性。本文从数学角度对多阈值模型、基本尖峰模型和量化的ANN进行了严格分析，然后提出了一种新颖的LM-HT模型，这是一个等距多阈值模型，可以动态调节时间维度上的全局输入电流和膜电位泄漏。LM-HT模型还可以通过重新参数化转化为基本的单一阈值模型，从而实现更灵活的硬件部署。此外，我们注意到LM-HT模型可以在特殊初始化下与ANN-SNN转换框架无缝集成。这种新颖的混合学习框架可以有效地提高在低时延下转换SNN的相对较差性能。大量实验结果证明，我们的模型可以在各种类型的数据集上胜过先前的最新工作，从而使SNN能够达到与量化的ANN相媲美的全新性能水平。代码可在https://github.com/hzc1208/LMHT_SNN获取。

更新时间: 2024-10-09 02:56:46

领域: cs.NE,cs.AI,cs.CV

下载: http://arxiv.org/abs/2402.00411v2

ERCache: An Efficient and Reliable Caching Framework for Large-Scale User Representations in Meta's Ads System

The increasing complexity of deep learning models used for calculating user representations presents significant challenges, particularly with limited computational resources and strict service-level agreements (SLAs). Previous research efforts have focused on optimizing model inference but have overlooked a critical question: is it necessary to perform user model inference for every ad request in large-scale social networks? To address this question and these challenges, we first analyze user access patterns at Meta and find that most user model inferences occur within a short timeframe. T his observation reveals a triangular relationship among model complexity, embedding freshness, and service SLAs. Building on this insight, we designed, implemented, and evaluated ERCache, an efficient and robust caching framework for large-scale user representations in ads recommendation systems on social networks. ERCache categorizes cache into direct and failover types and applies customized settings and eviction policies for each model, effectively balancing model complexity, embedding freshness, and service SLAs, even considering the staleness introduced by caching. ERCache has been deployed at Meta for over six months, supporting more than 30 ranking models while efficiently conserving computational resources and complying with service SLA requirements.

Updated: 2024-10-09 02:51:27

标题: ERCache：Meta广告系统中大规模用户表示的高效可靠缓存框架

摘要: 用于计算用户表示的深度学习模型的日益复杂性带来了重大挑战，特别是在有限的计算资源和严格的服务级别协议（SLAs）的情况下。先前的研究工作集中在优化模型推断上，但忽视了一个关键问题：在大规模社交网络中是否有必要对每个广告请求执行用户模型推断？为了解决这个问题和这些挑战，我们首先分析了在Meta的用户访问模式，并发现大多数用户模型推断发生在短时间内。这一观察结果揭示了模型复杂性、嵌入新鲜度和服务SLAs之间的三角关系。基于这一洞察，我们设计、实施并评估了ERCache，这是一个用于社交网络广告推荐系统中大规模用户表示的高效和稳健的缓存框架。ERCache将缓存分为直接和故障转移类型，并为每个模型应用定制设置和驱逐策略，有效平衡模型复杂性、嵌入新鲜度和服务SLAs，甚至考虑缓存引入的陈旧性。ERCache已在Meta部署超过六个月，支持超过30个排名模型，同时有效节省计算资源并符合服务SLA要求。

更新时间: 2024-10-09 02:51:27

领域: cs.IR,cs.AI,cs.DC,cs.LG

下载: http://arxiv.org/abs/2410.06497v1

Efficient representation learning of scintillation signal characteristics with spectrum-inspired temporal neural networks

Nuclear radiation detectors based on scintillators are widely used in particle and high energy physics experiments, nuclear medicine imaging, industrial and environmental detection, etc. Precisely extracting scintillation signal characteristics at the event level is important for these applications, not only in respect of understanding the scintillator itself, but also kinds and physical property of incident particles. Recent researches demonstrate data-driven neural networks are superior to traditional statistical methods, especially when the analytical form of signals is hard to obtain, or noise is significant. However, most densely connected or convolution-based networks fail to fully exploit the spectral and temporal structure of scintillation signals, leaving large space for performance improvement. In this paper, we propose a network architecture specially tailored for scintillation signal characterization based on previous works on time series analysis. By directly applying Fast Fourier Transform on original signals without data embedding, including the zero-frequency component, adjusting convolution scheme for low-frequency components, and unbiasedly re-weighting features from different frequencies, the proposed network architecture can serve as a lightweight and enhanced representation learning backbone. We prove our idea on simulation data generated with the setting of the LUX dark matter detector, and on experimental electrical signals with fast electronics to emulate scintillation variations. The proposed model achieves significantly better results than the reference model in literature and densely connected models without representation learning.

Updated: 2024-10-09 02:44:53

标题: 用受频谱启发的时间神经网络高效表示闪烁信号特征学习

摘要: 基于闪烁体的核辐射探测器被广泛应用于粒子和高能物理实验、核医学成像、工业和环境检测等领域。在事件级别准确提取闪烁信号特征对于这些应用非常重要，不仅有助于理解闪烁体本身，还有助于了解入射粒子的种类和物理特性。最近的研究表明，基于数据驱动的神经网络优于传统的统计方法，特别是当信号的分析形式难以获得或噪声显著时。然而，大多数密集连接或基于卷积的网络未能充分利用闪烁信号的频谱和时间结构，留下了大量的性能改进空间。本文提出了一种专门针对闪烁信号特征的网络架构，基于先前关于时间序列分析的工作。通过直接在原始信号上应用快速傅立叶变换而无需数据嵌入，包括零频率分量，调整卷积方案以处理低频分量，并对来自不同频率的特征进行无偏重加权，所提出的网络架构可以作为一种轻量级和增强的表示学习骨干。我们在设置为LUX暗物质探测器的模拟数据和使用快速电子模拟闪烁变化的实验电子信号上证明了我们的想法。所提出的模型比文献中的参考模型和未进行表示学习的密集连接模型取得了显著更好的结果。

更新时间: 2024-10-09 02:44:53

领域: physics.ins-det,cs.LG,physics.data-an

下载: http://arxiv.org/abs/2410.07267v1

BiC-MPPI: Goal-Pursuing, Sampling-Based Bidirectional Rollout Clustering Path Integral for Trajectory Optimization

This paper introduces the Bidirectional Clustered MPPI (BiC-MPPI) algorithm, a novel trajectory optimization method aimed at enhancing goal-directed guidance within the Model Predictive Path Integral (MPPI) framework. BiC-MPPI incorporates bidirectional dynamics approximations and a new guide cost mechanism, improving both trajectory planning and goal-reaching performance. By leveraging forward and backward rollouts, the bidirectional approach ensures effective trajectory connections between initial and terminal states, while the guide cost helps discover dynamically feasible paths. Experimental results demonstrate that BiC-MPPI outperforms existing MPPI variants in both 2D and 3D environments, achieving higher success rates and competitive computation times across 900 simulations on a modified BARN dataset for autonomous navigation. GitHub: https://github.com/i-ASL/BiC-MPPI

Updated: 2024-10-09 02:36:35

标题: BiC-MPPI：面向目标的、基于采样的双向展开聚类路径积分用于轨迹优化

摘要: 本文介绍了双向聚类MPPI（BiC-MPPI）算法，这是一种旨在增强模型预测路径积分（MPPI）框架中目标导向引导的新型轨迹优化方法。BiC-MPPI结合了双向动力学近似和新的引导成本机制，提高了轨迹规划和目标达成性能。通过利用正向和反向推出，双向方法确保了初始和终端状态之间的有效轨迹连接，同时引导成本有助于发现动态可行路径。实验结果表明，BiC-MPPI在二维和三维环境中优于现有的MPPI变体，在修改的BARN数据集上进行了900次模拟，实现了更高的成功率和竞争性的计算时间。GitHub链接：https://github.com/i-ASL/BiC-MPPI

更新时间: 2024-10-09 02:36:35

领域: cs.RO,cs.AI,cs.SY,eess.SY,math.OC,68T40, 13P25,I.2.9; I.2.8; G.1.6; G.4

下载: http://arxiv.org/abs/2410.06493v1

Honesty to Subterfuge: In-Context Reinforcement Learning Can Make Honest Models Reward Hack

Previous work has shown that training "helpful-only" LLMs with reinforcement learning on a curriculum of gameable environments can lead models to generalize to egregious specification gaming, such as editing their own reward function or modifying task checklists to appear more successful. We show that gpt-4o, gpt-4o-mini, o1-preview, and o1-mini - frontier models trained to be helpful, harmless, and honest - can engage in specification gaming without training on a curriculum of tasks, purely from in-context iterative reflection (which we call in-context reinforcement learning, "ICRL"). We also show that using ICRL to generate highly-rewarded outputs for expert iteration (compared to the standard expert iteration reinforcement learning algorithm) may increase gpt-4o-mini's propensity to learn specification-gaming policies, generalizing (in very rare cases) to the most egregious strategy where gpt-4o-mini edits its own reward function. Our results point toward the strong ability of in-context reflection to discover rare specification-gaming strategies that models might not exhibit zero-shot or with normal training, highlighting the need for caution when relying on alignment of LLMs in zero-shot settings.

Updated: 2024-10-09 02:34:27

标题: 从诚实到诡计：上下文强化学习可以使诚实模型获得奖励 hack

摘要: 先前的研究表明，通过在可玩环境的课程上使用强化学习训练“仅有帮助性”的LLMs可以使模型推广到严重的规范游戏，例如编辑自己的奖励函数或修改任务清单以显得更成功。我们展示了gpt-4o、gpt-4o-mini、o1-preview和o1-mini - 被训练为有益、无害和诚实的前沿模型可以在没有任务课程训练的情况下从上下文迭代反思（我们称之为上下文强化学习，“ICRL”）中参与规范游戏。我们还展示了使用ICRL为专家迭代生成高度奖励的输出（与标准的专家迭代强化学习算法相比）可能会增加gpt-4o-mini学习规范游戏策略的倾向，推广（在极少数情况下）到最严重的策略，即gpt-4o-mini编辑自己的奖励函数。我们的结果指向上下文反思发现模型可能在零-shot或正常训练中没有展示的罕见规范游戏策略的强大能力，强调在零-shot设置中依赖LLMs的对齐时需要谨慎。

更新时间: 2024-10-09 02:34:27

领域: cs.AI,cs.LG

下载: http://arxiv.org/abs/2410.06491v1

FedL2G: Learning to Guide Local Training in Heterogeneous Federated Learning

Data and model heterogeneity are two core issues in Heterogeneous Federated Learning (HtFL). In scenarios with heterogeneous model architectures, aggregating model parameters becomes infeasible, leading to the use of prototypes (i.e., class representative feature vectors) for aggregation and guidance. However, they still experience a mismatch between the extra guiding objective and the client's original local objective when aligned with global prototypes. Thus, we propose a Federated Learning-to-Guide (FedL2G) method that adaptively learns to guide local training in a federated manner and ensures the extra guidance is beneficial to clients' original tasks. With theoretical guarantees, FedL2G efficiently implements the learning-to-guide process using only first-order derivatives w.r.t. model parameters and achieves a non-convex convergence rate of O(1/T). We conduct extensive experiments on two data heterogeneity and six model heterogeneity settings using 14 heterogeneous model architectures (e.g., CNNs and ViTs) to demonstrate FedL2G's superior performance compared to six counterparts.

Updated: 2024-10-09 02:31:49

标题: FedL2G：学习在异构联邦学习中引导本地训练

摘要: 数据和模型异质性是异质联邦学习（HtFL）中的两个核心问题。在具有异构模型架构的场景中，聚合模型参数变得不可行，导致使用原型（即，类代表性特征向量）用于聚合和指导。然而，当与全局原型对齐时，它们仍然经历额外指导目标与客户端原始本地目标之间的不匹配。因此，我们提出了一种名为Federated Learning-to-Guide（FedL2G）的方法，该方法在联邦方式下自适应学习指导本地训练，并确保额外指导对客户端的原始任务有益。FedL2G在仅使用模型参数的一阶导数的情况下高效实现了学习指导过程，并实现了O(1/T)的非凸收敛速度。我们在两种数据异质性和六种模型异质性设置上进行了大量实验，使用14种异构模型架构（例如，CNN和ViTs）来展示与六个对照方法相比，FedL2G的卓越性能。

更新时间: 2024-10-09 02:31:49

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2410.06490v1

FreeAvatar: Robust 3D Facial Animation Transfer by Learning an Expression Foundation Model

Video-driven 3D facial animation transfer aims to drive avatars to reproduce the expressions of actors. Existing methods have achieved remarkable results by constraining both geometric and perceptual consistency. However, geometric constraints (like those designed on facial landmarks) are insufficient to capture subtle emotions, while expression features trained on classification tasks lack fine granularity for complex emotions. To address this, we propose \textbf{FreeAvatar}, a robust facial animation transfer method that relies solely on our learned expression representation. Specifically, FreeAvatar consists of two main components: the expression foundation model and the facial animation transfer model. In the first component, we initially construct a facial feature space through a face reconstruction task and then optimize the expression feature space by exploring the similarities among different expressions. Benefiting from training on the amounts of unlabeled facial images and re-collected expression comparison dataset, our model adapts freely and effectively to any in-the-wild input facial images. In the facial animation transfer component, we propose a novel Expression-driven Multi-avatar Animator, which first maps expressive semantics to the facial control parameters of 3D avatars and then imposes perceptual constraints between the input and output images to maintain expression consistency. To make the entire process differentiable, we employ a trained neural renderer to translate rig parameters into corresponding images. Furthermore, unlike previous methods that require separate decoders for each avatar, we propose a dynamic identity injection module that allows for the joint training of multiple avatars within a single network.

Updated: 2024-10-09 02:29:57

标题: FreeAvatar：通过学习表情基础模型实现稳健的3D面部动画转移

摘要: 视频驱动的3D面部动画传输旨在驱动化身复制演员的表情。现有方法通过限制几何和感知一致性均取得了显著成果。然而，几何约束（如设计在面部特征点上的约束）不足以捕捉微妙的情绪，而基于分类任务训练的表情特征缺乏复杂情绪的细粒度。为了解决这个问题，我们提出了\textbf{FreeAvatar}，一种强大的面部动画传输方法，仅依赖于我们学习的表情表示。具体而言，FreeAvatar包括两个主要组件：表情基础模型和面部动画传输模型。在第一个组件中，我们通过面部重建任务首先构建面部特征空间，然后通过探索不同表情之间的相似性来优化表情特征空间。由于在大量未标记的面部图像和重新收集的表情比较数据集上进行训练，我们的模型可以自由且有效地适应任何野外输入的面部图像。在面部动画传输组件中，我们提出了一种新颖的表情驱动多化身动画师，首先将富有表现力的语义映射到3D化身的面部控制参数，然后在输入和输出图像之间施加感知约束以保持表情一致性。为了使整个过程可微分，我们采用训练有素的神经渲染器将刚性参数转换为相应的图像。此外，与以前需要为每个化身单独解码器的方法不同，我们提出了一个动态身份注入模块，允许在单个网络中联合训练多个化身。

更新时间: 2024-10-09 02:29:57

领域: cs.GR,cs.AI

下载: http://arxiv.org/abs/2409.13180v2

Exploring Adversarial Robustness of Deep State Space Models

Deep State Space Models (SSMs) have proven effective in numerous task scenarios but face significant security challenges due to Adversarial Perturbations (APs) in real-world deployments. Adversarial Training (AT) is a mainstream approach to enhancing Adversarial Robustness (AR) and has been validated on various traditional DNN architectures. However, its effectiveness in improving the AR of SSMs remains unclear. While many enhancements in SSM components, such as integrating Attention mechanisms and expanding to data-dependent SSM parameterizations, have brought significant gains in Standard Training (ST) settings, their potential benefits in AT remain unexplored. To investigate this, we evaluate existing structural variants of SSMs with AT to assess their AR performance. We observe that pure SSM structures struggle to benefit from AT, whereas incorporating Attention yields a markedly better trade-off between robustness and generalization for SSMs in AT compared to other components. Nonetheless, the integration of Attention also leads to Robust Overfitting (RO) issues. To understand these phenomena, we empirically and theoretically analyze the output error of SSMs under AP. We find that fixed-parameterized SSMs have output error bounds strictly related to their parameters, limiting their AT benefits, while input-dependent SSMs may face the problem of error explosion. Furthermore, we show that the Attention component effectively scales the output error of SSMs during training, enabling them to benefit more from AT, but at the cost of introducing RO due to its high model complexity. Inspired by this, we propose a simple and effective Adaptive Scaling (AdS) mechanism that brings AT performance close to Attention-integrated SSMs without introducing the issue of RO. Our code is available at https://github.com/Biqing-Qi/Exploring-Adversarial-Robustness-of-Deep-State-Space-Models.git.

Updated: 2024-10-09 02:28:56

标题: 探究深度状态空间模型的对抗性稳健性

摘要: 深度状态空间模型（SSM）在许多任务场景中已被证明有效，但在真实世界部署中面临着由对抗性扰动（APs）引起的重大安全挑战。对抗性训练（AT）是增强对抗性鲁棒性（AR）的主流方法，已在各种传统DNN架构上得到验证。然而，它在改善SSM的AR方面的有效性仍不清楚。虽然在SSM组件中的许多增强措施，如整合注意力机制和扩展到数据相关的SSM参数化，在标准训练（ST）设置中带来了显著的收益，但它们在AT中的潜在益处尚未被探索。为了调查这一点，我们评估现有的SSM结构变体与AT相结合，以评估它们的AR性能。我们观察到纯SSM结构很难从AT中受益，而整合注意力能够在AT中为SSM提供更显著的鲁棒性和泛化之间的权衡。然而，注意力的整合也会导致鲁棒过拟合（RO）问题。为了理解这些现象，我们从经验和理论上分析了SSM在AP下的输出错误。我们发现固定参数化的SSM的输出错误界限严格与其参数相关，限制了它们的AT益处，而依赖于输入的SSM可能面临错误爆炸的问题。此外，我们展示了注意力组件在训练过程中有效地调整了SSM的输出错误，使它们更能从AT中受益，但由于其高模型复杂性，这也导致了RO的问题。受此启发，我们提出了一种简单有效的自适应缩放（AdS）机制，将AT性能接近于整合注意力的SSM，而不会引入RO问题。我们的代码可在https://github.com/Biqing-Qi/Exploring-Adversarial-Robustness-of-Deep-State-Space-Models.git上找到。

更新时间: 2024-10-09 02:28:56

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2406.05532v2

AuditWen:An Open-Source Large Language Model for Audit

Intelligent auditing represents a crucial advancement in modern audit practices, enhancing both the quality and efficiency of audits within the realm of artificial intelligence. With the rise of large language model (LLM), there is enormous potential for intelligent models to contribute to audit domain. However, general LLMs applied in audit domain face the challenges of lacking specialized knowledge and the presence of data biases. To overcome these challenges, this study introduces AuditWen, an open-source audit LLM by fine-tuning Qwen with constructing instruction data from audit domain. We first outline the application scenarios for LLMs in the audit and extract requirements that shape the development of LLMs tailored for audit purposes. We then propose an audit LLM, called AuditWen, by fine-tuning Qwen with constructing 28k instruction dataset from 15 audit tasks and 3 layers. In evaluation stage, we proposed a benchmark with 3k instructions that covers a set of critical audit tasks derived from the application scenarios. With the benchmark, we compare AuditWen with other existing LLMs from information extraction, question answering and document generation. The experimental results demonstrate superior performance of AuditWen both in question understanding and answer generation, making it an immediately valuable tool for audit.

Updated: 2024-10-09 02:28:55

标题: 审计文：一种用于审计的开源大型语言模型

摘要: 智能审计代表了现代审计实践中的重要进展，提升了人工智能领域内审计的质量和效率。随着大型语言模型（LLM）的兴起，智能模型在审计领域有巨大潜力。然而，应用于审计领域的通用LLM面临缺乏专业知识和数据偏见的挑战。为了克服这些挑战，本研究引入了AuditWen，一个通过从审计领域构建指令数据对Qwen进行微调的开源审计LLM。我们首先概述了LLM在审计中的应用场景，并提取了塑造为审计目的定制的LLM发展的要求。然后，我们通过使用来自15个审计任务和3个层次的28k指令数据集对Qwen进行微调，提出了一个审计LLM，称为AuditWen。在评估阶段，我们提出了一个包含来自应用场景的一组关键审计任务的3k指令的基准。借助这个基准，我们将AuditWen与其他现有的LLM进行了比较，包括信息提取、问题回答和文档生成。实验结果表明，AuditWen在问题理解和答案生成方面表现出优越性能，使其成为一个立即有价值的审计工具。

更新时间: 2024-10-09 02:28:55

领域: cs.CL,cs.AI,cs.CY

下载: http://arxiv.org/abs/2410.10873v1

Composite Learning Units: Generalized Learning Beyond Parameter Updates to Transform LLMs into Adaptive Reasoners

Human learning thrives on the ability to learn from mistakes, adapt through feedback, and refine understanding-processes often missing in static machine learning models. In this work, we introduce Composite Learning Units (CLUs) designed to transform reasoners, such as Large Language Models (LLMs), into learners capable of generalized, continuous learning without conventional parameter updates while enhancing their reasoning abilities through continual interaction and feedback. CLUs are built on an architecture that allows a reasoning model to maintain and evolve a dynamic knowledge repository: a General Knowledge Space for broad, reusable insights and a Prompt-Specific Knowledge Space for task-specific learning. Through goal-driven interactions, CLUs iteratively refine these knowledge spaces, enabling the system to adapt dynamically to complex tasks, extract nuanced insights, and build upon past experiences autonomously. We demonstrate CLUs' effectiveness through a cryptographic reasoning task, where they continuously evolve their understanding through feedback to uncover hidden transformation rules. While conventional models struggle to grasp underlying logic, CLUs excel by engaging in an iterative, goal-oriented process. Specialized components-handling knowledge retrieval, prompt generation, and feedback analysis-work together within a reinforcing feedback loop. This approach allows CLUs to retain the memory of past failures and successes, adapt autonomously, and apply sophisticated reasoning effectively, continually learning from mistakes while also building on breakthroughs.

Updated: 2024-10-09 02:27:58

标题: 复合学习单元：将LLMs转变为自适应推理器的广义学习超越参数更新

摘要: 人类学习依赖于从错误中学习、通过反馈进行调整，并完善理解的能力，这些过程通常在静态机器学习模型中缺失。在这项工作中，我们介绍了复合学习单元（CLUs），旨在将推理者，如大型语言模型（LLMs），转变为能够进行泛化、持续学习的学习者，而不需要传统的参数更新，同时通过持续的交互和反馈增强他们的推理能力。CLUs建立在一个架构之上，允许一个推理模型维护和发展一个动态的知识库：一个用于广泛、可重复利用见解的通用知识空间，以及一个用于任务特定学习的特定提示知识空间。通过目标驱动的交互，CLUs迭代地完善这些知识空间，使系统能够动态适应复杂任务，提取微妙的见解，并自主地建立在过去经验的基础上。我们通过一个加密推理任务展示了CLUs的有效性，在这个任务中，他们通过反馈不断演化他们的理解，以揭示隐藏的转换规则。而传统模型很难理解潜在的逻辑，CLUs通过参与一个迭代、目标导向的过程而表现出色。专门的组件-处理知识检索、提示生成和反馈分析-在一个增强反馈循环中共同工作。这种方法使CLUs能够保留过去失败和成功的记忆，自主地适应，并有效地应用复杂的推理，不断从错误中学习，同时也在突破中积累。

更新时间: 2024-10-09 02:27:58

领域: cs.LG,cs.AI,cs.CL,cs.MA

下载: http://arxiv.org/abs/2410.08037v1

SR-PredictAO: Session-based Recommendation with High-Capability Predictor Add-On

Session-based recommendation, aiming at making the prediction of the user's next item click based on the information in a single session only, even in the presence of some random user's behavior, is a complex problem. This complex problem requires a high-capability model of predicting the user's next action. Most (if not all) existing models follow the encoder-predictor paradigm where all studies focus on how to optimize the encoder module extensively in the paradigm, but they overlook how to optimize the predictor module. In this paper, we discover the critical issue of the low-capability predictor module among existing models. Motivated by this, we propose a novel framework called *Session-based Recommendation with Predictor Add-On* (SR-PredictAO). In this framework, we propose a high-capability predictor module which could alleviate the effect of random user's behavior for prediction. It is worth mentioning that this framework could be applied to any existing models, which could give opportunities for further optimizing the framework. Extensive experiments on two real-world benchmark datasets for three state-of-the-art models show that *SR-PredictAO* out-performs the current state-of-the-art model by up to 2.9% in HR@20 and 2.3% in MRR@20. More importantly, the improvement is consistent across almost all the existing models on all datasets, and is statistically significant, which could be regarded as a significant contribution in the field.

Updated: 2024-10-09 02:27:04

标题: SR-PredictAO：具有高性能预测器附加功能的基于会话的推荐

摘要: 基于会话的推荐旨在仅基于单个会话中的信息预测用户的下一个项目点击，即使存在一些随机用户行为，这是一个复杂的问题。这个复杂的问题需要一个高能力的模型来预测用户的下一个行动。大多数（如果不是所有）现有模型都遵循编码器-预测器范式，在这个范式中所有研究都集中在如何广泛优化编码器模块，但它们忽视了如何优化预测器模块。在本文中，我们发现现有模型中低能力预测器模块的关键问题。在此基础上，我们提出了一个名为“Session-based Recommendation with Predictor Add-On”（SR-PredictAO）的新框架。在这个框架中，我们提出了一个高能力的预测器模块，可以减轻随机用户行为对预测的影响。值得一提的是，这个框架可以应用于任何现有模型，从而为进一步优化框架提供机会。对三种最先进模型的两个真实世界基准数据集进行的广泛实验表明，“SR-PredictAO”在HR@20和MRR@20上的表现优于当前最先进模型高达2.9％和2.3％。更重要的是，改进在几乎所有现有模型的所有数据集上都是一致的，并且在统计上具有显著意义，可以被视为该领域的重要贡献。

更新时间: 2024-10-09 02:27:04

领域: cs.IR,cs.LG

下载: http://arxiv.org/abs/2309.12218v2

Deep Learning Ensemble for Predicting Diabetic Macular Edema Onset Using Ultra-Wide Field Color Fundus Image

Diabetic macular edema (DME) is a severe complication of diabetes, characterized by thickening of the central portion of the retina due to accumulation of fluid. DME is a significant and common cause of visual impairment in diabetic patients. Center-involved DME (ci-DME) is the highest risk form of disease as fluid extends close to the fovea which is responsible for sharp central vision. Earlier diagnosis or prediction of ci-DME may improve treatment outcomes. Here, we propose an ensemble method to predict ci-DME onset within a year using ultra-wide-field color fundus photography (UWF-CFP) images provided by the DIAMOND Challenge. We adopted a variety of baseline state-of-the-art classification networks including ResNet, DenseNet, EfficientNet, and VGG with the aim of enhancing model robustness. The best performing models were Densenet 121, Resnet 152 and EfficientNet b7, and these were assembled into a definitive predictive model. The final ensemble model demonstrates a strong performance with an Area Under Curve (AUC) of 0.7017, an F1 score of 0.6512, and an Expected Calibration Error (ECE) of 0.2057 when deployed on a synthetic dataset. The performance of this ensemble model is comparable to previous studies despite training and testing in a more realistic setting, indicating the potential of UWF-CFP combined with a deep learning classification system to facilitate earlier diagnosis, better treatment decisions, and improved prognostication in ci-DME.

Updated: 2024-10-09 02:16:29

标题: 深度学习集成用于预测使用超广角彩色底片图像诊断糖尿病黄斑水肿的发作

摘要: 糖尿病性黄斑水肿（DME）是糖尿病的严重并发症，其特点是由于液体积聚导致视网膜中央部位增厚。DME是糖尿病患者视力障碍的重要和常见原因。涉及中央的DME（ci-DME）是疾病最高风险形式，因为液体延伸至负责清晰中央视觉的黄斑附近。早期诊断或预测ci-DME可能会改善治疗结果。在这里，我们提出了一种集成方法，使用DIAMOND Challenge提供的超广角彩色眼底摄影（UWF-CFP）图像来预测一年内ci-DME的发作。我们采用了各种基准现代分类网络，包括ResNet、DenseNet、EfficientNet和VGG，旨在增强模型的稳健性。表现最佳的模型是Densenet 121、Resnet 152和EfficientNet b7，这些模型被组装成一个确定性的预测模型。最终的集成模型展现出强大的性能，当在合成数据集上部署时，具有0.7017的曲线下面积（AUC）、0.6512的F1得分和0.2057的期望校准误差（ECE）。尽管在更现实的环境中进行训练和测试，这个集成模型的表现与以往研究相当，表明UWF-CFP结合深度学习分类系统有助于促进ci-DME的早期诊断、更好的治疗决策和改善预后的潜力。

更新时间: 2024-10-09 02:16:29

领域: eess.IV,cs.AI,cs.CV

下载: http://arxiv.org/abs/2410.06483v1

OledFL: Unleashing the Potential of Decentralized Federated Learning via Opposite Lookahead Enhancement

Decentralized Federated Learning (DFL) surpasses Centralized Federated Learning (CFL) in terms of faster training, privacy preservation, and light communication, making it a promising alternative in the field of federated learning. However, DFL still exhibits significant disparities with CFL in terms of generalization ability such as rarely theoretical understanding and degraded empirical performance due to severe inconsistency. In this paper, we enhance the consistency of DFL by developing an opposite lookahead enhancement technique (Ole), yielding OledFL to optimize the initialization of each client in each communication round, thus significantly improving both the generalization and convergence speed. Moreover, we rigorously establish its convergence rate in non-convex setting and characterize its generalization bound through uniform stability, which provides concrete reasons why OledFL can achieve both the fast convergence speed and high generalization ability. Extensive experiments conducted on the CIFAR10 and CIFAR100 datasets with Dirichlet and Pathological distributions illustrate that our OledFL can achieve up to 5\% performance improvement and 8$\times$ speedup, compared to the most popular DFedAvg optimizer in DFL.

Updated: 2024-10-09 02:16:14

标题: OledFL：通过相反的前瞻增强释放分散式联邦学习的潜力

摘要: 分散式联邦学习（DFL）在训练速度、隐私保护和轻量通信方面超过了集中式联邦学习（CFL），使其成为联邦学习领域中一种有前途的替代方案。然而，与CFL相比，DFL仍然存在显著差异，如很少的理论理解和由于严重不一致性导致的降低的实证性能。在本文中，我们通过开发一种逆向前瞻增强技术（Ole）来提高DFL的一致性，从而产生OledFL，优化每个通信轮次中每个客户端的初始化，从而显著提高泛化和收敛速度。此外，我们在非凸设置中严格建立了其收敛速度，并通过统一稳定性表征其泛化界限，这提供了OledFL能够实现快速收敛速度和高泛化能力的具体原因。在CIFAR10和CIFAR100数据集上进行了大量实验，采用狄利克雷和病态分布，结果表明我们的OledFL相比于DFL中最流行的DFedAvg优化器，可以实现高达5%的性能提升和8倍的加速。

更新时间: 2024-10-09 02:16:14

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2410.06482v1

Combining AI Control Systems and Human Decision Support via Robustness and Criticality

AI-enabled capabilities are reaching the requisite level of maturity to be deployed in the real world, yet do not always make correct or safe decisions. One way of addressing these concerns is to leverage AI control systems alongside and in support of human decisions, relying on the AI control system in safe situations while calling on a human co-decider for critical situations. We extend a methodology for adversarial explanations (AE) to state-of-the-art reinforcement learning frameworks, including MuZero. Multiple improvements to the base agent architecture are proposed. We demonstrate how this technology has two applications: for intelligent decision tools and to enhance training / learning frameworks. In a decision support context, adversarial explanations help a user make the correct decision by highlighting those contextual factors that would need to change for a different AI-recommended decision. As another benefit of adversarial explanations, we show that the learned AI control system demonstrates robustness against adversarial tampering. Additionally, we supplement AE by introducing strategically similar autoencoders (SSAs) to help users identify and understand all salient factors being considered by the AI system. In a training / learning framework, this technology can improve both the AI's decisions and explanations through human interaction. Finally, to identify when AI decisions would most benefit from human oversight, we tie this combined system to our prior art on statistically verified analyses of the criticality of decisions at any point in time.

Updated: 2024-10-09 02:16:02

标题: 结合人工智能控制系统和人类决策支持：通过稳健性和关键性

摘要: 人工智能（AI）能力已经达到了在现实世界中部署的成熟水平，但并不总是做出正确或安全的决策。解决这些问题的一种方法是利用AI控制系统，支持人类决策，依靠AI控制系统在安全情况下，同时在关键情况下寻求人类共同决策者的帮助。我们将对抗性解释（AE）方法扩展到最先进的强化学习框架，包括MuZero。提出了对基本代理架构的多种改进。我们展示了这项技术有两个应用：用于智能决策工具和增强培训/学习框架。在决策支持的背景下，对抗性解释通过突出显示那些需要改变的情境因素，帮助用户做出正确的决定，以实现不同的AI推荐决策。作为对抗性解释的另一个好处，我们展示了学习的AI控制系统表现出对抗性篡改的稳健性。此外，我们通过引入战略上相似的自动编码器（SSAs）来补充AE，帮助用户识别和理解AI系统考虑的所有重要因素。在培训/学习框架中，这项技术可以通过人类互动改善AI的决策和解释。最后，为了确定何时AI决策最需要人类监督，我们将这个联合系统与我们在任何时间点对决策重要性进行统计验证分析的先前技术联系起来。

更新时间: 2024-10-09 02:16:02

领域: cs.LG,cs.AI,cs.NE,68T07,I.2.6

下载: http://arxiv.org/abs/2407.03210v2

Leaf Stripping on Uniform Attachment Trees

In this note we analyze the performance of a simple root-finding algorithm in uniform attachment trees. The leaf-stripping algorithm recursively removes all leaves of the tree for a carefully chosen number of rounds. We show that, with probability $1 - \epsilon$, the set of remaining vertices contains the root and has a size only depending on $\epsilon$ but not on the size of the tree.

Updated: 2024-10-09 02:15:57

标题: 统一附着树上的叶片剥离

摘要: 在这篇论文中，我们分析了在均匀附加树中一个简单的根查找算法的性能。叶剥离算法递归地移除树的所有叶子节点，直到经过精心选择的轮数为止。我们证明，有概率$1 - \epsilon$，剩余顶点集合包含根节点，并且其大小仅取决于$\epsilon$，而不取决于树的大小。

更新时间: 2024-10-09 02:15:57

领域: math.PR,cs.DS,cs.LG,stat.ML

下载: http://arxiv.org/abs/2410.06481v1

TCGU: Data-centric Graph Unlearning based on Transferable Condensation

With growing demands for data privacy and model robustness, graph unlearning (GU), which erases the influence of specific data on trained GNN models, has gained significant attention. However, existing exact unlearning methods suffer from either low efficiency or poor model performance. While being more utility-preserving and efficient, current approximate unlearning methods are not applicable in the zero-glance privacy setting, where the deleted samples cannot be accessed during unlearning due to immediate deletion requested by regulations. Besides, these approximate methods, which try to directly perturb model parameters still involve high privacy concerns in practice. To fill the gap, we propose Transferable Condensation Graph Unlearning (TCGU), a data-centric solution to zero-glance graph unlearning. Specifically, we first design a two-level alignment strategy to pre-condense the original graph into a small yet utility-preserving dataset. Upon receiving an unlearning request, we fine-tune the pre-condensed data with a low-rank plugin, to directly align its distribution with the remaining graph, thus efficiently revoking the information of deleted data without accessing them. A novel similarity distribution matching approach and a discrimination regularizer are proposed to effectively transfer condensed data and preserve its utility in GNN training, respectively. Finally, we retrain the GNN on the transferred condensed data. Extensive experiments on 6 benchmark datasets demonstrate that TCGU can achieve superior performance in terms of model utility, unlearning efficiency, and unlearning efficacy than existing GU methods.

Updated: 2024-10-09 02:14:40

标题: TCGU：基于可转移浓缩的数据中心图去学习

摘要: 随着对数据隐私和模型稳健性需求的增长，图形遗忘（GU）已经引起了人们的重视，它可以消除特定数据对经过训练的GNN模型的影响。然而，现有的准确遗忘方法要么效率低，要么模型性能差。当前的近似遗忘方法在保留更多效用并提高效率方面表现更好，但在零查看隐私设置中却无法应用，因为由于法规要求立即删除，被删除的样本在遗忘过程中无法访问。此外，这些近似方法试图直接扰动模型参数，在实践中仍存在高隐私问题。为填补这一空白，我们提出了可转移的凝结图形遗忘（TCGU），这是一种数据中心的零查看图形遗忘解决方案。具体来说，我们首先设计了两级对齐策略，将原始图形预浓缩为一个小型但保留效用的数据集。在收到遗忘请求后，我们通过低秩插件对预浓缩数据进行微调，直接将其分布与剩余图形对齐，从而有效地撤销已删除数据的信息，而无需访问它们。提出了一种新颖的相似度分布匹配方法和一个区分正则化器，以有效地传输浓缩数据并在GNN训练中保留其效用。最后，我们在转移后的浓缩数据上重新训练GNN。对6个基准数据集进行的大量实验证明，TCGU在模型效用、遗忘效率和遗忘效果方面均优于现有的GU方法。

更新时间: 2024-10-09 02:14:40

领域: cs.LG

下载: http://arxiv.org/abs/2410.06480v1

Flipping-based Policy for Chance-Constrained Markov Decision Processes

Safe reinforcement learning (RL) is a promising approach for many real-world decision-making problems where ensuring safety is a critical necessity. In safe RL research, while expected cumulative safety constraints (ECSCs) are typically the first choices, chance constraints are often more pragmatic for incorporating safety under uncertainties. This paper proposes a \textit{flipping-based policy} for Chance-Constrained Markov Decision Processes (CCMDPs). The flipping-based policy selects the next action by tossing a potentially distorted coin between two action candidates. The probability of the flip and the two action candidates vary depending on the state. We establish a Bellman equation for CCMDPs and further prove the existence of a flipping-based policy within the optimal solution sets. Since solving the problem with joint chance constraints is challenging in practice, we then prove that joint chance constraints can be approximated into Expected Cumulative Safety Constraints (ECSCs) and that there exists a flipping-based policy in the optimal solution sets for constrained MDPs with ECSCs. As a specific instance of practical implementations, we present a framework for adapting constrained policy optimization to train a flipping-based policy. This framework can be applied to other safe RL algorithms. We demonstrate that the flipping-based policy can improve the performance of the existing safe RL algorithms under the same limits of safety constraints on Safety Gym benchmarks.

Updated: 2024-10-09 02:00:39

标题: 基于翻转的概率约束马尔可夫决策过程策略

摘要: 安全强化学习（RL）是许多现实世界决策问题的一种有前景的方法，其中确保安全是至关重要的。在安全RL研究中，尽管通常首选预期累积安全约束（ECSCs），但在不确定性下，机会约束往往更加实用用于融入安全性。本文提出了一种基于翻转的策略，用于机会约束马尔可夫决策过程（CCMDPs）。基于翻转的策略通过在两个行动候选项之间抛掷一个潜在扭曲的硬币来选择下一个行动。翻转的概率和两个行动候选项取决于状态。我们建立了CCMDPs的贝尔曼方程，并进一步证明了在最优解集合中存在基于翻转的策略。由于在实践中解决具有联合机会约束的问题是具有挑战性的，我们进一步证明了联合机会约束可以近似为预期累积安全约束（ECSCs），并且存在一个基于翻转的策略在具有ECSCs的约束MDPs中的最优解集合中。作为实际实现的一个具体实例，我们提出了一个框架，用于将受限策略优化调整为训练基于翻转的策略。这个框架可以应用于其他安全RL算法。我们证明，在安全性约束的限制下，基于翻转的策略可以改进现有的安全RL算法在Safety Gym基准测试中的性能。

更新时间: 2024-10-09 02:00:39

领域: cs.LG,math.OC

下载: http://arxiv.org/abs/2410.06474v1

Safety Margins for Reinforcement Learning

Any autonomous controller will be unsafe in some situations. The ability to quantitatively identify when these unsafe situations are about to occur is crucial for drawing timely human oversight in, e.g., freight transportation applications. In this work, we demonstrate that the true criticality of an agent's situation can be robustly defined as the mean reduction in reward given some number of random actions. Proxy criticality metrics that are computable in real-time (i.e., without actually simulating the effects of random actions) can be compared to the true criticality, and we show how to leverage these proxy metrics to generate safety margins, which directly tie the consequences of potentially incorrect actions to an anticipated loss in overall performance. We evaluate our approach on learned policies from APE-X and A3C within an Atari environment, and demonstrate how safety margins decrease as agents approach failure states. The integration of safety margins into programs for monitoring deployed agents allows for the real-time identification of potentially catastrophic situations.

Updated: 2024-10-09 01:57:11

标题: 强化学习的安全边界

摘要: 在某些情况下，任何自主控制器都可能不安全。定量地识别这些不安全情况即将发生的能力对于及时引入人类监督至关重要，例如在货运运输应用中。在这项工作中，我们展示了一个代理的真实关键性可以坚定地定义为给定一定数量的随机动作后奖励的平均减少量。可以在实时计算（即，不实际模拟随机动作的效果）的代理关键性指标可以与真实关键性进行比较，并展示了如何利用这些代理指标生成安全边界，直接将潜在错误行为的后果与预期的整体绩效损失联系起来。我们在Atari环境中从APE-X和A3C学习到的策略上评估了我们的方法，并展示了随着代理接近失败状态，安全边界如何减少。将安全边界集成到用于监控部署代理的程序中，可以实时识别潜在的灾难性情况。

更新时间: 2024-10-09 01:57:11

领域: cs.LG,cs.AI,cs.SY,eess.SY,68T07,I.2.6

下载: http://arxiv.org/abs/2307.13642v2

Enabling Novel Mission Operations and Interactions with ROSA: The Robot Operating System Agent

The advancement of robotic systems has revolutionized numerous industries, yet their operation often demands specialized technical knowledge, limiting accessibility for non-expert users. This paper introduces ROSA (Robot Operating System Agent), an AI-powered agent that bridges the gap between the Robot Operating System (ROS) and natural language interfaces. By leveraging state-of-the-art language models and integrating open-source frameworks, ROSA enables operators to interact with robots using natural language, translating commands into actions and interfacing with ROS through well-defined tools. ROSA's design is modular and extensible, offering seamless integration with both ROS1 and ROS2, along with safety mechanisms like parameter validation and constraint enforcement to ensure secure, reliable operations. While ROSA is originally designed for ROS, it can be extended to work with other robotics middle-wares to maximize compatibility across missions. ROSA enhances human-robot interaction by democratizing access to complex robotic systems, empowering users of all expertise levels with multi-modal capabilities such as speech integration and visual perception. Ethical considerations are thoroughly addressed, guided by foundational principles like Asimov's Three Laws of Robotics, ensuring that AI integration promotes safety, transparency, privacy, and accountability. By making robotic technology more user-friendly and accessible, ROSA not only improves operational efficiency but also sets a new standard for responsible AI use in robotics and potentially future mission operations. This paper introduces ROSA's architecture and showcases initial mock-up operations in JPL's Mars Yard, a laboratory, and a simulation using three different robots. The core ROSA library is available as open-source.

Updated: 2024-10-09 01:54:02

标题: 利用ROSA实现新型任务操作和交互：机器人操作系统代理

摘要: 机器人系统的进步已经彻底改变了许多行业，但它们的操作往往需要专门的技术知识，限制了非专家用户的可访问性。本文介绍了ROSA（Robot Operating System Agent），这是一款由人工智能驱动的代理，弥合了机器人操作系统（ROS）和自然语言接口之间的差距。通过利用最先进的语言模型并集成开源框架，ROSA使操作员能够使用自然语言与机器人进行交互，将命令转换为动作，并通过定义良好的工具与ROS进行接口。ROSA的设计是模块化和可扩展的，可以与ROS1和ROS2无缝集成，还提供参数验证和约束执行等安全机制，以确保安全可靠的操作。虽然ROSA最初是为ROS设计的，但可以扩展到与其他机器人中间件一起工作，以最大程度地提高任务的兼容性。ROSA通过使复杂的机器人系统更易于访问，赋予所有专业水平的用户多模态功能（如语音集成和视觉感知），从而增强了人机交互。道德考虑得到了充分解决，以亚西莫夫的《机器人三大定律》等基础原则为指导，确保AI整合促进安全、透明、隐私和问责。通过使机器人技术更加用户友好和可访问，ROSA不仅提高了操作效率，还为机器人技术的负责任使用设立了新标准，可能也为未来任务操作设立了新标准。本文介绍了ROSA的架构，并展示了在JPL的火星场地、实验室和使用三种不同机器人进行模拟的初始模拟操作。核心ROSA库可作为开源提供。

更新时间: 2024-10-09 01:54:02

领域: cs.RO,cs.AI,cs.HC

下载: http://arxiv.org/abs/2410.06472v1

Does Spatial Cognition Emerge in Frontier Models?

Not yet. We present SPACE, a benchmark that systematically evaluates spatial cognition in frontier models. Our benchmark builds on decades of research in cognitive science. It evaluates large-scale mapping abilities that are brought to bear when an organism traverses physical environments, smaller-scale reasoning about object shapes and layouts, and cognitive infrastructure such as spatial attention and memory. For many tasks, we instantiate parallel presentations via text and images, allowing us to benchmark both large language models and large multimodal models. Results suggest that contemporary frontier models fall short of the spatial intelligence of animals, performing near chance level on a number of classic tests of animal cognition.

Updated: 2024-10-09 01:41:49

标题: Frontier Models中出现空间认知吗？

摘要: 我们提出了SPACE，一个系统评估前沿模型中空间认知的基准。我们的基准建立在几十年的认知科学研究基础上。它评估了生物在穿越物理环境时所展现的大规模映射能力，对物体形状和布局进行的小规模推理，以及空间注意力和记忆等认知基础设施。对于许多任务，我们通过文本和图像实例化并行呈现，使我们能够评估大语言模型和大多模态模型。结果表明，当代前沿模型的空间智能还不及动物，在许多经典动物认知测试中表现接近于随机水平。

更新时间: 2024-10-09 01:41:49

领域: cs.AI,cs.CV,cs.LG

下载: http://arxiv.org/abs/2410.06468v1

WAPITI: A Watermark for Finetuned Open-Source LLMs

Watermarking of large language models (LLMs) generation embeds an imperceptible statistical pattern within texts, making it algorithmically detectable. Watermarking is a promising method for addressing potential harm and biases from LLMs, as it enables traceability, accountability, and detection of manipulated content, helping to mitigate unintended consequences. However, for open-source models, watermarking faces two major challenges: (i) incompatibility with fine-tuned models, and (ii) vulnerability to fine-tuning attacks. In this work, we propose WAPITI, a new method that transfers watermarking from base models to fine-tuned models through parameter integration. To the best of our knowledge, we propose the first watermark for fine-tuned open-source LLMs that preserves their fine-tuned capabilities. Furthermore, our approach offers an effective defense against fine-tuning attacks. We test our method on various model architectures and watermarking strategies. Results demonstrate that our method can successfully inject watermarks and is highly compatible with fine-tuned models. Additionally, we offer an in-depth analysis of how parameter editing influences the watermark strength and overall capabilities of the resulting models.

Updated: 2024-10-09 01:41:14

标题: 麋鹿：一个用于微调开源LLMs的水印

摘要: 大型语言模型（LLMs）的水印生成将一种不可察觉的统计模式嵌入文本中，使其在算法上可检测。水印是解决LLMs潜在危害和偏见的一种有前途的方法，因为它可以追踪、问责和检测被篡改的内容，有助于减轻意外后果。然而，对于开源模型，水印面临两个主要挑战：（i）与微调模型不兼容，（ii）易受微调攻击。在这项工作中，我们提出了一种新方法WAPITI，通过参数集成将水印从基础模型转移到微调模型。据我们所知，我们提出了第一个保留微调能力的开源LLMs水印。此外，我们的方法提供了有效的防御措施来抵御微调攻击。我们测试了我们的方法在各种模型架构和水印策略上的效果。结果表明，我们的方法可以成功注入水印，并与微调模型高度兼容。此外，我们对参数编辑如何影响水印强度以及最终模型的整体能力进行了深入分析。

更新时间: 2024-10-09 01:41:14

领域: cs.CR

下载: http://arxiv.org/abs/2410.06467v1

Hallucinating AI Hijacking Attack: Large Language Models and Malicious Code Recommenders

The research builds and evaluates the adversarial potential to introduce copied code or hallucinated AI recommendations for malicious code in popular code repositories. While foundational large language models (LLMs) from OpenAI, Google, and Anthropic guard against both harmful behaviors and toxic strings, previous work on math solutions that embed harmful prompts demonstrate that the guardrails may differ between expert contexts. These loopholes would appear in mixture of expert's models when the context of the question changes and may offer fewer malicious training examples to filter toxic comments or recommended offensive actions. The present work demonstrates that foundational models may refuse to propose destructive actions correctly when prompted overtly but may unfortunately drop their guard when presented with a sudden change of context, like solving a computer programming challenge. We show empirical examples with trojan-hosting repositories like GitHub, NPM, NuGet, and popular content delivery networks (CDN) like jsDelivr which amplify the attack surface. In the LLM's directives to be helpful, example recommendations propose application programming interface (API) endpoints which a determined domain-squatter could acquire and setup attack mobile infrastructure that triggers from the naively copied code. We compare this attack to previous work on context-shifting and contrast the attack surface as a novel version of "living off the land" attacks in the malware literature. In the latter case, foundational language models can hijack otherwise innocent user prompts to recommend actions that violate their owners' safety policies when posed directly without the accompanying coding support request.

Updated: 2024-10-09 01:36:25

标题: 幻觉人工智能劫持攻击：大型语言模型和恶意代码推荐算法

摘要: 这项研究建立并评估了引入复制代码或虚构的人工智能建议以用于流行代码库中恶意代码的对抗潜力。虽然来自OpenAI、谷歌和Anthropic的基础性大型语言模型(LLMs)可以抵御有害行为和有毒字符，但先前关于嵌入有害提示的数学解决方案的工作表明，在专家环境中，防护措施可能存在差异。当问题的上下文发生变化时，这些漏洞可能出现在专家模型的混合中，并且可能提供较少的恶意训练示例以过滤有毒评论或推荐的攻击性行为。本研究表明，基础模型在被明确提示时可能会拒绝正确提出破坏性行动，但当出现突然的上下文变化时，如解决计算机编程挑战时，可能不幸地放松警惕。我们展示了类似GitHub、NPM、NuGet和流行内容传输网络(CDN)jsDelivr等特洛伊木马托管库的实证例子，这些库扩大了攻击面。在LLM的指导下，示例建议提议应用程序编程接口(API)端点，这些端点可能会被决心的域抢注者获取并设置攻击移动基础设施，该基础设施从天真复制的代码中触发。我们将这种攻击与先前关于上下文转移的工作进行了比较，并将攻击面作为恶意软件文献中的“利用现有资源”攻击的新版本进行对比。在后一种情况下，基础语言模型可以利用否则无害的用户提示，建议违反其所有者的安全政策的行动，而不需要附带的编码支持请求。

更新时间: 2024-10-09 01:36:25

领域: cs.CR,cs.AI

下载: http://arxiv.org/abs/2410.06462v1

Recent advancements in LLM Red-Teaming: Techniques, Defenses, and Ethical Considerations

Large Language Models (LLMs) have demonstrated remarkable capabilities in natural language processing tasks, but their vulnerability to jailbreak attacks poses significant security risks. This survey paper presents a comprehensive analysis of recent advancements in attack strategies and defense mechanisms within the field of Large Language Model (LLM) red-teaming. We analyze various attack methods, including gradient-based optimization, reinforcement learning, and prompt engineering approaches. We discuss the implications of these attacks on LLM safety and the need for improved defense mechanisms. This work aims to provide a thorough understanding of the current landscape of red-teaming attacks and defenses on LLMs, enabling the development of more secure and reliable language models.

Updated: 2024-10-09 01:35:38

标题: 最近LLM红队技术的进展：技术、防御和道德考虑

摘要: 大型语言模型（LLMs）在自然语言处理任务中展示了出色的能力，但它们对越狱攻击的脆弱性带来了重大的安全风险。本调查论文对大型语言模型（LLM）红队行动领域内攻击策略和防御机制的最新进展进行了全面分析。我们分析了各种攻击方法，包括基于梯度的优化、强化学习和提示工程方法。我们讨论了这些攻击对LLM安全性的影响以及对改进防御机制的需求。这项工作旨在全面了解当前LLMs上红队攻击和防御的现状，从而促进更安全可靠的语言模型的发展。

更新时间: 2024-10-09 01:35:38

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2410.09097v1

A Benchmark on Directed Graph Representation Learning in Hardware Designs

To keep pace with the rapid advancements in design complexity within modern computing systems, directed graph representation learning (DGRL) has become crucial, particularly for encoding circuit netlists, computational graphs, and developing surrogate models for hardware performance prediction. However, DGRL remains relatively unexplored, especially in the hardware domain, mainly due to the lack of comprehensive and user-friendly benchmarks. This study presents a novel benchmark comprising five hardware design datasets and 13 prediction tasks spanning various levels of circuit abstraction. We evaluate 21 DGRL models, employing diverse graph neural networks and graph transformers (GTs) as backbones, enhanced by positional encodings (PEs) tailored for directed graphs. Our results highlight that bidirected (BI) message passing neural networks (MPNNs) and robust PEs significantly enhance model performance. Notably, the top-performing models include PE-enhanced GTs interleaved with BI-MPNN layers and BI-Graph Isomorphism Network, both surpassing baselines across the 13 tasks. Additionally, our investigation into out-of-distribution (OOD) performance emphasizes the urgent need to improve OOD generalization in DGRL models. This benchmark, implemented with a modular codebase, streamlines the evaluation of DGRL models for both hardware and ML practitioners

Updated: 2024-10-09 01:32:48

标题: 一个关于硬件设计中有向图表示学习的基准测试

摘要: 为了跟上现代计算系统设计复杂性的快速发展，定向图表示学习（DGRL）变得至关重要，特别是用于编码电路清单、计算图，并开发用于硬件性能预测的替代模型。然而，DGRL 在硬件领域仍相对未被探索，主要是由于缺乏全面且用户友好的基准测试。本研究提出了一个新颖的基准测试，包括五个硬件设计数据集和涵盖各种电路抽象级别的13个预测任务。我们评估了21个DGRL模型，采用多样的图神经网络和图变换器（GTs）作为骨干，通过为定向图量身定制的位置编码（PEs）增强。我们的结果凸显出，双向（BI）消息传递神经网络（MPNNs）和强大的PEs显著提升了模型性能。值得注意的是，表现最佳的模型包括PE增强GTs与BI-MPNN层交错以及BI-图同构网络，两者都超越了13个任务的基线。此外，我们对分布外（OOD）性能的调查强调了在DGRL模型中改善OOD泛化的紧迫需求。这个基准测试使用模块化代码库实现，简化了硬件和机器学习从业者对DGRL模型的评估。

更新时间: 2024-10-09 01:32:48

领域: cs.LG

下载: http://arxiv.org/abs/2410.06460v1

LLM Self-Correction with DeCRIM: Decompose, Critique, and Refine for Enhanced Following of Instructions with Multiple Constraints

Instruction following is a key capability for LLMs. However, recent studies have shown that LLMs often struggle with instructions containing multiple constraints (e.g. a request to create a social media post "in a funny tone" with "no hashtag"). Despite this, most evaluations focus solely on synthetic data. To address this, we introduce RealInstruct, the first benchmark designed to evaluate LLMs' ability to follow real-world multi-constrained instructions by leveraging queries real users asked AI assistants. We also investigate model-based evaluation as a cost-effective alternative to human annotation for this task. Our findings reveal that even the proprietary GPT-4 model fails to meet at least one constraint on over 21% of instructions, highlighting the limitations of state-of-the-art models. To address the performance gap between open-source and proprietary models, we propose the Decompose, Critique and Refine (DeCRIM) self-correction pipeline, which enhances LLMs' ability to follow constraints. DeCRIM works by decomposing the original instruction into a list of constraints and using a Critic model to decide when and where the LLM's response needs refinement. Our results show that DeCRIM improves Mistral's performance by 7.3% on RealInstruct and 8.0% on IFEval even with weak feedback. Moreover, we demonstrate that with strong feedback, open-source LLMs with DeCRIM can outperform GPT-4 on both benchmarks.

Updated: 2024-10-09 01:25:10

标题: LLM自我校正与DeCRIM：通过分解、批判和完善以增强在多个约束条件下遵循指令

摘要: 指令遵循是LLMs的关键能力。然而，最近的研究表明，LLMs经常在包含多个约束的指令（例如要求以“幽默的语调”创建社交媒体帖子，但“不要使用标签”）上遇到困难。尽管如此，大多数评估仍然专注于合成数据。为了解决这个问题，我们引入了RealInstruct，这是第一个旨在通过利用真实用户向AI助手提出的查询来评估LLMs遵循真实世界多约束指令能力的基准。我们还探讨了基于模型的评估作为这项任务的一种成本效益的替代方法，而不是人工标注。我们的研究结果显示，即使专有的GPT-4模型在超过21%的指令中至少有一个约束未达成，突显了现有模型的局限性。为了解决开源和专有模型之间的性能差距，我们提出了Decompose、Critic和Refine（DeCRIM）自我校正流程，该流程增强了LLMs遵循约束的能力。DeCRIM通过将原始指令分解为约束列表，并使用Critic模型来决定LLM的响应何时何地需要改进。我们的结果显示，即使反馈较弱，DeCRIM也能使Mistral在RealInstruct上的性能提高了7.3％，在IFEval上提高了8.0％。此外，我们证明了在强反馈下，具有DeCRIM的开源LLMs可以在两个基准测试中胜过GPT-4。

更新时间: 2024-10-09 01:25:10

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2410.06458v1

Modeling chaotic Lorenz ODE System using Scientific Machine Learning

In climate science, models for global warming and weather prediction face significant challenges due to the limited availability of high-quality data and the difficulty in obtaining it, making data efficiency crucial. In the past few years, Scientific Machine Learning (SciML) models have gained tremendous traction as they can be trained in a data-efficient manner, making them highly suitable for real-world climate applications. Despite this, very little attention has been paid to chaotic climate system modeling utilizing SciML methods. In this paper, we have integrated SciML methods into foundational weather models, where we have enhanced large-scale climate predictions with a physics-informed approach that achieves high accuracy with reduced data. We successfully demonstrate that by combining the interpretability of physical climate models with the computational power of neural networks, SciML models can prove to be a reliable tool for modeling climate. This indicates a shift from the traditional black box-based machine learning modeling of climate systems to physics-informed decision-making, leading to effective climate policy implementation.

Updated: 2024-10-09 01:17:06

标题: 利用科学机器学习对混沌洛伦兹ODE系统进行建模

摘要: 在气候科学领域，全球变暖和天气预测模型面临重大挑战，这是由于高质量数据的有限可用性和获取数据的困难，使数据效率至关重要。在过去几年中，科学机器学习（SciML）模型因其可以以数据高效的方式进行训练而获得了巨大的关注，使其非常适合于实际气候应用。尽管如此，对利用SciML方法进行混沌气候系统建模的关注非常有限。在本文中，我们将SciML方法整合到基础天气模型中，通过物理知识为基础的方法增强了大规模气候预测，实现了高准确性并减少了数据量。我们成功地证明，通过将物理气候模型的可解释性与神经网络的计算能力相结合，SciML模型可以成为建模气候的可靠工具。这表明了从传统基于黑匣子的机器学习气候系统建模转变为基于物理信息的决策制定，从而实现有效的气候政策实施。

更新时间: 2024-10-09 01:17:06

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2410.06452v1

Simultaneous Masking, Not Prompting Optimization: A Paradigm Shift in Fine-tuning LLMs for Simultaneous Translation

Large language models (LLMs) have achieved state-of-the-art performance in various language processing tasks, motivating their adoption in simultaneous translation. Current fine-tuning methods to adapt LLMs for simultaneous translation focus on prompting optimization strategies using either data augmentation or prompt structure modifications. However, these methods suffer from several issues, such as unnecessarily expanded training sets, computational inefficiency from dumping the key and value cache, increased prompt sizes, or restriction to a single decision policy. To eliminate these issues, in this work, we propose SimulMask, a new paradigm for fine-tuning LLMs for simultaneous translation. It utilizes a novel attention mask approach that models simultaneous translation during fine-tuning by masking attention for a desired decision policy. Applying the proposed SimulMask on a Falcon LLM for the IWSLT 2017 dataset, we have observed a significant translation quality improvement compared to state-of-the-art prompting optimization strategies on five language pairs while reducing the computational cost.

Updated: 2024-10-09 01:12:19

标题: 同时屏蔽，而非提示优化：调整LLM进行同时翻译的范式转变

摘要: 大型语言模型(LLMs)在各种语言处理任务中取得了最先进的性能，这促使它们被应用于同时翻译。目前用于调整LLMs以用于同时翻译的微调方法主要集中在使用数据增强或提示结构修改的优化策略。然而，这些方法存在一些问题，例如不必要扩展的训练集、从键和值缓存中丢弃导致的计算效率低下、提示大小增加，或者仅限于单一决策策略。为了消除这些问题，在这项工作中，我们提出了SimulMask，这是一种为同时翻译微调LLMs的新范式。它利用一种新颖的注意力掩码方法，在微调过程中通过为所需的决策策略屏蔽注意力来建模同时翻译。将提出的SimulMask应用于Falcon LLM对IWSLT 2017数据集，我们观察到与最先进的提示优化策略相比，在五种语言对上显著提高了翻译质量，同时降低了计算成本。

更新时间: 2024-10-09 01:12:19

领域: cs.CL,cs.LG

下载: http://arxiv.org/abs/2405.10443v4

Multi-label Classification for Android Malware Based on Active Learning

The existing malware classification approaches (i.e., binary and family classification) can barely benefit subsequent analysis with their outputs. Even the family classification approaches suffer from lacking a formal naming standard and an incomplete definition of malicious behaviors. More importantly, the existing approaches are powerless for one malware with multiple malicious behaviors, while this is a very common phenomenon for Android malware in the wild. So, neither of them can provide researchers with a direct and comprehensive enough understanding of malware. In this paper, we propose MLCDroid, an ML-based multi-label classification approach that can directly indicate the existence of pre-defined malicious behaviors. With an in-depth analysis, we summarize six basic malicious behaviors from real-world malware with security reports and construct a labeled dataset. We compare the results of 70 algorithm combinations to evaluate the effectiveness (best at 73.3%). Faced with the challenge of the expensive cost of data annotation, we further propose an active learning approach based on data augmentation, which can improve the overall accuracy to 86.7% with a data augmentation of 5,000+ high-quality samples from an unlabeled malware dataset. This is the first multi-label Android malware classification approach intending to provide more information on fine-grained malicious behaviors.

Updated: 2024-10-09 01:09:24

标题: 基于主动学习的安卓恶意软件的多标签分类

摘要: 现有的恶意软件分类方法（即二进制和家族分类）几乎无法通过它们的输出受益于后续分析。即使家族分类方法也存在着缺乏正式命名标准和恶意行为不完整定义的问题。更重要的是，现有方法对于一个恶意软件具有多个恶意行为的情况无能为力，而这在野外的Android恶意软件中是非常常见的现象。因此，它们都无法为研究人员提供对恶意软件的直接和全面的理解。在本文中，我们提出了MLCDroid，一种基于机器学习的多标签分类方法，可以直接指示预定义恶意行为的存在。通过深入分析，我们总结了来自安全报告的真实世界恶意软件的六种基本恶意行为，并构建了一个带标签的数据集。我们比较了70种算法组合的结果以评估有效性（最佳为73.3%）。面对数据标注成本昂贵的挑战，我们进一步提出了一种基于数据增强的主动学习方法，可以将整体准确率提高到86.7%，并从无标签的恶意软件数据集中获得5000多个高质量样本的数据增强。这是第一个旨在提供更多关于细粒度恶意行为信息的多标签Android恶意软件分类方法。

更新时间: 2024-10-09 01:09:24

领域: cs.CR

下载: http://arxiv.org/abs/2410.06444v1

Pessimistic Nonlinear Least-Squares Value Iteration for Offline Reinforcement Learning

Offline reinforcement learning (RL), where the agent aims to learn the optimal policy based on the data collected by a behavior policy, has attracted increasing attention in recent years. While offline RL with linear function approximation has been extensively studied with optimal results achieved under certain assumptions, many works shift their interest to offline RL with non-linear function approximation. However, limited works on offline RL with non-linear function approximation have instance-dependent regret guarantees. In this paper, we propose an oracle-efficient algorithm, dubbed Pessimistic Nonlinear Least-Square Value Iteration (PNLSVI), for offline RL with non-linear function approximation. Our algorithmic design comprises three innovative components: (1) a variance-based weighted regression scheme that can be applied to a wide range of function classes, (2) a subroutine for variance estimation, and (3) a planning phase that utilizes a pessimistic value iteration approach. Our algorithm enjoys a regret bound that has a tight dependency on the function class complexity and achieves minimax optimal instance-dependent regret when specialized to linear function approximation. Our work extends the previous instance-dependent results within simpler function classes, such as linear and differentiable function to a more general framework.

Updated: 2024-10-09 00:58:22

标题: 悲观非线性最小二乘值迭代用于离线强化学习

摘要: 离线强化学习（RL）是指代理根据由行为策略收集的数据学习最优策略的过程，在近年来吸引了越来越多的关注。虽然使用线性函数逼近的离线RL已经得到广泛研究，并在某些假设下取得了最佳结果，但许多研究转向了使用非线性函数逼近的离线RL。然而，关于非线性函数逼近的离线RL的研究有限，并且缺乏实例相关的后悔保证。在本文中，我们提出了一种名为悲观非线性最小二乘值迭代（PNLSVI）的算法，用于离线RL中的非线性函数逼近。我们的算法设计包括三个创新组件：（1）一种基于方差的加权回归方案，可应用于广泛的函数类别，（2）用于方差估计的子程序，（3）利用悲观值迭代方法的计划阶段。我们的算法具有一个后悔界限，其对函数类复杂度有紧密依赖，并在专门用于线性函数逼近时实现了最小化最优的实例相关后悔。我们的工作将之前在简单函数类别（如线性和可微函数）中的实例相关结果扩展到一个更一般的框架。

更新时间: 2024-10-09 00:58:22

领域: cs.LG,math.OC,stat.ML

下载: http://arxiv.org/abs/2310.01380v2

MaD-Scientist: AI-based Scientist solving Convection-Diffusion-Reaction Equations Using Massive PINN-Based Prior Data

Large language models (LLMs), like ChatGPT, have shown that even trained with noisy prior data, they can generalize effectively to new tasks through in-context learning (ICL) and pre-training techniques. Motivated by this, we explore whether a similar approach can be applied to scientific foundation models (SFMs). Our methodology is structured as follows: (i) we collect low-cost physics-informed neural network (PINN)-based approximated prior data in the form of solutions to partial differential equations (PDEs) constructed through an arbitrary linear combination of mathematical dictionaries; (ii) we utilize Transformer architectures with self and cross-attention mechanisms to predict PDE solutions without knowledge of the governing equations in a zero-shot setting; (iii) we provide experimental evidence on the one-dimensional convection-diffusion-reaction equation, which demonstrate that pre-training remains robust even with approximated prior data, with only marginal impacts on test accuracy. Notably, this finding opens the path to pre-training SFMs with realistic, low-cost data instead of (or in conjunction with) numerical high-cost data. These results support the conjecture that SFMs can improve in a manner similar to LLMs, where fully cleaning the vast set of sentences crawled from the Internet is nearly impossible.

Updated: 2024-10-09 00:52:00

标题: 疯狂科学家：基于人工智能的科学家利用大规模PINN先验数据解决对流-扩散-反应方程

摘要: 大型语言模型（LLMs），如ChatGPT，已经表明，即使在嘈杂的先前数据的训练下，它们可以通过上下文学习（ICL）和预训练技术有效地推广到新任务。受此启发，我们探讨了类似的方法是否可以应用于科学基础模型（SFMs）。我们的方法结构如下：（i）我们收集低成本的基于物理信息的神经网络（PINN）近似的先前数据，这些数据以通过数学字典的任意线性组合构造的偏微分方程（PDEs）的解的形式存在；（ii）我们利用具有自注意力机制和交叉注意力机制的Transformer架构，在零-shot设置中预测PDE解，而不需要了解控制方程；（iii）我们提供了关于一维对流扩散反应方程的实验证据，表明即使使用近似的先前数据，预训练仍然保持稳健，对测试准确性的影响仅有边际影响。值得注意的是，这一发现为使用现实、低成本数据而不是（或与）数值高成本数据一起预训练SFMs打开了道路。这些结果支持了SFMs可以像LLMs一样改进的猜想，LLMs中几乎不可能完全清理从互联网上爬取的大量句子。

更新时间: 2024-10-09 00:52:00

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2410.06442v1

Addax: Utilizing Zeroth-Order Gradients to Improve Memory Efficiency and Performance of SGD for Fine-Tuning Language Models

Fine-tuning language models (LMs) with the Adam optimizer often demands excessive memory, limiting accessibility. The "in-place" version of Stochastic Gradient Descent (IP-SGD) and Memory-Efficient Zeroth-order Optimizer (MeZO) have been proposed to address this. However, IP-SGD still requires substantial memory, and MeZO suffers from slow convergence and degraded final performance due to its zeroth-order nature. This paper introduces Addax, a novel method that improves both memory efficiency and performance of IP-SGD by integrating it with MeZO. Specifically, Addax computes zeroth- or first-order gradients of data points in the minibatch based on their memory consumption, combining these gradient estimates to update directions. By computing zeroth-order gradients for data points that require more memory and first-order gradients for others, Addax overcomes the slow convergence of MeZO and the excessive memory requirement of IP-SGD. Additionally, the zeroth-order gradient acts as a regularizer for the first-order gradient, further enhancing the model's final performance. Theoretically, we establish the convergence of Addax under mild assumptions, demonstrating faster convergence and less restrictive hyper-parameter choices than MeZO. Our experiments with diverse LMs and tasks show that Addax consistently outperforms MeZO regarding accuracy and convergence speed while having a comparable memory footprint. When fine-tuning OPT-13B with one A100 GPU, on average, Addax outperforms MeZO in accuracy/F1 score by 14% and runs 15x faster while using memory similar to MeZO. In our experiments on the larger OPT-30B model, on average, Addax outperforms MeZO in terms of accuracy/F1 score by >16 and runs 30x faster on a single H100 GPU. Moreover, Addax surpasses the performance of standard fine-tuning approaches, such as IP-SGD and Adam, in most tasks with significantly less memory requirement.

Updated: 2024-10-09 00:49:08

标题: Addax：利用零阶梯度来提高语言模型微调中SGD的内存效率和性能

摘要: 使用Adam优化器对语言模型（LMs）进行微调通常需要过多的内存，限制了可访问性。为了解决这一问题，人们提出了“就地”版本的随机梯度下降（IP-SGD）和内存高效的零阶优化器（MeZO）。然而，IP-SGD仍然需要大量内存，而MeZO由于其零阶特性导致收敛速度慢，最终性能下降。本文介绍了一种新方法Addax，通过将其与MeZO集成，改善了IP-SGD的内存效率和性能。具体而言，Addax根据数据点的内存消耗计算小批量数据点的零阶或一阶梯度，将这些梯度估计结合起来更新方向。通过为需要更多内存的数据点计算零阶梯度，为其他数据点计算一阶梯度，Addax克服了MeZO的收敛速度慢和IP-SGD的内存需求过大的问题。此外，零阶梯度作为一阶梯度的正则化器，进一步提升了模型的最终性能。在理论上，我们在温和的假设下建立了Addax的收敛性，展示了比MeZO更快的收敛速度和更自由的超参数选择。我们对不同的LMs和任务进行了实验，结果表明，Addax在准确性和收敛速度方面一直优于MeZO，并且具有相似的内存占用。在使用一台A100 GPU对OPT-13B进行微调时，平均而言，Addax在准确性/F1分数方面比MeZO提高了14％，并且运行速度快15倍，同时内存占用与MeZO相似。在较大的OPT-30B模型上进行实验时，平均而言，Addax在准确性/F1分数方面优于MeZO超过16倍，并且在单个H100 GPU上运行速度快30倍。此外，Addax在大多数任务中超越了标准的微调方法，如IP-SGD和Adam，在内存需求显著较少的情况下表现更好。

更新时间: 2024-10-09 00:49:08

领域: cs.LG,cs.CL

下载: http://arxiv.org/abs/2410.06441v1

Functional-level Uncertainty Quantification for Calibrated Fine-tuning on LLMs

From common-sense reasoning to domain-specific tasks, parameter-efficient fine tuning (PEFT) methods for large language models (LLMs) have showcased significant performance improvements on downstream tasks. However, fine-tuned LLMs often struggle with overconfidence in uncertain predictions, particularly due to sparse training data. This overconfidence reflects poor epistemic uncertainty calibration, which arises from limitations in the model's ability to generalize with limited data. Existing PEFT uncertainty quantification methods for LLMs focus on the post fine-tuning stage and thus have limited capability in calibrating epistemic uncertainty. To address these limitations, we propose Functional-Level Uncertainty Quantification for Calibrated Fine-Tuning (UQ4CT), which captures and calibrates functional-level epistemic uncertainty during the fine-tuning stage via a mixture-of-expert framework. We show that UQ4CT reduces Expected Calibration Error (ECE) by more than $25\%$ while maintaining high accuracy across $5$ benchmarks. Furthermore, UQ4CT maintains superior ECE performance with high accuracy under distribution shift, showcasing improved generalizability.

Updated: 2024-10-09 00:09:15

标题: 在LLMs上校准微调的功能级不确定性量化

摘要: 从常识推理到特定领域任务，针对大型语言模型（LLMs）的参数高效微调（PEFT）方法在下游任务上展示了显著的性能改进。然而，经过微调的LLMs往往在不确定预测中表现出过度自信，特别是由于稀疏的训练数据。这种过度自信反映了对认识不确定性校准的不良，这是由于模型在有限数据下泛化能力受限所引起的。现有的针对LLMs的PEFT不确定性量化方法侧重于微调后的阶段，因此在校准认识不确定性方面具有有限的能力。为了解决这些限制，我们提出了用于校准微调的功能级不确定性量化（UQ4CT），通过混合专家框架在微调阶段捕获和校准功能级认识不确定性。我们展示了UQ4CT将预期校准误差（ECE）降低了超过25％，同时在5个基准测试中保持高准确性。此外，UQ4CT在分布转移下保持了卓越的ECE性能，并展示了改进的泛化能力。

更新时间: 2024-10-09 00:09:15

领域: cs.LG

下载: http://arxiv.org/abs/2410.06431v1