Arxiv Day: Article

Bayesian identification of nonseparable Hamiltonians with multiplicative noise using deep learning and reduced-order modeling

This paper presents a structure-preserving Bayesian approach for learning nonseparable Hamiltonian systems using stochastic dynamic models allowing for statistically-dependent, vector-valued additive and multiplicative measurement noise. The approach is comprised of three main facets. First, we derive a Gaussian filter for a statistically-dependent, vector-valued, additive and multiplicative noise model that is needed to evaluate the likelihood within the Bayesian posterior. Second, we develop a novel algorithm for cost-effective application of Bayesian system identification to high-dimensional systems. Third, we demonstrate how structure-preserving methods can be incorporated into the proposed framework, using nonseparable Hamiltonians as an illustrative system class. We assess the method's performance based on the forecasting accuracy of a model estimated from-single trajectory data. We compare the Bayesian method to a state-of-the-art machine learning method on a canonical nonseparable Hamiltonian model and a chaotic double pendulum model with small, noisy training datasets. The results show that using the Bayesian posterior as a training objective can yield upwards of 724 times improvement in Hamiltonian mean squared error using training data with up to 10% multiplicative noise compared to a standard training objective. Lastly, we demonstrate the utility of the novel algorithm for parameter estimation of a 64-dimensional model of the spatially-discretized nonlinear Schr\"odinger equation with data corrupted by up to 20% multiplicative noise.

Updated: 2024-06-26 23:55:27

标题: 贝叶斯方法在使用深度学习和降阶建模识别带有乘性噪声的不可分离哈密顿量中的应用

摘要: 本文提出了一种保持结构的贝叶斯方法，用于学习非可分离的哈密顿系统，使用允许统计相关、矢量值加性和乘性测量噪声的随机动态模型。该方法包括三个主要方面。首先，我们推导了一个高斯滤波器，用于统计相关、矢量值、加性和乘性噪声模型，这是评估贝叶斯后验中的似然性所需的。其次，我们开发了一种新颖的算法，用于对高维系统进行成本有效的贝叶斯系统识别。第三，我们演示了如何将保持结构的方法纳入提出的框架中，使用非可分离的哈密顿作为说明性系统类。我们根据从单轨道数据估计的模型的预测精度评估该方法的性能。我们将贝叶斯方法与一种最先进的机器学习方法在一个经典的非可分离的哈密顿模型和一个带有小而嘈杂的训练数据集的混沌双摆模型上进行比较。结果显示，使用贝叶斯后验作为训练目标，可以使哈密顿均方误差在使用高达10%乘性噪声的训练数据时提高724倍，相对于标准训练目标。最后，我们演示了该新颖算法在64维空间离散非线性Schr\"odinger方程的参数估计中的实用性，数据受到高达20%乘性噪声的影响。

更新时间: 2024-06-26 23:55:27

领域: stat.ML,cs.LG,math.DS,physics.data-an,stat.CO

下载: http://arxiv.org/abs/2401.12476v2

A Survey on Human-AI Teaming with Large Pre-Trained Models

In the rapidly evolving landscape of artificial intelligence (AI), the collaboration between human intelligence and AI systems, known as Human-AI (HAI) Teaming, has emerged as a cornerstone for advancing problem-solving and decision-making processes. The advent of Large Pre-trained Models (LPtM) has significantly transformed this landscape, offering unprecedented capabilities by leveraging vast amounts of data to understand and predict complex patterns. This paper surveys the pivotal integration of LPtMs with HAI, emphasizing how these models enhance collaborative intelligence beyond traditional approaches. It examines the potential of LPtMs in augmenting human capabilities, discussing this collaboration for AI model improvements, effective teaming, ethical considerations, and their broad applied implications in various sectors. Through this exploration, the study sheds light on the transformative impact of LPtM-enhanced HAI Teaming, providing insights for future research, policy development, and strategic implementations aimed at harnessing the full potential of this collaboration for research and societal benefit.

Updated: 2024-06-26 23:44:48

标题: 一个关于使用大型预训练模型的人工智能团队合作调查

摘要: 在人工智能（AI）快速发展的背景下，人类智能和AI系统之间的合作，即人机合作（HAI）团队合作，已经成为推进问题解决和决策过程的基石。大型预训练模型（LPtM）的出现显著改变了这一格局，通过利用大量数据来理解和预测复杂模式，提供了前所未有的能力。本文调查了LPtMs与HAI的关键整合，强调了这些模型如何超越传统方法增强协作智能。它探讨了LPtMs在增强人类能力方面的潜力，讨论了AI模型改进、有效团队合作、伦理考虑以及在各个领域的广泛应用意义。通过这一探索，研究揭示了LPtM增强的HAI团队合作的变革性影响，为未来研究、政策制定和战略实施提供了见解，旨在充分利用这种合作的潜力，造福研究和社会。

更新时间: 2024-06-26 23:44:48

领域: cs.AI,cs.CL,cs.HC

下载: http://arxiv.org/abs/2403.04931v2

To smooth a cloud or to pin it down: Guarantees and Insights on Score Matching in Denoising Diffusion Models

Denoising diffusion models are a class of generative models which have recently achieved state-of-the-art results across many domains. Gradual noise is added to the data using a diffusion process, which transforms the data distribution into a Gaussian. Samples from the generative model are then obtained by simulating an approximation of the time reversal of this diffusion initialized by Gaussian samples. Recent research has explored adapting diffusion models for sampling and inference tasks. In this paper, we leverage known connections to stochastic control akin to the F\"ollmer drift to extend established neural network approximation results for the F\"ollmer drift to denoising diffusion models and samplers.

Updated: 2024-06-26 23:41:36

标题: 平滑云还是固定云：噪声扩散模型中得分匹配的保证和洞察

摘要: 去噪扩散模型是一类生成模型，最近在许多领域取得了最先进的结果。通过扩散过程向数据逐渐添加噪声，将数据分布转换为高斯分布。然后通过模拟该扩散的时间反转的近似来获取生成模型的样本，初始值为高斯样本。最近的研究探讨了将扩散模型调整为采样和推断任务。在本文中，我们利用已知的与F\"ollmer漂移类似的随机控制联系，将F\"ollmer漂移的已建立的神经网络逼近结果扩展到去噪扩散模型和取样器。

更新时间: 2024-06-26 23:41:36

领域: stat.ML,cs.LG

下载: http://arxiv.org/abs/2305.09605v3

Mélange: Cost Efficient Large Language Model Serving by Exploiting GPU Heterogeneity

Large language models (LLMs) are increasingly integrated into many online services, yet they remain cost-prohibitive to deploy due to the requirement of expensive GPU instances. Prior work has addressed the high cost of LLM serving by improving the inference engine, but less attention has been given to selecting the most cost-efficient GPU type(s) for a specific LLM service. There is a large and growing landscape of GPU types and, within these options, higher cost does not always lead to increased performance. Instead, through a comprehensive investigation, we find that three key LLM service characteristics (request size, request rate, SLO) strongly influence GPU cost efficiency, and differing GPU types are most cost efficient for differing LLM service settings. As a result, the most cost-efficient allocation for a given service is typically a mix of heterogeneous GPU types. Based on this analysis, we introduce M\'elange, a GPU allocation framework that navigates these diverse LLM service characteristics and heterogeneous GPU option space to automatically and efficiently derive the minimal-cost GPU allocation for a given LLM service. We formulate the GPU allocation task as a cost-aware bin packing problem where GPUs are bins and items are slices of the service workload. Our formulation's constraints account for a service's unique characteristics, allowing M\'elange to be flexible to support diverse service settings and heterogeneity-aware to adapt the GPU allocation to a specific service. Compared to using only a single GPU type, M\'elange reduces deployment costs by up to 77\% in conversational settings, 33\% in document-based settings, and 51\% in a mixed setting.

Updated: 2024-06-26 23:39:26

标题: 混合：利用GPU异构性实现成本效益的大型语言模型服务

摘要: 大型语言模型（LLMs）越来越多地集成到许多在线服务中，但由于需要昂贵的GPU实例，它们部署仍然成本高昂。先前的工作已经解决了LLM服务的高成本问题，通过改进推理引擎，但对于选择特定LLM服务的最具成本效益的GPU类型(s)给予的关注较少。存在大量且不断增长的GPU类型，在这些选项中，更高的成本并不总是导致性能提升。相反，通过全面调查，我们发现三个关键的LLM服务特征（请求大小、请求速率、SLO）强烈影响GPU成本效率，并且不同的GPU类型对于不同的LLM服务设置最具成本效益。因此，对于给定的服务，最具成本效益的分配通常是异构GPU类型的混合。基于这一分析，我们引入了M\'elange，一个GPU分配框架，可以浏览这些多样的LLM服务特征和异构GPU选项空间，自动且高效地为给定的LLM服务推导出最低成本的GPU分配。我们将GPU分配任务制定为成本感知的装箱问题，其中GPU是箱子，项目是服务工作负载的片段。我们的制定约束考虑了服务的独特特征，使M\'elange能够灵活支持不同的服务设置，并且具有异构感知，可根据具体服务调整GPU分配。与仅使用单一GPU类型相比，M\'elange在对话设置中可将部署成本降低高达77％，在基于文档的设置中降低33％，在混合设置中降低51％。

更新时间: 2024-06-26 23:39:26

领域: cs.DC,cs.LG

下载: http://arxiv.org/abs/2404.14527v2

PGODE: Towards High-quality System Dynamics Modeling

This paper studies the problem of modeling multi-agent dynamical systems, where agents could interact mutually to influence their behaviors. Recent research predominantly uses geometric graphs to depict these mutual interactions, which are then captured by powerful graph neural networks (GNNs). However, predicting interacting dynamics in challenging scenarios such as out-of-distribution shift and complicated underlying rules remains unsolved. In this paper, we propose a new approach named Prototypical Graph ODE (PGODE) to address the problem. The core of PGODE is to incorporate prototype decomposition from contextual knowledge into a continuous graph ODE framework. Specifically, PGODE employs representation disentanglement and system parameters to extract both object-level and system-level contexts from historical trajectories, which allows us to explicitly model their independent influence and thus enhances the generalization capability under system changes. Then, we integrate these disentangled latent representations into a graph ODE model, which determines a combination of various interacting prototypes for enhanced model expressivity. The entire model is optimized using an end-to-end variational inference framework to maximize the likelihood. Extensive experiments in both in-distribution and out-of-distribution settings validate the superiority of PGODE compared to various baselines.

Updated: 2024-06-26 23:37:46

标题: PGODE：朝着高质量的系统动力学建模方向

摘要: 本文研究了建模多智能体动态系统的问题，其中智能体可以相互交互以影响它们的行为。最近的研究主要使用几何图形来描述这些相互作用，然后通过强大的图神经网络（GNNs）来捕捉这些相互作用。然而，在诸如分布偏移和复杂的基础规则等具有挑战性的场景中预测相互作用动态仍未解决。在本文中，我们提出了一种名为原型图ODE（PGODE）的新方法来解决这个问题。PGODE的核心是将原型分解从上下文知识中引入连续图ODE框架中。具体而言，PGODE利用表示解缠和系统参数从历史轨迹中提取对象级和系统级上下文，这使我们能够明确地建模它们的独立影响，从而增强了在系统变化下的泛化能力。然后，我们将这些解缠的潜在表示集成到图ODE模型中，确定各种相互作用原型的组合，以增强模型的表达能力。整个模型使用端到端变分推断框架进行优化，以最大化可能性。在分布内和分布外的设置中进行了大量实验证明了PGODE相对于各种基线的优越性。

更新时间: 2024-06-26 23:37:46

领域: cs.LG

下载: http://arxiv.org/abs/2311.06554v2

Operator Learning of Lipschitz Operators: An Information-Theoretic Perspective

Operator learning based on neural operators has emerged as a promising paradigm for the data-driven approximation of operators, mapping between infinite-dimensional Banach spaces. Despite significant empirical progress, our theoretical understanding regarding the efficiency of these approximations remains incomplete. This work addresses the parametric complexity of neural operator approximations for the general class of Lipschitz continuous operators. Motivated by recent findings on the limitations of specific architectures, termed curse of parametric complexity, we here adopt an information-theoretic perspective. Our main contribution establishes lower bounds on the metric entropy of Lipschitz operators in two approximation settings; uniform approximation over a compact set of input functions, and approximation in expectation, with input functions drawn from a probability measure. It is shown that these entropy bounds imply that, regardless of the activation function used, neural operator architectures attaining an approximation accuracy $\epsilon$ must have a size that is exponentially large in $\epsilon^{-1}$. The size of architectures is here measured by counting the number of encoded bits necessary to store the given model in computational memory. The results of this work elucidate fundamental trade-offs and limitations in

Updated: 2024-06-26 23:36:46

标题: 利普希茨算子的操作员学习：信息论视角

摘要: 基于神经算子的操作员学习已经成为数据驱动算子逼近的一个有前途的范例，它在无限维Banach空间之间建立了映射。尽管在实证方面取得了显著进展，但我们对于这些逼近的效率的理论理解仍然不完整。这项工作探讨了对于Lipschitz连续算子的神经算子逼近的参数复杂性。受最近关于特定架构限制的发现（被称为参数复杂性的诅咒）的启发，我们在这里采用了信息论的视角。我们的主要贡献是建立了Lipschitz算子在两种逼近设置中的度量熵的下界；在输入函数的紧致集上的均匀逼近，以及在期望中的逼近，其中输入函数是从概率分布中抽取的。结果表明，这些熵下界意味着，无论使用何种激活函数，实现逼近精度$\epsilon$的神经算子架构必须具有指数级的大小$\epsilon^{-1}$。这里架构的大小是通过计算存储给定模型所需的编码比特数来衡量的。这项工作的结果阐明了在逼近算子方面的基本权衡和限制。

更新时间: 2024-06-26 23:36:46

领域: cs.LG,cs.NA,math.NA

下载: http://arxiv.org/abs/2406.18794v1

MUMU: Bootstrapping Multimodal Image Generation from Text-to-Image Data

We train a model to generate images from multimodal prompts of interleaved text and images such as "a <picture of a man> man and his <picture of a dog> dog in an <picture of a cartoon> animated style." We bootstrap a multimodal dataset by extracting semantically meaningful image crops corresponding to words in the image captions of synthetically generated and publicly available text-image data. Our model, MUMU, is composed of a vision-language model encoder with a diffusion decoder and is trained on a single 8xH100 GPU node. Despite being only trained on crops from the same image, MUMU learns to compose inputs from different images into a coherent output. For example, an input of a realistic person and a cartoon will output the same person in the cartoon style, and an input of a standing subject and a scooter will output the subject riding the scooter. As a result, our model generalizes to tasks such as style transfer and character consistency. Our results show the promise of using multimodal models as general purpose controllers for image generation.

Updated: 2024-06-26 23:21:42

标题: MUMU：从文本到图像数据引导多模态图像生成

摘要: 我们训练了一个模型，从交错的文本和图像的多模态提示中生成图像，例如“一位<图片中的人>男士和他的<图片中的狗>狗以<图片中的卡通>动画风格。”我们通过从合成生成的和公开可用的文本图像数据的图像标题中提取语义信息丰富的图像裁剪来引导一个多模态数据集。我们的模型MUMU由一个视觉语言模型编码器和一个扩散解码器组成，并在单个8xH100 GPU节点上进行训练。尽管只是在来自同一图像的裁剪上进行训练，MUMU学会了将来自不同图像的输入组合成连贯的输出。例如，一个现实的人物和一个卡通将输出相同风格的卡通人物，一个站立的主体和一个滑板车将输出主体骑着滑板车。因此，我们的模型推广到风格转移和角色一致性等任务。我们的结果显示了使用多模态模型作为图像生成的通用控制器的潜力。

更新时间: 2024-06-26 23:21:42

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2406.18790v1

Unified Uncertainties: Combining Input, Data and Model Uncertainty into a Single Formulation

Modelling uncertainty in Machine Learning models is essential for achieving safe and reliable predictions. Most research on uncertainty focuses on output uncertainty (predictions), but minimal attention is paid to uncertainty at inputs. We propose a method for propagating uncertainty in the inputs through a Neural Network that is simultaneously able to estimate input, data, and model uncertainty. Our results show that this propagation of input uncertainty results in a more stable decision boundary even under large amounts of input noise than comparatively simple Monte Carlo sampling. Additionally, we discuss and demonstrate that input uncertainty, when propagated through the model, results in model uncertainty at the outputs. The explicit incorporation of input uncertainty may be beneficial in situations where the amount of input uncertainty is known, though good datasets for this are still needed.

Updated: 2024-06-26 23:13:45

标题: 统一不确定性：将输入、数据和模型不确定性结合为一个公式

摘要: 在机器学习模型中建模不确定性对于实现安全可靠的预测至关重要。大多数关于不确定性的研究都集中在输出不确定性（预测），但对输入不确定性的关注较少。我们提出了一种通过神经网络传播输入不确定性的方法，该方法同时能够估计输入、数据和模型的不确定性。我们的结果表明，通过传播输入不确定性，即使在大量输入噪声的情况下，也能产生更稳定的决策边界，而相对简单的蒙特卡罗抽样则无法做到这一点。此外，我们讨论并证明，通过模型传播输入不确定性会导致输出的模型不确定性。明确地将输入不确定性纳入模型可能在已知输入不确定性量的情况下是有益的，尽管仍需要良好的数据集来支持这一点。

更新时间: 2024-06-26 23:13:45

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2406.18787v1

Psychological Profiling in Cybersecurity: A Look at LLMs and Psycholinguistic Features

The increasing sophistication of cyber threats necessitates innovative approaches to cybersecurity. In this paper, we explore the potential of psychological profiling techniques, particularly focusing on the utilization of Large Language Models (LLMs) and psycholinguistic features. We investigate the intersection of psychology and cybersecurity, discussing how LLMs can be employed to analyze textual data for identifying psychological traits of threat actors. We explore the incorporation of psycholinguistic features, such as linguistic patterns and emotional cues, into cybersecurity frameworks. \iffalse Through case studies and experiments, we discuss the effectiveness of these methods in enhancing threat detection and mitigation strategies.\fi Our research underscores the importance of integrating psychological perspectives into cybersecurity practices to bolster defense mechanisms against evolving threats.

Updated: 2024-06-26 23:04:52

标题: 心理学在网络安全中的剖析：关于LLMs和心理语言学特征

摘要: 网络威胁的日益复杂化需要创新的网络安全方法。本文探讨了心理分析技术的潜力，特别关注大型语言模型（LLMs）和心理语言学特征的利用。我们调查了心理学和网络安全的交集，讨论了如何利用LLMs来分析文本数据，以识别威胁行为者的心理特征。我们探讨了将心理语言学特征，如语言模式和情绪线索，纳入网络安全框架的可能性。我们的研究强调了将心理学视角整合到网络安全实践中的重要性，以加强对不断演变的威胁的防御机制。

更新时间: 2024-06-26 23:04:52

领域: cs.CL,cs.LG

下载: http://arxiv.org/abs/2406.18783v1

Learning to Remove Cuts in Integer Linear Programming

Cutting plane methods are a fundamental approach for solving integer linear programs (ILPs). In each iteration of such methods, additional linear constraints (cuts) are introduced to the constraint set with the aim of excluding the previous fractional optimal solution while not affecting the optimal integer solution. In this work, we explore a novel approach within cutting plane methods: instead of only adding new cuts, we also consider the removal of previous cuts introduced at any of the preceding iterations of the method under a learnable parametric criteria. We demonstrate that in fundamental combinatorial optimization settings such cut removal policies can lead to significant improvements over both human-based and machine learning-guided cut addition policies even when implemented with simple models.

Updated: 2024-06-26 22:50:43

标题: 学习在整数线性规划中删除割线

摘要: 切割平面方法是解决整数线性规划(ILPs)问题的基本方法。在这些方法的每次迭代中，额外的线性约束(切割)被引入到约束集合中，旨在排除先前的分数最优解，同时不影响最优整数解。在这项工作中，我们探索了切割平面方法中的一种新方法：不仅添加新的切割，还考虑在任何先前迭代中引入的切割根据可学习的参数化标准进行删除。我们证明，在基本的组合优化设置中，这种切割删除策略可以比人类基于和机器学习引导的切割添加策略带来显著的改进，即使是使用简单模型实现。

更新时间: 2024-06-26 22:50:43

领域: math.OC,cs.DM,cs.LG,68R01

下载: http://arxiv.org/abs/2406.18781v1

Empirical Analysis of Fictitious Play for Nash Equilibrium Computation in Multiplayer Games

While fictitious play is guaranteed to converge to Nash equilibrium in certain game classes, such as two-player zero-sum games, it is not guaranteed to converge in non-zero-sum and multiplayer games. We show that fictitious play in fact leads to improved Nash equilibrium approximation over a variety of game classes and sizes than (counterfactual) regret minimization, which has recently produced superhuman play for multiplayer poker. We also show that when fictitious play is run several times using random initializations it is able to solve several known challenge problems in which the standard version is known to not converge, including Shapley's classic counterexample. These provide some of the first positive results for fictitious play in these settings, despite the fact that worst-case theoretical results are negative.

Updated: 2024-06-26 22:27:36

标题: 多人游戏中虚拟博弈对纳什均衡计算的实证分析

摘要: 虚构博弈在某些游戏类别中（如两人零和游戏）收敛至纳什均衡是有保证的，但在非零和和多人游戏中并不保证收敛。我们展示虚构博弈实际上比（反事实）遗憾最小化在各种游戏类别和规模上都能得到更好的纳什均衡逼近，而遗憾最小化最近为多人扑克游戏产生了超人类水平的游戏。我们还展示了当虚构博弈多次运行并使用随机初始化时，能够解决一些已知的挑战问题，包括著名的沙普利反例。这些结果为虚构博弈在这些设置中的一些首次积极结果，尽管最坏情况下的理论结果是负面的。

更新时间: 2024-06-26 22:27:36

领域: cs.GT,cs.AI,cs.MA,econ.TH

下载: http://arxiv.org/abs/2001.11165v9

Aligning Model Properties via Conformal Risk Control

AI model alignment is crucial due to inadvertent biases in training data and the underspecified pipeline in modern machine learning, where numerous models with excellent test set metrics can be produced, yet they may not meet end-user requirements. Recent advances demonstrate that post-training model alignment via human feedback can address some of these challenges. However, these methods are often confined to settings (such as generative AI) where humans can interpret model outputs and provide feedback. In traditional non-generative settings, where model outputs are numerical values or classes, detecting misalignment through single-sample outputs is highly challenging. In this paper we consider an alternative strategy. We propose interpreting model alignment through property testing, defining an aligned model $f$ as one belonging to a subset $\mathcal{P}$ of functions that exhibit specific desired behaviors. We focus on post-processing a pre-trained model $f$ to better align with $\mathcal{P}$ using conformal risk control. Specifically, we develop a general procedure for converting queries for a given property $\mathcal{P}$ to a collection of loss functions suitable for use in a conformal risk control algorithm. We prove a probabilistic guarantee that the resulting conformal interval around $f$ contains a function approximately satisfying $\mathcal{P}$. Given the capabilities of modern AI models with extensive parameters and training data, one might assume alignment issues will resolve naturally. However, increasing training data or parameters in a random feature model doesn't eliminate the need for alignment techniques when pre-training data is biased. We demonstrate our alignment methodology on supervised learning datasets for properties like monotonicity and concavity. Our flexible procedure can be applied to various desired properties.

Updated: 2024-06-26 22:24:46

标题: 通过一致风险控制对齐模型属性

摘要: 人工智能模型对齐是至关重要的，因为训练数据中存在无意的偏见，现代机器学习中的流程不够明确，可以产生许多具有优秀测试集指标的模型，但它们可能不符合最终用户的要求。最近的进展表明，通过人类反馈进行后期模型对齐可以解决其中一些挑战。然而，这些方法通常局限于人类可以解释模型输出并提供反馈的环境（如生成式人工智能）。在传统的非生成式环境中，模型输出为数值或类别，通过单一样本输出检测不一致性非常具有挑战性。在本文中，我们考虑了一种替代策略。我们提出通过属性测试来解释模型对齐，将一个对齐的模型$f$定义为属于一组展示特定期望行为的函数$\mathcal{P}$的子集。我们专注于后处理一个预训练模型$f$，以更好地与$\mathcal{P}$对齐，使用符合风险控制。具体来说，我们开发了一个通用程序，将对给定属性$\mathcal{P}$的查询转换为适用于符合风险控制算法的一组损失函数。我们证明了一个概率保证，即围绕$f$的结果符合$\mathcal{P}$的函数。鉴于现代人工智能模型具有大量参数和训练数据的能力，人们可能认为对齐问题将自然解决。然而，在预训练数据存在偏见时，增加训练数据或参数在随机特征模型中并不能消除对齐技术的需求。我们在监督学习数据集上展示了我们的对齐方法，针对诸如单调性和凹性等属性。我们的灵活程序可以应用于各种期望的属性。

更新时间: 2024-06-26 22:24:46

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2406.18777v1

Latent diffusion models for parameterization and data assimilation of facies-based geomodels

Geological parameterization entails the representation of a geomodel using a small set of latent variables and a mapping from these variables to grid-block properties such as porosity and permeability. Parameterization is useful for data assimilation (history matching), as it maintains geological realism while reducing the number of variables to be determined. Diffusion models are a new class of generative deep-learning procedures that have been shown to outperform previous methods, such as generative adversarial networks, for image generation tasks. Diffusion models are trained to "denoise", which enables them to generate new geological realizations from input fields characterized by random noise. Latent diffusion models, which are the specific variant considered in this study, provide dimension reduction through use of a low-dimensional latent variable. The model developed in this work includes a variational autoencoder for dimension reduction and a U-net for the denoising process. Our application involves conditional 2D three-facies (channel-levee-mud) systems. The latent diffusion model is shown to provide realizations that are visually consistent with samples from geomodeling software. Quantitative metrics involving spatial and flow-response statistics are evaluated, and general agreement between the diffusion-generated models and reference realizations is observed. Stability tests are performed to assess the smoothness of the parameterization method. The latent diffusion model is then used for ensemble-based data assimilation. Two synthetic "true" models are considered. Significant uncertainty reduction, posterior P$_{10}$-P$_{90}$ forecasts that generally bracket observed data, and consistent posterior geomodels, are achieved in both cases.

Updated: 2024-06-26 22:23:23

标题: 潜在扩散模型用于相位基地质模型参数化和数据同化

摘要: 地质参数化涉及使用一小组潜在变量来表示地质模型，并将这些变量映射到网格块属性，如孔隙度和渗透率。参数化对于数据同化（历史匹配）很有用，因为它在保持地质现实性的同时减少了需要确定的变量数量。扩散模型是一种新型生成式深度学习程序，已被证明在图像生成任务中优于以往的方法，如生成对抗网络。扩散模型被训练用于“去噪”，这使它们能够从以随机噪声特征化的输入字段生成新的地质实现。潜在扩散模型是本研究考虑的特定变体，通过使用低维潜在变量实现维度缩减。本研究开发的模型包括用于维度缩减的变分自动编码器和用于去噪过程的U-net。我们的应用涉及条件2D三相（河道-河岸-泥）系统。潜在扩散模型显示出生成的实现与地质建模软件中的样本视觉一致。评估涉及空间和流动响应统计的定量指标，并观察到扩散生成的模型和参考实现之间的一般一致性。进行稳定性测试以评估参数化方法的平滑性。然后使用潜在扩散模型进行基于集合的数据同化。考虑了两个合成的“真实”模型。在两种情况下都实现了显著的不确定性减少，后验P$_{10}$-P$_{90}$预测通常包围了观测数据，并实现了一致的后验地质模型。

更新时间: 2024-06-26 22:23:23

领域: cs.CV,cs.AI,cs.CE,cs.LG,physics.geo-ph

下载: http://arxiv.org/abs/2406.14815v2

Machine Learning-Enabled Software and System Architecture Frameworks

Various architecture frameworks for software, systems, and enterprises have been proposed in the literature. They identified several stakeholders and defined modeling perspectives, architecture viewpoints, and views to frame and address stakeholder concerns. However, the stakeholders with data science and Machine Learning (ML) related concerns, such as data scientists and data engineers, are yet to be included in existing architecture frameworks. Only this way can we envision a holistic system architecture description of an ML-enabled system. Note that the ML component behavior and functionalities are special and should be distinguished from traditional software system behavior and functionalities. The main reason is that the actual functionality should be inferred from data instead of being specified at design time. Additionally, the structural models of ML components, such as ML model architectures, are typically specified using different notations and formalisms from what the Software Engineering (SE) community uses for software structural models. Yet, these two aspects, namely ML and non-ML, are becoming so intertwined that it necessitates an extension of software architecture frameworks and modeling practices toward supporting ML-enabled system architectures. In this paper, we address this gap through an empirical study using an online survey instrument. We surveyed 61 subject matter experts from over 25 organizations in 10 countries.

Updated: 2024-06-26 22:09:04

标题: 机器学习驱动的软件和系统架构框架

摘要: 文献中提出了各种面向软件、系统和企业的架构框架。它们确定了几个利益相关者，并定义了建模透视、架构观点和视图，以框定和解决利益相关者的关注点。然而，具有数据科学和机器学习（ML）相关关注点的利益相关者，如数据科学家和数据工程师，尚未包含在现有的架构框架中。只有这样，我们才能构想一个ML启用系统的整体系统架构描述。需要注意的是，ML组件的行为和功能是特殊的，应该与传统软件系统的行为和功能区分开来。主要原因是实际功能应该从数据中推断出来，而不是在设计时指定。此外，ML组件的结构模型，如ML模型架构，通常使用不同的符号和形式规范来指定，与软件工程（SE）社区用于软件结构模型的不同。然而，这两个方面，即ML和非ML，已经变得如此交织在一起，以至于需要扩展软件架构框架和建模实践，以支持ML启用系统架构。在本文中，我们通过使用在线调查工具进行实证研究，填补了这一空白。我们在10个国家的25多个组织中对61位主题专家进行了调查。

更新时间: 2024-06-26 22:09:04

领域: cs.SE,cs.LG

下载: http://arxiv.org/abs/2308.05239v2

Nonparametric Strategy Test

We present a nonparametric statistical test for determining whether an agent is following a given mixed strategy in a repeated strategic-form game given samples of the agent's play. This involves two components: determining whether the agent's frequencies of pure strategies are sufficiently close to the target frequencies, and determining whether the pure strategies selected are independent between different game iterations. Our integrated test involves applying a chi-squared goodness of fit test for the first component and a generalized Wald-Wolfowitz runs test for the second component. The results from both tests are combined using Bonferroni correction to produce a complete test for a given significance level $\alpha.$ We applied the test to publicly available data of human rock-paper-scissors play. The data consists of 50 iterations of play for 500 human players. We test with a null hypothesis that the players are following a uniform random strategy independently at each game iteration. Using a significance level of $\alpha = 0.05$, we conclude that 305 (61%) of the subjects are following the target strategy.

Updated: 2024-06-26 22:06:49

标题: 非参数策略检验

摘要: 我们提出了一个非参数统计检验，用于确定一个代理是否在一个重复的战略型博弈中遵循一个给定的混合策略，给定代理的游戏样本。这涉及两个组成部分：确定代理的纯策略频率是否足够接近目标频率，以及确定所选择的纯策略在不同游戏迭代之间是否独立。我们的综合测试包括对第一部分应用卡方拟合优度检验和对第二部分应用广义瓦尔德-沃尔夫维茨序列检验。两个测试的结果使用Bonferroni校正结合在一起，以产生一个完整的测试，对于给定的显著水平α。我们将该测试应用于公开可用的人类石头-剪刀-布游戏数据。该数据包括500名人类玩家的50次游戏迭代。我们以一个零假设进行测试，即玩家在每次游戏迭代中都独立地遵循一个均匀随机策略。使用显著水平α = 0.05，我们得出结论，305名（61%）主体正在遵循目标策略。

更新时间: 2024-06-26 22:06:49

领域: stat.ME,cs.AI,cs.GT,cs.MA,econ.TH

下载: http://arxiv.org/abs/2312.10695v3

Redactable Blockchain Solutions for IoT: A Review of Mechanisms and Applications

The integration of blockchain technology with the Internet of Things (IoT) presents a promising solution to enhance data security, integrity, and trust within IoT ecosystems. However, the immutable nature of blockchain technology conflicts with data redaction requirements mandated by data protection laws. This paper provides a comprehensive review of the current state of redactable blockchains and redaction mechanisms, particularly focusing on their application within IoT contexts. Through an extensive review of existing literature, this paper identifies key challenges and opportunities in implementing redactable blockchains for IoT data management. Various redaction mechanisms are explored, and the paper examines IoT implementations and use cases where redactable blockchains are employed to address data protection concerns.

Updated: 2024-06-26 22:03:57

标题: 可编辑的物联网区块链解决方案：机制和应用的综述

摘要: 区块链技术与物联网（IoT）的整合提供了一个有前途的解决方案，可以提升IoT生态系统中的数据安全性、完整性和信任度。然而，区块链技术的不可变性与数据保护法规规定的数据消除要求存在冲突。本文全面审查了可编辑区块链和消除机制的当前状态，特别关注它们在IoT背景下的应用。通过对现有文献的广泛审查，本文确定了在实施可编辑区块链用于IoT数据管理时的关键挑战和机遇。探讨了各种消除机制，并审查了IoT实施和使用案例中采用可编辑区块链来解决数据保护问题。

更新时间: 2024-06-26 22:03:57

领域: cs.CR

下载: http://arxiv.org/abs/2407.05948v1

Towards understanding neural collapse in supervised contrastive learning with the information bottleneck method

Neural collapse describes the geometry of activation in the final layer of a deep neural network when it is trained beyond performance plateaus. Open questions include whether neural collapse leads to better generalization and, if so, why and how training beyond the plateau helps. We model neural collapse as an information bottleneck (IB) problem in order to investigate whether such a compact representation exists and discover its connection to generalization. We demonstrate that neural collapse leads to good generalization specifically when it approaches an optimal IB solution of the classification problem. Recent research has shown that two deep neural networks independently trained with the same contrastive loss objective are linearly identifiable, meaning that the resulting representations are equivalent up to a matrix transformation. We leverage linear identifiability to approximate an analytical solution of the IB problem. This approximation demonstrates that when class means exhibit $K$-simplex Equiangular Tight Frame (ETF) behavior (e.g., $K$=10 for CIFAR10 and $K$=100 for CIFAR100), they coincide with the critical phase transitions of the corresponding IB problem. The performance plateau occurs once the optimal solution for the IB problem includes all of these phase transitions. We also show that the resulting $K$-simplex ETF can be packed into a $K$-dimensional Gaussian distribution using supervised contrastive learning with a ResNet50 backbone. This geometry suggests that the $K$-simplex ETF learned by supervised contrastive learning approximates the optimal features for source coding. Hence, there is a direct correspondence between optimal IB solutions and generalization in contrastive learning.

Updated: 2024-06-26 21:52:52

标题: 朝向理解监督对比学习中神经坍塌的信息瓶颈方法

摘要: 神经崩溃描述了深度神经网络在训练超出性能平台时最终层激活的几何形状。开放的问题包括神经崩溃是否会导致更好的泛化，以及如果是这样的话，为什么以及如何训练超出平台会有所帮助。我们将神经崩溃建模为信息瓶颈（IB）问题，以便研究是否存在这样一种紧凑的表示，并发现其与泛化的联系。我们证明神经崩溃会导致良好的泛化，特别是当它接近分类问题的最优IB解时。最近的研究表明，使用相同对比损失目标独立训练的两个深度神经网络是线性可识别的，这意味着生成的表示在矩阵变换上是等价的。我们利用线性可识别性来近似IB问题的解析解。这种近似表明，当类别均值表现出$K$-simplex等角紧框（ETF）行为时（例如，CIFAR10的$K$=10和CIFAR100的$K$=100），它们与相应IB问题的临界相变重合。性能平台出现在IB问题的最优解包括所有这些相变时。我们还展示，通过使用ResNet50骨干结构进行监督对比学习，可以将生成的$K$-simplex ETF打包成$K$维高斯分布。这种几何形状表明，通过监督对比学习学习的$K$-simplex ETF近似了源编码的最优特征。因此，在对比学习中，最优IB解和泛化之间存在直接的对应关系。

更新时间: 2024-06-26 21:52:52

领域: cs.LG,cs.IT,math.IT

下载: http://arxiv.org/abs/2305.11957v2

ADO-LLM: Analog Design Bayesian Optimization with In-Context Learning of Large Language Models

Analog circuit design requires substantial human expertise and involvement, which is a significant roadblock to design productivity. Bayesian Optimization (BO), a popular machine learning based optimization strategy, has been leveraged to automate analog design given its applicability across various circuit topologies and technologies. Traditional BO methods employ black box Gaussian Process surrogate models and optimized labeled data queries to find optimization solutions by trading off between exploration and exploitation. However, the search for the optimal design solution in BO can be expensive from both a computational and data usage point of view, particularly for high dimensional optimization problems. This paper presents ADO-LLM, the first work integrating large language models (LLMs) with Bayesian Optimization for analog design optimization. ADO-LLM leverages the LLM's ability to infuse domain knowledge to rapidly generate viable design points to remedy BO's inefficiency in finding high value design areas specifically under the limited design space coverage of the BO's probabilistic surrogate model. In the meantime, sampling of design points evaluated in the iterative BO process provides quality demonstrations for the LLM to generate high quality design points while leveraging infused broad design knowledge. Furthermore, the diversity brought by BO's exploration enriches the contextual understanding of the LLM and allows it to more broadly search in the design space and prevent repetitive and redundant suggestions. We evaluate the proposed framework on two different types of analog circuits and demonstrate notable improvements in design efficiency and effectiveness.

Updated: 2024-06-26 21:42:50

标题: ADO-LLM：具有大型语言模型上下文学习的模拟设计贝叶斯优化

摘要: 模拟电路设计需要大量的人类专业知识和参与，这是设计生产力的一个重要障碍。贝叶斯优化（BO）是一种流行的基于机器学习的优化策略，已经被利用来自动化模拟设计，因为它适用于各种电路拓扑和技术。传统的BO方法采用黑匣子高斯过程替代模型和优化标记数据查询来通过在探索和开发之间权衡来找到优化解决方案。然而，在BO中寻找最佳设计解决方案可能在计算和数据使用方面昂贵，特别是对于高维优化问题。本文介绍了ADO-LLM，这是第一个将大型语言模型（LLMs）与贝叶斯优化相结合进行模拟设计优化的工作。ADO-LLM利用LLM的能力来注入领域知识，快速生成可行的设计点，以弥补BO在发现高价值设计区域方面的低效性，特别是在BO的概率替代模型的有限设计空间覆盖下。同时，在迭代BO过程中评估的设计点的抽样为LLM提供了高质量的演示，从而生成高质量的设计点，同时利用注入的广泛设计知识。此外，BO的探索带来的多样性丰富了LLM的情境理解，使其能够更广泛地在设计空间中进行搜索，避免重复和冗余的建议。我们在两种不同类型的模拟电路上评估提出的框架，并展示了设计效率和效果方面的显著改进。

更新时间: 2024-06-26 21:42:50

领域: cs.LG

下载: http://arxiv.org/abs/2406.18770v1

WV-Net: A foundation model for SAR WV-mode satellite imagery trained using contrastive self-supervised learning on 10 million images

The European Space Agency's Copernicus Sentinel-1 (S-1) mission is a constellation of C-band synthetic aperture radar (SAR) satellites that provide unprecedented monitoring of the world's oceans. S-1's wave mode (WV) captures 20x20 km image patches at 5 m pixel resolution and is unaffected by cloud cover or time-of-day. The mission's open data policy has made SAR data easily accessible for a range of applications, but the need for manual image annotations is a bottleneck that hinders the use of machine learning methods. This study uses nearly 10 million WV-mode images and contrastive self-supervised learning to train a semantic embedding model called WV-Net. In multiple downstream tasks, WV-Net outperforms a comparable model that was pre-trained on natural images (ImageNet) with supervised learning. Experiments show improvements for estimating wave height (0.50 vs 0.60 RMSE using linear probing), estimating near-surface air temperature (0.90 vs 0.97 RMSE), and performing multilabel-classification of geophysical and atmospheric phenomena (0.96 vs 0.95 micro-averaged AUROC). WV-Net embeddings are also superior in an unsupervised image-retrieval task and scale better in data-sparse settings. Together, these results demonstrate that WV-Net embeddings can support geophysical research by providing a convenient foundation model for a variety of data analysis and exploration tasks.

Updated: 2024-06-26 21:30:41

标题: WV-Net：使用对比自监督学习在1000万张图像上训练的SAR WV模式卫星图像的基础模型

摘要: 欧洲空间局的哥白尼哨兵-1（S-1）任务是一组C波段合成孔径雷达（SAR）卫星，提供了对世界海洋的前所未有的监测能力。S-1的波模式（WV）以5米像素分辨率捕获20x20公里的图像块，并不受云层覆盖或时间影响。该任务的开放数据政策使得SAR数据易于获取用于各种应用，但手动图像注释的需求成为限制机器学习方法使用的瓶颈。本研究利用近1000万个WV模式图像和对比自监督学习训练了一个称为WV-Net的语义嵌入模型。在多个下游任务中，WV-Net优于一个使用监督学习在自然图像（ImageNet）上预先训练的可比较模型。实验结果显示，在估计波高（使用线性探测时RMSE为0.50 vs 0.60）、估计近地表气温（RMSE为0.90 vs 0.97）以及执行地球物理和大气现象的多标签分类（微平均AUROC为0.96 vs 0.95）方面有所改进。WV-Net嵌入也在无监督图像检索任务中表现优秀，并在数据稀疏环境中更好地扩展。综合这些结果表明，WV-Net嵌入可以通过为各种数据分析和探索任务提供方便的基础模型来支持地球物理研究。

更新时间: 2024-06-26 21:30:41

领域: cs.LG,cs.AI,cs.CV,J.2; I.4.10

下载: http://arxiv.org/abs/2406.18765v1

Conformalized Link Prediction on Graph Neural Networks

Graph Neural Networks (GNNs) excel in diverse tasks, yet their applications in high-stakes domains are often hampered by unreliable predictions. Although numerous uncertainty quantification methods have been proposed to address this limitation, they often lack \textit{rigorous} uncertainty estimates. This work makes the first attempt to introduce a distribution-free and model-agnostic uncertainty quantification approach to construct a predictive interval with a statistical guarantee for GNN-based link prediction. We term it as \textit{conformalized link prediction.} Our approach builds upon conformal prediction (CP), a framework that promises to construct statistically robust prediction sets or intervals. We first theoretically and empirically establish a permutation invariance condition for the application of CP in link prediction tasks, along with an exact test-time coverage. Leveraging the important structural information in graphs, we then identify a novel and crucial connection between a graph's adherence to the power law distribution and the efficiency of CP. This insight leads to the development of a simple yet effective sampling-based method to align the graph structure with a power law distribution prior to the standard CP procedure. Extensive experiments demonstrate that for conformalized link prediction, our approach achieves the desired marginal coverage while significantly improving the efficiency of CP compared to baseline methods.

Updated: 2024-06-26 21:17:37

标题: 图神经网络上的共形化链路预测

摘要: 图神经网络（GNNs）在各种任务中表现出色，但它们在高风险领域的应用往往受到不可靠预测的阻碍。尽管已经提出了许多不确定性量化方法来解决这一限制，但它们往往缺乏严格的不确定性估计。本文首次尝试引入一种无分布且与模型无关的不确定性量化方法，用于构建具有统计保证的基于GNN的链接预测的预测区间。我们将其称为“符合化链接预测”。我们的方法基于符合预测（CP），这是一个承诺构建具有统计鲁棒性的预测集或区间的框架。我们首先在理论上和实证上建立了一个排列不变性条件，用于在链接预测任务中应用CP，同时具有精确的测试时间覆盖。利用图中的重要结构信息，我们发现了图遵循幂律分布与CP效率之间的新颖而关键的联系。这一见解导致了一种基于采样的简单而有效的方法，将图结构与幂律分布事先对齐，然后进行标准CP过程。广泛的实验表明，对于符合化链接预测，我们的方法实现了所需的边际覆盖，同时与基线方法相比显著提高了CP的效率。

更新时间: 2024-06-26 21:17:37

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2406.18763v1

Source-Free Domain Adaptation with Diffusion-Guided Source Data Generation

This paper introduces a novel approach to leverage the generalizability of Diffusion Models for Source-Free Domain Adaptation (DM-SFDA). Our proposed DMSFDA method involves fine-tuning a pre-trained text-to-image diffusion model to generate source domain images using features from the target images to guide the diffusion process. Specifically, the pre-trained diffusion model is fine-tuned to generate source samples that minimize entropy and maximize confidence for the pre-trained source model. We then use a diffusion model-based image mixup strategy to bridge the domain gap between the source and target domains. We validate our approach through comprehensive experiments across a range of datasets, including Office-31, Office-Home, and VisDA. The results demonstrate significant improvements in SFDA performance, highlighting the potential of diffusion models in generating contextually relevant, domain-specific images.

Updated: 2024-06-26 20:57:15

标题: 无源域自适应与扩散引导的源数据生成

摘要: 本文介绍了一种新颖的方法，利用扩散模型进行无源领域自适应（DM-SFDA）的泛化能力。我们提出的DMSFDA方法涉及微调预训练的文本到图像扩散模型，以生成源域图像，使用目标图像的特征来指导扩散过程。具体来说，预训练的扩散模型被微调为生成最小熵和最大置信度的源样本，以适应预训练的源模型。然后，我们使用基于扩散模型的图像混合策略来弥合源域和目标域之间的领域差距。我们通过在一系列数据集上进行全面实验证实我们的方法，包括Office-31、Office-Home和VisDA。结果表明，在SFDA性能方面取得了显著改进，突出了扩散模型在生成具有上下文相关性的领域特定图像方面的潜力。

更新时间: 2024-06-26 20:57:15

领域: cs.CV,cs.AI,cs.LG

下载: http://arxiv.org/abs/2402.04929v3

The Impact of Feature Representation on the Accuracy of Photonic Neural Networks

Photonic Neural Networks (PNNs) are gaining significant interest in the research community due to their potential for high parallelization, low latency, and energy efficiency. PNNs compute using light, which leads to several differences in implementation when compared to electronics, such as the need to represent input features in the photonic domain before feeding them into the network. In this encoding process, it is common to combine multiple features into a single input to reduce the number of inputs and associated devices, leading to smaller and more energy-efficient PNNs. Although this alters the network's handling of input data, its impact on PNNs remains understudied. This paper addresses this open question, investigating the effect of commonly used encoding strategies that combine features on the performance and learning capabilities of PNNs. Here, using the concept of feature importance, we develop a mathematical framework for analyzing feature combination. Through this framework, we demonstrate that encoding multiple features together in a single input determines their relative importance, thus limiting the network's ability to learn from the data. Given some prior knowledge of the data, however, this can also be leveraged for higher accuracy. By selecting an optimal encoding method, we achieve up to a 12.3\% improvement in accuracy of PNNs trained on the Iris dataset compared to other encoding techniques, surpassing the performance of networks where features are not combined. These findings highlight the importance of carefully choosing the encoding to the accuracy and decision-making strategies of PNNs, particularly in size or power constrained applications.

Updated: 2024-06-26 20:55:26

标题: 特征表示对光子神经网络准确性的影响

摘要: 光子神经网络（PNNs）由于其高度并行化、低延迟和能源效率的潜力，正在研究界引起了极大关注。PNNs使用光进行计算，与电子相比，在实施时存在一些不同之处，例如需要在输入到网络之前在光子领域表示输入特征。在这种编码过程中，通常将多个特征组合成单个输入以减少输入数量和相关设备，从而导致更小、更节能的PNNs。尽管这改变了网络对输入数据的处理方式，但对PNNs的影响尚未得到充分研究。本文针对这一未解问题，研究了常用的将特征组合的编码策略对PNNs的性能和学习能力的影响。在这里，通过特征重要性的概念，我们开发了一个数学框架来分析特征的组合。通过这个框架，我们证明将多个特征一起编码为单个输入决定了它们的相对重要性，从而限制了网络从数据中学习的能力。然而，在一些数据的先验知识下，这也可以用于提高准确率。通过选择最佳的编码方法，与其他编码技术相比，我们在鸢尾花数据集上训练的PNNs的准确率提高了12.3％，超过了特征未组合的网络的性能。这些发现突显了仔细选择编码对PNNs的准确性和决策策略的重要性，特别是在尺寸或功耗受限的应用中。

更新时间: 2024-06-26 20:55:26

领域: cs.ET,cs.AI,cs.LG

下载: http://arxiv.org/abs/2406.18757v1

Diff3Dformer: Leveraging Slice Sequence Diffusion for Enhanced 3D CT Classification with Transformer Networks

The manifestation of symptoms associated with lung diseases can vary in different depths for individual patients, highlighting the significance of 3D information in CT scans for medical image classification. While Vision Transformer has shown superior performance over convolutional neural networks in image classification tasks, their effectiveness is often demonstrated on sufficiently large 2D datasets and they easily encounter overfitting issues on small medical image datasets. To address this limitation, we propose a Diffusion-based 3D Vision Transformer (Diff3Dformer), which utilizes the latent space of the Diffusion model to form the slice sequence for 3D analysis and incorporates clustering attention into ViT to aggregate repetitive information within 3D CT scans, thereby harnessing the power of the advanced transformer in 3D classification tasks on small datasets. Our method exhibits improved performance on two different scales of small datasets of 3D lung CT scans, surpassing the state of the art 3D methods and other transformer-based approaches that emerged during the COVID-19 pandemic, demonstrating its robust and superior performance across different scales of data. Experimental results underscore the superiority of our proposed method, indicating its potential for enhancing medical image classification tasks in real-world scenarios.

Updated: 2024-06-26 20:54:45

标题: Diff3Dformer：利用切片序列扩散增强变压器网络的3D CT分类

摘要: 与肺部疾病相关的症状表现可以在个体患者中的不同深度中变化，突显了CT扫描中的3D信息对医学图像分类的重要性。虽然Vision Transformer在图像分类任务中表现出优越的性能，但它们的有效性通常是在足够大的2D数据集上展示的，并且在小型医学图像数据集上很容易遇到过拟合问题。为了解决这一限制，我们提出了一种基于扩散的3D Vision Transformer（Diff3Dformer），它利用扩散模型的潜在空间来形成用于3D分析的切片序列，并将聚类注意力集成到ViT中，以聚合3D CT扫描中的重复信息，从而利用先进的变压器在小型数据集上的3D分类任务中的能力。我们的方法在两个不同规模的小型3D肺部CT扫描数据集上表现出改进的性能，超越了最先进的3D方法和其他在COVID-19大流行期间出现的基于变压器的方法，展示了其在不同数据规模下的稳健和优越性能。实验结果强调了我们提出的方法的优越性，表明其在实际场景中增强医学图像分类任务的潜力。

更新时间: 2024-06-26 20:54:45

领域: eess.IV,cs.CV,cs.LG

下载: http://arxiv.org/abs/2406.17173v2

Outlier-Efficient Hopfield Layers for Large Transformer-Based Models

We introduce an Outlier-Efficient Modern Hopfield Model (termed $\mathrm{OutEffHop}$) and use it to address the outlier inefficiency problem of {training} gigantic transformer-based models. Our main contribution is a novel associative memory model facilitating \textit{outlier-efficient} associative memory retrievals. Interestingly, this memory model manifests a model-based interpretation of an outlier-efficient attention mechanism (${\rm Softmax}_1$): it is an approximation of the memory retrieval process of $\mathrm{OutEffHop}$. Methodologically, this allows us to introduce novel outlier-efficient Hopfield layers as powerful alternatives to traditional attention mechanisms, with superior post-quantization performance. Theoretically, the Outlier-Efficient Modern Hopfield Model retains and improves the desirable properties of standard modern Hopfield models, including fixed point convergence and exponential storage capacity. Empirically, we demonstrate the efficacy of the proposed model across large-scale transformer-based and Hopfield-based models (including BERT, OPT, ViT, and STanHop-Net), benchmarking against state-of-the-art methods like $\mathtt{Clipped\_Softmax}$ and $\mathtt{Gated\_Attention}$. Notably, $\mathrm{OutEffHop}$ achieves an average reduction of 22+\% in average kurtosis and 26+\% in the maximum infinity norm of model outputs across four models. Code is available at \href{https://github.com/MAGICS-LAB/OutEffHop}{GitHub}; models are on \href{https://huggingface.co/collections/magicslabnu/outeffhop-6610fcede8d2cda23009a98f}{Hugging Face Hub}; future updates are on \href{https://arxiv.org/abs/2404.03828}{arXiv}.

Updated: 2024-06-26 20:50:18

标题: 大型基于Transformer模型的异常值高效Hopfield层

摘要: 我们引入了一种异常值高效的现代Hopfield模型（称为$\mathrm{OutEffHop}$），并使用它来解决训练巨大的基于transformer的模型的异常值效率问题。我们的主要贡献是一种新颖的关联记忆模型，促进\textit{异常值高效}的关联记忆检索。有趣的是，这种记忆模型展现了一种基于模型的异常值高效注意机制（${\rm Softmax}_1$）的解释：它是$\mathrm{OutEffHop}$的记忆检索过程的近似。在方法上，这使我们能够引入新颖的异常值高效Hopfield层作为传统注意机制的强大替代品，具有优越的后量化性能。在理论上，异常值高效现代Hopfield模型保留并改进了标准现代Hopfield模型的良好特性，包括固定点收敛和指数存储容量。在实证方面，我们展示了所提出模型在大规模transformer和基于Hopfield的模型（包括BERT、OPT、ViT和STanHop-Net）上的有效性，与$\mathtt{Clipped\_Softmax}$和$\mathtt{Gated\_Attention}$等最先进方法进行了基准测试。值得注意的是，$\mathrm{OutEffHop}$在四个模型的平均峰度和模型输出的最大无穷范数上分别实现了平均22+\%和26+\%的减少。代码可在\href{https://github.com/MAGICS-LAB/OutEffHop}{GitHub}上找到；模型可在\href{https://huggingface.co/collections/magicslabnu/outeffhop-6610fcede8d2cda23009a98f}{Hugging Face Hub}上找到；未来更新将在\href{https://arxiv.org/abs/2404.03828}{arXiv}上发布。

更新时间: 2024-06-26 20:50:18

领域: cs.LG,cs.AI,stat.ML

下载: http://arxiv.org/abs/2404.03828v2

Decoding the AI Pen: Techniques and Challenges in Detecting AI-Generated Text

Large Language Models (LLMs) have revolutionized the field of Natural Language Generation (NLG) by demonstrating an impressive ability to generate human-like text. However, their widespread usage introduces challenges that necessitate thoughtful examination, ethical scrutiny, and responsible practices. In this study, we delve into these challenges, explore existing strategies for mitigating them, with a particular emphasis on identifying AI-generated text as the ultimate solution. Additionally, we assess the feasibility of detection from a theoretical perspective and propose novel research directions to address the current limitations in this domain.

Updated: 2024-06-26 20:49:32

标题: 解码AI笔：在检测AI生成文本中的技术和挑战

摘要: 大型语言模型（LLMs）通过展示出生成类似于人类的文本的印象能力，彻底改变了自然语言生成（NLG）领域。然而，它们的广泛使用引入了需要深思熟虑、伦理审查和负责任实践的挑战。在这项研究中，我们深入探讨这些挑战，探索现有的减轻挑战的策略，特别强调将AI生成的文本识别为最终解决方案。此外，我们从理论角度评估检测的可行性，并提出新的研究方向，以解决该领域目前的限制。

更新时间: 2024-06-26 20:49:32

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2403.05750v3

Competitive Algorithms for Online Knapsack with Succinct Predictions

In the online knapsack problem, the goal is to pack items arriving online with different values and weights into a capacity-limited knapsack to maximize the total value of the accepted items. We study \textit{learning-augmented} algorithms for this problem, which aim to use machine-learned predictions to move beyond pessimistic worst-case guarantees. Existing learning-augmented algorithms for online knapsack consider relatively complicated prediction models that give an algorithm substantial information about the input, such as the total weight of items at each value. In practice, such predictions can be error-sensitive and difficult to learn. Motivated by this limitation, we introduce a family of learning-augmented algorithms for online knapsack that use \emph{succinct predictions}. In particular, the machine-learned prediction given to the algorithm is just a single value or interval that estimates the minimum value of any item accepted by an offline optimal solution. By leveraging a relaxation to online \emph{fractional} knapsack, we design algorithms that can leverage such succinct predictions in both the trusted setting (i.e., with perfect prediction) and the untrusted setting, where we prove that a simple meta-algorithm achieves a nearly optimal consistency-robustness trade-off. Empirically, we show that our algorithms significantly outperform baselines that do not use predictions and often outperform algorithms based on more complex prediction models.

Updated: 2024-06-26 20:38:00

标题: 具有简要预测的在线背包问题的竞争算法

摘要: 在在线背包问题中，目标是将具有不同价值和重量的在线到达的物品装入容量有限的背包中，以最大化接受物品的总价值。我们研究了针对这个问题的\textit{学习增强}算法，旨在利用机器学习预测超越悲观的最坏情况保证。现有的在线背包学习增强算法考虑相对复杂的预测模型，为算法提供关于输入的大量信息，如每个价值处物品的总重量。在实践中，这种预测可能对错误敏感且难以学习。受到这一限制的启发，我们引入了一族针对在线背包的学习增强算法，使用\emph{简洁预测}。具体来说，提供给算法的机器学习预测只是一个单一值或区间，估计离线最优解接受的任何物品的最小价值。通过利用对在线\emph{分数}背包的松弛，我们设计了可以在受信任设置（即具有完美预测）和不受信任设置中利用这种简洁预测的算法，我们证明了一个简单的元算法实现几乎最优的一致性-鲁棒性权衡。从经验上看，我们展示了我们的算法明显优于不使用预测的基线，并经常优于基于更复杂预测模型的算法。

更新时间: 2024-06-26 20:38:00

领域: cs.LG,cs.GT,68Q25, 68T05,F.2.2; I.2.6

下载: http://arxiv.org/abs/2406.18752v1

A Stem-Agnostic Single-Decoder System for Music Source Separation Beyond Four Stems

Despite significant recent progress across multiple subtasks of audio source separation, few music source separation systems support separation beyond the four-stem vocals, drums, bass, and other (VDBO) setup. Of the very few current systems that support source separation beyond this setup, most continue to rely on an inflexible decoder setup that can only support a fixed pre-defined set of stems. Increasing stem support in these inflexible systems correspondingly requires increasing computational complexity, rendering extensions of these systems computationally infeasible for long-tail instruments. In this work, we propose Banquet, a system that allows source separation of multiple stems using just one decoder. A bandsplit source separation model is extended to work in a query-based setup in tandem with a music instrument recognition PaSST model. On the MoisesDB dataset, Banquet, at only 24.9 M trainable parameters, approached the performance level of the significantly more complex 6-stem Hybrid Transformer Demucs on VDBO stems and outperformed it on guitar and piano. The query-based setup allows for the separation of narrow instrument classes such as clean acoustic guitars, and can be successfully applied to the extraction of less common stems such as reeds and organs. Implementation is available at https://github.com/kwatcharasupat/query-bandit.

Updated: 2024-06-26 20:25:53

标题: 一个针对音乐源分离的无关于音频源的单解码器系统，超越四个音频源

摘要: 尽管在音频源分离的多个子任务上取得了显著的进展，但很少有音乐源分离系统支持超出四个音轨（人声、鼓、贝斯和其他）的分离设置。目前支持超出此设置的系统中，大多数仍然依赖于一种不灵活的解码器设置，只能支持一个固定预定义的音轨集。在这些不灵活系统中增加音轨支持相应地需要增加计算复杂性，使得这些系统的扩展在长尾乐器的情况下在计算上不可行。在这项工作中，我们提出了Banquet，一个系统，它允许使用一个解码器对多个音轨进行源分离。一种带分离源的模型被扩展为与音乐乐器识别PaSST模型一起在基于查询的设置中工作。在MoisesDB数据集上，Banquet仅使用24.9百万可训练参数，接近了复杂得多的6音轨混合变压器Demucs在VDBO音轨上的性能水平，并在吉他和钢琴上表现出色。基于查询的设置允许分离狭窄的乐器类别，如干净的原声吉他，并可以成功应用于提取较少见的音轨，如簧片和风琴。实现可在https://github.com/kwatcharasupat/query-bandit找到。

更新时间: 2024-06-26 20:25:53

领域: cs.SD,cs.AI,cs.IR,cs.LG,eess.AS

下载: http://arxiv.org/abs/2406.18747v1

AutoProSAM: Automated Prompting SAM for 3D Multi-Organ Segmentation

Segment Anything Model (SAM) is one of the pioneering prompt-based foundation models for image segmentation and has been rapidly adopted for various medical imaging applications. However, in clinical settings, creating effective prompts is notably challenging and time-consuming, requiring the expertise of domain specialists such as physicians. This requirement significantly diminishes SAM's primary advantage - its interactive capability with end users - in medical applications. Moreover, recent studies have indicated that SAM, originally designed for 2D natural images, performs sub optimally on 3D medical image segmentation tasks. This subpar performance is attributed to the domain gaps between natural and medical images and the disparities in spatial arrangements between 2D and 3D images, particularly in multi-organ segmentation applications. To overcome these challenges, we present a novel technique termed AutoProSAM. This method automates 3D multi-organ CT-based segmentation by leveraging SAM's foundational model capabilities without relying on domain experts for prompts. The approach utilizes parameter-efficient adaptation techniques to adapt SAM for 3D medical imagery and incorporates an effective automatic prompt learning paradigm specific to this domain. By eliminating the need for manual prompts, it enhances SAM's capabilities for 3D medical image segmentation and achieves state-of-the-art (SOTA) performance in CT-based multi-organ segmentation tasks.

Updated: 2024-06-26 20:25:03

标题: AutoProSAM：用于3D多器官分割的自动提示SAM

摘要: Segment Anything Model (SAM)是图像分割的先驱性基于提示的基础模型之一，已被迅速应用于各种医学影像应用中。然而，在临床环境中，创建有效的提示是明显具有挑战性和耗时的，需要领域专家（如医生）的专业知识。这一要求显著降低了SAM在医学应用中的主要优势 - 与最终用户的互动能力。此外，最近的研究表明，最初设计用于2D自然图像的SAM在3D医学图像分割任务上表现不佳。这种次优表现归因于自然图像和医学图像之间的领域差距以及2D和3D图像之间的空间布局差异，特别是在多器官分割应用中。为了克服这些挑战，我们提出了一种名为AutoProSAM的新技术。该方法通过利用SAM的基础模型能力，在不依赖领域专家的提示情况下，自动化进行基于3D多器官CT的分割。该方法利用参数高效的适应技术，对SAM进行适应以用于3D医学图像，并结合了适用于该领域的有效自动提示学习范式。通过消除手动提示的需要，它增强了SAM在3D医学图像分割中的能力，并在基于CT的多器官分割任务中实现了最先进的表现。

更新时间: 2024-06-26 20:25:03

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2308.14936v3

GRAM: An Interpretable Approach for Graph Anomaly Detection using Gradient Attention Maps

Detecting unusual patterns in graph data is a crucial task in data mining. However, existing methods face challenges in consistently achieving satisfactory performance and often lack interpretability, which hinders our understanding of anomaly detection decisions. In this paper, we propose a novel approach to graph anomaly detection that leverages the power of interpretability to enhance performance. Specifically, our method extracts an attention map derived from gradients of graph neural networks, which serves as a basis for scoring anomalies. Notably, our approach is flexible and can be used in various anomaly detection settings. In addition, we conduct theoretical analysis using synthetic data to validate our method and gain insights into its decision-making process. To demonstrate the effectiveness of our method, we extensively evaluate our approach against state-of-the-art graph anomaly detection techniques on real-world graph classification and wireless network datasets. The results consistently demonstrate the superior performance of our method compared to the baselines.

Updated: 2024-06-26 20:24:08

标题: GRAM：使用梯度关注图进行图异常检测的可解释方法

摘要: 在数据挖掘中，检测图数据中的异常模式是一项至关重要的任务。然而，现有方法在持续实现令人满意的性能方面面临挑战，并且常常缺乏可解释性，这阻碍了我们对异常检测决策的理解。在本文中，我们提出了一种利用可解释性提升性能的图异常检测新方法。具体而言，我们的方法提取自图神经网络梯度的注意力图，作为评分异常的基础。值得注意的是，我们的方法具有灵活性，可以应用于各种异常检测设置中。此外，我们使用合成数据进行理论分析，验证我们的方法并深入了解其决策过程。为了展示我们方法的有效性，我们在真实世界的图分类和无线网络数据集上与最先进的图异常检测技术进行了广泛评估。结果始终显示我们的方法相对于基线具有卓越的性能。

更新时间: 2024-06-26 20:24:08

领域: cs.LG

下载: http://arxiv.org/abs/2311.06153v2

Simultaneous Masking, Not Prompting Optimization: A Paradigm Shift in Fine-tuning LLMs for Simultaneous Translation

Large language models (LLMs) have achieved state-of-the-art performance in various language processing tasks, motivating their adoption in simultaneous translation. Current fine-tuning methods to adapt LLMs for simultaneous translation focus on prompting optimization strategies using either data augmentation or prompt structure modifications. However, these methods suffer from several issues, such as unnecessarily expanded training sets, computational inefficiency from dumping the key and value cache, increased prompt sizes, or restriction to a single decision policy. To eliminate these issues, in this work, we propose SimulMask, a new paradigm for fine-tuning LLMs for simultaneous translation. It utilizes a novel attention mask approach that models simultaneous translation during fine-tuning by masking attention for a desired decision policy. Applying the proposed SimulMask on a Falcon LLM for the IWSLT 2017 dataset, we have observed a significant translation quality improvement compared to state-of-the-art prompting optimization strategies on five language pairs while reducing the computational cost.

Updated: 2024-06-26 20:22:31

标题: 同时掩盖，而不是提示优化：在LLM微调中进行的一次范式转变

摘要: 大型语言模型（LLMs）在各种语言处理任务中取得了最先进的性能，促使它们在同时翻译中得到采用。目前用于调整LLMs进行同时翻译的微调方法侧重于使用数据增强或提示结构修改的优化策略。然而，这些方法存在一些问题，例如训练集过度扩展、从键值缓存中丢弃的计算效率低下、提示大小增加或限制为单一决策策略。为了消除这些问题，在这项工作中，我们提出了SimulMask，这是一种用于同时翻译的LLMs微调的新范式。它利用一种新颖的注意力掩码方法，在微调过程中通过对决策策略进行注意力掩码来建模同时翻译。将提出的SimulMask应用于Falcon LLM对IWSLT 2017数据集，我们观察到与最先进的提示优化策略相比，在五种语言对上显著提高了翻译质量，同时减少了计算成本。

更新时间: 2024-06-26 20:22:31

领域: cs.CL,cs.LG

下载: http://arxiv.org/abs/2405.10443v2

QBI: Quantile-based Bias Initialization for Efficient Private Data Reconstruction in Federated Learning

Federated learning enables the training of machine learning models on distributed data without compromising user privacy, as data remains on personal devices and only model updates, such as gradients, are shared with a central coordinator. However, recent research has shown that the central entity can perfectly reconstruct private data from shared model updates by maliciously initializing the model's parameters. In this paper, we propose QBI, a novel bias initialization method that significantly enhances reconstruction capabilities. This is accomplished by directly solving for bias values yielding sparse activation patterns. Further, we propose PAIRS, an algorithm that builds on QBI. PAIRS can be deployed when a separate dataset from the target domain is available to further increase the percentage of data that can be fully recovered. Measured by the percentage of samples that can be perfectly reconstructed from batches of various sizes, our approach achieves significant improvements over previous methods with gains of up to 50% on ImageNet and up to 60% on the IMDB sentiment analysis text dataset. Furthermore, we establish theoretical limits for attacks leveraging stochastic gradient sparsity, providing a foundation for understanding the fundamental constraints of these attacks. We empirically assess these limits using synthetic datasets. Finally, we propose and evaluate AGGP, a defensive framework designed to prevent gradient sparsity attacks, contributing to the development of more secure and private federated learning systems.

Updated: 2024-06-26 20:19:32

标题: QBI：基于分位数偏差初始化的联邦学习中高效私密数据重建

摘要: 联邦学习使得可以在分布式数据上训练机器学习模型，而不损害用户隐私，因为数据仍然保存在个人设备上，只有模型更新（如梯度）与中央协调器共享。然而，最近的研究表明，中央实体可以通过恶意初始化模型参数，从共享的模型更新中完美重建私人数据。在本文中，我们提出了一种新颖的偏置初始化方法QBI，显著增强了重建能力。这是通过直接求解产生稀疏激活模式的偏置值来实现的。此外，我们提出了基于QBI的算法PAIRS。当来自目标域的单独数据集可用时，PAIRS可以部署以进一步增加可以完全恢复的数据百分比。通过衡量可以从各种批次完美重建的样本百分比，我们的方法在ImageNet上取得了显著的改进，增益高达50％，在IMDB情感分析文本数据集上高达60％。此外，我们为利用随机梯度稀疏性的攻击建立了理论限制，为理解这些攻击的基本约束提供了基础。我们使用合成数据集对这些限制进行了实证评估。最后，我们提出和评估了AGGP，这是一个旨在防止梯度稀疏攻击的防御框架，有助于开发更安全和私密的联邦学习系统。

更新时间: 2024-06-26 20:19:32

领域: cs.LG

下载: http://arxiv.org/abs/2406.18745v1

Theory of Mind for Multi-Agent Collaboration via Large Language Models

While Large Language Models (LLMs) have demonstrated impressive accomplishments in both reasoning and planning, their abilities in multi-agent collaborations remains largely unexplored. This study evaluates LLM-based agents in a multi-agent cooperative text game with Theory of Mind (ToM) inference tasks, comparing their performance with Multi-Agent Reinforcement Learning (MARL) and planning-based baselines. We observed evidence of emergent collaborative behaviors and high-order Theory of Mind capabilities among LLM-based agents. Our results reveal limitations in LLM-based agents' planning optimization due to systematic failures in managing long-horizon contexts and hallucination about the task state. We explore the use of explicit belief state representations to mitigate these issues, finding that it enhances task performance and the accuracy of ToM inferences for LLM-based agents.

Updated: 2024-06-26 20:15:34

标题: 通过大型语言模型的多智能体协作的心灯理论

摘要: 尽管大型语言模型（LLMs）在推理和规划方面展示出令人印象深刻的成就，但它们在多智能体合作方面的能力仍然较少探索。本研究评估了基于LLM的智能体在带有心灵理论（ToM）推理任务的多智能体合作文本游戏中的表现，将它们与多智能体强化学习（MARL）和基于规划的基线进行比较。我们观察到基于LLM的智能体之间出现了新兴的合作行为和高阶心灵理论能力。我们的结果揭示了LLM-based智能体在规划优化方面存在的局限性，因为它们在处理长期视野上下文和对任务状态的幻觉方面存在系统性失败。我们探讨了使用明确的信念状态表示来减轻这些问题，发现这样做提高了任务性能和LLM-based智能体的ToM推理的准确性。

更新时间: 2024-06-26 20:15:34

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2310.10701v3

Decentralized Semantic Traffic Control in AVs Using RL and DQN for Dynamic Roadblocks

Autonomous Vehicles (AVs), furnished with sensors capable of capturing essential vehicle dynamics such as speed, acceleration, and precise location, possess the capacity to execute intelligent maneuvers, including lane changes, in anticipation of approaching roadblocks. Nevertheless, the sheer volume of sensory data and the processing necessary to derive informed decisions can often overwhelm the vehicles, rendering them unable to handle the task independently. Consequently, a common approach in traffic scenarios involves transmitting the data to servers for processing, a practice that introduces challenges, particularly in situations demanding real-time processing. In response to this challenge, we present a novel DL-based semantic traffic control system that entrusts semantic encoding responsibilities to the vehicles themselves. This system processes driving decisions obtained from a Reinforcement Learning (RL) agent, streamlining the decision-making process. Specifically, our framework envisions scenarios where abrupt roadblocks materialize due to factors such as road maintenance, accidents, or vehicle repairs, necessitating vehicles to make determinations concerning lane-keeping or lane-changing actions to navigate past these obstacles. To formulate this scenario mathematically, we employ a Markov Decision Process (MDP) and harness the Deep Q Learning (DQN) algorithm to unearth viable solutions.

Updated: 2024-06-26 20:12:48

标题: 使用强化学习和深度Q网络在无人驾驶车辆中实现去中心化语义交通控制，应对动态路障

摘要: 自动驾驶车辆（AVs）配备了能够捕获关键车辆动态信息如速度、加速度和精确位置的传感器，具备执行智能操作的能力，包括在预期遇到路障时进行车道变换。然而，传感器数据的海量和派生明智决策所需的处理往往会使车辆不堪重负，导致它们无法独立完成任务。因此，在交通场景中常见的方法是将数据传输到服务器进行处理，这种做法在需要实时处理的情况下会带来挑战。针对这一挑战，我们提出了一种新颖的基于深度学习的语义交通控制系统，将语义编码责任委托给车辆自身。该系统处理从强化学习（RL）代理获得的驾驶决策，简化决策过程。具体来说，我们的框架设想了由于道路维护、事故或车辆维修等因素导致突发路障出现的情况，需要车辆对车道保持或变换行为进行决策以避开这些障碍物。为了在数学上构建这种情景，我们采用马尔可夫决策过程（MDP）并利用深度Q学习（DQN）算法来发掘可行解决方案。

更新时间: 2024-06-26 20:12:48

领域: cs.LG,cs.AI,cs.NI

下载: http://arxiv.org/abs/2406.18741v1

RetroGFN: Diverse and Feasible Retrosynthesis using GFlowNets

Single-step retrosynthesis aims to predict a set of reactions that lead to the creation of a target molecule, which is a crucial task in molecular discovery. Although a target molecule can often be synthesized with multiple different reactions, it is not clear how to verify the feasibility of a reaction, because the available datasets cover only a tiny fraction of the possible solutions. Consequently, the existing models are not encouraged to explore the space of possible reactions sufficiently. In this paper, we propose a novel single-step retrosynthesis model, RetroGFN, that can explore outside the limited dataset and return a diverse set of feasible reactions by leveraging a feasibility proxy model during the training. We show that RetroGFN achieves competitive results on standard top-k accuracy while outperforming existing methods on round-trip accuracy. Moreover, we provide empirical arguments in favor of using round-trip accuracy which expands the notion of feasibility with respect to the standard top-k accuracy metric.

Updated: 2024-06-26 20:10:03

标题: RetroGFN：使用GFlowNets进行多样化和可行的逆合成

摘要: 单步回溯合成旨在预测一组反应，以导致目标分子的创造，这是分子发现中的关键任务。尽管目标分子通常可以用多种不同的反应合成，但不清楚如何验证反应的可行性，因为可用数据集仅覆盖可能解决方案的一小部分。因此，现有模型不鼓励充分探索可能反应的空间。在本文中，我们提出了一种新颖的单步回溯合成模型RetroGFN，它可以在训练过程中利用一个可行性代理模型，探索超出有限数据集的范围，并返回一组多样化的可行反应。我们展示了RetroGFN在标准top-k准确性上取得了竞争结果，同时在往返准确性上优于现有方法。此外，我们提供了支持使用往返准确性的经验论据，这扩展了与标准top-k准确性指标相关的可行性概念。

更新时间: 2024-06-26 20:10:03

领域: cs.LG

下载: http://arxiv.org/abs/2406.18739v1

WavRx: a Disease-Agnostic, Generalizable, and Privacy-Preserving Speech Health Diagnostic Model

Speech is known to carry health-related attributes, which has emerged as a novel venue for remote and long-term health monitoring. However, existing models are usually tailored for a specific type of disease, and have been shown to lack generalizability across datasets. Furthermore, concerns have been raised recently towards the leakage of speaker identity from health embeddings. To mitigate these limitations, we propose WavRx, a speech health diagnostics model that captures the respiration and articulation related dynamics from a universal speech representation. Our in-domain and cross-domain experiments on six pathological speech datasets demonstrate WavRx as a new state-of-the-art health diagnostic model. Furthermore, we show that the amount of speaker identity entailed in the WavRx health embeddings is significantly reduced without extra guidance during training. An in-depth analysis of the model was performed, thus providing physiological interpretation of its improved generalizability and privacy-preserving ability.

Updated: 2024-06-26 19:59:21

标题: WavRx：一种疾病不可知、易推广且保护隐私的语音健康诊断模型

摘要: 语音被认为携带着与健康相关的属性，已经成为远程和长期健康监测的新途径。然而，现有的模型通常是为特定类型的疾病量身定制的，并且已经显示出在数据集之间缺乏泛化能力。此外，最近对从健康嵌入中泄漏说话者身份的担忧日益加剧。为了缓解这些限制，我们提出了WavRx，一种语音健康诊断模型，该模型从通用语音表示中捕获与呼吸和发音相关的动力学。我们在六个病理性语音数据集上进行了领域内和跨领域实验，将WavRx展示为一种新的健康诊断模型。此外，我们展示了在训练期间没有额外指导的情况下，WavRx健康嵌入中包含的说话者身份数量显著减少。对模型进行了深入分析，从而提供了对其改进的泛化能力和保护隐私能力的生理解释。

更新时间: 2024-06-26 19:59:21

领域: eess.AS,cs.AI,cs.CL

下载: http://arxiv.org/abs/2406.18731v1

Data-driven identification of port-Hamiltonian DAE systems by Gaussian processes

Port-Hamiltonian systems (pHS) allow for a structure-preserving modeling of dynamical systems. Coupling pHS via linear relations between input and output defines an overall pHS, which is structure preserving. However, in multiphysics applications, some subsystems do not allow for a physical pHS description, as (a) this is not available or (b) too expensive. Here, data-driven approaches can be used to deliver a pHS for such subsystems, which can then be coupled to the other subsystems in a structure-preserving way. In this work, we derive a data-driven identification approach for port-Hamiltonian differential algebraic equation (DAE) systems. The approach uses input and state space data to estimate nonlinear effort functions of pH-DAEs. As underlying technique, we us (multi-task) Gaussian processes. This work thereby extends over the current state of the art, in which only port-Hamiltonian ordinary differential equation systems could be identified via Gaussian processes. We apply this approach successfully to two applications from network design and constrained multibody system dynamics, based on pH-DAE system of index one and three, respectively.

Updated: 2024-06-26 19:51:53

标题: 用数据驱动的高斯过程识别港 - 哈密尔顿DAE系统

摘要: Port-Hamiltonian系统（pHS）允许对动态系统进行结构保持建模。通过输入和输出之间的线性关系耦合pHS定义了整体pHS，这是结构保持的。然而，在多物理学应用中，一些子系统不允许进行物理pHS描述，因为（a）这种描述不可用或（b）成本太高。在这种情况下，可以使用数据驱动方法为这些子系统提供pHS，然后以结构保持的方式将其与其他子系统耦合。在这项工作中，我们为港汉密尔顿微分代数方程（DAE）系统推导了一种数据驱动识别方法。该方法使用输入和状态空间数据来估计pH-DAEs的非线性努力函数。作为基础技术，我们使用（多任务）高斯过程。这项工作扩展了当前技术水平，目前只能通过高斯过程识别港汉密尔顿常微分方程系统。我们成功将此方法应用于网络设计和受限多体系统动力学的两个应用，分别基于指数为一和三的pH-DAE系统。

更新时间: 2024-06-26 19:51:53

领域: eess.SY,cs.LG,cs.NA,cs.SY,math.NA

下载: http://arxiv.org/abs/2406.18726v1

Jailbreaking LLMs with Arabic Transliteration and Arabizi

This study identifies the potential vulnerabilities of Large Language Models (LLMs) to 'jailbreak' attacks, specifically focusing on the Arabic language and its various forms. While most research has concentrated on English-based prompt manipulation, our investigation broadens the scope to investigate the Arabic language. We initially tested the AdvBench benchmark in Standardized Arabic, finding that even with prompt manipulation techniques like prefix injection, it was insufficient to provoke LLMs into generating unsafe content. However, when using Arabic transliteration and chatspeak (or arabizi), we found that unsafe content could be produced on platforms like OpenAI GPT-4 and Anthropic Claude 3 Sonnet. Our findings suggest that using Arabic and its various forms could expose information that might remain hidden, potentially increasing the risk of jailbreak attacks. We hypothesize that this exposure could be due to the model's learned connection to specific words, highlighting the need for more comprehensive safety training across all language forms.

Updated: 2024-06-26 19:48:48

标题: 用阿拉伯语转音和阿拉伯字母表破解LLM

摘要: 这项研究确定了大型语言模型（LLMs）对“越狱”攻击的潜在漏洞，特别关注阿拉伯语及其各种形式。尽管大多数研究集中在基于英语的提示操作上，我们的调查扩大了范围，以研究阿拉伯语。我们最初在标准化阿拉伯语中测试了AdvBench基准测试，发现即使使用前缀注入等提示操作技术，也无法引发LLMs生成不安全内容。然而，当使用阿拉伯语音译和聊天语（或阿拉伯字母表拼写），我们发现在OpenAI GPT-4和Anthropic Claude 3 Sonnet等平台上可能会产生不安全内容。我们的发现表明，使用阿拉伯语及其各种形式可能会暴露可能保持隐藏的信息，潜在增加越狱攻击的风险。我们假设这种暴露可能是由于模型对特定词语的学习连接，强调了在所有语言形式上进行更全面的安全培训的必要性。

更新时间: 2024-06-26 19:48:48

领域: cs.LG,cs.CL

下载: http://arxiv.org/abs/2406.18725v1

Disentangled Representation Learning

Disentangled Representation Learning (DRL) aims to learn a model capable of identifying and disentangling the underlying factors hidden in the observable data in representation form. The process of separating underlying factors of variation into variables with semantic meaning benefits in learning explainable representations of data, which imitates the meaningful understanding process of humans when observing an object or relation. As a general learning strategy, DRL has demonstrated its power in improving the model explainability, controlability, robustness, as well as generalization capacity in a wide range of scenarios such as computer vision, natural language processing, and data mining. In this article, we comprehensively investigate DRL from various aspects including motivations, definitions, methodologies, evaluations, applications, and model designs. We first present two well-recognized definitions, i.e., Intuitive Definition and Group Theory Definition for disentangled representation learning. We further categorize the methodologies for DRL into four groups from the following perspectives, the model type, representation structure, supervision signal, and independence assumption. We also analyze principles to design different DRL models that may benefit different tasks in practical applications. Finally, we point out challenges in DRL as well as potential research directions deserving future investigations. We believe this work may provide insights for promoting the DRL research in the community.

Updated: 2024-06-26 19:29:39

标题: 解缠表示学习

摘要: Disentangled Representation Learning (DRL)旨在学习一个能够识别和解开可观察数据中隐藏的潜在因素的模型。将变化的潜在因素分离成具有语义含义的变量的过程有助于学习数据的可解释表示，这模仿了人类在观察对象或关系时的有意义理解过程。作为一种通用的学习策略，DRL已经展示了它在改进模型的可解释性、可控性、鲁棒性以及泛化能力方面的优势，在诸如计算机视觉、自然语言处理和数据挖掘等各种场景中。在本文中，我们全面调查了DRL，包括动机、定义、方法、评估、应用和模型设计等各个方面。我们首先介绍了两种公认的定义，即直观定义和群论定义，用于解释式表示学习。我们进一步将DRL的方法论从模型类型、表示结构、监督信号和独立性假设等方面分为四组。我们还分析了设计不同DRL模型的原则，这些模型可能有助于实际应用中的不同任务。最后，我们指出了DRL中的挑战，以及值得未来研究探索的潜在研究方向。我们相信这项工作可能为推动社区中的DRL研究提供见解。

更新时间: 2024-06-26 19:29:39

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2211.11695v4

Towards Neural Scaling Laws for Foundation Models on Temporal Graphs

The field of temporal graph learning aims to learn from evolving network data to forecast future interactions. Given a collection of observed temporal graphs, is it possible to predict the evolution of an unseen network from the same domain? To answer this question, we first present the Temporal Graph Scaling (TGS) dataset, a large collection of temporal graphs consisting of eighty-four ERC20 token transaction networks collected from 2017 to 2023. Next, we evaluate the transferability of Temporal Graph Neural Networks (TGNNs) for the temporal graph property prediction task by pre-training on a collection of up to sixty-four token transaction networks and then evaluating the downstream performance on twenty unseen token networks. We find that the neural scaling law observed in NLP and Computer Vision also applies in temporal graph learning, where pre-training on greater number of networks leads to improved downstream performance. To the best of our knowledge, this is the first empirical demonstration of the transferability of temporal graphs learning. On downstream token networks, the largest pre-trained model outperforms single model TGNNs on thirteen unseen test networks. Therefore, we believe that this is a promising first step towards building foundation models for temporal graphs.

Updated: 2024-06-26 19:26:58

标题: 朝向时间图上基础模型的神经缩放定律

摘要: 时间图学习领域的目标是从不断变化的网络数据中学习，以预测未来的交互。给定一组观察到的时间图，是否可以预测来自相同领域的未知网络的演变？为了回答这个问题，我们首先介绍了时间图缩放（TGS）数据集，这是一个包含84个ERC20代币交易网络的大型时间图集合，从2017年到2023年收集。接下来，我们通过在最多64个代币交易网络的集合上进行预训练，然后评估在20个未见代币网络上的下游性能，评估了时间图神经网络（TGNNs）的可迁移性，用于时间图属性预测任务。我们发现，在NLP和计算机视觉中观察到的神经网络缩放定律也适用于时间图学习，在这里，对更多网络进行预训练会导致改进的下游性能。据我们所知，这是对时间图学习可迁移性的首次实证演示。在下游代币网络上，在最大的预训练模型上表现优于单个模型TGNNs的13个未见测试网络。因此，我们相信这是朝着为时间图建立基础模型迈出的有希望的第一步。

更新时间: 2024-06-26 19:26:58

领域: cs.LG

下载: http://arxiv.org/abs/2406.10426v2

Learn it or Leave it: Module Composition and Pruning for Continual Learning

In real-world environments, continual learning is essential for machine learning models, as they need to acquire new knowledge incrementally without forgetting what they have already learned. While pretrained language models have shown impressive capabilities on various static tasks, applying them to continual learning poses significant challenges, including avoiding catastrophic forgetting, facilitating knowledge transfer, and maintaining parameter efficiency. In this paper, we introduce MoCL-P, a novel lightweight continual learning method that addresses these challenges simultaneously. Unlike traditional approaches that continuously expand parameters for newly arriving tasks, MoCL-P integrates task representation-guided module composition with adaptive pruning, effectively balancing knowledge integration and computational overhead. Our evaluation across three continual learning benchmarks with up to 176 tasks shows that MoCL-P achieves state-of-the-art performance and improves parameter efficiency by up to three times, demonstrating its potential for practical applications where resource requirements are constrained.

Updated: 2024-06-26 19:18:28

标题: 学习它或者离开它: 模块组合和修剪对于持续学习的影响

摘要: 在现实世界的环境中，对于机器学习模型来说，持续学习是至关重要的，因为它们需要增量地获取新知识，而不会忘记已经学到的知识。虽然预训练的语言模型在各种静态任务上展现出了令人印象深刻的能力，但将它们应用于持续学习会面临重大挑战，包括避免灾难性遗忘、促进知识转移和保持参数效率。在本文中，我们介绍了MoCL-P，这是一种新颖的轻量级持续学习方法，同时解决了这些挑战。与传统方法不同，MoCL-P集成了基于任务表示引导的模块组合和自适应剪枝，有效平衡了知识整合和计算开销。我们在三个持续学习基准测试中对MoCL-P进行评估，其中包括高达176个任务，结果表明MoCL-P实现了最先进的性能，并将参数效率提高了多达三倍，展示了它在资源受限的实际应用中的潜力。

更新时间: 2024-06-26 19:18:28

领域: cs.LG,cs.CL

下载: http://arxiv.org/abs/2406.18708v1

Fast Optimizer Benchmark

In this paper, we present the Fast Optimizer Benchmark (FOB), a tool designed for evaluating deep learning optimizers during their development. The benchmark supports tasks from multiple domains such as computer vision, natural language processing, and graph learning. The focus is on convenient usage, featuring human-readable YAML configurations, SLURM integration, and plotting utilities. FOB can be used together with existing hyperparameter optimization (HPO) tools as it handles training and resuming of runs. The modular design enables integration into custom pipelines, using it simply as a collection of tasks. We showcase an optimizer comparison as a usage example of our tool. FOB can be found on GitHub: https://github.com/automl/FOB.

Updated: 2024-06-26 19:10:34

标题: 快速优化器基准测试

摘要: 在本文中，我们介绍了快速优化器基准（FOB），这是一个旨在评估深度学习优化器在其开发过程中的工具。该基准支持来自多个领域的任务，如计算机视觉、自然语言处理和图学习。重点是方便使用，具有可读性强的YAML配置、SLURM集成和绘图工具。FOB可以与现有的超参数优化（HPO）工具一起使用，因为它处理训练和恢复运行。模块化设计使其能够集成到自定义流程中，简单地将其用作任务集合。我们展示了一个优化器比较作为我们工具的使用示例。FOB可以在GitHub上找到：https://github.com/automl/FOB。

更新时间: 2024-06-26 19:10:34

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2406.18701v1

Learning to Correct for QA Reasoning with Black-box LLMs

An open challenge in recent machine learning is about how to improve the reasoning capability of large language models (LLMs) in a black-box setting, i.e., without access to detailed information such as output token probabilities. Existing approaches either rely on accessibility (which is often unrealistic) or involve significantly increased train- and inference-time costs. This paper addresses those limitations or shortcomings by proposing a novel approach, namely CoBB (Correct for improving QA reasoning of Black-Box LLMs). It uses a trained adaptation model to perform a seq2seq mapping from the often-imperfect reasonings of the original black-box LLM to the correct or improved reasonings. Specifically, the adaptation model is initialized with a relatively small open-source LLM and adapted over a collection of sub-sampled training pairs. To select the representative pairs of correct and incorrect reasonings, we formulated the dataset construction as an optimization problem that minimizes the statistical divergence between the sampled subset and the entire collection, and solved it via a genetic algorithm. We then train the adaptation model over the sampled pairs by contrasting the likelihoods of correct and incorrect reasonings. Our experimental results demonstrate that CoBB significantly improves reasoning accuracy across various QA benchmarks, compared to the best-performing adaptation baselines.

Updated: 2024-06-26 18:57:32

标题: 学习使用黑盒LLMs进行问答推理修正

摘要: 近年来，机器学习领域面临的一个挑战是如何在黑匣子设置中提高大型语言模型（LLMs）的推理能力，即在没有访问详细信息（如输出标记概率）的情况下。现有方法要么依赖于可访问性（这往往是不现实的），要么涉及显著增加的训练和推理时间成本。本文通过提出一种新颖的方法CoBB（用于改善黑匣子LLMs的QA推理），来解决这些限制或缺陷。该方法使用一个经过训练的适应模型，将原始黑匣子LLMs的常常不完善的推理映射到正确或改进的推理上。具体而言，适应模型是用一个相对较小的开源LLM进行初始化，并在一系列抽样训练对上进行调整。为了选择正确和错误推理的代表性对，我们将数据集构建问题形式化为一个优化问题，通过遗传算法求解最小化抽样子集与整个集合之间的统计差异。然后，通过对比正确和错误推理的可能性来对适应模型进行训练。我们的实验结果表明，与表现最佳的适应基线相比，CoBB显著提高了在各种QA基准上的推理准确性。

更新时间: 2024-06-26 18:57:32

领域: cs.LG,cs.AI,cs.CL

下载: http://arxiv.org/abs/2406.18695v1

Lazy Safety Alignment for Large Language Models against Harmful Fine-tuning

Recent studies show that Large Language Models (LLMs) with safety alignment can be jail-broken by fine-tuning on a dataset mixed with harmful data. First time in the literature, we show that the jail-broken effect can be mitigated by separating states in the finetuning stage to optimize the alignment and user datasets. Unfortunately, our subsequent study shows that this simple Bi-State Optimization (BSO) solution experiences convergence instability when steps invested in its alignment state is too small, leading to downgraded alignment performance. By statistical analysis, we show that the \textit{excess drift} towards consensus could be a probable reason for the instability. To remedy this issue, we propose \textbf{L}azy(\textbf{i}) \textbf{s}afety \textbf{a}lignment (\textbf{Lisa}), which introduces a proximal term to constraint the drift of each state. Theoretically, the benefit of the proximal term is supported by the convergence analysis, wherein we show that a sufficient large proximal factor is necessary to guarantee Lisa's convergence. Empirically, our results on four downstream finetuning tasks show that Lisa with a proximal term can significantly increase alignment performance while maintaining the LLM's accuracy on the user tasks. Code is available at \url{https://github.com/git-disl/Lisa}.

Updated: 2024-06-26 18:54:59

标题: 大型语言模型的懒惰安全对齐：针对有害微调的防护

摘要: 最近的研究表明，具有安全对齐的大型语言模型（LLMs）可以通过在混合有害数据的数据集上微调来越狱。首次在文献中，我们展示了通过在微调阶段分离状态来优化对齐和用户数据集可以减轻破解效果。不幸的是，我们随后的研究表明，当投入到对齐状态的步骤过小时，这种简单的双状态优化（BSO）解决方案会出现收敛不稳定性，导致对齐性能下降。通过统计分析，我们表明\textit{过度漂移}朝向共识可能是不稳定性的一个可能原因。为了解决这个问题，我们提出了\textbf{L}azy(\textbf{i}) \textbf{s}afety \textbf{a}lignment（\textbf{Lisa}），它引入了一个接近项来约束每个状态的漂移。从理论上讲，接近项的好处得到了收敛分析的支持，我们在其中表明一个足够大的接近因子是必要的，以确保Lisa的收敛。从经验上看，我们在四个下游微调任务上的结果表明，Lisa使用接近项可以显著提高对齐性能，同时保持LLM在用户任务上的准确性。代码可在\url{https://github.com/git-disl/Lisa}找到。

更新时间: 2024-06-26 18:54:59

领域: cs.LG

下载: http://arxiv.org/abs/2405.18641v4

Petal-X: Human-Centered Visual Explanations to Improve Cardiovascular Risk Communication

Cardiovascular diseases (CVDs), the leading cause of death worldwide, can be prevented in most cases through behavioral interventions. Therefore, effective communication of CVD risk and projected risk reduction by risk factor modification plays a crucial role in reducing CVD risk at the individual level. However, despite interest in refining risk estimation with improved prediction models such as SCORE2, the guidelines for presenting these risk estimations in clinical practice remained essentially unchanged in the last few years, with graphical score charts (GSCs) continuing to be one of the prevalent systems. This work describes the design and implementation of Petal-X, a novel tool to support clinician-patient shared decision-making by explaining the CVD risk contributions of different factors and facilitating what-if analysis. Petal-X relies on a novel visualization, Petal Product Plots, and a tailor-made global surrogate model of SCORE2, whose fidelity is comparable to that of the GSCs used in clinical practice. We evaluated Petal-X compared to GSCs in a controlled experiment with 88 healthcare students, all but one with experience with chronic patients. The results show that Petal-X outperforms GSC in critical tasks, such as comparing the contribution to the patient's 10-year CVD risk of each modifiable risk factor, without a significant loss of perceived transparency, trust, or intent to use. Our study provides an innovative approach to the visualization and explanation of risk in clinical practice that, due to its model-agnostic nature, could continue to support next-generation artificial intelligence risk assessment models.

Updated: 2024-06-26 18:48:50

标题: "Petal-X: 人类中心的视觉解释以改善心血管风险沟通"

摘要: 心血管疾病（CVDs）是全球死亡的主要原因，大多数情况下可以通过行为干预来预防。因此，有效传达CVD风险和通过风险因素修改预期的风险降低在个体层面降低CVD风险起着至关重要的作用。然而，尽管人们对用改进的预测模型如SCORE2来精确估计风险表示出兴趣，但在过去几年中，在临床实践中呈现这些风险估计的指南基本保持不变，图形评分表（GSCs）仍然是主要系统之一。本文描述了Petal-X的设计和实施，这是一种新型工具，用于支持临床医生和患者共享决策，通过解释不同因素对CVD风险的贡献并促进假设分析。Petal-X依赖于一种新型可视化工具Petal Product Plots，以及一个定制的SCORE2全球替代模型，其准确度与临床实践中使用的GSCs相当。我们在88名医疗保健学生中进行了一项受控实验，其中除一人外，所有人都有与慢性患者的经验。结果显示，Petal-X在关键任务中优于GSC，如比较每个可修改风险因素对患者10年CVD风险的贡献，而不会显著损失透明度、信任或使用意向。我们的研究提供了一种创新的方法，用于在临床实践中可视化和解释风险，由于其与模型无关的性质，可以继续支持下一代人工智能风险评估模型。

更新时间: 2024-06-26 18:48:50

领域: cs.HC,cs.AI,cs.LG

下载: http://arxiv.org/abs/2406.18690v1

Large Stepsize Gradient Descent for Non-Homogeneous Two-Layer Networks: Margin Improvement and Fast Optimization

The typical training of neural networks using large stepsize gradient descent (GD) under the logistic loss often involves two distinct phases, where the empirical risk oscillates in the first phase but decreases monotonically in the second phase. We investigate this phenomenon in two-layer networks that satisfy a near-homogeneity condition. We show that the second phase begins once the empirical risk falls below a certain threshold, dependent on the stepsize. Additionally, we show that the normalized margin grows nearly monotonically in the second phase, demonstrating an implicit bias of GD in training non-homogeneous predictors. If the dataset is linearly separable and the derivative of the activation function is bounded away from zero, we show that the average empirical risk decreases, implying that the first phase must stop in finite steps. Finally, we demonstrate that by choosing a suitably large stepsize, GD that undergoes this phase transition is more efficient than GD that monotonically decreases the risk. Our analysis applies to networks of any width, beyond the well-known neural tangent kernel and mean-field regimes.

Updated: 2024-06-26 18:40:57

标题: 大步长梯度下降在非齐次两层网络中的应用：边界改善和快速优化

摘要: 使用大步长梯度下降（GD）在 logistic 损失下训练神经网络的典型训练通常包括两个不同阶段，第一阶段中经验风险波动，而第二阶段中经验风险单调下降。我们研究了满足近均匀性条件的两层网络中的这种现象。我们表明，第二阶段开始于经验风险降至某个阈值以下，该阈值取决于步长。此外，我们表明在第二阶段中，标准化边缘几乎单调增长，展示了 GD 在训练非均匀预测器中的隐性偏见。如果数据集是线性可分的，并且激活函数的导数远离零，我们表明平均经验风险下降，暗示第一阶段必须在有限步骤内停止。最后，我们证明通过选择适当大的步长，经历这种阶段转变的 GD 比单调降低风险的 GD 更有效。我们的分析适用于任何宽度的网络，超出了众所周知的神经切线核和平均场区域。

更新时间: 2024-06-26 18:40:57

领域: stat.ML,cs.LG,math.OC

下载: http://arxiv.org/abs/2406.08654v2

The Multilingual Alignment Prism: Aligning Global and Local Preferences to Reduce Harm

A key concern with the concept of "alignment" is the implicit question of "alignment to what?". AI systems are increasingly used across the world, yet safety alignment is often focused on homogeneous monolingual settings. Additionally, preference training and safety measures often overfit to harms common in Western-centric datasets. Here, we explore the viability of different alignment approaches when balancing dual objectives: addressing and optimizing for a non-homogeneous set of languages and cultural preferences while minimizing both global and local harms. We collect the first set of human annotated red-teaming prompts in different languages distinguishing between global and local harm, which serve as a laboratory for understanding the reliability of alignment techniques when faced with preference distributions that are non-stationary across geographies and languages. While this setting is seldom covered by the literature to date, which primarily centers on English harm mitigation, it captures real-world interactions with AI systems around the world. We establish a new precedent for state-of-the-art alignment techniques across 6 languages with minimal degradation in general performance. Our work provides important insights into cross-lingual transfer and novel optimization approaches to safeguard AI systems designed to serve global populations.

Updated: 2024-06-26 18:39:08

标题: 多语言对准棱镜：将全球和本地偏好对准以减少伤害

摘要: “对齐”概念的一个关键问题是暗含的“对齐到什么？”的问题。人工智能系统在全球范围内的应用越来越广泛，然而安全对齐往往集中在同质单语环境中。此外，偏好训练和安全措施往往过度拟合于西方中心数据集中常见的危害。在这里，我们探讨了在平衡双重目标时不同对齐方法的可行性：解决和优化非同质语言和文化偏好集合，同时最小化全球和本地危害。我们收集了不同语言中第一批人类标注的红队提示，区分全球和本地危害，这些提示可以作为一个实验室，用于理解当面临地理和语言上非稳态偏好分布时对齐技术的可靠性。尽管这种设置在文献中很少被报道，主要集中在英语危害缓解上，但它捕捉到了全球范围内与人工智能系统的真实互动。我们在6种语言上建立了最新对齐技术的先例，总体性能几乎没有降低。我们的工作为跨语言转移和新颖的优化方法提供了重要见解，以保障为全球人口设计的人工智能系统。

更新时间: 2024-06-26 18:39:08

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2406.18682v1

Speakers Unembedded: Embedding-free Approach to Long-form Neural Diarization

End-to-end neural diarization (EEND) models offer significant improvements over traditional embedding-based Speaker Diarization (SD) approaches but falls short on generalizing to long-form audio with large number of speakers. EEND-vector-clustering method mitigates this by combining local EEND with global clustering of speaker embeddings from local windows, but this requires an additional speaker embedding framework alongside the EEND module. In this paper, we propose a novel framework applying EEND both locally and globally for long-form audio without separate speaker embeddings. This approach achieves significant relative DER reduction of 13% and 10% over the conventional 1-pass EEND on Callhome American English and RT03-CTS datasets respectively and marginal improvements over EEND-vector-clustering without the need for additional speaker embeddings. Furthermore, we discuss the computational complexity of our proposed framework and explore strategies for reducing processing times.

Updated: 2024-06-26 18:32:16

标题: 说话者非嵌入式：基于长形神经对话分离的无嵌入式方法

摘要: 端到端神经对话（EEND）模型相对于传统基于嵌入的说话人对话（SD）方法有显著改进，但在泛化至具有大量说话人的长篇音频方面表现不佳。EEND-向量聚类方法通过将本地EEND与来自本地窗口的说话人嵌入的全局聚类相结合，从而缓解了这一问题，但这需要在EEND模块旁边另外使用一个说话人嵌入框架。在本文中，我们提出了一种新颖的框架，将EEND同时应用于本地和全局的长篇音频，而无需单独的说话人嵌入。这种方法在Callhome美国英语和RT03-CTS数据集上分别相对于传统的一遍EEND实现了显著的DER减少13%和10%，并在无需额外说话人嵌入的情况下略有改进EEND-向量聚类。此外，我们讨论了我们提出的框架的计算复杂性，并探讨了减少处理时间的策略。

更新时间: 2024-06-26 18:32:16

领域: eess.AS,cs.AI,cs.CL,cs.LG

下载: http://arxiv.org/abs/2406.18679v1

Few-shot Personalization of LLMs with Mis-aligned Responses

As the diversity of users increases, the capability of providing personalized responses by large language models (LLMs) has become increasingly important. Existing approaches have only limited successes in LLM personalization, due to the absence of personalized learning or the reliance on shared personal data. This paper proposes a new approach for a few-shot personalization of LLMs with their mis-aligned responses (Fermi). Our key idea is to learn a set of personalized prompts for each user by progressively improving the prompts using LLMs, based on user profile (e.g., demographic information) and a few examples of previous opinions. During an iterative process of prompt improvement, we incorporate the contexts of mis-aligned responses by LLMs, which are especially crucial for the effective personalization of LLMs. In addition, we develop an effective inference method to further leverage the context of the test query and the personalized prompts. Our experimental results demonstrate that Fermi significantly improves performance across various benchmarks, compared to the best-performing baselines.

Updated: 2024-06-26 18:29:12

标题: 使用不符合对齐响应的LLMs进行少样本个性化

摘要: 随着用户群体的多样化增加，大型语言模型（LLMs）提供个性化响应的能力变得日益重要。现有方法在LLM个性化方面取得的成功有限，这是由于缺乏个性化学习或依赖共享个人数据。本文提出了一种新方法，用于通过其错位响应（Fermi）对LLMs进行少量个性化。我们的关键思想是通过逐步改进基于LLMs的提示，学习每个用户的一组个性化提示，基于用户资料（例如人口统计信息）和少量先前意见的示例。在提示改进的迭代过程中，我们将LLMs的错位响应的上下文纳入考虑，这对于LLMs的有效个性化至关重要。此外，我们开发了一种有效的推理方法，进一步利用测试查询的上下文和个性化提示。我们的实验结果表明，与表现最佳的基准方法相比，Fermi在各种基准测试中显著提高了性能。

更新时间: 2024-06-26 18:29:12

领域: cs.LG,cs.AI,cs.CL

下载: http://arxiv.org/abs/2406.18678v1

Kirchhoff Meets Johnson: In Pursuit of Unconditionally Secure Communication

Noise: an enemy to be dealt with and a major factor limiting communication system performance. However, what if there is gold in that garbage? In conventional engineering, our focus is primarily on eliminating, suppressing, combating, or even ignoring noise and its detrimental impacts. Conversely, could we exploit it similarly to biology, which utilizes noise-alike carrier signals to convey information? In this context, the utilization of noise, or noise-alike signals in general, has been put forward as a means to realize unconditionally secure communication systems in the future. In this tutorial article, we begin by tracing the origins of thermal noise-based communication and highlighting one of its significant applications for ensuring unconditionally secure networks: the Kirchhoff-law-Johnson-noise (KLJN) secure key exchange scheme. We then delve into the inherent challenges tied to secure communication and discuss the imperative need for physics-based key distribution schemes in pursuit of unconditional security. Concurrently, we provide a concise overview of quantum key distribution (QKD) schemes and draw comparisons with their KLJN-based counterparts. Finally, extending beyond wired communication loops, we explore the transmission of noise signals over-the-air and evaluate their potential for stealth and secure wireless communication systems.

Updated: 2024-06-26 18:28:07

标题: 基尔霍夫遇见约翰逊：追求无条件安全通信

摘要: 噪音：一个需要应对的敌人，也是限制通信系统性能的主要因素。然而，如果在那些垃圾中有黄金呢？在传统工程中，我们的重点主要是消除、抑制、对抗甚至忽视噪音及其有害影响。相反，我们是否可以像生物学一样利用噪音类似的载波信号传递信息？在这个背景下，利用噪音或类似噪音的信号被提出作为未来实现无条件安全通信系统的手段。在这篇教程文章中，我们首先追溯了基于热噪声的通信的起源，并突出了其中一个重要应用，用于确保无条件安全网络的基尔霍夫-约翰逊噪声（KLJN）安全密钥交换方案。然后，我们深入讨论了与安全通信相关的固有挑战，并讨论了基于物理的密钥分发方案在追求无条件安全性方面的必要性。与此同时，我们提供了量子密钥分发（QKD）方案的简要概述，并与基于KLJN的方案进行了比较。最后，超越有线通信环路，我们探讨了通过空中传输噪音信号，并评估了它们对隐蔽和安全无线通信系统的潜力。

更新时间: 2024-06-26 18:28:07

领域: cs.IT,cs.CR,eess.SP,math.IT

下载: http://arxiv.org/abs/2312.02042v3

Understand What LLM Needs: Dual Preference Alignment for Retrieval-Augmented Generation

Retrieval-augmented generation (RAG) has demonstrated effectiveness in mitigating the hallucination problem of large language models (LLMs). However, the difficulty of aligning the retriever with the diverse LLMs' knowledge preferences inevitably poses an inevitable challenge in developing a reliable RAG system. To address this issue, we propose DPA-RAG, a universal framework designed to align diverse knowledge preferences within RAG systems. Specifically, we initially introduce a preference knowledge construction pipline and incorporate five novel query augmentation strategies to alleviate preference data scarcity. Based on preference data, DPA-RAG accomplishes both external and internal preference alignment: 1) It jointly integrate pair-wise, point-wise, and contrastive preference alignment abilities into the reranker, achieving external preference alignment among RAG components. 2) It further introduces a pre-aligned stage before vanilla Supervised Fine-tuning (SFT), enabling LLMs to implicitly capture knowledge aligned with their reasoning preferences, achieving LLMs' internal alignment. Experimental results across four knowledge-intensive QA datasets demonstrate that DPA-RAG outperforms all baselines and seamlessly integrates both black-box and open-sourced LLM readers. Further qualitative analysis and discussions also provide empirical guidance for achieving reliable RAG systems. Our code is publicly available at https://github.com/dongguanting/DPA-RAG.

Updated: 2024-06-26 18:26:53

标题: 理解LLM的需求：检索增强生成的双重偏好对齐

摘要: 检索增强生成（RAG）已经证明在减轻大型语言模型（LLMs）的幻觉问题方面是有效的。然而，将检索器与各种LLMs的知识偏好对齐的难度不可避免地在开发可靠的RAG系统中提出了挑战。为了解决这个问题，我们提出了DPA-RAG，这是一个旨在将RAG系统内的各种知识偏好对齐的通用框架。具体来说，我们首先引入了一个偏好知识构建流程，并结合了五种新颖的查询增强策略来缓解偏好数据的稀缺性。基于偏好数据，DPA-RAG实现了外部和内部偏好对齐：1）它将成对、点对和对比偏好对齐能力集成到重新排名器中，实现了RAG组件之间的外部偏好对齐。2）它进一步引入了一个预对齐阶段，用于普通监督微调（SFT）之前，使LLMs能够隐式捕捉与其推理偏好对齐的知识，实现了LLMs的内部对齐。跨四个知识密集型QA数据集的实验结果表明，DPA-RAG优于所有基线，并无缝集成了黑盒和开源LLM读者。进一步的定性分析和讨论也为实现可靠的RAG系统提供了经验指导。我们的代码公开可用于https://github.com/dongguanting/DPA-RAG。

更新时间: 2024-06-26 18:26:53

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2406.18676v1

Human-AI Collaborative Taxonomy Construction: A Case Study in Profession-Specific Writing Assistants

Large Language Models (LLMs) have assisted humans in several writing tasks, including text revision and story generation. However, their effectiveness in supporting domain-specific writing, particularly in business contexts, is relatively less explored. Our formative study with industry professionals revealed the limitations in current LLMs' understanding of the nuances in such domain-specific writing. To address this gap, we propose an approach of human-AI collaborative taxonomy development to perform as a guideline for domain-specific writing assistants. This method integrates iterative feedback from domain experts and multiple interactions between these experts and LLMs to refine the taxonomy. Through larger-scale experiments, we aim to validate this methodology and thus improve LLM-powered writing assistance, tailoring it to meet the unique requirements of different stakeholder needs.

Updated: 2024-06-26 18:25:06

标题: 人工智能协作分类构建：以专业特定写作助手为例研究

摘要: 大型语言模型(LLMs)已经帮助人类完成多项写作任务，包括文本修订和故事生成。然而，它们在支持特定领域写作，特别是在商业环境中的效果，相对较少被探讨。我们与行业专业人士进行的初步研究揭示了当前LLMs在理解这种特定领域写作细微差别方面的局限性。为了弥补这一空白，我们提出了一种人工智能协作分类法发展的方法，以作为特定领域写作助手的指南。该方法整合了领域专家的迭代反馈和这些专家与LLMs之间的多次互动，以完善分类法。通过更大规模的实验，我们旨在验证这一方法论，从而改进LLM驱动的写作辅助，使其能够满足不同利益相关者需求的独特要求。

更新时间: 2024-06-26 18:25:06

领域: cs.HC,cs.AI,cs.CL

下载: http://arxiv.org/abs/2406.18675v1

A simple and improved algorithm for noisy, convex, zeroth-order optimisation

In this paper, we study the problem of noisy, convex, zeroth order optimisation of a function $f$ over a bounded convex set $\bar{\mathcal X}\subset \mathbb{R}^d$. Given a budget $n$ of noisy queries to the function $f$ that can be allocated sequentially and adaptively, our aim is to construct an algorithm that returns a point $\hat x\in \bar{\mathcal X}$ such that $f(\hat x)$ is as small as possible. We provide a conceptually simple method inspired by the textbook center of gravity method, but adapted to the noisy and zeroth order setting. We prove that this method is such that the $f(\hat x) - \min_{x\in \bar{\mathcal X}} f(x)$ is of smaller order than $d^2/\sqrt{n}$ up to poly-logarithmic terms. We slightly improve upon existing literature, where to the best of our knowledge the best known rate is in [Lattimore, 2024] is of order $d^{2.5}/\sqrt{n}$, albeit for a more challenging problem. Our main contribution is however conceptual, as we believe that our algorithm and its analysis bring novel ideas and are significantly simpler than existing approaches.

Updated: 2024-06-26 18:19:10

标题: 一个简单且改进的算法用于带噪声、凸函数、零阶优化

摘要: 在这篇论文中，我们研究了在有噪声的情况下，在有界凸集合$\bar{\mathcal X} \subset \mathbb{R}^d$上对函数$f$进行零阶优化的问题。给定一个有噪声查询函数$f$的预算$n$，可以按顺序和自适应方式分配，我们的目标是构建一个算法，返回一个点$\hat x \in \bar{\mathcal X}$，使得$f(\hat x)$尽可能小。我们提供了一个概念上简单的方法，灵感来自于课本上的重心法，但适应了有噪声和零阶设置。我们证明这种方法使得$f(\hat x) - \min_{x\in \bar{\mathcal X}} f(x)$比$d^2/\sqrt{n}$小，直到对数多项式项。我们略有改进已有文献，据我们所知，已知的最佳速率在[Lattimore, 2024]中为$d^{2.5}/\sqrt{n}$，尽管针对一个更具挑战性的问题。然而，我们的主要贡献在于概念上，因为我们认为我们的算法和其分析带来了新颖的想法，并且明显比现有方法简单。

更新时间: 2024-06-26 18:19:10

领域: math.OC,cs.LG,stat.ML

下载: http://arxiv.org/abs/2406.18672v1

A Zero Auxiliary Knowledge Membership Inference Attack on Aggregate Location Data

Location data is frequently collected from populations and shared in aggregate form to guide policy and decision making. However, the prevalence of aggregated data also raises the privacy concern of membership inference attacks (MIAs). MIAs infer whether an individual's data contributed to the aggregate release. Although effective MIAs have been developed for aggregate location data, these require access to an extensive auxiliary dataset of individual traces over the same locations, which are collected from a similar population. This assumption is often impractical given common privacy practices surrounding location data. To measure the risk of an MIA performed by a realistic adversary, we develop the first Zero Auxiliary Knowledge (ZK) MIA on aggregate location data, which eliminates the need for an auxiliary dataset of real individual traces. Instead, we develop a novel synthetic approach, such that suitable synthetic traces are generated from the released aggregate. We also develop methods to correct for bias and noise, to show that our synthetic-based attack is still applicable when privacy mechanisms are applied prior to release. Using two large-scale location datasets, we demonstrate that our ZK MIA matches the state-of-the-art Knock-Knock (KK) MIA across a wide range of settings, including popular implementations of differential privacy (DP) and suppression of small counts. Furthermore, we show that ZK MIA remains highly effective even when the adversary only knows a small fraction (10%) of their target's location history. This demonstrates that effective MIAs can be performed by realistic adversaries, highlighting the need for strong DP protection.

Updated: 2024-06-26 18:14:36

标题: 一种零辅助知识的聚合位置数据成员推断攻击

摘要: 位置数据经常从人口中收集并以聚合形式共享，以指导政策和决策。然而，聚合数据的普遍性也引发了成员推理攻击（MIAs）的隐私担忧。MIAs推断一个人的数据是否有助于聚合发布。尽管已经开发了对聚合位置数据有效的MIAs，但这些需要访问一个广泛的个体轨迹辅助数据集，这些数据集是从类似人口中收集的相同位置。鉴于围绕位置数据的常见隐私做法，这种假设通常是不切实际的。为了衡量由现实对手执行的MIA的风险，我们开发了第一个零辅助知识（ZK）MIA，该MIA基于聚合位置数据，消除了真实个体轨迹辅助数据集的需求。相反，我们开发了一种新颖的合成方法，从发布的聚合数据中生成适当的合成轨迹。我们还开发了纠正偏差和噪声的方法，以表明我们基于合成的攻击在发布之前应用隐私机制时仍然适用。使用两个大规模位置数据集，我们证明了我们的ZK MIA与各种设置下的最先进的Knock-Knock（KK）MIA相匹配，包括差分隐私（DP）的流行实现和抑制小计数。此外，我们还表明，即使对手只知道目标位置历史的一小部分（10%），ZK MIA仍然非常有效。这表明现实对手可以执行有效的MIAs，突显了强大的差分隐私保护的必要性。

更新时间: 2024-06-26 18:14:36

领域: cs.CR,cs.LG

下载: http://arxiv.org/abs/2406.18671v1

RouteLLM: Learning to Route LLMs with Preference Data

Large language models (LLMs) exhibit impressive capabilities across a wide range of tasks, yet the choice of which model to use often involves a trade-off between performance and cost. More powerful models, though effective, come with higher expenses, while less capable models are more cost-effective. To address this dilemma, we propose several efficient router models that dynamically select between a stronger and a weaker LLM during inference, aiming to optimize the balance between cost and response quality. We develop a training framework for these routers leveraging human preference data and data augmentation techniques to enhance performance. Our evaluation on widely-recognized benchmarks shows that our approach significantly reduces costs-by over 2 times in certain cases-without compromising the quality of responses. Interestingly, our router models also demonstrate significant transfer learning capabilities, maintaining their performance even when the strong and weak models are changed at test time. This highlights the potential of these routers to provide a cost-effective yet high-performance solution for deploying LLMs.

Updated: 2024-06-26 18:10:22

标题: RouteLLM：利用偏好数据学习路由LLM

摘要: 大型语言模型（LLMs）在各种任务中展示出令人印象深刻的能力，然而选择使用哪种模型往往涉及性能和成本之间的权衡。更强大的模型虽然有效，但成本更高，而能力较弱的模型则更具成本效益。为了解决这一困境，我们提出了几种高效的路由器模型，在推断过程中动态选择更强大或更弱的LLM，旨在优化成本和响应质量之间的平衡。我们开发了一个训练框架，利用人类偏好数据和数据增强技术来提高性能。我们在广泛认可的基准测试上的评估显示，我们的方法在某些情况下显着减少了成本-超过2倍-而不影响响应质量。有趣的是，我们的路由器模型还展示了显著的迁移学习能力，在测试时即使更换了强弱模型，也能保持性能。这突显了这些路由器提供成本效益且高性能解决方案的潜力。

更新时间: 2024-06-26 18:10:22

领域: cs.LG,cs.AI,cs.CL

下载: http://arxiv.org/abs/2406.18665v1

Evaluating Copyright Takedown Methods for Language Models

Language models (LMs) derive their capabilities from extensive training on diverse data, including potentially copyrighted material. These models can memorize and generate content similar to their training data, posing potential concerns. Therefore, model creators are motivated to develop mitigation methods that prevent generating protected content. We term this procedure as copyright takedowns for LMs, noting the conceptual similarity to (but legal distinction from) the DMCA takedown This paper introduces the first evaluation of the feasibility and side effects of copyright takedowns for LMs. We propose CoTaEval, an evaluation framework to assess the effectiveness of copyright takedown methods, the impact on the model's ability to retain uncopyrightable factual knowledge from the training data whose recitation is embargoed, and how well the model maintains its general utility and efficiency. We examine several strategies, including adding system prompts, decoding-time filtering interventions, and unlearning approaches. Our findings indicate that no tested method excels across all metrics, showing significant room for research in this unique problem setting and indicating potential unresolved challenges for live policy proposals.

Updated: 2024-06-26 18:09:46

标题: 评估语言模型的版权下架方法

摘要: 语言模型（LMs）的能力来自对各种数据的广泛训练，包括潜在的受版权保护的材料。这些模型可以记忆和生成与它们的训练数据相似的内容，可能引起潜在的担忧。因此，模型的创建者被激励发展防止生成受版权保护内容的缓解方法。我们将这一程序称为LMs的版权下架，注意到与DMCA下架概念上的相似性（但在法律上有区别）。本文介绍了对LMs版权下架的可行性和副作用的首次评估。我们提出了CoTaEval，一个评估框架，用于评估版权下架方法的有效性，对模型保留的训练数据中受到禁止的不受版权保护的事实知识的影响，以及模型保持其一般实用性和效率的情况。我们检查了几种策略，包括添加系统提示，解码时过滤干预和取消学习方法。我们的研究结果表明，没有经过测试的方法在所有指标上表现出色，显示了在这一独特问题设置中的研究空间，并指出了潜在的未解决的对于实时政策提案的挑战。

更新时间: 2024-06-26 18:09:46

领域: cs.CL,cs.LG

下载: http://arxiv.org/abs/2406.18664v1

Contraction of Private Quantum Channels and Private Quantum Hypothesis Testing

A quantum generalized divergence by definition satisfies the data-processing inequality; as such, the relative decrease in such a divergence under the action of a quantum channel is at most one. This relative decrease is formally known as the contraction coefficient of the channel and the divergence. Interestingly, there exist combinations of channels and divergences for which the contraction coefficient is strictly less than one. Furthermore, understanding the contraction coefficient is fundamental for the study of statistical tasks under privacy constraints. To this end, here we establish upper bounds on contraction coefficients for the hockey-stick divergence under privacy constraints, where privacy is quantified with respect to the quantum local differential privacy (QLDP) framework, and we fully characterize the contraction coefficient for the trace distance under privacy constraints. With the machinery developed, we also determine an upper bound on the contraction of both the Bures distance and quantum relative entropy relative to the normalized trace distance, under QLDP constraints. Next, we apply our findings to establish bounds on the sample complexity of quantum hypothesis testing under privacy constraints. Furthermore, we study various scenarios in which the sample complexity bounds are tight, while providing order-optimal quantum channels that achieve those bounds. Lastly, we show how private quantum channels provide fairness and Holevo information stability in quantum learning settings.

Updated: 2024-06-26 18:00:03

标题: 私有量子通道的收缩和私有量子假设检验

摘要: 一个量子广义散度在定义上满足数据处理不等式；因此，在量子通道的作用下，这种散度的相对减少最多为一。这种相对减少在形式上被称为通道和散度的收缩系数。有趣的是，存在一些通道和散度的组合，其收缩系数严格小于一。此外，理解收缩系数对于在隐私约束条件下进行统计任务的研究是至关重要的。为此，我们在隐私约束条件下建立了曲棍球散度的收缩系数的上界，其中隐私是根据量子局部差分隐私（QLDP）框架来量化的，并且我们完全描述了在隐私约束条件下迹距离的收缩系数。利用我们开发的技术，我们还确定了在QLDP约束条件下，对于Bures距离和量子相对熵相对于归一化的迹距离的收缩的上界。接下来，我们将我们的研究结果应用于在隐私约束条件下建立量子假设检验的样本复杂度的上界。此外，我们研究了一些情境，在这些情境中，样本复杂度的上界是紧密的，同时提供了实现这些上界的次序最优量子通道。最后，我们展示了私密量子通道如何在量子学习环境中提供公平性和霍勒沃信息稳定性。

更新时间: 2024-06-26 18:00:03

领域: quant-ph,cs.CR,cs.IT,cs.LG,math.IT,stat.ML

下载: http://arxiv.org/abs/2406.18651v1

Improving Hyperparameter Optimization with Checkpointed Model Weights

When training deep learning models, the performance depends largely on the selected hyperparameters. However, hyperparameter optimization (HPO) is often one of the most expensive parts of model design. Classical HPO methods treat this as a black-box optimization problem. However, gray-box HPO methods, which incorporate more information about the setup, have emerged as a promising direction for more efficient optimization. For example, using intermediate loss evaluations to terminate bad selections. In this work, we propose an HPO method for neural networks using logged checkpoints of the trained weights to guide future hyperparameter selections. Our method, Forecasting Model Search (FMS), embeds weights into a Gaussian process deep kernel surrogate model, using a permutation-invariant graph metanetwork to be data-efficient with the logged network weights. To facilitate reproducibility and further research, we open-source our code at https://github.com/NVlabs/forecasting-model-search.

Updated: 2024-06-26 17:59:54

标题: 使用检查点模型权重改进超参数优化

摘要: 在训练深度学习模型时，性能很大程度上取决于选择的超参数。然而，超参数优化（HPO）通常是模型设计中最昂贵的部分之一。经典的HPO方法将其视为黑箱优化问题。然而，灰箱HPO方法，即融入更多关于设置的信息，已经成为更有效优化的一个有希望的方向。例如，使用中间损失评估来终止不良选择。在这项工作中，我们提出了一种用于神经网络的HPO方法，使用已训练权重的记录检查点来指导未来的超参数选择。我们的方法，Forecasting Model Search（FMS），将权重嵌入高斯过程深度核替代模型中，使用排列不变图元网络来与记录的网络权重进行数据高效处理。为了便于重现和进一步研究，我们在https://github.com/NVlabs/forecasting-model-search上开源我们的代码。

更新时间: 2024-06-26 17:59:54

领域: cs.LG,cs.AI,stat.ML,68T05,I.2.6; G.1.6; D.2.8

下载: http://arxiv.org/abs/2406.18630v1

Situational Awareness Matters in 3D Vision Language Reasoning

Being able to carry out complicated vision language reasoning tasks in 3D space represents a significant milestone in developing household robots and human-centered embodied AI. In this work, we demonstrate that a critical and distinct challenge in 3D vision language reasoning is situational awareness, which incorporates two key components: (1) The autonomous agent grounds its self-location based on a language prompt. (2) The agent answers open-ended questions from the perspective of its calculated position. To address this challenge, we introduce SIG3D, an end-to-end Situation-Grounded model for 3D vision language reasoning. We tokenize the 3D scene into sparse voxel representation and propose a language-grounded situation estimator, followed by a situated question answering module. Experiments on the SQA3D and ScanQA datasets show that SIG3D outperforms state-of-the-art models in situation estimation and question answering by a large margin (e.g., an enhancement of over 30% on situation estimation accuracy). Subsequent analysis corroborates our architectural design choices, explores the distinct functions of visual and textual tokens, and highlights the importance of situational awareness in the domain of 3D question answering.

Updated: 2024-06-26 17:59:50

标题: 三维视觉语言推理中的情境感知至关重要

摘要: 能够在3D空间中执行复杂的视觉语言推理任务代表着在发展家用机器人和以人为中心的具身人工智能方面取得了重要里程碑。在这项工作中，我们展示了3D视觉语言推理中的一个关键且独特挑战是情境意识，其中包括两个关键组成部分：（1）自主代理根据语言提示确定其自身位置。（2）代理从其计算出的位置的角度回答开放式问题。为了解决这一挑战，我们引入了SIG3D，一个用于3D视觉语言推理的端到端情境基础模型。我们将3D场景标记为稀疏体素表示，并提出了一个语言基础的情境估计器，随后是一个情境问答模块。在SQA3D和ScanQA数据集上的实验表明，SIG3D在情境估计和问题回答方面远远优于最先进的模型（例如，在情境估计准确性方面提高了超过30％）。随后的分析证实了我们的架构设计选择，探索了视觉和文本标记的不同功能，并强调了在3D问题回答领域中情境意识的重要性。

更新时间: 2024-06-26 17:59:50

领域: cs.CV,cs.AI,cs.CL,cs.LG

下载: http://arxiv.org/abs/2406.07544v2

On Convex Data-Driven Inverse Optimal Control for Nonlinear, Non-stationary and Stochastic Systems

This paper is concerned with a finite-horizon inverse control problem, which has the goal of reconstructing, from observations, the possibly non-convex and non-stationary cost driving the actions of an agent. In this context, we present a result enabling cost reconstruction by solving an optimization problem that is convex even when the agent cost is not and when the underlying dynamics is nonlinear, non-stationary and stochastic. To obtain this result, we also study a finite-horizon forward control problem that has randomized policies as decision variables. We turn our findings into algorithmic procedures and show the effectiveness of our approach via in-silico and hardware validations. All experiments confirm the effectiveness of our approach.

Updated: 2024-06-26 17:59:37

标题: 关于非线性、非稳态和随机系统的凸数据驱动逆优化控制

摘要: 本文关注一个有限时间跨度的逆控制问题，其目标是从观察中重建可能非凸和非定常的成本，这些成本驱动代理的行为。在这个背景下，我们提出了一个结果，通过解决一个优化问题实现成本的重建，即使代理成本不是凸的，基础动态也是非线性、非定常和随机的。为了得到这个结果，我们还研究了一个有限时间跨度的前向控制问题，其中随机策略作为决策变量。我们将研究结果转化为算法过程，并通过模拟和硬件验证展示了我们方法的有效性。所有实验都证实了我们方法的有效性。

更新时间: 2024-06-26 17:59:37

领域: math.OC,cs.IT,cs.LG,cs.RO,math.DS,math.IT

下载: http://arxiv.org/abs/2306.13928v2

Towards Compositionality in Concept Learning

Concept-based interpretability methods offer a lens into the internals of foundation models by decomposing their embeddings into high-level concepts. These concept representations are most useful when they are compositional, meaning that the individual concepts compose to explain the full sample. We show that existing unsupervised concept extraction methods find concepts which are not compositional. To automatically discover compositional concept representations, we identify two salient properties of such representations, and propose Compositional Concept Extraction (CCE) for finding concepts which obey these properties. We evaluate CCE on five different datasets over image and text data. Our evaluation shows that CCE finds more compositional concept representations than baselines and yields better accuracy on four downstream classification tasks. Code and data are available at https://github.com/adaminsky/compositional_concepts .

Updated: 2024-06-26 17:59:30

标题: 朝向概念学习的组合性

摘要: 基于概念的可解释性方法通过将基础模型的嵌入分解为高级概念，提供了一种了解内部机制的视角。这些概念表示在它们是组合性的时候最有用，这意味着单个概念能够组合起来解释整个样本。我们发现现有的无监督概念提取方法发现的概念并非组合性的。为了自动发现组合性概念表示，我们确定了这些表示的两个显著属性，并提出了用于找到遵循这些属性的概念的组合概念提取（CCE）方法。我们在图像和文本数据的五个不同数据集上对CCE进行评估。我们的评估结果显示，CCE比基线方法找到更多具有组合性的概念表示，并在四个下游分类任务上获得更好的准确性。代码和数据可在https://github.com/adaminsky/compositional_concepts 上获得。

更新时间: 2024-06-26 17:59:30

领域: cs.CL,cs.LG

下载: http://arxiv.org/abs/2406.18534v1

Symbolic Learning Enables Self-Evolving Agents

The AI community has been exploring a pathway to artificial general intelligence (AGI) by developing "language agents", which are complex large language models (LLMs) pipelines involving both prompting techniques and tool usage methods. While language agents have demonstrated impressive capabilities for many real-world tasks, a fundamental limitation of current language agents research is that they are model-centric, or engineering-centric. That's to say, the progress on prompts, tools, and pipelines of language agents requires substantial manual engineering efforts from human experts rather than automatically learning from data. We believe the transition from model-centric, or engineering-centric, to data-centric, i.e., the ability of language agents to autonomously learn and evolve in environments, is the key for them to possibly achieve AGI. In this work, we introduce agent symbolic learning, a systematic framework that enables language agents to optimize themselves on their own in a data-centric way using symbolic optimizers. Specifically, we consider agents as symbolic networks where learnable weights are defined by prompts, tools, and the way they are stacked together. Agent symbolic learning is designed to optimize the symbolic network within language agents by mimicking two fundamental algorithms in connectionist learning: back-propagation and gradient descent. Instead of dealing with numeric weights, agent symbolic learning works with natural language simulacrums of weights, loss, and gradients. We conduct proof-of-concept experiments on both standard benchmarks and complex real-world tasks and show that agent symbolic learning enables language agents to update themselves after being created and deployed in the wild, resulting in "self-evolving agents".

Updated: 2024-06-26 17:59:18

标题: Symbolic Learning Enables Self-Evolving Agents 符号学习使自进化代理变得可能

摘要: 人工智能领域一直在探索一条通往人工通用智能（AGI）的路径，通过开发“语言代理”，这些代理是复杂的大型语言模型（LLMs）管道，涉及提示技术和工具使用方法。虽然语言代理已经展示出在许多现实世界任务中的令人印象深刻的能力，但当前语言代理研究的一个基本限制是它们是以模型为中心或工程为中心的。也就是说，语言代理的提示、工具和管道的进展需要人类专家进行大量手工工程努力，而不是从数据中自动学习。我们认为，从以模型为中心或以工程为中心转变为以数据为中心，即语言代理具有自主学习和在环境中进化的能力，是它们可能实现AGI的关键。在这项工作中，我们介绍了代理符号学习，这是一个系统框架，使语言代理能够以数据为中心的方式使用符号优化器自我优化。具体而言，我们将代理视为符号网络，其中可学习的权重由提示、工具以及它们如何堆叠在一起来定义。代理符号学习旨在通过模仿连接主义学习中的两个基本算法：反向传播和梯度下降，优化语言代理内的符号网络。代理符号学习不是处理数值权重，而是处理权重、损失和梯度的自然语言模拟。我们在标准基准测试和复杂的现实世界任务上进行了概念验证实验，并展示代理符号学习使语言代理能够在被创建和部署后更新自身，从而产生“自我进化代理”。

更新时间: 2024-06-26 17:59:18

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2406.18532v1

Confident Natural Policy Gradient for Local Planning in $q_π$-realizable Constrained MDPs

The constrained Markov decision process (CMDP) framework emerges as an important reinforcement learning approach for imposing safety or other critical objectives while maximizing cumulative reward. However, the current understanding of how to learn efficiently in a CMDP environment with a potentially infinite number of states remains under investigation, particularly when function approximation is applied to the value functions. In this paper, we address the learning problem given linear function approximation with $q_{\pi}$-realizability, where the value functions of all policies are linearly representable with a known feature map, a setting known to be more general and challenging than other linear settings. Utilizing a local-access model, we propose a novel primal-dual algorithm that, after $\tilde{O}(\text{poly}(d) \epsilon^{-3})$ queries, outputs with high probability a policy that strictly satisfies the constraints while nearly optimizing the value with respect to a reward function. Here, $d$ is the feature dimension and $\epsilon > 0$ is a given error. The algorithm relies on a carefully crafted off-policy evaluation procedure to evaluate the policy using historical data, which informs policy updates through policy gradients and conserves samples. To our knowledge, this is the first result achieving polynomial sample complexity for CMDP in the $q_{\pi}$-realizable setting.

Updated: 2024-06-26 17:57:13

标题: 自信的自然策略梯度在$q_π$-可实现的受限MDPs中的局部规划中

摘要: 受限马尔可夫决策过程（CMDP）框架作为一种重要的强化学习方法，用于在最大化累积奖励的同时实施安全或其他关键目标。然而，目前对于如何在具有潜在无限状态数量的CMDP环境中进行高效学习的理解仍在调查中，特别是当将函数逼近应用于值函数时。在本文中，我们讨论了具有$q_{\pi}$-可实现性的线性函数逼近的学习问题，其中所有策略的值函数均可用已知特征映射线性表示，这个设置被认为比其他线性设置更普遍和具有挑战性。利用局部访问模型，我们提出了一种新颖的原始-对偶算法，在经过$\tilde{O}(\text{poly}(d) \epsilon^{-3})$个查询后，以很高的概率输出一个严格满足约束条件的策略，同时几乎优化了相对于奖励函数的价值。这里，$d$是特征维度，$\epsilon > 0$是给定的误差。该算法依赖于精心设计的离线评估过程，利用历史数据评估策略，通过策略梯度更新策略并保留样本。据我们所知，这是在$q_{\pi}$-可实现设置下实现CMDP的多项式样本复杂度的第一个结果。

更新时间: 2024-06-26 17:57:13

领域: cs.LG

下载: http://arxiv.org/abs/2406.18529v1

APIGen: Automated Pipeline for Generating Verifiable and Diverse Function-Calling Datasets

The advancement of function-calling agent models requires diverse, reliable, and high-quality datasets. This paper presents APIGen, an automated data generation pipeline designed to synthesize verifiable high-quality datasets for function-calling applications. We leverage APIGen and collect 3,673 executable APIs across 21 different categories to generate diverse function-calling datasets in a scalable and structured manner. Each data in our dataset is verified through three hierarchical stages: format checking, actual function executions, and semantic verification, ensuring its reliability and correctness. We demonstrate that models trained with our curated datasets, even with only 7B parameters, can achieve state-of-the-art performance on the Berkeley Function-Calling Benchmark, outperforming multiple GPT-4 models. Moreover, our 1B model achieves exceptional performance, surpassing GPT-3.5-Turbo and Claude-3 Haiku. We release a dataset containing 60,000 high-quality entries, aiming to advance the field of function-calling agent domains. The dataset is available on Huggingface: https://huggingface.co/datasets/Salesforce/xlam-function-calling-60k and the project homepage: https://apigen-pipeline.github.io/

Updated: 2024-06-26 17:49:11

标题: APIGen：用于生成可验证和多样化功能调用数据集的自动化流水线

摘要: 功能调用代理模型的进展需要多样化、可靠和高质量的数据集。本文介绍了APIGen，一种自动化数据生成管道，旨在为功能调用应用程序合成可验证的高质量数据集。我们利用APIGen并收集了21个不同类别中的3673个可执行API，以便以可扩展和结构化的方式生成多样化的功能调用数据集。我们的数据集中的每个数据都经过三个层次的验证：格式检查、实际函数执行和语义验证，确保其可靠性和正确性。我们展示，即使只有70亿参数，使用我们精心策划的数据集训练的模型也可以在伯克利函数调用基准测试中实现最先进的性能，胜过多个GPT-4模型。此外，我们的10亿模型表现出色，超过了GPT-3.5-Turbo和Claude-3 Haiku。我们发布了一个包含60,000个高质量条目的数据集，旨在推动功能调用代理领域的发展。该数据集可在Huggingface上获得：https://huggingface.co/datasets/Salesforce/xlam-function-calling-60k，以及项目主页：https://apigen-pipeline.github.io/。

更新时间: 2024-06-26 17:49:11

领域: cs.CL,cs.AI,cs.LG,cs.SE

下载: http://arxiv.org/abs/2406.18518v1

Large Language Models in the Clinic: A Comprehensive Benchmark

The adoption of large language models (LLMs) to assist clinicians has attracted remarkable attention. Existing works mainly adopt the close-ended question-answering (QA) task with answer options for evaluation. However, many clinical decisions involve answering open-ended questions without pre-set options. To better understand LLMs in the clinic, we construct a benchmark ClinicBench. We first collect eleven existing datasets covering diverse clinical language generation, understanding, and reasoning tasks. Furthermore, we construct six novel datasets and complex clinical tasks that are close to real-world practice, i.e., referral QA, treatment recommendation, hospitalization (long document) summarization, patient education, pharmacology QA and drug interaction for emerging drugs. We conduct an extensive evaluation of twenty-two LLMs under both zero-shot and few-shot settings. Finally, we invite medical experts to evaluate the clinical usefulness of LLMs.

Updated: 2024-06-26 17:48:18

标题: 诊所中的大型语言模型：全面评估Benchmark

摘要: 大型语言模型（LLMs）在协助临床医生方面的采用引起了显着关注。现有研究主要采用带有答案选项的封闭式问答（QA）任务进行评估。然而，许多临床决策涉及回答没有预设选项的开放式问题。为了更好地了解LLMs在临床中的应用，我们构建了一个基准ClinicBench。我们首先收集了涵盖多样化临床语言生成、理解和推理任务的十一个现有数据集。此外，我们构建了六个新的数据集和接近真实世界实践的复杂临床任务，即转诊QA、治疗建议、住院（长文档）摘要、患者教育、药理学QA和新药相互作用。我们对二十二个LLMs进行了广泛评估，包括零样本和少样本设置。最后，我们邀请医学专家评估LLMs的临床实用性。

更新时间: 2024-06-26 17:48:18

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2405.00716v3

Leveraging Large Language Models for Software Model Completion: Results from Industrial and Public Datasets

Modeling structure and behavior of software systems plays a crucial role in the industrial practice of software engineering. As with other software engineering artifacts, software models are subject to evolution. Supporting modelers in evolving software models with recommendations for model completions is still an open problem, though. In this paper, we explore the potential of large language models for this task. In particular, we propose an approach, retrieval-augmented generation, leveraging large language models, model histories, and retrieval-augmented generation for model completion. Through experiments on three datasets, including an industrial application, one public open-source community dataset, and one controlled collection of simulated model repositories, we evaluate the potential of large language models for model completion with retrieval-augmented generation. We found that large language models are indeed a promising technology for supporting software model evolution (62.30% semantically correct completions on real-world industrial data and up to 86.19% type-correct completions). The general inference capabilities of large language models are particularly useful when dealing with concepts for which there are few, noisy, or no examples at all.

Updated: 2024-06-26 17:43:15

标题: 利用大型语言模型进行软件模型补全：来自工业和公共数据集的结果

摘要: 建模软件系统的结构和行为在软件工程的工业实践中起着至关重要的作用。与其他软件工程工件一样，软件模型也会经历演变。尽管如此，支持模型师使用建议来完善软件模型仍然是一个未解决的问题。在本文中，我们探讨了大型语言模型在这一任务中的潜力。具体而言，我们提出了一种方法，即检索增强生成，利用大型语言模型、模型历史和检索增强生成进行模型完成。通过对三个数据集的实验，包括一个工业应用、一个公开的开源社区数据集和一个模拟模型仓库的受控集合，我们评估了大型语言模型在模型完成中的潜力。我们发现大型语言模型确实是一种有希望支持软件模型演变的技术（在真实世界工业数据上的语义正确完成率为62.30%，最高可达86.19%的类型正确完成率）。大型语言模型的一般推理能力在处理那些很少、嘈杂或根本没有示例的概念时特别有用。

更新时间: 2024-06-26 17:43:15

领域: cs.SE,cs.AI,94-04,D.2.2

下载: http://arxiv.org/abs/2406.17651v2

Step-DPO: Step-wise Preference Optimization for Long-chain Reasoning of LLMs

Mathematical reasoning presents a significant challenge for Large Language Models (LLMs) due to the extensive and precise chain of reasoning required for accuracy. Ensuring the correctness of each reasoning step is critical. To address this, we aim to enhance the robustness and factuality of LLMs by learning from human feedback. However, Direct Preference Optimization (DPO) has shown limited benefits for long-chain mathematical reasoning, as models employing DPO struggle to identify detailed errors in incorrect answers. This limitation stems from a lack of fine-grained process supervision. We propose a simple, effective, and data-efficient method called Step-DPO, which treats individual reasoning steps as units for preference optimization rather than evaluating answers holistically. Additionally, we have developed a data construction pipeline for Step-DPO, enabling the creation of a high-quality dataset containing 10K step-wise preference pairs. We also observe that in DPO, self-generated data is more effective than data generated by humans or GPT-4, due to the latter's out-of-distribution nature. Our findings demonstrate that as few as 10K preference data pairs and fewer than 500 Step-DPO training steps can yield a nearly 3% gain in accuracy on MATH for models with over 70B parameters. Notably, Step-DPO, when applied to Qwen2-72B-Instruct, achieves scores of 70.8% and 94.0% on the test sets of MATH and GSM8K, respectively, surpassing a series of closed-source models, including GPT-4-1106, Claude-3-Opus, and Gemini-1.5-Pro. Our code, data, and models are available at https://github.com/dvlab-research/Step-DPO.

Updated: 2024-06-26 17:43:06

标题: Step-DPO：LLM的长链推理的逐步偏好优化

摘要: 数学推理对于大型语言模型（LLMs）构成了重大挑战，因为需要进行广泛而精确的推理链以确保准确性。确保每个推理步骤的正确性至关重要。为了解决这个问题，我们旨在通过学习人类反馈来增强LLMs的鲁棒性和事实性。然而，直接偏好优化（DPO）对于长链数学推理的好处有限，因为采用DPO的模型很难识别错误答案中的详细错误。这一限制源于缺乏细粒度的过程监督。我们提出了一种简单、有效和数据高效的方法，称为Step-DPO，该方法将单个推理步骤视为偏好优化的单位，而不是对整体答案进行评估。此外，我们开发了一个数据构建管道用于Step-DPO，可以创建一个包含10K步骤偏好对的高质量数据集。我们还观察到，在DPO中，自动生成的数据比人类或GPT-4生成的数据更有效，这是因为后者具有超出分布的特性。我们的研究结果表明，仅仅10K个偏好数据对和少于500个Step-DPO训练步骤就可以使具有超过70B参数的模型在MATH上的准确率提高近3%。值得注意的是，当应用于Qwen2-72B-Instruct时，Step-DPO在MATH和GSM8K的测试集上分别取得了70.8%和94.0%的得分，超过了一系列闭源模型，包括GPT-4-1106、Claude-3-Opus和Gemini-1.5-Pro。我们的代码、数据和模型可在https://github.com/dvlab-research/Step-DPO找到。

更新时间: 2024-06-26 17:43:06

领域: cs.LG,cs.AI,cs.CL

下载: http://arxiv.org/abs/2406.18629v1

Large Language Model Enhanced Clustering for News Event Detection

The news landscape is continuously evolving, with an ever-increasing volume of information from around the world. Automated event detection within this vast data repository is essential for monitoring, identifying, and categorizing significant news occurrences across diverse platforms. This paper presents an event detection framework that leverages Large Language Models (LLMs) combined with clustering analysis to detect news events from the Global Database of Events, Language, and Tone (GDELT). The framework enhances event clustering through both pre-event detection tasks (keyword extraction and text embedding) and post-event detection tasks (event summarization and topic labeling). We also evaluate the impact of various textual embeddings on the quality of clustering outcomes, ensuring robust news categorization. Additionally, we introduce a novel Cluster Stability Assessment Index (CSAI) to assess the validity and robustness of clustering results. CSAI utilizes latent feature vectors to provide a new way of measuring clustering quality. Our experiments indicate that combining LLM embeddings with clustering algorithms yields the best results, demonstrating greater robustness in terms of CSAI scores. Moreover, post-event detection tasks generate meaningful insights, facilitating effective interpretation of event clustering results. Overall, our experimental results indicate that the proposed framework offers valuable insights and could enhance the accuracy and depth of news reporting.

Updated: 2024-06-26 17:42:59

标题: 大型语言模型增强的新闻事件检测聚类

摘要: 新闻景观不断发展，世界各地的信息量不断增加。在这个庞大的数据存储库中，自动事件检测对于监测、识别和分类跨平台的重要新闻事件至关重要。本文提出了一个利用大型语言模型（LLMs）结合聚类分析来从全球事件、语言和语调数据库（GDELT）中检测新闻事件的框架。该框架通过前事件检测任务（关键词提取和文本嵌入）和后事件检测任务（事件摘要和主题标签）增强事件聚类。我们还评估了各种文本嵌入对聚类结果质量的影响，确保新闻分类的稳健性。此外，我们引入了一种新的集群稳定性评估指数（CSAI）来评估聚类结果的有效性和稳健性。CSAI利用潜在特征向量提供了一种衡量聚类质量的新方法。我们的实验证明，将LLM嵌入与聚类算法结合可以获得最佳结果，从CSAI分数的角度展现出更强的稳健性。此外，后事件检测任务生成有意义的见解，有助于有效解释事件聚类结果。总的来说，我们的实验结果表明，提出的框架提供了有价值的见解，可以增强新闻报道的准确性和深度。

更新时间: 2024-06-26 17:42:59

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2406.10552v2

Multimodal and Force-Matched Imitation Learning with a See-Through Visuotactile Sensor

Contact-rich tasks continue to present a variety of challenges for robotic manipulation. In this work, we leverage a multimodal visuotactile sensor within the framework of imitation learning (IL) to perform contact rich tasks that involve relative motion (slipping/sliding) between the end-effector and object. We introduce two algorithmic contributions, tactile force matching and learned mode switching, as complimentary methods for improving IL. Tactile force matching enhances kinesthetic teaching by reading approximate forces during the demonstration and generating an adapted robot trajectory that recreates the recorded forces. Learned mode switching uses IL to couple visual and tactile sensor modes with the learned motion policy, simplifying the transition from reaching to contacting. We perform robotic manipulation experiments on four door opening tasks with a variety of observation and method configurations to study the utility of our proposed improvements and multimodal visuotactile sensing. Our results show that the inclusion of force matching raises average policy success rates by 62.5%, visuotactile mode switching by 30.3%, and visuotactile data as a policy input by 42.5%, emphasizing the value of see-through tactile sensing for IL, both for data collection to allow force matching, and for policy execution to allow accurate task feedback.

Updated: 2024-06-26 17:40:14

标题: 多模态和力匹配的透视视触觉传感器模仿学习

摘要: 接触丰富的任务对于机器人操作仍然存在各种挑战。在这项工作中，我们利用多模视触觉传感器在模仿学习(IL)框架下执行涉及末端执行器和对象之间相对运动（滑动/滑动）的接触丰富任务。我们引入了两种算法贡献，触觉力匹配和学习模式切换，作为改进IL的补充方法。触觉力匹配通过在演示过程中读取近似力量并生成适应的机器人轨迹来增强动作教学。学习模式切换使用IL将视触觉传感器模式与学习运动策略相结合，简化从到达到接触的过渡。我们在四个打开门任务上进行机器人操作实验，采用各种观察和方法配置来研究我们提出的改进和多模视触觉传感器的效用。我们的结果表明，力匹配的包含提高了平均策略成功率62.5％，视触觉模式切换提高了30.3％，视触觉数据作为策略输入提高了42.5％，强调了透明触觉传感对IL的价值，无论是用于数据收集以允许力匹配，还是用于策略执行以允许准确的任务反馈。

更新时间: 2024-06-26 17:40:14

领域: cs.RO,cs.AI,cs.LG

下载: http://arxiv.org/abs/2311.01248v3

AND: Audio Network Dissection for Interpreting Deep Acoustic Models

Neuron-level interpretations aim to explain network behaviors and properties by investigating neurons responsive to specific perceptual or structural input patterns. Although there is emerging work in the vision and language domains, none is explored for acoustic models. To bridge the gap, we introduce $\textit{AND}$, the first $\textbf{A}$udio $\textbf{N}$etwork $\textbf{D}$issection framework that automatically establishes natural language explanations of acoustic neurons based on highly-responsive audio. $\textit{AND}$ features the use of LLMs to summarize mutual acoustic features and identities among audio. Extensive experiments are conducted to verify $\textit{AND}$'s precise and informative descriptions. In addition, we demonstrate a potential use of $\textit{AND}$ for audio machine unlearning by conducting concept-specific pruning based on the generated descriptions. Finally, we highlight two acoustic model behaviors with analysis by $\textit{AND}$: (i) models discriminate audio with a combination of basic acoustic features rather than high-level abstract concepts; (ii) training strategies affect model behaviors and neuron interpretability -- supervised training guides neurons to gradually narrow their attention, while self-supervised learning encourages neurons to be polysemantic for exploring high-level features.

Updated: 2024-06-26 17:36:53

标题: AND：用于解释深度声学模型的音频网络解剖学

摘要: 神经元级别的解释旨在通过研究对特定知觉或结构输入模式具有响应性的神经元来解释网络行为和属性。尽管在视觉和语言领域出现了一些相关工作，但在声学模型领域尚未有相关探索。为了弥合这一差距，我们引入了AND，这是第一个自动建立声学神经元自然语言解释的框架，其基于高度响应的音频。AND采用LLMs来总结音频之间的相互声学特征和身份。进行了大量实验来验证AND的精确和信息丰富的描述。此外，我们演示了AND在音频机器遗忘中的潜在用途，通过根据生成的描述进行基于概念的修剪。最后，我们通过AND对两种声学模型行为进行了分析：(i)模型通过基本声学特征的组合来区分音频，而不是高级抽象概念；(ii)训练策略影响模型行为和神经元的可解释性--监督训练引导神经元逐渐缩小其关注范围，而自监督学习鼓励神经元具有多义性，以探索高级特征。

更新时间: 2024-06-26 17:36:53

领域: cs.SD,cs.AI,eess.AS

下载: http://arxiv.org/abs/2406.16990v2

BASS: Batched Attention-optimized Speculative Sampling

Speculative decoding has emerged as a powerful method to improve latency and throughput in hosting large language models. However, most existing implementations focus on generating a single sequence. Real-world generative AI applications often require multiple responses and how to perform speculative decoding in a batched setting while preserving its latency benefits poses non-trivial challenges. This paper describes a system of batched speculative decoding that sets a new state of the art in multi-sequence generation latency and that demonstrates superior GPU utilization as well as quality of generations within a time budget. For example, for a 7.8B-size model on a single A100 GPU and with a batch size of 8, each sequence is generated at an average speed of 5.8ms per token, the overall throughput being 1.1K tokens per second. These results represent state-of-the-art latency and a 2.15X speed-up over optimized regular decoding. Within a time budget that regular decoding does not finish, our system is able to generate sequences with HumanEval Pass@First of 43% and Pass@All of 61%, far exceeding what's feasible with single-sequence speculative decoding. Our peak GPU utilization during decoding reaches as high as 15.8%, more than 3X the highest of that of regular decoding and around 10X of single-sequence speculative decoding.

Updated: 2024-06-26 17:29:46

标题: BASS：批量化的关注优化推测抽样

摘要: 推测解码已经成为提高大型语言模型主机延迟和吞吐量的强大方法。然而，大多数现有的实现都集中在生成单个序列上。现实世界中的生成式人工智能应用通常需要多个响应，如何在批处理设置中执行推测解码并保持其延迟优势，面临着非常困难的挑战。本文描述了一种批量推测解码系统，该系统在多序列生成延迟方面达到了最新的技术水平，并展示了卓越的GPU利用率以及在时间预算内的生成质量。例如，对于一个7.8B规模的模型，在单个A100 GPU上，批处理大小为8，每个序列的平均生成速度为5.8ms每个标记，整体吞吐量为每秒1.1K个标记。这些结果代表了最先进的延迟和比优化的常规解码快2.15倍。在常规解码无法完成的时间预算内，我们的系统能够生成具有43%的HumanEval Pass@First和61%的Pass@All的序列，远远超出了单序列推测解码的可行范围。我们在解码过程中的GPU利用率达到了高达15.8%，超过了常规解码的最高值的3倍，以及单序列推测解码的10倍左右。

更新时间: 2024-06-26 17:29:46

领域: cs.LG,cs.CL

下载: http://arxiv.org/abs/2404.15778v2

Mental Modeling of Reinforcement Learning Agents by Language Models

Can emergent language models faithfully model the intelligence of decision-making agents? Though modern language models exhibit already some reasoning ability, and theoretically can potentially express any probable distribution over tokens, it remains underexplored how the world knowledge these pretrained models have memorized can be utilized to comprehend an agent's behaviour in the physical world. This study empirically examines, for the first time, how well large language models (LLMs) can build a mental model of agents, termed agent mental modelling, by reasoning about an agent's behaviour and its effect on states from agent interaction history. This research may unveil the potential of leveraging LLMs for elucidating RL agent behaviour, addressing a key challenge in eXplainable reinforcement learning (XRL). To this end, we propose specific evaluation metrics and test them on selected RL task datasets of varying complexity, reporting findings on agent mental model establishment. Our results disclose that LLMs are not yet capable of fully mental modelling agents through inference alone without further innovations. This work thus provides new insights into the capabilities and limitations of modern LLMs.

Updated: 2024-06-26 17:14:45

标题: 语言模型对强化学习代理的心理建模

摘要: 紧急语言模型能够忠实地模拟决策代理的智能吗？尽管现代语言模型已经展示了一定的推理能力，并且理论上可以潜在地表达对标记的任何可能分布，但尚未深入探讨这些预训练模型已经记忆的世界知识如何被利用来理解一个代理在物理世界中的行为。本研究首次从经验上考察了大型语言模型（LLMs）能够通过推理建立代理的心理模型，即所谓的代理心理建模，通过推理代理的行为及其对状态的影响从代理互动历史中。这项研究可能揭示了利用LLMs阐明强化学习代理行为的潜力，解决了可解释性强化学习（XRL）中的一个关键挑战。为此，我们提出了具体的评估指标，并在选择的不同复杂度的RL任务数据集上进行了测试，报告了有关代理心理模型建立的发现。我们的结果显示，LLMs尚未能够仅通过推理而不需要进一步的创新来完全建立代理的心理模型。这项工作因此为现代LLMs的能力和局限性提供了新的见解。

更新时间: 2024-06-26 17:14:45

领域: cs.LG,cs.AI,cs.CL,cs.RO

下载: http://arxiv.org/abs/2406.18505v1

Learning Generalizable Program and Architecture Representations for Performance Modeling

Performance modeling is an essential tool in many areas, including performance characterization/optimization, design space exploration, and resource allocation problems, to name a few. However, existing performance modeling approaches have limitations, such as high computational cost for discrete-event simulators, narrow flexibility of hardware emulators, or restricted accuracy/generality of analytical/data-driven models. To address these limitations, this paper proposes PerfVec, a novel deep learning-based performance modeling framework that learns high-dimensional and independent/orthogonal program and microarchitecture representations. Once learned, a program representation can be used to predict its performance on any microarchitecture, and likewise, a microarchitecture representation can be applied in the performance prediction of any program. Additionally, PerfVec yields a foundation model that captures the performance essence of instructions, which can be directly used by developers in numerous performance modeling related tasks without incurring its training cost. The evaluation demonstrates that PerfVec is more general and efficient than previous approaches.

Updated: 2024-06-26 17:12:21

标题: 学习通用程序和架构表示以进行性能建模

摘要: 性能建模是许多领域中的一个重要工具，包括性能特征/优化、设计空间探索和资源分配问题等。然而，现有的性能建模方法存在一些限制，比如对于离散事件模拟器而言计算成本高、硬件仿真器的灵活性有限，或者分析/数据驱动模型的精度/普适性受限。为了解决这些限制，本文提出了PerfVec，这是一个基于深度学习的性能建模框架，可以学习高维度和独立/正交的程序和微架构表示。一旦学习完成，程序表示可以用来预测其在任何微架构上的性能，同样，微架构表示也可以应用于任何程序的性能预测中。此外，PerfVec还产生了一个基础模型，捕捉了指令的性能本质，开发人员可以直接在许多与性能建模相关的任务中使用，而不需要承担其训练成本。评估结果表明，PerfVec比先前的方法更普遍和高效。

更新时间: 2024-06-26 17:12:21

领域: cs.LG,cs.AR

下载: http://arxiv.org/abs/2310.16792v2

BigCodeBench: Benchmarking Code Generation with Diverse Function Calls and Complex Instructions

Automated software engineering has been greatly empowered by the recent advances in Large Language Models (LLMs) for programming. While current benchmarks have shown that LLMs can perform various software engineering tasks like human developers, the majority of their evaluations are limited to short and self-contained algorithmic tasks. Solving challenging and practical programming tasks requires the capability of utilizing diverse function calls as tools to efficiently implement functionalities like data analysis and web development. In addition, using multiple tools to solve a task needs compositional reasoning by accurately understanding complex instructions. Fulfilling both of these characteristics can pose a great challenge for LLMs. To assess how well LLMs can solve challenging and practical programming tasks, we introduce Bench, a benchmark that challenges LLMs to invoke multiple function calls as tools from 139 libraries and 7 domains for 1,140 fine-grained programming tasks. To evaluate LLMs rigorously, each programming task encompasses 5.6 test cases with an average branch coverage of 99%. In addition, we propose a natural-language-oriented variant of Bench, Benchi, that automatically transforms the original docstrings into short instructions only with essential information. Our extensive evaluation of 60 LLMs shows that LLMs are not yet capable of following complex instructions to use function calls precisely, with scores up to 60%, significantly lower than the human performance of 97%. The results underscore the need for further advancements in this area.

Updated: 2024-06-26 17:05:14

标题: BigCodeBench：使用多样化的函数调用和复杂指令进行代码生成基准测试

摘要: 最近对编程的大型语言模型（LLMs）的进展极大地推动了自动化软件工程。虽然目前的基准测试表明LLMs可以执行各种软件工程任务，如人类开发者一样，但它们的评估大多局限于短小且自包含的算法任务。解决具有挑战性和实际意义的编程任务需要利用各种函数调用作为工具，以有效地实现数据分析和网站开发等功能。此外，使用多个工具来解决一个任务需要通过准确理解复杂指令进行合成推理。满足这两个特征对LLMs可能构成巨大挑战。为了评估LLMs在解决具有挑战性和实际意义的编程任务方面的表现如何，我们引入了Bench，一个基准测试，挑战LLMs从139个库和7个领域中调用多个函数调用作为工具，共计1,140个细粒度编程任务。为了严格评估LLMs，每个编程任务包含5.6个测试用例，平均分支覆盖率为99%。此外，我们提出了Bench的自然语言导向变体Benchi，将原始文档字符串自动转换为仅包含基本信息的简短指令。我们对60个LLMs进行了广泛评估，结果显示LLMs尚无法准确遵循复杂指令使用函数调用，得分最高仅为60%，明显低于人类表现的97%。这些结果强调了在这一领域进一步推进的需要。

更新时间: 2024-06-26 17:05:14

领域: cs.SE,cs.AI,cs.CL

下载: http://arxiv.org/abs/2406.15877v2

Scaling and renormalization in high-dimensional regression

This paper presents a succinct derivation of the training and generalization performance of a variety of high-dimensional ridge regression models using the basic tools of random matrix theory and free probability. We provide an introduction and review of recent results on these topics, aimed at readers with backgrounds in physics and deep learning. Analytic formulas for the training and generalization errors are obtained in a few lines of algebra directly from the properties of the $S$-transform of free probability. This allows for a straightforward identification of the sources of power-law scaling in model performance. We compute the generalization error of a broad class of random feature models. We find that in all models, the $S$-transform corresponds to the train-test generalization gap, and yields an analogue of the generalized-cross-validation estimator. Using these techniques, we derive fine-grained bias-variance decompositions for a very general class of random feature models with structured covariates. These novel results allow us to discover a scaling regime for random feature models where the variance due to the features limits performance in the overparameterized setting. We also demonstrate how anisotropic weight structure in random feature models can limit performance and lead to nontrivial exponents for finite-width corrections in the overparameterized setting. Our results extend and provide a unifying perspective on earlier models of neural scaling laws.

Updated: 2024-06-26 16:56:06

标题: 高维回归中的缩放和重整化

摘要: 本文利用随机矩阵理论和自由概率的基本工具，简洁地推导了各种高维岭回归模型的训练和泛化性能。我们介绍并回顾了这些主题的最新研究成果，针对具有物理和深度学习背景的读者。通过从自由概率的$S$-变换的性质直接得到的几行代数运算，得到了训练和泛化误差的解析公式。这使得模型性能中幂律尺度的来源可以直接识别。我们计算了一类广泛的随机特征模型的泛化误差。我们发现在所有模型中，$S$-变换对应于训练-测试泛化差距，并提供了广义交叉验证估计器的类似物。利用这些技术，我们为具有结构协变量的非常普遍的随机特征模型推导了细粒度的偏差-方差分解。这些新颖结果使我们能够发现随机特征模型的一个尺度区域，在这个区域中，特征引起的方差限制了过度参数化设置中的性能。我们还展示了随机特征模型中各向异性权重结构如何限制性能，并导致在过度参数化设置中有限宽度修正的非平凡指数。我们的结果扩展并提供了对以前的神经缩放定律模型的一个统一视角。

更新时间: 2024-06-26 16:56:06

领域: stat.ML,cond-mat.dis-nn,cs.LG

下载: http://arxiv.org/abs/2405.00592v3

Enhancing Federated Learning with Adaptive Differential Privacy and Priority-Based Aggregation

Federated learning (FL), a novel branch of distributed machine learning (ML), develops global models through a private procedure without direct access to local datasets. However, it is still possible to access the model updates (gradient updates of deep neural networks) transferred between clients and servers, potentially revealing sensitive local information to adversaries using model inversion attacks. Differential privacy (DP) offers a promising approach to addressing this issue by adding noise to the parameters. On the other hand, heterogeneities in data structure, storage, communication, and computational capabilities of devices can cause convergence problems and delays in developing the global model. A personalized weighted averaging of local parameters based on the resources of each device can yield a better aggregated model in each round. In this paper, to efficiently preserve privacy, we propose a personalized DP framework that injects noise based on clients' relative impact factors and aggregates parameters while considering heterogeneities and adjusting properties. To fulfill the DP requirements, we first analyze the convergence boundary of the FL algorithm when impact factors are personalized and fixed throughout the learning process. We then further study the convergence property considering time-varying (adaptive) impact factors.

Updated: 2024-06-26 16:55:07

标题: 使用自适应差分隐私和基于优先级的聚合增强联邦学习

摘要: 联邦学习（FL）是分布式机器学习（ML）的一个新颖分支，通过私有程序开发全局模型，而无需直接访问本地数据集。然而，仍然有可能访问在客户端和服务器之间传输的模型更新（深度神经网络的梯度更新），可能会通过模型反演攻击向对手揭示敏感的本地信息。差分隐私（DP）通过向参数添加噪声提供了一种解决这个问题的有希望的方法。另一方面，数据结构、存储、通信和设备计算能力的异质性可能导致全局模型的收敛问题和延迟。基于每个设备资源的个性化加权平均本地参数可以在每一轮中产生更好的聚合模型。在本文中，为了有效地保护隐私，我们提出了一个个性化的DP框架，根据客户的相对影响因素注入噪声并在考虑异质性和调整属性的情况下聚合参数。为了满足DP要求，我们首先分析了当影响因素在学习过程中个性化且固定时，FL算法的收敛边界。然后我们进一步研究了考虑时变（自适应）影响因素的收敛性质。

更新时间: 2024-06-26 16:55:07

领域: cs.LG,cs.CR,cs.DC

下载: http://arxiv.org/abs/2406.18491v1

Mixtures of Experts Unlock Parameter Scaling for Deep RL

The recent rapid progress in (self) supervised learning models is in large part predicted by empirical scaling laws: a model's performance scales proportionally to its size. Analogous scaling laws remain elusive for reinforcement learning domains, however, where increasing the parameter count of a model often hurts its final performance. In this paper, we demonstrate that incorporating Mixture-of-Expert (MoE) modules, and in particular Soft MoEs (Puigcerver et al., 2023), into value-based networks results in more parameter-scalable models, evidenced by substantial performance increases across a variety of training regimes and model sizes. This work thus provides strong empirical evidence towards developing scaling laws for reinforcement learning.

Updated: 2024-06-26 16:50:01

标题: 专家混合解锁深度强化学习的参数缩放

摘要: 最近在（自我）监督学习模型方面取得的快速进展在很大程度上是通过经验性扩展定律预测的：模型的性能与其大小成正比。然而，在强化学习领域中，类似的扩展定律仍然难以捉摸，增加模型的参数计数往往会损害其最终性能。在本文中，我们证明将专家混合（MoE）模块，尤其是Soft MoEs（Puigcerver等人，2023）合并到基于价值的网络中，可以得到更具参数可扩展性的模型，这在各种训练方案和模型大小下都表现为显著的性能提升。因此，这项工作为发展强化学习的扩展定律提供了强有力的经验证据。

更新时间: 2024-06-26 16:50:01

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2402.08609v3

VADA: a Data-Driven Simulator for Nanopore Sequencing

Nanopore sequencing offers the ability for real-time analysis of long DNA sequences at a low cost, enabling new applications such as early detection of cancer. Due to the complex nature of nanopore measurements and the high cost of obtaining ground truth datasets, there is a need for nanopore simulators. Existing simulators rely on handcrafted rules and parameters and do not learn an internal representation that would allow for analysing underlying biological factors of interest. Instead, we propose VADA, a purely data-driven method for simulating nanopores based on an autoregressive latent variable model. We embed subsequences of DNA and introduce a conditional prior to address the challenge of a collapsing conditioning. We introduce an auxiliary regressor on the latent variable to encourage our model to learn an informative latent representation. We empirically demonstrate that our model achieves competitive simulation performance on experimental nanopore data. Moreover, we show we have learned an informative latent representation that is predictive of the DNA labels. We hypothesize that other biological factors of interest, beyond the DNA labels, can potentially be extracted from such a learned latent representation.

Updated: 2024-06-26 16:46:19

标题: VADA：一种用于纳米孔测序的数据驱动模拟器

摘要: 纳米孔测序技术提供了实时分析长DNA序列的能力，成本低廉，可以实现新应用，如早期癌症检测。由于纳米孔测量的复杂性和获取地面真实数据集的高成本，需要纳米孔模拟器。现有的模拟器依赖手工制定的规则和参数，并且没有学习内部表示，无法分析感兴趣的生物学因素。相反，我们提出了VADA，一种基于自回归潜变量模型的纯数据驱动方法，用于模拟纳米孔。我们嵌入DNA的子序列，并引入条件先验以解决条件崩溃的挑战。我们在潜变量上引入一个辅助回归器，以鼓励我们的模型学习信息丰富的潜在表示。我们在实验纳米孔数据上实证证明，我们的模型在模拟性能上达到竞争水平。此外，我们展示了我们学习到的信息丰富的潜在表示与DNA标签有预测性。我们假设，除了DNA标签之外，其他感兴趣的生物学因素可能从这样一个学习到的潜在表示中提取出来。

更新时间: 2024-06-26 16:46:19

领域: q-bio.QM,cs.LG

下载: http://arxiv.org/abs/2404.08722v2

Robustness to Subpopulation Shift with Domain Label Noise via Regularized Annotation of Domains

Existing methods for last layer retraining that aim to optimize worst-group accuracy (WGA) rely heavily on well-annotated groups in the training data. We show, both in theory and practice, that annotation-based data augmentations using either downsampling or upweighting for WGA are susceptible to domain annotation noise, and in high-noise regimes approach the WGA of a model trained with vanilla empirical risk minimization. We introduce Regularized Annotation of Domains (RAD) in order to train robust last layer classifiers without the need for explicit domain annotations. Our results show that RAD is competitive with other recently proposed domain annotation-free techniques. Most importantly, RAD outperforms state-of-the-art annotation-reliant methods even with only 5% noise in the training data for several publicly available datasets.

Updated: 2024-06-26 16:35:16

标题: 通过域标签噪声的正则化领域注释实现子群体转变的鲁棒性

摘要: 现有的旨在优化最差组准确性（WGA）的最后一层重新训练方法严重依赖于训练数据中良好注释的组。我们在理论和实践中都表明，基于注释的数据增强（使用下采样或上权重）对WGA容易受到领域注释噪声的影响，并且在高噪声情况下接近使用基本经验风险最小化训练的模型的WGA。我们引入了正则化领域注释（RAD）以训练强大的最后一层分类器，而无需明确的领域注释。我们的结果表明，RAD与其他最近提出的无领域注释技术竞争力强。最重要的是，即使训练数据中只有5%的噪声，RAD在几个公开可用数据集上仍然优于最先进的依赖注释的方法。

更新时间: 2024-06-26 16:35:16

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2402.11039v2

ReLU Neural Networks with Linear Layers are Biased Towards Single- and Multi-Index Models

Neural networks often operate in the overparameterized regime, in which there are far more parameters than training samples, allowing the training data to be fit perfectly. That is, training the network effectively learns an interpolating function, and properties of the interpolant affect predictions the network will make on new samples. This manuscript explores how properties of such functions learned by neural networks of depth greater than two layers. Our framework considers a family of networks of varying depths that all have the same capacity but different representation costs. The representation cost of a function induced by a neural network architecture is the minimum sum of squared weights needed for the network to represent the function; it reflects the function space bias associated with the architecture. Our results show that adding additional linear layers to the input side of a shallow ReLU network yields a representation cost favoring functions with low mixed variation - that is, it has limited variation in directions orthogonal to a low-dimensional subspace and can be well approximated by a single- or multi-index model. Such functions may be represented by the composition of a function with low two-layer representation cost and a low-rank linear operator. Our experiments confirm this behavior in standard network training regimes. They additionally show that linear layers can improve generalization and the learned network is well-aligned with the true latent low-dimensional linear subspace when data is generated using a multi-index model.

Updated: 2024-06-26 16:29:56

标题: 带有线性层的ReLU神经网络偏向于单指标和多指标模型

摘要: 神经网络通常在过参数化的范围内运行，即参数远远多于训练样本，使得训练数据可以完美拟合。换言之，有效训练网络可以学习插值函数，插值函数的性质会影响网络对新样本的预测。本文探讨了深度大于两层的神经网络学习函数的性质。我们的框架考虑了一系列深度不同但容量相同的网络，它们具有不同的表示成本。由神经网络架构引起的函数的表示成本是网络表示函数所需的最小平方权重之和；它反映了与架构相关的函数空间偏差。我们的结果表明，将额外的线性层添加到浅层ReLU网络的输入端会导致表示成本偏向具有低混合变化的函数 - 即，在低维子空间正交方向上具有有限变化，并且可以被单索引或多索引模型很好地近似。这样的函数可以由具有低两层表示成本和低秩线性算子的组合来表示。我们的实验在标准网络训练范围内证实了这种行为。此外，它们还表明线性层可以改善泛化能力，并且当数据使用多索引模型生成时，学习到的网络与真实的潜在低维线性子空间良好对齐。

更新时间: 2024-06-26 16:29:56

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2305.15598v3

UniRec: A Dual Enhancement of Uniformity and Frequency in Sequential Recommendations

Representation learning in sequential recommendation is critical for accurately modeling user interaction patterns and improving recommendation precision. However, existing approaches predominantly emphasize item-to-item transitions, often neglecting the time intervals between interactions, which are closely related to behavior pattern changes. Additionally, broader interaction attributes, such as item frequency, are frequently overlooked. We found that both sequences with more uniform time intervals and items with higher frequency yield better prediction performance. Conversely, non-uniform sequences exacerbate user interest drift and less-frequent items are difficult to model due to sparse sampling, presenting unique challenges inadequately addressed by current methods. In this paper, we propose UniRec, a novel bidirectional enhancement sequential recommendation method. UniRec leverages sequence uniformity and item frequency to enhance performance, particularly improving the representation of non-uniform sequences and less-frequent items. These two branches mutually reinforce each other, driving comprehensive performance optimization in complex sequential recommendation scenarios. Additionally, we present a multidimensional time module to further enhance adaptability. To the best of our knowledge, UniRec is the first method to utilize the characteristics of uniformity and frequency for feature augmentation. Comparing with eleven advanced models across four datasets, we demonstrate that UniRec outperforms SOTA models significantly. The code is available at https://github.com/Linxi000/UniRec.

Updated: 2024-06-26 16:28:24

标题: UniRec：顺序推荐中统一性和频率的双重增强

摘要: 顺序推荐中的表示学习对于准确建模用户互动模式和提高推荐精度至关重要。然而，现有方法主要强调物品之间的转换，通常忽视互动之间的时间间隔，这与行为模式变化密切相关。此外，广泛的交互属性，如物品频率，经常被忽视。我们发现，具有更均匀时间间隔的序列和频率较高的物品可以提供更好的预测性能。相反，非均匀序列加剧用户兴趣漂移，频率较低的物品由于稀疏采样而难以建模，这些都是当前方法未能充分解决的独特挑战。在本文中，我们提出了UniRec，一种新颖的双向增强顺序推荐方法。UniRec利用序列的均匀性和物品的频率来增强性能，特别是改进了对非均匀序列和频率较低物品的表示。这两个分支相互加强，推动了在复杂的顺序推荐场景中的全面性能优化。此外，我们提出了一个多维时间模块来进一步增强适应性。据我们所知，UniRec是第一个利用均匀性和频率特征增强的方法。通过在四个数据集上与十一个先进模型进行比较，我们证明UniRec明显优于SOTA模型。代码可在https://github.com/Linxi000/UniRec上找到。

更新时间: 2024-06-26 16:28:24

领域: cs.IR,cs.LG,H.3.3; I.2.6

下载: http://arxiv.org/abs/2406.18470v1

Fair, Manipulation-Robust, and Transparent Sortition

Sortition, the random selection of political representatives, is increasingly being used around the world to choose participants of deliberative processes like Citizens' Assemblies. Motivated by sortition's practical importance, there has been a recent flurry of research on sortition algorithms, whose task it is to select a panel from among a pool of volunteers. This panel must satisfy quotas enforcing representation of key population subgroups. Past work has contributed an algorithmic approach for fulfilling this task while ensuring that volunteers' chances of selection are maximally equal, as measured by any convex equality objective. The question, then, is: which equality objective is the right one? Past work has mainly studied the objectives Minimax and Leximin, which respectively minimize the maximum and maximize the minimum chance of selection given to any volunteer. Recent work showed that both of these objectives have key weaknesses: Minimax is highly robust to manipulation but is arbitrarily unfair; oppositely, Leximin is highly fair but arbitrarily manipulable. In light of this gap, we propose a new equality objective, Goldilocks, that aims to achieve these ideals simultaneously by ensuring that no volunteer receives too little or too much chance of selection. We theoretically bound the extent to which Goldilocks achieves these ideals, finding that in an important sense, Goldilocks recovers among the best available solutions in a given instance. We then extend our bounds to the case where the output of Goldilocks is transformed to achieve a third goal, Transparency. Our empirical analysis of Goldilocks in real data is even more promising: we find that this objective achieves nearly instance-optimal minimum and maximum selection probabilities simultaneously in most real instances -- an outcome not even guaranteed to be possible for any algorithm.

Updated: 2024-06-26 16:26:50

标题: 公平、抗操纵和透明的分配

摘要: 随机抽签，即选择政治代表的随机方式，越来越被世界各地用来选择参与公民议会等协商过程的参与者。受到抽签实践重要性的推动，最近对抽签算法进行了大量研究，其任务是从志愿者池中选择一个小组。这个小组必须满足强制代表关键人口子群的配额。过去的工作为实现这一任务提供了一种算法方法，同时确保志愿者被选择的机会尽可能平等，任何凸平等目标都可以衡量。那么问题来了：哪个平等目标才是正确的？过去的工作主要研究了极小化和极大化这两个目标，分别是最小化和最大化任何志愿者被选择的机会。最近的研究显示，这两个目标都存在关键的弱点：极小化对操纵非常强大，但是任意不公平；相反，极大化非常公平，但是任意可操纵。鉴于这一差距，我们提出了一个新的平等目标，即“金发女孩”，旨在通过确保没有志愿者获得太少或太多的选择机会来同时实现这些理想。我们在理论上界定了“金发女孩”实现这些理想的程度，发现在一个重要意义上，“金发女孩”在给定实例中实现了最好的解决方案之一。然后，我们将我们的界限扩展到“金发女孩”的输出被转换以实现第三个目标——透明度的情况。我们对“金发女孩”在真实数据中的实证分析更加令人期待：我们发现这一目标几乎在大多数真实实例中同时实现了最小和最大选择概率的最佳实例——这甚至对于任何算法来说都不是必然的。

更新时间: 2024-06-26 16:26:50

领域: cs.AI

下载: http://arxiv.org/abs/2406.15009v2

ProFLingo: A Fingerprinting-based Intellectual Property Protection Scheme for Large Language Models

Large language models (LLMs) have attracted significant attention in recent years. Due to their "Large" nature, training LLMs from scratch consumes immense computational resources. Since several major players in the artificial intelligence (AI) field have open-sourced their original LLMs, an increasing number of individual researchers and smaller companies are able to build derivative LLMs based on these open-sourced models at much lower costs. However, this practice opens up possibilities for unauthorized use or reproduction that may not comply with licensing agreements, and fine-tuning can change the model's behavior, thus complicating the determination of model ownership. Current intellectual property (IP) protection schemes for LLMs are either designed for white-box settings or require additional modifications to the original model, which restricts their use in real-world settings. In this paper, we propose ProFLingo, a black-box fingerprinting-based IP protection scheme for LLMs. ProFLingo generates queries that elicit specific responses from an original model, thereby establishing unique fingerprints. Our scheme assesses the effectiveness of these queries on a suspect model to determine whether it has been derived from the original model. ProFLingo offers a non-invasive approach, which neither requires knowledge of the suspect model nor modifications to the base model or its training process. To the best of our knowledge, our method represents the first black-box fingerprinting technique for IP protection for LLMs. Our source code and generated queries are available at: https://github.com/hengvt/ProFLingo.

Updated: 2024-06-26 16:22:43

标题: ProFLingo：基于指纹识别的大型语言模型知识产权保护方案

摘要: 大型语言模型（LLMs）近年来受到了广泛关注。由于其“大型”特性，从头开始训练LLMs需要消耗大量计算资源。由于一些人工智能（AI）领域的主要参与者已经开源了他们的原始LLMs，越来越多的个人研究人员和小公司能够基于这些开源模型以更低的成本构建衍生LLMs。然而，这种做法可能会导致未经授权的使用或复制，而这可能不符合许可协议，并且微调可能会改变模型的行为，从而使确定模型所有权变得更加复杂。当前针对LLMs的知识产权（IP）保护方案要么设计用于白盒环境，要么需要对原始模型进行额外修改，这限制了它们在实际环境中的使用。在本文中，我们提出了ProFLingo，一种基于黑盒指纹识别的LLMs知识产权保护方案。ProFLingo生成特定响应的查询，从而建立独特的指纹。我们的方案评估这些查询对疑犯模型的有效性，以确定它是否源自原始模型。ProFLingo提供了一种无侵入性的方法，既不需要了解疑犯模型，也不需要对基础模型或其训练过程进行修改。据我们所知，我们的方法代表了LLMs的知识产权保护的首个黑盒指纹识别技术。我们的源代码和生成的查询可在以下网址找到：https://github.com/hengvt/ProFLingo。

更新时间: 2024-06-26 16:22:43

领域: cs.CR,cs.LG

下载: http://arxiv.org/abs/2405.02466v2

Bayesian inverse Navier-Stokes problems: joint flow field reconstruction and parameter learning

We formulate and solve a Bayesian inverse Navier-Stokes (N-S) problem that assimilates velocimetry data in order to jointly reconstruct a 3D flow field and learn the unknown N-S parameters, including the boundary position. By hardwiring a generalised N-S problem, and regularising its unknown parameters using Gaussian prior distributions, we learn the most likely parameters in a collapsed search space. The most likely flow field reconstruction is then the N-S solution that corresponds to the learned parameters. We develop the method in the variational setting and use a stabilised Nitsche weak form of the N-S problem that permits the control of all N-S parameters. To regularise the inferred the geometry, we use a viscous signed distance field (vSDF) as an auxiliary variable, which is given as the solution of a viscous Eikonal boundary value problem. We devise an algorithm that solves this inverse problem, and numerically implement it using an adjoint-consistent stabilised cut-cell finite element method. We then use this method to reconstruct magnetic resonance velocimetry (flow-MRI) data of a 3D steady laminar flow through a physical model of an aortic arch for two different Reynolds numbers and signal-to-noise ratio (SNR) levels (low/high). We find that the method can accurately i) reconstruct the low SNR data by filtering out the noise/artefacts and recovering flow features that are obscured by noise, and ii) reproduce the high SNR data without overfitting. Although the framework that we develop applies to 3D steady laminar flows in complex geometries, it readily extends to time-dependent laminar and Reynolds-averaged turbulent flows, as well as non-Newtonian (e.g. viscoelastic) fluids.

Updated: 2024-06-26 16:16:36

标题: 贝叶斯反向Navier-Stokes问题：联合流场重建和参数学习

摘要: 我们制定并解决了一个贝叶斯逆纳维尔-斯托克斯（N-S）问题，该问题同化速度测量数据，以共同重构3D流场并学习未知的N-S参数，包括边界位置。通过硬编码一个广义N-S问题，并使用高斯先验分布对其未知参数进行正则化，我们在折叠搜索空间中学习到最可能的参数。然后，最可能的流场重构是与学习到的参数对应的N-S解。我们在变分设置中开发了这种方法，并使用稳定的Nitsche弱形式的N-S问题，该方法允许控制所有N-S参数。为了正则化推断的几何形状，我们使用粘性符号距离场（vSDF）作为辅助变量，该变量作为粘性Eikonal边值问题的解给出。我们设计了一个解决这个逆问题的算法，并使用一个伴随一致的稳定切单元有限元方法进行了数值实现。然后，我们使用这种方法重构了通过主动脉弓物理模型的3D稳定层流的磁共振测速（流MRI）数据，对于两个不同的雷诺数和信噪比（低/高）。我们发现该方法能够准确地i) 通过滤除噪音/伪影并恢复被噪音遮蔽的流动特征来重构低信噪比数据，以及ii) 在不过度拟合的情况下重现高信噪比数据。尽管我们开发的框架适用于复杂几何形状中的3D稳定层流，但它很容易扩展到时变层流和雷诺平均湍流流动，以及非牛顿（例如粘弹性）流体。

更新时间: 2024-06-26 16:16:36

领域: physics.flu-dyn,cs.LG,math.OC

下载: http://arxiv.org/abs/2406.18464v1

Large Knowledge Model: Perspectives and Challenges

Humankind's understanding of the world is fundamentally linked to our perception and cognition, with \emph{human languages} serving as one of the major carriers of \emph{world knowledge}. In this vein, \emph{Large Language Models} (LLMs) like ChatGPT epitomize the pre-training of extensive, sequence-based world knowledge into neural networks, facilitating the processing and manipulation of this knowledge in a parametric space. This article explores large models through the lens of "knowledge". We initially investigate the role of symbolic knowledge such as Knowledge Graphs (KGs) in enhancing LLMs, covering aspects like knowledge-augmented language model, structure-inducing pre-training, knowledgeable prompts, structured CoT, knowledge editing, semantic tools for LLM and knowledgeable AI agents. Subsequently, we examine how LLMs can boost traditional symbolic knowledge bases, encompassing aspects like using LLM as KG builder and controller, structured knowledge pretraining, and LLM-enhanced symbolic reasoning. Considering the intricate nature of human knowledge, we advocate for the creation of \emph{Large Knowledge Models} (LKM), specifically engineered to manage diversified spectrum of knowledge structures. This promising undertaking would entail several key challenges, such as disentangling knowledge base from language models, cognitive alignment with human knowledge, integration of perception and cognition, and building large commonsense models for interacting with physical world, among others. We finally propose a five-"A" principle to distinguish the concept of LKM.

Updated: 2024-06-26 16:11:55

标题: 大型知识模型：视角与挑战

摘要: 人类对世界的理解基本上与我们的感知和认知密切相关，\emph{人类语言}作为\emph{世界知识}的主要载体之一。在这方面，像ChatGPT这样的\emph{大型语言模型}（LLMs）体现了将广泛的基于序列的世界知识预训练到神经网络中，促进了这些知识在参数空间中的处理和操作。本文通过“知识”这一视角探讨大型模型。我们最初研究了符号知识（如知识图谱（KGs））在增强LLMs方面的作用，涵盖了增强知识的语言模型、结构诱导预训练、知识提示、结构化CoT、知识编辑、LLM的语义工具和知识型AI代理等方面。随后，我们研究了LLMs如何提高传统符号知识库，包括使用LLM作为KG构建器和控制器、结构化知识预训练以及LLM增强的符号推理等方面。考虑到人类知识的复杂性，我们主张创建\emph{大型知识模型}（LKM），专门设计用于管理多样化的知识结构。这一有前途的工作将涉及几个关键挑战，如将知识库与语言模型区分开来、与人类知识进行认知对齐、融合感知和认知，以及为与物理世界互动构建大型常识模型等。最后，我们提出了一个五个“A”原则，以区分LKM的概念。

更新时间: 2024-06-26 16:11:55

领域: cs.AI,cs.CL

下载: http://arxiv.org/abs/2312.02706v2

Role-Play Zero-Shot Prompting with Large Language Models for Open-Domain Human-Machine Conversation

Recently, various methods have been proposed to create open-domain conversational agents with Large Language Models (LLMs). These models are able to answer user queries, but in a one-way Q&A format rather than a true conversation. Fine-tuning on particular datasets is the usual way to modify their style to increase conversational ability, but this is expensive and usually only available in a few languages. In this study, we explore role-play zero-shot prompting as an efficient and cost-effective solution for open-domain conversation, using capable multilingual LLMs (Beeching et al., 2023) trained to obey instructions. We design a prompting system that, when combined with an instruction-following model - here Vicuna (Chiang et al., 2023) - produces conversational agents that match and even surpass fine-tuned models in human evaluation in French in two different tasks.

Updated: 2024-06-26 16:10:53

标题: 使用大型语言模型进行零样本提示的角色扮演，在开放领域人机对话中的作用

摘要: 最近，各种方法已被提出来创建具有大型语言模型（LLMs）的开放领域对话代理。这些模型能够回答用户的查询，但以一种单向问答格式而非真正的对话。在特定数据集上进行微调是修改它们的风格以增加对话能力的常规方法，但这是昂贵的且通常只在少数语言中可用。在这项研究中，我们探索了零样本提示角色扮演作为一种高效且具有成本效益的解决方案，用于开放领域对话，使用能够遵守指令的多语言LLMs（Beeching等，2023）进行训练。我们设计了一个提示系统，当与一个遵循指令的模型结合在一起 - 这里是Vicuna（Chiang等，2023） - 产生了在两个不同任务中在法语中与甚至超越微调模型的人类评估中匹配的对话代理。

更新时间: 2024-06-26 16:10:53

领域: cs.CL,cs.AI,cs.HC

下载: http://arxiv.org/abs/2406.18460v1

MultiAgent Collaboration Attack: Investigating Adversarial Attacks in Large Language Model Collaborations via Debate

Large Language Models (LLMs) have shown exceptional results on current benchmarks when working individually. The advancement in their capabilities, along with a reduction in parameter size and inference times, has facilitated the use of these models as agents, enabling interactions among multiple models to execute complex tasks. Such collaborations offer several advantages, including the use of specialized models (e.g. coding), improved confidence through multiple computations, and enhanced divergent thinking, leading to more diverse outputs. Thus, the collaborative use of language models is expected to grow significantly in the coming years. In this work, we evaluate the behavior of a network of models collaborating through debate under the influence of an adversary. We introduce pertinent metrics to assess the adversary's effectiveness, focusing on system accuracy and model agreement. Our findings highlight the importance of a model's persuasive ability in influencing others. Additionally, we explore inference-time methods to generate more compelling arguments and evaluate the potential of prompt-based mitigation as a defensive strategy.

Updated: 2024-06-26 16:05:20

标题: 多智能体协作攻击：通过辩论调查大型语言模型协作中的对抗性攻击

摘要: 大型语言模型（LLMs）在单独工作时表现出色，已在当前基准测试中展现出卓越的结果。它们的能力不断提升，同时参数规模和推理时间的减少促进了将这些模型作为代理的使用，使多个模型之间能够相互交互以执行复杂任务成为可能。这种合作提供了几个优势，包括使用专门的模型（例如编码），通过多次计算提高信心，以及增强分歧思维，从而产生更多样化的输出。因此，预计语言模型的协作使用将在未来几年显著增长。在这项工作中，我们评估了一个网络模型在受到对手影响下通过辩论合作的行为。我们引入相关指标来评估对手的有效性，重点关注系统准确性和模型一致性。我们的研究结果突出了模型的说服能力在影响他人方面的重要性。此外，我们探讨了推理时间方法来生成更具有说服力的论点，并评估基于提示的缓解作为一种防御策略的潜力。

更新时间: 2024-06-26 16:05:20

领域: cs.CL,cs.AI,cs.MA

下载: http://arxiv.org/abs/2406.14711v2

Detecting Brittle Decisions for Free: Leveraging Margin Consistency in Deep Robust Classifiers

Despite extensive research on adversarial training strategies to improve robustness, the decisions of even the most robust deep learning models can still be quite sensitive to imperceptible perturbations, creating serious risks when deploying them for high-stakes real-world applications. While detecting such cases may be critical, evaluating a model's vulnerability at a per-instance level using adversarial attacks is computationally too intensive and unsuitable for real-time deployment scenarios. The input space margin is the exact score to detect non-robust samples and is intractable for deep neural networks. This paper introduces the concept of margin consistency -- a property that links the input space margins and the logit margins in robust models -- for efficient detection of vulnerable samples. First, we establish that margin consistency is a necessary and sufficient condition to use a model's logit margin as a score for identifying non-robust samples. Next, through comprehensive empirical analysis of various robustly trained models on CIFAR10 and CIFAR100 datasets, we show that they indicate strong margin consistency with a strong correlation between their input space margins and the logit margins. Then, we show that we can effectively use the logit margin to confidently detect brittle decisions with such models and accurately estimate robust accuracy on an arbitrarily large test set by estimating the input margins only on a small subset. Finally, we address cases where the model is not sufficiently margin-consistent by learning a pseudo-margin from the feature representation. Our findings highlight the potential of leveraging deep representations to efficiently assess adversarial vulnerability in deployment scenarios.

Updated: 2024-06-26 16:00:35

标题: 免费检测易碎决策：利用深度鲁棒分类器中的边际一致性

摘要: 尽管关于采用对抗训练策略以提高鲁棒性的研究已经很广泛，但即使是最鲁棒的深度学习模型的决策仍然对不可察觉的扰动非常敏感，这在将它们部署到高风险的现实应用中会带来严重风险。虽然检测这种情况可能至关重要，但使用对抗攻击在逐个实例级别评估模型的脆弱性在计算上过于密集，不适合实时部署场景。输入空间边界是检测非鲁棒样本的确切分数，并且对于深度神经网络来说是难以处理的。本文介绍了边界一致性的概念--一种将输入空间边界和鲁棒模型中的逻辑边界联系起来的属性--以有效检测脆弱样本。首先，我们建立了边界一致性是将模型的逻辑边界作为识别非鲁棒样本得分的必要和充分条件。接下来，通过对CIFAR10和CIFAR100数据集上各种鲁棒训练模型进行全面的实证分析，我们展示了它们显示出强边界一致性，其输入空间边界和逻辑边界之间有很强的相关性。然后，我们展示了我们可以有效地使用逻辑边界来自信地检测这些模型中的脆弱决策，并通过仅在小子集上估计输入边界来准确估计任意大的测试集上的鲁棒准确性。最后，我们解决了模型不足够边界一致的情况，通过从特征表示中学习伪边界。我们的发现突显了利用深度表示来有效评估部署场景中的对抗脆弱性的潜力。

更新时间: 2024-06-26 16:00:35

领域: cs.LG,cs.AI,cs.CV

下载: http://arxiv.org/abs/2406.18451v1

Preference Elicitation for Offline Reinforcement Learning

Applying reinforcement learning (RL) to real-world problems is often made challenging by the inability to interact with the environment and the difficulty of designing reward functions. Offline RL addresses the first challenge by considering access to an offline dataset of environment interactions labeled by the reward function. In contrast, Preference-based RL does not assume access to the reward function and learns it from preferences, but typically requires an online interaction with the environment. We bridge the gap between these frameworks by exploring efficient methods for acquiring preference feedback in a fully offline setup. We propose Sim-OPRL, an offline preference-based reinforcement learning algorithm, which leverages a learned environment model to elicit preference feedback on simulated rollouts. Drawing on insights from both the offline RL and the preference-based RL literature, our algorithm employs a pessimistic approach for out-of-distribution data, and an optimistic approach for acquiring informative preferences about the optimal policy. We provide theoretical guarantees regarding the sample complexity of our approach, dependent on how well the offline data covers the optimal policy. Finally, we demonstrate the empirical performance of Sim-OPRL in different environments.

Updated: 2024-06-26 15:59:13

标题: 离线强化学习中的偏好引导

摘要: 将强化学习（RL）应用于现实世界问题通常面临的挑战是无法与环境进行交互并且难以设计奖励函数。离线RL通过考虑访问由奖励函数标记的离线数据集来解决第一个挑战。相比之下，基于偏好的RL并不假设访问奖励函数，而是通过偏好学习奖励函数，但通常需要与环境进行在线交互。我们通过探索在完全离线设置中获取偏好反馈的高效方法来弥合这些框架之间的差距。我们提出了Sim-OPRL，一种离线基于偏好的强化学习算法，利用学习的环境模型在模拟推演中引出偏好反馈。借鉴离线RL和基于偏好的RL文献的见解，我们的算法对于分布外数据采取悲观的方法，并对获取关于最优政策的信息偏好采取乐观的方法。我们提供了关于我们方法的样本复杂性的理论保证，取决于离线数据覆盖最优政策的程度。最后，我们展示了Sim-OPRL在不同环境中的实证表现。

更新时间: 2024-06-26 15:59:13

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2406.18450v1

ToM-LM: Delegating Theory of Mind Reasoning to External Symbolic Executors in Large Language Models

Theory of Mind (ToM) refers to the ability of individuals to attribute mental states to others. While Large Language Models (LLMs) have shown some promise with ToM ability, they still struggle with complex ToM reasoning. Our approach leverages an external symbolic executor, specifically the SMCDEL model checker, and fine-tuning to improve the ToM reasoning ability of LLMs. In our approach, an LLM is first fine-tuned through pairs of natural language and symbolic formulation representation of ToM problems and is then instructed to generate the symbolic formulation with a one-shot in-context example. The generated symbolic formulation is then executed by the SMCDEL model checker to perform transparent and verifiable ToM reasoning and give the final result. We demonstrate that our approach, ToM-LM, shows a significant improvement over all the constructed baselines. Our study proposes a novel view about externalizing a particular component of ToM reasoning, mainly reasoning about beliefs, and suggests generalizing it to other aspects of ToM reasoning.

Updated: 2024-06-26 15:57:22

标题: ToM-LM：将心理理论推理委托给大型语言模型中的外部符号执行器

摘要: 心智理论（ToM）指的是个体将心理状态归因于他人的能力。虽然大型语言模型（LLMs）在ToM能力方面表现出一定的潜力，但它们仍然在复杂的ToM推理中遇到困难。我们的方法利用外部符号执行器，特别是SMCDEL模型检查器，并通过微调来提高LLMs的ToM推理能力。在我们的方法中，LLM首先通过ToM问题的自然语言和符号化表达的配对进行微调，然后被要求生成带有一次性上下文示例的符号化表达。生成的符号化表达然后由SMCDEL模型检查器执行，进行透明和可验证的ToM推理并给出最终结果。我们证明了我们的方法ToM-LM在所有构建的基线上都显示出显着的改进。我们的研究提出了一种关于外部化ToM推理特定组成部分的新观点，主要是关于信念的推理，并建议将其推广到ToM推理的其他方面。

更新时间: 2024-06-26 15:57:22

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2404.15515v3

Normalizing Flows for Conformal Regression

Conformal Prediction (CP) algorithms estimate the uncertainty of a prediction model by calibrating its outputs on labeled data. The same calibration scheme usually applies to any model and data without modifications. The obtained prediction intervals are valid by construction but could be inefficient, i.e. unnecessarily big, if the prediction errors are not uniformly distributed over the input space. We present a general scheme to localize the intervals by training the calibration process. The standard prediction error is replaced by an optimized distance metric that depends explicitly on the object attributes. Learning the optimal metric is equivalent to training a Normalizing Flow that acts on the joint distribution of the errors and the inputs. Unlike the Error Reweighting CP algorithm of Papadopoulos et al. (2008), the framework allows estimating the gap between nominal and empirical conditional validity. The approach is compatible with existing locally-adaptive CP strategies based on re-weighting the calibration samples and applies to any point-prediction model without retraining.

Updated: 2024-06-26 15:55:02

标题: 规范回归的正规流

摘要: Conformal Prediction（CP）算法通过在标记数据上校准其输出来估计预测模型的不确定性。相同的校准方案通常适用于任何模型和数据，无需修改。获得的预测区间是通过构造有效的，但如果预测误差在输入空间中不均匀分布，则可能会低效，即不必要地大。我们提出了一种通过训练校准过程来定位区间的通用方案。标准预测误差被一个依赖于对象属性的优化距离度量取代。学习最佳度量相当于训练一个作用于错误和输入的联合分布的归一化流。与Papadopoulos等人（2008年）的错误重新加权CP算法不同，该框架允许估计名义和实证条件有效性之间的差距。该方法与现有的基于重新加权校准样本的局部自适应CP策略兼容，并适用于任何点预测模型，无需重新训练。

更新时间: 2024-06-26 15:55:02

领域: cs.LG,math.PR,stat.ML

下载: http://arxiv.org/abs/2406.03346v2

Cascading Large Language Models for Salient Event Graph Generation

Generating event graphs from long documents is challenging due to the inherent complexity of multiple tasks involved such as detecting events, identifying their relationships, and reconciling unstructured input with structured graphs. Recent studies typically consider all events with equal importance, failing to distinguish salient events crucial for understanding narratives. This paper presents CALLMSAE, a CAscading Large Language Model framework for SAlient Event graph generation, which leverages the capabilities of LLMs and eliminates the need for costly human annotations. We first identify salient events by prompting LLMs to generate summaries, from which salient events are identified. Next, we develop an iterative code refinement prompting strategy to generate event relation graphs, removing hallucinated relations and recovering missing edges. Fine-tuning contextualised graph generation models on the LLM-generated graphs outperforms the models trained on CAEVO-generated data. Experimental results on a human-annotated test set show that the proposed method generates salient and more accurate graphs, outperforming competitive baselines.

Updated: 2024-06-26 15:53:54

标题: 基于大型语言模型的突出事件图生成级联方法

摘要: 从长篇文档生成事件图是一项具有挑战性的任务，因为涉及多个任务的固有复杂性，例如检测事件、识别它们之间的关系，并将非结构化输入与结构化图形进行协调。最近的研究通常将所有事件视为具有相等重要性，未能区分对理解叙述至关重要的显著事件。本文提出了CALLMSAE，一个用于生成突出事件图的CAscading Large Language Model框架，利用LLM的能力并消除了昂贵的人工标注的需求。我们首先通过提示LLM生成摘要来识别显著事件，从中识别出显著事件。接下来，我们开发了一个迭代的代码细化提示策略来生成事件关系图，消除了虚构的关系并恢复了缺失的边缘。在LLM生成的图上微调上下文化图生成模型的性能优于在CAEVO生成的数据上训练的模型。对人工注释的测试集的实验结果表明，所提出的方法生成了突出且更准确的图形，优于竞争基线。

更新时间: 2024-06-26 15:53:54

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2406.18449v1

ICTSurF: Implicit Continuous-Time Survival Functions with Neural Networks

Survival analysis is a widely known method for predicting the likelihood of an event over time. The challenge of dealing with censored samples still remains. Traditional methods, such as the Cox Proportional Hazards (CPH) model, hinge on the limitations due to the strong assumptions of proportional hazards and the predetermined relationships between covariates. The rise of models based on deep neural networks (DNNs) has demonstrated enhanced effectiveness in survival analysis. This research introduces the Implicit Continuous-Time Survival Function (ICTSurF), built on a continuous-time survival model, and constructs survival distribution through implicit representation. As a result, our method is capable of accepting inputs in continuous-time space and producing survival probabilities in continuous-time space, independent of neural network architecture. Comparative assessments with existing methods underscore the high competitiveness of our proposed approach. Our implementation of ICTSurF is available at https://github.com/44REAM/ICTSurF.

Updated: 2024-06-26 15:51:44

标题: ICTSurF：使用神经网络的隐式连续时间生存函数

摘要: 生存分析是一种广泛知名的方法，用于预测随着时间发生的事件的可能性。处理被截尾样本的挑战仍然存在。传统方法，如Cox比例风险（CPH）模型，受到比例风险的强假设和协变量之间预先确定关系的限制。基于深度神经网络（DNNs）的模型的崛起已经证明在生存分析中具有增强的效果。本研究介绍了基于连续时间生存模型的Implicit Continuous-Time Survival Function（ICTSurF），并通过隐式表示构建生存分布。因此，我们的方法能够接受连续时间空间的输入，并在连续时间空间生成生存概率，与神经网络架构无关。与现有方法的比较评估强调了我们提出的方法的高竞争力。我们的ICTSurF实现可在https://github.com/44REAM/ICTSurF上找到。

更新时间: 2024-06-26 15:51:44

领域: cs.LG

下载: http://arxiv.org/abs/2312.05818v2

An Autotuning-based Optimization Framework for Mixed-kernel SVM Classifications in Smart Pixel Datasets and Heterojunction Transistors

Support Vector Machine (SVM) is a state-of-the-art classification method widely used in science and engineering due to its high accuracy, its ability to deal with high dimensional data, and its flexibility in modeling diverse sources of data. In this paper, we propose an autotuning-based optimization framework to quantify the ranges of hyperparameters in SVMs to identify their optimal choices, and apply the framework to two SVMs with the mixed-kernel between Sigmoid and Gaussian kernels for smart pixel datasets in high energy physics (HEP) and mixed-kernel heterojunction transistors (MKH). Our experimental results show that the optimal selection of hyperparameters in the SVMs and the kernels greatly varies for different applications and datasets, and choosing their optimal choices is critical for a high classification accuracy of the mixed kernel SVMs. Uninformed choices of hyperparameters C and coef0 in the mixed-kernel SVMs result in severely low accuracy, and the proposed framework effectively quantifies the proper ranges for the hyperparameters in the SVMs to identify their optimal choices to achieve the highest accuracy 94.6\% for the HEP application and the highest average accuracy 97.2\% with far less tuning time for the MKH application.

Updated: 2024-06-26 15:50:13

标题: 一个基于自调整的优化框架：用于智能像素数据集和异质结晶体管中混合核SVM分类的方法

摘要: 支持向量机（SVM）是一种先进的分类方法，在科学和工程领域被广泛应用，因为它具有高准确性、处理高维数据的能力以及在建模不同数据源方面的灵活性。本文提出了一种基于自动调优的优化框架，用于量化SVM中超参数的范围，以确定它们的最佳选择，并将该框架应用于两种SVM，分别为在高能物理（HEP）中智能像素数据集和混合核异质结晶体管（MKH）中的Sigmoid和高斯核之间的混合核。我们的实验结果显示，在不同应用和数据集中，SVM和核中超参数的最佳选择差异很大，选择它们的最佳选择对于混合核SVM的高分类准确性至关重要。在混合核SVM中未知的超参数C和coef0选择会导致严重低准确性，而提出的框架有效地量化了SVM中超参数的适当范围，以确定它们的最佳选择，从而实现HEP应用的最高准确率94.6％和MKH应用的最高平均准确率97.2％，且调整时间要少得多。

更新时间: 2024-06-26 15:50:13

领域: cs.LG,cs.PF

下载: http://arxiv.org/abs/2406.18445v1

DiarizationLM: Speaker Diarization Post-Processing with Large Language Models

In this paper, we introduce DiarizationLM, a framework to leverage large language models (LLM) to post-process the outputs from a speaker diarization system. Various goals can be achieved with the proposed framework, such as improving the readability of the diarized transcript, or reducing the word diarization error rate (WDER). In this framework, the outputs of the automatic speech recognition (ASR) and speaker diarization systems are represented as a compact textual format, which is included in the prompt to an optionally finetuned LLM. The outputs of the LLM can be used as the refined diarization results with the desired enhancement. As a post-processing step, this framework can be easily applied to any off-the-shelf ASR and speaker diarization systems without retraining existing components. Our experiments show that a finetuned PaLM 2-S model can reduce the WDER by rel. 55.5% on the Fisher telephone conversation dataset, and rel. 44.9% on the Callhome English dataset.

Updated: 2024-06-26 15:49:05

标题: DiarizationLM：使用大型语言模型进行说话人辨识后处理

摘要: 在这篇论文中，我们介绍了DiarizationLM，这是一个利用大型语言模型（LLM）来对说话者辨别系统的输出进行后处理的框架。提出的框架可以实现各种目标，比如改善辨别后的转录的可读性，或者降低词辨别错误率（WDER）。在这个框架中，自动语音识别（ASR）和说话者辨别系统的输出被表示为一个紧凑的文本格式，这个格式包含在一个可选的经过微调的LLM的提示中。LLM的输出可以作为经过改进的辨别结果来使用。作为一个后处理步骤，这个框架可以轻松地应用于任何现成的ASR和说话者辨别系统，无需重新训练现有组件。我们的实验表明，经过微调的PaLM 2-S模型可以在Fisher电话对话数据集上将WDER降低55.5％，在Callhome英语数据集上降低44.9％。

更新时间: 2024-06-26 15:49:05

领域: eess.AS,cs.LG,cs.SD

下载: http://arxiv.org/abs/2401.03506v6

RL in Latent MDPs is Tractable: Online Guarantees via Off-Policy Evaluation

In many real-world decision problems there is partially observed, hidden or latent information that remains fixed throughout an interaction. Such decision problems can be modeled as Latent Markov Decision Processes (LMDPs), where a latent variable is selected at the beginning of an interaction and is not disclosed to the agent. In the last decade, there has been significant progress in solving LMDPs under different structural assumptions. However, for general LMDPs, there is no known learning algorithm that provably matches the existing lower bound (Kwon et al., 2021). We introduce the first sample-efficient algorithm for LMDPs without any additional structural assumptions. Our result builds off a new perspective on the role of off-policy evaluation guarantees and coverage coefficients in LMDPs, a perspective, that has been overlooked in the context of exploration in partially observed environments. Specifically, we establish a novel off-policy evaluation lemma and introduce a new coverage coefficient for LMDPs. Then, we show how these can be used to derive near-optimal guarantees of an optimistic exploration algorithm. These results, we believe, can be valuable for a wide range of interactive learning problems beyond LMDPs, and especially, for partially observed environments.

Updated: 2024-06-26 15:42:57

标题: 在潜在MDPs中的RL是可处理的：通过离线策略评估实现在线保证

摘要: 在许多实际决策问题中，存在部分观察、隐藏或潜在信息，这些信息在整个交互过程中保持不变。这样的决策问题可以建模为潜在马尔可夫决策过程（LMDPs），其中在交互开始时选择一个潜在变量，并且不向代理披露。在过去的十年中，在不同结构假设下解决LMDPs取得了显著进展。然而，对于一般的LMDPs，目前尚无已知的学习算法能够确保匹配现有的下界（Kwon等人，2021）。我们引入了第一个无需任何额外结构假设的LMDPs的样本高效算法。我们的结果基于对离线策略评估保证和覆盖系数在LMDPs中的作用的新视角，这是在部分观察环境下的探索背景中被忽视的。具体来说，我们建立了一个新的离线策略评估引理，并引入了一个新的覆盖系数用于LMDPs。然后，我们展示了如何利用这些结果推导出一种乐观探索算法的近乎最优保证。我们相信，这些结果对于超出LMDPs范围的广泛交互式学习问题以及特别是部分观察环境可能具有价值。

更新时间: 2024-06-26 15:42:57

领域: cs.LG,cs.AI,cs.SY,eess.SY

下载: http://arxiv.org/abs/2406.01389v2

Benchmarking mortality risk prediction from electrocardiograms

Several recent high-impact studies leverage large hospital-owned electrocardiographic (ECG) databases to model and predict patient mortality. MIMIC-IV, released September 2023, is the first comparable public dataset and includes 800,000 ECGs from a U.S. hospital system. Previously, the largest public ECG dataset was Code-15, containing 345,000 ECGs collected during routine care in Brazil. These datasets now provide an excellent resource for a broader audience to explore ECG survival modeling. Here, we benchmark survival model performance on Code-15 and MIMIC-IV with two neural network architectures, compare four deep survival modeling approaches to Cox regressions trained on classifier outputs, and evaluate performance at one to ten years. Our results yield AUROC and concordance scores comparable to past work (circa 0.8) and reasonable AUPRC scores (MIMIC-IV: 0.4-0.5, Code-15: 0.05-0.13) considering the fraction of ECG samples linked to a mortality (MIMIC-IV: 27\%, Code-15: 4\%). When evaluating models on the opposite dataset, AUROC and concordance values drop by 0.1-0.15, which may be due to cohort differences. All code and results are made public.

Updated: 2024-06-26 15:27:16

标题: 基于心电图的死亡风险预测基准对比

摘要: 最近几项影响较大的研究利用大型医院拥有的心电图（ECG）数据库对患者死亡进行建模和预测。MIMIC-IV于2023年9月发布，是第一个可比较的公共数据集，包括来自美国医院系统的80万份ECG。此前，最大的公共ECG数据集是Code-15，其中包含在巴西日常护理过程中收集的345,000份ECG。现在这些数据集为更广泛的受众提供了一个探索ECG生存建模的优秀资源。在这里，我们使用两种神经网络架构在Code-15和MIMIC-IV上基准生存模型表现，比较四种深度生存建模方法与基于分类器输出训练的Cox回归，并评估一到十年的性能。我们的结果产生了与过去工作（约0.8）相当的AUROC和一致性分数，以及合理的AUPRC分数（MIMIC-IV：0.4-0.5，Code-15：0.05-0.13），考虑到与死亡相关联的ECG样本的比例（MIMIC-IV：27％，Code-15：4％）。在对相反数据集评估模型时，AUROC和一致性值下降了0.1-0.15，这可能是由于队列差异导致的。所有代码和结果均已公开。

更新时间: 2024-06-26 15:27:16

领域: eess.SP,cs.LG,cs.NE,stat.AP

下载: http://arxiv.org/abs/2406.17002v2

Cultural Bias and Cultural Alignment of Large Language Models

Culture fundamentally shapes people's reasoning, behavior, and communication. As people increasingly use generative artificial intelligence (AI) to expedite and automate personal and professional tasks, cultural values embedded in AI models may bias people's authentic expression and contribute to the dominance of certain cultures. We conduct a disaggregated evaluation of cultural bias for five widely used large language models (OpenAI's GPT-4o/4-turbo/4/3.5-turbo/3) by comparing the models' responses to nationally representative survey data. All models exhibit cultural values resembling English-speaking and Protestant European countries. We test cultural prompting as a control strategy to increase cultural alignment for each country/territory. For recent models (GPT-4, 4-turbo, 4o), this improves the cultural alignment of the models' output for 71-81% of countries and territories. We suggest using cultural prompting and ongoing evaluation to reduce cultural bias in the output of generative AI.

Updated: 2024-06-26 15:26:44

标题: 大型语言模型的文化偏见和文化对齐

摘要: 文化在根本上塑造了人们的推理、行为和沟通。随着人们越来越多地使用生成式人工智能（AI）来加快和自动化个人和专业任务，嵌入在AI模型中的文化价值观可能会偏向于影响人们的真实表达，并有助于某些文化的主导地位。我们通过将模型的响应与全国代表性调查数据进行比较，对五种广泛使用的大型语言模型（OpenAI的GPT-4o/4-turbo/4/3.5-turbo/3）进行了分解评估文化偏见。所有模型展现出类似英语和新教欧洲国家的文化价值观。我们测试文化提示作为一种控制策略，以增加每个国家/地区的文化对齐度。对于最近的模型（GPT-4、4-turbo、4o），这种方法提高了模型输出在71-81％国家和地区的文化对齐度。我们建议使用文化提示和持续评估来减少生成式AI输出中的文化偏见。

更新时间: 2024-06-26 15:26:44

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2311.14096v2

Graph Neural Networks for Emulation of Finite-Element Ice Dynamics in Greenland and Antarctic Ice Sheets

Although numerical models provide accurate solutions for ice sheet dynamics based on physics laws, they accompany intensified computational demands to solve partial differential equations. In recent years, convolutional neural networks (CNNs) have been widely used as statistical emulators for those numerical models. However, since CNNs operate on regular grids, they cannot represent the refined meshes and computational efficiency of finite-element numerical models. Therefore, instead of CNNs, this study adopts an equivariant graph convolutional network (EGCN) as an emulator for the ice sheet dynamics modeling. EGCN reproduces ice thickness and velocity changes in the Helheim Glacier, Greenland, and Pine Island Glacier, Antarctica, with 260 times and 44 times faster computation time, respectively. Compared to the traditional CNN and graph convolutional network, EGCN shows outstanding accuracy in thickness prediction near fast ice streams by preserving the equivariance to the translation and rotation of graphs.

Updated: 2024-06-26 15:18:49

标题: 图神经网络用于模拟格陵兰和南极冰盖有限元冰动力学

摘要: 尽管数值模型根据物理定律提供了冰盖动力学的准确解决方案，但它们伴随着加剧的计算需求来解决偏微分方程。近年来，卷积神经网络（CNNs）广泛用作这些数值模型的统计仿真器。然而，由于CNNs在规则网格上运行，它们无法代表有限元数值模型的精细网格和计算效率。因此，本研究采用等变图卷积网络（EGCN）代替CNNs作为冰盖动力学建模的仿真器。EGCN在格陵兰Helheim冰川和南极洲Pine Island冰川中复制了冰层厚度和速度变化，分别比传统CNN和图卷积网络快260倍和44倍。与传统CNN和图卷积网络相比，EGCN通过保持对图的平移和旋转的等变性，在快速冰流附近的厚度预测中显示出出色的准确性。

更新时间: 2024-06-26 15:18:49

领域: cs.LG,cs.AI,cs.NA,math.NA

下载: http://arxiv.org/abs/2406.18423v1

Mixture of Experts in a Mixture of RL settings

Mixtures of Experts (MoEs) have gained prominence in (self-)supervised learning due to their enhanced inference efficiency, adaptability to distributed training, and modularity. Previous research has illustrated that MoEs can significantly boost Deep Reinforcement Learning (DRL) performance by expanding the network's parameter count while reducing dormant neurons, thereby enhancing the model's learning capacity and ability to deal with non-stationarity. In this work, we shed more light on MoEs' ability to deal with non-stationarity and investigate MoEs in DRL settings with "amplified" non-stationarity via multi-task training, providing further evidence that MoEs improve learning capacity. In contrast to previous work, our multi-task results allow us to better understand the underlying causes for the beneficial effect of MoE in DRL training, the impact of the various MoE components, and insights into how best to incorporate them in actor-critic-based DRL networks. Finally, we also confirm results from previous work.

Updated: 2024-06-26 15:15:15

标题: 专家混合在混合强化学习环境中的应用

摘要: 混合专家模型（MoEs）在（自我）监督学习中备受关注，因为它们提高了推理效率，适应了分布式训练，并具有模块化特性。先前的研究表明，MoEs可以通过扩展网络的参数数量，减少休眠神经元，从而显著提高深度强化学习（DRL）的性能，增强模型的学习能力并处理非稳态情况。在本研究中，我们进一步探讨了MoEs处理非稳态情况的能力，并通过多任务训练在DRL设置中调查了“放大”非稳态情况下的MoEs，进一步证明了MoEs提高学习能力的效果。与先前的研究相反，我们的多任务结果使我们更好地理解了MoE在DRL训练中产生有益效果的根本原因，MoE各个组件的影响以及如何最好地将它们结合到基于演员-评论家的DRL网络中。最后，我们还确认了先前研究的结果。

更新时间: 2024-06-26 15:15:15

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2406.18420v1

Differential error feedback for communication-efficient decentralized learning

Communication-constrained algorithms for decentralized learning and optimization rely on local updates coupled with the exchange of compressed signals. In this context, differential quantization is an effective technique to mitigate the negative impact of compression by leveraging correlations between successive iterates. In addition, the use of error feedback, which consists of incorporating the compression error into subsequent steps, is a powerful mechanism to compensate for the bias caused by the compression. Under error feedback, performance guarantees in the literature have so far focused on algorithms employing a fusion center or a special class of contractive compressors that cannot be implemented with a finite number of bits. In this work, we propose a new decentralized communication-efficient learning approach that blends differential quantization with error feedback. The approach is specifically tailored for decentralized learning problems where agents have individual risk functions to minimize subject to subspace constraints that require the minimizers across the network to lie in low-dimensional subspaces. This constrained formulation includes consensus or single-task optimization as special cases, and allows for more general task relatedness models such as multitask smoothness and coupled optimization. We show that, under some general conditions on the compression noise, and for sufficiently small step-sizes $\mu$, the resulting communication-efficient strategy is stable both in terms of mean-square error and average bit rate: by reducing $\mu$, it is possible to keep the estimation errors small (on the order of $\mu$) without increasing indefinitely the bit rate as $\mu\rightarrow 0$. The results establish that, in the small step-size regime and with a finite number of bits, it is possible to attain the performance achievable in the absence of compression.

Updated: 2024-06-26 15:11:26

标题: 分布式学习中的差分误差反馈以实现高效通信

摘要: 通信受限算法用于分散式学习和优化，依赖于本地更新与压缩信号交换相结合。在这种背景下，差分量化是一种有效的技术，可以通过利用连续迭代之间的相关性来减轻压缩的负面影响。此外，使用误差反馈，即将压缩误差纳入后续步骤中，是一种强大的机制，可以弥补压缩引起的偏差。在误差反馈下，文献中的性能保证迄今主要集中在采用融合中心或一类无法使用有限比特数实现的收缩压缩器的算法上。在这项工作中，我们提出了一种新的分散式通信高效学习方法，将差分量化与误差反馈相结合。该方法专门针对分散式学习问题，其中代理具有各自要最小化的风险函数，同时受限于需要网络中的最小化者位于低维子空间的子空间约束。这种受限制的公式包括共识或单任务优化作为特殊情况，并允许更一般的任务相关性模型，如多任务平滑性和耦合优化。我们表明，在压缩噪声的一些一般条件下，并且对于足够小的步长$\mu$，由此产生的通信高效策略在均方误差和平均比特率方面是稳定的：通过减小$\mu$，可以保持估计误差很小（数量级为$\mu$），而不会随着$\mu\rightarrow 0$而无限增加比特率。结果表明，在小步长区间和有限比特数下，可以实现在没有压缩的情况下可达到的性能。

更新时间: 2024-06-26 15:11:26

领域: cs.MA,cs.LG,eess.SP

下载: http://arxiv.org/abs/2406.18418v1

Towards diffusion models for large-scale sea-ice modelling

We make the first steps towards diffusion models for unconditional generation of multivariate and Arctic-wide sea-ice states. While targeting to reduce the computational costs by diffusion in latent space, latent diffusion models also offer the possibility to integrate physical knowledge into the generation process. We tailor latent diffusion models to sea-ice physics with a censored Gaussian distribution in data space to generate data that follows the physical bounds of the modelled variables. Our latent diffusion models reach similar scores as the diffusion model trained in data space, but they smooth the generated fields as caused by the latent mapping. While enforcing physical bounds cannot reduce the smoothing, it improves the representation of the marginal ice zone. Therefore, for large-scale Earth system modelling, latent diffusion models can have many advantages compared to diffusion in data space if the significant barrier of smoothing can be resolved.

Updated: 2024-06-26 15:11:15

标题: 朝向大规模海冰建模的扩散模型

摘要: 我们迈出了朝着无条件生成多变量和整个北极海冰状态扩散模型的第一步。通过在潜在空间中进行扩散以降低计算成本，潜在扩散模型还提供了将物理知识整合到生成过程中的可能性。我们将潜在扩散模型定制为海冰物理学，使用数据空间中的截尾高斯分布来生成遵循模拟变量物理边界的数据。我们的潜在扩散模型达到了与在数据空间中训练的扩散模型相似的分数，但由于潜在映射所致，它们平滑了生成的场。虽然强制执行物理边界无法减少平滑效果，但它改善了边缘冰带的表示。因此，在大规模地球系统建模中，与数据空间中的扩散相比，如果可以解决平滑的重要障碍，潜在扩散模型可能具有许多优势。

更新时间: 2024-06-26 15:11:15

领域: cs.LG,physics.ao-ph

下载: http://arxiv.org/abs/2406.18417v1

BiTrack: Bidirectional Offline 3D Multi-Object Tracking Using Camera-LiDAR Data

Compared with real-time multi-object tracking (MOT), offline multi-object tracking (OMOT) has the advantages to perform 2D-3D detection fusion, erroneous link correction, and full track optimization but has to deal with the challenges from bounding box misalignment and track evaluation, editing, and refinement. This paper proposes "BiTrack", a 3D OMOT framework that includes modules of 2D-3D detection fusion, initial trajectory generation, and bidirectional trajectory re-optimization to achieve optimal tracking results from camera-LiDAR data. The novelty of this paper includes threefold: (1) development of a point-level object registration technique that employs a density-based similarity metric to achieve accurate fusion of 2D-3D detection results; (2) development of a set of data association and track management skills that utilizes a vertex-based similarity metric as well as false alarm rejection and track recovery mechanisms to generate reliable bidirectional object trajectories; (3) development of a trajectory re-optimization scheme that re-organizes track fragments of different fidelities in a greedy fashion, as well as refines each trajectory with completion and smoothing techniques. The experiment results on the KITTI dataset demonstrate that BiTrack achieves the state-of-the-art performance for 3D OMOT tasks in terms of accuracy and efficiency.

Updated: 2024-06-26 15:09:54

标题: BiTrack：使用相机-LiDAR数据进行双向离线3D多目标跟踪

摘要: 与实时多目标跟踪（MOT）相比，离线多目标跟踪（OMOT）具有执行2D-3D检测融合、错误链接校正和完整轨迹优化的优势，但必须处理边界框错位和轨迹评估、编辑和细化等挑战。本文提出了一个名为“BiTrack”的3D OMOT框架，其中包括2D-3D检测融合、初始轨迹生成和双向轨迹重新优化模块，以实现从相机-LiDAR数据中获得最佳跟踪结果。本文的创新之处包括三点：（1）开发了一种点级目标注册技术，利用基于密度的相似度度量实现了2D-3D检测结果的准确融合；（2）开发了一组数据关联和轨迹管理技巧，利用基于顶点的相似度度量以及误报拒绝和轨迹恢复机制生成可靠的双向目标轨迹；（3）开发了一个轨迹重新优化方案，以贪婪方式重新组织不同可信度的轨迹片段，并利用完成和平滑技术完善每条轨迹。在KITTI数据集上的实验结果表明，BiTrack在准确性和效率方面实现了3D OMOT任务的最新性能水平。

更新时间: 2024-06-26 15:09:54

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2406.18414v1

Active Preference Inference using Language Models and Probabilistic Reasoning

Actively inferring user preferences, for example by asking good questions, is important for any human-facing decision-making system. Active inference allows such systems to adapt and personalize themselves to nuanced individual preferences. To enable this ability for instruction-tuned large language models (LLMs), one may prompt them to ask users questions to infer their preferences, transforming the language models into more robust, interactive systems. However, out of the box, these models are not efficient at extracting preferences: the questions they generate are not informative, requiring a high number of user interactions and impeding the usability of the downstream system. In this work, we introduce an inference-time algorithm that helps LLMs quickly infer preferences by using more informative questions. Our algorithm uses a probabilistic model whose conditional distributions are defined by prompting an LLM, and returns questions that optimize expected entropy and expected model change. Results in a simplified interactive web shopping setting with real product items show that an LLM equipped with our entropy reduction algorithm outperforms baselines with the same underlying LLM on task performance while using fewer user interactions.

Updated: 2024-06-26 15:00:52

标题: 使用语言模型和概率推理进行主动偏好推断

摘要: 积极推断用户偏好，例如通过提出好问题，对于任何面向人类决策系统都是重要的。积极推断使这些系统能够适应并个性化地满足微妙的个体偏好。为了使针对指导调整的大型语言模型（LLMs）具有这种能力，可以促使它们向用户提问以推断其偏好，将语言模型转变为更强大、互动性更强的系统。然而，这些模型在初始状态下并不擅长提取偏好：它们生成的问题缺乏信息，需要大量用户互动，影响了下游系统的可用性。在这项工作中，我们介绍了一种推断时算法，通过使用更具信息量的问题帮助LLMs快速推断偏好。我们的算法使用一个概率模型，其条件分布由促使一个LLM定义，并返回最优化的预期熵和预期模型变化的问题。在一个简化的真实产品项目的互动式网络购物环境中的结果表明，装备了我们的熵减算法的LLM在任务表现上优于具有相同基础LLM的基准线，同时使用更少的用户互动。

更新时间: 2024-06-26 15:00:52

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2312.12009v2

Efficient Low-rank Identification via Accelerated Iteratively Reweighted Nuclear Norm Minimization

This paper considers the problem of minimizing the sum of a smooth function and the Schatten-$p$ norm of the matrix. Our contribution involves proposing accelerated iteratively reweighted nuclear norm methods designed for solving the nonconvex low-rank minimization problem. Two major novelties characterize our approach. Firstly, the proposed method possesses a rank identification property, enabling the provable identification of the "correct" rank of the stationary point within a finite number of iterations. Secondly, we introduce an adaptive updating strategy for smoothing parameters. This strategy automatically fixes parameters associated with zero singular values as constants upon detecting the "correct" rank while quickly driving the rest of the parameters to zero. This adaptive behavior transforms the algorithm into one that effectively solves smooth problems after a few iterations, setting our work apart from existing iteratively reweighted methods for low-rank optimization. We prove the global convergence of the proposed algorithm, guaranteeing that every limit point of the iterates is a critical point. Furthermore, a local convergence rate analysis is provided under the Kurdyka-{\L}ojasiewicz property. We conduct numerical experiments using both synthetic and real data to showcase our algorithm's efficiency and superiority over existing methods.

Updated: 2024-06-26 15:00:46

标题: 高效低秩识别：通过加速迭代重新加权核范数最小化

摘要: 本文考虑了最小化平滑函数和矩阵的Schatten-$p$范数之和的问题。我们的贡献包括提出加速迭代重新加权核范数方法，旨在解决非凸低秩最小化问题。我们方法的两个主要创新点是：首先，所提出的方法具有秩识别性质，可以在有限次迭代内证明在稳定点中识别出“正确”秩。其次，我们引入了一种自适应更新策略用于平滑参数。该策略在检测到“正确”秩后会自动将与零奇异值相关的参数固定为常数，同时快速将其余参数驱动为零。这种自适应行为将算法转变为在几次迭代后有效解决平滑问题的算法，使我们的工作与现有的低秩优化的迭代重新加权方法有所不同。我们证明了所提出算法的全局收敛性，保证每个迭代点的极限点都是临界点。此外，在Kurdyka-{\L}ojasiewicz性质下提供了局部收敛速度分析。我们使用合成数据和真实数据进行数值实验，展示了我们算法的效率和优越性。

更新时间: 2024-06-26 15:00:46

领域: math.OC,cs.LG

下载: http://arxiv.org/abs/2406.15713v2

OlympicArena Medal Ranks: Who Is the Most Intelligent AI So Far?

In this report, we pose the following question: Who is the most intelligent AI model to date, as measured by the OlympicArena (an Olympic-level, multi-discipline, multi-modal benchmark for superintelligent AI)? We specifically focus on the most recently released models: Claude-3.5-Sonnet, Gemini-1.5-Pro, and GPT-4o. For the first time, we propose using an Olympic medal Table approach to rank AI models based on their comprehensive performance across various disciplines. Empirical results reveal: (1) Claude-3.5-Sonnet shows highly competitive overall performance over GPT-4o, even surpassing GPT-4o on a few subjects (i.e., Physics, Chemistry, and Biology). (2) Gemini-1.5-Pro and GPT-4V are ranked consecutively just behind GPT-4o and Claude-3.5-Sonnet, but with a clear performance gap between them. (3) The performance of AI models from the open-source community significantly lags behind these proprietary models. (4) The performance of these models on this benchmark has been less than satisfactory, indicating that we still have a long way to go before achieving superintelligence. We remain committed to continuously tracking and evaluating the performance of the latest powerful models on this benchmark (available at https://github.com/GAIR-NLP/OlympicArena).

Updated: 2024-06-26 15:00:04

标题: 奥林匹克竞技场奖牌排名：到目前为止最聪明的人工智能是谁？

摘要: 在这份报告中，我们提出了以下问题：截至目前，根据奥林匹克竞技场（一项奥林匹克级别的、多学科、多模态的超智能AI基准），谁是最聪明的AI模型？我们专注于最近发布的模型：Claude-3.5-Sonnet、Gemini-1.5-Pro和GPT-4o。我们首次提出使用奥运奖牌榜的方法来排名AI模型，根据它们在各个学科上的综合表现。实证结果显示：（1）Claude-3.5-Sonnet在整体性能上表现出极具竞争力，甚至在一些学科（如物理、化学和生物学）上超过了GPT-4o。（2）Gemini-1.5-Pro和GPT-4V分别排名在GPT-4o和Claude-3.5-Sonnet之后，但它们之间存在明显的性能差距。（3）来自开源社区的AI模型性能明显落后于这些专有模型。（4）这些模型在这一基准测试上的表现令人不满，表明在实现超智能之前我们仍有很长的路要走。我们将继续致力于持续跟踪和评估最新强大模型在这一基准测试上的表现（可在https://github.com/GAIR-NLP/OlympicArena获取）。

更新时间: 2024-06-26 15:00:04

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2406.16772v2

An Information Theoretic Perspective on Conformal Prediction

Conformal Prediction (CP) is a distribution-free uncertainty estimation framework that constructs prediction sets guaranteed to contain the true answer with a user-specified probability. Intuitively, the size of the prediction set encodes a general notion of uncertainty, with larger sets associated with higher degrees of uncertainty. In this work, we leverage information theory to connect conformal prediction to other notions of uncertainty. More precisely, we prove three different ways to upper bound the intrinsic uncertainty, as described by the conditional entropy of the target variable given the inputs, by combining CP with information theoretical inequalities. Moreover, we demonstrate two direct and useful applications of such connection between conformal prediction and information theory: (i) more principled and effective conformal training objectives that generalize previous approaches and enable end-to-end training of machine learning models from scratch, and (ii) a natural mechanism to incorporate side information into conformal prediction. We empirically validate both applications in centralized and federated learning settings, showing our theoretical results translate to lower inefficiency (average prediction set size) for popular CP methods.

Updated: 2024-06-26 14:58:25

标题: 一个关于符合预测的信息论视角

摘要: Conformal Prediction (CP)是一个无分布的不确定性估计框架，它构建了预测集，保证以用户指定的概率包含真实答案。直观地，预测集的大小编码了一般的不确定性概念，较大的集合与较高程度的不确定性相关联。在这项工作中，我们利用信息论将符合预测与其他不确定性概念联系起来。更确切地说，我们证明了三种不同的方法来上界内在的不确定性，通过结合CP与信息论不等式。此外，我们展示了符合预测与信息论之间的这种联系的两种直接和有用的应用：(i)更加原则和有效的符合训练目标，推广了以前的方法并实现了从头开始对机器学习模型进行端到端的训练，以及(ii)一种自然的机制将侧面信息整合到符合预测中。我们在集中式和联邦学习设置中通过经验证实现了这两种应用，展示了我们的理论结果转化为流行CP方法的更低低效率(平均预测集大小)。

更新时间: 2024-06-26 14:58:25

领域: cs.LG,cs.IT,math.IT,stat.ML

下载: http://arxiv.org/abs/2405.02140v2

IRCAN: Mitigating Knowledge Conflicts in LLM Generation via Identifying and Reweighting Context-Aware Neurons

It is widely acknowledged that large language models (LLMs) encode a vast reservoir of knowledge after being trained on mass data. Recent studies disclose knowledge conflicts in LLM generation, wherein outdated or incorrect parametric knowledge (i.e., encoded knowledge) contradicts new knowledge provided in the context. To mitigate such knowledge conflicts, we propose a novel framework, IRCAN (Identifying and Reweighting Context-Aware Neurons) to capitalize on neurons that are crucial in processing contextual cues. Specifically, IRCAN first identifies neurons that significantly contribute to context processing, utilizing a context-aware attribution score derived from integrated gradients. Subsequently, the identified context-aware neurons are strengthened via reweighting. In doing so, we steer LLMs to generate context-sensitive outputs with respect to the new knowledge provided in the context. Extensive experiments conducted across a variety of models and tasks demonstrate that IRCAN not only achieves remarkable improvements in handling knowledge conflicts but also offers a scalable, plug-andplay solution that can be integrated seamlessly with existing models.

Updated: 2024-06-26 14:57:38

标题: IRCAN：通过识别和重新加权上下文感知神经元来减轻LLM生成中的知识冲突

摘要: 大型语言模型（LLMs）在大规模数据训练后编码了大量知识，这一点被广泛认可。最近的研究揭示了LLM生成中的知识冲突，即过时或不正确的参数化知识（即编码知识）与上下文中提供的新知识相矛盾。为了减轻这种知识冲突，我们提出了一个新颖的框架IRCAN（识别和重新加权上下文感知神经元），利用在处理上下文提示时至关重要的神经元。具体而言，IRCAN首先通过从综合梯度得出的上下文感知归因分数来识别显著贡献于上下文处理的神经元。随后，通过重新加权来加强已识别的上下文感知神经元。通过这样做，我们引导LLMs生成相对于上下文中提供的新知识的上下文敏感输出。在各种模型和任务上进行的大量实验表明，IRCAN不仅在处理知识冲突方面取得了显着改进，而且提供了一种可扩展的即插即用解决方案，可以与现有模型无缝集成。

更新时间: 2024-06-26 14:57:38

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2406.18406v1

A Quantization-based Technique for Privacy Preserving Distributed Learning

The massive deployment of Machine Learning (ML) models raises serious concerns about data protection. Privacy-enhancing technologies (PETs) offer a promising first step, but hard challenges persist in achieving confidentiality and differential privacy in distributed learning. In this paper, we describe a novel, regulation-compliant data protection technique for the distributed training of ML models, applicable throughout the ML life cycle regardless of the underlying ML architecture. Designed from the data owner's perspective, our method protects both training data and ML model parameters by employing a protocol based on a quantized multi-hash data representation Hash-Comb combined with randomization. The hyper-parameters of our scheme can be shared using standard Secure Multi-Party computation protocols. Our experimental results demonstrate the robustness and accuracy-preserving properties of our approach.

Updated: 2024-06-26 14:54:12

标题: 一个基于量化的隐私保护分布式学习技术

摘要: 机器学习（ML）模型的大规模部署引发了对数据保护的严重关注。隐私增强技术（PETs）提供了一个有前途的第一步，但在实现分布式学习中的保密性和差分隐私方面仍存在挑战。在本文中，我们描述了一种新颖的、符合规定的数据保护技术，用于分布式ML模型的训练，适用于整个ML生命周期，无论底层ML架构如何。从数据所有者的角度设计，我们的方法通过采用基于量化多哈希数据表示Hash-Comb结合随机化的协议来保护训练数据和ML模型参数。我们方案的超参数可以使用标准的安全多方计算协议共享。我们的实验结果展示了我们方法的稳健性和保持准确性的特性。

更新时间: 2024-06-26 14:54:12

领域: cs.CR,cs.AI

下载: http://arxiv.org/abs/2406.19418v1

Do LLMs dream of elephants (when told not to)? Latent concept association and associative memory in transformers

Large Language Models (LLMs) have the capacity to store and recall facts. Through experimentation with open-source models, we observe that this ability to retrieve facts can be easily manipulated by changing contexts, even without altering their factual meanings. These findings highlight that LLMs might behave like an associative memory model where certain tokens in the contexts serve as clues to retrieving facts. We mathematically explore this property by studying how transformers, the building blocks of LLMs, can complete such memory tasks. We study a simple latent concept association problem with a one-layer transformer and we show theoretically and empirically that the transformer gathers information using self-attention and uses the value matrix for associative memory.

Updated: 2024-06-26 14:49:54

标题: LLMs是否梦见大象（即使被告知不要）？变压器中的潜在概念关联和联想记忆

摘要: 大型语言模型（LLMs）具有存储和回忆事实的能力。通过对开源模型进行实验，我们观察到这种检索事实的能力可以通过改变上下文来轻松操纵，即使不改变它们的事实含义。这些发现突显了LLMs可能表现得像一个联想记忆模型，其中上下文中的某些标记充当检索事实的线索。我们通过研究transformers（LLMs的构建模块）如何完成这种记忆任务来数学地探索这一属性。我们使用一个单层transformer研究了一个简单的潜在概念关联问题，并在理论上和实证上展示了transformer如何使用自注意力来收集信息，并使用值矩阵进行联想记忆。

更新时间: 2024-06-26 14:49:54

领域: cs.CL,cs.LG,stat.ML

下载: http://arxiv.org/abs/2406.18400v1

AssertionBench: A Benchmark to Evaluate Large-Language Models for Assertion Generation

Assertions have been the de facto collateral for simulation-based and formal verification of hardware designs for over a decade. The quality of hardware verification, \ie, detection and diagnosis of corner-case design bugs, is critically dependent on the quality of the assertions. There has been a considerable amount of research leveraging a blend of data-driven statistical analysis and static analysis to generate high-quality assertions from hardware design source code and design execution trace data. Despite such concerted effort, all prior research struggles to scale to industrial-scale large designs, generates too many low-quality assertions, often fails to capture subtle and non-trivial design functionality, and does not produce any easy-to-comprehend explanations of the generated assertions to understand assertions' suitability to different downstream validation tasks. Recently, with the advent of Large-Language Models (LLMs), there has been a widespread effort to leverage prompt engineering to generate assertions. However, there is little effort to quantitatively establish the effectiveness and suitability of various LLMs for assertion generation. In this paper, we present AssertionBench, a novel benchmark to evaluate LLMs' effectiveness for assertion generation quantitatively. AssertioBench contains 100 curated Verilog hardware designs from OpenCores and formally verified assertions for each design generated from GoldMine and HARM. We use AssertionBench to compare state-of-the-art LLMs to assess their effectiveness in inferring functionally correct assertions for hardware designs. Our experiments demonstrate how LLMs perform relative to each other, the benefits of using more in-context exemplars in generating a higher fraction of functionally correct assertions, and the significant room for improvement for LLM-based assertion generators.

Updated: 2024-06-26 14:47:28

标题: AssertionBench：用于评估大型语言模型生成断言的基准测试

摘要: 断言已经成为硬件设计基于仿真和形式验证的事实上的抵押品超过十年。硬件验证的质量，即检测和诊断角落案例设计缺陷，严重依赖于断言的质量。已经有大量的研究利用数据驱动的统计分析和静态分析的混合来从硬件设计源代码和设计执行跟踪数据生成高质量的断言。尽管有这样的共同努力，但所有先前的研究都难以扩展到工业规模的大型设计，生成太多低质量的断言，往往无法捕捉微妙和非常规设计功能，并且未能为生成的断言提供任何易于理解的解释，以理解断言对不同下游验证任务的适用性。最近，随着大型语言模型（LLM）的出现，人们普遍努力利用提示工程来生成断言。然而，很少有努力定量地建立各种LLM对断言生成的有效性和适用性。在本文中，我们提出了AssertionBench，一个新颖的基准来定量评估LLM对断言生成的有效性。AssertioBench包含100个来自OpenCores的精选Verilog硬件设计，以及为每个设计从GoldMine和HARM生成的形式验证的断言。我们使用AssertionBench来比较最先进的LLM，以评估它们在推断硬件设计的功能正确断言方面的有效性。我们的实验展示了LLM相对于彼此的表现，利用更多上下文示例生成更高比例的功能正确断言的好处，以及LLM为基础的断言生成器有很大改进空间。

更新时间: 2024-06-26 14:47:28

领域: cs.SE,cs.LG

下载: http://arxiv.org/abs/2406.18627v1

Second Maximum of a Gaussian Random Field and Exact (t-)Spacing test

In this article, we introduce the novel concept of the second maximum of a Gaussian random field on a Riemannian submanifold. This second maximum serves as a powerful tool for characterizing the distribution of the maximum. By utilizing an ad-hoc Kac Rice formula, we derive the explicit form of the maximum's distribution, conditioned on the second maximum and some regressed component of the Riemannian Hessian. This approach results in an exact test, based on the evaluation of spacing between these maxima, which we refer to as the spacing test. We investigate the applicability of this test in detecting sparse alternatives within Gaussian symmetric tensors, continuous sparse deconvolution, and two-layered neural networks with smooth rectifiers. Our theoretical results are supported by numerical experiments, which illustrate the calibration and power of the proposed tests. More generally, this test can be applied to any Gaussian random field on a Riemannian manifold, and we provide a general framework for the application of the spacing test in continuous sparse kernel regression. Furthermore, when the variance-covariance function of the Gaussian random field is known up to a scaling factor, we derive an exact Studentized version of our test, coined the $t$-spacing test. This test is perfectly calibrated under the null hypothesis and has high power for detecting sparse alternatives.

Updated: 2024-06-26 14:44:24

标题: 高斯随机场的第二最大值和精确(t-)间距检验

摘要: 在这篇文章中，我们介绍了在黎曼子流形上的高斯随机场的第二最大值的新概念。这个第二最大值作为表征最大值分布的强大工具。通过利用特定的卡克-赖斯公式，我们推导出了最大值的分布的显式形式，条件是第二最大值和黎曼海森矩阵的一些回归分量。这种方法导致了一种精确的检验，基于这些最大值之间的间距评估，我们称之为间距检验。我们研究了这种检验在检测高斯对称张量、连续稀疏反卷积以及具有平滑整流器的两层神经网络中的稀疏替代品方面的适用性。我们的理论结果得到了数值实验的支持，这些实验展示了所提出的检验的校准性和功效。更一般地，这种检验可以应用于任何黎曼流形上的高斯随机场，我们提供了一个在连续稀疏核回归中应用间距检验的一般框架。此外，当高斯随机场的方差-协方差函数直到一个缩放因子时已知，我们推导出了我们的检验的精确学生化版本，称为$t$-间距检验。这种检验在零假设下完全校准，并具有高功效以检测稀疏替代品。

更新时间: 2024-06-26 14:44:24

领域: math.ST,cs.LG,math.DG,math.PR,stat.ML,stat.TH,Primary 62E15, 62F03, 60G15, 62H10, 62H15, secondary 60E05, 60G10, 62J05, 94A08

下载: http://arxiv.org/abs/2406.18397v1

AlphaForge: A Framework to Mine and Dynamically Combine Formulaic Alpha Factors

The variability and low signal-to-noise ratio in financial data, combined with the necessity for interpretability, make the alpha factor mining workflow a crucial component of quantitative investment. Transitioning from early manual extraction to genetic programming, the most advanced approach in this domain currently employs reinforcement learning to mine a set of combination factors with fixed weights. However, the performance of resultant alpha factors exhibits inconsistency, and the inflexibility of fixed factor weights proves insufficient in adapting to the dynamic nature of financial markets. To address this issue, this paper proposes a two-stage formulaic alpha generating framework AlphaForge, for alpha factor mining and factor combination. This framework employs a generative-predictive neural network to generate factors, leveraging the robust spatial exploration capabilities inherent in deep learning while concurrently preserving diversity. The combination model within the framework incorporates the temporal performance of factors for selection and dynamically adjusts the weights assigned to each component alpha factor. Experiments conducted on real-world datasets demonstrate that our proposed model outperforms contemporary benchmarks in formulaic alpha factor mining. Furthermore, our model exhibits a notable enhancement in portfolio returns within the realm of quantitative investment.

Updated: 2024-06-26 14:34:37

标题: AlphaForge：挖掘和动态组合公式Alpha因子的框架

摘要: 金融数据的变异性和低信噪比，结合对可解释性的需求，使得Alpha因子挖掘工作流成为量化投资中关键的组成部分。从早期的手动提取转变为遗传编程，目前在这一领域最先进的方法利用强化学习来挖掘一组具有固定权重的组合因子。然而，所得到的Alpha因子的表现存在不一致性，而固定因子权重的不灵活性在适应金融市场动态性方面表现不足。为了解决这个问题，本文提出了一个两阶段的公式化Alpha生成框架AlphaForge，用于Alpha因子挖掘和因子组合。该框架利用生成-预测神经网络生成因子，利用深度学习中固有的强大空间探索能力，同时保持多样性。框架内的组合模型考虑因子的时间性能进行选择，并动态调整分配给每个组件Alpha因子的权重。对真实数据集进行的实验证明，我们提出的模型在公式化Alpha因子挖掘方面胜过当代基准。此外，我们的模型在量化投资领域的投资组合回报方面表现出显著增强。

更新时间: 2024-06-26 14:34:37

领域: q-fin.CP,cs.AI

下载: http://arxiv.org/abs/2406.18394v1

WhaleNet: a Novel Deep Learning Architecture for Marine Mammals Vocalizations on Watkins Marine Mammal Sound Database

Marine mammal communication is a complex field, hindered by the diversity of vocalizations and environmental factors. The Watkins Marine Mammal Sound Database (WMMD) constitutes a comprehensive labeled dataset employed in machine learning applications. Nevertheless, the methodologies for data preparation, preprocessing, and classification documented in the literature exhibit considerable variability and are typically not applied to the dataset in its entirety. This study initially undertakes a concise review of the state-of-the-art benchmarks pertaining to the dataset, with a particular focus on clarifying data preparation and preprocessing techniques. Subsequently, we explore the utilization of the Wavelet Scattering Transform (WST) and Mel spectrogram as preprocessing mechanisms for feature extraction. In this paper, we introduce \textbf{WhaleNet} (Wavelet Highly Adaptive Learning Ensemble Network), a sophisticated deep ensemble architecture for the classification of marine mammal vocalizations, leveraging both WST and Mel spectrogram for enhanced feature discrimination. By integrating the insights derived from WST and Mel representations, we achieved an improvement in classification accuracy by $8-10\%$ over existing architectures, corresponding to a classification accuracy of $97.61\%$.

Updated: 2024-06-26 14:34:13

标题: 鲸网：用于沃特金斯海洋哺乳动物声音数据库的新型深度学习架构

摘要: 海洋哺乳动物的交流是一个复杂的领域，受到叫声的多样性和环境因素的阻碍。Watkins海洋哺乳动物声音数据库（WMMD）构成了一个全面的带标签数据集，用于机器学习应用。然而，文献中记录的数据准备、预处理和分类的方法学表现出相当大的变异性，通常不会全部应用于数据集。本研究首先对与数据集相关的最新基准进行简明回顾，特别关注澄清数据准备和预处理技术。随后，我们探讨了Wavelet Scattering Transform（WST）和Mel频谱图作为特征提取的预处理机制的利用。在本文中，我们介绍了\textbf{WhaleNet}（Wavelet Highly Adaptive Learning Ensemble Network），这是一个用于分类海洋哺乳动物叫声的复杂深度集成架构，利用WST和Mel频谱图来增强特征辨别能力。通过整合从WST和Mel表征中得出的见解，我们在现有架构基础上将分类准确性提高了8-10％，对应于97.61％的分类准确性。

更新时间: 2024-06-26 14:34:13

领域: eess.SP,cs.AI,cs.CV,cs.LG,cs.SD,eess.AS

下载: http://arxiv.org/abs/2402.17775v2

Blockchain Based Zero-Knowledge Proof of Location in IoT

With the development of precise positioning technology, a growing number of location-based services (LBSs) facilitate people's life. Most LBSs require proof of location (PoL) to prove that the user satisfies the service requirement, which exposes the user's privacy. In this paper, we propose a zero-knowledge proof of location (zk-PoL) protocol to better protect the user's privacy. With the zk-PoL protocol, the user can choose necessary information to expose to the server, so that hierarchical privacy protection can be achieved. The evaluation shows that the zk-PoL has excellent security to resist main attacks, moreover the computational efficiency is independent of input parameters and the zk-PoL is appropriate to delay-tolerant LBSs.

Updated: 2024-06-26 14:30:56

标题: 基于区块链的物联网零知识位置证明

摘要: 随着精确定位技术的发展，越来越多的基于位置的服务（LBS）方便了人们的生活。大多数LBS需要位置证明（PoL）来证明用户满足服务需求，这暴露了用户的隐私。在本文中，我们提出了一种零知识位置证明（zk-PoL）协议，以更好地保护用户的隐私。使用zk-PoL协议，用户可以选择需要向服务器披露的信息，从而实现分层隐私保护。评估表明，zk-PoL具有出色的安全性，能够抵抗主要攻击，而且计算效率与输入参数无关，适用于延迟容忍的LBS。

更新时间: 2024-06-26 14:30:56

领域: cs.CR

下载: http://arxiv.org/abs/2406.18389v1

DoubleTake: Geometry Guided Depth Estimation

Estimating depth from a sequence of posed RGB images is a fundamental computer vision task, with applications in augmented reality, path planning etc. Prior work typically makes use of previous frames in a multi view stereo framework, relying on matching textures in a local neighborhood. In contrast, our model leverages historical predictions by giving the latest 3D geometry data as an extra input to our network. This self-generated geometric hint can encode information from areas of the scene not covered by the keyframes and it is more regularized when compared to individual predicted depth maps for previous frames. We introduce a Hint MLP which combines cost volume features with a hint of the prior geometry, rendered as a depth map from the current camera location, together with a measure of the confidence in the prior geometry. We demonstrate that our method, which can run at interactive speeds, achieves state-of-the-art estimates of depth and 3D scene reconstruction in both offline and incremental evaluation scenarios.

Updated: 2024-06-26 14:29:05

标题: 双重关注：几何引导的深度估计

摘要: 从一系列定位的RGB图像中估计深度是一个基本的计算机视觉任务，适用于增强现实、路径规划等应用。先前的工作通常利用多视图立体框架中的先前帧，依赖于在局部邻域中匹配纹理。相反，我们的模型通过将最新的3D几何数据作为额外输入提供给我们的网络，利用历史预测。这种自动生成的几何提示可以编码场景中未被关键帧覆盖的区域的信息，并且与之前帧的个别预测深度图相比更加规范化。我们引入了一个Hint MLP，它将成本体积特征与先前几何的提示结合在一起，作为从当前摄像机位置渲染的深度图，同时伴随着对先前几何的信心度的度量。我们证明了我们的方法可以以交互速度运行，并在离线和增量评估场景中实现了深度和三维场景重建的最先进估计。

更新时间: 2024-06-26 14:29:05

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2406.18387v1

Emergent World Representations: Exploring a Sequence Model Trained on a Synthetic Task

Language models show a surprising range of capabilities, but the source of their apparent competence is unclear. Do these networks just memorize a collection of surface statistics, or do they rely on internal representations of the process that generates the sequences they see? We investigate this question by applying a variant of the GPT model to the task of predicting legal moves in a simple board game, Othello. Although the network has no a priori knowledge of the game or its rules, we uncover evidence of an emergent nonlinear internal representation of the board state. Interventional experiments indicate this representation can be used to control the output of the network and create "latent saliency maps" that can help explain predictions in human terms.

Updated: 2024-06-26 14:27:49

标题: 新兴世界表征：探索在合成任务上训练的序列模型

摘要: 语言模型展示了令人惊讶的能力范围，但它们表现出来的能力的来源尚不清楚。这些网络是否只是记忆了一系列表面统计数据，还是依赖于生成它们所看到的序列的过程的内部表示？我们通过将GPT模型的一个变体应用于预测一个简单棋盘游戏奥赛罗中的合法移动来研究这个问题。尽管网络对游戏或其规则没有先验知识，但我们发现了对局面状态的新兴非线性内部表示的证据。干预实验表明，这种表示可以用于控制网络的输出并创建“潜在显著性地图”，这些地图可以帮助用人类术语解释预测。

更新时间: 2024-06-26 14:27:49

领域: cs.LG,cs.AI,cs.CL

下载: http://arxiv.org/abs/2210.13382v5

Adversarial Search Engine Optimization for Large Language Models

Large Language Models (LLMs) are increasingly used in applications where the model selects from competing third-party content, such as in LLM-powered search engines or chatbot plugins. In this paper, we introduce Preference Manipulation Attacks, a new class of attacks that manipulate an LLM's selections to favor the attacker. We demonstrate that carefully crafted website content or plugin documentations can trick an LLM to promote the attacker products and discredit competitors, thereby increasing user traffic and monetization. We show this leads to a prisoner's dilemma, where all parties are incentivized to launch attacks, but the collective effect degrades the LLM's outputs for everyone. We demonstrate our attacks on production LLM search engines (Bing and Perplexity) and plugin APIs (for GPT-4 and Claude). As LLMs are increasingly used to rank third-party content, we expect Preference Manipulation Attacks to emerge as a significant threat.

Updated: 2024-06-26 14:24:51

标题: 大规模语言模型的对抗性搜索引擎优化

摘要: 大型语言模型（LLMs）越来越多地被用于选择竞争第三方内容的应用，例如LLM驱动的搜索引擎或聊天机器人插件。在本文中，我们介绍了一种新型攻击类别，即偏好操纵攻击，这种攻击操纵LLM的选择以偏向攻击者。我们展示了精心制作的网站内容或插件文档如何欺骗LLM，使其推广攻击者的产品并贬低竞争对手，从而增加用户流量和货币化。我们展示了这导致了囚徒困境，所有各方都被激励发动攻击，但集体效果会使LLM的输出对所有人产生负面影响。我们在生产LLM搜索引擎（必应和Perplexity）和插件API（GPT-4和Claude）上展示了我们的攻击。随着LLMs越来越多地被用于排名第三方内容，我们预计偏好操纵攻击将成为一个重要威胁。

更新时间: 2024-06-26 14:24:51

领域: cs.CR,cs.LG

下载: http://arxiv.org/abs/2406.18382v1

An LLM-based Knowledge Synthesis and Scientific Reasoning Framework for Biomedical Discovery

We present BioLunar, developed using the Lunar framework, as a tool for supporting biological analyses, with a particular emphasis on molecular-level evidence enrichment for biomarker discovery in oncology. The platform integrates Large Language Models (LLMs) to facilitate complex scientific reasoning across distributed evidence spaces, enhancing the capability for harmonizing and reasoning over heterogeneous data sources. Demonstrating its utility in cancer research, BioLunar leverages modular design, reusable data access and data analysis components, and a low-code user interface, enabling researchers of all programming levels to construct LLM-enabled scientific workflows. By facilitating automatic scientific discovery and inference from heterogeneous evidence, BioLunar exemplifies the potential of the integration between LLMs, specialised databases and biomedical tools to support expert-level knowledge synthesis and discovery.

Updated: 2024-06-26 14:22:46

标题: 一种基于LLM的医学发现知识综合及科学推理框架

摘要: 我们介绍了使用Lunar框架开发的BioLunar作为一种支持生物学分析的工具，特别强调在肿瘤学中生物标志物发现的分子水平证据丰富化。该平台集成了大型语言模型（LLMs），以促进跨分布式证据空间的复杂科学推理，增强了协调和推理异质数据源的能力。通过在癌症研究中展示其实用性，BioLunar利用模块化设计、可重用的数据访问和数据分析组件，以及低代码用户界面，使所有编程水平的研究人员能够构建LLM启用的科学工作流程。通过促进来自异质证据的自动科学发现和推理，BioLunar展示了LLMs、专门数据库和生物医学工具集成的潜力，支持专家级知识综合和发现。

更新时间: 2024-06-26 14:22:46

领域: q-bio.QM,cs.AI,cs.CL

下载: http://arxiv.org/abs/2406.18626v1

KAGNNs: Kolmogorov-Arnold Networks meet Graph Learning

In recent years, Graph Neural Networks (GNNs) have become the de facto tool for learning node and graph representations. Most GNNs typically consist of a sequence of neighborhood aggregation (a.k.a., message passing) layers. Within each of these layers, the representation of each node is updated from an aggregation and transformation of its neighbours representations at the previous layer. The upper bound for the expressive power of message passing GNNs was reached through the use of MLPs as a transformation, due to their universal approximation capabilities. However, MLPs suffer from well-known limitations, which recently motivated the introduction of Kolmogorov-Arnold Networks (KANs). KANs rely on the Kolmogorov-Arnold representation theorem, rendering them a promising alternative to MLPs. In this work, we compare the performance of KANs against that of MLPs in graph learning tasks. We perform extensive experiments on node classification, graph classification and graph regression datasets. Our preliminary results indicate that while KANs are on-par with MLPs in classification tasks, they seem to have a clear advantage in the graph regression tasks.

Updated: 2024-06-26 14:21:21

标题: KAGNNs：科尔莫戈洛夫-阿诺德网络遇上图学习

摘要: 近年来，图神经网络（GNNs）已成为学习节点和图表示的事实工具。大多数GNNs通常由一系列邻域聚合（也称为消息传递）层组成。在每个层中，每个节点的表示从上一层的邻居表示的聚合和转换中更新。通过使用MLPs作为转换，消息传递GNNs的表达能力的上限已经达到，这是因为它们具有通用逼近能力。然而，MLPs存在众所周知的局限性，这最近促使引入Kolmogorov-Arnold Networks（KANs）。KANs依赖于Kolmogorov-Arnold表示定理，使它们成为MLPs的有希望的替代方案。在这项工作中，我们比较了KANs在图学习任务中与MLPs的性能。我们在节点分类、图分类和图回归数据集上进行了大量实验。我们的初步结果显示，虽然KANs在分类任务中与MLPs持平，但它们似乎在图回归任务中具有明显优势。

更新时间: 2024-06-26 14:21:21

领域: cs.LG

下载: http://arxiv.org/abs/2406.18380v1

MALSIGHT: Exploring Malicious Source Code and Benign Pseudocode for Iterative Binary Malware Summarization

Binary malware summarization aims to automatically generate human-readable descriptions of malware behaviors from executable files, facilitating tasks like malware cracking and detection. Previous methods based on Large Language Models (LLMs) have shown great promise. However, they still face significant issues, including poor usability, inaccurate explanations, and incomplete summaries, primarily due to the obscure pseudocode structure and the lack of malware training summaries. Further, calling relationships between functions, which involve the rich interactions within a binary malware, remain largely underexplored. To this end, we propose MALSIGHT, a novel code summarization framework that can iteratively generate descriptions of binary malware by exploring malicious source code and benign pseudocode. Specifically, we construct the first malware summaries, MalS and MalP, using an LLM and manually refine this dataset with human effort. At the training stage, we tune our proposed MalT5, a novel LLM-based code model, on the MalS dataset and a benign pseudocode dataset. Then, at the test stage, we iteratively feed the pseudocode functions into MalT5 to obtain the summary. Such a procedure facilitates the understanding of pseudocode structure and captures the intricate interactions between functions, thereby benefiting the usability, accuracy, and completeness of summaries. Additionally, we propose a novel evaluation benchmark, BLEURT-sum, to measure the quality of summaries. Experiments on three datasets show the effectiveness of the proposed MALSIGHT. Notably, our proposed MalT5, with only 0.77B parameters, delivers comparable performance to much larger ChatGPT3.5.

Updated: 2024-06-26 14:21:09

标题: MALSIGHT：探索恶意源代码和良性伪代码，用于迭代式二进制恶意软件总结

摘要: 二进制恶意软件摘要旨在自动生成可读性强的恶意软件行为描述，从可执行文件中简化类似恶意软件破解和检测的任务。基于大型语言模型（LLMs）的先前方法显示出很大的潜力。然而，它们仍然面临重大问题，包括用户体验不佳、解释不准确和摘要不完整，主要是由于模糊的伪代码结构和缺乏恶意软件训练摘要。此外，涉及二进制恶意软件内部丰富交互的函数之间的调用关系仍然很少被探索。为此，我们提出了MALSIGHT，一个新颖的代码摘要框架，可以通过探索恶意源代码和良性伪代码迭代生成二进制恶意软件的描述。具体而言，我们使用LLM构建了第一个恶意软件摘要MalS和MalP，并通过人工工作精细化了该数据集。在训练阶段，我们在MalS数据集和良性伪代码数据集上调整了我们提出的新颖LLM基础代码模型MalT5。然后，在测试阶段，我们迭代地将伪代码函数输入MalT5以获取摘要。这种过程有助于理解伪代码结构并捕捉函数之间复杂的交互，从而有利于摘要的可用性、准确性和完整性。此外，我们提出了一个新颖的评估基准，BLEURT-sum，用于衡量摘要的质量。在三个数据集上的实验显示了MALSIGHT的有效性。值得注意的是，我们提出的MalT5，仅有0.77B参数，提供了与更大的ChatGPT3.5相当的性能。

更新时间: 2024-06-26 14:21:09

领域: cs.CR,cs.AI,cs.SE

下载: http://arxiv.org/abs/2406.18379v1

The Fundamental Limits of Least-Privilege Learning

The promise of least-privilege learning -- to find feature representations that are useful for a learning task but prevent inference of any sensitive information unrelated to this task -- is highly appealing. However, so far this concept has only been stated informally. It thus remains an open question whether and how we can achieve this goal. In this work, we provide the first formalisation of the least-privilege principle for machine learning and characterise its feasibility. We prove that there is a fundamental trade-off between a representation's utility for a given task and its leakage beyond the intended task: it is not possible to learn representations that have high utility for the intended task but, at the same time prevent inference of any attribute other than the task label itself. This trade-off holds under realistic assumptions on the data distribution and regardless of the technique used to learn the feature mappings that produce these representations. We empirically validate this result for a wide range of learning techniques, model architectures, and datasets.

Updated: 2024-06-26 14:18:44

标题: 最小权限学习的基本限制

摘要: 最小权限学习的诺言 - 找到对学习任务有用的特征表示，但阻止推断与该任务无关的任何敏感信息 - 是非常吸引人的。然而，到目前为止，这个概念只是以非正式的方式陈述。因此，是否以及如何实现这一目标仍然是一个悬而未决的问题。在这项工作中，我们首次对机器学习的最小权限原则进行了正式化，并表明其可行性。我们证明了在给定任务的情况下，表示对于任务的效用和超出预期任务范围的泄漏之间存在根本的权衡：不可能学习具有对于预期任务的高效用性，但同时阻止推断除任务标签本身以外的任何属性的表示。在数据分布上做出现实假设的情况下，这种权衡成立，不管用于学习产生这些表示的特征映射的技术是什么。我们在各种学习技术、模型架构和数据集上进行了实证验证。

更新时间: 2024-06-26 14:18:44

领域: cs.LG,cs.CR

下载: http://arxiv.org/abs/2402.12235v2

Learning pure quantum states (almost) without regret

We initiate the study of quantum state tomography with minimal regret. A learner has sequential oracle access to an unknown pure quantum state, and in each round selects a pure probe state. Regret is incurred if the unknown state is measured orthogonal to this probe, and the learner's goal is to minimise the expected cumulative regret over $T$ rounds. The challenge is to find a balance between the most informative measurements and measurements incurring minimal regret. We show that the cumulative regret scales as $\Theta(\operatorname{polylog} T)$ using a new tomography algorithm based on a median of means least squares estimator. This algorithm employs measurements biased towards the unknown state and produces online estimates that are optimal (up to logarithmic terms) in the number of observed samples.

Updated: 2024-06-26 14:13:50

标题: 学习纯量子态（几乎）无悔意

摘要: 我们开始研究具有最小后悔的量子状态重构。学习者依次访问一个未知的纯量子态，并在每一轮选择一个纯探测态。如果未知态与此探测态正交测量，则会产生后悔，学习者的目标是在$T$轮中最小化期望累积后悔。挑战在于找到最具信息量的测量和产生最小后悔的测量之间的平衡。我们展示了一个基于中位数最小二乘估计器的新重构算法，使累积后悔的规模为$\Theta(\operatorname{polylog} T)$。该算法采用偏向未知态的测量，并生成在线估计，在观察到的样本数上是最优的（在对数项上）。

更新时间: 2024-06-26 14:13:50

领域: quant-ph,cs.AI,cs.LG,stat.ML

下载: http://arxiv.org/abs/2406.18370v1

Inference-Time Intervention: Eliciting Truthful Answers from a Language Model

We introduce Inference-Time Intervention (ITI), a technique designed to enhance the "truthfulness" of large language models (LLMs). ITI operates by shifting model activations during inference, following a set of directions across a limited number of attention heads. This intervention significantly improves the performance of LLaMA models on the TruthfulQA benchmark. On an instruction-finetuned LLaMA called Alpaca, ITI improves its truthfulness from 32.5% to 65.1%. We identify a tradeoff between truthfulness and helpfulness and demonstrate how to balance it by tuning the intervention strength. ITI is minimally invasive and computationally inexpensive. Moreover, the technique is data efficient: while approaches like RLHF require extensive annotations, ITI locates truthful directions using only few hundred examples. Our findings suggest that LLMs may have an internal representation of the likelihood of something being true, even as they produce falsehoods on the surface.

Updated: 2024-06-26 14:11:53

标题: 推断时间干预：从语言模型中获取真实答案

摘要: 我们介绍了一种称为推断时间干预（ITI）的技术，旨在增强大型语言模型（LLMs）的“真实性”。ITI通过在推断过程中沿着一组方向移动模型激活，跨越有限数量的注意力头。这种干预显著改善了LLaMA模型在TruthfulQA基准上的性能。在一个名为Alpaca的LLaMA上进行指令微调后，ITI将其真实性从32.5%提高到65.1%。我们发现真实性和帮助性之间存在权衡，并演示如何通过调整干预强度来平衡。ITI对系统影响最小，计算成本低廉。此外，该技术具有数据效率：虽然RLHF等方法需要大量注释，但ITI仅使用少量例子即可找到真实的方向。我们的研究结果表明，尽管LLMs在表面上产生虚假信息，但它们可能具有关于某事为真实的内部表示。

更新时间: 2024-06-26 14:11:53

领域: cs.LG,cs.AI,cs.CL

下载: http://arxiv.org/abs/2306.03341v6

Research on Information Extraction of LCSTS Dataset Based on an Improved BERTSum-LSTM Model

With the continuous advancement of artificial intelligence, natural language processing technology has become widely utilized in various fields. At the same time, there are many challenges in creating Chinese news summaries. First of all, the semantics of Chinese news is complex, and the amount of information is enormous. Extracting critical information from Chinese news presents a significant challenge. Second, the news summary should be concise and clear, focusing on the main content and avoiding redundancy. In addition, the particularity of the Chinese language, such as polysemy, word segmentation, etc., makes it challenging to generate Chinese news summaries. Based on the above, this paper studies the information extraction method of the LCSTS dataset based on an improved BERTSum-LSTM model. We improve the BERTSum-LSTM model to make it perform better in generating Chinese news summaries. The experimental results show that the proposed method has a good effect on creating news summaries, which is of great importance to the construction of news summaries.

Updated: 2024-06-26 14:04:15

标题: 《基于改进的BERTSum-LSTM模型的LCSTS数据集信息提取研究》

摘要: 随着人工智能的不断进步，自然语言处理技术已被广泛应用于各个领域。与此同时，在创建中文新闻摘要方面存在许多挑战。首先，中文新闻的语义复杂，信息量巨大。从中文新闻中提取关键信息是一个重大挑战。其次，新闻摘要应简洁清晰，侧重主要内容，避免冗余。此外，中文语言的特殊性，如多义性、分词等，使得生成中文新闻摘要具有挑战性。基于以上考虑，本文研究了基于改进的BERTSum-LSTM模型的LCSTS数据集的信息提取方法。我们改进了BERTSum-LSTM模型，使其在生成中文新闻摘要方面表现更好。实验结果表明，所提出的方法对创建新闻摘要具有良好的效果，对新闻摘要的构建具有重要意义。

更新时间: 2024-06-26 14:04:15

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2406.18364v1

MindStar: Enhancing Math Reasoning in Pre-trained LLMs at Inference Time

Although Large Language Models (LLMs) achieve remarkable performance across various tasks, they often struggle with complex reasoning tasks, such as answering mathematical questions. Recent efforts to address this issue have primarily focused on leveraging mathematical datasets through supervised fine-tuning or self-improvement techniques. However, these methods often depend on high-quality datasets that are difficult to prepare, or they require substantial computational resources for fine-tuning. Inspired by findings that LLMs know how to produce the right answer but struggle to select the correct reasoning path, we propose a purely inference-based searching method -- MindStar (M*). This method formulates reasoning tasks as searching problems and proposes two search ideas to identify the optimal reasoning paths. We evaluate the M* framework on both the GSM8K and MATH datasets, comparing its performance with existing open and closed-source LLMs. Our results demonstrate that M* significantly enhances the reasoning abilities of open-source models, such as Llama-2-13B and Mistral-7B, and achieves comparable performance to GPT-3.5 and Grok-1, but with substantially reduced model size and computational costs.

Updated: 2024-06-26 14:01:15

标题: MindStar：在推理时间增强预训练LLMs中的数学推理

摘要: 尽管大型语言模型（LLMs）在各种任务中取得了显著的表现，但它们通常在复杂的推理任务中遇到困难，比如回答数学问题。最近为解决这个问题的努力主要集中在利用数学数据集进行监督微调或自我改进技术。然而，这些方法通常依赖于难以准备的高质量数据集，或者需要大量的计算资源进行微调。受到一项发现的启发，即LLMs知道如何产生正确答案但难以选择正确的推理路径，我们提出了一种纯推理为基础的搜索方法--MindStar（M*）。该方法将推理任务形式化为搜索问题，并提出了两种搜索思路来确定最佳的推理路径。我们在GSM8K和MATH数据集上评估了M*框架，将其性能与现有的开源和闭源LLMs进行了比较。我们的结果表明，M*显著提升了开源模型（如Llama-2-13B和Mistral-7B）的推理能力，并且与GPT-3.5和Grok-1具有可比性的表现，但模型大小和计算成本大大降低。

更新时间: 2024-06-26 14:01:15

领域: cs.LG

下载: http://arxiv.org/abs/2405.16265v4

Kolmogorov-Arnold Graph Neural Networks

Graph neural networks (GNNs) excel in learning from network-like data but often lack interpretability, making their application challenging in domains requiring transparent decision-making. We propose the Graph Kolmogorov-Arnold Network (GKAN), a novel GNN model leveraging spline-based activation functions on edges to enhance both accuracy and interpretability. Our experiments on five benchmark datasets demonstrate that GKAN outperforms state-of-the-art GNN models in node classification, link prediction, and graph classification tasks. In addition to the improved accuracy, GKAN's design inherently provides clear insights into the model's decision-making process, eliminating the need for post-hoc explainability techniques. This paper discusses the methodology, performance, and interpretability of GKAN, highlighting its potential for applications in domains where interpretability is crucial.

Updated: 2024-06-26 13:54:59

标题: 科尔莫戈洛夫-阿诺德图神经网络

摘要: 图神经网络（GNNs）在学习类似网络的数据方面表现出色，但通常缺乏可解释性，这使得它们在需要透明决策的领域中的应用具有挑战性。我们提出了图科尔莫戈洛夫-阿诺尔德网络（GKAN），这是一种新颖的GNN模型，利用基于样条的激活函数在边上增强了准确性和可解释性。我们在五个基准数据集上的实验表明，GKAN在节点分类、链接预测和图分类任务中优于最先进的GNN模型。除了改进的准确性外，GKAN的设计本质上提供了对模型决策过程的清晰洞察，消除了对事后可解释性技术的需求。本文讨论了GKAN的方法论、性能和可解释性，突出了其在需要可解释性至关重要的领域中的应用潜力。

更新时间: 2024-06-26 13:54:59

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2406.18354v1

Reinforcement Learning with Intrinsically Motivated Feedback Graph for Lost-sales Inventory Control

Reinforcement learning (RL) has proven to be well-performed and general-purpose in the inventory control (IC). However, further improvement of RL algorithms in the IC domain is impeded due to two limitations of online experience. First, online experience is expensive to acquire in real-world applications. With the low sample efficiency nature of RL algorithms, it would take extensive time to train the RL policy to convergence. Second, online experience may not reflect the true demand due to the lost sales phenomenon typical in IC, which makes the learning process more challenging. To address the above challenges, we propose a decision framework that combines reinforcement learning with feedback graph (RLFG) and intrinsically motivated exploration (IME) to boost sample efficiency. In particular, we first take advantage of the inherent properties of lost-sales IC problems and design the feedback graph (FG) specially for lost-sales IC problems to generate abundant side experiences aid RL updates. Then we conduct a rigorous theoretical analysis of how the designed FG reduces the sample complexity of RL methods. Based on the theoretical insights, we design an intrinsic reward to direct the RL agent to explore to the state-action space with more side experiences, further exploiting FG's power. Experimental results demonstrate that our method greatly improves the sample efficiency of applying RL in IC. Our code is available at https://anonymous.4open.science/r/RLIMFG4IC-811D/

Updated: 2024-06-26 13:52:47

标题: 使用内在动机反馈图的强化学习在缺货库存控制中的应用

摘要: 强化学习（RL）在库存控制（IC）领域表现出良好的性能和通用性。然而，由于在线经验的两个限制，进一步改进IC领域的RL算法受到阻碍。首先，实际应用中获取在线经验是昂贵的。由于RL算法的低样本效率特性，训练RL策略达到收敛需要大量时间。其次，由于IC中典型的销售损失现象，在线经验可能无法反映真实需求，这使得学习过程更具挑战性。为了解决上述挑战，我们提出了一个决策框架，将强化学习与反馈图（RLFG）和内在动机探索（IME）相结合，以提高样本效率。具体来说，我们首先利用销售损失IC问题的固有特性，专门为销售损失IC问题设计反馈图（FG），以生成丰富的辅助经验，帮助RL更新。然后，我们对设计的FG如何降低RL方法的样本复杂度进行了严格的理论分析。基于理论洞察，我们设计了一种内在奖励，引导RL代理探索具有更多辅助经验的状态-动作空间，进一步利用FG的能力。实验结果表明，我们的方法极大地提高了将RL应用于IC中的样本效率。我们的代码可在https://anonymous.4open.science/r/RLIMFG4IC-811D/上找到。

更新时间: 2024-06-26 13:52:47

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2406.18351v1

AI Alignment through Reinforcement Learning from Human Feedback? Contradictions and Limitations

This paper critically evaluates the attempts to align Artificial Intelligence (AI) systems, especially Large Language Models (LLMs), with human values and intentions through Reinforcement Learning from Feedback (RLxF) methods, involving either human feedback (RLHF) or AI feedback (RLAIF). Specifically, we show the shortcomings of the broadly pursued alignment goals of honesty, harmlessness, and helpfulness. Through a multidisciplinary sociotechnical critique, we examine both the theoretical underpinnings and practical implementations of RLxF techniques, revealing significant limitations in their approach to capturing the complexities of human ethics and contributing to AI safety. We highlight tensions and contradictions inherent in the goals of RLxF. In addition, we discuss ethically-relevant issues that tend to be neglected in discussions about alignment and RLxF, among which the trade-offs between user-friendliness and deception, flexibility and interpretability, and system safety. We conclude by urging researchers and practitioners alike to critically assess the sociotechnical ramifications of RLxF, advocating for a more nuanced and reflective approach to its application in AI development.

Updated: 2024-06-26 13:42:13

标题: 通过人类反馈进行强化学习的AI对齐？矛盾与限制

摘要: 本文对尝试通过反馈强化学习方法将人工智能系统，特别是大型语言模型（LLMs）与人类价值观和意图进行对齐的努力进行了批判性评估，涉及人类反馈（RLHF）或人工智能反馈（RLAIF）。具体来说，我们展示了广泛追求的诚实、无害和有益对齐目标的不足之处。通过多学科社会技术批判，我们审查了RLxF技术的理论基础和实际实施，揭示了它们在捕捉人类伦理复杂性和促进人工智能安全方面的重大限制。我们强调了RLxF目标中固有的紧张和矛盾。此外，我们讨论了在有关对齐和RLxF的讨论中往往被忽视的具有伦理意义的问题，其中包括用户友好性与欺骗、灵活性与可解释性以及系统安全之间的权衡。我们最终敦促研究人员和从业者们对RLxF的社会技术后果进行批判性评估，并倡导在人工智能开发中更加细致和反思的应用方法。

更新时间: 2024-06-26 13:42:13

领域: cs.AI

下载: http://arxiv.org/abs/2406.18346v1

EmT: A Novel Transformer for Generalized Cross-subject EEG Emotion Recognition

Integrating prior knowledge of neurophysiology into neural network architecture enhances the performance of emotion decoding. While numerous techniques emphasize learning spatial and short-term temporal patterns, there has been limited emphasis on capturing the vital long-term contextual information associated with emotional cognitive processes. In order to address this discrepancy, we introduce a novel transformer model called emotion transformer (EmT). EmT is designed to excel in both generalized cross-subject EEG emotion classification and regression tasks. In EmT, EEG signals are transformed into a temporal graph format, creating a sequence of EEG feature graphs using a temporal graph construction module (TGC). A novel residual multi-view pyramid GCN module (RMPG) is then proposed to learn dynamic graph representations for each EEG feature graph within the series, and the learned representations of each graph are fused into one token. Furthermore, we design a temporal contextual transformer module (TCT) with two types of token mixers to learn the temporal contextual information. Finally, the task-specific output module (TSO) generates the desired outputs. Experiments on four publicly available datasets show that EmT achieves higher results than the baseline methods for both EEG emotion classification and regression tasks. The code is available at https://github.com/yi-ding-cs/EmT.

Updated: 2024-06-26 13:42:11

标题: EmT：一种用于广义跨主题脑电情绪识别的新型Transformer

摘要: 将神经生理学的先前知识整合到神经网络架构中，可以增强情绪解码的性能。虽然许多技术强调学习空间和短期时间模式，但对捕获与情绪认知过程相关的重要长期上下文信息的重视有限。为了解决这一差距，我们引入了一种名为情感变换器（EmT）的新型变压器模型。EmT被设计为在广义跨受试者EEG情绪分类和回归任务中表现出色。在EmT中，EEG信号被转换为时间图格式，使用时间图构建模块（TGC）创建一系列EEG特征图的序列。然后提出了一种新颖的残差多视图金字塔GCN模块（RMPG），用于学习系列中每个EEG特征图的动态图表示，并将每个图的学习表示融合成一个令牌。此外，我们设计了一个具有两种令牌混合器的时间上下文变换器模块（TCT）来学习时间上下文信息。最后，任务特定输出模块（TSO）生成所需的输出。对四个公开可用数据集的实验表明，EmT在EEG情绪分类和回归任务中均实现了比基线方法更高的结果。代码可在https://github.com/yi-ding-cs/EmT 上找到。

更新时间: 2024-06-26 13:42:11

领域: cs.LG,eess.SP

下载: http://arxiv.org/abs/2406.18345v1

Topological data quality via 0-dimensional persistence matching

Data quality is crucial for the successful training, generalization and performance of artificial intelligence models. We propose to measure data quality for supervised learning using topological data analysis techniques. Specifically, we provide a novel topological invariant based on persistence matchings induced by inclusions and using $0$-dimensional persistent homology. We show that such an invariant is stable. We provide an algorithm and relate it to images, kernels, and cokernels of the induced morphisms. Also, we show that the invariant allows us to understand whether the subset "represents well" the clusters from the larger dataset or not, and we also use it to estimate bounds for the Hausdorff distance between the subset and the complete dataset. This approach enables us to explain why the chosen dataset will lead to poor performance.

Updated: 2024-06-26 13:37:58

标题: 通过0维持久性匹配实现拓扑数据质量

摘要: 数据质量对于人工智能模型的成功训练、泛化和性能至关重要。我们提出使用拓扑数据分析技术来衡量监督学习的数据质量。具体来说，我们提供了一种基于包含和使用$0$维持久同调的持久匹配的新型拓扑不变量。我们展示了这种不变量是稳定的。我们提供了一个算法，并将其与诱导态射的图像、核和余核相关联。此外，我们展示了该不变量能够帮助我们了解子集是否"很好地代表"了更大数据集中的聚类，我们还使用它来估计子集和完整数据集之间的豪斯多夫距离的上界。这种方法使我们能够解释为什么选择的数据集会导致性能不佳。

更新时间: 2024-06-26 13:37:58

领域: math.AT,cs.AI,cs.LG

下载: http://arxiv.org/abs/2306.02411v2

Introducing 3DCNN ResNets for ASD full-body kinematic assessment: a comparison with hand-crafted features

Autism Spectrum Disorder (ASD) is characterized by challenges in social communication and restricted patterns, with motor abnormalities gaining traction for early detection. However, kinematic analysis in ASD is limited, often lacking robust validation and relying on hand-crafted features for single tasks, leading to inconsistencies across studies. End-to-end models have emerged as promising methods to overcome the need for feature engineering. Our aim is to propose a newly adapted 3DCNN ResNet from and compare it to widely used hand-crafted features for motor ASD assessment. Specifically, we developed a virtual reality environment with multiple motor tasks and trained models using both approaches. We prioritized a reliable validation framework with repeated cross-validation. Results show the proposed model achieves a maximum accuracy of 85$\pm$3%, outperforming state-of-the-art end-to-end models with short 1-to-3 minute samples. Our comparative analysis with hand-crafted features shows feature-engineered models outperformed our end-to-end model in certain tasks. However, our end-to-end model achieved a higher mean AUC of 0.80$\pm$0.03. Additionally, statistical differences were found in model variance, with our end-to-end model providing more consistent results with less variability across all VR tasks, demonstrating domain generalization and reliability. These findings show that end-to-end models enable less variable and context-independent ASD classification without requiring domain knowledge or task specificity. However, they also recognize the effectiveness of hand-crafted features in specific task scenarios.

Updated: 2024-06-26 13:29:12

标题: 引入3DCNN ResNets用于ASD全身运动学评估：与手工特征的比较

摘要: 自闭症谱系障碍（ASD）以社交沟通挑战和受限模式为特征，运动异常成为早期检测的焦点。然而，在ASD的运动学分析方面存在局限性，通常缺乏强有力的验证，依赖于为单个任务手工制作的特征，导致研究之间存在不一致性。端到端模型已经成为克服特征工程需求的有前途的方法。我们的目标是提出一种新适应的3DCNN ResNet，并将其与广泛使用的手工制作特征进行运动ASD评估的比较。具体而言，我们开发了一个具有多个运动任务的虚拟现实环境，并使用两种方法训练模型。我们优先考虑了一个可靠的重复交叉验证框架。结果显示，所提出的模型达到了85±3%的最高准确率，优于具有短1至3分钟样本的最先进的端到端模型。我们与手工制作特征的比较分析显示，在某些任务中，特征工程模型优于我们的端到端模型。然而，我们的端到端模型实现了更高的平均AUC为0.80±0.03。此外，在模型方差方面发现了统计学上的差异，我们的端到端模型提供了更一致的结果，减少了在所有虚拟现实任务中的变异性，展示了领域泛化和可靠性。这些发现表明，端到端模型使得ASD分类更少变量且与上下文无关，无需领域知识或任务特定性。然而，他们也认识到在特定任务场景中手工制作特征的有效性。

更新时间: 2024-06-26 13:29:12

领域: cs.LG

下载: http://arxiv.org/abs/2311.14533v3

Combining Reconstruction and Contrastive Methods for Multimodal Representations in RL

Learning self-supervised representations using reconstruction or contrastive losses improves performance and sample complexity of image-based and multimodal reinforcement learning (RL). Here, different self-supervised loss functions have distinct advantages and limitations depending on the information density of the underlying sensor modality. Reconstruction provides strong learning signals but is susceptible to distractions and spurious information. While contrastive approaches can ignore those, they may fail to capture all relevant details and can lead to representation collapse. For multimodal RL, this suggests that different modalities should be treated differently based on the amount of distractions in the signal. We propose Contrastive Reconstructive Aggregated representation Learning (CoRAL), a unified framework enabling us to choose the most appropriate self-supervised loss for each sensor modality and allowing the representation to better focus on relevant aspects. We evaluate CoRAL's benefits on a wide range of tasks with images containing distractions or occlusions, a new locomotion suite, and a challenging manipulation suite with visually realistic distractions. Our results show that learning a multimodal representation by combining contrastive and reconstruction-based losses can significantly improve performance and solve tasks that are out of reach for more naive representation learning approaches and other recent baselines.

Updated: 2024-06-26 13:28:35

标题: 结合重建和对比方法用于强化学习中的多模态表征

摘要: 使用重构或对比损失学习自监督表示可以提高基于图像和多模态强化学习（RL）的性能和样本复杂性。在这里，不同的自监督损失函数具有不同的优势和局限性，取决于底层传感器模态的信息密度。重构提供强大的学习信号，但容易受到干扰和虚假信息的影响。而对比方法可以忽略这些干扰，但可能无法捕获所有相关细节，并可能导致表示崩溃。对于多模态RL，这表明不同的模态应该根据信号中的干扰程度进行不同处理。我们提出了对比重构聚合表示学习（CoRAL），这是一个统一框架，使我们能够为每个传感器模态选择最合适的自监督损失，并使表示能够更好地专注于相关方面。我们评估了CoRAL在包含干扰或遮挡的图像、新的运动套件以及具有视觉逼真干扰的挑战性操纵套件上的好处。我们的结果显示，通过结合对比和基于重构的损失学习多模态表示可以显著提高性能，并解决对于更为天真的表示学习方法和其他最近基线来说难以实现的任务。

更新时间: 2024-06-26 13:28:35

领域: cs.LG

下载: http://arxiv.org/abs/2302.05342v4

Automatic Prediction of Amyotrophic Lateral Sclerosis Progression using Longitudinal Speech Transformer

Automatic prediction of amyotrophic lateral sclerosis (ALS) disease progression provides a more efficient and objective alternative than manual approaches. We propose ALS longitudinal speech transformer (ALST), a neural network-based automatic predictor of ALS disease progression from longitudinal speech recordings of ALS patients. By taking advantage of high-quality pretrained speech features and longitudinal information in the recordings, our best model achieves 91.0\% AUC, improving upon the previous best model by 5.6\% relative on the ALS TDI dataset. Careful analysis reveals that ALST is capable of fine-grained and interpretable predictions of ALS progression, especially for distinguishing between rarer and more severe cases. Code is publicly available.

Updated: 2024-06-26 13:28:24

标题: 使用纵向语音转换器自动预测肌萎缩性侧索硬化症进展

摘要: 自动预测肌萎缩侧索硬化症（ALS）疾病进展比手动方法更有效和客观。我们提出了ALS纵向语音变换器（ALST），这是一种基于神经网络的自动预测器，用于从ALS患者的纵向语音记录中预测ALS疾病的进展。通过利用高质量的预训练语音特征和记录中的纵向信息，我们的最佳模型实现了91.0\%的AUC，相对于ALS TDI数据集上的先前最佳模型提高了5.6\%。仔细分析表明，ALST能够对ALS进展进行细致且可解释的预测，特别是用于区分更罕见和更严重的病例。代码已公开可用。

更新时间: 2024-06-26 13:28:24

领域: cs.SD,cs.AI,eess.AS

下载: http://arxiv.org/abs/2406.18625v1

Efficient and Accurate Explanation Estimation with Distribution Compression

Exact computation of various machine learning explanations requires numerous model evaluations and in extreme cases becomes impractical. The computational cost of approximation increases with an ever-increasing size of data and model parameters. Many heuristics have been proposed to approximate post-hoc explanations efficiently. This paper shows that the standard i.i.d. sampling used in a broad spectrum of algorithms for explanation estimation leads to an approximation error worthy of improvement. To this end, we introduce Compress Then Explain (CTE), a new paradigm for more efficient and accurate explanation estimation. CTE uses distribution compression through kernel thinning to obtain a data sample that best approximates the marginal distribution. We show that CTE improves the estimation of removal-based local and global explanations with negligible computational overhead. It often achieves an on-par explanation approximation error using 2-3x less samples, i.e. requiring 2-3x less model evaluations. CTE is a simple, yet powerful, plug-in for any explanation method that now relies on i.i.d. sampling.

Updated: 2024-06-26 13:21:24

标题: 高效准确的解释估计与分布压缩

摘要: 准确计算各种机器学习解释需要大量的模型评估，在极端情况下变得不切实际。近似计算的计算成本随着数据和模型参数的不断增加而增加。许多启发式方法已被提出，以有效地近似事后解释。本文表明，在广泛的解释估计算法中使用的标准i.i.d.抽样会导致值得改进的近似误差。为此，我们引入了“压缩然后解释”(CTE)的新范式，用于更有效和准确的解释估计。CTE通过核稀疏来进行分布压缩，以获得最佳逼近边缘分布的数据样本。我们展示了CTE通过几乎可以忽略的计算开销来改进基于删除的局部和全局解释的估计。它通常使用2-3倍的样本数量实现与同等解释逼近误差，即需要2-3倍的模型评估。CTE是一个简单但强大的插件，适用于任何现在依赖于i.i.d.抽样的解释方法。

更新时间: 2024-06-26 13:21:24

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2406.18334v1

Continuous Sign Language Recognition Using Intra-inter Gloss Attention

Many continuous sign language recognition (CSLR) studies adopt transformer-based architectures for sequence modeling due to their powerful capacity for capturing global contexts. Nevertheless, vanilla self-attention, which serves as the core module of the transformer, calculates a weighted average over all time steps; therefore, the local temporal semantics of sign videos may not be fully exploited. In this study, we introduce a novel module in sign language recognition studies, called intra-inter gloss attention module, to leverage the relationships among frames within glosses and the semantic and grammatical dependencies between glosses in the video. In the intra-gloss attention module, the video is divided into equally sized chunks and a self-attention mechanism is applied within each chunk. This localized self-attention significantly reduces complexity and eliminates noise introduced by considering non-relative frames. In the inter-gloss attention module, we first aggregate the chunk-level features within each gloss chunk by average pooling along the temporal dimension. Subsequently, multi-head self-attention is applied to all chunk-level features. Given the non-significance of the signer-environment interaction, we utilize segmentation to remove the background of the videos. This enables the proposed model to direct its focus toward the signer. Experimental results on the PHOENIX-2014 benchmark dataset demonstrate that our method can effectively extract sign language features in an end-to-end manner without any prior knowledge, improve the accuracy of CSLR, and achieve the word error rate (WER) of 20.4 on the test set which is a competitive result compare to the state-of-the-art which uses additional supervisions.

Updated: 2024-06-26 13:21:08

标题: 连续手语识别基于内部-外部词汇注意力的方法

摘要: 许多连续手语识别（CSLR）研究采用基于transformer的架构进行序列建模，因为它们具有捕捉全局上下文的强大能力。然而，作为transformer核心模块的普通自注意力机制计算所有时间步长的加权平均值；因此，手语视频的局部时间语义可能没有得到充分利用。在这项研究中，我们引入了一种新的模块，称为内部-外部词汇注意力模块，在手语识别研究中利用词汇内帧之间的关系以及视频中词汇之间的语义和语法依赖关系。在内部词汇注意力模块中，视频被分成大小相等的块，并在每个块内应用自我关注机制。这种局部自我关注显著减少了复杂性，并消除了考虑非相关帧引入的噪音。在外部词汇注意力模块中，我们首先通过沿时间维度的平均池化聚合每个词汇块内的块级特征。随后，将多头自我关注应用于所有块级特征。鉴于手语者与环境互动的不显著性，我们利用分割来消除视频的背景。这使得所提出的模型能够将注意力集中在手语者身上。在PHOENIX-2014基准数据集上的实验结果表明，我们的方法可以在端到端的情况下有效提取手语特征，无需任何先验知识，提高CSLR的准确性，并在测试集上实现20.4的词错误率（WER），这与使用额外监督的最先进技术相比是竞争性的结果。

更新时间: 2024-06-26 13:21:08

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2406.18333v1

Early Classification of Time Series: Taxonomy and Benchmark

In many situations, the measurements of a studied phenomenon are provided sequentially, and the prediction of its class needs to be made as early as possible so as not to incur too high a time penalty, but not too early and risk paying the cost of misclassification. This problem has been particularly studied in the case of time series, and is known as Early Classification of Time Series (ECTS). Although it has been the subject of a growing body of literature, there is still a lack of a systematic, shared evaluation protocol to compare the relative merits of the various existing methods. This document begins by situating these methods within a principle-based taxonomy. It defines dimensions for organizing their evaluation, and then reports the results of a very extensive set of experiments along these dimensions involving nine state-of-the art ECTS algorithms. In addition, these and other experiments can be carried out using an open-source library in which most of the existing ECTS algorithms have been implemented (see \url{https://github.com/ML-EDM/ml_edm}).

Updated: 2024-06-26 13:21:00

标题: 时间序列的早期分类：分类和基准Benchmark

摘要: 在许多情况下，所研究现象的测量是依次提供的，需要尽早进行类别预测，以避免产生过高的时间成本，但又不能太早，以免冒误分类的风险。这个问题在时间序列的情况下被特别研究，被称为时间序列的早期分类（ECTS）。尽管这已经成为一个不断增长的文献领域，但仍然缺乏一个系统的、共享的评估协议，以比较各种现有方法的相对优点。本文首先将这些方法置于基于原则的分类之中。它定义了组织评估的维度，然后报告了一系列涉及九种最先进的ECTS算法的非常广泛的实验结果。此外，这些和其他实验可以使用一个开源库进行，其中大多数现有的ECTS算法已经被实现（请参见\url{https://github.com/ML-EDM/ml_edm}）。

更新时间: 2024-06-26 13:21:00

领域: cs.LG

下载: http://arxiv.org/abs/2406.18332v1

Molecular Diffusion Models with Virtual Receptors

Machine learning approaches to Structure-Based Drug Design (SBDD) have proven quite fertile over the last few years. In particular, diffusion-based approaches to SBDD have shown great promise. We present a technique which expands on this diffusion approach in two crucial ways. First, we address the size disparity between the drug molecule and the target/receptor, which makes learning more challenging and inference slower. We do so through the notion of a Virtual Receptor, which is a compressed version of the receptor; it is learned so as to preserve key aspects of the structural information of the original receptor, while respecting the relevant group equivariance. Second, we incorporate a protein language embedding used originally in the context of protein folding. We experimentally demonstrate the contributions of both the virtual receptors and the protein embeddings: in practice, they lead to both better performance, as well as significantly faster computations.

Updated: 2024-06-26 13:18:42

标题: 具有虚拟受体的分子扩散模型

摘要: 机器学习方法在基于结构的药物设计（SBDD）中已经被证明是相当富有成效的。特别是，基于扩散的SBDD方法显示出了巨大的潜力。我们提出了一种技术，通过两种关键方式扩展了这种扩散方法。首先，我们解决了药物分子和靶点/受体之间的大小差距，这使得学习更具挑战性，推断速度更慢。我们通过虚拟受体的概念来实现这一点，虚拟受体是受体的压缩版本；通过学习，它保留了原始受体的结构信息的关键方面，同时尊重相关的群等变性。其次，我们结合了最初用于蛋白质折叠的蛋白质语言嵌入。我们在实验中展示了虚拟受体和蛋白质嵌入的贡献：在实践中，它们不仅导致更好的性能，而且计算速度显著更快。

更新时间: 2024-06-26 13:18:42

领域: cs.LG

下载: http://arxiv.org/abs/2406.18330v1

PDFA Distillation via String Probability Queries {PDFA Distillation via String Probability Queries}

Probabilistic deterministic finite automata (PDFA) are discrete event systems modeling conditional probabilities over languages: Given an already seen sequence of tokens they return the probability of tokens of interest to appear next. These types of models have gained interest in the domain of explainable machine learning, where they are used as surrogate models for neural networks trained as language models. In this work we present an algorithm to distill PDFA from neural networks. Our algorithm is a derivative of the L# algorithm and capable of learning PDFA from a new type of query, in which the algorithm infers conditional probabilities from the probability of the queried string to occur. We show its effectiveness on a recent public dataset by distilling PDFA from a set of trained neural networks.

Updated: 2024-06-26 13:16:40

标题: 通过字符串概率查询进行PDFA提取

摘要: 概率确定性有限自动机（PDFA）是离散事件系统，用于对语言建模条件概率：给定已经看到的令牌序列，它们返回感兴趣的令牌出现的概率。这些类型的模型在可解释的机器学习领域引起了兴趣，它们被用作神经网络的替代模型，这些神经网络被训练为语言模型。在这项工作中，我们提出了一种从神经网络中提取PDFA的算法。我们的算法是L＃算法的衍生物，并能够从一种新类型的查询中学习PDFA，其中算法根据查询字符串出现的概率推断条件概率。我们通过从一组训练过的神经网络中提取PDFA，在最近的一个公共数据集上展示了其有效性。

更新时间: 2024-06-26 13:16:40

领域: cs.FL,cs.LG

下载: http://arxiv.org/abs/2406.18328v1

Multi-modal Evidential Fusion Network for Trusted PET/CT Tumor Segmentation

Accurate segmentation of tumors in PET/CT images is important in computer-aided diagnosis and treatment of cancer. The key issue of such a segmentation problem lies in the effective integration of complementary information from PET and CT images. However, the quality of PET and CT images varies widely in clinical settings, which leads to uncertainty in the modality information extracted by networks. To take the uncertainty into account in multi-modal information fusion, this paper proposes a novel Multi-modal Evidential Fusion Network (MEFN) comprising a Cross-Modal Feature Learning (CFL) module and a Multi-modal Trusted Fusion (MTF) module. The CFL module reduces the domain gap upon modality conversion and highlights common tumor features, thereby alleviating the needs of the segmentation module to handle modality specificity. The MTF module utilizes mutual attention mechanisms and an uncertainty calibrator to fuse modality features based on modality uncertainty and then fuse the segmentation results under the guidance of Dempster-Shafer Theory. Besides, a new uncertainty perceptual loss is introduced to force the model focusing on uncertain features and hence improve its ability to extract trusted modality information. Extensive comparative experiments are conducted on two publicly available PET/CT datasets to evaluate the performance of our proposed method whose results demonstrate that our MEFN significantly outperforms state-of-the-art methods with improvements of 2.15% and 3.23% in DSC scores on the AutoPET dataset and the Hecktor dataset, respectively. More importantly, our model can provide radiologists with credible uncertainty of the segmentation results for their decision in accepting or rejecting the automatic segmentation results, which is particularly important for clinical applications. Our code will be available at https://github.com/QPaws/MEFN.

Updated: 2024-06-26 13:14:24

标题: 多模态证据融合网络用于可信PET/CT肿瘤分割

摘要: PET/CT图像中肿瘤的准确分割对于癌症的计算机辅助诊断和治疗至关重要。这种分割问题的关键在于有效整合PET和CT图像的互补信息。然而，在临床设置中，PET和CT图像的质量差异很大，这导致网络提取的模态信息存在不确定性。为了考虑多模态信息融合中的不确定性，本文提出了一种新颖的多模态证据融合网络(MEFN)，包括交叉模态特征学习(CFL)模块和多模态可信融合(MTF)模块。CFL模块在模态转换时减小领域差距，并突出显示常见的肿瘤特征，从而减轻分割模块处理模态特异性的需求。MTF模块利用互相关注机制和不确定性校准器根据模态不确定性融合模态特征，然后在Dempster-Shafer理论的指导下融合分割结果。此外，引入了新的不确定性感知损失，强制模型专注于不确定特征，从而提高其提取可信模态信息的能力。我们在两个公开可用的PET/CT数据集上进行了大量比较实验，评估了我们提出的方法的性能，结果表明，我们的MEFN在AutoPET数据集和Hecktor数据集的DSC分数分别提高了2.15%和3.23%，明显优于最先进的方法。更重要的是，我们的模型可以为放射科医生提供分割结果的可信度不确定性，供其决定接受或拒绝自动分割结果，这对临床应用尤为重要。我们的代码将在https://github.com/QPaws/MEFN上提供。

更新时间: 2024-06-26 13:14:24

领域: eess.IV,cs.CV,cs.LG

下载: http://arxiv.org/abs/2406.18327v1

PaCoST: Paired Confidence Significance Testing for Benchmark Contamination Detection in Large Language Models

Large language models (LLMs) are known to be trained on vast amounts of data, which may unintentionally or intentionally include data from commonly used benchmarks. This inclusion can lead to cheatingly high scores on model leaderboards, yet result in disappointing performance in real-world applications. To address this benchmark contamination problem, we first propose a set of requirements that practical contamination detection methods should follow. Following these proposed requirements, we introduce PaCoST, a Paired Confidence Significance Testing to effectively detect benchmark contamination in LLMs. Our method constructs a counterpart for each piece of data with the same distribution, and performs statistical analysis of the corresponding confidence to test whether the model is significantly more confident under the original benchmark. We validate the effectiveness of PaCoST and apply it on popular open-source models and benchmarks. We find that almost all models and benchmarks we tested are suspected contaminated more or less. We finally call for new LLM evaluation methods.

Updated: 2024-06-26 13:12:40

标题: PaCoST：大型语言模型中用于检测基准污染的成对置信度显著性测试

摘要: 大型语言模型（LLMs）通常是在大量数据上训练的，这些数据可能无意或有意地包含来自常用基准的数据。这种包含可能导致模型排行榜上出现虚高的分数，但在实际应用中却表现出令人失望的性能。为了解决这一基准污染问题，我们首先提出了实际污染检测方法应遵循的一组要求。根据这些提出的要求，我们引入了PaCoST，一种配对置信显著性检验方法，可以有效检测LLMs中的基准污染。我们的方法为每个数据构建了一个具有相同分布的对应物，并对相应的置信度进行统计分析，以测试模型在原始基准下是否显著更加自信。我们验证了PaCoST的有效性，并将其应用于流行的开源模型和基准。我们发现我们测试的几乎所有模型和基准都或多或少存在疑似污染。最后，我们呼吁提出新的LLM评估方法。

更新时间: 2024-06-26 13:12:40

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2406.18326v1

The Challenges of the Nonlinear Regime for Physics-Informed Neural Networks

The Neural Tangent Kernel (NTK) viewpoint is widely employed to analyze the training dynamics of overparameterized Physics-Informed Neural Networks (PINNs). However, unlike the case of linear Partial Differential Equations (PDEs), we show how the NTK perspective falls short in the nonlinear scenario. Specifically, we establish that the NTK yields a random matrix at initialization that is not constant during training, contrary to conventional belief. Another significant difference from the linear regime is that, even in the idealistic infinite-width limit, the Hessian does not vanish and hence it cannot be disregarded during training. This motivates the adoption of second-order optimization methods. We explore the convergence guarantees of such methods in both linear and nonlinear cases, addressing challenges such as spectral bias and slow convergence. Every theoretical result is supported by numerical examples with both linear and nonlinear PDEs, and we highlight the benefits of second-order methods in benchmark test cases.

Updated: 2024-06-26 13:05:18

标题: 非线性体制对基于物理的神经网络的挑战

摘要: 神经切线核（NTK）视角被广泛应用于分析过参数化的物理信息神经网络（PINNs）的训练动态。然而，与线性偏微分方程（PDEs）的情况不同，我们展示了NTK视角在非线性情况下的局限性。具体来说，我们确定了NTK在初始化时产生的随机矩阵在训练过程中并非恒定，与传统观念相反。另一个与线性情况不同的重要差异是，即使在理想的无限宽度极限下，Hessian矩阵也不会消失，因此在训练过程中不能忽略它。这促使采用二阶优化方法。我们探讨了这些方法在线性和非线性情况下的收敛保证，解决了诸如谱偏差和收敛缓慢等挑战。每个理论结果都有线性和非线性PDEs的数值示例支持，并突出了在基准测试案例中二阶方法的优势。

更新时间: 2024-06-26 13:05:18

领域: cs.LG

下载: http://arxiv.org/abs/2402.03864v2

Adam-mini: Use Fewer Learning Rates To Gain More

We propose Adam-mini, an optimizer that achieves on-par or better performance than AdamW with 45% to 50% less memory footprint. Adam-mini reduces memory by cutting down the learning rate resources in Adam (i.e., $1/\sqrt{v}$). We find that $\geq$ 90% of these learning rates in $v$ could be harmlessly removed if we (1) carefully partition the parameters into blocks following our proposed principle on Hessian structure; (2) assign a single but good learning rate to each parameter block. We further find that, for each of these parameter blocks, there exists a single high-quality learning rate that can outperform Adam, provided that sufficient resources are available to search it out. We then provide one cost-effective way to find good learning rates and propose Adam-mini. Empirically, we verify that Adam-mini performs on par or better than AdamW on various language models sized from 125M to 7B for pre-training, supervised fine-tuning, and RLHF. The reduced memory footprint of Adam-mini also alleviates communication overheads among GPUs and CPUs, thereby increasing throughput. For instance, Adam-mini achieves 49.6% higher throughput than AdamW when pre-training Llama2-7B on $2\times$ A800-80GB GPUs, which saves 33% wall-clock time for pre-training.

Updated: 2024-06-26 13:03:16

标题: Adam-mini：使用更少的学习率获取更多

摘要: 我们提出了Adam-mini，这是一种优化器，其在内存占用上比AdamW表现相当或更好，且内存占用减少了45%到50%。Adam-mini通过减少Adam中学习率资源（即$1/\sqrt{v}$）来降低内存占用。我们发现，$v$中90%以上的学习率可以被安全地删除，条件是我们（1）根据我们提出的Hessian结构原则将参数分成块；（2）为每个参数块分配一个单一但良好的学习率。我们进一步发现，对于每个参数块，存在一个高质量的学习率可以胜过Adam，前提是有足够的资源来搜索它。然后，我们提供了一种成本效益的方法来找到好的学习率，并提出了Adam-mini。在实证上，我们验证了Adam-mini在各种语言模型上（从125M到7B）的预训练、监督微调和RLHF方面的表现与AdamW相当甚至更好。Adam-mini的减少内存占用也减轻了GPU和CPU之间的通信开销，从而提高了吞吐量。例如，当在$2\times$ A800-80GB GPU上对Llama2-7B进行预训练时，Adam-mini的吞吐量比AdamW高出49.6%，节省了33%的预训练时间。

更新时间: 2024-06-26 13:03:16

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2406.16793v3

MathOdyssey: Benchmarking Mathematical Problem-Solving Skills in Large Language Models Using Odyssey Math Data

Large language models (LLMs) have significantly advanced natural language understanding and demonstrated strong problem-solving abilities. Despite these successes, most LLMs still struggle with solving mathematical problems due to the intricate reasoning required. This paper investigates the mathematical problem-solving capabilities of LLMs using the newly developed "MathOdyssey" dataset. The dataset includes diverse mathematical problems at high school and university levels, created by experts from notable institutions to rigorously test LLMs in advanced problem-solving scenarios and cover a wider range of subject areas. By providing the MathOdyssey dataset as a resource to the AI community, we aim to contribute to the understanding and improvement of AI capabilities in complex mathematical problem-solving. We conduct benchmarking on open-source models, such as Llama-3 and DBRX-Instruct, and closed-source models from the GPT series and Gemini models. Our results indicate that while LLMs perform well on routine and moderately difficult tasks, they face significant challenges with Olympiad-level problems and complex university-level questions. Our analysis shows a narrowing performance gap between open-source and closed-source models, yet substantial challenges remain, particularly with the most demanding problems. This study highlights the ongoing need for research to enhance the mathematical reasoning of LLMs. The dataset, results, and code are publicly available.

Updated: 2024-06-26 13:02:35

标题: MathOdyssey：使用Odyssey Math数据在大型语言模型中对数学问题解决技能进行基准测试

摘要: 大型语言模型（LLMs）显著推进了自然语言理解，并展示了强大的问题解决能力。尽管取得了这些成功，大多数LLMs仍然在解决数学问题方面遇到困难，这是由于所需的复杂推理。本文使用新开发的“MathOdyssey”数据集调查了LLMs的数学问题解决能力。该数据集包括由知名机构的专家创建的高中和大学水平的各种数学问题，旨在严格测试LLMs在高级问题解决场景中，并涵盖更广泛的学科领域。通过将MathOdyssey数据集作为AI社区的资源提供，我们旨在为理解和改进AI在复杂数学问题解决方面的能力做出贡献。我们对开源模型（例如Llama-3和DBRX-Instruct）以及GPT系列和Gemini模型的闭源模型进行基准测试。我们的结果表明，虽然LLMs在日常和具有一定难度的任务中表现良好，但在奥林匹克级别的问题和复杂的大学水平问题上面临重大挑战。我们的分析显示，开源和闭源模型之间的性能差距在缩小，但仍存在重大挑战，特别是在最具挑战性的问题上。这项研究突显了提升LLMs数学推理能力的持续需求。数据集、结果和代码都是公开可用的。

更新时间: 2024-06-26 13:02:35

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2406.18321v1

Trade-off between Gradient Measurement Efficiency and Expressivity in Deep Quantum Neural Networks

Quantum neural networks (QNNs) require an efficient training algorithm to achieve practical quantum advantages. A promising approach is the use of gradient-based optimization algorithms, where gradients are estimated through quantum measurements. However, it is generally difficult to efficiently measure gradients in QNNs because the quantum state collapses upon measurement. In this work, we prove a general trade-off between gradient measurement efficiency and expressivity in a wide class of deep QNNs, elucidating the theoretical limits and possibilities of efficient gradient estimation. This trade-off implies that a more expressive QNN requires a higher measurement cost in gradient estimation, whereas we can increase gradient measurement efficiency by reducing the QNN expressivity to suit a given task. We further propose a general QNN ansatz called the stabilizer-logical product ansatz (SLPA), which can reach the upper limit of the trade-off inequality by leveraging the symmetric structure of the quantum circuit. In learning an unknown symmetric function, the SLPA drastically reduces the quantum resources required for training while maintaining accuracy and trainability compared to a well-designed symmetric circuit based on the parameter-shift method. Our results not only reveal a theoretical understanding of efficient training in QNNs but also provide a standard and broadly applicable efficient QNN design.

Updated: 2024-06-26 12:59:37

标题: 深度量子神经网络中梯度测量效率与表达能力之间的权衡Trade-off

摘要: 量子神经网络（QNNs）需要一种高效的训练算法来实现实际的量子优势。一种有前途的方法是使用基于梯度的优化算法，通过量子测量来估计梯度。然而，在QNNs中通常很难有效地测量梯度，因为量子态在测量时会坍缩。在这项工作中，我们证明了在一类广泛的深度QNNs中，梯度测量效率与表达能力之间存在一种一般性的权衡，阐明了有效梯度估计的理论极限和可能性。这种权衡意味着更具表现力的QNN需要更高的梯度测量成本，而我们可以通过减少QNN表达能力来增加梯度测量效率以适应给定任务。我们进一步提出了一种称为稳定子逻辑乘积谱（SLPA）的通用QNN基准，通过利用量子电路的对称结构，可以达到权衡不等式的上限。在学习未知对称函数时，与基于参数移位方法设计的对称电路相比，SLPA大大减少了训练所需的量子资源，同时保持了准确性和可训练性。我们的结果不仅揭示了QNNs中高效训练的理论理解，还提供了一种标准且广泛适用的高效QNN设计。

更新时间: 2024-06-26 12:59:37

领域: quant-ph,cs.LG

下载: http://arxiv.org/abs/2406.18316v1

ContactNet: Geometric-Based Deep Learning Model for Predicting Protein-Protein Interactions

Deep learning approaches achieved significant progress in predicting protein structures. These methods are often applied to protein-protein interactions (PPIs) yet require Multiple Sequence Alignment (MSA) which is unavailable for various interactions, such as antibody-antigen. Computational docking methods are capable of sampling accurate complex models, but also produce thousands of invalid configurations. The design of scoring functions for identifying accurate models is a long-standing challenge. We develop a novel attention-based Graph Neural Network (GNN), ContactNet, for classifying PPI models obtained from docking algorithms into accurate and incorrect ones. When trained on docked antigen and modeled antibody structures, ContactNet doubles the accuracy of current state-of-the-art scoring functions, achieving accurate models among its Top-10 at 43% of the test cases. When applied to unbound antibodies, its Top-10 accuracy increases to 65%. This performance is achieved without MSA and the approach is applicable to other types of interactions, such as host-pathogens or general PPIs.

Updated: 2024-06-26 12:54:41

标题: ContactNet：基于几何的深度学习模型用于预测蛋白质相互作用

摘要: 深度学习方法在预测蛋白质结构方面取得了显著进展。这些方法通常应用于蛋白质-蛋白质相互作用（PPIs），但需要多序列比对（MSA），而对于各种相互作用，如抗体-抗原，MSA是不可用的。计算对接方法能够采样准确的复杂模型，但也会产生成千上万个无效配置。为识别准确模型设计评分函数是一个长期存在的挑战。我们开发了一种新颖的基于注意力的图神经网络（GNN），ContactNet，用于将从对接算法获得的PPI模型分类为准确和不准确的模型。当在对接抗原和建模抗体结构上进行训练时，ContactNet将当前最先进的评分函数的准确性翻倍，在测试案例的前十个中获得43%的准确模型。当应用于未结合的抗体时，其前十准确率增加至65%。这种表现是在没有MSA的情况下实现的，并且这种方法适用于其他类型的相互作用，如宿主-病原体或一般PPIs。

更新时间: 2024-06-26 12:54:41

领域: cs.LG,q-bio.BM

下载: http://arxiv.org/abs/2406.18314v1

AI-native Memory: A Pathway from LLMs Towards AGI

Large language models (LLMs) have demonstrated the world with the sparks of artificial general intelligence (AGI). One opinion, especially from some startups working on LLMs, argues that an LLM with nearly unlimited context length can realize AGI. However, they might be too optimistic about the long-context capability of (existing) LLMs -- (1) Recent literature has shown that their effective context length is significantly smaller than their claimed context length; and (2) Our reasoning-in-a-haystack experiments further demonstrate that simultaneously finding the relevant information from a long context and conducting (simple) reasoning is nearly impossible. In this paper, we envision a pathway from LLMs to AGI through the integration of \emph{memory}. We believe that AGI should be a system where LLMs serve as core processors. In addition to raw data, the memory in this system would store a large number of important conclusions derived from reasoning processes. Compared with retrieval-augmented generation (RAG) that merely processing raw data, this approach not only connects semantically related information closer, but also simplifies complex inferences at the time of querying. As an intermediate stage, the memory will likely be in the form of natural language descriptions, which can be directly consumed by users too. Ultimately, every agent/person should have its own large personal model, a deep neural network model (thus \emph{AI-native}) that parameterizes and compresses all types of memory, even the ones cannot be described by natural languages. Finally, we discuss the significant potential of AI-native memory as the transformative infrastructure for (proactive) engagement, personalization, distribution, and social in the AGI era, as well as the incurred privacy and security challenges with preliminary solutions.

Updated: 2024-06-26 12:51:37

标题: AI本土内存：从LLMs走向AGI的路径

摘要: 大型语言模型（LLMs）展示了人工通用智能（AGI）的火花。一种观点，特别是一些致力于LLMs的初创企业，认为一个具有几乎无限上下文长度的LLM可以实现AGI。然而，他们可能对（现有的）LLMs的长上下文能力过于乐观——（1）最近的文献显示，它们的有效上下文长度明显小于它们声称的上下文长度；（2）我们的“在一堆干草中推理”实验进一步证明了同时从长上下文中找到相关信息并进行（简单）推理几乎是不可能的。在本文中，我们构想了通过整合\emph{记忆}从LLMs到AGI的路径。我们认为AGI应该是一个LLMs作为核心处理器的系统。除了原始数据，该系统中的记忆将存储大量从推理过程中得出的重要结论。与仅处理原始数据的检索增强生成（RAG）相比，这种方法不仅使语义相关信息更接近，而且在查询时简化了复杂的推理。作为中间阶段，记忆可能以自然语言描述的形式存在，这也可以直接被用户使用。最终，每个代理/人都应该有自己的大型个人模型，一个深度神经网络模型（因此\emph{AI-native}），它对所有类型的记忆进行参数化和压缩，甚至那些不能用自然语言描述的记忆。最后，我们讨论了AI-native记忆作为AGI时代（主动）参与、个性化、分发和社会的转变基础设施的重要潜力，以及由此带来的隐私和安全挑战与初步解决方案。

更新时间: 2024-06-26 12:51:37

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2406.18312v1

Robust Low-Cost Drone Detection and Classification in Low SNR Environments

The proliferation of drones, or unmanned aerial vehicles (UAVs), has raised significant safety concerns due to their potential misuse in activities such as espionage, smuggling, and infrastructure disruption. This paper addresses the critical need for effective drone detection and classification systems that operate independently of UAV cooperation. We evaluate various convolutional neural networks (CNNs) for their ability to detect and classify drones using spectrogram data derived from consecutive Fourier transforms of signal components. The focus is on model robustness in low signal-to-noise ratio (SNR) environments, which is critical for real-world applications. A comprehensive dataset is provided to support future model development. In addition, we demonstrate a low-cost drone detection system using a standard computer, software-defined radio (SDR) and antenna, validated through real-world field testing. On our development dataset, all models consistently achieved an average balanced classification accuracy of >= 85% at SNR > -12dB. In the field test, these models achieved an average balance accuracy of > 80%, depending on transmitter distance and antenna direction. Our contributions include: a publicly available dataset for model development, a comparative analysis of CNN for drone detection under low SNR conditions, and the deployment and field evaluation of a practical, low-cost detection system.

Updated: 2024-06-26 12:50:55

标题: 在低信噪比环境下的稳健低成本无人机检测和分类

摘要: The proliferation of drones, or unmanned aerial vehicles (UAVs, has raised significant safety concerns due to their potential misuse in activities such as espionage, smuggling, and infrastructure disruption. This paper addresses the critical need for effective drone detection and classification systems that operate independently of UAV cooperation. We evaluate various convolutional neural networks (CNNs) for their ability to detect and classify drones using spectrogram data derived from consecutive Fourier transforms of signal components. The focus is on model robustness in low signal-to-noise ratio (SNR) environments, which is critical for real-world applications. A comprehensive dataset is provided to support future model development. In addition, we demonstrate a low-cost drone detection system using a standard computer, software-defined radio (SDR) and antenna, validated through real-world field testing. On our development dataset, all models consistently achieved an average balanced classification accuracy of >= 85% at SNR > -12dB. In the field test, these models achieved an average balance accuracy of > 80%, depending on transmitter distance and antenna direction. Our contributions include: a publicly available dataset for model development, a comparative analysis of CNN for drone detection under low SNR conditions, and the deployment and field evaluation of a practical, low-cost detection system.

更新时间: 2024-06-26 12:50:55

领域: eess.SP,cs.LG

下载: http://arxiv.org/abs/2406.18624v1

Online Learning of Multiple Tasks and Their Relationships : Testing on Spam Email Data and EEG Signals Recorded in Construction Fields

This paper examines an online multi-task learning (OMTL) method, which processes data sequentially to predict labels across related tasks. The framework learns task weights and their relatedness concurrently. Unlike previous models that assumed static task relatedness, our approach treats tasks as initially independent, updating their relatedness iteratively using newly calculated weight vectors. We introduced three rules to update the task relatedness matrix: OMTLCOV, OMTLLOG, and OMTLVON, and compared them against a conventional method (CMTL) that uses a fixed relatedness value. Performance evaluations on three datasets a spam dataset and two EEG datasets from construction workers under varying conditions demonstrated that our OMTL methods outperform CMTL, improving accuracy by 1\% to 3\% on EEG data, and maintaining low error rates around 12\% on the spam dataset.

Updated: 2024-06-26 12:50:13

标题: 在线学习多个任务及其关系：在垃圾邮件数据和建筑现场记录的脑电图信号上的测试

摘要: 本文研究了一种在线多任务学习（OMTL）方法，该方法按顺序处理数据，以预测相关任务之间的标签。该框架同时学习任务权重及其相关性。与先前假定任务相关性静态的模型不同，我们的方法将任务视为最初独立的，并使用新计算的权重向量迭代更新它们的相关性。我们引入了三个规则来更新任务相关性矩阵：OMTLCOV、OMTLLOG和OMTLVON，并将它们与使用固定相关性值的传统方法（CMTL）进行比较。对三个数据集（一个垃圾邮件数据集和两个来自建筑工人的脑电图数据集）的性能评估表明，我们的OMTL方法优于CMTL，在EEG数据上将准确率提高了1\%至3％，并在垃圾邮件数据集上保持低错误率约为12％。

更新时间: 2024-06-26 12:50:13

领域: cs.LG

下载: http://arxiv.org/abs/2406.18311v1

Spatial-temporal Hierarchical Reinforcement Learning for Interpretable Pathology Image Super-Resolution

Pathology image are essential for accurately interpreting lesion cells in cytopathology screening, but acquiring high-resolution digital slides requires specialized equipment and long scanning times. Though super-resolution (SR) techniques can alleviate this problem, existing deep learning models recover pathology image in a black-box manner, which can lead to untruthful biological details and misdiagnosis. Additionally, current methods allocate the same computational resources to recover each pixel of pathology image, leading to the sub-optimal recovery issue due to the large variation of pathology image. In this paper, we propose the first hierarchical reinforcement learning framework named Spatial-Temporal hierARchical Reinforcement Learning (STAR-RL), mainly for addressing the aforementioned issues in pathology image super-resolution problem. We reformulate the SR problem as a Markov decision process of interpretable operations and adopt the hierarchical recovery mechanism in patch level, to avoid sub-optimal recovery. Specifically, the higher-level spatial manager is proposed to pick out the most corrupted patch for the lower-level patch worker. Moreover, the higher-level temporal manager is advanced to evaluate the selected patch and determine whether the optimization should be stopped earlier, thereby avoiding the over-processed problem. Under the guidance of spatial-temporal managers, the lower-level patch worker processes the selected patch with pixel-wise interpretable actions at each time step. Experimental results on medical images degraded by different kernels show the effectiveness of STAR-RL. Furthermore, STAR-RL validates the promotion in tumor diagnosis with a large margin and shows generalizability under various degradations. The source code is available at https://github.com/CUHK-AIM-Group/STAR-RL.

Updated: 2024-06-26 12:50:10

标题: 空间-时间分层强化学习用于可解释的病理学图像超分辨率

摘要: 病理图像对于准确解释细胞病变在细胞病理学筛选中至关重要，但获取高分辨率数字切片需要专门的设备和长时间扫描。尽管超分辨率（SR）技术可以缓解这一问题，但现有的深度学习模型以黑盒方式恢复病理图像，可能导致不真实的生物细节和误诊。此外，当前方法为恢复病理图像的每个像素分配相同的计算资源，由于病理图像的巨大变化，导致次优恢复问题。在本文中，我们提出了第一个层次强化学习框架，命名为时空层次强化学习（STAR-RL），主要用于解决病理图像超分辨率问题中提到的问题。我们将SR问题重新定义为一个可解释操作的马尔科夫决策过程，并采用分层恢复机制在补丁级别上，以避免次优恢复。具体来说，提出了更高级别的空间管理器，用于选择最受损的补丁，供较低级别的补丁工作者使用。此外，更高级别的时间管理器被用来评估所选的补丁，并确定是否应提前停止优化，从而避免过度处理问题。在时空管理器的指导下，较低级别的补丁工作者在每个时间步骤采用像素级可解释的操作处理所选的补丁。通过不同内核降级的医学图像上的实验结果显示了STAR-RL的有效性。此外，STAR-RL证实了在肿瘤诊断中的显著提升，并展示了在各种降级情况下的泛化能力。源代码可在https://github.com/CUHK-AIM-Group/STAR-RL获取。

更新时间: 2024-06-26 12:50:10

领域: cs.CV,cs.LG,eess.IV

下载: http://arxiv.org/abs/2406.18310v1

Automated Immunophenotyping Assessment for Diagnosing Childhood Acute Leukemia using Set-Transformers

Acute Leukemia is the most common hematologic malignancy in children and adolescents. A key methodology in the diagnostic evaluation of this malignancy is immunophenotyping based on Multiparameter Flow Cytometry (FCM). However, this approach is manual, and thus time-consuming and subjective. To alleviate this situation, we propose in this paper the FCM-Former, a machine learning, self-attention based FCM-diagnostic tool, automating the immunophenotyping assessment in Childhood Acute Leukemia. The FCM-Former is trained in a supervised manner, by directly using flow cytometric data. Our FCM-Former achieves an accuracy of 96.5% assigning lineage to each sample among 960 cases of either acute B-cell, T-cell lymphoblastic, and acute myeloid leukemia (B-ALL, T-ALL, AML). To the best of our knowledge, the FCM-Former is the first work that automates the immunophenotyping assessment with FCM data in diagnosing pediatric Acute Leukemia.

Updated: 2024-06-26 12:50:07

标题: 使用集合变换器进行自动化免疫表型评估诊断儿童急性白血病

摘要: 急性白血病是儿童和青少年中最常见的血液恶性肿瘤。在这种恶性肿瘤的诊断评估中的一个关键方法是基于多参数流式细胞术（FCM）的免疫表型分析。然而，这种方法是手动的，因此耗时且主观。为了缓解这种情况，我们在本文中提出了FCM-Former，这是一种基于机器学习和自注意力的FCM诊断工具，可自动化儿童急性白血病的免疫表型评估。FCM-Former是通过直接使用流式细胞数据以监督方式进行训练的。我们的FCM-Former在960例急性B-细胞、T-细胞淋巴母细胞和急性髓细胞白血病（B-ALL、T-ALL、AML）中为每个样本分配细胞系的准确率达到96.5%。据我们所知，FCM-Former是第一个使用FCM数据自动化进行儿科急性白血病诊断的工作。

更新时间: 2024-06-26 12:50:07

领域: cs.LG,q-bio.QM

下载: http://arxiv.org/abs/2406.18309v1

Autoencoder-based Anomaly Detection System for Online Data Quality Monitoring of the CMS Electromagnetic Calorimeter

The CMS detector is a general-purpose apparatus that detects high-energy collisions produced at the LHC. Online Data Quality Monitoring of the CMS electromagnetic calorimeter is a vital operational tool that allows detector experts to quickly identify, localize, and diagnose a broad range of detector issues that could affect the quality of physics data. A real-time autoencoder-based anomaly detection system using semi-supervised machine learning is presented enabling the detection of anomalies in the CMS electromagnetic calorimeter data. A novel method is introduced which maximizes the anomaly detection performance by exploiting the time-dependent evolution of anomalies as well as spatial variations in the detector response. The autoencoder-based system is able to efficiently detect anomalies, while maintaining a very low false discovery rate. The performance of the system is validated with anomalies found in 2018 and 2022 LHC collision data. Additionally, the first results from deploying the autoencoder-based system in the CMS online Data Quality Monitoring workflow during the beginning of Run 3 of the LHC are presented, showing its ability to detect issues missed by the existing system.

Updated: 2024-06-26 12:45:55

标题: 使用基于自动编码器的异常检测系统监测CMS电磁量能器在线数据质量

摘要: CMS探测器是一个通用的装置，用于检测在LHC上产生的高能碰撞。CMS电磁量能器的在线数据质量监测是一种至关重要的操作工具，允许探测器专家快速识别、定位和诊断可能影响物理数据质量的各种探测器问题。本文介绍了一种基于自监督机器学习的实时自编码器异常检测系统，可以检测CMS电磁量能器数据中的异常。引入了一种新颖的方法，通过利用异常的时间演变和探测器响应的空间变化，最大化了异常检测性能。自编码器系统能够高效地检测异常，同时保持非常低的误发现率。该系统的性能经过了2018年和2022年LHC碰撞数据中发现的异常的验证。此外，本文还展示了在LHC Run 3开始期间将基于自编码器的系统部署到CMS在线数据质量监控工作流程中的初步结果，显示了其能够检测到现有系统忽略的问题。

更新时间: 2024-06-26 12:45:55

领域: physics.ins-det,cs.LG,hep-ex,physics.data-an

下载: http://arxiv.org/abs/2309.10157v2

S3: A Simple Strong Sample-effective Multimodal Dialog System

In this work, we present a conceptually simple yet powerful baseline for the multimodal dialog task, an S3 model, that achieves near state-of-the-art results on two compelling leaderboards: MMMU and AI Journey Contest 2023. The system is based on a pre-trained large language model, pre-trained modality encoders for image and audio, and a trainable modality projector. The proposed effective data mixture for training such an architecture demonstrates that a multimodal model based on a strong language model and trained on a small amount of multimodal data can perform efficiently in the task of multimodal dialog.

Updated: 2024-06-26 12:45:43

标题: S3：一个简单强大的样本有效的多模态对话系统

摘要: 在这项工作中，我们提出了一个概念简单但强大的多模态对话任务的基线模型S3，该模型在两个引人注目的榜单MMM和AI Journey Contest 2023上取得了接近最新技术水平的结果。该系统基于一个预训练的大型语言模型，预训练的图像和音频模态编码器，以及一个可训练的模态投影器。所提出的有效数据混合用于训练这样的架构，表明基于强大语言模型并在少量多模态数据上训练的多模态模型可以在多模态对话任务中高效执行。

更新时间: 2024-06-26 12:45:43

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2406.18305v1

Are AI-Generated Text Detectors Robust to Adversarial Perturbations?

The widespread use of large language models (LLMs) has sparked concerns about the potential misuse of AI-generated text, as these models can produce content that closely resembles human-generated text. Current detectors for AI-generated text (AIGT) lack robustness against adversarial perturbations, with even minor changes in characters or words causing a reversal in distinguishing between human-created and AI-generated text. This paper investigates the robustness of existing AIGT detection methods and introduces a novel detector, the Siamese Calibrated Reconstruction Network (SCRN). The SCRN employs a reconstruction network to add and remove noise from text, extracting a semantic representation that is robust to local perturbations. We also propose a siamese calibration technique to train the model to make equally confidence predictions under different noise, which improves the model's robustness against adversarial perturbations. Experiments on four publicly available datasets show that the SCRN outperforms all baseline methods, achieving 6.5\%-18.25\% absolute accuracy improvement over the best baseline method under adversarial attacks. Moreover, it exhibits superior generalizability in cross-domain, cross-genre, and mixed-source scenarios. The code is available at \url{https://github.com/CarlanLark/Robust-AIGC-Detector}.

Updated: 2024-06-26 12:43:56

标题: AI生成的文本检测器对对抗性扰动具有鲁棒性吗？

摘要: 广泛使用大型语言模型(LLMs)引发了关于人工智能生成文本潜在误用的担忧，因为这些模型可以生成与人类生成文本非常相似的内容。目前用于人工智能生成文本(AIGT)检测的方法对抗性扰动缺乏鲁棒性，即使是字符或单词的微小变化也会导致在区分人类创建和人工智能生成文本之间产生逆转。本文研究了现有AIGT检测方法的鲁棒性，并介绍了一种新颖的检测器，即Siamese校准重构网络(SCRN)。SCRN利用重构网络从文本中添加和删除噪声，提取出对局部扰动具有鲁棒性的语义表示。我们还提出了一种Siamese校准技术，用于训练模型在不同噪声下做出同等自信的预测，从而提高模型对抗性扰动的鲁棒性。对四个公开可用数据集的实验结果显示，SCRN优于所有基准方法，在对抗性攻击下比最佳基准方法表现提高了6.5\%-18.25\%的绝对准确率。此外，在跨领域、跨类型和混合来源场景中表现出卓越的泛化能力。代码可在\url{https://github.com/CarlanLark/Robust-AIGC-Detector}找到。

更新时间: 2024-06-26 12:43:56

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2406.01179v2

SUB-PLAY: Adversarial Policies against Partially Observed Multi-Agent Reinforcement Learning Systems

Recent advancements in multi-agent reinforcement learning (MARL) have opened up vast application prospects, such as swarm control of drones, collaborative manipulation by robotic arms, and multi-target encirclement. However, potential security threats during the MARL deployment need more attention and thorough investigation. Recent research reveals that attackers can rapidly exploit the victim's vulnerabilities, generating adversarial policies that result in the failure of specific tasks. For instance, reducing the winning rate of a superhuman-level Go AI to around 20%. Existing studies predominantly focus on two-player competitive environments, assuming attackers possess complete global state observation. In this study, we unveil, for the first time, the capability of attackers to generate adversarial policies even when restricted to partial observations of the victims in multi-agent competitive environments. Specifically, we propose a novel black-box attack (SUB-PLAY) that incorporates the concept of constructing multiple subgames to mitigate the impact of partial observability and suggests sharing transitions among subpolicies to improve attackers' exploitative ability. Extensive evaluations demonstrate the effectiveness of SUB-PLAY under three typical partial observability limitations. Visualization results indicate that adversarial policies induce significantly different activations of the victims' policy networks. Furthermore, we evaluate three potential defenses aimed at exploring ways to mitigate security threats posed by adversarial policies, providing constructive recommendations for deploying MARL in competitive environments.

Updated: 2024-06-26 12:41:59

标题: 子游戏：针对部分观察到的多智能体强化学习系统的对抗策略

摘要: 最近在多智能体强化学习（MARL）方面取得了重大进展，开拓了广泛的应用前景，例如无人机群体控制、机器人臂的协作操纵和多目标包围。然而，在MARL部署过程中潜在的安全威胁需要更多关注和深入调查。最近的研究表明，攻击者可以迅速利用受害者的漏洞，生成敌对策略导致特定任务失败。例如，将超级水平的围棋AI的胜率降低到约20％。现有研究主要集中在两人竞争环境中，假设攻击者拥有完整的全局状态观察。在本研究中，我们首次揭示了攻击者在多智能体竞争环境中即使受限于受害者的部分观察也能生成敌对策略的能力。具体地，我们提出了一种新颖的黑盒攻击（SUB-PLAY），该攻击将构建多个子游戏的概念结合起来，从而缓解部分可观察性的影响，并建议在子策略之间共享转换以提高攻击者的利用能力。广泛的评估证明了在三种典型的部分可观测性限制下SUB-PLAY的有效性。可视化结果表明，敌对策略引发受害者策略网络的显着不同激活。此外，我们评估了三种潜在的防御方法，旨在探索减轻敌对策略引起的安全威胁的方法，并为在竞争环境中部署MARL提供建设性建议。

更新时间: 2024-06-26 12:41:59

领域: cs.LG,cs.AI,cs.CR

下载: http://arxiv.org/abs/2402.03741v3

CLARA: Classifying and Disambiguating User Commands for Reliable Interactive Robotic Agents

In this paper, we focus on inferring whether the given user command is clear, ambiguous, or infeasible in the context of interactive robotic agents utilizing large language models (LLMs). To tackle this problem, we first present an uncertainty estimation method for LLMs to classify whether the command is certain (i.e., clear) or not (i.e., ambiguous or infeasible). Once the command is classified as uncertain, we further distinguish it between ambiguous or infeasible commands leveraging LLMs with situational aware context in a zero-shot manner. For ambiguous commands, we disambiguate the command by interacting with users via question generation with LLMs. We believe that proper recognition of the given commands could lead to a decrease in malfunction and undesired actions of the robot, enhancing the reliability of interactive robot agents. We present a dataset for robotic situational awareness, consisting pair of high-level commands, scene descriptions, and labels of command type (i.e., clear, ambiguous, or infeasible). We validate the proposed method on the collected dataset, pick-and-place tabletop simulation. Finally, we demonstrate the proposed approach in real-world human-robot interaction experiments, i.e., handover scenarios.

Updated: 2024-06-26 12:39:36

标题: CLARA：用于可靠互动机器人代理的分类和消岐用户命令

摘要: 在这篇论文中，我们关注于推断在使用大型语言模型（LLMs）的交互式机器人代理的背景下，给定用户命令是清晰的、含糊的还是不可行的。为了解决这个问题，我们首先提出了一种用于LLMs的不确定性估计方法，以分类命令是确定的（即清晰的）还是不确定的（即含糊的或不可行的）。一旦命令被分类为不确定，我们进一步利用具有情境感知上下文的LLMs以零-shot方式将其区分为含糊或不可行的命令。对于含糊的命令，我们通过与用户互动生成问题来消除命令的含糊性。我们相信对给定命令的正确识别可以减少机器人的故障和不良行为，增强交互式机器人代理的可靠性。我们提供了一个用于机器人情境意识的数据集，包括高级命令对、场景描述以及命令类型的标签（即清晰的、含糊的或不可行的）。我们在收集的数据集上验证了提出的方法，即拾取放置桌面模拟。最后，我们在真实世界的人机交互实验中展示了所提出的方法，即交接场景。

更新时间: 2024-06-26 12:39:36

领域: cs.RO,cs.AI

下载: http://arxiv.org/abs/2306.10376v6

General Distribution Learning: A theoretical framework for Deep Learning

There remain numerous unanswered research questions on deep learning (DL) within the classical learning theory framework. These include the remarkable generalization capabilities of overparametrized neural networks (NNs), the efficient optimization performance despite non-convexity of objectives, the mechanism of flat minima for generalization, and the exceptional performance of deep architectures in solving physical problems. This paper introduces General Distribution Learning (GD Learning), a novel theoretical learning framework designed to address a comprehensive range of machine learning and statistical tasks, including classification, regression and parameter estimation. Departing from traditional statistical machine learning, GD Learning focuses on the true underlying distribution. In GD Learning, learning error, corresponding to the expected error in classical statistical learning framework, is divided into fitting errors due to models and algorithms, as well as sampling errors introduced by limited sampling data. The framework significantly incorporates prior knowledge, especially in scenarios characterized by data scarcity, thereby enhancing performance. Within the GD Learning framework, we demonstrate that the global optimal solutions in non-convex optimization can be approached by minimizing the gradient norm and the non-uniformity of the eigenvalues of the model's Jacobian matrix. This insight leads to the development of the gradient structure control algorithm. GD Learning also offers fresh insights into the questions on deep learning, including overparameterization and non-convex optimization, bias-variance trade-off, and the mechanism of flat minima.

Updated: 2024-06-26 12:32:28

标题: 一般分布学习：深度学习的理论框架

摘要: 在经典学习理论框架内，深度学习（DL）仍然存在许多未解的研究问题。这些问题包括超参数化神经网络（NNs）的显著泛化能力、尽管目标非凸性但优化性能高效、泛化的平坦极小点机制，以及深度体系结构在解决物理问题方面的卓越表现。本文介绍了一种新颖的理论学习框架——General Distribution Learning（GD Learning），旨在解决一系列机器学习和统计任务，包括分类、回归和参数估计。与传统统计机器学习不同，GD Learning注重真实的底层分布。在GD Learning中，学习误差，对应于经典统计学习框架中的预期误差，被分为由模型和算法导致的拟合误差，以及由有限采样数据引入的采样误差。该框架显著地整合了先验知识，特别是在数据稀缺情况下，从而提高了性能。在GD Learning框架内，我们证明了非凸优化中的全局最优解可以通过最小化梯度范数和模型雅可比矩阵的特征值的不均匀性来逼近。这一见解导致了梯度结构控制算法的发展。GD Learning还为深度学习的问题提供了新的见解，包括超参数化和非凸优化、偏差-方差权衡，以及平坦极小点的机制。

更新时间: 2024-06-26 12:32:28

领域: cs.LG,cs.IR,stat.ML

下载: http://arxiv.org/abs/2406.05666v4

Jina CLIP: Your CLIP Model Is Also Your Text Retriever

Contrastive Language-Image Pretraining (CLIP) is widely used to train models to align images and texts in a common embedding space by mapping them to fixed-sized vectors. These models are key to multimodal information retrieval and related tasks. However, CLIP models generally underperform in text-only tasks compared to specialized text models. This creates inefficiencies for information retrieval systems that keep separate embeddings and models for text-only and multimodal tasks. We propose a novel, multi-task contrastive training method to address this issue, which we use to train the jina-clip-v1 model to achieve the state-of-the-art performance on both text-image and text-text retrieval tasks.

Updated: 2024-06-26 12:31:48

标题: Jina CLIP：您的CLIP模型也是您的文本检索器

摘要: 对比语言-图像预训练（CLIP）被广泛用于训练模型，通过将图像和文本映射到固定大小的向量，使它们在一个共同的嵌入空间中对齐。这些模型对于多模态信息检索和相关任务至关重要。然而，与专门的文本模型相比，CLIP模型在纯文本任务中通常表现不佳。这为保持文本-仅和多模态任务分开的嵌入和模型的信息检索系统带来了低效。我们提出了一种新颖的多任务对比训练方法来解决这个问题，我们使用这种方法来训练jina-clip-v1模型，在文本-图像和文本-文本检索任务上实现了最先进的性能。

更新时间: 2024-06-26 12:31:48

领域: cs.CL,cs.AI,cs.CV,cs.IR,68T50,I.2.7

下载: http://arxiv.org/abs/2405.20204v2

Single-Model Attribution of Generative Models Through Final-Layer Inversion

Recent breakthroughs in generative modeling have sparked interest in practical single-model attribution. Such methods predict whether a sample was generated by a specific generator or not, for instance, to prove intellectual property theft. However, previous works are either limited to the closed-world setting or require undesirable changes to the generative model. We address these shortcomings by, first, viewing single-model attribution through the lens of anomaly detection. Arising from this change of perspective, we propose FLIPAD, a new approach for single-model attribution in the open-world setting based on final-layer inversion and anomaly detection. We show that the utilized final-layer inversion can be reduced to a convex lasso optimization problem, making our approach theoretically sound and computationally efficient. The theoretical findings are accompanied by an experimental study demonstrating the effectiveness of our approach and its flexibility to various domains.

Updated: 2024-06-26 12:31:04

标题: 单模型通过最后一层反演对生成模型进行归因

摘要: 最近在生成建模方面取得的突破引起了对实际单一模型归因的兴趣。这些方法可以预测一份样本是否是由特定生成器生成的，例如，以证明知识产权盗窃行为。然而，先前的研究要么局限于封闭世界设置，要么需要对生成模型进行不必要的更改。我们通过首先将单一模型归因视为异常检测来解决这些缺点。由于这种视角的改变，我们提出了FLIPAD，一种基于最终层反演和异常检测的新方法，用于开放世界环境中的单一模型归因。我们表明，所使用的最终层反演可以简化为一个凸lasso优化问题，使我们的方法在理论上具有合理性并且计算效率高。理论发现得到了一项实验研究的支持，展示了我们方法的有效性及其对各种领域的灵活性。

更新时间: 2024-06-26 12:31:04

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2306.06210v5

Evaluating and Benchmarking Foundation Models for Earth Observation and Geospatial AI

When we are primarily interested in solving several problems jointly with a given prescribed high performance accuracy for each target application, then Foundation Models should for most cases be used rather than problem-specific models. We focus on the specific Computer Vision application of Foundation Models for Earth Observation (EO) and geospatial AI. These models can solve important problems we are tackling, including for example land cover classification, crop type mapping, flood segmentation, building density estimation, and road regression segmentation. In this paper, we show that for a limited number of labelled data, Foundation Models achieve improved performance compared to problem-specific models. In this work, we also present our proposed evaluation benchmark for Foundation Models for EO. Benchmarking the generalization performance of Foundation Models is important as it has become difficult to standardize a fair comparison across the many different models that have been proposed recently. We present the results using our evaluation benchmark for EO Foundation Models and show that Foundation Models are label efficient in the downstream tasks and help us solve problems we are tackling in EO and remote sensing.

Updated: 2024-06-26 12:27:06

标题: 评估和基准测试基于地球观测和地理空间人工智能的基础模型

摘要: 当我们主要关心的是与给定的目标应用程序一起解决多个问题，并且每个目标应用程序都有高性能准确性要求时，大多数情况下应使用基础模型，而不是特定于问题的模型。我们专注于基础模型在地球观测（EO）和地理空间人工智能中的特定计算机视觉应用。这些模型可以解决我们正在处理的重要问题，包括土地覆盖分类、作物类型映射、洪水分割、建筑密度估算和道路回归分割等。在本文中，我们表明对于有限数量的标记数据，基础模型相比于特定问题的模型实现了更好的性能。在这项工作中，我们还提出了基于基础模型的EO评估基准。基准基础模型的泛化性能是重要的，因为最近提出了许多不同模型，很难对它们进行公平比较。我们使用我们的EO基础模型评估基准的结果，并表明基础模型在下游任务中是标签高效的，并帮助我们解决我们正在处理的EO和遥感中的问题。

更新时间: 2024-06-26 12:27:06

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2406.18295v1

Combining Automated Optimisation of Hyperparameters and Reward Shape

There has been significant progress in deep reinforcement learning (RL) in recent years. Nevertheless, finding suitable hyperparameter configurations and reward functions remains challenging even for experts, and performance heavily relies on these design choices. Also, most RL research is conducted on known benchmarks where knowledge about these choices already exists. However, novel practical applications often pose complex tasks for which no prior knowledge about good hyperparameters and reward functions is available, thus necessitating their derivation from scratch. Prior work has examined automatically tuning either hyperparameters or reward functions individually. We demonstrate empirically that an RL algorithm's hyperparameter configurations and reward function are often mutually dependent, meaning neither can be fully optimised without appropriate values for the other. We then propose a methodology for the combined optimisation of hyperparameters and the reward function. Furthermore, we include a variance penalty as an optimisation objective to improve the stability of learned policies. We conducted extensive experiments using Proximal Policy Optimisation and Soft Actor-Critic on four environments. Our results show that combined optimisation significantly improves over baseline performance in half of the environments and achieves competitive performance in the others, with only a minor increase in computational costs. This suggests that combined optimisation should be best practice.

Updated: 2024-06-26 12:23:54

标题: 结合超参数和奖励形状的自动优化

摘要: 近年来在深度强化学习（RL）领域取得了显著进展。然而，即使对专家来说，找到合适的超参数配置和奖励函数仍然具有挑战性，并且性能严重依赖于这些设计选择。此外，大多数RL研究是在已知基准上进行的，这些基准已经存在关于这些选择的知识。然而，新颖的实际应用通常提出了复杂的任务，对于这些任务，没有关于良好超参数和奖励函数的先验知识可用，因此需要从头开始推导它们。先前的工作已经研究了自动调整超参数或奖励函数的方法。我们通过实验证明，RL算法的超参数配置和奖励函数通常是相互依赖的，意味着没有适当的值，两者都无法完全优化。然后，我们提出了一种方法论，用于同时优化超参数和奖励函数。此外，我们将方差惩罚作为一个优化目标，以提高学习策略的稳定性。我们在四个环境中使用Proximal Policy Optimization和Soft Actor-Critic进行了大量实验。我们的结果表明，综合优化明显优于基准性能的一半环境，并在其余环境中取得了竞争性能，仅在计算成本上略微增加。这表明综合优化应该是最佳实践。

更新时间: 2024-06-26 12:23:54

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2406.18293v1

Deep Fusion: Efficient Network Training via Pre-trained Initializations

In recent years, deep learning has made remarkable progress in a wide range of domains, with a particularly notable impact on natural language processing tasks. One of the challenges associated with training deep neural networks in the context of LLMs is the need for large amounts of computational resources and time. To mitigate this, network growing algorithms offer potential cost savings, but their underlying mechanisms are poorly understood. We present two notable contributions in this paper. First, we present Deep Fusion, an efficient approach to network training that leverages pre-trained initializations of smaller networks. Second, we propose a theoretical framework using backward error analysis to illustrate the dynamics of mid-training network growth. Our experiments show how Deep Fusion is a practical and effective approach that not only accelerates the training process but also reduces computational requirements, maintaining or surpassing traditional training methods' performance in various NLP tasks and T5 model sizes. Finally, we validate our theoretical framework, which guides the optimal use of Deep Fusion, showing that with carefully optimized training dynamics, it significantly reduces both training time and resource consumption.

Updated: 2024-06-26 12:16:57

标题: 深度融合：通过预训练初始化实现高效网络训练

摘要: 近年来，深度学习在各个领域取得了显著进展，尤其在自然语言处理任务中产生了显著影响。在LLMs背景下训练深度神经网络所面临的挑战之一是需要大量的计算资源和时间。为了缓解这一挑战，网络增长算法提供了潜在的成本节约可能性，但它们的基本机制尚不完全理解。本文提出了两个显著的贡献。首先，我们提出了Deep Fusion，一种利用较小网络的预训练初始化的高效网络训练方法。其次，我们提出了一个使用反向误差分析的理论框架，以说明中间训练网络增长的动态。我们的实验证明，Deep Fusion是一种实用且有效的方法，不仅加快了训练过程，还减少了计算需求，在各种NLP任务和T5模型大小中保持或超越传统训练方法的性能。最后，我们验证了我们的理论框架，指导了深度融合的最佳使用，表明通过精心优化的训练动态，它显著减少了训练时间和资源消耗。

更新时间: 2024-06-26 12:16:57

领域: cs.LG

下载: http://arxiv.org/abs/2306.11903v3

Weisfeiler Leman for Euclidean Equivariant Machine Learning

The $k$-Weisfeiler-Leman ($k$-WL) graph isomorphism test hierarchy is a common method for assessing the expressive power of graph neural networks (GNNs). Recently, GNNs whose expressive power is equivalent to the $2$-WL test were proven to be universal on weighted graphs which encode $3\mathrm{D}$ point cloud data, yet this result is limited to invariant continuous functions on point clouds. In this paper, we extend this result in three ways: Firstly, we show that PPGN can simulate $2$-WL uniformly on all point clouds with low complexity. Secondly, we show that $2$-WL tests can be extended to point clouds which include both positions and velocities, a scenario often encountered in applications. Finally, we provide a general framework for proving equivariant universality and leverage it to prove that a simple modification of this invariant PPGN architecture can be used to obtain a universal equivariant architecture that can approximate all continuous equivariant functions uniformly. Building on our results, we develop our WeLNet architecture, which sets new state-of-the-art results on the N-Body dynamics task and the GEOM-QM9 molecular conformation generation task.

Updated: 2024-06-26 12:11:16

标题: Weisfeiler-Leman用于欧几里德等变机器学习

摘要: $k$-Weisfeiler-Leman ($k$-WL)图同构测试层次结构是评估图神经网络（GNNs）表达能力的常用方法。最近，证明了表达能力等同于$2$-WL测试的GNNs在编码$3\mathrm{D}$点云数据的加权图上是普遍的，然而这一结果仅限于点云上的不变连续函数。在本文中，我们通过三种方式扩展了这一结果：首先，我们展示了PPGN可以以低复杂度在所有点云上均匀模拟$2$-WL。其次，我们展示了$2$-WL测试可以扩展到包含位置和速度的点云，这在应用中经常遇到。最后，我们提供了一个证明等变普遍性的通用框架，并利用它证明了对这一不变PPGN架构的简单修改可用于获得一个可近似所有连续等变函数的普遍等变架构。基于我们的结果，我们开发了我们的WeLNet架构，该架构在N-Body动力学任务和GEOM-QM9分子构象生成任务上取得了新的最先进的结果。

更新时间: 2024-06-26 12:11:16

领域: cs.LG

下载: http://arxiv.org/abs/2402.02484v3

CAS: Confidence Assessments of classification algorithms for Semantic segmentation of EO data

Confidence assessments of semantic segmentation algorithms in remote sensing are important. It is a desirable property of models to a priori know if they produce an incorrect output. Evaluations of the confidence assigned to the estimates of models for the task of classification in Earth Observation (EO) are crucial as they can be used to achieve improved semantic segmentation performance and prevent high error rates during inference and deployment. The model we develop, the Confidence Assessments of classification algorithms for Semantic segmentation (CAS) model, performs confidence evaluations at both the segment and pixel levels, and outputs both labels and confidence. The outcome of this work has important applications. The main application is the evaluation of EO Foundation Models on semantic segmentation downstream tasks, in particular land cover classification using satellite Copernicus Sentinel-2 data. The evaluation shows that the proposed model is effective and outperforms other alternative baseline models.

Updated: 2024-06-26 12:05:49

标题: CAS：遥感数据语义分割分类算法的置信度评估

摘要: 在遥感中，对语义分割算法的置信度评估非常重要。模型的一个理想特性是在先验条件下知道它们是否产生了错误的输出。对于地球观测（EO）分类任务的模型估计的置信度评估至关重要，因为它们可以用来实现改进的语义分割性能，并在推断和部署过程中防止高误差率。我们开发的模型，即用于语义分割的分类算法的置信度评估（CAS）模型，在分段和像素级别执行置信度评估，并输出标签和置信度。这项工作的结果具有重要应用价值。主要应用是在语义分割下游任务上评估EO基础模型，特别是使用卫星哥白尼哨兵-2数据进行土地覆盖分类。评估表明，所提出的模型是有效的，并且优于其他替代基线模型。

更新时间: 2024-06-26 12:05:49

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2406.18279v1

Guarantees in Security: A Philosophical Perspective

Research in cybersecurity may seem reactive, specific, ephemeral, and indeed ineffective. Despite decades of innovation in defense, even the most critical software systems turn out to be vulnerable to attacks. Time and again. Offense and defense forever on repeat. Even provable security, meant to provide an indubitable guarantee of security, does not stop attackers from finding security flaws. As we reflect on our achievements, we are left wondering: Can security be solved once and for all? In this paper, we take a philosophical perspective and develop the first theory of cybersecurity that explains what *fundamentally* prevents us from making reliable statements about the security of a software system. We substantiate each argument by demonstrating how the corresponding challenge is routinely exploited to attack a system despite credible assurances about the absence of security flaws. To make meaningful progress in the presence of these challenges, we introduce a philosophy of cybersecurity.

Updated: 2024-06-26 11:46:50

标题: 《安全保障：哲学视角》

摘要: 网络安全研究似乎是被动的、具体的、短暂的，事实上是无效的。尽管防御方面有几十年的创新，甚至最关键的软件系统也会被攻击。一次又一次。攻击和防御永远在重复。即使是可证明的安全，旨在提供不容置疑的安全保障，也无法阻止攻击者发现安全漏洞。当我们反思我们的成就时，我们不禁想到：安全问题是否能够一劳永逸地解决？在本文中，我们从哲学的角度出发，发展了第一个解释我们为什么不能对软件系统的安全性做出可靠论断的网络安全理论。我们通过展示相应的挑战如何被常规利用来攻击系统，尽管有关安全漏洞不存在的可靠保证，来证明每个论点。为了在面对这些挑战时取得有意义的进展，我们引入了一种网络安全哲学。

更新时间: 2024-06-26 11:46:50

领域: cs.CR,cs.SE

下载: http://arxiv.org/abs/2402.01944v4

360$^\circ$REA: Towards A Reusable Experience Accumulation with 360° Assessment for Multi-Agent System

Large language model agents have demonstrated remarkable advancements across various complex tasks. Recent works focus on optimizing the agent team or employing self-reflection to iteratively solve complex tasks. Since these agents are all based on the same LLM, only conducting self-evaluation or removing underperforming agents does not substantively enhance the capability of the agents. We argue that a comprehensive evaluation and accumulating experience from evaluation feedback is an effective approach to improving system performance. In this paper, we propose Reusable Experience Accumulation with 360$^\circ$ Assessment (360$^\circ$REA), a hierarchical multi-agent framework inspired by corporate organizational practices. The framework employs a novel 360$^\circ$ performance assessment method for multi-perspective performance evaluation with fine-grained assessment. To enhance the capability of agents in addressing complex tasks, we introduce dual-level experience pool for agents to accumulate experience through fine-grained assessment. Extensive experiments on complex task datasets demonstrate the effectiveness of 360$^\circ$REA.

Updated: 2024-06-26 11:42:10

标题: 360°REA：为多智能体系统提供可重复使用的360°评估经验积累

摘要: 大型语言模型代理已在各种复杂任务中展示出显著的进展。最近的研究集中在优化代理团队或利用自我反思来迭代解决复杂任务。由于这些代理都基于相同的LLM，仅进行自我评估或移除表现不佳的代理并不能实质性地增强代理的能力。我们认为，进行全面评估并积累评估反馈经验是改善系统性能的有效方法。在本文中，我们提出了360$^\circ$评估可重复使用经验积累（360$^\circ$REA）的层次化多代理框架，受公司组织实践启发。该框架采用一种新颖的360$^\circ$绩效评估方法，用于多角度绩效评估和细粒度评估。为了增强代理在处理复杂任务中的能力，我们引入了双层经验池，让代理通过细粒度评估积累经验。对复杂任务数据集的广泛实验表明360$^\circ$REA的有效性。

更新时间: 2024-06-26 11:42:10

领域: cs.AI,cs.CL,cs.MA

下载: http://arxiv.org/abs/2404.05569v2

Unbiased least squares regression via averaged stochastic gradient descent

We consider an on-line least squares regression problem with optimal solution $\theta^*$ and Hessian matrix H, and study a time-average stochastic gradient descent estimator of $\theta^*$. For $k\ge2$, we provide an unbiased estimator of $\theta^*$ that is a modification of the time-average estimator, runs with an expected number of time-steps of order k, with O(1/k) expected excess risk. The constant behind the O notation depends on parameters of the regression and is a poly-logarithmic function of the smallest eigenvalue of H. We provide both a biased and unbiased estimator of the expected excess risk of the time-average estimator and of its unbiased counterpart, without requiring knowledge of either H or $\theta^*$. We describe an "average-start" version of our estimators with similar properties. Our approach is based on randomized multilevel Monte Carlo. Our numerical experiments confirm our theoretical findings.

Updated: 2024-06-26 11:39:22

标题: 通过平均随机梯度下降实现无偏最小二乘回归

摘要: 我们考虑一个具有最优解$\theta^*$和Hessian矩阵H的在线最小二乘回归问题，并研究$\theta^*$的时间平均随机梯度下降估计器。对于$k\ge2$，我们提供了$\theta^*$的一个无偏估计器，它是时间平均估计器的修改版本，以期望数量级为k运行，期望超额风险为O(1/k)。O符号后面的常数取决于回归参数，并且是H的最小特征值的多对数函数。我们提供了时间平均估计器和其无偏对应物的期望超额风险的有偏和无偏估计器，而不需要了解H或$\theta^*$。我们描述了我们估计器的一个具有类似性质的“平均启动”版本。我们的方法基于随机多层蒙特卡洛。我们的数值实验证实了我们的理论发现。

更新时间: 2024-06-26 11:39:22

领域: stat.ML,cs.LG,stat.ME,62Jxx 65K05 65C05

下载: http://arxiv.org/abs/2406.18623v1

GlucOS: Security, correctness, and simplicity for automated insulin delivery

Type 1 Diabetes (T1D) is a metabolic disorder where an individual's pancreas stops producing insulin. To compensate, they inject synthetic insulin. Computer systems, called automated insulin delivery systems, exist that inject insulin automatically. However, insulin is a dangerous hormone, where too much insulin can kill people in a matter of hours and too little insulin can kill people in a matter of days. In this paper, we take on the challenge of building a new trustworthy automated insulin delivery system, called GlucOS. In our design, we apply separation principles to keep our implementation simple, we use formal methods to prove correct the most critical parts of the system, and we design novel security mechanisms and policies to withstand malicious components and attacks on the system. We report on real world use for one individual for 6 months using GlucOS. Our data shows that for this individual, our ML-based algorithm runs safely and manages their T1D effectively. We also run our system on 21 virtual humans using simulations and show that our security and safety mechanisms enable ML to improve their core T1D measures of metabolic health by 4.3\% on average. Finally, we show that our security and safety mechanisms maintain recommended levels of control over T1D even in the face of active attacks that would have otherwise led to death. GlucOS is open source and our code is available on GitHub.

Updated: 2024-06-26 11:20:15

标题: GlucOS：自动胰岛素输送的安全性、正确性和简单性

摘要: 糖尿病1型（T1D）是一种代谢紊乱，个体的胰腺停止产生胰岛素。为了补偿，他们注射合成胰岛素。存在名为自动胰岛素传递系统的计算机系统，可以自动注射胰岛素。然而，胰岛素是一种危险的激素，过量的胰岛素可以在几个小时内杀死人，而过少的胰岛素可以在几天内杀死人。在本文中，我们接受了构建一个新的可信任的自动胰岛素传递系统GlucOS的挑战。在我们的设计中，我们应用分离原则使我们的实现简单，我们使用形式化方法证明系统的最关键部分是正确的，并设计新颖的安全机制和策略以抵御恶意组件和对系统的攻击。我们报告了一个个体在6个月内使用GlucOS的实际使用情况。我们的数据显示，对于这个个体，我们基于机器学习的算法安全运行并有效管理他们的T1D。我们还在21名虚拟人体上运行我们的系统，进行模拟，并展示我们的安全和安全机制使机器学习平均提高他们核心T1D代谢健康水平4.3％。最后，我们展示我们的安全和安全机制即使面对可能导致死亡的主动攻击，也能保持对T1D的推荐控制水平。 GlucOS是开源的，我们的代码可在GitHub上获得。

更新时间: 2024-06-26 11:20:15

领域: cs.CR

下载: http://arxiv.org/abs/2406.18262v1

Enhancing Geometric Ontology Embeddings for $\mathcal{EL}^{++}$ with Negative Sampling and Deductive Closure Filtering

Ontology embeddings map classes, relations, and individuals in ontologies into $\mathbb{R}^n$, and within $\mathbb{R}^n$ similarity between entities can be computed or new axioms inferred. For ontologies in the Description Logic $\mathcal{EL}^{++}$, several embedding methods have been developed that explicitly generate models of an ontology. However, these methods suffer from some limitations; they do not distinguish between statements that are unprovable and provably false, and therefore they may use entailed statements as negatives. Furthermore, they do not utilize the deductive closure of an ontology to identify statements that are inferred but not asserted. We evaluated a set of embedding methods for $\mathcal{EL}^{++}$ ontologies based on high-dimensional ball representation of concept descriptions, incorporating several modifications that aim to make use of the ontology deductive closure. In particular, we designed novel negative losses that account both for the deductive closure and different types of negatives. We demonstrate that our embedding methods improve over the baseline ontology embedding in the task of knowledge base or ontology completion.

Updated: 2024-06-26 11:17:13

标题: 增强几何本体嵌入对于$\mathcal{EL}^{++}$的负采样和演绎闭包过滤

摘要: 本体嵌入将本体中的类、关系和个体映射到$\mathbb{R}^n$中，而在$\mathbb{R}^n$中，实体之间的相似性可以计算或推断新的公理。对于$\mathcal{EL}^{++}$描述逻辑中的本体，已经开发了几种嵌入方法，明确地生成本体的模型。然而，这些方法存在一些局限性；它们不能区分无法证明和可证明为假的陈述，因此可能将蕴含的陈述用作否定。此外，它们不利用本体的演绎闭包来识别被推断但未被断言的陈述。我们评估了一组基于概念描述的高维球表示的$\mathcal{EL}^{++}$本体的嵌入方法，包括几种旨在利用本体演绎闭包的修改。特别是，我们设计了新颖的负损失，既考虑了演绎闭包，又考虑了不同类型的否定。我们证明了我们的嵌入方法在知识库或本体完成任务中优于基线本体嵌入。

更新时间: 2024-06-26 11:17:13

领域: cs.AI

下载: http://arxiv.org/abs/2405.04868v2

Detecting Machine-Generated Texts: Not Just "AI vs Humans" and Explainability is Complicated

As LLMs rapidly advance, increasing concerns arise regarding risks about actual authorship of texts we see online and in real world. The task of distinguishing LLM-authored texts is complicated by the nuanced and overlapping behaviors of both machines and humans. In this paper, we challenge the current practice of considering LLM-generated text detection a binary classification task of differentiating human from AI. Instead, we introduce a novel ternary text classification scheme, adding an "undecided" category for texts that could be attributed to either source, and we show that this new category is crucial to understand how to make the detection result more explainable to lay users. This research shifts the paradigm from merely classifying to explaining machine-generated texts, emphasizing need for detectors to provide clear and understandable explanations to users. Our study involves creating four new datasets comprised of texts from various LLMs and human authors. Based on new datasets, we performed binary classification tests to ascertain the most effective SOTA detection methods and identified SOTA LLMs capable of producing harder-to-detect texts. We constructed a new dataset of texts generated by two top-performing LLMs and human authors, and asked three human annotators to produce ternary labels with explanation notes. This dataset was used to investigate how three top-performing SOTA detectors behave in new ternary classification context. Our results highlight why "undecided" category is much needed from the viewpoint of explainability. Additionally, we conducted an analysis of explainability of the three best-performing detectors and the explanation notes of the human annotators, revealing insights about the complexity of explainable detection of machine-generated texts. Finally, we propose guidelines for developing future detection systems with improved explanatory power.

Updated: 2024-06-26 11:11:47

标题: 检测机器生成的文本：不仅仅是“人工智能 vs 人类”，解释性是复杂的

摘要: 随着大型语言模型（LLMs）的快速发展，人们越来越关注在线和现实世界中我们看到的文本的实际作者身份的风险。区分LLM编写的文本的任务受到机器和人类行为的微妙和重叠的复杂性的挑战。在本文中，我们质疑目前将LLM生成文本检测视为区分人类和人工智能的二分类任务的做法。相反，我们引入了一种新颖的三元文本分类方案，为可能归因于任一来源的文本添加了一个“未决定”类别，并展示了这一新类别对于理解如何使检测结果更易于向普通用户解释的重要性。这项研究将范式从简单分类转变为解释机器生成文本，强调了检测器需要向用户提供清晰明了的解释的必要性。我们的研究涉及创建四个由各种LLMs和人类作者的文本组成的新数据集。基于新数据集，我们进行了二分类测试，以确定最有效的SOTA检测方法，并确定了能够生成更难检测文本的SOTA LLMs。我们构建了一个由两个表现最佳的LLMs和人类作者生成的文本组成的新数据集，并要求三名人类注释者产生带有解释注释的三元标签。这个数据集被用来研究三个表现最佳的SOTA检测器在新的三元分类环境中的行为。我们的结果突显了为什么从可解释性的角度来看，“未决定”类别是非常必要的。此外，我们对三个效果最佳的检测器的可解释性以及人类注释者的解释注释进行了分析，揭示了有关解释机器生成文本可检测性的复杂性的见解。最后，我们提出了为开发具有改进解释能力的未来检测系统的指导方针。

更新时间: 2024-06-26 11:11:47

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2406.18259v1

Improving the Consistency in Cross-Lingual Cross-Modal Retrieval with 1-to-K Contrastive Learning

Cross-lingual Cross-modal Retrieval (CCR) is an essential task in web search, which aims to break the barriers between modality and language simultaneously and achieves image-text retrieval in the multi-lingual scenario with a single model. In recent years, excellent progress has been made based on cross-lingual cross-modal pre-training; particularly, the methods based on contrastive learning on large-scale data have significantly improved retrieval tasks. However, these methods directly follow the existing pre-training methods in the cross-lingual or cross-modal domain, leading to two problems of inconsistency in CCR: The methods with cross-lingual style suffer from the intra-modal error propagation, resulting in inconsistent recall performance across languages in the whole dataset. The methods with cross-modal style suffer from the inter-modal optimization direction bias, resulting in inconsistent rank across languages within each instance, which cannot be reflected by Recall@K. To solve these problems, we propose a simple but effective 1-to-K contrastive learning method, which treats each language equally and eliminates error propagation and optimization bias. In addition, we propose a new evaluation metric, Mean Rank Variance (MRV), to reflect the rank inconsistency across languages within each instance. Extensive experiments on four CCR datasets show that our method improves both recall rates and MRV with smaller-scale pre-trained data, achieving the new state-of-art.

Updated: 2024-06-26 11:04:25

标题: 利用1对K对比学习提升跨语言跨模态检索的一致性

摘要: 跨语言跨模态检索（CCR）是网络搜索中的一个重要任务，旨在同时突破模态和语言之间的障碍，在多语言场景下通过单个模型实现图像-文本检索。近年来，基于跨语言跨模态预训练取得了显著进展；特别是基于大规模数据上的对比学习方法显著改进了检索任务。然而，这些方法直接遵循现有的跨语言或跨模态领域的预训练方法，导致CCR中存在两个不一致问题：采用跨语言风格的方法遭受模态内误差传播，导致整个数据集中各语言之间的召回性能不一致。采用跨模态风格的方法则受到了跨模态优化方向偏差的影响，导致每个实例中各语言之间的排名不一致，这不能由Recall@K来反映。为解决这些问题，我们提出了一种简单但有效的1对K对比学习方法，平等对待每种语言，消除误差传播和优化偏差。此外，我们提出了一个新的评估指标，平均排名方差（MRV），以反映每个实例中各语言之间的排名不一致性。对四个CCR数据集的广泛实验表明，我们的方法通过规模较小的预训练数据提高了召回率和MRV，实现了新的最先进水平。

更新时间: 2024-06-26 11:04:25

领域: cs.IR,cs.AI,cs.MM

下载: http://arxiv.org/abs/2406.18254v1

A Survey of Generative AI for de novo Drug Design: New Frontiers in Molecule and Protein Generation

Artificial intelligence (AI)-driven methods can vastly improve the historically costly drug design process, with various generative models already in widespread use. Generative models for de novo drug design, in particular, focus on the creation of novel biological compounds entirely from scratch, representing a promising future direction. Rapid development in the field, combined with the inherent complexity of the drug design process, creates a difficult landscape for new researchers to enter. In this survey, we organize de novo drug design into two overarching themes: small molecule and protein generation. Within each theme, we identify a variety of subtasks and applications, highlighting important datasets, benchmarks, and model architectures and comparing the performance of top models. We take a broad approach to AI-driven drug design, allowing for both micro-level comparisons of various methods within each subtask and macro-level observations across different fields. We discuss parallel challenges and approaches between the two applications and highlight future directions for AI-driven de novo drug design as a whole. An organized repository of all covered sources is available at https://github.com/gersteinlab/GenAI4Drug.

Updated: 2024-06-26 11:03:21

标题: 一项关于生成式人工智能在新药设计中的应用的调查：分子和蛋白质生成的新领域

摘要: 人工智能（AI）驱动的方法可以极大改进历史上昂贵的药物设计过程，各种生成模型已被广泛应用。特别是针对新型药物设计的生成模型，重点在于从头开始创造全新的生物化合物，展现了一个充满希望的未来方向。领域快速发展，结合药物设计过程的固有复杂性，使得新研究者难以进入。在这项调查中，我们将新型药物设计分为两个主题：小分子和蛋白质生成。在每个主题中，我们确定了各种子任务和应用，突出重要的数据集、基准和模型架构，并比较顶级模型的性能。我们对AI驱动的药物设计采取了广泛的方法，允许在每个子任务中对各种方法进行微观水平的比较，以及在不同领域之间进行宏观观察。我们讨论了两种应用之间的并行挑战和方法，并突出了AI驱动的新型药物设计的未来方向。所有涵盖的来源的组织存储库可在https://github.com/gersteinlab/GenAI4Drug上找到。

更新时间: 2024-06-26 11:03:21

领域: q-bio.BM,cs.AI,cs.LG

下载: http://arxiv.org/abs/2402.08703v2

Mitigate the Gap: Investigating Approaches for Improving Cross-Modal Alignment in CLIP

Contrastive Language--Image Pre-training (CLIP) has manifested remarkable improvements in zero-shot classification and cross-modal vision-language tasks. Yet, from a geometrical point of view, the CLIP embedding space has been found to have a pronounced modality gap. This gap renders the embedding space overly sparse and disconnected, with different modalities being densely distributed in distinct subregions of the hypersphere. In this work, we aim at answering two main questions: 1. Does sharing the parameter space between the multi-modal encoders reduce the modality gap? 2. Can the gap be mitigated by pushing apart the uni-modal embeddings via intra-modality separation? We design AlignCLIP, in order to answer these questions and show that answers to both questions are positive. Through extensive experiments, we show that AlignCLIP achieves noticeable enhancements in the cross-modal alignment of the embeddings, and thereby, reduces the modality gap, while maintaining the performance across several downstream evaluations, such as zero-shot image classification, zero-shot multi-modal retrieval and zero-shot semantic text similarity.

Updated: 2024-06-26 10:58:48

标题: 缓解差距：探讨改进CLIP中跨模态对齐的方法

摘要: 对比语言-图像预训练（CLIP）在零样本分类和跨模态视觉语言任务中表现出显著的改进。然而，从几何角度来看，发现CLIP嵌入空间存在明显的模态差距。这种差距使得嵌入空间过于稀疏和不连贯，不同的模态在超球面的不同子区域中密集分布。在这项工作中，我们旨在回答两个主要问题：1. 在多模态编码器之间共享参数空间是否能减少模态差距？2. 是否可以通过在模态内部分离来减轻差距？我们设计了AlignCLIP来回答这些问题，并展示了这两个问题的答案是积极的。通过广泛的实验，我们展示AlignCLIP在嵌入的跨模态对齐方面取得了显著的提升，从而减少了模态差距，同时在多个下游评估中保持了性能，如零样本图像分类、零样本多模态检索和零样本语义文本相似度。

更新时间: 2024-06-26 10:58:48

领域: cs.CV,cs.AI,cs.CL,cs.LG

下载: http://arxiv.org/abs/2406.17639v2

Adversarial Multi-dueling Bandits

We introduce the problem of regret minimization in adversarial multi-dueling bandits. While adversarial preferences have been studied in dueling bandits, they have not been explored in multi-dueling bandits. In this setting, the learner is required to select $m \geq 2$ arms at each round and observes as feedback the identity of the most preferred arm which is based on an arbitrary preference matrix chosen obliviously. We introduce a novel algorithm, MiDEX (Multi Dueling EXP3), to learn from such preference feedback that is assumed to be generated from a pairwise-subset choice model. We prove that the expected cumulative $T$-round regret of MiDEX compared to a Borda-winner from a set of $K$ arms is upper bounded by $O((K \log K)^{1/3} T^{2/3})$. Moreover, we prove a lower bound of $\Omega(K^{1/3} T^{2/3})$ for the expected regret in this setting which demonstrates that our proposed algorithm is near-optimal.

Updated: 2024-06-26 10:57:40

标题: 对抗性多重对决老虎机

摘要: 我们介绍了对抗性多对决赌徒中的后悔最小化问题。虽然对抗性偏好已经在对决赌徒中进行了研究，但在多对决赌徒中还没有探讨过。在这种情况下，学习者需要在每一轮选择 $m \geq 2$ 个臂，并观察基于任意偏好矩阵选择的最受偏好臂的身份作为反馈。我们引入了一种新的算法 MiDEX（多对决EXP3），用于从假设生成的配对子集选择模型中学习偏好反馈。我们证明了MiDEX相对于来自 $K$ 个臂集合的Borda赢家的预期累积 $T$ 轮后悔的上界为 $O((K \log K)^{1/3} T^{2/3})$。此外，我们证明了在这种情况下预期后悔的下界为 $\Omega(K^{1/3} T^{2/3})，这表明我们提出的算法是接近最优的。

更新时间: 2024-06-26 10:57:40

领域: cs.LG

下载: http://arxiv.org/abs/2406.12475v2

Foundational Models for Pathology and Endoscopy Images: Application for Gastric Inflammation

The integration of artificial intelligence (AI) in medical diagnostics represents a significant advancement in managing upper gastrointestinal (GI) cancer, a major cause of global cancer mortality. Specifically for gastric cancer (GC), chronic inflammation causes changes in the mucosa such as atrophy, intestinal metaplasia (IM), dysplasia and ultimately cancer. Early detection through endoscopic regular surveillance is essential for better outcomes. Foundation models (FM), which are machine or deep learning models trained on diverse data and applicable to broad use cases, offer a promising solution to enhance the accuracy of endoscopy and its subsequent pathology image analysis. This review explores the recent advancements, applications, and challenges associated with FM in endoscopy and pathology imaging. We started by elucidating the core principles and architectures underlying these models, including their training methodologies and the pivotal role of large-scale data in developing their predictive capabilities. Moreover, this work discusses emerging trends and future research directions, emphasizing the integration of multimodal data, the development of more robust and equitable models, and the potential for real-time diagnostic support. This review aims to provide a roadmap for researchers and practitioners in navigating the complexities of incorporating FM into clinical practice for prevention/management of GC cases, thereby improving patient outcomes.

Updated: 2024-06-26 10:51:44

标题: 基于病理学和内镜图像的基础模型：在胃炎中的应用

摘要: 人工智能（AI）在医学诊断中的整合代表了管理上消化道（GI）癌症的重大进展，这是全球癌症死亡的主要原因之一。特别是对于胃癌（GC），慢性炎症导致黏膜发生变化，如萎缩、肠上皮化生（IM）、异型增生，最终发展为癌症。通过内窥镜定期监测早期发现对于更好的结果至关重要。基础模型（FM）是在多样化数据上训练的机器或深度学习模型，适用于广泛的用例，为增强内窥镜及其后续病理图像分析的准确性提供了有希望的解决方案。本综述探讨了FM在内窥镜和病理学成像中的最新进展、应用和挑战。我们首先阐明了这些模型的核心原理和架构，包括它们的训练方法以及大规模数据在发展预测能力中的关键作用。此外，本研究讨论了新兴趋势和未来研究方向，强调了多模态数据的整合、更强大和公平的模型的开发，以及实时诊断支持的潜力。这篇综述旨在为研究人员和从业者提供一份路线图，帮助他们在预防/管理GC病例的临床实践中引入FM，从而改善患者结果。

更新时间: 2024-06-26 10:51:44

领域: cs.LG,cs.CV

下载: http://arxiv.org/abs/2406.18249v1

Generative artificial intelligence in ophthalmology: multimodal retinal images for the diagnosis of Alzheimer's disease with convolutional neural networks

Background/Aim. This study aims to predict Amyloid Positron Emission Tomography (AmyloidPET) status with multimodal retinal imaging and convolutional neural networks (CNNs) and to improve the performance through pretraining with synthetic data. Methods. Fundus autofluorescence, optical coherence tomography (OCT), and OCT angiography images from 328 eyes of 59 AmyloidPET positive subjects and 108 AmyloidPET negative subjects were used for classification. Denoising Diffusion Probabilistic Models (DDPMs) were trained to generate synthetic images and unimodal CNNs were pretrained on synthetic data and finetuned on real data or trained solely on real data. Multimodal classifiers were developed to combine predictions of the four unimodal CNNs with patient metadata. Class activation maps of the unimodal classifiers provided insight into the network's attention to inputs. Results. DDPMs generated diverse, realistic images without memorization. Pretraining unimodal CNNs with synthetic data improved AUPR at most from 0.350 to 0.579. Integration of metadata in multimodal CNNs improved AUPR from 0.486 to 0.634, which was the best overall best classifier. Class activation maps highlighted relevant retinal regions which correlated with AD. Conclusion. Our method for generating and leveraging synthetic data has the potential to improve AmyloidPET prediction from multimodal retinal imaging. A DDPM can generate realistic and unique multimodal synthetic retinal images. Our best performing unimodal and multimodal classifiers were not pretrained on synthetic data, however pretraining with synthetic data slightly improved classification performance for two out of the four modalities.

Updated: 2024-06-26 10:49:26

标题: 《眼科学中的生成人工智能：利用卷积神经网络对多模态视网膜图像进行阿尔茨海默病诊断》

摘要: 背景/目的。本研究旨在通过多模态视网膜成像和卷积神经网络（CNNs）来预测淀粉样正电子发射断层扫描（AmyloidPET）状态，并通过合成数据的预训练来提高性能。方法。使用59名AmyloidPET阳性受试者和108名AmyloidPET阴性受试者的328只眼的眼底荧光、光学相干断层扫描（OCT）和OCT血管造影图像进行分类。训练去噪扩散概率模型（DDPMs）生成合成图像，对单模态CNN进行合成数据的预训练，并在真实数据上进行微调，或仅在真实数据上进行训练。开发了多模态分类器，将四个单模态CNN的预测与患者元数据进行结合。单模态分类器的类激活图提供了关于网络对输入的关注的见解。结果。DDPMs生成了多样化、逼真的图像而没有记忆。使用合成数据对单模态CNN进行预训练将AUPR从0.350提高到0.579。在多模态CNN中整合元数据将AUPR从0.486提高到0.634，这是最佳的分类器。类激活图突出显示了与阿尔茨海默病相关的视网膜区域。结论。我们生成和利用合成数据的方法有潜力从多模态视网膜成像中改善AmyloidPET预测。DDPM可以生成逼真且独特的多模态合成视网膜图像。我们表现最佳的单模态和多模态分类器并未在合成数据上进行预训练，但在两种模态中，使用合成数据进行预训练略微提高了分类性能。

更新时间: 2024-06-26 10:49:26

领域: eess.IV,cs.CV,cs.LG

下载: http://arxiv.org/abs/2406.18247v1

Zero-shot prompt-based classification: topic labeling in times of foundation models in German Tweets

Filtering and annotating textual data are routine tasks in many areas, like social media or news analytics. Automating these tasks allows to scale the analyses wrt. speed and breadth of content covered and decreases the manual effort required. Due to technical advancements in Natural Language Processing, specifically the success of large foundation models, a new tool for automating such annotation processes by using a text-to-text interface given written guidelines without providing training samples has become available. In this work, we assess these advancements in-the-wild by empirically testing them in an annotation task on German Twitter data about social and political European crises. We compare the prompt-based results with our human annotation and preceding classification approaches, including Naive Bayes and a BERT-based fine-tuning/domain adaptation pipeline. Our results show that the prompt-based approach - despite being limited by local computation resources during the model selection - is comparable with the fine-tuned BERT but without any annotated training data. Our findings emphasize the ongoing paradigm shift in the NLP landscape, i.e., the unification of downstream tasks and elimination of the need for pre-labeled training data.

Updated: 2024-06-26 10:44:02

标题: 零样本提示驱动的分类：德国推文中的基于主题的标记化

摘要: 在许多领域，如社交媒体或新闻分析中，过滤和注释文本数据是常规任务。自动化这些任务可以扩展分析的速度和覆盖的内容范围，减少了所需的手动工作量。由于自然语言处理技术的技术进步，特别是大型基础模型的成功，一种新的工具通过使用文本到文本接口在不提供训练样本的情况下给出书面指导来自动化这种注释过程已经可用。在这项工作中，我们通过在德国Twitter数据上进行社会和政治欧洲危机的注释任务的实证测试来评估这些进展。我们将基于提示的结果与我们的人工注释和先前的分类方法进行比较，包括朴素贝叶斯和基于BERT的微调/领域适应管道。我们的结果表明，尽管在模型选择过程中受到局部计算资源的限制，基于提示的方法与经过精细调整的BERT相当，但没有任何已注释的训练数据。我们的研究结果强调了自然语言处理领域的持续范式转变，即下游任务的统一和消除预先标记训练数据的需求。

更新时间: 2024-06-26 10:44:02

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2406.18239v1

PlaMo: Plan and Move in Rich 3D Physical Environments

Controlling humanoids in complex physically simulated worlds is a long-standing challenge with numerous applications in gaming, simulation, and visual content creation. In our setup, given a rich and complex 3D scene, the user provides a list of instructions composed of target locations and locomotion types. To solve this task we present PlaMo, a scene-aware path planner and a robust physics-based controller. The path planner produces a sequence of motion paths, considering the various limitations the scene imposes on the motion, such as location, height, and speed. Complementing the planner, our control policy generates rich and realistic physical motion adhering to the plan. We demonstrate how the combination of both modules enables traversing complex landscapes in diverse forms while responding to real-time changes in the environment. Video: https://youtu.be/wWlqSQlRZ9M .

Updated: 2024-06-26 10:41:07

标题: PlaMo：在丰富的3D物理环境中规划和移动

摘要: 在复杂的物理模拟世界中控制人形机器人是一个长期存在的挑战，具有在游戏、模拟和视觉内容创作中广泛应用的许多应用。在我们的设置中，用户提供了一个丰富而复杂的3D场景，用户提供了一个由目标位置和运动类型组成的指令列表。为了解决这个任务，我们提出了PlaMo，一个场景感知路径规划器和一个稳健的基于物理的控制器。路径规划器生成一系列运动路径，考虑到场景对运动的各种限制，如位置、高度和速度。作为规划器的补充，我们的控制策略生成丰富而逼真的物理运动，符合计划。我们展示了这两个模块的结合如何使机器人能够以不同形式穿越复杂的地形，并对环境中的实时变化做出响应。视频：https://youtu.be/wWlqSQlRZ9M。

更新时间: 2024-06-26 10:41:07

领域: cs.AI,cs.GR,cs.RO

下载: http://arxiv.org/abs/2406.18237v1

Quantifying Arbitrage in Automated Market Makers: An Empirical Study of Ethereum ZK Rollups

Arbitrage can arise from the simultaneous purchase and sale of the same asset in different markets in order to profit from a difference in its price. This work systematically reviews arbitrage opportunities between Automated Market Makers (AMMs) on Ethereum ZK rollups, and Centralised Exchanges (CEXs). First, we propose a theoretical framework to measure such arbitrage opportunities and derive a formula for the related Maximal Arbitrage Value (MAV) that accounts for both price divergences and liquidity available in the trading venues. Then, we empirically measure the historical MAV available between SyncSwap, an AMM on zkSync Era, and Binance, and investigate how quickly misalignments in price are corrected against explicit and implicit market costs. Overall, the cumulative MAV from July to September 2023 on the USDC-ETH SyncSwap pool amounts to $104.96k (0.24% of trading volume).

Updated: 2024-06-26 10:40:08

标题: 量化自动市场制造商中的套利：以以太坊 ZK Rollups 为例的实证研究

摘要: 套利可以由同时在不同市场购买和销售同一资产而产生，以从其价格差异中获利。本文系统地审查了以太坊ZK Rollups上的自动做市商（AMMs）和中心化交易所（CEXs）之间的套利机会。首先，我们提出了一个理论框架来衡量这种套利机会，并推导出一个相关的最大套利价值（MAV）的公式，该公式考虑了交易场所中的价格差异和流动性。然后，我们从历史数据中实测了SyncSwap（zkSync Era上的AMM）和币安之间可用的MAV，并调查了价格失调如何快速纠正，以及明确和隐含市场成本。总体而言，从2023年7月到9月，USDC-ETH SyncSwap池的累积MAV金额为104.96万美元（交易量的0.24%）。

更新时间: 2024-06-26 10:40:08

领域: cs.CR

下载: http://arxiv.org/abs/2403.16083v2

ChangeMamba: Remote Sensing Change Detection with Spatio-Temporal State Space Model

Convolutional neural networks (CNN) and Transformers have made impressive progress in the field of remote sensing change detection (CD). However, both architectures have inherent shortcomings: CNN are constrained by a limited receptive field that may hinder their ability to capture broader spatial contexts, while Transformers are computationally intensive, making them costly to train and deploy on large datasets. Recently, the Mamba architecture, based on state space models, has shown remarkable performance in a series of natural language processing tasks, which can effectively compensate for the shortcomings of the above two architectures. In this paper, we explore for the first time the potential of the Mamba architecture for remote sensing CD tasks. We tailor the corresponding frameworks, called MambaBCD, MambaSCD, and MambaBDA, for binary change detection (BCD), semantic change detection (SCD), and building damage assessment (BDA), respectively. All three frameworks adopt the cutting-edge Visual Mamba architecture as the encoder, which allows full learning of global spatial contextual information from the input images. For the change decoder, which is available in all three architectures, we propose three spatio-temporal relationship modeling mechanisms, which can be naturally combined with the Mamba architecture and fully utilize its attribute to achieve spatio-temporal interaction of multi-temporal features, thereby obtaining accurate change information. On five benchmark datasets, our proposed frameworks outperform current CNN- and Transformer-based approaches without using any complex training strategies or tricks, fully demonstrating the potential of the Mamba architecture in CD tasks. Further experiments show that our architecture is quite robust to degraded data. The source code will be available in https://github.com/ChenHongruixuan/MambaCD

Updated: 2024-06-26 10:38:29

标题: ChangeMamba：利用时空状态空间模型进行遥感变化检测

摘要: 卷积神经网络（CNN）和Transformer在遥感变化检测（CD）领域取得了令人瞩目的进展。然而，这两种架构都存在固有的缺点：CNN受限于有限的感受野，可能会阻碍其捕获更广泛的空间上下文能力，而Transformer计算密集，使其在大型数据集上训练和部署成本昂贵。最近，基于状态空间模型的Mamba架构在一系列自然语言处理任务中表现出卓越的性能，可以有效弥补上述两种架构的缺点。本文首次探讨了Mamba架构在遥感CD任务中的潜力。我们为二值变化检测（BCD）、语义变化检测（SCD）和建筑损害评估（BDA）分别定制了相应的框架，称为MambaBCD、MambaSCD和MambaBDA。所有三个框架都采用最先进的Visual Mamba架构作为编码器，允许从输入图像中完全学习全局空间上下文信息。对于三种架构中都可用的变化解码器，我们提出了三种时空关系建模机制，可以自然地与Mamba架构结合，并充分利用其属性来实现多时相特征的时空交互，从而获得准确的变化信息。在五个基准数据集上，我们提出的框架优于当前基于CNN和Transformer的方法，而不使用任何复杂的训练策略或技巧，充分展示了Mamba架构在CD任务中的潜力。进一步实验证明我们的架构对降质数据相当稳健。源代码将在https://github.com/ChenHongruixuan/MambaCD 上提供。

更新时间: 2024-06-26 10:38:29

领域: eess.IV,cs.AI,cs.CV

下载: http://arxiv.org/abs/2404.03425v5

ObjFormer: Learning Land-Cover Changes From Paired OSM Data and Optical High-Resolution Imagery via Object-Guided Transformer

Optical high-resolution imagery and OSM data are two important data sources of change detection (CD). Previous related studies focus on utilizing the information in OSM data to aid the CD on optical high-resolution images. This paper pioneers the direct detection of land-cover changes utilizing paired OSM data and optical imagery, thereby expanding the scope of CD tasks. To this end, we propose an object-guided Transformer (ObjFormer) by naturally combining the object-based image analysis (OBIA) technique with the advanced vision Transformer architecture. This combination can significantly reduce the computational overhead in the self-attention module without adding extra parameters or layers. ObjFormer has a hierarchical pseudo-siamese encoder consisting of object-guided self-attention modules that extracts multi-level heterogeneous features from OSM data and optical images; a decoder consisting of object-guided cross-attention modules can recover land-cover changes from the extracted heterogeneous features. Beyond basic binary change detection, this paper raises a new semi-supervised semantic change detection task that does not require any manually annotated land-cover labels to train semantic change detectors. Two lightweight semantic decoders are added to ObjFormer to accomplish this task efficiently. A converse cross-entropy loss is designed to fully utilize negative samples, contributing to the great performance improvement in this task. A large-scale benchmark dataset called OpenMapCD containing 1,287 samples covering 40 regions on six continents is constructed to conduct detailed experiments. The results show the effectiveness of our methods in this new kind of CD task. Additionally, case studies in Japanese cities demonstrate the framework's generalizability and practical potential. The OpenMapCD and source code are available in https://github.com/ChenHongruixuan/ObjFormer

Updated: 2024-06-26 10:31:54

标题: ObjFormer：通过配对的OSM数据和光学高分辨率图像，通过对象引导的Transformer学习土地覆盖变化

摘要: 光学高分辨率图像和OSM数据是变化检测（CD）的两个重要数据源。先前的相关研究集中于利用OSM数据中的信息来帮助光学高分辨率图像上的CD。本文首次尝试直接利用配对的OSM数据和光学图像检测土地覆盖变化，从而拓展了CD任务的范围。为此，我们提出了一种对象引导Transformer（ObjFormer），通过自然地将基于对象的图像分析（OBIA）技术与先进的视觉Transformer架构相结合。这种组合可以显著减少自注意模块中的计算开销，且无需增加额外的参数或层。ObjFormer具有一个层次伪孪生编码器，由对象引导的自注意模块组成，从OSM数据和光学图像中提取多层异质特征；一个由对象引导的交叉注意模块组成的解码器可以从提取的异质特征中恢复土地覆盖变化。除了基本的二进制变化检测，本文提出了一个新的无需手动标记土地覆盖标签训练语义变化检测器的半监督语义变化检测任务。向ObjFormer中添加了两个轻量级语义解码器，以有效完成此任务。设计了一种逆交叉熵损失，充分利用负样本，从而在此任务中取得了巨大的性能改进。构建了一个名为OpenMapCD的大规模基准数据集，包含1,287个样本，涵盖六大洲的40个地区，以进行详细实验。结果表明我们的方法在这种新型CD任务中的有效性。此外，在日本城市的案例研究中展示了框架的泛化能力和实际潜力。OpenMapCD和源代码可在https://github.com/ChenHongruixuan/ObjFormer中获得。

更新时间: 2024-06-26 10:31:54

领域: cs.CV,cs.AI,cs.CY,cs.MM

下载: http://arxiv.org/abs/2310.02674v3

SoK: Web Authentication in the Age of End-to-End Encryption

The advent of end-to-end encrypted (E2EE) messaging and backup services has brought new challenges for usable authentication. Compared to regular web services, the nature of E2EE implies that the provider cannot recover data for users who have forgotten passwords or lost devices. Therefore, new forms of robustness and recoverability are required, leading to a plethora of solutions ranging from randomly-generated recovery codes to threshold-based social verification. These implications also spread to new forms of authentication and legacy web services: passwordless authentication ("passkeys") has become a promising candidate to replace passwords altogether, but are inherently device-bound. However, users expect that they can login from multiple devices and recover their passwords in case of device loss--prompting providers to sync credentials to cloud storage using E2EE, resulting in the very same authentication challenges of regular E2EE services. Hence, E2EE authentication quickly becomes relevant not only for a niche group of dedicated E2EE enthusiasts but for the general public using the passwordless authentication techniques promoted by their device vendors. In this paper we systematize existing research literature and industry practice relating to security, privacy, usability, and recoverability of E2EE authentication. We investigate authentication and recovery schemes in all widely-used E2EE web services and survey passwordless authentication deployment in the top-200 most popular websites. Finally, we present concrete research directions based on observed gaps between industry deployment and academic literature.

Updated: 2024-06-26 10:23:58

标题: SoK: 在端到端加密时代的网络身份验证

摘要: 端到端加密（E2EE）消息传递和备份服务的出现为可用身份验证带来了新挑战。与普通网络服务相比，E2EE的性质意味着提供者无法恢复忘记密码或丢失设备的用户数据。因此，需要新形式的稳健性和可恢复性，这导致了各种解决方案的出现，从随机生成的恢复代码到基于阈值的社交验证。这些影响也延伸到新形式的身份验证和传统网络服务：无密码身份验证（“通行证”）已成为完全替代密码的有前途的候选方案，但本质上是与设备绑定的。然而，用户期望能够从多个设备登录并在设备丢失的情况下恢复他们的密码--促使提供者将凭据同步到使用E2EE的云存储中，导致与常规E2EE服务相同的身份验证挑战。因此，E2EE身份验证迅速成为不仅适用于一小部分专注于E2EE的爱好者，而且适用于使用其设备供应商推广的无密码身份验证技术的普通公众。在本文中，我们系统化了与E2EE身份验证的安全性、隐私性、可用性和可恢复性相关的现有研究文献和行业实践。我们调查了所有广泛使用的E2EE网络服务中的身份验证和恢复方案，并调查了排名前200的最受欢迎的网站中的无密码身份验证部署情况。最后，我们根据观察到的行业部署与学术文献之间的差距提出了具体的研究方向。

更新时间: 2024-06-26 10:23:58

领域: cs.CR

下载: http://arxiv.org/abs/2406.18226v1

Visual Odometry with Neuromorphic Resonator Networks

Visual Odometry (VO) is a method to estimate self-motion of a mobile robot using visual sensors. Unlike odometry based on integrating differential measurements that can accumulate errors, such as inertial sensors or wheel encoders, visual odometry is not compromised by drift. However, image-based VO is computationally demanding, limiting its application in use cases with low-latency, -memory, and -energy requirements. Neuromorphic hardware offers low-power solutions to many vision and AI problems, but designing such solutions is complicated and often has to be assembled from scratch. Here we propose to use Vector Symbolic Architecture (VSA) as an abstraction layer to design algorithms compatible with neuromorphic hardware. Building from a VSA model for scene analysis, described in our companion paper, we present a modular neuromorphic algorithm that achieves state-of-the-art performance on two-dimensional VO tasks. Specifically, the proposed algorithm stores and updates a working memory of the presented visual environment. Based on this working memory, a resonator network estimates the changing location and orientation of the camera. We experimentally validate the neuromorphic VSA-based approach to VO with two benchmarks: one based on an event camera dataset and the other in a dynamic scene with a robotic task.

Updated: 2024-06-26 10:17:08

标题: 使用神经形态谐振网络进行视觉里程计

摘要: Visual Odometry（VO）是一种利用视觉传感器估计移动机器人自身运动的方法。与基于积分差分测量的测距技术不同，后者可能会积累误差，比如惯性传感器或轮式编码器，视觉测距技术不会受到漂移的影响。然而，基于图像的VO在计算上要求高，限制了它在低延迟、内存和能耗要求的使用场景中的应用。神经形态硬件为许多视觉和人工智能问题提供了低功耗解决方案，但设计这种解决方案是复杂的，通常需要从头开始组装。在这里，我们提议使用矢量符号结构（VSA）作为一个抽象层，设计与神经形态硬件兼容的算法。基于我们在伴随论文中描述的场景分析的VSA模型，我们提出了一个模块化的神经形态算法，该算法在二维VO任务上实现了最先进的性能。具体而言，所提出的算法存储并更新所呈现的视觉环境的工作内存。基于这个工作内存，一个共振器网络估计相机的变化位置和方向。我们通过两个基准实验验证了基于神经形态VSA的VO方法：一个基于事件相机数据集，另一个在具有机器人任务的动态场景中。

更新时间: 2024-06-26 10:17:08

领域: cs.RO,cs.AI,cs.CV,cs.NE,I.4.9

下载: http://arxiv.org/abs/2209.02000v3

Improving Local Training in Federated Learning via Temperature Scaling

Federated learning is inherently hampered by data heterogeneity: non-i.i.d. training data over local clients. We propose a novel model training approach for federated learning, FLex&Chill, which exploits the Logit Chilling method. Through extensive evaluations, we demonstrate that, in the presence of non-i.i.d. data characteristics inherent in federated learning systems, this approach can expedite model convergence and improve inference accuracy. Quantitatively, from our experiments, we observe up to 6X improvement in the global federated learning model convergence time, and up to 3.37% improvement in inference accuracy.

Updated: 2024-06-26 10:16:46

标题: 通过温度缩放改进联邦学习中的本地训练

摘要: 联邦学习受数据异质性的困扰：本地客户端上的非独立同分布训练数据。我们提出了一种新颖的联邦学习模型训练方法FLex&Chill，利用Logit Chilling方法。通过广泛的评估，我们证明，在联邦学习系统中存在的非独立同分布数据特征的情况下，这种方法可以加快模型收敛速度，并提高推理准确性。定量地，从我们的实验中，我们观察到全局联邦学习模型收敛时间提高了高达6倍，并且推理准确性提高了高达3.37%。

更新时间: 2024-06-26 10:16:46

领域: cs.LG,cs.AI,68,I.2.11

下载: http://arxiv.org/abs/2401.09986v2

Neuromorphic Visual Scene Understanding with Resonator Networks

Analyzing a visual scene by inferring the configuration of a generative model is widely considered the most flexible and generalizable approach to scene understanding. Yet, one major problem is the computational challenge of the inference procedure, involving a combinatorial search across object identities and poses. Here we propose a neuromorphic solution exploiting three key concepts: (1) a computational framework based on Vector Symbolic Architectures (VSA) with complex-valued vectors; (2) the design of Hierarchical Resonator Networks (HRN) to factorize the non-commutative transforms translation and rotation in visual scenes; (3) the design of a multi-compartment spiking phasor neuron model for implementing complex-valued resonator networks on neuromorphic hardware. The VSA framework uses vector binding operations to form a generative image model in which binding acts as the equivariant operation for geometric transformations. A scene can, therefore, be described as a sum of vector products, which can then be efficiently factorized by a resonator network to infer objects and their poses. The HRN features a partitioned architecture in which vector binding is equivariant for horizontal and vertical translation within one partition and for rotation and scaling within the other partition. The spiking neuron model allows mapping the resonator network onto efficient and low-power neuromorphic hardware. Our approach is demonstrated on synthetic scenes composed of simple 2D shapes undergoing rigid geometric transformations and color changes. A companion paper demonstrates the same approach in real-world application scenarios for machine vision and robotics.

Updated: 2024-06-26 10:16:08

标题: 用谐振网络实现的神经形态视觉场景理解

摘要: 通过推断生成模型的配置来分析视觉场景，被广泛认为是对场景理解最灵活和通用的方法。然而，一个主要问题是推理过程的计算挑战，涉及跨对象身份和姿势的组合搜索。在这里，我们提出了一个神经形态学的解决方案，利用了三个关键概念：（1）基于复值向量的矢量符号架构（VSA）的计算框架；（2）设计了用于分解视觉场景中的非交换变换平移和旋转的分层共振网络（HRN）；（3）设计了一个多室脉冲相位神经元模型，用于在神经形态硬件上实现复值共振器网络。VSA框架使用矢量绑定操作来形成生成图像模型，在其中绑定作为几何变换的等变操作。因此，一个场景可以被描述为矢量乘积的总和，然后可以通过共振器网络有效地分解来推断对象及其姿势。HRN具有分区架构，在其中矢量绑定对一个分区内的水平和垂直平移是等变的，对另一个分区内的旋转和缩放也是等变的。脉冲神经元模型允许将共振器网络映射到高效和低功耗的神经形态硬件上。我们的方法在由简单2D形状组成的合成场景上进行了演示，这些形状经历了刚性几何变换和颜色变化。一篇相关论文展示了相同的方法在机器视觉和机器人技术的真实应用场景中的应用。

更新时间: 2024-06-26 10:16:08

领域: cs.CV,cs.AI,cs.NE,eess.IV,I.4.8

下载: http://arxiv.org/abs/2208.12880v4

A Single Graph Convolution Is All You Need: Efficient Grayscale Image Classification

Image classifiers for domain-specific tasks like Synthetic Aperture Radar Automatic Target Recognition (SAR ATR) and chest X-ray classification often rely on convolutional neural networks (CNNs). These networks, while powerful, experience high latency due to the number of operations they perform, which can be problematic in real-time applications. Many image classification models are designed to work with both RGB and grayscale datasets, but classifiers that operate solely on grayscale images are less common. Grayscale image classification has critical applications in fields such as medical imaging and SAR ATR. In response, we present a novel grayscale image classification approach using a vectorized view of images. By leveraging the lightweight nature of Multi-Layer Perceptrons (MLPs), we treat images as vectors, simplifying the problem to grayscale image classification. Our approach incorporates a single graph convolutional layer in a batch-wise manner, enhancing accuracy and reducing performance variance. Additionally, we develop a customized accelerator on FPGA for our model, incorporating several optimizations to improve performance. Experimental results on benchmark grayscale image datasets demonstrate the effectiveness of our approach, achieving significantly lower latency (up to $16\times$ less on MSTAR) and competitive or superior performance compared to state-of-the-art models for SAR ATR and medical image classification.

Updated: 2024-06-26 10:08:51

标题: 一次图卷积就足够：高效灰度图像分类

摘要: 图像分类器用于特定领域任务，如合成孔径雷达自动目标识别（SAR ATR）和胸部X射线分类通常依赖于卷积神经网络（CNNs）。这些网络虽然功能强大，但由于执行的操作数量而经历高延迟，这在实时应用中可能会有问题。许多图像分类模型设计用于处理RGB和灰度数据集，但仅使用灰度图像的分类器较少见。灰度图像分类在医学影像和SAR ATR等领域具有关键应用。为此，我们提出了一种新颖的灰度图像分类方法，利用图像的矢量化视图。通过利用多层感知器（MLPs）的轻量特性，我们将图像视为向量，将问题简化为灰度图像分类。我们的方法以批处理方式引入了单个图卷积层，提高了准确性并减少了性能差异。此外，我们为我们的模型开发了一个在FPGA上的定制加速器，包括几种优化以提高性能。对基准灰度图像数据集的实验结果证明了我们方法的有效性，实现了显著较低的延迟（在MSTAR上高达16倍）并且相比于SAR ATR和医学图像分类的最先进模型，性能具有竞争力或更优越。

更新时间: 2024-06-26 10:08:51

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2402.00564v6

Enhancing Data Privacy in Large Language Models through Private Association Editing

Large Language Models (LLMs) are powerful tools with extensive applications, but their tendency to memorize private information raises significant concerns as private data leakage can easily happen. In this paper, we introduce Private Association Editing (PAE), a novel defense approach for private data leakage. PAE is designed to effectively remove Personally Identifiable Information (PII) without retraining the model. Our approach consists of a four-step procedure: detecting memorized PII, applying PAE cards to mitigate memorization of private data, verifying resilience to targeted data extraction (TDE) attacks, and ensuring consistency in the post-edit LLMs. The versatility and efficiency of PAE, which allows for batch modifications, significantly enhance data privacy in LLMs. Experimental results demonstrate the effectiveness of PAE in mitigating private data leakage. We believe PAE will serve as a critical tool in the ongoing effort to protect data privacy in LLMs, encouraging the development of safer models for real-world applications.

Updated: 2024-06-26 10:08:47

标题: 通过私密关联编辑增强大型语言模型中的数据隐私

摘要: 大型语言模型（LLMs）是具有广泛应用的强大工具，但它们倾向于记忆私人信息引发了重大关注，因为私人数据泄露很容易发生。在本文中，我们介绍了私人关联编辑（PAE），这是一种新颖的防御方法，用于防止私人数据泄露。PAE旨在有效地删除个人可识别信息（PII），而无需重新训练模型。我们的方法包括四个步骤：检测已记忆的PII，应用PAE卡以减轻对私人数据的记忆，验证对目标数据提取（TDE）攻击的抵抗力，并确保后编辑LLMs的一致性。PAE的多功能性和效率，允许批量修改，显著增强了LLMs中的数据隐私。实验结果表明，PAE在减轻私人数据泄露方面的有效性。我们相信PAE将成为保护LLMs中数据隐私的关键工具，在鼓励开发更安全的模型应用于现实世界的努力中发挥作用。

更新时间: 2024-06-26 10:08:47

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2406.18221v1

Guiding Video Prediction with Explicit Procedural Knowledge

We propose a general way to integrate procedural knowledge of a domain into deep learning models. We apply it to the case of video prediction, building on top of object-centric deep models and show that this leads to a better performance than using data-driven models alone. We develop an architecture that facilitates latent space disentanglement in order to use the integrated procedural knowledge, and establish a setup that allows the model to learn the procedural interface in the latent space using the downstream task of video prediction. We contrast the performance to a state-of-the-art data-driven approach and show that problems where purely data-driven approaches struggle can be handled by using knowledge about the domain, providing an alternative to simply collecting more data.

Updated: 2024-06-26 10:08:24

标题: 用明确的程序性知识引导视频预测

摘要: 我们提出了一种将领域的程序知识整合到深度学习模型中的通用方式。我们将其应用于视频预测案例，基于对象为中心的深度模型，并展示这比仅使用数据驱动模型具有更好的性能。我们开发了一种架构，促进潜在空间的解缠以利用整合的程序知识，并建立了一个设置，使模型能够在潜在空间中通过视频预测的下游任务学习程序接口。我们将性能与最先进的数据驱动方法进行对比，并展示纯粹数据驱动方法难以处理的问题可以通过使用关于领域的知识来解决，提供了一个简单收集更多数据的替代方法。

更新时间: 2024-06-26 10:08:24

领域: cs.CV,cs.AI,cs.LG

下载: http://arxiv.org/abs/2406.18220v1

A Closer Look into Mixture-of-Experts in Large Language Models

Mixture-of-experts (MoE) is gaining increasing attention due to its unique properties and remarkable performance, especially for language tasks. By sparsely activating a subset of parameters for each token, MoE architecture could increase the model size without sacrificing computational efficiency, achieving a better trade-off between performance and training costs. However, the underlying mechanism of MoE still lacks further exploration, and its modularization degree remains questionable. In this paper, we make an initial attempt to understand the inner workings of MoE-based large language models. Concretely, we comprehensively study the parametric and behavioral features of three recent MoE-based models and reveal some intriguing observations, including (1) Neurons act like fine-grained experts. (2) The router of MoE usually selects experts with larger output norms. (3) The expert diversity increases as the layer increases, while the last layer is an outlier. Based on the observations, we also provide suggestions for a broad spectrum of MoE practitioners, such as router design and expert allocation. We hope this work could shed light on future research on the MoE framework and other modular architectures. Code is available at https://github.com/kamanphoebe/Look-into-MoEs.

Updated: 2024-06-26 10:07:57

标题: 大语言模型中专家混合模型的深入研究

摘要: 混合专家模型（MoE）因其独特的特性和卓越的性能，特别是在语言任务中，而备受关注。通过为每个标记稀疏激活一组参数，MoE架构可以增加模型大小而不损失计算效率，实现更好的性能和训练成本之间的平衡。然而，MoE的基本机制仍然缺乏进一步的探索，其模块化程度仍然存在疑问。在本文中，我们首次尝试理解基于MoE的大型语言模型的内部运作方式。具体而言，我们全面研究了三个最近基于MoE的模型的参数和行为特征，并揭示了一些有趣的观察结果，包括（1）神经元表现得像精细的专家。（2）MoE的路由器通常选择输出范数较大的专家。（3）随着层次的增加，专家的多样性也增加，而最后一层是一个异常值。根据观察结果，我们还为广泛的MoE从业者提供建议，如路由器设计和专家分配。我们希望这项工作能为未来对MoE框架和其他模块化体系结构的研究提供启示。源代码可在https://github.com/kamanphoebe/Look-into-MoEs找到。

更新时间: 2024-06-26 10:07:57

领域: cs.CL,cs.LG

下载: http://arxiv.org/abs/2406.18219v1

MedMNIST-C: Comprehensive benchmark and improved classifier robustness by simulating realistic image corruptions

The integration of neural-network-based systems into clinical practice is limited by challenges related to domain generalization and robustness. The computer vision community established benchmarks such as ImageNet-C as a fundamental prerequisite to measure progress towards those challenges. Similar datasets are largely absent in the medical imaging community which lacks a comprehensive benchmark that spans across imaging modalities and applications. To address this gap, we create and open-source MedMNIST-C, a benchmark dataset based on the MedMNIST+ collection covering 12 datasets and 9 imaging modalities. We simulate task and modality-specific image corruptions of varying severity to comprehensively evaluate the robustness of established algorithms against real-world artifacts and distribution shifts. We further provide quantitative evidence that our simple-to-use artificial corruptions allow for highly performant, lightweight data augmentation to enhance model robustness. Unlike traditional, generic augmentation strategies, our approach leverages domain knowledge, exhibiting significantly higher robustness when compared to widely adopted methods. By introducing MedMNIST-C and open-sourcing the corresponding library allowing for targeted data augmentations, we contribute to the development of increasingly robust methods tailored to the challenges of medical imaging. The code is available at https://github.com/francescodisalvo05/medmnistc-api}{github.com/francescodisalvo05/medmnistc-api .

Updated: 2024-06-26 09:52:47

标题: MedMNIST-C：通过模拟真实图像破坏来进行全面基准测试并提高分类器的鲁棒性

摘要: 将神经网络系统整合到临床实践中受到与领域泛化和鲁棒性相关的挑战的限制。计算机视觉社区建立了像ImageNet-C这样的基准，作为衡量向这些挑战取得进展的基本先决条件。医学影像领域缺乏跨影像模态和应用的综合基准，相似的数据集在很大程度上缺失。为了填补这一空白，我们创建并开源了MedMNIST-C，这是一个基于MedMNIST+集合的基准数据集，涵盖12个数据集和9种影像模态。我们模拟了不同严重程度的任务和模态特定图像污染，全面评估建立的算法对真实世界工件和分布变化的鲁棒性。我们进一步提供定量证据表明，我们简单易用的人工污染允许进行高效、轻量级的数据增强，以增强模型的鲁棒性。与传统的通用增强策略不同，我们的方法利用领域知识，在与广泛采用的方法相比表现出显著更高的鲁棒性。通过引入MedMNIST-C并开源相应的库，允许进行有针对性的数据增强，我们为发展越来越具有针对医学影像挑战的鲁棒性方法做出贡献。该代码可在https://github.com/francescodisalvo05/medmnistc-api找到。

更新时间: 2024-06-26 09:52:47

领域: eess.IV,cs.CV,cs.LG

下载: http://arxiv.org/abs/2406.17536v2

AI Cards: Towards an Applied Framework for Machine-Readable AI and Risk Documentation Inspired by the EU AI Act

With the upcoming enforcement of the EU AI Act, documentation of high-risk AI systems and their risk management information will become a legal requirement playing a pivotal role in demonstration of compliance. Despite its importance, there is a lack of standards and guidelines to assist with drawing up AI and risk documentation aligned with the AI Act. This paper aims to address this gap by providing an in-depth analysis of the AI Act's provisions regarding technical documentation, wherein we particularly focus on AI risk management. On the basis of this analysis, we propose AI Cards as a novel holistic framework for representing a given intended use of an AI system by encompassing information regarding technical specifications, context of use, and risk management, both in human- and machine-readable formats. While the human-readable representation of AI Cards provides AI stakeholders with a transparent and comprehensible overview of the AI use case, its machine-readable specification leverages on state of the art Semantic Web technologies to embody the interoperability needed for exchanging documentation within the AI value chain. This brings the flexibility required for reflecting changes applied to the AI system and its context, provides the scalability needed to accommodate potential amendments to legal requirements, and enables development of automated tools to assist with legal compliance and conformity assessment tasks. To solidify the benefits, we provide an exemplar AI Card for an AI-based student proctoring system and further discuss its potential applications within and beyond the context of the AI Act.

Updated: 2024-06-26 09:51:49

标题: AI卡片：受欧盟AI法案启发的机器可读AI和风险文件的应用框架

摘要: 随着欧盟人工智能法案的即将实施，高风险人工智能系统及其风险管理信息的文档化将成为法定要求，在合规证明中发挥关键作用。尽管其重要性，但目前缺乏标准和指南来协助制定与人工智能法案一致的人工智能和风险文档。本文旨在通过深入分析人工智能法案有关技术文档的规定，特别关注人工智能风险管理，以填补这一空白。基于这一分析，我们提出了AI卡片作为一种新颖的全面框架，通过包含有关技术规格、使用环境和风险管理的信息，以人类和机器可读格式来代表给定人工智能系统的预期用途。虽然AI卡片的人类可读表示提供给人工智能利益相关者透明且易理解的人工智能用例概述，其机器可读规范利用最先进的语义网络技术来体现交换人工智能价值链内文档所需的互操作性。这带来了反映应用于人工智能系统及其环境的变化所需的灵活性，提供了适应潜在法律要求修订的扩展性，并促进了开发自动化工具来协助法律合规和一致性评估任务。为了巩固这些好处，我们提供了一个基于人工智能的学生监考系统的示例AI卡片，并进一步讨论其在人工智能法案的背景内外的潜在应用。

更新时间: 2024-06-26 09:51:49

领域: cs.CY,cs.AI

下载: http://arxiv.org/abs/2406.18211v1

Case-Based or Rule-Based: How Do Transformers Do the Math?

Despite the impressive performance in a variety of complex tasks, modern large language models (LLMs) still have trouble dealing with some math problems that are simple and intuitive for humans, such as addition. While we can easily learn basic rules of addition and apply them to new problems of any length, LLMs struggle to do the same. Instead, they may rely on similar cases seen in the training corpus for help. We define these two different reasoning mechanisms as "rule-based reasoning" and "case-based reasoning". Since rule-based reasoning is essential for acquiring systematic generalization ability, we aim to explore exactly whether transformers use rule-based or case-based reasoning for math problems. Through carefully designed intervention experiments on five math tasks, we confirm that transformers are performing case-based reasoning, no matter whether scratchpad is used, which aligns with the previous observations that transformers use subgraph matching/shortcut learning to reason. To mitigate such problems, we propose a Rule-Following Fine-Tuning (RFFT) technique to teach transformers to perform rule-based reasoning. Specifically, we provide explicit rules in the input and then instruct transformers to recite and follow the rules step by step. Through RFFT, we successfully enable LLMs fine-tuned on 1-5 digit addition to generalize to up to 12-digit addition with over 95% accuracy, which is over 40% higher than scratchpad. The significant improvement demonstrates that teaching LLMs to use rules explicitly helps them learn rule-based reasoning and generalize better in length.

Updated: 2024-06-26 09:25:07

标题: 基于案例还是基于规则：变压器是如何进行数学计算的？

摘要: 尽管现代大型语言模型（LLMs）在各种复杂任务中表现出色，但仍然难以处理一些对人类来说简单直观的数学问题，如加法。虽然我们可以轻松学习加法的基本规则并将其应用于任意长度的新问题，但LLMs难以做到。相反，它们可能依赖于训练语料库中看到的类似案例来帮助解决问题。我们将这两种不同的推理机制定义为“基于规则的推理”和“基于案例的推理”。由于基于规则的推理对于获得系统化泛化能力至关重要，我们旨在探究变压器在数学问题中究竟使用基于规则还是基于案例的推理。通过对五项数学任务进行精心设计的干预实验，我们确认变压器正在执行基于案例的推理，无论是否使用草稿本，这与以前的观察结果一致，即变压器使用子图匹配/快捷学习来推理。为了缓解这种问题，我们提出了一种规则遵循微调（RFFT）技术，教导变压器执行基于规则的推理。具体来说，我们在输入中提供明确的规则，然后指导变压器逐步背诵并遵循规则。通过RFFT，我们成功地使经过1-5位数字加法微调的LLMs能够在超过95%的准确率下泛化到最多12位数字加法，比草稿本高出40%以上。显著的改进表明，教导LLMs明确使用规则有助于它们学习基于规则的推理并在长度上更好地进行泛化。

更新时间: 2024-06-26 09:25:07

领域: cs.AI,cs.CL

下载: http://arxiv.org/abs/2402.17709v2

MammothModa: Multi-Modal Large Language Model

In this report, we introduce MammothModa, yet another multi-modal large language model (MLLM) designed to achieve state-of-the-art performance starting from an elementary baseline. We focus on three key design insights: (i) Integrating Visual Capabilities while Maintaining Complex Language Understanding: In addition to the vision encoder, we incorporated the Visual Attention Experts into the LLM to enhance its visual capabilities. (ii) Extending Context Window for High-Resolution and Long-Duration Visual Feature: We explore the Visual Merger Module to effectively reduce the token number of high-resolution images and incorporated frame position ids to avoid position interpolation. (iii) High-Quality Bilingual Datasets: We meticulously curated and filtered a high-quality bilingual multimodal dataset to reduce visual hallucinations. With above recipe we build MammothModa that consistently outperforms the state-of-the-art models, e.g., LLaVA-series, across main real-world visual language benchmarks without bells and whistles.

Updated: 2024-06-26 09:17:27

标题: MammothModa: 多模态大型语言模型

摘要: 在这份报告中，我们介绍了MammothModa，这是另一个多模态大型语言模型（MLLM），旨在从基本基线开始实现最先进的性能。我们着重讨论了三个关键设计见解：（i）在保持复杂语言理解的同时整合视觉能力：除了视觉编码器外，我们还将视觉注意力专家整合到LLM中，以增强其视觉能力。（ii）扩展上下文窗口以获取高分辨率和长时间视觉特征：我们探索了视觉融合模块，有效减少了高分辨率图像的令牌数量，并将帧位置ID整合进去，以避免位置插值。（iii）高质量的双语数据集：我们精心策划和过滤了一个高质量的双语多模态数据集，以减少视觉幻觉。通过以上方法，我们构建了MammothModa，它在主要实际视觉语言基准测试中始终优于最先进的模型，例如LLaVA系列，而没有任何花哨的东西。

更新时间: 2024-06-26 09:17:27

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2406.18193v1

Learning Antenna Pointing Correction in Operations: Efficient Calibration of a Black Box

We propose an efficient offline pointing calibration method for operational antenna systems which does not require any downtime. Our approach minimizes the calibration effort and exploits technical signal information which is typically used for monitoring and control purposes in ground station operations. Using a standard antenna interface and data from an operational satellite contact, we come up with a robust strategy for training data set generation. On top of this, we learn the parameters of a suitable coordinate transform by means of linear regression. In our experiments, we show the usefulness of the method in a real-world setup.

Updated: 2024-06-26 09:08:00

标题: 在运营中学习天线指向校正：黑匣子的高效校准

摘要: 我们提出了一种高效的脱机指向校准方法，适用于运行中的天线系统，不需要任何停机时间。我们的方法最小化校准工作量，并利用通常用于地面站操作中监控和控制目的的技术信号信息。使用标准天线接口和来自运行卫星联系的数据，我们提出了一个稳健的训练数据集生成策略。此外，我们通过线性回归学习适当坐标转换的参数。在我们的实验中，我们展示了该方法在真实环境中的实用性。

更新时间: 2024-06-26 09:08:00

领域: cs.LG

下载: http://arxiv.org/abs/2405.15247v2

Selective Prompting Tuning for Personalized Conversations with LLMs

In conversational AI, personalizing dialogues with persona profiles and contextual understanding is essential. Despite large language models' (LLMs) improved response coherence, effective persona integration remains a challenge. In this work, we first study two common approaches for personalizing LLMs: textual prompting and direct fine-tuning. We observed that textual prompting often struggles to yield responses that are similar to the ground truths in datasets, while direct fine-tuning tends to produce repetitive or overly generic replies. To alleviate those issues, we propose \textbf{S}elective \textbf{P}rompt \textbf{T}uning (SPT), which softly prompts LLMs for personalized conversations in a selective way. Concretely, SPT initializes a set of soft prompts and uses a trainable dense retriever to adaptively select suitable soft prompts for LLMs according to different input contexts, where the prompt retriever is dynamically updated through feedback from the LLMs. Additionally, we propose context-prompt contrastive learning and prompt fusion learning to encourage the SPT to enhance the diversity of personalized conversations. Experiments on the CONVAI2 dataset demonstrate that SPT significantly enhances response diversity by up to 90\%, along with improvements in other critical performance indicators. Those results highlight the efficacy of SPT in fostering engaging and personalized dialogue generation. The SPT model code (https://github.com/hqsiswiliam/SPT) is publicly available for further exploration.

Updated: 2024-06-26 09:03:52

标题: 个性化对话中的选择性提示调整LLM

摘要: 在对话人工智能中，通过个人资料和上下文理解个性化对话至关重要。尽管大型语言模型（LLMs）改进了响应连贯性，但有效地整合个性仍然是一个挑战。在这项工作中，我们首先研究了两种常见的个性化LLMs的方法：文本提示和直接微调。我们观察到，文本提示通常难以产生与数据集中的基本事实相似的响应，而直接微调往往会产生重复或过于通用的回复。为了缓解这些问题，我们提出了\textbf{S}elective \textbf{P}rompt \textbf{T}uning（SPT），以一种选择性的方式软提示LLMs进行个性化对话。具体地，SPT初始化一组软提示，并使用可训练的密集检索器根据不同的输入上下文自适应地选择适合LLMs的软提示，其中提示检索器通过LLMs的反馈动态更新。此外，我们提出了上下文提示对比学习和提示融合学习，以鼓励SPT增强个性化对话的多样性。在CONVAI2数据集上的实验证明，SPT显著提高了响应多样性高达90％，同时改善了其他关键性能指标。这些结果突显了SPT在促进引人入胜和个性化对话生成方面的有效性。SPT模型代码（https://github.com/hqsiswiliam/SPT）已公开提供，供进一步探索。

更新时间: 2024-06-26 09:03:52

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2406.18187v1

ClickPrompt: CTR Models are Strong Prompt Generators for Adapting Language Models to CTR Prediction

Click-through rate (CTR) prediction has become increasingly indispensable for various Internet applications. Traditional CTR models convert the multi-field categorical data into ID features via one-hot encoding, and extract the collaborative signals among features. Such a paradigm suffers from the problem of semantic information loss. Another line of research explores the potential of pretrained language models (PLMs) for CTR prediction by converting input data into textual sentences through hard prompt templates. Although semantic signals are preserved, they generally fail to capture the collaborative information (e.g., feature interactions, pure ID features), not to mention the unacceptable inference overhead brought by the huge model size. In this paper, we aim to model both the semantic knowledge and collaborative knowledge for accurate CTR estimation, and meanwhile address the inference inefficiency issue. To benefit from both worlds and close their gaps, we propose a novel model-agnostic framework (i.e., ClickPrompt), where we incorporate CTR models to generate interaction-aware soft prompts for PLMs. We design a prompt-augmented masked language modeling (PA-MLM) pretraining task, where PLM has to recover the masked tokens based on the language context, as well as the soft prompts generated by CTR model. The collaborative and semantic knowledge from ID and textual features would be explicitly aligned and interacted via the prompt interface. Then, we can either tune the CTR model with PLM for superior performance, or solely tune the CTR model without PLM for inference efficiency. Experiments on four real-world datasets validate the effectiveness of ClickPrompt compared with existing baselines.

Updated: 2024-06-26 08:59:47

标题: 点击提示：CTR模型是强大的提示生成器，用于将语言模型调整为CTR预测

摘要: 点击率（CTR）预测已经逐渐成为各种互联网应用中不可或缺的。传统的CTR模型通过独热编码将多字段分类数据转换为ID特征，并提取特征之间的协作信号。这种范式存在语义信息丢失的问题。另一方面的研究探索了预训练语言模型（PLMs）在CTR预测中的潜力，通过将输入数据通过硬提示模板转换为文本句子。虽然语义信号得以保留，但它们通常无法捕捉协作信息（例如，特征交互、纯ID特征），更不用说庞大模型大小带来的无法接受的推理开销。在本文中，我们旨在为准确的CTR估计建模语义知识和协作知识，并同时解决推理效率问题。为了受益于两个领域并弥合它们之间的差距，我们提出了一个新颖的模型无关框架（即ClickPrompt），在其中我们将CTR模型整合到PLMs中生成基于交互感知的软提示。我们设计了一个提示增强的掩码语言建模（PA-MLM）预训练任务，其中PLM必须根据语言上下文和CTR模型生成的软提示恢复掩码标记。通过提示接口，来自ID和文本特征的协作和语义知识将被明确地对齐和互动。然后，我们可以通过PLM调整CTR模型以获得更优越的性能，或者仅通过CTR模型调整而不使用PLM以提高推理效率。对四个真实世界数据集的实验证实了ClickPrompt相对于现有基线的有效性。

更新时间: 2024-06-26 08:59:47

领域: cs.IR,cs.AI

下载: http://arxiv.org/abs/2310.09234v5

ReLLa: Retrieval-enhanced Large Language Models for Lifelong Sequential Behavior Comprehension in Recommendation

With large language models (LLMs) achieving remarkable breakthroughs in natural language processing (NLP) domains, LLM-enhanced recommender systems have received much attention and have been actively explored currently. In this paper, we focus on adapting and empowering a pure large language model for zero-shot and few-shot recommendation tasks. First and foremost, we identify and formulate the lifelong sequential behavior incomprehension problem for LLMs in recommendation domains, i.e., LLMs fail to extract useful information from a textual context of long user behavior sequence, even if the length of context is far from reaching the context limitation of LLMs. To address such an issue and improve the recommendation performance of LLMs, we propose a novel framework, namely Retrieval-enhanced Large Language models (ReLLa) for recommendation tasks in both zero-shot and few-shot settings. For zero-shot recommendation, we perform semantic user behavior retrieval (SUBR) to improve the data quality of testing samples, which greatly reduces the difficulty for LLMs to extract the essential knowledge from user behavior sequences. As for few-shot recommendation, we further design retrieval-enhanced instruction tuning (ReiT) by adopting SUBR as a data augmentation technique for training samples. Specifically, we develop a mixed training dataset consisting of both the original data samples and their retrieval-enhanced counterparts. We conduct extensive experiments on three real-world public datasets to demonstrate the superiority of ReLLa compared with existing baseline models, as well as its capability for lifelong sequential behavior comprehension. To be highlighted, with only less than 10% training samples, few-shot ReLLa can outperform traditional CTR models that are trained on the entire training set (e.g., DCNv2, DIN, SIM). The code is available \url{https://github.com/LaVieEnRose365/ReLLa}.

Updated: 2024-06-26 08:55:55

标题: ReLLa：检索增强的大规模语言模型用于推荐中的生命周期顺序行为理解

摘要: 随着大型语言模型（LLMs）在自然语言处理（NLP）领域取得显著突破，LLM增强的推荐系统引起了广泛关注，并受到当前积极探索。本文重点研究如何调整和增强纯大型语言模型以进行零样本和少样本推荐任务。首先，我们确定并形成了LLMs在推荐领域中的终身顺序行为不理解问题，即LLMs无法从长用户行为序列的文本环境中提取有用信息，即使上下文长度远未达到LLMs的上下文限制。为了解决这个问题并提高LLMs的推荐性能，我们提出了一种新的框架，即用于零样本和少样本设置的检索增强大型语言模型（ReLLa）用于推荐任务。对于零样本推荐，我们进行语义用户行为检索（SUBR）以提高测试样本的数据质量，从而极大地降低LLMs从用户行为序列中提取必要知识的难度。至于少样本推荐，我们进一步设计了检索增强指导调整（ReiT），通过采用SUBR作为训练样本的数据增强技术。具体地，我们开发了一个混合训练数据集，包括原始数据样本和它们的检索增强对应物。我们在三个真实世界的公共数据集上进行了大量实验证明ReLLa相对于现有基线模型的优越性，以及其对终身顺序行为理解的能力。值得强调的是，少样本ReLLa仅使用不到10%的训练样本即可胜过传统CTR模型（例如DCNv2，DIN，SIM）训练的整个训练集。代码可在\url{https://github.com/LaVieEnRose365/ReLLa}中找到。

更新时间: 2024-06-26 08:55:55

领域: cs.IR,cs.AI

下载: http://arxiv.org/abs/2308.11131v6

ECGrecover: a Deep Learning Approach for Electrocardiogram Signal Completion

In this work, we address the challenge of reconstructing the complete 12-lead ECG signal from incomplete parts of it. We focus on two main scenarii: (i) reconstructing missing signal segments within an ECG lead and (ii) recovering missing leads from a single-lead. We propose a model with a U-Net architecture trained on a novel objective function to address the reconstruction problem. This function incorporates both spatial and temporal aspects of the ECG by combining the distance in amplitude between the reconstructed and real signals with the signal trend. Through comprehensive assessments using both a real-life dataset and a publicly accessible one, we demonstrate that the proposed approach consistently outperforms state-of-the-art methods based on generative adversarial networks and a CopyPaste strategy. Our proposed model demonstrates superior performance in standard distortion metrics and preserves critical ECG characteristics, particularly the P, Q, R, S, and T wave coordinates. Two emerging clinical applications emphasize the relevance of our work. The first is the increasing need to digitize paper-stored ECGs for utilization in AI-based applications (automatic annotation and risk-quantification), often limited to digital ECG complete 10s recordings. The second is the widespread use of wearable devices that record ECGs but typically capture only a small subset of the 12 standard leads. In both cases, a non-negligible amount of information is lost or not recorded, which our approach aims to recover to overcome these limitations.

Updated: 2024-06-26 08:54:40

标题: ECGrecover：一种用于心电图信号补全的深度学习方法

摘要: 在这项工作中，我们致力于解决从不完整部分重建完整的12导联心电图信号的挑战。我们关注两个主要情景：（i）在一个心电图导联内重建缺失的信号片段，以及（ii）从单导联中恢复缺失的导联。我们提出了一个基于U-Net架构的模型，该模型使用新颖的目标函数进行训练以解决重建问题。该函数结合了心电图的空间和时间方面，通过将重建信号与真实信号的振幅距离与信号趋势相结合。通过使用真实生活数据集和公开可访问的数据集进行全面评估，我们展示了所提出的方法始终优于基于生成对抗网络和CopyPaste策略的最新方法。我们提出的模型在标准失真度量方面表现出卓越的性能，并保留了关键的心电图特征，特别是P、Q、R、S和T波坐标。两个新兴的临床应用突显了我们工作的相关性。第一个是数字化存储在纸张上的心电图日益增加的需求，以便在基于人工智能的应用程序中使用（自动标注和风险量化），通常限于数字化心电图完整的10秒记录。第二个是可穿戴设备广泛使用记录心电图，但通常只捕获12个标准导联的一个小子集。在这两种情况下，会丢失或未记录的信息量是不可忽视的，我们的方法旨在恢复这些信息以克服这些限制。

更新时间: 2024-06-26 08:54:40

领域: eess.SP,cs.CV,cs.LG

下载: http://arxiv.org/abs/2406.16901v2

DeepExtremeCubes: Integrating Earth system spatio-temporal data for impact assessment of climate extremes

With climate extremes' rising frequency and intensity, robust analytical tools are crucial to predict their impacts on terrestrial ecosystems. Machine learning techniques show promise but require well-structured, high-quality, and curated analysis-ready datasets. Earth observation datasets comprehensively monitor ecosystem dynamics and responses to climatic extremes, yet the data complexity can challenge the effectiveness of machine learning models. Despite recent progress in deep learning to ecosystem monitoring, there is a need for datasets specifically designed to analyse compound heatwave and drought extreme impact. Here, we introduce the DeepExtremeCubes database, tailored to map around these extremes, focusing on persistent natural vegetation. It comprises over 40,000 spatially sampled small data cubes (i.e. minicubes) globally, with a spatial coverage of 2.5 by 2.5 km. Each minicube includes (i) Sentinel-2 L2A images, (ii) ERA5-Land variables and generated extreme event cube covering 2016 to 2022, and (iii) ancillary land cover and topography maps. The paper aims to (1) streamline data accessibility, structuring, pre-processing, and enhance scientific reproducibility, and (2) facilitate biosphere dynamics forecasting in response to compound extremes.

Updated: 2024-06-26 08:53:26

标题: 深度极端方块：整合地球系统时空数据用于气候极端影响评估

摘要: 随着气候极端事件频率和强度的上升，强大的分析工具对于预测其对陆地生态系统的影响至关重要。机器学习技术显示出潜力，但需要结构良好、高质量和经过精心筛选的分析就绪数据集。地球观测数据集全面监测生态系统动态和对气候极端事件的响应，然而数据的复杂性可能挑战机器学习模型的有效性。尽管最近在深度学习方面取得了进展，用于生态系统监测，但仍需要专门设计用于分析复合热浪和干旱极端事件影响的数据集。在这里，我们介绍了DeepExtremeCubes数据库，旨在针对这些极端事件进行映射，重点关注持续性自然植被。它包括全球范围内超过40,000个空间采样的小数据立方体（即小立方体），空间覆盖范围为2.5 x 2.5公里。每个小立方体包括（i）Sentinel-2 L2A图像，（ii）ERA5-Land变量和生成的极端事件立方体，涵盖2016年至2022年，以及（iii）辅助土地覆盖和地形地图。本文旨在（1）简化数据获取、结构化、预处理，并增强科学可重复性，以及（2）促进生物圈动态对复合极端事件的预测。

更新时间: 2024-06-26 08:53:26

领域: cs.LG,cs.DB

下载: http://arxiv.org/abs/2406.18179v1

Galaxy spectroscopy without spectra: Galaxy properties from photometric images with conditional diffusion models

Modern spectroscopic surveys can only target a small fraction of the vast amount of photometrically cataloged sources in wide-field surveys. Here, we report the development of a generative AI method capable of predicting optical galaxy spectra from photometric broad-band images alone. This method draws from the latest advances in diffusion models in combination with contrastive networks. We pass multi-band galaxy images into the architecture to obtain optical spectra. From these, robust values for galaxy properties can be derived with any methods in the spectroscopic toolbox, such as standard population synthesis techniques and Lick indices. When trained and tested on 64x64-pixel images from the Sloan Digital Sky Survey, the global bimodality of star-forming and quiescent galaxies in photometric space is recovered, as well as a mass-metallicity relation of star-forming galaxies. The comparison between the observed and the artificially created spectra shows good agreement in overall metallicity, age, Dn4000, stellar velocity dispersion, and E(B-V) values. Photometric redshift estimates of our generative algorithm can compete with other current, specialized deep-learning techniques. Moreover, this work is the first attempt in the literature to infer velocity dispersion from photometric images. Additionally, we can predict the presence of an active galactic nucleus up to an accuracy of 82%. With our method, scientifically interesting galaxy properties, normally requiring spectroscopic inputs, can be obtained in future data sets from large-scale photometric surveys alone. The spectra prediction via AI can further assist in creating realistic mock catalogs.

Updated: 2024-06-26 08:49:51

标题: 使用条件扩散模型从光度图像中获得的星系性质：无需光谱的星系光谱学

摘要: 现代光谱调查只能针对广域调查中大量光度目录源的一小部分进行目标观测。在这里，我们报告了一种生成式人工智能方法的开发，能够仅通过光度宽带图像预测光学星系光谱。该方法结合了扩散模型和对比网络的最新进展。我们将多波段星系图像输入架构中以获得光学光谱。从中，可以利用任何光谱工具箱中的方法推导出星系属性的稳健值，例如标准种群合成技术和Lick指数。在从斯隆数字天空调查中训练和测试的64x64像素图像上，光学空间中星形和静止星系的全局双峰性得以恢复，以及星形星系的质量-金属丰度关系。观测和人工创建的光谱之间的比较显示在总体金属丰度、年龄、Dn4000、恒星速度离散度和E(B-V)值方面存在良好的一致性。我们的生成算法的光度红移估计可以与其他当前专门的深度学习技术竞争。此外，这项工作是文献中首次尝试从光度图像中推断速度离散度。此外，我们可以以82%的准确率预测活跃星系核的存在。通过我们的方法，未来大规模光度调查的数据集中可以仅通过光度观测获得通常需要光谱输入的科学上有趣的星系属性。通过人工智能进行的光谱预测还可以帮助创建逼真的模拟目录。

更新时间: 2024-06-26 08:49:51

领域: astro-ph.GA,astro-ph.IM,cs.AI

下载: http://arxiv.org/abs/2406.18175v1

Boosting the Cross-Architecture Generalization of Dataset Distillation through an Empirical Study

The poor cross-architecture generalization of dataset distillation greatly weakens its practical significance. This paper attempts to mitigate this issue through an empirical study, which suggests that the synthetic datasets undergo an inductive bias towards the distillation model. Therefore, the evaluation model is strictly confined to having similar architectures of the distillation model. We propose a novel method of EvaLuation with distillation Feature (ELF), which utilizes features from intermediate layers of the distillation model for the cross-architecture evaluation. In this manner, the evaluation model learns from bias-free knowledge therefore its architecture becomes unfettered while retaining performance. By performing extensive experiments, we successfully prove that ELF can well enhance the cross-architecture generalization of current DD methods. Code of this project is at \url{https://github.com/Lirui-Zhao/ELF}.

Updated: 2024-06-26 08:43:43

标题: 通过实证研究提升数据集提炼的跨架构泛化

摘要: 贫乏的跨架构泛化数据集精炼大大削弱了其实际意义。本文试图通过经验研究来缓解这个问题，该研究表明，合成数据集在向精炼模型的归纳偏差。因此，评估模型严格限定于具有与精炼模型相似的架构。我们提出了一种新颖的EvaLuation with distillation Feature (ELF)方法，利用精炼模型的中间层特征进行跨架构评估。通过这种方式，评估模型从无偏知识中学习，因此其架构变得自由而保持性能。通过进行大量实验，我们成功证明ELF能够有效增强当前DD方法的跨架构泛化能力。该项目的代码位于\url{https://github.com/Lirui-Zhao/ELF}。

更新时间: 2024-06-26 08:43:43

领域: cs.LG

下载: http://arxiv.org/abs/2312.05598v2

Towards Deep Active Learning in Avian Bioacoustics

Passive acoustic monitoring (PAM) in avian bioacoustics enables cost-effective and extensive data collection with minimal disruption to natural habitats. Despite advancements in computational avian bioacoustics, deep learning models continue to encounter challenges in adapting to diverse environments in practical PAM scenarios. This is primarily due to the scarcity of annotations, which requires labor-intensive efforts from human experts. Active learning (AL) reduces annotation cost and speed ups adaption to diverse scenarios by querying the most informative instances for labeling. This paper outlines a deep AL approach, introduces key challenges, and conducts a small-scale pilot study.

Updated: 2024-06-26 08:43:05

标题: 走向鸟类生物声学中的深度主动学习

摘要: 在鸟类声学领域，被动声学监测（PAM）能够以成本效益高且对自然栖息地干扰最小的方式进行广泛数据收集。尽管计算鸟类声学领域取得了进展，深度学习模型在实际PAM场景中仍然面临着适应多样化环境的挑战。这主要是由于注释的稀缺性，这需要人类专家进行繁重的劳动。主动学习（AL）通过查询最具信息量的实例进行标记，从而降低注释成本并加快适应多样化场景的速度。本文概述了一种深度AL方法，介绍了关键挑战，并进行了一项小规模的试点研究。

更新时间: 2024-06-26 08:43:05

领域: cs.SD,cs.AI,eess.AS

下载: http://arxiv.org/abs/2406.18621v1

Super Tiny Language Models

The rapid advancement of large language models (LLMs) has led to significant improvements in natural language processing but also poses challenges due to their high computational and energy demands. This paper introduces a series of research efforts focused on Super Tiny Language Models (STLMs), which aim to deliver high performance with significantly reduced parameter counts. We explore innovative techniques such as byte-level tokenization with a pooling mechanism, weight tying, and efficient training strategies. These methods aim to significantly reduce reduce the parameter count compared to traditional models -- in future works, we aim to build on these in a way that maintains and improves upon the performance of base transformer models. This series of papers will explore into various subproblems, including tokenizer-free models, self-play based training, and alternative training objectives. We will target models with 10M, 50M, and 100M parameters. Our ultimate goal is to make high-performance language models more accessible and practical for a wide range of applications.

Updated: 2024-06-26 08:41:06

标题: 超微型语言模型

摘要: 大型语言模型（LLMs）的快速发展已经显著提高了自然语言处理的水平，但也因其高计算和能源需求而带来挑战。本文介绍了一系列研究工作，重点关注超小语言模型（STLMs），旨在通过显著减少参数来提供高性能。我们探索了创新技术，如字节级标记化和池化机制、权重绑定和高效训练策略。这些方法旨在与传统模型相比显著减少参数数量--在未来的工作中，我们将在这些基础上建立，以保持和提高基础变压器模型的性能。这一系列论文将探讨各种子问题，包括无标记化模型、基于自我对弈的训练和替代训练目标。我们将针对具有10M、50M和100M参数的模型。我们的最终目标是使高性能语言模型对各种应用更加易用和实用。

更新时间: 2024-06-26 08:41:06

领域: cs.CL,cs.AI,I.2.7

下载: http://arxiv.org/abs/2405.14159v2

Documentation Practices of Artificial Intelligence

Artificial Intelligence (AI) faces persistent challenges in terms of transparency and accountability, which requires rigorous documentation. Through a literature review on documentation practices, we provide an overview of prevailing trends, persistent issues, and the multifaceted interplay of factors influencing the documentation. Our examination of key characteristics such as scope, target audiences, support for multimodality, and level of automation, highlights a dynamic evolution in documentation practices, underscored by a shift towards a more holistic, engaging, and automated documentation.

Updated: 2024-06-26 08:33:52

标题: 人工智能的文档实践

摘要: 人工智能（AI）在透明度和问责方面面临着持续的挑战，这需要严格的文档记录。通过对文档记录实践的文献综述，我们提供了对主导趋势、持续问题和影响文档记录的多方面因素相互作用的概述。我们对范围、目标受众、支持多模态和自动化水平等关键特征的审查，突显了文档记录实践的动态演变，强调了向更全面、更引人入胜和更自动化的文档记录的转变。

更新时间: 2024-06-26 08:33:52

领域: cs.DL,cs.AI

下载: http://arxiv.org/abs/2406.18620v1

Start from Zero: Triple Set Prediction for Automatic Knowledge Graph Completion

Knowledge graph (KG) completion aims to find out missing triples in a KG. Some tasks, such as link prediction and instance completion, have been proposed for KG completion. They are triple-level tasks with some elements in a missing triple given to predict the missing element of the triple. However, knowing some elements of the missing triple in advance is not always a realistic setting. In this paper, we propose a novel graph-level automatic KG completion task called Triple Set Prediction (TSP) which assumes none of the elements in the missing triples is given. TSP is to predict a set of missing triples given a set of known triples. To properly and accurately evaluate this new task, we propose 4 evaluation metrics including 3 classification metrics and 1 ranking metric, considering both the partial-open-world and the closed-world assumptions. Furthermore, to tackle the huge candidate triples for prediction, we propose a novel and efficient subgraph-based method GPHT that can predict the triple set fast. To fairly compare the TSP results, we also propose two types of methods RuleTensor-TSP and KGE-TSP applying the existing rule- and embedding-based methods for TSP as baselines. During experiments, we evaluate the proposed methods on two datasets extracted from Wikidata following the relation-similarity partial-open-world assumption proposed by us, and also create a complete family data set to evaluate TSP results following the closed-world assumption. Results prove that the methods can successfully generate a set of missing triples and achieve reasonable scores on the new task, and GPHT performs better than the baselines with significantly shorter prediction time. The datasets and code for experiments are available at https://github.com/zjukg/GPHT-for-TSP.

Updated: 2024-06-26 08:26:32

标题: 从零开始：用于自动知识图谱完成的三元组预测

摘要: 知识图谱（KG）完成旨在找出知识图谱中缺失的三元组。一些任务，如链接预测和实例完成，已被提出用于知识图谱完成。它们是三元组级任务，给定一些缺失三元组中的元素来预测三元组的缺失元素。然而，在事先知道缺失三元组中的一些元素并不总是现实的设置。在本文中，我们提出了一种新的图级自动知识图谱完成任务，称为三元组集预测（TSP），它假设缺失三元组中的元素都未知。TSP旨在预测一组缺失三元组，给定一组已知三元组。为了正确和准确评估这一新任务，我们提出了4个评估指标，包括3个分类指标和1个排名指标，考虑到部分开放世界和封闭世界的假设。此外，为了处理预测的巨大候选三元组，我们提出了一种新颖高效的基于子图的方法GPHT，可以快速预测三元组集。为了公平地比较TSP的结果，我们还提出了两种方法RuleTensor-TSP和KGE-TSP，将现有的基于规则和嵌入的方法应用于TSP作为基准。在实验中，我们根据我们提出的关系相似性部分开放世界假设，对从Wikidata提取的两个数据集上的提出的方法进行评估，并创建一个完整的家庭数据集，以评估TSP结果，遵循封闭世界的假设。结果证明，这些方法可以成功生成一组缺失的三元组，并在新任务上取得合理的分数，而GPHT的表现优于基准，预测时间明显更短。实验数据集和代码可在https://github.com/zjukg/GPHT-for-TSP 上找到。

更新时间: 2024-06-26 08:26:32

领域: cs.AI

下载: http://arxiv.org/abs/2406.18166v1

Fast Learnings of Coupled Nonnegative Tensor Decomposition Using Optimal Gradient and Low-rank Approximation

Tensor decomposition is a fundamental technique widely applied in signal processing, machine learning, and various other fields. However, traditional tensor decomposition methods encounter limitations when jointly analyzing multi-block tensors, as they often struggle to effectively explore shared information among tensors. In this study, we first introduce a novel coupled nonnegative CANDECOMP/PARAFAC decomposition algorithm optimized by the alternating proximal gradient method (CoNCPD-APG). This algorithm is specially designed to address the challenges of jointly decomposing different tensors that are partially or fully linked, while simultaneously extracting common components, individual components and, core tensors. Recognizing the computational challenges inherent in optimizing nonnegative constraints over high-dimensional tensor data, we further propose the lraCoNCPD-APG algorithm. By integrating low-rank approximation with the proposed CoNCPD-APG method, the proposed algorithm can significantly decrease the computational burden without compromising decomposition quality, particularly for multi-block large-scale tensors. Simulation experiments conducted on synthetic data, real-world face image data, and two kinds of electroencephalography (EEG) data demonstrate the practicality and superiority of the proposed algorithms for coupled nonnegative tensor decomposition problems. Our results underscore the efficacy of our methods in uncovering meaningful patterns and structures from complex multi-block tensor data, thereby offering valuable insights for future applications.

Updated: 2024-06-26 08:25:49

标题: 使用最优梯度和低秩逼近快速学习耦合非负张量分解

摘要: 张量分解是一种广泛应用于信号处理、机器学习和其他各个领域的基本技术。然而，传统的张量分解方法在联合分析多块张量时会遇到限制，因为它们经常难以有效地探索张量之间的共享信息。在这项研究中，我们首先介绍了一种新颖的通过交替近端梯度方法优化的耦合非负CANDECOMP/PARAFAC分解算法（CoNCPD-APG）。这个算法专门设计用来解决联合分解不同部分或完全连接的张量时所面临的挑战，同时提取共同组件、个体组件和核心张量。鉴于在优化高维张量数据上的非负约束时存在的计算挑战，我们进一步提出了lraCoNCPD-APG算法。通过将低秩逼近与提出的CoNCPD-APG方法相结合，提出的算法可以显著减少计算负担，而不会牺牲分解质量，特别是对于多块大规模张量。在合成数据、真实人脸图像数据和两种脑电图（EEG）数据上进行的模拟实验表明，所提出的算法对于耦合非负张量分解问题具有实用性和优越性。我们的结果强调了我们的方法在从复杂的多块张量数据中揭示有意义的模式和结构方面的功效，从而为未来应用提供了有价值的见解。

更新时间: 2024-06-26 08:25:49

领域: cs.LG,cs.CV

下载: http://arxiv.org/abs/2302.05119v2

NeBuLa: A discourse aware Minecraft Builder

When engaging in collaborative tasks, humans efficiently exploit the semantic structure of a conversation to optimize verbal and nonverbal interactions. But in recent "language to code" or "language to action" models, this information is lacking. We show how incorporating the prior discourse and nonlinguistic context of a conversation situated in a nonlinguistic environment can improve the "language to action" component of such interactions. We fine tune an LLM to predict actions based on prior context; our model, NeBuLa, doubles the net-action F1 score over the baseline on this task of Jayannavar et al.(2020). We also investigate our model's ability to construct shapes and understand location descriptions using a synthetic dataset.

Updated: 2024-06-26 08:24:44

标题: NeBuLa：一个具有话语意识的Minecraft建造者

摘要: 在进行协作任务时，人类有效地利用对话的语义结构来优化语言和非语言交互。但是在最近的“语言到代码”或“语言到行动”模型中，缺乏这些信息。我们展示了如何将先前对话和非语言环境中的语境纳入到对话中，以提高这种交互的“语言到行动”组件。我们对LLM进行微调，根据先前的上下文来预测动作；我们的模型NeBuLa在Jayannavar等人(2020)的任务中，将净行动F1分数提高了一倍。我们还研究了我们的模型在使用合成数据集构建形状和理解位置描述方面的能力。

更新时间: 2024-06-26 08:24:44

领域: cs.CL,cs.LG

下载: http://arxiv.org/abs/2406.18164v1

Transferable Reward Learning by Dynamics-Agnostic Discriminator Ensemble

Recovering reward function from expert demonstrations is a fundamental problem in reinforcement learning. The recovered reward function captures the motivation of the expert. Agents can imitate experts by following these reward functions in their environment, which is known as apprentice learning. However, the agents may face environments different from the demonstrations, and therefore, desire transferable reward functions. Classical reward learning methods such as inverse reinforcement learning (IRL) or, equivalently, adversarial imitation learning (AIL), recover reward functions coupled with training dynamics, which are hard to be transferable. Previous dynamics-agnostic reward learning methods rely on assumptions such as that the reward function has to be state-only, restricting their applicability. In this work, we present a dynamics-agnostic discriminator-ensemble reward learning method (DARL) within the AIL framework, capable of learning both state-action and state-only reward functions. DARL achieves this by decoupling the reward function from training dynamics, employing a dynamics-agnostic discriminator on a latent space derived from the original state-action space. This latent space is optimized to minimize information on the dynamics. We moreover discover the policy-dependency issue of the AIL framework that reduces the transferability. DARL represents the reward function as an ensemble of discriminators during training to eliminate policy dependencies. Empirical studies on MuJoCo tasks with changed dynamics show that DARL better recovers the reward function and results in better imitation performance in transferred environments, handling both state-only and state-action reward scenarios.

Updated: 2024-06-26 08:24:26

标题: 动态不可知鉴别器集合的可转移奖励学习

摘要: 从专家演示中恢复奖励函数是强化学习中的一个基本问题。恢复的奖励函数捕捉了专家的动机。代理可以通过在环境中遵循这些奖励函数来模仿专家，这被称为学徒学习。然而，代理可能面临与演示不同的环境，因此需要可转移的奖励函数。传统的奖励学习方法，如逆向强化学习（IRL）或等价的对抗性模仿学习（AIL），恢复与训练动态耦合的奖励函数，这些奖励函数难以转移。先前的不考虑动态的奖励学习方法依赖于诸如奖励函数必须仅限于状态等假设，限制了它们的适用性。在这项工作中，我们在AIL框架内提出了一种不考虑动态的鉴别器集成奖励学习方法（DARL），能够学习状态-动作和仅状态奖励函数。DARL通过将奖励函数与训练动态解耦，利用从原始状态-动作空间导出的潜在空间上的不考虑动态的鉴别器来实现这一点。该潜在空间被优化以最小化关于动态的信息。我们还发现了AIL框架的依赖策略问题，降低了可转移性。DARL在训练过程中将奖励函数表示为鉴别器的集成，以消除策略依赖性。对具有改变动态的MuJoCo任务的经验研究表明，DARL更好地恢复了奖励函数，并在转移环境中表现出更好的模仿性能，处理了仅状态和状态-动作奖励场景。

更新时间: 2024-06-26 08:24:26

领域: cs.LG

下载: http://arxiv.org/abs/2206.00238v2

Efficient Data Learning for Open Information Extraction with Pre-trained Language Models

Open Information Extraction (OpenIE) is a fundamental yet challenging task in Natural Language Processing, which involves extracting all triples (subject, predicate, object) from a given sentence. While labeling-based methods have their merits, generation-based techniques offer unique advantages, such as the ability to generate tokens not present in the original sentence. However, these generation-based methods often require a significant amount of training data to learn the task form of OpenIE and substantial training time to overcome slow model convergence due to the order penalty. In this paper, we introduce a novel framework, OK-IE, that ingeniously transforms the task form of OpenIE into the pre-training task form of the T5 model, thereby reducing the need for extensive training data. Furthermore, we introduce an innovative concept of Anchor to control the sequence of model outputs, effectively eliminating the impact of order penalty on model convergence and significantly reducing training time. Experimental results indicate that, compared to previous SOTA methods, OK-IE requires only 1/100 of the training data (900 instances) and 1/120 of the training time (3 minutes) to achieve comparable results.

Updated: 2024-06-26 08:23:10

标题: 使用预训练语言模型进行高效的开放信息抽取数据学习

摘要: 开放信息提取（OpenIE）是自然语言处理中一个基础但具有挑战性的任务，涉及从给定句子中提取所有三元组（主语、谓语、宾语）。虽然基于标记的方法有其优点，但生成型技术提供了独特的优势，例如能够生成原始句子中不存在的标记。然而，这些基于生成的方法通常需要大量训练数据来学习OpenIE的任务形式，并需要大量训练时间来克服由于顺序惩罚导致的模型收敛缓慢的问题。在本文中，我们介绍了一个新颖的框架OK-IE，巧妙地将OpenIE的任务形式转换为T5模型的预训练任务形式，从而减少了对大量训练数据的需求。此外，我们引入了一个名为Anchor的创新概念来控制模型输出的顺序，有效消除了顺序惩罚对模型收敛的影响，显著减少了训练时间。实验结果表明，与之前的SOTA方法相比，OK-IE仅需要1/100的训练数据（900个实例）和1/120的训练时间（3分钟）即可实现可比较的结果。

更新时间: 2024-06-26 08:23:10

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2310.15021v2

FedAQ: Communication-Efficient Federated Edge Learning via Joint Uplink and Downlink Adaptive Quantization

Federated learning (FL) is a powerful machine learning paradigm which leverages the data as well as the computational resources of clients, while protecting clients' data privacy. However, the substantial model size and frequent aggregation between the server and clients result in significant communication overhead, making it challenging to deploy FL in resource-limited wireless networks. In this work, we aim to mitigate the communication overhead by using quantization. Previous research on quantization has primarily focused on the uplink communication, employing either fixed-bit quantization or adaptive quantization methods. In this work, we introduce a holistic approach by joint uplink and downlink adaptive quantization to reduce the communication overhead. In particular, we optimize the learning convergence by determining the optimal uplink and downlink quantization bit-length, with a communication energy constraint. Theoretical analysis shows that the optimal quantization levels depend on the range of model gradients or weights. Based on this insight, we propose a decreasing-trend quantization for the uplink and an increasing-trend quantization for the downlink, which aligns with the change of the model parameters during the training process. Experimental results show that, the proposed joint uplink and downlink adaptive quantization strategy can save up to 66.7% energy compared with the existing schemes.

Updated: 2024-06-26 08:14:23

标题: FedAQ：通过联合上行和下行自适应量化实现高效通信的联合边缘学习

摘要: 联合学习（FL）是一种强大的机器学习范式，利用客户端的数据和计算资源，同时保护客户端数据隐私。然而，服务器和客户端之间的大量模型大小和频繁聚合导致了显著的通信开销，使得在资源有限的无线网络中部署FL具有挑战性。本文旨在通过使用量化来减小通信开销。先前有关量化的研究主要集中在上行通信上，采用固定位数量化或自适应量化方法。在本文中，我们引入了一种整体方法，通过联合上行和下行自适应量化来减少通信开销。具体来说，我们通过确定最佳的上行和下行量化位长度，并在通信能量约束下优化学习收敛性。理论分析表明，最佳量化级别取决于模型梯度或权重的范围。基于这一洞察，我们提出了一种递减趋势的上行量化和递增趋势的下行量化，与训练过程中模型参数的变化相一致。实验结果表明，提出的联合上行和下行自适应量化策略相比现有方案可节省高达66.7％的能量。

更新时间: 2024-06-26 08:14:23

领域: cs.LG,cs.DC,cs.NI,eess.SP

下载: http://arxiv.org/abs/2406.18156v1

Toward Availability Attacks in 3D Point Clouds

Despite the great progress of 3D vision, data privacy and security issues in 3D deep learning are not explored systematically. In the domain of 2D images, many availability attacks have been proposed to prevent data from being illicitly learned by unauthorized deep models. However, unlike images represented on a fixed dimensional grid, point clouds are characterized as unordered and unstructured sets, posing a significant challenge in designing an effective availability attack for 3D deep learning. In this paper, we theoretically show that extending 2D availability attacks directly to 3D point clouds under distance regularization is susceptible to the degeneracy, rendering the generated poisons weaker or even ineffective. This is because in bi-level optimization, introducing regularization term can result in update directions out of control. To address this issue, we propose a novel Feature Collision Error-Minimization (FC-EM) method, which creates additional shortcuts in the feature space, inducing different update directions to prevent the degeneracy of bi-level optimization. Moreover, we provide a theoretical analysis that demonstrates the effectiveness of the FC-EM attack. Extensive experiments on typical point cloud datasets, 3D intracranial aneurysm medical dataset, and 3D face dataset verify the superiority and practicality of our approach. Code is available at https://github.com/hala64/fc-em.

Updated: 2024-06-26 08:13:30

标题: 朝向3D点云中的可用性攻击

摘要: 尽管3D视觉取得了巨大进展，但在3D深度学习中的数据隐私和安全问题尚未得到系统性探讨。在2D图像领域，已经提出了许多可用性攻击来防止未经授权的深度模型非法学习数据。然而，与固定维度网格上表示的图像不同，点云被描述为无序和无结构集合，这在设计3D深度学习的有效可用性攻击方面提出了重大挑战。本文从理论上证明了将2D可用性攻击直接扩展到基于距离正则化的3D点云易受退化影响，使生成的毒素变得更弱甚至无效。这是因为在双层优化中，引入正则化项可能导致更新方向失控。为了解决这个问题，我们提出了一种新颖的特征碰撞误差最小化（FC-EM）方法，在特征空间中创建额外的快捷方式，诱导不同的更新方向以防止双层优化的退化。此外，我们提供了一项理论分析，证明了FC-EM攻击的有效性。对典型点云数据集、3D颅内动脉瘤医学数据集和3D人脸数据集进行了大量实验，验证了我们方法的优越性和实用性。代码可在https://github.com/hala64/fc-em获取。

更新时间: 2024-06-26 08:13:30

领域: cs.CR,cs.CV,cs.LG

下载: http://arxiv.org/abs/2407.11011v1

Layer-Wise Quantization: A Pragmatic and Effective Method for Quantizing LLMs Beyond Integer Bit-Levels

We present a simple variable quantization approach that quantizes different layers of a large language model (LLM) at different bit levels. Specifically, we quantize the most important layers to higher bit precision and less important layers to lower bits to achieve floating point quantization levels. We propose two effective strategies to measure the importance of layers within LLMs: the first measures the importance of a layer based on how different its output embeddings are from the input embeddings (the higher the better); the second estimates the importance of a layer using the number of layer weights that are much larger than average (the smaller the better). We show that quantizing different layers at varying bits according to our importance scores results in minimal performance drop with a far more compressed model size. Finally, we present several practical key takeaways from our variable layer-wise quantization experiments: (a) LLM performance under variable quantization remains close to the original model until 25-50% of layers are moved in lower quantization using our proposed ordering but only until 5-10% if moved using no specific ordering; (b) Quantizing LLMs to lower bits performs substantially better than pruning unless extreme quantization (2-bit) is used; and (c) Layer-wise quantization to lower bits works better in the case of larger LLMs with more layers compared to smaller LLMs with fewer layers. The code used to run the experiments is available at: https://github.com/RazvanDu/LayerwiseQuant.

Updated: 2024-06-26 08:00:18

标题: 逐层量化：一种实用和有效的方法，用于对LLMs进行整数位级之外的量化

摘要: 我们提出了一种简单的可变量化方法，该方法对大型语言模型（LLM）的不同层使用不同的比特级别进行量化。具体而言，我们将最重要的层量化为更高的比特精度，将不太重要的层量化为较低的比特，以实现浮点量化水平。我们提出了两种有效的策略来衡量LLM中各层的重要性：第一种根据输出嵌入与输入嵌入的差异程度来衡量层的重要性（差异越大，性能越好）；第二种利用大于平均值的层权重数量来估计层的重要性（数量越小，性能越好）。我们展示了根据我们的重要性评分按不同比特量化不同层结果在性能上只有轻微下降，但模型大小要大得多。最后，我们从可变层量化实验中总结了几个实用的关键经验：（a）在使用我们提出的排序进行量化的情况下，LLM在可变量化下的性能保持接近原始模型，直到移动25-50% 的层为止，但如果使用没有特定排序的情况下仅能维持在5-10%层；（b）将LLM量化为较低比特的性能远远优于修剪，除非使用极端量化（2比特）；（c）与层数较少的较小LLM相比，对于具有更多层的较大LLM，逐层量化为较低比特的效果更好。运行实验所使用的代码可在以下链接找到：https://github.com/RazvanDu/LayerwiseQuant。

更新时间: 2024-06-26 08:00:18

领域: cs.CL,cs.AI,cs.LG,I.2.7; I.2.0

下载: http://arxiv.org/abs/2406.17415v2

Benchmarking General-Purpose In-Context Learning

In-context learning (ICL) empowers generative models to address new tasks effectively and efficiently on the fly, without relying on any artificially crafted optimization techniques. In this paper, we study extending ICL to address a broader range of tasks with an extended learning horizon and higher improvement potential, namely General-Purpose In-Context Learning (GPICL). To this end, we introduce two lightweight benchmarks specifically crafted to train and evaluate GPICL functionalities. Each benchmark encompasses a vast number of tasks characterized by significant task variance, facilitating meta-training that minimizes inductive bias. These tasks are also crafted to promote long-horizon in-context learning through continuous generation and interaction. These characteristics necessitate the models to leverage contexts and history interactions to enhance their capabilities, across domains such as language modeling, decision-making, and world modeling. Our experiments on the baseline models demonstrate that meta-training with minimal inductive bias and ICL from the ground up is feasible across all the domains we've discussed. Additionally, our findings indicate that the scale of parameters alone may not be crucial for ICL or GPICL, suggesting alternative approaches such as increasing the scale of contexts and memory states.

Updated: 2024-06-26 07:59:40

标题: 基准测试通用目的上下文学习

摘要: 在上下文学习（ICL）的支持下，生成模型能够在不依赖任何人工优化技术的情况下，有效和高效地应对新任务，即使是在任务执行过程中。在本文中，我们研究将ICL扩展到涵盖更广泛任务范围的情况，具有更长的学习时域和更高的改进潜力，即通用型上下文学习（GPICL）。为此，我们引入了两个轻量级基准，专门设计用于训练和评估GPICL的功能。每个基准包含大量任务，这些任务具有显著的任务差异，有助于通过最小化归纳偏差进行元训练。这些任务也被设计为通过持续生成和交互来促进长时域的上下文学习。这些特征要求模型利用上下文和历史交互来增强其能力，涉及语言建模、决策制定和世界建模等领域。我们对基准模型的实验表明，通过在所有讨论的领域中进行最小归纳偏差的元训练和从头开始的ICL是可行的。此外，我们的发现表明，参数规模本身对于ICL或GPICL可能并不至关重要，建议采用增加上下文和记忆状态规模等替代方法。

更新时间: 2024-06-26 07:59:40

领域: cs.AI,cs.LG

下载: http://arxiv.org/abs/2405.17234v5

SPHINX-X: Scaling Data and Parameters for a Family of Multi-modal Large Language Models

We propose SPHINX-X, an extensive Multimodality Large Language Model (MLLM) series developed upon SPHINX. To improve the architecture and training efficiency, we modify the SPHINX framework by removing redundant visual encoders, bypassing fully-padded sub-images with skip tokens, and simplifying multi-stage training into a one-stage all-in-one paradigm. To fully unleash the potential of MLLMs, we assemble a comprehensive multi-domain and multimodal dataset covering publicly available resources in language, vision, and vision-language tasks. We further enrich this collection with our curated OCR intensive and Set-of-Mark datasets, extending the diversity and generality. By training over different base LLMs including TinyLlama1.1B, InternLM2-7B, LLaMA2-13B, and Mixtral8x7B, we obtain a spectrum of MLLMs that vary in parameter size and multilingual capabilities. Comprehensive benchmarking reveals a strong correlation between the multi-modal performance with the data and parameter scales. Code and models are released at https://github.com/Alpha-VLLM/LLaMA2-Accessory

Updated: 2024-06-26 07:59:03

标题: SPHINX-X：为一系列多模态大型语言模型扩展数据和参数

摘要: 我们提出了SPHINX-X，这是一个基于SPHINX开发的广泛的多模态大语言模型（MLLM）系列。为了改进架构和训练效率，我们通过移除冗余的视觉编码器、通过跳过完全填充的子图像并使用跳过令牌，以及简化多阶段训练为一阶段一体化范式，修改了SPHINX框架。为了充分释放MLLM的潜力，我们组装了一个包含语言、视觉和视觉-语言任务公开可用资源的综合多域多模态数据集。我们进一步丰富了这个收藏，包括我们精心策划的OCR密集和Set-of-Mark数据集，扩展了多样性和普适性。通过对不同基本LLM进行训练，包括TinyLlama1.1B、InternLM2-7B、LLaMA2-13B和Mixtral8x7B，我们获得了一系列在参数大小和多语言能力上变化的MLLM。全面的基准测试表明，多模态性能与数据和参数规模之间存在强相关性。代码和模型发布在https://github.com/Alpha-VLLM/LLaMA2-Accessory。

更新时间: 2024-06-26 07:59:03

领域: cs.CV,cs.AI,cs.CL,cs.LG

下载: http://arxiv.org/abs/2402.05935v2

Beyond Statistical Estimation: Differentially Private Individual Computation in the Shuffle Model

The shuffle model of differential privacy (DP) has recently emerged as a powerful one for decentralized computation without fully trustable parties. Since it anonymizes and permutes messages from clients through a shuffler, the privacy can be amplified and utility can be improved. However, the shuffling procedure in turn restricts its applications only to statistical tasks that are permutation-invariant. This work explores the feasibility of shuffle privacy amplification for prevalent non-statistical computations: spatial crowdsourcing, combinatorial optimization, location-based social systems, and federated learning with incentives, which suffer either computationally intractability or intolerable utility loss in existing approaches (e.g., secure MPC and local DP). We proposes a new paradigm of shuffle model that can provide critical security functionalities like message authorization and result access control, meanwhile maintaining the most of privacy amplification effects. It incurs almost the same computation/communication costs as the non-private setting, and permits the server to run arbitrary algorithms on (noisy) client information in plaintext. Our novel technique is introducing statistically random identity into DP and force identical random distribution on all clients, so as to support secure functionalities even after message shuffling and to maintain privacy amplification simultaneously. Given that existing DP randomizers fails in the new shuffle model, we also propose a new mechanism and prove its optimality therein. Experimental results on spatial crowdsourcing, location-based social system, and federated learning with incentives, show that our paradigm and mechanism is fast as non-private settings, while reducing up to 90% error and increasing utility performance indicates by 100%-300% relatively, and can be practical under reasonable privacy budget.

Updated: 2024-06-26 07:53:48

标题: 超越统计估计：差分隐私个体计算在混洗模型中

摘要: 最近出现的差分隐私（DP）洗牌模型被认为是一种在没有完全可信方的去中心化计算中的强大模型。由于通过混洗器对来自客户端的消息进行匿名化和排列，隐私可以被增强，效用也可以得到改善。然而，混洗过程反过来限制了其应用范围，仅适用于对排列不变的统计任务。本文探讨了对流行的非统计计算（如空间众包、组合优化、基于位置的社交系统和带有激励措施的联邦学习）进行洗牌隐私增强的可行性，这些计算要么在现有方法（如安全MPC和本地DP）中具有计算难度，要么效用损失不可忍受。我们提出了一种新的洗牌模型范式，可以提供诸如消息授权和结果访问控制等关键安全功能，同时保持大部分隐私增强效果。它几乎与非私密设置的计算/通信成本相同，并允许服务器在明文中对（带有噪声的）客户信息运行任意算法。我们的新技术将统计随机身份引入DP，并强制在所有客户端上实施相同的随机分布，以便在消息混洗后支持安全功能，并同时保持隐私增强。鉴于现有的DP随机化器在新的洗牌模型中失败，我们还提出了一种新的机制，并证明了其最优性。在空间众包、基于位置的社交系统和带有激励措施的联邦学习的实验结果显示，我们的范式和机制与非私密设置一样快，同时减少了高达90%的错误，并相对增加了100%-300%的效用性能，并且在合理的隐私预算下是可行的。

更新时间: 2024-06-26 07:53:48

领域: cs.CR,cs.LG

下载: http://arxiv.org/abs/2406.18145v1

Innovating for Tomorrow: The Convergence of SE and Green AI

The latest advancements in machine learning, specifically in foundation models, are revolutionizing the frontiers of existing software engineering (SE) processes. This is a bi-directional phenomona, where 1) software systems are now challenged to provide AI-enabled features to their users, and 2) AI is used to automate tasks within the software development lifecycle. In an era where sustainability is a pressing societal concern, our community needs to adopt a long-term plan enabling a conscious transformation that aligns with environmental sustainability values. In this paper, we reflect on the impact of adopting environmentally friendly practices to create AI-enabled software systems and make considerations on the environmental impact of using foundation models for software development.

Updated: 2024-06-26 07:47:04

标题: 明天创新：软件工程和绿色人工智能的融合

摘要: 最新的机器学习技术，特别是基础模型方面的进展，正在彻底改变现有软件工程（SE）流程的前沿。这是一个双向现象，其中1）软件系统现在面临着为其用户提供AI功能的挑战，2）AI被用来自动化软件开发生命周期中的任务。在一个可持续性是一个迫切社会关注的时代，我们的社区需要采纳一个长期计划，促成一个与环境可持续性价值观一致的有意识的转变。在本文中，我们反思了采用环保实践来创建AI功能软件系统的影响，并考虑了使用基础模型进行软件开发的环境影响。

更新时间: 2024-06-26 07:47:04

领域: cs.SE,cs.AI

下载: http://arxiv.org/abs/2406.18142v1

Transformer-based de novo peptide sequencing for data-independent acquisition mass spectrometry

Tandem mass spectrometry (MS/MS) stands as the predominant high-throughput technique for comprehensively analyzing protein content within biological samples. This methodology is a cornerstone driving the advancement of proteomics. In recent years, substantial strides have been made in Data-Independent Acquisition (DIA) strategies, facilitating impartial and non-targeted fragmentation of precursor ions. The DIA-generated MS/MS spectra present a formidable obstacle due to their inherent high multiplexing nature. Each spectrum encapsulates fragmented product ions originating from multiple precursor peptides. This intricacy poses a particularly acute challenge in de novo peptide/protein sequencing, where current methods are ill-equipped to address the multiplexing conundrum. In this paper, we introduce DiaTrans, a deep-learning model based on transformer architecture. It deciphers peptide sequences from DIA mass spectrometry data. Our results show significant improvements over existing STOA methods, including DeepNovo-DIA and PepNet. Casanovo-DIA enhances precision by 15.14% to 34.8%, recall by 11.62% to 31.94% at the amino acid level, and boosts precision by 59% to 81.36% at the peptide level. Integrating DIA data and our DiaTrans model holds considerable promise to uncover novel peptides and more comprehensive profiling of biological samples. Casanovo-DIA is freely available under the GNU GPL license at https://github.com/Biocomputing-Research-Group/DiaTrans.

Updated: 2024-06-26 07:45:33

标题: 基于Transformer的数据无关采集质谱的新肽段测序

摘要: 串联质谱（MS/MS）作为全面分析生物样本中蛋白质含量的主要高通量技术。这种方法是推动蛋白质组学进步的基石。近年来，在数据无关采集（DIA）策略方面取得了重大进展，促进了前体离子的公正和非靶向碎裂。由DIA生成的MS/MS谱图由于其固有的高复用性而构成巨大障碍。每个谱图包含来自多个前体肽的碎裂产物离子。这种复杂性在de novo肽/蛋白质测序中构成特别严峻的挑战，当前方法无法解决复用难题。在本文中，我们介绍了DiaTrans，这是一种基于变压器架构的深度学习模型。它可以从DIA质谱数据中解密肽段序列。我们的结果显示，Casanovo-DIA相比现有的STOA方法（包括DeepNovo-DIA和PepNet）有显著改进。在氨基酸水平上，Casanovo-DIA将精度提高了15.14%至34.8%，召回率提高了11.62%至31.94%，在肽水平上将精度提高了59%至81.36%。整合DIA数据和我们的DiaTrans模型有望发现新的肽段并更全面地对生物样本进行分析。Casanovo-DIA可在GNU GPL许可下免费获取，网址为https://github.com/Biocomputing-Research-Group/DiaTrans。

更新时间: 2024-06-26 07:45:33

领域: q-bio.QM,cs.AI

下载: http://arxiv.org/abs/2402.11363v3

CDQuant: Accurate Post-training Weight Quantization of Large Pre-trained Models using Greedy Coordinate Descent

Large language models (LLMs) have recently demonstrated remarkable performance across diverse language tasks. But their deployment is often constrained by their substantial computational and storage requirements. Quantization has emerged as a key technique for addressing this challenge, enabling the compression of large models with minimal impact on performance. The recent GPTQ algorithm, a post-training quantization (PTQ) method, has proven highly effective for compressing LLMs, sparking a wave of research that leverages GPTQ as a core component. Recognizing the pivotal role of GPTQ in the PTQ landscape, we introduce CDQuant, a simple and scalable alternative to GPTQ with improved performance. CDQuant uses coordinate descent to minimize the layer-wise reconstruction loss to achieve high-quality quantized weights. Our algorithm is easy to implement and scales efficiently to models with hundreds of billions of parameters. Through extensive evaluation on the PaLM2 model family, we demonstrate that CDQuant consistently outperforms GPTQ across diverse model sizes and quantization levels. In particular, for INT2 quantization of PaLM2-Otter, CDQuant achieves a 10% reduction in perplexity compared to GPTQ.

Updated: 2024-06-26 07:44:42

标题: CDQuant：使用贪婪坐标下降准确地对大型预训练模型进行训练后的权重量化

摘要: 大型语言模型（LLMs）最近在各种语言任务中表现出色。但是它们的部署通常受到它们巨大的计算和存储需求的限制。量化已经成为解决这一挑战的关键技术，使得可以在对性能影响最小的情况下压缩大型模型。最近的GPTQ算法是一种后训练量化（PTQ）方法，已被证明对LLMs的压缩非常有效，引发了一系列利用GPTQ作为核心组件的研究浪潮。鉴于GPTQ在PTQ领域中的关键作用，我们引入了CDQuant，这是一种简单且可扩展的GPTQ替代方案，具有更好的性能。CDQuant使用坐标下降来最小化逐层重构损失，以实现高质量的量化权重。我们的算法易于实施，并且可以有效扩展到具有数千亿参数的模型。通过对PaLM2模型系列进行广泛评估，我们证明CDQuant在各种模型大小和量化级别上始终优于GPTQ。特别是对于PaLM2-Otter的INT2量化，与GPTQ相比，CDQuant可以将困惑度降低10％。

更新时间: 2024-06-26 07:44:42

领域: cs.LG,cs.AI,cs.CL

下载: http://arxiv.org/abs/2406.17542v2

Exclusive Style Removal for Cross Domain Novel Class Discovery

As a promising field in open-world learning, \textit{Novel Class Discovery} (NCD) is usually a task to cluster unseen novel classes in an unlabeled set based on the prior knowledge of labeled data within the same domain. However, the performance of existing NCD methods could be severely compromised when novel classes are sampled from a different distribution with the labeled ones. In this paper, we explore and establish the solvability of NCD in cross domain setting with the necessary condition that style information must be removed. Based on the theoretical analysis, we introduce an exclusive style removal module for extracting style information that is distinctive from the baseline features, thereby facilitating inference. Moreover, this module is easy to integrate with other NCD methods, acting as a plug-in to improve performance on novel classes with different distributions compared to the seen labeled set. Additionally, recognizing the non-negligible influence of different backbones and pre-training strategies on the performance of the NCD methods, we build a fair benchmark for future NCD research. Extensive experiments on three common datasets demonstrate the effectiveness of our proposed module.

Updated: 2024-06-26 07:44:27

标题: 跨领域新类别发现的独占式风格去除

摘要: 作为开放世界学习中一个具有前景的领域，“新类别发现”（NCD）通常是一个任务，根据同一领域标记数据的先验知识，在一个未标记集合中对未见过的新类别进行聚类。然而，当新类别从与标记类别不同的分布中抽样时，现有NCD方法的性能可能会严重受损。在本文中，我们探索并建立了NCD在跨领域设置中的可解性，其中必要条件是必须移除样式信息。基于理论分析，我们引入了一个专门的样式去除模块，用于提取与基准特征有区别的样式信息，从而促进推断。此外，这个模块易于与其他NCD方法集成，作为一个插件来提高在与已见标记集合不同分布的新类别上的性能。此外，我们认识到不同的主干网络和预训练策略对NCD方法性能的影响是不可忽视的，因此我们为未来NCD研究建立了一个公平的基准。对三个常见数据集的大量实验证明了我们提出的模块的有效性。

更新时间: 2024-06-26 07:44:27

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2406.18140v1

CritiqueLLM: Towards an Informative Critique Generation Model for Evaluation of Large Language Model Generation

Since the natural language processing (NLP) community started to make large language models (LLMs) act as a critic to evaluate the quality of generated texts, most of the existing works train a critique generation model on the evaluation data labeled by GPT-4's direct prompting. We observe that these models lack the ability to generate informative critiques in both pointwise grading and pairwise comparison especially without references. As a result, their generated critiques cannot provide fine-grained distinguishability on generated texts, causing unsatisfactory evaluation performance. In this paper, we propose a simple yet effective method called Eval-Instruct, which can first acquire pointwise grading critiques with pseudo references and then revise these critiques via multi-path prompting to obtain informative evaluation data in different tasks and settings, including pointwise grading and pairwise comparison with / without references. After fine-tuning on these data, the resulting model CritiqueLLM is empirically shown to outperform ChatGPT and all the open-source baselines and even achieve comparable evaluation performance to GPT-4 in system-level correlations of pointwise grading. We also demonstrate that our generated critiques can act as scalable feedback to further improve the generation quality of strong LLMs like ChatGPT.

Updated: 2024-06-26 07:44:11

标题: CritiqueLLM：面向大型语言模型生成评估的信息性批评生成模型

摘要: 自然语言处理（NLP）社区开始让大型语言模型（LLMs）充当评论家来评估生成文本的质量以来，大多数现有作品都在由GPT-4直接提示标记的评估数据上训练评论生成模型。我们观察到这些模型缺乏在点对点评分和成对比较中生成信息评论的能力，尤其是在没有参考文献的情况下。因此，它们生成的评论无法提供生成文本的细粒度区分性，导致评估性能不佳。在本文中，我们提出了一种简单而有效的方法，称为Eval-Instruct，它可以首先通过伪参考文献获取点对点评分评论，然后通过多路径提示修订这些评论，以在不同任务和设置中获取信息丰富的评估数据，包括带/不带参考文献的点对点评分和成对比较。在这些数据上微调后，生成的模型CritiqueLLM在点对点评分的系统级相关性方面经验性地表现优于ChatGPT和所有开源基线，甚至在评估性能方面可与GPT-4相媲美。我们还证明了我们生成的评论可以作为可扩展的反馈，进一步提高像ChatGPT这样的强大LLMs的生成质量。

更新时间: 2024-06-26 07:44:11

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2311.18702v2

FaithLM: Towards Faithful Explanations for Large Language Models

Large Language Models (LLMs) have become proficient in addressing complex tasks by leveraging their extensive internal knowledge and reasoning capabilities. However, the black-box nature of these models complicates the task of explaining their decision-making processes. While recent advancements demonstrate the potential of leveraging LLMs to self-explain their predictions through natural language (NL) explanations, their explanations may not accurately reflect the LLMs' decision-making process due to a lack of fidelity optimization on the derived explanations. Measuring the fidelity of NL explanations is a challenging issue, as it is difficult to manipulate the input context to mask the semantics of these explanations. To this end, we introduce FaithLM to explain the decision of LLMs with NL explanations. Specifically, FaithLM designs a method for evaluating the fidelity of NL explanations by incorporating the contrary explanations to the query process. Moreover, FaithLM conducts an iterative process to improve the fidelity of derived explanations. Experiment results on three datasets from multiple domains demonstrate that FaithLM can significantly improve the fidelity of derived explanations, which also provides a better alignment with the ground-truth explanations.

Updated: 2024-06-26 07:43:11

标题: FaithLM: 朝向大型语言模型的忠实解释

摘要: 大型语言模型（LLMs）已经通过利用其广泛的内部知识和推理能力，熟练地解决复杂任务。然而，这些模型的黑匣子特性使得解释它们的决策过程变得复杂。尽管最近的进展表明，利用LLMs通过自然语言（NL）解释其预测的潜力，但由于对衍生解释的准确度优化不足，它们的解释可能无法准确反映LLMs的决策过程。衡量NL解释的准确度是一个具有挑战性的问题，因为很难操纵输入上下文来掩盖这些解释的语义。为此，我们引入了FaithLM来解释LLMs的决策过程并提供NL解释。具体来说，FaithLM设计了一种评估NL解释准确性的方法，通过将相反的解释纳入到查询过程中。此外，FaithLM进行迭代过程来改善衍生解释的准确性。来自多个领域的三个数据集上的实验结果表明，FaithLM能够显著提高衍生解释的准确性，同时与基准解释更好地对齐。

更新时间: 2024-06-26 07:43:11

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2402.04678v3

Sparse deep neural networks for nonparametric estimation in high-dimensional sparse regression

Generalization theory has been established for sparse deep neural networks under high-dimensional regime. Beyond generalization, parameter estimation is also important since it is crucial for variable selection and interpretability of deep neural networks. Current theoretical studies concerning parameter estimation mainly focus on two-layer neural networks, which is due to the fact that the convergence of parameter estimation heavily relies on the regularity of the Hessian matrix, while the Hessian matrix of deep neural networks is highly singular. To avoid the unidentifiability of deep neural networks in parameter estimation, we propose to conduct nonparametric estimation of partial derivatives with respect to inputs. We first show that model convergence of sparse deep neural networks is guaranteed in that the sample complexity only grows with the logarithm of the number of parameters or the input dimension when the $\ell_{1}$-norm of parameters is well constrained. Then by bounding the norm and the divergence of partial derivatives, we establish that the convergence rate of nonparametric estimation of partial derivatives scales as $\mathcal{O}(n^{-1/4})$, a rate which is slower than the model convergence rate $\mathcal{O}(n^{-1/2})$. To the best of our knowledge, this study combines nonparametric estimation and parametric sparse deep neural networks for the first time. As nonparametric estimation of partial derivatives is of great significance for nonlinear variable selection, the current results show the promising future for the interpretability of deep neural networks.

Updated: 2024-06-26 07:41:41

标题: 稀疏深度神经网络在高维度稀疏回归中的非参数估计

摘要: 稀疏深度神经网络的泛化理论已经在高维情况下得到建立。除了泛化外，参数估计也很重要，因为它对于深度神经网络的变量选择和可解释性至关重要。目前关于参数估计的理论研究主要集中在两层神经网络上，这是因为参数估计的收敛严重依赖于Hessian矩阵的正则性，而深度神经网络的Hessian矩阵是高度奇异的。为了避免在参数估计中深度神经网络的不可识别性，我们建议进行关于输入的偏导数的非参数估计。我们首先证明，当参数的$\ell_{1}$-范数受到良好约束时，稀疏深度神经网络的模型收敛是有保证的，因为样本复杂度仅随参数数量或输入维度的对数增长。然后通过限制偏导数的范数和发散性，我们建立了非参数估计偏导数的收敛速度为$\mathcal{O}(n^{-1/4})$，这个速度比模型收敛速度$\mathcal{O}(n^{-1/2})$更慢。据我们所知，这项研究首次将非参数估计和参数稀疏深度神经网络结合起来。由于非参数估计偏导数对于非线性变量选择非常重要，当前结果展示了深度神经网络可解释性的光明未来。

更新时间: 2024-06-26 07:41:41

领域: stat.ML,cs.LG

下载: http://arxiv.org/abs/2406.18137v1

$\text{Alpha}^2$: Discovering Logical Formulaic Alphas using Deep Reinforcement Learning

Alphas are pivotal in providing signals for quantitative trading. The industry highly values the discovery of formulaic alphas for their interpretability and ease of analysis, compared with the expressive yet overfitting-prone black-box alphas. In this work, we focus on discovering formulaic alphas. Prior studies on automatically generating a collection of formulaic alphas were mostly based on genetic programming (GP), which is known to suffer from the problems of being sensitive to the initial population, converting to local optima, and slow computation speed. Recent efforts employing deep reinforcement learning (DRL) for alpha discovery have not fully addressed key practical considerations such as alpha correlations and validity, which are crucial for their effectiveness. In this work, we propose a novel framework for alpha discovery using DRL by formulating the alpha discovery process as program construction. Our agent, $\text{Alpha}^2$, assembles an alpha program optimized for an evaluation metric. A search algorithm guided by DRL navigates through the search space based on value estimates for potential alpha outcomes. The evaluation metric encourages both the performance and the diversity of alphas for a better final trading strategy. Our formulation of searching alphas also brings the advantage of pre-calculation dimensional analysis, ensuring the logical soundness of alphas, and pruning the vast search space to a large extent. Empirical experiments on real-world stock markets demonstrates $\text{Alpha}^2$'s capability to identify a diverse set of logical and effective alphas, which significantly improves the performance of the final trading strategy. The code of our method is available at https://github.com/x35f/alpha2.

Updated: 2024-06-26 07:40:12

标题: Alpha^2：使用深度强化学习发现逻辑公式Alpha

摘要: 阿尔法因提供量化交易信号而至关重要。与黑匣子阿尔法相比，行业非常重视发现公式化阿尔法，因为其可解释性和易于分析。在这项工作中，我们专注于发现公式化阿尔法。先前关于自动生成一系列公式化阿尔法的研究大多基于遗传编程（GP），已知受到初始种群敏感、转化为局部最优解以及计算速度缓慢等问题的影响。最近利用深度强化学习（DRL）进行阿尔法发现的努力尚未充分解决关键的实际考虑因素，如阿尔法之间的相关性和有效性，这对其有效性至关重要。在这项工作中，我们提出了一个新颖的使用DRL进行阿尔法发现的框架，通过将阿尔法发现过程作为程序构建来加以表述。我们的代理$\text{Alpha}^2$组装了一个优化评估指标的阿尔法程序。由DRL指导的搜索算法根据潜在阿尔法结果的价值估计来遍历搜索空间。评估指标鼓励提高阿尔法的性能和多样性，以获得更好的最终交易策略。我们对寻找阿尔法的表述还带来了预计算维度分析的优势，确保阿尔法的逻辑合理性，并在很大程度上剪枝广阔的搜索空间。对真实股市的实证实验表明，$\text{Alpha}^2$能够识别出一组多样化的逻辑和有效的阿尔法，显著提高了最终交易策略的性能。我们的方法的代码可在https://github.com/x35f/alpha2 上获得。

更新时间: 2024-06-26 07:40:12

领域: q-fin.CP,cs.AI

下载: http://arxiv.org/abs/2406.16505v2

Implications of the AI Act for Non-Discrimination Law and Algorithmic Fairness

The topic of fairness in AI, as debated in the FATE (Fairness, Accountability, Transparency, and Ethics in AI) communities, has sparked meaningful discussions in the past years. However, from a legal perspective, particularly from the perspective of European Union law, many open questions remain. Whereas algorithmic fairness aims to mitigate structural inequalities at design-level, European non-discrimination law is tailored to individual cases of discrimination after an AI model has been deployed. The AI Act might present a tremendous step towards bridging these two approaches by shifting non-discrimination responsibilities into the design stage of AI models. Based on an integrative reading of the AI Act, we comment on legal as well as technical enforcement problems and propose practical implications on bias detection and bias correction in order to specify and comply with specific technical requirements.

Updated: 2024-06-26 07:35:30

标题: 《AI法案对非歧视法律和算法公正性的影响》

摘要: 人工智能领域的公平性话题，在FATE（公平性、问责性、透明性和伦理学在人工智能中）社区中引发了多年来有意义的讨论。然而，从法律角度来看，特别是从欧盟法律的角度来看，仍然存在许多悬而未决的问题。虽然算法公平旨在在设计层面减少结构性不平等，但欧洲的非歧视法律是针对在人工智能模型部署后的个案歧视进行调整的。人工智能法案可能是朝着将这两种方法桥接的巨大一步，通过将非歧视责任转移到人工智能模型的设计阶段。基于对人工智能法案的综合阅读，我们对法律和技术执行问题进行评论，并提出在偏见检测和偏见校正方面的实际影响，以便明确和符合特定的技术要求。

更新时间: 2024-06-26 07:35:30

领域: cs.AI

下载: http://arxiv.org/abs/2403.20089v2

Sequential Disentanglement by Extracting Static Information From A Single Sequence Element

One of the fundamental representation learning tasks is unsupervised sequential disentanglement, where latent codes of inputs are decomposed to a single static factor and a sequence of dynamic factors. To extract this latent information, existing methods condition the static and dynamic codes on the entire input sequence. Unfortunately, these models often suffer from information leakage, i.e., the dynamic vectors encode both static and dynamic information, or vice versa, leading to a non-disentangled representation. Attempts to alleviate this problem via reducing the dynamic dimension and auxiliary loss terms gain only partial success. Instead, we propose a novel and simple architecture that mitigates information leakage by offering a simple and effective subtraction inductive bias while conditioning on a single sample. Remarkably, the resulting variational framework is simpler in terms of required loss terms, hyperparameters, and data augmentation. We evaluate our method on multiple data-modality benchmarks including general time series, video, and audio, and we show beyond state-of-the-art results on generation and prediction tasks in comparison to several strong baselines.

Updated: 2024-06-26 07:32:47

标题: 通过从单个序列元素中提取静态信息进行顺序解缠

摘要: 一种基本的表示学习任务是无监督的顺序解缠，其中输入的潜在编码被分解为一个静态因子和一系列动态因子。为了提取这些潜在信息，现有方法会在整个输入序列上对静态和动态编码进行条件。不幸的是，这些模型经常遭受信息泄漏，即动态向量同时编码了静态和动态信息，或者反之，导致表示未解缠。试图通过减少动态维度和辅助损失项来缓解此问题仅取得部分成功。相反，我们提出了一种新颖简单的架构，通过在单个样本上进行条件化提供一个简单有效的减法归纳偏差来减轻信息泄漏。值得注意的是，由此产生的变分框架在所需损失项、超参数和数据增强方面更简单。我们在多个数据模态基准上评估了我们的方法，包括一般时间序列、视频和音频，并且我们展示了相对于几个强基线的生成和预测任务的最新结果。

更新时间: 2024-06-26 07:32:47

领域: cs.LG

下载: http://arxiv.org/abs/2406.18131v1

CTS: Sim-to-Real Unsupervised Domain Adaptation on 3D Detection

Simulation data can be accurately labeled and have been expected to improve the performance of data-driven algorithms, including object detection. However, due to the various domain inconsistencies from simulation to reality (sim-to-real), cross-domain object detection algorithms usually suffer from dramatic performance drops. While numerous unsupervised domain adaptation (UDA) methods have been developed to address cross-domain tasks between real-world datasets, progress in sim-to-real remains limited. This paper presents a novel Complex-to-Simple (CTS) framework to transfer models from labeled simulation (source) to unlabeled reality (target) domains. Based on a two-stage detector, the novelty of this work is threefold: 1) developing fixed-size anchor heads and RoI augmentation to address size bias and feature diversity between two domains, thereby improving the quality of pseudo-label; 2) developing a novel corner-format representation of aleatoric uncertainty (AU) for the bounding box, to uniformly quantify pseudo-label quality; 3) developing a noise-aware mean teacher domain adaptation method based on AU, as well as object-level and frame-level sampling strategies, to migrate the impact of noisy labels. Experimental results demonstrate that our proposed approach significantly enhances the sim-to-real domain adaptation capability of 3D object detection models, outperforming state-of-the-art cross-domain algorithms, which are usually developed for real-to-real UDA tasks.

Updated: 2024-06-26 07:31:16

标题: CTS：基于Sim-to-Real的三维检测无监督领域自适应

摘要: 模拟数据可以被准确标记，并且已经被期望能够提高基于数据的算法的性能，包括目标检测。然而，由于从模拟到现实的各种领域不一致性（从模拟到实际的跨领域），跨领域目标检测算法通常遭受严重的性能下降。虽然已经开发了许多无监督领域自适应（UDA）方法来解决真实世界数据集之间的跨领域任务，但模拟到实际的进展仍然有限。本文提出了一个新颖的复杂到简单（CTS）框架，用于将模型从标记的模拟（源）转移到未标记的现实（目标）领域。基于两阶段检测器，这项工作的新颖之处有三点：1）开发固定大小的锚头和RoI增强以解决两个领域之间的大小偏差和特征多样性，从而提高伪标签的质量；2）开发了一种新颖的角格式表示的混淆不确定性（AU）用于边界框，以统一量化伪标签的质量；3）基于AU开发了一个考虑噪声的平均教师领域自适应方法，以及基于对象级和帧级采样策略，以迁移嘈杂标签的影响。实验结果表明，我们提出的方法显著增强了3D目标检测模型的模拟到实际域自适应能力，优于通常为真实到真实UDA任务开发的最先进的跨域算法。

更新时间: 2024-06-26 07:31:16

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2406.18129v1

A Survey of Privacy-Preserving Model Explanations: Privacy Risks, Attacks, and Countermeasures

As the adoption of explainable AI (XAI) continues to expand, the urgency to address its privacy implications intensifies. Despite a growing corpus of research in AI privacy and explainability, there is little attention on privacy-preserving model explanations. This article presents the first thorough survey about privacy attacks on model explanations and their countermeasures. Our contribution to this field comprises a thorough analysis of research papers with a connected taxonomy that facilitates the categorisation of privacy attacks and countermeasures based on the targeted explanations. This work also includes an initial investigation into the causes of privacy leaks. Finally, we discuss unresolved issues and prospective research directions uncovered in our analysis. This survey aims to be a valuable resource for the research community and offers clear insights for those new to this domain. To support ongoing research, we have established an online resource repository, which will be continuously updated with new and relevant findings. Interested readers are encouraged to access our repository at https://github.com/tamlhp/awesome-privex.

Updated: 2024-06-26 07:28:15

标题: 隐私保护模型解释调查：隐私风险、攻击和对策

摘要: 随着可解释人工智能（XAI）的采用不断扩大，解决其隐私影响的紧迫性也在加剧。尽管在人工智能隐私和可解释性方面的研究日益增多，但对于保护隐私的模型解释却鲜有关注。本文首次全面调查了模型解释中的隐私攻击及其对策。我们在这一领域的贡献包括对研究论文进行深入分析，并建立了一个联系的分类法，便于根据目标解释对隐私攻击和对策进行分类。这项工作还首次探究了隐私泄漏的原因。最后，我们讨论了在分析中发现的未解决问题和未来研究方向。这项调查旨在成为研究社区的宝贵资源，并为新手提供清晰的见解。为了支持持续研究，我们建立了一个在线资源库，将不断更新新的相关发现。鼓励感兴趣的读者访问我们的资源库：https://github.com/tamlhp/awesome-privex。

更新时间: 2024-06-26 07:28:15

领域: cs.CR,cs.AI,cs.CY,cs.LG

下载: http://arxiv.org/abs/2404.00673v2

A GPU-Accelerated Bi-linear ADMM Algorithm for Distributed Sparse Machine Learning

This paper introduces the Bi-linear consensus Alternating Direction Method of Multipliers (Bi-cADMM), aimed at solving large-scale regularized Sparse Machine Learning (SML) problems defined over a network of computational nodes. Mathematically, these are stated as minimization problems with convex local loss functions over a global decision vector, subject to an explicit $\ell_0$ norm constraint to enforce the desired sparsity. The considered SML problem generalizes different sparse regression and classification models, such as sparse linear and logistic regression, sparse softmax regression, and sparse support vector machines. Bi-cADMM leverages a bi-linear consensus reformulation of the original non-convex SML problem and a hierarchical decomposition strategy that divides the problem into smaller sub-problems amenable to parallel computing. In Bi-cADMM, this decomposition strategy is based on a two-phase approach. Initially, it performs a sample decomposition of the data and distributes local datasets across computational nodes. Subsequently, a delayed feature decomposition of the data is conducted on Graphics Processing Units (GPUs) available to each node. This methodology allows Bi-cADMM to undertake computationally intensive data-centric computations on GPUs, while CPUs handle more cost-effective computations. The proposed algorithm is implemented within an open-source Python package called Parallel Sparse Fitting Toolbox (PsFiT), which is publicly available. Finally, computational experiments demonstrate the efficiency and scalability of our algorithm through numerical benchmarks across various SML problems featuring distributed datasets.

Updated: 2024-06-26 07:27:56

标题: 一种用于分布式稀疏机器学习的GPU加速双线性ADMM算法

摘要: 本文介绍了双线性一致性交替方向乘子方法（Bi-cADMM），旨在解决在计算节点网络上定义的大规模正则化稀疏机器学习（SML）问题。从数学上讲，这些问题被陈述为在全局决策向量上具有凸局部损失函数的最小化问题，受到显式$\ell_0$范数约束以强制所需的稀疏性。所考虑的SML问题推广了不同的稀疏回归和分类模型，如稀疏线性和逻辑回归，稀疏softmax回归以及稀疏支持向量机。Bi-cADMM利用原始非凸SML问题的双线性一致性重构和分层分解策略，将问题分解为更小的子问题，适合于并行计算。在Bi-cADMM中，这种分解策略基于两阶段方法。最初，它对数据进行样本分解，并将本地数据集分布在计算节点上。随后，在每个节点的图形处理单元（GPU）上进行延迟特征分解。这种方法允许Bi-cADMM在GPU上进行计算密集型的数据中心计算，而CPU处理更具成本效益的计算。所提出的算法实现在一个名为并行稀疏拟合工具包（PsFiT）的开源Python包中，该工具包是公开可用的。最后，计算实验通过在多种SML问题上进行分布式数据集的数值基准测试，展示了我们算法的效率和可扩展性。

更新时间: 2024-06-26 07:27:56

领域: cs.LG

下载: http://arxiv.org/abs/2405.16267v2

ResumeAtlas: Revisiting Resume Classification with Large-Scale Datasets and Large Language Models

The increasing reliance on online recruitment platforms coupled with the adoption of AI technologies has highlighted the critical need for efficient resume classification methods. However, challenges such as small datasets, lack of standardized resume templates, and privacy concerns hinder the accuracy and effectiveness of existing classification models. In this work, we address these challenges by presenting a comprehensive approach to resume classification. We curated a large-scale dataset of 13,389 resumes from diverse sources and employed Large Language Models (LLMs) such as BERT and Gemma1.1 2B for classification. Our results demonstrate significant improvements over traditional machine learning approaches, with our best model achieving a top-1 accuracy of 92\% and a top-5 accuracy of 97.5\%. These findings underscore the importance of dataset quality and advanced model architectures in enhancing the accuracy and robustness of resume classification systems, thus advancing the field of online recruitment practices.

Updated: 2024-06-26 07:25:18

标题: 简历地图：通过大规模数据集和大型语言模型重新审视简历分类

摘要: 随着对在线招聘平台的日益依赖以及人工智能技术的采用，高效简历分类方法的重要性日益凸显。然而，小数据集、缺乏标准化简历模板以及隐私问题等挑战阻碍了现有分类模型的准确性和有效性。在本研究中，我们通过提出综合的简历分类方法来解决这些挑战。我们从各种来源精心筛选了一个规模庞大的数据集，包括13,389份简历，并采用了大型语言模型（LLMs）如BERT和Gemma1.1 2B进行分类。我们的结果表明，与传统机器学习方法相比，我们的最佳模型在top-1准确率达到92\%，top-5准确率达到97.5\%。这些发现强调了数据集质量和先进模型架构在提高简历分类系统准确性和鲁棒性方面的重要性，从而推动了在线招聘实践领域的发展。

更新时间: 2024-06-26 07:25:18

领域: cs.CL,cs.AI,cs.CY,cs.LG

下载: http://arxiv.org/abs/2406.18125v1

Poisoned LangChain: Jailbreak LLMs by LangChain

With the development of natural language processing (NLP), large language models (LLMs) are becoming increasingly popular. LLMs are integrating more into everyday life, raising public concerns about their security vulnerabilities. Consequently, the security of large language models is becoming critically important. Currently, the techniques for attacking and defending against LLMs are continuously evolving. One significant method type of attack is the jailbreak attack, which designed to evade model safety mechanisms and induce the generation of inappropriate content. Existing jailbreak attacks primarily rely on crafting inducement prompts for direct jailbreaks, which are less effective against large models with robust filtering and high comprehension abilities. Given the increasing demand for real-time capabilities in large language models, real-time updates and iterations of new knowledge have become essential. Retrieval-Augmented Generation (RAG), an advanced technique to compensate for the model's lack of new knowledge, is gradually becoming mainstream. As RAG enables the model to utilize external knowledge bases, it provides a new avenue for jailbreak attacks. In this paper, we conduct the first work to propose the concept of indirect jailbreak and achieve Retrieval-Augmented Generation via LangChain. Building on this, we further design a novel method of indirect jailbreak attack, termed Poisoned-LangChain (PLC), which leverages a poisoned external knowledge base to interact with large language models, thereby causing the large models to generate malicious non-compliant dialogues.We tested this method on six different large language models across three major categories of jailbreak issues. The experiments demonstrate that PLC successfully implemented indirect jailbreak attacks under three different scenarios, achieving success rates of 88.56%, 79.04%, and 82.69% respectively.

Updated: 2024-06-26 07:21:02

标题: 中文翻译：被毒化的LangChain：通过LangChain越狱LLMs

摘要: 随着自然语言处理（NLP）的发展，大型语言模型（LLMs）变得越来越受欢迎。LLMs正在更多地融入日常生活中，引发了公众对它们安全漏洞的担忧。因此，大型语言模型的安全性变得至关重要。目前，针对LLMs的攻击和防御技术不断发展。一种重要的攻击方法是越狱攻击，旨在规避模型的安全机制并诱导生成不当内容。现有的越狱攻击主要依赖于为直接越狱设计诱导提示，对于具有强大过滤和高理解能力的大型模型来说效果不佳。鉴于大型语言模型对实时能力的需求增加，实时更新和迭代新知识变得至关重要。检索增强生成（RAG）是一种先进技术，用于弥补模型缺乏新知识，逐渐成为主流。由于RAG使模型能够利用外部知识库，为越狱攻击提供了新途径。在本文中，我们首次提出了间接越狱的概念，并通过LangChain实现了检索增强生成。在此基础上，我们进一步设计了一种新颖的间接越狱攻击方法，称为Poisoned-LangChain（PLC），利用受污染的外部知识库与大型语言模型互动，从而导致大型模型生成恶意的非符合对话。我们在三大类越狱问题的六种不同大型语言模型上测试了这种方法。实验表明，在三种不同情景下，PLC成功实施了间接越狱攻击，分别达到了88.56％、79.04％和82.69％的成功率。

更新时间: 2024-06-26 07:21:02

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2406.18122v1

ArzEn-LLM: Code-Switched Egyptian Arabic-English Translation and Speech Recognition Using LLMs

Motivated by the widespread increase in the phenomenon of code-switching between Egyptian Arabic and English in recent times, this paper explores the intricacies of machine translation (MT) and automatic speech recognition (ASR) systems, focusing on translating code-switched Egyptian Arabic-English to either English or Egyptian Arabic. Our goal is to present the methodologies employed in developing these systems, utilizing large language models such as LLama and Gemma. In the field of ASR, we explore the utilization of the Whisper model for code-switched Egyptian Arabic recognition, detailing our experimental procedures including data preprocessing and training techniques. Through the implementation of a consecutive speech-to-text translation system that integrates ASR with MT, we aim to overcome challenges posed by limited resources and the unique characteristics of the Egyptian Arabic dialect. Evaluation against established metrics showcases promising results, with our methodologies yielding a significant improvement of $56\%$ in English translation over the state-of-the-art and $9.3\%$ in Arabic translation. Since code-switching is deeply inherent in spoken languages, it is crucial that ASR systems can effectively handle this phenomenon. This capability is crucial for enabling seamless interaction in various domains, including business negotiations, cultural exchanges, and academic discourse. Our models and code are available as open-source resources. Code: \url{http://github.com/ahmedheakl/arazn-llm}}, Models: \url{http://huggingface.co/collections/ahmedheakl/arazn-llm-662ceaf12777656607b9524e}.

Updated: 2024-06-26 07:19:51

标题: ArzEn-LLM: 使用LLMs进行混合码波斯语-英语翻译和语音识别

摘要: 受到近年来埃及阿拉伯语和英语之间代码切换现象普遍增加的影响，本文探讨了机器翻译（MT）和自动语音识别（ASR）系统的复杂性，重点研究将代码切换的埃及阿拉伯语-英语翻译为英语或埃及阿拉伯语。我们的目标是介绍开发这些系统所采用的方法论，利用LLama和Gemma等大型语言模型。在ASR领域，我们探讨了Whisper模型在代码切换的埃及阿拉伯语识别中的利用，详细描述了我们的实验过程，包括数据预处理和训练技术。通过实施将ASR与MT集成的连续语音到文本翻译系统，我们旨在克服受限资源和埃及阿拉伯方言独特特征所带来的挑战。根据已建立的度量标准进行评估，我们的方法论在英语翻译方面取得了56%的显著改善，并在阿拉伯语翻译方面取得了9.3%的改善。由于代码切换在口语中根深蒂固，因此ASR系统能够有效处理这一现象至关重要。这种能力对于在各个领域实现无缝互动至关重要，包括商业谈判、文化交流和学术话语。我们的模型和代码可作为开源资源使用。代码：\url{http://github.com/ahmedheakl/arazn-llm}，模型：\url{http://huggingface.co/collections/ahmedheakl/arazn-llm-662ceaf12777656607b9524e}。

更新时间: 2024-06-26 07:19:51

领域: cs.CL,cs.AI,cs.CY,cs.LG

下载: http://arxiv.org/abs/2406.18120v1

Robust personnel rostering: how accurate should absenteeism predictions be?

Disruptions to personnel rosters caused by absenteeism often necessitate last-minute adjustments to the employees' working hours. A common strategy to mitigate the impact of such changes is to assign employees to reserve shifts: special on-call duties during which an employee can be called in to cover for an absent employee. To maximize roster robustness, we assume a predict-then-optimize approach that uses absence predictions from a machine learning model to schedule an adequate number of reserve shifts. In this paper we propose a methodology to evaluate the robustness of rosters generated by the predict-then-optimize approach, assuming the machine learning model will make predictions at a predetermined prediction performance level. Instead of training and testing machine learning models, our methodology simulates the predictions based on a characterization of model performance. We show how this methodology can be applied to identify the minimum performance level needed for the model to outperform simple non-data-driven robust rostering policies. In a computational study on a nurse rostering problem, we demonstrate how the predict-then-optimize approach outperforms non-data-driven policies under reasonable performance requirements, particularly when employees possess interchangeable skills.

Updated: 2024-06-26 07:16:18

标题: 强大的人员排班：缺勤预测应该有多准确？

摘要: 由缺勤引起的人员值班表中断通常需要对员工的工作时间进行临时调整。缓解这种变化影响的常见策略是将员工分配到备用班次：特殊的待命职责，员工可以在缺席员工时被叫来替班。为了最大限度地提高值班表的稳健性，我们采取了一种“预测-优化”方法，该方法使用机器学习模型的缺席预测来安排足够数量的备用班次。在本文中，我们提出了一种方法来评估通过“预测-优化”方法生成的值班表的稳健性，假设机器学习模型将在预定的预测性能水平上进行预测。我们的方法不是训练和测试机器学习模型，而是基于对模型性能的表征来模拟预测。我们展示了如何应用这种方法来确定模型需要的最低性能水平，以使其优于简单的非数据驱动稳健值班政策。通过对护士值班问题的计算研究，我们展示了在合理的性能要求下，“预测-优化”方法如何优于非数据驱动政策，特别是当员工具有可互换的技能时。

更新时间: 2024-06-26 07:16:18

领域: cs.LG

下载: http://arxiv.org/abs/2406.18119v1

SafeAligner: Safety Alignment against Jailbreak Attacks via Response Disparity Guidance

As the development of large language models (LLMs) rapidly advances, securing these models effectively without compromising their utility has become a pivotal area of research. However, current defense strategies against jailbreak attacks (i.e., efforts to bypass security protocols) often suffer from limited adaptability, restricted general capability, and high cost. To address these challenges, we introduce SafeAligner, a methodology implemented at the decoding stage to fortify defenses against jailbreak attacks. We begin by developing two specialized models: the Sentinel Model, which is trained to foster safety, and the Intruder Model, designed to generate riskier responses. SafeAligner leverages the disparity in security levels between the responses from these models to differentiate between harmful and beneficial tokens, effectively guiding the safety alignment by altering the output token distribution of the target model. Extensive experiments show that SafeAligner can increase the likelihood of beneficial tokens, while reducing the occurrence of harmful ones, thereby ensuring secure alignment with minimal loss to generality.

Updated: 2024-06-26 07:15:44

标题: SafeAligner：通过响应差异引导抵御越狱攻击的安全对齐器

摘要: 随着大型语言模型（LLMs）的快速发展，有效保护这些模型而不损害其实用性已成为研究的一个关键领域。然而，目前对抗越狱攻击（即绕过安全协议的努力）的防御策略通常受限于适应性有限、一般能力受限和成本高昂。为了解决这些挑战，我们引入了SafeAligner，这是一种在解码阶段实施的方法，用于加强对抗越狱攻击的防御。我们首先开发了两个专门的模型：Sentinel Model，用于培养安全性，以及Intruder Model，设计用于生成更具风险的响应。SafeAligner利用这些模型的响应之间安全级别的差异来区分有害和有益的标记，从而通过改变目标模型的输出标记分布有效地引导安全对齐。广泛的实验表明，SafeAligner可以增加有益标记的可能性，同时减少有害标记的发生，从而确保安全对齐，同时最小限度地损失一般性。

更新时间: 2024-06-26 07:15:44

领域: cs.CR,cs.CL

下载: http://arxiv.org/abs/2406.18118v1

Unifying the Perspectives of NLP and Software Engineering: A Survey on Language Models for Code

In this work we systematically review the recent advancements in software engineering with language models, covering 70+ models, 40+ evaluation tasks, 180+ datasets, and 900 related works. Unlike previous works, we integrate software engineering (SE) with natural language processing (NLP) by discussing the perspectives of both sides: SE applies language models for development automation, while NLP adopts SE tasks for language model evaluation. We break down code processing models into general language models represented by the GPT family and specialized models that are specifically pretrained on code, often with tailored objectives. We discuss the relations and differences between these models, and highlight the historical transition of code modeling from statistical models and RNNs to pretrained Transformers and LLMs, which is exactly the same course that had been taken by NLP. We also go beyond programming and review LLMs' application in other software engineering activities including requirement engineering, testing, deployment, and operations in an endeavor to provide a global view of NLP in SE, and identify key challenges and potential future directions in this domain. We keep the survey open and updated on GitHub at https://github.com/codefuse-ai/Awesome-Code-LLM.

Updated: 2024-06-26 07:11:00

标题: 统一NLP和软件工程的视角：关于代码语言模型的调查

摘要: 在这项工作中，我们系统地回顾了软件工程与语言模型的最新进展，涵盖了70多种模型、40多项评估任务、180多个数据集和900多个相关工作。与以往的研究不同，我们通过讨论双方的观点，将软件工程（SE）与自然语言处理（NLP）相结合：SE应用语言模型进行开发自动化，而NLP采用SE任务进行语言模型评估。我们将代码处理模型分解为由GPT系列代表的通用语言模型和专门在代码上预训练的具体模型，通常具有定制的目标。我们讨论了这些模型之间的关系和差异，并强调了代码建模从统计模型和RNNs到预训练的Transformers和LLMs的历史过渡，这恰好是NLP所经历的过程。我们还超越了编程，回顾了LLMs在其他软件工程活动中的应用，包括需求工程、测试、部署和运营，以提供NLP在SE中的全局视图，并确定这一领域的关键挑战和潜在未来方向。我们将调查保持开放并在GitHub上更新，网址为https://github.com/codefuse-ai/Awesome-Code-LLM。

更新时间: 2024-06-26 07:11:00

领域: cs.CL,cs.AI,cs.SE

下载: http://arxiv.org/abs/2311.07989v7

Empathy Detection from Text, Audiovisual, Audio or Physiological Signals: Task Formulations and Machine Learning Methods

Empathy indicates an individual's ability to understand others. Over the past few years, empathy has drawn attention from various disciplines, including but not limited to Affective Computing, Cognitive Science and Psychology. Detecting empathy has potential applications in society, healthcare and education. Despite being a broad and overlapping topic, the avenue of empathy detection leveraging Machine Learning remains underexplored from a systematic literature review perspective. We collected 828 papers from 10 well-known databases, systematically screened them and analysed the final 61 papers. Our analyses reveal several prominent task formulations $-$ including empathy on localised utterances or overall expressions, unidirectional or parallel empathy, and emotional contagion $-$ in monadic, dyadic and group interactions. Empathy detection methods are summarised based on four input modalities $-$ text, audiovisual, audio and physiological signals $-$ thereby presenting modality-specific network architecture design protocols. We discuss challenges, research gaps and potential applications in the Affective Computing-based empathy domain, which can facilitate new avenues of exploration. We further enlist the public availability of datasets and codes. We believe that our work is a stepping stone to developing a robust empathy detection system that can be deployed in practice to enhance the overall well-being of human life.

Updated: 2024-06-26 07:10:49

标题: 从文本、视听、音频或生理信号中检测共情能力：任务形式和机器学习方法

摘要: 共情能力表明一个人理解他人的能力。在过去的几年里，共情引起了各个学科的关注，包括但不限于情感计算、认知科学和心理学。检测共情在社会、医疗和教育领域具有潜在应用。尽管共情是一个广泛且重叠的话题，但利用机器学习来检测共情的路径在系统文献回顾的角度下仍未得到充分探索。我们从10个知名数据库收集了828篇论文，对它们进行了系统筛选和分析，最终分析了61篇论文。我们的分析揭示了几个突出的任务形式，包括局部言辞或整体表达中的共情，单向或平行共情，以及情感传染，在一对一、一对多和群体互动中的表现。根据四种输入方式（文本、视听、音频和生理信号），总结了共情检测方法，从而提出了特定于模式的网络架构设计协议。我们讨论了情感计算领域的共情领域中的挑战、研究空白和潜在应用，这可以促进新的探索途径。我们进一步列出了数据集和代码的公开可用性。我们相信我们的工作是发展一个强大的共情检测系统的基石，可以在实践中部署，以增强人类生活的整体幸福感。

更新时间: 2024-06-26 07:10:49

领域: cs.HC,cs.LG,cs.SI

下载: http://arxiv.org/abs/2311.00721v2

BADGE: BADminton report Generation and Evaluation with LLM

Badminton enjoys widespread popularity, and reports on matches generally include details such as player names, game scores, and ball types, providing audiences with a comprehensive view of the games. However, writing these reports can be a time-consuming task. This challenge led us to explore whether a Large Language Model (LLM) could automate the generation and evaluation of badminton reports. We introduce a novel framework named BADGE, designed for this purpose using LLM. Our method consists of two main phases: Report Generation and Report Evaluation. Initially, badminton-related data is processed by the LLM, which then generates a detailed report of the match. We tested different Input Data Types, In-Context Learning (ICL), and LLM, finding that GPT-4 performs best when using CSV data type and the Chain of Thought prompting. Following report generation, the LLM evaluates and scores the reports to assess their quality. Our comparisons between the scores evaluated by GPT-4 and human judges show a tendency to prefer GPT-4 generated reports. Since the application of LLM in badminton reporting remains largely unexplored, our research serves as a foundational step for future advancements in this area. Moreover, our method can be extended to other sports games, thereby enhancing sports promotion. For more details, please refer to https://github.com/AndyChiangSH/BADGE.

Updated: 2024-06-26 07:07:52

标题: BADGE: 用LLM生成和评估羽毛球报告

摘要: 羽毛球享有广泛的流行度，比赛报告通常包括球员姓名、比分和球类等细节，为观众提供了对比赛的全面了解。然而，撰写这些报告可能是一项耗时的任务。这一挑战促使我们探索一个大型语言模型（LLM）是否能自动化生成和评估羽毛球报告。我们引入了一个名为BADGE的新框架，专为此目的设计，使用LLM。我们的方法包括两个主要阶段：报告生成和报告评估。最初，羽毛球相关数据经过LLM处理，然后生成比赛的详细报告。我们测试了不同的输入数据类型、上下文学习（ICL）和LLM，发现在使用CSV数据类型和思维链提示时，GPT-4表现最佳。在报告生成后，LLM评估和打分报告以评估其质量。我们对由GPT-4评估的分数与人类评委评估的分数进行比较，显示出对GPT-4生成的报告的偏好。由于LLM在羽毛球报道中的应用仍然未被深入探讨，我们的研究为未来在这一领域的进展奠定了基础。此外，我们的方法可以扩展到其他体育比赛，从而提升体育推广。更多详情，请参阅https://github.com/AndyChiangSH/BADGE。

更新时间: 2024-06-26 07:07:52

领域: cs.CL,cs.AI,cs.HC

下载: http://arxiv.org/abs/2406.18116v1

Open-vocabulary Mobile Manipulation in Unseen Dynamic Environments with 3D Semantic Maps

Open-Vocabulary Mobile Manipulation (OVMM) is a crucial capability for autonomous robots, especially when faced with the challenges posed by unknown and dynamic environments. This task requires robots to explore and build a semantic understanding of their surroundings, generate feasible plans to achieve manipulation goals, adapt to environmental changes, and comprehend natural language instructions from humans. To address these challenges, we propose a novel framework that leverages the zero-shot detection and grounded recognition capabilities of pretraining visual-language models (VLMs) combined with dense 3D entity reconstruction to build 3D semantic maps. Additionally, we utilize large language models (LLMs) for spatial region abstraction and online planning, incorporating human instructions and spatial semantic context. We have built a 10-DoF mobile manipulation robotic platform JSR-1 and demonstrated in real-world robot experiments that our proposed framework can effectively capture spatial semantics and process natural language user instructions for zero-shot OVMM tasks under dynamic environment settings, with an overall navigation and task success rate of 80.95% and 73.33% over 105 episodes, and better SFT and SPL by 157.18% and 19.53% respectively compared to the baseline. Furthermore, the framework is capable of replanning towards the next most probable candidate location based on the spatial semantic context derived from the 3D semantic map when initial plans fail, keeping an average success rate of 76.67%.

Updated: 2024-06-26 07:06:42

标题: 在未知动态环境中使用3D语义地图进行开放词汇移动操作

摘要: 开放词汇移动操作（OVMM）是自主机器人的关键能力，特别是面对未知和动态环境所带来的挑战。这项任务要求机器人探索并建立对周围环境的语义理解，生成可行的计划来实现操作目标，适应环境变化，并理解人类的自然语言指令。为了解决这些挑战，我们提出了一个新颖的框架，利用预训练视觉语言模型（VLMs）的零样本检测和基于地面的识别能力，结合密集的3D实体重建来构建3D语义地图。此外，我们利用大型语言模型（LLMs）进行空间区域抽象和在线规划，融合人类指令和空间语义上下文。我们已经建立了一个具有10个自由度的移动操作机器人平台JSR-1，并在真实世界的机器人实验中展示了我们提出的框架可以有效地捕捉空间语义并处理自然语言用户指令，以完成动态环境设置下的零样本OVMM任务，105个周期内的整体导航和任务成功率为80.95％和73.33％，相比基线，SFT和SPL分别提高了157.18％和19.53％。此外，该框架能够根据从3D语义地图中导出的空间语义上下文，重新规划到下一个最有可能的候选位置，保持平均成功率为76.67％。

更新时间: 2024-06-26 07:06:42

领域: cs.RO,cs.AI,cs.CV

下载: http://arxiv.org/abs/2406.18115v1

ES-GNN: Generalizing Graph Neural Networks Beyond Homophily with Edge Splitting

While Graph Neural Networks (GNNs) have achieved enormous success in multiple graph analytical tasks, modern variants mostly rely on the strong inductive bias of homophily. However, real-world networks typically exhibit both homophilic and heterophilic linking patterns, wherein adjacent nodes may share dissimilar attributes and distinct labels. Therefore, GNNs smoothing node proximity holistically may aggregate both task-relevant and irrelevant (even harmful) information, limiting their ability to generalize to heterophilic graphs and potentially causing non-robustness. In this work, we propose a novel Edge Splitting GNN (ES-GNN) framework to adaptively distinguish between graph edges either relevant or irrelevant to learning tasks. This essentially transfers the original graph into two subgraphs with the same node set but complementary edge sets dynamically. Given that, information propagation separately on these subgraphs and edge splitting are alternatively conducted, thus disentangling the task-relevant and irrelevant features. Theoretically, we show that our ES-GNN can be regarded as a solution to a disentangled graph denoising problem, which further illustrates our motivations and interprets the improved generalization beyond homophily. Extensive experiments over 11 benchmark and 1 synthetic datasets not only demonstrate the effective performance of ES-GNN but also highlight its robustness to adversarial graphs and mitigation of the over-smoothing problem.

Updated: 2024-06-26 06:59:04

标题: ES-GNN：通过边分割泛化图神经网络超越同质性

摘要: 尽管图神经网络（GNNs）在多个图分析任务中取得了巨大成功，但现代变体主要依赖同质性的强归纳偏差。然而，现实世界中的网络通常同时展现同质性和异质性连接模式，相邻节点可能具有不同的属性和不同的标签。因此，GNN平滑节点的整体接近性可能会聚合任务相关和无关（甚至有害）信息，限制其对异质图的泛化能力，并潜在地导致非鲁棒性。在这项工作中，我们提出了一种新颖的边分割GNN（ES-GNN）框架，以自适应地区分对学习任务相关或无关的图边。这本质上将原始图转换为具有相同节点集但动态互补边集的两个子图。鉴于此，分别在这些子图上进行信息传播和边分割，从而解开任务相关和无关特征。从理论上讲，我们展示了我们的ES-GNN可以被视为解决解缠图去噪问题的一种解决方案，进一步说明了我们的动机，并解释了超越同质性的改进泛化能力。在11个基准和1个合成数据集上的大量实验不仅展示了ES-GNN的有效性，还突显了其对敌对图的鲁棒性和缓解过度平滑问题。

更新时间: 2024-06-26 06:59:04

领域: cs.LG

下载: http://arxiv.org/abs/2205.13700v4

Token-Weighted RNN-T for Learning from Flawed Data

ASR models are commonly trained with the cross-entropy criterion to increase the probability of a target token sequence. While optimizing the probability of all tokens in the target sequence is sensible, one may want to de-emphasize tokens that reflect transcription errors. In this work, we propose a novel token-weighted RNN-T criterion that augments the RNN-T objective with token-specific weights. The new objective is used for mitigating accuracy loss from transcriptions errors in the training data, which naturally appear in two settings: pseudo-labeling and human annotation errors. Experiments results show that using our method for semi-supervised learning with pseudo-labels leads to a consistent accuracy improvement, up to 38% relative. We also analyze the accuracy degradation resulting from different levels of WER in the reference transcription, and show that token-weighted RNN-T is suitable for overcoming this degradation, recovering 64%-99% of the accuracy loss.

Updated: 2024-06-26 06:48:11

标题: Token-Weighted RNN-T用于从有缺陷的数据中学习

摘要: ASR模型通常使用交叉熵标准来训练，以增加目标标记序列的概率。虽然优化目标序列中所有标记的概率是合理的，但有些标记可能是转录错误，因此有必要降低它们的重要性。在这项工作中，我们提出了一种新的token加权RNN-T标准，它通过标记特定的权重来增强RNN-T目标。新的目标用于减轻训练数据中由伪标签和人工注释错误导致的准确性损失，这些错误自然地出现在两种情况下。实验结果显示，使用我们的方法进行半监督学习与伪标签可以显著提高准确性，最多达到38%的相对改进。我们还分析了参考转录中不同词错误率（WER）水平导致的准确性降低，并展示了token加权RNN-T适用于克服这种降低，恢复64%-99%的准确性损失。

更新时间: 2024-06-26 06:48:11

领域: cs.CL,cs.LG,cs.SD,eess.AS

下载: http://arxiv.org/abs/2406.18108v1

Truthful Aggregation of LLMs with an Application to Online Advertising

Online platforms generate hundreds of billions of dollars in revenue per year by showing advertisements alongside their own content. Currently, these platforms are integrating Large Language Models (LLMs) into their services. This makes revenue generation from LLM-generated content the next major challenge in online advertising. We consider a scenario where advertisers aim to influence the responses of an LLM to align with their interests, while platforms seek to maximize advertiser value and ensure user satisfaction. We introduce an auction mechanism for this problem that operates without LLM fine-tuning or access to model weights and provably converges to the output of the optimally fine-tuned LLM for the platform's objective as computational resources increase. Our mechanism ensures that truthful reporting is a dominant strategy for advertisers and it aligns each advertiser's utility with their contribution to social welfare - an essential feature for long-term viability. Additionally, it can incorporate contextual information about the advertisers, significantly accelerating convergence. Via experiments with a publicly available LLM, we show that our mechanism significantly boosts advertiser value and platform revenue, with low computational overhead. While our motivating application is online advertising, our mechanism can be applied in any setting with monetary transfers, making it a general-purpose solution for truthfully aggregating the preferences of self-interested agents over LLM-generated replies.

Updated: 2024-06-26 06:29:32

标题: 真实聚合LMMs并应用于在线广告

摘要: 在线平台通过在其内容旁边展示广告每年创造数千亿美元的收入。目前，这些平台正在将大型语言模型（LLMs）集成到其服务中。这使得LLM生成的内容的收入成为在线广告的下一个主要挑战。我们考虑一个场景，其中广告商旨在影响LLM的响应以使其与他们的利益保持一致，而平台则寻求最大化广告商价值并确保用户满意度。我们为这个问题引入了一个拍卖机制，该机制在没有LLM微调或访问模型权重的情况下运行，并在计算资源增加时收敛到最优微调的LLM的输出，以实现平台目标。我们的机制确保真实报告对广告商是一个主导策略，并将每个广告商的效用与其对社会福利的贡献相一致 - 这是长期生存的一个重要特征。此外，它可以结合关于广告商的上下文信息，显著加速收敛。通过对一个公开可用的LLM进行实验，我们展示了我们的机制显著提高了广告商价值和平台收入，同时计算开销很低。虽然我们的激励应用是在线广告，但我们的机制可以应用于任何有货币转移的环境，使其成为一个通用的解决方案，用于真实地聚合自利代理人对LLM生成的回复的偏好。

更新时间: 2024-06-26 06:29:32

领域: cs.GT,cs.AI

下载: http://arxiv.org/abs/2405.05905v3

Xmodel-LM Technical Report

We introduce Xmodel-LM, a compact and efficient 1.1B language model pre-trained on around 2 trillion tokens. Trained on our self-built dataset (Xdata), which balances Chinese and English corpora based on downstream task optimization, Xmodel-LM exhibits remarkable performance despite its smaller size. It notably surpasses existing open-source language models of similar scale. Our model checkpoints and code are publicly accessible on GitHub at https://github.com/XiaoduoAILab/XmodelLM.

Updated: 2024-06-26 06:28:45

标题: Xmodel-LM技术报告

摘要: 我们介绍了Xmodel-LM，这是一个紧凑高效的1.1B语言模型，预训练了大约2万亿个标记。在我们自建的数据集Xdata上进行训练，该数据集基于下游任务优化平衡了中文和英文语料库，尽管规模较小，但Xmodel-LM表现出色。它明显超越了现有开源语言模型的类似规模。我们的模型检查点和代码可以在GitHub上公开访问：https://github.com/XiaoduoAILab/XmodelLM。

更新时间: 2024-06-26 06:28:45

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2406.02856v4

Single-sample versus case-control sampling scheme for Positive Unlabeled data: the story of two scenarios

In the paper we argue that performance of the classifiers based on Empirical Risk Minimization (ERM) for positive unlabeled data, which are designed for case-control sampling scheme may significantly deteriorate when applied to a single-sample scenario. We reveal why their behavior depends, in all but very specific cases, on the scenario. Also, we introduce a single-sample case analogue of the popular non-negative risk classifier designed for case-control data and compare its performance with the original proposal. We show that the significant differences occur between them, especiall when half or more positive of observations are labeled. The opposite case when ERM minimizer designed for the case-control case is applied for single-sample data is also considered and similar conclusions are drawn. Taking into account difference of scenarios requires a sole, but crucial, change in the definition of the Empirical Risk.

Updated: 2024-06-26 06:22:48

标题: 单样本与病例对照采样方案对于阳性未标记数据的影响：两种情景的故事

摘要: 在这篇论文中，我们认为基于经验风险最小化（ERM）的分类器在设计用于病例对照抽样方案的正未标记数据时，当应用于单个样本场景时，性能可能会显著恶化。我们揭示了它们的行为为何依赖于场景，在除非非常特定的情况下。此外，我们介绍了一个针对单个样本案例的流行非负风险分类器的类比，与原始提案进行了性能比较。我们表明它们之间存在显著差异，特别是当一半或更多的正样本被标记时。我们还考虑了将为病例对照案例设计的ERM最小化器应用于单样本数据的反向情况，并得出了类似的结论。考虑到不同场景的差异需要在经验风险的定义中进行唯一但至关重要的更改。

更新时间: 2024-06-26 06:22:48

领域: cs.LG

下载: http://arxiv.org/abs/2312.02095v2

LLM-Driven Multimodal Opinion Expression Identification

Opinion Expression Identification (OEI) is essential in NLP for applications ranging from voice assistants to depression diagnosis. This study extends OEI to encompass multimodal inputs, underlining the significance of auditory cues in delivering emotional subtleties beyond the capabilities of text. We introduce a novel multimodal OEI (MOEI) task, integrating text and speech to mirror real-world scenarios. Utilizing CMU MOSEI and IEMOCAP datasets, we construct the CI-MOEI dataset. Additionally, Text-to-Speech (TTS) technology is applied to the MPQA dataset to obtain the CIM-OEI dataset. We design a template for the OEI task to take full advantage of the generative power of large language models (LLMs). Advancing further, we propose an LLM-driven method STOEI, which combines speech and text modal to identify opinion expressions. Our experiments demonstrate that MOEI significantly improves the performance while our method outperforms existing methods by 9.20\% and obtains SOTA results.

Updated: 2024-06-26 05:52:47

标题: LLM驱动的多模态观点表达识别

摘要: 意见表达识别（OEI）在自然语言处理中至关重要，适用范围从语音助手到抑郁症诊断等各种应用。本研究将OEI扩展到跨越多模态输入，强调听觉线索在传递超越文本能力的情感微妙之处的重要性。我们引入了一项新颖的多模态OEI（MOEI）任务，将文本和语音整合起来，以反映现实世界的场景。利用CMU MOSEI和IEMOCAP数据集，我们构建了CI-MOEI数据集。此外，应用文本到语音（TTS）技术到MPQA数据集，获得了CIM-OEI数据集。我们设计了一个OEI任务的模板，充分利用大型语言模型（LLMs）的生成能力。更进一步，我们提出了一种LLM驱动的方法STOEI，结合语音和文本模态来识别意见表达。我们的实验表明，MOEI显著提高了性能，而我们的方法表现优于现有方法9.20％，并获得了SOTA结果。

更新时间: 2024-06-26 05:52:47

领域: cs.CL,cs.AI,cs.SD,eess.AS

下载: http://arxiv.org/abs/2406.18088v1

EHR-Based Mobile and Web Platform for Chronic Disease Risk Prediction Using Large Language Multimodal Models

Traditional diagnosis of chronic diseases involves in-person consultations with physicians to identify the disease. However, there is a lack of research focused on predicting and developing application systems using clinical notes and blood test values. We collected five years of Electronic Health Records (EHRs) from Taiwan's hospital database between 2017 and 2021 as an AI database. Furthermore, we developed an EHR-based chronic disease prediction platform utilizing Large Language Multimodal Models (LLMMs), successfully integrating with frontend web and mobile applications for prediction. This prediction platform can also connect to the hospital's backend database, providing physicians with real-time risk assessment diagnostics. The demonstration link can be found at https://www.youtube.com/watch?v=oqmL9DEDFgA.

Updated: 2024-06-26 05:51:08

标题: 基于电子健康记录的移动和Web平台，利用大型语言多模型进行慢性疾病风险预测

摘要: 传统慢性疾病诊断涉及与医生面对面咨询以识别疾病。然而，目前缺乏针对使用临床笔记和血液检测值预测和开发应用系统的研究。我们收集了2017年至2021年间来自台湾医院数据库的五年电子健康记录（EHRs）作为人工智能数据库。此外，我们开发了基于EHR的慢性疾病预测平台，利用大型语言多模型（LLMMs），成功地与前端网络和移动应用程序集成以进行预测。该预测平台还可以连接到医院的后端数据库，为医生提供实时风险评估诊断。演示链接可在https://www.youtube.com/watch?v=oqmL9DEDFgA找到。

更新时间: 2024-06-26 05:51:08

领域: cs.SE,cs.AI,cs.CL

下载: http://arxiv.org/abs/2406.18087v1

Generating Chain-of-Thoughts with a Pairwise-Comparison Approach to Searching for the Most Promising Intermediate Thought

To improve the ability of the large language model (LLMs) to tackle complex reasoning problems, chain-of-thoughts (CoT) methods were proposed to guide LLMs to reason step-by-step, enabling problem solving from simple to complex. State-of-the-art methods for generating such a chain involve interactive collaboration, where the learner generates candidate intermediate thoughts, evaluated by the LLM, guiding the generation of subsequent thoughts. However, a widespread yet understudied problem is that the evaluation from the LLM is typically noisy and unreliable, potentially misleading the generation process in selecting promising intermediate thoughts. In this paper, motivated by Vapnik's principle, we use pairwise-comparison evaluation instead of point-wise scoring to search for promising intermediate thoughts with the noisy feedback from the LLM. In each round, we randomly pair intermediate thoughts and directly prompt the LLM to select the more promising one from each pair, allowing us to identify the most promising thoughts through an iterative process. To further alleviate the noise in the comparison, we incorporate techniques from ensemble learning and dueling bandits, proposing two variants of the algorithm. Experiments on three real-world tasks demonstrate the effectiveness of our proposed algorithm and verify the rationale of the pairwise comparison mechanism.

Updated: 2024-06-26 05:47:52

标题: 用两两比较方法生成一系列思维链，以寻找最具潜力的中间思维

摘要: 为了提高大型语言模型（LLMs）解决复杂推理问题的能力，我们提出了一种链式思维（CoT）方法，以引导LLMs逐步推理，使问题从简单到复杂的解决变得可能。目前生成这种链式思维的最先进方法涉及交互式协作，学习者生成候选中间思维，由LLM评估，引导生成后续思维。然而，一个普遍但未充分研究的问题是，LLM的评估通常是嘈杂且不可靠的，可能误导生成过程选择有前途的中间思维。在本文中，受瓦普尼克原则的启发，我们使用成对比较评估而不是逐点评分，通过LLM的嘈杂反馈搜索有前途的中间思维。在每一轮中，我们随机配对中间思维，直接提示LLM从每对中选择更有前途的那一个，通过迭代过程找出最有前途的思维。为了进一步减轻比较中的噪音，我们结合集成学习和决斗贝叶斯技术，提出了两种算法变体。对三个真实任务的实验证明了我们提出算法的有效性，并验证了成对比较机制的合理性。

更新时间: 2024-06-26 05:47:52

领域: cs.LG,cs.AI,cs.CL

下载: http://arxiv.org/abs/2402.06918v2

Enabling Large Language Models to Perform Power System Simulations with Previously Unseen Tools: A Case of Daline

The integration of experiment technologies with large language models (LLMs) is transforming scientific research, offering AI capabilities beyond specialized problem-solving to becoming research assistants for human scientists. In power systems, simulations are essential for research. However, LLMs face significant challenges in power system simulations due to limited pre-existing knowledge and the complexity of power grids. To address this issue, this work proposes a modular framework that integrates expertise from both the power system and LLM domains. This framework enhances LLMs' ability to perform power system simulations on previously unseen tools. Validated using 34 simulation tasks in Daline, a (optimal) power flow simulation and linearization toolbox not yet exposed to LLMs, the proposed framework improved GPT-4o's simulation coding accuracy from 0% to 96.07%, also outperforming the ChatGPT-4o web interface's 33.8% accuracy (with the entire knowledge base uploaded). These results highlight the potential of LLMs as research assistants in power systems.

Updated: 2024-06-26 05:45:28

标题: 使大型语言模型能够使用以前未见过的工具进行电力系统模拟：以达林为例

摘要: 实验技术与大型语言模型（LLMs）的整合正在改变科学研究，为人类科学家提供了超越专业问题解决的AI能力，成为研究助手。在电力系统中，模拟对于研究至关重要。然而，由于先前知识有限和电力网络的复杂性，LLMs在电力系统模拟中面临重大挑战。为解决这一问题，本文提出了一个模块化框架，将电力系统和LLM领域的专业知识整合在一起。该框架增强了LLMs在以前未曾接触的工具上执行电力系统模拟的能力。通过在Daline中进行34个模拟任务的验证，Daline是一个尚未接触LLMs的（最优）潮流模拟和线性化工具箱，所提出的框架将GPT-4o的模拟编码准确性从0％提高到了96.07％，还超越了ChatGPT-4o网络界面的33.8％准确性（整个知识库已上传）。这些结果凸显了LLMs在电力系统中作为研究助手的潜力。

更新时间: 2024-06-26 05:45:28

领域: eess.SY,cs.AI,cs.SY

下载: http://arxiv.org/abs/2406.17215v2

InterCLIP-MEP: Interactive CLIP and Memory-Enhanced Predictor for Multi-modal Sarcasm Detection

The prevalence of sarcasm in social media, conveyed through text-image combinations, presents significant challenges for sentiment analysis and intention mining. Current multi-modal sarcasm detection methods have been proven to struggle with biases from spurious cues, leading to a superficial understanding of the complex interactions between text and image. To address these issues, we propose InterCLIP-MEP, a robust framework for multi-modal sarcasm detection. InterCLIP-MEP introduces a refined variant of CLIP, Interactive CLIP (InterCLIP), as the backbone, enhancing sample representations by embedding cross-modality information in each encoder. Furthermore, a novel training strategy is designed to adapt InterCLIP for a Memory-Enhanced Predictor (MEP). MEP uses dynamic dual-channel memory to store valuable historical knowledge of test samples and then leverages this memory as a non-parametric classifier to derive the final prediction. By using InterCLIP to encode text-image interactions more effectively and incorporating MEP, InterCLIP-MEP offers a more robust recognition of multi-modal sarcasm. Experiments demonstrate that InterCLIP-MEP achieves state-of-the-art performance on the MMSD2.0 benchmark. Code and data are available at https://github.com/CoderChen01/InterCLIP-MEP.

Updated: 2024-06-26 05:40:16

标题: InterCLIP-MEP：用于多模态讽刺检测的交互式CLIP和记忆增强预测器

摘要: 社交媒体中讽刺的普及，通过文本图像组合传达，给情感分析和意图挖掘带来了重大挑战。目前的多模态讽刺检测方法已被证明难以应对来自虚假线索的偏见，导致对文本和图像之间复杂互动的表面理解。为了解决这些问题，我们提出了InterCLIP-MEP，一个用于多模态讽刺检测的强大框架。InterCLIP-MEP引入了CLIP的一个精细变体，交互式CLIP（InterCLIP），作为骨干，通过在每个编码器中嵌入跨模态信息来增强样本表示。此外，设计了一种新颖的训练策略，将InterCLIP适应为Memory-Enhanced Predictor（MEP）。MEP使用动态双通道内存存储测试样本的有价值的历史知识，然后利用这个内存作为非参数分类器来推导最终的预测。通过更有效地使用InterCLIP来编码文本图像交互并结合MEP，InterCLIP-MEP提供了对多模态讽刺更强大的识别。实验表明，InterCLIP-MEP在MMSD2.0基准上实现了最先进的性能。代码和数据可在https://github.com/CoderChen01/InterCLIP-MEP 上获得。

更新时间: 2024-06-26 05:40:16

领域: cs.CL,cs.AI,cs.CV

下载: http://arxiv.org/abs/2406.16464v2

"Glue pizza and eat rocks" -- Exploiting Vulnerabilities in Retrieval-Augmented Generative Models

Retrieval-Augmented Generative (RAG) models enhance Large Language Models (LLMs) by integrating external knowledge bases, improving their performance in applications like fact-checking and information searching. In this paper, we demonstrate a security threat where adversaries can exploit the openness of these knowledge bases by injecting deceptive content into the retrieval database, intentionally changing the model's behavior. This threat is critical as it mirrors real-world usage scenarios where RAG systems interact with publicly accessible knowledge bases, such as web scrapings and user-contributed data pools. To be more realistic, we target a realistic setting where the adversary has no knowledge of users' queries, knowledge base data, and the LLM parameters. We demonstrate that it is possible to exploit the model successfully through crafted content uploads with access to the retriever. Our findings emphasize an urgent need for security measures in the design and deployment of RAG systems to prevent potential manipulation and ensure the integrity of machine-generated content.

Updated: 2024-06-26 05:36:23

标题: "粘合比萨和吃石头" ——利用检索增强生成模型的漏洞"

摘要: 检索增强生成(RAG)模型通过整合外部知识库来增强大型语言模型(LLMs)，提高它们在事实核查和信息搜索等应用中的性能。在本文中，我们展示了一种安全威胁，即对手可以利用这些知识库的开放性，向检索数据库中注入欺骗性内容，有意改变模型的行为。这种威胁是至关重要的，因为它反映了现实世界中RAG系统与公开可访问的知识库（如网页抓取和用户贡献的数据池）交互的使用场景。为了更加现实，我们针对一种现实设置，即对手没有用户查询、知识库数据和LLM参数的知识。我们展示通过访问检索器成功利用模型是可能的，通过精心制作的内容上传。我们的发现强调了在设计和部署RAG系统时迫切需要安全措施，以防止潜在的操纵并确保机器生成内容的完整性。

更新时间: 2024-06-26 05:36:23

领域: cs.CR,cs.AI

下载: http://arxiv.org/abs/2406.19417v1

Self-Training with Pseudo-Label Scorer for Aspect Sentiment Quad Prediction

Aspect Sentiment Quad Prediction (ASQP) aims to predict all quads (aspect term, aspect category, opinion term, sentiment polarity) for a given review, which is the most representative and challenging task in aspect-based sentiment analysis. A key challenge in the ASQP task is the scarcity of labeled data, which limits the performance of existing methods. To tackle this issue, we propose a self-training framework with a pseudo-label scorer, wherein a scorer assesses the match between reviews and their pseudo-labels, aiming to filter out mismatches and thereby enhance the effectiveness of self-training. We highlight two critical aspects to ensure the scorer's effectiveness and reliability: the quality of the training dataset and its model architecture. To this end, we create a human-annotated comparison dataset and train a generative model on it using ranking-based objectives. Extensive experiments on public ASQP datasets reveal that using our scorer can greatly and consistently improve the effectiveness of self-training. Moreover, we explore the possibility of replacing humans with large language models for comparison dataset annotation, and experiments demonstrate its feasibility. We release our code and data at https://github.com/HITSZ-HLT/ST-w-Scorer-ABSA .

Updated: 2024-06-26 05:30:21

标题: 自训练与伪标签评分器用于方面情感四元预测

摘要: Aspect Sentiment Quad Prediction (ASQP)旨在预测给定评论的所有四元组（方面术语，方面类别，意见术语，情感极性），这是基于方面的情感分析中最具代表性和具有挑战性的任务。 ASQP任务中的一个关键挑战是标记数据的稀缺性，这限制了现有方法的性能。为了解决这个问题，我们提出了一个带有伪标签评分器的自训练框架，其中评分器评估评论和它们的伪标签之间的匹配程度，旨在过滤出不匹配的内容，从而增强自训练的有效性。我们强调确保评分器的有效性和可靠性的两个关键方面：训练数据集的质量和其模型架构。为此，我们创建了一个人工注释的比较数据集，并使用基于排名的目标在其上训练了一个生成模型。对公共ASQP数据集的大量实验表明，使用我们的评分器可以极大且一致地提高自训练的有效性。此外，我们探讨了用大型语言模型替代人类进行比较数据集注释的可能性，并实验证明了其可行性。我们在https://github.com/HITSZ-HLT/ST-w-Scorer-ABSA 上发布了我们的代码和数据。

更新时间: 2024-06-26 05:30:21

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2406.18078v1

LayerMerge: Neural Network Depth Compression through Layer Pruning and Merging

Recent works show that reducing the number of layers in a convolutional neural network can enhance efficiency while maintaining the performance of the network. Existing depth compression methods remove redundant non-linear activation functions and merge the consecutive convolution layers into a single layer. However, these methods suffer from a critical drawback; the kernel size of the merged layers becomes larger, significantly undermining the latency reduction gained from reducing the depth of the network. We show that this problem can be addressed by jointly pruning convolution layers and activation functions. To this end, we propose LayerMerge, a novel depth compression method that selects which activation layers and convolution layers to remove, to achieve a desired inference speed-up while minimizing performance loss. Since the corresponding selection problem involves an exponential search space, we formulate a novel surrogate optimization problem and efficiently solve it via dynamic programming. Empirical results demonstrate that our method consistently outperforms existing depth compression and layer pruning methods on various network architectures, both on image classification and generation tasks. We release the code at https://github.com/snu-mllab/LayerMerge.

Updated: 2024-06-26 05:28:12

标题: LayerMerge: 通过层剪枝和合并实现神经网络深度压缩

摘要: 最近的研究表明，在保持网络性能的同时，减少卷积神经网络中的层数可以提高效率。现有的深度压缩方法通过移除冗余的非线性激活函数并将连续的卷积层合并为单个层来实现。然而，这些方法存在一个关键缺点；合并层的核大小变大，显著削弱了通过减少网络深度获得的延迟降低效果。我们表明，这个问题可以通过联合修剪卷积层和激活函数来解决。为此，我们提出了LayerMerge，一种新的深度压缩方法，选择要移除的激活层和卷积层，以实现所需的推断加速同时最小化性能损失。由于相应的选择问题涉及指数搜索空间，我们制定了一种新的替代优化问题，并通过动态规划高效解决它。实证结果表明，我们的方法在各种网络架构上始终优于现有的深度压缩和层修剪方法，无论是在图像分类还是生成任务上。我们将代码发布在https://github.com/snu-mllab/LayerMerge。

更新时间: 2024-06-26 05:28:12

领域: cs.LG,cs.CV

下载: http://arxiv.org/abs/2406.12837v2

Are Protein Language Models Compute Optimal?

While protein language models (pLMs) have transformed biological research, the scaling laws governing their improvement remain underexplored. By adapting methodologies from NLP scaling laws, we investigated the optimal ratio between model parameters and training tokens within a fixed compute budget. Our study reveals that pLM sizes scale sublinearly with compute budget, showing diminishing returns in performance as model size increases, and we identify a performance plateau in training loss comparable to the one found in relevant works in the field. Our findings suggest that widely-used pLMs might not be compute-optimal, indicating that larger models could achieve convergence more efficiently. Training a 35M model on a reduced token set, we attained perplexity results comparable to larger models like ESM-2 (15B) and xTrimoPGLM (100B) with a single dataset pass. This work paves the way towards more compute-efficient pLMs, democratizing their training and practical application in computational biology.

Updated: 2024-06-26 05:07:15

标题: 蛋白质语言模型是否能够实现计算最优？

摘要: 蛋白质语言模型（pLMs）已经改变了生物研究，但是控制其改进的规律仍然未被充分探讨。通过借鉴自然语言处理规律的方法，我们研究了在固定的计算预算内模型参数和训练标记之间的最佳比率。我们的研究表明，pLM的大小随着计算预算的增加呈亚线性变化，随着模型大小的增加，性能的收益递减，并且我们发现训练损失中存在与该领域相关作品中发现的性能平台相当的现象。我们的发现表明，广泛使用的pLM可能不是计算优化的，这表明更大的模型可能更有效地实现收敛。通过在减少的标记集上训练一个35M模型，我们获得了与更大模型如ESM-2（15B）和xTrimoPGLM（100B）类似的困惑度结果，只需一个数据集通过。这项工作为更高效的计算pLM铺平了道路，使它们在计算生物学中的培训和实际应用更加普及。

更新时间: 2024-06-26 05:07:15

领域: q-bio.BM,cs.AI

下载: http://arxiv.org/abs/2406.07249v2

Few-Shot Medical Image Segmentation with High-Fidelity Prototypes

Few-shot Semantic Segmentation (FSS) aims to adapt a pretrained model to new classes with as few as a single labelled training sample per class. Despite the prototype based approaches have achieved substantial success, existing models are limited to the imaging scenarios with considerably distinct objects and not highly complex background, e.g., natural images. This makes such models suboptimal for medical imaging with both conditions invalid. To address this problem, we propose a novel Detail Self-refined Prototype Network (DSPNet) to constructing high-fidelity prototypes representing the object foreground and the background more comprehensively. Specifically, to construct global semantics while maintaining the captured detail semantics, we learn the foreground prototypes by modelling the multi-modal structures with clustering and then fusing each in a channel-wise manner. Considering that the background often has no apparent semantic relation in the spatial dimensions, we integrate channel-specific structural information under sparse channel-aware regulation. Extensive experiments on three challenging medical image benchmarks show the superiority of DSPNet over previous state-of-the-art methods.

Updated: 2024-06-26 05:06:14

标题: 使用高保真度原型进行少样本医学图像分割

摘要: Few-shot Semantic Segmentation (FSS)旨在将预训练模型适应于新类别，每个类别只需一个标记的训练样本。尽管基于原型的方法取得了相当大的成功，但现有模型仅适用于具有明显不同对象和不高度复杂背景的成像场景，例如自然图像。这使得这种模型对于具有无效条件的医学成像不够理想。为了解决这个问题，我们提出了一种新颖的Detail Self-refined Prototype Network（DSPNet），以更全面地构建代表目标前景和背景的高保真度原型。具体地，为了构建全局语义并保持捕获的细节语义，我们通过对多模态结构进行聚类来学习前景原型，然后以通道方式融合每个原型。考虑到背景通常在空间维度上没有明显的语义关系，我们在稀疏通道感知调节下整合了通道特定的结构信息。在三个具有挑战性的医学图像基准上进行了大量实验，结果显示DSPNet优于之前的最先进方法。

更新时间: 2024-06-26 05:06:14

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2406.18074v1

Learning for Bandits under Action Erasures

We consider a novel multi-arm bandit (MAB) setup, where a learner needs to communicate the actions to distributed agents over erasure channels, while the rewards for the actions are directly available to the learner through external sensors. In our model, while the distributed agents know if an action is erased, the central learner does not (there is no feedback), and thus does not know whether the observed reward resulted from the desired action or not. We propose a scheme that can work on top of any (existing or future) MAB algorithm and make it robust to action erasures. Our scheme results in a worst-case regret over action-erasure channels that is at most a factor of $O(1/\sqrt{1-\epsilon})$ away from the no-erasure worst-case regret of the underlying MAB algorithm, where $\epsilon$ is the erasure probability. We also propose a modification of the successive arm elimination algorithm and prove that its worst-case regret is $\Tilde{O}(\sqrt{KT}+K/(1-\epsilon))$, which we prove is optimal by providing a matching lower bound.

Updated: 2024-06-26 05:03:00

标题: 动作擦除条件下的强盗学习

摘要: 我们考虑了一种新颖的多臂老虎机（MAB）设置，在这种设置中，学习者需要通过擦除通道将动作传达给分布式代理，同时通过外部传感器直接获得动作的奖励。在我们的模型中，虽然分布式代理知道动作是否被擦除，但中央学习者不知道（没有反馈），因此不知道观察到的奖励是否来自期望的动作。我们提出了一种方案，可以在任何（现有或未来的）MAB算法之上运行，并使其对动作擦除具有鲁棒性。我们的方案导致在动作擦除通道上的最坏情况后悔，最多是底层MAB算法的无擦除最坏情况后悔的$O(1/\sqrt{1-\epsilon})$因子，其中$\epsilon$是擦除概率。我们还提出了一种连续臂消除算法的修改，并证明其最坏情况后悔是$\Tilde{O}(\sqrt{KT}+K/(1-\epsilon))$，我们证明通过提供匹配下限，这是最优的。

更新时间: 2024-06-26 05:03:00

领域: stat.ML,cs.LG

下载: http://arxiv.org/abs/2406.18072v1

CIMRL: Combining IMitation and Reinforcement Learning for Safe Autonomous Driving

Modern approaches to autonomous driving rely heavily on learned components trained with large amounts of human driving data via imitation learning. However, these methods require large amounts of expensive data collection and even then face challenges with safely handling long-tail scenarios and compounding errors over time. At the same time, pure Reinforcement Learning (RL) methods can fail to learn performant policies in sparse, constrained, and challenging-to-define reward settings like driving. Both of these challenges make deploying purely cloned policies in safety critical applications like autonomous vehicles challenging. In this paper we propose Combining IMitation and Reinforcement Learning (CIMRL) approach - a framework that enables training driving policies in simulation through leveraging imitative motion priors and safety constraints. CIMRL does not require extensive reward specification and improves on the closed loop behavior of pure cloning methods. By combining RL and imitation, we demonstrate that our method achieves state-of-the-art results in closed loop simulation driving benchmarks.

Updated: 2024-06-26 04:56:39

标题: CIMRL：将模仿学习和强化学习相结合，实现安全自动驾驶

摘要: 现代自动驾驶的方法主要依赖于通过模仿学习训练的学习组件，这些组件使用大量的人类驾驶数据。然而，这些方法需要大量昂贵的数据收集，即使如此，也面临着安全处理长尾场景和随时间累积错误的挑战。同时，纯强化学习（RL）方法可能无法在稀疏、受限和难以定义奖励设置（如驾驶）中学习出性能良好的策略。这两个挑战使得在像自动驾驶汽车这样的安全关键应用中部署纯克隆策略具有挑战性。在本文中，我们提出了结合模仿学习和强化学习（CIMRL）方法- 一个框架，通过利用模仿运动先验和安全约束在模拟中训练驾驶策略。CIMRL 不需要广泛的奖励规范，并改善了纯克隆方法的闭环行为。通过结合RL和模仿学习，我们证明我们的方法在闭环模拟驾驶基准测试中取得了最先进的结果。

更新时间: 2024-06-26 04:56:39

领域: cs.LG

下载: http://arxiv.org/abs/2406.08878v3

Learning Optimal Filters Using Variational Inference

Filtering-the task of estimating the conditional distribution of states of a dynamical system given partial, noisy, observations-is important in many areas of science and engineering, including weather and climate prediction. However, the filtering distribution is generally intractable to obtain for high-dimensional, nonlinear systems. Filters used in practice, such as the ensemble Kalman filter (EnKF), are biased for nonlinear systems and have numerous tuning parameters. Here, we present a framework for learning a parameterized analysis map-the map that takes a forecast distribution and observations to the filtering distribution-using variational inference. We show that this methodology can be used to learn gain matrices for filtering linear and nonlinear dynamical systems, as well as inflation and localization parameters for an EnKF. Future work will apply this framework to learn new filtering algorithms.

Updated: 2024-06-26 04:51:14

标题: 使用变分推断学习最优滤波器

摘要: 过滤-估计动态系统状态的条件分布，给定部分、噪声观测-在许多科学和工程领域中都很重要，包括天气和气候预测。然而，对于高维、非线性系统，过滤分布通常难以获取。实际应用的过滤器，如集合卡尔曼滤波器（EnKF），对于非线性系统存在偏差，并具有许多调整参数。在这里，我们提出了一个通过变分推断学习参数化分析映射的框架-该映射将预测分布和观测转换为过滤分布。我们展示了这种方法可以用于学习线性和非线性动态系统的过滤增益矩阵，以及EnKF的膨胀和定位参数。未来的工作将应用这个框架来学习新的过滤算法。

更新时间: 2024-06-26 04:51:14

领域: cs.LG,math.DS

下载: http://arxiv.org/abs/2406.18066v1

Breaking the Barrier: Enhanced Utility and Robustness in Smoothed DRL Agents

Robustness remains a paramount concern in deep reinforcement learning (DRL), with randomized smoothing emerging as a key technique for enhancing this attribute. However, a notable gap exists in the performance of current smoothed DRL agents, often characterized by significantly low clean rewards and weak robustness. In response to this challenge, our study introduces innovative algorithms aimed at training effective smoothed robust DRL agents. We propose S-DQN and S-PPO, novel approaches that demonstrate remarkable improvements in clean rewards, empirical robustness, and robustness guarantee across standard RL benchmarks. Notably, our S-DQN and S-PPO agents not only significantly outperform existing smoothed agents by an average factor of $2.16\times$ under the strongest attack, but also surpass previous robustly-trained agents by an average factor of $2.13\times$. This represents a significant leap forward in the field. Furthermore, we introduce Smoothed Attack, which is $1.89\times$ more effective in decreasing the rewards of smoothed agents than existing adversarial attacks.

Updated: 2024-06-26 04:49:03

标题: 突破障碍：平滑的深度强化学习代理的增强效用和稳健性

摘要: 鲁棒性仍然是深度强化学习（DRL）中的一个重要关注点，随机平滑技术作为增强这一属性的关键技术得到了发展。然而，当前平滑DRL代理的性能存在显著差距，通常表现为明显降低的干净奖励和较弱的鲁棒性。为了应对这一挑战，我们的研究提出了旨在训练有效的平滑鲁棒DRL代理的创新算法。我们提出了S-DQN和S-PPO，这是两种新的方法，展示了在标准RL基准测试中干净奖励、实证鲁棒性和鲁棒性保证方面的显著改进。值得注意的是，我们的S-DQN和S-PPO代理不仅在最强攻击下平均超过现有平滑代理的$2.16\times$，而且在平均超过先前经过鲁棒训练的代理的$2.13\times$。这代表了该领域的重大进步。此外，我们引入了平滑攻击，其降低平滑代理奖励的效果比现有对抗性攻击提高了$1.89\times$。

更新时间: 2024-06-26 04:49:03

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2406.18062v1

AdaZeta: Adaptive Zeroth-Order Tensor-Train Adaption for Memory-Efficient Large Language Models Fine-Tuning

Fine-tuning large language models (LLMs) has achieved remarkable performance across various natural language processing tasks, yet it demands more and more memory as model sizes keep growing. To address this issue, the recently proposed Memory-efficient Zeroth-order (MeZO) methods attempt to fine-tune LLMs using only forward passes, thereby avoiding the need for a backpropagation graph. However, significant performance drops and a high risk of divergence have limited their widespread adoption. In this paper, we propose the Adaptive Zeroth-order Tensor-Train Adaption (AdaZeta) framework, specifically designed to improve the performance and convergence of the ZO methods. To enhance dimension-dependent ZO estimation accuracy, we introduce a fast-forward, low-parameter tensorized adapter. To tackle the frequently observed divergence issue in large-scale ZO fine-tuning tasks, we propose an adaptive query number schedule that guarantees convergence. Detailed theoretical analysis and extensive experimental results on Roberta-Large and Llama-2-7B models substantiate the efficacy of our AdaZeta framework in terms of accuracy, memory efficiency, and convergence speed.

Updated: 2024-06-26 04:33:13

标题: AdaZeta：用于内存高效的大型语言模型微调的自适应零阶张量列车调整

摘要: 大型语言模型（LLMs）的微调在各种自然语言处理任务中取得了显著的性能，然而随着模型规模的不断增长，它需要越来越多的内存。为了解决这个问题，最近提出的内存高效的零阶（MeZO）方法尝试使用仅前向传递来微调LLMs，从而避免了需要反向传播图的需求。然而，显著的性能下降和高风险的发散限制了它们的广泛采用。在本文中，我们提出了自适应零阶张量列车调整（AdaZeta）框架，专门设计用于改善ZO方法的性能和收敛性。为了提高依赖于维度的ZO估计精度，我们引入了一个快速的、低参数的张量化适配器。为了解决大规模ZO微调任务中经常观察到的发散问题，我们提出了一个自适应的查询数量调度，保证收敛。对Roberta-Large和Llama-2-7B模型的详细理论分析和广泛的实验结果证实了我们的AdaZeta框架在准确性、内存效率和收敛速度方面的有效性。

更新时间: 2024-06-26 04:33:13

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2406.18060v1

Towards Large Language Model Aided Program Refinement

Program refinement involves correctness-preserving transformations from formal high-level specification statements into executable programs. Traditional verification tool support for program refinement is highly interactive and lacks automation. On the other hand, the emergence of large language models (LLMs) enables automatic code generations from informal natural language specifications. However, code generated by LLMs is often unreliable. Moreover, the opaque procedure from specification to code provided by LLM is an uncontrolled black box. We propose LLM4PR, a tool that combines formal program refinement techniques with informal LLM-based methods to (1) transform the specification to preconditions and postconditions, (2) automatically build prompts based on refinement calculus, (3) interact with LLM to generate code, and finally, (4) verify that the generated code satisfies the conditions of refinement calculus, thus guaranteeing the correctness of the code. We have implemented our tool using GPT4, Coq, and Coqhammer, and evaluated it on the HumanEval and EvalPlus datasets.

Updated: 2024-06-26 04:29:27

标题: 朝着大型语言模型辅助的程序细化方向

摘要: 程序细化涉及从正式的高级规范语句到可执行程序的保持正确性的转换。传统的程序细化的验证工具支持高度互动且缺乏自动化。另一方面，大型语言模型（LLMs）的出现使得可以从非正式的自然语言规范生成代码成为可能。然而，由LLMs生成的代码通常不可靠。此外，LLM提供的从规范到代码的不透明过程是一个不受控制的黑盒。我们提出了LLM4PR，这是一个将正式程序细化技术与非正式LLM方法结合的工具，用于（1）将规范转换为前置条件和后置条件，（2）基于细化演算自动构建提示，（3）与LLM互动生成代码，最终，（4）验证生成的代码是否满足细化演算条件，从而保证代码的正确性。我们使用GPT4，Coq和Coqhammer实现了我们的工具，并在HumanEval和EvalPlus数据集上进行了评估。

更新时间: 2024-06-26 04:29:27

领域: cs.SE,cs.AI,cs.CL,K.6.3

下载: http://arxiv.org/abs/2406.18616v1

Safely Learning with Private Data: A Federated Learning Framework for Large Language Model

Private data, being larger and quality-higher than public data, can greatly improve large language models (LLM). However, due to privacy concerns, this data is often dispersed in multiple silos, making its secure utilization for LLM training a challenge. Federated learning (FL) is an ideal solution for training models with distributed private data, but traditional frameworks like FedAvg are unsuitable for LLM due to their high computational demands on clients. An alternative, split learning, offloads most training parameters to the server while training embedding and output layers locally, making it more suitable for LLM. Nonetheless, it faces significant challenges in security and efficiency. Firstly, the gradients of embeddings are prone to attacks, leading to potential reverse engineering of private data. Furthermore, the server's limitation of handle only one client's training request at a time hinders parallel training, severely impacting training efficiency. In this paper, we propose a Federated Learning framework for LLM, named FL-GLM, which prevents data leakage caused by both server-side and peer-client attacks while improving training efficiency. Specifically, we first place the input block and output block on local client to prevent embedding gradient attacks from server. Secondly, we employ key-encryption during client-server communication to prevent reverse engineering attacks from peer-clients. Lastly, we employ optimization methods like client-batching or server-hierarchical, adopting different acceleration methods based on the actual computational capabilities of the server. Experimental results on NLU and generation tasks demonstrate that FL-GLM achieves comparable metrics to centralized chatGLM model, validating the effectiveness of our federated learning framework.

Updated: 2024-06-26 04:28:38

标题: 使用私有数据安全学习：面向大型语言模型的联邦学习框架

摘要: 私人数据比公共数据更大且质量更高，可以极大地改善大型语言模型（LLM）。然而，由于隐私问题，这些数据通常分散在多个孤立的存储中，使其安全利用于LLM训练成为一项挑战。联邦学习（FL）是一个理想的解决方案，用于训练具有分布式私人数据的模型，但传统的框架如FedAvg由于对客户端的高计算需求，不适用于LLM。一种替代方案，分裂学习，将大部分训练参数转移到服务器上，同时在本地训练嵌入和输出层，使其更适用于LLM。然而，它面临着安全性和效率方面的重大挑战。首先，嵌入的梯度容易受到攻击，导致私人数据的潜在反向工程。此外，服务器一次只能处理一个客户端的训练请求的限制，阻碍了并行训练，严重影响了训练效率。在本文中，我们提出了一个名为FL-GLM的LLM联邦学习框架，旨在防止由服务器端和对等客户端攻击引起的数据泄漏，同时提高训练效率。具体来说，我们首先将输入块和输出块放在本地客户端上，以防止从服务器发起的嵌入梯度攻击。其次，在客户端-服务器通信过程中采用密钥加密，以防止对等客户端的反向工程攻击。最后，我们采用优化方法，如客户端批处理或服务器分层，根据服务器的实际计算能力采用不同的加速方法。在NLU和生成任务上的实验结果表明，FL-GLM实现了与集中chatGLM模型可比较的指标，验证了我们联邦学习框架的有效性。

更新时间: 2024-06-26 04:28:38

领域: cs.CR,cs.CL

下载: http://arxiv.org/abs/2406.14898v2

Fuzzing at Scale: The Untold Story of the Scheduler

How to search for bugs in 1,000 programs using a pre-existing fuzzer and a standard PC? We consider this problem and show that a well-designed strategy that determines which programs to fuzz and for how long can greatly impact the number of bugs found across the programs. In fact, the impact of employing an effective strategy is comparable to that of utilizing a state-of-the-art fuzzer. The considered problem is referred to as fuzzing at scale, and the strategy as scheduler. We show that besides a naive scheduler, that allocates equal fuzz time to all programs, we can consider dynamic schedulers that adjust time allocation based on the ongoing fuzzing progress of individual programs. Such schedulers are superior because they lead both to higher number of total found bugs and to higher number of found bugs for most programs. The performance gap between naive and dynamic schedulers can be as wide (or even wider) as the gap between two fuzzers. Our findings thus suggest that the problem of advancing schedulers is fundamental for fuzzing at scale. We develop several schedulers and leverage the most sophisticated one to fuzz simultaneously our newly compiled benchmark of around 5,000 Ubuntu programs, and detect 4908 bugs.

Updated: 2024-06-26 04:28:02

标题: 规模化模糊测试：调度器的未被揭示的故事

摘要: 如何使用现有的模糊测试工具和标准个人电脑在1,000个程序中搜索漏洞？我们考虑了这个问题，并展示了一个精心设计的策略，确定哪些程序进行模糊测试以及测试时间长度可以极大地影响在这些程序中找到的漏洞数量。事实上，采用一个有效的策略的影响可与使用最先进的模糊测试工具相媲美。所考虑的问题被称为规模化模糊测试，策略被称为调度器。我们发现，除了一个简单的调度器，为所有程序分配相等的模糊测试时间外，我们还可以考虑根据各个程序的进行中的模糊测试进度调整时间分配的动态调度器。这样的调度器更优越，因为它们不仅导致找到的总漏洞数量更多，而且大多数程序找到的漏洞数量更多。简单调度器和动态调度器之间的性能差距可能与两个模糊测试工具之间的差距一样大（甚至更大）。因此，我们的发现表明，推进调度器的问题对于规模化模糊测试是至关重要的。我们开发了几个调度器，并利用最复杂的调度器同时对我们新编译的大约5,000个Ubuntu程序进行模糊测试，并检测到了4908个漏洞。

更新时间: 2024-06-26 04:28:02

领域: cs.CR

下载: http://arxiv.org/abs/2406.18058v1

Neural Methods for Amortised Inference

Simulation-based methods for statistical inference have evolved dramatically over the past 50 years, keeping pace with technological advancements. The field is undergoing a new revolution as it embraces the representational capacity of neural networks, optimisation libraries and graphics processing units for learning complex mappings between data and inferential targets. The resulting tools are amortised, in the sense that they allow rapid inference through fast feedforward operations. In this article we review recent progress in the context of point estimation, approximate Bayesian inference, summary-statistic construction, and likelihood approximation. We also cover software, and include a simple illustration to showcase the wide array of tools available for amortised inference and the benefits they offer over Markov chain Monte Carlo methods. The article concludes with an overview of relevant topics and an outlook on future research directions.

Updated: 2024-06-26 04:27:25

标题: 神经网络方法用于摊销推断

摘要: 过去50年来，基于模拟的统计推断方法已经发生了巨大的发展，与技术进步同步。随着神经网络、优化库和图形处理单元等技术的应用，该领域正在经历一场新的革命，以便学习数据和推断目标之间的复杂映射。由此产生的工具是摊销的，因为它们通过快速的前馈操作实现快速推断。在本文中，我们回顾了最近在点估计、近似贝叶斯推断、摘要统计构建和似然近似方面取得的进展。我们还涵盖软件，并包括一个简单的示例，展示了可用于摊销推断的各种工具以及它们相对于马尔可夫链蒙特卡罗方法的优势。文章最后总结了相关主题，并展望了未来的研究方向。

更新时间: 2024-06-26 04:27:25

领域: stat.ML,cs.LG,stat.CO

下载: http://arxiv.org/abs/2404.12484v3

Iterated Denoising Energy Matching for Sampling from Boltzmann Densities

Efficiently generating statistically independent samples from an unnormalized probability distribution, such as equilibrium samples of many-body systems, is a foundational problem in science. In this paper, we propose Iterated Denoising Energy Matching (iDEM), an iterative algorithm that uses a novel stochastic score matching objective leveraging solely the energy function and its gradient -- and no data samples -- to train a diffusion-based sampler. Specifically, iDEM alternates between (I) sampling regions of high model density from a diffusion-based sampler and (II) using these samples in our stochastic matching objective to further improve the sampler. iDEM is scalable to high dimensions as the inner matching objective, is simulation-free, and requires no MCMC samples. Moreover, by leveraging the fast mode mixing behavior of diffusion, iDEM smooths out the energy landscape enabling efficient exploration and learning of an amortized sampler. We evaluate iDEM on a suite of tasks ranging from standard synthetic energy functions to invariant $n$-body particle systems. We show that the proposed approach achieves state-of-the-art performance on all metrics and trains $2-5\times$ faster, which allows it to be the first method to train using energy on the challenging $55$-particle Lennard-Jones system.

Updated: 2024-06-26 04:14:13

标题: Iterated Denoising Energy Matching 用于从玻尔兹曼密度中抽样

摘要: 高效地从未归一化的概率分布生成统计独立样本，例如多体系统的平衡样本，是科学中的一个基本问题。在本文中，我们提出了迭代去噪能量匹配（iDEM）算法，这是一种迭代算法，利用一种新颖的随机评分匹配目标，仅利用能量函数及其梯度 -- 而不使用数据样本 -- 来训练基于扩散的采样器。具体而言，iDEM在（I）从基于扩散的采样器中采样高模型密度区域和（II）使用这些样本在我们的随机匹配目标中进行交替，以进一步改进采样器。iDEM可扩展到高维度，作为内部匹配目标，不需要模拟，也不需要MCMC样本。此外，通过利用扩散的快速模式混合行为，iDEM平滑了能量景观，实现了有效的探索和学习摊销采样器。我们在一系列任务上评估了iDEM，从标准合成能量函数到不变的n体粒子系统。我们展示了所提出的方法在所有指标上实现了最先进的性能，并且训练速度快2-5倍，这使得它成为首个在具有挑战性的55个粒子Lennard-Jones系统上使用能量进行训练的方法。

更新时间: 2024-06-26 04:14:13

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2402.06121v2

A Large-Scale Exploration of $μ$-Transfer

Large artificial neural networks have become a mainstay of language, vision, and audio processing and synthesis, yet their initializations and learning rates are often set in an unsophisticated fashion, due to the high cost of hyperparameter sweeps at scale. The $\mu$-Parameterization ($\mu$P) offers a potential solution to this challenge, yielding scaling rules for model initialization and learning rates while reportedly enabling zero-shot hyperparameter transfer from small to large models. Despite its evident promise, the $\mu$P method is not yet widely adopted, perhaps due to higher implementation complexity, many variations, or complex theoretical background. This work investigates $\mu$P empirically, focusing on the ubiquitous transformer architecture, and aims to answer a simple question: does $\mu$-Transfer yield optimal learning rates in practice? Studying models of up to 10B parameters and training budgets of up to 190B tokens, we find $\mu$-Transfer works as intended for the majority of important cases, yet also identify a few cases where it may not.

Updated: 2024-06-26 04:07:08

标题: 一个大规模的$μ$-传输探索

摘要: 大型人工神经网络已经成为语言、视觉和音频处理与合成的主要工具，然而它们的初始化和学习率通常以一种不够复杂的方式设置，这是由于在大规模超参数扫描中的高成本。$\mu$-参数化（$\mu$P）提供了对这一挑战的潜在解决方案，为模型初始化和学习率提供了缩放规则，同时据报道能够实现从小到大模型的零-shot超参数转移。尽管$\mu$P方法具有明显的潜力，但其实际上尚未被广泛采用，可能是由于实现复杂性较高、存在许多变体或复杂的理论背景。本研究从经验的角度对$\mu$P进行了调查，重点关注普遍存在的transformer架构，并旨在回答一个简单的问题：实际上$\mu$-转移是否能够产生最佳的学习率？通过研究高达10B参数和高达190B标记的训练预算的模型，我们发现$\mu$-转移在大多数重要情况下能够如预期地发挥作用，但也确定了一些可能不适用的情况。

更新时间: 2024-06-26 04:07:08

领域: cs.LG

下载: http://arxiv.org/abs/2404.05728v5

Bidirectional-Reachable Hierarchical Reinforcement Learning with Mutually Responsive Policies

Hierarchical reinforcement learning (HRL) addresses complex long-horizon tasks by skillfully decomposing them into subgoals. Therefore, the effectiveness of HRL is greatly influenced by subgoal reachability. Typical HRL methods only consider subgoal reachability from the unilateral level, where a dominant level enforces compliance to the subordinate level. However, we observe that when the dominant level becomes trapped in local exploration or generates unattainable subgoals, the subordinate level is negatively affected and cannot follow the dominant level's actions. This can potentially make both levels stuck in local optima, ultimately hindering subsequent subgoal reachability. Allowing real-time bilateral information sharing and error correction would be a natural cure for this issue, which motivates us to propose a mutual response mechanism. Based on this, we propose the Bidirectional-reachable Hierarchical Policy Optimization (BrHPO)--a simple yet effective algorithm that also enjoys computation efficiency. Experiment results on a variety of long-horizon tasks showcase that BrHPO outperforms other state-of-the-art HRL baselines, coupled with a significantly higher exploration efficiency and robustness.

Updated: 2024-06-26 04:05:04

标题: 具有相互响应策略的双向可达分层强化学习

摘要: Hierarchical reinforcement learning (HRL)通过巧妙地将复杂且长期目标分解为子目标来解决问题。因此，HRL的有效性受到子目标可达性的极大影响。典型的HRL方法只考虑从单边层面的子目标可达性，其中一个主导级别强制执行对从属级别的遵守。然而，我们观察到当主导级别陷入局部探索或生成无法实现的子目标时，从属级别会受到负面影响，无法跟随主导级别的行动。这可能使两个级别都陷入局部最优解，最终阻碍后续子目标的可达性。允许实时的双向信息共享和错误修正将是解决此问题的自然方法，这激励我们提出一种相互响应机制。基于此，我们提出了双向可达性分层策略优化（BrHPO）-一种简单而有效的算法，同时也具有计算效率。对各种长期目标任务的实验结果显示，BrHPO优于其他最先进的HRL基线，同时具有更高的探索效率和鲁棒性。

更新时间: 2024-06-26 04:05:04

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2406.18053v1

Improving Entity Recognition Using Ensembles of Deep Learning and Fine-tuned Large Language Models: A Case Study on Adverse Event Extraction from Multiple Sources

Adverse event (AE) extraction following COVID-19 vaccines from text data is crucial for monitoring and analyzing the safety profiles of immunizations. Traditional deep learning models are adept at learning intricate feature representations and dependencies in sequential data, but often require extensive labeled data. In contrast, large language models (LLMs) excel in understanding contextual information, but exhibit unstable performance on named entity recognition tasks, possibly due to their broad but unspecific training. This study aims to evaluate the effectiveness of LLMs and traditional deep learning models in AE extraction, and to assess the impact of ensembling these models on performance. In this study, we utilized reports and posts from the VAERS (n=621), Twitter (n=9,133), and Reddit (n=131) as our corpora. Our goal was to extract three types of entities: "vaccine", "shot", and "ae". We explored and fine-tuned (except GPT-4) multiple LLMs, including GPT-2, GPT-3.5, GPT-4, and Llama-2, as well as traditional deep learning models like RNN and BioBERT. To enhance performance, we created ensembles of the three models with the best performance. For evaluation, we used strict and relaxed F1 scores to evaluate the performance for each entity type, and micro-average F1 was used to assess the overall performance. The ensemble model achieved the highest performance in "vaccine", "shot", and "ae" with strict F1-scores of 0.878, 0.930, and 0.925, respectively, along with a micro-average score of 0.903. In conclusion, this study demonstrates the effectiveness and robustness of ensembling fine-tuned traditional deep learning models and LLMs, for extracting AE-related information. This study contributes to the advancement of biomedical natural language processing, providing valuable insights into improving AE extraction from text data for pharmacovigilance and public health surveillance.

Updated: 2024-06-26 03:56:21

标题: 利用深度学习和微调大型语言模型的集成来改进实体识别：对从多个来源提取的不良事件的案例研究

摘要: 从文本数据中提取COVID-19疫苗后的不良事件（AE）是监测和分析免疫接种安全性概况至关重要。传统深度学习模型擅长学习序列数据中复杂特征表示和依赖关系，但通常需要大量标记数据。相比之下，大型语言模型（LLMs）擅长理解上下文信息，但在命名实体识别任务上表现不稳定，可能是由于它们广泛但不具体的训练造成的。本研究旨在评估LLMs和传统深度学习模型在AE提取中的有效性，并评估这些模型合并对性能的影响。在该研究中，我们利用了VAERS（n=621）、Twitter（n=9,133）和Reddit（n=131）的报告和帖子作为我们的语料库。我们的目标是提取三种类型的实体：“疫苗”、“注射”和“AE”。我们探索并对多个LLMs进行了微调，包括GPT-2、GPT-3.5、GPT-4和Llama-2，以及传统深度学习模型如RNN和BioBERT。为了提高性能，我们创建了三个表现最好的模型的集成模型。为评估，我们使用严格和放宽的F1分数来评估每种实体类型的性能，使用微平均F1来评估整体性能。合成模型在“疫苗”、“注射”和“AE”方面表现最佳，严格F1分数分别为0.878、0.930和0.925，微平均分数为0.903。总之，本研究展示了微调传统深度学习模型和LLMs合成的有效性和稳健性，用于从文本数据中提取与AE相关的信息。本研究有助于推进生物医学自然语言处理，为改进从文本数据中提取AE的药物监测和公共卫生监测提供了宝贵的见解。

更新时间: 2024-06-26 03:56:21

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2406.18049v1

Pre-Calc: Learning to Use the Calculator Improves Numeracy in Language Models

Quantitative and numerical comprehension in language is an important task in many fields like education and finance, but still remains a challenging task for language models. While tool and calculator usage has shown to be helpful to improve mathematical reasoning in large pretrained decoder-only language models, this remains unexplored for smaller language models with encoders. In this paper, we propose Pre-Calc, a simple pre-finetuning objective of learning to use the calculator for both encoder-only and encoder-decoder architectures, formulated as a discriminative and generative task respectively. We pre-train BERT and RoBERTa for discriminative calculator use and Flan-T5 for generative calculator use on the MAWPS, SVAMP, and AsDiv-A datasets, which improves performance on downstream tasks that require numerical understanding. Our code and data are available at https://github.com/calc-cmu/pre-calc.

Updated: 2024-06-26 03:52:24

标题: Pre-Calc：学习使用计算器提高语言模型的数字能力

摘要: 语言中的数量和数字理解是教育和金融等许多领域中的重要任务，但对于语言模型仍然是一个具有挑战性的任务。虽然工具和计算器的使用已被证明有助于改善大型预训练的仅解码器语言模型中的数学推理，但对于具有编码器的较小语言模型仍未得到探讨。在本文中，我们提出了Pre-Calc，一个简单的预微调目标，旨在学习如何在仅编码器和编码器-解码器架构中使用计算器，分别制定为鉴别和生成任务。我们在MAWPS、SVAMP和AsDiv-A数据集上为鉴别性计算器使用预训练了BERT和RoBERTa，为生成性计算器使用预训练了Flan-T5，从而提高了需要数值理解的下游任务的性能。我们的代码和数据可以在https://github.com/calc-cmu/pre-calc 上找到。

更新时间: 2024-06-26 03:52:24

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2404.14355v3

PharmGPT: Domain-Specific Large Language Models for Bio-Pharmaceutical and Chemistry

Large language models (LLMs) have revolutionized Natural Language Processing (NLP) by by minimizing the need for complex feature engineering. However, the application of LLMs in specialized domains like biopharmaceuticals and chemistry remains largely unexplored. These fields are characterized by intricate terminologies, specialized knowledge, and a high demand for precision areas where general purpose LLMs often fall short. In this study, we introduce PharmGPT, a suite of multilingual LLMs with 13 billion and 70 billion parameters, specifically trained on a comprehensive corpus of hundreds of billions of tokens tailored to the Bio-Pharmaceutical and Chemical sectors. Our evaluation shows that PharmGPT matches or surpasses existing general models on key benchmarks, such as NAPLEX, demonstrating its exceptional capability in domain-specific tasks. This advancement establishes a new benchmark for LLMs in the Bio-Pharmaceutical and Chemical fields, addressing the existing gap in specialized language modeling. Furthermore, this suggests a promising path for enhanced research and development in these specialized areas, paving the way for more precise and effective applications of NLP in specialized domains.

Updated: 2024-06-26 03:43:09

标题: PharmGPT：面向生物制药和化学领域的领域特定大型语言模型

摘要: 大型语言模型（LLMs）通过最小化复杂特征工程的需求，彻底改变了自然语言处理（NLP）。然而，在生物制药和化学等专业领域中，LLMs的应用仍然未被充分探索。这些领域以复杂的术语、专业知识和对精度要求高为特征，这些领域通常使通用LLMs表现不佳。在本研究中，我们介绍了PharmGPT，这是一套具有130亿和700亿参数的多语言LLMs，专门在数百亿令牌的全面语料库上进行训练，以适应生物制药和化学领域。我们的评估显示，PharmGPT在关键基准测试（如NAPLEX）上与或超过现有的通用模型，展示了其在领域特定任务中的出色能力。这一进步为生物制药和化学领域的LLMs设立了新的基准，填补了专业语言建模领域的空白。此外，这也为这些专业领域的增强研究和开发开辟了一条有前途的道路，为NLP在专业领域中的更精确和有效的应用铺平了道路。

更新时间: 2024-06-26 03:43:09

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2406.18045v1

Multimodal foundation world models for generalist embodied agents

Learning generalist embodied agents, able to solve multitudes of tasks in different domains is a long-standing problem. Reinforcement learning (RL) is hard to scale up as it requires a complex reward design for each task. In contrast, language can specify tasks in a more natural way. Current foundation vision-language models (VLMs) generally require fine-tuning or other adaptations to be functional, due to the significant domain gap. However, the lack of multimodal data in such domains represents an obstacle toward developing foundation models for embodied applications. In this work, we overcome these problems by presenting multimodal foundation world models, able to connect and align the representation of foundation VLMs with the latent space of generative world models for RL, without any language annotations. The resulting agent learning framework, GenRL, allows one to specify tasks through vision and/or language prompts, ground them in the embodied domain's dynamics, and learns the corresponding behaviors in imagination. As assessed through large-scale multi-task benchmarking, GenRL exhibits strong multi-task generalization performance in several locomotion and manipulation domains. Furthermore, by introducing a data-free RL strategy, it lays the groundwork for foundation model-based RL for generalist embodied agents.

Updated: 2024-06-26 03:41:48

标题: 多模态基础世界模型对于通用的具身代理的影响

摘要: 学习通用的具有体现性的代理程序，能够解决多种领域的任务是一个长期存在的问题。强化学习（RL）很难扩展，因为它需要为每个任务设计复杂的奖励机制。相比之下，语言可以以更自然的方式指定任务。当前基础视觉-语言模型（VLMs）通常需要微调或其他适应性措施才能发挥作用，因为领域差距显著。然而，在这些领域中缺乏多模态数据是发展面向体现应用的基础模型的障碍。在这项工作中，我们通过提出多模态基础世界模型来克服这些问题，能够将基础VLMs的表示与RL的生成世界模型的潜在空间连接和对齐，而无需任何语言标注。由此产生的代理学习框架GenRL，允许通过视觉和/或语言提示指定任务，并将其基于体现领域的动态，通过想象学习相应的行为。通过大规模多任务基准测试评估，GenRL在几个运动和操纵领域展现出强大的多任务泛化性能。此外，通过引入无数据RL策略，为面向通用性的具有体现性代理的基础模型RL奠定了基础。

更新时间: 2024-06-26 03:41:48

领域: cs.AI,cs.CV,cs.LG,cs.RO

下载: http://arxiv.org/abs/2406.18043v1

STEEL: Singularity-aware Reinforcement Learning

Batch reinforcement learning (RL) aims at leveraging pre-collected data to find an optimal policy that maximizes the expected total rewards in a dynamic environment. The existing methods require absolutely continuous assumption (e.g., there do not exist non-overlapping regions) on the distribution induced by target policies with respect to the data distribution over either the state or action or both. We propose a new batch RL algorithm that allows for singularity for both state and action spaces (e.g., existence of non-overlapping regions between offline data distribution and the distribution induced by the target policies) in the setting of an infinite-horizon Markov decision process with continuous states and actions. We call our algorithm STEEL: SingulariTy-awarE rEinforcement Learning. Our algorithm is motivated by a new error analysis on off-policy evaluation, where we use maximum mean discrepancy, together with distributionally robust optimization, to characterize the error of off-policy evaluation caused by the possible singularity and to enable model extrapolation. By leveraging the idea of pessimism and under some technical conditions, we derive a first finite-sample regret guarantee for our proposed algorithm under singularity. Compared with existing algorithms,by requiring only minimal data-coverage assumption, STEEL improves the applicability and robustness of batch RL. In addition, a two-step adaptive STEEL, which is nearly tuning-free, is proposed. Extensive simulation studies and one (semi)-real experiment on personalized pricing demonstrate the superior performance of our methods in dealing with possible singularity in batch RL.

Updated: 2024-06-26 03:39:39

标题: 钢铁：具有奇点意识的强化学习

摘要: 批量强化学习（RL）旨在利用预先收集的数据，在动态环境中找到一个最优策略，最大化预期总奖励。现有方法要求对由目标策略引起的分布在状态或行为或两者之间的数据分布上具有绝对连续的假设（例如，不存在不重叠的区域）。我们提出了一种新的批量RL算法，允许状态和行为空间都具有奇异性（例如，离线数据分布与目标策略引起的分布之间存在非重叠区域）在具有连续状态和行为的无限时间马尔可夫决策过程中。我们将算法命名为STEEL：奇异性感知强化学习。我们的算法受到对政策评估的新错误分析的启发，我们使用最大均值差异，结合分布鲁棒优化，来描述由可能的奇异性引起的政策评估错误，并实现模型外推。通过利用悲观主义的思想，并在一些技术条件下，我们为我们提出的算法在奇异性条件下推导了第一个有限样本遗憾保证。与现有算法相比，STEEL只需最小的数据覆盖假设，改进了批量RL的适用性和鲁棒性。此外，还提出了一个几乎无需调整的两步自适应STEEL算法。广泛的仿真研究和一个（半）真实的个性化定价实验展示了我们的方法在处理批量RL中可能的奇异性方面的卓越性能。

更新时间: 2024-06-26 03:39:39

领域: stat.ML,cs.LG,econ.EM,stat.ME

下载: http://arxiv.org/abs/2301.13152v5

Multi-Agent Imitation Learning: Value is Easy, Regret is Hard

We study a multi-agent imitation learning (MAIL) problem where we take the perspective of a learner attempting to coordinate a group of agents based on demonstrations of an expert doing so. Most prior work in MAIL essentially reduces the problem to matching the behavior of the expert within the support of the demonstrations. While doing so is sufficient to drive the value gap between the learner and the expert to zero under the assumption that agents are non-strategic, it does not guarantee robustness to deviations by strategic agents. Intuitively, this is because strategic deviations can depend on a counterfactual quantity: the coordinator's recommendations outside of the state distribution their recommendations induce. In response, we initiate the study of an alternative objective for MAIL in Markov Games we term the regret gap that explicitly accounts for potential deviations by agents in the group. We first perform an in-depth exploration of the relationship between the value and regret gaps. First, we show that while the value gap can be efficiently minimized via a direct extension of single-agent IL algorithms, even value equivalence can lead to an arbitrarily large regret gap. This implies that achieving regret equivalence is harder than achieving value equivalence in MAIL. We then provide a pair of efficient reductions to no-regret online convex optimization that are capable of minimizing the regret gap (a) under a coverage assumption on the expert (MALICE) or (b) with access to a queryable expert (BLADES).

Updated: 2024-06-26 03:39:31

标题: 多智能体模仿学习：价值容易，后悔难

摘要: 我们研究了一个多智能体模仿学习（MAIL）问题，从一个学习者的角度出发，试图基于专家的示范来协调一组智能体。在以往的MAIL工作中，主要将问题简化为在示范的支持范围内匹配专家的行为。尽管这样做足以在假设智能体不具有策略性的情况下将学习者与专家之间的价值差降为零，但并不能保证对于具有策略性的智能体的偏差具有鲁棒性。直觉上，这是因为策略性的偏差可能取决于一个反事实的数量：协调者的建议超出了他们的建议引起的状态分布。作为回应，我们开始研究了在马尔可夫博弈中MAIL的另一种目标，称之为后悔差，明确考虑了群体中智能体可能的偏差。首先，我们深入探讨了价值差和后悔差之间的关系。首先，我们表明，虽然价值差可以通过直接扩展单智能体IL算法有效地最小化，但即使价值等价也可能导致任意大的后悔差。这意味着在MAIL中实现后悔等价比实现价值等价更困难。然后，我们提供了一对能够最小化后悔差的有效减少方法，即在专家具有覆盖假设的情况下（MALICE）或者可以查询专家的情况下（BLADES）。

更新时间: 2024-06-26 03:39:31

领域: cs.LG

下载: http://arxiv.org/abs/2406.04219v2

DeCoF: Generated Video Detection via Frame Consistency: The First Benchmark Dataset

The escalating quality of video generated by advanced video generation methods results in new security challenges, while there have been few relevant research efforts: 1) There is no open-source dataset for generated video detection, 2) No generated video detection method has been proposed so far. To this end, we propose an open-source dataset and a detection method for generated video for the first time. First, we propose a scalable dataset consisting of 964 prompts, covering various forgery targets, scenes, behaviors, and actions, as well as various generation models with different architectures and generation methods, including the most popular commercial models like OpenAI's Sora and Google's Veo. Second, we found via probing experiments that spatial artifact-based detectors lack generalizability. Hence, we propose a simple yet effective \textbf{de}tection model based on \textbf{f}rame \textbf{co}nsistency (\textbf{DeCoF}), which focuses on temporal artifacts by eliminating the impact of spatial artifacts during feature learning. Extensive experiments demonstrate the efficacy of DeCoF in detecting videos generated by unseen video generation models and confirm its powerful generalizability across several commercially proprietary models. Our code and dataset will be released at \url{https://github.com/wuwuwuyue/DeCoF}.

Updated: 2024-06-26 03:32:50

标题: DeCoF：通过帧一致性生成视频检测：第一个基准数据集

摘要: 先进视频生成方法生成的视频质量不断提高，导致新的安全挑战，但相关的研究工作很少：1）没有用于生成视频检测的开源数据集，2）迄今尚未提出生成视频检测方法。为此，我们首次提出了一个开源数据集和一个用于生成视频检测的检测方法。首先，我们提出了一个可伸缩的数据集，包含964个提示，涵盖各种伪造目标、场景、行为和动作，以及具有不同架构和生成方法的各种生成模型，包括最流行的商业模型，如OpenAI的Sora和Google的Veo。其次，我们通过探测实验发现，基于空间伪影的检测器缺乏泛化能力。因此，我们提出了一种简单而有效的基于帧一致性（DeCoF）的检测模型，该模型通过在特征学习过程中消除空间伪影的影响，专注于时间伪影。大量实验表明DeCoF在检测未见过的视频生成模型生成的视频方面的有效性，并确认其在几种商业专有模型之间的强大泛化能力。我们的代码和数据集将在\url{https://github.com/wuwuwuyue/DeCoF}上发布。

更新时间: 2024-06-26 03:32:50

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2402.02085v4

MT2ST: Adaptive Multi-Task to Single-Task Learning

The conventional training approaches often face challenges in balancing the breadth of multi-task learning (MTL) with the depth of single-task learning (STL). To address this issue, we introduce the Multi-Task to Single-Task (MT2ST) framework, a groundbreaking approach that can combine the generalizability of MTL with the precision of STL. Our work include two strategies: 'Diminish' and 'Switch'. 'Diminish' Strategy will gradually reduce the influence of auxiliary tasks, while the 'Switch' strategy involves a shift from multi-tasking to single-tasking at a specific timepoint at the training process. In this paper, we propose the Multi-Task to Single-Task (MT2ST) framework, a novel approach that significantly enhances the efficiency and accuracy of word embedding training while concurrently addressing prevalent issues such as overfitting. Our empirical studies demonstrate that MT2ST can reduce training time by 67\% when contrasted with single-task learning approaches, and by 13\% compared to traditional multi-task learning methods. These findings underscore MT2ST's potential to be a powerful tools for word embedding training acceleration.

Updated: 2024-06-26 03:12:07

标题: MT2ST：自适应多任务到单任务学习

摘要: 传统的训练方法在平衡多任务学习（MTL）的广度与单任务学习（STL）的深度时经常面临挑战。为了解决这个问题，我们引入了Multi-Task to Single-Task（MT2ST）框架，这是一种开创性的方法，可以将MTL的泛化能力与STL的精度结合起来。我们的工作包括两种策略：‘减弱’和‘切换’。‘减弱’策略将逐渐减少辅助任务的影响，而‘切换’策略则涉及在训练过程的特定时间点从多任务切换到单任务。在本文中，我们提出了Multi-Task to Single-Task（MT2ST）框架，这是一种新颖的方法，显著提高了词嵌入训练的效率和准确性，同时解决了过拟合等普遍问题。我们的实证研究表明，与单任务学习方法相比，MT2ST可以将训练时间缩短67%，与传统的多任务学习方法相比，可以缩短13%。这些发现强调了MT2ST作为词嵌入训练加速的强大工具的潜力。

更新时间: 2024-06-26 03:12:07

领域: cs.LG

下载: http://arxiv.org/abs/2406.18038v1

Local Linear Recovery Guarantee of Deep Neural Networks at Overparameterization

Determining whether deep neural network (DNN) models can reliably recover target functions at overparameterization is a critical yet complex issue in the theory of deep learning. To advance understanding in this area, we introduce a concept we term "local linear recovery" (LLR), a weaker form of target function recovery that renders the problem more amenable to theoretical analysis. In the sense of LLR, we prove that functions expressible by narrower DNNs are guaranteed to be recoverable from fewer samples than model parameters. Specifically, we establish upper limits on the optimistic sample sizes, defined as the smallest sample size necessary to guarantee LLR, for functions in the space of a given DNN. Furthermore, we prove that these upper bounds are achieved in the case of two-layer tanh neural networks. Our research lays a solid groundwork for future investigations into the recovery capabilities of DNNs in overparameterized scenarios.

Updated: 2024-06-26 03:08:24

标题: 深度神经网络在过参数化情况下的局部线性恢复保证

摘要: 在深度学习理论中，确定深度神经网络（DNN）模型是否能够可靠地在超参数化时恢复目标函数是一个关键但复杂的问题。为了推进这一领域的理解，我们引入了一个名为“局部线性恢复”（LLR）的概念，这是目标函数恢复的一种较弱形式，使问题更易于理论分析。在LLR的意义上，我们证明了较窄DNN表达的函数可以从比模型参数更少的样本中恢复。具体地，我们为给定DNN空间中的函数建立了乐观样本大小的上限，定义为确保LLR所需的最小样本大小。此外，我们证明在两层tanh神经网络的情况下实现了这些上限。我们的研究为未来探讨超参数化场景中DNN恢复能力打下了坚实的基础。

更新时间: 2024-06-26 03:08:24

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2406.18035v1

Antigen-Specific Antibody Design via Direct Energy-based Preference Optimization

Antibody design, a crucial task with significant implications across various disciplines such as therapeutics and biology, presents considerable challenges due to its intricate nature. In this paper, we tackle antigen-specific antibody sequence-structure co-design as an optimization problem towards specific preferences, considering both rationality and functionality. Leveraging a pre-trained conditional diffusion model that jointly models sequences and structures of antibodies with equivariant neural networks, we propose direct energy-based preference optimization to guide the generation of antibodies with both rational structures and considerable binding affinities to given antigens. Our method involves fine-tuning the pre-trained diffusion model using a residue-level decomposed energy preference. Additionally, we employ gradient surgery to address conflicts between various types of energy, such as attraction and repulsion. Experiments on RAbD benchmark show that our approach effectively optimizes the energy of generated antibodies and achieves state-of-the-art performance in designing high-quality antibodies with low total energy and high binding affinity simultaneously, demonstrating the superiority of our approach.

Updated: 2024-06-26 03:06:42

标题: 通过直接基于能量的偏好优化设计特异性抗体

摘要: 抗体设计是一个具有重要意义的关键任务，涉及到各个领域，如治疗和生物学，在其复杂性的影响下面临着相当大的挑战。本文中，我们将特异性抗体序列-结构共同设计作为一个优化问题来考虑特定偏好，考虑到理性和功能性。利用一个预训练的条件扩散模型，该模型同时模拟抗体的序列和结构，配合等变神经网络，我们提出了直接基于能量的偏好优化方法，以引导生成具有理性结构和对给定抗原具有相当结合亲和力的抗体。我们的方法涉及使用残基级别分解的能量偏好来微调预训练的扩散模型。此外，我们采用梯度手术来解决各种类型能量之间的冲突，比如吸引和排斥。在RAbD基准测试中的实验表明，我们的方法有效地优化了生成抗体的能量，并在同时设计低总能量和高结合亲和力的高质量抗体方面取得了最先进的表现，展示了我们方法的优越性。

更新时间: 2024-06-26 03:06:42

领域: q-bio.BM,cs.LG

下载: http://arxiv.org/abs/2403.16576v2

Slot State Space Models

Recent State Space Models (SSMs) such as S4, S5, and Mamba have shown remarkable computational benefits in long-range temporal dependency modeling. However, in many sequence modeling problems, the underlying process is inherently modular and it is of interest to have inductive biases that mimic this modular structure. In this paper, we introduce SlotSSMs, a novel framework for incorporating independent mechanisms into SSMs to preserve or encourage separation of information. Unlike conventional SSMs that maintain a monolithic state vector, SlotSSMs maintains the state as a collection of multiple vectors called slots. Crucially, the state transitions are performed independently per slot with sparse interactions across slots implemented via the bottleneck of self-attention. In experiments, we evaluate our model in object-centric video understanding, 3D visual reasoning, and video prediction tasks, which involve modeling multiple objects and their long-range temporal dependencies. We find that our proposed design offers substantial performance gains over existing sequence modeling methods.

Updated: 2024-06-26 03:04:04

标题: 插槽状态空间模型

摘要: 最近的状态空间模型（SSMs）如S4、S5和Mamba在建模长期时间依赖性方面展现出显著的计算优势。然而，在许多序列建模问题中，底层过程本质上是模块化的，有兴趣引入归纳偏差以模仿这种模块化结构。在本文中，我们介绍了SlotSSMs，这是一个新颖的框架，用于将独立机制纳入SSMs中以保留或鼓励信息的分离。与维护单一状态向量的传统SSMs不同，SlotSSMs将状态维护为称为slot的多个向量集合。关键是，状态转换是针对每个slot独立执行的，跨slot的稀疏交互通过自注意力的瓶颈实现。在实验中，我们评估了我们的模型在以对象为中心的视频理解、3D视觉推理和视频预测任务中的表现，这些任务涉及建模多个对象及其长期时间依赖性。我们发现我们提出的设计相比现有的序列建模方法提供了显著的性能增益。

更新时间: 2024-06-26 03:04:04

领域: cs.AI

下载: http://arxiv.org/abs/2406.12272v3

Boosting Soft Q-Learning by Bounding

An agent's ability to leverage past experience is critical for efficiently solving new tasks. Prior work has focused on using value function estimates to obtain zero-shot approximations for solutions to a new task. In soft Q-learning, we show how any value function estimate can also be used to derive double-sided bounds on the optimal value function. The derived bounds lead to new approaches for boosting training performance which we validate experimentally. Notably, we find that the proposed framework suggests an alternative method for updating the Q-function, leading to boosted performance.

Updated: 2024-06-26 03:02:22

标题: 通过限制来增强软 Q 学习

摘要: 一个代理人利用过去经验的能力对于有效解决新任务至关重要。先前的研究集中在使用值函数估计来获取新任务解决方案的零射击近似。在软Q学习中，我们展示了如何使用任何值函数估计来推导最优值函数的双边界限。推导出的界限导致了提高训练性能的新方法，我们通过实验证实了这一点。值得注意的是，我们发现提出的框架提出了一种更新Q函数的替代方法，从而提高了性能。

更新时间: 2024-06-26 03:02:22

领域: cs.LG,cs.AI,stat.ML

下载: http://arxiv.org/abs/2406.18033v1

A Communication Satellite Servises Based Decentralized Network Protocol

In this paper, we present a decentralized network protocol, Space Network Protocol, based on Communication Satellite Services. The protocol outlines a method for distributing information about the status of satellite communication services across the entire blockchain network, facilitating fairness and transparency in all communication services. Our primary objective is to standardize the services delivered by all satellite networks under the communication satellite protocol. This standard remains intact regardless of potential unreliability associated with the satellites or the terminal hardware. We proposed PoD (Proof of Distribution) to verify if the communication satellites are online and PoF (Proof of Flow) to authenticate the actual data flow provided by the communication satellites. In addition, we also proposed PoM (Proof of Mesh) to verify if the communication satellites have successfully meshed together. Utilizing zero-knowledge proof and multi-party cryptographic computations, we can evaluate the service provisioning parameters of each satellite, even in the presence of potential terminal or network node fraud. This method offers technical support for the modeling of distributed network services.

Updated: 2024-06-26 03:01:40

标题: 一种基于通信卫星服务的去中心化网络协议

摘要: 在本文中，我们提出了一种基于通信卫星服务的分散式网络协议——Space Network Protocol。该协议概述了一种在整个区块链网络中分发卫星通信服务状态信息的方法，促进了所有通信服务的公平性和透明性。我们的主要目标是将所有卫星网络提供的服务标准化到通信卫星协议下。这个标准保持不变，无论卫星或终端硬件可能存在的不可靠性。我们提出了PoD（分发证明）来验证通信卫星是否在线，以及PoF（流量证明）来验证通信卫星提供的实际数据流。此外，我们还提出了PoM（网状证明）来验证通信卫星是否成功网状连接。通过利用零知识证明和多方加密计算，我们可以评估每颗卫星的服务提供参数，即使终端或网络节点可能存在欺诈行为。这种方法为分布式网络服务建模提供了技术支持。

更新时间: 2024-06-26 03:01:40

领域: cs.CR,cs.DC,cs.NI

下载: http://arxiv.org/abs/2406.18032v1

GCondenser: Benchmarking Graph Condensation

Large-scale graphs are valuable for graph representation learning, yet the abundant data in these graphs hinders the efficiency of the training process. Graph condensation (GC) alleviates this issue by compressing the large graph into a significantly smaller one that still supports effective model training. Although recent research has introduced various approaches to improve the effectiveness of the condensed graph, comprehensive and practical evaluations across different GC methods are neglected. This paper proposes the first large-scale graph condensation benchmark, GCondenser, to holistically evaluate and compare mainstream GC methods. GCondenser includes a standardised GC paradigm, consisting of condensation, validation, and evaluation procedures, as well as enabling extensions to new GC methods and datasets. With GCondenser, a comprehensive performance study is conducted, presenting the effectiveness of existing methods. GCondenser is open-sourced and available at https://github.com/superallen13/GCondenser.

Updated: 2024-06-26 03:00:04

标题: GCondenser：图压缩的基准测试

摘要: 大规模图对图表示学习非常有价值，然而这些图中的丰富数据阻碍了训练过程的效率。图浓缩（GC）通过将大图压缩成一个显著更小但仍支持有效模型训练的图，从而缓解了这个问题。尽管最近的研究引入了各种方法来改善浓缩图的有效性，但对不同GC方法进行全面和实用的评估被忽视了。本文提出了第一个大规模图浓缩基准测试，GCondenser，以全面评估和比较主流的GC方法。GCondenser包括一个标准化的GC范式，包括浓缩、验证和评估程序，同时还支持对新的GC方法和数据集的扩展。通过GCondenser进行了全面的性能研究，展示了现有方法的有效性。GCondenser是开源的，可在https://github.com/superallen13/GCondenser上找到。

更新时间: 2024-06-26 03:00:04

领域: cs.LG

下载: http://arxiv.org/abs/2405.14246v2

Automated Clinical Data Extraction with Knowledge Conditioned LLMs

The extraction of lung lesion information from clinical and medical imaging reports is crucial for research on and clinical care of lung-related diseases. Large language models (LLMs) can be effective at interpreting unstructured text in reports, but they often hallucinate due to a lack of domain-specific knowledge, leading to reduced accuracy and posing challenges for use in clinical settings. To address this, we propose a novel framework that aligns generated internal knowledge with external knowledge through in-context learning (ICL). Our framework employs a retriever to identify relevant units of internal or external knowledge and a grader to evaluate the truthfulness and helpfulness of the retrieved internal-knowledge rules, to align and update the knowledge bases. Our knowledge-conditioned approach also improves the accuracy and reliability of LLM outputs by addressing the extraction task in two stages: (i) lung lesion finding detection and primary structured field parsing, followed by (ii) further parsing of lesion description text into additional structured fields. Experiments with expert-curated test datasets demonstrate that this ICL approach can increase the F1 score for key fields (lesion size, margin and solidity) by an average of 12.9% over existing ICL methods.

Updated: 2024-06-26 02:49:28

标题: 使用知识条件的LLMs自动化临床数据提取

摘要: 从临床和医学影像报告中提取肺部病变信息对于肺部相关疾病的研究和临床护理至关重要。大型语言模型（LLMs）可以有效解释报告中的非结构化文本，但由于缺乏领域特定知识，它们经常产生幻觉，导致准确性降低，并在临床环境中使用时面临挑战。为了解决这个问题，我们提出了一个新颖的框架，通过上下文学习（ICL）将生成的内部知识与外部知识对齐。我们的框架使用检索器识别相关的内部或外部知识单元，并使用评分器评估检索到的内部知识规则的真实性和帮助性，以对齐和更新知识库。我们的知识条件方法还通过在两个阶段处理提取任务来提高LLM输出的准确性和可靠性：（i）肺部病变发现检测和主要结构化字段解析，然后（ii）进一步将病变描述文本解析为额外的结构化字段。对专家策划的测试数据集进行的实验表明，这种ICL方法可以将关键字段（病变大小、边缘和固体度）的F1分数平均提高12.9%，超过现有的ICL方法。

更新时间: 2024-06-26 02:49:28

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2406.18027v1

Reinforcement Learning from Delayed Observations via World Models

In standard reinforcement learning settings, agents typically assume immediate feedback about the effects of their actions after taking them. However, in practice, this assumption may not hold true due to physical constraints and can significantly impact the performance of learning algorithms. In this paper, we address observation delays in partially observable environments. We propose leveraging world models, which have shown success in integrating past observations and learning dynamics, to handle observation delays. By reducing delayed POMDPs to delayed MDPs with world models, our methods can effectively handle partial observability, where existing approaches achieve sub-optimal performance or degrade quickly as observability decreases. Experiments suggest that one of our methods can outperform a naive model-based approach by up to 250%. Moreover, we evaluate our methods on visual delayed environments, for the first time showcasing delay-aware reinforcement learning continuous control with visual observations.

Updated: 2024-06-26 02:44:18

标题: 通过世界模型从延迟观察中进行强化学习

摘要: 在标准的强化学习设置中，代理通常假设在采取行动后立即获得有关其行动效果的反馈。然而，在实践中，由于物理限制，这种假设可能不成立，可能会显著影响学习算法的性能。在本文中，我们解决了部分可观察环境中的观察延迟问题。我们提出利用已经在整合过去观察和学习动态方面取得成功的世界模型来处理观察延迟。通过将延迟的部分可观察马尔可夫决策过程（POMDPs）转化为具有世界模型的延迟的马尔可夫决策过程（MDPs），我们的方法可以有效处理部分可观察性，其中现有方法在可观察性减少时达到次优性能或迅速退化。实验表明，我们的一种方法可以比天真的基于模型的方法提高高达250％的性能。此外，我们首次在视觉延迟环境中评估我们的方法，展示了具有视觉观察的延迟感知强化学习连续控制。

更新时间: 2024-06-26 02:44:18

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2403.12309v2

AutoOPE: Automated Off-Policy Estimator Selection

The Off-Policy Evaluation (OPE) problem consists of evaluating the performance of counterfactual policies with data collected by another one. This problem is of utmost importance for various application domains, e.g., recommendation systems, medical treatments, and many others. To solve the OPE problem, we resort to estimators, which aim to estimate in the most accurate way possible the performance that the counterfactual policies would have had if they were deployed in place of the logging policy. In the literature, several estimators have been developed, all with different characteristics and theoretical guarantees. Therefore, there is no dominant estimator, and each estimator may be the best one for different OPE problems, depending on the characteristics of the dataset at hand. While the selection of the estimator is a crucial choice for an accurate OPE, this problem has been widely overlooked in the literature. We propose an automated data-driven OPE estimator selection method based on machine learning. In particular, the core idea we propose in this paper is to create several synthetic OPE tasks and use a machine learning model trained to predict the best estimator for those synthetic tasks. We empirically show how our method is able to generalize to unseen tasks and make a better estimator selection compared to a baseline method on several real-world datasets, with a computational cost significantly lower than the one of the baseline.

Updated: 2024-06-26 02:34:48

标题: AutoOPE：自动化的离线策略评估器选择

摘要: 离线策略评估（OPE）问题包括使用由另一个策略收集的数据来评估反事实政策的表现。这个问题对各种应用领域非常重要，例如推荐系统、医疗治疗等。为了解决OPE问题，我们借助估计器，旨在以尽可能准确的方式估计反事实政策在替换记录政策的情况下可能会有的表现。在文献中，已经开发了几种估计器，具有不同的特征和理论保证。因此，没有一个主导估计器，每个估计器可能对于不同的OPE问题是最佳的，这取决于手头数据集的特征。虽然选择估计器对于准确的OPE是一个关键选择，但这个问题在文献中被广泛忽视。我们提出一种基于机器学习的自动化数据驱动OPE估计器选择方法。特别是，我们在本文提出的核心思想是创建几个合成的OPE任务，并使用训练好的机器学习模型来预测这些合成任务的最佳估计器。我们在实证中展示了我们的方法如何能够推广到看不见的任务，并在几个真实世界数据集上进行更好的估计器选择，其计算成本明显低于基线方法。

更新时间: 2024-06-26 02:34:48

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2406.18022v1

SC-MoE: Switch Conformer Mixture of Experts for Unified Streaming and Non-streaming Code-Switching ASR

In this work, we propose a Switch-Conformer-based MoE system named SC-MoE for unified streaming and non-streaming code-switching (CS) automatic speech recognition (ASR), where we design a streaming MoE layer consisting of three language experts, which correspond to Mandarin, English, and blank, respectively, and equipped with a language identification (LID) network with a Connectionist Temporal Classification (CTC) loss as a router in the encoder of SC-MoE to achieve a real-time streaming CS ASR system. To further utilize the language information embedded in text, we also incorporate MoE layers into the decoder of SC-MoE. In addition, we introduce routers into every MoE layer of the encoder and the decoder and achieve better recognition performance. Experimental results show that the SC-MoE significantly improves CS ASR performances over baseline with comparable computational efficiency.

Updated: 2024-06-26 02:32:59

标题: SC-MoE:切换型构象混合专家用于统一的流式和非流式代码切换ASR

摘要: 在这项工作中，我们提出了一种基于开关构象的MoE系统，命名为SC-MoE，用于统一流式和非流式代码切换（CS）自动语音识别（ASR），我们设计了一个流式MoE层，包括三个语言专家，分别对应普通话、英语和空白，并配备了一个具有Connectionist Temporal Classification（CTC）损失的语言识别（LID）网络作为SC-MoE编码器中的路由器，以实现实时流式CS ASR系统。为了进一步利用文本中嵌入的语言信息，我们还将MoE层合并到SC-MoE的解码器中。此外，我们在编码器和解码器的每个MoE层中引入路由器，以实现更好的识别性能。实验结果表明，与基线相比，SC-MoE显着提高了CS ASR的性能，同时具有可比较的计算效率。

更新时间: 2024-06-26 02:32:59

领域: cs.SD,cs.LG,eess.AS

下载: http://arxiv.org/abs/2406.18021v1

Stabilizing Policy Gradients for Stochastic Differential Equations via Consistency with Perturbation Process

Considering generating samples with high rewards, we focus on optimizing deep neural networks parameterized stochastic differential equations (SDEs), the advanced generative models with high expressiveness, with policy gradient, the leading algorithm in reinforcement learning. Nevertheless, when applying policy gradients to SDEs, since the policy gradient is estimated on a finite set of trajectories, it can be ill-defined, and the policy behavior in data-scarce regions may be uncontrolled. This challenge compromises the stability of policy gradients and negatively impacts sample complexity. To address these issues, we propose constraining the SDE to be consistent with its associated perturbation process. Since the perturbation process covers the entire space and is easy to sample, we can mitigate the aforementioned problems. Our framework offers a general approach allowing for a versatile selection of policy gradient methods to effectively and efficiently train SDEs. We evaluate our algorithm on the task of structure-based drug design and optimize the binding affinity of generated ligand molecules. Our method achieves the best Vina score -9.07 on the CrossDocked2020 dataset.

Updated: 2024-06-26 02:28:07

标题: 通过与摄动过程的一致性稳定随机微分方程的政策梯度

摘要: 考虑生成高奖励样本，我们专注于优化深度神经网络参数化的随机微分方程（SDEs），这是高表现力的先进生成模型，使用强化学习中的主要算法-策略梯度。然而，当将策略梯度应用于SDEs时，由于策略梯度是在有限的轨迹集上估计的，它可能是不明确的，并且数据稀缺区域的策略行为可能无法控制。这种挑战影响了策略梯度的稳定性，并对样本复杂度产生了负面影响。为了解决这些问题，我们提出将SDE约束为与其相关的扰动过程保持一致。由于扰动过程覆盖了整个空间并且易于抽样，我们可以缓解上述问题。我们的框架提供了一种通用方法，允许灵活选择策略梯度方法来有效地训练SDEs。我们在基于结构的药物设计任务上评估了我们的算法，并优化了生成的配体分子的结合亲和力。我们的方法在CrossDocked2020数据集上实现了最佳的Vina分数-9.07。

更新时间: 2024-06-26 02:28:07

领域: cs.LG

下载: http://arxiv.org/abs/2403.04154v2

MolFusion: Multimodal Fusion Learning for Molecular Representations via Multi-granularity Views

Artificial Intelligence predicts drug properties by encoding drug molecules, aiding in the rapid screening of candidates. Different molecular representations, such as SMILES and molecule graphs, contain complementary information for molecular encoding. Thus exploiting complementary information from different molecular representations is one of the research priorities in molecular encoding. Most existing methods for combining molecular multi-modalities only use molecular-level information, making it hard to encode intra-molecular alignment information between different modalities. To address this issue, we propose a multi-granularity fusion method that is MolFusion. The proposed MolFusion consists of two key components: (1) MolSim, a molecular-level encoding component that achieves molecular-level alignment between different molecular representations. and (2) AtomAlign, an atomic-level encoding component that achieves atomic-level alignment between different molecular representations. Experimental results show that MolFusion effectively utilizes complementary multimodal information, leading to significant improvements in performance across various classification and regression tasks.

Updated: 2024-06-26 02:26:50

标题: MolFusion: 多模态融合学习，通过多粒度视图实现分子表示

摘要: 人工智能通过对药物分子进行编码来预测药物属性，有助于快速筛选候选药物。不同的分子表示，如SMILES和分子图，包含了分子编码的互补信息。因此，利用不同分子表示的互补信息是分子编码研究的重点之一。大多数现有的结合分子多模态的方法只使用分子级别信息，这使得难以编码不同模态之间的分子内对齐信息。为解决这一问题，我们提出了一种多粒度融合方法，即MolFusion。该MolFusion由两个关键组成部分组成：（1）MolSim，一个分子级别编码组件，实现不同分子表示之间的分子级别对齐；和（2）AtomAlign，一个原子级别编码组件，实现不同分子表示之间的原子级别对齐。实验结果表明，MolFusion有效地利用了互补的多模态信息，在各种分类和回归任务中显著提高了性能。

更新时间: 2024-06-26 02:26:50

领域: cs.LG,cs.AI,physics.chem-ph

下载: http://arxiv.org/abs/2406.18020v1

Key-Element-Informed sLLM Tuning for Document Summarization

Remarkable advances in large language models (LLMs) have enabled high-quality text summarization. However, this capability is currently accessible only through LLMs of substantial size or proprietary LLMs with usage fees. In response, smaller-scale LLMs (sLLMs) of easy accessibility and low costs have been extensively studied, yet they often suffer from missing key information and entities, i.e., low relevance, in particular, when input documents are long. We hence propose a key-element-informed instruction tuning for summarization, so-called KEITSum, which identifies key elements in documents and instructs sLLM to generate summaries capturing these key elements. Experimental results on dialogue and news datasets demonstrate that sLLM with KEITSum indeed provides high-quality summarization with higher relevance and less hallucinations, competitive to proprietary LLM.

Updated: 2024-06-26 02:22:11

标题: 关键元素指导的sLLM摘要调优

摘要: 大型语言模型(LLMs)的显著进展已经实现了高质量的文本摘要。然而，目前这种能力只能通过规模庞大的LLMs或需付费使用的专有LLMs来实现。作为回应，研究人员广泛研究了易于获取和成本低廉的小规模LLMs(sLLMs)，然而它们往往存在关键信息和实体缺失的问题，即低相关性，特别是在输入文档很长的情况下。因此，我们提出了一种基于关键元素的指导调整方法，名为KEITSum，该方法识别文档中的关键元素，并指导sLLM生成捕捉这些关键元素的摘要。对对话和新闻数据集的实验结果表明，使用KEITSum的sLLM确实提供了高质量的摘要，具有更高的相关性和更少的幻觉，与专有LLM相媲美。

更新时间: 2024-06-26 02:22:11

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2406.04625v2

JailbreakZoo: Survey, Landscapes, and Horizons in Jailbreaking Large Language and Vision-Language Models

The rapid evolution of artificial intelligence (AI) through developments in Large Language Models (LLMs) and Vision-Language Models (VLMs) has brought significant advancements across various technological domains. While these models enhance capabilities in natural language processing and visual interactive tasks, their growing adoption raises critical concerns regarding security and ethical alignment. This survey provides an extensive review of the emerging field of jailbreaking--deliberately circumventing the ethical and operational boundaries of LLMs and VLMs--and the consequent development of defense mechanisms. Our study categorizes jailbreaks into seven distinct types and elaborates on defense strategies that address these vulnerabilities. Through this comprehensive examination, we identify research gaps and propose directions for future studies to enhance the security frameworks of LLMs and VLMs. Our findings underscore the necessity for a unified perspective that integrates both jailbreak strategies and defensive solutions to foster a robust, secure, and reliable environment for the next generation of language models. More details can be found on our website: \url{https://chonghan-chen.com/llm-jailbreak-zoo-survey/}.

Updated: 2024-06-26 02:20:23

标题: 越狱动物园：对大型语言和视觉-语言模型进行越狱的调查、景观和展望

摘要: 人工智能（AI）通过大型语言模型（LLMs）和视觉语言模型（VLMs）的发展迅速演进，为各种技术领域带来了显著进步。虽然这些模型增强了自然语言处理和视觉交互任务的能力，但它们日益普及也引发了关于安全和道德一致性的重要关切。本调查对越来越受关注的越狱领域进行了广泛审查，即故意绕过LLMs和VLMs的道德和运行边界，以及随之发展的防御机制。我们的研究将越狱分为七种不同类型，并详细阐述了解决这些漏洞的防御策略。通过这一全面的研究，我们确定了研究领域的差距，并提出了未来研究的方向，以增强LLMs和VLMs的安全框架。我们的发现强调了统一视角的必要性，即整合越狱策略和防御解决方案，以促进下一代语言模型的健壮、安全和可靠的环境。更多详细信息请访问我们的网站：https://chonghan-chen.com/llm-jailbreak-zoo-survey/。

更新时间: 2024-06-26 02:20:23

领域: cs.CL,cs.CR,cs.CV,cs.LG

下载: http://arxiv.org/abs/2407.01599v1

The Responsible Foundation Model Development Cheatsheet: A Review of Tools & Resources

Foundation model development attracts a rapidly expanding body of contributors, scientists, and applications. To help shape responsible development practices, we introduce the Foundation Model Development Cheatsheet: a growing collection of 250+ tools and resources spanning text, vision, and speech modalities. We draw on a large body of prior work to survey resources (e.g. software, documentation, frameworks, guides, and practical tools) that support informed data selection, processing, and understanding, precise and limitation-aware artifact documentation, efficient model training, advance awareness of the environmental impact from training, careful model evaluation of capabilities, risks, and claims, as well as responsible model release, licensing and deployment practices. We hope this curated collection of resources helps guide more responsible development. The process of curating this list, enabled us to review the AI development ecosystem, revealing what tools are critically missing, misused, or over-used in existing practices. We find that (i) tools for data sourcing, model evaluation, and monitoring are critically under-serving ethical and real-world needs, (ii) evaluations for model safety, capabilities, and environmental impact all lack reproducibility and transparency, (iii) text and particularly English-centric analyses continue to dominate over multilingual and multi-modal analyses, and (iv) evaluation of systems, rather than just models, is needed so that capabilities and impact are assessed in context.

Updated: 2024-06-26 02:19:01

标题: 负责任的基金会模型开发速查表：工具与资源评价

摘要: 基金会模型开发吸引了一个迅速扩大的贡献者、科学家和应用程序群体。为了帮助塑造负责任的开发实践，我们引入了基金会模型开发速查表：这是一个不断增长的资源集合，涵盖文本、视觉和语音模式的250多种工具和资源。我们借鉴了大量先前工作，调查了支持知情数据选择、处理和理解、精确和限制意识文档化、高效模型训练、从培训中了解环境影响、仔细评估模型能力、风险和声明，以及负责任的模型发布、许可和部署实践的资源（例如软件、文档、框架、指南和实用工具）。我们希望这个精心策划的资源集合能够指导更负责任的开发。策划这个清单的过程使我们能够审查人工智能开发生态系统，揭示现有实践中哪些工具关键缺失、被误用或过度使用。我们发现（i）数据采集、模型评估和监控工具在道德和现实需求方面严重不足，（ii）模型安全性、能力和环境影响的评估都缺乏可重复性和透明度，（iii）文本尤其是以英语为中心的分析仍然主导多语言和多模态分析，（iv）需要评估系统而不仅仅是模型，以便能力和影响在上下文中得到评估。

更新时间: 2024-06-26 02:19:01

领域: cs.LG,cs.AI,cs.CL

下载: http://arxiv.org/abs/2406.16746v2

MgNO: Efficient Parameterization of Linear Operators via Multigrid

In this work, we propose a concise neural operator architecture for operator learning. Drawing an analogy with a conventional fully connected neural network, we define the neural operator as follows: the output of the $i$-th neuron in a nonlinear operator layer is defined by $O_i(u) = \sigma\left( \sum_j W_{ij} u + B_{ij}\right)$. Here, $ W_{ij}$ denotes the bounded linear operator connecting $j$-th input neuron to $i$-th output neuron, and the bias $ B_{ij}$ takes the form of a function rather than a scalar. Given its new universal approximation property, the efficient parameterization of the bounded linear operators between two neurons (Banach spaces) plays a critical role. As a result, we introduce MgNO, utilizing multigrid structures to parameterize these linear operators between neurons. This approach offers both mathematical rigor and practical expressivity. Additionally, MgNO obviates the need for conventional lifting and projecting operators typically required in previous neural operators. Moreover, it seamlessly accommodates diverse boundary conditions. Our empirical observations reveal that MgNO exhibits superior ease of training compared to other CNN-based models, while also displaying a reduced susceptibility to overfitting when contrasted with spectral-type neural operators. We demonstrate the efficiency and accuracy of our method with consistently state-of-the-art performance on different types of partial differential equations (PDEs).

Updated: 2024-06-26 02:00:14

标题: MgNO：通过多重网格有效参数化线性算子

摘要: 在这项工作中，我们提出了一种简洁的神经算子架构用于算子学习。通过将其与传统的全连接神经网络进行类比，我们将神经算子定义如下：非线性算子层中第$i$个神经元的输出由$O_i(u) = \sigma\left(\sum_j W_{ij} u + B_{ij}\right)$定义。这里，$W_{ij}$表示连接第$j$个输入神经元和第$i$个输出神经元的有界线性算子，而偏置$B_{ij}$采用函数形式而非标量形式。由于其新的通用逼近性质，有界线性算子在两个神经元（巴拿赫空间）之间的高效参数化起着关键作用。因此，我们引入了MgNO，利用多重网格结构来参数化这些神经元之间的线性算子。这种方法既具有数学严谨性又具有实际表达性。此外，MgNO消除了以往神经算子中通常需要的传统升降算子。此外，它还可以无缝地适应各种边界条件。我们的经验观察表明，与其他基于CNN的模型相比，MgNO在训练时更容易，同时与光谱类型的神经算子相比，其过拟合的倾向也减少了。我们通过在不同类型的偏微分方程（PDEs）上始终表现出最先进的性能，展示了我们方法的效率和准确性。

更新时间: 2024-06-26 02:00:14

领域: cs.LG,cs.NA,math.NA

下载: http://arxiv.org/abs/2310.19809v3

Multimodal Physiological Signals Representation Learning via Multiscale Contrasting for Depression Recognition

Depression recognition based on physiological signals such as functional near-infrared spectroscopy (fNIRS) and electroencephalogram (EEG) has made considerable progress. However, most existing studies ignore the complementarity and semantic consistency of multimodal physiological signals under the same stimulation task in complex spatio-temporal patterns. In this paper, we introduce a multimodal physiological signals representation learning framework using Siamese architecture via multiscale contrasting for depression recognition (MRLMC). First, fNIRS and EEG are transformed into different but correlated data based on a time-domain data augmentation strategy. Then, we design a spatio-temporal contrasting module to learn the representation of fNIRS and EEG through weight-sharing multiscale spatio-temporal convolution. Furthermore, to enhance the learning of semantic representation associated with stimulation tasks, a semantic consistency contrast module is proposed, aiming to maximize the semantic similarity of fNIRS and EEG. Extensive experiments on publicly available and self-collected multimodal physiological signals datasets indicate that MRLMC outperforms the state-of-the-art models. Moreover, our proposed framework is capable of transferring to multimodal time series downstream tasks.

Updated: 2024-06-26 01:54:51

标题: 使用多尺度对比进行的多模态生理信号表示学习用于抑郁症识别

摘要: 基于功能性近红外光谱（fNIRS）和脑电图（EEG）等生理信号的抑郁症识别取得了显著进展。然而，大多数现有研究忽略了相同刺激任务下多模态生理信号的互补性和语义一致性在复杂时空模式中。在本文中，我们提出了一种利用Siamese架构通过多尺度对比进行抑郁症识别的多模态生理信号表示学习框架（MRLMC）。首先，基于时间域数据增强策略，将fNIRS和EEG转换为不同但相关的数据。然后，我们设计了一个时空对比模块，通过共享权重的多尺度时空卷积来学习fNIRS和EEG的表示。此外，为增强与刺激任务相关的语义表示学习，提出了一个语义一致性对比模块，旨在最大化fNIRS和EEG的语义相似性。对公开可用和自收集的多模态生理信号数据集进行了大量实验，结果表明MRLMC优于最先进的模型。此外，我们提出的框架能够应用于多模态时间序列的下游任务。

更新时间: 2024-06-26 01:54:51

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2406.16968v2

View-Invariant Pixelwise Anomaly Detection in Multi-object Scenes with Adaptive View Synthesis

The inspection and monitoring of infrastructure assets typically requires identifying visual anomalies in scenes periodically photographed over time. Images collected manually or with robots such as unmanned aerial vehicles from the same scene at different instances in time are typically not perfectly aligned. Supervised segmentation methods can be applied to identify known problems, but unsupervised anomaly detection approaches are required when unknown anomalies occur. Current unsupervised pixel-level anomaly detection methods have mainly been developed for industrial settings where the camera position is known and constant. However, we find that these methods fail to generalize to the case when images are not perfectly aligned. We term the problem of unsupervised anomaly detection between two such imperfectly aligned sets of images as Scene Anomaly Detection (Scene AD). We present a novel network termed OmniAD to address the Scene AD problem posed. Specifically, we refine the anomaly detection method reverse distillation to achieve a 40% increase in pixel-level anomaly detection performance. The network's performance is further demonstrated to improve with two new data augmentation strategies proposed that leverage novel view synthesis and camera localization to improve generalization. We validate our approach with qualitative and quantitative results on a new dataset, ToyCity, the first Scene AD dataset with multiple objects, as well as on the established single object-centric dataset, MAD. https://drags99.github.io/OmniAD/

Updated: 2024-06-26 01:54:10

标题: 在多物体场景中具有自适应视图合成的视图不变像素级异常检测

摘要: 基础设施资产的检查和监测通常需要在不同时间拍摄的场景中识别视觉异常。手动收集的图像或使用无人机等机器人从不同时间点拍摄的同一场景通常不完全对齐。监督分割方法可用于识别已知问题，但当出现未知异常时，则需要无监督异常检测方法。目前的无监督像素级异常检测方法主要针对相机位置已知且恒定的工业环境而开发。然而，我们发现这些方法在图像不完全对齐的情况下无法推广。我们将两组图像之间的无监督异常检测问题称为场景异常检测（Scene AD）。我们提出了一种新颖的网络OmniAD来解决提出的场景AD问题。具体来说，我们通过改进异常检测方法逆向蒸馏，实现了像素级异常检测性能的40％增加。我们进一步证明，该网络的性能可以通过两种新的数据增强策略进一步提高，这些策略利用新颖的视图合成和相机定位来提高泛化性能。我们通过定性和定量结果验证了我们的方法，包括一个新的数据集ToyCity，这是第一个具有多个对象的场景AD数据集，以及已建立的单个对象中心数据集MAD。https://drags99.github.io/OmniAD/

更新时间: 2024-06-26 01:54:10

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2406.18012v1

Iterative Reasoning Preference Optimization

Iterative preference optimization methods have recently been shown to perform well for general instruction tuning tasks, but typically make little improvement on reasoning tasks (Yuan et al., 2024, Chen et al., 2024). In this work we develop an iterative approach that optimizes the preference between competing generated Chain-of-Thought (CoT) candidates by optimizing for winning vs. losing reasoning steps that lead to the correct answer. We train using a modified DPO loss (Rafailov et al., 2023) with an additional negative log-likelihood term, which we find to be crucial. We show reasoning improves across repeated iterations of this scheme. While only relying on examples in the training set, our approach results in increasing accuracy on GSM8K, MATH, and ARC-Challenge for Llama-2-70B-Chat, outperforming other Llama-2-based models not relying on additionally sourced datasets. For example, we see a large improvement from 55.6% to 81.6% on GSM8K and an accuracy of 88.7% with majority voting out of 32 samples.

Updated: 2024-06-26 01:28:35

标题: 迭代推理偏好优化

摘要: 最近已经展示了迭代偏好优化方法在一般指导调整任务中表现良好，但通常对推理任务的改进较小（Yuan等，2024年，Chen等，2024年）。在这项工作中，我们开发了一种迭代方法，通过优化导致正确答案的获胜vs.失去推理步骤之间的偏好，优化竞争生成的思维链（CoT）候选者之间的偏好。我们使用修改后的DPO损失（Rafailov等，2023年）进行训练，其中包含一个额外的负对数似然项，我们发现这一点至关重要。我们展示了在这种方案的重复迭代中推理得到了改进。虽然仅依赖于训练集中的示例，我们的方法在Llama-2-70B-Chat上在GSM8K、MATH和ARC-Challenge上的准确性逐渐提高，胜过其他不依赖于额外数据集的基于Llama-2的模型。例如，我们看到GSM8K的准确率从55.6％提高到81.6％，在32个样本中通过多数投票获得88.7％的准确率。

更新时间: 2024-06-26 01:28:35

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2404.19733v3

Enhancing Explainability of Knowledge Learning Paths: Causal Knowledge Networks

A reliable knowledge structure is a prerequisite for building effective adaptive learning systems and intelligent tutoring systems. Pursuing an explainable and trustworthy knowledge structure, we propose a method for constructing causal knowledge networks. This approach leverages Bayesian networks as a foundation and incorporates causal relationship analysis to derive a causal network. Additionally, we introduce a dependable knowledge-learning path recommendation technique built upon this framework, improving teaching and learning quality while maintaining transparency in the decision-making process.

Updated: 2024-06-26 01:25:44

标题: 提升知识学习路径的可解释性：因果知识网络

摘要: 建立有效的自适应学习系统和智能辅导系统的先决条件是可靠的知识结构。为追求可解释和可信赖的知识结构，我们提出了一种构建因果知识网络的方法。该方法利用贝叶斯网络作为基础，并结合因果关系分析来推导因果网络。此外，我们还介绍了一种建立在这一框架上的可信赖的知识学习路径推荐技术，提高教学和学习质量，同时保持决策过程的透明度。

更新时间: 2024-06-26 01:25:44

领域: cs.AI,cs.SI

下载: http://arxiv.org/abs/2406.17518v2

Decoding with Limited Teacher Supervision Requires Understanding When to Trust the Teacher

How can sLLMs efficiently utilize the supervision of LLMs to improve their generative quality? This question has been well studied in scenarios where there is no restriction on the number of LLM supervisions one can use, giving birth to many decoding algorithms that utilize supervision without further training. However, it is still unclear what is an effective strategy under the limited supervision scenario, where we assume that no more than a few tokens can be generated by LLMs. To this end, we develop an algorithm to effectively aggregate the sLLM and LLM predictions on initial tokens so that the generated tokens can more accurately condition the subsequent token generation by sLLM only. Critically, we find that it is essential to adaptively overtrust or disregard the LLM prediction based on the confidence of the sLLM. Through our experiments on a wide range of models and datasets, we demonstrate that our method provides a consistent improvement over conventional decoding strategies.

Updated: 2024-06-26 01:16:12

标题: 有限教师监督下的解码需要理解何时信任老师

摘要: 如何使sLLMs有效地利用LLMs的监督来提高它们的生成质量？这个问题在没有对可以使用的LLM监督数量的限制的情况下得到了很好的研究，从而产生了许多利用监督而无需进一步训练的解码算法。然而，在有限监督情况下仍不清楚什么是有效的策略，我们假设LLMs只能生成少量标记。为此，我们开发了一种算法，有效地聚合sLLM和LLM对初始标记的预测，以使生成的标记能更准确地通过sLLM条件后续标记的生成。关键是，我们发现根据sLLM的置信度，有必要自适应地过度信任或忽略LLM的预测。通过我们在各种模型和数据集上的实验，我们证明我们的方法比传统的解码策略提供了一致的改进。

更新时间: 2024-06-26 01:16:12

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2406.18002v1

Vanilla Bayesian Optimization Performs Great in High Dimensions

High-dimensional problems have long been considered the Achilles' heel of Bayesian optimization algorithms. Spurred by the curse of dimensionality, a large collection of algorithms aim to make it more performant in this setting, commonly by imposing various simplifying assumptions on the objective. In this paper, we identify the degeneracies that make vanilla Bayesian optimization poorly suited to high-dimensional tasks, and further show how existing algorithms address these degeneracies through the lens of lowering the model complexity. Moreover, we propose an enhancement to the prior assumptions that are typical to vanilla Bayesian optimization algorithms, which reduces the complexity to manageable levels without imposing structural restrictions on the objective. Our modification - a simple scaling of the Gaussian process lengthscale prior with the dimensionality - reveals that standard Bayesian optimization works drastically better than previously thought in high dimensions, clearly outperforming existing state-of-the-art algorithms on multiple commonly considered real-world high-dimensional tasks.

Updated: 2024-06-26 01:10:56

标题: 香草贝叶斯优化在高维度下表现出色

摘要: 高维问题长期以来被认为是贝叶斯优化算法的弱点。受到维度诅咒的刺激，许多算法旨在在这种情况下更高效地运行，通常是通过对目标施加各种简化假设来实现的。在本文中，我们确定了使原始贝叶斯优化算法在高维任务中不适用的退化现象，并进一步展示了现有算法如何通过降低模型复杂性来解决这些退化现象。此外，我们提出了对原始贝叶斯优化算法典型先前假设的改进，这种改进可以将复杂性降低到可管理的水平，而不会对目标施加结构限制。我们的修改 - 通过维度对高斯过程长度尺度先验进行简单缩放 - 显示了标准贝叶斯优化在高维度中的工作效果比以前想象的要好得多，明显优于现有多种常见的真实世界高维任务上的最先进算法。

更新时间: 2024-06-26 01:10:56

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2402.02229v4

Pointwise convergence of Fourier series and deep neural network for the indicator function of d-dimensional ball

In this paper, we clarify the crucial difference between a deep neural network and the Fourier series. For the multiple Fourier series of periodization of some radial functions on $\mathbb{R}^d$, Kuratsubo (2010) investigated the behavior of the spherical partial sum and discovered the third phenomenon other than the well-known Gibbs-Wilbraham and Pinsky phenomena. In particular, the third one exhibits prevention of pointwise convergence. In contrast to it, we give a specific deep neural network and prove pointwise convergence.

Updated: 2024-06-26 01:00:53

标题: 傅立叶级数的逐点收敛与深度神经网络在d维球指示函数中的应用

摘要: 在本文中，我们澄清了深度神经网络和傅立叶级数之间的关键区别。对于一些径向函数在$\mathbb{R}^d$上的周期化的多重傅立叶级数，Kuratsubo（2010）研究了球形部分和的行为，并发现了除了众所周知的吉布斯-威尔布拉姆和平斯基现象之外的第三个现象。特别是，第三个现象表现出防止逐点收敛。与此相反，我们给出了一个特定的深度神经网络，并证明了逐点收敛。

更新时间: 2024-06-26 01:00:53

领域: cs.LG,cs.IT,math.AP,math.IT

下载: http://arxiv.org/abs/2304.08172v5

Learning in RKHM: a $C^*$-Algebraic Twist for Kernel Machines

Supervised learning in reproducing kernel Hilbert space (RKHS) and vector-valued RKHS (vvRKHS) has been investigated for more than 30 years. In this paper, we provide a new twist to this rich literature by generalizing supervised learning in RKHS and vvRKHS to reproducing kernel Hilbert $C^*$-module (RKHM), and show how to construct effective positive-definite kernels by considering the perspective of $C^*$-algebra. Unlike the cases of RKHS and vvRKHS, we can use $C^*$-algebras to enlarge representation spaces. This enables us to construct RKHMs whose representation power goes beyond RKHSs, vvRKHSs, and existing methods such as convolutional neural networks. Our framework is suitable, for example, for effectively analyzing image data by allowing the interaction of Fourier components.

Updated: 2024-06-26 00:29:54

标题: 在RKHM中学习：核机器的$C^*$代数扭曲

摘要: 在再生核希尔伯特空间（RKHS）和矢量值RKHS（vvRKHS）中的监督学习已经被研究了30多年。在本文中，我们通过将RKHS和vvRKHS中的监督学习概括到再生核希尔伯特C^*模块（RKHM）中，为这一丰富的文献提供了一个新的视角，并展示了如何通过考虑C^*代数的角度来构建有效的正定核。与RKHS和vvRKHS的情况不同，我们可以使用C^*代数来扩大表示空间。这使我们能够构建表示能力超越RKHS、vvRKHS和现有方法如卷积神经网络的RKHM。例如，我们的框架适用于通过允许傅立叶分量的相互作用来有效分析图像数据。

更新时间: 2024-06-26 00:29:54

领域: stat.ML,cs.LG,math.OA

下载: http://arxiv.org/abs/2210.11855v3

Catching Chameleons: Detecting Evolving Disinformation Generated using Large Language Models

Despite recent advancements in detecting disinformation generated by large language models (LLMs), current efforts overlook the ever-evolving nature of this disinformation. In this work, we investigate a challenging yet practical research problem of detecting evolving LLM-generated disinformation. Disinformation evolves constantly through the rapid development of LLMs and their variants. As a consequence, the detection model faces significant challenges. First, it is inefficient to train separate models for each disinformation generator. Second, the performance decreases in scenarios when evolving LLM-generated disinformation is encountered in sequential order. To address this problem, we propose DELD (Detecting Evolving LLM-generated Disinformation), a parameter-efficient approach that jointly leverages the general fact-checking capabilities of pre-trained language models (PLM) and the independent disinformation generation characteristics of various LLMs. In particular, the learned characteristics are concatenated sequentially to facilitate knowledge accumulation and transformation. DELD addresses the issue of label scarcity by integrating the semantic embeddings of disinformation with trainable soft prompts to elicit model-specific knowledge. Our experiments show that \textit{DELD} significantly outperforms state-of-the-art methods. Moreover, our method provides critical insights into the unique patterns of disinformation generation across different LLMs, offering valuable perspectives in this line of research.

Updated: 2024-06-26 00:21:39

标题: 抓住变色龙：检测利用大型语言模型生成的不断演变的虚假信息

摘要: 尽管最近在检测大型语言模型(LLMs)生成的虚假信息方面取得了进展，但目前的努力忽视了这种虚假信息不断演变的特性。在这项工作中，我们研究了一个具有挑战性但实际的研究问题，即检测不断演变的LLM生成的虚假信息。虚假信息通过LLMs及其变体的快速发展不断演变。因此，检测模型面临着重大挑战。首先，为每个虚假信息生成器单独训练模型是低效的。其次，在遇到按顺序演变的LLM生成的虚假信息的情况下，性能会下降。为解决这个问题，我们提出了DELD（检测演变的LLM生成的虚假信息），这是一种参数高效的方法，同时利用了预训练语言模型(PLM)的一般事实检查能力和各种LLM的独立虚假信息生成特征。特别地，学到的特征被顺序连接以促进知识积累和转化。DELD通过将虚假信息的语义嵌入与可训练的软提示结合起来，解决了标签稀缺性的问题，以引发模型特定知识。我们的实验表明，DELD明显优于最先进的方法。此外，我们的方法为不同LLMs之间虚假信息生成的独特模式提供了关键见解，为这一研究领域提供了宝贵的观点。

更新时间: 2024-06-26 00:21:39

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2406.17992v1

Unleashing the Expressive Power of Pulse-Based Quantum Neural Networks

Quantum machine learning (QML) based on Noisy Intermediate-Scale Quantum (NISQ) devices hinges on the optimal utilization of limited quantum resources. While gate-based QML models are user-friendly for software engineers, their expressivity is restricted by the permissible circuit depth within a finite coherence time. In contrast, pulse-based models enable the construction of "infinitely" deep quantum neural networks within the same time, which may unleash greater expressive power for complex learning tasks. In this paper, this potential is investigated from the perspective of quantum control theory. We first indicate that the nonlinearity of pulse-based models comes from the encoding process that can be viewed as the continuous limit of data-reuploading in gate-based models. Subsequently, we prove that the pulse-based model can approximate arbitrary nonlinear functions when the underlying physical system is ensemble controllable. Under this condition, numerical simulations demonstrate the enhanced expressivity by either increasing the pulse length or the number of qubits. As anticipated, we show through numerical examples that the pulse-based model can unleash more expressive power compared to the gate-based model. These findings lay a theoretical foundation for understanding and designing expressive QML models using NISQ devices.

Updated: 2024-06-26 00:20:37

标题: 释放基于脉冲的量子神经网络的表现力

摘要: 基于嘈杂中间规模量子（NISQ）设备的量子机器学习（QML）依赖于有限量子资源的最佳利用。虽然基于门的QML模型对软件工程师来说很友好，但其表现力受限于有限相干时间内允许的电路深度。相比之下，基于脉冲的模型可以在相同时间内构建“无限”深度的量子神经网络，这可能为复杂学习任务释放更大的表现力。本文从量子控制理论的角度探讨了这一潜力。我们首先指出，基于脉冲的模型的非线性来自编码过程，可以看作是门模型中数据重新上传的连续极限。随后，我们证明了在底层物理系统是集合可控时，基于脉冲的模型可以近似任意非线性函数。在这种条件下，数值模拟展示了通过增加脉冲长度或量子位数来增强表现力。正如预期的那样，我们通过数值示例展示了与基于门的模型相比，基于脉冲的模型可以释放出更多表现力。这些发现为使用NISQ设备理解和设计具有表现力的QML模型奠定了理论基础。

更新时间: 2024-06-26 00:20:37

领域: quant-ph,cs.ET,cs.LG

下载: http://arxiv.org/abs/2402.02880v2

Explicit Diversity Conditions for Effective Question Answer Generation with Large Language Models

Question Answer Generation (QAG) is an effective data augmentation technique to improve the accuracy of question answering systems, especially in low-resource domains. While recent pretrained and large language model-based QAG methods have made substantial progress, they face the critical issue of redundant QA pair generation, affecting downstream QA systems. Implicit diversity techniques such as sampling and diverse beam search are proven effective solutions but often yield smaller diversity. We present explicit diversity conditions for QAG, focusing on spatial aspects, question types, and entities, substantially increasing diversity in QA generation. Our work emphasizes the need of explicit diversity conditions for generating diverse question-answer synthetic data by showing significant improvements in downstream QA task over existing widely adopted implicit diversity techniques. In particular, generated QA pairs from explicit diversity conditions when used to train the downstream QA model results in an average 4.1% exact match and 4.5% F1 improvement over QAG from implicit sampling techniques on SQuADDU. Our work emphasizes the need for explicit diversity conditions even more in low-resource datasets (SubjQA), where average downstream QA performance improvements are around 12% EM.

Updated: 2024-06-26 00:12:08

标题: 大型语言模型有效问题回答生成的明确多样性条件

摘要: 问答生成（QAG）是一种有效的数据增强技术，可提高问答系统的准确性，特别是在资源匮乏的领域。虽然最近基于预训练和大型语言模型的QAG方法取得了实质性进展，但它们面临着冗余QA对生成的关键问题，影响了下游QA系统。隐式多样性技术，如抽样和多样性束搜索，被证明是有效的解决方案，但通常产生较小的多样性。我们提出了针对QAG的显式多样性条件，重点关注空间方面、问题类型和实体，显著增加了QA生成中的多样性。我们的工作强调了通过明确多样性条件生成多样化问题-答案合成数据的需求，通过在现有广泛采用的隐式多样性技术上显示对下游QA任务的显著改进。特别是，采用明确多样性条件生成的QA对来训练下游QA模型，在SQuADDU上的准确匹配和F1得分平均提高了4.1％和4.5％。我们的工作更加强调了在资源匮乏的数据集（SubjQA）中甚至更需要显式多样性条件，下游QA性能平均提高了约12％。

更新时间: 2024-06-26 00:12:08

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2406.17990v1

Learning Neural Networks with Sparse Activations

A core component present in many successful neural network architectures, is an MLP block of two fully connected layers with a non-linear activation in between. An intriguing phenomenon observed empirically, including in transformer architectures, is that, after training, the activations in the hidden layer of this MLP block tend to be extremely sparse on any given input. Unlike traditional forms of sparsity, where there are neurons/weights which can be deleted from the network, this form of {\em dynamic} activation sparsity appears to be harder to exploit to get more efficient networks. Motivated by this we initiate a formal study of PAC learnability of MLP layers that exhibit activation sparsity. We present a variety of results showing that such classes of functions do lead to provable computational and statistical advantages over their non-sparse counterparts. Our hope is that a better theoretical understanding of {\em sparsely activated} networks would lead to methods that can exploit activation sparsity in practice.

Updated: 2024-06-26 00:11:13

标题: 学习具有稀疏激活的神经网络

摘要: 许多成功的神经网络架构中都包含一个核心组件，即具有非线性激活的两个全连接层的MLP块。一个引人注目的现象是，经验性观察到，包括在变压器架构中，经过训练后，这个MLP块的隐藏层中的激活在任何给定输入上都倾向于极度稀疏。与传统形式的稀疏不同，传统形式中可以从网络中删除神经元/权重，这种形式的动态激活稀疏似乎更难利用以获得更高效的网络。受此启发，我们开始对展示激活稀疏的MLP层的PAC可学习性进行正式研究。我们提出了各种结果，显示这些函数类确实比其非稀疏对应物具有可证明的计算和统计优势。我们希望更好地理解稀疏激活网络会导致可以在实践中利用激活稀疏的方法。

更新时间: 2024-06-26 00:11:13

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2406.17989v1

Multi-step Knowledge Retrieval and Inference over Unstructured Data

The advent of Large Language Models (LLMs) and Generative AI has revolutionized natural language applications across various domains. However, high-stakes decision-making tasks in fields such as medical, legal and finance require a level of precision, comprehensiveness, and logical consistency that pure LLM or Retrieval-Augmented-Generation (RAG) approaches often fail to deliver. At Elemental Cognition (EC), we have developed a neuro-symbolic AI platform to tackle these problems. The platform integrates fine-tuned LLMs for knowledge extraction and alignment with a robust symbolic reasoning engine for logical inference, planning and interactive constraint solving. We describe Cora, a Collaborative Research Assistant built on this platform, that is designed to perform complex research and discovery tasks in high-stakes domains. This paper discusses the multi-step inference challenges inherent in such domains, critiques the limitations of existing LLM-based methods, and demonstrates how Cora's neuro-symbolic approach effectively addresses these issues. We provide an overview of the system architecture, key algorithms for knowledge extraction and formal reasoning, and present preliminary evaluation results that highlight Cora's superior performance compared to well-known LLM and RAG baselines.

Updated: 2024-06-26 00:00:45

标题: 多步骤知识检索和推理在非结构化数据上的应用

摘要: 大型语言模型（LLMs）和生成式人工智能的出现彻底改变了各个领域的自然语言应用。然而，在医疗、法律和金融等领域的高风险决策任务中，需要一定的精确度、全面性和逻辑一致性，而纯粹的LLM或检索增强生成（RAG）方法通常无法达到。在Elemental Cognition（EC）中，我们开发了一个神经符号人工智能平台来解决这些问题。该平台集成了用于知识提取和对齐的精细调整的LLMs，以及用于逻辑推理、规划和交互式约束求解的强大符号推理引擎。我们描述了基于这一平台构建的协作研究助手Cora，旨在执行高风险领域中的复杂研究和发现任务。本文讨论了这些领域固有的多步推理挑战，批评了现有基于LLM的方法的局限性，并展示了Cora的神经符号方法如何有效地解决这些问题。我们提供了系统架构概述、知识提取和形式推理的关键算法，并呈现了初步评估结果，突显了Cora相对于知名的LLM和RAG基线的优越性能。

更新时间: 2024-06-26 00:00:45

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2406.17987v1