Arxiv Day: Article

The Over-Certainty Phenomenon in Modern UDA Algorithms

When neural networks are confronted with unfamiliar data that deviate from their training set, this signifies a domain shift. While these networks output predictions on their inputs, they typically fail to account for their level of familiarity with these novel observations. This challenge becomes even more pronounced in resource-constrained settings, such as embedded systems or edge devices. To address such challenges, we aim to recalibrate a neural network's decision boundaries in relation to its cognizance of the data it observes, introducing an approach we coin as certainty distillation. While prevailing works navigate unsupervised domain adaptation (UDA) with the goal of curtailing model entropy, they unintentionally birth models that grapple with calibration inaccuracies - a dilemma we term the over-certainty phenomenon. In this paper, we probe the drawbacks of this traditional learning model. As a solution to the issue, we propose a UDA algorithm that not only augments accuracy but also assures model calibration, all while maintaining suitability for environments with limited computational resources.

Updated: 2024-05-27 23:58:00

标题: 现代UDA算法中的过度确定性现象

摘要: 当神经网络遇到与其训练集有所偏离的陌生数据时，这表明存在领域转移。虽然这些网络在其输入上输出预测，但它们通常无法考虑对这些新观察的熟悉程度。在资源受限的环境中，如嵌入式系统或边缘设备中，这一挑战变得更加突出。为了解决这些挑战，我们旨在重新校准神经网络的决策边界，与其观察到的数据的认知相对应，引入一种我们称之为确定性提炼的方法。虽然现有作品通过无监督领域自适应（UDA）来削减模型熵，但它们无意中产生了模型面临校准不准确性的困境 - 我们称之为过度确定性现象。在本文中，我们探讨了这种传统学习模型的缺点。作为解决这一问题的方法，我们提出了一种UDA算法，不仅增加了准确性，而且确保了模型的校准，同时还保持了适用于计算资源有限的环境。

更新时间: 2024-05-27 23:58:00

领域: cs.LG,cs.AI,stat.ML

下载: http://arxiv.org/abs/2404.16168v2

Multiple-policy Evaluation via Density Estimation

We study the multiple-policy evaluation problem where we are given a set of $K$ policies and the goal is to evaluate their performance (expected total reward over a fixed horizon) to an accuracy $\epsilon$ with probability at least $1-\delta$. We propose an algorithm named $\mathrm{CAESAR}$ for this problem. Our approach is based on computing an approximate optimal offline sampling distribution and using the data sampled from it to perform the simultaneous estimation of the policy values. $\mathrm{CAESAR}$ has two phases. In the first we produce coarse estimates of the visitation distributions of the target policies at a low order sample complexity rate that scales with $\tilde{O}(\frac{1}{\epsilon})$. In the second phase, we approximate the optimal offline sampling distribution and compute the importance weighting ratios for all target policies by minimizing a step-wise quadratic loss function inspired by the DualDICE \cite{nachum2019dualdice} objective. Up to low order and logarithmic terms $\mathrm{CAESAR}$ achieves a sample complexity $\tilde{O}\left(\frac{H^4}{\epsilon^2}\sum_{h=1}^H\max_{k\in[K]}\sum_{s,a}\frac{(d_h^{\pi^k}(s,a))^2}{\mu^*_h(s,a)}\right)$, where $d^{\pi}$ is the visitation distribution of policy $\pi$, $\mu^*$ is the optimal sampling distribution, and $H$ is the horizon.

Updated: 2024-05-27 23:57:02

标题: 通过密度估计进行多策略评估

摘要: 我们研究了多策略评估问题，其中给定一组$K$个策略，目标是以至少$1-\delta$的概率准确评估它们的性能（在固定时间段内的预期总奖励）至精度$\epsilon$。我们提出了一个名为$\mathrm{CAESAR}$的算法来解决这个问题。我们的方法基于计算一个近似的最优离线抽样分布，并使用从中抽样的数据来进行策略值的同时估计。$\mathrm{CAESAR}$有两个阶段。在第一个阶段，我们以与$\tilde{O}(\frac{1}{\epsilon})$成比例的低阶样本复杂度率生成目标策略的访问分布的粗略估计。在第二阶段，我们近似计算最优离线抽样分布，并通过最小化受DualDICE \cite{nachum2019dualdice}目标启发的逐步二次损失函数来计算所有目标策略的重要性加权比。在低阶和对数项上，$\mathrm{CAESAR}$实现了样本复杂度$\tilde{O}\left(\frac{H^4}{\epsilon^2}\sum_{h=1}^H\max_{k\in[K]}\sum_{s,a}\frac{(d_h^{\pi^k}(s,a))^2}{\mu^*_h(s,a)}\right)$，其中$d^{\pi}$是策略$\pi$的访问分布，$\mu^*$是最优抽样分布，$H$是时间段。

更新时间: 2024-05-27 23:57:02

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2404.00195v2

OPERA: Automatic Offline Policy Evaluation with Re-weighted Aggregates of Multiple Estimators

Offline policy evaluation (OPE) allows us to evaluate and estimate a new sequential decision-making policy's performance by leveraging historical interaction data collected from other policies. Evaluating a new policy online without a confident estimate of its performance can lead to costly, unsafe, or hazardous outcomes, especially in education and healthcare. Several OPE estimators have been proposed in the last decade, many of which have hyperparameters and require training. Unfortunately, choosing the best OPE algorithm for each task and domain is still unclear. In this paper, we propose a new algorithm that adaptively blends a set of OPE estimators given a dataset without relying on an explicit selection using a statistical procedure. We prove that our estimator is consistent and satisfies several desirable properties for policy evaluation. Additionally, we demonstrate that when compared to alternative approaches, our estimator can be used to select higher-performing policies in healthcare and robotics. Our work contributes to improving ease of use for a general-purpose, estimator-agnostic, off-policy evaluation framework for offline RL.

Updated: 2024-05-27 23:51:20

标题: OPERA：使用多个估计器的重新加权聚合进行自动离线策略评估

摘要: 离线策略评估（OPE）允许我们通过利用从其他策略收集的历史交互数据来评估和估计新的连续决策制定政策的性能。在线评估新政策而没有其性能的可靠估计可能会导致昂贵、不安全或危险的结果，特别是在教育和医疗领域。在过去的十年中已经提出了几种OPE估计器，其中许多具有超参数并需要训练。不幸的是，为每个任务和领域选择最佳的OPE算法仍然不清楚。在本文中，我们提出了一种新算法，它在给定数据集的情况下自适应地混合一组OPE估计器，而无需依赖于使用统计程序进行显式选择。我们证明我们的估计器是一致的，并满足几个用于策略评估的理想属性。此外，我们证明与替代方法相比，我们的估计器可以用于在医疗保健和机器人领域选择性能更高的政策。我们的工作有助于改善通用、估计器不可知、离线RL的离线策略评估框架的易用性。

更新时间: 2024-05-27 23:51:20

领域: cs.LG,cs.AI,stat.ML

下载: http://arxiv.org/abs/2405.17708v1

3D Diffusion Policy: Generalizable Visuomotor Policy Learning via Simple 3D Representations

Imitation learning provides an efficient way to teach robots dexterous skills; however, learning complex skills robustly and generalizablely usually consumes large amounts of human demonstrations. To tackle this challenging problem, we present 3D Diffusion Policy (DP3), a novel visual imitation learning approach that incorporates the power of 3D visual representations into diffusion policies, a class of conditional action generative models. The core design of DP3 is the utilization of a compact 3D visual representation, extracted from sparse point clouds with an efficient point encoder. In our experiments involving 72 simulation tasks, DP3 successfully handles most tasks with just 10 demonstrations and surpasses baselines with a 24.2% relative improvement. In 4 real robot tasks, DP3 demonstrates precise control with a high success rate of 85%, given only 40 demonstrations of each task, and shows excellent generalization abilities in diverse aspects, including space, viewpoint, appearance, and instance. Interestingly, in real robot experiments, DP3 rarely violates safety requirements, in contrast to baseline methods which frequently do, necessitating human intervention. Our extensive evaluation highlights the critical importance of 3D representations in real-world robot learning. Videos, code, and data are available on https://3d-diffusion-policy.github.io .

Updated: 2024-05-27 23:50:22

标题: 3D扩散策略：通过简单的3D表示实现可泛化的视觉动作策略学习

摘要: 模仿学习为教授机器人灵巧技能提供了一种有效的方法；然而，学习复杂的技能通常需要大量的人类演示，以确保稳健性和泛化性。为了解决这一具有挑战性的问题，我们提出了3D扩散策略（DP3），这是一种新颖的视觉模仿学习方法，将3D视觉表示的力量整合到扩散策略中，这是一类条件动作生成模型。DP3的核心设计是利用从稀疏点云中提取的紧凑3D视觉表示，使用高效的点编码器。在我们的72个模拟任务实验中，DP3仅仅通过10次演示就成功处理了大多数任务，并且相对于基准线实现了24.2％的相对改善。在4个真实机器人任务中，DP3表现出精确控制的能力，成功率高达85％，仅仅通过每个任务40次演示，并且在空间、视角、外观和实例等各个方面展现出优秀的泛化能力。有趣的是，在真实机器人实验中，DP3很少违反安全要求，而基准方法经常需要人类干预。我们的广泛评估突显了在现实世界机器人学习中3D表示的关键重要性。视频、代码和数据可在https://3d-diffusion-policy.github.io 上获取。

更新时间: 2024-05-27 23:50:22

领域: cs.RO,cs.CV,cs.LG

下载: http://arxiv.org/abs/2403.03954v5

Video Enriched Retrieval Augmented Generation Using Aligned Video Captions

In this work, we propose the use of "aligned visual captions" as a mechanism for integrating information contained within videos into retrieval augmented generation (RAG) based chat assistant systems. These captions are able to describe the visual and audio content of videos in a large corpus while having the advantage of being in a textual format that is both easy to reason about & incorporate into large language model (LLM) prompts, but also typically require less multimedia content to be inserted into the multimodal LLM context window, where typical configurations can aggressively fill up the context window by sampling video frames from the source video. Furthermore, visual captions can be adapted to specific use cases by prompting the original foundational model / captioner for particular visual details or fine tuning. In hopes of helping advancing progress in this area, we curate a dataset and describe automatic evaluation procedures on common RAG tasks.

Updated: 2024-05-27 23:39:17

标题: 使用对齐的视频字幕进行视频丰富检索增强生成

摘要: 在这项工作中，我们提出使用“对齐的视觉字幕”作为一种机制，将视频中包含的信息整合到检索增强生成（RAG）为基础的聊天助手系统中。这些字幕能够描述大型语料库中视频的视觉和音频内容，同时具有以文本格式呈现的优势，这种格式既便于推理和整合到大型语言模型（LLM）提示中，又通常需要插入更少的多媒体内容到多模态LLM上下文窗口中，典型配置可能通过从源视频中采样视频帧来积极填充上下文窗口。此外，视觉字幕可以通过提示原始基础模型/字幕生成器获取特定的视觉细节或微调，以适应特定的用例。为了帮助推进这一领域的进展，我们整理了一个数据集，并描述了常见RAG任务的自动评估程序。

更新时间: 2024-05-27 23:39:17

领域: cs.AI,cs.CV,cs.IR

下载: http://arxiv.org/abs/2405.17706v1

Stochastic optimization on matrices and a graphon McKean-Vlasov limit

We consider stochastic gradient descents on the space of large symmetric matrices of suitable functions that are invariant under permuting the rows and columns using the same permutation. We establish deterministic limits of these random curves as the dimensions of the matrices go to infinity while the entries remain bounded. Under a ``small noise'' assumption the limit is shown to be the gradient flow of functions on graphons whose existence was established in~\cite{oh2021gradient}. We also consider limits of stochastic gradient descents with added properly scaled reflected Brownian noise. The limiting curve of graphons is characterized by a family of stochastic differential equations with reflections and can be thought of as an extension of the classical McKean-Vlasov limit for interacting diffusions to the graphon setting. The proofs introduce a family of infinite-dimensional exchangeable arrays of reflected diffusions and a novel notion of propagation of chaos for large matrices of diffusions converging to such arrays in a suitable sense.

Updated: 2024-05-27 23:34:26

标题: 矩阵和图上的随机优化与McKean-Vlasov极限

摘要: 我们考虑在适当函数的大对称矩阵空间上进行随机梯度下降，这些函数在对行和列进行排列时保持不变，使用相同的排列。我们建立了这些随机曲线的确定性极限，当矩阵的维度趋于无穷时，而条目保持有界。在“小噪声”假设下，证明了极限是图像的函数的梯度流，其存在已在~\cite{oh2021gradient}中建立。我们还考虑了添加适当缩放的反射布朗噪声的随机梯度下降的极限。图像的极限曲线由带有反射的随机微分方程族表征，并可以被看作是将相互作用扩散的经典McKean-Vlasov极限扩展到图像设置中。证明引入了一组无限维可交换的反射扩散数组，以及一种新颖的传播混沌的概念，用于收敛到这种数组的大矩阵扩散。

更新时间: 2024-05-27 23:34:26

领域: math.PR,cs.LG,stat.ML,05C60, 05C63, 05C80, 68R10, 60K35, 60G09

下载: http://arxiv.org/abs/2210.00422v3

TimeGPT-1

In this paper, we introduce TimeGPT, the first foundation model for time series, capable of generating accurate predictions for diverse datasets not seen during training. We evaluate our pre-trained model against established statistical, machine learning, and deep learning methods, demonstrating that TimeGPT zero-shot inference excels in performance, efficiency, and simplicity. Our study provides compelling evidence that insights from other domains of artificial intelligence can be effectively applied to time series analysis. We conclude that large-scale time series models offer an exciting opportunity to democratize access to precise predictions and reduce uncertainty by leveraging the capabilities of contemporary advancements in deep learning.

Updated: 2024-05-27 23:34:15

标题: TimeGPT-1

摘要: 在这篇论文中，我们介绍了TimeGPT，这是第一个针对时间序列的基础模型，能够生成准确预测多样化数据集，即使这些数据集在训练过程中并没有出现过。我们将我们预训练的模型与已建立的统计学、机器学习和深度学习方法进行评估，结果表明TimeGPT零-shot推理在性能、效率和简单性方面表现出色。我们的研究提供了有力证据，表明人工智能其他领域的见解可以有效应用于时间序列分析。我们得出结论，大规模时间序列模型为民主化获取精确预测提供了激动人心的机会，并通过利用深度学习的现代进展的能力来减少不确定性。

更新时间: 2024-05-27 23:34:15

领域: cs.LG,stat.AP

下载: http://arxiv.org/abs/2310.03589v3

Mechanistic Interpretability of Binary and Ternary Transformers

Recent research (arXiv:2310.11453, arXiv:2402.17764) has proposed binary and ternary transformer networks as a way to significantly reduce memory and improve inference speed in Large Language Models (LLMs) while maintaining accuracy. In this work, we apply techniques from mechanistic interpretability to investigate whether such networks learn distinctly different or similar algorithms when compared to full-precision transformer networks. In particular, we reverse engineer the algorithms learned for the toy problem of modular addition where we find that binary and ternary networks learn similar algorithms as full precision networks. This provides evidence against the possibility of using binary and ternary networks as a more interpretable alternative in the LLM setting.

Updated: 2024-05-27 23:22:23

标题: 二进制和三进制变压器的机理可解释性

摘要: 最近的研究提出了二进制和三进制变压器网络作为一种显著减少内存并提高大型语言模型（LLMs）推理速度的方法，同时保持准确性。在这项工作中，我们应用机械可解释性技术，研究这些网络是否学习了与完整精度变压器网络相比截然不同或相似的算法。具体来说，我们对模块化加法的玩具问题进行了算法逆向工程，发现二进制和三进制网络学习了与完整精度网络相似的算法。这为在LLM设置中使用二进制和三进制网络作为更可解释的替代方案提供了证据。

更新时间: 2024-05-27 23:22:23

领域: cs.LG,cs.CL

下载: http://arxiv.org/abs/2405.17703v1

A Dynamical Model of Neural Scaling Laws

On a variety of tasks, the performance of neural networks predictably improves with training time, dataset size and model size across many orders of magnitude. This phenomenon is known as a neural scaling law. Of fundamental importance is the compute-optimal scaling law, which reports the performance as a function of units of compute when choosing model sizes optimally. We analyze a random feature model trained with gradient descent as a solvable model of network training and generalization. This reproduces many observations about neural scaling laws. First, our model makes a prediction about why the scaling of performance with training time and with model size have different power law exponents. Consequently, the theory predicts an asymmetric compute-optimal scaling rule where the number of training steps are increased faster than model parameters, consistent with recent empirical observations. Second, it has been observed that early in training, networks converge to their infinite-width dynamics at a rate $1/\textit{width}$ but at late time exhibit a rate $\textit{width}^{-c}$, where $c$ depends on the structure of the architecture and task. We show that our model exhibits this behavior. Lastly, our theory shows how the gap between training and test loss can gradually build up over time due to repeated reuse of data.

Updated: 2024-05-27 23:21:13

标题: 神经尺度律的动力学模型

摘要: 在各种任务中，神经网络的性能随着训练时间、数据集大小和模型大小的增加而可预测地提高，跨多个数量级。这种现象被称为神经缩放定律。计算优化缩放定律是非常重要的，它报告了在选择模型大小最优时，性能与计算单元的关系。我们分析了使用梯度下降训练的随机特征模型作为网络训练和泛化的可解模型。这重现了关于神经缩放定律的许多观察结果。首先，我们的模型对于为什么性能随着训练时间和模型大小的扩展具有不同的幂律指数做出了预测。因此，这个理论预测了一个不对称的计算优化缩放规则，其中训练步骤的数量增加得比模型参数快，与最近的经验观察结果一致。其次，已经观察到在训练初期，网络以$1/\textit{width}$的速率收敛到其无限宽度的动态，但在后期则表现出$\textit{width}^{-c}$的速率，其中$c$取决于架构和任务的结构。我们展示了我们的模型展现出这种行为。最后，我们的理论展示了由于数据重复使用，训练和测试损失之间的差距如何随着时间逐渐增加。

更新时间: 2024-05-27 23:21:13

领域: stat.ML,cond-mat.dis-nn,cs.LG

下载: http://arxiv.org/abs/2402.01092v3

Improved Generalization Bounds for Communication Efficient Federated Learning

This paper focuses on reducing the communication cost of federated learning by exploring generalization bounds and representation learning. We first characterize a tighter generalization bound for one-round federated learning based on local clients' generalizations and heterogeneity of data distribution (non-iid scenario). We also characterize a generalization bound in R-round federated learning and its relation to the number of local updates (local stochastic gradient descents (SGDs)). Then, based on our generalization bound analysis and our representation learning interpretation of this analysis, we show for the first time that less frequent aggregations, hence more local updates, for the representation extractor (usually corresponds to initial layers) leads to the creation of more generalizable models, particularly for non-iid scenarios. We design a novel Federated Learning with Adaptive Local Steps (FedALS) algorithm based on our generalization bound and representation learning analysis. FedALS employs varying aggregation frequencies for different parts of the model, so reduces the communication cost. The paper is followed with experimental results showing the effectiveness of FedALS.

Updated: 2024-05-27 23:20:52

标题: 通信高效的联邦学习改进泛化界限

摘要: 本文着重于通过探索泛化界限和表示学习来减少联邦学习的通信成本。我们首先基于本地客户端的泛化和数据分布的异质性（非独立同分布场景），为一轮联邦学习制定了更紧密的泛化界限。我们还为R轮联邦学习制定了一个泛化界限，并探讨了它与本地更新次数（本地随机梯度下降（SGD））的关系。然后，基于我们的泛化界限分析和对该分析的表示学习解释，我们首次表明，对于表示提取器（通常对应于初始层），更少的聚合频率，因此更多的本地更新，会导致创建更具泛化性的模型，特别适用于非独立同分布场景。我们设计了一种基于我们的泛化界限和表示学习分析的新颖的具有自适应本地步骤的联邦学习算法（FedALS）。FedALS为模型的不同部分采用不同的聚合频率，从而降低了通信成本。该论文随后展示了FedALS的有效性的实验结果。

更新时间: 2024-05-27 23:20:52

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2404.11754v3

Learning Social Welfare Functions

Is it possible to understand or imitate a policy maker's rationale by looking at past decisions they made? We formalize this question as the problem of learning social welfare functions belonging to the well-studied family of power mean functions. We focus on two learning tasks; in the first, the input is vectors of utilities of an action (decision or policy) for individuals in a group and their associated social welfare as judged by a policy maker, whereas in the second, the input is pairwise comparisons between the welfares associated with a given pair of utility vectors. We show that power mean functions are learnable with polynomial sample complexity in both cases, even if the comparisons are social welfare information is noisy. Finally, we design practical algorithms for these tasks and evaluate their performance.

Updated: 2024-05-27 23:16:52

标题: 学习社会福利函数

摘要: 我们是否可以通过观察政策制定者过去的决策来理解或模拟他们的思维过程？我们将这个问题形式化为学习社会福利函数的问题，这些函数属于被广泛研究的幂均值函数家族。我们专注于两个学习任务；在第一个任务中，输入是一个群体中个体对一个行动（决策或政策）的效用向量以及他们与政策制定者判断的社会福利相关联，而在第二个任务中，输入是与给定一对效用向量相关联的福利之间的两两比较。我们展示了在这两种情况下，即使比较是社会福利信息是有噪声的情况下，幂均值函数也是可以用多项式样本复杂度来学习的。最后，我们设计了这些任务的实用算法并评估了它们的性能。

更新时间: 2024-05-27 23:16:52

领域: cs.GT,cs.LG

下载: http://arxiv.org/abs/2405.17700v1

Independence Testing for Temporal Data

Temporal data are increasingly prevalent in modern data science. A fundamental question is whether two time series are related or not. Existing approaches often have limitations, such as relying on parametric assumptions, detecting only linear associations, and requiring multiple tests and corrections. While many non-parametric and universally consistent dependence measures have recently been proposed, directly applying them to temporal data can inflate the p-value and result in an invalid test. To address these challenges, this paper introduces the temporal dependence statistic with block permutation to test independence between temporal data. Under proper assumptions, the proposed procedure is asymptotically valid and universally consistent for testing independence between stationary time series, and capable of estimating the optimal dependence lag that maximizes the dependence. Moreover, it is compatible with a rich family of distance and kernel based dependence measures, eliminates the need for multiple testing, and exhibits excellent testing power in various simulation settings.

Updated: 2024-05-27 23:15:09

标题: 时间数据的独立性检验

摘要: 时间数据在现代数据科学中越来越普遍。一个基本问题是两个时间序列是否相关。现有方法通常存在一些限制，比如依赖参数假设、仅检测线性关联、需要多次测试和校正等。最近提出了许多非参数和普遍一致的依赖度量，但直接应用于时间数据可能会增加p值并导致无效测试。为解决这些挑战，本文介绍了使用块置换的时间依赖统计量来测试时间数据之间的独立性。在适当的假设下，所提出的程序在测试平稳时间序列之间的独立性时是渐近有效和普遍一致的，并能够估计最大化依赖性的最佳依赖滞后。此外，它与丰富的基于距离和核的依赖度量相兼容，消除了多次测试的需要，并在各种模拟设置中展现出出色的测试能力。

更新时间: 2024-05-27 23:15:09

领域: stat.ML,cs.LG,stat.ME

下载: http://arxiv.org/abs/1908.06486v5

P4: Towards private, personalized, and Peer-to-Peer learning

Personalized learning is a proposed approach to address the problem of data heterogeneity in collaborative machine learning. In a decentralized setting, the two main challenges of personalization are client clustering and data privacy. In this paper, we address these challenges by developing P4 (Personalized Private Peer-to-Peer) a method that ensures that each client receives a personalized model while maintaining differential privacy guarantee of each client's local dataset during and after the training. Our approach includes the design of a lightweight algorithm to identify similar clients and group them in a private, peer-to-peer (P2P) manner. Once grouped, we develop differentially-private knowledge distillation for clients to co-train with minimal impact on accuracy. We evaluate our proposed method on three benchmark datasets (FEMNIST or Federated EMNIST, CIFAR-10 and CIFAR-100) and two different neural network architectures (Linear and CNN-based networks) across a range of privacy parameters. The results demonstrate the potential of P4, as it outperforms the state-of-the-art of differential private P2P by up to 40 percent in terms of accuracy. We also show the practicality of P4 by implementing it on resource constrained devices, and validating that it has minimal overhead, e.g., about 7 seconds to run collaborative training between two clients.

Updated: 2024-05-27 23:04:37

标题: P4: 私人、个性化和点对点学习的探索

摘要: 个性化学习是一种提出的方法，用于解决协作机器学习中数据异质性的问题。在分散设置中，个性化的两个主要挑战是客户端聚类和数据隐私。在本文中，我们通过开发P4（个性化私人点对点）方法来解决这些挑战，确保每个客户端在训练期间和之后都能接收到个性化模型的同时，保持每个客户端的本地数据集的差分隐私保证。我们的方法包括设计一种轻量级算法，以私人的、点对点（P2P）方式识别相似的客户端并将它们分组。一旦分组，我们开发差分私有知识蒸馏，使客户端能够共同训练，对准确性的影响最小。我们在三个基准数据集（FEMNIST或联邦EMNIST、CIFAR-10和CIFAR-100）和两种不同的神经网络架构（线性和基于CNN的网络）上评估了我们提出的方法，跨越一系列隐私参数。结果表明P4的潜力，它在准确性方面比最先进的差分私有P2P高出多达40％。我们还展示了P4的实用性，通过在资源受限的设备上实施它，并验证它的额外开销很小，例如，在两个客户端之间进行协作训练大约需要7秒。

更新时间: 2024-05-27 23:04:37

领域: cs.LG

下载: http://arxiv.org/abs/2405.17697v1

Physics-guided Full Waveform Inversion using Encoder-Solver Convolutional Neural Networks

Full Waveform Inversion (FWI) is an inverse problem for estimating the wave velocity distribution in a given domain, based on observed data on the boundaries. The inversion is computationally demanding because we are required to solve multiple forward problems, either in time or frequency domains, to simulate data that are then iteratively fitted to the observed data. We consider FWI in the frequency domain, where the Helmholtz equation is used as a forward model, and its repeated solution is the main computational bottleneck of the inversion process. To ease this cost, we integrate a learning process of an encoder-solver preconditioner that is based on convolutional neural networks (CNNs). The encoder-solver is trained to effectively precondition the discretized Helmholtz operator given velocity medium parameters. Then, by re-training the CNN between the iterations of the optimization process, the encoder-solver is adapted to the iteratively evolving velocity medium as part of the inversion. Without retraining, the performance of the solver deteriorates as the medium changes. Using our light retraining procedures, we obtain the forward simulations effectively throughout the process. We demonstrate our approach to solving FWI problems using 2D geophysical models with high-frequency data.

Updated: 2024-05-27 23:03:21

标题: 物理引导的编码器-求解器卷积神经网络的全波形反演

摘要: Full Waveform Inversion（FWI）是一个逆问题，用于估计给定域内的波速分布，基于在边界上观测到的数据。该反演在计算上要求较高，因为我们需要解决多个正演问题，无论是在时间或频率域中，以模拟数据，然后将其迭代拟合到观测数据。我们考虑在频率域中进行FWI，其中使用Helmholtz方程作为正演模型，并且其重复解决方案是反演过程的主要计算瓶颈。为了减少这种成本，我们集成了基于卷积神经网络（CNNs）的编码器-求解器预处理器的学习过程。编码器-求解器经过训练，可以有效地预处理给定速度介质参数的离散化Helmholtz算子。然后，在优化过程的迭代之间重新训练CNN后，编码器-求解器将适应于反演过程中不断演化的速度介质。如果不重新训练，求解器的性能会随介质的变化而恶化。使用我们的轻量级重新训练程序，我们在整个过程中有效地获得正演模拟。我们展示了使用高频数据的2D地球物理模型解决FWI问题的方法。

更新时间: 2024-05-27 23:03:21

领域: cs.LG,physics.comp-ph,68T07 (Primary), 65N21 (Secondary)

下载: http://arxiv.org/abs/2405.17696v1

Tamed Langevin sampling under weaker conditions

Motivated by applications to deep learning which often fail standard Lipschitz smoothness requirements, we examine the problem of sampling from distributions that are not log-concave and are only weakly dissipative, with log-gradients allowed to grow superlinearly at infinity. In terms of structure, we only assume that the target distribution satisfies either a log-Sobolev or a Poincar\'e inequality and a local Lipschitz smoothness assumption with modulus growing possibly polynomially at infinity. This set of assumptions greatly exceeds the operational limits of the "vanilla" unadjusted Langevin algorithm (ULA), making sampling from such distributions a highly involved affair. To account for this, we introduce a taming scheme which is tailored to the growth and decay properties of the target distribution, and we provide explicit non-asymptotic guarantees for the proposed sampler in terms of the Kullback-Leibler (KL) divergence, total variation, and Wasserstein distance to the target distribution.

Updated: 2024-05-27 23:00:40

标题: 在较弱条件下的驯服Langevin采样

摘要: 受深度学习应用的启发，通常无法满足标准利普希茨平滑性要求，我们研究了从非对数凹函数且仅具有弱耗散性的分布中进行采样的问题，其中对数梯度允许在无穷远处超线性增长。在结构方面，我们仅假设目标分布满足对数Sobolev或Poincaré不等式和局部利普希茨平滑性假设，其模数可能在无穷远处多项式增长。这一系列假设远远超出了“普通”未经调整的Langevin算法（ULA）的操作限制，使得从这种分布中进行采样成为一项非常复杂的工作。为了解决这个问题，我们引入了一个适应于目标分布增长和衰减特性的驯服方案，并提供了关于所提出采样器的Kullback-Leibler（KL）散度、总变差和Wasserstein距离的明确非渐近保证，以及目标分布。

更新时间: 2024-05-27 23:00:40

领域: stat.ML,cs.LG,cs.NA,math.NA,math.OC,math.PR,Primary 65C05, 60H10, secondary 68Q32

下载: http://arxiv.org/abs/2405.17693v1

Ontology-Enhanced Decision-Making for Autonomous Agents in Dynamic and Partially Observable Environments

Agents, whether software or hardware, perceive their environment through sensors and act using actuators, often operating in dynamic, partially observable settings. They face challenges like incomplete and noisy data, unforeseen situations, and the need to adapt goals in real-time. Traditional reasoning and ML methods, including Reinforcement Learning (RL), help but are limited by data needs, predefined goals, and extensive exploration periods. Ontologies offer a solution by integrating diverse information sources, enhancing decision-making in complex environments. This thesis introduces an ontology-enhanced decision-making model (OntoDeM) for autonomous agents. OntoDeM enriches agents' domain knowledge, allowing them to interpret unforeseen events, generate or adapt goals, and make better decisions. Key contributions include: 1. An ontology-based method to improve agents' real-time observations using prior knowledge. 2. The OntoDeM model for handling dynamic, unforeseen situations by evolving or generating new goals. 3. Implementation and evaluation in four real-world applications, demonstrating its effectiveness. Compared to traditional and advanced learning algorithms, OntoDeM shows superior performance in improving agents' observations and decision-making in dynamic, partially observable environments.

Updated: 2024-05-27 22:52:23

标题: 本体增强的决策制定：在动态和部分可观察环境中的自主代理

摘要: Agents, whether software or hardware, perceive their environment through sensors and act using actuators, often operating in dynamic, partially observable settings. They face challenges like incomplete and noisy data, unforeseen situations, and the need to adapt goals in real-time. Traditional reasoning and ML methods, including Reinforcement Learning (RL), help but are limited by data needs, predefined goals, and extensive exploration periods. Ontologies offer a solution by integrating diverse information sources, enhancing decision-making in complex environments. This thesis introduces an ontology-enhanced decision-making model (OntoDeM) for autonomous agents. OntoDeM enriches agents' domain knowledge, allowing them to interpret unforeseen events, generate or adapt goals, and make better decisions. Key contributions include: 1. An ontology-based method to improve agents' real-time observations using prior knowledge. 2. The OntoDeM model for handling dynamic, unforeseen situations by evolving or generating new goals. 3. Implementation and evaluation in four real-world applications, demonstrating its effectiveness. Compared to traditional and advanced learning algorithms, OntoDeM shows superior performance in improving agents' observations and decision-making in dynamic, partially observable environments.

更新时间: 2024-05-27 22:52:23

领域: cs.AI,cs.LG

下载: http://arxiv.org/abs/2405.17691v1

Linear Function Approximation as a Computationally Efficient Method to Solve Classical Reinforcement Learning Challenges

Neural Network based approximations of the Value function make up the core of leading Policy Based methods such as Trust Regional Policy Optimization (TRPO) and Proximal Policy Optimization (PPO). While this adds significant value when dealing with very complex environments, we note that in sufficiently low State and action space environments, a computationally expensive Neural Network architecture offers marginal improvement over simpler Value approximation methods. We present an implementation of Natural Actor Critic algorithms with actor updates through Natural Policy Gradient methods. This paper proposes that Natural Policy Gradient (NPG) methods with Linear Function Approximation as a paradigm for value approximation may surpass the performance and speed of Neural Network based models such as TRPO and PPO within these environments. Over Reinforcement Learning benchmarks Cart Pole and Acrobot, we observe that our algorithm trains much faster than complex neural network architectures, and obtains an equivalent or greater result. This allows us to recommend the use of NPG methods with Linear Function Approximation over TRPO and PPO for both traditional and sparse reward low dimensional problems.

Updated: 2024-05-27 22:51:58

标题: 线性函数逼近作为解决经典强化学习挑战的计算效率方法

摘要: 基于神经网络的值函数近似构成了领先的基于策略的方法的核心，如TRPO和PPO。尽管在处理非常复杂的环境时增加了显著价值，但我们注意到，在状态和动作空间足够低的环境中，计算昂贵的神经网络架构仅比简单的值函数近似方法略有改善。我们提出了一种通过自然策略梯度方法进行演员更新的自然演员-评论家算法的实现。本文提出，以线性函数近似作为值函数近似的范式的自然策略梯度（NPG）方法可能在这些环境中超越基于神经网络的模型，如TRPO和PPO的性能和速度。在强化学习基准Cart Pole和Acrobot上，我们观察到我们的算法训练速度比复杂的神经网络架构快得多，并获得了等效或更好的结果。这使我们建议在传统和稀疏奖励低维问题中使用NPG方法与线性函数近似，而不是TRPO和PPO。

更新时间: 2024-05-27 22:51:58

领域: cs.LG

下载: http://arxiv.org/abs/2405.20350v1

ED-Copilot: Reduce Emergency Department Wait Time with Language Model Diagnostic Assistance

In the emergency department (ED), patients undergo triage and multiple laboratory tests before diagnosis. This time-consuming process causes ED crowding which impacts patient mortality, medical errors, staff burnout, etc. This work proposes (time) cost-effective diagnostic assistance that leverages artificial intelligence systems to help ED clinicians make efficient and accurate diagnoses. In collaboration with ED clinicians, we use public patient data to curate MIMIC-ED-Assist, a benchmark for AI systems to suggest laboratory tests that minimize wait time while accurately predicting critical outcomes such as death. With MIMIC-ED-Assist, we develop ED-Copilot which sequentially suggests patient-specific laboratory tests and makes diagnostic predictions. ED-Copilot employs a pre-trained bio-medical language model to encode patient information and uses reinforcement learning to minimize ED wait time and maximize prediction accuracy. On MIMIC-ED-Assist, ED-Copilot improves prediction accuracy over baselines while halving average wait time from four hours to two hours. ED-Copilot can also effectively personalize treatment recommendations based on patient severity, further highlighting its potential as a diagnostic assistant. Since MIMIC-ED-Assist is a retrospective benchmark, ED-Copilot is restricted to recommend only observed tests. We show ED-Copilot achieves competitive performance without this restriction as the maximum allowed time increases. Our code is available at https://github.com/cxcscmu/ED-Copilot.

Updated: 2024-05-27 22:30:46

标题: ED-Copilot：利用语言模型诊断辅助缩短急诊科候诊时间

摘要: 在急诊科（ED）中，患者在诊断前需要进行分诊和多种实验室检查。这个耗时的过程导致急诊科拥挤，影响患者的死亡率、医疗错误、医护人员的疲劳等。本文提出了一种（时间）成本效益的诊断辅助方案，利用人工智能系统帮助急诊科医生进行高效和准确的诊断。与急诊科医生合作，我们使用公共患者数据创建了MIMIC-ED-Assist，这是一个为AI系统建议最小化等待时间并准确预测关键结果（如死亡）的基准。使用MIMIC-ED-Assist，我们开发了ED-Copilot，它依次建议患者特定的实验室检查并进行诊断预测。ED-Copilot采用预训练的生物医学语言模型对患者信息进行编码，并利用强化学习来最小化急诊科等待时间并最大化预测准确性。在MIMIC-ED-Assist上，ED-Copilot在基线的基础上提高了预测准确性，同时将平均等待时间从四小时缩短到两小时。ED-Copilot还可以有效根据患者病情个性化治疗建议，进一步突显其作为诊断助手的潜力。由于MIMIC-ED-Assist是一个回顾性基准，ED-Copilot只能推荐观察到的检查。我们展示了在最大允许时间增加时，即使受到限制，ED-Copilot也能达到竞争性的性能。我们的代码可在https://github.com/cxcscmu/ED-Copilot上找到。

更新时间: 2024-05-27 22:30:46

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2402.13448v2

On the Role of Information Structure in Reinforcement Learning for Partially-Observable Sequential Teams and Games

In a sequential decision-making problem, the information structure is the description of how events in the system occurring at different points in time affect each other. Classical models of reinforcement learning (e.g., MDPs, POMDPs) assume a simple and highly regular information structure, while more general models like predictive state representations do not explicitly model the information structure. By contrast, real-world sequential decision-making problems typically involve a complex and time-varying interdependence of system variables, requiring a rich and flexible representation of information structure. In this paper, we formalize a novel reinforcement learning model which explicitly represents the information structure. We then use this model to carry out an information-structural analysis of the statistical hardness of general sequential decision-making problems, obtaining a characterization via a graph-theoretic quantity of the DAG representation of the information structure. We prove an upper bound on the sample complexity of learning a general sequential decision-making problem in terms of its information structure by exhibiting an algorithm achieving the upper bound. This recovers known tractability results and gives a novel perspective on reinforcement learning in general sequential decision-making problems, providing a systematic way of identifying new tractable classes of problems.

Updated: 2024-05-27 22:19:40

标题: 关于信息结构在部分可观察序列团队和游戏强化学习中的作用

摘要: 在一个顺序决策问题中，信息结构是描述系统中不同时间点发生的事件如何相互影响的描述。传统的强化学习模型（例如MDPs、POMDPs）假设一个简单和高度规则的信息结构，而更一般的模型如预测状态表示并不明确地建模信息结构。相比之下，现实世界中的顺序决策问题通常涉及系统变量之间复杂和时变的相互依赖，需要一个丰富和灵活的信息结构表示。在本文中，我们形式化一个显式表示信息结构的新型强化学习模型。然后，我们使用这个模型对一般顺序决策问题的统计困难性进行信息结构分析，通过DAG表示信息结构的图论量来进行表征。我们证明了学习一般顺序决策问题的样本复杂度的上限，通过展示一个实现上限的算法。这恢复了已知的可处理性结果，并为强化学习在一般顺序决策问题中提供了一个新的视角，提供了一种系统地识别新的可处理问题类别的方式。

更新时间: 2024-05-27 22:19:40

领域: cs.LG,cs.AI,stat.ML

下载: http://arxiv.org/abs/2403.00993v2

Low-Rank Approximation of Structural Redundancy for Self-Supervised Learning

We study the data-generating mechanism for reconstructive SSL to shed light on its effectiveness. With an infinite amount of labeled samples, we provide a sufficient and necessary condition for perfect linear approximation. The condition reveals a full-rank component that preserves the label classes of Y, along with a redundant component. Motivated by the condition, we propose to approximate the redundant component by a low-rank factorization and measure the approximation quality by introducing a new quantity $\epsilon_s$, parameterized by the rank of factorization s. We incorporate $\epsilon_s$ into the excess risk analysis under both linear regression and ridge regression settings, where the latter regularization approach is to handle scenarios when the dimension of the learned features is much larger than the number of labeled samples n for downstream tasks. We design three stylized experiments to compare SSL with supervised learning under different settings to support our theoretical findings.

Updated: 2024-05-27 22:11:00

标题: 自监督学习中结构冗余的低秩近似

摘要: 我们研究了重建式半监督学习的数据生成机制，以揭示其有效性。在拥有无限量标记样本的情况下，我们提供了完美线性逼近的充分必要条件。该条件揭示了一个保留Y标签类别的全秩分量，以及一个冗余分量。受该条件启发，我们提出通过低秩因式分解逼近冗余分量，并引入一个新的参数化量$\epsilon_s$来衡量逼近质量，其中s为分解的秩。我们将$\epsilon_s$纳入线性回归和岭回归设置下的超额风险分析中，后者的正则化方法用于处理学习特征维度远大于用于下游任务的标记样本数量n的情况。我们设计了三个风格化实验来比较不同设置下的半监督学习和监督学习，以支持我们的理论发现。

更新时间: 2024-05-27 22:11:00

领域: stat.ML,cs.LG

下载: http://arxiv.org/abs/2402.06884v2

TIMA: Text-Image Mutual Awareness for Balancing Zero-Shot Adversarial Robustness and Generalization Ability

This work addresses the challenge of achieving zero-shot adversarial robustness while preserving zero-shot generalization in large-scale foundation models, with a focus on the popular Contrastive Language-Image Pre-training (CLIP). Although foundation models were reported to have exceptional zero-shot generalization, they are highly vulnerable to adversarial perturbations. Existing methods achieve a comparable good tradeoff between zero-shot adversarial robustness and generalization under small adversarial perturbations. However, they fail to achieve a good tradeoff under large adversarial perturbations. To this end, we propose a novel Text-Image Mutual Awareness (TIMA) method that strikes a balance between zero-shot adversarial robustness and generalization. More precisely, we propose an Image-Aware Text (IAT) tuning mechanism that increases the inter-class distance of text embeddings by incorporating the Minimum Hyperspherical Energy (MHE). Simultaneously, fixed pre-trained image embeddings are used as cross-modal auxiliary supervision to maintain the similarity between the MHE-tuned and original text embeddings by the knowledge distillation, preserving semantic information between different classes. Besides, we introduce a Text-Aware Image (TAI) tuning mechanism, which increases inter-class distance between image embeddings during the training stage by Text-distance based Adaptive Margin (TAM). Similarly, a knowledge distillation is utilized to retain the similarity between fine-tuned and pre-trained image embeddings. Extensive experimental results demonstrate the effectiveness of our approach, showing impressive zero-shot performance against a wide range of adversarial perturbations while preserving the zero-shot generalization capabilities of the original CLIP model.

Updated: 2024-05-27 22:10:17

标题: TIMA：文本-图像相互感知，平衡零样本对抗鲁棒性和泛化能力

摘要: 这项工作解决了在大规模基础模型中实现零射击对抗鲁棒性并保持零射击泛化的挑战，重点关注流行的对比语言-图像预训练（CLIP）。尽管基础模型被报道具有异常的零射击泛化能力，但它们对对抗性扰动非常脆弱。现有方法在小型对抗扰动下实现了零射击对抗鲁棒性和泛化之间的可比较良好权衡。然而，在大型对抗扰动下，它们未能实现良好的权衡。为此，我们提出了一种新颖的文本-图像相互感知（TIMA）方法，可以在零射击对抗鲁棒性和泛化之间取得平衡。更具体地说，我们提出了一种图像感知文本（IAT）调整机制，通过整合最小超球能量（MHE）来增加文本嵌入的类间距离。同时，固定的预训练图像嵌入被用作跨模态辅助监督，通过知识蒸馏来保持MHE调整和原始文本嵌入之间的相似性，保留不同类别之间的语义信息。此外，我们引入了一种文本感知图像（TAI）调整机制，在训练阶段通过基于文本距离的自适应边距（TAM）增加图像嵌入之间的类间距离。类似地，利用知识蒸馏来保持精调和预训练图像嵌入之间的相似性。广泛的实验结果证明了我们方法的有效性，展示了在各种对抗性扰动下令人印象深刻的零射击性能，同时保持了原始CLIP模型的零射击泛化能力。

更新时间: 2024-05-27 22:10:17

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2405.17678v1

Optimistic Safety for Online Convex Optimization with Unknown Linear Constraints

We study the problem of online convex optimization (OCO) under unknown linear constraints that are either static, or stochastically time-varying. For this problem, we introduce an algorithm that we term Optimistically Safe OCO (OSOCO) and show that it enjoys $\tilde{\mathcal{O}}(\sqrt{T})$ regret and no constraint violation. In the case of static linear constraints, this improves on the previous best known $\tilde{\mathcal{O}}(T^{2/3})$ regret with only slightly stronger assumptions. In the case of stochastic time-varying constraints, our work supplements existing results that show $\mathcal{O}(\sqrt{T})$ regret and $\mathcal{O}(\sqrt{T})$ cumulative violation under more general convex constraints albeit a less general feedback model. In addition to our theoretical guarantees, we also give numerical results comparing the performance of OSOCO to existing algorithms.

Updated: 2024-05-27 22:07:51

标题: 未知线性约束下的在线凸优化的乐观安全性

摘要: 我们研究了在线凸优化（OCO）问题，在未知的线性约束条件下，这些约束条件可以是静态的，也可以是随机时间变化的。针对这个问题，我们引入了一种算法，称为乐观安全OCO（OSOCO），并证明它具有$\tilde{\mathcal{O}}(\sqrt{T})$的后悔值，并且不会违反约束条件。在静态线性约束条件下，这优于先前已知的$\tilde{\mathcal{O}}(T^{2/3})$的后悔值，仅有略微更强的假设。在随机时间变化的约束条件下，我们的工作补充了现有结果，表明在更一般的凸约束条件下，尽管反馈模型较不普遍，但具有$\mathcal{O}(\sqrt{T})$的后悔值和$\mathcal{O}(\sqrt{T})$的累计违规。除了我们的理论保证，我们还给出了比较OSOCO与现有算法性能的数值结果。

更新时间: 2024-05-27 22:07:51

领域: cs.LG,math.OC

下载: http://arxiv.org/abs/2403.05786v2

Utilising a Quantum Hybrid Solver for Bi-objective Quadratic Assignment Problems

The intersection between quantum computing and optimisation has been an area of interest in recent years. There have been numerous studies exploring the application of quantum and quantum-hybrid solvers to various optimisation problems. This work explores scalarisation methods within the context of solving the bi-objective quadratic assignment problem using a quantum-hybrid solver. We show results that are consistent with previous research on a different Ising machine.

Updated: 2024-05-27 22:03:26

标题: 利用量子混合求解器解决双目标二次分配问题

摘要: 最近几年，量子计算和优化之间的交叉点一直是一个备受关注的领域。已经有许多研究探索了将量子和量子混合求解器应用于各种优化问题。本文探讨了在使用量子混合求解器解决双目标二次分配问题时的标量化方法。我们展示的结果与先前在另一种伊辛机器上的研究结果一致。

更新时间: 2024-05-27 22:03:26

领域: quant-ph,cs.AI,G.1.6

下载: http://arxiv.org/abs/2405.17676v1

Fast Samplers for Inverse Problems in Iterative Refinement Models

Constructing fast samplers for unconditional diffusion and flow-matching models has received much attention recently; however, existing methods for solving inverse problems, such as super-resolution, inpainting, or deblurring, still require hundreds to thousands of iterative steps to obtain high-quality results. We propose a plug-and-play framework for constructing efficient samplers for inverse problems, requiring only pre-trained diffusion or flow-matching models. We present Conditional Conjugate Integrators, which leverage the specific form of the inverse problem to project the respective conditional diffusion/flow dynamics into a more amenable space for sampling. Our method complements popular posterior approximation methods for solving inverse problems using diffusion/flow models. We evaluate the proposed method's performance on various linear image restoration tasks across multiple datasets, employing diffusion and flow-matching models. Notably, on challenging inverse problems like 4$\times$ super-resolution on the ImageNet dataset, our method can generate high-quality samples in as few as 5 conditional sampling steps and outperforms competing baselines requiring 20-1000 steps. Our code and models will be publicly available at https://github.com/mandt-lab/CI2RM.

Updated: 2024-05-27 21:50:16

标题: 在迭代细化模型中用于逆问题的快速采样器

摘要: 最近，构建用于无条件扩散和流匹配模型的快速采样器引起了广泛关注；然而，现有的用于解决逆问题（如超分辨率、修补或去模糊）的方法仍然需要数百到数千次迭代才能获得高质量结果。我们提出了一个即插即用的框架，用于构建逆问题的高效采样器，只需要预先训练的扩散或流匹配模型。我们提出了条件共轭积分器，利用逆问题的特定形式将相应的条件扩散/流动动态投影到更易于采样的空间中。我们的方法为使用扩散/流模型解决逆问题的流行后验逼近方法提供了补充。我们在多个数据集上评估了所提出方法在各种线性图像恢复任务中的性能，采用了扩散和流匹配模型。值得注意的是，在ImageNet数据集上进行4倍超分辨率等具有挑战性的逆问题时，我们的方法可以在仅 5 个条件采样步骤中生成高质量样本，并且优于竞争基线，后者需要 20-1000 步。我们的代码和模型将在https://github.com/mandt-lab/CI2RM 上公开提供。

更新时间: 2024-05-27 21:50:16

领域: cs.CV,cs.LG,stat.ML

下载: http://arxiv.org/abs/2405.17673v1

Exploring Loss Design Techniques For Decision Tree Robustness To Label Noise

In the real world, data is often noisy, affecting not only the quality of features but also the accuracy of labels. Current research on mitigating label errors stems primarily from advances in deep learning, and a gap exists in exploring interpretable models, particularly those rooted in decision trees. In this study, we investigate whether ideas from deep learning loss design can be applied to improve the robustness of decision trees. In particular, we show that loss correction and symmetric losses, both standard approaches, are not effective. We argue that other directions need to be explored to improve the robustness of decision trees to label noise.

Updated: 2024-05-27 21:49:57

标题: 探索损失设计技术，以提高决策树对标签噪声的鲁棒性

摘要: 在现实世界中，数据往往存在噪音，影响不仅是特征的质量，还包括标签的准确性。目前关于减轻标签错误的研究主要源自深度学习的进展，存在一个探索可解释模型的空白，特别是基于决策树的模型。在这项研究中，我们探讨了是否可以应用深度学习损失设计的思想来提高决策树的鲁棒性。特别地，我们发现损失校正和对称损失，这两种标准方法，并不有效。我们认为需要探索其他方向来提高决策树对标签噪音的鲁棒性。

更新时间: 2024-05-27 21:49:57

领域: cs.LG,cs.AI,stat.ML

下载: http://arxiv.org/abs/2405.17672v1

Hunting for Polluted White Dwarfs and Other Treasures with Gaia XP Spectra and Unsupervised Machine Learning

White dwarfs (WDs) polluted by exoplanetary material provide the unprecedented opportunity to directly observe the interiors of exoplanets. However, spectroscopic surveys are often limited by brightness constraints, and WDs tend to be very faint, making detections of large populations of polluted WDs difficult. In this paper, we aim to increase considerably the number of WDs with multiple metals in their atmospheres. Using 96,134 WDs with Gaia DR3 BP/RP (XP) spectra, we constructed a 2D map using an unsupervised machine learning technique called Uniform Manifold Approximation and Projection (UMAP) to organize the WDs into identifiable spectral regions. The polluted WDs are among the distinct spectral groups identified in our map. We have shown that this selection method could potentially increase the number of known WDs with 5 or more metal species in their atmospheres by an order of magnitude. Such systems are essential for characterizing exoplanet diversity and geology.

Updated: 2024-05-27 21:44:14

标题: 使用Gaia XP光谱和无监督机器学习寻找受污染的白矮星和其他宝藏

摘要: 被外行星物质污染的白矮星（WDs）提供了直接观测外行星内部的前所未有的机会。然而，光谱调查通常受到亮度限制，而WDs往往非常暗淡，使得检测到大量被污染的WDs变得困难。在本文中，我们旨在显著增加具有多种金属在其大气层中的WDs数量。利用Gaia DR3 BP/RP（XP）光谱的96,134个WDs，我们使用一种称为均匀流形逼近和投影（UMAP）的无监督机器学习技术构建了一个2D地图，将WDs组织成可识别的光谱区域。被污染的WDs是我们地图中识别出的明显光谱组之一。我们已经表明，这种选择方法有可能将已知的大气层中含有5种或更多金属物种的WDs数量增加一个数量级。这种系统对于表征外行星多样性和地质学至关重要。

更新时间: 2024-05-27 21:44:14

领域: astro-ph.SR,astro-ph.EP,cs.LG

下载: http://arxiv.org/abs/2405.17667v1

Structured Partial Stochasticity in Bayesian Neural Networks

Bayesian neural network posterior distributions have a great number of modes that correspond to the same network function. The abundance of such modes can make it difficult for approximate inference methods to do their job. Recent work has demonstrated the benefits of partial stochasticity for approximate inference in Bayesian neural networks; inference can be less costly and performance can sometimes be improved. I propose a structured way to select the deterministic subset of weights that removes neuron permutation symmetries, and therefore the corresponding redundant posterior modes. With a drastically simplified posterior distribution, the performance of existing approximate inference schemes is found to be greatly improved.

Updated: 2024-05-27 21:40:31

标题: 贝叶斯神经网络中的结构化部分随机性

摘要: 贝叶斯神经网络后验分布具有许多与相同网络功能对应的模式。这些模式的丰富性可能会使近似推断方法难以发挥作用。最近的研究表明，对于贝叶斯神经网络的近似推断，部分随机性的好处已被证明; 推断成本可以降低，性能有时可以提高。我提出了一种结构化方法来选择消除神经元置换对称性的确定性权重子集，从而消除相应的冗余后验模式。通过大幅简化后验分布，发现现有近似推断方案的性能得到极大改善。

更新时间: 2024-05-27 21:40:31

领域: stat.ML,cs.LG

下载: http://arxiv.org/abs/2405.17666v1

RICE: Breaking Through the Training Bottlenecks of Reinforcement Learning with Explanation

Deep reinforcement learning (DRL) is playing an increasingly important role in real-world applications. However, obtaining an optimally performing DRL agent for complex tasks, especially with sparse rewards, remains a significant challenge. The training of a DRL agent can be often trapped in a bottleneck without further progress. In this paper, we propose RICE, an innovative refining scheme for reinforcement learning that incorporates explanation methods to break through the training bottlenecks. The high-level idea of RICE is to construct a new initial state distribution that combines both the default initial states and critical states identified through explanation methods, thereby encouraging the agent to explore from the mixed initial states. Through careful design, we can theoretically guarantee that our refining scheme has a tighter sub-optimality bound. We evaluate RICE in various popular RL environments and real-world applications. The results demonstrate that RICE significantly outperforms existing refining schemes in enhancing agent performance.

Updated: 2024-05-27 21:37:40

标题: RICE: 通过解释突破强化学习的训练瓶颈

摘要: 深度强化学习（DRL）在现实世界应用中扮演着越来越重要的角色。然而，对于复杂任务，尤其是在奖励稀疏的情况下，获得一个性能最佳的DRL代理仍然是一个重要挑战。DRL代理的训练经常会陷入瓶颈而无法取得进展。在本文中，我们提出了RICE，一种创新的强化学习细化方案，它结合了解释方法来突破训练瓶颈。RICE的高级思想是构建一个新的初始状态分布，结合了默认初始状态和通过解释方法识别的关键状态，从而鼓励代理从混合初始状态进行探索。通过精心设计，我们可以在理论上保证我们的细化方案具有更严格的次优性界限。我们在各种流行的RL环境和现实世界应用中评估了RICE。结果表明，RICE在提高代理性能方面明显优于现有的细化方案。

更新时间: 2024-05-27 21:37:40

领域: cs.LG,cs.AI,cs.CR

下载: http://arxiv.org/abs/2405.03064v2

What's the Opposite of a Face? Finding Shared Decodable Concepts and their Negations in the Brain

Prior work has offered evidence for functional localization in the brain; different anatomical regions preferentially activate for certain types of visual input. For example, the fusiform face area preferentially activates for visual stimuli that include a face. However, the spectrum of visual semantics is extensive, and only a few semantically-tuned patches of cortex have so far been identified in the human brain. Using a multimodal (natural language and image) neural network architecture (CLIP) we train a highly accurate contrastive model that maps brain responses during naturalistic image viewing to CLIP embeddings. We then use a novel adaptation of the DBSCAN clustering algorithm to cluster the parameters of these participant-specific contrastive models. This reveals what we call Shared Decodable Concepts (SDCs): clusters in CLIP space that are decodable from common sets of voxels across multiple participants. Examining the images most and least associated with each SDC cluster gives us additional insight into the semantic properties of each SDC. We note SDCs for previously reported visual features (e.g. orientation tuning in early visual cortex) as well as visual semantic concepts such as faces, places and bodies. In cases where our method finds multiple clusters for a visuo-semantic concept, the least associated images allow us to dissociate between confounding factors. For example, we discovered two clusters of food images, one driven by color, the other by shape. We also uncover previously unreported areas such as regions of extrastriate body area (EBA) tuned for legs/hands and sensitivity to numerosity in right intraparietal sulcus, and more. Thus, our contrastive-learning methodology better characterizes new and existing visuo-semantic representations in the brain by leveraging multimodal neural network representations and a novel adaptation of clustering algorithms.

Updated: 2024-05-27 21:28:26

标题: 脸的反义是什么？在大脑中找到共享可解码概念及其否定形式

摘要: 以前的研究提供了大脑功能定位的证据；不同的解剖区域会优先激活特定类型的视觉输入。例如，颞下面部位于包含脸部的视觉刺激时会优先激活。然而，视觉语义的范围很广，迄今为止在人类大脑中只有少数几个语义调谐的皮层区域被确定。我们使用多模态（自然语言和图像）神经网络架构（CLIP）训练了一个高度准确的对比模型，将自然图像查看期间的大脑反应映射到CLIP嵌入中。然后，我们使用DBSCAN聚类算法的一种新颖改进来对这些特定参与者的对比模型的参数进行聚类。这揭示了我们所称之为共享可解码概念（SDCs）：在多个参与者中通过共同的一组体素解码的CLIP空间中的聚类。检查与每个SDC集群最相关和最不相关的图像为我们提供了对每个SDC的语义属性的额外见解。我们注意到先前报道的视觉特征（例如早期视觉皮层中的方向调谐）以及视觉语义概念，如面部、地点和身体的SDC。在我们的方法发现一个视觉语义概念的多个集群的情况下，最不相关的图像使我们能够区分混淆因素。例如，我们发现了两个由颜色驱动的食物图像集群，另一个由形状驱动。我们还发现了以前未报道的区域，如额外视觉体区域（EBA）针对腿/手以及对右侧颞顶沟中的数字敏感性。因此，我们的对比学习方法通过利用多模态神经网络表示和聚类算法的新颖改进更好地表征了大脑中的新旧视觉语义表示。

更新时间: 2024-05-27 21:28:26

领域: cs.LG,cs.CV

下载: http://arxiv.org/abs/2405.17663v1

Beyond Random Augmentations: Pretraining with Hard Views

Many Self-Supervised Learning (SSL) methods aim for model invariance to different image augmentations known as views. To achieve this invariance, conventional approaches make use of random sampling operations within the image augmentation pipeline. We hypothesize that the efficacy of pretraining pipelines based on conventional random view sampling can be enhanced by explicitly selecting views that benefit the learning progress. A simple, yet effective approach is to select hard views that yield a higher loss. In this paper, we present Hard View Pretraining (HVP), a learning-free strategy that builds upon this hypothesis and extends random view generation. HVP exposes the model to harder, more challenging samples during SSL pretraining, which enhances downstream performance. It encompasses the following iterative steps: 1) randomly sample multiple views and forward each view through the pretrained model, 2) create pairs of two views and compute their loss, 3) adversarially select the pair yielding the highest loss depending on the current model state, and 4) run the backward pass with the selected pair. As a result, HVP achieves linear evaluation accuracy improvements of 1% on average on ImageNet for both 100 and 300 epoch pretraining and similar improvements on transfer tasks across DINO, SimSiam, iBOT, and SimCLR.

Updated: 2024-05-27 21:19:55

标题: 超越随机增强：采用困难视角进行预训练

摘要: 许多自监督学习（SSL）方法旨在使模型对不同图像增强（称为视图）具有不变性。为了实现这种不变性，传统方法利用图像增强管道中的随机抽样操作。我们假设基于传统随机视图抽样的预训练管道的有效性可以通过明确选择有益于学习进展的视图来增强。一个简单但有效的方法是选择产生更高损失的困难视图。在本文中，我们提出了Hard View Pretraining（HVP），这是一种无需学习的策略，基于这一假设并扩展了随机视图生成。HVP在SSL预训练过程中使模型暴露于更困难、更具挑战性的样本，从而提高了下游性能。它包括以下迭代步骤：1）随机抽样多个视图，并将每个视图通过预训练模型前向传递，2）创建两个视图的配对并计算它们的损失，3）根据当前模型状态对产生最高损失的配对进行对抗选择，4）使用所选配对进行反向传递。结果，HVP在ImageNet上平均线性评估准确率提高了1％，无论是在100还是300个时代的预训练过程中，以及在DINO、SimSiam、iBOT和SimCLR的转移任务上获得类似的改进。

更新时间: 2024-05-27 21:19:55

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2310.03940v5

Class Distribution Shifts in Zero-Shot Learning: Learning Robust Representations

Zero-shot learning methods typically assume that the new, unseen classes that are encountered at deployment, come from the same distribution as training classes. However, real-world scenarios often involve class distribution shifts (e.g., in age or gender for person identification), posing challenges for zero-shot classifiers that rely on learned representations from training classes. In this work, we propose a model that assumes that the attribute responsible for the shift is unknown in advance, and show that standard training may lead to non-robust representations. To mitigate this, we propose an algorithm for learning robust representations by (a) constructing synthetic data environments via hierarchical sampling and (b) applying environment balancing penalization, inspired by out-of-distribution problems. We show that our approach improves generalization on diverse class distributions in both simulations and real-world datasets.

Updated: 2024-05-27 21:19:20

标题: Zero-Shot Learning中的类别分布转移：学习稳健表示

摘要: 零样本学习方法通常假设在部署时遇到的新的、未见过的类别来自与训练类别相同的分布。然而，在现实世界中，场景常常涉及类别分布的变化（例如，对于人员识别中的年龄或性别），这给依赖于训练类别学习表示的零样本分类器带来挑战。在这项工作中，我们提出了一种模型，假设导致转变的属性事先是未知的，并且表明标准训练可能导致非鲁棒的表示。为了缓解这一问题，我们提出了一种通过（a）通过分层抽样构建合成数据环境和（b）应用环境平衡惩罚的算法来学习鲁棒表示，灵感来自于超出分布问题。我们展示了我们的方法在模拟和真实世界数据集中改善了对多样化类别分布的泛化能力。

更新时间: 2024-05-27 21:19:20

领域: cs.LG

下载: http://arxiv.org/abs/2311.18575v3

Dual-Activated Lightweight Attention ResNet50 for Automatic Histopathology Breast Cancer Image Classification

Automatic breast cancer classification in histopathology images is crucial for precise diagnosis and treatment planning. Recently, classification approaches based on the ResNet architecture have gained popularity for significantly improving accuracy by using skip connections to mitigate vanishing gradient problems, thereby integrating low-level and high-level feature information. Nevertheless, the conventional ResNet architecture faces challenges such as data imbalance and limited interpretability, necessitating cross-domain knowledge and collaboration among medical experts. This study effectively addresses these challenges by introducing a novel method for breast cancer classification, the Dual-Activated Lightweight Attention ResNet50 (DALAResNet50) model. It integrates a pre-trained ResNet50 model with a lightweight attention mechanism, embedding an attention module in the fourth layer of ResNet50 and incorporating two fully connected layers with LeakyReLU and ReLU activation functions to enhance feature learning capabilities. The DALAResNet50 method was tested on breast cancer histopathology images from the BreakHis Database across magnification factors of 40X, 100X, 200X, and 400X, achieving accuracies of 98.5%, 98.7%, 97.9%, and 94.3%, respectively. It was also compared with established deep learning models such as SEResNet50, DenseNet121, VGG16, VGG16Inception, ViT, Swin-Transformer, Dinov2_Vitb14, and ResNet50. The reported results of DALAResNet50 have been shown to outperform the compared approaches regarding accuracy, F1 score, IBA, and GMean, demonstrating significant robustness and broad applicability when dealing with different magnifications and imbalanced breast cancer datasets

Updated: 2024-05-27 21:14:43

标题: 双激活轻量级注意力ResNet50用于自动组织病理学乳腺癌图像分类

摘要: 在组织病理学图像中自动分类乳腺癌对于精确诊断和治疗计划至关重要。最近，基于ResNet架构的分类方法因使用跳过连接以缓解梯度消失问题，从而整合低级和高级特征信息而变得流行，从而显着提高了准确性。然而，传统的ResNet架构面临诸如数据不平衡和有限的可解释性等挑战，需要跨领域知识和医学专家之间的合作。本研究通过引入一种新的乳腺癌分类方法，Dual-Activated Lightweight Attention ResNet50（DALAResNet50）模型，有效地解决了这些挑战。它将经过预训练的ResNet50模型与轻量级注意机制相结合，在ResNet50的第四层中嵌入了一个注意模块，并将两个具有LeakyReLU和ReLU激活函数的全连接层合并，以增强特征学习能力。DALAResNet50方法在BreakHis数据库中的乳腺癌组织病理学图像上进行了测试，涵盖40X、100X、200X和400X的放大倍率，分别实现了98.5%、98.7%、97.9%和94.3%的准确率。它还与已建立的深度学习模型进行了比较，如SEResNet50、DenseNet121、VGG16、VGG16Inception、ViT、Swin-Transformer、Dinov2_Vitb14和ResNet50。据报道，DALAResNet50的结果已经显示出在准确性、F1分数、IBA和GMean方面优于比较方法，表明在处理不同放大倍率和不平衡乳腺癌数据集时具有显著的鲁棒性和广泛适用性。

更新时间: 2024-05-27 21:14:43

领域: eess.IV,cs.CV,cs.LG

下载: http://arxiv.org/abs/2308.13150v9

EnCoMP: Enhanced Covert Maneuver Planning with Adaptive Threat-Aware Visibility Estimation using Offline Reinforcement Learning

Autonomous robots operating in complex environments face the critical challenge of identifying and utilizing environmental cover for covert navigation to minimize exposure to potential threats. We propose EnCoMP, an enhanced navigation framework that integrates offline reinforcement learning and our novel Adaptive Threat-Aware Visibility Estimation (ATAVE) algorithm to enable robots to navigate covertly and efficiently in diverse outdoor settings. ATAVE is a dynamic probabilistic threat modeling technique that we designed to continuously assess and mitigate potential threats in real-time, enhancing the robot's ability to navigate covertly by adapting to evolving environmental and threat conditions. Moreover, our approach generates high-fidelity multi-map representations, including cover maps, potential threat maps, height maps, and goal maps from LiDAR point clouds, providing a comprehensive understanding of the environment. These multi-maps offer detailed environmental insights, helping in strategic navigation decisions. The goal map encodes the relative distance and direction to the target location, guiding the robot's navigation. We train a Conservative Q-Learning (CQL) model on a large-scale dataset collected from real-world environments, learning a robust policy that maximizes cover utilization, minimizes threat exposure, and maintains efficient navigation. We demonstrate our method's capabilities on a physical Jackal robot, showing extensive experiments across diverse terrains. These experiments demonstrate EnCoMP's superior performance compared to state-of-the-art methods, achieving a 95% success rate, 85% cover utilization, and reducing threat exposure to 10.5%, while significantly outperforming baselines in navigation efficiency and robustness.

Updated: 2024-05-27 21:07:28

标题: EnCoMP：使用离线强化学习的自适应威胁感知可见性估计增强的隐蔽机动规划

摘要: 在复杂环境中操作的自主机器人面临着识别和利用环境掩护进行隐蔽导航以最小化暴露于潜在威胁的关键挑战。我们提出了EnCoMP，这是一个增强的导航框架，它整合了离线强化学习和我们的新颖的自适应威胁感知能见度估计（ATAVE）算法，使机器人能够在不同的室外环境中隐蔽、高效地导航。ATAVE是一种动态的概率威胁建模技术，我们设计它来持续评估和缓解实时潜在威胁，通过适应环境和威胁条件的演变，增强机器人隐蔽导航的能力。此外，我们的方法从LiDAR点云生成高保真度的多地图表示，包括掩护地图、潜在威胁地图、高度地图和目标地图，提供对环境的全面理解。这些多地图提供了详细的环境见解，有助于战略导航决策。目标地图编码了到目标位置的相对距离和方向，引导机器人的导航。我们在从现实环境中收集的大规模数据集上训练了保守Q学习（CQL）模型，学习了一个最大化掩护利用、最小化威胁暴露和保持高效导航的强健策略。我们在一个实际的Jackal机器人上展示了我们方法的能力，展示了在不同地形上进行的广泛实验。这些实验表明，与最先进方法相比，EnCoMP的性能优越，成功率达到95%，掩护利用率为85%，威胁暴露降低至10.5%，同时在导航效率和鲁棒性方面明显优于基线。

更新时间: 2024-05-27 21:07:28

领域: cs.RO,cs.LG

下载: http://arxiv.org/abs/2403.20016v2

A Bayesian Approach to Online Learning for Contextual Restless Bandits with Applications to Public Health

Public health programs often provide interventions to encourage beneficiary adherence,and effectively allocating interventions is vital for producing the greatest overall health outcomes. Such resource allocation problems are often modeled as restless multi-armed bandits (RMABs) with unknown underlying transition dynamics, hence requiring online reinforcement learning (RL). We present Bayesian Learning for Contextual RMABs (BCoR), an online RL approach for RMABs that novelly combines techniques in Bayesian modeling with Thompson sampling to flexibly model the complex RMAB settings present in public health program adherence problems, such as context and non-stationarity. BCoR's key strength is the ability to leverage shared information within and between arms to learn the unknown RMAB transition dynamics quickly in intervention-scarce settings with relatively short time horizons, which is common in public health applications. Empirically, BCoR achieves substantially higher finite-sample performance over a range of experimental settings, including an example based on real-world adherence data that was developed in collaboration with ARMMAN, an NGO in India which runs a large-scale maternal health program, showcasing BCoR practical utility and potential for real-world deployment.

Updated: 2024-05-27 21:03:41

标题: 一个贝叶斯方法用于上下文环境下的不安静老虎机的在线学习，并应用于公共卫生领域

摘要: 公共卫生项目经常提供干预措施来鼓励受益人的依从，并有效地分配干预措施对于产生最佳的整体健康结果至关重要。这类资源分配问题通常被建模为具有未知基础转换动态的不安定多臂老虎机（RMABs），因此需要在线强化学习（RL）。我们提出了一种名为Bayesian Learning for Contextual RMABs（BCoR）的在线RL方法，该方法将贝叶斯建模技术与汤普森抽样技术相结合，灵活地模拟了公共卫生项目依从问题中存在的复杂的RMAB设置，如上下文和非静态性。BCoR的关键优势在于能够利用在和之间的共享信息，以便在干预稀缺、时间范围相对较短的情况下快速学习未知的RMAB转换动态，这在公共卫生应用中很常见。从经验上看，BCoR在一系列实验设置中实现了明显更高的有限样本性能，包括基于与印度的ARMMAN合作开发的真实依从性数据的示例，ARMMAN是一家在印度开展大规模母婴健康项目的非政府组织，展示了BCoR在实践中的实用性和在现实世界部署的潜力。

更新时间: 2024-05-27 21:03:41

领域: cs.LG,stat.AP

下载: http://arxiv.org/abs/2402.04933v2

Discrete Probabilistic Inference as Control in Multi-path Environments

We consider the problem of sampling from a discrete and structured distribution as a sequential decision problem, where the objective is to find a stochastic policy such that objects are sampled at the end of this sequential process proportionally to some predefined reward. While we could use maximum entropy Reinforcement Learning (MaxEnt RL) to solve this problem for some distributions, it has been shown that in general, the distribution over states induced by the optimal policy may be biased in cases where there are multiple ways to generate the same object. To address this issue, Generative Flow Networks (GFlowNets) learn a stochastic policy that samples objects proportionally to their reward by approximately enforcing a conservation of flows across the whole Markov Decision Process (MDP). In this paper, we extend recent methods correcting the reward in order to guarantee that the marginal distribution induced by the optimal MaxEnt RL policy is proportional to the original reward, regardless of the structure of the underlying MDP. We also prove that some flow-matching objectives found in the GFlowNet literature are in fact equivalent to well-established MaxEnt RL algorithms with a corrected reward. Finally, we study empirically the performance of multiple MaxEnt RL and GFlowNet algorithms on multiple problems involving sampling from discrete distributions.

Updated: 2024-05-27 20:58:38

标题: 多路径环境下的离散概率推断作为控制

摘要: 我们考虑从离散且结构化分布中进行采样的问题，将其视为一个顺序决策问题，其目标是找到一个随机策略，使得在这个顺序过程的末尾按照预定义的奖励比例对对象进行采样。虽然我们可以使用最大熵强化学习（MaxEnt RL）来解决一些分布的问题，但已经证明，在一般情况下，由最优策略引起的状态分布可能存在偏差，特别是在存在多种生成相同对象的方式时。为解决这一问题，生成流网络（GFlowNets）学习一种随机策略，通过大致强制在整个马尔可夫决策过程（MDP）上保持流的平衡来按照奖励对对象进行采样。在本文中，我们扩展了最近的方法，通过校正奖励来保证由最优MaxEnt RL策略引起的边际分布与原始奖励成比例，而不受基础MDP结构的影响。我们还证明了一些在GFlowNet文献中发现的流匹配目标实际上等同于具有校正奖励的已建立的MaxEnt RL算法。最后，我们在多个涉及从离散分布中进行采样的问题上，实证研究了多个MaxEnt RL和GFlowNet算法的性能。

更新时间: 2024-05-27 20:58:38

领域: cs.LG

下载: http://arxiv.org/abs/2402.10309v2

Alignment is Key for Applying Diffusion Models to Retrosynthesis

Retrosynthesis, the task of identifying precursors for a given molecule, can be naturally framed as a conditional graph generation task. Diffusion models are a particularly promising modelling approach, enabling post-hoc conditioning and trading off quality for speed during generation. We show mathematically that permutation equivariant denoisers severely limit the expressiveness of graph diffusion models and thus their adaptation to retrosynthesis. To address this limitation, we relax the equivariance requirement such that it only applies to aligned permutations of the conditioning and the generated graphs obtained through atom mapping. Our new denoiser achieves the highest top-$1$ accuracy ($54.7$\%) across template-free and template-based methods on USPTO-50k. We also demonstrate the ability for flexible post-training conditioning and good sample quality with small diffusion step counts, highlighting the potential for interactive applications and additional controls for multi-step planning.

Updated: 2024-05-27 20:57:19

标题: 对应的中文翻译是：对齐是将扩散模型应用于回溯合成的关键

摘要: 逆合成是识别给定分子前体的任务，可以自然地构建为一项条件图生成任务。扩散模型是一种特别有前途的建模方法，可以在生成过程中进行事后条件设定，并在生成过程中权衡质量和速度。我们在数学上证明了置换等变去噪器严重限制了图扩散模型的表达能力，从而限制了它们对逆合成的适应性。为了解决这一限制，我们放宽了等变性要求，使其仅适用于通过原子映射获得的条件和生成图的对齐置换。我们的新去噪器在USPTO-50k上实现了最高的前1准确率（54.7％），跨模板自由和基于模板的方法。我们还展示了可以在训练后进行灵活的条件设定，并在扩散步数较小的情况下获得良好的样本质量，突显了互动应用和额外控制多步规划的潜力。

更新时间: 2024-05-27 20:57:19

领域: cs.LG,q-bio.QM

下载: http://arxiv.org/abs/2405.17656v1

InversionView: A General-Purpose Method for Reading Information from Neural Activations

The inner workings of neural networks can be better understood if we can fully decipher the information encoded in neural activations. In this paper, we argue that this information is embodied by the subset of inputs that give rise to similar activations. Computing such subsets is nontrivial as the input space is exponentially large. We propose InversionView, which allows us to practically inspect this subset by sampling from a trained decoder model conditioned on activations. This helps uncover the information content of activation vectors, and facilitates understanding of the algorithms implemented by transformer models. We present three case studies where we investigate models ranging from small transformers to GPT-2. In these studies, we demonstrate the characteristics of our method, show the distinctive advantages it offers, and provide causally verified circuits.

Updated: 2024-05-27 20:53:22

标题: InversionView：从神经激活中读取信息的通用方法

摘要: 神经网络的内部运作可以更好地理解，如果我们能够完全解读神经激活中编码的信息。在本文中，我们认为这些信息体现在那些导致类似激活的输入子集中。计算这样的子集是非常复杂的，因为输入空间是指数级的大。我们提出了InversionView，它允许我们通过从受训解码器模型中采样来实际检查这个子集，这个模型是基于激活条件的。这有助于揭示激活向量的信息内容，并促进理解转换器模型实施的算法。我们展示了三个案例研究，我们在这些研究中调查了从小型转换器到GPT-2的模型。在这些研究中，我们展示了我们的方法的特点，展示了它提供的独特优势，并提供了因果验证电路。

更新时间: 2024-05-27 20:53:22

领域: cs.LG

下载: http://arxiv.org/abs/2405.17653v1

Bivariate Causal Discovery using Bayesian Model Selection

Much of the causal discovery literature prioritises guaranteeing the identifiability of causal direction in statistical models. For structures within a Markov equivalence class, this requires strong assumptions which may not hold in real-world datasets, ultimately limiting the usability of these methods. Building on previous attempts, we show how to incorporate causal assumptions within the Bayesian framework. Identifying causal direction then becomes a Bayesian model selection problem. This enables us to construct models with realistic assumptions, and consequently allows for the differentiation between Markov equivalent causal structures. We analyse why Bayesian model selection works in situations where methods based on maximum likelihood fail. To demonstrate our approach, we construct a Bayesian non-parametric model that can flexibly model the joint distribution. We then outperform previous methods on a wide range of benchmark datasets with varying data generating assumptions.

Updated: 2024-05-27 20:43:50

标题: 双变量因果发现中的贝叶斯模型选择

摘要: 许多因果发现文献优先考虑在统计模型中保证因果方向的可识别性。对于马尔科夫等价类中的结构，这需要强大的假设，这些假设在现实世界的数据集中可能不成立，最终限制了这些方法的可用性。在之前的尝试基础上，我们展示了如何在贝叶斯框架中融入因果假设。识别因果方向随后成为一个贝叶斯模型选择问题。这使我们能够构建具有现实假设的模型，从而允许区分马尔科夫等价的因果结构。我们分析了为什么贝叶斯模型选择在最大似然方法失败的情况下起作用。为了展示我们的方法，我们构建了一个贝叶斯非参数模型，可以灵活地建模联合分布。然后，在一系列具有不同数据生成假设的基准数据集上，我们的方法胜过了先前的方法。

更新时间: 2024-05-27 20:43:50

领域: stat.ML,cs.LG

下载: http://arxiv.org/abs/2306.02931v2

Future You: A Conversation with an AI-Generated Future Self Reduces Anxiety, Negative Emotions, and Increases Future Self-Continuity

We introduce "Future You," an interactive, brief, single-session, digital chat intervention designed to improve future self-continuity--the degree of connection an individual feels with a temporally distant future self--a characteristic that is positively related to mental health and wellbeing. Our system allows users to chat with a relatable yet AI-powered virtual version of their future selves that is tuned to their future goals and personal qualities. To make the conversation realistic, the system generates a "synthetic memory"--a unique backstory for each user--that creates a throughline between the user's present age (between 18-30) and their life at age 60. The "Future You" character also adopts the persona of an age-progressed image of the user's present self. After a brief interaction with the "Future You" character, users reported decreased anxiety, and increased future self-continuity. This is the first study successfully demonstrating the use of personalized AI-generated characters to improve users' future self-continuity and wellbeing.

Updated: 2024-05-27 20:39:30

标题: 未来的你：与由人工智能生成的未来自我对话减少焦虑、负面情绪，并增加未来自我的连续性

摘要: 我们介绍了一种名为“未来自己”的交互式、简短、单次会话的数字聊天干预，旨在提高未来自我连续性——即个体与遥远未来自我的联系程度——这一特征与心理健康和幸福感呈正相关。我们的系统允许用户与一个与他们的未来目标和个人品质相契合但又由人工智能驱动的虚拟未来自己进行对话。为了使对话更加真实，系统生成了一个“合成记忆”——为每个用户创造了一个独特的背景故事——从用户目前的年龄（在18-30岁之间）到他们60岁时的生活之间建立了联系。这个“未来自己”角色还采用了用户目前自己的年龄增长后的形象。与“未来自己”角色进行简短互动后，用户报告称焦虑减少，未来自我连续性增加。这是第一项成功证明个性化人工智能生成角色用于提高用户未来自我连续性和幸福感的研究。

更新时间: 2024-05-27 20:39:30

领域: cs.HC,cs.AI

下载: http://arxiv.org/abs/2405.12514v2

Accelerating Transformer Pre-training with 2:4 Sparsity

Training large transformers is slow, but recent innovations on GPU architecture give us an advantage. NVIDIA Ampere GPUs can execute a fine-grained 2:4 sparse matrix multiplication twice as fast as its dense equivalent. In the light of this property, we comprehensively investigate the feasibility of accelerating feed-forward networks (FFNs) of transformers in pre-training. First, we define a ``flip rate'' to monitor the stability of a 2:4 training process. Utilizing this metric, we propose three techniques to preserve accuracy: to modify the sparse-refined straight-through estimator by applying the masked decay term on gradients, to determine a feasible decay factor in warm-up stage, and to enhance the model's quality by a dense fine-tuning procedure near the end of pre-training. Besides, we devise two techniques to practically accelerate training: to calculate transposable 2:4 masks by convolution, and to accelerate gated activation functions by reducing GPU L2 cache miss. Experiments show that our 2:4 sparse training algorithm achieves similar convergence to dense training algorithms on several transformer pre-training tasks, while actual acceleration can be observed on different shapes of transformer block apparently. Our toolkit is available at https://github.com/huyz2023/2by4-pretrain.

Updated: 2024-05-27 20:34:44

标题: 使用2:4稀疏性加速变压器预训练

摘要: 训练大型变压器的速度较慢，但最近 GPU 架构的创新为我们带来了优势。NVIDIA Ampere GPU 可以比其密集等价物快两倍执行细粒度 2:4 稀疏矩阵乘法。鉴于这一特性，我们全面调查了在预训练中加速变压器的前馈网络（FFNs）的可行性。首先，我们定义了一个“翻转率”来监测 2:4 训练过程的稳定性。利用这个指标，我们提出了三种技术来保持准确性：通过在梯度上应用掩蔽衰减项修改稀疏细化的直通估计器，确定一个在热身阶段可行的衰减因子，并通过在预训练结束附近进行密集微调程序来增强模型的质量。此外，我们设计了两种实际加速训练的技术：通过卷积计算可转置的 2:4 掩码，通过减少 GPU L2 缓存丢失来加速门控激活函数。实验证明，我们的 2:4 稀疏训练算法在几个变压器预训练任务上实现了与密集训练算法相似的收敛速度，而在不同形状的变压器块上显然可以观察到实际加速。我们的工具包可在 https://github.com/huyz2023/2by4-pretrain 上找到。

更新时间: 2024-05-27 20:34:44

领域: cs.LG

下载: http://arxiv.org/abs/2404.01847v2

Disentanglement Learning via Topology

We propose TopDis (Topological Disentanglement), a method for learning disentangled representations via adding a multi-scale topological loss term. Disentanglement is a crucial property of data representations substantial for the explainability and robustness of deep learning models and a step towards high-level cognition. The state-of-the-art methods are based on VAE and encourage the joint distribution of latent variables to be factorized. We take a different perspective on disentanglement by analyzing topological properties of data manifolds. In particular, we optimize the topological similarity for data manifolds traversals. To the best of our knowledge, our paper is the first one to propose a differentiable topological loss for disentanglement learning. Our experiments have shown that the proposed TopDis loss improves disentanglement scores such as MIG, FactorVAE score, SAP score, and DCI disentanglement score with respect to state-of-the-art results while preserving the reconstruction quality. Our method works in an unsupervised manner, permitting us to apply it to problems without labeled factors of variation. The TopDis loss works even when factors of variation are correlated. Additionally, we show how to use the proposed topological loss to find disentangled directions in a trained GAN.

Updated: 2024-05-27 20:33:56

标题: 通过拓扑学进行解缠学习

摘要: 我们提出了TopDis（拓扑解缠）方法，通过添加多尺度拓扑损失项学习解缠表示。解缠是数据表示的一个关键属性，对于深度学习模型的可解释性和鲁棒性至关重要，并且是迈向高级认知的一步。目前的方法基于VAE，并鼓励潜变量的联合分布被分解。我们通过分析数据流形的拓扑属性，以不同的角度看待解缠。特别地，我们优化数据流形遍历的拓扑相似性。据我们所知，我们的论文是第一个提出可微拓扑损失用于解缠学习的论文。我们的实验证明，所提出的TopDis损失相对于最先进的结果，提高了解缠评分，如MIG、FactorVAE分数、SAP分数和DCI解缠分数，并保持了重构质量。我们的方法以无监督方式工作，使我们能够将其应用于没有标记变化因素的问题。TopDis损失在变化因素相关时也能起作用。此外，我们展示了如何使用提出的拓扑损失在训练好的GAN中找到解缠方向。

更新时间: 2024-05-27 20:33:56

领域: cs.LG

下载: http://arxiv.org/abs/2308.12696v3

Unifying Perspectives: Plausible Counterfactual Explanations on Global, Group-wise, and Local Levels

Growing regulatory and societal pressures demand increased transparency in AI, particularly in understanding the decisions made by complex machine learning models. Counterfactual Explanations (CFs) have emerged as a promising technique within Explainable AI (xAI), offering insights into individual model predictions. However, to understand the systemic biases and disparate impacts of AI models, it is crucial to move beyond local CFs and embrace global explanations, which offer a~holistic view across diverse scenarios and populations. Unfortunately, generating Global Counterfactual Explanations (GCEs) faces challenges in computational complexity, defining the scope of "global," and ensuring the explanations are both globally representative and locally plausible. We introduce a novel unified approach for generating Local, Group-wise, and Global Counterfactual Explanations for differentiable classification models via gradient-based optimization to address these challenges. This framework aims to bridge the gap between individual and systemic insights, enabling a deeper understanding of model decisions and their potential impact on diverse populations. Our approach further innovates by incorporating a probabilistic plausibility criterion, enhancing actionability and trustworthiness. By offering a cohesive solution to the optimization and plausibility challenges in GCEs, our work significantly advances the interpretability and accountability of AI models, marking a step forward in the pursuit of transparent AI.

Updated: 2024-05-27 20:32:09

标题: 统一观点：关于全球、群体和本地水平上可信的反事实解释

摘要: 随着监管和社会压力的增加，对人工智能（AI）的透明度需求不断增加，特别是在理解复杂机器学习模型所做决策方面。反事实解释（CFs）已经成为可解释人工智能（xAI）中一种有前景的技术，提供对个体模型预测的洞察。然而，为了理解人工智能模型的系统偏见和不同群体的影响，至关重要的是超越局部CFs，拥抱全局解释，这些解释提供了跨不同场景和人群的整体视图。不幸的是，生成全局反事实解释（GCEs）面临计算复杂性、定义“全局”范围以及确保解释既具有全局代表性又在局部具有可信度等挑战。我们提出了一种新的统一方法，通过基于梯度的优化生成针对可区分分类模型的本地、分组和全局反事实解释，以解决这些挑战。该框架旨在弥合个体和系统洞见之间的差距，使人们能够更深入地了解模型决策及其对不同人群的潜在影响。我们的方法进一步创新地引入了概率可信度标准，增强了可操作性和可信度。通过为GCEs中的优化和可信度挑战提供一致的解决方案，我们的工作显著提升了AI模型的可解释性和责任性，标志着透明AI的迈进。

更新时间: 2024-05-27 20:32:09

领域: cs.LG,cs.AI,stat.ME

下载: http://arxiv.org/abs/2405.17642v1

CausalCite: A Causal Formulation of Paper Citations

Citation count of a paper is a commonly used proxy for evaluating the significance of a paper in the scientific community. Yet citation measures are widely criticized for failing to accurately reflect the true impact of a paper. Thus, we propose CausalCite, a new way to measure the significance of a paper by assessing the causal impact of the paper on its follow-up papers. CausalCite is based on a novel causal inference method, TextMatch, which adapts the traditional matching framework to high-dimensional text embeddings. TextMatch encodes each paper using text embeddings from large language models (LLMs), extracts similar samples by cosine similarity, and synthesizes a counterfactual sample as the weighted average of similar papers according to their similarity values. We demonstrate the effectiveness of CausalCite on various criteria, such as high correlation with paper impact as reported by scientific experts on a previous dataset of 1K papers, (test-of-time) awards for past papers, and its stability across various subfields of AI. We also provide a set of findings that can serve as suggested ways for future researchers to use our metric for a better understanding of the quality of a paper. Our code is available at https://github.com/causalNLP/causal-cite.

Updated: 2024-05-27 20:31:14

标题: CausalCite：一种关于论文引用的因果公式

摘要: 一篇论文的引用次数是评估其在科学界重要性的常用代理。然而，引用度量经常被批评未能准确反映一篇论文的真实影响。因此，我们提出了一种新的衡量论文重要性的方法，即CausalCite，通过评估论文对其后续论文的因果影响。CausalCite基于一种新颖的因果推断方法TextMatch，该方法将传统匹配框架适应到高维文本嵌入中。TextMatch使用大型语言模型（LLMs）的文本嵌入对每篇论文进行编码，通过余弦相似度提取相似样本，并根据它们的相似度值计算相似论文的加权平均作为反事实样本。我们展示了CausalCite在各种标准上的有效性，例如与科学专家在之前数据集中报告的论文影响的高相关性，过去论文的（时间测试）奖项，以及其在AI各个子领域中的稳定性。我们还提供了一组研究结果，可作为未来研究人员利用我们的指标更好理解一篇论文质量的建议途径。我们的代码可在https://github.com/causalNLP/causal-cite 上获得。

更新时间: 2024-05-27 20:31:14

领域: cs.CL,cs.AI,cs.CY,cs.IR,cs.LG

下载: http://arxiv.org/abs/2311.02790v3

Avoiding Pitfalls for Privacy Accounting of Subsampled Mechanisms under Composition

We consider the problem of computing tight privacy guarantees for the composition of subsampled differentially private mechanisms. Recent algorithms can numerically compute the privacy parameters to arbitrary precision but must be carefully applied. Our main contribution is to address two common points of confusion. First, some privacy accountants assume that the privacy guarantees for the composition of a subsampled mechanism are determined by self-composing the worst-case datasets for the uncomposed mechanism. We show that this is not true in general. Second, Poisson subsampling is sometimes assumed to have similar privacy guarantees compared to sampling without replacement. We show that the privacy guarantees may in fact differ significantly between the two sampling schemes. In particular, we give an example of hyperparameters that result in $\varepsilon \approx 1$ for Poisson subsampling and $\varepsilon > 10$ for sampling without replacement. This occurs for some parameters that could realistically be chosen for DP-SGD.

Updated: 2024-05-27 20:30:12

标题: 避免合成情况下子采样机制隐私会计的错误

摘要: 我们考虑计算子采样差分私有机制的组合的严格隐私保证的问题。最近的算法可以在任意精度上计算隐私参数，但必须谨慎应用。我们的主要贡献是解决两个常见的困惑点。首先，一些隐私会计人员认为，子采样机制的组合的隐私保证是通过自我组合未组合机制的最坏情况数据集来确定的。我们证明这在一般情况下并不成立。其次，泊松子采样有时被认为与无替换抽样具有类似的隐私保证。我们展示了这两种抽样方案的隐私保证实际上可能存在显著差异。特别是，我们举例说明了一些超参数，这些超参数导致泊松子采样的ε约为1，而无替换抽样的ε大于10。这发生在一些实际上可以为DP-SGD选择的参数上。

更新时间: 2024-05-27 20:30:12

领域: cs.CR,cs.DS,cs.LG,stat.ML

下载: http://arxiv.org/abs/2405.20769v1

Cross-Modal Safety Alignment: Is textual unlearning all you need?

Recent studies reveal that integrating new modalities into Large Language Models (LLMs), such as Vision-Language Models (VLMs), creates a new attack surface that bypasses existing safety training techniques like Supervised Fine-tuning (SFT) and Reinforcement Learning with Human Feedback (RLHF). While further SFT and RLHF-based safety training can be conducted in multi-modal settings, collecting multi-modal training datasets poses a significant challenge. Inspired by the structural design of recent multi-modal models, where, regardless of the combination of input modalities, all inputs are ultimately fused into the language space, we aim to explore whether unlearning solely in the textual domain can be effective for cross-modality safety alignment. Our evaluation across six datasets empirically demonstrates the transferability -- textual unlearning in VLMs significantly reduces the Attack Success Rate (ASR) to less than 8\% and in some cases, even as low as nearly 2\% for both text-based and vision-text-based attacks, alongside preserving the utility. Moreover, our experiments show that unlearning with a multi-modal dataset offers no potential benefits but incurs significantly increased computational demands, possibly up to 6 times higher.

Updated: 2024-05-27 20:29:13

标题: 跨模态安全对齐：文本去学习是否足够？

摘要: 最近的研究表明，将新的模态集成到大型语言模型（LLMs）中，例如视觉-语言模型（VLMs），会创建一个新的攻击面，可以绕过现有的安全训练技术，如监督微调（SFT）和带有人类反馈的强化学习（RLHF）。虽然可以在多模态设置中进一步进行基于SFT和RLHF的安全训练，但收集多模态训练数据集面临着重大挑战。受最近多模态模型的结构设计启发，即使输入模态的组合不同，所有输入最终都融合到语言空间中，我们的目标是探索是否仅在文本领域进行遗忘可以有效实现跨模态安全对齐。我们在六个数据集上的评估从经验上证明了可转移性——VLMs中的文本遗忘显著降低了攻击成功率（ASR）至低于8％，在某些情况下，甚至对于基于文本和视觉-文本的攻击，ASR几乎降至2％，同时保留了效用。此外，我们的实验证明，使用多模态数据集进行遗忘不会带来潜在的好处，但会导致显着增加的计算需求，可能高达6倍。

更新时间: 2024-05-27 20:29:13

领域: cs.CL,cs.CR,cs.LG

下载: http://arxiv.org/abs/2406.02575v1

Probabilistically Plausible Counterfactual Explanations with Normalizing Flows

We present PPCEF, a novel method for generating probabilistically plausible counterfactual explanations (CFs). PPCEF advances beyond existing methods by combining a probabilistic formulation that leverages the data distribution with the optimization of plausibility within a unified framework. Compared to reference approaches, our method enforces plausibility by directly optimizing the explicit density function without assuming a particular family of parametrized distributions. This ensures CFs are not only valid (i.e., achieve class change) but also align with the underlying data's probability density. For that purpose, our approach leverages normalizing flows as powerful density estimators to capture the complex high-dimensional data distribution. Furthermore, we introduce a novel loss that balances the trade-off between achieving class change and maintaining closeness to the original instance while also incorporating a probabilistic plausibility term. PPCEF's unconstrained formulation allows for efficient gradient-based optimization with batch processing, leading to orders of magnitude faster computation compared to prior methods. Moreover, the unconstrained formulation of PPCEF allows for the seamless integration of future constraints tailored to specific counterfactual properties. Finally, extensive evaluations demonstrate PPCEF's superiority in generating high-quality, probabilistically plausible counterfactual explanations in high-dimensional tabular settings. This makes PPCEF a powerful tool for not only interpreting complex machine learning models but also for improving fairness, accountability, and trust in AI systems.

Updated: 2024-05-27 20:24:03

标题: 通过正则化流实现概率上合理的反事实解释

摘要: 我们提出了PPCEF，一种用于生成概率可信的反事实解释（CFs）的新方法。PPCEF通过将利用数据分布的概率形式与在统一框架内优化合理性相结合，进一步发展了现有方法。与参考方法相比，我们的方法通过直接优化显式密度函数来强制实现合理性，而不是假设特定的参数化分布族。这确保了CFs不仅有效（即实现类变换），而且与基础数据的概率密度相符。为此，我们的方法利用正规化流作为强大的密度估计器，以捕获复杂的高维数据分布。此外，我们引入了一种新的损失函数，平衡实现类变换和保持与原始实例的接近性之间的权衡，同时还包含了一个概率合理性项。PPCEF的无约束形式允许通过批处理进行高效的基于梯度的优化，与先前方法相比，计算速度提高了几个数量级。此外，PPCEF的无约束形式允许无缝集成将来针对特定反事实属性定制的未来约束。最后，广泛的评估证明了PPCEF在生成高质量、概率可信的反事实解释方面的优越性，尤其是在高维表格设置中。这使PPCEF不仅成为解释复杂机器学习模型的强大工具，还可用于提高AI系统中的公平性、问责制和信任度。

更新时间: 2024-05-27 20:24:03

领域: cs.LG,cs.AI,stat.ME

下载: http://arxiv.org/abs/2405.17640v1

The surprising efficiency of temporal difference learning for rare event prediction

We quantify the efficiency of temporal difference (TD) learning over the direct, or Monte Carlo (MC), estimator for policy evaluation in reinforcement learning, with an emphasis on estimation of quantities related to rare events. Policy evaluation is complicated in the rare event setting by the long timescale of the event and by the need for \emph{relative accuracy} in estimates of very small values. Specifically, we focus on least-squares TD (LSTD) prediction for finite state Markov chains, and show that LSTD can achieve relative accuracy far more efficiently than MC. We prove a central limit theorem for the LSTD estimator and upper bound the \emph{relative asymptotic variance} by simple quantities characterizing the connectivity of states relative to the transition probabilities between them. Using this bound, we show that, even when both the timescale of the rare event and the relative accuracy of the MC estimator are exponentially large in the number of states, LSTD maintains a fixed level of relative accuracy with a total number of observed transitions of the Markov chain that is only \emph{polynomially} large in the number of states.

Updated: 2024-05-27 20:18:20

标题: 罕见事件预测中时序差分学习的惊人效率

摘要: 我们量化了在强化学习中政策评估中时间差分（TD）学习相对于直接或蒙特卡罗（MC）估计器的效率，重点放在与罕见事件相关的数量的估计上。罕见事件设置中的政策评估由事件的长时间尺度和对非常小值的估计的相对精确性的需求而变得复杂。具体来说，我们关注有限状态马尔可夫链的最小二乘TD（LSTD）预测，并且表明LSTD可以比MC更有效地实现相对精确性。我们证明了LSTD估计器的中心极限定理，并通过表征状态之间的连接性相对于它们之间的转移概率的简单数量对相对渐近方差进行了上界。利用这个界限，我们表明，即使罕见事件的时间尺度和MC估计器的相对精确性都与状态数呈指数增长，LSTD仍然保持相对精确性的固定水平，观察到的马尔可夫链的转换总数仅在状态数中呈多项式增长。

更新时间: 2024-05-27 20:18:20

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2405.17638v1

The Economic Implications of Large Language Model Selection on Earnings and Return on Investment: A Decision Theoretic Model

Selecting language models in business contexts requires a careful analysis of the final financial benefits of the investment. However, the emphasis of academia and industry analysis of LLM is solely on performance. This work introduces a framework to evaluate LLMs, focusing on the earnings and return on investment aspects that should be taken into account in business decision making. We use a decision-theoretic approach to compare the financial impact of different LLMs, considering variables such as the cost per token, the probability of success in the specific task, and the gain and losses associated with LLMs use. The study reveals how the superior accuracy of more expensive models can, under certain conditions, justify a greater investment through more significant earnings but not necessarily a larger RoI. This article provides a framework for companies looking to optimize their technology choices, ensuring that investment in cutting-edge technology aligns with strategic financial objectives. In addition, we discuss how changes in operational variables influence the economics of using LLMs, offering practical insights for enterprise settings, finding that the predicted gain and loss and the different probabilities of success and failure are the variables that most impact the sensitivity of the models.

Updated: 2024-05-27 20:08:41

标题: 大型语言模型选择对收入和投资回报的经济影响：一个决策理论模型

摘要: 在商业环境中选择语言模型需要对投资的最终财务收益进行谨慎分析。然而，学术界和工业界对LLM的重点仅仅是性能。本文介绍了一个评估LLM的框架，侧重于应考虑在商业决策中的收入和投资回报方面。我们使用决策理论方法比较不同LLM的财务影响，考虑诸如每个令牌的成本、在特定任务中成功的概率以及与LLM使用相关的收益和损失等变量。研究揭示了更昂贵模型的卓越准确性如何在特定条件下通过更显著的收入来证明更大的投资是有道理的，但不一定会带来更大的投资回报率。本文为希望优化其技术选择的公司提供了一个框架，确保对尖端技术的投资与战略财务目标相一致。此外，我们讨论了操作变量的变化如何影响使用LLM的经济效益，为企业设置提供了实用见解，发现预测的收益和损失以及成功和失败的不同概率是影响模型灵敏度最大的变量。

更新时间: 2024-05-27 20:08:41

领域: cs.AI,cs.CE,I.2.m; K.6.1

下载: http://arxiv.org/abs/2405.17637v1

Relative Preference Optimization: Enhancing LLM Alignment through Contrasting Responses across Identical and Diverse Prompts

In the field of large language models (LLMs), aligning models with the diverse preferences of users is a critical challenge. Direct Preference Optimization (DPO) has played a key role in this area. It works by using pairs of preferences derived from the same prompts, and it functions without needing an additional reward model. However, DPO does not fully reflect the complex nature of human learning, which often involves understanding contrasting responses to not only identical but also similar questions. To overcome this shortfall, we propose Relative Preference Optimization (RPO). RPO is designed to discern between more and less preferred responses derived from both identical and related prompts. It introduces a contrastive weighting mechanism, enabling the tuning of LLMs using a broader range of preference data, including both paired and unpaired sets. This approach expands the learning capabilities of the model, allowing it to leverage insights from a more varied set of prompts. Through empirical tests, including dialogue and summarization tasks, and evaluations using the AlpacaEval2.0 leaderboard, RPO has demonstrated a superior ability to align LLMs with user preferences and to improve their adaptability during the training process. Our code can be viewed at https://github.com/yinyueqin/relative-preference-optimization

Updated: 2024-05-27 20:05:03

标题: 相对偏好优化：通过在相同和多样化提示中对比响应来增强LLM对齐

摘要: 在大型语言模型（LLMs）领域，将模型与用户多样化偏好对齐是一个关键挑战。直接偏好优化（DPO）在这一领域发挥了关键作用。它通过使用从相同提示中衍生的偏好对来工作，无需额外的奖励模型。然而，DPO并未完全反映人类学习的复杂性质，通常涉及不仅是相同而且类似问题的对立响应的理解。为了克服这一不足，我们提出了相对偏好优化（RPO）。RPO旨在区分从相同和相关提示中衍生的更受偏好和不太受偏好的响应。它引入了对比加权机制，使得可以使用更广泛的偏好数据来调整LLMs，包括成对和未成对的集合。这种方法扩展了模型的学习能力，使其能够利用更多样化的提示集的见解。通过包括对话和摘要任务以及使用AlpacaEval2.0排行榜进行评估的实证测试，RPO已经证明了与用户偏好对齐并在训练过程中提高其适应性的卓越能力。我们的代码可以在https://github.com/yinyueqin/relative-preference-optimization 上查看。

更新时间: 2024-05-27 20:05:03

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2402.10958v2

Soft Preference Optimization: Aligning Language Models to Expert Distributions

We propose Soft Preference Optimization (SPO), a method for aligning generative models, such as Large Language Models (LLMs), with human preferences, without the need for a reward model. SPO optimizes model outputs directly over a preference dataset through a natural loss function that integrates preference loss with a regularization term across the model's entire output distribution rather than limiting it to the preference dataset. Although SPO does not require the assumption of an existing underlying reward model, we demonstrate that, under the Bradley-Terry (BT) model assumption, it converges to a softmax of scaled rewards, with the distribution's "softness" adjustable via the softmax exponent, an algorithm parameter. We showcase SPO's methodology, its theoretical foundation, and its comparative advantages in simplicity, computational efficiency, and alignment precision.

Updated: 2024-05-27 19:59:00

标题: 软偏好优化：将语言模型与专家分布对齐

摘要: 我们提出软优先优化（SPO）方法，用于将生成模型，如大型语言模型（LLMs），与人类偏好对齐，而无需奖励模型。SPO通过一个自然损失函数直接优化模型输出在一个偏好数据集上，该损失函数将偏好损失与整个模型输出分布上的正则化项整合在一起，而不是仅限于偏好数据集。虽然SPO不需要假设存在底层奖励模型，但我们证明，在布拉德利-特里（BT）模型假设下，它收敛到一个缩放奖励的softmax分布，通过softmax指数（算法参数）可以调整分布的“软度”。我们展示了SPO的方法论、理论基础以及在简单性、计算效率和对齐精度方面的比较优势。

更新时间: 2024-05-27 19:59:00

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2405.00747v3

Tensor and Matrix Low-Rank Value-Function Approximation in Reinforcement Learning

Value-function (VF) approximation is a central problem in Reinforcement Learning (RL). Classical non-parametric VF estimation suffers from the curse of dimensionality. As a result, parsimonious parametric models have been adopted to approximate VFs in high-dimensional spaces, with most efforts being focused on linear and neural-network-based approaches. Differently, this paper puts forth a a parsimonious non-parametric approach, where we use stochastic low-rank algorithms to estimate the VF matrix in an online and model-free fashion. Furthermore, as VFs tend to be multi-dimensional, we propose replacing the classical VF matrix representation with a tensor (multi-way array) representation and, then, use the PARAFAC decomposition to design an online model-free tensor low-rank algorithm. Different versions of the algorithms are proposed, their complexity is analyzed, and their performance is assessed numerically using standardized RL environments.

Updated: 2024-05-27 19:58:52

标题: 张量和矩阵低秩值函数逼近在强化学习中的应用

摘要: 价值函数（VF）逼近是强化学习（RL）中的一个核心问题。传统的非参数VF估计受到维度灾难的困扰。因此，在高维空间中，人们采用了简约的参数模型来逼近VF，其中大部分工作集中在线性和基于神经网络的方法上。与此不同的是，本文提出了一种简约的非参数方法，我们使用随机低秩算法以在线和无模型的方式估计VF矩阵。此外，由于VF往往是多维的，我们提出用张量（多维数组）表示取代经典的VF矩阵表示，然后使用PARAFAC分解设计一个在线无模型张量低秩算法。提出了不同版本的算法，分析了它们的复杂度，并在标准化的RL环境中通过数值评估它们的性能。

更新时间: 2024-05-27 19:58:52

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2201.09736v3

BioDiscoveryAgent: An AI Agent for Designing Genetic Perturbation Experiments

Agents based on large language models have shown great potential in accelerating scientific discovery by leveraging their rich background knowledge and reasoning capabilities. Here, we develop BioDiscoveryAgent, an agent that designs new experiments, reasons about their outcomes, and efficiently navigates the hypothesis space to reach desired solutions. We demonstrate our agent on the problem of designing genetic perturbation experiments, where the aim is to find a small subset out of many possible genes that, when perturbed, result in a specific phenotype (e.g., cell growth). Utilizing its biological knowledge, BioDiscoveryAgent can uniquely design new experiments without the need to train a machine learning model or explicitly design an acquisition function. Moreover, BioDiscoveryAgent achieves an average of 18% improvement in detecting desired phenotypes across five datasets, compared to existing Bayesian optimization baselines specifically trained for this task. Our evaluation includes one dataset that is unpublished, ensuring it is not part of the language model's training data. Additionally, BioDiscoveryAgent predicts gene combinations to perturb twice as accurately as a random baseline, a task so far not explored in the context of closed-loop experiment design. The agent also has access to tools for searching the biomedical literature, executing code to analyze biological datasets, and prompting another agent to critically evaluate its predictions. Overall, BioDiscoveryAgent is interpretable at every stage, representing an accessible new paradigm in the computational design of biological experiments with the potential to augment scientists' capabilities.

Updated: 2024-05-27 19:57:17

标题: 生物发现代理：一种用于设计基因扰动实验的人工智能代理

摘要: 基于大型语言模型的智能体已经展示出在加速科学发现方面具有巨大潜力，通过利用其丰富的背景知识和推理能力。在这里，我们开发了BioDiscoveryAgent，这是一个智能体，可以设计新的实验，推理它们的结果，并有效地在假设空间中导航，以达到期望的解决方案。我们在设计基因干扰实验的问题上展示了我们的智能体，目的是找到许多可能基因中的一个小子集，在被扰动时会产生特定表型（例如细胞生长）。利用其生物知识，BioDiscoveryAgent可以独特地设计新的实验，无需训练机器学习模型或明确设计获取函数。此外，与专门针对该任务训练的现有贝叶斯优化基线相比，BioDiscoveryAgent 在检测期望表型方面平均提高了18%。我们的评估包括一个未发表的数据集，以确保它不是语言模型的训练数据的一部分。此外，BioDiscoveryAgent 预测基因组合的扰动准确性比随机基线提高了一倍，这是迄今未在闭环实验设计背景下探索的任务。该智能体还可以访问搜索生物医学文献的工具，执行代码以分析生物数据集，并促使另一智能体对其预测进行批判性评估。总的来说，BioDiscoveryAgent 在每个阶段都是可解释的，代表了一个在计算生物实验设计方面具有潜力增强科学家能力的可访问的新范式。

更新时间: 2024-05-27 19:57:17

领域: cs.AI,cs.CE,cs.MA

下载: http://arxiv.org/abs/2405.17631v1

DataDreamer: A Tool for Synthetic Data Generation and Reproducible LLM Workflows

Large language models (LLMs) have become a dominant and important tool for NLP researchers in a wide range of tasks. Today, many researchers use LLMs in synthetic data generation, task evaluation, fine-tuning, distillation, and other model-in-the-loop research workflows. However, challenges arise when using these models that stem from their scale, their closed source nature, and the lack of standardized tooling for these new and emerging workflows. The rapid rise to prominence of these models and these unique challenges has had immediate adverse impacts on open science and on the reproducibility of work that uses them. In this paper, we introduce DataDreamer, an open source Python library that allows researchers to write simple code to implement powerful LLM workflows. DataDreamer also helps researchers adhere to best practices that we propose to encourage open science and reproducibility. The library and documentation are available at https://github.com/datadreamer-dev/DataDreamer .

Updated: 2024-05-27 19:54:44

标题: DataDreamer：用于合成数据生成和可重现LLM工作流程的工具

摘要: 大型语言模型（LLMs）已经成为NLP研究人员在各种任务中的一个主要和重要工具。如今，许多研究人员在合成数据生成、任务评估、微调、蒸馏和其他基于模型的研究工作流程中使用LLMs。然而，使用这些模型时会出现一些挑战，这些挑战源于它们的规模、闭源性质以及缺乏针对这些新兴工作流程的标准化工具。这些模型的迅速崛起和这些独特挑战对开放科学和使用它们的工作的可重复性产生了直接的负面影响。在本文中，我们介绍了DataDreamer，这是一个开源的Python库，允许研究人员编写简单的代码来实现强大的LLM工作流程。DataDreamer还帮助研究人员遵循我们提出的最佳实践，以鼓励开放科学和可重复性。该库和文档可在https://github.com/datadreamer-dev/DataDreamer 上找到。

更新时间: 2024-05-27 19:54:44

领域: cs.CL,cs.LG

下载: http://arxiv.org/abs/2402.10379v2

MetaOptimize: A Framework for Optimizing Step Sizes and Other Meta-parameters

This paper addresses the challenge of optimizing meta-parameters (i.e., hyperparameters) in machine learning algorithms, a critical factor influencing training efficiency and model performance. Moving away from the computationally expensive traditional meta-parameter search methods, we introduce MetaOptimize framework that dynamically adjusts meta-parameters, particularly step sizes (also known as learning rates), during training. More specifically, MetaOptimize can wrap around any first-order optimization algorithm, tuning step sizes on the fly to minimize a specific form of regret that accounts for long-term effect of step sizes on training, through a discounted sum of future losses. We also introduce low complexity variants of MetaOptimize that, in conjunction with its adaptability to multiple optimization algorithms, demonstrate performance competitive to those of best hand-crafted learning rate schedules across various machine learning applications.

Updated: 2024-05-27 19:52:56

标题: MetaOptimize：优化步长和其他元参数的框架

摘要: 本文讨论了优化机器学习算法中元参数（即超参数）的挑战，这是影响训练效率和模型性能的关键因素。我们摆脱了传统的计算昂贵的元参数搜索方法，引入了MetaOptimize框架，该框架在训练过程中动态调整元参数，特别是步长（也称为学习率）。更具体地说，MetaOptimize可以包裹任何一阶优化算法，在训练过程中实时调整步长，以最小化一种特定形式的遗憾，该遗憾考虑了步长对训练的长期影响，通过未来损失的折现总和。我们还介绍了MetaOptimize的低复杂度变体，结合其适应多种优化算法的能力，展示了在各种机器学习应用中与最佳手工学习率调度相媲美的性能。

更新时间: 2024-05-27 19:52:56

领域: cs.LG,cs.AI,math.OC

下载: http://arxiv.org/abs/2402.02342v4

Tensor Low-rank Approximation of Finite-horizon Value Functions

The goal of reinforcement learning is estimating a policy that maps states to actions and maximizes the cumulative reward of a Markov Decision Process (MDP). This is oftentimes achieved by estimating first the optimal (reward) value function (VF) associated with each state-action pair. When the MDP has an infinite horizon, the optimal VFs and policies are stationary under mild conditions. However, in finite-horizon MDPs, the VFs (hence, the policies) vary with time. This poses a challenge since the number of VFs to estimate grows not only with the size of the state-action space but also with the time horizon. This paper proposes a non-parametric low-rank stochastic algorithm to approximate the VFs of finite-horizon MDPs. First, we represent the (unknown) VFs as a multi-dimensional array, or tensor, where time is one of the dimensions. Then, we use rewards sampled from the MDP to estimate the optimal VFs. More precisely, we use the (truncated) PARAFAC decomposition to design an online low-rank algorithm that recovers the entries of the tensor of VFs. The size of the low-rank PARAFAC model grows additively with respect to each of its dimensions, rendering our approach efficient, as demonstrated via numerical experiments.

Updated: 2024-05-27 19:52:00

标题: 有限时间段价值函数的张量低秩逼近

摘要: 强化学习的目标是估计一个将状态映射到动作并最大化马尔可夫决策过程（MDP）累积奖励的策略。通常情况下，这是通过首先估计与每个状态-动作对应的最优（奖励）价值函数（VF）来实现的。当MDP具有无限时间横跨时，最优VF和策略在温和条件下是固定的。然而，在有限时间横跨的MDP中，VF（因此策略）随时间变化。这带来了挑战，因为需要估计的VF数量不仅随着状态-动作空间的大小增长，还随着时间横跨增长。本文提出了一种非参数低秩随机算法来逼近有限时间横跨MDP的VF。首先，我们将（未知的）VF表示为多维数组或张量，其中时间是其中一个维度。然后，我们使用从MDP中采样的奖励来估计最优VF。更确切地说，我们使用（截断的）PARAFAC分解来设计一种在线低秩算法，以恢复VF张量的条目。低秩PARAFAC模型的大小随着其各个维度的增加而增加，使我们的方法高效，通过数值实验证明。

更新时间: 2024-05-27 19:52:00

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2405.17628v1

Salutary Labeling with Zero Human Annotation

Active learning strategically selects informative unlabeled data points and queries their ground truth labels for model training. The prevailing assumption underlying this machine learning paradigm is that acquiring these ground truth labels will optimally enhance model performance. However, this assumption may not always hold true or maximize learning capacity, particularly considering the costly labor annotations required for ground truth labels. In contrast to traditional ground truth labeling, this paper proposes salutary labeling, which automatically assigns the most beneficial labels to the most informative samples without human annotation. Specifically, we utilize the influence function, a tool for estimating sample influence, to select newly added samples and assign their salutary labels by choosing the category that maximizes their positive influence. This process eliminates the need for human annotation. Extensive experiments conducted on nine benchmark datasets demonstrate the superior performance of our salutary labeling approach over traditional active learning strategies. Additionally, we provide several in-depth explorations and practical applications of large language model (LLM) fine-tuning.

Updated: 2024-05-27 19:49:18

标题: Salutary Labeling with Zero Human Annotation 零人工标注的有益标注

摘要: 主动学习策略性地选择信息量丰富的未标记数据点，并查询它们的真实标签以进行模型训练。支撑这种机器学习范式的主要假设是获取这些真实标签将最大程度地提升模型性能。然而，这种假设并不总是成立或能够最大化学习能力，尤其考虑到获取真实标签所需的昂贵劳动标注。与传统的真实标签不同，本文提出了有益标注，它能够自动为最具信息量的样本分配最有益的标签，无需人工标注。具体来说，我们利用影响函数这一估计样本影响力的工具，选择新添加的样本并通过选择最大化其正面影响的类别来分配它们的有益标签。这个过程消除了人工标注的需要。在九个基准数据集上进行的大量实验表明，我们的有益标注方法相较传统主动学习策略具有更优越的性能。此外，我们提供大型语言模型（LLM）微调的若干深入探讨和实际应用。

更新时间: 2024-05-27 19:49:18

领域: cs.LG

下载: http://arxiv.org/abs/2405.17627v1

Matrix Low-Rank Approximation For Policy Gradient Methods

Estimating a policy that maps states to actions is a central problem in reinforcement learning. Traditionally, policies are inferred from the so called value functions (VFs), but exact VF computation suffers from the curse of dimensionality. Policy gradient (PG) methods bypass this by learning directly a parametric stochastic policy. Typically, the parameters of the policy are estimated using neural networks (NNs) tuned via stochastic gradient descent. However, finding adequate NN architectures can be challenging, and convergence issues are common as well. In this paper, we put forth low-rank matrix-based models to estimate efficiently the parameters of PG algorithms. We collect the parameters of the stochastic policy into a matrix, and then, we leverage matrix-completion techniques to promote (enforce) low rank. We demonstrate via numerical studies how low-rank matrix-based policy models reduce the computational and sample complexities relative to NN models, while achieving a similar aggregated reward.

Updated: 2024-05-27 19:49:08

标题: 策略梯度方法的矩阵低秩逼近

摘要: 在强化学习中，估计将状态映射到动作的策略是一个核心问题。传统上，策略是从价值函数（VFs）中推断出来的，但精确的VF计算受到维度灾难的影响。策略梯度（PG）方法通过直接学习参数化的随机策略来避开这一问题。通常，策略的参数是使用通过随机梯度下降优化的神经网络（NNs）来估计的。然而，找到合适的NN结构可能具有挑战性，并且收敛问题也很常见。在本文中，我们提出了基于低秩矩阵的模型，以有效地估计PG算法的参数。我们将随机策略的参数收集到一个矩阵中，然后利用矩阵完成技术来促进（强制）低秩。通过数值研究，我们展示了基于低秩矩阵的策略模型如何减少相对于NN模型的计算和样本复杂性，同时实现类似的聚合奖励。

更新时间: 2024-05-27 19:49:08

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2405.17626v1

Matrix Low-Rank Trust Region Policy Optimization

Most methods in reinforcement learning use a Policy Gradient (PG) approach to learn a parametric stochastic policy that maps states to actions. The standard approach is to implement such a mapping via a neural network (NN) whose parameters are optimized using stochastic gradient descent. However, PG methods are prone to large policy updates that can render learning inefficient. Trust region algorithms, like Trust Region Policy Optimization (TRPO), constrain the policy update step, ensuring monotonic improvements. This paper introduces low-rank matrix-based models as an efficient alternative for estimating the parameters of TRPO algorithms. By gathering the stochastic policy's parameters into a matrix and applying matrix-completion techniques, we promote and enforce low rank. Our numerical studies demonstrate that low-rank matrix-based policy models effectively reduce both computational and sample complexities compared to NN models, while maintaining comparable aggregated rewards.

Updated: 2024-05-27 19:46:31

标题: 矩阵低秩信任域策略优化

摘要: 强化学习中的大多数方法使用策略梯度（PG）方法来学习将状态映射到动作的参数化随机策略。标准方法是通过神经网络（NN）实现这种映射，通过随机梯度下降优化其参数。然而，PG方法容易产生大的策略更新，可能导致学习效率低下。信任区域算法，如信任区域策略优化（TRPO），限制策略更新步骤，确保单调改进。本文介绍了基于低秩矩阵的模型作为估计TRPO算法参数的有效替代方法。通过将随机策略的参数收集到矩阵中，并应用矩阵补全技术，我们促进并强制低秩。我们的数值研究表明，基于低秩矩阵的策略模型相比NN模型有效地减少了计算和样本复杂性，同时保持可比的累积奖励。

更新时间: 2024-05-27 19:46:31

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2405.17625v1

Experts Don't Cheat: Learning What You Don't Know By Predicting Pairs

Identifying how much a model ${\widehat{p}}_{\theta}(Y|X)$ knows about the stochastic real-world process $p(Y|X)$ it was trained on is important to ensure it avoids producing incorrect or "hallucinated" answers or taking unsafe actions. But this is difficult for generative models because probabilistic predictions do not distinguish between per-response noise (aleatoric uncertainty) and lack of knowledge about the process (epistemic uncertainty), and existing epistemic uncertainty quantification techniques tend to be overconfident when the model underfits. We propose a general strategy for teaching a model to both approximate $p(Y|X)$ and also estimate the remaining gaps between ${\widehat{p}}_{\theta}(Y|X)$ and $p(Y|X)$: train it to predict pairs of independent responses drawn from the true conditional distribution, allow it to "cheat" by observing one response while predicting the other, then measure how much it cheats. Remarkably, we prove that being good at cheating (i.e. cheating whenever it improves your prediction) is equivalent to being second-order calibrated, a principled extension of ordinary calibration that allows us to construct provably-correct frequentist confidence intervals for $p(Y|X)$ and detect incorrect responses with high probability. We demonstrate empirically that our approach accurately estimates how much models don't know across ambiguous image classification, (synthetic) language modeling, and partially-observable navigation tasks, outperforming existing techniques.

Updated: 2024-05-27 19:40:04

标题: 专家不作弊：通过预测配对学习未知知识

摘要: 识别模型${\widehat{p}}_{\theta}(Y|X)$对其训练的随机真实世界过程$p(Y|X)$了解程度的重要性在于确保它避免产生不正确或“幻觉”答案或采取不安全行动。但是，对生成模型而言，这是困难的，因为概率预测不能区分每个响应的噪音（aleatoric不确定性）和对过程的了解不足（epistemic不确定性），而现有的epistemic不确定性量化技术在模型拟合不足时往往会过于自信。我们提出了一种通用策略，教导模型既逼近$p(Y|X)$，又估计${\widehat{p}}_{\theta}(Y|X)$与$p(Y|X)$之间的剩余差距：训练模型预测从真实条件分布中抽取的独立响应对，允许它通过观察一个响应而预测另一个响应来“作弊”，然后测量它作弊的程度。值得注意的是，我们证明，善于作弊（即在提高预测时作弊）等同于二阶校准，这是对普通校准的原则性扩展，允许我们构建对$p(Y|X)$的经过证明正确的频率主义置信区间，并且高概率地检测不正确的响应。我们在模糊图像分类、（合成）语言建模和部分可观测导航任务中进行了实证演示，表明我们的方法准确估计模型对模糊图像分类、（合成）语言建模和部分可观测导航任务中的不了解程度，优于现有技术。

更新时间: 2024-05-27 19:40:04

领域: cs.LG

下载: http://arxiv.org/abs/2402.08733v2

A Best-of-both-worlds Algorithm for Bandits with Delayed Feedback with Robustness to Excessive Delays

We propose a new best-of-both-worlds algorithm for bandits with variably delayed feedback. In contrast to prior work, which required prior knowledge of the maximal delay $d_{\mathrm{max}}$ and had a linear dependence of the regret on it, our algorithm can tolerate arbitrary excessive delays up to order $T$ (where $T$ is the time horizon). The algorithm is based on three technical innovations, which may all be of independent interest: (1) We introduce the first implicit exploration scheme that works in best-of-both-worlds setting. (2) We introduce the first control of distribution drift that does not rely on boundedness of delays. The control is based on the implicit exploration scheme and adaptive skipping of observations with excessive delays. (3) We introduce a procedure relating standard regret with drifted regret that does not rely on boundedness of delays. At the conceptual level, we demonstrate that complexity of best-of-both-worlds bandits with delayed feedback is characterized by the amount of information missing at the time of decision making (measured by the number of outstanding observations) rather than the time that the information is missing (measured by the delays).

Updated: 2024-05-27 19:30:57

标题: 一个适用于带有延迟反馈的赌博机的最佳算法，对过度延迟具有鲁棒性

摘要: 我们提出了一种新的针对具有可变延迟反馈的赌博机的最佳方案算法。与以往的工作相比，以前的工作需要事先知道最大延迟$d_{\mathrm{max}}$，并且遗憾与之成线性依赖关系，我们的算法可以容忍任意长达到$T$（其中$T$是时间范围）的任意过度延迟。该算法基于三项技术创新，这三项技术可能都具有独立的兴趣：(1)我们引入了在最佳方案设置中工作的第一个隐式探索方案。(2)我们引入了第一个不依赖延迟有界性的分布漂移控制。该控制基于隐式探索方案和自适应跳过具有过度延迟的观测。(3)我们引入了一个将标准遗憾与漂移遗憾相关的程序，该程序不依赖于延迟的有界性。在概念层面上，我们证明最佳方案赌博机与延迟反馈的复杂性是由决策时缺失的信息量（通过未完成观测数量来衡量）而不是信息缺失的时间（通过延迟来衡量）来表征的。

更新时间: 2024-05-27 19:30:57

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2308.10675v2

Symmetric Reinforcement Learning Loss for Robust Learning on Diverse Tasks and Model Scales

Reinforcement learning (RL) training is inherently unstable due to factors such as moving targets and high gradient variance. Reinforcement Learning from Human Feedback (RLHF) and Reinforcement Learning from AI Feedback (RLAIF) can introduce additional difficulty. Differing preferences can complicate the alignment process, and prediction errors in a trained reward model can become more severe as the LLM generates unseen outputs. To enhance training robustness, RL has adopted techniques from supervised learning, such as ensembles and layer normalization. In this work, we improve the stability of RL training by adapting the reverse cross entropy (RCE) from supervised learning for noisy data to define a symmetric RL loss. We demonstrate performance improvements across various tasks and scales. We conduct experiments in discrete action tasks (Atari games) and continuous action space tasks (MuJoCo benchmark and Box2D) using Symmetric A2C (SA2C) and Symmetric PPO (SPPO), with and without added noise with especially notable performance in SPPO across different hyperparameters. Furthermore, we validate the benefits of the symmetric RL loss when using SPPO for large language models through improved performance in RLHF tasks, such as IMDB positive sentiment sentiment and TL;DR summarization tasks.

Updated: 2024-05-27 19:28:33

标题: 对不同任务和模型规模的鲁棒学习的对称强化学习损失

摘要: 强化学习（RL）训练由于移动目标和梯度变化较大等因素，本质上是不稳定的。来自人类反馈的强化学习（RLHF）和来自人工智能反馈的强化学习（RLAIF）可能会引入额外的困难。不同的偏好可能会复杂化对齐过程，并且在训练奖励模型中的预测错误可能会因为LLM生成未见输出而变得更加严重。为了增强训练的稳健性，RL借鉴了来自监督学习的技术，如集成和层归一化。在这项工作中，我们通过将来自有噪数据的监督学习中的反向交叉熵（RCE）调整为定义对称RL损失，从而改善了RL训练的稳定性。我们展示了在各种任务和规模上的性能改进。我们在离散动作任务（Atari游戏）和连续动作空间任务（MuJoCo基准和Box2D）中使用Symmetric A2C（SA2C）和Symmetric PPO（SPPO）进行实验，有的加入了噪声，特别是在不同超参数下SPPO表现出显著的性能提升。此外，我们验证了对称RL损失在使用SPPO进行大型语言模型时的好处，通过在RLHF任务中的改进性能，如IMDB正面情感和TL;DR摘要任务。

更新时间: 2024-05-27 19:28:33

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2405.17618v1

Listenable Maps for Zero-Shot Audio Classifiers

Interpreting the decisions of deep learning models, including audio classifiers, is crucial for ensuring the transparency and trustworthiness of this technology. In this paper, we introduce LMAC-ZS (Listenable Maps for Audio Classifiers in the Zero-Shot context), which, to the best of our knowledge, is the first decoder-based post-hoc interpretation method for explaining the decisions of zero-shot audio classifiers. The proposed method utilizes a novel loss function that maximizes the faithfulness to the original similarity between a given text-and-audio pair. We provide an extensive evaluation using the Contrastive Language-Audio Pretraining (CLAP) model to showcase that our interpreter remains faithful to the decisions in a zero-shot classification context. Moreover, we qualitatively show that our method produces meaningful explanations that correlate well with different text prompts.

Updated: 2024-05-27 19:25:42

标题: 可听地图用于零样本音频分类器

摘要: 解释深度学习模型的决策，包括音频分类器，对于确保这项技术的透明度和可信度至关重要。在本文中，我们介绍了LMAC-ZS（用于零-shot上下文中音频分类器的可听地图），据我们所知，这是第一个基于解码器的事后解释方法，用于解释零-shot音频分类器的决策。所提出的方法利用一种新颖的损失函数，最大化对给定文本和音频对之间原始相似性的忠实性。我们使用对比语言-音频预训练（CLAP）模型进行了广泛评估，展示我们的解释器在零-shot分类上下文中仍然忠实于决策。此外，我们定性地展示，我们的方法产生的有意义解释与不同的文本提示很好地相关。

更新时间: 2024-05-27 19:25:42

领域: cs.SD,cs.LG,eess.AS,eess.SP

下载: http://arxiv.org/abs/2405.17615v1

A Framework for Multi-modal Learning: Jointly Modeling Inter- & Intra-Modality Dependencies

Supervised multi-modal learning involves mapping multiple modalities to a target label. Previous studies in this field have concentrated on capturing in isolation either the inter-modality dependencies (the relationships between different modalities and the label) or the intra-modality dependencies (the relationships within a single modality and the label). We argue that these conventional approaches that rely solely on either inter- or intra-modality dependencies may not be optimal in general. We view the multi-modal learning problem from the lens of generative models where we consider the target as a source of multiple modalities and the interaction between them. Towards that end, we propose inter- & intra-modality modeling (I2M2) framework, which captures and integrates both the inter- and intra-modality dependencies, leading to more accurate predictions. We evaluate our approach using real-world healthcare and vision-and-language datasets with state-of-the-art models, demonstrating superior performance over traditional methods focusing only on one type of modality dependency.

Updated: 2024-05-27 19:22:41

标题: 多模态学习框架：联合建模跨模态和内模态依赖

摘要: 监督多模态学习涉及将多种模态映射到目标标签。该领域先前的研究集中于捕捉互模态依赖性（不同模态和标签之间的关系）或者单一模态内部依赖性（单一模态和标签之间的关系）。我们认为，仅依赖于互模态或者单一模态依赖性的传统方法可能在一般情况下并不最优。我们从生成模型的视角看待多模态学习问题，其中我们将目标视为多种模态的来源以及它们之间的交互。为此，我们提出了互模态和单一模态建模（I2M2）框架，该框架捕获并整合了互模态和单一模态依赖性，从而实现更准确的预测。我们使用真实世界的医疗保健和视觉与语言数据集以及最先进的模型评估了我们的方法，表明相较于仅专注于一种模态依赖性的传统方法，我们的方法表现出更优越的性能。

更新时间: 2024-05-27 19:22:41

领域: cs.CV,cs.CL,cs.LG

下载: http://arxiv.org/abs/2405.17613v1

A note on the error analysis of data-driven closure models for large eddy simulations of turbulence

In this work, we provide a mathematical formulation for error propagation in flow trajectory prediction using data-driven turbulence closure modeling. Under the assumption that the predicted state of a large eddy simulation prediction must be close to that of a subsampled direct numerical simulation, we retrieve an upper bound for the prediction error when utilizing a data-driven closure model. We also demonstrate that this error is significantly affected by the time step size and the Jacobian which play a role in amplifying the initial one-step error made by using the closure. Our analysis also shows that the error propagates exponentially with rollout time and the upper bound of the system Jacobian which is itself influenced by the Jacobian of the closure formulation. These findings could enable the development of new regularization techniques for ML models based on the identified error-bound terms, improving their robustness and reducing error propagation.

Updated: 2024-05-27 19:20:22

标题: 关于数据驱动封闭模型在湍流大涡模拟中的误差分析的注记

摘要: 在这项工作中，我们提供了一个数学形式化的方法，用于利用数据驱动的湍流封闭建模进行流轨迹预测中的误差传播。在假设大涡模拟预测的状态必须接近子采样直接数值模拟的状态的前提下，我们得出了利用数据驱动封闭模型时的预测误差的上限。我们还证明了这种误差受到时间步长和雅可比矩阵的显著影响，这些因素在放大使用封闭模型造成的初始单步误差中起到作用。我们的分析还显示，误差随着滚动时间和系统雅可比矩阵的上限呈指数级传播，而系统雅可比矩阵本身受到封闭公式雅可比矩阵的影响。这些发现可以促进基于识别出的误差上限项的新正则化技术的发展，提高模型的鲁棒性并减少误差传播。

更新时间: 2024-05-27 19:20:22

领域: physics.flu-dyn,cs.LG,physics.comp-ph

下载: http://arxiv.org/abs/2405.17612v1

Explainable machine learning multi-label classification of Spanish legal judgements

Artificial Intelligence techniques such as Machine Learning (ML) have not been exploited to their maximum potential in the legal domain. This has been partially due to the insufficient explanations they provided about their decisions. Automatic expert systems with explanatory capabilities can be specially useful when legal practitioners search jurisprudence to gather contextual knowledge for their cases. Therefore, we propose a hybrid system that applies ML for multi-label classification of judgements (sentences) and visual and natural language descriptions for explanation purposes, boosted by Natural Language Processing techniques and deep legal reasoning to identify the entities, such as the parties, involved. We are not aware of any prior work on automatic multi-label classification of legal judgements also providing natural language explanations to the end-users with comparable overall quality. Our solution achieves over 85 % micro precision on a labelled data set annotated by legal experts. This endorses its interest to relieve human experts from monotonous labour-intensive legal classification tasks.

Updated: 2024-05-27 19:16:42

标题: 可解释的机器学习多标签分类：西班牙法律判决

摘要: 人工智能技术，如机器学习（ML），在法律领域尚未充分发挥其潜力。部分原因是它们没有提供关于决策的充分解释。具有解释能力的自动专家系统在法律实践者搜索判例以获取案例背景知识时特别有用。因此，我们提出了一个混合系统，应用ML对判决（句子）进行多标签分类，并利用视觉和自然语言描述进行解释，辅以自然语言处理技术和深度法律推理来识别涉及的实体，如相关方。我们不知道先前有关自动多标签分类法律判决并向最终用户提供自然语言解释的任何工作，其整体质量可与之相媲美。我们的解决方案在被法律专家注释的标记数据集上实现了超过85％的微精度。这证明了它对于减轻人工专家从单调的劳动密集型法律分类任务中解放的兴趣。

更新时间: 2024-05-27 19:16:42

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2405.17610v1

Analysis of Internet of Things Implementation Barriers in the Cold Supply Chain: An Integrated ISM-MICMAC and DEMATEL Approach

Integrating Internet of Things (IoT) technology inside the cold supply chain can enhance transparency, efficiency, and quality, optimizing operating procedures and increasing productivity. The integration of IoT in this complicated setting is hindered by specific barriers that need a thorough examination. Prominent barriers to IoT implementation in the cold supply chain are identified using a two-stage model. After reviewing the available literature on the topic of IoT implementation, a total of 13 barriers were found. The survey data was cross-validated for quality, and Cronbach's alpha test was employed to ensure validity. This research applies the interpretative structural modeling technique in the first phase to identify the main barriers. Among those barriers, "regularity compliance" and "cold chain networks" are key drivers for IoT adoption strategies. MICMAC's driving and dependence power element categorization helps evaluate the barrier interactions. In the second phase of this research, a decision-making trial and evaluation laboratory methodology was employed to identify causal relationships between barriers and evaluate them according to their relative importance. Each cause is a potential drive, and if its efficiency can be enhanced, the system as a whole benefits. The research findings provide industry stakeholders, governments, and organizations with significant drivers of IoT adoption to overcome these barriers and optimize the utilization of IoT technology to improve the effectiveness and reliability of the cold supply chain.

Updated: 2024-05-27 19:13:33

标题: 在冷链供应链中物联网实施障碍的分析：一种整合的ISM-MICMAC和DEMATEL方法 (Note: The title has been translated from English to Chinese)

摘要: 将物联网（IoT）技术整合到冷链供应链中可以提高透明度、效率和质量，优化运营流程并提高生产力。在这种复杂环境中整合物联网受到特定障碍的阻碍，这些障碍需要进行彻底的审查。使用两阶段模型确定了冷链供应链中物联网实施的突出障碍。在审查有关物联网实施主题的现有文献后，总共发现了13个障碍。调查数据进行了交叉验证以确保质量，并采用克朗巴赫α检验以确保有效性。本研究在第一阶段应用解释结构建模技术以确定主要障碍。在这些障碍中，“规律性合规性”和“冷链网络”是物联网采用战略的关键推动因素。MICMAC的驱动和依赖力量元素分类有助于评估障碍之间的相互作用。在本研究的第二阶段，采用决策试验和评估实验室方法确定障碍之间的因果关系，并根据它们的相对重要性进行评估。每个原因都是潜在的推动力，如果其效率可以提高，整个系统将受益。研究结果为行业利益相关者、政府和组织提供了重要的物联网采用驱动因素，以克服这些障碍并优化物联网技术的利用，以提高冷链供应链的效率和可靠性。

更新时间: 2024-05-27 19:13:33

领域: cs.CY,cs.AI

下载: http://arxiv.org/abs/2402.01804v3

Advancing Cultural Inclusivity: Optimizing Embedding Spaces for Balanced Music Recommendations

Popularity bias in music recommendation systems -- where artists and tracks with the highest listen counts are recommended more often -- can also propagate biases along demographic and cultural axes. In this work, we identify these biases in recommendations for artists from underrepresented cultural groups in prototype-based matrix factorization methods. Unlike traditional matrix factorization methods, prototype-based approaches are interpretable. This allows us to directly link the observed bias in recommendations for minority artists (the effect) to specific properties of the embedding space (the cause). We mitigate popularity bias in music recommendation through capturing both users' and songs' cultural nuances in the embedding space. To address these challenges while maintaining recommendation quality, we propose two novel enhancements to the embedding space: i) we propose an approach to filter-out the irrelevant prototypes used to represent each user and item to improve generalizability, and ii) we introduce regularization techniques to reinforce a more uniform distribution of prototypes within the embedding space. Our results demonstrate significant improvements in reducing popularity bias and enhancing demographic and cultural fairness in music recommendations while achieving competitive -- if not better -- overall performance.

Updated: 2024-05-27 19:12:53

标题: 推动文化包容性：优化嵌入式空间以实现平衡的音乐推荐

摘要: 音乐推荐系统中的流行偏见——即更频繁推荐拥有最高收听次数的艺术家和音轨——也可能沿着人口统计和文化轴线传播偏见。在这项工作中，我们在基于原型的矩阵分解方法中识别了对来自代表性文化群体的艺术家的推荐中存在的这些偏见。与传统的矩阵分解方法不同，基于原型的方法是可解释的。这使我们能够直接将对少数艺术家推荐中观察到的偏见（效果）与嵌入空间的特定属性（原因）联系起来。我们通过在嵌入空间中捕捉用户和歌曲的文化细微差别来减轻音乐推荐中的流行偏见。为了解决这些挑战并保持推荐质量，我们提出了对嵌入空间的两种新改进：i) 我们提出一种方法来过滤出用于表示每个用户和项目的无关原型，以提高泛化能力，ii) 我们引入正则化技术来加强嵌入空间内原型的更均匀分布。我们的结果表明，在减少流行偏见并增强音乐推荐中的人口统计和文化公平性的同时，实现了竞争性——如果不是更好的——整体性能的显著改进。

更新时间: 2024-05-27 19:12:53

领域: cs.IR,cs.AI,cs.LG

下载: http://arxiv.org/abs/2405.17607v1

LoRA-XS: Low-Rank Adaptation with Extremely Small Number of Parameters

The recent trend in scaling language models has led to a growing demand for parameter-efficient tuning (PEFT) methods such as LoRA (Low-Rank Adaptation). LoRA consistently matches or surpasses the full fine-tuning baseline with fewer parameters. However, handling numerous task-specific or user-specific LoRA modules on top of a base model still presents significant storage challenges. To address this, we introduce LoRA-XS (Low-Rank Adaptation with eXtremely Small number of parameters), a novel approach leveraging Singular Value Decomposition (SVD) for parameter-efficient fine-tuning. LoRA-XS introduces a small r x r weight matrix between frozen LoRA matrices, which are constructed by SVD of the original weight matrix. Training only r x r weight matrices ensures independence from model dimensions, enabling more parameter-efficient fine-tuning, especially for larger models. LoRA-XS achieves a remarkable reduction of trainable parameters by over 100x in 7B models compared to LoRA. Our benchmarking across various scales, including GLUE, GSM8k, and MATH benchmarks, shows that our approach outperforms LoRA and recent state-of-the-art approaches like VeRA in terms of parameter efficiency while maintaining competitive performance.

Updated: 2024-05-27 19:07:13

标题: LoRA-XS：极小参数量的低秩适应

摘要: 最近在扩展语言模型方面的趋势导致对参数高效调整（PEFT）方法的增长需求，例如LoRA（低秩调整）。LoRA始终能够以更少的参数匹配或超越完全微调的基线。然而，在基础模型之上处理众多任务特定或用户特定的LoRA模块仍然存在重大存储挑战。为了解决这个问题，我们引入了LoRA-XS（具有极少参数的低秩调整），这是一种利用奇异值分解（SVD）进行参数高效微调的新方法。LoRA-XS在冻结LoRA矩阵之间引入了一个小的r x r权重矩阵，这些矩阵是通过对原始权重矩阵进行SVD构建的。仅训练r x r权重矩阵确保独立于模型维度，实现更加参数高效的微调，特别是对于更大的模型。与LoRA相比，LoRA-XS在7B模型中实现了可观的可训练参数减少超过100倍。我们在各种规模上进行了基准测试，包括GLUE、GSM8k和MATH基准测试，结果显示我们的方法在参数效率方面优于LoRA和最近的VeRA等最新技术方法，同时保持竞争性能。

更新时间: 2024-05-27 19:07:13

领域: cs.LG,cs.AI,cs.CL

下载: http://arxiv.org/abs/2405.17604v1

SERNet-Former: Semantic Segmentation by Efficient Residual Network with Attention-Boosting Gates and Attention-Fusion Networks

Improving the efficiency of state-of-the-art methods in semantic segmentation requires overcoming the increasing computational cost as well as issues such as fusing semantic information from global and local contexts. Based on the recent success and problems that convolutional neural networks (CNNs) encounter in semantic segmentation, this research proposes an encoder-decoder architecture with a unique efficient residual network, Efficient-ResNet. Attention-boosting gates (AbGs) and attention-boosting modules (AbMs) are deployed by aiming to fuse the equivariant and feature-based semantic information with the equivalent sizes of the output of global context of the efficient residual network in the encoder. Respectively, the decoder network is developed with the additional attention-fusion networks (AfNs) inspired by AbM. AfNs are designed to improve the efficiency in the one-to-one conversion of the semantic information by deploying additional convolution layers in the decoder part. Our network is tested on the challenging CamVid and Cityscapes datasets, and the proposed methods reveal significant improvements on the residual networks. To the best of our knowledge, the developed network, SERNet-Former, achieves state-of-the-art results (84.62 % mean IoU) on CamVid dataset and challenging results (87.35 % mean IoU) on Cityscapes validation dataset.

Updated: 2024-05-27 19:05:00

标题: SERNet-Former：具有注意力增强门和注意力融合网络的高效残差网络进行语义分割

摘要: 提高语义分割方法的效率需要克服不断增加的计算成本以及融合来自全局和局部上下文的语义信息等问题。基于卷积神经网络（CNN）在语义分割中遇到的最近的成功和问题，本研究提出了一种具有独特高效残差网络Efficient-ResNet的编码器-解码器架构。通过目标是在编码器中融合等变和基于特征的语义信息与高效残差网络全局上下文输出等效大小的等效信息的注意增强门（AbGs）和注意增强模块（AbMs）。相应地，解码器网络采用了受AbM启发的额外注意融合网络（AfNs）。AfNs旨在通过在解码器部分部署额外的卷积层来改善语义信息的一对一转换的效率。我们的网络在具有挑战性的CamVid和Cityscapes数据集上进行了测试，并且所提出的方法在残差网络上显示出显著的改进。据我们所知，开发的网络SERNet-Former在CamVid数据集上取得了最新的结果（84.62％的平均IoU），在Cityscapes验证数据集上取得了挑战性的结果（87.35％的平均IoU）。

更新时间: 2024-05-27 19:05:00

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2401.15741v5

DSP: Dynamic Sequence Parallelism for Multi-Dimensional Transformers

Scaling multi-dimensional transformers to long sequences is indispensable across various domains. However, the challenges of large memory requirements and slow speeds of such sequences necessitate sequence parallelism. All existing approaches fall under the category of embedded sequence parallelism, which are limited to shard along a single sequence dimension, thereby introducing significant communication overhead. However, the nature of multi-dimensional transformers involves independent calculations across multiple sequence dimensions. To this end, we propose Dynamic Sequence Parallelism (DSP) as a novel abstraction of sequence parallelism. DSP dynamically switches the parallel dimension among all sequences according to the computation stage with efficient resharding strategy. DSP offers significant reductions in communication costs, adaptability across modules, and ease of implementation with minimal constraints. Experimental evaluations demonstrate DSP's superiority over state-of-the-art embedded sequence parallelism methods by remarkable throughput improvements ranging from 32.2% to 10x, with less than 25% communication volume.

Updated: 2024-05-27 18:51:52

标题: DSP：多维变压器的动态序列并行ism

摘要: 将多维变压器扩展到长序列在各个领域都是必不可少的。然而，这种序列的大内存需求和慢速度的挑战需要序列并行处理。所有现有方法都属于嵌入式序列并行处理范畴，这些方法只能沿着单个序列维度进行分片，从而引入了显著的通信开销。然而，多维变压器的性质涉及跨多个序列维度的独立计算。因此，我们提出动态序列并行处理（DSP）作为序列并行处理的新抽象。DSP根据计算阶段使用高效的重新分片策略动态切换所有序列中的并行维度。DSP在通信成本、跨模块的适应性和易于实现方面都提供了显著的降低和最小限制。实验评估表明，DSP在吞吐量方面优于最先进的嵌入式序列并行处理方法，吞吐量提高了32.2%到10倍，通信量少于25%。

更新时间: 2024-05-27 18:51:52

领域: cs.DC,cs.LG

下载: http://arxiv.org/abs/2403.10266v2

A Method for Auto-Differentiation of the Voronoi Tessellation

Voronoi tessellation, also known as Voronoi diagram, is an important computational geometry technique that has applications in various scientific disciplines. It involves dividing a given space into regions based on the proximity to a set of points. Autodifferentiation is a powerful tool for solving optimization tasks. Autodifferentiation assumes constructing a computational graph that allows to compute gradients using backpropagation algorithm. However, often the Voronoi tessellation remains the only non-differentiable part of a pipeline, prohibiting end-to-end differentiation. We present the method for autodifferentiation of the 2D Voronoi tessellation. The method allows one to construct the Voronoi tessellation and pass gradients, making the construction end-to-end differentiable. We provide the implementation details and present several important applications. To the best of our knowledge this is the first autodifferentiable realization of the Voronoi tessellation providing full set of Voronoi geometrical parameters in a differentiable way.

Updated: 2024-05-27 18:49:08

标题: 一种用于Voronoi镶嵌自动微分的方法

摘要: Voronoi tessellation，也称为Voronoi图，是一种重要的计算几何技术，在各种科学学科中都有应用。它涉及根据一组点的接近程度将给定空间划分为区域。自动微分是解决优化任务的强大工具。自动微分假设构建一个计算图，允许使用反向传播算法计算梯度。然而，通常Voronoi tessellation仍然是管道中唯一的不可微分部分，阻碍了端对端的微分。我们提出了一种用于2D Voronoi tessellation的自动微分方法。该方法允许构建Voronoi tessellation并传递梯度，使构建成为端对端可微分。我们提供了实现细节并提出了几个重要应用。据我们所知，这是Voronoi tessellation的第一个可自动微分实现，以可微分方式提供完整的Voronoi几何参数集。

更新时间: 2024-05-27 18:49:08

领域: cs.CG,cs.LG

下载: http://arxiv.org/abs/2312.16192v2

Are Machines Better at Complex Reasoning? Unveiling Human-Machine Inference Gaps in Entailment Verification

Making inferences in text comprehension to understand the meaning is essential in language processing. This work studies the entailment verification (EV) problem of multi-sentence premises that requires a system to make multiple inferences implicitly. Studying EV for such complex premises is important because modern NLP problems, such as detecting inconsistent model-generated rationales, require complex multi-hop reasoning. However, current textual inference datasets mostly contain short premises that only partially focus on these challenges. To address this, we compile an EV benchmark that includes datasets from three NLP domains (NLI, contextual QA, and rationales) containing multi-sentence premises. On benchmarking humans and LLMs, we find that LLMs are better than humans in multi-hop reasoning across extended contexts, while humans perform better in simple deductive reasoning tasks. We also finetune a Flan-T5 model for EV using two training objectives to obtain a strong open-source model that outperforms GPT-3.5 and rivals GPT-4. Finally, we use this model to filter out inconsistent model-generated rationales in self-consistency decoding, resulting in a 6% accuracy improvement on average across three MCQ datasets.

Updated: 2024-05-27 18:44:14

标题: 机器在复杂推理方面更优秀吗？揭示人机推理在蕴涵验证中的差距

摘要: 在文本理解中进行推断以理解含义是语言处理中至关重要的。本研究探讨了多句前提的蕴涵验证（EV）问题，这需要系统隐式地进行多次推理。研究这种复杂前提的EV问题是重要的，因为现代自然语言处理问题，如检测模型生成的理由不一致，需要进行复杂的多跳推理。然而，当前的文本推理数据集大多包含短前提，只部分关注这些挑战。为了解决这一问题，我们编制了一个包含来自三个自然语言处理领域（NLI、上下文问答、理由）的多句前提数据集的EV基准。在对人类和LLM进行基准测试时，我们发现LLM在跨扩展上下文的多跳推理方面优于人类，而人类在简单演绎推理任务中表现更好。我们还使用两个训练目标对Flan-T5模型进行微调，以获得一个性能优于GPT-3.5且与GPT-4相媲美的强大开源模型用于EV。最后，我们使用该模型在自一致解码中过滤出不一致的模型生成的理由，结果在三个多项选择题数据集中平均提高了6%的准确性。

更新时间: 2024-05-27 18:44:14

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2402.03686v3

Enhancing the analysis of murine neonatal ultrasonic vocalizations: Development, evaluation, and application of different mathematical models

Rodents employ a broad spectrum of ultrasonic vocalizations (USVs) for social communication. As these vocalizations offer valuable insights into affective states, social interactions, and developmental stages of animals, various deep learning approaches have aimed to automate both the quantitative (detection) and qualitative (classification) analysis of USVs. Here, we present the first systematic evaluation of different types of neural networks for USV classification. We assessed various feedforward networks, including a custom-built, fully-connected network and convolutional neural network, different residual neural networks (ResNets), an EfficientNet, and a Vision Transformer (ViT). Paired with a refined, entropy-based detection algorithm (achieving recall of 94.9% and precision of 99.3%), the best architecture (achieving 86.79% accuracy) was integrated into a fully automated pipeline capable of analyzing extensive USV datasets with high reliability. Additionally, users can specify an individual minimum accuracy threshold based on their research needs. In this semi-automated setup, the pipeline selectively classifies calls with high pseudo-probability, leaving the rest for manual inspection. Our study focuses exclusively on neonatal USVs. As part of an ongoing phenotyping study, our pipeline has proven to be a valuable tool for identifying key differences in USVs produced by mice with autism-like behaviors.

Updated: 2024-05-27 18:42:45

标题: 增强小鼠新生儿超声波 vocalization 分析：不同数学模型的开发、评估和应用

摘要: 啮齿动物利用广泛的超声波声音（USVs）进行社会交流。由于这些声音提供了有价值的洞察力，可以了解动物的情感状态、社会互动和发育阶段，因此各种深度学习方法旨在自动化USVs的定量（检测）和定性（分类）分析。在这里，我们首次系统评估了不同类型的神经网络用于USV分类。我们评估了各种前馈网络，包括自定义构建的全连接网络和卷积神经网络、不同的残差神经网络（ResNets）、一个EfficientNet和一个Vision Transformer（ViT）。配合一个经过精细调整的基于熵的检测算法（实现召回率94.9%和精度99.3%），最佳架构（达到86.79%准确度）被集成到一个完全自动化的流程中，能够以高可靠性分析大量的USV数据集。此外，用户可以基于其研究需求指定个体最小准确度阈值。在这种半自动设置中，流程有选择地对具有高伪概率的呼叫进行分类，其余留待手动检查。我们的研究专注于新生儿USVs。作为一个正在进行的表型研究的一部分，我们的流程已被证明是一个有价值的工具，用于识别具有类似自闭症行为的小鼠产生的USVs的关键差异。

更新时间: 2024-05-27 18:42:45

领域: cs.SD,cs.LG,eess.AS

下载: http://arxiv.org/abs/2405.12957v2

RAGSys: Item-Cold-Start Recommender as RAG System

Large Language Models (LLM) hold immense promise for real-world applications, but their generic knowledge often falls short of domain-specific needs. Fine-tuning, a common approach, can suffer from catastrophic forgetting and hinder generalizability. In-Context Learning (ICL) offers an alternative, which can leverage Retrieval-Augmented Generation (RAG) to provide LLMs with relevant demonstrations for few-shot learning tasks. This paper explores the desired qualities of a demonstration retrieval system for ICL. We argue that ICL retrieval in this context resembles item-cold-start recommender systems, prioritizing discovery and maximizing information gain over strict relevance. We propose a novel evaluation method that measures the LLM's subsequent performance on NLP tasks, eliminating the need for subjective diversity scores. Our findings demonstrate the critical role of diversity and quality bias in retrieved demonstrations for effective ICL, and highlight the potential of recommender system techniques in this domain.

Updated: 2024-05-27 18:40:49

标题: RAG系统：作为RAG系统的物品冷启动推荐系统

摘要: 大型语言模型（LLM）在现实世界的应用中具有巨大的潜力，但它们的通用知识通常不符合特定领域的需求。微调是一种常见的方法，但可能会导致灾难性遗忘并阻碍泛化能力。上下文学习（ICL）提供了一种替代方案，可以利用检索增强生成（RAG）为LLM提供相关的演示，用于少样本学习任务。本文探讨了用于ICL的演示检索系统的期望品质。我们认为，在这种情况下ICL检索类似于物品冷启动推荐系统，优先考虑发现和最大化信息增益而不是严格的相关性。我们提出了一种新颖的评估方法，该方法衡量LLM在自然语言处理任务中的后续表现，消除了对主观多样性得分的需求。我们的研究结果表明，对于有效的ICL，检索演示中的多样性和质量偏见起着关键作用，并突出了推荐系统技术在该领域的潜力。

更新时间: 2024-05-27 18:40:49

领域: cs.IR,cs.CL,cs.LG

下载: http://arxiv.org/abs/2405.17587v1

Best Arm Identification for Stochastic Rising Bandits

Stochastic Rising Bandits (SRBs) model sequential decision-making problems in which the expected reward of the available options increases every time they are selected. This setting captures a wide range of scenarios in which the available options are learning entities whose performance improves (in expectation) over time (e.g., online best model selection). While previous works addressed the regret minimization problem, this paper focuses on the fixed-budget Best Arm Identification (BAI) problem for SRBs. In this scenario, given a fixed budget of rounds, we are asked to provide a recommendation about the best option at the end of the identification process. We propose two algorithms to tackle the above-mentioned setting, namely R-UCBE, which resorts to a UCB-like approach, and R-SR, which employs a successive reject procedure. Then, we prove that, with a sufficiently large budget, they provide guarantees on the probability of properly identifying the optimal option at the end of the learning process and on the simple regret. Furthermore, we derive a lower bound on the error probability, matched by our R-SR (up to constants), and illustrate how the need for a sufficiently large budget is unavoidable in the SRB setting. Finally, we numerically validate the proposed algorithms in both synthetic and realistic environments.

Updated: 2024-05-27 18:35:39

标题: 最佳臂识别对于随机上升赌徒的意义

摘要: 随机上升赌博机（SRBs）模型顺序决策问题，其中可用选项的预期奖励每次选择时增加。这种设置涵盖了一系列场景，其中可用选项是学习实体，其性能（期望值）随时间改善（例如，在线最佳模型选择）。虽然先前的研究解决了遗憾最小化问题，但本文侧重于针对SRBs的固定预算最佳臂识别（BAI）问题。在这种情况下，给定固定轮次的预算，我们被要求在识别过程结束时提供关于最佳选项的建议。我们提出了两种算法来解决上述设置，即R-UCBE，采用类似UCB的方法，以及R-SR，采用连续拒绝程序。然后，我们证明，在足够大的预算下，它们可以保证在学习过程结束时正确识别最佳选项的概率和简单遗憾。此外，我们得出一个关于错误概率的下界，与我们的R-SR相匹配（直到常数），并说明在SRB设置中不可避免地需要足够大的预算。最后，我们在合成和现实环境中对所提出的算法进行了数值验证。

更新时间: 2024-05-27 18:35:39

领域: cs.LG

下载: http://arxiv.org/abs/2302.07510v3

Understanding Forgetting in Continual Learning with Linear Regression

Continual learning, focused on sequentially learning multiple tasks, has gained significant attention recently. Despite the tremendous progress made in the past, the theoretical understanding, especially factors contributing to catastrophic forgetting, remains relatively unexplored. In this paper, we provide a general theoretical analysis of forgetting in the linear regression model via Stochastic Gradient Descent (SGD) applicable to both underparameterized and overparameterized regimes. Our theoretical framework reveals some interesting insights into the intricate relationship between task sequence and algorithmic parameters, an aspect not fully captured in previous studies due to their restrictive assumptions. Specifically, we demonstrate that, given a sufficiently large data size, the arrangement of tasks in a sequence, where tasks with larger eigenvalues in their population data covariance matrices are trained later, tends to result in increased forgetting. Additionally, our findings highlight that an appropriate choice of step size will help mitigate forgetting in both underparameterized and overparameterized settings. To validate our theoretical analysis, we conducted simulation experiments on both linear regression models and Deep Neural Networks (DNNs). Results from these simulations substantiate our theoretical findings.

Updated: 2024-05-27 18:33:37

标题: 使用线性回归理解连续学习中的遗忘

摘要: 连续学习，重点是顺序学习多个任务，最近引起了很大关注。尽管过去取得了巨大进展，但理论理解，特别是导致灾难性遗忘的因素仍相对未被探索。本文通过随机梯度下降（SGD）在线性回归模型中提供了遗忘的一般理论分析，适用于欠参数化和过参数化制度。我们的理论框架揭示了在任务序列和算法参数之间错综复杂关系的一些有趣见解，这一方面在先前研究中并未完全捕捉到，因为它们的假设过于限制。具体来说，我们证明，在数据规模足够大的情况下，按照任务数据协方差矩阵中具有更大特征值的任务后训练的顺序，往往会导致遗忘增加。此外，我们的发现强调，适当选择步长将有助于减轻欠参数化和过参数化设置中的遗忘。为了验证我们的理论分析，我们对线性回归模型和深度神经网络（DNNs）进行了模拟实验。这些模拟实验的结果证实了我们的理论发现。

更新时间: 2024-05-27 18:33:37

领域: cs.LG

下载: http://arxiv.org/abs/2405.17583v1

Building a temperature forecasting model for the city with the regression neural network (RNN)

In recent years, a study by environmental organizations in the world and Vietnam shows that weather change is quite complex. global warming has become a serious problem in the modern world, which is a concern for scientists. last century, it was difficult to forecast the weather due to missing weather monitoring stations and technological limitations. this made it hard to collect data for building predictive models to make accurate simulations. in Vietnam, research on weather forecast models is a recent development, having only begun around 2000. along with advancements in computer science, mathematical models are being built and applied with machine learning techniques to create more accurate and reliable predictive models. this article will summarize the research and solutions for applying recurrent neural networks to forecast urban temperatures.

Updated: 2024-05-27 18:32:36

标题: 使用回归神经网络（RNN）构建城市温度预测模型

摘要: 近年来，世界和越南的环保组织进行的一项研究显示，天气变化相当复杂。全球变暖已经成为现代世界的一个严重问题，这令科学家们感到担忧。上个世纪，由于缺少气象监测站和技术限制，很难预测天气。这使得很难收集数据来建立准确的模型进行精确模拟。在越南，对天气预测模型的研究是最近才开始的，大约在2000年左右。随着计算机科学的进步，数学模型正在被建立和应用于机器学习技术，以创建更准确和可靠的预测模型。本文将总结将递归神经网络应用于预测城市温度的研究和解决方案。

更新时间: 2024-05-27 18:32:36

领域: cs.LG,cs.SY,eess.SY

下载: http://arxiv.org/abs/2405.17582v1

Mixed Dynamics In Linear Networks: Unifying the Lazy and Active Regimes

The training dynamics of linear networks are well studied in two distinct setups: the lazy regime and balanced/active regime, depending on the initialization and width of the network. We provide a surprisingly simple unyfing formula for the evolution of the learned matrix that contains as special cases both lazy and balanced regimes but also a mixed regime in between the two. In the mixed regime, a part of the network is lazy while the other is balanced. More precisely the network is lazy along singular values that are below a certain threshold and balanced along those that are above the same threshold. At initialization, all singular values are lazy, allowing for the network to align itself with the task, so that later in time, when some of the singular value cross the threshold and become active they will converge rapidly (convergence in the balanced regime is notoriously difficult in the absence of alignment). The mixed regime is the `best of both worlds': it converges from any random initialization (in contrast to balanced dynamics which require special initialization), and has a low rank bias (absent in the lazy dynamics). This allows us to prove an almost complete phase diagram of training behavior as a function of the variance at initialization and the width, for a MSE training task.

Updated: 2024-05-27 18:29:23

标题: 线性网络中的混合动力学：统一懒惰和活跃模式

摘要: 线性网络的训练动态在两种不同的设置中得到了很好的研究：懒惰模式和平衡/活跃模式，这取决于网络的初始化和宽度。我们提供了一个令人惊讶地简单的统一公式，用于描述学习矩阵的演化过程，该公式既包含懒惰和平衡模式，也包含两者之间的混合模式。在混合模式中，网络的一部分是懒惰的，而另一部分是平衡的。更准确地说，网络在低于一定阈值的奇异值处是懒惰的，在高于同一阈值的奇异值处是平衡的。在初始化时，所有奇异值都是懒惰的，这使得网络能够与任务对齐，因此，在之后的时间中，当一些奇异值跨过阈值并变得活跃时，它们将迅速收敛（在没有对齐的情况下，平衡模式中的收敛是非常困难的）。混合模式是“两全其美”的：它可以从任意随机初始化收敛（与需要特殊初始化的平衡动态形成对比），并且具有低秩偏差（在懒惰动态中不存在）。这使我们能够根据初始化时的方差和宽度作为函数对训练行为的几乎完整相图进行证明，针对一个均方误差训练任务。

更新时间: 2024-05-27 18:29:23

领域: cs.LG,cs.AI,stat.ML

下载: http://arxiv.org/abs/2405.17580v1

The last Dance : Robust backdoor attack via diffusion models and bayesian approach

Diffusion models are state-of-the-art deep learning generative models that are trained on the principle of learning forward and backward diffusion processes via the progressive addition of noise and denoising. In this paper, we aim to fool audio-based DNN models, such as those from the Hugging Face framework, primarily those that focus on audio, in particular transformer-based artificial intelligence models, which are powerful machine learning models that save time and achieve results faster and more efficiently. We demonstrate the feasibility of backdoor attacks (called `BacKBayDiffMod`) on audio transformers derived from Hugging Face, a popular framework in the world of artificial intelligence research. The backdoor attack developed in this paper is based on poisoning model training data uniquely by incorporating backdoor diffusion sampling and a Bayesian approach to the distribution of poisoned data.

Updated: 2024-05-27 18:23:01

标题: 最后的舞蹈：通过扩散模型和贝叶斯方法实施强劲后门攻击

摘要: 扩散模型是目前最先进的深度学习生成模型，其训练原则是通过逐渐添加噪声和去噪学习正向和反向扩散过程。本文旨在欺骗基于音频的DNN模型，如Hugging Face框架中的模型，特别是那些专注于音频的变压器（transformer）人工智能模型，这些模型是强大的机器学习模型，可以节省时间并更加高效地实现结果。我们展示了在Hugging Face衍生的音频变压器上进行后门攻击（称为`BacKBayDiffMod`）的可行性，Hugging Face是人工智能研究领域中的流行框架。本文开发的后门攻击基于通过将后门扩散采样和贝叶斯方法应用于受污染数据的分布来独特地毒化模型训练数据。

更新时间: 2024-05-27 18:23:01

领域: cs.LG,cs.AI,cs.CR,eess.SP

下载: http://arxiv.org/abs/2402.05967v5

Container pre-marshalling problem minimizing CV@R under uncertainty of ship arrival times

This paper is concerned with the container pre-marshalling problem, which involves relocating containers in the storage area so that they can be efficiently loaded onto ships without reshuffles. In reality, however, ship arrival times are affected by various external factors, which can cause the order of container retrieval to be different from the initial plan. To represent such uncertainty, we generate multiple scenarios from a multivariate probability distribution of ship arrival times. We derive a mixed-integer linear optimization model to find an optimal container layout such that the conditional value-at-risk is minimized for the number of misplaced containers responsible for reshuffles. Moreover, we devise an exact algorithm based on the cutting-plane method to handle large-scale problems. Numerical experiments using synthetic datasets demonstrate that our method can produce high-quality container layouts compared with the conventional robust optimization model. Additionally, our algorithm can speed up the computation of solving large-scale problems.

Updated: 2024-05-27 18:19:09

标题: 在船舶到达时间不确定性下最小化CV@R的集装箱预编组问题

摘要: 本文关注集装箱预编组问题，涉及将集装箱重新定位到存储区域，以便它们可以在不需要重新排列的情况下高效地装载到船上。然而，在现实中，船舶到达时间受到各种外部因素的影响，这可能导致集装箱检索顺序与初始计划不同。为了表示这种不确定性，我们从船舶到达时间的多变量概率分布中生成多个场景。我们推导了一个混合整数线性优化模型，以找到一个最优的集装箱布局，使得负责重新排列的集装箱数量的条件风险值最小化。此外，我们设计了一种基于切割平面方法的精确算法来处理大规模问题。使用合成数据集进行的数值实验表明，与传统的鲁棒优化模型相比，我们的方法可以产生高质量的集装箱布局。此外，我们的算法可以加速解决大规模问题的计算过程。

更新时间: 2024-05-27 18:19:09

领域: math.OC,cs.AI

下载: http://arxiv.org/abs/2405.17576v1

Interpretable Prognostics with Concept Bottleneck Models

Deep learning approaches have recently been extensively explored for the prognostics of industrial assets. However, they still suffer from a lack of interpretability, which hinders their adoption in safety-critical applications. To improve their trustworthiness, explainable AI (XAI) techniques have been applied in prognostics, primarily to quantify the importance of input variables for predicting the remaining useful life (RUL) using post-hoc attribution methods. In this work, we propose the application of Concept Bottleneck Models (CBMs), a family of inherently interpretable neural network architectures based on concept explanations, to the task of RUL prediction. Unlike attribution methods, which explain decisions in terms of low-level input features, concepts represent high-level information that is easily understandable by users. Moreover, once verified in actual applications, CBMs enable domain experts to intervene on the concept activations at test-time. We propose using the different degradation modes of an asset as intermediate concepts. Our case studies on the New Commercial Modular AeroPropulsion System Simulation (N-CMAPSS) aircraft engine dataset for RUL prediction demonstrate that the performance of CBMs can be on par or superior to black-box models, while being more interpretable, even when the available labeled concepts are limited. Code available at \href{https://github.com/EPFL-IMOS/concept-prognostics/}{\url{github.com/EPFL-IMOS/concept-prognostics/}}.

Updated: 2024-05-27 18:15:40

标题: 可解释的概念瓶颈模型的预测解释

摘要: 最近，深度学习方法在工业资产预测方面得到了广泛探索。然而，它们仍然存在缺乏可解释性的问题，这阻碍了它们在安全关键应用中的采用。为了提高它们的可信度，已经应用了可解释的人工智能（XAI）技术在预测中，主要是利用事后归因方法来量化输入变量对预测剩余寿命（RUL）的重要性。在这项工作中，我们提出了将概念瓶颈模型（CBMs）应用于RUL预测任务，这是一种基于概念解释的内在可解释性神经网络架构系列。与解释方法不同，概念代表了用户容易理解的高级信息，而不是低级输入特征。此外，一旦在实际应用中验证，CBMs使领域专家能够在测试时干预概念激活。我们提出使用资产的不同退化模式作为中间概念。我们在新商用模块化航空推进系统模拟（N-CMAPSS）飞机发动机数据集上进行的RUL预测案例研究表明，CBMs的性能可以与黑盒模型相媲美或优于黑盒模型，同时更易解释，即使可用的标记概念有限。代码可在https://github.com/EPFL-IMOS/concept-prognostics/找到。

更新时间: 2024-05-27 18:15:40

领域: cs.LG,eess.SP,stat.ML

下载: http://arxiv.org/abs/2405.17575v1

Hamiltonian Mechanics of Feature Learning: Bottleneck Structure in Leaky ResNets

We study Leaky ResNets, which interpolate between ResNets ($\tilde{L}=0$) and Fully-Connected nets ($\tilde{L}\to\infty$) depending on an 'effective depth' hyper-parameter $\tilde{L}$. In the infinite depth limit, we study 'representation geodesics' $A_{p}$: continuous paths in representation space (similar to NeuralODEs) from input $p=0$ to output $p=1$ that minimize the parameter norm of the network. We give a Lagrangian and Hamiltonian reformulation, which highlight the importance of two terms: a kinetic energy which favors small layer derivatives $\partial_{p}A_{p}$ and a potential energy that favors low-dimensional representations, as measured by the 'Cost of Identity'. The balance between these two forces offers an intuitive understanding of feature learning in ResNets. We leverage this intuition to explain the emergence of a bottleneck structure, as observed in previous work: for large $\tilde{L}$ the potential energy dominates and leads to a separation of timescales, where the representation jumps rapidly from the high dimensional inputs to a low-dimensional representation, move slowly inside the space of low-dimensional representations, before jumping back to the potentially high-dimensional outputs. Inspired by this phenomenon, we train with an adaptive layer step-size to adapt to the separation of timescales.

Updated: 2024-05-27 18:15:05

标题: 特征学习的哈密顿力学：泄漏ResNets中的瓶颈结构

摘要: 我们研究了Leaky ResNets，它在ResNets（$\tilde{L}=0$）和全连接网络（$\tilde{L}\to\infty$）之间插值，取决于一个“有效深度”超参数$\tilde{L}$。在无限深度限制下，我们研究了“表示测地线”$A_{p}$：表示空间中的连续路径（类似于NeuralODEs），从输入$p=0$到输出$p=1$，最小化网络的参数范数。我们给出了一个拉格朗日和哈密顿重述，强调了两个重要术语的重要性：一个动能，有利于小层导数$\partial_{p}A_{p}$，和一个势能，有利于低维表示，由“身份成本”来衡量。这两种力量之间的平衡提供了对ResNets中特征学习的直观理解。我们利用这种直觉来解释以前工作中观察到的瓶颈结构的出现：对于较大的$\tilde{L}$，势能占主导地位，并导致时间尺度的分离，其中表示从高维输入快速跳转到低维表示，然后在低维表示空间内缓慢移动，最后跳回潜在的高维输出。受到这一现象的启发，我们训练时使用自适应层步长来适应时间尺度的分离。

更新时间: 2024-05-27 18:15:05

领域: stat.ML,cs.AI,cs.LG

下载: http://arxiv.org/abs/2405.17573v1

Learning Latent Space Hierarchical EBM Diffusion Models

This work studies the learning problem of the energy-based prior model and the multi-layer generator model. The multi-layer generator model, which contains multiple layers of latent variables organized in a top-down hierarchical structure, typically assumes the Gaussian prior model. Such a prior model can be limited in modelling expressivity, which results in a gap between the generator posterior and the prior model, known as the prior hole problem. Recent works have explored learning the energy-based (EBM) prior model as a second-stage, complementary model to bridge the gap. However, the EBM defined on a multi-layer latent space can be highly multi-modal, which makes sampling from such marginal EBM prior challenging in practice, resulting in ineffectively learned EBM. To tackle the challenge, we propose to leverage the diffusion probabilistic scheme to mitigate the burden of EBM sampling and thus facilitate EBM learning. Our extensive experiments demonstrate a superior performance of our diffusion-learned EBM prior on various challenging tasks.

Updated: 2024-05-27 18:05:55

标题: 学习潜在空间分层EBM扩散模型

摘要: 这项工作研究了基于能量的先验模型和多层生成器模型的学习问题。多层生成器模型包含多层潜变量，以自上而下的分层结构组织，通常假设高斯先验模型。这种先验模型在建模表现力方面可能存在局限，导致生成器后验与先验模型之间存在差距，即所谓的先验空洞问题。最近的研究探索了学习能量基础（EBM）先验模型作为第二阶段的补充模型，以弥合这一差距。然而，在多层潜变量空间上定义的EBM可能是高度多模态的，这使得从这种边际EBM先验中进行采样在实践中具有挑战性，导致EBM学习效果不佳。为了应对这一挑战，我们提出利用扩散概率方案来减轻EBM采样的负担，从而促进EBM学习。我们进行了大量实验，证明了我们通过扩散学习的EBM先验在各种具有挑战性的任务上表现出优越性能。

更新时间: 2024-05-27 18:05:55

领域: cs.LG,cs.CV,stat.ML

下载: http://arxiv.org/abs/2405.13910v2

Discriminant audio properties in deep learning based respiratory insufficiency detection in Brazilian Portuguese

This work investigates Artificial Intelligence (AI) systems that detect respiratory insufficiency (RI) by analyzing speech audios, thus treating speech as a RI biomarker. Previous works collected RI data (P1) from COVID-19 patients during the first phase of the pandemic and trained modern AI models, such as CNNs and Transformers, which achieved $96.5\%$ accuracy, showing the feasibility of RI detection via AI. Here, we collect RI patient data (P2) with several causes besides COVID-19, aiming at extending AI-based RI detection. We also collected control data from hospital patients without RI. We show that the considered models, when trained on P1, do not generalize to P2, indicating that COVID-19 RI has features that may not be found in all RI types.

Updated: 2024-05-27 18:04:49

标题: 在巴西葡萄牙语中，基于深度学习的呼吸不足检测中的判别音频特性

摘要: 这项工作研究了通过分析语音音频来检测呼吸不足（RI）的人工智能（AI）系统，将语音视为RI生物标志物。先前的研究从COVID-19患者（P1）在大流行的第一阶段收集了RI数据，并训练了现代AI模型，如CNN和Transformer，其准确率达到了96.5％，显示了通过AI检测RI的可行性。在这里，我们收集了除COVID-19外还有几种原因引起的RI患者数据（P2），旨在扩展基于AI的RI检测。我们还从医院患者中收集了没有RI的对照数据。我们展示了考虑到的模型在P1上训练时无法推广到P2，表明COVID-19 RI具有一些可能不会在所有RI类型中发现的特征。

更新时间: 2024-05-27 18:04:49

领域: cs.LG,cs.AI,cs.SD,eess.AS

下载: http://arxiv.org/abs/2405.17569v1

UDPM: Upsampling Diffusion Probabilistic Models

Denoising Diffusion Probabilistic Models (DDPM) have recently gained significant attention. DDPMs compose a Markovian process that begins in the data domain and gradually adds noise until reaching pure white noise. DDPMs generate high-quality samples from complex data distributions by defining an inverse process and training a deep neural network to learn this mapping. However, these models are inefficient because they require many diffusion steps to produce aesthetically pleasing samples. Additionally, unlike generative adversarial networks (GANs), the latent space of diffusion models is less interpretable. In this work, we propose to generalize the denoising diffusion process into an Upsampling Diffusion Probabilistic Model (UDPM). In the forward process, we reduce the latent variable dimension through downsampling, followed by the traditional noise perturbation. As a result, the reverse process gradually denoises and upsamples the latent variable to produce a sample from the data distribution. We formalize the Markovian diffusion processes of UDPM and demonstrate its generation capabilities on the popular FFHQ, AFHQv2, and CIFAR10 datasets. UDPM generates images with as few as three network evaluations, whose overall computational cost is less than a single DDPM or EDM step, while achieving an FID score of 6.86. This surpasses current state-of-the-art efficient diffusion models that use a single denoising step for sampling. Additionally, UDPM offers an interpretable and interpolable latent space, which gives it an advantage over traditional DDPMs. Our code is available online: \url{https://github.com/shadyabh/UDPM/}

Updated: 2024-05-27 18:02:56

标题: UDPM：上采样扩散概率模型

摘要: 去噪扩散概率模型（DDPM）最近引起了广泛关注。DDPM组成一个马尔可夫过程，从数据域开始，并逐渐添加噪音，直到达到纯白噪声。通过定义一个反向过程并训练深度神经网络来学习这种映射，DDPM从复杂的数据分布中生成高质量样本。然而，这些模型效率低下，因为它们需要许多扩散步骤才能生成审美上令人满意的样本。此外，与生成对抗网络（GAN）不同，扩散模型的潜在空间较难解释。在这项工作中，我们提出将去噪扩散过程推广为上采样扩散概率模型（UDPM）。在正向过程中，我们通过下采样减少潜变量维度，然后进行传统的噪声扰动。结果，反向过程逐渐去噪和上采样潜变量，以生成来自数据分布的样本。我们正式化了UDPM的马尔可夫扩散过程，并展示了其在流行的FFHQ、AFHQv2和CIFAR10数据集上的生成能力。UDPM生成的图像仅需三次网络评估，整体计算成本低于单个DDPM或EDM步骤，同时实现6.86的FID分数。这超越了当前使用单一去噪步骤进行采样的最先进的高效扩散模型。此外，UDPM提供一个可解释和可插值的潜在空间，这使其比传统的DDPM具有优势。我们的代码可在线获取：\url{https://github.com/shadyabh/UDPM/}

更新时间: 2024-05-27 18:02:56

领域: cs.CV,cs.LG,eess.IV

下载: http://arxiv.org/abs/2305.16269v2

A deep-learning algorithm to disentangle self-interacting dark matter and AGN feedback models

Different models of dark matter can alter the distribution of mass in galaxy clusters in a variety of ways. However, so can uncertain astrophysical feedback mechanisms. Here we present a Machine Learning method that ''learns'' how the impact of dark matter self-interactions differs from that of astrophysical feedback in order to break this degeneracy and make inferences on dark matter. We train a Convolutional Neural Network on images of galaxy clusters from hydro-dynamic simulations. In the idealised case our algorithm is 80% accurate at identifying if a galaxy cluster harbours collisionless dark matter, dark matter with ${\sigma}_{\rm DM}/m = 0.1$cm$^2/$g or with ${\sigma}_{DM}/m = 1$cm$^2$/g. Whilst we find adding X-ray emissivity maps does not improve the performance in differentiating collisional dark matter, it does improve the ability to disentangle different models of astrophysical feedback. We include noise to resemble data expected from Euclid and Chandra and find our model has a statistical error of < 0.01cm$^2$/g and that our algorithm is insensitive to shape measurement bias and photometric redshift errors. This method represents a new way to analyse data from upcoming telescopes that is an order of magnitude more precise and many orders faster, enabling us to explore the dark matter parameter space like never before.

Updated: 2024-05-27 18:00:49

标题: 一种深度学习算法用于区分自相互作用暗物质和AGN反馈模型

摘要: 不同模型的暗物质可以以各种方式改变星系团中的质量分布。然而，不确定的天体物理反馈机制也可以产生类似的效果。在这里，我们提出了一种机器学习方法，通过“学习”暗物质自相互作用与天体物理反馈的影响方式之间的差异，以打破这种退化，并推断出关于暗物质的信息。我们在星系团的图像上训练了一个卷积神经网络，这些图像来自流体动力学模拟。在理想情况下，我们的算法在识别星系团是否含有无碰撞暗物质、具有${\sigma}_{\rm DM}/m = 0.1$cm$^2/$g或${\sigma}_{DM}/m = 1$cm$^2$/g的暗物质方面的准确率达到80%。虽然我们发现添加X射线辐射度图并不能提高区分碰撞暗物质的性能，但它确实提高了解开不同天体物理反馈模型的能力。我们添加噪声以模拟来自欧几里得和钱德拉期望数据，并发现我们的模型具有<0.01cm$^2$/g的统计误差，并且我们的算法对形状测量偏差和光度红移误差不敏感。这种方法代表了一种全新的方式来分析未来望远镜的数据，其精度提高了一个数量级，速度提高了几个数量级，使我们能够以前所未有的方式探索暗物质参数空间。

更新时间: 2024-05-27 18:00:49

领域: astro-ph.CO,cs.LG,physics.comp-ph

下载: http://arxiv.org/abs/2405.17566v1

Probabilistic Verification of Neural Networks using Branch and Bound

Probabilistic verification of neural networks is concerned with formally analysing the output distribution of a neural network under a probability distribution of the inputs. Examples of probabilistic verification include verifying the demographic parity fairness notion or quantifying the safety of a neural network. We present a new algorithm for the probabilistic verification of neural networks based on an algorithm for computing and iteratively refining lower and upper bounds on probabilities over the outputs of a neural network. By applying state-of-the-art bound propagation and branch and bound techniques from non-probabilistic neural network verification, our algorithm significantly outpaces existing probabilistic verification algorithms, reducing solving times for various benchmarks from the literature from tens of minutes to tens of seconds. Furthermore, our algorithm compares favourably even to dedicated algorithms for restricted subsets of probabilistic verification. We complement our empirical evaluation with a theoretical analysis, proving that our algorithm is sound and, under mildly restrictive conditions, also complete when using a suitable set of heuristics.

Updated: 2024-05-27 18:00:03

标题: 使用分支和界方法对神经网络进行概率验证

摘要: 神经网络的概率验证涉及在输入概率分布下正式分析神经网络的输出分布。概率验证的示例包括验证人口平等公平概念或量化神经网络的安全性。我们提出了一种基于计算和迭代细化神经网络输出概率下限和上限的算法的神经网络概率验证新算法。通过应用最先进的界传播和分支界限技术，我们的算法明显超越了现有的概率验证算法，将从文献中的各种基准测试的解决时间从几十分钟减少到几十秒。此外，我们的算法甚至与针对概率验证受限子集的专用算法比较有利。我们的经验评估与理论分析相辅相成，证明了我们的算法在使用适当的启发式方法时是可靠的，并在适度限制条件下也是完整的。

更新时间: 2024-05-27 18:00:03

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2405.17556v1

Bayesian RG Flow in Neural Network Field Theories

The Neural Network Field Theory correspondence (NNFT) is a mapping from neural network (NN) architectures into the space of statistical field theories (SFTs). The Bayesian renormalization group (BRG) is an information-theoretic coarse graining scheme that generalizes the principles of the Exact Renormalization Group (ERG) to arbitrarily parameterized probability distributions, including those of NNs. In BRG, coarse graining is performed in parameter space with respect to an information-theoretic distinguishability scale set by the Fisher information metric. In this paper, we unify NNFT and BRG to form a powerful new framework for exploring the space of NNs and SFTs, which we coin BRG-NNFT. With BRG-NNFT, NN training dynamics can be interpreted as inducing a flow in the space of SFTs from the information-theoretic `IR' $\rightarrow$ `UV'. Conversely, applying an information-shell coarse graining to the trained network's parameters induces a flow in the space of SFTs from the information-theoretic `UV' $\rightarrow$ `IR'. When the information-theoretic cutoff scale coincides with a standard momentum scale, BRG is equivalent to ERG. We demonstrate the BRG-NNFT correspondence on two analytically tractable examples. First, we construct BRG flows for trained, infinite-width NNs, of arbitrary depth, with generic activation functions. As a special case, we then restrict to architectures with a single infinitely-wide layer, scalar outputs, and generalized cos-net activations. In this case, we show that BRG coarse-graining corresponds exactly to the momentum-shell ERG flow of a free scalar SFT. Our analytic results are corroborated by a numerical experiment in which an ensemble of asymptotically wide NNs are trained and subsequently renormalized using an information-shell BRG scheme.

Updated: 2024-05-27 18:00:00

标题: 神经网络场论中的贝叶斯RG流

摘要: 神经网络场论对应（NNFT）是将神经网络（NN）结构映射到统计场论（SFT）空间的方法。贝叶斯重整化群（BRG）是一种信息论粗粒化方案，将精确重整化群（ERG）的原则推广到任意参数化概率分布，包括NN的分布。在BRG中，粗粒化是在参数空间中进行的，根据费舍尔信息度量设定信息论可分辨性尺度。在本文中，我们将NNFT和BRG统一起来，形成一个探索NN和SFT空间的强大新框架，我们称之为BRG-NNFT。通过BRG-NNFT，NN训练动态可以被解释为在SFT空间中从信息论的'IR'到'UV'的流动。反之，对训练好的网络参数应用信息壳粗粒化，则在SFT空间中诱导出从信息论的'UV'到'IR'的流动。当信息论截断尺度与标准动量尺度重合时，BRG等价于ERG。我们在两个解析可处理的示例上展示了BRG-NNFT对应关系。首先，我们构建了对训练好的宽度无限的NN的BRG流，具有任意深度和通用激活函数。然后，我们限制为具有单个无限宽度层、标量输出和广义cos-net激活函数的结构。在这种情况下，我们展示了BRG粗粒化与自由标量SFT的动量壳ERG流完全对应。我们的分析结果得到了一个数值实验的验证，其中一个渐近宽的NN集合被训练并随后使用信息壳BRG方案重整化。

更新时间: 2024-05-27 18:00:00

领域: hep-th,cond-mat.dis-nn,cs.LG

下载: http://arxiv.org/abs/2405.17538v1

Approximately-symmetric neural networks for quantum spin liquids

We propose and analyze a family of approximately-symmetric neural networks for quantum spin liquid problems. These tailored architectures are parameter-efficient, scalable, and significantly out-perform existing symmetry-unaware neural network architectures. Utilizing the mixed-field toric code model, we demonstrate that our approach is competitive with the state-of-the-art tensor network and quantum Monte Carlo methods. Moreover, at the largest system sizes (N=480), our method allows us to explore Hamiltonians with sign problems beyond the reach of both quantum Monte Carlo and finite-size matrix-product states. The network comprises an exactly symmetric block following a non-symmetric block, which we argue learns a transformation of the ground state analogous to quasiadiabatic continuation. Our work paves the way toward investigating quantum spin liquid problems within interpretable neural network architectures

Updated: 2024-05-27 18:00:00

标题: 量子自旋液体的近似对称神经网络

摘要: 我们提出并分析了一类用于量子自旋液体问题的近似对称神经网络。这些定制的架构在参数效率、可扩展性方面表现出色，并且明显优于现有的不考虑对称性的神经网络架构。利用混合场扭码模型，我们展示了我们的方法与最先进的张量网络和量子蒙特卡罗方法相竞争。此外，在最大系统尺寸（N=480）下，我们的方法使我们能够探索具有超出量子蒙特卡罗和有限大小矩阵乘积态所能达到的符号问题的哈密顿量。该网络包括一个完全对称的块，后跟一个非对称的块，我们认为该块学习了类似于准绝热延续的基态转化。我们的工作为在可解释的神经网络架构中研究量子自旋液体问题铺平了道路。

更新时间: 2024-05-27 18:00:00

领域: quant-ph,cond-mat.dis-nn,cond-mat.str-el,cs.LG

下载: http://arxiv.org/abs/2405.17541v1

Towards Human-AI Complementarity with Predictions Sets

Decision support systems based on prediction sets have proven to be effective at helping human experts solve classification tasks. Rather than providing single-label predictions, these systems provide sets of label predictions constructed using conformal prediction, namely prediction sets, and ask human experts to predict label values from these sets. In this paper, we first show that the prediction sets constructed using conformal prediction are, in general, suboptimal in terms of average accuracy. Then, we show that the problem of finding the optimal prediction sets under which the human experts achieve the highest average accuracy is NP-hard. More strongly, unless P = NP, we show that the problem is hard to approximate to any factor less than the size of the label set. However, we introduce a simple and efficient greedy algorithm that, for a large class of expert models and non-conformity scores, is guaranteed to find prediction sets that provably offer equal or greater performance than those constructed using conformal prediction. Further, using a simulation study with both synthetic and real expert predictions, we demonstrate that, in practice, our greedy algorithm finds near-optimal prediction sets offering greater performance than conformal prediction.

Updated: 2024-05-27 18:00:00

标题: 朝向人工智能与预测集的互补性

摘要: 基于预测集的决策支持系统已被证明在帮助人类专家解决分类任务方面非常有效。这些系统提供的不是单一标签的预测，而是使用符合预测构建的标签预测集，并要求人类专家从这些集合中预测标签值。本文首先展示，使用符合预测构建的预测集在平均准确度方面通常不是最优的。然后，我们展示，在人类专家实现最高平均准确度的情况下，找到最优预测集的问题是NP难的。更具体地说，除非P = NP，否则我们证明这个问题很难以比标签集大小更小的任何因子来近似。然而，我们介绍了一个简单而高效的贪婪算法，对于一大类专家模型和非符合分数，保证能够找到预测集，其明显比使用符合预测构建的预测集提供相等或更优的性能。此外，通过使用合成和真实专家预测的模拟研究，我们证明，实际上，我们的贪婪算法能够找到接近最优的预测集，其性能比符合预测更好。

更新时间: 2024-05-27 18:00:00

领域: cs.LG,cs.CY,cs.HC

下载: http://arxiv.org/abs/2405.17544v1

Matryoshka Multimodal Models

Large Multimodal Models (LMMs) such as LLaVA have shown strong performance in visual-linguistic reasoning. These models first embed images into a fixed large number of visual tokens and then feed them into a Large Language Model (LLM). However, this design causes an excessive number of tokens for dense visual scenarios such as high-resolution images and videos, leading to great inefficiency. While token pruning/merging methods do exist, they produce a single length output for each image and do not afford flexibility in trading off information density v.s. efficiency. Inspired by the concept of Matryoshka Dolls, we propose M3: Matryoshka Multimodal Models, which learns to represent visual content as nested sets of visual tokens that capture information across multiple coarse-to-fine granularities. Our approach offers several unique benefits for LMMs: (1) One can explicitly control the visual granularity per test instance during inference, e.g. , adjusting the number of tokens used to represent an image based on the anticipated complexity or simplicity of the content; (2) M3 provides a framework for analyzing the granularity needed for existing datasets, where we find that COCO-style benchmarks only need around ~9 visual tokens to obtain accuracy similar to that of using all 576 tokens; (3) Our approach provides a foundation to explore the best trade-off between performance and visual token length at sample level, where our investigation reveals that a large gap exists between the oracle upper bound and current fixed-scale representations.

Updated: 2024-05-27 17:59:56

标题: 母形多模型

摘要: 大型多模态模型（LMMs）如LLaVA在视觉语言推理中表现出很强的性能。这些模型首先将图像嵌入到固定数量的视觉标记中，然后将它们输入到大型语言模型（LLM）中。然而，这种设计对于密集视觉场景（如高分辨率图像和视频）会导致过多的标记，从而导致效率低下。虽然存在标记修剪/合并方法，但它们会为每个图像产生一个固定长度的输出，并且在信息密度与效率之间的平衡上缺乏灵活性。受到母婴娃娃概念的启发，我们提出了M3：母婴多模态模型，它学会将视觉内容表示为捕捉跨多个粗细颗粒度的信息的嵌套视觉标记集。我们的方法为LMMs提供了几个独特的优势：（1）可以在推理过程中明确控制每个测试实例的视觉颗粒度，例如，根据内容的预期复杂性或简单性调整用于表示图像的标记数量；（2）M3为分析现有数据集所需的颗粒度提供了一个框架，我们发现像COCO风格的基准测试只需要大约9个视觉标记就可以获得与使用所有576个标记相似的准确性；（3）我们的方法为在样本级别探索性能和视觉标记长度之间的最佳平衡提供了基础，我们的研究表明，现有的固定尺度表示与最佳解的差距很大。

更新时间: 2024-05-27 17:59:56

领域: cs.CV,cs.AI,cs.CL,cs.LG

下载: http://arxiv.org/abs/2405.17430v1

GaussianFormer: Scene as Gaussians for Vision-Based 3D Semantic Occupancy Prediction

3D semantic occupancy prediction aims to obtain 3D fine-grained geometry and semantics of the surrounding scene and is an important task for the robustness of vision-centric autonomous driving. Most existing methods employ dense grids such as voxels as scene representations, which ignore the sparsity of occupancy and the diversity of object scales and thus lead to unbalanced allocation of resources. To address this, we propose an object-centric representation to describe 3D scenes with sparse 3D semantic Gaussians where each Gaussian represents a flexible region of interest and its semantic features. We aggregate information from images through the attention mechanism and iteratively refine the properties of 3D Gaussians including position, covariance, and semantics. We then propose an efficient Gaussian-to-voxel splatting method to generate 3D occupancy predictions, which only aggregates the neighboring Gaussians for a certain position. We conduct extensive experiments on the widely adopted nuScenes and KITTI-360 datasets. Experimental results demonstrate that GaussianFormer achieves comparable performance with state-of-the-art methods with only 17.8% - 24.8% of their memory consumption. Code is available at: https://github.com/huang-yh/GaussianFormer.

Updated: 2024-05-27 17:59:51

标题: 高斯变换器：将场景视为高斯分布进行基于视觉的三维语义占有预测

摘要: 3D语义占用预测旨在获取周围场景的3D细粒度几何和语义，并且是视觉中心自主驾驶的稳健性的重要任务。大多数现有方法采用密集网格，如体素作为场景表示，这忽视了占用的稀疏性和物体尺度的多样性，从而导致资源的不平衡分配。为了解决这个问题，我们提出了一种以物体为中心的表示，用稀疏的3D语义高斯描述3D场景，其中每个高斯表示一个灵活的感兴趣区域及其语义特征。我们通过注意机制从图像中汇总信息，并迭代地细化3D高斯的属性，包括位置、协方差和语义。然后，我们提出了一种高效的高斯到体素的喷洒方法来生成3D占用预测，该方法仅聚合某个位置的相邻高斯。我们在广泛采用的nuScenes和KITTI-360数据集上进行了大量实验。实验结果表明，高斯形成者在仅占用他们内存消耗的17.8%-24.8%的情况下，实现了与最先进方法相当的性能。代码可在以下网址找到：https://github.com/huang-yh/GaussianFormer。

更新时间: 2024-05-27 17:59:51

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2405.17429v1

NV-Embed: Improved Techniques for Training LLMs as Generalist Embedding Models

Decoder-only large language model (LLM)-based embedding models are beginning to outperform BERT or T5-based embedding models in general-purpose text embedding tasks, including dense vector-based retrieval. In this work, we introduce the NV-Embed model with a variety of architectural designs and training procedures to significantly enhance the performance of LLM as a versatile embedding model, while maintaining its simplicity and reproducibility. For model architecture, we propose a latent attention layer to obtain pooled embeddings, which consistently improves retrieval and downstream task accuracy compared to mean pooling or using the last <EOS> token embedding from LLMs. To enhance representation learning, we remove the causal attention mask of LLMs during contrastive training. For model training, we introduce a two-stage contrastive instruction-tuning method. It first applies contrastive training with instructions on retrieval datasets, utilizing in-batch negatives and curated hard negative examples. At stage-2, it blends various non-retrieval datasets into instruction tuning, which not only enhances non-retrieval task accuracy but also improves retrieval performance. Combining these techniques, our NV-Embed model, using only publicly available data, has achieved a record-high score of 69.32, ranking No. 1 on the Massive Text Embedding Benchmark (MTEB) (as of May 24, 2024), with 56 tasks, encompassing retrieval, reranking, classification, clustering, and semantic textual similarity tasks. Notably, our model also attains the highest score of 59.36 on 15 retrieval tasks in the MTEB benchmark (also known as BEIR). We will open-source the model at: https://huggingface.co/nvidia/NV-Embed-v1.

Updated: 2024-05-27 17:59:45

标题: NV-Embed：改进的技术用于将LLM训练为通用嵌入模型

摘要: 仅解码器的大型语言模型（LLM）嵌入模型开始在一般文本嵌入任务中超越BERT或T5基于的嵌入模型，包括基于密集向量的检索。在这项工作中，我们引入了NV-Embed模型，采用各种架构设计和训练程序，显著提升LLM作为多功能嵌入模型的性能，同时保持其简单性和可重现性。在模型架构方面，我们提出了一个潜在的注意力层以获得池化嵌入，与从LLM中使用均值池化或最后的<EOS>标记嵌入相比，这一方法始终能够提高检索和下游任务的准确性。为了增强表示学习，我们在对比训练期间去除了LLM的因果关注掩码。在模型训练方面，我们引入了一个两阶段的对比指令调整方法。首先，在检索数据集上应用带有指令的对比训练，利用批内负例和策划的困难负例。在第二阶段，将各种非检索数据集融入指令调整中，不仅提高了非检索任务的准确性，还改善了检索性能。通过结合这些技术，我们的NV-Embed模型仅使用公开可用数据，在2024年5月24日在大规模文本嵌入基准（MTEB）上取得了69.32的最高纪录分数，排名第一，涵盖了56个任务，包括检索、重新排序、分类、聚类和语义文本相似性任务。值得注意的是，我们的模型还在MTEB基准中的15个检索任务中取得了59.36的最高分数（也称为BEIR）。我们将在以下网址开源该模型：https://huggingface.co/nvidia/NV-Embed-v1。

更新时间: 2024-05-27 17:59:45

领域: cs.CL,cs.AI,cs.IR,cs.LG

下载: http://arxiv.org/abs/2405.17428v1

Exploring Backdoor Attacks against Large Language Model-based Decision Making

Large Language Models (LLMs) have shown significant promise in decision-making tasks when fine-tuned on specific applications, leveraging their inherent common sense and reasoning abilities learned from vast amounts of data. However, these systems are exposed to substantial safety and security risks during the fine-tuning phase. In this work, we propose the first comprehensive framework for Backdoor Attacks against LLM-enabled Decision-making systems (BALD), systematically exploring how such attacks can be introduced during the fine-tuning phase across various channels. Specifically, we propose three attack mechanisms and corresponding backdoor optimization methods to attack different components in the LLM-based decision-making pipeline: word injection, scenario manipulation, and knowledge injection. Word injection embeds trigger words directly into the query prompt. Scenario manipulation occurs in the physical environment, where a high-level backdoor semantic scenario triggers the attack. Knowledge injection conducts backdoor attacks on retrieval augmented generation (RAG)-based LLM systems, strategically injecting word triggers into poisoned knowledge while ensuring the information remains factually accurate for stealthiness. We conduct extensive experiments with three popular LLMs (GPT-3.5, LLaMA2, PaLM2), using two datasets (HighwayEnv, nuScenes), and demonstrate the effectiveness and stealthiness of our backdoor triggers and mechanisms. Finally, we critically assess the strengths and weaknesses of our proposed approaches, highlight the inherent vulnerabilities of LLMs in decision-making tasks, and evaluate potential defenses to safeguard LLM-based decision making systems.

Updated: 2024-05-27 17:59:43

标题: 探索针对基于大型语言模型的决策制定的后门攻击

摘要: 大型语言模型（LLMs）在特定应用程序上进行微调后，在决策任务中显示出了显著的潜力，利用了它们从大量数据中学到的固有常识和推理能力。然而，在微调阶段，这些系统面临着重大的安全和安全风险。在这项工作中，我们提出了第一个针对LLM启用的决策系统的背门攻击的综合框架（BALD），系统地探索了在微调阶段如何通过各种渠道引入此类攻击。具体而言，我们提出了三种攻击机制和相应的背门优化方法，以攻击LLM决策管道中的不同组件：单词注入，场景操纵和知识注入。单词注入将触发单词直接嵌入到查询提示中。场景操纵发生在物理环境中，高级背门语义场景触发攻击。知识注入对检索增强生成（RAG）型LLM系统进行背门攻击，战略性地将单词触发器注入到受污染的知识中，同时确保信息在事实上保持准确以实现隐蔽性。我们使用三个流行的LLM（GPT-3.5，LLaMA2，PaLM2）以及两个数据集（HighwayEnv，nuScenes）进行了大量实验，并展示了我们的背门触发器和机制的有效性和隐匿性。最后，我们对我们提出的方法的优势和劣势进行了批判性评估，突出了LLM在决策任务中的固有漏洞，并评估了保护LLM决策系统的潜在防御措施。

更新时间: 2024-05-27 17:59:43

领域: cs.CR,cs.AI

下载: http://arxiv.org/abs/2405.20774v1

From Neurons to Neutrons: A Case Study in Interpretability

Mechanistic Interpretability (MI) promises a path toward fully understanding how neural networks make their predictions. Prior work demonstrates that even when trained to perform simple arithmetic, models can implement a variety of algorithms (sometimes concurrently) depending on initialization and hyperparameters. Does this mean neuron-level interpretability techniques have limited applicability? We argue that high-dimensional neural networks can learn low-dimensional representations of their training data that are useful beyond simply making good predictions. Such representations can be understood through the mechanistic interpretability lens and provide insights that are surprisingly faithful to human-derived domain knowledge. This indicates that such approaches to interpretability can be useful for deriving a new understanding of a problem from models trained to solve it. As a case study, we extract nuclear physics concepts by studying models trained to reproduce nuclear data.

Updated: 2024-05-27 17:59:35

标题: 从神经元到中子：可解释性案例研究

摘要: 机制可解释性（MI）承诺了一条通向充分理解神经网络如何进行预测的路径。先前的研究表明，即使在训练执行简单算术的情况下，模型也可以根据初始化和超参数实现各种算法（有时同时）。这是否意味着神经元级别的可解释性技术具有有限的适用性？我们认为，高维神经网络可以学习其训练数据的低维表示，这些表示不仅有助于做出良好的预测，而且还可以通过机制可解释性的视角理解，并提供出人意料地忠实于人类衍生的领域知识的见解。这表明，这种解释性方法可以用于从训练用于解决问题的模型中推导出对问题的新理解。作为一个案例研究，我们通过研究训练用于重现核数据的模型来提取核物理概念。

更新时间: 2024-05-27 17:59:35

领域: cs.LG,nucl-th

下载: http://arxiv.org/abs/2405.17425v1

Hardness-Aware Scene Synthesis for Semi-Supervised 3D Object Detection

3D object detection aims to recover the 3D information of concerning objects and serves as the fundamental task of autonomous driving perception. Its performance greatly depends on the scale of labeled training data, yet it is costly to obtain high-quality annotations for point cloud data. While conventional methods focus on generating pseudo-labels for unlabeled samples as supplements for training, the structural nature of 3D point cloud data facilitates the composition of objects and backgrounds to synthesize realistic scenes. Motivated by this, we propose a hardness-aware scene synthesis (HASS) method to generate adaptive synthetic scenes to improve the generalization of the detection models. We obtain pseudo-labels for unlabeled objects and generate diverse scenes with different compositions of objects and backgrounds. As the scene synthesis is sensitive to the quality of pseudo-labels, we further propose a hardness-aware strategy to reduce the effect of low-quality pseudo-labels and maintain a dynamic pseudo-database to ensure the diversity and quality of synthetic scenes. Extensive experimental results on the widely used KITTI and Waymo datasets demonstrate the superiority of the proposed HASS method, which outperforms existing semi-supervised learning methods on 3D object detection. Code: https://github.com/wzzheng/HASS.

Updated: 2024-05-27 17:59:23

标题: 硬度感知场景合成用于半监督三维物体检测

摘要: 3D目标检测旨在恢复有关对象的3D信息，并作为自动驾驶感知的基本任务。其性能在很大程度上取决于标记训练数据的规模，然而，获取点云数据的高质量注释是昂贵的。传统方法侧重于为未标记样本生成伪标签作为训练的补充，而3D点云数据的结构性质有助于组合对象和背景以合成逼真的场景。受此启发，我们提出了一种硬度感知场景合成（HASS）方法，以生成适应性合成场景以提高检测模型的泛化能力。我们为未标记对象获得伪标签，并生成不同对象和背景组合的多样化场景。由于场景合成对伪标签的质量敏感，我们进一步提出了一种硬度感知策略，以减少低质量伪标签的影响，并保持动态伪数据库，以确保合成场景的多样性和质量。在广泛使用的KITTI和Waymo数据集上的大量实验结果表明，所提出的HASS方法优于现有的半监督学习方法在3D目标检测上的表现。代码：https://github.com/wzzheng/HASS。

更新时间: 2024-05-27 17:59:23

领域: cs.CV,cs.AI,cs.LG

下载: http://arxiv.org/abs/2405.17422v1

Towards Calibrated Robust Fine-Tuning of Vision-Language Models

Improving out-of-distribution (OOD) generalization through in-distribution (ID) adaptation is a primary goal of robust fine-tuning methods beyond the naive fine-tuning approach. However, despite decent OOD generalization performance from recent robust fine-tuning methods, OOD confidence calibration for reliable machine learning has not been fully addressed. This work proposes a robust fine-tuning method that improves both OOD accuracy and calibration error in Vision Language Models (VLMs). Firstly, we show that both types of errors have a shared upper bound consisting of two terms of ID data: 1) calibration error and 2) the smallest singular value of the input covariance matrix. Based on this insight, we design a novel framework that conducts fine-tuning with a constrained multimodal contrastive loss enforcing a larger smallest singular value, which is further aided by the self-distillation of a moving averaged model to achieve well-calibrated prediction. Starting from an empirical validation of our theoretical statements, we provide extensive experimental results on ImageNet distribution shift benchmarks that demonstrate the effectiveness of our method.

Updated: 2024-05-27 17:59:16

标题: 朝着视觉语言模型的校准稳健微调前进

摘要: 通过在分布内（ID）适应来提高分布外（OOD）泛化是鲁棒微调方法的一个主要目标，超出了朴素微调方法。然而，尽管最近的鲁棒微调方法在OOD泛化性能方面表现不错，但可靠的机器学习中尚未完全解决OOD置信度校准问题。本文提出了一种改进视觉语言模型（VLMs）中OOD准确性和校准误差的鲁棒微调方法。首先，我们表明这两种类型的错误具有共享的上限，由ID数据的两个项组成：1）校准误差和2）输入协方差矩阵的最小奇异值。基于这一见解，我们设计了一个新颖的框架，通过施加一个较大的最小奇异值的受限多模态对比损失进行微调，进一步通过移动平均模型的自蒸馏来实现校准预测。从我们理论陈述的实证验证开始，我们在ImageNet分布转移基准上提供了广泛的实验结果，展示了我们方法的有效性。

更新时间: 2024-05-27 17:59:16

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2311.01723v5

Survival of the Fittest Representation: A Case Study with Modular Addition

When a neural network can learn multiple distinct algorithms to solve a task, how does it "choose" between them during training? To approach this question, we take inspiration from ecology: when multiple species coexist, they eventually reach an equilibrium where some survive while others die out. Analogously, we suggest that a neural network at initialization contains many solutions (representations and algorithms), which compete with each other under pressure from resource constraints, with the "fittest" ultimately prevailing. To investigate this Survival of the Fittest hypothesis, we conduct a case study on neural networks performing modular addition, and find that these networks' multiple circular representations at different Fourier frequencies undergo such competitive dynamics, with only a few circles surviving at the end. We find that the frequencies with high initial signals and gradients, the "fittest," are more likely to survive. By increasing the embedding dimension, we also observe more surviving frequencies. Inspired by the Lotka-Volterra equations describing the dynamics between species, we find that the dynamics of the circles can be nicely characterized by a set of linear differential equations. Our results with modular addition show that it is possible to decompose complicated representations into simpler components, along with their basic interactions, to offer insight on the training dynamics of representations.

Updated: 2024-05-27 17:59:04

标题: 适者生存的表征：模块化加法案例研究

摘要: 当一个神经网络可以学习多个不同的算法来解决一个任务时，它在训练过程中如何在它们之间进行“选择”？为了解决这个问题，我们从生态学中得到启发：当多个物种共存时，它们最终会达到一种平衡状态，在这种状态下一些物种会生存下来，而另一些则会灭绝。类似地，我们认为一个神经网络在初始化时包含了许多解决方案（表示和算法），它们在资源约束的压力下相互竞争，最终只有“最适者”才能存活下来。为了验证这个“适者生存”假设，我们对执行模块化加法的神经网络进行了案例研究，并发现这些网络在不同的傅立叶频率下具有多个圆形表示，并且这些圆形表示经历了这种竞争动态，最终只有少数圆形能够存活下来。我们发现具有高初始信号和梯度的频率，“最适者”，更有可能存活下来。通过增加嵌入维度，我们还观察到更多频率的存活。受描述物种之间动态的洛特卡-沃尔特拉方程的启发，我们发现这些圆形的动态可以很好地用一组线性微分方程来描述。我们关于模块化加法的结果表明，将复杂的表示分解为更简单的组件，以及它们的基本相互作用，可以为表示的训练动态提供洞见。

更新时间: 2024-05-27 17:59:04

领域: cs.LG

下载: http://arxiv.org/abs/2405.17420v1

MultiOOD: Scaling Out-of-Distribution Detection for Multiple Modalities

Detecting out-of-distribution (OOD) samples is important for deploying machine learning models in safety-critical applications such as autonomous driving and robot-assisted surgery. Existing research has mainly focused on unimodal scenarios on image data. However, real-world applications are inherently multimodal, which makes it essential to leverage information from multiple modalities to enhance the efficacy of OOD detection. To establish a foundation for more realistic Multimodal OOD Detection, we introduce the first-of-its-kind benchmark, MultiOOD, characterized by diverse dataset sizes and varying modality combinations. We first evaluate existing unimodal OOD detection algorithms on MultiOOD, observing that the mere inclusion of additional modalities yields substantial improvements. This underscores the importance of utilizing multiple modalities for OOD detection. Based on the observation of Modality Prediction Discrepancy between in-distribution (ID) and OOD data, and its strong correlation with OOD performance, we propose the Agree-to-Disagree (A2D) algorithm to encourage such discrepancy during training. Moreover, we introduce a novel outlier synthesis method, NP-Mix, which explores broader feature spaces by leveraging the information from nearest neighbor classes and complements A2D to strengthen OOD detection performance. Extensive experiments on MultiOOD demonstrate that training with A2D and NP-Mix improves existing OOD detection algorithms by a large margin. Our source code and MultiOOD benchmark are available at https://github.com/donghao51/MultiOOD.

Updated: 2024-05-27 17:59:02

标题: MultiOOD：多模态情境下的分布外检测扩展

摘要: 检测出现在（OOD）样本对于在安全关键应用中部署机器学习模型，如自动驾驶和机器人辅助手术中至关重要。现有研究主要集中在图像数据的单模态场景上。然而，现实世界的应用本质上是多模态的，这使得利用多种模态的信息以增强OOD检测的效力至关重要。为了建立更为真实的多模态OOD检测的基础，我们引入了首款基准，MultiOOD，其特点是数据集大小多样化和模态组合变化。我们首先在MultiOOD上评估现有的单模态OOD检测算法，观察到仅仅包含额外的模态就能够取得显著的改进。这凸显了利用多种模态进行OOD检测的重要性。根据在分布（ID）和OOD数据之间的模态预测差异的观察，以及其与OOD性能之间的强相关性，我们提出了Agree-to-Disagree（A2D）算法，以在训练过程中鼓励这种差异。此外，我们引入了一种新颖的异常值合成方法，NP-Mix，通过利用最近邻类别的信息来探索更广泛的特征空间，并补充A2D以加强OOD检测性能。在MultiOOD上进行的大量实验表明，使用A2D和NP-Mix进行训练大大改善了现有的OOD检测算法。我们的源代码和MultiOOD基准可在https://github.com/donghao51/MultiOOD 上找到。

更新时间: 2024-05-27 17:59:02

领域: cs.CV,cs.AI,cs.LG

下载: http://arxiv.org/abs/2405.17419v1

A Recipe for Unbounded Data Augmentation in Visual Reinforcement Learning

$Q$-learning algorithms are appealing for real-world applications due to their data-efficiency, but they are very prone to overfitting and training instabilities when trained from visual observations. Prior work, namely SVEA, finds that selective application of data augmentation can improve the visual generalization of RL agents without destabilizing training. We revisit its recipe for data augmentation, and find an assumption that limits its effectiveness to augmentations of a photometric nature. Addressing these limitations, we propose a generalized recipe, SADA, that works with wider varieties of augmentations. We benchmark its effectiveness on DMC-GB2 -- our proposed extension of the popular DMControl Generalization Benchmark -- as well as tasks from Meta-World and the Distracting Control Suite, and find that our method, SADA, greatly improves training stability and generalization of RL agents across a diverse set of augmentations. Visualizations, code, and benchmark: see https://aalmuzairee.github.io/SADA/

Updated: 2024-05-27 17:58:23

标题: 一种视觉强化学习中无限数据增强的配方

摘要: $Q$-learning算法在现实世界应用中具有吸引力，因为它们具有数据效率，但在从视觉观察中训练时很容易过拟合和训练不稳定。之前的工作，即SVEA，发现有选择性地应用数据增强可以提高RL代理的视觉泛化能力，而不会使训练不稳定。我们重新审视了其数据增强的方法，并发现一个假设限制了其对光度增强的有效性。为了解决这些限制，我们提出了一个通用的方法，即SADA，可以适用更广泛的增强方式。我们在DMC-GB2上对其有效性进行了基准测试，这是我们提出的流行的DMControl泛化基准的扩展，以及来自Meta-World和Distracting Control Suite的任务，并发现我们的方法SADA大大提高了RL代理在各种增强方式下的训练稳定性和泛化能力。可视化、代码和基准测试结果：请参阅https://aalmuzairee.github.io/SADA/

更新时间: 2024-05-27 17:58:23

领域: cs.LG,cs.CV,cs.RO

下载: http://arxiv.org/abs/2405.17416v1

BIOSCAN-CLIP: Bridging Vision and Genomics for Biodiversity Monitoring at Scale

Measuring biodiversity is crucial for understanding ecosystem health. While prior works have developed machine learning models for the taxonomic classification of photographic images and DNA separately, in this work, we introduce a multimodal approach combining both, using CLIP-style contrastive learning to align images, DNA barcodes, and textual data in a unified embedding space. This allows for accurate classification of both known and unknown insect species without task-specific fine-tuning, leveraging contrastive learning for the first time to fuse DNA and image data. Our method surpasses previous single-modality approaches in accuracy by over 11% on zero-shot learning tasks, showcasing its effectiveness in biodiversity studies.

Updated: 2024-05-27 17:57:48

标题: BIOSCAN-CLIP: 桥接视觉和基因组学，实现大规模生物多样性监测

摘要: 测量生物多样性对于理解生态系统健康至关重要。尽管先前的研究已经开发出机器学习模型来分别对图像和DNA进行分类，但在本研究中，我们介绍了一种多模态方法，结合使用CLIP风格的对比学习将图像、DNA条形码和文本数据在统一的嵌入空间中对齐。这使得能够准确分类已知和未知的昆虫物种，无需特定任务的微调，首次利用对比学习融合DNA和图像数据。我们的方法在零样本学习任务上的准确性比之前的单模态方法高出超过11％，展示了其在生物多样性研究中的有效性。

更新时间: 2024-05-27 17:57:48

领域: cs.AI,cs.CL,cs.CV

下载: http://arxiv.org/abs/2405.17537v1

Enhancing Music Genre Classification through Multi-Algorithm Analysis and User-Friendly Visualization

The aim of this study is to teach an algorithm how to recognize different types of music. Users will submit songs for analysis. Since the algorithm hasn't heard these songs before, it needs to figure out what makes each song unique. It does this by breaking down the songs into different parts and studying things like rhythm, melody, and tone via supervised learning because the program learns from examples that are already labelled. One important thing to consider when classifying music is its genre, which can be quite complex. To ensure accuracy, we use five different algorithms, each working independently, to analyze the songs. This helps us get a more complete understanding of each song's characteristics. Therefore, our goal is to correctly identify the genre of each submitted song. Once the analysis is done, the results are presented using a graphing tool, making it easy for users to understand and provide feedback.

Updated: 2024-05-27 17:57:20

标题: 通过多算法分析和用户友好的可视化增强音乐流派分类

摘要: 本研究的目的是教会算法如何识别不同类型的音乐。用户将提交歌曲进行分析。由于算法之前没有听过这些歌曲，它需要弄清楚使每首歌曲独特的因素。它通过将歌曲分解为不同部分，并通过监督学习研究节奏、旋律和音调等因素来实现这一点，因为该程序从已经标记的示例中学习。在对音乐进行分类时要考虑的一个重要因素是其流派，这可能相当复杂。为了确保准确性，我们使用五种不同的算法，每种独立工作，来分析这些歌曲。这帮助我们更全面地了解每首歌曲的特点。因此，我们的目标是正确识别每首提交的歌曲的流派。分析完成后，结果将使用图形工具呈现，使用户易于理解并提供反馈。

更新时间: 2024-05-27 17:57:20

领域: cs.SD,cs.AI,eess.AS

下载: http://arxiv.org/abs/2405.17413v1

Towards One Model for Classical Dimensionality Reduction: A Probabilistic Perspective on UMAP and t-SNE

This paper shows that the dimensionality reduction methods, UMAP and t-SNE, can be approximately recast as MAP inference methods corresponding to a generalized Wishart-based model introduced in ProbDR. This interpretation offers deeper theoretical insights into these algorithms, while introducing tools with which similar dimensionality reduction methods can be studied.

Updated: 2024-05-27 17:57:12

标题: 朝向经典降维的统一模型：UMAP和t-SNE的概率视角

摘要: 本文表明，降维方法UMAP和t-SNE可以近似重新解释为对应于ProbDR中引入的广义Wishart模型的MAP推断方法。这种解释为这些算法提供了更深入的理论洞见，同时引入了工具，可以用来研究类似的降维方法。

更新时间: 2024-05-27 17:57:12

领域: stat.ML,cs.AI,cs.LG

下载: http://arxiv.org/abs/2405.17412v1

SQLformer: Deep Auto-Regressive Query Graph Generation for Text-to-SQL Translation

In recent years, the task of text-to-SQL translation, which converts natural language questions into executable SQL queries, has gained significant attention for its potential to democratize data access. Despite its promise, challenges such as adapting to unseen databases and aligning natural language with SQL syntax have hindered widespread adoption. To overcome these issues, we introduce SQLformer, a novel Transformer architecture specifically crafted to perform text-to-SQL translation tasks. Our model predicts SQL queries as abstract syntax trees (ASTs) in an autoregressive way, incorporating structural inductive bias in the encoder and decoder layers. This bias, guided by database table and column selection, aids the decoder in generating SQL query ASTs represented as graphs in a Breadth-First Search canonical order. Our experiments demonstrate that SQLformer achieves state-of-the-art performance across six prominent text-to-SQL benchmarks.

Updated: 2024-05-27 17:55:18

标题: SQLformer：深度自回归查询图生成用于文本到SQL翻译

摘要: 近年来，将自然语言问题转换为可执行的SQL查询的文本到SQL翻译任务受到了人们的关注，因为它具有潜力推动数据访问的民主化。尽管具有潜力，但适应未知数据库和将自然语言与SQL语法对齐等挑战阻碍了广泛应用。为了克服这些问题，我们引入了SQLformer，这是一种新颖的Transformer架构，专门设计用于执行文本到SQL翻译任务。我们的模型以自回归方式预测SQL查询，将结构归纳偏差纳入编码器和解码器层。这种偏差，受数据库表和列选择指导，帮助解码器以广度优先搜索规范顺序生成表示为图的SQL查询ASTs。我们的实验表明，SQLformer在六个著名的文本到SQL基准测试中取得了最先进的性能。

更新时间: 2024-05-27 17:55:18

领域: cs.CL,cs.LG

下载: http://arxiv.org/abs/2310.18376v4

Deep Learning Calabi-Yau four folds with hybrid and recurrent neural network architectures

In this work, we report the results of applying deep learning based on hybrid convolutional-recurrent and purely recurrent neural network architectures to the dataset of almost one million complete intersection Calabi-Yau four-folds (CICY4) to machine-learn their four Hodge numbers $h^{1,1}, h^{2,1}, h^{3,1}, h^{2,2}$. In particular, we explored and experimented with twelve different neural network models, nine of which are convolutional-recurrent (CNN-RNN) hybrids with the RNN unit being either GRU (Gated Recurrent Unit) or Long Short Term Memory (LSTM). The remaining four models are purely recurrent neural networks based on LSTM. In terms of the $h^{1,1}, h^{2,1}, h^{3,1}, h^{2,2}$ prediction accuracies, at 72% training ratio, our best performing individual model is CNN-LSTM-400, a hybrid CNN-LSTM with the LSTM hidden size of 400, which obtained 99.74%, 98.07%, 95.19%, 81.01%, our second best performing individual model is LSTM-448, an LSTM-based model with the hidden size of 448, which obtained 99.74%, 97.51%, 94.24%, and 78.63%. These results were improved by forming ensembles of the top two, three or even four models. Our best ensemble, consisting of the top three models, achieved the accuracies of 99.80%, 98.40%, 95.80%, 83.02%. At 80% training ratio, the top two performing models LSTM-448 and LSTM-424 are both LSTM-based with the hidden sizes of 448 and 424. Compared with the 72% training ratio, there is a significant improvement of accuracies, which reached 99.85%, 98.66%, 96.26%, 84.77% for the best individual model and 99.88%, 98.91%, 96.96%, 86.78% for the best ensemble.

Updated: 2024-05-27 17:55:05

标题: 使用混合和递归神经网络结构进行深度学习Calabi-Yau四向量

摘要: 在这项工作中，我们报告了基于混合卷积-循环和纯循环神经网络架构应用深度学习的结果，用于对几乎一百万个完整交叉 Calabi-Yau 四维空间（CICY4）数据集进行机器学习，以学习它们的四个 Hodge 数 $h^{1,1}, h^{2,1}, h^{3,1}, h^{2,2}$。特别地，我们探索并尝试了十二种不同的神经网络模型，其中九种是卷积-循环（CNN-RNN）混合模型，循环单元可以是 GRU（门控循环单元）或长短时记忆（LSTM）。剩下的四个模型是基于LSTM的纯循环神经网络。在 $h^{1,1}, h^{2,1}, h^{3,1}, h^{2,2}$ 预测准确率方面，在72%的训练比率下，我们表现最好的单个模型是CNN-LSTM-400，一个带有400个LSTM隐藏层的混合CNN-LSTM模型，准确率分别为99.74%，98.07%，95.19%，81.01%。我们表现次好的单个模型是LSTM-448，一个基于LSTM的模型，隐藏层大小为448，准确率分别为99.74%，97.51%，94.24%，78.63%。这些结果通过组合前两个、三个甚至四个模型而得到改善。我们最好的组合，由前三个模型组成，达到了99.80%，98.40%，95.80%，83.02%的准确率。在80%的训练比率下，表现最好的两个模型LSTM-448和LSTM-424都是基于LSTM的，隐藏层大小分别为448和424。与72%的训练比率相比，准确率有显著改善，最佳单个模型分别达到99.85%，98.66%，96.26%，84.77%，最佳组合分别达到99.88%，98.91%，96.96%，86.78%。

更新时间: 2024-05-27 17:55:05

领域: hep-th,cs.LG,math.AG

下载: http://arxiv.org/abs/2405.17406v1

Calibrated Dataset Condensation for Faster Hyperparameter Search

Dataset condensation can be used to reduce the computational cost of training multiple models on a large dataset by condensing the training dataset into a small synthetic set. State-of-the-art approaches rely on matching the model gradients between the real and synthetic data. However, there is no theoretical guarantee of the generalizability of the condensed data: data condensation often generalizes poorly across hyperparameters/architectures in practice. This paper considers a different condensation objective specifically geared toward hyperparameter search. We aim to generate a synthetic validation dataset so that the validation-performance rankings of the models, with different hyperparameters, on the condensed and original datasets are comparable. We propose a novel hyperparameter-calibrated dataset condensation (HCDC) algorithm, which obtains the synthetic validation dataset by matching the hyperparameter gradients computed via implicit differentiation and efficient inverse Hessian approximation. Experiments demonstrate that the proposed framework effectively maintains the validation-performance rankings of models and speeds up hyperparameter/architecture search for tasks on both images and graphs.

Updated: 2024-05-27 17:55:01

标题: 校准数据集压缩以加快超参数搜索速度

摘要: 数据集压缩可以用来减少在大型数据集上训练多个模型的计算成本，通过将训练数据集压缩为一个小的合成集。最先进的方法依赖于匹配真实数据和合成数据之间的模型梯度。然而，在实践中，压缩数据的泛化能力没有理论保证：数据压缩在不同超参数/架构之间通常泛化效果差。本文考虑了一个针对超参数搜索的不同压缩目标。我们的目标是生成一个合成的验证数据集，使得在压缩和原始数据集上，具有不同超参数的模型的验证性能排名是可比较的。我们提出了一种新颖的超参数校准数据集压缩（HCDC）算法，通过匹配通过隐式微分和高效逆Hessian逼近计算的超参数梯度来获取合成的验证数据集。实验证明，所提出的框架有效地保持了模型的验证性能排名，并加速了在图像和图形任务上的超参数/架构搜索。

更新时间: 2024-05-27 17:55:01

领域: cs.LG,cs.AI,stat.ML

下载: http://arxiv.org/abs/2405.17535v1

SMR: State Memory Replay for Long Sequence Modeling

Despite the promising performance of state space models (SSMs) in long sequence modeling, limitations still exist. Advanced SSMs like S5 and S6 (Mamba) in addressing non-uniform sampling, their recursive structures impede efficient SSM computation via convolution. To overcome compatibility limitations in parallel convolutional computation, this paper proposes a novel non-recursive non-uniform sample processing strategy. Theoretical analysis of SSMs through the lens of Event-Triggered Control (ETC) theory reveals the Non-Stable State (NSS) problem, where deviations from sampling point requirements lead to error transmission and accumulation, causing the divergence of the SSM's hidden state. Our analysis further reveals that adjustments of input sequences with early memories can mitigate the NSS problem, achieving Sampling Step Adaptation (SSA). Building on this insight, we introduce a simple yet effective plug-and-play mechanism, State Memory Replay (SMR), which utilizes learnable memories to adjust the current state with multi-step information for generalization at sampling points different from those in the training data. This enables SSMs to stably model varying sampling points. Experiments on long-range modeling tasks in autoregressive language modeling and Long Range Arena demonstrate the general effectiveness of the SMR mechanism for a series of SSM models.

Updated: 2024-05-27 17:53:32

标题: SMR：用于长序列建模的状态记忆重播

摘要: 尽管状态空间模型（SSMs）在长序列建模中表现出色，但仍然存在一些限制。像S5和S6（Mamba）这样先进的SSMs在处理非均匀采样时，它们的递归结构会妨碍通过卷积进行高效的SSM计算。为了克服并行卷积计算中的兼容性限制，本文提出了一种新颖的非递归非均匀采样处理策略。通过事件触发控制（ETC）理论的视角对SSMs进行理论分析揭示了非稳定状态（NSS）问题，即与采样点要求的偏差会导致误差传输和累积，导致SSM隐藏状态的发散。我们的分析进一步揭示了通过早期记忆调整输入序列可以减轻NSS问题，实现采样步骤适应（SSA）。基于这一洞察力，我们引入了一种简单而有效的即插即用机制，状态记忆重播（SMR），它利用可学习的记忆来调整当前状态，以多步信息实现在训练数据中不同采样点的泛化。这使得SSMs能够稳定地建模不同采样点。在自回归语言建模和长距离竞技场的长距离建模任务中的实验表明，SMR机制对一系列SSM模型的普遍有效性。

更新时间: 2024-05-27 17:53:32

领域: cs.LG

下载: http://arxiv.org/abs/2405.17534v1

Spectral Greedy Coresets for Graph Neural Networks

The ubiquity of large-scale graphs in node-classification tasks significantly hinders the real-world applications of Graph Neural Networks (GNNs). Node sampling, graph coarsening, and dataset condensation are effective strategies for enhancing data efficiency. However, owing to the interdependence of graph nodes, coreset selection, which selects subsets of the data examples, has not been successfully applied to speed up GNN training on large graphs, warranting special treatment. This paper studies graph coresets for GNNs and avoids the interdependence issue by selecting ego-graphs (i.e., neighborhood subgraphs around a node) based on their spectral embeddings. We decompose the coreset selection problem for GNNs into two phases: a coarse selection of widely spread ego graphs and a refined selection to diversify their topologies. We design a greedy algorithm that approximately optimizes both objectives. Our spectral greedy graph coreset (SGGC) scales to graphs with millions of nodes, obviates the need for model pre-training, and applies to low-homophily graphs. Extensive experiments on ten datasets demonstrate that SGGC outperforms other coreset methods by a wide margin, generalizes well across GNN architectures, and is much faster than graph condensation.

Updated: 2024-05-27 17:52:12

标题: 图神经网络的频谱贪心核心集

摘要: 大规模图在节点分类任务中的普遍存在显著阻碍了图神经网络（GNNs）在现实世界应用中的应用。节点抽样、图粗化和数据集浓缩是增强数据效率的有效策略。然而，由于图节点间的相互依赖关系，选择核心集，即选择数据示例的子集，尚未成功应用于加速大型图上的GNN训练，需要特别处理。本文研究了用于GNN的图核心集，并通过基于它们的谱嵌入选择自我图（即节点周围的邻域子图）来避免相互依赖问题。我们将GNN的核心集选择问题分解为两个阶段：广泛分布的自我图的粗略选择和对其拓扑多样化的精细选择。我们设计了一种贪婪算法，可以近似优化这两个目标。我们的谱贪心图核心集（SGGC）可扩展到具有数百万节点的图，无需模型预训练，并适用于低同质性图。对十个数据集的广泛实验表明，SGGC在性能上远远优于其他核心集方法，可以很好地推广到各种GNN架构，并且比图浓缩快得多。

更新时间: 2024-05-27 17:52:12

领域: cs.LG,cs.AI,stat.ML

下载: http://arxiv.org/abs/2405.17404v1

A Closer Look at Time Steps is Worthy of Triple Speed-Up for Diffusion Model Training

Training diffusion models is always a computation-intensive task. In this paper, we introduce a novel speed-up method for diffusion model training, called, which is based on a closer look at time steps. Our key findings are: i) Time steps can be empirically divided into acceleration, deceleration, and convergence areas based on the process increment. ii) These time steps are imbalanced, with many concentrated in the convergence area. iii) The concentrated steps provide limited benefits for diffusion training. To address this, we design an asymmetric sampling strategy that reduces the frequency of steps from the convergence area while increasing the sampling probability for steps from other areas. Additionally, we propose a weighting strategy to emphasize the importance of time steps with rapid-change process increments. As a plug-and-play and architecture-agnostic approach, SpeeD consistently achieves 3-times acceleration across various diffusion architectures, datasets, and tasks. Notably, due to its simple design, our approach significantly reduces the cost of diffusion model training with minimal overhead. Our research enables more researchers to train diffusion models at a lower cost.

Updated: 2024-05-27 17:51:36

标题: 细看时间步骤值得将扩散模型训练速度提高三倍

摘要: 训练扩散模型始终是一项计算密集型的任务。在本文中，我们介绍了一种用于扩散模型训练的新型加速方法，称为SpeeD，该方法基于对时间步骤的更深入研究。我们的主要发现包括：i) 根据过程增量，时间步骤可以经验性地分为加速、减速和收敛区域。ii) 这些时间步骤是不平衡的，很多集中在收敛区域。iii) 集中的步骤对扩散训练提供有限的好处。为了解决这个问题，我们设计了一种不对称采样策略，减少来自收敛区域的步骤的频率，同时增加来自其他区域的步骤的采样概率。此外，我们提出了一种加权策略，强调具有快速变化过程增量的时间步骤的重要性。作为一种即插即用且与体系结构无关的方法，SpeeD在各种扩散架构、数据集和任务中始终实现3倍加速。值得注意的是，由于其简单的设计，我们的方法显著降低了扩散模型训练的成本，并且开销很小。我们的研究使更多的研究人员能够以更低的成本训练扩散模型。

更新时间: 2024-05-27 17:51:36

领域: cs.LG,cs.AI,I.2

下载: http://arxiv.org/abs/2405.17403v1

RB-Modulation: Training-Free Personalization of Diffusion Models using Stochastic Optimal Control

We propose Reference-Based Modulation (RB-Modulation), a new plug-and-play solution for training-free personalization of diffusion models. Existing training-free approaches exhibit difficulties in (a) style extraction from reference images in the absence of additional style or content text descriptions, (b) unwanted content leakage from reference style images, and (c) effective composition of style and content. RB-Modulation is built on a novel stochastic optimal controller where a style descriptor encodes the desired attributes through a terminal cost. The resulting drift not only overcomes the difficulties above, but also ensures high fidelity to the reference style and adheres to the given text prompt. We also introduce a cross-attention-based feature aggregation scheme that allows RB-Modulation to decouple content and style from the reference image. With theoretical justification and empirical evidence, our framework demonstrates precise extraction and control of content and style in a training-free manner. Further, our method allows a seamless composition of content and style, which marks a departure from the dependency on external adapters or ControlNets.

Updated: 2024-05-27 17:51:08

标题: RB-调制：使用随机最优控制进行扩散模型的无训练个性化

摘要: 我们提出了一种新的即插即用解决方案——基于参考的调制（RB-Modulation），用于无需训练的扩散模型个性化。现有的无需训练的方法存在以下困难：（a）在没有额外样式或内容文本描述的情况下从参考图像中提取风格；（b）从参考风格图像中泄漏不需要的内容；（c）有效地组合风格和内容。RB-Modulation建立在一种新颖的随机最优控制器上，其中一个风格描述符通过终端成本对所需属性进行编码。由此产生的漂移不仅克服了上述困难，还确保了对参考风格的高保真度，并遵循给定的文本提示。我们还引入了一种基于交叉注意力的特征聚合方案，使RB-Modulation能够从参考图像中解耦内容和样式。通过理论证明和实证证据，我们的框架展示了在无需训练的情况下对内容和样式的精确提取和控制。此外，我们的方法允许无缝地组合内容和样式，这标志着摆脱对外部适配器或控制网络的依赖。

更新时间: 2024-05-27 17:51:08

领域: cs.LG,cs.CV,stat.ML

下载: http://arxiv.org/abs/2405.17401v1

PAE: LLM-based Product Attribute Extraction for E-Commerce Fashion Trends

Product attribute extraction is an growing field in e-commerce business, with several applications including product ranking, product recommendation, future assortment planning and improving online shopping customer experiences. Understanding the customer needs is critical part of online business, specifically fashion products. Retailers uses assortment planning to determine the mix of products to offer in each store and channel, stay responsive to market dynamics and to manage inventory and catalogs. The goal is to offer the right styles, in the right sizes and colors, through the right channels. When shoppers find products that meet their needs and desires, they are more likely to return for future purchases, fostering customer loyalty. Product attributes are a key factor in assortment planning. In this paper we present PAE, a product attribute extraction algorithm for future trend reports consisting text and images in PDF format. Most existing methods focus on attribute extraction from titles or product descriptions or utilize visual information from existing product images. Compared to the prior works, our work focuses on attribute extraction from PDF files where upcoming fashion trends are explained. This work proposes a more comprehensive framework that fully utilizes the different modalities for attribute extraction and help retailers to plan the assortment in advance. Our contributions are three-fold: (a) We develop PAE, an efficient framework to extract attributes from unstructured data (text and images); (b) We provide catalog matching methodology based on BERT representations to discover the existing attributes using upcoming attribute values; (c) We conduct extensive experiments with several baselines and show that PAE is an effective, flexible and on par or superior (avg 92.5% F1-Score) framework to existing state-of-the-art for attribute value extraction task.

Updated: 2024-05-27 17:50:25

标题: PAE: 基于LLM的电子商务时尚趋势产品属性提取

摘要: 产品属性提取是电子商务业务中一个不断发展的领域，包括产品排名、产品推荐、未来商品规划和改善在线购物客户体验等多个应用。理解客户需求是在线业务的关键部分，特别是时尚产品。零售商利用商品规划来确定每个店铺和渠道提供的产品组合，以保持对市场动态的响应，并管理库存和目录。目标是通过正确的渠道提供合适的款式、尺码和颜色。当购物者找到满足他们需求和欲望的产品时，他们更有可能返回进行未来购买，培养客户忠诚度。产品属性是商品规划的关键因素。本文提出了PAE，一种用于未来趋势报告的产品属性提取算法，包括PDF格式中的文本和图像。大多数现有方法侧重于从标题或产品描述中提取属性，或利用现有产品图像中的视觉信息。与先前的工作相比，我们的工作侧重于从解释即将到来的时尚趋势的PDF文件中提取属性。该工作提出了一个更全面的框架，充分利用不同形式的属性提取，帮助零售商提前规划产品组合。我们的贡献有三个方面：(a)我们开发了PAE，一个从非结构化数据（文本和图像）中提取属性的高效框架；(b)我们提供基于BERT表示的目录匹配方法，以发现使用即将到来的属性值的现有属性；(c)我们进行了大量实验证明PAE是一个有效、灵活且与现有最先进技术相当或更优秀（平均92.5%的F1-Score）的框架，用于属性值提取任务。

更新时间: 2024-05-27 17:50:25

领域: cs.AI,cs.CV,cs.LG

下载: http://arxiv.org/abs/2405.17533v1

Transformers Can Do Arithmetic with the Right Embeddings

The poor performance of transformers on arithmetic tasks seems to stem in large part from their inability to keep track of the exact position of each digit inside of a large span of digits. We mend this problem by adding an embedding to each digit that encodes its position relative to the start of the number. In addition to the boost these embeddings provide on their own, we show that this fix enables architectural modifications such as input injection and recurrent layers to improve performance even further. With positions resolved, we can study the logical extrapolation ability of transformers. Can they solve arithmetic problems that are larger and more complex than those in their training data? We find that training on only 20 digit numbers with a single GPU for one day, we can reach state-of-the-art performance, achieving up to 99% accuracy on 100 digit addition problems. Finally, we show that these gains in numeracy also unlock improvements on other multi-step reasoning tasks including sorting and multiplication.

Updated: 2024-05-27 17:49:18

标题: 变压器能够通过正确的嵌入进行算术运算

摘要: transformer在算术任务上表现不佳的原因在于它们无法精确追踪大量数字中每个数字的位置。我们通过为每个数字添加一个嵌入来解决这个问题，该嵌入编码了数字相对于数字开头的位置。除了这些嵌入本身提供的提升之外，我们还展示了这种修复使得架构修改，如输入注入和循环层，进一步提高了性能。有了位置确定，我们可以研究transformer的逻辑推断能力。它们能够解决比训练数据中更大更复杂的算术问题吗？我们发现，仅在单个GPU上训练20位数字一个月，我们就可以达到最先进的性能，将100位数字的加法问题的准确率提高到99%。最后，我们展示这些在数学能力上的提升也在其他多步推理任务中带来了改进，包括排序和乘法。

更新时间: 2024-05-27 17:49:18

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2405.17399v1

Vista: A Generalizable Driving World Model with High Fidelity and Versatile Controllability

World models can foresee the outcomes of different actions, which is of paramount importance for autonomous driving. Nevertheless, existing driving world models still have limitations in generalization to unseen environments, prediction fidelity of critical details, and action controllability for flexible application. In this paper, we present Vista, a generalizable driving world model with high fidelity and versatile controllability. Based on a systematic diagnosis of existing methods, we introduce several key ingredients to address these limitations. To accurately predict real-world dynamics at high resolution, we propose two novel losses to promote the learning of moving instances and structural information. We also devise an effective latent replacement approach to inject historical frames as priors for coherent long-horizon rollouts. For action controllability, we incorporate a versatile set of controls from high-level intentions (command, goal point) to low-level maneuvers (trajectory, angle, and speed) through an efficient learning strategy. After large-scale training, the capabilities of Vista can seamlessly generalize to different scenarios. Extensive experiments on multiple datasets show that Vista outperforms the most advanced general-purpose video generator in over 70% of comparisons and surpasses the best-performing driving world model by 55% in FID and 27% in FVD. Moreover, for the first time, we utilize the capacity of Vista itself to establish a generalizable reward for real-world action evaluation without accessing the ground truth actions.

Updated: 2024-05-27 17:49:15

标题: Vista: 一种具有高保真度和多功能可控性的可泛化驾驶世界模型

摘要: 世界模型可以预测不同行动的结果，这对于自动驾驶至关重要。然而，现有的驾驶世界模型在对未知环境的泛化能力、关键细节的预测准确性以及行动可控性方面仍然存在局限。在本文中，我们提出了Vista，一个具有高保真度和多功能可控性的通用驾驶世界模型。通过对现有方法进行系统诊断，我们引入了几个关键元素来解决这些限制。为了准确预测高分辨率的真实世界动态，我们提出了两种新的损失函数来促进对移动实例和结构信息的学习。我们还设计了一种有效的潜在替换方法，将历史帧注入作为一致性长期预测的先验。为了实现行动的可控性，我们通过高效的学习策略将一组多功能控制（高级意图，目标点）与低级机动（轨迹，角度和速度）集成在一起。经过大规模训练，Vista的能力可以无缝地泛化到不同的场景。对多个数据集进行的大量实验表明，Vista在超过70%的比较中优于最先进的通用视频生成器，并且在FID和FVD方面超过最佳驾驶世界模型55%和27%。此外，我们首次利用Vista本身的能力建立了一个可泛化的奖励，用于对真实世界行动进行评估，而无需访问地面真实行动。

更新时间: 2024-05-27 17:49:15

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2405.17398v1

The Expressive Capacity of State Space Models: A Formal Language Perspective

Recently, recurrent models based on linear state space models (SSMs) have shown promising performance in language modeling (LM), competititve with transformers. However, there is little understanding of the in-principle abilities of such models, which could provide useful guidance to the search for better LM architectures. We present a comprehensive theoretical study of the capacity of such SSMs as it compares to that of transformers and traditional RNNs. We find that SSMs and transformers have overlapping but distinct strengths. In star-free state tracking, SSMs implement straightforward and exact solutions to problems that transformers struggle to represent exactly. They can also model bounded hierarchical structure with optimal memory even without simulating a stack. On the other hand, we identify a design choice in current SSMs that limits their expressive power. We discuss implications for SSM and LM research, and verify results empirically on a recent SSM, Mamba.

Updated: 2024-05-27 17:46:57

标题: 状态空间模型的表达能力：形式语言的视角

摘要: 最近，基于线性状态空间模型（SSMs）的循环模型在语言建模（LM）中表现出有希望的性能，与transformers竞争激烈。然而，对于这类模型的原则能力几乎没有深入了解，这可能为寻找更好的LM架构提供有用的指导。我们对SSMs的容量进行了全面的理论研究，与transformers和传统的RNNs进行比较。我们发现，SSMs和transformers具有重叠但不同的优势。在无星号状态跟踪方面，SSMs实现了对transformers难以准确表示的问题的直接和准确的解决方案。它们还可以在不模拟堆栈的情况下以最佳内存模拟有界的分层结构。另一方面，我们确定了当前SSMs中限制其表达能力的设计选择。我们讨论了对SSM和LM研究的影响，并在最近的SSM“Mamba”上通过实证验证了结果。

更新时间: 2024-05-27 17:46:57

领域: cs.CL,cs.FL,cs.LG

下载: http://arxiv.org/abs/2405.17394v1

Dataset-learning duality and emergent criticality

In artificial neural networks, the activation dynamics of non-trainable variables is strongly coupled to the learning dynamics of trainable variables. During the activation pass, the boundary neurons (e.g., input neurons) are mapped to the bulk neurons (e.g., hidden neurons), and during the learning pass, both bulk and boundary neurons are mapped to changes in trainable variables (e.g., weights and biases). For example, in feed-forward neural networks, forward propagation is the activation pass and backward propagation is the learning pass. We show that a composition of the two maps establishes a duality map between a subspace of non-trainable boundary variables (e.g., dataset) and a tangent subspace of trainable variables (i.e., learning). In general, the dataset-learning duality is a complex non-linear map between high-dimensional spaces, but in a learning equilibrium, the problem can be linearized and reduced to many weakly coupled one-dimensional problems. We use the duality to study the emergence of criticality, or the power-law distributions of fluctuations of the trainable variables. In particular, we show that criticality can emerge in the learning system even from the dataset in a non-critical state, and that the power-law distribution can be modified by changing either the activation function or the loss function.

Updated: 2024-05-27 17:44:33

标题: 数据集学习的二元性和新兴的临界性

摘要: 在人工神经网络中，不可训练变量的激活动态与可训练变量的学习动态强烈耦合。在激活过程中，边界神经元（例如输入神经元）被映射到批量神经元（例如隐藏神经元），而在学习过程中，批量和边界神经元都被映射到可训练变量的变化（例如权重和偏差）。例如，在前馈神经网络中，前向传播是激活过程，而反向传播是学习过程。我们表明这两个映射的组合建立了一个非可训练边界变量子空间（例如数据集）和可训练变量（即学习）的切线子空间之间的对偶映射。一般来说，数据集-学习对偶是高维空间之间的复杂非线性映射，但在学习平衡状态下，问题可以被线性化并简化为许多弱耦合的一维问题。我们利用对偶来研究临界性的出现，或者可训练变量的波动的幂律分布。特别地，我们表明即使数据集处于非临界状态，临界性也可以从学习系统中出现，并且幂律分布可以通过改变激活函数或损失函数来修改。

更新时间: 2024-05-27 17:44:33

领域: cs.LG,cs.NE

下载: http://arxiv.org/abs/2405.17391v1

MindMerger: Efficient Boosting LLM Reasoning in non-English Languages

Reasoning capabilities are crucial for Large Language Models (LLMs), yet a notable gap exists between English and non-English languages. To bridge this disparity, some works fine-tune LLMs to relearn reasoning capabilities in non-English languages, while others replace non-English inputs with an external model's outputs such as English translation text to circumvent the challenge of LLM understanding non-English. Unfortunately, these methods often underutilize the built-in skilled reasoning and useful language understanding capabilities of LLMs. In order to better utilize the minds of reasoning and language understanding in LLMs, we propose a new method, namely MindMerger, which merges LLMs with the external language understanding capabilities from multilingual models to boost the multilingual reasoning performance. Furthermore, a two-step training scheme is introduced to first train to embeded the external capabilities into LLMs and then train the collaborative utilization of the external capabilities and the built-in capabilities in LLMs. Experiments on three multilingual reasoning datasets and a language understanding dataset demonstrate that MindMerger consistently outperforms all baselines, especially in low-resource languages. Without updating the parameters of LLMs, the average accuracy improved by 6.7% and 8.0% across all languages and low-resource languages on the MGSM dataset, respectively.

Updated: 2024-05-27 17:41:54

标题: MindMerger：非英语语言中高效 Boosting LLM 推理

摘要: 推理能力对于大型语言模型（LLMs）至关重要，但是英语和非英语语言之间存在明显差距。为了弥合这种差距，一些工作对LLMs进行微调，重新学习非英语语言中的推理能力，而另一些则将非英语输入替换为外部模型的输出，如英语翻译文本，以规避LLMs理解非英语的挑战。不幸的是，这些方法通常没有充分利用LLMs内置的熟练推理和有用的语言理解能力。为了更好地利用LLMs中的推理和语言理解能力，我们提出了一种新方法，即MindMerger，它将LLMs与多语言模型的外部语言理解能力相结合，以提升多语言推理性能。此外，引入了一个两步训练方案，首先训练将外部能力嵌入到LLMs中，然后训练协同利用LLMs中的外部能力和内置能力。在三个多语言推理数据集和一个语言理解数据集上的实验表明，MindMerger在所有基准测试中始终表现优异，特别是在资源稀缺语言中。在不更新LLMs参数的情况下，平均准确率分别在所有语言和资源稀缺语言上提高了6.7%和8.0%。

更新时间: 2024-05-27 17:41:54

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2405.17386v1

Cutting through buggy adversarial example defenses: fixing 1 line of code breaks Sabre

Sabre is a defense to adversarial examples that was accepted at IEEE S&P 2024. We first reveal significant flaws in the evaluation that point to clear signs of gradient masking. We then show the cause of this gradient masking: a bug in the original evaluation code. By fixing a single line of code in the original repository, we reduce Sabre's robust accuracy to 0%. In response to this, the authors modify the defense and introduce a new defense component not described in the original paper. But this fix contains a second bug; modifying one more line of code reduces robust accuracy to below baseline levels. After we released the first version of our paper online, the authors introduced another change to the defense; by commenting out one line of code during attack we reduce the robust accuracy to 0% again.

Updated: 2024-05-27 17:41:06

标题: 穿透有缺陷的对抗性示例防御：修复一行代码破解Sabre

摘要: Sabre是一种对抗性示例的防御，被IEEE S&P 2024接受。我们首先揭示了评估中的重大缺陷，指出明显的梯度掩盖迹象。然后我们展示了这种梯度掩盖的原因：原始评估代码中的一个错误。通过修复原始存储库中的一行代码，我们将Sabre的鲁棒准确率降至0%。作为回应，作者修改了防御，并引入了原始论文中未描述的新防御组件。但这个修复包含了第二个错误；修改一行代码使鲁棒准确率降至基准水平以下。在我们在线发布我们的论文的第一个版本之后，作者对防御进行了另一个更改；在攻击过程中注释掉一行代码，我们再次将鲁棒准确率降至0%。

更新时间: 2024-05-27 17:41:06

领域: cs.CR,cs.LG

下载: http://arxiv.org/abs/2405.03672v2

ReMoDetect: Reward Models Recognize Aligned LLM's Generations

The remarkable capabilities and easy accessibility of large language models (LLMs) have significantly increased societal risks (e.g., fake news generation), necessitating the development of LLM-generated text (LGT) detection methods for safe usage. However, detecting LGTs is challenging due to the vast number of LLMs, making it impractical to account for each LLM individually; hence, it is crucial to identify the common characteristics shared by these models. In this paper, we draw attention to a common feature of recent powerful LLMs, namely the alignment training, i.e., training LLMs to generate human-preferable texts. Our key finding is that as these aligned LLMs are trained to maximize the human preferences, they generate texts with higher estimated preferences even than human-written texts; thus, such texts are easily detected by using the reward model (i.e., an LLM trained to model human preference distribution). Based on this finding, we propose two training schemes to further improve the detection ability of the reward model, namely (i) continual preference fine-tuning to make the reward model prefer aligned LGTs even further and (ii) reward modeling of Human/LLM mixed texts (a rephrased texts from human-written texts using aligned LLMs), which serves as a median preference text corpus between LGTs and human-written texts to learn the decision boundary better. We provide an extensive evaluation by considering six text domains across twelve aligned LLMs, where our method demonstrates state-of-the-art results. Code is available at https://github.com/hyunseoklee-ai/reward_llm_detect.

Updated: 2024-05-27 17:38:33

标题: ReMoDetect：奖励模型识别对齐的LLM生成

摘要: 语言模型的显著能力和易于获取性显著增加了社会风险（例如，虚假新闻生成），必须开发用于安全使用的LLM生成文本（LGT）检测方法。然而，由于LLM数量庞大，检测LGT变得具有挑战性，因此无法单独考虑每个LLM；因此，识别这些模型共享的普遍特征至关重要。本文着重介绍了最近强大的LLM的一个共同特征，即对齐训练，即训练LLM生成人类偏好文本。我们的关键发现是，由于这些对齐的LLM被训练为最大化人类偏好，它们生成的文本甚至比人类编写的文本具有更高的估计偏好；因此，通过使用奖励模型（即，经过训练以模拟人类偏好分布的LLM）可以轻松检测此类文本。基于这一发现，我们提出了两种培训方案，进一步提高奖励模型的检测能力，即（i）持续偏好微调，以使奖励模型更倾向于对齐的LGT，甚至更进一步，以及（ii）对人类/LLM混合文本进行奖励建模（使用对齐的LLM重新表述人类编写的文本生成的文本），这作为LGT和人类编写的文本之间的中位偏好文本语料库，更好地学习决策边界。我们通过考虑12个对齐的LLM跨六个文本领域进行了广泛评估，其中我们的方法展示了最新的结果。代码可在https://github.com/hyunseoklee-ai/reward_llm_detect找到。

更新时间: 2024-05-27 17:38:33

领域: cs.LG,cs.CL

下载: http://arxiv.org/abs/2405.17382v1

Controllable Text Summarization: Unraveling Challenges, Approaches, and Prospects -- A Survey

Generic text summarization approaches often fail to address the specific intent and needs of individual users. Recently, scholarly attention has turned to the development of summarization methods that are more closely tailored and controlled to align with specific objectives and user needs. Despite a growing corpus of controllable summarization research, there is no comprehensive survey available that thoroughly explores the diverse controllable attributes employed in this context, delves into the associated challenges, and investigates the existing solutions. In this survey, we formalize the Controllable Text Summarization (CTS) task, categorize controllable attributes according to their shared characteristics and objectives, and present a thorough examination of existing datasets and methods within each category. Moreover, based on our findings, we uncover limitations and research gaps, while also exploring potential solutions and future directions for CTS. We release our detailed analysis of CTS papers at \url{https://github.com/ashokurlana/controllable\_text\_summarization\_survey}.

Updated: 2024-05-27 17:36:38

标题: 可控文本摘要：揭示挑战、方法和前景--一项调查

摘要: 通常通用文本摘要方法往往无法满足个体用户的特定意图和需求。最近，学术界开始关注开发更贴近特定目标和用户需求的摘要方法。尽管可控制摘要研究领域的语料库不断增长，但目前还没有全面的调查可控制属性在这一背景下的使用情况，并深入探讨相关挑战并调查现有解决方案。在这项调查中，我们正式界定了可控制文本摘要(CTS)任务，根据它们的共同特征和目标对可控制属性进行分类，并对每个类别中现有数据集和方法进行了深入研究。此外，根据我们的调查结果，我们揭示了限制和研究空白，同时探讨了可控制文本摘要的潜在解决方案和未来发展方向。我们在\url{https://github.com/ashokurlana/controllable\_text\_summarization\_survey}发布了对CTS论文的详细分析。

更新时间: 2024-05-27 17:36:38

领域: cs.CL,cs.AI,I.2.7

下载: http://arxiv.org/abs/2311.09212v2

RTL-Repo: A Benchmark for Evaluating LLMs on Large-Scale RTL Design Projects

Large Language Models (LLMs) have demonstrated potential in assisting with Register Transfer Level (RTL) design tasks. Nevertheless, there remains to be a significant gap in benchmarks that accurately reflect the complexity of real-world RTL projects. To address this, this paper presents RTL-Repo, a benchmark specifically designed to evaluate LLMs on large-scale RTL design projects. RTL-Repo includes a comprehensive dataset of more than 4000 Verilog code samples extracted from public GitHub repositories, with each sample providing the full context of the corresponding repository. We evaluate several state-of-the-art models on the RTL-Repo benchmark, including GPT-4, GPT-3.5, Starcoder2, alongside Verilog-specific models like VeriGen and RTLCoder, and compare their performance in generating Verilog code for complex projects. The RTL-Repo benchmark provides a valuable resource for the hardware design community to assess and compare LLMs' performance in real-world RTL design scenarios and train LLMs specifically for Verilog code generation in complex, multi-file RTL projects. RTL-Repo is open-source and publicly available on Github.

Updated: 2024-05-27 17:36:01

标题: RTL-Repo：用于评估大规模RTL设计项目上LLMs的基准测试

摘要: 大型语言模型(LLMs)已经展示出在辅助Register Transfer Level (RTL)设计任务方面的潜力。然而，仍然存在一个显著的差距，即准确反映实际RTL项目复杂性的基准。为了解决这个问题，本文提出了RTL-Repo，这是一个专门设计用于评估LLMs在大规模RTL设计项目上的基准。RTL-Repo包括一个全面的数据集，其中包含从公共GitHub存储库中提取的4000多个Verilog代码样本，每个样本提供相应存储库的完整上下文。我们在RTL-Repo基准上评估了几个最先进的模型，包括GPT-4、GPT-3.5、Starcoder2，以及Verilog特定模型如VeriGen和RTLCoder，并比较它们在为复杂项目生成Verilog代码方面的表现。RTL-Repo基准为硬件设计社区提供了一个宝贵的资源，用于评估和比较LLMs在实际RTL设计场景中的性能，并专门为复杂的多文件RTL项目中的Verilog代码生成训练LLMs。RTL-Repo是开源的，并在Github上公开可用。

更新时间: 2024-05-27 17:36:01

领域: cs.LG,cs.AR

下载: http://arxiv.org/abs/2405.17378v1

How Does Perfect Fitting Affect Representation Learning? On the Training Dynamics of Representations in Deep Neural Networks

In this paper, we elucidate how representations in deep neural networks (DNNs) evolve during training. We focus on overparameterized learning settings where the training continues much after the trained DNN starts to perfectly fit its training data. We examine the evolution of learned representations along the entire training process, including its perfect fitting regime, and with respect to the epoch-wise double descent phenomenon. We explore the representational similarity of DNN layers, each layer with respect to its own representations throughout the training process. For this, we use two similarity metrics: (1) The centered kernel alignment (CKA) similarity; (2) Similarity of decision regions of linear classifier probes that we train for the DNN layers. Our extensive experiments discover training dynamics patterns that can emerge in layers depending on the relative layer-depth, DNN width, and architecture. We show that representations at the deeper layers evolve much more in the training when an epoch-wise double descent occurs. For Vision Transformer, we show that the perfect fitting threshold creates a transition in the evolution of representations across all the encoder blocks.

Updated: 2024-05-27 17:33:03

标题: 完美拟合如何影响表示学习？深度神经网络中表示学习的训练动态

摘要: 在本文中，我们阐明了深度神经网络（DNNs）中的表示在训练过程中是如何演变的。我们专注于过度参数化的学习设置，其中训练在训练后续继续进行，直到训练的DNN开始完美拟合其训练数据。我们研究了学习表示在整个训练过程中的演变，包括其完美拟合阶段，以及对每个时代双下降现象的影响。我们探讨了DNN层的表示相似性，每个层相对于其自身在整个训练过程中的表示。为此，我们使用了两种相似性度量：（1）中心核对齐（CKA）相似性；（2）我们为DNN层训练的线性分类器探针的决策区域相似性。我们的广泛实验发现了在不同层中可能出现的训练动态模式，这取决于相对层深度、DNN宽度和架构。我们表明，在时代双下降发生时，深层的表示在训练过程中更多地演变。对于Vision Transformer，我们表明完美拟合阈值在整个编码器块的表示演变中创建了一个过渡。

更新时间: 2024-05-27 17:33:03

领域: cs.LG

下载: http://arxiv.org/abs/2405.17377v1

Navigating the Safety Landscape: Measuring Risks in Finetuning Large Language Models

Safety alignment is the key to guiding the behaviors of large language models (LLMs) that are in line with human preferences and restrict harmful behaviors at inference time, but recent studies show that it can be easily compromised by finetuning with only a few adversarially designed training examples. We aim to measure the risks in finetuning LLMs through navigating the LLM safety landscape. We discover a new phenomenon observed universally in the model parameter space of popular open-source LLMs, termed as "safety basin": randomly perturbing model weights maintains the safety level of the original aligned model in its local neighborhood. Our discovery inspires us to propose the new VISAGE safety metric that measures the safety in LLM finetuning by probing its safety landscape. Visualizing the safety landscape of the aligned model enables us to understand how finetuning compromises safety by dragging the model away from the safety basin. LLM safety landscape also highlights the system prompt's critical role in protecting a model, and that such protection transfers to its perturbed variants within the safety basin. These observations from our safety landscape research provide new insights for future work on LLM safety community.

Updated: 2024-05-27 17:31:56

标题: 导航安全景观：在细调大型语言模型中测量风险

摘要: 安全对齐是引导大型语言模型（LLMs）行为符合人类偏好并限制有害行为的关键，在推断时，但最近的研究表明，通过仅使用少量对抗设计的训练示例微调可以轻易地破坏它。我们的目标是通过导航LLM安全景观来衡量微调LLMs的风险。我们发现了一种新现象，在流行的开源LLMs的模型参数空间中普遍观察到，称为“安全盆地”：随机扰动模型权重在其局部邻域内保持原始对齐模型的安全水平。我们的发现激发我们提出了新的VISAGE安全度量标准，通过探测其安全景观来衡量LLM微调中的安全性。可视化对齐模型的安全景观使我们能够了解微调如何通过将模型拖离安全盆地来破坏安全性。LLM安全景观还突出了系统提示在保护模型中的关键作用，这种保护会转移到其在安全盆地内的扰动变体。我们安全景观研究的这些观察为未来LLM安全社区的工作提供了新的见解。

更新时间: 2024-05-27 17:31:56

领域: cs.LG

下载: http://arxiv.org/abs/2405.17374v1

AI-based analysis of super-resolution microscopy: Biological discovery in the absence of ground truth

Super-resolution microscopy, or nanoscopy, enables the use of fluorescent-based molecular localization tools to study molecular structure at the nanoscale level in the intact cell, bridging the mesoscale gap to classical structural biology methodologies. Analysis of super-resolution data by artificial intelligence (AI), such as machine learning, offers tremendous potential for discovery of new biology, that, by definition, is not known and lacks ground truth. Herein, we describe the application of weakly supervised paradigms to super-resolution microscopy and its potential to enable the accelerated exploration of the nanoscale architecture of subcellular macromolecules and organelles.

Updated: 2024-05-27 17:31:37

标题: 基于人工智能的超分辨显微镜分析：在没有基准真相的情况下进行生物发现

摘要: 超分辨显微镜，或称纳米显微镜，使得可以利用基于荧光的分子定位工具来研究完整细胞中的分子结构，弥合了中等尺度差距到传统结构生物学方法的缝隙。通过人工智能（AI）如机器学习对超分辨数据的分析，为发现新的生物学提供了巨大潜力，这些生物学在定义上是未知的，并且缺乏基本真相。在这里，我们描述了弱监督范式在超分辨显微镜中的应用以及其潜力，可以加速探索亚细胞大分子和细胞器的纳米级结构。

更新时间: 2024-05-27 17:31:37

领域: q-bio.SC,cs.AI,cs.CV,cs.LG,physics.bio-ph,q-bio.QM

下载: http://arxiv.org/abs/2305.17193v2

BehaviorGPT: Smart Agent Simulation for Autonomous Driving with Next-Patch Prediction

Simulating realistic interactions among traffic agents is crucial for efficiently validating the safety of autonomous driving systems. Existing leading simulators primarily use an encoder-decoder structure to encode the historical trajectories for future simulation. However, such a paradigm complicates the model architecture, and the manual separation of history and future trajectories leads to low data utilization. To address these challenges, we propose Behavior Generative Pre-trained Transformers (BehaviorGPT), a decoder-only, autoregressive architecture designed to simulate the sequential motion of multiple agents. Crucially, our approach discards the traditional separation between "history" and "future," treating each time step as the "current" one, resulting in a simpler, more parameter- and data-efficient design that scales seamlessly with data and computation. Additionally, we introduce the Next-Patch Prediction Paradigm (NP3), which enables models to reason at the patch level of trajectories and capture long-range spatial-temporal interactions. BehaviorGPT ranks first across several metrics on the Waymo Sim Agents Benchmark, demonstrating its exceptional performance in multi-agent and agent-map interactions. We outperformed state-of-the-art models with a realism score of 0.741 and improved the minADE metric to 1.540, with an approximately 91.6% reduction in model parameters.

Updated: 2024-05-27 17:28:25

标题: BehaviorGPT：具有下一步预测的自动驾驶智能代理模拟

摘要: 模拟交通代理之间的真实互动对于有效验证自动驾驶系统的安全性至关重要。现有的领先模拟器主要使用编码器-解码器结构来编码历史轨迹以进行未来模拟。然而，这种范式使模型架构复杂化，手动将历史和未来轨迹分离导致数据利用率低。为了解决这些挑战，我们提出了行为生成预训练变换器（BehaviorGPT），这是一种仅解码器、自回归架构，旨在模拟多个代理的顺序运动。关键在于，我们的方法取消了传统的“历史”和“未来”之间的分离，将每个时间步视为“当前”时间步，从而设计出更简单、更参数和数据有效的设计，能够与数据和计算无缝扩展。此外，我们引入了下一个补丁预测范式（NP3），使模型能够在轨迹的补丁级别进行推理，并捕获长距离的时空交互作用。BehaviorGPT在Waymo Sim Agents Benchmark上几个指标中排名第一，展示了其在多代理和代理-地图互动中的出色性能。我们以0.741的逼真得分超越了最先进的模型，并将minADE指标提高到1.540，模型参数约减少了91.6%。

更新时间: 2024-05-27 17:28:25

领域: cs.AI,cs.LG,cs.RO

下载: http://arxiv.org/abs/2405.17372v1

Model-Agnostic Zeroth-Order Policy Optimization for Meta-Learning of Ergodic Linear Quadratic Regulators

Meta-learning has been proposed as a promising machine learning topic in recent years, with important applications to image classification, robotics, computer games, and control systems. In this paper, we study the problem of using meta-learning to deal with uncertainty and heterogeneity in ergodic linear quadratic regulators. We integrate the zeroth-order optimization technique with a typical meta-learning method, proposing an algorithm that omits the estimation of policy Hessian, which applies to tasks of learning a set of heterogeneous but similar linear dynamic systems. The induced meta-objective function inherits important properties of the original cost function when the set of linear dynamic systems are meta-learnable, allowing the algorithm to optimize over a learnable landscape without projection onto the feasible set. We provide a convergence result for the exact gradient descent process by analyzing the boundedness and smoothness of the gradient for the meta-objective, which justify the proposed algorithm with gradient estimation error being small. We also provide a numerical example to corroborate this perspective.

Updated: 2024-05-27 17:26:36

标题: 对于元学习的零阶策略优化对模型不可知的遍历线性二次调节器进行翻译

摘要: 近年来，元学习被提出作为一个有前途的机器学习主题，具有重要的应用于图像分类、机器人、电脑游戏和控制系统。本文研究了利用元学习来处理遍历线性二次调节器中的不确定性和异质性的问题。我们将零阶优化技术与典型的元学习方法相结合，提出了一种算法，省略了对策略Hessian的估计，适用于学习一组异质但相似的线性动态系统的任务。当一组线性动态系统是可元学习的时，诱导的元目标函数继承了原始成本函数的重要属性，使得算法能够在可学习的景观上进行优化，而无需投影到可行集上。通过分析元目标函数的有界性和平滑性，我们为精确梯度下降过程提供了一个收敛结果，从而证明了所提出的算法在梯度估计误差较小的情况下是合理的。我们还提供了一个数值示例来证实这一观点。

更新时间: 2024-05-27 17:26:36

领域: eess.SY,cs.LG,cs.SY

下载: http://arxiv.org/abs/2405.17370v1

A Theoretical Framework for Partially Observed Reward-States in RLHF

The growing deployment of reinforcement learning from human feedback (RLHF) calls for a deeper theoretical investigation of its underlying models. The prevalent models of RLHF do not account for neuroscience-backed, partially-observed "internal states" that can affect human feedback, nor do they accommodate intermediate feedback during an interaction. Both of these can be instrumental in speeding up learning and improving alignment. To address these limitations, we model RLHF as reinforcement learning with partially observed reward-states (PORRL). We accommodate two kinds of feedback $-$ cardinal and dueling feedback. We first demonstrate that PORRL subsumes a wide class of RL problems, including traditional RL, RLHF, and reward machines. For cardinal feedback, we present two model-based methods (POR-UCRL, POR-UCBVI). We give both cardinal regret and sample complexity guarantees for the methods, showing that they improve over naive history-summarization. We then discuss the benefits of a model-free method like GOLF with naive history-summarization in settings with recursive internal states and dense intermediate feedback. For this purpose, we define a new history aware version of the Bellman-eluder dimension and give a new guarantee for GOLF in our setting, which can be exponentially sharper in illustrative examples. For dueling feedback, we show that a naive reduction to cardinal feedback fails to achieve sublinear dueling regret. We then present the first explicit reduction that converts guarantees for cardinal regret to dueling regret. In both feedback settings, we show that our models and guarantees generalize and extend existing ones.

Updated: 2024-05-27 17:20:41

标题: 强化学习中部分观测奖励状态的理论框架

摘要: 随着强化学习从人类反馈（RLHF）的日益部署，需要对其基础模型进行更深入的理论研究。目前的RLHF模型并未考虑到可以影响人类反馈的部分观察到的“内部状态”，也未能适应互动过程中的中间反馈。这两者都可以加快学习速度并提高对齐性。为了解决这些限制，我们将RLHF建模为具有部分观察奖励状态（PORRL）的强化学习。我们适应了两种反馈方式——基数和对决反馈。我们首先证明PORRL涵盖了广泛的RL问题类别，包括传统RL、RLHF和奖励机制。对于基数反馈，我们提出了两种基于模型的方法（POR-UCRL，POR-UCBVI）。我们为这些方法提供了基数遗憾和样本复杂性保证，显示它们优于天真的历史总结。然后我们讨论了在具有递归内部状态和密集中间反馈的情境中像GOLF这样的无模型方法的优势。为此，我们定义了贝尔曼-干扰器维度的新的历史感知版本，并在我们的情境中为GOLF提供了一个新的保证，这在说明性示例中可以呈指数级更加尖锐。对于对决反馈，我们表明天真地将其简化为基数反馈未能实现次线性对决遗憾。然后我们提出了第一个明确的简化，将基数遗憾的保证转化为对决遗憾。在两种反馈设置中，我们表明我们的模型和保证推广并扩展了现有的模型。

更新时间: 2024-05-27 17:20:41

领域: cs.LG,cs.AI,stat.ML

下载: http://arxiv.org/abs/2402.03282v2

EM-GANSim: Real-time and Accurate EM Simulation Using Conditional GANs for 3D Indoor Scenes

We present a novel machine-learning (ML) approach (EM-GANSim) for real-time electromagnetic (EM) propagation that is used for wireless communication simulation in 3D indoor environments. Our approach uses a modified conditional Generative Adversarial Network (GAN) that incorporates encoded geometry and transmitter location while adhering to the electromagnetic propagation theory. The overall physically-inspired learning is able to predict the power distribution in 3D scenes, which is represented using heatmaps. Our overall accuracy is comparable to ray tracing-based EM simulation, as evidenced by lower mean squared error values. Furthermore, our GAN-based method drastically reduces the computation time, achieving a 5X speedup on complex benchmarks. In practice, it can compute the signal strength in a few milliseconds on any location in 3D indoor environments. We also present a large dataset of 3D models and EM ray tracing-simulated heatmaps. To the best of our knowledge, EM-GANSim is the first real-time algorithm for EM simulation in complex 3D indoor environments. We plan to release the code and the dataset.

Updated: 2024-05-27 17:19:02

标题: EM-GANSim：使用条件GAN在3D室内场景中实现实时和准确的电磁仿真

摘要: 我们提出了一种新颖的机器学习（ML）方法（EM-GANSim），用于在3D室内环境中进行无线通信仿真的实时电磁（EM）传播。我们的方法使用了一个修改后的条件生成对抗网络（GAN），该网络结合了编码几何和发射机位置，同时遵循电磁传播理论。总体上，基于物理启发的学习能够预测3D场景中的功率分布，该分布用热图表示。我们的整体准确性与基于射线追踪的EM仿真相当，证明了更低的均方误差值。此外，我们基于GAN的方法大大减少了计算时间，在复杂基准测试中实现了5倍速度提升。在实践中，它可以在3D室内环境中的任何位置计算信号强度，仅需几毫秒时间。我们还提供了一个包含3D模型和EM射线追踪仿真热图的大型数据集。据我们所知，EM-GANSim是复杂3D室内环境中首个实时电磁仿真算法。我们计划发布代码和数据集。

更新时间: 2024-05-27 17:19:02

领域: cs.LG,eess.SP

下载: http://arxiv.org/abs/2405.17366v1

Generating Likely Counterfactuals Using Sum-Product Networks

Explainability of decisions made by AI systems is driven by both recent regulation and user demand. These decisions are often explainable only \emph{post hoc}, after the fact. In counterfactual explanations, one may ask what constitutes the best counterfactual explanation. Clearly, multiple criteria must be taken into account, although "distance from the sample" is a key criterion. Recent methods that consider the plausibility of a counterfactual seem to sacrifice this original objective. Here, we present a system that provides high-likelihood explanations that are, at the same time, close and sparse. We show that the search for the most likely explanations satisfying many common desiderata for counterfactual explanations can be modeled using mixed-integer optimization (MIO). In the process, we propose an MIO formulation of a Sum-Product Network (SPN) and use the SPN to estimate the likelihood of a counterfactual, which can be of independent interest.

Updated: 2024-05-27 17:17:08

标题: 使用Sum-Product Networks生成可能的反事实情况

摘要: AI系统所做决策的可解释性受到最近的监管和用户需求的驱动。这些决策通常只能在事后解释。在反事实解释中，人们可能会问什么构成最佳的反事实解释。显然，必须考虑多个标准，尽管“距离样本的距离”是一个关键标准。最近的方法似乎牺牲了这一最初的目标，考虑到反事实的可信度。在这里，我们提出了一个系统，提供高可能性的解释，同时又接近和稀疏。我们展示了寻找满足反事实解释的许多常见愿望的最可能解释的过程可以使用混合整数优化（MIO）进行建模。在此过程中，我们提出了一个Sum-Product Network（SPN）的MIO公式，并使用SPN来估计反事实的可能性，这可能是独立感兴趣的。

更新时间: 2024-05-27 17:17:08

领域: cs.AI,cs.LG,math.OC

下载: http://arxiv.org/abs/2401.14086v2

Pre-training with Synthetic Data Helps Offline Reinforcement Learning

Recently, it has been shown that for offline deep reinforcement learning (DRL), pre-training Decision Transformer with a large language corpus can improve downstream performance (Reid et al., 2022). A natural question to ask is whether this performance gain can only be achieved with language pre-training, or can be achieved with simpler pre-training schemes which do not involve language. In this paper, we first show that language is not essential for improved performance, and indeed pre-training with synthetic IID data for a small number of updates can match the performance gains from pre-training with a large language corpus; moreover, pre-training with data generated by a one-step Markov chain can further improve the performance. Inspired by these experimental results, we then consider pre-training Conservative Q-Learning (CQL), a popular offline DRL algorithm, which is Q-learning-based and typically employs a Multi-Layer Perceptron (MLP) backbone. Surprisingly, pre-training with simple synthetic data for a small number of updates can also improve CQL, providing consistent performance improvement on D4RL Gym locomotion datasets. The results of this paper not only illustrate the importance of pre-training for offline DRL but also show that the pre-training data can be synthetic and generated with remarkably simple mechanisms.

Updated: 2024-05-27 17:16:03

标题: 使用合成数据进行预训练有助于离线强化学习

摘要: 最近的研究表明，对于离线深度强化学习（DRL），使用大型语言语料库对决策变换器进行预训练可以提高下游性能（Reid等人，2022）。一个自然的问题是，这种性能提升只能通过语言预训练来实现，还是可以通过不涉及语言的更简单的预训练方案来实现。在本文中，我们首先展示了语言对于提高性能并非必要，实际上，使用合成IID数据进行少量更新的预训练可以与使用大型语言语料库进行预训练的性能提升相匹配；此外，使用一步马尔可夫链生成的数据进行预训练可以进一步提高性能。受这些实验结果的启发，我们考虑了保守Q-学习（CQL），这是一种流行的离线DRL算法，基于Q-学习，并通常使用多层感知器（MLP）作为骨干网络。令人惊讶的是，使用简单的合成数据进行少量更新的预训练也可以改善CQL，在D4RL Gym运动数据集上提供一致的性能改进。本文的结果不仅说明了离线DRL的预训练的重要性，还表明预训练数据可以是合成的，并且可以通过非常简单的机制生成。

更新时间: 2024-05-27 17:16:03

领域: cs.AI,cs.LG

下载: http://arxiv.org/abs/2310.00771v4

Loss Landscape of Shallow ReLU-like Neural Networks: Stationary Points, Saddle Escaping, and Network Embedding

In this paper, we investigate the loss landscape of one-hidden-layer neural networks with ReLU-like activation functions trained with the empirical squared loss. As the activation function is non-differentiable, it is so far unclear how to completely characterize the stationary points. We propose the conditions for stationarity that apply to both non-differentiable and differentiable cases. Additionally, we show that, if a stationary point does not contain "escape neurons", which are defined with first-order conditions, then it must be a local minimum. Moreover, for the scalar-output case, the presence of an escape neuron guarantees that the stationary point is not a local minimum. Our results refine the description of the saddle-to-saddle training process starting from infinitesimally small (vanishing) initialization for shallow ReLU-like networks, linking saddle escaping directly with the parameter changes of escape neurons. Moreover, we are also able to fully discuss how network embedding, which is to instantiate a narrower network within a wider network, reshapes the stationary points.

Updated: 2024-05-27 17:08:59

标题: Shallow ReLU-like神经网络的损失景观：固定点，鞍点逃逸和网络嵌入

摘要: 在这篇论文中，我们研究了使用ReLU类激活函数训练的具有一个隐藏层的神经网络的损失景观。由于激活函数是不可微的，目前尚不清楚如何完全描述稳定点。我们提出了适用于不可微和可微情况的稳定条件。此外，我们表明，如果一个稳定点不包含“逃逸神经元”，即根据一阶条件定义的神经元，则它必须是局部最小值。此外，对于标量输出情况，逃逸神经元的存在保证了稳定点不是局部最小值。我们的结果完善了对于浅层ReLU类网络从无穷小（消失）初始化开始的鞍点到鞍点训练过程的描述，直接将鞍点逃逸与逃逸神经元的参数变化联系起来。此外，我们还能够充分讨论网络嵌入如何重新塑造稳定点，即在更宽的网络中实例化一个较窄的网络。

更新时间: 2024-05-27 17:08:59

领域: cs.LG

下载: http://arxiv.org/abs/2402.05626v3

Memory Efficient Neural Processes via Constant Memory Attention Block

Neural Processes (NPs) are popular meta-learning methods for efficiently modelling predictive uncertainty. Recent state-of-the-art methods, however, leverage expensive attention mechanisms, limiting their applications, particularly in low-resource settings. In this work, we propose Constant Memory Attentive Neural Processes (CMANPs), an NP variant that only requires constant memory. To do so, we first propose an efficient update operation for Cross Attention. Leveraging the update operation, we propose Constant Memory Attention Block (CMAB), a novel attention block that (i) is permutation invariant, (ii) computes its output in constant memory, and (iii) performs constant computation updates. Finally, building on CMAB, we detail Constant Memory Attentive Neural Processes. Empirically, we show CMANPs achieve state-of-the-art results on popular NP benchmarks while being significantly more memory efficient than prior methods.

Updated: 2024-05-27 17:06:51

标题: 通过恒定内存注意块实现的内存高效神经过程 (Note: This translation may not be perfect as it is based on machine translation.)

摘要: 神经过程（NPs）是一种流行的元学习方法，用于高效地建模预测不确定性。然而，最近的最先进方法利用昂贵的注意机制，限制了它们的应用，特别是在资源匮乏的环境中。在这项工作中，我们提出了Constant Memory Attentive Neural Processes（CMANPs），这是一种仅需要恒定内存的NP变种。为此，我们首先提出了一种用于交叉注意力的高效更新操作。利用更新操作，我们提出了Constant Memory Attention Block（CMAB），这是一种新颖的注意力块，它（i）是置换不变的，（ii）在恒定内存中计算其输出，（iii）执行恒定计算更新。最后，基于CMAB，我们详细介绍了Constant Memory Attentive Neural Processes。在实证方面，我们展示了CMANPs在流行的NP基准测试中取得了最先进的结果，同时比先前的方法显著更节省内存。

更新时间: 2024-05-27 17:06:51

领域: cs.LG,cs.CV

下载: http://arxiv.org/abs/2305.14567v3

FALCON: Scalable Reasoning over Inconsistent ALC Ontologies

Ontologies are one of the richest sources of knowledge. Real-world ontologies often contain thousands of axioms and are often human-made. Hence, they may contain inconsistency and incomplete information which may impair classical reasoners to compute entailments that are considered as useful. To overcome these two challenges, we propose FALCON, a Fuzzy Ontology Neural reasoner to approximate reasoning over ALC ontologies. We provide an approximate technique for the model generation step in classical ALC reasoners. Our approximation is not guaranteed to construct exact logical models, but can approximate arbitrary models, which is notably faster for some large ontologies. Moreover, by sampling multiple approximate logical models, our technique supports approximate entailment also over inconsistent ontologies. Theoretical results show that more models generated lead to closer, i.e., faithful approximation of entailment over ALC entailments. Experimental results show that FALCON enables approximate reasoning and reasoning in the presence of inconsistency. Our experiments further demonstrate how ontologies can improve knowledge base completion in biomedicine by incorporating knowledge expressed in ALC.

Updated: 2024-05-27 17:04:30

标题: FALCON：在不一致的ALC本体上可扩展推理

摘要: 本文介绍了本体论是知识的最丰富来源之一。现实世界中的本体往往包含成千上万个公理，并且通常是人为制定的。因此，它们可能包含不一致和不完整的信息，这可能会影响经典推理器计算被认为有用的推导。为了克服这两个挑战，我们提出了FALCON，一种模糊本体神经推理器，用于在ALC本体上进行近似推理。我们为经典ALC推理器中的模型生成步骤提供了一种近似技术。我们的近似不保证构建精确的逻辑模型，但可以近似任意模型，这对一些大本体来说明显更快。此外，通过对多个近似逻辑模型进行采样，我们的技术支持在不一致的本体上进行近似推导。理论结果表明，生成更多模型会导致更接近的，即忠实的近似推导。实验结果表明，FALCON使得近似推理和在不一致情况下的推理成为可能。我们的实验进一步展示了本体如何通过整合ALC中表达的知识来改进生物医学知识库的完整性。

更新时间: 2024-05-27 17:04:30

领域: cs.AI,cs.LO

下载: http://arxiv.org/abs/2208.07628v5

Rethinking Transformers in Solving POMDPs

Sequential decision-making algorithms such as reinforcement learning (RL) in real-world scenarios inevitably face environments with partial observability. This paper scrutinizes the effectiveness of a popular architecture, namely Transformers, in Partially Observable Markov Decision Processes (POMDPs) and reveals its theoretical limitations. We establish that regular languages, which Transformers struggle to model, are reducible to POMDPs. This poses a significant challenge for Transformers in learning POMDP-specific inductive biases, due to their lack of inherent recurrence found in other models like RNNs. This paper casts doubt on the prevalent belief in Transformers as sequence models for RL and proposes to introduce a point-wise recurrent structure. The Deep Linear Recurrent Unit (LRU) emerges as a well-suited alternative for Partially Observable RL, with empirical results highlighting the sub-optimal performance of the Transformer and considerable strength of LRU.

Updated: 2024-05-27 17:02:35

标题: 重新思考使用Transformer解决POMDPs

摘要: 顺序决策算法如强化学习（RL）在现实世界场景中不可避免地面临部分可观察性的环境。本文审查了一种流行的架构，即变压器，在部分可观察的马尔可夫决策过程（POMDPs）中的有效性，并揭示了其理论局限性。我们确定了常规语言，变压器难以建模，可以简化为POMDPs。这对于变压器在学习POMDP特定归纳偏见方面构成重大挑战，因为它们缺乏像RNN这样的其他模型中具有的内在循环性。本文对变压器作为RL序列模型的普遍信念提出质疑，并建议引入逐点递归结构。深度线性递归单元（LRU）出现为部分可观察RL的合适替代品，实证结果突出显示了变压器的次优性能以及LRU的显著优势。

更新时间: 2024-05-27 17:02:35

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2405.17358v1

Why are Sensitive Functions Hard for Transformers?

Empirical studies have identified a range of learnability biases and limitations of transformers, such as a persistent difficulty in learning to compute simple formal languages such as PARITY, and a bias towards low-degree functions. However, theoretical understanding remains limited, with existing expressiveness theory either overpredicting or underpredicting realistic learning abilities. We prove that, under the transformer architecture, the loss landscape is constrained by the input-space sensitivity: Transformers whose output is sensitive to many parts of the input string inhabit isolated points in parameter space, leading to a low-sensitivity bias in generalization. We show theoretically and empirically that this theory unifies a broad array of empirical observations about the learning abilities and biases of transformers, such as their generalization bias towards low sensitivity and low degree, and difficulty in length generalization for PARITY. This shows that understanding transformers' inductive biases requires studying not just their in-principle expressivity, but also their loss landscape.

Updated: 2024-05-27 17:01:29

标题: 为什么敏感函数对于变压器来说很困难？

摘要: 实证研究已经确定了一系列关于可学习偏见和transformer的限制，比如在学习计算简单形式语言（如PARITY）方面存在持续困难，以及偏向于低阶函数。然而，对于这方面的理论理解仍然有限，现有的表达能力理论要么过度预测或者低估了实际的学习能力。我们证明，在transformer架构下，损失景观受输入空间敏感性的限制：输出对输入字符串的多个部分都敏感的transformer存在于参数空间中的孤立点，导致一种在泛化中表现出的低敏感性偏见。我们在理论上和实证上展示了这个理论统一了关于transformer学习能力和偏见的广泛观察，比如它们对低敏感性和低阶数的泛化偏见，以及在PARITY问题上长度泛化的困难。这表明了理解transformer的归纳偏见需要研究不仅仅是它们在原则上的表达能力，还需要关注它们的损失景观。

更新时间: 2024-05-27 17:01:29

领域: cs.LG

下载: http://arxiv.org/abs/2402.09963v4

For Better or For Worse? Learning Minimum Variance Features With Label Augmentation

Data augmentation has been pivotal in successfully training deep learning models on classification tasks over the past decade. An important subclass of data augmentation techniques - which includes both label smoothing and Mixup - involves modifying not only the input data but also the input label during model training. In this work, we analyze the role played by the label augmentation aspect of such methods. We first prove that linear models on binary classification data trained with label augmentation learn only the minimum variance features in the data, while standard training (which includes weight decay) can learn higher variance features. We then use our techniques to show that even for nonlinear models and general data distributions, the label smoothing and Mixup losses are lower bounded by a function of the model output variance. An important consequence of our results is negative: label smoothing and Mixup can be less robust to spurious correlations in the data. We verify that our theory reflects practice via experiments on image classification benchmarks modified to have spurious correlations.

Updated: 2024-05-27 16:58:55

标题: 是好是坏？通过标签增强学习最小方差特征

摘要: 数据增强在过去十年中在成功训练深度学习模型进行分类任务方面起到了关键作用。数据增强技术的一个重要子类，包括标签平滑和Mixup，涉及在模型训练过程中修改输入数据和输入标签。在本研究中，我们分析了这些方法中标签增强的作用。我们首先证明，使用标签增强训练的二元分类数据上的线性模型只学习数据中的最小方差特征，而标准训练（包括权重衰减）可以学习到更高方差的特征。然后我们使用我们的技术证明，即使对于非线性模型和一般数据分布，标签平滑和Mixup的损失也可以被模型输出方差的一个函数下界约束。我们结果的一个重要后果是负面的：标签平滑和Mixup可能对数据中的虚假相关性不够稳健。我们通过对具有虚假相关性的图像分类基准数据集进行实验验证我们的理论反映了实践。

更新时间: 2024-05-27 16:58:55

领域: cs.LG,cs.CV

下载: http://arxiv.org/abs/2402.06855v2

Assessing the significance of longitudinal data in Alzheimer's Disease forecasting

In this study, we employ a transformer encoder model to characterize the significance of longitudinal patient data for forecasting the progression of Alzheimer's Disease (AD). Our model, Longitudinal Forecasting Model for Alzheimer's Disease (LongForMAD), harnesses the comprehensive temporal information embedded in sequences of patient visits that incorporate multimodal data, providing a deeper understanding of disease progression than can be drawn from single-visit data alone. We present an empirical analysis across two patient groups-Cognitively Normal (CN) and Mild Cognitive Impairment (MCI)-over a span of five follow-up years. Our findings reveal that models incorporating more extended patient histories can outperform those relying solely on present information, suggesting a deeper historical context is critical in enhancing predictive accuracy for future AD progression. Our results support the incorporation of longitudinal data in clinical settings to enhance the early detection and monitoring of AD. Our code is available at \url{https://github.com/batuhankmkaraman/LongForMAD}.

Updated: 2024-05-27 16:55:48

标题: 评估纵向数据在阿尔茨海默病预测中的重要性

摘要: 在这项研究中，我们利用一个变压器编码器模型来表征长期患者数据对于预测阿尔茨海默病（AD）进展的重要性。我们的模型，阿尔茨海默病（AD）的长期预测模型（LongForMAD），利用了包含多模态数据的患者访问序列中嵌入的全面时间信息，提供了比仅从单次访问数据中得出的更深入的疾病进展理解。我们在五年的随访期间对两组患者-认知正常（CN）和轻度认知障碍（MCI）进行了实证分析。我们的研究结果显示，结合更长期的患者病史的模型可以胜过仅依赖于当前信息的模型，这表明更深入的历史背景对于增强未来AD进展的预测准确性至关重要。我们的结果支持在临床设置中使用长期数据来增强AD的早期检测和监测。我们的代码可在\url{https://github.com/batuhankmkaraman/LongForMAD}获取。

更新时间: 2024-05-27 16:55:48

领域: cs.LG

下载: http://arxiv.org/abs/2405.17352v1

Why Transformers Need Adam: A Hessian Perspective

SGD performs worse than Adam by a significant margin on Transformers, but the reason remains unclear. In this work, we provide an explanation of SGD's bad performance on Transformers through the lens of Hessian: (i) Transformers are "heterogeneous": the Hessian spectrum across parameter blocks vary dramatically, a phenomenon we call "block heterogeneity"; (ii) Heterogeneity hampers SGD: SGD performs badly on problems with block heterogeneity. To validate that heterogeneity hampers SGD, we check various Transformers, CNNs, MLPs, and quadratic problems, and find that SGD works well on problems without block heterogeneity but performs badly when the heterogeneity exists. Our initial theoretical analysis indicates that SGD performs poorly because it applies one single learning rate to all blocks, which cannot handle the heterogeneity among blocks. This limitation could be ameliorated if we use coordinate-wise learning rates, as designed in Adam.

Updated: 2024-05-27 16:53:18

标题: 为什么变压器需要亚当：来自Hessian视角

摘要: SGD在变压器上的表现明显不如Adam，但原因尚不清楚。在这项工作中，我们通过Hessian的视角提供了对SGD在变压器上表现不佳的解释：(i) 变压器是“异质的”：参数块之间的Hessian谱差异巨大，这种现象被称为“块异质性”；(ii) 异质性阻碍了SGD的表现：SGD在存在块异质性的问题上表现不佳。为验证异质性对SGD的影响，我们检查了各种变压器、CNN、MLP和二次问题，发现SGD在不存在块异质性的问题上表现良好，但在存在异质性时表现不佳。我们的初步理论分析表明，SGD表现不佳是因为它将同一个学习率应用于所有块，无法处理块之间的异质性。如果我们采用像Adam中设计的坐标级别学习率，这种限制可能会得到改善。

更新时间: 2024-05-27 16:53:18

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2402.16788v2

O$n$ Learning Deep O($n$)-Equivariant Hyperspheres

In this paper, we utilize hyperspheres and regular $n$-simplexes and propose an approach to learning deep features equivariant under the transformations of $n$D reflections and rotations, encompassed by the powerful group of O$(n)$. Namely, we propose O$(n)$-equivariant neurons with spherical decision surfaces that generalize to any dimension $n$, which we call Deep Equivariant Hyperspheres. We demonstrate how to combine them in a network that directly operates on the basis of the input points and propose an invariant operator based on the relation between two points and a sphere, which as we show, turns out to be a Gram matrix. Using synthetic and real-world data in $n$D, we experimentally verify our theoretical contributions and find that our approach is superior to the competing methods for O$(n)$-equivariant benchmark datasets (classification and regression), demonstrating a favorable speed/performance trade-off. The code is available at https://github.com/pavlo-melnyk/equivariant-hyperspheres.

Updated: 2024-05-27 16:50:10

标题: 学习深度O(n)-等变超球面

摘要: 在本文中，我们利用超球体和正则的$n$-单纯形，并提出了一种学习深度特征的方法，该方法在$n$维反射和旋转变换下具有等变性，包含在强大的O$(n)$群中。具体来说，我们提出了具有球形决策面的O$(n)$-等变神经元，可推广到任意维度$n$，我们称之为深度等变超球体。我们演示了如何将它们组合在一个直接基于输入点运行的网络中，并提出了一个基于两点和一个球之间关系的不变算子，正如我们所展示的，这个算子实际上是一个Gram矩阵。使用$n$维合成和真实世界数据，我们实验验证了我们的理论贡献，并发现我们的方法优于竞争方法对于O$(n)$-等变基准数据集（分类和回归），展示出有利的速度/性能折衷。代码可在https://github.com/pavlo-melnyk/equivariant-hyperspheres找到。

更新时间: 2024-05-27 16:50:10

领域: cs.LG

下载: http://arxiv.org/abs/2305.15613v7

Prompt Optimization with Human Feedback

Large language models (LLMs) have demonstrated remarkable performances in various tasks. However, the performance of LLMs heavily depends on the input prompt, which has given rise to a number of recent works on prompt optimization. However, previous works often require the availability of a numeric score to assess the quality of every prompt. Unfortunately, when a human user interacts with a black-box LLM, attaining such a score is often infeasible and unreliable. Instead, it is usually significantly easier and more reliable to obtain preference feedback from a human user, i.e., showing the user the responses generated from a pair of prompts and asking the user which one is preferred. Therefore, in this paper, we study the problem of prompt optimization with human feedback (POHF), in which we aim to optimize the prompt for a black-box LLM using only human preference feedback. Drawing inspiration from dueling bandits, we design a theoretically principled strategy to select a pair of prompts to query for preference feedback in every iteration, and hence introduce our algorithm named automated POHF (APOHF). We apply our APOHF algorithm to various tasks, including optimizing user instructions, prompt optimization for text-to-image generative models, and response optimization with human feedback (i.e., further refining the response using a variant of our APOHF). The results demonstrate that our APOHF can efficiently find a good prompt using a small number of preference feedback instances. Our code can be found at \url{https://github.com/xqlin98/APOHF}.

Updated: 2024-05-27 16:49:29

标题: 使用人类反馈进行快速优化

摘要: 大型语言模型（LLMs）在各种任务中表现出色。然而，LLMs的性能很大程度上取决于输入提示，这导致了最近一系列有关提示优化的研究。然而，先前的研究通常需要可用性数值评分来评估每个提示的质量。不幸的是，当人类用户与黑匣子LLM交互时，获得这样的评分通常是不可行且不可靠的。相反，通常更容易且更可靠地从人类用户那里获得偏好反馈，即向用户展示从一对提示中生成的响应，并询问用户更喜欢哪个。因此，在本文中，我们研究了利用人类反馈进行提示优化的问题（POHF），即我们旨在仅使用人类偏好反馈为黑匣子LLM优化提示。受到对抗性波段的启发，我们设计了一个在每次迭代中选择一对提示来查询偏好反馈的理论上基础的策略，因此引入了我们的算法名为自动化POHF（APOHF）。我们将我们的APOHF算法应用于各种任务，包括优化用户说明、文本到图像生成模型的提示优化以及利用人类反馈进行响应优化（即使用我们的APOHF变体进一步优化响应）。结果表明，我们的APOHF可以有效地使用少量偏好反馈实例找到一个好的提示。我们的代码可以在 \url{https://github.com/xqlin98/APOHF} 找到。

更新时间: 2024-05-27 16:49:29

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2405.17346v1

Exploring and steering the moral compass of Large Language Models

Large Language Models (LLMs) have become central to advancing automation and decision-making across various sectors, raising significant ethical questions. This study proposes a comprehensive comparative analysis of the most advanced LLMs to assess their moral profiles. We subjected several state-of-the-art models to a selection of ethical dilemmas and found that all the proprietary ones are mostly utilitarian and all of the open-weights ones align mostly with values-based ethics. Furthermore, when using the Moral Foundations Questionnaire, all models we probed - except for Llama 2- displayed a strong liberal bias. Lastly, in order to causally intervene in one of the studied models, we propose a novel similarity-specific activation steering technique. Using this method, we were able to reliably steer the model's moral compass to different ethical schools. All of these results showcase that there is an ethical dimension in already deployed LLMs, an aspect that is generally overlooked.

Updated: 2024-05-27 16:49:22

标题: 探索和引导大型语言模型的道德指南

摘要: 大型语言模型（LLMs）已经成为推动各个领域自动化和决策制定的核心，引发了重要的伦理问题。本研究提出了一项对最先进的LLMs进行全面比较分析的提案，以评估它们的道德特征。我们将几个最先进的模型置于一系列伦理困境中，发现所有专有模型大多是功利主义的，而所有开放权重的模型大多与价值导向伦理相一致。此外，当使用道德基础问卷时，我们发现所有我们调查的模型 - 除了Llama 2 - 都显示出很强的自由主义偏见。最后，为了在研究的模型中进行因果干预，我们提出了一种新颖的相似性特定激活引导技术。使用这种方法，我们能够可靠地引导模型的道德指南针指向不同的伦理学派别。所有这些结果表明，在已经部署的LLMs中存在一个伦理维度，这是通常被忽视的方面。

更新时间: 2024-05-27 16:49:22

领域: cs.AI,cs.CL

下载: http://arxiv.org/abs/2405.17345v1

Policy Space Response Oracles: A Survey

Game theory provides a mathematical way to study the interaction between multiple decision makers. However, classical game-theoretic analysis is limited in scalability due to the large number of strategies, precluding direct application to more complex scenarios. This survey provides a comprehensive overview of a framework for large games, known as Policy Space Response Oracles (PSRO), which holds promise to improve scalability by focusing attention on sufficient subsets of strategies. We first motivate PSRO and provide historical context. We then focus on the strategy exploration problem for PSRO: the challenge of assembling effective subsets of strategies that still represent the original game well with minimum computational cost. We survey current research directions for enhancing the efficiency of PSRO, and explore the applications of PSRO across various domains. We conclude by discussing open questions and future research.

Updated: 2024-05-27 16:49:18

标题: 政策空间响应预言：一项调查

摘要: 博弈论提供了一种数学方法来研究多个决策者之间的互动。然而，由于策略数量巨大，传统的博弈论分析在可扩展性方面存在局限性，因此无法直接应用于更复杂的场景。本综述全面介绍了一种用于大型博弈的框架，称为政策空间响应预言（PSRO），该框架有望通过将注意力集中在足够的策略子集上来提高可扩展性。我们首先阐明了PSRO的动机并提供了历史背景。然后，我们聚焦于PSRO的策略探索问题：如何有效地组合策略子集，以便仍然能够以最小的计算成本有效地代表原始博弈。我们对提高PSRO效率的当前研究方向进行了概述，并探讨了PSRO在各个领域的应用。最后，我们讨论了开放问题和未来研究方向。

更新时间: 2024-05-27 16:49:18

领域: cs.GT,cs.AI,cs.MA

下载: http://arxiv.org/abs/2403.02227v2

Open Ad Hoc Teamwork with Cooperative Game Theory

Ad hoc teamwork poses a challenging problem, requiring the design of an agent to collaborate with teammates without prior coordination or joint training. Open ad hoc teamwork further complicates this challenge by considering environments with a changing number of teammates, referred to as open teams. One promising solution to this problem is leveraging the generalizability of graph neural networks to handle an unrestricted number of agents and effectively address open teams, named graph-based policy learning (GPL). However, its joint Q-value representation over a coordination graph lacks convincing explanations. In this paper, we establish a new theory to understand the joint Q-value representation from the perspective of cooperative game theory, and validate its learning paradigm in open team settings. Building on our theory, we propose a novel algorithm named CIAO compatible with GPL framework, with additional provable implementation tricks that can facilitate learning. The demo of experiments is available on https://sites.google.com/view/ciao2024, and the code of experiments is published on https://github.com/hsvgbkhgbv/CIAO.

Updated: 2024-05-27 16:48:35

标题: 《利用合作博弈论进行开放式临时团队合作》

摘要: Ad hoc团队合作是一个具有挑战性的问题，需要设计一个代理人与队友进行协作，而无需事先协调或联合培训。开放式ad hoc团队进一步复杂化了这个挑战，因为它考虑了具有不断变化的队友数量的环境，被称为开放式团队。解决这个问题的一个有希望的方法是利用图神经网络的泛化能力，以处理不受限制的代理数量并有效地解决开放式团队，称为基于图的策略学习（GPL）。然而，它在协调图上的联合Q值表示缺乏令人信服的解释。在本文中，我们建立了一个新理论，从合作博弈理论的角度理解联合Q值表示，并验证了其在开放团队环境中的学习范式。基于我们的理论，我们提出了一种名为CIAO的新算法，与GPL框架兼容，具有额外的可证明的实现技巧，可以促进学习。实验演示可在https://sites.google.com/view/ciao2024上查看，实验代码已发布在https://github.com/hsvgbkhgbv/CIAO。

更新时间: 2024-05-27 16:48:35

领域: cs.MA,cs.LG

下载: http://arxiv.org/abs/2402.15259v2

Physics-Informed Real NVP for Satellite Power System Fault Detection

The unique challenges posed by the space environment, characterized by extreme conditions and limited accessibility, raise the need for robust and reliable techniques to identify and prevent satellite faults. Fault detection methods in the space sector are required to ensure mission success and to protect valuable assets. In this context, this paper proposes an Artificial Intelligence (AI) based fault detection methodology and evaluates its performance on ADAPT (Advanced Diagnostics and Prognostics Testbed), an Electrical Power System (EPS) dataset, crafted in laboratory by NASA. Our study focuses on the application of a physics-informed (PI) real-valued non-volume preserving (Real NVP) model for fault detection in space systems. The efficacy of this method is systematically compared against other AI approaches such as Gated Recurrent Unit (GRU) and Autoencoder-based techniques. Results show that our physics-informed approach outperforms existing methods of fault detection, demonstrating its suitability for addressing the unique challenges of satellite EPS sub-system faults. Furthermore, we unveil the competitive advantage of physics-informed loss in AI models to address specific space needs, namely robustness, reliability, and power constraints, crucial for space exploration and satellite missions.

Updated: 2024-05-27 16:42:51

标题: 物理信息实现的Real NVP用于卫星电力系统故障检测

摘要: 空间环境所带来的独特挑战，极端条件和有限可访问性，使得需要强大可靠的技术来识别和预防卫星故障。太空领域的故障检测方法必须确保任务成功并保护宝贵的资产。在这种背景下，本文提出了一种基于人工智能（AI）的故障检测方法，并评估其在NASA实验室制作的ADAPT（先进诊断和预测试验台）上的电力系统（EPS）数据集上的性能。我们的研究重点放在应用物理学信息（PI）的真实值非体积保持（Real NVP）模型在太空系统中进行故障检测。该方法的有效性被系统地与其他AI方法（如门控循环单元（GRU）和基于自动编码器的技术）进行比较。结果显示我们的物理学信息方法胜过现有的故障检测方法，展示了其适用于解决卫星EPS子系统故障的独特挑战的能力。此外，我们揭示了物理学信息损失在AI模型中的竞争优势，以解决特定太空需求，即稳健性，可靠性和功率约束，这对太空探索和卫星任务至关重要。

更新时间: 2024-05-27 16:42:51

领域: cs.LG,cs.RO

下载: http://arxiv.org/abs/2405.17339v1

Test-Time Adaptation for Depth Completion

It is common to observe performance degradation when transferring models trained on some (source) datasets to target testing data due to a domain gap between them. Existing methods for bridging this gap, such as domain adaptation (DA), may require the source data on which the model was trained (often not available), while others, i.e., source-free DA, require many passes through the testing data. We propose an online test-time adaptation method for depth completion, the task of inferring a dense depth map from a single image and associated sparse depth map, that closes the performance gap in a single pass. We first present a study on how the domain shift in each data modality affects model performance. Based on our observations that the sparse depth modality exhibits a much smaller covariate shift than the image, we design an embedding module trained in the source domain that preserves a mapping from features encoding only sparse depth to those encoding image and sparse depth. During test time, sparse depth features are projected using this map as a proxy for source domain features and are used as guidance to train a set of auxiliary parameters (i.e., adaptation layer) to align image and sparse depth features from the target test domain to that of the source domain. We evaluate our method on indoor and outdoor scenarios and show that it improves over baselines by an average of 21.1%.

Updated: 2024-05-27 16:39:45

标题: 深度完成的测试时间适应

摘要: 在将在某些（源）数据集上训练的模型转移到目标测试数据时，由于它们之间存在领域差距，通常会观察到性能下降。现有的用于弥合这一差距的方法，如领域自适应（DA），可能需要模型训练时的源数据（通常不可用），而其他方法，即无源DA，需要多次通过测试数据。我们提出了一种用于深度完成的在线测试时间适应方法，即从单个图像和相关的稀疏深度图中推断出密集深度图的任务，可以在单次通过中缩小性能差距。我们首先对每个数据模态中的领域转移如何影响模型性能进行了研究。基于我们的观察结果，稀疏深度模态表现出比图像更小的协变量转移，我们设计了一个在源域中训练的嵌入模块，该模块保留了仅编码稀疏深度的特征到编码图像和稀疏深度的特征的映射。在测试时间，使用这个映射将稀疏深度特征投影为源域特征的代理，并将其用作指导来训练一组辅助参数（即适应层），以将来自目标测试域的图像和稀疏深度特征对齐到源域的特征。我们在室内和室外场景上评估了我们的方法，并显示它相对于基线的平均改进为21.1%。

更新时间: 2024-05-27 16:39:45

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2402.03312v4

Cost-efficient Knowledge-based Question Answering with Large Language Models

Knowledge-based question answering (KBQA) is widely used in many scenarios that necessitate domain knowledge. Large language models (LLMs) bring opportunities to KBQA, while their costs are significantly higher and absence of domain-specific knowledge during pre-training. We are motivated to combine LLMs and prior small models on knowledge graphs (KGMs) for both inferential accuracy and cost saving. However, it remains challenging since accuracy and cost are not readily combined in the optimization as two distinct metrics. It is also laborious for model selection since different models excel in diverse knowledge. To this end, we propose Coke, a novel cost-efficient strategy for KBQA with LLMs, modeled as a tailored multi-armed bandit problem to minimize calls to LLMs within limited budgets. We first formulate the accuracy expectation with a cluster-level Thompson Sampling for either KGMs or LLMs. A context-aware policy is optimized to further distinguish the expert model subject to the question semantics. The overall decision is bounded by the cost regret according to historical expenditure on failures. Extensive experiments showcase the superior performance of Coke, which moves the Pareto frontier with up to 20.89% saving of GPT-4 fees while achieving a 2.74% higher accuracy on the benchmark datasets.

Updated: 2024-05-27 16:37:34

标题: 使用大型语言模型进行成本效益高的基于知识的问答

摘要: 基于知识的问答（KBQA）在许多需要领域知识的场景中被广泛使用。大型语言模型（LLMs）为KBQA带来了机会，但其成本显着较高且在预训练期间缺乏领域特定知识。我们希望将LLMs和先前在知识图上的小型模型（KGMs）结合起来，以提高推理精度和节约成本。然而，由于准确性和成本在优化中并非容易结合的两个不同指标，这仍然具有挑战性。在模型选择方面也很费力，因为不同模型在各种知识方面表现出色。因此，我们提出了Coke，一种新颖的节约成本的KBQA策略，将LLMs建模为定制的多臂老虎机问题，以在有限预算内最小化对LLMs的调用。我们首先利用集群级别的汤普森抽样为KGMs或LLMs制定准确性期望。进一步优化了一个上下文感知策略，以进一步区分根据问题语义选择专家模型。整体决策受历史支出失败的成本遗憾的限制。广泛的实验展示了Coke的卓越性能，使得帕累托边界向前移动，最多节省了20.89％的GPT-4费用，同时在基准数据集上实现了2.74％更高的准确性。

更新时间: 2024-05-27 16:37:34

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2405.17337v1

LoRA Training in the NTK Regime has No Spurious Local Minima

Low-rank adaptation (LoRA) has become the standard approach for parameter-efficient fine-tuning of large language models (LLM), but our theoretical understanding of LoRA has been limited. In this work, we theoretically analyze LoRA fine-tuning in the neural tangent kernel (NTK) regime with $N$ data points, showing: (i) full fine-tuning (without LoRA) admits a low-rank solution of rank $r\lesssim \sqrt{N}$; (ii) using LoRA with rank $r\gtrsim \sqrt{N}$ eliminates spurious local minima, allowing gradient descent to find the low-rank solutions; (iii) the low-rank solution found using LoRA generalizes well.

Updated: 2024-05-27 16:35:26

标题: 在NTK制度下的LoRA训练没有虚假的局部极小值

摘要: 低秩适应（LoRA）已成为大语言模型（LLM）参数高效微调的标准方法，但我们对LoRA的理论理解有限。在这项工作中，我们在神经切线核（NTK）区间中对LoRA微调进行了理论分析，其中有N个数据点，结果显示：（i）完全微调（无需LoRA）允许秩为$r\lesssim \sqrt{N}$的低秩解；（ii）使用秩为$r\gtrsim \sqrt{N}$的LoRA可以消除虚假局部最小值，使梯度下降能够找到低秩解；（iii）使用LoRA找到的低秩解具有良好的泛化能力。

更新时间: 2024-05-27 16:35:26

领域: cs.LG,math.OC

下载: http://arxiv.org/abs/2402.11867v2

Conditioning on Time is All You Need for Synthetic Survival Data Generation

Synthetic data generation holds considerable promise, offering avenues to enhance privacy, fairness, and data accessibility. Despite the availability of various methods for generating synthetic tabular data, challenges persist, particularly in specialized applications such as survival analysis. One significant obstacle in survival data generation is censoring, which manifests as not knowing the precise timing of observed (target) events for certain instances. Existing methods face difficulties in accurately reproducing the real distribution of event times for both observed (uncensored) events and censored events, i.e., the generated event-time distributions do not accurately match the underlying distributions of the real data. So motivated, we propose a simple paradigm to produce synthetic survival data by generating covariates conditioned on event times (and censoring indicators), thus allowing one to reuse existing conditional generative models for tabular data without significant computational overhead, and without making assumptions about the (usually unknown) generation mechanism underlying censoring. We evaluate this method via extensive experiments on real-world datasets. Our methodology outperforms multiple competitive baselines at generating survival data, while improving the performance of downstream survival models trained on it and tested on real data.

Updated: 2024-05-27 16:34:18

标题: 时间上的条件对合成生存数据生成至关重要

摘要: 合成数据生成具有相当大的潜力，提供了增强隐私性、公平性和数据可访问性的途径。尽管存在各种方法用于生成合成表格数据，但挑战仍然存在，特别是在专业应用程序中，如生存分析。生存数据生成中的一个重要障碍是截尾，表现为不知道某些实例的观察（目标）事件的精确时间。现有方法在准确重现观察（未截尾）事件和截尾事件的实际事件时间分布方面面临困难，即生成的事件时间分布不能准确匹配实际数据的基础分布。因此，我们提出了一个简单的范例，通过在事件时间（和截尾指标）上生成协变量来产生合成生存数据，从而可以重复使用现有的条件生成模型用于表格数据，而无需进行重大计算开销，并且不需要对截尾的生成机制（通常未知）进行假设。我们通过对真实数据集进行大量实验来评估这种方法。我们的方法在生成生存数据方面优于多个竞争基线，同时提高了在其上训练的下游生存模型在真实数据上的性能。

更新时间: 2024-05-27 16:34:18

领域: stat.ML,cs.LG

下载: http://arxiv.org/abs/2405.17333v1

PlatoLM: Teaching LLMs in Multi-Round Dialogue via a User Simulator

The unparalleled performance of closed-sourced ChatGPT has sparked efforts towards its democratization, with notable strides made by leveraging real user and ChatGPT dialogues, as evidenced by Vicuna. However, due to challenges in gathering dialogues involving human participation, current endeavors like Baize and UltraChat rely on ChatGPT conducting roleplay to simulate humans based on instructions, resulting in overdependence on seeds, diminished human-likeness, limited topic diversity, and an absence of genuine multi-round conversational dynamics. To address the above issues, we propose a paradigm to simulate human behavior better and explore the benefits of incorporating more human-like questions in multi-turn conversations. Specifically, we directly target human questions extracted from genuine human-machine conversations as a learning goal and provide a novel user simulator called `Socratic'. The experimental results show our response model, `PlatoLM', achieves SoTA performance among LLaMA-based 7B models in MT-Bench. Our findings further demonstrate that our method introduces highly human-like questioning patterns and rich topic structures, which can teach the response model better than previous works in multi-round conversations.

Updated: 2024-05-27 16:32:16

标题: PlatoLM：通过用户模拟器在多轮对话中教授LLMs

摘要: 封闭源ChatGPT的无与伦比的性能引发了朝着其民主化的努力，Vicuna通过利用真实用户和ChatGPT对话取得了显著进展。然而，由于在收集涉及人类参与的对话方面存在挑战，像Baize和UltraChat这样的当前努力依赖于ChatGPT进行角色扮演，以模拟基于指示的人类，导致对种子过度依赖，人类化程度降低，话题多样性受限，并且缺乏真实的多轮对话动态。为解决上述问题，我们提出了一种模拟人类行为更好的范式，并探索在多轮对话中引入更多类似人类的问题的好处。具体而言，我们将从真实人机对话中提取的人类问题直接作为学习目标，并提供了一个名为"Socratic"的新型用户模拟器。实验结果显示，我们的响应模型"PlatoLM"在MT-Bench的基于LLaMA的7B模型中取得了SoTA性能。我们的研究结果进一步表明，我们的方法引入了高度类似人类的提问模式和丰富的主题结构，可以比以前的作品更好地教导响应模型在多轮对话中表现。

更新时间: 2024-05-27 16:32:16

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2308.11534v5

Clip Body and Tail Separately: High Probability Guarantees for DPSGD with Heavy Tails

Differentially Private Stochastic Gradient Descent (DPSGD) is widely utilized to preserve training data privacy in deep learning, which first clips the gradients to a predefined norm and then injects calibrated noise into the training procedure. Existing DPSGD works typically assume the gradients follow sub-Gaussian distributions and design various clipping mechanisms to optimize training performance. However, recent studies have shown that the gradients in deep learning exhibit a heavy-tail phenomenon, that is, the tails of the gradient have infinite variance, which may lead to excessive clipping loss to the gradients with existing DPSGD mechanisms. To address this problem, we propose a novel approach, Discriminative Clipping~(DC)-DPSGD, with two key designs. First, we introduce a subspace identification technique to distinguish between body and tail gradients. Second, we present a discriminative clipping mechanism that applies different clipping thresholds for body and tail gradients to reduce the clipping loss. Under the non-convex condition, \ourtech{} reduces the empirical gradient norm from {${\mathbb{O}\left(\log^{\max(0,\theta-1)}(T/\delta)\log^{2\theta}(\sqrt{T})\right)}$} to {${\mathbb{O}\left(\log(\sqrt{T})\right)}$} with heavy-tailed index $\theta\geq 1/2$, iterations $T$, and arbitrary probability $\delta$. Extensive experiments on four real-world datasets demonstrate that our approach outperforms three baselines by up to 9.72\% in terms of accuracy.

Updated: 2024-05-27 16:30:11

标题: 将Clip Body and Tail Separately: High Probability Guarantees for DPSGD with Heavy Tails翻译为"分开剪裁主体和尾部：具有重尾的DPSGD的高概率保证"

摘要: 隐私保护随机梯度下降（DPSGD）被广泛应用于深度学习中，以保护训练数据的隐私，首先将梯度剪裁到预定义的范数，然后在训练过程中注入校准的噪声。现有的DPSGD方法通常假设梯度遵循次高斯分布，并设计各种剪裁机制来优化训练性能。然而，最近的研究表明，深度学习中的梯度表现出重尾现象，即梯度的尾部具有无限方差，这可能导致现有的DPSGD机制对梯度的过度剪裁损失。为了解决这个问题，我们提出了一种新的方法，即判别剪裁（DC）-DPSGD，具有两个关键设计。首先，我们引入了一种子空间识别技术来区分主体和尾部梯度。其次，我们提出了一种判别性剪裁机制，为主体和尾部梯度应用不同的剪裁阈值，以减少剪裁损失。在非凸条件下，\ourtech{} 将经验梯度范数从 {${\mathbb{O}\left(\log^{\max(0,\theta-1)}(T/\delta)\log^{2\theta}(\sqrt{T})\right)}$} 降低到 {${\mathbb{O}\left(\log(\sqrt{T})\right)}$}，其中重尾指数为 $\theta\geq 1/2$，迭代次数为 $T$，任意概率为 $\delta$。对四个真实数据集的大量实验证明，我们的方法在准确性方面比三个基线方法表现更好，最多提高了9.72\%。

更新时间: 2024-05-27 16:30:11

领域: cs.LG,cs.CR

下载: http://arxiv.org/abs/2405.17529v1

Double Correction Framework for Denoising Recommendation

As its availability and generality in online services, implicit feedback is more commonly used in recommender systems. However, implicit feedback usually presents noisy samples in real-world recommendation scenarios (such as misclicks or non-preferential behaviors), which will affect precise user preference learning. To overcome the noisy samples problem, a popular solution is based on dropping noisy samples in the model training phase, which follows the observation that noisy samples have higher training losses than clean samples. Despite the effectiveness, we argue that this solution still has limits. (1) High training losses can result from model optimization instability or hard samples, not just noisy samples. (2) Completely dropping of noisy samples will aggravate the data sparsity, which lacks full data exploitation. To tackle the above limitations, we propose a Double Correction Framework for Denoising Recommendation (DCF), which contains two correction components from views of more precise sample dropping and avoiding more sparse data. In the sample dropping correction component, we use the loss value of the samples over time to determine whether it is noise or not, increasing dropping stability. Instead of averaging directly, we use the damping function to reduce the bias effect of outliers. Furthermore, due to the higher variance exhibited by hard samples, we derive a lower bound for the loss through concentration inequality to identify and reuse hard samples. In progressive label correction, we iteratively re-label highly deterministic noisy samples and retrain them to further improve performance. Finally, extensive experimental results on three datasets and four backbones demonstrate the effectiveness and generalization of our proposed framework.

Updated: 2024-05-27 16:29:40

标题: 双重校正框架用于去噪推荐

摘要: 由于在线服务中隐式反馈的可用性和通用性，它在推荐系统中更常见地被使用。然而，在现实世界的推荐场景中，隐式反馈通常呈现出噪声样本（例如误点或非偏好行为），这会影响精确的用户偏好学习。为了克服噪声样本问题，一个流行的解决方案是基于在模型训练阶段丢弃噪声样本，这是根据噪声样本具有比干净样本更高的训练损失的观察而来的。尽管有效，我们认为这个解决方案仍然存在局限性。（1）高训练损失可能是由于模型优化不稳定或难以处理的样本，而不仅仅是噪声样本。（2）完全丢弃噪声样本会加剧数据稀疏性，从而缺乏充分的数据利用。为了解决上述限制，我们提出了一个用于去噪推荐的双重校正框架（DCF），其中包含两个校正组件，从更精确的样本丢弃和避免更稀疏数据的角度进行。在样本丢弃校正组件中，我们使用样本随时间的损失值来确定它是否是噪声，从而增加丢弃的稳定性。与直接平均不同，我们使用阻尼函数来减小异常值的偏差效应。此外，由于难以处理的样本表现出更高的方差，我们通过集中不等式推导出损失的下限，以识别和重复使用难以处理的样本。在渐进标签校正中，我们迭代地重新标记高度确定性的噪声样本，并重新训练它们以进一步提高性能。最后，对三个数据集和四个支柱的广泛实验结果展示了我们提出的框架的有效性和泛化性。

更新时间: 2024-05-27 16:29:40

领域: cs.IR,cs.AI

下载: http://arxiv.org/abs/2405.11272v2

Novel Approaches for ML-Assisted Particle Track Reconstruction and Hit Clustering

Track reconstruction is a vital aspect of High-Energy Physics (HEP) and plays a critical role in major experiments. In this study, we delve into unexplored avenues for particle track reconstruction and hit clustering. Firstly, we enhance the algorithmic design effort by utilising a simplified simulator (REDVID) to generate training data that is specifically composed for simplicity. We demonstrate the effectiveness of this data in guiding the development of optimal network architectures. Additionally, we investigate the application of image segmentation networks for this task, exploring their potential for accurate track reconstruction. Moreover, we approach the task from a different perspective by treating it as a hit sequence to track sequence translation problem. Specifically, we explore the utilisation of Transformer architectures for tracking purposes. Our preliminary findings are covered in detail. By considering this novel approach, we aim to uncover new insights and potential advancements in track reconstruction. This research sheds light on previously unexplored methods and provides valuable insights for the field of particle track reconstruction and hit clustering in HEP.

Updated: 2024-05-27 16:23:50

标题: 基于机器学习辅助的粒子轨迹重建和击中聚类的新方法

摘要: 轨道重建是高能物理学（HEP）中至关重要的一个方面，对主要实验起着关键作用。在这项研究中，我们深入探讨了粒子轨迹重建和击中聚类的未开发领域。首先，我们通过利用一个简化的模拟器（REDVID）生成专门用于简化的训练数据，增强了算法设计工作。我们展示了这些数据在指导最佳网络架构开发方面的有效性。此外，我们研究了图像分割网络在这一任务中的应用，探索了其对精确轨迹重建的潜力。此外，我们从不同的角度处理这项任务，将其视为一个从击中序列到轨道序列的翻译问题。具体而言，我们探讨了Transformer架构在跟踪目的上的利用。我们详细介绍了我们的初步发现。通过考虑这种新颖的方法，我们旨在揭示轨道重建中的新见解和潜在进展。这项研究揭示了以前未曾探索的方法，并为HEP中的粒子轨迹重建和击中聚类领域提供了宝贵的见解。

更新时间: 2024-05-27 16:23:50

领域: hep-ex,cs.LG

下载: http://arxiv.org/abs/2405.17325v1

Leveraging Offline Data in Linear Latent Bandits

Sequential decision-making domains such as recommender systems, healthcare and education often have unobserved heterogeneity in the population that can be modeled using latent bandits $-$ a framework where an unobserved latent state determines the model for a trajectory. While the latent bandit framework is compelling, the extent of its generality is unclear. We first address this by establishing a de Finetti theorem for decision processes, and show that $\textit{every}$ exchangeable and coherent stateless decision process is a latent bandit. The latent bandit framework lends itself particularly well to online learning with offline datasets, a problem of growing interest in sequential decision-making. One can leverage offline latent bandit data to learn a complex model for each latent state, so that an agent can simply learn the latent state online to act optimally. We focus on a linear model for a latent bandit with $d_A$-dimensional actions, where the latent states lie in an unknown $d_K$-dimensional subspace for $d_K \ll d_A$. We present SOLD, a novel principled method to learn this subspace from short offline trajectories with guarantees. We then provide two methods to leverage this subspace online: LOCAL-UCB and ProBALL-UCB. We demonstrate that LOCAL-UCB enjoys $\tilde O(\min(d_A\sqrt{T}, d_K\sqrt{T}(1+\sqrt{d_AT/d_KN})))$ regret guarantees, where the effective dimension is lower when the size $N$ of the offline dataset is larger. ProBALL-UCB enjoys a slightly weaker guarantee, but is more practical and computationally efficient. Finally, we establish the efficacy of our methods using experiments on both synthetic data and real-life movie recommendation data from MovieLens.

Updated: 2024-05-27 16:23:34

标题: 在线性潜在赌博中利用离线数据

摘要: 顺序决策领域，如推荐系统、医疗保健和教育往往存在人群中无法观察到的异质性，可以使用潜在bandits模型进行建模 - 一个未观察到的潜在状态决定轨迹模型的框架。虽然潜在bandits框架很有吸引力，但其广泛性尚不清楚。我们首先通过建立一个关于决策过程的de Finetti定理来解决这个问题，并展示出每一个可交换且一致的无状态决策过程都是一个潜在bandit。潜在bandit框架特别适用于在线学习与离线数据集的问题，在顺序决策中越来越受关注。我们可以利用离线潜在bandit数据来学习每个潜在状态的复杂模型，这样代理就可以简单地在线学习潜在状态以实现最优行为。我们关注具有$d_A$维动作的潜在bandit的线性模型，其中潜在状态位于未知的$d_K$维子空间中，其中$d_K \ll d_A$。我们提出了一种新颖的方法SOLD，可以从短离线轨迹中学习这个子空间并提供保证。然后我们提供了两种在线利用这个子空间的方法：LOCAL-UCB和ProBALL-UCB。我们证明了LOCAL-UCB享有$\tilde O(\min(d_A\sqrt{T}, d_K\sqrt{T}(1+\sqrt{d_AT/d_KN})))$的遗憾保证，其中当离线数据集的大小$N$较大时，有效维数较低。ProBALL-UCB享有稍弱的保证，但更实用和计算效率更高。最后，我们通过实验证实了我们方法的有效性，包括对合成数据和来自MovieLens的真实电影推荐数据的实验。

更新时间: 2024-05-27 16:23:34

领域: cs.LG,cs.AI,stat.ML

下载: http://arxiv.org/abs/2405.17324v1

Sharp Generalization of Transductive Learning: A Transductive Local Rademacher Complexity Approach

We introduce a new tool, Transductive Local Complexity (TLC), designed to analyze the generalization performance of transductive learning methods and inspire the development of new algorithms in this domain. Our work extends the concept of the popular Local Rademacher Complexity (LRC) to the transductive setting, incorporating significant and novel modifications compared to the typical analysis of LRC methods in the inductive setting. While LRC has been widely used as a powerful tool for analyzing inductive models, providing sharp generalization bounds for classification and minimax rates for nonparametric regression, it remains an open question whether a localized Rademacher complexity-based tool can be developed for transductive learning. Our goal is to achieve sharp bounds for transductive learning that align with the inductive excess risk bounds established by LRC. We provide a definitive answer to this open problem with the introduction of TLC. We construct TLC by first establishing a novel and sharp concentration inequality for the supremum of a test-train empirical processes. Using a peeling strategy and a new surrogate variance operator, we derive the a novel excess risk bound in the transductive setting which is consistent with the classical LRC-based excess risk bound in the inductive setting. As an application of TLC, we employ this new tool to analyze the Transductive Kernel Learning (TKL) model, deriving sharper excess risk bounds than those provided by the current state-of-the-art under the same assumptions. Additionally, the concentration inequality for the test-train process is employed to derive a sharp concentration inequality for the general supremum of empirical processes involving random variables in the setting of uniform sampling without replacement. The sharpness of our derived bound is compared to existing concentration inequalities under the same conditions.

Updated: 2024-05-27 16:23:03

标题: 尖锐的转导式学习的广义化：一种转导式局部Rademacher复杂性方法

摘要: 我们介绍了一种新工具，称为Transductive Local Complexity（TLC），旨在分析传导学习方法的泛化性能，并启发该领域新算法的发展。我们的工作将流行的局部Rademacher复杂度（LRC）的概念扩展到传导设置中，与归纳设置中LRC方法的典型分析相比，进行了重大和新颖的修改。虽然LRC被广泛用作分析归纳模型的强大工具，为分类提供尖锐的泛化界限和非参数回归的最小最大速率，但一个重要问题仍然存在，即是否可以为传导学习开发基于局部Rademacher复杂度的工具。我们的目标是获得与LRC建立的归纳超额风险边界一致的传导学习尖锐边界。我们通过引入TLC为这个开放问题提供了明确答案。我们首先建立了一个新颖和尖锐的浓度不等式，用于测试-训练经验过程的最大值。通过使用剥离策略和新的替代方差运算符，我们在传导设置中导出了一个新颖的超额风险边界，与归纳设置中基于经典LRC的超额风险边界一致。作为TLC的一个应用，我们利用这个新工具分析了传导核学习（TKL）模型，在相同的假设下，导出了比当前最先进技术提供的更尖锐的超额风险边界。此外，测试-训练过程的浓度不等式被用来在无替换均匀采样设置中导出涉及随机变量的经验过程的一般最大值的尖锐浓度不等式。我们导出的边界的尖锐程度与相同条件下现有的浓度不等式进行了比较。

更新时间: 2024-05-27 16:23:03

领域: stat.ML,cs.LG,62G08

下载: http://arxiv.org/abs/2309.16858v2

Geometry-Informed Neural Networks

Geometry is a ubiquitous language of computer graphics, design, and engineering. However, the lack of large shape datasets limits the application of state-of-the-art supervised learning methods and motivates the exploration of alternative learning strategies. To this end, we introduce geometry-informed neural networks (GINNs) to train shape generative models \emph{without any data}. GINNs combine (i) learning under constraints, (ii) neural fields as a suitable representation, and (iii) generating diverse solutions to under-determined problems. We apply GINNs to several two and three-dimensional problems of increasing levels of complexity. Our results demonstrate the feasibility of training shape generative models in a data-free setting. This new paradigm opens several exciting research directions, expanding the application of generative models into domains where data is sparse.

Updated: 2024-05-27 16:12:14

标题: 几何信息指导的神经网络

摘要: 几何是计算机图形，设计和工程的一种普遍语言。然而，缺乏大规模形状数据集限制了最先进的监督学习方法的应用，并激发了对替代学习策略的探索。为此，我们引入了几何信息神经网络（GINNs）来训练形状生成模型，而无需任何数据。GINNs结合了（i）在约束条件下学习，（ii）神经场作为合适的表示，以及（iii）生成多样的解决方案来解决欠定问题。我们将GINNs应用于增加复杂性的几个二维和三维问题。我们的结果证明了在无数据环境中训练形状生成模型的可行性。这种新的范式打开了几个激动人心的研究方向，将生成模型的应用拓展到数据稀缺的领域。

更新时间: 2024-05-27 16:12:14

领域: cs.LG,cs.CV

下载: http://arxiv.org/abs/2402.14009v2

Probabilistic Graph Rewiring via Virtual Nodes

Message-passing graph neural networks (MPNNs) have emerged as a powerful paradigm for graph-based machine learning. Despite their effectiveness, MPNNs face challenges such as under-reaching and over-squashing, where limited receptive fields and structural bottlenecks hinder information flow in the graph. While graph transformers hold promise in addressing these issues, their scalability is limited due to quadratic complexity regarding the number of nodes, rendering them impractical for larger graphs. Here, we propose \emph{implicitly rewired message-passing neural networks} (IPR-MPNNs), a novel approach that integrates \emph{implicit} probabilistic graph rewiring into MPNNs. By introducing a small number of virtual nodes, i.e., adding additional nodes to a given graph and connecting them to existing nodes, in a differentiable, end-to-end manner, IPR-MPNNs enable long-distance message propagation, circumventing quadratic complexity. Theoretically, we demonstrate that IPR-MPNNs surpass the expressiveness of traditional MPNNs. Empirically, we validate our approach by showcasing its ability to mitigate under-reaching and over-squashing effects, achieving state-of-the-art performance across multiple graph datasets. Notably, IPR-MPNNs outperform graph transformers while maintaining significantly faster computational efficiency.

Updated: 2024-05-27 16:11:49

标题: 概率图重连通过虚拟节点

摘要: 消息传递图神经网络（MPNNs）已经成为图形机器学习的一种强大范式。尽管它们有效，但MPNNs面临一些挑战，如传播范围有限和过度压缩，其中有限的接受域和结构瓶颈阻碍了图中的信息流动。虽然图变换器有望解决这些问题，但由于节点数量的二次复杂度，它们的可扩展性受到限制，这使它们在更大的图中不切实际。在这里，我们提出了一种新颖的方法，即\emph{隐式重连消息传递神经网络}（IPR-MPNNs），它将\emph{隐式}概率图重连集成到MPNNs中。通过以可微分的端到端方式引入少量虚拟节点，即向给定图中添加额外节点并将它们连接到现有节点，IPR-MPNNs实现了长距离消息传播，避免了二次复杂度。理论上，我们证明了IPR-MPNNs超越了传统MPNNs的表达能力。在实证方面，我们通过展示其减轻传播范围有限和过度压缩效应的能力来验证我们的方法，在多个图数据集上实现了最先进的性能。值得注意的是，IPR-MPNNs在保持显著更快的计算效率的同时，胜过了图变换器。

更新时间: 2024-05-27 16:11:49

领域: cs.LG

下载: http://arxiv.org/abs/2405.17311v1

Recurrent Early Exits for Federated Learning with Heterogeneous Clients

Federated learning (FL) has enabled distributed learning of a model across multiple clients in a privacy-preserving manner. One of the main challenges of FL is to accommodate clients with varying hardware capacities; clients have differing compute and memory requirements. To tackle this challenge, recent state-of-the-art approaches leverage the use of early exits. Nonetheless, these approaches fall short of mitigating the challenges of joint learning multiple exit classifiers, often relying on hand-picked heuristic solutions for knowledge distillation among classifiers and/or utilizing additional layers for weaker classifiers. In this work, instead of utilizing multiple classifiers, we propose a recurrent early exit approach named ReeFL that fuses features from different sub-models into a single shared classifier. Specifically, we use a transformer-based early-exit module shared among sub-models to i) better exploit multi-layer feature representations for task-specific prediction and ii) modulate the feature representation of the backbone model for subsequent predictions. We additionally present a per-client self-distillation approach where the best sub-model is automatically selected as the teacher of the other sub-models at each client. Our experiments on standard image and speech classification benchmarks across various emerging federated fine-tuning baselines demonstrate ReeFL's effectiveness over previous works.

Updated: 2024-05-27 16:11:22

标题: 异构客户端的联邦学习中的重复早期退出

摘要: 联邦学习（FL）已经实现了以隐私保护方式跨多个客户端分布式学习模型。FL的主要挑战之一是适应具有不同硬件能力的客户端；客户端具有不同的计算和内存需求。为了应对这一挑战，最近一些最新的方法利用了早期退出。然而，这些方法未能有效缓解联合学习多个退出分类器的挑战，通常依赖于手工挑选的启发式解决方案进行分类器之间的知识蒸馏，和/或利用额外的层用于弱分类器。在这项工作中，我们提出了一种名为ReeFL的循环早期退出方法，该方法将来自不同子模型的特征融合到一个共享的分类器中。具体地，我们使用一个基于Transformer的早期退出模块在子模型之间共享，以更好地利用多层特征表示进行任务特定预测，并调节主干模型的特征表示以进行后续预测。我们另外提出了一种每个客户端自我蒸馏的方法，其中最佳子模型被自动选为每个客户端其他子模型的教师。我们在各种新兴的联邦微调基线上对标准图像和语音分类基准进行的实验表明，ReeFL比先前的作品更有效。

更新时间: 2024-05-27 16:11:22

领域: cs.LG,cs.CV,cs.DC

下载: http://arxiv.org/abs/2405.14791v2

Confidence-aware Self-Semantic Distillation on Knowledge Graph Embedding

Knowledge Graph Embedding (KGE), which projects entities and relations into continuous vector spaces, have garnered significant attention. Although high-dimensional KGE methods offer better performance, they come at the expense of significant computation and memory overheads. Decreasing embedding dimensions significantly deteriorates model performance. While several recent efforts utilize knowledge distillation or non-Euclidean representation learning to augment the effectiveness of low-dimensional KGE, they either necessitate a pre-trained high-dimensional teacher model or involve complex non-Euclidean operations, thereby incurring considerable additional computational costs. To address this, this work proposes Confidence-aware Self-Knowledge Distillation (CSD) that learns from model itself to enhance KGE in a low-dimensional space. Specifically, CSD extracts knowledge from embeddings in previous iterations, which would be utilized to supervise the learning of the model in the next iterations. Moreover, a specific semantic module is developed to filter reliable knowledge by estimating the confidence of previously learned embeddings. This straightforward strategy bypasses the need for time-consuming pre-training of teacher models and can be integrated into various KGE methods to improve their performance. Our comprehensive experiments on six KGE backbones and four datasets underscore the effectiveness of the proposed CSD.

Updated: 2024-05-27 16:11:10

标题: 自信感知的知识图嵌入自语义蒸馏

摘要: 知识图谱嵌入（KGE）将实体和关系投影到连续向量空间中，引起了广泛关注。尽管高维KGE方法提供了更好的性能，但这是以巨大的计算和内存开销为代价的。降低嵌入维度会显著降低模型性能。虽然最近的一些工作利用知识蒸馏或非欧几里德表示学习来增强低维KGE的有效性，但它们要么需要一个预训练的高维教师模型，要么涉及复杂的非欧几里德运算，从而产生相当大的额外计算成本。为了解决这个问题，本文提出了自信感知自知识蒸馏（CSD），该方法从模型自身学习，以增强低维空间中的KGE。具体而言，CSD从先前迭代中的嵌入中提取知识，用于监督模型在下一个迭代中的学习。此外，开发了一个特定的语义模块，通过估计先前学习的嵌入的置信度来过滤可靠的知识。这种简单的策略避免了对教师模型进行耗时的预训练，可以集成到各种KGE方法中以提高它们的性能。我们在六个KGE骨干和四个数据集上进行了全面的实验，强调了所提出的CSD的有效性。

更新时间: 2024-05-27 16:11:10

领域: cs.LG,cs.AI,cs.CL

下载: http://arxiv.org/abs/2206.02963v2

Survey of Graph Neural Network for Internet of Things and NextG Networks

The exponential increase in Internet of Things (IoT) devices coupled with 6G pushing towards higher data rates and connected devices has sparked a surge in data. Consequently, harnessing the full potential of data-driven machine learning has become one of the important thrusts. In addition to the advancement in wireless technology, it is important to efficiently use the resources available and meet the users' requirements. Graph Neural Networks (GNNs) have emerged as a promising paradigm for effectively modeling and extracting insights which inherently exhibit complex network structures due to its high performance and accuracy, scalability, adaptability, and resource efficiency. There is a lack of a comprehensive survey that focuses on the applications and advances GNN has made in the context of IoT and Next Generation (NextG) networks. To bridge that gap, this survey starts by providing a detailed description of GNN's terminologies, architecture, and the different types of GNNs. Then we provide a comprehensive survey of the advancements in applying GNNs for IoT from the perspective of data fusion and intrusion detection. Thereafter, we survey the impact GNN has made in improving spectrum awareness. Next, we provide a detailed account of how GNN has been leveraged for networking and tactical systems. Through this survey, we aim to provide a comprehensive resource for researchers to learn more about GNN in the context of wireless networks, and understand its state-of-the-art use cases while contrasting to other machine learning approaches. Finally, we also discussed the challenges and wide range of future research directions to further motivate the use of GNN for IoT and NextG Networks.

Updated: 2024-05-27 16:10:49

标题: 对物联网和下一代网络的图神经网络调查

摘要: 物联网(IoT)设备的指数增长，加上6G推动数据速率和连接设备的增加，引发了数据激增。因此，充分利用数据驱动的机器学习潜力已成为重要的推动力之一。除了无线技术的进步外，高效利用可用资源并满足用户需求也很重要。图神经网络(GNN)作为一种有效建模和提取见解的有前途的范式已经出现，因为它具有高性能、精度、可扩展性、适应性和资源效率，从本质上表现出复杂的网络结构。目前缺乏针对GNN在物联网和下一代(NextG)网络背景下应用和进展的综合调查。为了填补这一空白，本调查从提供GNN术语、架构和不同类型的GNN的详细描述开始。然后，我们全面调查了在数据融合和入侵检测方面应用GNN的进展。之后，我们调查了GNN在改进频谱感知方面所带来的影响。接下来，我们详细介绍了GNN如何被用于网络和战术系统。通过这项调查，我们旨在为研究人员提供一个全面的资源，以了解无线网络背景下GNN的使用情况，并了解其最先进的用例，同时与其他机器学习方法进行对比。最后，我们还讨论了挑战和广泛的未来研究方向，以进一步推动在物联网和NextG网络中使用GNN。

更新时间: 2024-05-27 16:10:49

领域: cs.LG,cs.NI

下载: http://arxiv.org/abs/2405.17309v1

Peer2PIR: Private Queries for IPFS

The InterPlanetary File System (IPFS) is a peer-to-peer network for storing data in a distributed file system, hosting over 190,000 peers spanning 152 countries. Despite its prominence, the privacy properties that IPFS offers to peers are severely limited. Any query within the network leaks to other peers the content for which a peer is querying. We address IPFS' privacy leakage across three functionalities (peer routing, provider advertisements, and content retrieval), ultimately empowering peers to privately navigate and retrieve content in the network. We argue that private information retrieval (PIR) is the most suitable tool for our task. Our work highlights and addresses novel challenges inherent to integrating PIR into distributed systems. We present our new, private protocols and demonstrate that they incur minimal overheads compared to IPFS today. We also include a systematic comparison of state-of-art PIR protocols in the context of distributed systems which may be of independent interest.

Updated: 2024-05-27 16:09:25

标题: Peer2PIR：IPFS的私密查询

摘要: 星际文件系统（IPFS）是一个点对点网络，用于在分布式文件系统中存储数据，覆盖152个国家的超过190,000个节点。尽管IPFS备受关注，但其提供给节点的隐私属性严重受限。网络中的任何查询都会泄漏给其他节点，查询节点所请求内容。我们解决了IPFS在三个功能（节点路由、提供者广告和内容检索）中的隐私泄漏问题，最终使节点能够在网络中私密导航和检索内容。我们认为私密信息检索（PIR）是我们任务的最合适工具。我们的工作突出并解决了将PIR集成到分布式系统中固有的新挑战。我们提出了我们的新的私密协议，并证明它们与今天的IPFS相比只带来了最小的开销。我们还在分布式系统的背景下系统比较了最新的PIR协议，这可能是独立感兴趣的。

更新时间: 2024-05-27 16:09:25

领域: cs.CR

下载: http://arxiv.org/abs/2405.17307v1

NuwaTS: a Foundation Model Mending Every Incomplete Time Series

Time series imputation plays a crucial role in various real-world systems and has been extensively explored. Models for time series imputation often require specialization, necessitating distinct designs for different domains and missing patterns. In this study, we introduce NuwaTS, a framework to repurpose Pre-trained Language Model (PLM) for general time series imputation. Once trained, this model can be applied to imputation tasks on incomplete time series from any domain with any missing patterns. We begin by devising specific embeddings for each sub-series patch of the incomplete time series. These embeddings encapsulate information about the patch itself, the missing data patterns within the patch, and the patch's statistical characteristics. To enhance the model's adaptability to different missing patterns, we propose a contrastive learning approach to make representations of the same patch more similar across different missing patterns. By combining this contrastive loss with the missing data imputation task, we train PLMs to obtain a one-for-all imputation model. Furthermore, we utilize a plug-and-play layer-wise fine-tuning approach to train domain-specific models. Experimental results demonstrate that leveraging a dataset of over seventeen million time series from diverse domains, we obtain a one-for-all imputation model which outperforms existing domain-specific models across various datasets and missing patterns. Additionally, we find that NuwaTS can be generalized to other time series tasks such as forecasting. Our codes are available at https://github.com/Chengyui/NuwaTS.

Updated: 2024-05-27 16:01:44

标题: NuwaTS: 一个修补每个不完整时间序列的基础模型

摘要: 时间序列插值在各种现实世界系统中发挥着至关重要的作用，并得到了广泛探讨。时间序列插值模型通常需要专门化，需要针对不同领域和不同缺失模式进行独特设计。在本研究中，我们介绍了NuwaTS，一个重新利用预训练语言模型（PLM）进行一般时间序列插值的框架。一旦训练完成，该模型可以应用于任何领域任何缺失模式的不完整时间序列的插值任务。我们首先为不完整时间序列的每个子序列块设计了特定的嵌入。这些嵌入封装了关于块本身、块内缺失数据模式以及块的统计特征的信息。为了增强模型对不同缺失模式的适应性，我们提出了对比学习方法，使同一块的表示在不同缺失模式下更相似。通过将这种对比损失与缺失数据插值任务结合起来，我们训练PLM获得一种通用的插值模型。此外，我们利用一种逐层微调的插拔式方法来训练特定领域的模型。实验结果表明，利用来自多个领域的一千七百万时间序列数据集，我们获得了一种通用的插值模型，其在各种数据集和缺失模式上均优于现有的特定领域模型。此外，我们发现NuwaTS可以推广到其他时间序列任务，如预测。我们的代码可在https://github.com/Chengyui/NuwaTS上找到。

更新时间: 2024-05-27 16:01:44

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2405.15317v2

Simplicity Bias of Two-Layer Networks beyond Linearly Separable Data

Simplicity bias, the propensity of deep models to over-rely on simple features, has been identified as a potential reason for limited out-of-distribution generalization of neural networks (Shah et al., 2020). Despite the important implications, this phenomenon has been theoretically confirmed and characterized only under strong dataset assumptions, such as linear separability (Lyu et al., 2021). In this work, we characterize simplicity bias for general datasets in the context of two-layer neural networks initialized with small weights and trained with gradient flow. Specifically, we prove that in the early training phases, network features cluster around a few directions that do not depend on the size of the hidden layer. Furthermore, for datasets with an XOR-like pattern, we precisely identify the learned features and demonstrate that simplicity bias intensifies during later training stages. These results indicate that features learned in the middle stages of training may be more useful for OOD transfer. We support this hypothesis with experiments on image data.

Updated: 2024-05-27 16:00:45

标题: 两层网络在线性可分数据之外的简单性偏差

摘要: 简单性偏差是指深度模型倾向于过度依赖简单特征，已被确定为神经网络在分布外泛化能力有限的潜在原因（Shah等人，2020）。尽管这一现象具有重要的含义，但在强数据集假设下，如线性可分性（Lyu等人，2021），这一现象仅在理论上得到确认和描述。在这项工作中，我们对小权重初始化并通过梯度流训练的两层神经网络在一般数据集中的简单性偏差进行了表征。具体来说，我们证明在早期训练阶段，网络特征会聚集在几个方向上，这与隐藏层的大小无关。此外，对于具有类似异或模式的数据集，我们准确地识别了学习到的特征，并证明简单性偏差在后期训练阶段加剧。这些结果表明，在训练的中间阶段学习到的特征可能对分布外转移更有用。我们通过对图像数据的实验支持这一假设。

更新时间: 2024-05-27 16:00:45

领域: stat.ML,cs.LG,math.OC

下载: http://arxiv.org/abs/2405.17299v1

Think Before You Act: Decision Transformers with Working Memory

Decision Transformer-based decision-making agents have shown the ability to generalize across multiple tasks. However, their performance relies on massive data and computation. We argue that this inefficiency stems from the forgetting phenomenon, in which a model memorizes its behaviors in parameters throughout training. As a result, training on a new task may deteriorate the model's performance on previous tasks. In contrast to LLMs' implicit memory mechanism, the human brain utilizes distributed memory storage, which helps manage and organize multiple skills efficiently, mitigating the forgetting phenomenon. Inspired by this, we propose a working memory module to store, blend, and retrieve information for different downstream tasks. Evaluation results show that the proposed method improves training efficiency and generalization in Atari games and Meta-World object manipulation tasks. Moreover, we demonstrate that memory fine-tuning further enhances the adaptability of the proposed architecture.

Updated: 2024-05-27 16:00:31

标题: 在行动之前三思：具有工作记忆的决策转换器

摘要: 基于决策变压器的决策制定代理已经展示出在多个任务之间泛化的能力。然而，它们的表现依赖于大量的数据和计算。我们认为这种低效源于遗忘现象，即模型在训练过程中通过参数记忆其行为。因此，对新任务进行训练可能会降低模型在先前任务上的表现。与LLMs的隐式记忆机制相反，人脑利用分布式存储记忆，有助于有效管理和组织多种技能，减轻遗忘现象。受此启发，我们提出了一个工作记忆模块，用于存储、混合和检索不同下游任务的信息。评估结果显示，所提出的方法改善了Atari游戏和Meta-World物体操纵任务的训练效率和泛化能力。此外，我们证明了记忆微调进一步增强了所提出架构的适应性。

更新时间: 2024-05-27 16:00:31

领域: cs.LG,cs.AI,cs.CL

下载: http://arxiv.org/abs/2305.16338v2

Efficient Ensembles Improve Training Data Attribution

Training data attribution (TDA) methods aim to quantify the influence of individual training data points on the model predictions, with broad applications in data-centric AI, such as mislabel detection, data selection, and copyright compensation. However, existing methods in this field, which can be categorized as retraining-based and gradient-based, have struggled with the trade-off between computational efficiency and attribution efficacy. Retraining-based methods can accurately attribute complex non-convex models but are computationally prohibitive, while gradient-based methods are efficient but often fail for non-convex models. Recent research has shown that augmenting gradient-based methods with ensembles of multiple independently trained models can achieve significantly better attribution efficacy. However, this approach remains impractical for very large-scale applications. In this work, we discover that expensive, fully independent training is unnecessary for ensembling the gradient-based methods, and we propose two efficient ensemble strategies, DROPOUT ENSEMBLE and LORA ENSEMBLE, alternative to naive independent ensemble. These strategies significantly reduce training time (up to 80%), serving time (up to 60%), and space cost (up to 80%) while maintaining similar attribution efficacy to the naive independent ensemble. Our extensive experimental results demonstrate that the proposed strategies are effective across multiple TDA methods on diverse datasets and models, including generative settings, significantly advancing the Pareto frontier of TDA methods with better computational efficiency and attribution efficacy.

Updated: 2024-05-27 15:58:34

标题: 高效的合成提高训练数据的归因

摘要: 培训数据归因（TDA）方法旨在量化单个训练数据点对模型预测的影响，在数据中心人工智能领域具有广泛的应用，如误标签检测、数据选择和版权补偿。然而，目前在这一领域的现有方法可以归类为基于重新训练和基于梯度的方法，在计算效率和归因效果之间存在难以解决的折衷。重新训练方法可以准确归因复杂的非凸模型，但在计算上是禁锢的，而基于梯度的方法高效但通常在非凸模型上失败。最近的研究表明，将多个独立训练的模型集成到基于梯度的方法中可以显著提高归因效果。然而，这种方法对于非常大规模的应用仍然不切实际。在这项工作中，我们发现昂贵的完全独立训练对于集成基于梯度的方法是不必要的，并提出了两种高效的集成策略，DROPOUT ENSEMBLE和LORA ENSEMBLE，作为天真独立集成的替代方法。这些策略显著减少了训练时间（高达80%）、服务时间（高达60%）和空间成本（高达80%），同时保持了与天真独立集成相似的归因效果。我们广泛的实验结果表明，所提出的策略在多个TDA方法上对不同的数据集和模型都是有效的，包括生成设置，显著推进了TDA方法的帕累托前沿，具有更好的计算效率和归因效果。

更新时间: 2024-05-27 15:58:34

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2405.17293v1

Opinion-Guided Reinforcement Learning

Human guidance is often desired in reinforcement learning to improve the performance of the learning agent. However, human insights are often mere opinions and educated guesses rather than well-formulated arguments. While opinions are subject to uncertainty, e.g., due to partial informedness or ignorance about a problem, they also emerge earlier than hard evidence could be produced. Thus, guiding reinforcement learning agents through opinions offers the potential for more performant learning processes, but comes with the challenge of modeling and managing opinions in a formal way. In this article, we present a method to guide reinforcement learning agents through opinions. To this end, we provide an end-to-end method to model and manage advisors' opinions. To assess the utility of the approach, we evaluate it with synthetic and human advisors, at different levels of uncertainty, and under multiple advise strategies. Our results indicate that opinions, even if uncertain, improve the performance of reinforcement learning agents, resulting in higher rewards, more efficient exploration, and a better reinforced policy. Although we demonstrate our approach in a simplified topological running example, our approach is applicable to complex problems with higher dimensions as well.

Updated: 2024-05-27 15:52:27

标题: 意见引导的强化学习

摘要: 人类引导在强化学习中经常被期望用来提高学习代理的性能。然而，人类的见解通常只是意见和经验猜测，而不是经过深思熟虑的论据。尽管意见受到不确定性的影响，例如由于对问题的部分了解或无知，但它们通常比硬证据更早出现。因此，通过意见引导强化学习代理提供了更高效的学习过程的潜力，但也带来了以正式方式建模和管理意见的挑战。在本文中，我们提出了一种通过意见引导强化学习代理的方法。为此，我们提供了一种端到端的方法来模拟和管理顾问的意见。为了评估该方法的实用性，我们使用合成和人类顾问，在不同程度的不确定性和多种建议策略下进行评估。我们的结果表明，即使有不确定性，意见也会提高强化学习代理的性能，导致更高的奖励、更有效的探索以及更好的强化策略。尽管我们在一个简化的拓扑运行示例中演示了我们的方法，但我们的方法也适用于具有更高维度的复杂问题。

更新时间: 2024-05-27 15:52:27

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2405.17287v1

ProSparse: Introducing and Enhancing Intrinsic Activation Sparsity within Large Language Models

Activation sparsity refers to the existence of considerable weakly-contributed elements among activation outputs. As a prevalent property of the models using the ReLU activation function, activation sparsity has been proven a promising paradigm to boost model inference efficiency. Nevertheless, most large language models (LLMs) adopt activation functions without intrinsic activation sparsity (e.g., GELU and Swish). Some recent efforts have explored introducing ReLU or its variants as the substitutive activation function to help LLMs achieve activation sparsity and inference acceleration, but few can simultaneously obtain high sparsity and comparable model performance. This paper introduces a simple and effective sparsification method named "ProSparse" to push LLMs for higher activation sparsity while maintaining comparable performance. Specifically, after substituting the activation function of LLMs with ReLU, ProSparse adopts progressive sparsity regularization with a factor smoothly increasing along the multi-stage sine curves. This can enhance activation sparsity and mitigate performance degradation by avoiding radical shifts in activation distributions. With ProSparse, we obtain high sparsity of 89.32% for LLaMA2-7B, 88.80% for LLaMA2-13B, and 87.89% for end-size MiniCPM-1B, respectively, achieving comparable performance to their original Swish-activated versions. These present the most sparsely activated models among open-source LLaMA versions and competitive end-size models, considerably surpassing ReluLLaMA-7B (66.98%) and ReluLLaMA-13B (71.56%). Our inference acceleration experiments further demonstrate the significant practical acceleration potential of LLMs with higher activation sparsity, obtaining up to 4.52$\times$ inference speedup.

Updated: 2024-05-27 15:49:58

标题: ProSparse：在大型语言模型中引入和增强内在激活稀疏性

摘要: 激活稀疏性指的是激活输出中存在大量贡献较弱的元素。作为使用ReLU激活函数的模型的一种普遍特性，激活稀疏性已被证明是提高模型推断效率的一种有前途的范式。然而，大多数大型语言模型（LLMs）采用没有内在激活稀疏性的激活函数（例如GELU和Swish）。一些最近的研究尝试引入ReLU或其变体作为替代激活函数，帮助LLMs实现激活稀疏性和推断加速，但很少能同时获得高稀疏性和可比的模型性能。本文介绍了一种简单有效的稀疏化方法，命名为"ProSparse"，以促使LLMs实现更高的激活稀疏性，同时保持可比的性能。具体而言，在将LLMs的激活函数替换为ReLU后，ProSparse采用逐步稀疏正则化，其因子沿着多阶段正弦曲线平滑增加。这可以增强激活稀疏性，并通过避免激活分布中的激烈变化来减轻性能下降。通过ProSparse，我们分别获得了LLaMA2-7B的高达89.32%、LLaMA2-13B的88.80%和端尺寸MiniCPM-1B的87.89%的高稀疏性，实现了与它们原始Swish激活版本相当的性能。这些是开源LLaMA版本和竞争性端尺寸模型中最稀疏激活的模型，远远超过ReluLLaMA-7B（66.98%）和ReluLLaMA-13B（71.56%）。我们的推断加速实验进一步证明了具有更高激活稀疏性的LLMs具有显著的实际加速潜力，获得高达4.52倍的推断加速。

更新时间: 2024-05-27 15:49:58

领域: cs.LG,cs.AI,cs.CL,I.2.7

下载: http://arxiv.org/abs/2402.13516v3

An NLP Crosswalk Between the Common Core State Standards and NAEP Item Specifications

Natural language processing (NLP) is rapidly developing for applications in educational assessment. In this paper, I describe an NLP-based procedure that can be used to support subject matter experts in establishing a crosswalk between item specifications and content standards. This paper extends recent work by proposing and demonstrating the use of multivariate similarity based on embedding vectors for sentences or texts. In particular, a hybrid regression procedure is demonstrated for establishing the match of each content standard to multiple item specifications. The procedure is used to evaluate the match of the Common Core State Standards (CCSS) for mathematics at grade 4 to the corresponding item specifications for the 2026 National Assessment of Educational Progress (NAEP).

Updated: 2024-05-27 15:47:46

标题: 一个关于常见核心州标准和全国评估教育进展项目规范之间的自然语言处理对应关系

摘要: 自然语言处理（NLP）正在迅速发展，应用于教育评估。在本文中，我描述了一种基于NLP的程序，可用于支持学科专家建立项目规范和内容标准之间的对应关系。本文通过提出并展示基于嵌入向量的句子或文本的多变量相似度的使用，扩展了最近的工作。特别地，展示了一种混合回归程序，用于确定每个内容标准与多个项目规范的匹配程度。该程序用于评估第四年级数学的共同核心国家标准（CCSS）与2026年全国教育进步评估（NAEP）对应项目规范的匹配情况。

更新时间: 2024-05-27 15:47:46

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2405.17284v1

Recurrent Complex-Weighted Autoencoders for Unsupervised Object Discovery

Current state-of-the-art synchrony-based models encode object bindings with complex-valued activations and compute with real-valued weights in feedforward architectures. We argue for the computational advantages of a recurrent architecture with complex-valued weights. We propose a fully convolutional autoencoder, SynCx, that performs iterative constraint satisfaction: at each iteration, a hidden layer bottleneck encodes statistically regular configurations of features in particular phase relationships; over iterations, local constraints propagate and the model converges to a globally consistent configuration of phase assignments. Binding is achieved simply by the matrix-vector product operation between complex-valued weights and activations, without the need for additional mechanisms that have been incorporated into current synchrony-based models. SynCx outperforms or is strongly competitive with current models for unsupervised object discovery. SynCx also avoids certain systematic grouping errors of current models, such as the inability to separate similarly colored objects without additional supervision.

Updated: 2024-05-27 15:47:03

标题: 复发性复杂加权自编码器用于无监督对象发现

摘要: 目前最先进的基于同步性的模型使用复值激活来编码对象绑定，并在前馈结构中使用实值权重进行计算。我们认为采用复值权重的循环结构具有计算优势。我们提出了一个全卷积自编码器SynCx，该自编码器执行迭代约束满足：在每次迭代中，隐藏层瓶颈将特征的统计规则配置编码为特定相位关系；随着迭代的进行，局部约束传播，模型收敛到全局一致的相位分配配置。绑定仅通过复值权重和激活之间的矩阵-向量乘法操作实现，无需额外机制，这些机制已被整合到当前基于同步性的模型中。SynCx在无监督对象发现方面优于当前模型或具有强竞争力。SynCx还避免了当前模型的某些系统性分组错误，例如无法在没有额外监督的情况下分离颜色相似的对象。

更新时间: 2024-05-27 15:47:03

领域: cs.LG,cs.NE

下载: http://arxiv.org/abs/2405.17283v1

R-ODE: Ricci Curvature Tells When You Will be Informed

Information diffusion prediction is fundamental to understand the structure and organization of the online social networks, and plays a crucial role to blocking rumor spread, influence maximization, political propaganda, etc. So far, most existing solutions primarily predict the next user who will be informed with historical cascades, but ignore an important factor in the diffusion process - the time. Such limitation motivates us to pose the problem of the time-aware personalized information diffusion prediction for the first time, telling the time when the target user will be informed. In this paper, we address this problem from a fresh geometric perspective of Ricci curvature, and propose a novel Ricci-curvature regulated Ordinary Differential Equation (R-ODE). In the diffusion process, R-ODE considers that the inter-correlated users are organized in a dynamic system in the representation space, and the cascades give the observations sampled from the continuous realm. At each infection time, the message diffuses along the largest Ricci curvature, signifying less transportation effort. In the continuous realm, the message triggers users' movement, whose trajectory in the space is parameterized by an ODE with graph neural network. Consequently, R-ODE predicts the infection time of a target user by the movement trajectory learnt from the observations. Extensive experiments evaluate the personalized time prediction ability of R-ODE, and show R-ODE outperforms the state-of-the-art baselines.

Updated: 2024-05-27 15:46:52

标题: R-ODE：黎曼曲率告诉你何时会被通知

摘要: 信息传播预测是理解在线社交网络的结构和组织的基础，对于阻止谣言传播、最大化影响力、政治宣传等起着至关重要的作用。到目前为止，大多数现有的解决方案主要通过历史级联预测下一个被通知的用户，但忽略了传播过程中一个重要因素 - 时间。这种限制激发了我们首次提出时间感知的个性化信息传播预测问题，告诉目标用户何时会被通知。在本文中，我们从一个新颖的几何透视角度，即里奇曲率，解决了这个问题，并提出了一种新颖的里奇曲率调节的普通微分方程（R-ODE）。在传播过程中，R-ODE认为相互关联的用户在表示空间中组织成一个动态系统，级联提供了从连续领域采样的观察。在每个感染时间点，消息沿着最大的里奇曲率扩散，表示较少的运输工作。在连续领域中，消息触发用户的移动，其空间中的轨迹由带有图神经网络的ODE参数化。因此，R-ODE通过从观察中学到的移动轨迹来预测目标用户的感染时间。大量实验证明了R-ODE的个性化时间预测能力，并显示R-ODE胜过了最先进的基线模型。

更新时间: 2024-05-27 15:46:52

领域: cs.SI,cs.LG

下载: http://arxiv.org/abs/2405.17282v1

Challenging the Myth of Graph Collaborative Filtering: a Reasoned and Reproducibility-driven Analysis

The success of graph neural network-based models (GNNs) has significantly advanced recommender systems by effectively modeling users and items as a bipartite, undirected graph. However, many original graph-based works often adopt results from baseline papers without verifying their validity for the specific configuration under analysis. Our work addresses this issue by focusing on the replicability of results. We present a code that successfully replicates results from six popular and recent graph recommendation models (NGCF, DGCF, LightGCN, SGL, UltraGCN, and GFCF) on three common benchmark datasets (Gowalla, Yelp 2018, and Amazon Book). Additionally, we compare these graph models with traditional collaborative filtering models that historically performed well in offline evaluations. Furthermore, we extend our study to two new datasets (Allrecipes and BookCrossing) that lack established setups in existing literature. As the performance on these datasets differs from the previous benchmarks, we analyze the impact of specific dataset characteristics on recommendation accuracy. By investigating the information flow from users' neighborhoods, we aim to identify which models are influenced by intrinsic features in the dataset structure. The code to reproduce our experiments is available at: https://github.com/sisinflab/Graph-RSs-Reproducibility.

Updated: 2024-05-27 15:42:54

标题: 挑战图协同过滤的神话：一种理性和可重复性驱动的分析

摘要: 基于图神经网络模型（GNNs）的成功显著推动了推荐系统，有效地将用户和物品建模为一个双部分、无向图。然而，许多原始基于图的作品常常采用基线论文的结果，而未验证其在特定配置下的有效性。我们的工作着重于结果的可重现性，通过一个成功复制六种流行且最新的图推荐模型（NGCF, DGCF, LightGCN, SGL, UltraGCN 和 GFCF）在三个常见基准数据集（Gowalla, Yelp 2018 和亚马逊图书）上的结果的代码。此外，我们将这些图模型与历史上在离线评估中表现良好的传统协同过滤模型进行比较。此外，我们将研究扩展到两个缺乏现有文献中建立设置的新数据集（Allrecipes 和 BookCrossing）。由于这些数据集的性能与先前的基准数据不同，我们分析了特定数据集特征对推荐准确性的影响。通过研究用户邻域传播的信息流，我们旨在确定哪些模型受到数据集结构内在特征的影响。我们的实验复现代码可在以下链接找到：https://github.com/sisinflab/Graph-RSs-Reproducibility。

更新时间: 2024-05-27 15:42:54

领域: cs.IR,cs.AI

下载: http://arxiv.org/abs/2308.00404v2

Socially-Aware Shared Control Navigation for Assistive Mobile Robots in the Built Environment

As the number of Persons with Disabilities (PWD), particularly those with one or more physical impairments, increases, there is an increasing demand for assistive robotic technologies that can support independent mobility in the built environment and reduce the burden on caregivers. Current assistive mobility platforms (e.g., robotic wheelchairs) often fail to incorporate user preferences and control, leading to reduced trust and efficiency. Existing shared control algorithms do not allow the incorporation of the user control preferences inside the navigation framework or the path planning algorithm. In addition, existing dynamic local planner algorithms for robotic wheelchairs do not take into account the social spaces of people, potentially leading such platforms to infringe upon these areas and cause discomfort. To address these concerns, this work introduces a novel socially-aware shared autonomy-based navigation system for assistive mobile robotic platforms. Our navigation framework comprises a Global Planner and a Local Planner. To implement the Global Planner, the proposed approach introduces a novel User Preference Field (UPF) theory within its global planning framework, explicitly acknowledging user preferences to adeptly navigate away from congested areas. For the Local Planner, we propose a Socially-aware Shared Control-based Model Predictive Control with Dynamic Control Barrier Function (SS-MPC-DCBF) to adjust movements in real-time, integrating user preferences for safer, more autonomous navigation. Evaluation results show that our Global Planner aligns closely with user preferences compared to baselines, and our Local Planner demonstrates enhanced safety and efficiency in dynamic and static scenarios. This integrated approach fosters trust and autonomy, crucial for the acceptance of assistive mobility technologies in the built environment.

Updated: 2024-05-27 15:40:34

标题: 在建筑环境中的辅助移动机器人的社会感知共享控制导航

摘要: 随着残疾人士（尤其是那些有一个或多个身体功能障碍的人）的数量增加，对能够支持建筑环境中独立移动并减轻照护者负担的辅助机器人技术的需求日益增加。目前的辅助移动平台（例如，机器人轮椅）往往没有考虑用户偏好和控制，导致信任和效率降低。现有的共享控制算法不允许将用户控制偏好纳入导航框架或路径规划算法中。此外，现有的机器人轮椅动态局部规划算法没有考虑人们的社交空间，可能导致这些平台侵犯这些区域并引起不适。为了解决这些问题，本文介绍了一种新颖的基于社会意识的共享自治导航系统，用于辅助移动机器人平台。我们的导航框架包括全局规划器和局部规划器。为了实现全局规划器，提出的方法在其全局规划框架中引入了一种新颖的用户偏好场（UPF）理论，明确承认用户偏好以巧妙地避开拥挤区域。对于局部规划器，我们提出了一种基于社会意识共享控制的模型预测控制与动态控制屏障功能（SS-MPC-DCBF），以实时调整运动，集成用户偏好以实现更安全、更自主的导航。评估结果表明，我们的全局规划器与基线相比更符合用户偏好，我们的局部规划器在动态和静态场景中展示了增强的安全性和效率。这种整合方法促进了信任和自治，对于建筑环境中辅助移动技术的接受至关重要。

更新时间: 2024-05-27 15:40:34

领域: cs.RO,cs.AI,cs.HC

下载: http://arxiv.org/abs/2405.17279v1

UniTable: Towards a Unified Framework for Table Recognition via Self-Supervised Pretraining

Tables convey factual and quantitative data with implicit conventions created by humans that are often challenging for machines to parse. Prior work on table recognition (TR) has mainly centered around complex task-specific combinations of available inputs and tools. We present UniTable, a training framework that unifies both the training paradigm and training objective of TR. Its training paradigm combines the simplicity of purely pixel-level inputs with the effectiveness and scalability empowered by self-supervised pretraining from diverse unannotated tabular images. Our framework unifies the training objectives of all three TR tasks - extracting table structure, cell content, and cell bounding box - into a unified task-agnostic training objective: language modeling. Extensive quantitative and qualitative analyses highlight UniTable's state-of-the-art (SOTA) performance on four of the largest TR datasets. UniTable's table parsing capability has surpassed both existing TR methods and general large vision-language models, e.g., GPT-4o, GPT-4-turbo with vision, and LLaVA. Our code is publicly available at https://github.com/poloclub/unitable, featuring a Jupyter Notebook that includes the complete inference pipeline, fine-tuned across multiple TR datasets, supporting all three TR tasks.

Updated: 2024-05-27 15:39:51

标题: UniTable：通过自监督预训练实现表格识别的统一框架

摘要: 表格传达了由人类创建的隐式约定的事实和定量数据，这些约定往往对机器来说很具挑战性。以往关于表格识别（TR）的研究主要集中在复杂的任务特定组合的可用输入和工具上。我们提出了UniTable，一个统一了TR的训练范式和训练目标的训练框架。其训练范式结合了纯像素级输入的简单性和通过多样化未标记表格图像的自监督预训练赋予的有效性和可扩展性。我们的框架将所有三个TR任务的训练目标 - 提取表格结构、单元格内容和单元格边界框 - 统一为一个统一的与任务无关的训练目标：语言建模。广泛的定量和定性分析突显了UniTable在四个最大的TR数据集上的最新性能。UniTable的表格解析能力已经超越了现有的TR方法和通用的大型视觉语言模型，例如GPT-4o、GPT-4-turbo with vision和LLaVA。我们的代码公开可用，网址为https://github.com/poloclub/unitable，其中包括一个Jupyter Notebook，包括完整的推理流程，经过多个TR数据集的微调，支持所有三个TR任务。

更新时间: 2024-05-27 15:39:51

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2403.04822v2

Gradients of Functions of Large Matrices

Tuning scientific and probabilistic machine learning models -- for example, partial differential equations, Gaussian processes, or Bayesian neural networks -- often relies on evaluating functions of matrices whose size grows with the data set or the number of parameters. While the state-of-the-art for evaluating these quantities is almost always based on Lanczos and Arnoldi iterations, the present work is the first to explain how to differentiate these workhorses of numerical linear algebra efficiently. To get there, we derive previously unknown adjoint systems for Lanczos and Arnoldi iterations, implement them in JAX, and show that the resulting code can compete with Diffrax when it comes to differentiating PDEs, GPyTorch for selecting Gaussian process models and beats standard factorisation methods for calibrating Bayesian neural networks. All this is achieved without any problem-specific code optimisation. Find the code at https://github.com/pnkraemer/experiments-lanczos-adjoints and install the library with pip install matfree.

Updated: 2024-05-27 15:39:45

标题: 大矩阵函数的梯度

摘要: 调整科学和概率机器学习模型 - 例如，偏微分方程，高斯过程或贝叶斯神经网络 - 通常依赖于评估随数据集或参数数量增长的矩阵函数。虽然评估这些数量的最新技术几乎总是基于Lanczos和Arnoldi迭代，但本研究是第一个有效解释如何高效地区分这些数值线性代数的核心算法。为了达到这个目标，我们推导了Lanczos和Arnoldi迭代的先前未知的伴随系统，将它们实现在JAX中，并展示了得到的代码在区分PDEs方面可以与Diffrax竞争，在选择高斯过程模型方面可以超越GPyTorch，并且在校准贝叶斯神经网络方面可以击败标准的分解方法。所有这些都是在没有任何特定问题代码优化的情况下实现的。在https://github.com/pnkraemer/experiments-lanczos-adjoints找到代码，并使用pip install matfree安装该库。

更新时间: 2024-05-27 15:39:45

领域: cs.LG,cs.NA,math.NA,stat.ML

下载: http://arxiv.org/abs/2405.17277v1

Unisolver: PDE-Conditional Transformers Are Universal PDE Solvers

Deep models have recently emerged as a promising tool to solve partial differential equations (PDEs), known as neural PDE solvers. While neural solvers trained from either simulation data or physics-informed loss can solve the PDEs reasonably well, they are mainly restricted to a specific set of PDEs, e.g. a certain equation or a finite set of coefficients. This bottleneck limits the generalizability of neural solvers, which is widely recognized as its major advantage over numerical solvers. In this paper, we present the Universal PDE solver (Unisolver) capable of solving a wide scope of PDEs by leveraging a Transformer pre-trained on diverse data and conditioned on diverse PDEs. Instead of simply scaling up data and parameters, Unisolver stems from the theoretical analysis of the PDE-solving process. Our key finding is that a PDE solution is fundamentally under the control of a series of PDE components, e.g. equation symbols, coefficients, and initial and boundary conditions. Inspired by the mathematical structure of PDEs, we define a complete set of PDE components and correspondingly embed them as domain-wise (e.g. equation symbols) and point-wise (e.g. boundaries) conditions for Transformer PDE solvers. Integrating physical insights with recent Transformer advances, Unisolver achieves consistent state-of-the-art results on three challenging large-scale benchmarks, showing impressive gains and endowing favorable generalizability and scalability.

Updated: 2024-05-27 15:34:35

标题: Unisolver：PDE-Conditional Transformers 是通用的 PDE 求解器

摘要: 深度模型最近出现作为解决偏微分方程（PDEs）的有前途的工具，被称为神经PDE求解器。虽然从仿真数据或物理信息损失训练的神经求解器可以相当好地解决PDEs，但它们主要受限于特定的一组PDEs，例如特定方程或一组有限的系数。这种瓶颈限制了神经求解器的泛化能力，这被广泛认为是它相对于数字求解器的主要优势。在本文中，我们提出了能够通过利用在不同数据上预训练的Transformer并受到不同PDEs条件的通用PDE求解器（Unisolver）。Unisolver并非简单地扩大数据和参数，而是源于PDE解决过程的理论分析。我们的关键发现是，PDE解决方案基本上受到一系列PDE组件的控制，例如方程符号、系数以及初始和边界条件。受到PDE的数学结构的启发，我们定义了一个完整的PDE组件集，并相应地将它们作为领域级（例如方程符号）和点级（例如边界）条件嵌入到Transformer PDE求解器中。将物理洞察力与最新的Transformer进展相结合，Unisolver在三个具有挑战性的大规模基准测试上取得了一致的最新成果，显示出令人印象深刻的收益，并赋予了良好的泛化能力和可扩展性。

更新时间: 2024-05-27 15:34:35

领域: cs.LG,cs.AI,cs.NA,math.NA

下载: http://arxiv.org/abs/2405.17527v1

DPN: Decoupling Partition and Navigation for Neural Solvers of Min-max Vehicle Routing Problems

The min-max vehicle routing problem (min-max VRP) traverses all given customers by assigning several routes and aims to minimize the length of the longest route. Recently, reinforcement learning (RL)-based sequential planning methods have exhibited advantages in solving efficiency and optimality. However, these methods fail to exploit the problem-specific properties in learning representations, resulting in less effective features for decoding optimal routes. This paper considers the sequential planning process of min-max VRPs as two coupled optimization tasks: customer partition for different routes and customer navigation in each route (i.e., partition and navigation). To effectively process min-max VRP instances, we present a novel attention-based Partition-and-Navigation encoder (P&N Encoder) that learns distinct embeddings for partition and navigation. Furthermore, we utilize an inherent symmetry in decoding routes and develop an effective agent-permutation-symmetric (APS) loss function. Experimental results demonstrate that the proposed Decoupling-Partition-Navigation (DPN) method significantly surpasses existing learning-based methods in both single-depot and multi-depot min-max VRPs. Our code is available at

Updated: 2024-05-27 15:33:16

标题: DPN：最小-最大车辆路径问题的神经求解器中分区和导航的解耦

摘要: 最小-最大车辆路径问题（最小-最大VRP）通过分配多条路径来遍历所有给定的客户，并旨在最小化最长路径的长度。最近，基于强化学习（RL）的序贯规划方法在解决效率和最优性方面表现出优势。然而，这些方法未能利用学习表示中的问题特定属性，导致对解码最佳路径的特征不够有效。本文将最小-最大VRP的序贯规划过程视为两个耦合优化任务：为不同路径进行客户划分和在每条路径中进行客户导航（即划分和导航）。为了有效处理最小-最大VRP实例，我们提出了一种新颖的基于注意力的划分和导航编码器（P＆N编码器），该编码器学习了用于划分和导航的不同嵌入。此外，我们利用解码路径中的固有对称性，并开发了一种有效的代理-置换对称（APS）损失函数。实验结果表明，提出的解耦-划分-导航（DPN）方法在单仓库和多仓库最小-最大VRP中明显优于现有的基于学习的方法。我们的代码可以在提供的链接中找到。

更新时间: 2024-05-27 15:33:16

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2405.17272v1

FedHPL: Efficient Heterogeneous Federated Learning with Prompt Tuning and Logit Distillation

Federated learning (FL) is a popular privacy-preserving paradigm that enables distributed clients to collaboratively train models with a central server while keeping raw data locally. In practice, distinct model architectures, varying data distributions, and limited resources across local clients inevitably cause model performance degradation and a slowdown in convergence speed. However, existing FL methods can only solve some of the above heterogeneous challenges and have obvious performance limitations. Notably, a unified framework has not yet been explored to overcome these challenges. Accordingly, we propose FedHPL, a parameter-efficient unified $\textbf{Fed}$erated learning framework for $\textbf{H}$eterogeneous settings based on $\textbf{P}$rompt tuning and $\textbf{L}$ogit distillation. Specifically, we employ a local prompt tuning scheme that leverages a few learnable visual prompts to efficiently fine-tune the frozen pre-trained foundation model for downstream tasks, thereby accelerating training and improving model performance under limited local resources and data heterogeneity. Moreover, we design a global logit distillation scheme to handle the model heterogeneity and guide the local training. In detail, we leverage logits to implicitly capture local knowledge and design a weighted knowledge aggregation mechanism to generate global client-specific logits. We provide a theoretical guarantee on the generalization error bound for FedHPL. The experiments on various benchmark datasets under diverse settings of models and data demonstrate that our framework outperforms state-of-the-art FL approaches, with less computation overhead and training rounds.

Updated: 2024-05-27 15:25:32

标题: FedHPL：具有提示调整和逻辑蒸馏的高效异构联邦学习

摘要: 联邦学习（FL）是一种流行的隐私保护范例，它使分布式客户端能够在保留原始数据的同时与中央服务器协作训练模型。在实践中，不同的模型架构、数据分布和本地客户端之间的有限资源不可避免地导致模型性能下降和收敛速度变慢。然而，现有的FL方法只能解决一些上述异构挑战，并且存在明显的性能限制。值得注意的是，尚未探索出一个统一的框架来克服这些挑战。因此，我们提出了FedHPL，一个基于Prompt调整和Logit蒸馏的参数高效的统一的联邦学习框架，适用于异构环境。具体地，我们采用一种本地Prompt调整方案，利用少量可学习的视觉提示来有效地微调冻结的预训练基础模型用于下游任务，从而加速训练并改善在有限的本地资源和数据异质性下的模型性能。此外，我们设计了一个全局Logit蒸馏方案来处理模型异质性并指导本地训练。具体来说，我们利用Logits隐含地捕捉本地知识，并设计了一个加权知识聚合机制来生成全局客户特定的Logits。我们为FedHPL提供了广义误差界的理论保证。在各种基准数据集和模型设置下的实验表明，我们的框架优于最先进的FL方法，并且计算开销和训练轮次更少。

更新时间: 2024-05-27 15:25:32

领域: cs.LG,cs.CV

下载: http://arxiv.org/abs/2405.17267v1

Computational Complexity of Preferred Subset Repairs on Data-Graphs

Preferences are a pivotal component in practical reasoning, especially in tasks that involve decision-making over different options or courses of action that could be pursued. In this work, we focus on repairing and querying inconsistent knowledge bases in the form of graph databases, which involves finding a way to solve conflicts in the knowledge base and considering answers that are entailed from every possible repair, respectively. Without a priori domain knowledge, all possible repairs are equally preferred. Though that may be adequate for some settings, it seems reasonable to establish and exploit some form of preference order among the potential repairs. We study the problem of computing prioritized repairs over graph databases with data values, using a notion of consistency based on GXPath expressions as integrity constraints. We present several preference criteria based on the standard subset repair semantics, incorporating weights, multisets, and set-based priority levels. We show that it is possible to maintain the same computational complexity as in the case where no preference criterion is available for exploitation. Finally, we explore the complexity of consistent query answering in this setting and obtain tight lower and upper bounds for all the preference criteria introduced.

Updated: 2024-05-27 15:24:32

标题: 数据图上首选子集修复的计算复杂性

摘要: 偏好是实践推理中的一个关键组成部分，特别是在涉及对不同选项或可追求的行动方案进行决策的任务中。在这项工作中，我们专注于修复和查询以图数据库形式存在的不一致知识库，这涉及找到解决知识库中冲突的方法，并分别考虑从每种可能的修复中得出的答案。在没有先验领域知识的情况下，所有可能的修复都是同等优先的。尽管这对某些情景可能是足够的，但建立和利用潜在修复的某种偏好顺序似乎是合理的。我们研究了在带有数据值的图数据库上计算优先修复的问题，使用基于GXPath表达式的一致性概念作为完整性约束。我们提出了几种基于标准子集修复语义的偏好标准，包括权重、多重集和基于集合的优先级别。我们展示了可以保持与在没有可利用的偏好标准的情况下相同的计算复杂度是可能的。最后，我们探讨了在这种情境下的一致查询回答的复杂性，并为所有引入的偏好标准获得了严格的下限和上限。

更新时间: 2024-05-27 15:24:32

领域: cs.DB,cs.AI,cs.LO,68P15, 68T27, 03B70, 68T37

下载: http://arxiv.org/abs/2402.09265v2

Reinforcement Learning from Bagged Reward

In Reinforcement Learning (RL), it is commonly assumed that an immediate reward signal is generated for each action taken by the agent, helping the agent maximize cumulative rewards to obtain the optimal policy. However, in many real-world scenarios, immediate reward signals are not obtainable; instead, agents receive a single reward that is contingent upon a partial sequence or a complete trajectory. In this work, we define this challenging problem as Reinforcement Learning from Bagged Reward (RLBR), where sequences of data are treated as bags with non-Markovian bagged rewards. We provide a theoretical study to establish the connection between RLBR and standard RL in Markov Decision Processes (MDPs). To effectively explore the reward distributions within these bags and enhance policy training, we propose a Transformer-based reward model, the Reward Bag Transformer, which employs a bidirectional attention mechanism to interpret contextual nuances and temporal dependencies within each bag. Our empirical evaluations reveal that the challenge intensifies as the bag length increases, leading to the performance degradation due to reduced informational granularity. Nevertheless, our approach consistently outperforms existing methods, demonstrating the least decline in efficacy across varying bag lengths and excelling in approximating the original MDP's reward distribution.

Updated: 2024-05-27 15:23:31

标题: Bagged Reward的强化学习

摘要: 在强化学习（RL）中，通常假定每个动作都会为智能体产生一个即时奖励信号，帮助智能体最大化累积奖励以获得最优策略。然而，在许多现实场景中，即时奖励信号是无法获得的；相反，智能体接收到的是一个取决于部分序列或完整轨迹的单一奖励。在这项工作中，我们将这一具有挑战性的问题定义为从奖励袋中学习的强化学习（RLBR），其中数据序列被视为具有非马尔科夫奖励的袋。我们进行了理论研究，建立了RLBR与马尔科夫决策过程（MDPs）中的标准RL之间的关系。为了有效地探索这些袋中的奖励分布并增强策略训练，我们提出了一种基于Transformer的奖励模型，即奖励袋Transformer，它使用双向注意机制来解释每个袋中的上下文细微差异和时间依赖关系。我们的实证评估表明，随着袋长度的增加，挑战程度增加，由于信息粒度的降低导致性能下降。然而，我们的方法始终优于现有方法，在不同袋长度下表现出较小的效果下降，并在逼近原始MDP的奖励分布方面表现出色。

更新时间: 2024-05-27 15:23:31

领域: cs.LG

下载: http://arxiv.org/abs/2402.03771v2

Incremental Sequence Labeling: A Tale of Two Shifts

The incremental sequence labeling task involves continuously learning new classes over time while retaining knowledge of the previous ones. Our investigation identifies two significant semantic shifts: E2O (where the model mislabels an old entity as a non-entity) and O2E (where the model labels a non-entity or old entity as a new entity). Previous research has predominantly focused on addressing the E2O problem, neglecting the O2E issue. This negligence results in a model bias towards classifying new data samples as belonging to the new class during the learning process. To address these challenges, we propose a novel framework, Incremental Sequential Labeling without Semantic Shifts (IS3). Motivated by the identified semantic shifts (E2O and O2E), IS3 aims to mitigate catastrophic forgetting in models. As for the E2O problem, we use knowledge distillation to maintain the model's discriminative ability for old entities. Simultaneously, to tackle the O2E problem, we alleviate the model's bias towards new entities through debiased loss and optimization levels. Our experimental evaluation, conducted on three datasets with various incremental settings, demonstrates the superior performance of IS3 compared to the previous state-of-the-art method by a significant margin.The data, code, and scripts are publicly available at https://github.com/zzz47zzz/codebase-for-incremental-learning-with-llm.

Updated: 2024-05-27 15:23:17

标题: 增量式序列标记：两次转变的故事

摘要: 增量序列标记任务涉及在保留先前知识的同时，随着时间不断学习新类别。我们的研究确定了两个重要的语义转变：E2O（模型将旧实体误标为非实体）和O2E（模型将非实体或旧实体标记为新实体）。先前的研究主要集中在解决E2O问题上，忽视了O2E问题。这种疏忽导致模型在学习过程中对新数据样本进行分类时存在偏见。为了解决这些挑战，我们提出了一个新颖的框架，即无语义转变的增量顺序标记（IS3）。受到确定的语义转变（E2O和O2E）的启发，IS3旨在减轻模型中的灾难性遗忘。针对E2O问题，我们使用知识蒸馏来维持模型对旧实体的区分能力。同时，为了解决O2E问题，我们通过去偏置损失和优化级别减轻模型对新实体的偏见。我们在三个具有不同增量设置的数据集上进行的实验评估表明，与先前的最先进方法相比，IS3具有显著优越的性能。数据、代码和脚本可在https://github.com/zzz47zzz/codebase-for-incremental-learning-with-llm 上公开获取。

更新时间: 2024-05-27 15:23:17

领域: cs.CL,cs.LG

下载: http://arxiv.org/abs/2402.10447v2

On the Noise Robustness of In-Context Learning for Text Generation

Large language models (LLMs) have shown impressive performance on downstream tasks by in-context learning (ICL), which heavily relies on the quality of demonstrations selected from a large set of annotated examples. Recent works claim that in-context learning is robust to noisy demonstrations in text classification. In this work, we show that, on text generation tasks, noisy annotations significantly hurt the performance of in-context learning. To circumvent the issue, we propose a simple and effective approach called Local Perplexity Ranking (LPR), which replaces the "noisy" candidates with their nearest neighbors that are more likely to be clean. Our method is motivated by analyzing the perplexity deviation caused by noisy labels and decomposing perplexity into inherent perplexity and matching perplexity. Our key idea behind LPR is thus to decouple the matching perplexity by performing the ranking among the neighbors in semantic space. Our approach can prevent the selected demonstrations from including mismatched input-label pairs while preserving the effectiveness of the original selection methods. Extensive experiments demonstrate the effectiveness of LPR, improving the EM score by up to 18.75 on common benchmarks with noisy annotations.

Updated: 2024-05-27 15:22:58

标题: 关于文本生成中上下文学习的噪声鲁棒性

摘要: 大型语言模型(LLMs)通过上下文学习(ICL)展示出对下游任务的出色性能，这在很大程度上取决于从大量注释示例中选择的演示质量。最近的研究声称在文本分类中，上下文学习对嘈杂的演示具有鲁棒性。在这项工作中，我们展示了在文本生成任务中，嘈杂的注释显著损害了上下文学习的性能。为了避免这个问题，我们提出了一种简单而有效的方法，称为局部困惑度排序(LPR)，它用更可能是干净的最近邻来替换“嘈杂”的候选项。我们的方法是通过分析由嘈杂标签引起的困惑度偏差，并将困惑度分解为固有困惑度和匹配困惑度来激发的。LPR背后的关键思想是通过在语义空间中对邻居进行排名来解耦匹配困惑度。我们的方法可以防止所选演示包含不匹配的输入-标签对，同时保留原始选择方法的有效性。大量实验证明了LPR的有效性，在带有嘈杂注释的常见基准上，将EM分数提高了高达18.75。

更新时间: 2024-05-27 15:22:58

领域: cs.CL,cs.LG

下载: http://arxiv.org/abs/2405.17264v1

Simultaneous Deep Learning of Myocardium Segmentation and T2 Quantification for Acute Myocardial Infarction MRI

In cardiac Magnetic Resonance Imaging (MRI) analysis, simultaneous myocardial segmentation and T2 quantification are crucial for assessing myocardial pathologies. Existing methods often address these tasks separately, limiting their synergistic potential. To address this, we propose SQNet, a dual-task network integrating Transformer and Convolutional Neural Network (CNN) components. SQNet features a T2-refine fusion decoder for quantitative analysis, leveraging global features from the Transformer, and a segmentation decoder with multiple local region supervision for enhanced accuracy. A tight coupling module aligns and fuses CNN and Transformer branch features, enabling SQNet to focus on myocardium regions. Evaluation on healthy controls (HC) and acute myocardial infarction patients (AMI) demonstrates superior segmentation dice scores (89.3/89.2) compared to state-of-the-art methods (87.7/87.9). T2 quantification yields strong linear correlations (Pearson coefficients: 0.84/0.93) with label values for HC/AMI, indicating accurate mapping. Radiologist evaluations confirm SQNet's superior image quality scores (4.60/4.58 for segmentation, 4.32/4.42 for T2 quantification) over state-of-the-art methods (4.50/4.44 for segmentation, 3.59/4.37 for T2 quantification). SQNet thus offers accurate simultaneous segmentation and quantification, enhancing cardiac disease diagnosis, such as AMI.

Updated: 2024-05-27 15:20:06

标题: 急性心肌梗死MRI的心肌分割和T2定量的深度学习同时进行

摘要: 在心脏磁共振成像（MRI）分析中，同时进行心肌分割和T2量化对于评估心肌病理情况至关重要。现有方法通常分别解决这些任务，限制了它们的协同潜力。为了解决这个问题，我们提出了SQNet，这是一个集成了Transformer和卷积神经网络（CNN）组件的双任务网络。SQNet具有一个T2-refine融合解码器用于定量分析，利用Transformer中的全局特征，以及一个带有多个局部区域监督的分割解码器，以提高准确性。一个紧密耦合模块对齐和融合CNN和Transformer分支特征，使SQNet能够专注于心肌区域。对健康对照组（HC）和急性心肌梗死患者（AMI）的评估显示，与最先进的方法相比，分割骰子分数（89.3/89.2）更高。T2量化与HC/AMI的标签值之间有很强的线性相关性（Pearson系数：0.84/0.93），表明准确的映射。放射科医师评估证实了SQNet在图像质量评分方面的优越性（分割为4.60/4.58，T2量化为4.32/4.42），超过了最先进的方法（分割为4.50/4.44，T2量化为3.59/4.37）。因此，SQNet提供了准确的同时分割和量化，增强了心脏疾病诊断，如AMI。

更新时间: 2024-05-27 15:20:06

领域: eess.IV,cs.AI

下载: http://arxiv.org/abs/2405.10570v2

Accelerating Simulation of Two-Phase Flows with Neural PDE Surrogates

Simulation is a powerful tool to better understand physical systems, but generally requires computationally expensive numerical methods. Downstream applications of such simulations can become computationally infeasible if they require many forward solves, for example in the case of inverse design with many degrees of freedom. In this work, we investigate and extend neural PDE solvers as a tool to aid in scaling simulations for two-phase flow problems, and simulations of oil expulsion from a pore specifically. We extend existing numerical methods for this problem to a more complex setting involving varying geometries of the domain to generate a challenging dataset. Further, we investigate three prominent neural PDE solver methods, namely the UNet, DRN and U-FNO, and extend them for characteristics of the oil-expulsion problem: (1) spatial conditioning on the geometry; (2) periodicity in the boundary; (3) approximate mass conservation. We scale all methods and benchmark their speed-accuracy trade-off, evaluate qualitative properties, and perform an ablation study. We find that the investigated methods can accurately model the droplet dynamics with up to three orders of magnitude speed-up, that our extensions improve performance over the baselines, and that the introduced varying geometries constitute a significantly more challenging setting over the previously considered oil expulsion problem.

Updated: 2024-05-27 15:18:12

标题: 用神经PDE代理加速两相流模拟

摘要: 模拟是了解物理系统的强大工具，但通常需要计算昂贵的数值方法。如果需要进行许多前向求解，例如在具有许多自由度的反向设计案例中，这种模拟的下游应用可能会变得计算上不可行。在这项工作中，我们研究并扩展神经PDE求解器作为一种辅助扩展两相流问题的模拟，特别是从孔隙中排油的模拟。我们将现有的数值方法扩展到涉及域的不同几何形状以生成具有挑战性的数据集。此外，我们研究了三种主要的神经PDE求解器方法，即UNet、DRN和U-FNO，并将它们扩展到油排放问题的特性上：（1）对几何形状的空间条件；（2）边界上的周期性；（3）近似质量守恒。我们对所有方法进行了扩展和基准测试，评估了速度-精度的权衡，评估了定性性质，并进行了消融研究。我们发现，所研究的方法可以准确地模拟液滴动力学，速度提高了三个数量级，我们的扩展改善了基线的性能，并且引入的不同几何形状构成了一个明显更具挑战性的设置，超过了以前考虑的油排放问题。

更新时间: 2024-05-27 15:18:12

领域: cs.LG,cs.CV,physics.flu-dyn

下载: http://arxiv.org/abs/2405.17260v1

$\textit{Trans-LoRA}$: towards data-free Transferable Parameter Efficient Finetuning

Low-rank adapters (LoRA) and their variants are popular parameter-efficient fine-tuning (PEFT) techniques that closely match full model fine-tune performance while requiring only a small number of additional parameters. These additional LoRA parameters are specific to the base model being adapted. When the base model needs to be deprecated and replaced with a new one, all the associated LoRA modules need to be re-trained. Such re-training requires access to the data used to train the LoRA for the original base model. This is especially problematic for commercial cloud applications where the LoRA modules and the base models are hosted by service providers who may not be allowed to host proprietary client task data. To address this challenge, we propose $\textit{Trans-LoRA}$ -- a novel method for lossless, nearly data-free transfer of LoRAs across base models. Our approach relies on synthetic data to transfer LoRA modules. Using large language models, we design a synthetic data generator to approximate the data-generating process of the $\textit{observed}$ task data subset. Training on the resulting synthetic dataset transfers LoRA modules to new models. We show the effectiveness of our approach using both LLama and Gemma model families. Our approach achieves lossless (mostly improved) LoRA transfer between models within and across different base model families, and even between different PEFT methods, on a wide variety of tasks.

Updated: 2024-05-27 15:15:08

标题: "Trans-LoRA：走向无数据的可转移参数高效微调"

摘要: 低秩适配器（LoRA）及其变体是流行的参数高效微调（PEFT）技术，可以在仅需要少量额外参数的情况下，与完整模型微调性能密切匹配。这些额外的LoRA参数是特定于被适配的基础模型的。当基础模型需要被淘汰并替换为新模型时，所有相关的LoRA模块都需要重新训练。这种重新训练需要访问用于训练原始基础模型LoRA的数据。对于商业云应用来说，这一点尤为棘手，因为LoRA模块和基础模型由可能不允许托管专有客户任务数据的服务提供商托管。为了解决这一挑战，我们提出了Trans-LoRA——一种用于在基础模型之间实现无损、几乎无数据传输LoRA的新方法。我们的方法依赖于合成数据来传输LoRA模块。利用大型语言模型，我们设计了一个合成数据生成器，来近似观察到的任务数据子集的生成过程。在生成的合成数据集上训练，将LoRA模块传输到新模型。我们展示了我们的方法在LLama和Gemma模型系列上的有效性。我们的方法在各种任务上在不同基础模型系列内部和之间，甚至在不同PEFT方法之间实现了无损（大多数得到改善）的LoRA传输。

更新时间: 2024-05-27 15:15:08

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2405.17258v1

Wav-KAN: Wavelet Kolmogorov-Arnold Networks

In this paper, we introduce Wav-KAN, an innovative neural network architecture that leverages the Wavelet Kolmogorov-Arnold Networks (Wav-KAN) framework to enhance interpretability and performance. Traditional multilayer perceptrons (MLPs) and even recent advancements like Spl-KAN face challenges related to interpretability, training speed, robustness, computational efficiency, and performance. Wav-KAN addresses these limitations by incorporating wavelet functions into the Kolmogorov-Arnold network structure, enabling the network to capture both high-frequency and low-frequency components of the input data efficiently. Wavelet-based approximations employ orthogonal or semi-orthogonal basis and maintain a balance between accurately representing the underlying data structure and avoiding overfitting to the noise. While continuous wavelet transform (CWT) has a lot of potentials, we also employed discrete wavelet transform (DWT) for multiresolution analysis, which obviated the need for recalculation of the previous steps in finding the details. Analogous to how water conforms to the shape of its container, Wav-KAN adapts to the data structure, resulting in enhanced accuracy, faster training speeds, and increased robustness compared to Spl-KAN and MLPs. Our results highlight the potential of Wav-KAN as a powerful tool for developing interpretable and high-performance neural networks, with applications spanning various fields. This work sets the stage for further exploration and implementation of Wav-KAN in frameworks such as PyTorch and TensorFlow, aiming to make wavelets in KAN as widespread as activation functions like ReLU and sigmoid in universal approximation theory (UAT). The codes to replicate the simulations are available at https://github.com/zavareh1/Wav-KAN.

Updated: 2024-05-27 15:12:55

标题: Wav-KAN：小波科尔莫戈洛夫-阿诺德网络

摘要: 在这篇论文中，我们介绍了一种创新的神经网络架构Wav-KAN，它利用了Wavelet Kolmogorov-Arnold Networks（Wav-KAN）框架来增强可解释性和性能。传统的多层感知器（MLPs）甚至最近的进展如Spl-KAN都面临与解释性、训练速度、稳健性、计算效率和性能相关的挑战。Wav-KAN通过将小波函数整合到Kolmogorov-Arnold网络结构中，有效地捕获输入数据的高频和低频成分，从而解决了这些限制。基于小波的逼近使用正交或半正交基，并保持准确地表示底层数据结构并避免过拟合噪声之间的平衡。虽然连续小波变换（CWT）具有很大的潜力，但我们还采用了离散小波变换（DWT）进行多分辨率分析，这消除了在查找细节时需要重新计算先前步骤的需求。类似于水如何适应其容器的形状，Wav-KAN适应数据结构，结果是提高了准确性、更快的训练速度和比Spl-KAN和MLPs更强的稳健性。我们的结果突显了Wav-KAN作为开发可解释性和高性能神经网络的强大工具的潜力，应用领域涵盖各个领域。这项工作为在PyTorch和TensorFlow等框架中进一步探索和实施Wav-KAN奠定了基础，旨在使KAN中的小波像ReLU和Sigmoid这样的激活函数在通用逼近理论（UAT）中一样普遍。可复制模拟的代码可在https://github.com/zavareh1/Wav-KAN找到。

更新时间: 2024-05-27 15:12:55

领域: cs.LG,cs.AI,eess.SP,stat.ML

下载: http://arxiv.org/abs/2405.12832v2

Selecting Large Language Model to Fine-tune via Rectified Scaling Law

The ever-growing ecosystem of LLMs has posed a challenge in selecting the most appropriate pre-trained model to fine-tune amidst a sea of options. Given constrained resources, fine-tuning all models and making selections afterward is unrealistic. In this work, we formulate this resource-constrained selection task into predicting fine-tuning performance and illustrate its natural connection with Scaling Law. Unlike pre-training, We find that the fine-tuning scaling curve includes not just the well-known "power phase" but also the previously unobserved "pre-power phase". We also explain why existing Scaling Law fails to capture this phase transition phenomenon both theoretically and empirically. To address this, we introduce the concept of "pre-learned data size" into our Rectified Scaling Law, which overcomes theoretical limitations and fits experimental results much better. By leveraging our law, we propose a novel LLM selection algorithm that selects the near-optimal model with hundreds of times less resource consumption, while other methods may provide negatively correlated selection.

Updated: 2024-05-27 15:11:22

标题: 选择大型语言模型进行微调的校正缩放定律

摘要: LLMs的生态系统不断增长，选择最适合微调的预训练模型在众多选择中构成了一项挑战。鉴于资源有限，微调所有模型然后进行选择是不现实的。在这项工作中，我们将这个资源受限的选择任务转化为预测微调性能，并展示其与缩放定律的自然联系。与预训练不同，我们发现微调缩放曲线不仅包括众所周知的“功率阶段”，还包括以前未观察到的“预功率阶段”。我们还解释了现有的缩放定律在理论上和实证上无法捕捉这种阶段转变现象。为了解决这个问题，我们在我们的矫正缩放定律中引入了“预学习数据大小”的概念，这克服了理论上的限制，并更好地适应实验结果。通过利用我们的定律，我们提出了一种新颖的LLM选择算法，可以在消耗资源的几百倍少的情况下选择接近最佳模型，而其他方法可能提供负相关的选择。

更新时间: 2024-05-27 15:11:22

领域: cs.LG,cs.AI,cs.CL

下载: http://arxiv.org/abs/2402.02314v2

Wasserstein Gradient Flow over Variational Parameter Space for Variational Inference

Variational inference (VI) can be cast as an optimization problem in which the variational parameters are tuned to closely align a variational distribution with the true posterior. The optimization task can be approached through vanilla gradient descent in black-box VI or natural-gradient descent in natural-gradient VI. In this work, we reframe VI as the optimization of an objective that concerns probability distributions defined over a \textit{variational parameter space}. Subsequently, we propose Wasserstein gradient descent for tackling this optimization problem. Notably, the optimization techniques, namely black-box VI and natural-gradient VI, can be reinterpreted as specific instances of the proposed Wasserstein gradient descent. To enhance the efficiency of optimization, we develop practical methods for numerically solving the discrete gradient flows. We validate the effectiveness of the proposed methods through empirical experiments on a synthetic dataset, supplemented by theoretical analyses.

Updated: 2024-05-27 15:09:48

标题: Wasserstein梯度流在变分推断的变分参数空间中的应用

摘要: 变分推断（VI）可以被视为一个优化问题，其中变分参数被调整以使变分分布与真实后验尽可能接近。优化任务可以通过黑盒VI中的普通梯度下降或自然梯度VI中的自然梯度下降来处理。在这项工作中，我们重新构建VI为一个关于定义在变分参数空间上的概率分布的目标的优化问题。随后，我们提出了Wasserstein梯度下降来解决这个优化问题。值得注意的是，优化技术，即黑盒VI和自然梯度VI，可以被重新解释为所提出的Wasserstein梯度下降的具体实例。为了增强优化的效率，我们开发了用于数值求解离散梯度流的实际方法。我们通过在一个合成数据集上进行的实证实验以及理论分析来验证所提出方法的有效性。

更新时间: 2024-05-27 15:09:48

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2310.16705v2

Gaussian Embedding of Temporal Networks

Representing the nodes of continuous-time temporal graphs in a low-dimensional latent space has wide-ranging applications, from prediction to visualization. Yet, analyzing continuous-time relational data with timestamped interactions introduces unique challenges due to its sparsity. Merely embedding nodes as trajectories in the latent space overlooks this sparsity, emphasizing the need to quantify uncertainty around the latent positions. In this paper, we propose TGNE (\textbf{T}emporal \textbf{G}aussian \textbf{N}etwork \textbf{E}mbedding), an innovative method that bridges two distinct strands of literature: the statistical analysis of networks via Latent Space Models (LSM)\cite{Hoff2002} and temporal graph machine learning. TGNE embeds nodes as piece-wise linear trajectories of Gaussian distributions in the latent space, capturing both structural information and uncertainty around the trajectories. We evaluate TGNE's effectiveness in reconstructing the original graph and modelling uncertainty. The results demonstrate that TGNE generates competitive time-varying embedding locations compared to common baselines for reconstructing unobserved edge interactions based on observed edges. Furthermore, the uncertainty estimates align with the time-varying degree distribution in the network, providing valuable insights into the temporal dynamics of the graph. To facilitate reproducibility, we provide an open-source implementation of TGNE at \url{https://github.com/aida-ugent/tgne}.

Updated: 2024-05-27 15:07:57

标题: 高斯嵌入的时间网络

摘要: 表示连续时间临时图的节点在低维潜在空间中具有广泛的应用，从预测到可视化。然而，分析时间戳交互的连续时间关系数据引入了独特的挑战，因为数据稀疏。仅将节点嵌入潜在空间中的轨迹忽略了这种稀疏性，强调了在潜在位置周围量化不确定性的需求。在本文中，我们提出了TGNE（\textbf{T}emporal \textbf{G}aussian \textbf{N}etwork \textbf{E}mbedding），这是一种创新方法，桥接了两个不同的文献领域：通过潜在空间模型（LSM）进行网络统计分析\cite{Hoff2002}和时间图机器学习。TGNE将节点嵌入潜在空间中的高斯分布的分段线性轨迹，捕捉了结构信息和轨迹周围的不确定性。我们评估了TGNE在重建原始图和建模不确定性方面的有效性。结果表明，与基于观察边重建未观察边交互的常见基线相比，TGNE生成了具有竞争力的时变嵌入位置。此外，不确定性估计与网络中的时变度分布相一致，为图的时间动态提供了有价值的洞察。为了促进可重现性，我们提供了TGNE的开源实现，网址为\url{https://github.com/aida-ugent/tgne}。

更新时间: 2024-05-27 15:07:57

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2405.17253v1

Assessing LLMs Suitability for Knowledge Graph Completion

Recent work shown the capability of Large Language Models (LLMs) to solve tasks related to Knowledge Graphs, such as Knowledge Graph Completion, even in Zero- or Few-Shot paradigms. However, they are known to hallucinate answers, or output results in a non-deterministic manner, thus leading to wrongly reasoned responses, even if they satisfy the user's demands. To highlight opportunities and challenges in knowledge graphs-related tasks, we experiment with two distinguished LLMs, namely Mixtral-8x7B-Instruct-v0.1, and gpt-3.5-turbo-0125, on Knowledge Graph Completion for static knowledge graphs, using prompts constructed following the TELeR taxonomy, in Zero- and One-Shot contexts, on a Task-Oriented Dialogue system use case. When evaluated using both strict and flexible metrics measurement manners, our results show that LLMs could be fit for such a task if prompts encapsulate sufficient information and relevant examples.

Updated: 2024-05-27 15:04:50

标题: 评估LLM在知识图谱补全中的适用性

摘要: 最近的研究表明，大型语言模型（LLMs）有能力解决与知识图谱相关的任务，如知识图谱补全，即使在零次或少次样本学习范式下也是如此。然而，它们被认为会出现幻觉性答案，或以非确定性方式输出结果，因此可能会导致错误推理的回答，即使它们满足用户的需求。为了突显与知识图谱相关任务中的机遇和挑战，我们在静态知识图谱上使用TELeR分类法构建提示，在零次和一次样本学习背景下，通过实验使用两个杰出的LLMs，分别是Mixtral-8x7B-Instruct-v0.1和gpt-3.5-turbo-0125，对任务导向的对话系统应用案例进行知识图谱补全。当使用严格和灵活的度量方式进行评估时，我们的结果表明，如果提示包含足够的信息和相关示例，LLMs可能适用于这样的任务。

更新时间: 2024-05-27 15:04:50

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2405.17249v1

Transformer In-Context Learning for Categorical Data

Recent research has sought to understand Transformers through the lens of in-context learning with functional data. We extend that line of work with the goal of moving closer to language models, considering categorical outcomes, nonlinear underlying models, and nonlinear attention. The contextual data are of the form $\textsf{C}=(x_1,c_1,\dots,x_N,c_{N})$ where each $c_i\in\{0,\dots,C-1\}$ is drawn from a categorical distribution that depends on covariates $x_i\in\mathbb{R}^d$. Contextual outcomes in the $m$th set of contextual data, $\textsf{C}_m$, are modeled in terms of latent function $f_m(x)\in\textsf{F}$, where $\textsf{F}$ is a functional class with $(C-1)$-dimensional vector output. The probability of observing class $c\in\{0,\dots,C-1\}$ is modeled in terms of the output components of $f_m(x)$ via the softmax. The Transformer parameters may be trained with $M$ contextual examples, $\{\textsf{C}_m\}_{m=1,M}$, and the trained model is then applied to new contextual data $\textsf{C}_{M+1}$ for new $f_{M+1}(x)\in\textsf{F}$. The goal is for the Transformer to constitute the probability of each category $c\in\{0,\dots,C-1\}$ for a new query $x_{N_{M+1}+1}$. We assume each component of $f_m(x)$ resides in a reproducing kernel Hilbert space (RKHS), specifying $\textsf{F}$. Analysis and an extensive set of experiments suggest that on its forward pass the Transformer (with attention defined by the RKHS kernel) implements a form of gradient descent of the underlying function, connected to the latent vector function associated with the softmax. We present what is believed to be the first real-world demonstration of this few-shot-learning methodology, using the ImageNet dataset.

Updated: 2024-05-27 15:03:21

标题: Transformer上下文学习在分类数据中的应用

摘要: 最近的研究试图通过功能数据的视角来理解变形金刚。我们延伸了这一研究方向，目标是更接近语言模型，考虑分类结果、非线性基础模型和非线性注意力。上下文数据的形式为$\textsf{C}=(x_1,c_1,\dots,x_N,c_{N})$，其中每个$c_i\in\{0,\dots,C-1\}$都是从依赖协变量$x_i\in\mathbb{R}^d$的分类分布中抽取的。第$m$组上下文数据中的上下文结果$\textsf{C}_m$被建模为潜在函数$f_m(x)\in\textsf{F}$，其中$\textsf{F}$是一个具有$(C-1)$维向量输出的功能类。观察到类别$c\in\{0,\dots,C-1\}$的概率是通过softmax模型化$f_m(x)$的输出分量来建模的。变形金刚参数可以使用$M$个上下文示例$\{\textsf{C}_m\}_{m=1,M}$进行训练，然后将训练模型应用于新的上下文数据$\textsf{C}_{M+1}$以获得新的$f_{M+1}(x)\in\textsf{F}$。目标是使变形金刚构成对新查询$x_{N_{M+1}+1}$的每个类别$c\in\{0,\dots,C-1\}$的概率。我们假设$f_m(x)$的每个组件都驻留在再生核希尔伯特空间（RKHS）中，指定$\textsf{F}$。分析和广泛的实验表明，在前向传递中，变形金刚（其注意力由RKHS核定义）实现了梯度下降的一种形式，与与softmax相关的潜在向量函数相连。我们展示了这种少样本学习方法的第一个真实世界演示，使用了ImageNet数据集。

更新时间: 2024-05-27 15:03:21

领域: stat.ML,cs.LG

下载: http://arxiv.org/abs/2405.17248v1

An Introduction to Vision-Language Modeling

Following the recent popularity of Large Language Models (LLMs), several attempts have been made to extend them to the visual domain. From having a visual assistant that could guide us through unfamiliar environments to generative models that produce images using only a high-level text description, the vision-language model (VLM) applications will significantly impact our relationship with technology. However, there are many challenges that need to be addressed to improve the reliability of those models. While language is discrete, vision evolves in a much higher dimensional space in which concepts cannot always be easily discretized. To better understand the mechanics behind mapping vision to language, we present this introduction to VLMs which we hope will help anyone who would like to enter the field. First, we introduce what VLMs are, how they work, and how to train them. Then, we present and discuss approaches to evaluate VLMs. Although this work primarily focuses on mapping images to language, we also discuss extending VLMs to videos.

Updated: 2024-05-27 15:01:23

标题: 一个关于视觉-语言建模的介绍

摘要: 随着大型语言模型（LLMs）近期的流行，已经有几次尝试将它们扩展到视觉领域。从拥有一个可以引导我们穿越陌生环境的视觉助手到仅使用高级文本描述生成图像的生成模型，视觉语言模型（VLM）应用将显著影响我们与技术的关系。然而，有许多挑战需要解决以提高这些模型的可靠性。虽然语言是离散的，但视觉发展在一个更高维度的空间中，其中概念并不总是容易离散化。为了更好地理解将视觉映射到语言背后的机制，我们提出了这篇介绍VLM的文章，希望能帮助任何想进入这一领域的人。首先，我们介绍了VLM是什么，它们如何工作以及如何训练它们。然后，我们提出并讨论了评估VLM的方法。虽然这项工作主要集中在将图像映射到语言上，但我们也讨论了将VLM扩展到视频的可能性。

更新时间: 2024-05-27 15:01:23

领域: cs.LG

下载: http://arxiv.org/abs/2405.17247v1

Galaxy: A Resource-Efficient Collaborative Edge AI System for In-situ Transformer Inference

Transformer-based models have unlocked a plethora of powerful intelligent applications at the edge, such as voice assistant in smart home. Traditional deployment approaches offload the inference workloads to the remote cloud server, which would induce substantial pressure on the backbone network as well as raise users' privacy concerns. To address that, in-situ inference has been recently recognized for edge intelligence, but it still confronts significant challenges stemming from the conflict between intensive workloads and limited on-device computing resources. In this paper, we leverage our observation that many edge environments usually comprise a rich set of accompanying trusted edge devices with idle resources and propose Galaxy, a collaborative edge AI system that breaks the resource walls across heterogeneous edge devices for efficient Transformer inference acceleration. Galaxy introduces a novel hybrid model parallelism to orchestrate collaborative inference, along with a heterogeneity-aware parallelism planning for fully exploiting the resource potential. Furthermore, Galaxy devises a tile-based fine-grained overlapping of communication and computation to mitigate the impact of tensor synchronizations on inference latency under bandwidth-constrained edge environments. Extensive evaluation based on prototype implementation demonstrates that Galaxy remarkably outperforms state-of-the-art approaches under various edge environment setups, achieving up to 2.5x end-to-end latency reduction.

Updated: 2024-05-27 15:01:04

标题: 星系：一种资源高效的协作边缘人工智能系统，用于原位变压器推断

摘要: 基于Transformer的模型已经在边缘解锁了大量强大的智能应用，例如智能家居中的语音助手。传统的部署方法将推理工作负载转移到远程云服务器，这会给骨干网络带来巨大压力，同时也会引发用户的隐私担忧。为了解决这个问题，最近开始认识到边缘智能的in-situ推理，但仍然面临着来自密集工作负载和有限设备计算资源之间冲突的重大挑战。在本文中，我们利用我们的观察，许多边缘环境通常包括一组富有信任的边缘设备，这些设备具有空闲资源，并提出了Galaxy，一个协作边缘AI系统，它打破了异构边缘设备之间的资源壁垒，实现了高效的Transformer推理加速。Galaxy引入了一种新颖的混合模型并行性来协调协作推理，同时还有一种适应性并行规划，充分利用资源潜力。此外，Galaxy设计了基于瓦片的细粒度重叠通信和计算，以减轻在带宽受限的边缘环境下张量同步对推理延迟的影响。基于原型实现的广泛评估表明，Galaxy在各种边缘环境设置下明显优于最先进的方法，实现了最多2.5倍的端到端延迟缩短。

更新时间: 2024-05-27 15:01:04

领域: cs.DC,cs.AI,cs.LG,cs.NI

下载: http://arxiv.org/abs/2405.17245v1

Surprise-Adaptive Intrinsic Motivation for Unsupervised Reinforcement Learning

Both entropy-minimizing and entropy-maximizing (curiosity) objectives for unsupervised reinforcement learning (RL) have been shown to be effective in different environments, depending on the environment's level of natural entropy. However, neither method alone results in an agent that will consistently learn intelligent behavior across environments. In an effort to find a single entropy-based method that will encourage emergent behaviors in any environment, we propose an agent that can adapt its objective online, depending on the entropy conditions by framing the choice as a multi-armed bandit problem. We devise a novel intrinsic feedback signal for the bandit, which captures the agent's ability to control the entropy in its environment. We demonstrate that such agents can learn to control entropy and exhibit emergent behaviors in both high- and low-entropy regimes and can learn skillful behaviors in benchmark tasks. Videos of the trained agents and summarized findings can be found on our project page https://sites.google.com/view/surprise-adaptive-agents

Updated: 2024-05-27 14:58:24

标题: 意外适应性内在动机对无监督强化学习的影响

摘要: 熵最小化和熵最大化（好奇心）目标对于无监督强化学习（RL）在不同环境中已被证明是有效的，取决于环境的自然熵水平。然而，单独使用任何一种方法都不能使代理人在各种环境中始终学习到智能行为。为了找到一种基于熵的方法，可以在任何环境中促进新兴行为，我们提出了一种代理人，可以根据熵条件在线调整其目标，将选择框架化为多臂老虎机问题。我们为老虎机设计了一种新颖的内在反馈信号，捕捉了代理人在环境中控制熵的能力。我们证明这样的代理人可以学会控制熵，并在高熵和低熵制度中展示新兴行为，并且可以在基准任务中学会熟练的行为。训练代理人的视频和总结的发现可以在我们的项目页面上找到：https://sites.google.com/view/surprise-adaptive-agents.

更新时间: 2024-05-27 14:58:24

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2405.17243v1

Stochastic Two Points Method for Deep Model Zeroth-order Optimization

Large foundation models, such as large language models, have performed exceptionally well in various application scenarios. Building or fully fine-tuning such large models is usually prohibitive due to either hardware budget or lack of access to backpropagation. The zeroth-order methods offer a promising direction for tackling this challenge, where only forward passes are needed to update the model. This paper introduces an efficient Stochastic Two-Point (S2P) approach within the gradient-free regime. We present the theoretical convergence properties of S2P under the general and relaxed smoothness assumptions, and the derived results help understand and inherently connect the two popular types of zeroth-order methods, basic random search and stochastic three-point method. The theoretical properties also shed light on a Variant of S2P (VS2P), through exploiting our new convergence properties that better represent the dynamics of deep models in training. Our comprehensive empirical results show that VS2P is highly effective in optimizing objectives for deep models. It outperforms or achieves competitive performance compared to standard methods across various model types and scales.

Updated: 2024-05-27 14:56:01

标题: 随机两点方法用于深度模型零阶优化

摘要: 大型基础模型，如大型语言模型，在各种应用场景中表现出色。由于硬件预算或无法访问反向传播，构建或完全微调这些大型模型通常是不可行的。零阶方法为解决这一挑战提供了一个有前途的方向，只需要进行前向传递即可更新模型。本文介绍了梯度自由范围内的一种高效的随机两点（S2P）方法。我们在一般和宽松的平滑性假设下展示了S2P的理论收敛性质，并且推导的结果有助于理解和固有地连接两种流行的零阶方法，即基本随机搜索和随机三点方法。理论性质也为VS2P（一种S2P的变体）提供了启示，通过利用我们的新收敛性质，更好地表示深度模型在训练中的动态。我们全面的实证结果表明，VS2P在优化深度模型的目标方面非常有效。与各种模型类型和规模相比，它在各种标准方法中表现出色或取得竞争性表现。

更新时间: 2024-05-27 14:56:01

领域: cs.LG

下载: http://arxiv.org/abs/2402.01621v3

LLM-Assisted Static Analysis for Detecting Security Vulnerabilities

Software is prone to security vulnerabilities. Program analysis tools to detect them have limited effectiveness in practice. While large language models (or LLMs) have shown impressive code generation capabilities, they cannot do complex reasoning over code to detect such vulnerabilities, especially because this task requires whole-repository analysis. In this work, we propose IRIS, the first approach that systematically combines LLMs with static analysis to perform whole-repository reasoning to detect security vulnerabilities. We curate a new dataset, CWE-Bench-Java, comprising 120 manually validated security vulnerabilities in real-world Java projects. These projects are complex, with an average of 300,000 lines of code and a maximum of up to 7 million. Out of 120 vulnerabilities in CWE-Bench-Java, IRIS detects 69 using GPT-4, while the state-of-the-art static analysis tool only detects 27. Further, IRIS also significantly reduces the number of false alarms (by more than 80% in the best case).

Updated: 2024-05-27 14:53:35

标题: LLM辅助的静态分析用于检测安全漏洞

摘要: 软件容易存在安全漏洞。检测这些漏洞的程序分析工具在实践中的效果有限。虽然大型语言模型（或LLMs）展示了令人印象深刻的代码生成能力，但它们无法进行对代码的复杂推理来检测这些漏洞，特别是因为这个任务需要对整个存储库进行分析。在这项工作中，我们提出了IRIS，这是第一个系统地将LLMs与静态分析相结合来进行整个存储库的推理以检测安全漏洞的方法。我们整理了一个新的数据集，名为CWE-Bench-Java，其中包含了120个在现实世界Java项目中手动验证的安全漏洞。这些项目很复杂，平均代码行数为30万行，最多可达700万行。在CWE-Bench-Java的120个漏洞中，IRIS使用GPT-4检测到了69个，而最先进的静态分析工具只能检测到27个。此外，IRIS还显著减少了误报的数量（在最好情况下超过80%）。

更新时间: 2024-05-27 14:53:35

领域: cs.CR,cs.PL,cs.SE

下载: http://arxiv.org/abs/2405.17238v1

Learn or Recall? Revisiting Incremental Learning with Pre-trained Language Models

Incremental Learning (IL) has been a long-standing problem in both vision and Natural Language Processing (NLP) communities. In recent years, as Pre-trained Language Models (PLMs) have achieved remarkable progress in various NLP downstream tasks, utilizing PLMs as backbones has become a common practice in recent research of IL in NLP. Most assume that catastrophic forgetting is the biggest obstacle to achieving superior IL performance and propose various techniques to overcome this issue. However, we find that this assumption is problematic. Specifically, we revisit more than 20 methods on four classification tasks (Text Classification, Intent Classification, Relation Extraction, and Named Entity Recognition) under the two most popular IL settings (Class-Incremental and Task-Incremental) and reveal that most of them severely underestimate the inherent anti-forgetting ability of PLMs. Based on the observation, we propose a frustratingly easy method called SEQ* for IL with PLMs. The results show that SEQ* has competitive or superior performance compared to state-of-the-art (SOTA) IL methods and requires considerably less trainable parameters and training time. These findings urge us to revisit the IL with PLMs and encourage future studies to have a fundamental understanding of the catastrophic forgetting in PLMs. The data, code and scripts are publicly available at https://github.com/zzz47zzz/codebase-for-incremental-learning-with-llm.

Updated: 2024-05-27 14:53:32

标题: 学习还是回忆？重新审视预训练语言模型的渐进学习

摘要: 增量学习（IL）一直是视觉和自然语言处理（NLP）社区中长期存在的问题。近年来，随着预训练语言模型（PLMs）在各种NLP下游任务中取得了显著进展，利用PLMs作为骨干已成为近期NLP中IL研究的常见做法。大多数人认为灾难性遗忘是实现优越IL性能的最大障碍，并提出了各种技术来克服这个问题。然而，我们发现这种假设是有问题的。具体来说，我们重新审视了四个分类任务（文本分类、意图分类、关系提取和命名实体识别）上的20多种方法，在两种最流行的IL设置（类增量和任务增量）下，并发现大多数方法严重低估了PLMs固有的抗遗忘能力。基于这一观察，我们提出了一种称为SEQ*的对于IL与PLMs的极其简单的方法。结果表明，与最先进的IL方法相比，SEQ*具有竞争力或卓越的性能，并且需要较少的可训练参数和训练时间。这些发现迫使我们重新审视IL与PLMs，并鼓励未来的研究对PLMs中的灾难性遗忘有基本的理解。数据、代码和脚本可在https://github.com/zzz47zzz/codebase-for-incremental-learning-with-llm上公开获取。

更新时间: 2024-05-27 14:53:32

领域: cs.CL,cs.LG

下载: http://arxiv.org/abs/2312.07887v4

Benchmarking General Purpose In-Context Learning

In-context learning (ICL) capabilities is becoming increasingly appealing towards building general intelligence. Taking this concept one step further, we draw a parallel to humans and many animals, who inherit primarily learning capabilities but refine their memory and acquire diverse skills and knowledge through extensive lifelong experiences. This parallel inspires our approach to general purpose in-context learning (GPICL). This paper introduces two lightweight but insightful benchmarks specifically crafted to train and evaluate GPICL functionalities. Each benchmark encompasses a wide range of diverse tasks characterized by generation and interaction, minimal transferable knowledge, and long-term dependency. These features present significant challenges for models that primarily rely on context or interactions to enhance their proficiency. We hope that these benchmarks will not only advance research in GPICL but also contribute significantly to the broader field of general intelligence.

Updated: 2024-05-27 14:50:42

标题: 基准测试通用目的上下文学习

摘要: 在场景学习（ICL）能力越来越受人们青睐，这有助于构建通用智能。将这一概念进一步发展，我们将其与人类和许多动物相提并论，它们主要继承了学习能力，但通过长期的生活经验来完善记忆，获得各种技能和知识。这种类比启发了我们对通用场景学习（GPICL）的方法。本文介绍了两个轻量级但富有洞察力的基准，专门设计用于训练和评估GPICL功能。每个基准涵盖了广泛的多样化任务，以生成和交互、最小的可转移知识和长期依赖为特征。这些特点对于主要依赖于上下文或交互来提高其熟练度的模型提出了重大挑战。我们希望这些基准不仅能推动GPICL领域的研究，还能为更广泛的通用智能领域做出重大贡献。

更新时间: 2024-05-27 14:50:42

领域: cs.AI,cs.LG

下载: http://arxiv.org/abs/2405.17234v1

CLAQ: Pushing the Limits of Low-Bit Post-Training Quantization for LLMs

Parameter quantization for Large Language Models (LLMs) has attracted increasing attentions recently in reducing memory costs and improving computational efficiency. Early approaches have been widely adopted. However, the existing methods suffer from poor performance in low-bit (such as 2 to 3 bits) scenarios. In this paper, we present a novel and effective Column-Level Adaptive weight Quantization (CLAQ) framework by introducing three different types of adaptive strategies for LLM quantization. Firstly, a K-Means clustering based algorithm is proposed that allows dynamic generation of quantization centroids for each column of a parameter matrix. Secondly, we design an outlier-guided adaptive precision search strategy which can dynamically assign varying bit-widths to different columns. Finally, a dynamic outlier reservation scheme is developed to retain some parameters in their original float point precision, in trade off of boosted model performance. Experiments on various mainstream open source LLMs including LLaMA-1, LLaMA-2 and Yi demonstrate that our methods achieve the state-of-the-art results across different bit settings, especially in extremely low-bit scenarios. Code will be released soon.

Updated: 2024-05-27 14:49:39

标题: CLAQ：推动LLMs的低位后训练量化的极限

摘要: 最近，对大型语言模型（LLMs）进行参数量化在减少内存成本和提高计算效率方面引起了越来越多的关注。早期的方法已被广泛采用。然而，现有的方法在低位（如2到3位）情况下表现不佳。本文介绍了一种新颖有效的基于列级自适应权重量化（CLAQ）框架，引入了三种不同的自适应策略来对LLM进行量化。首先，提出了基于K-Means聚类的算法，允许动态生成参数矩阵每列的量化中心。其次，设计了一种基于异常值引导的自适应精度搜索策略，可以动态为不同列分配不同的位宽。最后，开发了一种动态异常值保留方案，保留一些参数在其原始浮点精度中，以换取提升模型性能。对包括LLaMA-1、LLaMA-2和Yi在内的各种主流开源LLMs进行的实验表明，我们的方法在不同位设置下取得了最先进的结果，特别是在极低位情况下。代码即将发布。

更新时间: 2024-05-27 14:49:39

领域: cs.LG

下载: http://arxiv.org/abs/2405.17233v1

Towards Stability of Parameter-free Optimization

Hyperparameter tuning, particularly the selection of an appropriate learning rate in adaptive gradient training methods, remains a challenge. To tackle this challenge, in this paper, we propose a novel parameter-free optimizer, \textsc{AdamG} (Adam with the golden step size), designed to automatically adapt to diverse optimization problems without manual tuning. The core technique underlying \textsc{AdamG} is our golden step size derived for the AdaGrad-Norm algorithm, which is expected to help AdaGrad-Norm preserve the tuning-free convergence and approximate the optimal step size in expectation w.r.t. various optimization scenarios. To better evaluate tuning-free performance, we propose a novel evaluation criterion, \textit{reliability}, to comprehensively assess the efficacy of parameter-free optimizers in addition to classical performance criteria. Empirical results demonstrate that compared with other parameter-free baselines, \textsc{AdamG} achieves superior performance, which is consistently on par with Adam using a manually tuned learning rate across various optimization tasks.

Updated: 2024-05-27 14:46:21

标题: 朝向无参数优化的稳定性

摘要: 超参数调整，尤其是在自适应梯度训练方法中选择合适的学习率，仍然是一个挑战。为了解决这个挑战，在本文中，我们提出了一种新颖的无参数优化器\textsc{AdamG}（具有黄金步长的Adam），旨在自动适应各种优化问题，无需手动调整。 \textsc{AdamG}的核心技术是我们为AdaGrad-Norm算法推导出的黄金步长，预期将有助于AdaGrad-Norm保持无调整收敛性并在期望中逼近各种优化情景下的最佳步长。为了更好地评估无调整性能，我们提出了一个新的评估标准，\textit{可靠性}，以全面评估无参数优化器的有效性，除了传统的性能标准。实证结果表明，与其他无参数基线相比，\textsc{AdamG}取得了卓越的性能，始终与手动调整学习率的Adam在各种优化任务中表现一致。

更新时间: 2024-05-27 14:46:21

领域: cs.LG

下载: http://arxiv.org/abs/2405.04376v3

A Retrospective of the Tutorial on Opportunities and Challenges of Online Deep Learning

Machine learning algorithms have become indispensable in today's world. They support and accelerate the way we make decisions based on the data at hand. This acceleration means that data structures that were valid at one moment could no longer be valid in the future. With these changing data structures, it is necessary to adapt machine learning (ML) systems incrementally to the new data. This is done with the use of online learning or continuous ML technologies. While deep learning technologies have shown exceptional performance on predefined datasets, they have not been widely applied to online, streaming, and continuous learning. In this retrospective of our tutorial titled Opportunities and Challenges of Online Deep Learning held at ECML PKDD 2023, we provide a brief overview of the opportunities but also the potential pitfalls for the application of neural networks in online learning environments using the frameworks River and Deep-River.

Updated: 2024-05-27 14:40:03

标题: 一篇关于在线深度学习机会与挑战的教程回顾

摘要: 机器学习算法在当今世界变得不可或缺。它们支持并加速我们基于手头数据做出决策的方式。这种加速意味着一个时刻有效的数据结构在未来可能不再有效。随着这些数据结构的变化，有必要逐步调整机器学习（ML）系统以适应新数据。这是通过使用在线学习或连续ML技术来实现的。虽然深度学习技术在预定义数据集上表现出色，但它们并未广泛应用于在线、流式和连续学习。在我们于ECML PKDD 2023举办的题为“在线深度学习的机遇与挑战”的教程的回顾中，我们简要概述了在使用River和Deep-River框架的在线学习环境中应用神经网络的机会，但也提到了潜在的风险。

更新时间: 2024-05-27 14:40:03

领域: cs.LG

下载: http://arxiv.org/abs/2405.17222v1

Bayesian Safe Policy Learning with Chance Constrained Optimization: Application to Military Security Assessment during the Vietnam War

Algorithmic decisions and recommendations are used in many high-stakes decision-making settings such as criminal justice, medicine, and public policy. We investigate whether it would have been possible to improve a security assessment algorithm employed during the Vietnam War, using outcomes measured immediately after its introduction in late 1969. This empirical application raises several methodological challenges that frequently arise in high-stakes algorithmic decision-making. First, before implementing a new algorithm, it is essential to characterize and control the risk of yielding worse outcomes than the existing algorithm. Second, the existing algorithm is deterministic, and learning a new algorithm requires transparent extrapolation. Third, the existing algorithm involves discrete decision tables that are difficult to optimize over. To address these challenges, we introduce the Average Conditional Risk (ACRisk), which first quantifies the risk that a new algorithmic policy leads to worse outcomes for subgroups of individual units and then averages this over the distribution of subgroups. We also propose a Bayesian policy learning framework that maximizes the posterior expected value while controlling the posterior expected ACRisk. This framework separates the estimation of heterogeneous treatment effects from policy optimization, enabling flexible estimation of effects and optimization over complex policy classes. We characterize the resulting chance-constrained optimization problem as a constrained linear programming problem. Our analysis shows that compared to the actual algorithm used during the Vietnam War, the learned algorithm assesses most regions as more secure and emphasizes economic and political factors over military factors.

Updated: 2024-05-27 14:39:43

标题: 贝叶斯安全策略学习与概率约束优化：应用于越战期间的军事安全评估

摘要: 算法决策和建议在许多高风险决策环境中被使用，如刑事司法、医学和公共政策。我们调查了在越南战争期间使用的安全评估算法是否有可能在1969年底引入后改进。这一实证应用提出了在高风险算法决策中经常出现的几个方法论挑战。首先，在实施新算法之前，必须对产生比现有算法更糟糕结果的风险进行表征和控制。其次，现有算法是确定性的，学习新算法需要透明的外推。第三，现有算法涉及难以优化的离散决策表。为了解决这些挑战，我们引入了平均条件风险（ACRisk），首先量化新算法政策导致个体单位子群更糟糕结果的风险，然后将其平均化到子群分布上。我们还提出了一个贝叶斯政策学习框架，该框架在控制后验期望ACRisk的同时最大化后验期望值。该框架将异质处理效应的估计与政策优化分开，实现了对效应的灵活估计和对复杂政策类的优化。我们将由此产生的机会约束优化问题描述为受约束的线性规划问题。我们的分析显示，与越南战争期间使用的实际算法相比，学习到的算法将大多数地区评估为更安全，并强调经济和政治因素而不是军事因素。

更新时间: 2024-05-27 14:39:43

领域: cs.LG,stat.AP

下载: http://arxiv.org/abs/2307.08840v2

Generative Plant Growth Simulation from Sequence-Informed Environmental Conditions

A plant growth simulation can be characterized as a reconstructed visual representation of a plant or plant system. The phenotypic characteristics and plant structures are controlled by the scene environment and other contextual attributes. Considering the temporal dependencies and compounding effects of various factors on growth trajectories, we formulate a probabilistic approach to the simulation task by solving a frame synthesis and pattern recognition problem. We introduce a sequence-informed plant growth simulation framework (SI-PGS) that employs a conditional generative model to implicitly learn a distribution of possible plant representations within a dynamic scene from a fusion of low dimensional temporal sensor and context data. Methods such as controlled latent sampling and recurrent output connections are used to improve coherence in the plant structures between frames of predictions. In this work, we demonstrate that SI-PGS is able to capture temporal dependencies and continuously generate realistic frames of plant growth.

Updated: 2024-05-27 14:35:49

标题: 基于序列信息的环境条件生成植物生长模拟

摘要: 一种植物生长模拟可以被描述为对植物或植物系统的重建视觉表示。表型特征和植物结构受场景环境和其他情境属性的控制。考虑到各种因素对生长轨迹的时间依赖性和复合效应，我们通过解决帧合成和模式识别问题，提出了一种概率方法来进行模拟任务。我们引入了一个序列感知的植物生长模拟框架（SI-PGS），该框架利用条件生成模型从低维度时间传感器和环境数据的融合中隐式学习动态场景中可能的植物表示的分布。采用控制潜在抽样和经常性输出连接等方法来改善预测帧之间的植物结构的连贯性。在这项工作中，我们展示了SI-PGS能够捕捉时间依赖性并持续生成逼真的植物生长帧。

更新时间: 2024-05-27 14:35:49

领域: cs.CV,cs.AI,q-bio.QM

下载: http://arxiv.org/abs/2405.14796v2

Autoformalizing Euclidean Geometry

Autoformalization involves automatically translating informal math into formal theorems and proofs that are machine-verifiable. Euclidean geometry provides an interesting and controllable domain for studying autoformalization. In this paper, we introduce a neuro-symbolic framework for autoformalizing Euclidean geometry, which combines domain knowledge, SMT solvers, and large language models (LLMs). One challenge in Euclidean geometry is that informal proofs rely on diagrams, leaving gaps in texts that are hard to formalize. To address this issue, we use theorem provers to fill in such diagrammatic information automatically, so that the LLM only needs to autoformalize the explicit textual steps, making it easier for the model. We also provide automatic semantic evaluation for autoformalized theorem statements. We construct LeanEuclid, an autoformalization benchmark consisting of problems from Euclid's Elements and the UniGeo dataset formalized in the Lean proof assistant. Experiments with GPT-4 and GPT-4V show the capability and limitations of state-of-the-art LLMs on autoformalizing geometry problems. The data and code are available at https://github.com/loganrjmurphy/LeanEuclid.

Updated: 2024-05-27 14:35:10

标题: 自动形式化欧几里得几何学

摘要: 自动形式化涉及将非正式数学自动翻译为机器可验证的形式定理和证明。欧几里德几何为研究自动形式化提供了一个有趣且可控的领域。在本文中，我们引入了一个神经符号框架用于自动形式化欧几里德几何，该框架结合了领域知识、SMT求解器和大型语言模型（LLM）。欧几里德几何中的一个挑战是非正式证明依赖于图表，导致文本中存在难以形式化的空白。为了解决这个问题，我们使用定理证明器自动填充这些图表信息，使得LLM只需自动形式化明确的文本步骤，从而使模型更容易处理。我们还为自动形式化的定理陈述提供自动语义评估。我们构建了LeanEuclid，一个由欧几里德《原本》和UniGeo数据集问题组成的自动形式化基准，并在Lean证明助手中进行了形式化。使用GPT-4和GPT-4V对自动形式化几何问题的能力和限制进行了实验。数据和代码可在https://github.com/loganrjmurphy/LeanEuclid 上找到。

更新时间: 2024-05-27 14:35:10

领域: cs.LG,cs.AI,cs.LO,stat.ML

下载: http://arxiv.org/abs/2405.17216v1

Spectral-Refiner: Fine-Tuning of Accurate Spatiotemporal Neural Operator for Turbulent Flows

Recent advancements in operator-type neural networks have shown promising results in approximating the solutions of spatiotemporal Partial Differential Equations (PDEs). However, these neural networks often entail considerable training expenses, and may not always achieve the desired accuracy required in many scientific and engineering disciplines. In this paper, we propose a new Spatiotemporal Fourier Neural Operator (SFNO) that learns maps between Bochner spaces, and a new learning framework to address these issues. This new paradigm leverages wisdom from traditional numerical PDE theory and techniques to refine the pipeline of commonly adopted end-to-end neural operator training and evaluations. Specifically, in the learning problems for the turbulent flow modeling by the Navier-Stokes Equations (NSE), the proposed architecture initiates the training with a few epochs for SFNO, concluding with the freezing of most model parameters. Then, the last linear spectral convolution layer is fine-tuned without the frequency truncation. The optimization uses a negative Sobolev norm for the first time as the loss in operator learning, defined through a reliable functional-type \emph{a posteriori} error estimator whose evaluation is almost exact thanks to the Parseval identity. This design allows the neural operators to effectively tackle low-frequency errors while the relief of the de-aliasing filter addresses high-frequency errors. Numerical experiments on commonly used benchmarks for the 2D NSE demonstrate significant improvements in both computational efficiency and accuracy, compared to end-to-end evaluation and traditional numerical PDE solvers.

Updated: 2024-05-27 14:33:06

标题: 光谱精化器：用于湍流流动的准确时空神经算子的微调

摘要: 最近在操作员类型的神经网络方面取得的进展显示出在逼近时空偏微分方程（PDEs）解方面的前景。然而，这些神经网络往往需要相当大的训练成本，并且在许多科学和工程学科中并不总能达到所需的精度。在本文中，我们提出了一种新的时空傅立叶神经操作员（SFNO），学习波赫纳空间之间的映射，并提出了一种新的学习框架来解决这些问题。这种新的范式利用了传统数值PDE理论和技术的智慧，以改进通常采用的端到端神经操作员训练和评估流程。具体而言，在通过Navier-Stokes方程（NSE）对湍流流建模的学习问题中，建议的架构首先对SFNO进行几个时代的训练，最终冻结大部分模型参数。然后，调整最后一个线性谱卷积层，而无需进行频率截断。优化首次使用负Sobolev范数作为操作员学习中的损失，通过一个可靠的函数型后验误差估计器定义，其评估几乎是精确的，这要归功于Parseval恒等式。这种设计使神经操作员能够有效地处理低频错误，而去混叠滤波器则解决了高频错误。针对2D NSE的常用基准数值实验表明，与端到端评估和传统数值PDE求解器相比，在计算效率和准确性方面取得了显著的改进。

更新时间: 2024-05-27 14:33:06

领域: cs.LG,cs.NA,math.NA,physics.flu-dyn,65M70 (Primary), 35Q30, 76M22, 65M50, 68T07 (Secondary)

下载: http://arxiv.org/abs/2405.17211v1

Language-guided Skill Learning with Temporal Variational Inference

We present an algorithm for skill discovery from expert demonstrations. The algorithm first utilizes Large Language Models (LLMs) to propose an initial segmentation of the trajectories. Following that, a hierarchical variational inference framework incorporates the LLM-generated segmentation information to discover reusable skills by merging trajectory segments. To further control the trade-off between compression and reusability, we introduce a novel auxiliary objective based on the Minimum Description Length principle that helps guide this skill discovery process. Our results demonstrate that agents equipped with our method are able to discover skills that help accelerate learning and outperform baseline skill learning approaches on new long-horizon tasks in BabyAI, a grid world navigation environment, as well as ALFRED, a household simulation environment.

Updated: 2024-05-27 14:31:38

标题: Language-guided Skill Learning with Temporal Variational Inference 利用时间变分推断指导的技能学习

摘要: 我们提出了一种从专家演示中发现技能的算法。该算法首先利用大型语言模型（LLMs）提出轨迹的初始分割。随后，一个层次化变分推断框架将LLM生成的分割信息融入其中，通过合并轨迹段来发现可重复利用的技能。为了进一步控制压缩和可重用性之间的权衡，我们引入了一个基于最小描述长度原则的新辅助目标，帮助指导这个技能发现过程。我们的结果表明，使用我们的方法装备的代理能够发现有助于加速学习的技能，并在BabyAI（一个网格世界导航环境）以及ALFRED（一个家庭模拟环境）的新长期任务上表现优于基线技能学习方法。

更新时间: 2024-05-27 14:31:38

领域: cs.LG,cs.AI,cs.CL

下载: http://arxiv.org/abs/2402.16354v2

MAML MOT: Multiple Object Tracking based on Meta-Learning

With the advancement of video analysis technology, the multi-object tracking (MOT) problem in complex scenes involving pedestrians is gaining increasing importance. This challenge primarily involves two key tasks: pedestrian detection and re-identification. While significant progress has been achieved in pedestrian detection tasks in recent years, enhancing the effectiveness of re-identification tasks remains a persistent challenge. This difficulty arises from the large total number of pedestrian samples in multi-object tracking datasets and the scarcity of individual instance samples. Motivated by recent rapid advancements in meta-learning techniques, we introduce MAML MOT, a meta-learning-based training approach for multi-object tracking. This approach leverages the rapid learning capability of meta-learning to tackle the issue of sample scarcity in pedestrian re-identification tasks, aiming to improve the model's generalization performance and robustness. Experimental results demonstrate that the proposed method achieves high accuracy on mainstream datasets in the MOT Challenge. This offers new perspectives and solutions for research in the field of pedestrian multi-object tracking.

Updated: 2024-05-27 14:30:44

标题: MAML MOT：基于元学习的多目标跟踪

摘要: 随着视频分析技术的进步，涉及行人的复杂场景中的多目标跟踪（MOT）问题变得越来越重要。这一挑战主要涉及两个关键任务：行人检测和重新识别。近年来在行人检测任务中取得了显著进展，但提高重新识别任务的有效性仍然是一个持续的挑战。这一困难源于多目标跟踪数据集中大量行人样本和个体实例样本的稀缺性。受最近元学习技术的快速进展的启发，我们引入了一种基于元学习的多目标跟踪训练方法MAML MOT。该方法利用元学习的快速学习能力来解决行人重新识别任务中样本稀缺性的问题，旨在提高模型的泛化性能和鲁棒性。实验结果表明，所提出的方法在MOT挑战的主流数据集上实现了高精度，为行人多目标跟踪领域的研究提供了新的视角和解决方案。

更新时间: 2024-05-27 14:30:44

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2405.07272v2

PoCo: Policy Composition from and for Heterogeneous Robot Learning

Training general robotic policies from heterogeneous data for different tasks is a significant challenge. Existing robotic datasets vary in different modalities such as color, depth, tactile, and proprioceptive information, and collected in different domains such as simulation, real robots, and human videos. Current methods usually collect and pool all data from one domain to train a single policy to handle such heterogeneity in tasks and domains, which is prohibitively expensive and difficult. In this work, we present a flexible approach, dubbed Policy Composition, to combine information across such diverse modalities and domains for learning scene-level and task-level generalized manipulation skills, by composing different data distributions represented with diffusion models. Our method can use task-level composition for multi-task manipulation and be composed with analytic cost functions to adapt policy behaviors at inference time. We train our method on simulation, human, and real robot data and evaluate in tool-use tasks. The composed policy achieves robust and dexterous performance under varying scenes and tasks and outperforms baselines from a single data source in both simulation and real-world experiments. See https://liruiw.github.io/policycomp for more details .

Updated: 2024-05-27 14:29:57

标题: PoCo：来自和针对异构机器人学习的策略组合

摘要: 从异构数据中训练通用机器人策略以用于不同任务是一个重要的挑战。现有的机器人数据集在颜色、深度、触觉和本体感等不同模态上存在差异，并且在模拟、真实机器人和人类视频等不同领域中收集。目前的方法通常会收集并汇总来自一个领域的所有数据，以训练一个单一策略来处理任务和领域的异质性，这是代价高昂且困难的。在这项工作中，我们提出了一种灵活的方法，称为策略组合，通过组合用扩散模型表示的不同数据分布，结合这些多样的模态和领域信息，以学习场景级别和任务级别的通用操纵技能。我们的方法可以使用任务级别的组合进行多任务操纵，并可以与分析成本函数组合，以在推理时调整策略行为。我们在模拟、人类和真实机器人数据上训练我们的方法，并在工具使用任务中进行评估。组合策略在不同场景和任务下表现出稳健和灵巧的性能，并在模拟和真实世界实验中优于来自单一数据源的基线。更多详情请参见https://liruiw.github.io/policycomp。

更新时间: 2024-05-27 14:29:57

领域: cs.RO,cs.LG

下载: http://arxiv.org/abs/2402.02511v2

Federated Neuro-Symbolic Learning

Neuro-symbolic learning (NSL) models complex symbolic rule patterns into latent variable distributions by neural networks, which reduces rule search space and generates unseen rules to improve downstream task performance. Centralized NSL learning involves directly acquiring data from downstream tasks, which is not feasible for federated learning (FL). To address this limitation, we shift the focus from such a one-to-one interactive neuro-symbolic paradigm to one-to-many Federated Neuro-Symbolic Learning framework (FedNSL) with latent variables as the FL communication medium. Built on the basis of our novel reformulation of the NSL theory, FedNSL is capable of identifying and addressing rule distribution heterogeneity through a simple and effective Kullback-Leibler (KL) divergence constraint on rule distribution applicable under the FL setting. It further theoretically adjusts variational expectation maximization (V-EM) to reduce the rule search space across domains. This is the first incorporation of distribution-coupled bilevel optimization into FL. Extensive experiments based on both synthetic and real-world data demonstrate significant advantages of FedNSL compared to five state-of-the-art methods. It outperforms the best baseline by 17% and 29% in terms of unbalanced average training accuracy and unseen average testing accuracy, respectively.

Updated: 2024-05-27 14:29:29

标题: 联邦式神经符号学习

摘要: 神经符号学习（NSL）模型通过神经网络将复杂的符号规则模式转化为潜变量分布，从而减少规则搜索空间并生成未见规则以提高下游任务性能。集中式NSL学习涉及直接从下游任务中获取数据，这对于联邦学习（FL）来说是不可行的。为了解决这一限制，我们将重点从一对一的交互式神经符号范式转变为一对多的联邦神经符号学习框架（FedNSL），其中潜变量作为FL通信媒介。基于我们对NSL理论的新颖重新构造，FedNSL能够通过在FL设置下适用的简单有效的Kullback-Leibler（KL）散度约束来识别和解决规则分布异质性。它进一步在理论上调整变分期望最大化（V-EM）以减少跨领域的规则搜索空间。这是第一次将分布耦合的双层优化方法引入FL中。基于合成和真实数据的大量实验表明，与五种最先进的方法相比，FedNSL具有显著优势。在不平衡的平均训练准确率和未见平均测试准确率方面，它分别比最佳基准线提高了17%和29%。

更新时间: 2024-05-27 14:29:29

领域: cs.AI

下载: http://arxiv.org/abs/2308.15324v2

TimeMIL: Advancing Multivariate Time Series Classification via a Time-aware Multiple Instance Learning

Deep neural networks, including transformers and convolutional neural networks, have significantly improved multivariate time series classification (MTSC). However, these methods often rely on supervised learning, which does not fully account for the sparsity and locality of patterns in time series data (e.g., diseases-related anomalous points in ECG). To address this challenge, we formally reformulate MTSC as a weakly supervised problem, introducing a novel multiple-instance learning (MIL) framework for better localization of patterns of interest and modeling time dependencies within time series. Our novel approach, TimeMIL, formulates the temporal correlation and ordering within a time-aware MIL pooling, leveraging a tokenized transformer with a specialized learnable wavelet positional token. The proposed method surpassed 26 recent state-of-the-art methods, underscoring the effectiveness of the weakly supervised TimeMIL in MTSC. The code will be available at https://github.com/xiwenc1/TimeMIL.

Updated: 2024-05-27 14:26:21

标题: TimeMIL：通过基于时间的多实例学习推进多元时间序列分类

摘要: 深度神经网络，包括transformers和卷积神经网络，显著改进了多元时间序列分类(MTSC)。然而，这些方法通常依赖于监督学习，无法充分考虑时间序列数据中的模式稀疏性和局部性(例如，心电图中与疾病相关的异常点)。为了解决这一挑战，我们正式将MTSC重新定义为一个弱监督问题，引入了一个新颖的多实例学习(MIL)框架，以更好地定位感兴趣的模式，并对时间序列中的时间依赖性进行建模。我们的新方法TimeMIL在一个时间感知的MIL汇聚中构建了时间相关性和顺序性，利用了一个专门的可学习的小波位置标记的tokenized transformer。提出的方法超越了26种最新的最先进方法，突显了弱监督TimeMIL在MTSC中的有效性。代码将在https://github.com/xiwenc1/TimeMIL上提供。

更新时间: 2024-05-27 14:26:21

领域: cs.LG

下载: http://arxiv.org/abs/2405.03140v2

Efficient multi-prompt evaluation of LLMs

Most popular benchmarks for comparing LLMs rely on a limited set of prompt templates, which may not fully capture the LLMs' abilities and can affect the reproducibility of results on leaderboards. Many recent works empirically verify prompt sensitivity and advocate for changes in LLM evaluation. In this paper, we consider the problem of estimating the performance distribution across many prompt variants instead of finding a single prompt to evaluate with. We introduce PromptEval, a method for estimating performance across a large set of prompts borrowing strength across prompts and examples to produce accurate estimates under practical evaluation budgets. The resulting distribution can be used to obtain performance quantiles to construct various robust performance metrics (e.g., top 95% quantile or median). We prove that PromptEval consistently estimates the performance distribution and demonstrate its efficacy empirically on three prominent LLM benchmarks: MMLU, BIG-bench Hard, and LMentry. For example, PromptEval can accurately estimate performance quantiles across 100 prompt templates on MMLU with a budget equivalent to two single-prompt evaluations. Our code and data can be found at https://github.com/felipemaiapolo/prompt-eval.

Updated: 2024-05-27 14:24:47

标题: LLMs的高效多提示评估

摘要: 对于比较LLM模型而言，最流行的基准测试依赖于一组有限的提示模板，这可能无法充分捕捉LLM模型的能力，并可能影响排行榜上结果的可重复性。许多最近的研究通过实证验证提示的敏感性，并主张改变LLM模型评估方法。在本文中，我们考虑了估计跨多个提示变体的性能分布的问题，而不是找到一个单一的提示进行评估。我们介绍了PromptEval，一种估算跨大量提示的性能的方法，通过跨提示和示例借用强度来产生在实际评估预算下准确估计。得到的分布可用于获得性能分位数，以构建各种健壮的性能指标（例如，顶部95%分位数或中位数）。我们证明了PromptEval一致地估计了性能分布，并在三个知名的LLM基准测试上在实证上证明了其有效性：MMLU、BIG-bench Hard和LMentry。例如，PromptEval可以在等价于两个单提示评估的预算下准确估计MMLU上100个提示模板的性能分位数。我们的代码和数据可以在https://github.com/felipemaiapolo/prompt-eval 找到。

更新时间: 2024-05-27 14:24:47

领域: cs.CL,cs.AI,cs.LG,stat.ML

下载: http://arxiv.org/abs/2405.17202v1

Kronecker-Factored Approximate Curvature for Physics-Informed Neural Networks

Physics-informed neural networks (PINNs) are infamous for being hard to train. Recently, second-order methods based on natural gradient and Gauss-Newton methods have shown promising performance, improving the accuracy achieved by first-order methods by several orders of magnitude. While promising, the proposed methods only scale to networks with a few thousand parameters due to the high computational cost to evaluate, store, and invert the curvature matrix. We propose Kronecker-factored approximate curvature (KFAC) for PINN losses that greatly reduces the computational cost and allows scaling to much larger networks. Our approach goes beyond the established KFAC for traditional deep learning problems as it captures contributions from a PDE's differential operator that are crucial for optimization. To establish KFAC for such losses, we use Taylor-mode automatic differentiation to describe the differential operator's computation graph as a forward network with shared weights. This allows us to apply KFAC thanks to a recently-developed general formulation for networks with weight sharing. Empirically, we find that our KFAC-based optimizers are competitive with expensive second-order methods on small problems, scale more favorably to higher-dimensional neural networks and PDEs, and consistently outperform first-order methods and LBFGS.

Updated: 2024-05-27 14:23:46

标题: Kronecker分解近似曲率用于物理信息神经网络

摘要: 物理信息神经网络（PINNs）以难以训练而闻名。最近，基于自然梯度和高斯-牛顿方法的二阶方法显示出有希望的表现，将第一阶方法达到的准确性提高了几个数量级。尽管有希望，所提出的方法仅适用于具有少量参数的网络，因为评估、存储和反转曲率矩阵的计算成本很高。我们提出了适用于PINN损失的Kronecker分解近似曲率（KFAC），大大降低了计算成本，并允许扩展到更大的网络。我们的方法超越了传统深度学习问题的KFAC，因为它捕捉了对优化至关重要的PDE微分算子的贡献。为了建立这种损失函数的KFAC，我们使用Taylor模式自动微分将微分算子的计算图描述为一个具有共享权重的前向网络。这使我们能够应用KFAC，得益于最近针对具有权重共享的网络开发的一般性公式。在经验上，我们发现我们基于KFAC的优化器在小问题上与昂贵的二阶方法竞争，更有利地扩展到高维神经网络和PDE，并始终优于第一阶方法和LBFGS。

更新时间: 2024-05-27 14:23:46

领域: cs.LG,physics.comp-ph

下载: http://arxiv.org/abs/2405.15603v2

SmoothGNN: Smoothing-based GNN for Unsupervised Node Anomaly Detection

The smoothing issue leads to indistinguishable node representations, which poses a significant challenge in the field of graph learning. However, this issue also presents an opportunity to reveal underlying properties behind different types of nodes, which have been overlooked in previous studies. Through empirical and theoretical analysis of real-world node anomaly detection (NAD) datasets, we observe that anomalous and normal nodes show different patterns in the smoothing process, which can be leveraged to enhance NAD tasks. Motivated by these findings, in this paper, we propose a novel unsupervised NAD framework. Specifically, according to our theoretical analysis, we design a Smoothing Learning Component. Subsequently, we introduce a Smoothing-aware Spectral Graph Neural Network, which establishes the connection between the spectral space of graphs and the smoothing process. Additionally, we demonstrate that the Dirichlet Energy, which reflects the smoothness of a graph, can serve as coefficients for node representations across different dimensions of the spectral space. Building upon these observations and analyses, we devise a novel anomaly measure for the NAD task. Extensive experiments on 9 real-world datasets show that SmoothGNN outperforms the best rival by an average of 14.66% in AUC and 7.28% in Precision, with 75x running time speed-up, which validates the effectiveness and efficiency of our framework.

Updated: 2024-05-27 14:23:30

标题: SmoothGNN：基于平滑的GNN用于无监督节点异常检测

摘要: 平滑问题导致节点表示无法区分，这在图学习领域提出了显著挑战。然而，这个问题也为揭示不同类型节点背后的潜在属性提供了机会，这在先前的研究中被忽视了。通过对真实世界节点异常检测（NAD）数据集的经验和理论分析，我们观察到异常节点和正常节点在平滑过程中展现出不同的模式，这可以用来增强NAD任务。基于这些发现，在本文中，我们提出了一种新颖的无监督NAD框架。具体地，根据我们的理论分析，我们设计了一个平滑学习组件。随后，我们引入了一个平滑感知的谱图神经网络，建立了图的谱空间与平滑过程之间的联系。此外，我们证明了狄利克雷能量，反映了图的平滑性，可以作为跨不同维度的谱空间的节点表示的系数。基于这些观察和分析，我们设计了一种新颖的NAD任务异常度量。对9个真实世界数据集的广泛实验表明，SmoothGNN在AUC方面平均优于最佳竞争对手14.66％，在Precision方面优于7.28％，并且运行时间加速了75倍，验证了我们框架的有效性和效率。

更新时间: 2024-05-27 14:23:30

领域: cs.LG

下载: http://arxiv.org/abs/2405.17525v1

Convex Relaxation for Solving Large-Margin Classifiers in Hyperbolic Space

Hyperbolic spaces have increasingly been recognized for their outstanding performance in handling data with inherent hierarchical structures compared to their Euclidean counterparts. However, learning in hyperbolic spaces poses significant challenges. In particular, extending support vector machines to hyperbolic spaces is in general a constrained non-convex optimization problem. Previous and popular attempts to solve hyperbolic SVMs, primarily using projected gradient descent, are generally sensitive to hyperparameters and initializations, often leading to suboptimal solutions. In this work, by first rewriting the problem into a polynomial optimization, we apply semidefinite relaxation and sparse moment-sum-of-squares relaxation to effectively approximate the optima. From extensive empirical experiments, these methods are shown to perform better than the projected gradient descent approach.

Updated: 2024-05-27 14:19:53

标题: 在双曲空间中解决大间隔分类器的凸松弛

摘要: 超几何空间越来越被认为在处理具有固有层次结构的数据方面表现出色，与它们的欧几里德对应物相比。然而，在超几何空间中学习存在重大挑战。特别是，将支持向量机扩展到超几何空间通常是一个受约束的非凸优化问题。先前和流行的解决超几何支持向量机问题的尝试，主要使用投影梯度下降，通常对超参数和初始化敏感，往往导致次优解。在这项工作中，通过首先将问题重写为一个多项式优化问题，我们应用半定松弛和稀疏矩和平方法来有效地逼近最优解。通过大量的经验实验证明，这些方法表现比投影梯度下降方法更好。

更新时间: 2024-05-27 14:19:53

领域: cs.LG,math.OC

下载: http://arxiv.org/abs/2405.17198v1

Effective Learning with Node Perturbation in Multi-Layer Neural Networks

Backpropagation (BP) remains the dominant and most successful method for training parameters of deep neural network models. However, BP relies on two computationally distinct phases, does not provide a satisfactory explanation of biological learning, and can be challenging to apply for training of networks with discontinuities or noisy node dynamics. By comparison, node perturbation (NP) proposes learning by the injection of noise into network activations, and subsequent measurement of the induced loss change. NP relies on two forward (inference) passes, does not make use of network derivatives, and has been proposed as a model for learning in biological systems. However, standard NP is highly data inefficient and unstable due to its unguided noise-based search process. In this work, we investigate different formulations of NP and relate it to the concept of directional derivatives as well as combining it with a decorrelating mechanism for layer-wise inputs. We find that a closer alignment with directional derivatives together with input decorrelation at every layer strongly enhances performance of NP learning with large improvements in parameter convergence and much higher performance on the test data, approaching that of BP. Furthermore, our novel formulation allows for application to noisy systems in which the noise process itself is inaccessible.

Updated: 2024-05-27 14:15:45

标题: 多层神经网络中节点扰动的有效学习

摘要: 反向传播（BP）仍然是训练深度神经网络模型参数的主要和最成功的方法。然而，BP依赖于两个计算上不同的阶段，不能对生物学学习提供令人满意的解释，并且在训练具有不连续性或噪声节点动态的网络时可能具有挑战性。相比之下，节点扰动（NP）提出通过将噪声注入网络激活中进行学习，并测量引起的损失变化。NP依赖于两次前向（推理）传递，不利用网络导数，并被提出作为生物系统中学习的模型。然而，标准的NP由于其未经引导的基于噪声的搜索过程，因此高度数据效率低下且不稳定。在这项工作中，我们研究了NP的不同表达形式，并将其与方向导数的概念以及与层输入的去相关机制相结合。我们发现，与方向导数更加接近，并且在每一层进行输入去相关可以显著增强NP学习的性能，参数收敛得到了大幅改善，并且在测试数据上表现更好，接近于BP的性能。此外，我们的新颖表达形式允许应用于噪声系统，其中噪声过程本身是不可访问的。

更新时间: 2024-05-27 14:15:45

领域: cs.LG

下载: http://arxiv.org/abs/2310.00965v4

R2D2 image reconstruction with model uncertainty quantification in radio astronomy

The ``Residual-to-Residual DNN series for high-Dynamic range imaging'' (R2D2) approach was recently introduced for Radio-Interferometric (RI) imaging in astronomy. R2D2's reconstruction is formed as a series of residual images, iteratively estimated as outputs of Deep Neural Networks (DNNs) taking the previous iteration's image estimate and associated data residual as inputs. In this work, we investigate the robustness of the R2D2 image estimation process, by studying the uncertainty associated with its series of learned models. Adopting an ensemble averaging approach, multiple series can be trained, arising from different random DNN initializations of the training process at each iteration. The resulting multiple R2D2 instances can also be leveraged to generate ``R2D2 samples'', from which empirical mean and standard deviation endow the algorithm with a joint estimation and uncertainty quantification functionality. Focusing on RI imaging, and adopting a telescope-specific approach, multiple R2D2 instances were trained to encompass the most general observation setting of the Very Large Array (VLA). Simulations and real-data experiments confirm that: (i) R2D2's image estimation capability is superior to that of the state-of-the-art algorithms; (ii) its ultra-fast reconstruction capability (arising from series with only few DNNs) makes the computation of multiple reconstruction samples and of uncertainty maps practical even at large image dimension; (iii) it is characterized by a very low model uncertainty.

Updated: 2024-05-27 14:14:55

标题: 在射电天文学中对R2D2图像重建进行模型不确定性量化

摘要: 最近引入了“高动态范围成像的残差到残差DNN系列”（R2D2）方法，用于天文学中的射电干涉成像。R2D2的重建被形成为一系列残差图像，通过深度神经网络（DNNs）的输出来迭代估计，这些DNNs将前一次迭代的图像估计和相关数据残差作为输入。在这项工作中，我们研究了R2D2图像估计过程的稳健性，通过研究与其一系列学习模型相关的不确定性。采用集成平均方法，可以训练多个系列，这些系列来自在每次迭代的训练过程中不同的随机DNN初始化。由此产生的多个R2D2实例还可以用来生成“R2D2样本”，从中经验均值和标准偏差赋予该算法联合估计和不确定性量化功能。专注于射电干涉成像，并采用望远镜特定方法，训练了多个R2D2实例以涵盖非常大阵列（VLA）的最一般观测设置。模拟和真实数据实验证实：（i）R2D2的图像估计能力优于现有算法；（ii）其超快速重建能力（源自仅有少量DNNs的系列）使得即使在大图像尺寸下，计算多个重建样本和不确定性图是实际可行的；（iii）它以非常低的模型不确定性为特征。

更新时间: 2024-05-27 14:14:55

领域: astro-ph.IM,cs.LG,eess.IV,eess.SP

下载: http://arxiv.org/abs/2403.18052v2

SoK: Leveraging Transformers for Malware Analysis

The introduction of transformers has been an important breakthrough for AI research and application as transformers are the foundation behind Generative AI. A promising application domain for transformers is cybersecurity, in particular the malware domain analysis. The reason is the flexibility of the transformer models in handling long sequential features and understanding contextual relationships. However, as the use of transformers for malware analysis is still in the infancy stage, it is critical to evaluate, systematize, and contextualize existing literature to foster future research. This Systematization of Knowledge (SoK) paper aims to provide a comprehensive analysis of transformer-based approaches designed for malware analysis. Based on our systematic analysis of existing knowledge, we structure and propose taxonomies based on: (a) how different transformers are adapted, organized, and modified across various use cases; and (b) how diverse feature types and their representation capabilities are reflected. We also provide an inventory of datasets used to explore multiple research avenues in the use of transformers for malware analysis and discuss open challenges with future research directions. We believe that this SoK paper will assist the research community in gaining detailed insights from existing work and will serve as a foundational resource for implementing novel research using transformers for malware analysis.

Updated: 2024-05-27 14:14:07

标题: SoK: 利用变压器进行恶意软件分析

摘要: 引入transformers对人工智能研究和应用是一项重要突破，因为transformers是生成式人工智能的基础。transformers在处理长序列特征和理解上下文关系方面的灵活性使其成为网络安全领域的一个有前途的应用领域，特别是在恶意软件领域分析方面。然而，由于transformers用于恶意软件分析的应用仍处于初期阶段，因此评估、系统化和将现有文献置于上下文中以促进未来研究至关重要。本文旨在提供一个基于transformer的方法用于恶意软件分析的系统化知识（SoK）分析。基于我们对现有知识的系统性分析，我们根据：（a）不同transformers如何在各种用例中被调整、组织和修改；（b）不同特征类型及其表征能力如何反映，提出分类学。我们还提供了用于探索在使用transformers进行恶意软件分析的多个研究途径的数据集清单，并讨论未来研究方向中的挑战和开放问题。我们相信，这篇SoK论文将帮助研究社区从现有工作中获得详细的见解，并作为一种基础资源，用于实施使用transformers进行恶意软件分析的新颖研究。

更新时间: 2024-05-27 14:14:07

领域: cs.CR

下载: http://arxiv.org/abs/2405.17190v1

Memorize What Matters: Emergent Scene Decomposition from Multitraverse

Humans naturally retain memories of permanent elements, while ephemeral moments often slip through the cracks of memory. This selective retention is crucial for robotic perception, localization, and mapping. To endow robots with this capability, we introduce 3D Gaussian Mapping (3DGM), a self-supervised, camera-only offline mapping framework grounded in 3D Gaussian Splatting. 3DGM converts multitraverse RGB videos from the same region into a Gaussian-based environmental map while concurrently performing 2D ephemeral object segmentation. Our key observation is that the environment remains consistent across traversals, while objects frequently change. This allows us to exploit self-supervision from repeated traversals to achieve environment-object decomposition. More specifically, 3DGM formulates multitraverse environmental mapping as a robust differentiable rendering problem, treating pixels of the environment and objects as inliers and outliers, respectively. Using robust feature distillation, feature residuals mining, and robust optimization, 3DGM jointly performs 3D mapping and 2D segmentation without human intervention. We build the Mapverse benchmark, sourced from the Ithaca365 and nuPlan datasets, to evaluate our method in unsupervised 2D segmentation, 3D reconstruction, and neural rendering. Extensive results verify the effectiveness and potential of our method for self-driving and robotics.

Updated: 2024-05-27 14:11:17

标题: 记住重要的事情：从多通道中解析场景

摘要: 人类自然会保留永久元素的记忆，而短暂的时刻往往会从记忆中溜走。这种选择性的保留对于机器人的感知、定位和地图制作至关重要。为了赋予机器人这种能力，我们引入了3D高斯映射（3DGM），这是一个自监督的、仅基于摄像头的离线映射框架，基于3D高斯喷洒。3DGM将来自同一地区的多次穿越RGB视频转换为基于高斯的环境地图，同时执行2D短暂对象分割。我们的关键观察是，环境在不同穿越之间保持一致，而对象经常会发生变化。这使我们能够利用重复穿越的自监督来实现环境-对象分解。具体而言，3DGM将多次穿越环境映射形式化为一个强大的可微分渲染问题，将环境和对象的像素分别视为内点和外点。通过强大特征提炼、特征残差挖掘和强大优化，3DGM可以在不需要人为干预的情况下同时执行3D映射和2D分割。我们构建了Mapverse基准测试，从Ithaca365和nuPlan数据集中获取，以评估我们的方法在无监督2D分割、3D重建和神经渲染方面的效果。广泛的结果验证了我们的方法在自动驾驶和机器人技术中的有效性和潜力。

更新时间: 2024-05-27 14:11:17

领域: cs.CV,cs.AI,cs.RO

下载: http://arxiv.org/abs/2405.17187v1

Learning Personalized Decision Support Policies

Individual human decision-makers may benefit from different forms of support to improve decision outcomes, but when each form of support will yield better outcomes? In this work, we posit that personalizing access to decision support tools can be an effective mechanism for instantiating the appropriate use of AI assistance. Specifically, we propose the general problem of learning a decision support policy that, for a given input, chooses which form of support to provide to decision-makers for whom we initially have no prior information. We develop $\texttt{Modiste}$, an interactive tool to learn personalized decision support policies. $\texttt{Modiste}$ leverages stochastic contextual bandit techniques to personalize a decision support policy for each decision-maker and supports extensions to the multi-objective setting to account for auxiliary objectives like the cost of support. We find that personalized policies outperform offline policies, and, in the cost-aware setting, reduce the incurred cost with minimal degradation to performance. Our experiments include various realistic forms of support (e.g., expert consensus and predictions from a large language model) on vision and language tasks. Our human subject experiments validate our computational experiments, demonstrating that personalization can yield benefits in practice for real users, who interact with $\texttt{Modiste}$.

Updated: 2024-05-27 14:10:24

标题: 学习个性化决策支持策略

摘要: 个别的人类决策者可能会从不同形式的支持中受益，以改善决策结果，但是每种支持形式何时会产生更好的结果？在这项工作中，我们认为将个性化访问决策支持工具作为一种有效机制，用于实现正确使用人工智能辅助。具体来说，我们提出了一个学习决策支持策略的一般问题，对于给定的输入，选择为那些我们最初没有先前信息的决策者提供哪种形式的支持。我们开发了一个交互式工具$\texttt{Modiste}$，用于学习个性化决策支持策略。$\texttt{Modiste}$利用随机上下文强盗技术，为每个决策者个性化决策支持策略，并支持扩展到多目标设置，以考虑支持成本等辅助目标。我们发现，个性化策略优于离线策略，并且在成本感知设置中，降低了产生的成本，同时对性能的降级很小。我们的实验包括在视觉和语言任务中使用各种逼真的支持形式（例如，专家共识和来自大型语言模型的预测）。我们的人类受试者实验验证了我们的计算实验，证明了个性化在实践中对与$\texttt{Modiste}$互动的真实用户有益。

更新时间: 2024-05-27 14:10:24

领域: cs.LG,cs.AI,cs.CY,cs.HC

下载: http://arxiv.org/abs/2304.06701v2

RLHF from Heterogeneous Feedback via Personalization and Preference Aggregation

Reinforcement learning from human feedback (RLHF) has been an effective technique for aligning AI systems with human values, with remarkable successes in fine-tuning large-language models recently. Most existing RLHF paradigms make the underlying assumption that human preferences are relatively homogeneous, and can be encoded by a single reward model. In this paper, we focus on addressing the issues due to the inherent heterogeneity in human preferences, as well as their potential strategic behavior in providing feedback. Specifically, we propose two frameworks to address heterogeneous human feedback in principled ways: personalization-based one and aggregation-based one. For the former, we propose two approaches based on representation learning and clustering, respectively, for learning multiple reward models that trades off the bias (due to preference heterogeneity) and variance (due to the use of fewer data for learning each model by personalization). We then establish sample complexity guarantees for both approaches. For the latter, we aim to adhere to the single-model framework, as already deployed in the current RLHF paradigm, by carefully aggregating diverse and truthful preferences from humans. We propose two approaches based on reward and preference aggregation, respectively: the former utilizes both utilitarianism and Leximin approaches to aggregate individual reward models, with sample complexity guarantees; the latter directly aggregates the human feedback in the form of probabilistic opinions. Under the probabilistic-opinion-feedback model, we also develop an approach to handle strategic human labelers who may bias and manipulate the aggregated preferences with untruthful feedback. Based on the ideas in mechanism design, our approach ensures truthful preference reporting, with the induced aggregation rule maximizing social welfare functions.

Updated: 2024-05-27 14:08:40

标题: 通过个性化和偏好聚合来自异构反馈的RLHF

摘要: 人类反馈强化学习（RLHF）是一种有效的技术，可以将人工智能系统与人类价值观对齐，在最近微调大型语言模型方面取得了显著成功。大多数现有的RLHF范例假设人类偏好相对均匀，并可以通过单一奖励模型进行编码。本文着重解决人类偏好的固有异质性及其提供反馈时的潜在战略行为所引起的问题。具体而言，我们提出了两种框架以原则性地解决异质人类反馈：基于个性化和基于聚合。对于前者，我们提出了两种基于表示学习和聚类的方法，分别用于学习多个奖励模型，权衡了偏见（由于偏好异质性）和方差（由于使用较少数据来学习每个模型的个性化）。然后我们为这两种方法建立了样本复杂度保证。对于后者，我们旨在遵循单一模型框架，如已部署在当前RLHF范例中，通过仔细聚合来自人类的多样化和真实偏好。我们分别提出了两种基于奖励和偏好聚合的方法：前者利用效用主义和Leximin方法来聚合个体奖励模型，具有样本复杂度保证；后者直接以概率意见形式聚合人类反馈。在概率意见反馈模型下，我们还开发了一种处理可能会通过不真实反馈偏见和操纵聚合偏好的战略人类标记者的方法。基于机制设计的思想，我们的方法确保真实偏好报告，通过引发的聚合规则最大化社会福利函数。

更新时间: 2024-05-27 14:08:40

领域: cs.AI,cs.LG

下载: http://arxiv.org/abs/2405.00254v2

Exploring the Performance of Continuous-Time Dynamic Link Prediction Algorithms

Dynamic Link Prediction (DLP) addresses the prediction of future links in evolving networks. However, accurately portraying the performance of DLP algorithms poses challenges that might impede progress in the field. Importantly, common evaluation pipelines usually calculate ranking or binary classification metrics, where the scores of observed interactions (positives) are compared with those of randomly generated ones (negatives). However, a single metric is not sufficient to fully capture the differences between DLP algorithms, and is prone to overly optimistic performance evaluation. Instead, an in-depth evaluation should reflect performance variations across different nodes, edges, and time segments. In this work, we contribute tools to perform such a comprehensive evaluation. (1) We propose Birth-Death diagrams, a simple but powerful visualization technique that illustrates the effect of time-based train-test splitting on the difficulty of DLP on a given dataset. (2) We describe an exhaustive taxonomy of negative sampling methods that can be used at evaluation time. (3) We carry out an empirical study of the effect of the different negative sampling strategies. Our comparison between heuristics and state-of-the-art memory-based methods on various real-world datasets confirms a strong effect of using different negative sampling strategies on the test Area Under the Curve (AUC). Moreover, we conduct a visual exploration of the prediction, with additional insights on which different types of errors are prominent over time.

Updated: 2024-05-27 14:03:28

标题: 探究连续时间动态链路预测算法的性能

摘要: 动态链接预测（DLP）涉及在演变的网络中预测未来链接。然而，准确描绘DLP算法的性能会面临可能阻碍该领域进展的挑战。重要的是，常见的评估流程通常计算排名或二元分类指标，其中观察到的交互作用（正面）的得分与随机生成的交互作用（负面）的得分进行比较。然而，单一指标不足以充分捕捉DLP算法之间的差异，并且容易出现过于乐观的性能评估。相反，深入评估应该反映出不同节点、边缘和时间段之间的性能变化。在这项工作中，我们提出了工具来进行这样全面的评估。（1）我们提出了出生-死亡图，这是一种简单但强大的可视化技术，它说明了基于时间的训练-测试分割对于给定数据集上DLP的困难程度的影响。（2）我们描述了一种详尽的负采样方法分类法，可用于评估时使用。（3）我们进行了对不同负采样策略效果的实证研究。我们在各种真实数据集上比较启发式方法和最先进的基于记忆的方法，证实了使用不同负采样策略对测试下曲线下面积（AUC）的影响。此外，我们对预测进行了视觉探索，并提供了关于随时间主要出现的不同类型错误的额外见解。

更新时间: 2024-05-27 14:03:28

领域: cs.SI,cs.AI

下载: http://arxiv.org/abs/2405.17182v1

Spectral regularization for adversarially-robust representation learning

The vulnerability of neural network classifiers to adversarial attacks is a major obstacle to their deployment in safety-critical applications. Regularization of network parameters during training can be used to improve adversarial robustness and generalization performance. Usually, the network is regularized end-to-end, with parameters at all layers affected by regularization. However, in settings where learning representations is key, such as self-supervised learning (SSL), layers after the feature representation will be discarded when performing inference. For these models, regularizing up to the feature space is more suitable. To this end, we propose a new spectral regularizer for representation learning that encourages black-box adversarial robustness in downstream classification tasks. In supervised classification settings, we show empirically that this method is more effective in boosting test accuracy and robustness than previously-proposed methods that regularize all layers of the network. We then show that this method improves the adversarial robustness of classifiers using representations learned with self-supervised training or transferred from another classification task. In all, our work begins to unveil how representational structure affects adversarial robustness.

Updated: 2024-05-27 14:01:42

标题: 对抗性强健表示学习的谱正则化

摘要: 神经网络分类器对敌对攻击的脆弱性是它们在安全关键应用中部署的主要障碍。在训练期间对网络参数进行正则化可以提高对抗性的稳健性和泛化性能。通常，网络是端到端地进行正则化的，所有层的参数都受到正则化的影响。然而，在学习表示是关键的情况下，例如自监督学习（SSL），在执行推断时会丢弃特征表示后的层。对于这些模型，向特征空间正则化更加适合。为此，我们提出了一种新的谱正则化器用于表示学习，在下游分类任务中鼓励黑盒对抗稳健性。在监督分类设置中，我们在实证上表明，这种方法比先前提出的对网络所有层进行正则化的方法更有效地提高了测试准确性和稳健性。然后我们展示，这种方法提高了使用自监督训练学习的表示或从另一个分类任务转移得到的分类器的对抗稳健性。总的来说，我们的工作开始揭示表示结构如何影响对抗稳健性。

更新时间: 2024-05-27 14:01:42

领域: cs.LG,cs.CV

下载: http://arxiv.org/abs/2405.17181v1

Large Language Model as a Policy Teacher for Training Reinforcement Learning Agents

Recent studies have uncovered the potential of Large Language Models (LLMs) in addressing complex sequential decision-making tasks through the provision of high-level instructions. However, LLM-based agents lack specialization in tackling specific target problems, particularly in real-time dynamic environments. Additionally, deploying an LLM-based agent in practical scenarios can be both costly and time-consuming. On the other hand, reinforcement learning (RL) approaches train agents that specialize in the target task but often suffer from low sampling efficiency and high exploration costs. In this paper, we introduce a novel framework that addresses these challenges by training a smaller, specialized student RL agent using instructions from an LLM-based teacher agent. By incorporating the guidance from the teacher agent, the student agent can distill the prior knowledge of the LLM into its own model. Consequently, the student agent can be trained with significantly less data. Moreover, through further training with environment feedback, the student agent surpasses the capabilities of its teacher for completing the target task. We conducted experiments on challenging MiniGrid and Habitat environments, specifically designed for embodied AI research, to evaluate the effectiveness of our framework. The results clearly demonstrate that our approach achieves superior performance compared to strong baseline methods. Our code is available at https://github.com/ZJLAB-AMMI/LLM4Teach.

Updated: 2024-05-27 14:00:23

标题: 大型语言模型作为训练强化学习代理的政策教师

摘要: 最近的研究揭示了大型语言模型（LLMs）在通过提供高级指导解决复杂的顺序决策任务方面的潜力。然而，基于LLM的代理在解决特定目标问题方面缺乏专业化，特别是在实时动态环境中。此外，在实际场景中部署基于LLM的代理可能既昂贵又耗时。另一方面，强化学习（RL）方法训练专门针对目标任务的代理，但往往遭受低采样效率和高探索成本的困扰。在本文中，我们引入了一个新颖的框架，通过使用LLM-based教师代理的指导，训练一个较小、专门的学生RL代理来解决这些挑战。通过融入教师代理的指导，学生代理可以将LLM的先前知识提炼到自己的模型中。因此，学生代理可以用明显更少的数据进行训练。此外，通过进一步接收环境反馈进行训练，学生代理超越其教师的能力，完成目标任务。我们在具有挑战性的MiniGrid和Habitat环境上进行了实验证明我们框架的有效性。结果清楚地表明，我们的方法与强基线方法相比实现了优异的性能。我们的代码可以在https://github.com/ZJLAB-AMMI/LLM4Teach 上找到。

更新时间: 2024-05-27 14:00:23

领域: cs.AI

下载: http://arxiv.org/abs/2311.13373v6

DreamMat: High-quality PBR Material Generation with Geometry- and Light-aware Diffusion Models

2D diffusion model, which often contains unwanted baked-in shading effects and results in unrealistic rendering effects in the downstream applications. Generating Physically Based Rendering (PBR) materials instead of just RGB textures would be a promising solution. However, directly distilling the PBR material parameters from 2D diffusion models still suffers from incorrect material decomposition, such as baked-in shading effects in albedo. We introduce DreamMat, an innovative approach to resolve the aforementioned problem, to generate high-quality PBR materials from text descriptions. We find out that the main reason for the incorrect material distillation is that large-scale 2D diffusion models are only trained to generate final shading colors, resulting in insufficient constraints on material decomposition during distillation. To tackle this problem, we first finetune a new light-aware 2D diffusion model to condition on a given lighting environment and generate the shading results on this specific lighting condition. Then, by applying the same environment lights in the material distillation, DreamMat can generate high-quality PBR materials that are not only consistent with the given geometry but also free from any baked-in shading effects in albedo. Extensive experiments demonstrate that the materials produced through our methods exhibit greater visual appeal to users and achieve significantly superior rendering quality compared to baseline methods, which are preferable for downstream tasks such as game and film production.

Updated: 2024-05-27 13:55:08

标题: DreamMat: 凭借几何和光线感知扩散模型实现高质量的PBR材质生成

摘要: 2D扩散模型通常包含不必要的内置阴影效果，并导致下游应用中渲染效果不真实。生成基于物理的渲染（PBR）材质而不仅仅是RGB纹理可能是一个有前途的解决方案。然而，直接从2D扩散模型中提取PBR材质参数仍然存在错误的材质分解，比如Albedo中的内置阴影效果。我们引入了DreamMat，这是一种创新方法来解决上述问题，从文本描述中生成高质量的PBR材质。我们发现，不正确材质提炼的主要原因是大规模的2D扩散模型只被训练来生成最终的着色颜色，在提炼过程中对材质分解的约束不足。为了解决这个问题，我们首先微调一个新的光感知2D扩散模型，以一个给定的光照环境为条件，并在该特定光照条件下生成着色结果。然后，通过在材质提炼中应用相同的环境光，DreamMat可以生成高质量的PBR材质，不仅与给定几何一致，而且在Albedo中没有任何内置阴影效果。大量实验证明，通过我们的方法生成的材质对用户具有更大的视觉吸引力，并且在渲染质量方面比基准方法显著优越，这对于游戏和电影制作等下游任务是更为理想的。

更新时间: 2024-05-27 13:55:08

领域: cs.GR,cs.AI

下载: http://arxiv.org/abs/2405.17176v1

Joint Prediction Regions for time-series models

Machine Learning algorithms are notorious for providing point predictions but not prediction intervals. There are many applications where one requires confidence in predictions and prediction intervals. Stringing together, these intervals give rise to joint prediction regions with the desired significance level. It is an easy task to compute Joint Prediction regions (JPR) when the data is IID. However, the task becomes overly difficult when JPR is needed for time series because of the dependence between the observations. This project aims to implement Wolf and Wunderli's method for constructing JPRs and compare it with other methods (e.g. NP heuristic, Joint Marginals). The method under study is based on bootstrapping and is applied to different datasets (Min Temp, Sunspots), using different predictors (e.g. ARIMA and LSTM). One challenge of applying the method under study is to derive prediction standard errors for models, it cannot be obtained analytically. A novel method to estimate prediction standard error for different predictors is also devised. Finally, the method is applied to a synthetic dataset to find empirical averages and empirical widths and the results from the Wolf and Wunderli paper are consolidated. The experimental results show a narrowing of width with strong predictors like neural nets, widening of width with increasing forecast horizon H and decreasing significance level alpha, controlling the width with parameter k in K-FWE, and loss of information using Joint Marginals.

Updated: 2024-05-27 13:52:39

标题: 时间序列模型的联合预测区域

摘要: 机器学习算法以提供点预测而著称，但不提供预测区间。有许多应用场景需要对预测和预测区间有信心。将这些区间串联起来，形成具有所需显著性水平的联合预测区域。当数据是IID时，计算联合预测区域（JPR）是一项简单的任务。然而，当需要针对时间序列计算JPR时，由于观测之间的依赖关系，任务变得非常困难。本项目旨在实现Wolf和Wunderli构建JPR的方法，并与其他方法（例如NP启发式算法，联合边缘）进行比较。研究中的方法基于自举法，并应用于不同的数据集（例如最低温度，太阳黑子），使用不同的预测器（例如ARIMA和LSTM）。应用所研究方法的一个挑战是为模型推导预测标准误差，无法通过分析得到。还设计了一种用于估计不同预测器的预测标准误差的新方法。最后，该方法应用于一个合成数据集，以找到经验平均值和经验宽度，并整合Wolf和Wunderli论文的结果。实验结果显示，像神经网络这样的强预测器会使宽度变窄，随着预测时间跨度H的增加和显著性水平alpha的降低，宽度变宽，通过参数k在K-FWE中控制宽度，使用联合边缘会导致信息丢失。

更新时间: 2024-05-27 13:52:39

领域: stat.ML,cs.LG

下载: http://arxiv.org/abs/2405.12234v2

Mean-Field Microcanonical Gradient Descent

Microcanonical gradient descent is a sampling procedure for energy-based models allowing for efficient sampling of distributions in high dimension. It works by transporting samples from a high-entropy distribution, such as Gaussian white noise, to a low-energy region using gradient descent. We put this model in the framework of normalizing flows, showing how it can often overfit by losing an unnecessary amount of entropy in the descent. As a remedy, we propose a mean-field microcanonical gradient descent that samples several weakly coupled data points simultaneously, allowing for better control of the entropy loss while paying little in terms of likelihood fit. We study these models in the context of financial time series, illustrating the improvements on both synthetic and real data.

Updated: 2024-05-27 13:50:55

标题: 均场微正则梯度下降

摘要: 微正则梯度下降是一种能够有效在高维度中采样分布的能量模型的采样过程。它通过使用梯度下降将样本从高熵分布（如高斯白噪声）传送到低能区域。我们将该模型置于归一化流的框架中，展示了它在下降过程中往往会过度拟合，因为失去了大量不必要的熵。为了解决这个问题，我们提出了一种均场微正则梯度下降，同时采样多个弱耦合的数据点，从而更好地控制熵的损失，同时在可能性拟合方面付出较少的代价。我们在金融时间序列的背景下研究了这些模型，展示了在合成数据和真实数据上的改进。

更新时间: 2024-05-27 13:50:55

领域: stat.ML,cs.LG,q-fin.ST,stat.CO

下载: http://arxiv.org/abs/2403.08362v2

Forecasting Four Business Cycle Phases Using Machine Learning: A Case Study of US and EuroZone

Understanding the business cycle is crucial for building economic stability, guiding business planning, and informing investment decisions. The business cycle refers to the recurring pattern of expansion and contraction in economic activity over time. Economic analysis is inherently complex, incorporating a myriad of factors (such as macroeconomic indicators, political decisions). This complexity makes it challenging to fully account for all variables when determining the current state of the economy and predicting its future trajectory in the upcoming months. The objective of this study is to investigate the capacity of machine learning models in automatically analyzing the state of the economic, with the goal of forecasting business phases (expansion, slowdown, recession and recovery) in the United States and the EuroZone. We compared three different machine learning approaches to classify the phases of the business cycle, and among them, the Multinomial Logistic Regression (MLR) achieved the best results. Specifically, MLR got the best results by achieving the accuracy of 65.25% (Top1) and 84.74% (Top2) for the EuroZone and 75% (Top1) and 92.14% (Top2) for the United States. These results demonstrate the potential of machine learning techniques to predict business cycles accurately, which can aid in making informed decisions in the fields of economics and finance.

Updated: 2024-05-27 13:49:24

标题: 使用机器学习预测四个商业周期阶段：美国和欧元区的案例研究

摘要: 理解商业周期对建立经济稳定性、指导商业规划和指导投资决策至关重要。商业周期指的是经济活动在一段时间内扩张和收缩的周期性模式。经济分析固有地复杂，涵盖了诸多因素（如宏观经济指标、政治决策）。这种复杂性使得在确定当前经济状况并预测未来几个月内的轨迹时，很难充分考虑所有变量。本研究的目标是调查机器学习模型在自动分析经济状态方面的能力，旨在预测美国和欧元区的商业周期阶段（扩张、减缓、衰退和复苏）。我们比较了三种不同的机器学习方法来分类商业周期的阶段，其中，多项式逻辑回归（MLR）取得了最佳结果。具体来说，MLR在欧元区取得了65.25%（Top1）和84.74%（Top2）的准确率，而在美国分别为75%（Top1）和92.14%（Top2）。这些结果表明了机器学习技术准确预测商业周期的潜力，可以帮助在经济和金融领域做出明智的决策。

更新时间: 2024-05-27 13:49:24

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2405.17170v1

Can Generative Models Improve Self-Supervised Representation Learning?

The rapid advancement in self-supervised learning (SSL) has highlighted its potential to leverage unlabeled data for learning rich visual representations. However, the existing SSL techniques, particularly those employing different augmentations of the same image, often rely on a limited set of simple transformations that are not representative of real-world data variations. This constrains the diversity and quality of samples, which leads to sub-optimal representations. In this paper, we introduce a novel framework that enriches the SSL paradigm by utilizing generative models to produce semantically consistent image augmentations. By directly conditioning generative models on a source image representation, our method enables the generation of diverse augmentations while maintaining the semantics of the source image, thus offering a richer set of data for self-supervised learning. Our extensive experimental results on various SSL methods demonstrate that our framework significantly enhances the quality of learned visual representations by up to 10\% Top-1 accuracy in downstream tasks. This research demonstrates that incorporating generative models into the SSL workflow opens new avenues for exploring the potential of synthetic data. This development paves the way for more robust and versatile representation learning techniques.

Updated: 2024-05-27 13:49:10

标题: 生成模型能改善自监督表示学习吗？

摘要: 自监督学习（SSL）的快速发展突显了利用未标记数据学习丰富视觉表示的潜力。然而，现有的SSL技术，特别是采用同一图像的不同增强的技术，往往依赖于一组有限的简单变换，这些变换并不代表真实世界的数据变化。这限制了样本的多样性和质量，导致表示不佳。在本文中，我们介绍了一个新颖的框架，通过利用生成模型产生语义一致的图像增强来丰富SSL范式。通过直接将生成模型条件化为源图像表示，我们的方法能够生成多样化的增强，同时保持源图像的语义，从而为自监督学习提供更丰富的数据集。我们在各种SSL方法上进行的广泛实验结果表明，我们的框架将学习到的视觉表示的质量提高了高达10\%的Top-1准确率。这项研究表明，将生成模型纳入SSL工作流程开拓了探索合成数据潜力的新途径。这一发展为更稳健和多功能的表示学习技术铺平了道路。

更新时间: 2024-05-27 13:49:10

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2403.05966v2

Cross-modal Diffusion Modelling for Super-resolved Spatial Transcriptomics

The recent advancement of spatial transcriptomics (ST) allows to characterize spatial gene expression within tissue for discovery research. However, current ST platforms suffer from low resolution, hindering in-depth understanding of spatial gene expression. Super-resolution approaches promise to enhance ST maps by integrating histology images with gene expressions of profiled tissue spots. However, current super-resolution methods are limited by restoration uncertainty and mode collapse. Although diffusion models have shown promise in capturing complex interactions between multi-modal conditions, it remains a challenge to integrate histology images and gene expression for super-resolved ST maps. This paper proposes a cross-modal conditional diffusion model for super-resolving ST maps with the guidance of histology images. Specifically, we design a multi-modal disentangling network with cross-modal adaptive modulation to utilize complementary information from histology images and spatial gene expression. Moreover, we propose a dynamic cross-attention modelling strategy to extract hierarchical cell-to-tissue information from histology images. Lastly, we propose a co-expression-based gene-correlation graph network to model the co-expression relationship of multiple genes. Experiments show that our method outperforms other state-of-the-art methods in ST super-resolution on three public datasets.

Updated: 2024-05-27 13:43:30

标题: 跨模态扩散建模用于超分辨空间转录组学

摘要: 最近空间转录组学（ST）的进展允许对组织内的空间基因表达进行表征，以进行发现研究。然而，当前的ST平台存在分辨率低的问题，阻碍了对空间基因表达的深入理解。超分辨率方法承诺通过将组织切片的组织学图像与基因表达整合，增强ST地图。然而，当前的超分辨率方法受到恢复不确定性和模式崩溃的限制。虽然扩散模型已经显示出在捕捉多模态条件之间复杂交互作用方面的潜力，但将组织学图像和基因表达集成到超分辨ST地图中仍然是一个挑战。本文提出了一种用于在组织学图像的指导下对ST地图进行超分辨的跨模态条件扩散模型。具体来说，我们设计了一个多模态解缠网络，使用跨模态自适应调制来利用组织学图像和空间基因表达的互补信息。此外，我们提出了一种动态交叉注意力建模策略，用于从组织学图像中提取分层细胞到组织的信息。最后，我们提出了一个基于共表达的基因相关性图网络，用于建模多个基因之间的共表达关系。实验表明，我们的方法在三个公共数据集上的ST超分辨率方面优于其他最先进的方法。

更新时间: 2024-05-27 13:43:30

领域: eess.IV,cs.CV,cs.LG,q-bio.QM

下载: http://arxiv.org/abs/2404.12973v2

WeiPer: OOD Detection using Weight Perturbations of Class Projections

Recent advances in out-of-distribution (OOD) detection on image data show that pre-trained neural network classifiers can separate in-distribution (ID) from OOD data well, leveraging the class-discriminative ability of the model itself. Methods have been proposed that either use logit information directly or that process the model's penultimate layer activations. With "WeiPer", we introduce perturbations of the class projections in the final fully connected layer which creates a richer representation of the input. We show that this simple trick can improve the OOD detection performance of a variety of methods and additionally propose a distance-based method that leverages the properties of the augmented WeiPer space. We achieve state-of-the-art OOD detection results across multiple benchmarks of the OpenOOD framework, especially pronounced in difficult settings in which OOD samples are positioned close to the training set distribution. We support our findings with theoretical motivations and empirical observations, and run extensive ablations to provide insights into why WeiPer works.

Updated: 2024-05-27 13:38:28

标题: WeiPer：使用类别投影的权重扰动进行OOD检测

摘要: 最近在图像数据的异常外分布（OOD）检测方面取得了进展，表明预训练的神经网络分类器能够很好地区分内分布（ID）和OOD数据，利用模型本身的类别区分能力。已经提出了一些方法，其中一些直接使用logit信息，另一些处理模型的倒数第二层激活。通过“WeiPer”，我们引入了在最终完全连接层中的类别投影的扰动，从而创建了输入的更丰富表示。我们展示了这个简单的技巧可以提高各种方法的OOD检测性能，并另外提出了一种基于距离的方法，利用增强的WeiPer空间的属性。我们在OpenOOD框架的多个基准测试中取得了最先进的OOD检测结果，特别是在OOD样本接近训练集分布的困难情况下表现明显。我们通过理论动机和经验观察支持我们的发现，并进行了广泛的消融实验，以揭示为什么WeiPer起作用。

更新时间: 2024-05-27 13:38:28

领域: cs.LG

下载: http://arxiv.org/abs/2405.17164v1

Injecting Hamiltonian Architectural Bias into Deep Graph Networks for Long-Range Propagation

The dynamics of information diffusion within graphs is a critical open issue that heavily influences graph representation learning, especially when considering long-range propagation. This calls for principled approaches that control and regulate the degree of propagation and dissipation of information throughout the neural flow. Motivated by this, we introduce (port-)Hamiltonian Deep Graph Networks, a novel framework that models neural information flow in graphs by building on the laws of conservation of Hamiltonian dynamical systems. We reconcile under a single theoretical and practical framework both non-dissipative long-range propagation and non-conservative behaviors, introducing tools from mechanical systems to gauge the equilibrium between the two components. Our approach can be applied to general message-passing architectures, and it provides theoretical guarantees on information conservation in time. Empirical results prove the effectiveness of our port-Hamiltonian scheme in pushing simple graph convolutional architectures to state-of-the-art performance in long-range benchmarks.

Updated: 2024-05-27 13:36:50

标题: 将哈密顿建筑偏见注入深度图网络以进行长程传播

摘要: 图中信息扩散的动态是一个重要的开放性问题，特别是在考虑长距离传播时，这会严重影响图表示学习。这需要有原则的方法来控制和调节信息在神经流中的传播和耗散程度。受此启发，我们引入了（端口）哈密顿深度图网络，这是一个新颖的框架，通过建立在哈密顿动力系统的守恒定律基础上，来模拟图中神经信息的流动。我们在一个理论和实践框架下统一了非耗散的长距离传播和非守恒行为，引入了力学系统的工具来衡量两个组成部分之间的平衡。我们的方法可以应用于一般的消息传递架构，并且在时间上提供了信息保持的理论保证。实证结果证明了我们的端口哈密顿方案推动简单的图卷积架构达到长距离基准测试的最先进性能。

更新时间: 2024-05-27 13:36:50

领域: cs.LG,cs.SY,eess.SY

下载: http://arxiv.org/abs/2405.17163v1

The Scaling Law in Stellar Light Curves

Analyzing time series of fluxes from stars, known as stellar light curves, can reveal valuable information about stellar properties. However, most current methods rely on extracting summary statistics, and studies using deep learning have been limited to supervised approaches. In this research, we investigate the scaling law properties that emerge when learning from astronomical time series data using self-supervised techniques. By employing the GPT-2 architecture, we show the learned representation improves as the number of parameters increases from $10^4$ to $10^9$, with no signs of performance plateauing. We demonstrate that a self-supervised Transformer model achieves 3-10 times the sample efficiency compared to the state-of-the-art supervised learning model when inferring the surface gravity of stars as a downstream task. Our research lays the groundwork for analyzing stellar light curves by examining them through large-scale auto-regressive generative models.

Updated: 2024-05-27 13:31:03

标题: 恒星光变曲线中的尺度律

摘要: 分析来自恒星的通量时间序列，即恒星光变曲线，可以揭示有关恒星性质的宝贵信息。然而，大多数当前的方法依赖于提取摘要统计数据，并且使用深度学习的研究受限于监督方法。在这项研究中，我们调查了使用自监督技术从天文时间序列数据中学习时出现的标度律特性。通过使用GPT-2架构，我们展示了随着参数数量从$10^4$增加到$10^9$，学到的表示不断改进，且没有性能平稳的迹象。我们证明，一个自监督Transformer模型在推断作为下游任务的恒星表面重力时，比最先进的监督学习模型具有3-10倍的样本效率。我们的研究为通过大规模自回归生成模型检查恒星光变曲线奠定了基础。

更新时间: 2024-05-27 13:31:03

领域: astro-ph.IM,astro-ph.SR,cs.LG

下载: http://arxiv.org/abs/2405.17156v1

CoSLight: Co-optimizing Collaborator Selection and Decision-making to Enhance Traffic Signal Control

Effective multi-intersection collaboration is pivotal for reinforcement-learning-based traffic signal control to alleviate congestion. Existing work mainly chooses neighboring intersections as collaborators. However, quite an amount of congestion, even some wide-range congestion, is caused by non-neighbors failing to collaborate. To address these issues, we propose to separate the collaborator selection as a second policy to be learned, concurrently being updated with the original signal-controlling policy. Specifically, the selection policy in real-time adaptively selects the best teammates according to phase- and intersection-level features. Empirical results on both synthetic and real-world datasets provide robust validation for the superiority of our approach, offering significant improvements over existing state-of-the-art methods. The code is available at https://github.com/AnonymousAccountss/CoSLight.

Updated: 2024-05-27 13:26:59

标题: CoSLight: 协同优化协作人选择和决策以增强交通信号控制

摘要: 有效的多交叉口协作对于基于强化学习的交通信号控制以减轻拥堵至关重要。现有工作主要选择邻近交叉口作为协作者。然而，相当多的拥堵，甚至一些广泛的拥堵，是由非邻居失败协作导致的。为了解决这些问题，我们建议将协作者选择作为第二个要学习的策略，与原始信号控制策略同时更新。具体地，实时选择策略根据阶段和交叉口级别特征自适应地选择最佳的队友。在合成和真实数据集上的实证结果为我们的方法的优越性提供了强有力的验证，相对于现有的最先进方法，提供了显著的改进。代码可在https://github.com/AnonymousAccountss/CoSLight 上找到。

更新时间: 2024-05-27 13:26:59

领域: cs.MA,cs.AI

下载: http://arxiv.org/abs/2405.17152v1

Smoke and Mirrors in Causal Downstream Tasks

Machine Learning and AI have the potential to transform data-driven scientific discovery, enabling accurate predictions for several scientific phenomena. As many scientific questions are inherently causal, this paper looks at the causal inference task of treatment effect estimation, where we assume binary effects that are recorded as high-dimensional images in a Randomized Controlled Trial (RCT). Despite being the simplest possible setting and a perfect fit for deep learning, we theoretically find that many common choices in the literature may lead to biased estimates. To test the practical impact of these considerations, we recorded the first real-world benchmark for causal inference downstream tasks on high-dimensional observations as an RCT studying how garden ants (Lasius neglectus) respond to microparticles applied onto their colony members by hygienic grooming. Comparing 6 480 models fine-tuned from state-of-the-art visual backbones, we find that the sampling and modeling choices significantly affect the accuracy of the causal estimate, and that classification accuracy is not a proxy thereof. We further validated the analysis, repeating it on a synthetically generated visual data set controlling the causal model. Our results suggest that future benchmarks should carefully consider real downstream scientific questions, especially causal ones. Further, we highlight guidelines for representation learning methods to help answer causal questions in the sciences. All code and data will be released.

Updated: 2024-05-27 13:26:34

标题: 烟雾和镜子：因果关系下游任务中的问题

摘要: 机器学习和人工智能有潜力改变数据驱动的科学发现，实现对几种科学现象的准确预测。由于许多科学问题本质上是因果关系，本文关注治疗效果估计的因果推断任务，在这里我们假设记录为高维图像的二进制效应在随机对照试验（RCT）中。尽管这是可能的设置最简单且非常适合深度学习，但我们在理论上发现文献中许多常见选择可能导致偏倚估计。为了测试这些考虑因素的实际影响，我们记录了第一个针对高维观察因果推断下游任务的真实世界基准，作为一项RCT研究，研究了花园蚂蚁（Lasius neglectus）如何对施加在其群体成员身上的微粒作出卫生理疗反应。通过比较从最先进的视觉骨干进行微调的6 480个模型，我们发现采样和建模选择显著影响因果估计的准确性，并且分类准确性并非其代理。我们进一步验证了分析结果，在控制因果模型的情况下，重复了对合成生成的视觉数据集的分析。我们的结果表明，未来的基准应该认真考虑真实的下游科学问题，特别是因果问题。此外，我们强调了代表性学习方法的指导方针，以帮助回答科学中的因果问题。所有代码和数据将被发布。

更新时间: 2024-05-27 13:26:34

领域: cs.LG

下载: http://arxiv.org/abs/2405.17151v1

Towards Weakly-Supervised Hate Speech Classification Across Datasets

As pointed out by several scholars, current research on hate speech (HS) recognition is characterized by unsystematic data creation strategies and diverging annotation schemata. Subsequently, supervised-learning models tend to generalize poorly to datasets they were not trained on, and the performance of the models trained on datasets labeled using different HS taxonomies cannot be compared. To ease this problem, we propose applying extremely weak supervision that only relies on the class name rather than on class samples from the annotated data. We demonstrate the effectiveness of a state-of-the-art weakly-supervised text classification model in various in-dataset and cross-dataset settings. Furthermore, we conduct an in-depth quantitative and qualitative analysis of the source of poor generalizability of HS classification models.

Updated: 2024-05-27 13:23:27

标题: 朝向跨数据集的弱监督仇恨言论分类

摘要: 正如几位学者所指出的，目前关于仇恨言论（HS）识别的研究以不系统的数据创建策略和不同的注释模式为特征。随后，监督学习模型往往会对它们没有经过训练的数据集进行泛化，而使用不同的HS分类法标记的数据集训练的模型的表现是无法比较的。为了缓解这个问题，我们提出应用极弱监督，仅仅依赖于类别名称而不是来自已标注数据的类别样本。我们展示了一种最先进的弱监督文本分类模型在各种数据集和跨数据集环境中的有效性。此外，我们对HS分类模型泛化能力差的来源进行了深入的定量和定性分析。

更新时间: 2024-05-27 13:23:27

领域: cs.CL,cs.AI,cs.CY

下载: http://arxiv.org/abs/2305.02637v3

Retrieve, Merge, Predict: Augmenting Tables with Data Lakes

We present an in-depth analysis of data discovery in data lakes, focusing on table augmentation for given machine learning tasks. We analyze alternative methods used in the three main steps: retrieving joinable tables, merging information, and predicting with the resultant table. As data lakes, the paper uses YADL (Yet Another Data Lake) -- a novel dataset we developed as a tool for benchmarking this data discovery task -- and Open Data US, a well-referenced real data lake. Through systematic exploration on both lakes, our study outlines the importance of accurately retrieving join candidates and the efficiency of simple merging methods. We report new insights on the benefits of existing solutions and on their limitations, aiming at guiding future research in this space.

Updated: 2024-05-27 13:21:05

标题: 检索、合并、预测：用数据湖增强表格

摘要: 我们对数据湖中的数据发现进行了深入分析，重点关注给定机器学习任务的表增强。我们分析了在检索可连接表、合并信息和使用结果表进行预测的三个主要步骤中使用的替代方法。作为数据湖，本文使用了我们开发的用作基准测试该数据发现任务的新型数据集YADL（又一个数据湖）和Open Data US，一个被广泛引用的真实数据湖。通过对这两个数据湖的系统性探索，我们的研究概述了准确检索连接候选者的重要性和简单合并方法的效率。我们报告了对现有解决方案的好处和局限性的新见解，旨在指导未来在这一领域的研究。

更新时间: 2024-05-27 13:21:05

领域: cs.DB,cs.LG

下载: http://arxiv.org/abs/2402.06282v4

Adapting to Continuous Covariate Shift via Online Density Ratio Estimation

Dealing with distribution shifts is one of the central challenges for modern machine learning. One fundamental situation is the covariate shift, where the input distributions of data change from training to testing stages while the input-conditional output distribution remains unchanged. In this paper, we initiate the study of a more challenging scenario -- continuous covariate shift -- in which the test data appear sequentially, and their distributions can shift continuously. Our goal is to adaptively train the predictor such that its prediction risk accumulated over time can be minimized. Starting with the importance-weighted learning, we show the method works effectively if the time-varying density ratios of test and train inputs can be accurately estimated. However, existing density ratio estimation methods would fail due to data scarcity at each time step. To this end, we propose an online method that can appropriately reuse historical information. Our density ratio estimation method is proven to perform well by enjoying a dynamic regret bound, which finally leads to an excess risk guarantee for the predictor. Empirical results also validate the effectiveness.

Updated: 2024-05-27 13:18:12

标题: 通过在线密度比估计适应连续协变量转移

摘要: 处理分布转移是现代机器学习的核心挑战之一。一个基本情况是协变量转移，即数据的输入分布在训练和测试阶段发生变化，而输入条件输出分布保持不变。本文研究一个更具挑战性的场景 -- 连续协变量转移 -- 在这种情况下，测试数据按顺序出现，并且它们的分布可以连续变化。我们的目标是适应性地训练预测器，使其随着时间累积的预测风险最小化。我们从重要性加权学习开始，如果能准确估计测试和训练输入的时变密度比率，则该方法有效地工作。然而，由于每个时间步骤的数据稀缺，现有的密度比率估计方法会失败。为此，我们提出了一种在线方法，可以适当地重用历史信息。我们的密度比率估计方法已被证明通过享有动态遗憾界限而表现良好，最终为预测器提供了超额风险保证。实证结果也验证了其有效性。

更新时间: 2024-05-27 13:18:12

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2302.02552v2

Conformalized Selective Regression

Should prediction models always deliver a prediction? In the pursuit of maximum predictive performance, critical considerations of reliability and fairness are often overshadowed, particularly when it comes to the role of uncertainty. Selective regression, also known as the "reject option," allows models to abstain from predictions in cases of considerable uncertainty. Initially proposed seven decades ago, approaches to selective regression have mostly focused on distribution-based proxies for measuring uncertainty, particularly conditional variance. However, this focus neglects the significant influence of model-specific biases on a model's performance. In this paper, we propose a novel approach to selective regression by leveraging conformal prediction, which provides grounded confidence measures for individual predictions based on model-specific biases. In addition, we propose a standardized evaluation framework to allow proper comparison of selective regression approaches. Via an extensive experimental approach, we demonstrate how our proposed approach, conformalized selective regression, demonstrates an advantage over multiple state-of-the-art baselines.

Updated: 2024-05-27 13:08:37

标题: 角几何化的选择性回归

摘要: 预测模型是否总是要提供预测？在追求最大预测性能的过程中，可靠性和公平性往往被忽视，特别是当涉及到不确定性的作用时。选择性回归，也被称为“拒绝选项”，允许模型在存在相当不确定性的情况下放弃预测。虽然选择性回归最初是在七十年前提出的，但对于测量不确定性的基于分布的代理方法，尤其是条件方差，目前大多集中在这方面。然而，这种关注忽视了模型特定偏差对模型性能的重要影响。在本文中，我们提出了一种利用符合预测的新颖选择性回归方法，该方法基于模型特定偏差为个别预测提供了可信度测量。此外，我们提出了一个标准化评估框架，以便正确比较选择性回归方法。通过广泛的实验方法，我们展示了我们提出的方法，符合预测的选择性回归，在多个最新基线模型上具有优势。

更新时间: 2024-05-27 13:08:37

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2402.16300v2

SymbolicAI: A framework for logic-based approaches combining generative models and solvers

We introduce SymbolicAI, a versatile and modular framework employing a logic-based approach to concept learning and flow management in generative processes. SymbolicAI enables the seamless integration of generative models with a diverse range of solvers by treating large language models (LLMs) as semantic parsers that execute tasks based on both natural and formal language instructions, thus bridging the gap between symbolic reasoning and generative AI. We leverage probabilistic programming principles to tackle complex tasks, and utilize differentiable and classical programming paradigms with their respective strengths. The framework introduces a set of polymorphic, compositional, and self-referential operations for multi-modal data that connects multi-step generative processes and aligns their outputs with user objectives in complex workflows. As a result, we can transition between the capabilities of various foundation models with in-context learning capabilities and specialized, fine-tuned models or solvers proficient in addressing specific problems. Through these operations based on in-context learning our framework enables the creation and evaluation of explainable computational graphs. Finally, we introduce a quality measure and its empirical score for evaluating these computational graphs, and propose a benchmark that compares various state-of-the-art LLMs across a set of complex workflows. We refer to the empirical score as the "Vector Embedding for Relational Trajectory Evaluation through Cross-similarity", or VERTEX score for short. The framework codebase and benchmark are linked below.

Updated: 2024-05-27 13:05:13

标题: SymbolicAI：一个结合生成模型和求解器的基于逻辑的方法框架

摘要: 我们介绍了SymbolicAI，这是一个多功能且模块化的框架，采用基于逻辑的方法来进行概念学习和流程管理的生成过程。SymbolicAI通过将大型语言模型(LLMs)视为语义解析器来实现生成模型与各种求解器的无缝集成，这些解析器可以根据自然语言和形式语言指令执行任务，从而弥合符号推理与生成AI之间的差距。我们利用概率编程原则来解决复杂任务，并利用可微分编程和经典编程范式及其各自的优势。该框架引入了一组多态、组合和自引用操作，用于连接多模态数据的多步生成过程，并将它们的输出与用户在复杂工作流程中的目标对齐。因此，我们可以在具有上下文学习能力的各种基础模型和专业的、经过优化的模型或求解器之间进行平滑过渡，这些模型或求解器擅长解决特定问题。通过基于上下文学习的这些操作，我们的框架能够创建和评估可解释的计算图。最后，我们介绍了一种用于评估这些计算图的质量度量及其实证得分，并提出了一个基准，用于比较一系列复杂工作流程中各种最先进的LLMs。我们将这个实证得分称为"关系轨迹评估的向量嵌入通过交叉相似度"，简称为VERTEX得分。框架代码库和基准测试已经链接如下。

更新时间: 2024-05-27 13:05:13

领域: cs.LG,cs.AI,cs.SC,cs.SE

下载: http://arxiv.org/abs/2402.00854v3

Local Model Reconstruction Attacks in Federated Learning and their Uses

In this paper, we initiate the study of local model reconstruction attacks for federated learning, where a honest-but-curious adversary eavesdrops the messages exchanged between a targeted client and the server, and then reconstructs the local/personalized model of the victim. The local model reconstruction attack allows the adversary to trigger other classical attacks in a more effective way, since the local model only depends on the client's data and can leak more private information than the global model learned by the server. Additionally, we propose a novel model-based attribute inference attack in federated learning leveraging the local model reconstruction attack. We provide an analytical lower-bound for this attribute inference attack. Empirical results using real world datasets confirm that our local reconstruction attack works well for both regression and classification tasks. Moreover, we benchmark our novel attribute inference attack against the state-of-the-art attacks in federated learning. Our attack results in higher reconstruction accuracy especially when the clients' datasets are heterogeneous. Our work provides a new angle for designing powerful and explainable attacks to effectively quantify the privacy risk in FL.

Updated: 2024-05-27 13:04:34

标题: 《联邦学习中的本地模型重建攻击及其应用》

摘要: 在这篇论文中，我们开始研究联邦学习中的本地模型重建攻击，其中一个诚实但好奇的对手窃听目标客户端和服务器之间交换的消息，然后重建受害者的本地/个性化模型。本地模型重建攻击允许对手以更有效的方式触发其他经典攻击，因为本地模型仅取决于客户端的数据，并且可能泄露比服务器学习的全局模型更多的私人信息。此外，我们提出了一种新颖的基于模型的属性推断攻击，利用了联邦学习中的本地模型重建攻击。我们为这种属性推断攻击提供了一个分析下界。使用真实世界数据集的实证结果证实了我们的本地重建攻击对回归和分类任务都有效。此外，我们将我们的新型属性推断攻击与联邦学习中的最先进攻击进行了基准测试。我们的攻击结果在客户端数据集异构性较高时尤其具有更高的重建准确性。我们的工作为设计强大且可解释的攻击提供了一个新的角度，以有效量化联邦学习中的隐私风险。

更新时间: 2024-05-27 13:04:34

领域: cs.LG,cs.AI,cs.CR

下载: http://arxiv.org/abs/2210.16205v3

Synergy and Diversity in CLIP: Enhancing Performance Through Adaptive Backbone Ensembling

Contrastive Language-Image Pretraining (CLIP) stands out as a prominent method for image representation learning. Various architectures, from vision transformers (ViTs) to convolutional networks (ResNets) have been trained with CLIP to serve as general solutions to diverse vision tasks. This paper explores the differences across various CLIP-trained vision backbones. Despite using the same data and training objective, we find that these architectures have notably different representations, different classification performance across datasets, and different robustness properties to certain types of image perturbations. Our findings indicate a remarkable possible synergy across backbones by leveraging their respective strengths. In principle, classification accuracy could be improved by over 40 percentage with an informed selection of the optimal backbone per test example.Using this insight, we develop a straightforward yet powerful approach to adaptively ensemble multiple backbones. The approach uses as few as one labeled example per class to tune the adaptive combination of backbones. On a large collection of datasets, the method achieves a remarkable increase in accuracy of up to 39.1% over the best single backbone, well beyond traditional ensembles

Updated: 2024-05-27 12:59:35

标题: CLIP中的协同作用和多样性：通过自适应主干集成提高性能

摘要: 对比语言-图像预训练（CLIP）作为一种突出的图像表示学习方法。从视觉变换器（ViTs）到卷积网络（ResNets）等各种架构都经过CLIP训练，用作解决不同视觉任务的通用方法。本文探讨了各种经过CLIP训练的视觉骨干之间的差异。尽管使用相同的数据和训练目标，我们发现这些架构具有明显不同的表示、在数据集上不同的分类性能，以及对某些类型的图像扰动具有不同的鲁棒性特性。我们的发现表明，通过利用各个骨干的各自优势，可能存在显著的协同作用。原则上，通过在测试示例中选择最佳骨干，分类准确率可以提高超过40%。利用这一见解，我们开发了一种简单而强大的方法，以自适应地集成多个骨干。该方法使用每类仅一个标记样本来调整骨干的自适应组合。在大量数据集上，这种方法的准确率增加了最多39.1%，远远超过传统集成方法。

更新时间: 2024-05-27 12:59:35

领域: cs.CV,cs.AI,cs.LG

下载: http://arxiv.org/abs/2405.17139v1

Locally Testing Model Detections for Semantic Global Concepts

Ensuring the quality of black-box Deep Neural Networks (DNNs) has become ever more significant, especially in safety-critical domains such as automated driving. While global concept encodings generally enable a user to test a model for a specific concept, linking global concept encodings to the local processing of single network inputs reveals their strengths and limitations. Our proposed framework global-to-local Concept Attribution (glCA) uses approaches from local (why a specific prediction originates) and global (how a model works generally) eXplainable Artificial Intelligence (xAI) to test DNNs for a predefined semantical concept locally. The approach allows for conditioning local, post-hoc explanations on predefined semantic concepts encoded as linear directions in the model's latent space. Pixel-exact scoring concerning the global concept usage assists the tester in further understanding the model processing of single data points for the selected concept. Our approach has the advantage of fully covering the model-internal encoding of the semantic concept and allowing the localization of relevant concept-related information. The results show major differences in the local perception and usage of individual global concept encodings and demand for further investigations regarding obtaining thorough semantic concept encodings.

Updated: 2024-05-27 12:52:45

标题: 在本地测试模型检测语义全局概念

摘要: 确保黑盒深度神经网络（DNNs）的质量变得越来越重要，特别是在自动驾驶等安全关键领域。虽然全局概念编码通常使用户能够测试特定概念的模型，但将全局概念编码与单个网络输入的局部处理联系起来揭示了它们的优势和局限性。我们提出的全局到局部概念归因（glCA）框架使用来自局部（为什么特定预测源于何处）和全局（模型通常如何工作）可解释人工智能（xAI）的方法，为预定义的语义概念在局部进行DNNs测试。该方法允许将预定义的语义概念编码为模型潜在空间中的线性方向，并对局部后验解释进行条件化。关于全局概念使用的像素精确评分有助于测试者进一步了解模型处理所选概念的单个数据点。我们的方法具有完全覆盖语义概念的模型内部编码并允许定位相关概念相关信息的优势。结果显示在个别全局概念编码的局部感知和使用方面存在重大差异，并要求进一步调查获取彻底的语义概念编码。

更新时间: 2024-05-27 12:52:45

领域: cs.CV,cs.AI,cs.LG

下载: http://arxiv.org/abs/2405.17523v1

A Concept-Value Network as a Brain Model

This paper suggests a statistical framework for describing the relations between the physical and conceptual entities of a brain-like model. Features and concept instances are put into context, where the paper suggests that features may be the electrical wiring, although chemical connections are also possible. With this idea, the actual length of the connection is important, because it is related to firing rates and neuron synchronization, but the signal type is less important. The paper then suggests that concepts are neuron groups that link feature sets and concept instances are determined by chemical signals from those groups. Therefore, features become the static horizontal framework of the neural system and concepts are vertically interconnected combinations of these. This would also suggest that features can be distributed entities and not concentrated to a single area.

Updated: 2024-05-27 12:50:58

标题: 一个概念-价值网络作为大脑模型

摘要: 本文提出了一个描述类似大脑模型的物理和概念实体之间关系的统计框架。特征和概念实例被放置在上下文中，文章建议特征可能是电气布线，尽管化学连接也是可能的。根据这个想法，连接的实际长度很重要，因为它与发射速率和神经元同步有关，但信号类型不太重要。然后，文章建议概念是连接特征集的神经元群，概念实例由这些群体的化学信号确定。因此，特征成为神经系统的静态水平框架，概念是这些框架的垂直互相连接的组合。这也意味着特征可以是分布式实体，而不是集中在一个单一区域。

更新时间: 2024-05-27 12:50:58

领域: cs.NE,cs.AI,q-bio.NC

下载: http://arxiv.org/abs/1904.04579v3

Your decision path does matter in pre-training industrial recommenders with multi-source behaviors

Online service platforms offering a wide range of services through miniapps have become crucial for users who visit these platforms with clear intentions to find services they are interested in. Aiming at effective content delivery, cross-domain recommendation are introduced to learn high-quality representations by transferring behaviors from data-rich scenarios. However, these methods overlook the impact of the decision path that users take when conduct behaviors, that is, users ultimately exhibit different behaviors based on various intents. To this end, we propose HIER, a novel Hierarchical decIsion path Enhanced Representation learning for cross-domain recommendation. With the help of graph neural networks for high-order topological information of the knowledge graph between multi-source behaviors, we further adaptively learn decision paths through well-designed exemplar-level and information bottleneck based contrastive learning. Extensive experiments in online and offline environments show the superiority of HIER.

Updated: 2024-05-27 12:49:07

标题: 您在使用多源行为进行预训练工业推荐系统时的决策路径确实很重要

摘要: 在线服务平台通过小程序提供各种服务，对于那些明确目的是寻找感兴趣服务的用户来说变得至关重要。为了实现有效的内容传递，跨领域推荐被引入以从数据丰富的场景中转移行为以学习高质量的表示。然而，这些方法忽视了用户在进行行为时所采取的决策路径的影响，也就是说，用户基于不同的意图最终展现出不同的行为。因此，我们提出了HIER，一种新颖的用于跨领域推荐的层次决策路径增强表示学习方法。通过利用图神经网络来获取多源行为之间知识图的高阶拓扑信息，我们进一步通过精心设计的示例级别和基于信息瓶颈的对比学习来自适应地学习决策路径。在线和离线环境中的广泛实验显示了HIER的优越性。

更新时间: 2024-05-27 12:49:07

领域: cs.LG

下载: http://arxiv.org/abs/2405.17132v1

Exploiting the Layered Intrinsic Dimensionality of Deep Models for Practical Adversarial Training

Despite being a heavily researched topic, Adversarial Training (AT) is rarely, if ever, deployed in practical AI systems for two primary reasons: (i) the gained robustness is frequently accompanied by a drop in generalization and (ii) generating adversarial examples (AEs) is computationally prohibitively expensive. To address these limitations, we propose SMAAT, a new AT algorithm that leverages the manifold conjecture, stating that off-manifold AEs lead to better robustness while on-manifold AEs result in better generalization. Specifically, SMAAT aims at generating a higher proportion of off-manifold AEs by perturbing the intermediate deepnet layer with the lowest intrinsic dimension. This systematically results in better scalability compared to classical AT as it reduces the PGD chains length required for generating the AEs. Additionally, our study provides, to the best of our knowledge, the first explanation for the difference in the generalization and robustness trends between vision and language models, ie., AT results in a drop in generalization in vision models whereas, in encoder-based language models, generalization either improves or remains unchanged. We show that vision transformers and decoder-based models tend to have low intrinsic dimensionality in the earlier layers of the network (more off-manifold AEs), while encoder-based models have low intrinsic dimensionality in the later layers. We demonstrate the efficacy of SMAAT; on several tasks, including robustifying (i) sentiment classifiers, (ii) safety filters in decoder-based models, and (iii) retrievers in RAG setups. SMAAT requires only 25-33% of the GPU time compared to standard AT, while significantly improving robustness across all applications and maintaining comparable generalization.

Updated: 2024-05-27 12:48:30

标题: 利用深度模型的分层内在维度进行实用对抗训练

摘要: 尽管对Adversarial Training（AT）进行了大量研究，但由于两个主要原因，AT很少甚至从未在实际AI系统中部署：（i）获得的鲁棒性通常伴随着泛化能力的下降；（ii）生成对抗样本（AEs）在计算上是极其昂贵的。为解决这些限制，我们提出了一种新的AT算法SMAAT，利用流形猜想，即流形之外的AEs导致更好的鲁棒性，而流形上的AEs导致更好的泛化。具体而言，SMAAT旨在通过扰动具有最低内在维度的中间深度网络层生成更高比例的流形外AEs。与传统AT相比，这在系统上实现更好的可扩展性，因为它减少了生成AEs所需的PGD链长度。此外，我们的研究提供了截至目前的关于视觉和语言模型之间泛化和鲁棒性趋势差异的首个解释，即AT导致视觉模型泛化能力下降，而基于编码器的语言模型中，泛化能力要么改善，要么保持不变。我们展示了视觉Transformer和基于解码器的模型在网络的早期层具有较低的内在维度（更多的流形外AEs），而基于编码器的模型在后期层具有较低的内在维度。我们展示了SMAAT的有效性；在多个任务中，包括加固（i）情感分类器、（ii）基于解码器模型的安全过滤器和（iii）RAG设置中的检索器。与标准AT相比，SMAAT仅需要25-33%的GPU时间，同时在所有应用中显著提高鲁棒性，并保持可比较的泛化能力。

更新时间: 2024-05-27 12:48:30

领域: cs.LG,cs.CL

下载: http://arxiv.org/abs/2405.17130v1

TEII: Think, Explain, Interact and Iterate with Large Language Models to Solve Cross-lingual Emotion Detection

Cross-lingual emotion detection allows us to analyze global trends, public opinion, and social phenomena at scale. We participated in the Explainability of Cross-lingual Emotion Detection (EXALT) shared task, achieving an F1-score of 0.6046 on the evaluation set for the emotion detection sub-task. Our system outperformed the baseline by more than 0.16 F1-score absolute, and ranked second amongst competing systems. We conducted experiments using fine-tuning, zero-shot learning, and few-shot learning for Large Language Model (LLM)-based models as well as embedding-based BiLSTM and KNN for non-LLM-based techniques. Additionally, we introduced two novel methods: the Multi-Iteration Agentic Workflow and the Multi-Binary-Classifier Agentic Workflow. We found that LLM-based approaches provided good performance on multilingual emotion detection. Furthermore, ensembles combining all our experimented models yielded higher F1-scores than any single approach alone.

Updated: 2024-05-27 12:47:40

标题: TEII: 利用大型语言模型进行思考、解释、互动和迭代，解决跨语言情感检测

摘要: 跨语言情感检测使我们能够以规模化的方式分析全球趋势、公众意见和社会现象。我们参与了跨语言情感检测（EXALT）共享任务，在情感检测子任务的评估集上取得了0.6046的F1分数。我们的系统比基准性能提高了超过0.16的F1绝对分数，并在竞争系统中排名第二。我们进行了使用微调、零样本学习和少样本学习的实验，对基于大型语言模型（LLM）的模型以及基于嵌入式BiLSTM和KNN的非LLM技术进行了实验。此外，我们引入了两种新方法：多迭代主体工作流和多二元分类器主体工作流。我们发现，基于LLM的方法在多语言情感检测中表现良好。此外，将所有我们实验过的模型组合在一起的集成模型比任何单一方法都产生了更高的F1分数。

更新时间: 2024-05-27 12:47:40

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2405.17129v1

FinRobot: An Open-Source AI Agent Platform for Financial Applications using Large Language Models

As financial institutions and professionals increasingly incorporate Large Language Models (LLMs) into their workflows, substantial barriers, including proprietary data and specialized knowledge, persist between the finance sector and the AI community. These challenges impede the AI community's ability to enhance financial tasks effectively. Acknowledging financial analysis's critical role, we aim to devise financial-specialized LLM-based toolchains and democratize access to them through open-source initiatives, promoting wider AI adoption in financial decision-making. In this paper, we introduce FinRobot, a novel open-source AI agent platform supporting multiple financially specialized AI agents, each powered by LLM. Specifically, the platform consists of four major layers: 1) the Financial AI Agents layer that formulates Financial Chain-of-Thought (CoT) by breaking sophisticated financial problems down into logical sequences; 2) the Financial LLM Algorithms layer dynamically configures appropriate model application strategies for specific tasks; 3) the LLMOps and DataOps layer produces accurate models by applying training/fine-tuning techniques and using task-relevant data; 4) the Multi-source LLM Foundation Models layer that integrates various LLMs and enables the above layers to access them directly. Finally, FinRobot provides hands-on for both professional-grade analysts and laypersons to utilize powerful AI techniques for advanced financial analysis. We open-source FinRobot at \url{https://github.com/AI4Finance-Foundation/FinRobot}.

Updated: 2024-05-27 12:43:42

标题: FinRobot：一种使用大型语言模型的金融应用的开源AI代理平台

摘要: 随着金融机构和专业人士越来越多地将大型语言模型（LLMs）纳入其工作流程中，金融部门和人工智能社区之间仍存在重大障碍，包括专有数据和专业知识。这些挑战阻碍了人工智能社区有效增强金融任务的能力。鉴于金融分析的关键作用，我们旨在设计基于金融专业的LLM工具链，并通过开源倡议使其普及，促进更广泛的人工智能在金融决策中的采用。在本文中，我们介绍了FinRobot，这是一个支持多个专门针对金融的AI代理的新型开源平台，每个代理都由LLM驱动。具体来说，该平台包括四个主要层次：1）金融AI代理层，通过将复杂的金融问题分解成逻辑序列来构建金融思维链；2）金融LLM算法层，为特定任务动态配置适当的模型应用策略；3）LLMOps和DataOps层，通过应用训练/微调技术和使用任务相关数据生成准确的模型；4）多源LLM基础模型层，集成各种LLM并使上述层次能够直接访问它们。最后，FinRobot为专业分析师和普通人提供了实践机会，以利用强大的人工智能技术进行高级金融分析。我们在\url{https://github.com/AI4Finance-Foundation/FinRobot}上开源了FinRobot。

更新时间: 2024-05-27 12:43:42

领域: q-fin.ST,cs.CL,cs.LG,q-fin.TR

下载: http://arxiv.org/abs/2405.14767v2

P-split formulations: A class of intermediate formulations between big-M and convex hull for disjunctive constraints

We develop a class of mixed-integer formulations for disjunctive constraints intermediate to the big-M and convex hull formulations in terms of relaxation strength. The main idea is to capture the best of both the big-M and convex hull formulations: a computationally light formulation with a tight relaxation. The "P-split" formulations are based on a lifted transformation that splits convex additively separable constraints into P partitions and forms the convex hull of the linearized and partitioned disjunction. The "P-split" formulations are derived for disjunctive constraints with convex constraints within each disjuct, and we generalize the results for the case with nonconvex constraints within the disjuncts. We analyze the continuous relaxation of the P-split formulations and show that, under certain assumptions, the formulations form a hierarchy starting from a big-M equivalent and converging to the convex hull. The goal of the P-split formulations is to form strong approximations of the convex hull through a computationally simpler formulation. We computationally compare the P-split formulations against big-M and convex hull formulations on 344 test instances. The test problems include K-means clustering, semi-supervised clustering, P_ball problems, and optimization over trained ReLU neural networks. The computational results show promising potential of the P-split formulations. For many of the test problems, P-split formulations are solved with a similar number of explored nodes as the convex hull formulation, while reducing the solution time by an order of magnitude and outperforming big-M both in time and number of explored nodes.

Updated: 2024-05-27 12:41:17

标题: P-分割公式：一种介于大M和凸包之间的离散约束中间公式类别

摘要: 我们开发了一类混合整数形式，用于处理紧邻大M和凸包形式之间的松弛强度。主要思想是捕捉大M和凸包形式的优点：一种计算轻便的形式，同时具有紧密的松弛。基于分裂转换的“P分裂”形式将凸可加可分离约束拆分为P个分区，并形成线性化和分区间隔的凸包。针对每个disjuct中的凸约束，推导了“P-split”形式的不相交约束，并将结果推广到disjuncts内具有非凸约束的情况。我们分析了P-split形式的连续松弛，并展示，在某些假设条件下，这些形式形成了一个层次结构，从一个大M等价开始，收敛到凸包。P分裂形式的目标是通过一种计算简单的形式形成凸包的强近似。我们在344个测试实例上对P分裂形式与大M和凸包形式进行了计算比较。测试问题包括K均值聚类、半监督聚类、P球问题以及在经过训练的ReLU神经网络上的优化。计算结果显示P分裂形式具有潜在的优势。对于许多测试问题，P分裂形式的解法所探索的节点数量与凸包形式相似，同时将解决时间缩短一个数量级，并在时间和被探索节点数量方面优于大M。

更新时间: 2024-05-27 12:41:17

领域: math.OC,cs.LG

下载: http://arxiv.org/abs/2202.05198v2

Dual VC Dimension Obstructs Sample Compression by Embeddings

This work studies embedding of arbitrary VC classes in well-behaved VC classes, focusing particularly on extremal classes. Our main result expresses an impossibility: such embeddings necessarily require a significant increase in dimension. In particular, we prove that for every $d$ there is a class with VC dimension $d$ that cannot be embedded in any extremal class of VC dimension smaller than exponential in $d$. In addition to its independent interest, this result has an important implication in learning theory, as it reveals a fundamental limitation of one of the most extensively studied approaches to tackling the long-standing sample compression conjecture. Concretely, the approach proposed by Floyd and Warmuth entails embedding any given VC class into an extremal class of a comparable dimension, and then applying an optimal sample compression scheme for extremal classes. However, our results imply that this strategy would in some cases result in a sample compression scheme at least exponentially larger than what is predicted by the sample compression conjecture. The above implications follow from a general result we prove: any extremal class with VC dimension $d$ has dual VC dimension at most $2d+1$. This bound is exponentially smaller than the classical bound $2^{d+1}-1$ of Assouad, which applies to general concept classes (and is known to be unimprovable for some classes). We in fact prove a stronger result, establishing that $2d+1$ upper bounds the dual Radon number of extremal classes. This theorem represents an abstraction of the classical Radon theorem for convex sets, extending its applicability to a wider combinatorial framework, without relying on the specifics of Euclidean convexity. The proof utilizes the topological method and is primarily based on variants of the Topological Radon Theorem.

Updated: 2024-05-27 12:38:25

标题: 双重VC维度阻碍嵌入式样本压缩

摘要: 这项工作研究了任意VC类在良好的VC类中的嵌入，特别关注极端类。我们的主要结果表达了一个不可能性：这种嵌入必然需要显著增加维度。特别地，我们证明对于每个$d$，存在一个VC维度为$d$的类无法嵌入到任何VC维度小于$d$的指数级的极端类中。除了具有独立的兴趣外，这一结果在学习理论中具有重要意义，因为它揭示了解决长期存在的样本压缩猜想的一个最广泛研究方法的基本局限性。具体而言，Floyd和Warmuth提出的方法包括将任何给定的VC类嵌入到一个具有可比维度的极端类中，然后应用极端类的最佳样本压缩方案。然而，我们的结果意味着在某些情况下，这种策略将导致一个至少指数级更大的样本压缩方案，超过了样本压缩猜想的预测。以上推论源自我们证明的一个一般结果：任何VC维度为$d$的极端类的对偶VC维度最多为$2d+1$。这个上界在一般概念类的经典上界$2^{d+1}-1$的Assouad的基础上是指数级更小的（已知对于某些类无法改进）。事实上，我们证明了一个更强的结果，确定$2d+1$界限了极端类的对偶Radon数。这个定理是凸集的经典Radon定理的一个抽象，扩展了它适用的范围到更广泛的组合框架，而不依赖于欧几里德凸性的具体细节。证明利用了拓扑方法，主要基于Topological Radon Theorem的变体。

更新时间: 2024-05-27 12:38:25

领域: cs.DM,cs.LG,I.2.6; G.2.1

下载: http://arxiv.org/abs/2405.17120v1

Mixtures of Unsupervised Lexicon Classification

This paper presents a mixture version of the method-of-moment unsupervised lexicon classification by an incorporation of a Dirichlet process.

Updated: 2024-05-27 12:33:47

标题: 混合无监督词库分类

摘要: 本文提出了一种混合版本的矩法无监督词典分类方法，通过引入狄利克雷过程来实现。

更新时间: 2024-05-27 12:33:47

领域: cs.CL,cs.AI,cs.CV,cs.LG

下载: http://arxiv.org/abs/2405.17116v1

Diffusion Bridge AutoEncoders for Unsupervised Representation Learning

Diffusion-based representation learning has achieved substantial attention due to its promising capabilities in latent representation and sample generation. Recent studies have employed an auxiliary encoder to identify a corresponding representation from a sample and to adjust the dimensionality of a latent variable z. Meanwhile, this auxiliary structure invokes information split problem because the diffusion and the auxiliary encoder would divide the information from the sample into two representations for each model. Particularly, the information modeled by the diffusion becomes over-regularized because of the static prior distribution on xT. To address this problem, we introduce Diffusion Bridge AuteEncoders (DBAE), which enable z-dependent endpoint xT inference through a feed-forward architecture. This structure creates an information bottleneck at z, so xT becomes dependent on z in its generation. This results in two consequences: 1) z holds the full information of samples, and 2) xT becomes a learnable distribution, not static any further. We propose an objective function for DBAE to enable both reconstruction and generative modeling, with their theoretical justification. Empirical evidence supports the effectiveness of the intended design in DBAE, which notably enhances downstream inference quality, reconstruction, and disentanglement. Additionally, DBAE generates high-fidelity samples in the unconditional generation.

Updated: 2024-05-27 12:28:17

标题: 扩散桥自编码器用于无监督表示学习

摘要: 基于扩散的表示学习已经引起了相当大的关注，因为它在潜在表示和样本生成方面具有很大的潜力。最近的研究采用了辅助编码器来识别样本中的相应表示，并调整潜在变量z的维度。同时，这种辅助结构引发了信息分割问题，因为扩散和辅助编码器会将样本中的信息分成两个表示给每个模型。特别是，由扩散建模的信息由于xT上的静态先验分布而变得过于规范化。为了解决这个问题，我们引入了Diffusion Bridge AuteEncoders（DBAE），通过前馈架构实现了z依赖的端点xT推断。这种结构在z处创建了一个信息瓶颈，因此xT在生成时依赖于z。这带来了两个结果：1）z包含样本的全部信息，2）xT变成了可学习的分布，不再是静态的。我们提出了一个用于DBAE的目标函数，使其能够进行重构和生成建模，并给出了它们的理论证明。实证证据支持了DBAE中设计的有效性，显著提高了下游推断质量、重构和解缠。此外，DBAE在无条件生成中生成了高保真度的样本。

更新时间: 2024-05-27 12:28:17

领域: cs.LG

下载: http://arxiv.org/abs/2405.17111v1

Rethinking Intermediate Layers design in Knowledge Distillation for Kidney and Liver Tumor Segmentation

Knowledge distillation (KD) has demonstrated remarkable success across various domains, but its application to medical imaging tasks, such as kidney and liver tumor segmentation, has encountered challenges. Many existing KD methods are not specifically tailored for these tasks. Moreover, prevalent KD methods often lack a careful consideration of `what' and `from where' to distill knowledge from the teacher to the student. This oversight may lead to issues like the accumulation of training bias within shallower student layers, potentially compromising the effectiveness of KD. To address these challenges, we propose Hierarchical Layer-selective Feedback Distillation (HLFD). HLFD strategically distills knowledge from a combination of middle layers to earlier layers and transfers final layer knowledge to intermediate layers at both the feature and pixel levels. This design allows the model to learn higher-quality representations from earlier layers, resulting in a robust and compact student model. Extensive quantitative evaluations reveal that HLFD outperforms existing methods by a significant margin. For example, in the kidney segmentation task, HLFD surpasses the student model (without KD) by over 10\%, significantly improving its focus on tumor-specific features. From a qualitative standpoint, the student model trained using HLFD excels at suppressing irrelevant information and can focus sharply on tumor-specific details, which opens a new pathway for more efficient and accurate diagnostic tools. Code is available \href{https://github.com/vangorade/RethinkingKD_ISBI24}{here}.

Updated: 2024-05-27 12:27:16

标题: 重新思考知识蒸馏中的中间层设计：用于肾脏和肝脏肿瘤分割

摘要: 知识蒸馏（KD）在各个领域取得了显著的成功，但将其应用于医学成像任务，如肾脏和肝脏肿瘤分割，却遇到了挑战。许多现有的KD方法并不特别针对这些任务。此外，流行的KD方法往往缺乏对从教师到学生的知识蒸馏的“什么”和“从哪里”进行仔细考虑。这种疏忽可能导致训练偏差在较浅的学生层内积累，可能损害KD的有效性。为了解决这些挑战，我们提出了分层层选择性反馈蒸馏（HLFD）。HLFD从中间层的组合中策略性地蒸馏知识到较早的层，并将最终层的知识转移到特征和像素级别的中间层。这种设计使模型能够从较早的层中学习更高质量的表示，从而产生一个稳健且紧凑的学生模型。广泛的定量评估显示，HLFD在很大程度上优于现有方法。例如，在肾脏分割任务中，HLFD超过了学生模型（无KD）超过10％，显着改善了其对肿瘤特异性特征的关注。从定性角度来看，使用HLFD训练的学生模型在抑制无关信息方面表现出色，并能够专注于肿瘤特异性细节，为更高效和准确的诊断工具打开了一条新的途径。代码可在此处找到：https://github.com/vangorade/RethinkingKD_ISBI24。

更新时间: 2024-05-27 12:27:16

领域: cs.CV,cs.AI,cs.LG,q-bio.TO

下载: http://arxiv.org/abs/2311.16700v2

Superpixelwise Low-rank Approximation based Partial Label Learning for Hyperspectral Image Classification

Insufficient prior knowledge of a captured hyperspectral image (HSI) scene may lead the experts or the automatic labeling systems to offer incorrect labels or ambiguous labels (i.e., assigning each training sample to a group of candidate labels, among which only one of them is valid; this is also known as partial label learning) during the labeling process. Accordingly, how to learn from such data with ambiguous labels is a problem of great practical importance. In this paper, we propose a novel superpixelwise low-rank approximation (LRA)-based partial label learning method, namely SLAP, which is the first to take into account partial label learning in HSI classification. SLAP is mainly composed of two phases: disambiguating the training labels and acquiring the predictive model. Specifically, in the first phase, we propose a superpixelwise LRA-based model, preparing the affinity graph for the subsequent label propagation process while extracting the discriminative representation to enhance the following classification task of the second phase. Then to disambiguate the training labels, label propagation propagates the labeling information via the affinity graph of training pixels. In the second phase, we take advantage of the resulting disambiguated training labels and the discriminative representations to enhance the classification performance. The extensive experiments validate the advantage of the proposed SLAP method over state-of-the-art methods.

Updated: 2024-05-27 12:26:49

标题: 基于超像素低秩逼近的部分标签学习用于高光谱图像分类

摘要: 对于捕获的高光谱图像（HSI）场景的先验知识不足可能会导致专家或自动标注系统在标注过程中提供不正确的标签或模棱两可的标签（即，将每个训练样本分配给一组候选标签，其中只有一个是有效的；这也被称为部分标签学习）。因此，如何从具有模糊标签的数据中学习是一个非常重要的问题。在本文中，我们提出了一种新颖的基于超像素低秩逼近（LRA）的部分标签学习方法，命名为SLAP，这是第一个在HSI分类中考虑部分标签学习的方法。SLAP主要由两个阶段组成：消除训练标签的模棱两可性和获取预测模型。具体来说，在第一阶段中，我们提出了一个基于超像素LRA的模型，为后续的标签传播过程准备亲和图，同时提取辨别表征以增强第二阶段的分类任务。然后为了消除训练标签的模棱两可性，标签传播通过训练像素的亲和图传播标注信息。在第二阶段，我们利用结果消除的训练标签和辨别表征来增强分类性能。广泛的实验验证了所提出的SLAP方法相对于最先进方法的优势。

更新时间: 2024-05-27 12:26:49

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2405.17110v1

Finding good policies in average-reward Markov Decision Processes without prior knowledge

We revisit the identification of an $\varepsilon$-optimal policy in average-reward Markov Decision Processes (MDP). In such MDPs, two measures of complexity have appeared in the literature: the diameter, $D$, and the optimal bias span, $H$, which satisfy $H\leq D$. Prior work have studied the complexity of $\varepsilon$-optimal policy identification only when a generative model is available. In this case, it is known that there exists an MDP with $D \simeq H$ for which the sample complexity to output an $\varepsilon$-optimal policy is $\Omega(SAD/\varepsilon^2)$ where $S$ and $A$ are the sizes of the state and action spaces. Recently, an algorithm with a sample complexity of order $SAH/\varepsilon^2$ has been proposed, but it requires the knowledge of $H$. We first show that the sample complexity required to estimate $H$ is not bounded by any function of $S,A$ and $H$, ruling out the possibility to easily make the previous algorithm agnostic to $H$. By relying instead on a diameter estimation procedure, we propose the first algorithm for $(\varepsilon,\delta)$-PAC policy identification that does not need any form of prior knowledge on the MDP. Its sample complexity scales in $SAD/\varepsilon^2$ in the regime of small $\varepsilon$, which is near-optimal. In the online setting, our first contribution is a lower bound which implies that a sample complexity polynomial in $H$ cannot be achieved in this setting. Then, we propose an online algorithm with a sample complexity in $SAD^2/\varepsilon^2$, as well as a novel approach based on a data-dependent stopping rule that we believe is promising to further reduce this bound.

Updated: 2024-05-27 12:24:14

标题: 在没有先验知识的情况下，在平均回报马尔可夫决策过程中找到良好的策略

摘要: 我们重新审视在平均奖励马尔可夫决策过程（MDP）中识别出一个$\varepsilon$-最优策略。在这样的MDP中，文献中出现了两种复杂度度量：直径$D$和最优偏差跨度$H$，满足$H\leq D$。先前的研究仅在有生成模型可用时研究了$\varepsilon$-最优策略识别的复杂度。在这种情况下，已知存在一个MDP，其中$D \simeq H$，以便输出一个$\varepsilon$-最优策略的样本复杂度为$\Omega(SAD/\varepsilon^2)$，其中$S$和$A$分别为状态空间和动作空间的大小。最近提出了一个样本复杂度为$SAH/\varepsilon^2$的算法，但需要知道$H$。我们首先证明了估计$H$所需的样本复杂度不受$S,A$和$H$的任何函数限制，排除了轻松使先前算法对$H$不可知的可能性。相反，通过依赖直径估计程序，我们提出了第一个在不需要对MDP有任何形式的先验知识的$(\varepsilon,\delta)$-PAC策略识别的算法。在小$\varepsilon$区域，其样本复杂度按$SAD/\varepsilon^2$缩放，这是接近最优的。在在线设置中，我们的第一个贡献是一个下界，暗示在这种情况下无法实现与$H$多项式复杂度有关的样本复杂度。然后，我们提出了一个在线算法，其样本复杂度为$SAD^2/\varepsilon^2$，以及基于数据相关停止规则的新方法，我们认为这有望进一步减小这个界限。

更新时间: 2024-05-27 12:24:14

领域: cs.LG

下载: http://arxiv.org/abs/2405.17108v1

LLM-Optic: Unveiling the Capabilities of Large Language Models for Universal Visual Grounding

Visual grounding is an essential tool that links user-provided text queries with query-specific regions within an image. Despite advancements in visual grounding models, their ability to comprehend complex queries remains limited. To overcome this limitation, we introduce LLM-Optic, an innovative method that utilizes Large Language Models (LLMs) as an optical lens to enhance existing visual grounding models in comprehending complex text queries involving intricate text structures, multiple objects, or object spatial relationships, situations that current models struggle with. LLM-Optic first employs an LLM as a Text Grounder to interpret complex text queries and accurately identify objects the user intends to locate. Then a pre-trained visual grounding model is used to generate candidate bounding boxes given the refined query by the Text Grounder. After that, LLM-Optic annotates the candidate bounding boxes with numerical marks to establish a connection between text and specific image regions, thereby linking two distinct modalities. Finally, it employs a Large Multimodal Model (LMM) as a Visual Grounder to select the marked candidate objects that best correspond to the original text query. Through LLM-Optic, we have achieved universal visual grounding, which allows for the detection of arbitrary objects specified by arbitrary human language input. Importantly, our method achieves this enhancement without requiring additional training or fine-tuning. Extensive experiments across various challenging benchmarks demonstrate that LLM-Optic achieves state-of-the-art zero-shot visual grounding capabilities.

Updated: 2024-05-27 12:23:08

标题: LLM-Optic：揭示大型语言模型在通用视觉基础上的能力

摘要: 视觉基础是将用户提供的文本查询与图像内特定区域联系起来的重要工具。尽管视觉基础模型取得了进展，但它们理解复杂查询的能力仍然有限。为了克服这一限制，我们引入了LLM-Optic，这是一种创新方法，利用大型语言模型(LLMs)作为光学透镜，增强现有的视觉基础模型，以理解涉及复杂文本结构、多个对象或对象空间关系的复杂文本查询，这些是当前模型难以应对的情况。LLM-Optic首先将LLM作为文本基准，解释复杂文本查询并准确识别用户想要定位的对象。然后，使用预训练的视觉基础模型根据文本基准精炼的查询生成候选边界框。之后，LLM-Optic使用数字标记注释候选边界框，建立文本和特定图像区域之间的连接，从而将两种不同的模态联系起来。最后，它利用大型多模态模型(LMM)作为视觉基准，选择与原始文本查询最匹配的标记候选对象。通过LLM-Optic，我们实现了通用的视觉基础，可以检测由任意人类语言输入指定的任意对象。重要的是，我们的方法实现了这种增强，而无需额外的训练或微调。在各种具有挑战性的基准测试中进行的大量实验表明，LLM-Optic实现了最先进的零样本视觉基础能力。

更新时间: 2024-05-27 12:23:08

领域: cs.CV,cs.AI,cs.CL

下载: http://arxiv.org/abs/2405.17104v1

Empowering Character-level Text Infilling by Eliminating Sub-Tokens

In infilling tasks, sub-tokens, representing instances where a complete token is segmented into two parts, often emerge at the boundaries of prefixes, middles, and suffixes. Traditional methods focused on training models at the token level, leading to sub-optimal performance in character-level infilling tasks during the inference stage. Alternately, some approaches considered character-level infilling, but they relied on predicting sub-tokens in inference, yet this strategy diminished ability in character-level infilling tasks due to the large perplexity of the model on sub-tokens. In this paper, we introduce FIM-SE, which stands for Fill-In-the-Middle with both Starting and Ending character constraints. The proposed method addresses character-level infilling tasks by utilizing a line-level format to avoid predicting any sub-token in inference. In addition, we incorporate two special tokens to signify the rest of the incomplete lines, thereby enhancing generation guidance. Extensive experiments demonstrate that our proposed approach surpasses previous methods, offering a significant advantage. Code is available at https://github.com/SenseLLM/FIM-SE.

Updated: 2024-05-27 12:21:48

标题: 通过消除子标记，增强字符级文本填充

摘要: 在填充任务中，通常在前缀、中间和后缀的边界处出现表示将完整标记分割为两部分的次标记。传统方法侧重于在标记级别训练模型，在推理阶段导致字符级填充任务性能不佳。另一方面，一些方法考虑了字符级填充，但它们依赖于在推理中预测次标记，然而这种策略由于模型对次标记的大困惑而降低了字符级填充任务的能力。在本文中，我们介绍了FIM-SE，它代表Fill-In-the-Middle，并同时具有起始和结束字符约束。所提出的方法通过利用一种行级格式来解决字符级填充任务，从而避免在推理中预测任何次标记。此外，我们还引入了两个特殊标记来表示不完整行的其余部分，从而增强了生成指导。大量实验证明我们提出的方法超越了先前的方法，提供了显著优势。代码可在https://github.com/SenseLLM/FIM-SE找到。

更新时间: 2024-05-27 12:21:48

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2405.17103v1

Sok: Comprehensive Security Overview, Challenges, and Future Directions of Voice-Controlled Systems

The integration of Voice Control Systems (VCS) into smart devices and their growing presence in daily life accentuate the importance of their security. Current research has uncovered numerous vulnerabilities in VCS, presenting significant risks to user privacy and security. However, a cohesive and systematic examination of these vulnerabilities and the corresponding solutions is still absent. This lack of comprehensive analysis presents a challenge for VCS designers in fully understanding and mitigating the security issues within these systems. Addressing this gap, our study introduces a hierarchical model structure for VCS, providing a novel lens for categorizing and analyzing existing literature in a systematic manner. We classify attacks based on their technical principles and thoroughly evaluate various attributes, such as their methods, targets, vectors, and behaviors. Furthermore, we consolidate and assess the defense mechanisms proposed in current research, offering actionable recommendations for enhancing VCS security. Our work makes a significant contribution by simplifying the complexity inherent in VCS security, aiding designers in effectively identifying and countering potential threats, and setting a foundation for future advancements in VCS security research.

Updated: 2024-05-27 12:18:46

标题: 标题翻译：Sok：语音控制系统的全面安全概述、挑战和未来发展方向

摘要: 语音控制系统（VCS）的整合进智能设备以及它们在日常生活中不断增加的存在，突显了它们的安全性的重要性。当前的研究已经揭示了VCS中的许多漏洞，给用户的隐私和安全带来了重大风险。然而，对这些漏洞以及相应解决方案进行一致和系统的审查仍然缺乏。这种缺乏全面分析的情况对VCS设计者在完全理解和减轻这些系统内的安全问题方面构成了挑战。针对这一差距，我们的研究引入了一个VCS的层次模型结构，为对现有文献进行分类和分析提供了一种新颖的视角。我们根据攻击的技术原则对其进行分类，并彻底评估各种属性，例如方法、目标、向量和行为。此外，我们整合和评估了当前研究中提出的防御机制，提供了增强VCS安全性的可行建议。我们的工作通过简化VCS安全性中固有的复杂性，帮助设计者有效地识别和对抗潜在威胁，并为VCS安全性研究的未来进展奠定了基础，做出了重要贡献。

更新时间: 2024-05-27 12:18:46

领域: cs.CR,cs.SD,eess.AS

下载: http://arxiv.org/abs/2405.17100v1

Guiding Enumerative Program Synthesis with Large Language Models

Pre-trained Large Language Models (LLMs) are beginning to dominate the discourse around automatic code generation with natural language specifications. In contrast, the best-performing synthesizers in the domain of formal synthesis with precise logical specifications are still based on enumerative algorithms. In this paper, we evaluate the abilities of LLMs to solve formal synthesis benchmarks by carefully crafting a library of prompts for the domain. When one-shot synthesis fails, we propose a novel enumerative synthesis algorithm, which integrates calls to an LLM into a weighted probabilistic search. This allows the synthesizer to provide the LLM with information about the progress of the enumerator, and the LLM to provide the enumerator with syntactic guidance in an iterative loop. We evaluate our techniques on benchmarks from the Syntax-Guided Synthesis (SyGuS) competition. We find that GPT-3.5 as a stand-alone tool for formal synthesis is easily outperformed by state-of-the-art formal synthesis algorithms, but our approach integrating the LLM into an enumerative synthesis algorithm shows significant performance gains over both the LLM and the enumerative synthesizer alone and the winning SyGuS competition tool.

Updated: 2024-05-27 12:18:40

标题: 使用大型语言模型引导枚举式程序合成

摘要: 预训练的大型语言模型（LLMs）开始主导关于使用自然语言规范自动生成代码的讨论。相比之下，在具有精确逻辑规范的形式综合领域中，表现最佳的综合器仍然基于枚举算法。本文评估了LLMs解决形式综合基准的能力，通过精心设计一个领域提示库。当一次性综合失败时，我们提出了一种新颖的枚举综合算法，将LLM的调用集成到加权概率搜索中。这使得综合器能够向LLM提供有关枚举器进展的信息，而LLM则向枚举器提供语法指导，形成迭代循环。我们在来自语法引导综合（SyGuS）竞赛的基准上评估了我们的技术。我们发现，作为独立工具进行形式综合的GPT-3.5很容易被最先进的形式综合算法超越，但我们的方法将LLM集成到枚举综合算法中，显示出明显的性能增益，超过了单独的LLM和枚举综合器，以及获胜的SyGuS竞赛工具。

更新时间: 2024-05-27 12:18:40

领域: cs.AI

下载: http://arxiv.org/abs/2403.03997v2

Improving Token-Based World Models with Parallel Observation Prediction

Motivated by the success of Transformers when applied to sequences of discrete symbols, token-based world models (TBWMs) were recently proposed as sample-efficient methods. In TBWMs, the world model consumes agent experience as a language-like sequence of tokens, where each observation constitutes a sub-sequence. However, during imagination, the sequential token-by-token generation of next observations results in a severe bottleneck, leading to long training times, poor GPU utilization, and limited representations. To resolve this bottleneck, we devise a novel Parallel Observation Prediction (POP) mechanism. POP augments a Retentive Network (RetNet) with a novel forward mode tailored to our reinforcement learning setting. We incorporate POP in a novel TBWM agent named REM (Retentive Environment Model), showcasing a 15.4x faster imagination compared to prior TBWMs. REM attains superhuman performance on 12 out of 26 games of the Atari 100K benchmark, while training in less than 12 hours. Our code is available at \url{https://github.com/leor-c/REM}.

Updated: 2024-05-27 12:18:18

标题: 通过并行观测预测改进基于令牌的世界模型

摘要: 受到将Transformer应用于离散符号序列取得成功的启发，最近提出了基于标记的世界模型（TBWMs）作为高效的样本方法。在TBWMs中，世界模型将代理经验作为类似语言的令牌序列消耗，其中每个观察构成子序列。然而，在想象过程中，按顺序逐个生成下一个观察结果导致严重瓶颈，导致训练时间长、GPU利用率低且表示受限。为了解决这一瓶颈，我们设计了一种新颖的并行观察预测（POP）机制。POP通过将保留网络（RetNet）与适合我们的强化学习环境的新型前向模式相结合来增强。我们将POP结合到一个名为REM（保留环境模型）的新型TBWM代理中，展示了比以前的TBWMs快15.4倍的想象速度。REM在Atari 100K基准测试的26个游戏中的12个游戏中获得了超人类表现，而在不到12小时内完成训练。我们的代码可在\url{https://github.com/leor-c/REM}上找到。

更新时间: 2024-05-27 12:18:18

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2402.05643v3

Efficient Model Compression for Hierarchical Federated Learning

Federated learning (FL), as an emerging collaborative learning paradigm, has garnered significant attention due to its capacity to preserve privacy within distributed learning systems. In these systems, clients collaboratively train a unified neural network model using their local datasets and share model parameters rather than raw data, enhancing privacy. Predominantly, FL systems are designed for mobile and edge computing environments where training typically occurs over wireless networks. Consequently, as model sizes increase, the conventional FL frameworks increasingly consume substantial communication resources. To address this challenge and improve communication efficiency, this paper introduces a novel hierarchical FL framework that integrates the benefits of clustered FL and model compression. We present an adaptive clustering algorithm that identifies a core client and dynamically organizes clients into clusters. Furthermore, to enhance transmission efficiency, each core client implements a local aggregation with compression (LC aggregation) algorithm after collecting compressed models from other clients within the same cluster. Simulation results affirm that our proposed algorithms not only maintain comparable predictive accuracy but also significantly reduce energy consumption relative to existing FL mechanisms.

Updated: 2024-05-27 12:17:47

标题: Hierarchical Federated Learning的高效模型压缩

摘要: 分布式学习系统中，联邦学习（FL）作为一种新兴的协作学习范式，因其在保护隐私方面的能力而受到重视。在这些系统中，客户端通过使用本地数据集共同训练一个统一的神经网络模型，并共享模型参数，而不是原始数据，从而增强隐私保护。主要情况下，FL系统设计用于移动和边缘计算环境，训练通常在无线网络上进行。因此，随着模型大小的增加，传统的FL框架会越来越消耗大量的通信资源。为了解决这一挑战并提高通信效率，本文引入了一个集成了聚类FL和模型压缩优势的新型分层FL框架。我们提出了一种自适应聚类算法，可以识别核心客户端并动态将客户端组织成簇。此外，为了增强传输效率，每个核心客户端在从同一簇中的其他客户端收集压缩模型后，实施了本地聚合和压缩（LC聚合）算法。模拟结果证实，我们提出的算法不仅保持了可比较的预测准确性，而且相对于现有的FL机制，显著减少了能量消耗。

更新时间: 2024-05-27 12:17:47

领域: cs.LG,cs.DC

下载: http://arxiv.org/abs/2405.17522v1

Causal Temporal Regime Structure Learning

We address the challenge of structure learning from multivariate time series that are characterized by a sequence of different, unknown regimes. We introduce a new optimization-based method (CASTOR), that concurrently learns the Directed Acyclic Graph (DAG) for each regime and determine the number of regimes along with their sequential arrangement. Through the optimization of a score function via an expectation maximization (EM) algorithm, CASTOR alternates between learning the regime indices (Expectation step) and inferring causal relationships in each regime (Maximization step). We further prove the identifiability of regimes and DAGs within the CASTOR framework. We conduct extensive experiments and show that our method consistently outperforms causal discovery models across various settings (linear and nonlinear causal relationships) and datasets (synthetic and real data).

Updated: 2024-05-27 12:15:52

标题: 因果时间制度结构学习

摘要: 我们解决了从具有一系列不同未知机制的多变量时间序列中学习结构的挑战。我们引入了一种新的基于优化的方法（CASTOR），该方法同时学习每个机制的有向无环图（DAG）并确定机制的数量及其顺序排列。通过通过期望最大化（EM）算法优化得分函数，CASTOR在学习机制指数（期望步骤）和推断每个机制中的因果关系（最大化步骤）之间交替进行。我们进一步证明了CASTOR框架内机制和DAG的可识别性。我们进行了大量实验，并显示我们的方法在各种设置（线性和非线性因果关系）和数据集（合成和真实数据）上均始终优于因果发现模型。

更新时间: 2024-05-27 12:15:52

领域: cs.LG,cs.AI,stat.ME

下载: http://arxiv.org/abs/2311.01412v2

Q-value Regularized Transformer for Offline Reinforcement Learning

Recent advancements in offline reinforcement learning (RL) have underscored the capabilities of Conditional Sequence Modeling (CSM), a paradigm that learns the action distribution based on history trajectory and target returns for each state. However, these methods often struggle with stitching together optimal trajectories from sub-optimal ones due to the inconsistency between the sampled returns within individual trajectories and the optimal returns across multiple trajectories. Fortunately, Dynamic Programming (DP) methods offer a solution by leveraging a value function to approximate optimal future returns for each state, while these techniques are prone to unstable learning behaviors, particularly in long-horizon and sparse-reward scenarios. Building upon these insights, we propose the Q-value regularized Transformer (QT), which combines the trajectory modeling ability of the Transformer with the predictability of optimal future returns from DP methods. QT learns an action-value function and integrates a term maximizing action-values into the training loss of CSM, which aims to seek optimal actions that align closely with the behavior policy. Empirical evaluations on D4RL benchmark datasets demonstrate the superiority of QT over traditional DP and CSM methods, highlighting the potential of QT to enhance the state-of-the-art in offline RL.

Updated: 2024-05-27 12:12:39

标题: Q值正则化变换器用于离线强化学习

摘要: 最近离线强化学习（RL）领域的进展突显了条件序列建模（CSM）的能力，这是一种基于历史轨迹和每个状态的目标回报学习动作分布的范式。然而，这些方法通常难以将次优轨迹与最优轨迹结合起来，因为单个轨迹中采样回报与多个轨迹中的最优回报之间存在不一致性。幸运的是，动态规划（DP）方法通过利用值函数来逼近每个状态的最优未来回报提供了一种解决方案，尽管这些技术在长时间跨度和稀疏奖励场景中容易出现不稳定的学习行为。基于这些见解，我们提出了Q值正则化变压器（QT），它将变压器的轨迹建模能力与DP方法中的最优未来回报的可预测性结合起来。QT学习一个动作值函数，并将最大化动作值的项整合到CSM的训练损失中，旨在寻找与行为策略密切一致的最优行动。在D4RL基准数据集上的实证评估显示了QT相对于传统DP和CSM方法的优越性，突显了QT在离线RL领域的潜力。

更新时间: 2024-05-27 12:12:39

领域: cs.LG

下载: http://arxiv.org/abs/2405.17098v1

Evaluation of Multi-task Uncertainties in Joint Semantic Segmentation and Monocular Depth Estimation

While a number of promising uncertainty quantification methods have been proposed to address the prevailing shortcomings of deep neural networks like overconfidence and lack of explainability, quantifying predictive uncertainties in the context of joint semantic segmentation and monocular depth estimation has not been explored yet. Since many real-world applications are multi-modal in nature and, hence, have the potential to benefit from multi-task learning, this is a substantial gap in current literature. To this end, we conduct a comprehensive series of experiments to study how multi-task learning influences the quality of uncertainty estimates in comparison to solving both tasks separately.

Updated: 2024-05-27 12:12:26

标题: 评估联合语义分割和单目深度估计中的多任务不确定性

摘要: 尽管已经提出了许多有希望的不确定性量化方法来解决深度神经网络存在的问题，如过度自信和缺乏可解释性，但在联合语义分割和单目深度估计的背景下量化预测不确定性尚未被探讨。由于许多现实世界的应用是多模态的，并且因此有潜力从多任务学习中受益，这是当前文献中的一个重大空白。为此，我们进行了一系列全面的实验，研究多任务学习如何影响不确定性估计质量，与分别解决两个任务相比。

更新时间: 2024-05-27 12:12:26

领域: cs.CV,cs.AI,cs.LG

下载: http://arxiv.org/abs/2405.17097v1

Dual feature reduction for the sparse-group lasso and its adaptive variant

The sparse-group lasso performs both variable and group selection, making simultaneous use of the strengths of the lasso and group lasso. It has found widespread use in genetics, a field that regularly involves the analysis of high-dimensional data, due to its sparse-group penalty, which allows it to utilize grouping information. However, the sparse-group lasso can be computationally more expensive than both the lasso and group lasso, due to the added shrinkage complexity, and its additional hyper-parameter that needs tuning. In this paper a novel dual feature reduction method, Dual Feature Reduction (DFR), is presented that uses strong screening rules for the sparse-group lasso and the adaptive sparse-group lasso to reduce their input space before optimization. DFR applies two layers of screening and is based on the dual norms of the sparse-group lasso and adaptive sparse-group lasso. Through synthetic and real numerical studies, it is shown that the proposed feature reduction approach is able to drastically reduce the computational cost in many different scenarios.

Updated: 2024-05-27 12:10:07

标题: 双特征降维在稀疏组Lasso及其自适应变体中的应用

摘要: 稀疏组套索同时进行变量和组选择，充分利用套索和组套索的优势。由于其稀疏组惩罚，使其能够利用分组信息，因此在遗传学领域被广泛使用，该领域通常涉及高维数据的分析。然而，稀疏组套索可能在计算上比套索和组套索更昂贵，因为其额外的收缩复杂性和需要调整的附加超参数。本文提出了一种新颖的双特征降维方法，Dual Feature Reduction（DFR），该方法使用强筛选规则对稀疏组套索和自适应稀疏组套索进行输入空间优化。DFR应用两层筛选，并基于稀疏组套索和自适应稀疏组套索的双重规范。通过合成和实际数值研究，表明所提出的特征降维方法能够在许多不同场景中显著降低计算成本。

更新时间: 2024-05-27 12:10:07

领域: stat.ML,cs.LG

下载: http://arxiv.org/abs/2405.17094v1

Beyond static AI evaluations: advancing human interaction evaluations for LLM harms and risks

Model evaluations are central to understanding the safety, risks, and societal impacts of AI systems. While most real-world AI applications involve human-AI interaction, most current evaluations (e.g., common benchmarks) of AI models do not. Instead, they incorporate human factors in limited ways, assessing the safety of models in isolation, thereby falling short of capturing the complexity of human-model interactions. In this paper, we discuss and operationalize a definition of an emerging category of evaluations -- "human interaction evaluations" (HIEs) -- which focus on the assessment of human-model interactions or the process and the outcomes of humans using models. First, we argue that HIEs can be used to increase the validity of safety evaluations, assess direct human impact and interaction-specific harms, and guide future assessments of models' societal impact. Second, we propose a safety-focused HIE design framework -- containing a human-LLM interaction taxonomy -- with three stages: (1) identifying the risk or harm area, (2) characterizing the use context, and (3) choosing the evaluation parameters. Third, we apply our framework to two potential evaluations for overreliance and persuasion risks. Finally, we conclude with tangible recommendations for addressing concerns over costs, replicability, and unrepresentativeness of HIEs.

Updated: 2024-05-27 12:09:57

标题: 超越静态人工智能评估：推进LLM（大规模语言模型）伤害和风险的人类交互评估

摘要: 模型评估是理解人工智能系统的安全性、风险和社会影响的核心。虽然大多数现实世界的人工智能应用涉及人机交互，但大多数当前的评估（例如常见基准）不涉及人类因素。相反，它们以有限的方式纳入人类因素，评估模型的安全性，从而未能捕捉人模型交互的复杂性。在本文中，我们讨论并操作化了一种新兴评估类别的定义 - “人机交互评估”（HIEs），重点关注人与模型的交互评估或人类使用模型的过程和结果。首先，我们认为HIEs可以用于增加安全评估的有效性，评估直接的人类影响和交互特定的伤害，并指导未来对模型社会影响的评估。其次，我们提出了一个以安全为重点的HIE设计框架 - 包含人-LLM交互分类法 - 包括三个阶段：（1）确定风险或伤害领域，（2）表征使用环境，（3）选择评估参数。第三，我们将我们的框架应用于过度依赖和说服风险的两种潜在评估。最后，我们总结了针对HIE成本、可复制性和不代表性的担忧提出了切实的建议。

更新时间: 2024-05-27 12:09:57

领域: cs.CY,cs.AI,cs.HC

下载: http://arxiv.org/abs/2405.10632v3

Tilt your Head: Activating the Hidden Spatial-Invariance of Classifiers

Deep neural networks are applied in more and more areas of everyday life. However, they still lack essential abilities, such as robustly dealing with spatially transformed input signals. Approaches to mitigate this severe robustness issue are limited to two pathways: Either models are implicitly regularised by increased sample variability (data augmentation) or explicitly constrained by hard-coded inductive biases. The limiting factor of the former is the size of the data space, which renders sufficient sample coverage intractable. The latter is limited by the engineering effort required to develop such inductive biases for every possible scenario. Instead, we take inspiration from human behaviour, where percepts are modified by mental or physical actions during inference. We propose a novel technique to emulate such an inference process for neural nets. This is achieved by traversing a sparsified inverse transformation tree during inference using parallel energy-based evaluations. Our proposed inference algorithm, called Inverse Transformation Search (ITS), is model-agnostic and equips the model with zero-shot pseudo-invariance to spatially transformed inputs. We evaluated our method on several benchmark datasets, including a synthesised ImageNet test set. ITS outperforms the utilised baselines on all zero-shot test scenarios.

Updated: 2024-05-27 12:09:08

标题: 倾斜你的头：激活分类器的隐藏空间不变性

摘要: 深度神经网络在日常生活的更多领域得到应用。然而，它们仍然缺乏一些关键能力，比如在处理空间转换输入信号时具有稳健性。缓解这个严重稳健性问题的方法有限，要么通过增加样本变异性（数据增强）来隐式正则化模型，要么通过硬编码的归纳偏见来显式约束模型。前者的限制因素是数据空间的大小，使得足够的样本覆盖变得困难。后者受到为每种可能情况开发这种归纳偏见所需的工程工作的限制。相反，我们从人类行为中汲取灵感，在推断过程中通过心理或物理行为修改感知。我们提出了一种新颖的技术来模拟神经网络的这种推断过程。这是通过在推断过程中使用并行能量评估来遍历稀疏化的逆转换树来实现的。我们提出的推断算法称为逆转换搜索（ITS），与模型无关，并为模型提供了对空间转换输入的零样本伪不变性。我们在几个基准数据集上评估了我们的方法，包括一个合成的ImageNet测试集。在所有零样本测试场景中，ITS都优于所使用的基线。

更新时间: 2024-05-27 12:09:08

领域: cs.LG,cs.CV

下载: http://arxiv.org/abs/2405.03730v2

Phase Transitions in the Output Distribution of Large Language Models

In a physical system, changing parameters such as temperature can induce a phase transition: an abrupt change from one state of matter to another. Analogous phenomena have recently been observed in large language models. Typically, the task of identifying phase transitions requires human analysis and some prior understanding of the system to narrow down which low-dimensional properties to monitor and analyze. Statistical methods for the automated detection of phase transitions from data have recently been proposed within the physics community. These methods are largely system agnostic and, as shown here, can be adapted to study the behavior of large language models. In particular, we quantify distributional changes in the generated output via statistical distances, which can be efficiently estimated with access to the probability distribution over next-tokens. This versatile approach is capable of discovering new phases of behavior and unexplored transitions -- an ability that is particularly exciting in light of the rapid development of language models and their emergent capabilities.

Updated: 2024-05-27 12:04:36

标题: 大型语言模型输出分布的相变

摘要: 在一个物理系统中，改变参数如温度可以引发相变：物质从一种状态突然转变为另一种状态。类似的现象最近在大型语言模型中被观察到。通常，识别相变的任务需要人类分析和对系统的一些先前了解，以缩小需要监测和分析的低维属性范围。最近物理学界提出了用于从数据中自动检测相变的统计方法。这些方法在很大程度上不受系统影响，如本文所示，可以被调整用来研究大型语言模型的行为。具体而言，我们通过统计距离量化生成输出中的分布变化，这些统计距离可以通过访问下一个令牌的概率分布来有效地估计。这种多功能的方法能够发现新的行为相和未被探索的转变--这一能力在考虑到语言模型的快速发展和它们新兴的能力时特别令人兴奋。

更新时间: 2024-05-27 12:04:36

领域: cs.LG,cond-mat.stat-mech,cs.AI,cs.CL

下载: http://arxiv.org/abs/2405.17088v1

Effective Layer Pruning Through Similarity Metric Perspective

Deep neural networks have been the predominant paradigm in machine learning for solving cognitive tasks. Such models, however, are restricted by a high computational overhead, limiting their applicability and hindering advancements in the field. Extensive research demonstrated that pruning structures from these models is a straightforward approach to reducing network complexity. In this direction, most efforts focus on removing weights or filters. Studies have also been devoted to layer pruning as it promotes superior computational gains. However, layer pruning often hurts the network predictive ability (i.e., accuracy) at high compression rates. This work introduces an effective layer-pruning strategy that meets all underlying properties pursued by pruning methods. Our method estimates the relative importance of a layer using the Centered Kernel Alignment (CKA) metric, employed to measure the similarity between the representations of the unpruned model and a candidate layer for pruning. We confirm the effectiveness of our method on standard architectures and benchmarks, in which it outperforms existing layer-pruning strategies and other state-of-the-art pruning techniques. Particularly, we remove more than 75% of computation while improving predictive ability. At higher compression regimes, our method exhibits negligible accuracy drop, while other methods notably deteriorate model accuracy. Apart from these benefits, our pruned models exhibit robustness to adversarial and out-of-distribution samples.

Updated: 2024-05-27 11:54:51

标题: 相似性度量视角下的有效层剪枝

摘要: 深度神经网络一直是解决认知任务的机器学习中的主导范式。然而，这些模型受到高计算开销的限制，限制了它们的适用性并阻碍了该领域的进展。大量研究表明，从这些模型中修剪结构是减少网络复杂性的一种简单方法。在这方面，大多数工作集中在去除权重或滤波器上。研究还致力于层修剪，因为它促进了更好的计算收益。然而，层修剪通常在高压缩率下损害网络的预测能力（即准确性）。本研究介绍了一种有效的层修剪策略，满足修剪方法追求的所有基本属性。我们的方法使用中心核对齐（CKA）度量来估计层的相对重要性，用于衡量未修剪模型和候选层之间的表示之间的相似性。我们在标准架构和基准测试中证实了我们方法的有效性，在这些测试中，它优于现有的层修剪策略和其他最先进的修剪技术。特别是，我们在提高预测能力的同时减少了超过75%的计算量。在更高的压缩范围内，我们的方法表现出几乎可以忽略的准确性下降，而其他方法明显降低了模型的准确性。除了这些好处，我们修剪的模型对对抗性和超出分布样本表现出鲁棒性。

更新时间: 2024-05-27 11:54:51

领域: cs.LG

下载: http://arxiv.org/abs/2405.17081v1

Unified Hallucination Detection for Multimodal Large Language Models

Despite significant strides in multimodal tasks, Multimodal Large Language Models (MLLMs) are plagued by the critical issue of hallucination. The reliable detection of such hallucinations in MLLMs has, therefore, become a vital aspect of model evaluation and the safeguarding of practical application deployment. Prior research in this domain has been constrained by a narrow focus on singular tasks, an inadequate range of hallucination categories addressed, and a lack of detailed granularity. In response to these challenges, our work expands the investigative horizons of hallucination detection. We present a novel meta-evaluation benchmark, MHaluBench, meticulously crafted to facilitate the evaluation of advancements in hallucination detection methods. Additionally, we unveil a novel unified multimodal hallucination detection framework, UNIHD, which leverages a suite of auxiliary tools to validate the occurrence of hallucinations robustly. We demonstrate the effectiveness of UNIHD through meticulous evaluation and comprehensive analysis. We also provide strategic insights on the application of specific tools for addressing various categories of hallucinations.

Updated: 2024-05-27 11:52:56

标题: 多模态大型语言模型统一幻觉检测

摘要: 尽管在多模态任务方面取得了显著进展，但多模态大型语言模型（MLLMs）仍然受到幻觉的严重问题困扰。因此，在MLLMs中可靠地检测这种幻觉已成为模型评估和实际应用部署的重要方面。在这个领域的先前研究受到了对单一任务的狭窄关注、对幻觉类别的不足涉及以及缺乏详细细致的限制。针对这些挑战，我们的工作扩展了幻觉检测的调查视野。我们提出了一个新颖的元评估基准，MHaluBench，精心设计以促进幻觉检测方法的进展评估。此外，我们揭示了一个新颖的统一多模态幻觉检测框架，UNIHD，利用一套辅助工具来强大地验证幻觉的发生。我们通过细致的评估和全面的分析证明了UNIHD的有效性。我们还提供了关于应用特定工具来解决各种幻觉类别的战略见解。

更新时间: 2024-05-27 11:52:56

领域: cs.CL,cs.AI,cs.IR,cs.LG,cs.MM

下载: http://arxiv.org/abs/2402.03190v4

Learning with User-Level Local Differential Privacy

User-level privacy is important in distributed systems. Previous research primarily focuses on the central model, while the local models have received much less attention. Under the central model, user-level DP is strictly stronger than the item-level one. However, under the local model, the relationship between user-level and item-level LDP becomes more complex, thus the analysis is crucially different. In this paper, we first analyze the mean estimation problem and then apply it to stochastic optimization, classification, and regression. In particular, we propose adaptive strategies to achieve optimal performance at all privacy levels. Moreover, we also obtain information-theoretic lower bounds, which show that the proposed methods are minimax optimal up to logarithmic factors. Unlike the central DP model, where user-level DP always leads to slower convergence, our result shows that under the local model, the convergence rates are nearly the same between user-level and item-level cases for distributions with bounded support. For heavy-tailed distributions, the user-level rate is even faster than the item-level one.

Updated: 2024-05-27 11:52:24

标题: 学习用户级本地差分隐私

摘要: 用户级隐私在分布式系统中至关重要。先前的研究主要集中在中心模型上，而本地模型却受到了更少的关注。在中心模型下，用户级差分隐私严格强于项目级差分隐私。然而，在本地模型下，用户级和项目级局部差分隐私之间的关系变得更加复杂，因此分析至关重要。在本文中，我们首先分析了均值估计问题，然后将其应用于随机优化、分类和回归。特别地，我们提出了自适应策略，以实现在所有隐私级别上的最佳性能。此外，我们还获得了信息论下界，表明所提出的方法在对数因子上是极小化最优的。与中心差分隐私模型不同，在那里用户级差分隐私总是导致收敛速度较慢，我们的结果表明，在本地模型下，对于支持受限的分布，用户级和项目级情况之间的收敛速度几乎相同。对于重尾分布，用户级速度甚至比项目级更快。

更新时间: 2024-05-27 11:52:24

领域: stat.ML,cs.LG

下载: http://arxiv.org/abs/2405.17079v1

Leveraging small language models for Text2SPARQL tasks to improve the resilience of AI assistance

In this work we will show that language models with less than one billion parameters can be used to translate natural language to SPARQL queries after fine-tuning. Using three different datasets ranging from academic to real world, we identify prerequisites that the training data must fulfill in order for the training to be successful. The goal is to empower users of semantic web technology to use AI assistance with affordable commodity hardware, making them more resilient against external factors.

Updated: 2024-05-27 11:47:21

标题: 利用小型语言模型改进Text2SPARQL任务，以提高AI助手的弹性

摘要: 在这项工作中，我们将展示拥有不到十亿参数的语言模型经过微调后可以用于将自然语言翻译成SPARQL查询。通过使用从学术到真实世界的三个不同数据集，我们确定了训练数据必须满足的先决条件，以确保训练成功。我们的目标是让语义网络技术的用户能够利用价格实惠的通用硬件获得人工智能辅助，使他们更具抗干扰能力。

更新时间: 2024-05-27 11:47:21

领域: cs.AI,cs.CL,cs.IR

下载: http://arxiv.org/abs/2405.17076v1

Interaction-Force Transport Gradient Flows

This paper presents a new type of gradient flow geometries over non-negative and probability measures motivated via a principled construction that combines the optimal transport and interaction forces modeled by reproducing kernels. Concretely, we propose the interaction-force transport (IFT) gradient flows and its spherical variant via an infimal convolution of the Wasserstein and spherical MMD Riemannian metric tensors. We then develop a particle-based optimization algorithm based on the JKO-splitting scheme of the mass-preserving spherical IFT gradient flows. Finally, we provide both theoretical global exponential convergence guarantees and empirical simulation results for applying the IFT gradient flows to the sampling task of MMD-minimization studied by Arbel et al. [2019]. Furthermore, we prove that the spherical IFT gradient flow enjoys the best of both worlds by providing the global exponential convergence guarantee for both the MMD and KL energy.

Updated: 2024-05-27 11:46:14

标题: 相互作用力传输梯度流

摘要: 这篇论文介绍了一种新型的梯度流几何结构，适用于非负和概率测度，其构建受到最优输运和由再生核建模的相互作用力的启发。具体而言，我们提出了相互作用力输运（IFT）梯度流及其球形变体，通过Wasserstein和球形MMD黎曼度量张量的基于infimal卷积。然后，我们基于保持质量的球形IFT梯度流的JKO分裂方案开发了基于粒子的优化算法。最后，我们为将IFT梯度流应用于Arbel等人[2019]研究的MMD最小化抽样任务提供了理论上的全局指数收敛保证和经验模拟结果。此外，我们证明球形IFT梯度流在提供MMD和KL能量的全局指数收敛保证方面具有双重优势。

更新时间: 2024-05-27 11:46:14

领域: cs.LG,math.AP,stat.ML

下载: http://arxiv.org/abs/2405.17075v1

From Sparse to Soft Mixtures of Experts

Sparse mixture of expert architectures (MoEs) scale model capacity without significant increases in training or inference costs. Despite their success, MoEs suffer from a number of issues: training instability, token dropping, inability to scale the number of experts, or ineffective finetuning. In this work, we propose Soft MoE, a fully-differentiable sparse Transformer that addresses these challenges, while maintaining the benefits of MoEs. Soft MoE performs an implicit soft assignment by passing different weighted combinations of all input tokens to each expert. As in other MoEs, experts in Soft MoE only process a subset of the (combined) tokens, enabling larger model capacity (and performance) at lower inference cost. In the context of visual recognition, Soft MoE greatly outperforms dense Transformers (ViTs) and popular MoEs (Tokens Choice and Experts Choice). Furthermore, Soft MoE scales well: Soft MoE Huge/14 with 128 experts in 16 MoE layers has over 40x more parameters than ViT Huge/14, with only 2% increased inference time, and substantially better quality.

Updated: 2024-05-27 11:44:51

标题: 从稀疏到软专家混合

摘要: 稀疏混合专家架构（MoEs）扩展了模型容量，而不会显著增加训练或推断成本。尽管取得了成功，MoEs仍然面临一些问题：训练不稳定、标记丢失、无法扩展专家数量或无效的微调。在这项工作中，我们提出了Soft MoE，这是一个完全可微的稀疏Transformer，可以解决这些挑战，同时保持MoEs的优点。Soft MoE通过将所有输入标记的不同加权组合传递给每个专家来执行隐式软分配。与其他MoEs一样，Soft MoE中的专家只处理（组合的）标记的子集，从而实现更大的模型容量（和性能），并降低推断成本。在视觉识别的背景下，Soft MoE远远优于密集Transformer（ViTs）和流行的MoEs（Tokens Choice和Experts Choice）。此外，Soft MoE具有良好的扩展性：具有128个专家在16个MoE层中的Soft MoE Huge/14比ViT Huge/14具有40倍以上的参数，仅增加了2%的推断时间，并且质量更好。

更新时间: 2024-05-27 11:44:51

领域: cs.LG,cs.AI,cs.CV

下载: http://arxiv.org/abs/2308.00951v2

A novel framework for systematic propositional formula simplification based on existential graphs

This paper presents a novel simplification calculus for propositional logic derived from Peirce's existential graphs' rules of inference and implication graphs. Our rules can be applied to propositional logic formulae in nested form, are equivalence-preserving, guarantee a monotonically decreasing number of variables, clauses and literals, and maximise the preservation of structural problem information. Our techniques can also be seen as higher-level SAT preprocessing, and we show how one of our rules (TWSR) generalises and streamlines most of the known equivalence-preserving SAT preprocessing methods. In addition, we propose a simplification procedure based on the systematic application of two of our rules (EPR and TWSR) which is solver-agnostic and can be used to simplify large Boolean satisfiability problems and propositional formulae in arbitrary form, and we provide a formal analysis of its algorithmic complexity in terms of space and time. Finally, we show how our rules can be further extended with a novel n-ary implication graph to capture all known equivalence-preserving preprocessing procedures.

Updated: 2024-05-27 11:42:46

标题: 基于存在图的系统命题公式简化的新框架

摘要: 本文介绍了一种新颖的命题逻辑简化演算法，该演算法源自皮尔斯的存在图推理规则和蕴含图。我们的规则可应用于嵌套形式的命题逻辑公式，保持等价性，确保变量、子句和文字数量单调递减，并最大化保留结构性问题信息。我们的技术也可以看作是高级SAT预处理，我们展示了我们的一条规则（TWSR）如何泛化并简化大多数已知的保持等价性的SAT预处理方法。此外，我们提出了一种简化程序，基于我们的两条规则（EPR和TWSR）的系统应用，与求解器无关，可用于简化大型布尔可满足性问题和任意形式的命题公式，并在空间和时间方面提供了其算法复杂性的形式分析。最后，我们展示了如何进一步扩展我们的规则，使用一种新颖的n元蕴含图来捕捉所有已知的保持等价性的预处理程序。

更新时间: 2024-05-27 11:42:46

领域: cs.LO,cs.AI,math.LO,03B35, 03B70, 68N17, 68T27,F.4.1; I.2.2; I.2.3; I.2.4

下载: http://arxiv.org/abs/2405.17072v1

Efficient mid-term forecasting of hourly electricity load using generalized additive models

Accurate mid-term (weeks to one year) hourly electricity load forecasts are essential for strategic decision-making in power plant operation, ensuring supply security and grid stability, and energy trading. While numerous models effectively predict short-term (hours to a few days) hourly load, mid-term forecasting solutions remain scarce. In mid-term load forecasting, besides daily, weekly, and annual seasonal and autoregressive effects, capturing weather and holiday effects, as well as socio-economic non-stationarities in the data, poses significant modeling challenges. To address these challenges, we propose a novel forecasting method using Generalized Additive Models (GAMs) built from interpretable P-splines and enhanced with autoregressive post-processing. This model uses smoothed temperatures, Error-Trend-Seasonal (ETS) modeled non-stationary states, a nuanced representation of holiday effects with weekday variations, and seasonal information as input. The proposed model is evaluated on load data from 24 European countries. This analysis demonstrates that the model not only has significantly enhanced forecasting accuracy compared to state-of-the-art methods but also offers valuable insights into the influence of individual components on predicted load, given its full interpretability. Achieving performance akin to day-ahead TSO forecasts in fast computation times of a few seconds for several years of hourly data underscores the model's potential for practical application in the power system industry.

Updated: 2024-05-27 11:41:41

标题: 使用广义加性模型高效中期预测每小时电力负荷

摘要: 准确的中期（数周到一年）每小时电力负荷预测对于电厂运营的战略决策、确保供应安全和电网稳定以及能源交易至关重要。虽然许多模型有效地预测短期（几小时到几天）每小时负荷，但中期预测解决方案仍然很少见。在中期负荷预测中，除了每日、每周和年度季节性和自回归效应外，捕捉天气和节假日效应以及数据中的社会经济非平稳性，都带来了重要的建模挑战。为了解决这些挑战，我们提出了一种使用通用加法模型（GAMs）的新型预测方法，该模型由可解释的P-样条构建，并增强了自回归后处理。该模型使用平滑温度、Error-Trend-Seasonal（ETS）建模的非平稳状态、对节假日效应进行微妙的表示以及季节信息作为输入。所提出的模型在来自24个欧洲国家的负载数据上进行了评估。这一分析表明，与最先进的方法相比，该模型不仅具有明显增强的预测准确性，而且还能提供有关预测负荷的各个组成部分对其影响的宝贵见解，因为其具有完全的可解释性。在几秒钟内实现与日前TSO预测相似的性能，对几年的每小时数据进行快速计算的模型潜力在电力系统行业的实际应用中具有巨大潜力。

更新时间: 2024-05-27 11:41:41

领域: stat.AP,cs.LG,econ.GN,q-fin.EC,q-fin.ST

下载: http://arxiv.org/abs/2405.17070v1

Training-free Editioning of Text-to-Image Models

Inspired by the software industry's practice of offering different editions or versions of a product tailored to specific user groups or use cases, we propose a novel task, namely, training-free editioning, for text-to-image models. Specifically, we aim to create variations of a base text-to-image model without retraining, enabling the model to cater to the diverse needs of different user groups or to offer distinct features and functionalities. To achieve this, we propose that different editions of a given text-to-image model can be formulated as concept subspaces in the latent space of its text encoder (e.g., CLIP). In such a concept subspace, all points satisfy a specific user need (e.g., generating images of a cat lying on the grass/ground/falling leaves). Technically, we apply Principal Component Analysis (PCA) to obtain the desired concept subspaces from representative text embedding that correspond to a specific user need or requirement. Projecting the text embedding of a given prompt into these low-dimensional subspaces enables efficient model editioning without retraining. Intuitively, our proposed editioning paradigm enables a service provider to customize the base model into its "cat edition" (or other editions) that restricts image generation to cats, regardless of the user's prompt (e.g., dogs, people, etc.). This introduces a new dimension for product differentiation, targeted functionality, and pricing strategies, unlocking novel business models for text-to-image generators. Extensive experimental results demonstrate the validity of our approach and its potential to enable a wide range of customized text-to-image model editions across various domains and applications.

Updated: 2024-05-27 11:40:50

标题: 无需训练的文本到图像模型编辑

摘要: 受软件行业提供针对特定用户群体或用例定制不同版本或版本的做法启发，我们提出了一项新颖的任务，即无需训练的版本控制，用于文本到图像模型。具体而言，我们旨在创建基于基本文本到图像模型的变体，而无需重新训练，使模型能够满足不同用户群体的多样化需求或提供不同的功能和特性。为了实现这一目标，我们建议给定文本到图像模型的不同版本可以被构建为其文本编码器（例如CLIP）的潜在空间中的概念子空间。在这样的概念子空间中，所有点都满足特定用户需求（例如生成猫躺在草地/地面/落叶上的图像）。从代表性文本嵌入中应用主成分分析（PCA）来获得所需的概念子空间，这些嵌入对应于特定用户需求或要求。将给定提示的文本嵌入投影到这些低维子空间中，可以实现有效的模型版本控制而无需重新训练。直观地说，我们提出的版本控制范式使服务提供商能够将基本模型定制为其“猫版本”（或其他版本），限制图像生成为猫，而不管用户的提示是什么（例如狗、人等）。这为产品差异化、目标功能和定价策略引入了新的维度，解锁了文本到图像生成器的新商业模型。大量实验证明了我们方法的有效性，并显示了它在各个领域和应用中实现各种定制文本到图像模型版本的潜力。

更新时间: 2024-05-27 11:40:50

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2405.17069v1

The Poisson Midpoint Method for Langevin Dynamics: Provably Efficient Discretization for Diffusion Models

Langevin Dynamics is a Stochastic Differential Equation (SDE) central to sampling and generative modeling and is implemented via time discretization. Langevin Monte Carlo (LMC), based on the Euler-Maruyama discretization, is the simplest and most studied algorithm. LMC can suffer from slow convergence - requiring a large number of steps of small step-size to obtain good quality samples. This becomes stark in the case of diffusion models where a large number of steps gives the best samples, but the quality degrades rapidly with smaller number of steps. Randomized Midpoint Method has been recently proposed as a better discretization of Langevin dynamics for sampling from strongly log-concave distributions. However, important applications such as diffusion models involve non-log concave densities and contain time varying drift. We propose its variant, the Poisson Midpoint Method, which approximates a small step-size LMC with large step-sizes. We prove that this can obtain a quadratic speed up of LMC under very weak assumptions. We apply our method to diffusion models for image generation and show that it maintains the quality of DDPM with 1000 neural network calls with just 50-80 neural network calls and outperforms ODE based methods with similar compute.

Updated: 2024-05-27 11:40:42

标题: The Poisson Midpoint Method for Langevin Dynamics: 随机扩散模型的可证明高效离散化

摘要: Langevin动力学是一种随机微分方程（SDE），在采样和生成建模中起着重要作用，并通过时间离散化实现。基于Euler-Maruyama离散化的Langevin Monte Carlo（LMC）是最简单且最研究的算法。LMC可能收敛速度较慢，需要大量步数和小步长才能获得高质量的样本。在扩散模型中，大量步数可以获得最佳样本，但步数较少时质量迅速下降。最近提出了随机中点方法作为对从强对数凹分布进行采样的Langevin动力学的更好离散化方法。然而，重要应用如扩散模型涉及非对数凹密度，并包含时变漂移。我们提出其变种，泊松中点方法，用以近似小步长LMC，并使步长变大。我们证明在非常弱的假设下，这可以获得LMC的二次加速。我们将该方法应用于图像生成的扩散模型，并展示仅使用50-80个神经网络调用即可维持DDPM的质量，胜过具有类似计算的ODE方法。

更新时间: 2024-05-27 11:40:42

领域: cs.LG,cs.NA,math.NA,stat.ML

下载: http://arxiv.org/abs/2405.17068v1

Tokenization Matters! Degrading Large Language Models through Challenging Their Tokenization

Large Language Models (LLMs) have shown remarkable capabilities in language understanding and generation. Nonetheless, it was also witnessed that LLMs tend to produce inaccurate responses to specific queries. This deficiency can be traced to the tokenization step LLMs must undergo, which is an inevitable limitation inherent to all LLMs. In fact, incorrect tokenization is the critical point that hinders LLMs in understanding the input precisely, thus leading to unsatisfactory output. To demonstrate this flaw of LLMs, we construct an adversarial dataset, named as $\textbf{ADT (Adversarial Dataset for Tokenizer)}$, which draws upon the vocabularies of various open-source LLMs to challenge LLMs' tokenization. ADT consists of two subsets: the manually constructed ADT-Human and the automatically generated ADT-Auto. Our empirical results reveal that our ADT is highly effective on challenging the tokenization of leading LLMs, including GPT-4o, Llama-3, Qwen2.5-max and so on, thus degrading these LLMs' capabilities. Moreover, our method of automatic data generation has been proven efficient and robust, which can be applied to any open-source LLMs. To the best of our knowledge, our study is the first to investigating LLMs' vulnerability in terms of challenging their token segmentation, which will shed light on the subsequent research of improving LLMs' capabilities through optimizing their tokenization process and algorithms.

Updated: 2024-05-27 11:39:59

标题: 标记化很重要！通过挑战其标记化来降低大型语言模型的性能

摘要: 大型语言模型(LLMs)在语言理解和生成方面展现出了非凡的能力。然而，也有人发现LLMs往往会对特定查询产生不准确的响应。这种缺陷可以追溯到LLMs必须经历的分词步骤，这是所有LLMs固有的不可避免的限制。事实上，错误的分词是阻碍LLMs准确理解输入的关键点，从而导致不令人满意的输出。为了证明LLMs的这一缺陷，我们构建了一个对抗数据集，命名为$\textbf{ADT (Tokenzier的对抗数据集)}$，它利用各种开源LLMs的词汇挑战LLMs的分词。ADT包括两个子集：手动构建的ADT-Human和自动生成的ADT-Auto。我们的实证结果表明，我们的ADT对挑战领先的LLMs，包括GPT-4o、Llama-3、Qwen2.5-max等，在分词方面非常有效，从而降低了这些LLMs的能力。此外，我们的自动生成数据的方法已被证明是高效且稳健的，可以应用于任何开源LLMs。据我们所知，我们的研究是第一个探讨LLMs在挑战它们的分词分割方面的脆弱性，这将为通过优化它们的分词过程和算法来提高LLMs能力的后续研究提供启示。

更新时间: 2024-05-27 11:39:59

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2405.17067v1

Saturn: Sample-efficient Generative Molecular Design using Memory Manipulation

Generative molecular design for drug discovery has very recently achieved a wave of experimental validation, with language-based backbones being the most common architectures employed. The most important factor for downstream success is whether an in silico oracle is well correlated with the desired end-point. To this end, current methods use cheaper proxy oracles with higher throughput before evaluating the most promising subset with high-fidelity oracles. The ability to directly optimize high-fidelity oracles would greatly enhance generative design and be expected to improve hit rates. However, current models are not efficient enough to consider such a prospect, exemplifying the sample efficiency problem. In this work, we introduce Saturn, which leverages the Augmented Memory algorithm and demonstrates the first application of the Mamba architecture for generative molecular design. We elucidate how experience replay with data augmentation improves sample efficiency and how Mamba synergistically exploits this mechanism. Saturn outperforms 22 models on multi-parameter optimization tasks relevant to drug discovery and may possess sufficient sample efficiency to consider the prospect of directly optimizing high-fidelity oracles.

Updated: 2024-05-27 11:37:36

标题: 土星：利用记忆操作进行高效生成分子设计

摘要: 药物发现的生成式分子设计最近取得了一波实验证实，以基于语言的骨架为最常见的架构。下游成功的最重要因素是计算模型是否与期望的终点良好相关。为此，当前方法在评估具有更高吞吐量的廉价代理预测模型之前，使用更昂贵的高保真度预测模型评估最有前景的子集。直接优化高保真度预测模型的能力将极大地增强生成式设计，并有望提高命中率。然而，当前模型并不足够高效以考虑这样的前景，体现了样本效率问题。在这项工作中，我们介绍了Saturn，它利用了增强记忆算法，并展示了Mamba架构在生成式分子设计中的首次应用。我们阐明经验重播和数据增强如何提高样本效率，以及Mamba如何协同地利用这种机制。Saturn在与药物发现相关的多参数优化任务中胜过了22个模型，并可能具有足够的样本效率来考虑直接优化高保真度预测模型的前景。

更新时间: 2024-05-27 11:37:36

领域: q-bio.BM,cs.LG

下载: http://arxiv.org/abs/2405.17066v1

Multi-task learning via robust regularized clustering with non-convex group penalties

Multi-task learning (MTL) aims to improve estimation and prediction performance by sharing common information among related tasks. One natural assumption in MTL is that tasks are classified into clusters based on their characteristics. However, existing MTL methods based on this assumption often ignore outlier tasks that have large task-specific components or no relation to other tasks. To address this issue, we propose a novel MTL method called Multi-Task Learning via Robust Regularized Clustering (MTLRRC). MTLRRC incorporates robust regularization terms inspired by robust convex clustering, which is further extended to handle non-convex and group-sparse penalties. The extension allows MTLRRC to simultaneously perform robust task clustering and outlier task detection. The connection between the extended robust clustering and the multivariate M-estimator is also established. This provides an interpretation of the robustness of MTLRRC against outlier tasks. An efficient algorithm based on a modified alternating direction method of multipliers is developed for the estimation of the parameters. The effectiveness of MTLRRC is demonstrated through simulation studies and application to real data.

Updated: 2024-05-27 11:37:12

标题: 多任务学习通过具有非凸组罚项的稳健正则化聚类

摘要: 多任务学习（MTL）旨在通过在相关任务之间共享共同信息来改善估计和预测性能。MTL中的一个自然假设是根据任务的特征将任务分类为簇。然而，基于这一假设的现有MTL方法通常忽略具有较大特定于任务的组件或与其他任务无关的异常任务。为解决这个问题，我们提出了一种名为多任务学习通过稳健正则化聚类（MTLRRC）的新型MTL方法。MTLRRC结合了受稳健凸聚类启发的稳健正则化项，进一步扩展为处理非凸和组稀疏惩罚。这种扩展使MTLRRC能够同时执行稳健任务聚类和异常任务检测。扩展的稳健聚类与多变量M-估计器之间的联系也得到建立。这提供了解释MTLRRC对异常任务的稳健性的解释。基于修改的乘数法交替方向方法的高效算法被开发用于参数的估计。通过模拟研究和实际数据的应用证明了MTLRRC的有效性。

更新时间: 2024-05-27 11:37:12

领域: stat.ME,cs.LG,stat.ML

下载: http://arxiv.org/abs/2404.03250v2

Provably Efficient Reinforcement Learning with Multinomial Logit Function Approximation

We study a new class of MDPs that employs multinomial logit (MNL) function approximation to ensure valid probability distributions over the state space. Despite its benefits, introducing non-linear function approximation raises significant challenges in both computational and statistical efficiency. The best-known method of Hwang and Oh [2023] has achieved an $\widetilde{\mathcal{O}}(\kappa^{-1}dH^2\sqrt{K})$ regret, where $\kappa$ is a problem-dependent quantity, $d$ is the feature space dimension, $H$ is the episode length, and $K$ is the number of episodes. While this result attains the same rate in $K$ as the linear cases, the method requires storing all historical data and suffers from an $\mathcal{O}(K)$ computation cost per episode. Moreover, the quantity $\kappa$ can be exponentially small, leading to a significant gap for the regret compared to the linear cases. In this work, we first address the computational concerns by proposing an online algorithm that achieves the same regret with only $\mathcal{O}(1)$ computation cost. Then, we design two algorithms that leverage local information to enhance statistical efficiency. They not only maintain an $\mathcal{O}(1)$ computation cost per episode but achieve improved regrets of $\widetilde{\mathcal{O}}(\kappa^{-1/2}dH^2\sqrt{K})$ and $\widetilde{\mathcal{O}}(dH^2\sqrt{K} + \kappa^{-1}d^2H^2)$ respectively. Finally, we establish a lower bound, justifying the optimality of our results in $d$ and $K$. To the best of our knowledge, this is the first work that achieves almost the same computational and statistical efficiency as linear function approximation while employing non-linear function approximation for reinforcement learning.

Updated: 2024-05-27 11:31:54

标题: 具备证明有效性的多项Logit函数逼近强化学习

摘要: 我们研究了一类新的MDP，它采用多项式Logit（MNL）函数逼近来确保状态空间上的有效概率分布。尽管它具有许多优点，引入非线性函数逼近在计算和统计效率方面带来了重大挑战。Hwang和Oh [2023]提出的最佳方法已经实现了一个$\widetilde{\mathcal{O}}(\kappa^{-1}dH^2\sqrt{K})$的后悔值，其中$\kappa$是与问题相关的数量，$d$是特征空间的维度，$H$是每一集的长度，$K$是集的数量。虽然这个结果在$K$方面达到了与线性情况相同的速率，但该方法需要存储所有历史数据，并且每集的计算成本为$\mathcal{O}(K)$。此外，数量$\kappa$可能非常小，导致与线性情况相比的后悔值存在显著差距。在这项工作中，我们首先通过提出一种在线算法来解决计算上的问题，该算法仅需要$\mathcal{O}(1)$的计算成本就能达到相同的后悔值。然后，我们设计了两种算法，利用局部信息来提高统计效率。它们不仅每集保持$\mathcal{O}(1)$的计算成本，还分别实现了改进的后悔值$\widetilde{\mathcal{O}}(\kappa^{-1/2}dH^2\sqrt{K})$和$\widetilde{\mathcal{O}}(dH^2\sqrt{K} + \kappa^{-1}d^2H^2)$。最后，我们建立了一个下界，证明了我们结果在$d$和$K$方面的最优性。据我们所知，这是第一个在强化学习中采用非线性函数逼近实现几乎与线性函数逼近相同的计算和统计效率的工作。

更新时间: 2024-05-27 11:31:54

领域: cs.LG

下载: http://arxiv.org/abs/2405.17061v1

Graph Neural Networks on Quantum Computers

Graph Neural Networks (GNNs) are powerful machine learning models that excel at analyzing structured data represented as graphs, demonstrating remarkable performance in applications like social network analysis and recommendation systems. However, classical GNNs face scalability challenges when dealing with large-scale graphs. This paper proposes frameworks for implementing GNNs on quantum computers to potentially address the challenges. We devise quantum algorithms corresponding to the three fundamental types of classical GNNs: Graph Convolutional Networks, Graph Attention Networks, and Message-Passing GNNs. A complexity analysis of our quantum implementation of the Simplified Graph Convolutional (SGC) Network shows potential quantum advantages over its classical counterpart, with significant improvements in time and space complexities. Our complexities can have trade-offs between the two: when optimizing for minimal circuit depth, our quantum SGC achieves logarithmic time complexity in the input sizes (albeit at the cost of linear space complexity). When optimizing for minimal qubit usage, the quantum SGC exhibits space complexity logarithmic in the input sizes, offering an exponential reduction compared to classical SGCs, while still maintaining better time complexity. These results suggest our Quantum GNN frameworks could efficiently process large-scale graphs. This work paves the way for implementing more advanced Graph Neural Network models on quantum computers, opening new possibilities in quantum machine learning for analyzing graph-structured data.

Updated: 2024-05-27 11:31:08

标题: 在量子计算机上的图神经网络

摘要: 图神经网络（GNNs）是强大的机器学习模型，在分析以图形式表示的结构化数据方面表现出色，在社交网络分析和推荐系统等应用中表现出卓越性能。然而，传统的GNN在处理大规模图形时面临可扩展性挑战。本文提出了在量子计算机上实现GNN的框架，以潜在地解决这些挑战。我们设计了相应于三种基本类型的经典GNN的量子算法：图卷积网络、图注意力网络和消息传递GNN。我们对我们对简化图卷积（SGC）网络的量子实现进行了复杂性分析，显示出与其经典对应物相比的潜在量子优势，在时间和空间复杂性方面取得了显著改进。我们的复杂性可以在时间和空间两方面进行权衡：当优化最小电路深度时，我们的量子SGC实现在输入规模方面实现了对数时间复杂性（尽管以线性空间复杂度为代价）。当优化最小量子比特使用时，量子SGC在输入规模方面呈现出对数空间复杂性，与经典SGC相比实现了指数级减少，同时仍保持更好的时间复杂性。这些结果表明我们的量子GNN框架可以高效处理大规模图形。这项工作为在量子计算机上实现更先进的图神经网络模型铺平了道路，为分析图结构数据的量子机器学习开辟了新的可能性。

更新时间: 2024-05-27 11:31:08

领域: quant-ph,cs.AI,cs.LG

下载: http://arxiv.org/abs/2405.17060v1

Comparative Study of Machine Learning Algorithms in Detecting Cardiovascular Diseases

The detection of cardiovascular diseases (CVD) using machine learning techniques represents a significant advancement in medical diagnostics, aiming to enhance early detection, accuracy, and efficiency. This study explores a comparative analysis of various machine learning algorithms, including Logistic Regression, Decision Tree, Random Forest, Gradient Boosting, Support Vector Machine (SVM), K-Nearest Neighbors (KNN), and XGBoost. By utilising a structured workflow encompassing data collection, preprocessing, model selection and hyperparameter tuning, training, evaluation, and choice of the optimal model, this research addresses the critical need for improved diagnostic tools. The findings highlight the efficacy of ensemble methods and advanced algorithms in providing reliable predictions, thereby offering a comprehensive framework for CVD detection that can be readily implemented and adapted in clinical settings.

Updated: 2024-05-27 11:29:54

标题: 机器学习算法在检测心血管疾病中的比较研究

摘要: 使用机器学习技术检测心血管疾病（CVD）代表了医学诊断的重大进展，旨在提高早期检测、准确性和效率。本研究探讨了各种机器学习算法的比较分析，包括逻辑回归、决策树、随机森林、梯度提升、支持向量机（SVM）、K-最近邻（KNN）和XGBoost。通过利用结构化工作流程，包括数据收集、预处理、模型选择和超参数调整、训练、评估和选择最佳模型，这项研究解决了改进诊断工具的关键需求。研究结果突出了集成方法和先进算法在提供可靠预测方面的有效性，从而为CVD检测提供了全面框架，可以在临床环境中方便实施和调整。

更新时间: 2024-05-27 11:29:54

领域: cs.LG

下载: http://arxiv.org/abs/2405.17059v1

Self-Training: A Survey

Semi-supervised algorithms aim to learn prediction functions from a small set of labeled observations and a large set of unlabeled observations. Because this framework is relevant in many applications, they have received a lot of interest in both academia and industry. Among the existing techniques, self-training methods have undoubtedly attracted greater attention in recent years. These models are designed to find the decision boundary on low density regions without making additional assumptions about the data distribution, and use the unsigned output score of a learned classifier, or its margin, as an indicator of confidence. The working principle of self-training algorithms is to learn a classifier iteratively by assigning pseudo-labels to the set of unlabeled training samples with a margin greater than a certain threshold. The pseudo-labeled examples are then used to enrich the labeled training data and to train a new classifier in conjunction with the labeled training set. In this paper, we present self-training methods for binary and multi-class classification; as well as their variants and two related approaches, namely consistency-based approaches and transductive learning. We examine the impact of significant self-training features on various methods, using different general and image classification benchmarks, and we discuss our ideas for future research in self-training. To the best of our knowledge, this is the first thorough and complete survey on this subject.

Updated: 2024-05-27 11:27:47

标题: 自我训练：一项调查

摘要: 半监督算法旨在从一小组标记观察和一大组未标记观察中学习预测函数。由于这个框架在许多应用中都很重要，因此在学术界和工业界都受到了很多关注。在现有技术中，自训练方法在近年来无疑受到了更多关注。这些模型旨在在低密度区域找到决策边界，而不对数据分布做出额外的假设，并使用学习分类器的无符号输出分数或其边缘作为置信度指标。自训练算法的工作原理是通过为具有大于某个阈值的边缘的未标记训练样本集分配伪标签来迭代地学习分类器。然后使用伪标记示例来丰富标记的训练数据，并与标记的训练集一起训练一个新的分类器。在本文中，我们介绍了二元和多类分类的自训练方法，以及它们的变体和两种相关方法，即基于一致性的方法和传导学习。我们通过使用不同的一般和图像分类基准测试来研究各种方法对重要的自训练特性的影响，并讨论我们对未来自训练研究的想法。据我们所知，这是关于这个主题的第一份彻底和完整的调查报告。

更新时间: 2024-05-27 11:27:47

领域: cs.LG

下载: http://arxiv.org/abs/2202.12040v5

ReflectionCoder: Learning from Reflection Sequence for Enhanced One-off Code Generation

Code generation plays a crucial role in various tasks, such as code auto-completion and mathematical reasoning. Previous work has proposed numerous methods to enhance code generation performance, including integrating feedback from the compiler. Inspired by this, we present ReflectionCoder, a novel approach that effectively leverages reflection sequences constructed by integrating compiler feedback to improve one-off code generation performance. Furthermore, we propose reflection self-distillation and dynamically masked distillation to effectively utilize these reflection sequences. Extensive experiments on three benchmarks, i.e., HumanEval (+), MBPP (+), and MultiPl-E, demonstrate that models fine-tuned with our method achieve state-of-the-art performance. Notably, ReflectionCoder-DeepSeek-Coder-33B reaches pass@1 of 82.9 (76.8) on HumanEval (+) and 84.1 (72.0) on MBPP (+), on par with GPT-3.5-Turbo and Claude-3-opus, and surpasses early GPT-4. Beyond the code domain, we believe this approach can benefit other domains that focus on final results and require long reasoning paths. Code and data are available at https://github.com/SenseLLM/ReflectionCoder.

Updated: 2024-05-27 11:27:00

标题: ReflectionCoder：从反射序列中学习，以增强一次性代码生成

摘要: 代码生成在各种任务中发挥着至关重要的作用，如代码自动完成和数学推理。先前的工作已经提出了许多方法来增强代码生成性能，包括集成来自编译器的反馈。受此启发，我们提出了ReflectionCoder，一种新颖的方法，通过整合编译器反馈构建反射序列，有效地提高一次性代码生成性能。此外，我们提出了反射自蒸馏和动态遮罩蒸馏，以有效利用这些反射序列。在三个基准测试上进行了广泛的实验，即HumanEval (+)、MBPP (+)和MultiPl-E，结果表明，使用我们的方法进行微调的模型实现了最先进的性能。值得注意的是，ReflectionCoder-DeepSeek-Coder-33B在HumanEval (+)上达到了82.9（76.8）的pass@1，在MBPP (+)上达到了84.1（72.0），与GPT-3.5-Turbo和Claude-3-opus持平，并超越了早期的GPT-4。在代码领域之外，我们相信这种方法可以使其他专注于最终结果并需要长时间推理路径的领域受益。代码和数据可在https://github.com/SenseLLM/ReflectionCoder获取。

更新时间: 2024-05-27 11:27:00

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2405.17057v1

Leveraging Logical Rules in Knowledge Editing: A Cherry on the Top

Multi-hop Question Answering (MQA) under knowledge editing (KE) is a key challenge in Large Language Models (LLMs). While best-performing solutions in this domain use a plan and solve paradigm to split a question into sub-questions followed by response generation, we claim that this approach is sub-optimal as it fails for hard to decompose questions, and it does not explicitly cater to correlated knowledge updates resulting as a consequence of knowledge edits. This has a detrimental impact on the overall consistency of the updated knowledge. To address these issues, in this paper, we propose a novel framework named RULE-KE, i.e., RULE based Knowledge Editing, which is a cherry on the top for augmenting the performance of all existing MQA methods under KE. Specifically, RULE-KE leverages rule discovery to discover a set of logical rules. Then, it uses these discovered rules to update knowledge about facts highly correlated with the edit. Experimental evaluation using existing and newly curated datasets (i.e., RKE-EVAL) shows that RULE-KE helps augment both performances of parameter-based and memory-based solutions up to 92% and 112.9%, respectively.

Updated: 2024-05-27 11:24:59

标题: 利用逻辑规则进行知识编辑：锦上添花

摘要: 多跳问题回答（MQA）在知识编辑（KE）下是大型语言模型（LLMs）中的一个关键挑战。虽然在这个领域表现最佳的解决方案使用计划和解决范式将问题分解为子问题，然后生成响应，但我们认为这种方法是次优的，因为它无法解决难以分解的问题，并且它没有明确地考虑到由于知识编辑而导致的相关知识更新。这对更新后的知识的整体一致性产生了不利影响。为了解决这些问题，在本文中，我们提出了一个名为RULE-KE的新框架，即基于规则的知识编辑，这是对所有现有MQA方法在KE下性能的增强。具体来说，RULE-KE利用规则发现来发现一组逻辑规则。然后，它使用这些发现的规则来更新与编辑高度相关的事实的知识。使用现有和新策划的数据集（即RKE-EVAL）进行实验评估显示，RULE-KE有助于将基于参数和基于内存的解决方案的性能分别提高了92%和112.9%。

更新时间: 2024-05-27 11:24:59

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2405.15452v2

Mitigating the Curse of Dimensionality for Certified Robustness via Dual Randomized Smoothing

Randomized Smoothing (RS) has been proven a promising method for endowing an arbitrary image classifier with certified robustness. However, the substantial uncertainty inherent in the high-dimensional isotropic Gaussian noise imposes the curse of dimensionality on RS. Specifically, the upper bound of ${\ell_2}$ certified robustness radius provided by RS exhibits a diminishing trend with the expansion of the input dimension $d$, proportionally decreasing at a rate of $1/\sqrt{d}$. This paper explores the feasibility of providing ${\ell_2}$ certified robustness for high-dimensional input through the utilization of dual smoothing in the lower-dimensional space. The proposed Dual Randomized Smoothing (DRS) down-samples the input image into two sub-images and smooths the two sub-images in lower dimensions. Theoretically, we prove that DRS guarantees a tight ${\ell_2}$ certified robustness radius for the original input and reveal that DRS attains a superior upper bound on the ${\ell_2}$ robustness radius, which decreases proportionally at a rate of $(1/\sqrt m + 1/\sqrt n )$ with $m+n=d$. Extensive experiments demonstrate the generalizability and effectiveness of DRS, which exhibits a notable capability to integrate with established methodologies, yielding substantial improvements in both accuracy and ${\ell_2}$ certified robustness baselines of RS on the CIFAR-10 and ImageNet datasets. Code is available at https://github.com/xiasong0501/DRS.

Updated: 2024-05-27 11:23:37

标题: 通过双重随机平滑缓解认证鲁棒性的维度诅咒

摘要: 随机平滑（RS）已被证明是一种为任意图像分类器赋予认证鲁棒性的有希望的方法。然而，高维各向同性高斯噪声固有的实质性不确定性给RS带来了维度诅咒。具体而言，RS提供的${\ell_2}$认证鲁棒性半径的上界随着输入维度$d$的扩展呈现出递减的趋势，比例下降速率为$1/\sqrt{d}$。本文探讨了通过在低维空间利用双重平滑来为高维输入提供${\ell_2}$认证鲁棒性的可行性。提出的双重随机平滑（DRS）将输入图像下采样为两个子图像，并在较低维度中平滑这两个子图像。理论上，我们证明了DRS保证了原始输入的紧密${\ell_2}$认证鲁棒性半径，并揭示了DRS取得了${\ell_2}$鲁棒性半径的优越上界，其下降速率为$(1/\sqrt m + 1/\sqrt n)$，其中$m+n=d$。大量实验表明了DRS的泛化能力和有效性，它展示了与已建立方法的显着整合能力，显著改善了CIFAR-10和ImageNet数据集上RS的准确性和${\ell_2}$认证鲁棒性基线。源代码可在https://github.com/xiasong0501/DRS上找到。

更新时间: 2024-05-27 11:23:37

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2404.09586v3

Improving Data-aware and Parameter-aware Robustness for Continual Learning

The goal of Continual Learning (CL) task is to continuously learn multiple new tasks sequentially while achieving a balance between the plasticity and stability of new and old knowledge. This paper analyzes that this insufficiency arises from the ineffective handling of outliers, leading to abnormal gradients and unexpected model updates. To address this issue, we enhance the data-aware and parameter-aware robustness of CL, proposing a Robust Continual Learning (RCL) method. From the data perspective, we develop a contrastive loss based on the concepts of uniformity and alignment, forming a feature distribution that is more applicable to outliers. From the parameter perspective, we present a forward strategy for worst-case perturbation and apply robust gradient projection to the parameters. The experimental results on three benchmarks show that the proposed method effectively maintains robustness and achieves new state-of-the-art (SOTA) results. The code is available at: https://github.com/HanxiXiao/RCL

Updated: 2024-05-27 11:21:26

标题: Improving Data-aware and Parameter-aware Robustness for Continual Learning （改善数据感知和参数感知的鲁棒性以持续学习）

摘要: 持续学习（CL）任务的目标是在连续学习多个新任务的同时，在新旧知识之间实现可塑性和稳定性之间的平衡。本文分析了这种不足之处是由于对离群值的处理不足，导致异常梯度和意外模型更新。为了解决这个问题，我们增强了CL的数据感知和参数感知鲁棒性，提出了一种鲁棒持续学习（RCL）方法。从数据角度来看，我们基于一致性和对齐性概念提出了一种对比损失，形成了一个更适用于离群值的特征分布。从参数角度来看，我们提出了一种最坏情况扰动的前向策略，并将鲁棒梯度投影应用于参数。在三个基准测试上的实验结果表明，所提出的方法有效地保持了鲁棒性，并取得了新的最先进结果。代码可在以下链接找到：https://github.com/HanxiXiao/RCL

更新时间: 2024-05-27 11:21:26

领域: cs.LG

下载: http://arxiv.org/abs/2405.17054v1

WirelessLLM: Empowering Large Language Models Towards Wireless Intelligence

The rapid evolution of wireless technologies and the growing complexity of network infrastructures necessitate a paradigm shift in how communication networks are designed, configured, and managed. Recent advancements in Large Language Models (LLMs) have sparked interest in their potential to revolutionize wireless communication systems. However, existing studies on LLMs for wireless systems are limited to a direct application for telecom language understanding. To empower LLMs with knowledge and expertise in the wireless domain, this paper proposes WirelessLLM, a comprehensive framework for adapting and enhancing LLMs to address the unique challenges and requirements of wireless communication networks. We first identify three foundational principles that underpin WirelessLLM: knowledge alignment, knowledge fusion, and knowledge evolution. Then, we investigate the enabling technologies to build WirelessLLM, including prompt engineering, retrieval augmented generation, tool usage, multi-modal pre-training, and domain-specific fine-tuning. Moreover, we present three case studies to demonstrate the practical applicability and benefits of WirelessLLM for solving typical problems in wireless networks. Finally, we conclude this paper by highlighting key challenges and outlining potential avenues for future research.

Updated: 2024-05-27 11:18:25

标题: 无线LLM：赋能大型语言模型实现无线智能

摘要: 无线技术的快速发展和网络基础设施日益复杂，需要在设计、配置和管理通信网络方面进行范式转变。最近大型语言模型(LLMs)的进展引起了人们对其革新无线通信系统潜力的兴趣。然而，现有关于LLMs用于无线系统的研究仅限于直接应用于电信语言理解。为了赋予LLMs在无线领域知识和专业知识，本文提出了WirelessLLM，这是一个全面的框架，用于调整和增强LLMs以解决无线通信网络的独特挑战和要求。我们首先确定了支撑WirelessLLM的三个基本原则：知识对齐、知识融合和知识演化。然后，我们调查了构建WirelessLLM的能力技术，包括提示工程、检索增强生成、工具使用、多模态预训练和领域特定微调。此外，我们提出了三个案例研究，以展示WirelessLLM在解决无线网络中典型问题方面的实际适用性和益处。最后，我们通过强调主要挑战并概述未来研究的潜在途径来总结本文。

更新时间: 2024-05-27 11:18:25

领域: cs.NI,cs.AI,cs.LG

下载: http://arxiv.org/abs/2405.17053v1

Exploring Collaboration Mechanisms for LLM Agents: A Social Psychology View

As Natural Language Processing (NLP) systems are increasingly employed in intricate social environments, a pressing query emerges: Can these NLP systems mirror human-esque collaborative intelligence, in a multi-agent society consisting of multiple large language models (LLMs)? This paper probes the collaboration mechanisms among contemporary NLP systems by melding practical experiments with theoretical insights. We fabricate four unique `societies' comprised of LLM agents, where each agent is characterized by a specific `trait' (easy-going or overconfident) and engages in collaboration with a distinct `thinking pattern' (debate or reflection). Through evaluating these multi-agent societies on three benchmark datasets, we discern that certain collaborative strategies not only outshine previous top-tier approaches, but also optimize efficiency (using fewer API tokens). Moreover, our results further illustrate that LLM agents manifest human-like social behaviors, such as conformity and consensus reaching, mirroring foundational social psychology theories. In conclusion, we integrate insights from social psychology to contextualize the collaboration of LLM agents, inspiring further investigations into the collaboration mechanism for LLMs. We commit to sharing our code and datasets\footnote{\url{https://github.com/zjunlp/MachineSoM}.}, hoping to catalyze further research in this promising avenue.

Updated: 2024-05-27 11:12:45

标题: 探索LLM代理的协作机制：社会心理学视角

摘要: 随着自然语言处理（NLP）系统在复杂社会环境中的应用越来越广泛，一个紧迫的问题出现了：这些NLP系统能否模拟类似于人类的协作智能，在由多个大型语言模型（LLMs）组成的多智能体社会中？本文通过将实际实验与理论见解相结合，探讨了当代NLP系统之间的协作机制。我们构建了由LLM代理组成的四个独特的“社会”，其中每个代理都具有特定的“特质”（随和或过于自信），并与不同的“思维模式”（辩论或反思）进行协作。通过在三个基准数据集上评估这些多智能体社会，我们发现某些协作策略不仅超越了先前的顶级方法，还优化了效率（使用更少的API令牌）。此外，我们的结果进一步说明，LLM代理表现出类似于人类的社会行为，如顺从和达成共识，反映了基础社会心理学理论。总之，我们整合了社会心理学的见解，对LLM代理的协作进行了情境化，激发了进一步研究LLMs的协作机制的兴趣。我们致力于分享我们的代码和数据集\footnote{\url{https://github.com/zjunlp/MachineSoM}}，希望在这一有前途的领域推动进一步的研究。

更新时间: 2024-05-27 11:12:45

领域: cs.CL,cs.AI,cs.CY,cs.LG,cs.MA

下载: http://arxiv.org/abs/2310.02124v3

BeamVQ: Aligning Space-Time Forecasting Model via Self-training on Physics-aware Metrics

Data-driven deep learning has emerged as the new paradigm to model complex physical space-time systems. These data-driven methods learn patterns by optimizing statistical metrics and tend to overlook the adherence to physical laws, unlike traditional model-driven numerical methods. Thus, they often generate predictions that are not physically realistic. On the other hand, by sampling a large amount of high quality predictions from a data-driven model, some predictions will be more physically plausible than the others and closer to what will happen in the future. Based on this observation, we propose \emph{Beam search by Vector Quantization} (BeamVQ) to enhance the physical alignment of data-driven space-time forecasting models. The key of BeamVQ is to train model on self-generated samples filtered with physics-aware metrics. To be flexibly support different backbone architectures, BeamVQ leverages a code bank to transform any encoder-decoder model to the continuous state space into discrete codes. Afterwards, it iteratively employs beam search to sample high-quality sequences, retains those with the highest physics-aware scores, and trains model on the new dataset. Comprehensive experiments show that BeamVQ not only gave an average statistical skill score boost for more than 32% for ten backbones on five datasets, but also significantly enhances physics-aware metrics.

Updated: 2024-05-27 11:07:47

标题: BeamVQ：通过在物理感知度度量上进行自我训练来对齐时空预测模型

摘要: 数据驱动的深度学习已经成为建模复杂物理时空系统的新范式。这些数据驱动的方法通过优化统计指标来学习模式，并倾向于忽略对物理定律的遵守，与传统的基于模型的数值方法不同。因此，它们通常会产生不符合物理实际的预测。另一方面，通过从数据驱动模型中采样大量高质量的预测，一些预测会比其他预测更加符合物理实际并更接近未来的发展。基于这一观察，我们提出了一种名为“基于向量量化的波束搜索”（BeamVQ）的方法，以增强数据驱动时空预测模型的物理对齐性。BeamVQ的关键在于使用经过物理感知指标筛选的自动生成样本来训练模型。为了灵活支持不同的骨干架构，BeamVQ利用一个代码库将任何编码器-解码器模型转换为连续状态空间中的离散码。然后，它迭代地使用波束搜索来采样高质量的序列，保留具有最高物理感知分数的序列，并在新数据集上训练模型。全面的实验表明，BeamVQ不仅使五个数据集上十个骨干的平均统计技能分数提升了超过32％，还显著提升了物理感知指标。

更新时间: 2024-05-27 11:07:47

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2405.17051v1

Enhancing Graph Transformers with Hierarchical Distance Structural Encoding

Graph transformers need strong inductive biases to derive meaningful attention scores. Yet, current methods often fall short in capturing longer ranges, hierarchical structures, or community structures, which are common in various graphs such as molecules, social networks, and citation networks. This paper presents a Hierarchical Distance Structural Encoding (HDSE) method to model node distances in a graph, focusing on its multi-level, hierarchical nature. We introduce a novel framework to seamlessly integrate HDSE into the attention mechanism of existing graph transformers, allowing for simultaneous application with other positional encodings. To apply graph transformers with HDSE to large-scale graphs, we further propose a high-level HDSE that effectively biases the linear transformers towards graph hierarchies. We theoretically prove the superiority of HDSE over shortest path distances in terms of expressivity and generalization. Empirically, we demonstrate that graph transformers with HDSE excel in graph classification, regression on 7 graph-level datasets, and node classification on 11 large-scale graphs, including those with up to a billion nodes.

Updated: 2024-05-27 11:04:29

标题: 使用分层距离结构编码增强图变换器

摘要: 图形转换器需要强大的归纳偏差来推导有意义的注意力分数。然而，当前的方法往往在捕捉更长范围、层次结构或社区结构方面表现不佳，而这些在分子、社交网络和引用网络等各种图形中是常见的。本文提出了一种Hierarchical Distance Structural Encoding (HDSE)方法，用于模拟图形中的节点距离，重点关注其多层次、层次结构的特性。我们引入了一个新颖的框架，将HDSE无缝集成到现有图形转换器的注意力机制中，允许与其他位置编码同时应用。为了将带有HDSE的图形转换器应用于大规模图形，我们进一步提出了一个高级HDSE，有效地使线性转换器偏向于图形层次结构。我们在表达能力和泛化方面理论上证明了HDSE优于最短路径距离。从实证上，我们展示了带有HDSE的图形转换器在图形分类、7个图形级数据集上的回归以及11个大规模图形上的节点分类方面表现出色，包括那些具有多达十亿个节点的图形。

更新时间: 2024-05-27 11:04:29

领域: cs.LG,cs.AI,cs.SI

下载: http://arxiv.org/abs/2308.11129v4

HeNCler: Node Clustering in Heterophilous Graphs through Learned Asymmetric Similarity

Clustering nodes in heterophilous graphs presents unique challenges due to the asymmetric relationships often overlooked by traditional methods, which moreover assume that good clustering corresponds to high intra-cluster and low inter-cluster connectivity. To address these issues, we introduce HeNCler - a novel approach for Heterophilous Node Clustering. Our method begins by defining a weighted kernel singular value decomposition to create an asymmetric similarity graph, applicable to both directed and undirected graphs. We further establish that the dual problem of this formulation aligns with asymmetric kernel spectral clustering, interpreting learned graph similarities without relying on homophily. We demonstrate the ability to solve the primal problem directly, circumventing the computational difficulties of the dual approach. Experimental evidence confirms that HeNCler significantly enhances performance in node clustering tasks within heterophilous graph contexts.

Updated: 2024-05-27 11:04:05

标题: HeNCler：通过学习的非对称相似性在异质图中进行节点聚类

摘要: 在异构图中对节点进行聚类面临着独特的挑战，因为传统方法往往忽略了不对称关系，而且这些方法通常假设良好的聚类对应于高内部集群连接和低集群间连接。为了解决这些问题，我们引入了HeNCler - 一种新颖的异构节点聚类方法。我们的方法首先通过定义加权核奇异值分解来创建一个不对称相似图，适用于有向和无向图。我们进一步建立了这个公式的对偶问题与不对称核谱聚类相一致，解释了学习到的图相似性，而不依赖于同质性。我们展示了能够直接解决原始问题，避开了对偶方法的计算困难。实验证据证实了HeNCler在异构图环境中节点聚类任务中显著提升性能的能力。

更新时间: 2024-05-27 11:04:05

领域: cs.LG

下载: http://arxiv.org/abs/2405.17050v1

Verifying Properties of Binary Neural Networks Using Sparse Polynomial Optimization

This paper explores methods for verifying the properties of Binary Neural Networks (BNNs), focusing on robustness against adversarial attacks. Despite their lower computational and memory needs, BNNs, like their full-precision counterparts, are also sensitive to input perturbations. Established methods for solving this problem are predominantly based on Satisfiability Modulo Theories and Mixed-Integer Linear Programming techniques, which are characterized by NP complexity and often face scalability issues. We introduce an alternative approach using Semidefinite Programming relaxations derived from sparse Polynomial Optimization. Our approach, compatible with continuous input space, not only mitigates numerical issues associated with floating-point calculations but also enhances verification scalability through the strategic use of tighter first-order semidefinite relaxations. We demonstrate the effectiveness of our method in verifying robustness against both $\|.\|_\infty$ and $\|.\|_2$-based adversarial attacks.

Updated: 2024-05-27 11:03:48

标题: 使用稀疏多项式优化验证二进制神经网络的特性

摘要: 本文探讨了验证二进制神经网络（BNNs）属性的方法，重点关注其对抗性攻击的稳健性。尽管BNNs具有较低的计算和存储需求，但与其完整精度对应物一样，它们也对输入扰动敏感。解决这个问题的已建立方法主要基于满足性模块理论和混合整数线性规划技术，这些方法具有NP复杂性并且经常面临可扩展性问题。我们引入了一种替代方法，使用从稀疏多项式优化导出的半定规划松弛方法。我们的方法与连续输入空间兼容，不仅可以缓解与浮点计算相关的数值问题，还通过战略使用更紧密的一阶半定松弛来增强验证的可扩展性。我们展示了我们的方法在验证对抗攻击中的稳健性方面的有效性，包括对$\|.\|_\infty$和$\|.\|_2$基础对抗攻击的验证。

更新时间: 2024-05-27 11:03:48

领域: cs.LG,math.OC

下载: http://arxiv.org/abs/2405.17049v1

Interpretable Robotic Manipulation from Language

Humans naturally employ linguistic instructions to convey knowledge, a process that proves significantly more complex for machines, especially within the context of multitask robotic manipulation environments. Natural language, moreover, serves as the primary medium through which humans acquire new knowledge, presenting a potentially intuitive bridge for translating concepts understandable by humans into formats that can be learned by machines. In pursuit of facilitating this integration, we introduce an explainable behavior cloning agent, named Ex-PERACT, specifically designed for manipulation tasks. This agent is distinguished by its hierarchical structure, which incorporates natural language to enhance the learning process. At the top level, the model is tasked with learning a discrete skill code, while at the bottom level, the policy network translates the problem into a voxelized grid and maps the discretized actions to voxel grids. We evaluate our method across eight challenging manipulation tasks utilizing the RLBench benchmark, demonstrating that Ex-PERACT not only achieves competitive policy performance but also effectively bridges the gap between human instructions and machine execution in complex environments.

Updated: 2024-05-27 11:02:21

标题: 从语言中解释的机器人操作

摘要: 人类自然地利用语言指令来传达知识，这个过程对机器来说更加复杂，特别是在多任务机器人操作环境中。自然语言还作为人类获取新知识的主要媒介，为将人类可理解的概念转化为机器可学习的格式提供了可能直观的桥梁。为了促进这种整合，我们引入了一种名为Ex-PERACT的可解释行为克隆代理，专门设计用于操作任务。该代理以其分层结构而著名，其中融入自然语言以增强学习过程。在顶层，模型被赋予学习离散技能代码的任务，而在底层，策略网络将问题转化为像素化网格，并将离散化的动作映射到像素网格。我们在使用RLBench基准测试的八个具有挑战性的操作任务上评估了我们的方法，表明Ex-PERACT不仅实现了竞争性的策略性能，还有效地弥合了复杂环境中人类指令与机器执行之间的差距。

更新时间: 2024-05-27 11:02:21

领域: cs.RO,cs.LG

下载: http://arxiv.org/abs/2405.17047v1

Generation and human-expert evaluation of interesting research ideas using knowledge graphs and large language models

Advanced artificial intelligence (AI) systems with access to millions of research papers could inspire new research ideas that may not be conceived by humans alone. However, how interesting are these AI-generated ideas, and how can we improve their quality? Here, we introduce SciMuse, a system that uses an evolving knowledge graph built from more than 58 million scientific papers to generate personalized research ideas via an interface to GPT-4. We conducted a large-scale human evaluation with over 100 research group leaders from the Max Planck Society, who ranked more than 4,000 personalized research ideas based on their level of interest. This evaluation allows us to understand the relationships between scientific interest and the core properties of the knowledge graph. We find that data-efficient machine learning can predict research interest with high precision, allowing us to optimize the interest-level of generated research ideas. This work represents a step towards an artificial scientific muse that could catalyze unforeseen collaborations and suggest interesting avenues for scientists.

Updated: 2024-05-27 11:00:51

标题: 使用知识图谱和大型语言模型生成和人工专家评估有趣的研究想法

摘要: 具有对数百万研究论文的访问权限的先进人工智能（AI）系统可能会启发人类无法想象的新研究思路。然而，这些由AI生成的想法有多有趣，我们如何提高它们的质量？在这里，我们介绍了SciMuse，它利用从超过5800万篇科学论文构建的不断发展的知识图谱，通过与GPT-4的接口生成个性化的研究想法。我们与来自马克斯·普朗克学会的100多位研究小组领导进行了大规模的人类评估，他们根据兴趣程度对4000多个个性化研究想法进行了排名。这项评估使我们能够了解科学兴趣与知识图谱的核心属性之间的关系。我们发现，数据效率机器学习可以高精度地预测研究兴趣，从而使我们能够优化生成的研究想法的兴趣水平。这项工作代表了迈向人工科学缪斯的一步，它可能促进意想不到的合作，并为科学家提供有趣的研究方向。

更新时间: 2024-05-27 11:00:51

领域: cs.AI,cs.CL,cs.DL,cs.LG

下载: http://arxiv.org/abs/2405.17044v1

Worldwide Federated Training of Language Models

The reliance of language model training on massive amounts of computation and vast datasets scraped from potentially low-quality, copyrighted, or sensitive data has come into question practically, legally, and ethically. Federated learning provides a plausible alternative by enabling previously untapped data to be voluntarily gathered from collaborating organizations. However, when scaled globally, federated learning requires collaboration across heterogeneous legal, security, and privacy regimes while accounting for the inherent locality of language data; this further exacerbates the established challenge of federated statistical heterogeneity. We propose a Worldwide Federated Language Model Training~(WorldLM) system based on federations of federations, where each federation has the autonomy to account for factors such as its industry, operating jurisdiction, or competitive environment. WorldLM enables such autonomy in the presence of statistical heterogeneity via partial model localization by allowing sub-federations to attentively aggregate key layers from their constituents. Furthermore, it can adaptively share information across federations via residual layer embeddings. Evaluations of language modeling on naturally heterogeneous datasets show that WorldLM outperforms standard federations by up to $1.91\times$, approaches the personalized performance of fully local models, and maintains these advantages under privacy-enhancing techniques.

Updated: 2024-05-27 10:59:22

标题: 全球联合训练语言模型

摘要: 语言模型训练对大量计算和广泛的数据集依赖，这些数据集可能来自质量低劣、受版权保护或敏感的数据，这在实践、法律和道德上引发了质疑。联邦学习通过使先前未被利用的数据可以从合作组织中自愿收集，提供了一个可行的替代方案。然而，当在全球范围内扩展时，联邦学习需要跨越异构的法律、安全和隐私制度进行协作，同时考虑到语言数据的固有局部性；这进一步加剧了联邦统计异质性所面临的挑战。我们提出了一种基于联邦的联邦语言模型训练(WorldLM)系统，其中每个联邦都有自主权来考虑其行业、运营管辖区域或竞争环境等因素。WorldLM通过允许子联邦从其组成部分中专注地聚合关键层来通过部分模型本地化实现这种自主权，从而应对统计异质性的存在。此外，它可以通过残差层嵌入自适应地在联邦之间共享信息。对自然异构数据集上的语言建模评估显示，WorldLM在一定程度上超越了标准联邦模型，接近完全本地模型的个性化性能，并在隐私增强技术下保持这些优势。

更新时间: 2024-05-27 10:59:22

领域: cs.LG,cs.AI,cs.CL,cs.DC,I.2.7

下载: http://arxiv.org/abs/2405.14446v2

Dynamics Harmonic Analysis of Robotic Systems: Application in Data-Driven Koopman Modelling

We introduce the use of harmonic analysis to decompose the state space of symmetric robotic systems into orthogonal isotypic subspaces. These are lower-dimensional spaces that capture distinct, symmetric, and synergistic motions. For linear dynamics, we characterize how this decomposition leads to a subdivision of the dynamics into independent linear systems on each subspace, a property we term dynamics harmonic analysis (DHA). To exploit this property, we use Koopman operator theory to propose an equivariant deep-learning architecture that leverages the properties of DHA to learn a global linear model of the system dynamics. Our architecture, validated on synthetic systems and the dynamics of locomotion of a quadrupedal robot, exhibits enhanced generalization, sample efficiency, and interpretability, with fewer trainable parameters and computational costs.

Updated: 2024-05-27 10:58:39

标题: 机器人系统的动态谐波分析：在数据驱动的库普曼建模中的应用

摘要: 我们介绍了将谐波分析应用于将对称机器人系统的状态空间分解为正交同构子空间的方法。这些是捕捉不同、对称和协同运动的低维空间。对于线性动力学，我们表征了这种分解如何将动力学划分为每个子空间上的独立线性系统，我们将此属性称为动力学谐波分析（DHA）。为了利用这一特性，我们使用Koopman算子理论提出了一种利用DHA属性学习系统动力学全局线性模型的等变深度学习架构。我们的架构在合成系统和四足机器人运动动力学上得到验证，表现出增强的泛化能力、样本效率和可解释性，可用更少的可训练参数和计算成本。

更新时间: 2024-05-27 10:58:39

领域: cs.RO,cs.AI,cs.LG,cs.SY,eess.SY,43-08

下载: http://arxiv.org/abs/2312.07457v2

Unsupervised Evaluation of Code LLMs with Round-Trip Correctness

To evaluate code large language models (LLMs), research has relied on a few small manually curated benchmarks, such as HumanEval and MBPP, which represent a narrow part of the real-world software domains. In this work, we introduce round-trip correctness (RTC) as an alternative evaluation method. RTC allows Code LLM evaluation on a broader spectrum of real-world software domains without the need for costly human curation. RTC rests on the idea that we can ask a model to make a prediction (e.g., describe some code using natural language), feed that prediction back (e.g., synthesize code from the predicted description), and check if this round-trip leads to code that is semantically equivalent to the original input. We show how to employ RTC to evaluate code synthesis and editing. We find that RTC strongly correlates with model performance on existing narrow-domain code synthesis benchmarks while allowing us to expand to a much broader set of domains and tasks which was not previously possible without costly human annotations.

Updated: 2024-05-27 10:55:06

标题: 无监督评估具有往返正确性的代码LLMs

摘要: 为评估代码大型语言模型（LLMs），研究依赖于一些小型手工策划的基准测试，例如HumanEval和MBPP，这些基准测试代表了真实世界软件领域的一个狭窄部分。在这项工作中，我们引入了往返正确性（RTC）作为一种替代评估方法。RTC允许在更广泛的真实世界软件领域对代码LLM进行评估，而无需进行昂贵的人工策划。RTC的基础是我们可以要求模型做出预测（例如，用自然语言描述某些代码），将该预测反馈（例如，从预测的描述中合成代码），并检查这种往返是否导致代码在语义上等同于原始输入。我们展示了如何利用RTC来评估代码合成和编辑。我们发现RTC与现有狭窄领域代码合成基准测试中的模型性能强相关，同时允许我们扩展到更广泛的领域和任务，这在没有昂贵的人工注释的情况下以前是不可能的。

更新时间: 2024-05-27 10:55:06

领域: cs.SE,cs.LG

下载: http://arxiv.org/abs/2402.08699v2

LabObf: A Label Protection Scheme for Vertical Federated Learning Through Label Obfuscation

Split learning, as one of the most common architectures in vertical federated learning, has gained widespread use in industry due to its privacy-preserving characteristics. In this architecture, the party holding the labels seeks cooperation from other parties to improve model performance due to insufficient feature data. Each of these participants has a self-defined bottom model to learn hidden representations from its own feature data and uploads the embedding vectors to the top model held by the label holder for final predictions. This design allows participants to conduct joint training without directly exchanging data. However, existing research points out that malicious participants may still infer label information from the uploaded embeddings, leading to privacy leakage. In this paper, we first propose an embedding extension attack that manually modifies embeddings to undermine existing defense strategies, which rely on constraining the correlation between the embeddings uploaded by participants and the labels. Subsequently, we propose a new label obfuscation defense strategy, called `LabObf', which randomly maps each original one-hot vector label to multiple numerical soft labels with values intertwined, significantly increasing the difficulty for attackers to infer the labels. We conduct experiments on four different types of datasets, and the results show that LabObf can reduce the attacker's success rate to near random guessing while maintaining an acceptable model accuracy.

Updated: 2024-05-27 10:54:42

标题: LabObf：通过标签混淆实现垂直联邦学习的标签保护方案

摘要: 分裂学习作为垂直联邦学习中最常见的架构之一，由于其保护隐私的特性而在工业中得到广泛应用。在这种架构中，持有标签的一方寻求其他方的合作，以改善模型性能，因为特征数据不足。这些参与者中的每一个都有一个自定义的底层模型，可以从自己的特征数据中学习隐藏表示，并将嵌入向量上传到标签持有者持有的顶层模型进行最终预测。这种设计允许参与者进行联合训练，而无需直接交换数据。然而，现有研究指出，恶意参与者仍可能从上传的嵌入中推断标签信息，导致隐私泄漏。在本文中，我们首先提出了一种嵌入扩展攻击，手动修改嵌入以破坏现有的防御策略，这些策略依赖于约束参与者上传的嵌入与标签之间的相关性。随后，我们提出了一种新的标签混淆防御策略，称为“LabObf”，它将每个原始的单热向量标签随机映射到多个数值软标签中，这些数值相互交织，极大地增加了攻击者推断标签的难度。我们在四种不同类型的数据集上进行实验，结果显示LabObf可以将攻击者的成功率降低到接近随机猜测的水平，同时保持可接受的模型准确性。

更新时间: 2024-05-27 10:54:42

领域: cs.LG,cs.CR

下载: http://arxiv.org/abs/2405.17042v1

BWArea Model: Learning World Model, Inverse Dynamics, and Policy for Controllable Language Generation

Large language models (LLMs) have catalyzed a paradigm shift in natural language processing, yet their limited controllability poses a significant challenge for downstream applications. We aim to address this by drawing inspiration from the neural mechanisms of the human brain, specifically Broca's and Wernicke's areas, which are crucial for language generation and comprehension, respectively. In particular, Broca's area receives cognitive decision signals from Wernicke's area, treating the language generation as an intricate decision-making process, which differs from the fully auto-regressive language generation of existing LLMs. In a similar vein, our proposed system, the BWArea model, conceptualizes language generation as a decision-making task. This model has three components: a language world model, an inverse dynamics model, and a cognitive policy. Like Wernicke's area, the inverse dynamics model is designed to deduce the underlying cognitive intentions, or latent actions, behind each token. The BWArea model is amenable to both pre-training and fine-tuning like existing LLMs. With 30B clean pre-training tokens, we have trained a BWArea model, which achieves competitive performance with LLMs of equal size (1B parameters). Unlike fully auto-regressive LLMs, its pre-training performance does not degenerate if dirty data unintentionally appears. This shows the advantage of a decomposed structure of BWArea model in reducing efforts in laborious data selection and labeling. Finally, we reveal that the BWArea model offers enhanced controllability via fine-tuning the cognitive policy with downstream reward metrics, thereby facilitating alignment with greater simplicity. On 9 out of 10 tasks from two suites, TextWorld and BigBench Hard, our method shows superior performance to auto-regressive LLMs.

Updated: 2024-05-27 10:45:49

标题: BWArea模型：学习世界模型、逆动力学和可控语言生成策略

摘要: 大型语言模型(LLMs)在自然语言处理领域引发了一场范式转变，然而它们的受控性有限给下游应用带来了重大挑战。我们的目标是从人类大脑的神经机制中汲取灵感，具体来说是布洛卡区和维尼克区，它们分别对语言生成和理解至关重要。特别是，布洛卡区接收来自维尼克区的认知决策信号，将语言生成视为一个复杂的决策过程，这与现有LLMs的完全自回归语言生成不同。在类似的思路下，我们提出的系统，BWArea模型，将语言生成概念化为一个决策任务。该模型有三个组成部分：语言世界模型、逆动力学模型和认知策略。与维尼克区类似，逆动力学模型旨在推断每个标记背后的潜在认知意图或潜在行为。BWArea模型像现有的LLMs一样适用于预训练和微调。通过30B个干净的预训练标记，我们训练了一个BWArea模型，其性能与相同大小的LLMs（1B参数）相当。与完全自回归的LLMs不同，如果不慎出现脏数据，其预训练性能不会下降。这显示了BWArea模型分解结构在减少繁琐的数据选择和标记工作方面的优势。最后，我们揭示了BWArea模型通过微调认知策略与下游奖励指标相结合，从而促进对齐的简化，提供了增强的可控性。在两个套件TextWorld和BigBench Hard的10个任务中，我们的方法表现出比自回归LLMs更优异的性能。

更新时间: 2024-05-27 10:45:49

领域: cs.CL,cs.LG

下载: http://arxiv.org/abs/2405.17039v1

Advancements in Tactile Hand Gesture Recognition for Enhanced Human-Machine Interaction

Motivated by the growing interest in enhancing intuitive physical Human-Machine Interaction (HRI/HVI), this study aims to propose a robust tactile hand gesture recognition system. We performed a comprehensive evaluation of different hand gesture recognition approaches for a large area tactile sensing interface (touch interface) constructed from conductive textiles. Our evaluation encompassed traditional feature engineering methods, as well as contemporary deep learning techniques capable of real-time interpretation of a range of hand gestures, accommodating variations in hand sizes, movement velocities, applied pressure levels, and interaction points. Our extensive analysis of the various methods makes a significant contribution to tactile-based gesture recognition in the field of human-machine interaction.

Updated: 2024-05-27 10:44:27

标题: 触觉手势识别技术在增强人机交互方面的进展

摘要: 受到增强直观物理人机交互（HRI/HVI）日益增长的兴趣的驱使，本研究旨在提出一个强大的触觉手势识别系统。我们对不同的手势识别方法进行了全面评估，针对一种由导电纺织品构建的大面积触觉传感界面（触摸界面）。我们的评估涵盖了传统的特征工程方法，以及能够实时解释一系列手势的当代深度学习技术，适应手大小、移动速度、施加的压力水平和交互点的变化。我们对各种方法的广泛分析在人机交互领域的基于触觉的手势识别方面做出了重要贡献。

更新时间: 2024-05-27 10:44:27

领域: cs.HC,cs.AI

下载: http://arxiv.org/abs/2405.17038v1

PRISM: Leveraging Prototype Patient Representations with Feature-Missing-Aware Calibration for EHR Data Sparsity Mitigation

Electronic Health Records (EHRs) contain a wealth of patient data; however, the sparsity of EHRs data often presents significant challenges for predictive modeling. Conventional imputation methods inadequately distinguish between real and imputed data, leading to potential inaccuracies of patient representations. To address these issues, we introduce PRISM, a framework that indirectly imputes data by leveraging prototype representations of similar patients, thus ensuring compact representations that preserve patient information. PRISM also includes a feature confidence learner module, which evaluates the reliability of each feature considering missing statuses. Additionally, PRISM introduces a new patient similarity metric that accounts for feature confidence, avoiding overreliance on imprecise imputed values. Our extensive experiments on the MIMIC-III, MIMIC-IV, PhysioNet Challenge 2012, eICU datasets demonstrate PRISM's superior performance in predicting in-hospital mortality and 30-day readmission tasks, showcasing its effectiveness in handling EHR data sparsity. For the sake of reproducibility and further research, we have made the code publicly available at https://github.com/yhzhu99/PRISM.

Updated: 2024-05-27 10:44:17

标题: PRISM：利用原型患者表示与特征缺失感知校准，减轻电子健康记录数据稀疏性

摘要: 电子健康记录（EHRs）包含大量患者数据；然而，EHRs数据的稀疏性经常给预测建模带来重大挑战。传统的插补方法未能充分区分真实数据和插补数据，导致患者表征的潜在不准确性。为了解决这些问题，我们引入了PRISM，一个通过利用相似患者的原型表征间接插补数据的框架，从而确保保留患者信息的紧凑表征。PRISM还包括一个特征置信学习模块，评估每个特征在考虑缺失状态时的可靠性。此外，PRISM引入了一种新的患者相似度度量标准，考虑了特征置信度，避免过度依赖不精确的插补值。我们在MIMIC-III、MIMIC-IV、PhysioNet Challenge 2012、eICU数据集上进行了大量实验，展示了PRISM在预测住院死亡和30天再入院任务中的卓越性能，展示了其在处理EHR数据稀疏性方面的有效性。为了便于再现性和进一步研究，我们已经将代码公开发布在https://github.com/yhzhu99/PRISM。

更新时间: 2024-05-27 10:44:17

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2309.04160v5

Glauber Generative Model: Discrete Diffusion Models via Binary Classification

We introduce the Glauber Generative Model (GGM), a new class of discrete diffusion models, to obtain new samples from a distribution given samples from a discrete space. GGM deploys a discrete Markov chain called the heat bath dynamics (or the Glauber dynamics) to denoise a sequence of noisy tokens to a sample from a joint distribution of discrete tokens. Our novel conceptual framework provides an exact reduction of the task of learning the denoising Markov chain to solving a class of binary classification tasks. More specifically, the model learns to classify a given token in a noisy sequence as signal or noise. In contrast, prior works on discrete diffusion models either solve regression problems to learn importance ratios, or minimize loss functions given by variational approximations. We apply GGM to language modeling and image generation, where images are discretized using image tokenizers like VQGANs. We show that it outperforms existing discrete diffusion models in language generation, and demonstrates strong performance for image generation without using dataset-specific image tokenizers. We also show that our model is capable of performing well in zero-shot control settings like text and image infilling.

Updated: 2024-05-27 10:42:13

标题: 高劳伯生成模型：通过二元分类实现离散扩散模型

摘要: 我们介绍了Glauber生成模型（GGM），这是一种新型的离散扩散模型，用于从一个离散空间中给定样本中获取新样本。GGM部署了一种称为热浴动力学（或Glauber动力学）的离散马尔可夫链，将一系列嘈杂的标记去噪，得到一个离散标记的联合分布样本。我们的新概念框架将学习去噪马尔可夫链的任务精确简化为解决一类二元分类任务。具体而言，该模型学习将嘈杂序列中的给定标记分类为信号或噪声。相比之下，先前的离散扩散模型要么解决回归问题以学习重要性比率，要么最小化由变分逼近给出的损失函数。我们将GGM应用于语言建模和图像生成，其中图像使用像VQGANs这样的图像标记器进行离散化。我们展示了它在语言生成方面优于现有的离散扩散模型，并在不使用特定数据集图像标记器的情况下，展现出图像生成的强大性能。我们还展示了我们的模型能够在零样本控制设置中表现良好，如文本和图像填充。

更新时间: 2024-05-27 10:42:13

领域: cs.LG

下载: http://arxiv.org/abs/2405.17035v1

FUGNN: Harmonizing Fairness and Utility in Graph Neural Networks

Fairness-aware Graph Neural Networks (GNNs) often face a challenging trade-off, where prioritizing fairness may require compromising utility. In this work, we re-examine fairness through the lens of spectral graph theory, aiming to reconcile fairness and utility within the framework of spectral graph learning. We explore the correlation between sensitive features and spectrum in GNNs, using theoretical analysis to delineate the similarity between original sensitive features and those after convolution under different spectrum. Our analysis reveals a reduction in the impact of similarity when the eigenvectors associated with the largest magnitude eigenvalue exhibit directional similarity. Based on these theoretical insights, we propose FUGNN, a novel spectral graph learning approach that harmonizes the conflict between fairness and utility. FUGNN ensures algorithmic fairness and utility by truncating the spectrum and optimizing eigenvector distribution during the encoding process. The fairness-aware eigenvector selection reduces the impact of convolution on sensitive features while concurrently minimizing the sacrifice of utility. FUGNN further optimizes the distribution of eigenvectors through a transformer architecture. By incorporating the optimized spectrum into the graph convolution network, FUGNN effectively learns node representations. Experiments on six real-world datasets demonstrate the superiority of FUGNN over baseline methods. The codes are available at https://github.com/yushuowiki/FUGNN.

Updated: 2024-05-27 10:40:21

标题: FUGNN：在图神经网络中协调公平性和效用

摘要: 公平感知图神经网络（GNNs）经常面临一个具有挑战性的折衷，即优先考虑公平可能需要牺牲效用。在这项工作中，我们通过谱图论的视角重新审视公平，旨在在谱图学习框架内协调公平和效用。我们探索GNNs中敏感特征和谱之间的相关性，使用理论分析来勾勒原始敏感特征和在不同谱下卷积后的相似性之间的相似性。我们的分析揭示了当与最大幅度特征值相关的特征向量表现出方向相似性时，相似性影响的减少。基于这些理论见解，我们提出了FUGNN，一种新颖的谱图学习方法，可以协调公平和效用之间的冲突。FUGNN通过在编码过程中截断谱并优化特征向量分布来确保算法公平性和效用性。公平感知的特征向量选择减少了卷积对敏感特征的影响，同时最小化了效用的牺牲。FUGNN通过转换器架构进一步优化特征向量的分布。通过将优化谱整合到图卷积网络中，FUGNN有效地学习节点表示。对六个真实世界数据集的实验表明FUGNN优于基准方法。代码可在https://github.com/yushuowiki/FUGNN 上找到。

更新时间: 2024-05-27 10:40:21

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2405.17034v1

GRAPES: Learning to Sample Graphs for Scalable Graph Neural Networks

Graph neural networks (GNNs) learn to represent nodes by aggregating information from their neighbors. As GNNs increase in depth, their receptive field grows exponentially, leading to high memory costs. Several existing methods address this by sampling a small subset of nodes, scaling GNNs to much larger graphs. These methods are primarily evaluated on homophilous graphs, where neighboring nodes often share the same label. However, most of these methods rely on static heuristics that may not generalize across different graphs or tasks. We argue that the sampling method should be adaptive, adjusting to the complex structural properties of each graph. To this end, we introduce GRAPES, an adaptive sampling method that learns to identify the set of nodes crucial for training a GNN. GRAPES trains a second GNN to predict node sampling probabilities by optimizing the downstream task objective. We evaluate GRAPES on various node classification benchmarks, involving homophilous as well as heterophilous graphs. We demonstrate GRAPES' effectiveness in accuracy and scalability, particularly in multi-label heterophilous graphs. Unlike other sampling methods, GRAPES maintains high accuracy even with smaller sample sizes and, therefore, can scale to massive graphs. Our code is publicly available at https://github.com/dfdazac/grapes.

Updated: 2024-05-27 10:37:04

标题: GRAPES：学习如何对图进行采样以实现可扩展的图神经网络

摘要: 图神经网络（GNNs）通过聚合邻居节点的信息来学习表示节点。随着GNNs的深度增加，它们的感受野呈指数级增长，导致高昂的内存成本。一些现有方法通过对节点进行抽样来解决这个问题，将GNNs扩展到更大的图形。这些方法主要在同质图上进行评估，其中邻近节点通常具有相同的标签。然而，大多数这些方法依赖于静态启发式方法，可能无法推广到不同的图形或任务。我们认为抽样方法应该是自适应的，能够调整到每个图形的复杂结构特性。为此，我们引入了GRAPES，一种自适应抽样方法，它学习识别对训练GNN至关重要的节点集。GRAPES通过优化下游任务目标来训练第二个GNN以预测节点抽样概率。我们在涉及同质和异质图形的各种节点分类基准上评估了GRAPES。我们展示了GRAPES在准确性和可扩展性方面的有效性，特别是在多标签异质图形中。与其他抽样方法不同，GRAPES即使在较小的样本大小下也能保持高准确性，因此可以扩展到大规模图形。我们的代码可以在https://github.com/dfdazac/grapes上公开获取。

更新时间: 2024-05-27 10:37:04

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2310.03399v2

Any-step Dynamics Model Improves Future Predictions for Online and Offline Reinforcement Learning

Model-based methods in reinforcement learning offer a promising approach to enhance data efficiency by facilitating policy exploration within a dynamics model. However, accurately predicting sequential steps in the dynamics model remains a challenge due to the bootstrapping prediction, which attributes the next state to the prediction of the current state. This leads to accumulated errors during model roll-out. In this paper, we propose the Any-step Dynamics Model (ADM) to mitigate the compounding error by reducing bootstrapping prediction to direct prediction. ADM allows for the use of variable-length plans as inputs for predicting future states without frequent bootstrapping. We design two algorithms, ADMPO-ON and ADMPO-OFF, which apply ADM in online and offline model-based frameworks, respectively. In the online setting, ADMPO-ON demonstrates improved sample efficiency compared to previous state-of-the-art methods. In the offline setting, ADMPO-OFF not only demonstrates superior performance compared to recent state-of-the-art offline approaches but also offers better quantification of model uncertainty using only a single ADM.

Updated: 2024-05-27 10:33:53

标题: 任意步骤动态模型提高了在线和离线强化学习的未来预测

摘要: 强化学习中基于模型的方法为提高数据效率提供了一种有前途的途径，通过在动态模型中促进政策探索。然而，准确预测动态模型中的连续步骤仍然是一个挑战，因为引导预测会导致将下一个状态归因于当前状态的预测。这会导致模型展开过程中累积误差。在本文中，我们提出了任意步骤动态模型（ADM），通过将引导预测减少到直接预测，以减轻复合误差。ADM允许使用可变长度的计划作为输入，以预测未来状态而无需频繁引导。我们设计了两种算法，ADMPO-ON和ADMPO-OFF，分别将ADM应用于在线和离线基于模型的框架中。在在线设置中，与先前的最先进方法相比，ADMPO-ON表现出更高的样本效率。在离线设置中，ADMPO-OFF不仅与最近的最先进离线方法相比表现出更好的性能，而且仅使用单个ADM就可以更好地量化模型不确定性。

更新时间: 2024-05-27 10:33:53

领域: cs.LG

下载: http://arxiv.org/abs/2405.17031v1

SCaRL- A Synthetic Multi-Modal Dataset for Autonomous Driving

We present a novel synthetically generated multi-modal dataset, SCaRL, to enable the training and validation of autonomous driving solutions. Multi-modal datasets are essential to attain the robustness and high accuracy required by autonomous systems in applications such as autonomous driving. As deep learning-based solutions are becoming more prevalent for object detection, classification, and tracking tasks, there is great demand for datasets combining camera, lidar, and radar sensors. Existing real/synthetic datasets for autonomous driving lack synchronized data collection from a complete sensor suite. SCaRL provides synchronized Synthetic data from RGB, semantic/instance, and depth Cameras; Range-Doppler-Azimuth/Elevation maps and raw data from Radar; and 3D point clouds/2D maps of semantic, depth and Doppler data from coherent Lidar. SCaRL is a large dataset based on the CARLA Simulator, which provides data for diverse, dynamic scenarios and traffic conditions. SCaRL is the first dataset to include synthetic synchronized data from coherent Lidar and MIMO radar sensors. The dataset can be accessed here: https://fhr-ihs-sva.pages.fraunhofer.de/asp/scarl/

Updated: 2024-05-27 10:31:26

标题: SCaRL- 用于自动驾驶的合成多模态数据集

摘要: 我们提出了一个新颖的综合生成的多模态数据集SCaRL，以便训练和验证自动驾驶解决方案。多模态数据集对于实现自动系统在应用如自动驾驶中所需的健壮性和高准确性至关重要。随着基于深度学习的解决方案在目标检测、分类和跟踪任务中变得越来越普遍，对于结合摄像头、激光雷达和雷达传感器的数据集有着巨大需求。现有的用于自动驾驶的真实/合成数据集缺乏来自完整传感器组的同步数据收集。SCaRL提供了来自RGB、语义/实例和深度摄像机的同步合成数据；来自雷达的Range-Doppler-方位/仰角图和原始数据；以及来自相干激光雷达的语义、深度和多普勒数据的3D点云/2D地图。SCaRL是基于CARLA模拟器的大型数据集，提供各种多样、动态的场景和交通条件的数据。SCaRL是第一个包含来自相干激光雷达和MIMO雷达传感器的合成同步数据的数据集。可以在以下链接中访问该数据集：https://fhr-ihs-sva.pages.fraunhofer.de/asp/scarl/

更新时间: 2024-05-27 10:31:26

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2405.17030v1

Supervised Batch Normalization

Batch Normalization (BN), a widely-used technique in neural networks, enhances generalization and expedites training by normalizing each mini-batch to the same mean and variance. However, its effectiveness diminishes when confronted with diverse data distributions. To address this challenge, we propose Supervised Batch Normalization (SBN), a pioneering approach. We expand normalization beyond traditional single mean and variance parameters, enabling the identification of data modes prior to training. This ensures effective normalization for samples sharing common features. We define contexts as modes, categorizing data with similar characteristics. These contexts are explicitly defined, such as domains in domain adaptation or modalities in multimodal systems, or implicitly defined through clustering algorithms based on data similarity. We illustrate the superiority of our approach over BN and other commonly employed normalization techniques through various experiments on both single and multi-task datasets. Integrating SBN with Vision Transformer results in a remarkable \textit{15.13}\% accuracy enhancement on CIFAR-100. Additionally, in domain adaptation scenarios, employing AdaMatch demonstrates an impressive \textit{22.25}\% accuracy improvement on MNIST and SVHN compared to BN.

Updated: 2024-05-27 10:30:21

标题: 受监督的批量归一化

摘要: 批量归一化（BN）是神经网络中广泛使用的技术，通过将每个小批量归一化到相同的平均值和方差，增强了泛化能力并加快了训练速度。然而，当面对不同的数据分布时，其效果会减弱。为了解决这一挑战，我们提出了监督批量归一化（SBN）这一开创性方法。我们将归一化扩展到传统的单一平均值和方差参数之外，使其能够在训练之前识别数据模式。这确保了具有共同特征的样本进行有效归一化。我们将上下文定义为模式，对具有相似特征的数据进行分类。这些上下文可以明确定义，例如领域自适应中的领域或多模态系统中的模态，或通过基于数据相似性的聚类算法隐式定义。通过在单一和多任务数据集上进行各种实验，我们展示了我们的方法相对于BN和其他常用的归一化技术的优越性。将SBN与Vision Transformer集成，使CIFAR-100的准确率提高了显著的15.13％。此外，在领域自适应场景中，采用AdaMatch相对于BN在MNIST和SVHN上表现出了出色的22.25％的准确率提升。

更新时间: 2024-05-27 10:30:21

领域: cs.LG

下载: http://arxiv.org/abs/2405.17027v1

SWAT: Scalable and Efficient Window Attention-based Transformers Acceleration on FPGAs

Efficiently supporting long context length is crucial for Transformer models. The quadratic complexity of the self-attention computation plagues traditional Transformers. Sliding window-based static sparse attention mitigates the problem by limiting the attention scope of the input tokens, reducing the theoretical complexity from quadratic to linear. Although the sparsity induced by window attention is highly structured, it does not align perfectly with the microarchitecture of the conventional accelerators, leading to suboptimal implementation. In response, we propose a dataflow-aware FPGA-based accelerator design, SWAT, that efficiently leverages the sparsity to achieve scalable performance for long input. The proposed microarchitecture is based on a design that maximizes data reuse by using a combination of row-wise dataflow, kernel fusion optimization, and an input-stationary design considering the distributed memory and computation resources of FPGA. Consequently, it achieves up to 22$\times$ and 5.7$\times$ improvement in latency and energy efficiency compared to the baseline FPGA-based accelerator and 15$\times$ energy efficiency compared to GPU-based solution.

Updated: 2024-05-27 10:25:08

标题: SWAT：基于窗口注意力的Transformer在FPGAs上的可扩展和高效加速

摘要: 支持长上下文长度对于Transformer模型至关重要。自注意力计算的二次复杂性困扰传统的Transformer模型。基于滑动窗口的静态稀疏注意力可以缓解这个问题，通过限制输入标记的注意力范围，将理论复杂性从二次降低至线性。虽然窗口注意力引起的稀疏性高度结构化，但与传统加速器的微体系结构并不完全对齐，导致实现不够优化。因此，我们提出了一种基于FPGA的数据流感知加速器设计，SWAT，有效利用稀疏性实现长输入的可扩展性性能。所提出的微体系结构基于最大化数据重用的设计，通过使用一种组合行数据流、内核融合优化和考虑FPGA的分布式内存和计算资源的输入静态设计。因此，与基线FPGA加速器相比，延迟和能效分别提高了22倍和5.7倍，与基于GPU的解决方案相比，能效提高了15倍。

更新时间: 2024-05-27 10:25:08

领域: cs.AR,cs.AI

下载: http://arxiv.org/abs/2405.17025v1

Compositional Few-Shot Class-Incremental Learning

Few-shot class-incremental learning (FSCIL) is proposed to continually learn from novel classes with only a few samples after the (pre-)training on base classes with sufficient data. However, this remains a challenge. In contrast, humans can easily recognize novel classes with a few samples. Cognitive science demonstrates that an important component of such human capability is compositional learning. This involves identifying visual primitives from learned knowledge and then composing new concepts using these transferred primitives, making incremental learning both effective and interpretable. To imitate human compositional learning, we propose a cognitive-inspired method for the FSCIL task. We define and build a compositional model based on set similarities, and then equip it with a primitive composition module and a primitive reuse module. In the primitive composition module, we propose to utilize the Centered Kernel Alignment (CKA) similarity to approximate the similarity between primitive sets, allowing the training and evaluation based on primitive compositions. In the primitive reuse module, we enhance primitive reusability by classifying inputs based on primitives replaced with the closest primitives from other classes. Experiments on three datasets validate our method, showing it outperforms current state-of-the-art methods with improved interpretability. Our code is available at https://github.com/Zoilsen/Comp-FSCIL.

Updated: 2024-05-27 10:21:38

标题: 组合式少样本类增量学习

摘要: Few-shot class-incremental learning (FSCIL)是一种提出的方法，它可以在基类具有足够数据的情况下，仅使用少量样本连续地学习新类别。然而，这仍然是一个挑战。相比之下，人类可以轻松地用少量样本识别新类别。认知科学表明，这种人类能力的一个重要组成部分是组合学习。这涉及从学习的知识中识别视觉基元，然后利用这些传递的基元组合新概念，使增量学习既有效又可解释。为了模仿人类的组合学习，我们提出了一种针对FSCIL任务的受认知启发的方法。我们基于集合相似性定义和构建了一个组合模型，然后配备了一个基元组合模块和一个基元重用模块。在基元组合模块中，我们提出利用中心核对齐（CKA）相似性来近似基元集之间的相似性，从而允许基于基元组合进行训练和评估。在基元重用模块中，我们通过将输入基于最接近的其他类别的基元替换的基元进行分类，增强了基元的可重用性。对三个数据集的实验证明了我们的方法，表明它在解释性上优于当前的最先进方法。我们的代码可在https://github.com/Zoilsen/Comp-FSCIL 上找到。

更新时间: 2024-05-27 10:21:38

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2405.17022v1

Isotropy, Clusters, and Classifiers

Whether embedding spaces use all their dimensions equally, i.e., whether they are isotropic, has been a recent subject of discussion. Evidence has been accrued both for and against enforcing isotropy in embedding spaces. In the present paper, we stress that isotropy imposes requirements on the embedding space that are not compatible with the presence of clusters -- which also negatively impacts linear classification objectives. We demonstrate this fact both mathematically and empirically and use it to shed light on previous results from the literature.

Updated: 2024-05-27 10:21:08

标题: 各向同性、簇和分类器

摘要: 最近是否嵌入空间使用所有维度是等同的，即是否各向同性，一直是一个讨论的话题。关于在嵌入空间中强制各向同性的证据已经积累了。在本文中，我们强调各向同性对嵌入空间的要求与存在聚类不相容，这也对线性分类目标产生负面影响。我们在数学和经验上证明了这一事实，并利用它来阐明以前文献中的结果。

更新时间: 2024-05-27 10:21:08

领域: cs.LG,cs.CL

下载: http://arxiv.org/abs/2402.03191v3

Sketch-and-Project Meets Newton Method: Global $\mathcal O(k^{-2})$ Convergence with Low-Rank Updates

In this paper, we propose the first sketch-and-project Newton method with fast $\mathcal O(k^{-2})$ global convergence rate for self-concordant functions. Our method, SGN, can be viewed in three ways: i) as a sketch-and-project algorithm projecting updates of Newton method, ii) as a cubically regularized Newton ethod in sketched subspaces, and iii) as a damped Newton method in sketched subspaces. SGN inherits best of all three worlds: cheap iteration costs of sketch-and-project methods, state-of-the-art $\mathcal O(k^{-2})$ global convergence rate of full-rank Newton-like methods and the algorithm simplicity of damped Newton methods. Finally, we demonstrate its comparable empirical performance to baseline algorithms.

Updated: 2024-05-27 10:10:06

标题: 草图和投影遇上牛顿法：低秩更新实现全局$\mathcal O(k^{-2})$收敛

摘要: 在本文中，我们提出了第一个具有快速$\mathcal O(k^{-2})$全局收敛速率的基于草图和投影的牛顿方法，适用于自共轭函数。我们的方法SGN可以从三个方面来看待：i）作为一个将牛顿方法的更新投影到草图中的算法，ii）作为在草图子空间中进行三次正则化的牛顿方法，以及iii）作为在草图子空间中进行阻尼牛顿法。SGN继承了这三种方法的优点：具有草图和投影方法的低迭代成本，具有全秩牛顿样式方法的最先进的$\mathcal O(k^{-2})$全局收敛速率，以及具有阻尼牛顿方法的算法简单性。最后，我们展示了它与基准算法具有可比较的实证性能。

更新时间: 2024-05-27 10:10:06

领域: math.OC,cs.LG

下载: http://arxiv.org/abs/2305.13082v4

Analysis of Multiscale Reinforcement Q-Learning Algorithms for Mean Field Control Games

Mean Field Control Games (MFCG), introduced in [Angiuli et al., 2022a], represent competitive games between a large number of large collaborative groups of agents in the infinite limit of number and size of groups. In this paper, we prove the convergence of a three-timescale Reinforcement Q-Learning (RL) algorithm to solve MFCG in a model-free approach from the point of view of representative agents. Our analysis uses a Q-table for finite state and action spaces updated at each discrete time-step over an infinite horizon. In [Angiuli et al., 2023], we proved convergence of two-timescale algorithms for MFG and MFC separately highlighting the need to follow multiple population distributions in the MFC case. Here, we integrate this feature for MFCG as well as three rates of update decreasing to zero in the proper ratios. Our technique of proof uses a generalization to three timescales of the two-timescale analysis in [Borkar, 1997]. We give a simple example satisfying the various hypothesis made in the proof of convergence and illustrating the performance of the algorithm.

Updated: 2024-05-27 10:01:52

标题: 多尺度强化Q学习算法在均场控制游戏中的分析

摘要: Mean Field Control Games（MFCG）是在[Angiuli等，2022a]中引入的，代表了在群体数量和大小的无限极限下，大量大型协作代理组之间的竞争游戏。在本文中，我们证明了一个三时间尺度的强化Q学习（RL）算法在从代表性代理的角度以模型无关的方式解决MFCG的收敛性。我们的分析使用一个Q表，用于有限状态和动作空间，在无限时间跨度内每个离散时间步更新。在[Angiuli等，2023]中，我们证明了MFG和MFC分别的两时间尺度算法的收敛性，强调了在MFC情况下需要遵循多个人口分布。在这里，我们将这个特性整合到MFCG中，并且更新速率以适当比例递减到零。我们的证明技术使用了[Borkar，1997]中两时间尺度分析的三时间尺度的推广。我们给出一个简单的例子，满足证明收敛性的各种假设，并展示算法的性能。

更新时间: 2024-05-27 10:01:52

领域: math.OC,cs.LG,cs.MA

下载: http://arxiv.org/abs/2405.17017v1

Position: Foundation Agents as the Paradigm Shift for Decision Making

Decision making demands intricate interplay between perception, memory, and reasoning to discern optimal policies. Conventional approaches to decision making face challenges related to low sample efficiency and poor generalization. In contrast, foundation models in language and vision has showcased rapid adaptation to diverse new tasks. Therefore, we advocate for the construction of foundation agents as a transformative shift in the learning paradigm of agents. This proposal is underpinned by the formulation of foundation agents with its fundamental characteristics and challenges motivated by the success of large language models (LLMs). Moreover, we specify the roadmap of foundation agents from large interactive data collection or generation, to self-supervised pretraining and adaptation, and knowledge and value alignment with LLMs. Lastly, we pinpoint critical research questions derived from the formulation and delineate trends for foundation agents supported by real-world use cases, addressing both technical and theoretical aspects to propel the field towards a more comprehensive and impactful future.

Updated: 2024-05-27 09:54:50

标题: 职位：基金会代理人作为决策制定的范式转变

摘要: 决策制定需要感知、记忆和推理之间复杂的相互作用，以确定最佳策略。传统的决策方法面临着与低样本效率和泛化能力差相关的挑战。相比之下，语言和视觉基础模型展示了对各种新任务的快速适应能力。因此，我们主张将基础代理构建为学习范式中的一次转变性转变。这一提议基于对基础代理的制定，其基本特征和挑战是受到大型语言模型（LLMs）成功的启发。此外，我们详细说明了基础代理的路线图，从大规模交互式数据收集或生成，到自监督预训练和适应，以及与LLMs的知识和价值对齐。最后，我们指出了从制定中得出的关键研究问题，并勾勒了基础代理的发展趋势，支持真实世界用例，解决技术和理论方面的问题，推动该领域朝着更全面和有影响力的未来发展。

更新时间: 2024-05-27 09:54:50

领域: cs.AI

下载: http://arxiv.org/abs/2405.17009v1

Human-in-the-loop: Towards Label Embeddings for Measuring Classification Difficulty

Uncertainty in machine learning models is a timely and vast field of research. In supervised learning, uncertainty can already occur in the first stage of the training process, the annotation phase. This scenario is particularly evident when some instances cannot be definitively classified. In other words, there is inevitable ambiguity in the annotation step and hence, not necessarily a "ground truth" associated with each instance. The main idea of this work is to drop the assumption of a ground truth label and instead embed the annotations into a multidimensional space. This embedding is derived from the empirical distribution of annotations in a Bayesian setup, modeled via a Dirichlet-Multinomial framework. We estimate the model parameters and posteriors using a stochastic Expectation Maximization algorithm with Markov Chain Monte Carlo steps. The methods developed in this paper readily extend to various situations where multiple annotators independently label instances. To showcase the generality of the proposed approach, we apply our approach to three benchmark datasets for image classification and Natural Language Inference. Besides the embeddings, we can investigate the resulting correlation matrices, which reflect the semantic similarities of the original classes very well for all three exemplary datasets.

Updated: 2024-05-27 09:53:01

标题: 人在循环中：朝向用于衡量分类困难度的标签嵌入

摘要: 机器学习模型中的不确定性是一个及时且广泛的研究领域。在监督学习中，不确定性可能已经在训练过程的第一阶段，即注释阶段中出现。当一些实例无法明确分类时，这种情况特别明显。换句话说，在注释步骤中不可避免地存在模糊性，因此，并非每个实例都有一个“地面真相”相关联。这项工作的主要思想是放弃地面真相标签的假设，而是将注释嵌入到一个多维空间中。这种嵌入是通过贝叶斯设置中的注释的经验分布导出的，通过Dirichlet-Multinomial框架建模。我们使用随机期望最大化算法和马尔可夫链蒙特卡洛步骤估计模型参数和后验。本文开发的方法很容易扩展到多个注释者独立标记实例的各种情况。为展示所提出方法的普遍性，我们将我们的方法应用于三个图像分类和自然语言推理的基准数据集。除了嵌入外，我们还可以调查结果相关矩阵，这些矩阵非常好地反映了三个示例数据集的原始类的语义相似性。

更新时间: 2024-05-27 09:53:01

领域: cs.LG,stat.AP

下载: http://arxiv.org/abs/2311.08874v2

Polyhedral Complex Derivation from Piecewise Trilinear Networks

Recent advancements in visualizing deep neural networks provide insights into their structures and mesh extraction from Continuous Piecewise Affine (CPWA) functions. Meanwhile, developments in neural surface representation learning incorporate non-linear positional encoding, addressing issues like spectral bias; however, this poses challenges in applying mesh extraction techniques based on CPWA functions. Focusing on trilinear interpolating methods as positional encoding, we present theoretical insights and an analytical mesh extraction, showing the transformation of hypersurfaces to flat planes within the trilinear region under the eikonal constraint. Moreover, we introduce a method for approximating intersecting points among three hypersurfaces contributing to broader applications. We empirically validate correctness and parsimony through chamfer distance and efficiency, and angular distance, while examining the correlation between the eikonal loss and the planarity of the hypersurfaces.

Updated: 2024-05-27 09:50:32

标题: 从分段三线性网络推导多面体复合体

摘要: 最近在可视化深度神经网络方面取得的进展为我们提供了对其结构的洞察，并从连续分段仿射（CPWA）函数中提取网格。与此同时，神经表面表示学习的发展包括非线性位置编码，解决了光谱偏差等问题；然而，这在应用基于CPWA函数的网格提取技术时带来了挑战。着重于三线性插值方法作为位置编码，我们提出了理论见解和分析网格提取，展示了在eikonal约束下将超曲面转换为平面的过程。此外，我们介绍了一种用于近似计算三个超曲面之间的交点的方法，以扩展其应用范围。我们通过chamfer距离和效率以及角度距离来经验性地验证正确性和简洁性，同时检查eikonal损失与超曲面的平面性之间的相关性。

更新时间: 2024-05-27 09:50:32

领域: cs.LG,cs.AI,cs.CV,cs.GR

下载: http://arxiv.org/abs/2402.10403v2

Graph Condensation for Open-World Graph Learning

The burgeoning volume of graph data presents significant computational challenges in training graph neural networks (GNNs), critically impeding their efficiency in various applications. To tackle this challenge, graph condensation (GC) has emerged as a promising acceleration solution, focusing on the synthesis of a compact yet representative graph for efficiently training GNNs while retaining performance. Despite the potential to promote scalable use of GNNs, existing GC methods are limited to aligning the condensed graph with merely the observed static graph distribution. This limitation significantly restricts the generalization capacity of condensed graphs, particularly in adapting to dynamic distribution changes. In real-world scenarios, however, graphs are dynamic and constantly evolving, with new nodes and edges being continually integrated. Consequently, due to the limited generalization capacity of condensed graphs, applications that employ GC for efficient GNN training end up with sub-optimal GNNs when confronted with evolving graph structures and distributions in dynamic real-world situations. To overcome this issue, we propose open-world graph condensation (OpenGC), a robust GC framework that integrates structure-aware distribution shift to simulate evolving graph patterns and exploit the temporal environments for invariance condensation. This approach is designed to extract temporal invariant patterns from the original graph, thereby enhancing the generalization capabilities of the condensed graph and, subsequently, the GNNs trained on it. Extensive experiments on both real-world and synthetic evolving graphs demonstrate that OpenGC outperforms state-of-the-art (SOTA) GC methods in adapting to dynamic changes in open-world graph environments.

Updated: 2024-05-27 09:47:09

标题: 图收缩用于开放世界图学习

摘要: 图形数据的不断增长量在训练图神经网络（GNNs）方面提出了重大的计算挑战，严重影响了它们在各种应用中的效率。为了解决这一挑战，图形浓缩（GC）已经成为一种有前途的加速解决方案，重点是合成一个紧凑而具有代表性的图形，以有效地训练GNNs同时保持性能。尽管有可能促进GNNs的可扩展使用，但现有的GC方法仅限于将浓缩图与仅观察到的静态图分布进行对齐。这一限制显著限制了浓缩图的泛化能力，特别是在适应动态分布变化方面。然而，在现实场景中，图形是动态的，不断发展，新的节点和边缘不断集成。因此，由于浓缩图的有限泛化能力，将GC用于高效GNN训练的应用在面对动态现实世界情况下的演变图结构和分布时，最终得到的是次优GNN。为了克服这个问题，我们提出了开放世界图形浓缩（OpenGC），这是一个强大的GC框架，它整合了结构感知的分布转移，模拟演变的图形模式，利用时间环境进行不变性浓缩。这种方法旨在从原始图形中提取时间不变模式，从而增强浓缩图的泛化能力，进而增强在其上训练的GNN的泛化能力。对真实世界和合成演变图形的广泛实验表明，OpenGC在适应开放世界图形环境中的动态变化方面优于最先进的GC方法。

更新时间: 2024-05-27 09:47:09

领域: cs.LG

下载: http://arxiv.org/abs/2405.17003v1

Almost sure convergence rates of stochastic gradient methods under gradient domination

Stochastic gradient methods are among the most important algorithms in training machine learning problems. While classical assumptions such as strong convexity allow a simple analysis they are rarely satisfied in applications. In recent years, global and local gradient domination properties have shown to be a more realistic replacement of strong convexity. They were proved to hold in diverse settings such as (simple) policy gradient methods in reinforcement learning and training of deep neural networks with analytic activation functions. We prove almost sure convergence rates $f(X_n)-f^*\in o\big( n^{-\frac{1}{4\beta-1}+\epsilon}\big)$ of the last iterate for stochastic gradient descent (with and without momentum) under global and local $\beta$-gradient domination assumptions. The almost sure rates get arbitrarily close to recent rates in expectation. Finally, we demonstrate how to apply our results to the training task in both supervised and reinforcement learning.

Updated: 2024-05-27 09:43:50

标题: 随机梯度方法在梯度支配条件下的几乎必然收敛速率

摘要: 随机梯度方法是训练机器学习问题中最重要的算法之一。虽然传统的假设，如强凸性，可以进行简单的分析，但在应用中很少被满足。近年来，全局和局部梯度支配性质被证明是强凸性的更现实的替代品。它们被证明在不同的设置中成立，例如（简单的）政策梯度方法在强化学习中以及具有解析激活函数的深度神经网络的训练中。我们证明了在全局和局部β-梯度支配假设下，随机梯度下降（带动量和不带动量）的最后迭代的几乎必然收敛速率$f（X_n）-f^* \in o\big( n^{-\frac{1}{4\beta-1}+\epsilon}\big)$。几乎必然的收敛速率接近最近的期望速率。最后，我们展示了如何将我们的结果应用于监督学习和强化学习中的训练任务。

更新时间: 2024-05-27 09:43:50

领域: cs.LG,math.OC

下载: http://arxiv.org/abs/2405.13592v2

Semi-supervised Multimodal Representation Learning through a Global Workspace

Recent deep learning models can efficiently combine inputs from different modalities (e.g., images and text) and learn to align their latent representations, or to translate signals from one domain to another (as in image captioning, or text-to-image generation). However, current approaches mainly rely on brute-force supervised training over large multimodal datasets. In contrast, humans (and other animals) can learn useful multimodal representations from only sparse experience with matched cross-modal data. Here we evaluate the capabilities of a neural network architecture inspired by the cognitive notion of a "Global Workspace": a shared representation for two (or more) input modalities. Each modality is processed by a specialized system (pretrained on unimodal data, and subsequently frozen). The corresponding latent representations are then encoded to and decoded from a single shared workspace. Importantly, this architecture is amenable to self-supervised training via cycle-consistency: encoding-decoding sequences should approximate the identity function. For various pairings of vision-language modalities and across two datasets of varying complexity, we show that such an architecture can be trained to align and translate between two modalities with very little need for matched data (from 4 to 7 times less than a fully supervised approach). The global workspace representation can be used advantageously for downstream classification tasks and for robust transfer learning. Ablation studies reveal that both the shared workspace and the self-supervised cycle-consistency training are critical to the system's performance.

Updated: 2024-05-27 09:43:02

标题: 半监督多模态表示学习通过全局工作空间

摘要: 最近的深度学习模型能够有效地将不同模态（例如图像和文本）的输入结合起来，并学习对齐它们的潜在表示，或者将一个领域的信号转换到另一个领域（如图像字幕或文本到图像生成）。然而，当前的方法主要依赖于对大型多模态数据集进行蛮力监督训练。相比之下，人类（和其他动物）可以仅通过与匹配的跨模态数据的稀疏经验学习有用的多模态表示。在这里，我们评估了一个灵感来自“全局工作区”认知概念的神经网络结构的能力：两个（或更多）输入模态的共享表示。每个模态都由一个专门的系统处理（在单模态数据上预训练，然后冻结）。然后将相应的潜在表示编码到一个共享工作区中，然后解码。重要的是，这种架构适用于通过循环一致性进行自监督训练：编码-解码序列应该近似于身份函数。通过视觉-语言模态的各种配对以及跨两个不同复杂度的数据集，我们展示了这样的架构可以在非常少的匹配数据的情况下进行训练，实现两个模态之间的对齐和转换（比完全监督方法少 4 到 7 倍）。全局工作区表示可以有利地用于下游分类任务和鲁棒的迁移学习。消融研究表明，共享工作区和自监督循环一致性训练对系统的性能至关重要。

更新时间: 2024-05-27 09:43:02

领域: cs.AI,q-bio.NC

下载: http://arxiv.org/abs/2306.15711v2

CECILIA: Comprehensive Secure Machine Learning Framework

Since ML algorithms have proven their success in many different applications, there is also a big interest in privacy preserving (PP) ML methods for building models on sensitive data. Moreover, the increase in the number of data sources and the high computational power required by those algorithms force individuals to outsource the training and/or the inference of a ML model to the clouds providing such services. To address this, we propose a secure 3-party computation framework, CECILIA, offering PP building blocks to enable complex operations privately. In addition to the adapted and common operations like addition and multiplication, it offers multiplexer, most significant bit and modulus conversion. The first two are novel in terms of methodology and the last one is novel in terms of both functionality and methodology. CECILIA also has two complex novel methods, which are the exact exponential of a public base raised to the power of a secret value and the inverse square root of a secret Gram matrix. We use CECILIA to realize the private inference on pre-trained RKNs, which require more complex operations than most other DNNs, on the structural classification of proteins as the first study ever accomplishing the PP inference on RKNs. In addition to the successful private computation of basic building blocks, the results demonstrate that we perform the exact and fully private exponential computation, which is done by approximation in the literature so far. Moreover, they also show that we compute the exact inverse square root of a secret Gram matrix up to a certain privacy level, which has not been addressed in the literature at all. We also analyze the scalability of CECILIA to various settings on a synthetic dataset. The framework shows a great promise to make other ML algorithms as well as further computations privately computable by the building blocks of the framework.

Updated: 2024-05-27 09:42:39

标题: CECILIA：全面安全的机器学习框架

摘要: 由于机器学习算法在许多不同应用中已经证明了其成功，因此对于在敏感数据上构建模型的隐私保护（PP）机器学习方法也引起了很大兴趣。此外，数据源数量的增加和这些算法所需的高计算能力迫使个人将机器学习模型的训练和/或推断外包给提供此类服务的云。为了解决这个问题，我们提出了一个安全的三方计算框架CECILIA，提供PP构建模块以实现私密的复杂操作。除了适应和常见的操作如加法和乘法，它还提供了多路复用器、最高位和模转换。前两者在方法论上是新颖的，而最后一个在功能和方法论上都是新颖的。CECILIA还有两种复杂的新方法，即将公共基数的精确指数提高到秘密值的幂和秘密Gram矩阵的倒数平方根。我们使用CECILIA在预训练的RKN上实现私密推断，这需要比大多数其他DNN更复杂的操作，在蛋白质的结构分类上作为第一项研究成果实现了对RKN的PP推断。除了成功地进行基本构建块的私密计算外，结果表明我们执行了精确且完全私密的指数计算，迄今为止在文献中都是通过近似方法完成的。此外，他们还表明我们在一定的隐私级别上计算了秘密Gram矩阵的精确倒数平方根，这在文献中尚未得到解决。我们还分析了CECILIA在合成数据集上不同设置的可扩展性。该框架显示了使其他机器学习算法以及进一步计算通过框架的构建块私密计算的巨大潜力。

更新时间: 2024-05-27 09:42:39

领域: cs.LG,cs.CR

下载: http://arxiv.org/abs/2202.03023v3

Vision-and-Language Navigation Generative Pretrained Transformer

In the Vision-and-Language Navigation (VLN) field, agents are tasked with navigating real-world scenes guided by linguistic instructions. Enabling the agent to adhere to instructions throughout the process of navigation represents a significant challenge within the domain of VLN. To address this challenge, common approaches often rely on encoders to explicitly record past locations and actions, increasing model complexity and resource consumption. Our proposal, the Vision-and-Language Navigation Generative Pretrained Transformer (VLN-GPT), adopts a transformer decoder model (GPT2) to model trajectory sequence dependencies, bypassing the need for historical encoding modules. This method allows for direct historical information access through trajectory sequence, enhancing efficiency. Furthermore, our model separates the training process into offline pre-training with imitation learning and online fine-tuning with reinforcement learning. This distinction allows for more focused training objectives and improved performance. Performance assessments on the VLN dataset reveal that VLN-GPT surpasses complex state-of-the-art encoder-based models.

Updated: 2024-05-27 09:42:04

标题: 视觉和语言导航生成预训练变压器

摘要: 在视觉与语言导航（VLN）领域，代理人被赋予任务根据语言指令导航现实世界场景。使代理人在整个导航过程中遵循指令代表了VLN领域内的一个重要挑战。为了解决这一挑战，常见的方法通常依赖于编码器来明确记录过去的位置和动作，增加了模型复杂性和资源消耗。我们的提议，即视觉与语言导航生成预训练变换器（VLN-GPT），采用了变换器解码器模型（GPT2）来建模轨迹序列依赖性，绕过了历史编码模块的需要。这种方法通过轨迹序列允许直接访问历史信息，提高了效率。此外，我们的模型将训练过程分为离线预训练和模仿学习，以及在线微调和强化学习。这种区分允许更加专注的训练目标和改进的性能。在VLN数据集上的性能评估显示，VLN-GPT超越了复杂的最新基于编码器的模型。

更新时间: 2024-05-27 09:42:04

领域: cs.AI,cs.CL,cs.CV,cs.RO

下载: http://arxiv.org/abs/2405.16994v1

Smooth Kolmogorov Arnold networks enabling structural knowledge representation

Kolmogorov-Arnold Networks (KANs) offer an efficient and interpretable alternative to traditional multi-layer perceptron (MLP) architectures due to their finite network topology. However, according to the results of Kolmogorov and Vitushkin, the representation of generic smooth functions by KAN implementations using analytic functions constrained to a finite number of cutoff points cannot be exact. Hence, the convergence of KAN throughout the training process may be limited. This paper explores the relevance of smoothness in KANs, proposing that smooth, structurally informed KANs can achieve equivalence to MLPs in specific function classes. By leveraging inherent structural knowledge, KANs may reduce the data required for training and mitigate the risk of generating hallucinated predictions, thereby enhancing model reliability and performance in computational biomedicine.

Updated: 2024-05-27 09:32:35

标题: 平滑的科尔莫戈洛夫阿诺德网络实现结构化知识表示

摘要: Kolmogorov-Arnold网络（KANs）通过其有限的网络拓扑结构，提供了一种高效且可解释的替代传统多层感知器（MLP）架构的方法。然而，根据Kolmogorov和Vitushkin的研究结果，使用解析函数受限于有限数量的截止点的KAN实现来表示通用的平滑函数是不精确的。因此，在训练过程中KAN的收敛可能会受到限制。本文探讨了KAN中平滑性的相关性，提出了具有平滑性和结构信息的KAN可以在特定函数类中达到与MLP的等效性。通过利用内在的结构知识，KAN可以减少训练所需的数据量，并减轻产生幻觉预测的风险，从而提高计算生物医学中模型的可靠性和性能。

更新时间: 2024-05-27 09:32:35

领域: cs.LG,cond-mat.dis-nn,cs.AI,stat.ML

下载: http://arxiv.org/abs/2405.11318v2

Protecting Federated Learning from Extreme Model Poisoning Attacks via Multidimensional Time Series Anomaly Detection

Current defense mechanisms against model poisoning attacks in federated learning (FL) systems have proven effective up to a certain threshold of malicious clients. In this work, we introduce FLANDERS, a novel pre-aggregation filter for FL resilient to large-scale model poisoning attacks, i.e., when malicious clients far exceed legitimate participants. FLANDERS treats the sequence of local models sent by clients in each FL round as a matrix-valued time series. Then, it identifies malicious client updates as outliers in this time series by comparing actual observations with estimates generated by a matrix autoregressive forecasting model maintained by the server. Experiments conducted in several non-iid FL setups show that FLANDERS significantly improves robustness across a wide spectrum of attacks when paired with standard and robust existing aggregation methods.

Updated: 2024-05-27 09:30:37

标题: 通过多维时间序列异常检测保护联邦学习免受极端模型毒化攻击

摘要: 当前在联邦学习（FL）系统中针对模型毒化攻击的防御机制已经被证明在恶意客户端达到一定阈值时是有效的。在这项工作中，我们引入了FLANDERS，这是一种新颖的针对大规模模型毒化攻击具有弹性的FL预聚合过滤器，即当恶意客户端远远超过合法参与者时。FLANDERS将每一轮FL中客户端发送的本地模型序列视为矩阵值时间序列。然后，通过将服务器维护的矩阵自回归预测模型生成的估计与实际观察进行比较，将恶意客户端更新识别为该时间序列中的异常值。在几个非独立同分布的FL设置中进行的实验表明，当与标准和现有的强大聚合方法配对时，FLANDERS显著提高了对各种攻击的稳健性。

更新时间: 2024-05-27 09:30:37

领域: cs.LG,cs.AI,cs.CR,stat.ML

下载: http://arxiv.org/abs/2303.16668v2

Hacking Task Confounder in Meta-Learning

Meta-learning enables rapid generalization to new tasks by learning knowledge from various tasks. It is intuitively assumed that as the training progresses, a model will acquire richer knowledge, leading to better generalization performance. However, our experiments reveal an unexpected result: there is negative knowledge transfer between tasks, affecting generalization performance. To explain this phenomenon, we conduct Structural Causal Models (SCMs) for causal analysis. Our investigation uncovers the presence of spurious correlations between task-specific causal factors and labels in meta-learning. Furthermore, the confounding factors differ across different batches. We refer to these confounding factors as "Task Confounders". Based on these findings, we propose a plug-and-play Meta-learning Causal Representation Learner (MetaCRL) to eliminate task confounders. It encodes decoupled generating factors from multiple tasks and utilizes an invariant-based bi-level optimization mechanism to ensure their causality for meta-learning. Extensive experiments on various benchmark datasets demonstrate that our work achieves state-of-the-art (SOTA) performance.

Updated: 2024-05-27 09:29:21

标题: Meta-Learning中的任务混淆者入侵

摘要: 元学习通过从各种任务中学习知识，实现对新任务的快速泛化。直觉上认为，随着训练的进行，模型将获得更丰富的知识，从而实现更好的泛化性能。然而，我们的实验揭示了一个意外的结果：任务之间存在负面知识转移，影响了泛化性能。为了解释这一现象，我们进行了因果分析的结构性因果模型（SCMs）。我们的研究揭示了元学习中任务特定因果因素与标签之间存在虚假相关性。此外，混杂因素在不同批次之间也有所不同。我们将这些混杂因素称为“任务混杂因素”。基于这些发现，我们提出了一个即插即用的元学习因果表示学习器（MetaCRL），以消除任务混杂因素。它将来自多个任务的解耦生成因素进行编码，并利用基于不变量的双层优化机制来确保它们在元学习中的因果性。对各种基准数据集的大量实验证明，我们的工作实现了业界领先的性能。

更新时间: 2024-05-27 09:29:21

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2312.05771v4

Energy-Guided Continuous Entropic Barycenter Estimation for General Costs

Optimal transport (OT) barycenters are a mathematically grounded way of averaging probability distributions while capturing their geometric properties. In short, the barycenter task is to take the average of a collection of probability distributions w.r.t. given OT discrepancies. We propose a novel algorithm for approximating the continuous Entropic OT (EOT) barycenter for arbitrary OT cost functions. Our approach is built upon the dual reformulation of the EOT problem based on weak OT, which has recently gained the attention of the ML community. Beyond its novelty, our method enjoys several advantageous properties: (i) we establish quality bounds for the recovered solution; (ii) this approach seamlessly interconnects with the Energy-Based Models (EBMs) learning procedure enabling the use of well-tuned algorithms for the problem of interest; (iii) it provides an intuitive optimization scheme avoiding min-max, reinforce and other intricate technical tricks. For validation, we consider several low-dimensional scenarios and image-space setups, including non-Euclidean cost functions. Furthermore, we investigate the practical task of learning the barycenter on an image manifold generated by a pretrained generative model, opening up new directions for real-world applications.

Updated: 2024-05-27 09:24:19

标题: 能量引导的连续熵巴氏中心估计方法用于一般成本

摘要: 最优输运（OT）重心是一种数学基础的方法，可以在捕捉概率分布的几何属性的同时对其进行平均。简而言之，重心任务是在给定OT差异方面对一组概率分布进行平均。我们提出了一种新颖的算法，用于近似连续熵OT（EOT）重心，适用于任意OT成本函数。我们的方法建立在基于弱OT的EOT问题的对偶重构基础上，最近引起了ML社区的关注。除了其新颖性外，我们的方法具有几个有利特性：（i）我们为恢复的解决方案建立了质量界限；（ii）这种方法与基于能量的模型（EBM）学习过程无缝连接，可以利用经过良好调整的算法解决感兴趣的问题；（iii）它提供了直观的优化方案，避免了极小-极大、强化和其他复杂的技术技巧。为了验证，我们考虑了几种低维场景和图像空间设置，包括非欧几里德成本函数。此外，我们研究了在由预训练生成模型生成的图像流形上学习重心的实际任务，开辟了真实世界应用的新方向。

更新时间: 2024-05-27 09:24:19

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2310.01105v3

Publicly-Detectable Watermarking for Language Models

We present a highly detectable, trustless watermarking scheme for LLMs: the detection algorithm contains no secret information, and it is executable by anyone. We embed a publicly-verifiable cryptographic signature into LLM output using rejection sampling. We prove that our scheme is cryptographically correct, sound, and distortion-free. We make novel uses of error-correction techniques to overcome periods of low entropy, a barrier for all prior watermarking schemes. We implement our scheme and make empirical measurements over open models in the 2.7B to 70B parameter range. Our experiments suggest that our formal claims are met in practice.

Updated: 2024-05-27 09:24:16

标题: 语言模型的公开可检测数字水印技术

摘要: 我们提出了一种高度可检测的、无需信任的数字水印方案，适用于LLMs：检测算法不包含任何秘密信息，任何人都可以执行。我们使用拒绝取样将一个公开可验证的加密签名嵌入到LLM输出中。我们证明了我们的方案在密码学上是正确的、可靠的和无失真的。我们对纠错技术进行了新颖的运用，以克服低熵时期的障碍，这是所有先前水印方案的问题。我们实施了我们的方案，并在2.7B到70B参数范围内的开放模型中进行了实证测量。我们的实验表明我们的正式声明在实践中得到了验证。

更新时间: 2024-05-27 09:24:16

领域: cs.LG,cs.CL,cs.CR

下载: http://arxiv.org/abs/2310.18491v2

Accelerating Parallel Sampling of Diffusion Models

Diffusion models have emerged as state-of-the-art generative models for image generation. However, sampling from diffusion models is usually time-consuming due to the inherent autoregressive nature of their sampling process. In this work, we propose a novel approach that accelerates the sampling of diffusion models by parallelizing the autoregressive process. Specifically, we reformulate the sampling process as solving a system of triangular nonlinear equations through fixed-point iteration. With this innovative formulation, we explore several systematic techniques to further reduce the iteration steps required by the solving process. Applying these techniques, we introduce ParaTAA, a universal and training-free parallel sampling algorithm that can leverage extra computational and memory resources to increase the sampling speed. Our experiments demonstrate that ParaTAA can decrease the inference steps required by common sequential sampling algorithms such as DDIM and DDPM by a factor of 4$\sim$14 times. Notably, when applying ParaTAA with 100 steps DDIM for Stable Diffusion, a widely-used text-to-image diffusion model, it can produce the same images as the sequential sampling in only 7 inference steps. The code is available at https://github.com/TZW1998/ParaTAA-Diffusion.

Updated: 2024-05-27 09:23:24

标题: 加速扩散模型的并行采样

摘要: 扩散模型已成为图像生成的最先进生成模型。然而，由于其采样过程的固有自回归性质，从扩散模型中抽样通常是耗时的。在这项工作中，我们提出了一种新颖的方法，通过并行化自回归过程加速扩散模型的抽样。具体而言，我们将抽样过程重新制定为通过固定点迭代解决一系列三角非线性方程。通过这种创新性的制定，我们探索了几种系统技术，进一步减少解决过程所需的迭代步骤。应用这些技术，我们引入了ParaTAA，这是一种通用且无需训练的并行抽样算法，可以利用额外的计算和内存资源来提高抽样速度。我们的实验表明，ParaTAA可以将常见的顺序抽样算法（如DDIM和DDPM）所需的推断步骤减少4$\sim$14倍。值得注意的是，当将ParaTAA应用于100步的DDIM用于稳定扩散时，一种广泛使用的文本到图像扩散模型，它只需要7个推断步骤就可以生成与顺序抽样相同的图像。源代码可在https://github.com/TZW1998/ParaTAA-Diffusion获取。

更新时间: 2024-05-27 09:23:24

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2402.09970v2

OSLO: One-Shot Label-Only Membership Inference Attacks

We introduce One-Shot Label-Only (OSLO) membership inference attacks (MIAs), which accurately infer a given sample's membership in a target model's training set with high precision using just \emph{a single query}, where the target model only returns the predicted hard label. This is in contrast to state-of-the-art label-only attacks which require $\sim6000$ queries, yet get attack precisions lower than OSLO's. OSLO leverages transfer-based black-box adversarial attacks. The core idea is that a member sample exhibits more resistance to adversarial perturbations than a non-member. We compare OSLO against state-of-the-art label-only attacks and demonstrate that, despite requiring only one query, our method significantly outperforms previous attacks in terms of precision and true positive rate (TPR) under the same false positive rates (FPR). For example, compared to previous label-only MIAs, OSLO achieves a TPR that is 7$\times$ to 28$\times$ stronger under a 0.1\% FPR on CIFAR10 for a ResNet model. We evaluated multiple defense mechanisms against OSLO.

Updated: 2024-05-27 09:21:40

标题: 奥斯陆：一次性仅标签成员推断攻击

摘要: 我们引入了一种称为One-Shot Label-Only (OSLO)成员推断攻击（MIAs），能够准确推断目标模型训练集中给定样本的成员资格，仅使用一个查询即可实现高精度，其中目标模型仅返回预测的硬标签。这与需要大约6000个查询的最先进的仅标签攻击形成对比，但攻击精度低于OSLO。 OSLO利用基于转移的黑盒对抗攻击。其核心思想是成员样本比非成员更具抗性。我们将OSLO与最先进的仅标签攻击进行比较，并证明，尽管仅需要一个查询，我们的方法在精度和真正阳性率（TPR）方面在相同的假阳性率（FPR）下明显优于先前的攻击。例如，与先前的仅标签MIAs相比，在CIFAR10上对于ResNet模型，在0.1\%的FPR下，OSLO实现了7倍到28倍的更强TPR。我们评估了针对OSLO的多种防御机制。

更新时间: 2024-05-27 09:21:40

领域: cs.LG

下载: http://arxiv.org/abs/2405.16978v1

Accurate LoRA-Finetuning Quantization of LLMs via Information Retention

The LoRA-finetuning quantization of LLMs has been extensively studied to obtain accurate yet compact LLMs for deployment on resource-constrained hardware. However, existing methods cause the quantized LLM to severely degrade and even fail to benefit from the finetuning of LoRA. This paper proposes a novel IR-QLoRA for pushing quantized LLMs with LoRA to be highly accurate through information retention. The proposed IR-QLoRA mainly relies on two technologies derived from the perspective of unified information: (1) statistics-based Information Calibration Quantization allows the quantized parameters of LLM to retain original information accurately; (2) finetuning-based Information Elastic Connection makes LoRA utilizes elastic representation transformation with diverse information. Comprehensive experiments show that IR-QLoRA can significantly improve accuracy across LLaMA and LLaMA2 families under 2-4 bit-widths, e.g., 4- bit LLaMA-7B achieves 1.4% improvement on MMLU compared with the state-of-the-art methods. The significant performance gain requires only a tiny 0.31% additional time consumption, revealing the satisfactory efficiency of our IR-QLoRA. We highlight that IR-QLoRA enjoys excellent versatility, compatible with various frameworks (e.g., NormalFloat and Integer quantization) and brings general accuracy gains. The code is available at https://github.com/htqin/ir-qlora.

Updated: 2024-05-27 09:20:35

标题: 通过信息保留实现LLMs的准确LoRA微调量化

摘要: LoRA微调量化LLM的研究已经得到广泛关注，以获取准确而紧凑的LLM，以便在资源受限的硬件上部署。然而，现有方法会导致量化的LLM严重退化，甚至无法从LoRA的微调中受益。本文提出了一种新颖的IR-QLoRA，通过信息保留推动带有LoRA的量化LLM变得高度准确。提出的IR-QLoRA主要依赖于从统一信息角度得出的两项技术：（1）基于统计的信息校准量化允许LLM的量化参数准确保留原始信息；（2）基于微调的信息弹性连接使LoRA利用具有多样信息的弹性表示变换。综合实验表明，IR-QLoRA可以在2-4位宽度下显著提高LLaMA和LLaMA2系列的准确性，例如，4位LLaMA-7B相比最先进的方法在MMLU上实现了1.4%的改进。显著的性能增益仅需要额外的0.31%的时间消耗，显示了我们的IR-QLoRA的令人满意的效率。我们强调IR-QLoRA具有出色的通用性，与各种框架兼容（例如，NormalFloat和整数量化），并带来普遍的准确性增益。代码可在https://github.com/htqin/ir-qlora获取。

更新时间: 2024-05-27 09:20:35

领域: cs.LG,cs.CL

下载: http://arxiv.org/abs/2402.05445v2

Towards Optimizing with Large Language Models

In this work, we conduct an assessment of the optimization capabilities of LLMs across various tasks and data sizes. Each of these tasks corresponds to unique optimization domains, and LLMs are required to execute these tasks with interactive prompting. That is, in each optimization step, the LLM generates new solutions from the past generated solutions with their values, and then the new solutions are evaluated and considered in the next optimization step. Additionally, we introduce three distinct metrics for a comprehensive assessment of task performance from various perspectives. These metrics offer the advantage of being applicable for evaluating LLM performance across a broad spectrum of optimization tasks and are less sensitive to variations in test samples. By applying these metrics, we observe that LLMs exhibit strong optimization capabilities when dealing with small-sized samples. However, their performance is significantly influenced by factors like data size and values, underscoring the importance of further research in the domain of optimization tasks for LLMs.

Updated: 2024-05-27 09:13:26

标题: 朝向利用大型语言模型进行优化

摘要: 在这项工作中，我们对LLMs在各种任务和数据规模下的优化能力进行评估。每个任务对应于独特的优化领域，LLMs需要通过交互式提示来执行这些任务。也就是说，在每个优化步骤中，LLMs会根据过去生成的解决方案及其值生成新的解决方案，然后评估和考虑这些新的解决方案在下一个优化步骤中的表现。此外，我们引入了三个不同的指标，以从各种角度全面评估任务性能。这些指标具有评估LLMs在广泛的优化任务中的表现，并对测试样本变化不太敏感的优势。通过应用这些指标，我们观察到LLMs在处理小规模样本时表现出强大的优化能力。然而，它们的性能受到数据规模和值等因素的显著影响，突显了在LLMs的优化任务领域进一步研究的重要性。

更新时间: 2024-05-27 09:13:26

领域: cs.LG

下载: http://arxiv.org/abs/2310.05204v3

A Correlation- and Mean-Aware Loss Function and Benchmarking Framework to Improve GAN-based Tabular Data Synthesis

Advancements in science rely on data sharing. In medicine, where personal data are often involved, synthetic tabular data generated by generative adversarial networks (GANs) offer a promising avenue. However, existing GANs struggle to capture the complexities of real-world tabular data, which often contain a mix of continuous and categorical variables with potential imbalances and dependencies. We propose a novel correlation- and mean-aware loss function designed to address these challenges as a regularizer for GANs. To ensure a rigorous evaluation, we establish a comprehensive benchmarking framework using ten real-world datasets and eight established tabular GAN baselines. The proposed loss function demonstrates statistically significant improvements over existing methods in capturing the true data distribution, significantly enhancing the quality of synthetic data generated with GANs. The benchmarking framework shows that the enhanced synthetic data quality leads to improved performance in downstream machine learning (ML) tasks, ultimately paving the way for easier data sharing.

Updated: 2024-05-27 09:08:08

标题: 一个相关性和均值感知的损失函数和基准框架，用于改进基于GAN的表格数据合成

摘要: 科学的进步依赖于数据共享。在医学领域，个人数据经常涉及其中，由生成对抗网络（GANs）生成的合成表格数据提供了一个有前途的途径。然而，现有的GANs往往难以捕捉实际表格数据的复杂性，这些数据通常包含连续和分类变量的混合，具有潜在的不平衡和依赖关系。我们提出了一种新颖的基于相关性和均值的损失函数，旨在作为GANs的正则化器来解决这些挑战。为了确保严格的评估，我们建立了一个综合基准框架，使用十个真实世界的数据集和八个已建立的表格GAN基线。所提出的损失函数在捕捉真实数据分布方面表现出明显的统计学显著性改进，显著提高了使用GANs生成的合成数据的质量。基准框架显示，增强的合成数据质量导致下游机器学习（ML）任务中表现的提高，最终为更容易的数据共享铺平了道路。

更新时间: 2024-05-27 09:08:08

领域: cs.LG

下载: http://arxiv.org/abs/2405.16971v1

Speck: A Smart event-based Vision Sensor with a low latency 327K Neuron Convolutional Neuronal Network Processing Pipeline

Edge computing solutions that enable the extraction of high-level information from a variety of sensors is in increasingly high demand. This is due to the increasing number of smart devices that require sensory processing for their application on the edge. To tackle this problem, we present a smart vision sensor System on Chip (SoC), featuring an event-based camera and a low-power asynchronous spiking Convolutional Neural Network (sCNN) computing architecture embedded on a single chip. By combining both sensor and processing on a single die, we can lower unit production costs significantly. Moreover, the simple end-to-end nature of the SoC facilitates small stand-alone applications as well as functioning as an edge node in larger systems. The event-driven nature of the vision sensor delivers high-speed signals in a sparse data stream. This is reflected in the processing pipeline, which focuses on optimising highly sparse computation and minimising latency for 9 sCNN layers to 3.36{\mu}s for an incoming event. Overall, this results in an extremely low-latency visual processing pipeline deployed on a small form factor with a low energy budget and sensor cost. We present the asynchronous architecture, the individual blocks, and the sCNN processing principle and benchmark against other sCNN capable processors.

Updated: 2024-05-27 09:06:35

标题: Speck：一种具有低延迟327K神经元卷积神经网络处理流水线的智能事件驱动视觉传感器

摘要: 越来越多的智能设备需要在边缘进行感知处理，因此对能够从各种传感器中提取高级信息的边缘计算解决方案需求日益增加。为了解决这一问题，我们提出了一种智能视觉传感器SoC（片上系统），其特点是具有事件驱动相机和低功耗异步脉冲卷积神经网络（sCNN）计算架构，嵌入在单个芯片上。通过将传感器和处理器组合在一个芯片上，我们可以显著降低单元生产成本。此外，SoC的端对端简单性有助于实现小型独立应用，并可用作较大系统中的边缘节点。视觉传感器的事件驱动特性提供了稀疏数据流中的高速信号。这体现在处理管道中，重点是优化高度稀疏计算并将9个sCNN层的延迟最小化至3.36微秒以处理传入事件。总体而言，这导致在小型尺寸、低能耗和传感器成本的预算下部署了极低延迟的视觉处理管道。我们介绍了异步架构、各个模块以及sCNN处理原则，并与其他支持sCNN的处理器进行了对比。

更新时间: 2024-05-27 09:06:35

领域: cs.NE,cs.LG,eess.IV

下载: http://arxiv.org/abs/2304.06793v2

WASH: Train your Ensemble with Communication-Efficient Weight Shuffling, then Average

The performance of deep neural networks is enhanced by ensemble methods, which average the output of several models. However, this comes at an increased cost at inference. Weight averaging methods aim at balancing the generalization of ensembling and the inference speed of a single model by averaging the parameters of an ensemble of models. Yet, naive averaging results in poor performance as models converge to different loss basins, and aligning the models to improve the performance of the average is challenging. Alternatively, inspired by distributed training, methods like DART and PAPA have been proposed to train several models in parallel such that they will end up in the same basin, resulting in good averaging accuracy. However, these methods either compromise ensembling accuracy or demand significant communication between models during training. In this paper, we introduce WASH, a novel distributed method for training model ensembles for weight averaging that achieves state-of-the-art image classification accuracy. WASH maintains models within the same basin by randomly shuffling a small percentage of weights during training, resulting in diverse models and lower communication costs compared to standard parameter averaging methods.

Updated: 2024-05-27 09:02:57

标题: WASH：通过高效通信的权重重排来训练您的集成模型，然后进行平均

摘要: 深度神经网络的性能通过集成方法得到增强，这些方法会平均多个模型的输出。然而，这会增加推断的成本。权重平均方法旨在通过对模型集合的参数进行平均来平衡集成的泛化能力和单个模型的推断速度。然而，简单的平均会导致性能不佳，因为模型会收敛到不同的损失盆地，并且对齐模型以改善平均性能具有挑战性。受分布式训练的启发，类似于DART和PAPA的方法已被提出，以并行训练多个模型，使它们最终进入相同的盆地，从而实现良好的平均精度。然而，这些方法要么会牺牲集成的准确性，要么在训练期间需要模型之间大量的通信。在本文中，我们介绍了一种名为WASH的新型分布式方法，用于训练模型集合以进行权重平均，从而实现最先进的图像分类准确性。WASH通过在训练过程中随机洗牌一小部分权重来保持模型在同一盆地内，从而产生多样化的模型，并与标准参数平均方法相比具有更低的通信成本。

更新时间: 2024-05-27 09:02:57

领域: cs.LG,cs.CV,cs.NE,stat.ML

下载: http://arxiv.org/abs/2405.17517v1

Time Elastic Neural Networks

We introduce and detail an atypical neural network architecture, called time elastic neural network (teNN), for multivariate time series classification. The novelty compared to classical neural network architecture is that it explicitly incorporates time warping ability, as well as a new way of considering attention. In addition, this architecture is capable of learning a dropout strategy, thus optimizing its own architecture.Behind the design of this architecture, our overall objective is threefold: firstly, we are aiming at improving the accuracy of instance based classification approaches that shows quite good performances as far as enough training data is available. Secondly we seek to reduce the computational complexity inherent to these methods to improve their scalability. Ideally, we seek to find an acceptable balance between these first two criteria. And finally, we seek to enhance the explainability of the decision provided by this kind of neural architecture.The experiment demonstrates that the stochastic gradient descent implemented to train a teNN is quite effective. To the extent that the selection of some critical meta-parameters is correct, convergence is generally smooth and fast.While maintaining good accuracy, we get a drastic gain in scalability by first reducing the required number of reference time series, i.e. the number of teNN cells required. Secondly, we demonstrate that, during the training process, the teNN succeeds in reducing the number of neurons required within each cell. Finally, we show that the analysis of the activation and attention matrices as well as the reference time series after training provides relevant information to interpret and explain the classification results.The comparative study that we have carried out and which concerns around thirty diverse and multivariate datasets shows that the teNN obtains results comparable to those of the state of the art, in particular similar to those of a network mixing LSTM and CNN architectures for example.

Updated: 2024-05-27 09:01:30

标题: 时间弹性神经网络

摘要: 我们介绍并详细介绍了一种名为时间弹性神经网络（teNN）的非典型神经网络架构，用于多变量时间序列分类。与传统神经网络架构相比的新颖之处在于它明确地融入了时间弯曲能力，以及一种新的考虑注意力的方式。此外，这种架构能够学习一种丢弃策略，从而优化自身架构。在设计这种架构背后，我们的总体目标是三方面：首先，我们的目标是改善基于实例的分类方法的准确性，只要有足够的训练数据，它们就表现出相当不错的性能。其次，我们试图减少这些方法固有的计算复杂性，以提高其可扩展性。理想情况下，我们希望在这两个标准之间找到一个可接受的平衡。最后，我们试图增强这种神经架构提供的决策的可解释性。实验表明，用于训练teNN的随机梯度下降是非常有效的。在某种程度上，只要选择一些关键的元参数是正确的，收敛通常是平稳且快速的。在保持良好准确性的同时，我们通过首先减少所需的参考时间序列数量，即所需的teNN单元数量，获得了显著的可扩展性增益。其次，我们证明，在训练过程中，teNN成功地减少了每个单元内所需的神经元数量。最后，我们展示，在训练后分析激活和注意力矩阵以及参考时间序列提供了解释和解释分类结果的相关信息。我们进行的比较研究涉及约三十个不同的多变量数据集，显示出teNN获得了与现有技术水平相当的结果，特别是与混合LSTM和CNN架构的网络类似的结果。

更新时间: 2024-05-27 09:01:30

领域: cs.NE,cs.AI,cs.LG

下载: http://arxiv.org/abs/2405.17516v1

Dual-Delayed Asynchronous SGD for Arbitrarily Heterogeneous Data

We consider the distributed learning problem with data dispersed across multiple workers under the orchestration of a central server. Asynchronous Stochastic Gradient Descent (SGD) has been widely explored in such a setting to reduce the synchronization overhead associated with parallelization. However, the performance of asynchronous SGD algorithms often depends on a bounded dissimilarity condition among the workers' local data, a condition that can drastically affect their efficiency when the workers' data are highly heterogeneous. To overcome this limitation, we introduce the \textit{dual-delayed asynchronous SGD (DuDe-ASGD)} algorithm designed to neutralize the adverse effects of data heterogeneity. DuDe-ASGD makes full use of stale stochastic gradients from all workers during asynchronous training, leading to two distinct time lags in the model parameters and data samples utilized in the server's iterations. Furthermore, by adopting an incremental aggregation strategy, DuDe-ASGD maintains a per-iteration computational cost that is on par with traditional asynchronous SGD algorithms. Our analysis demonstrates that DuDe-ASGD achieves a near-minimax-optimal convergence rate for smooth nonconvex problems, even when the data across workers are extremely heterogeneous. Numerical experiments indicate that DuDe-ASGD compares favorably with existing asynchronous and synchronous SGD-based algorithms.

Updated: 2024-05-27 09:00:30

标题: 双延迟异步随机梯度下降用于任意异构数据

摘要: 我们考虑了数据分布在多个工作节点上，由中央服务器进行协调的分布式学习问题。在这种情况下，已经广泛探讨了异步随机梯度下降（SGD）算法，以减少与并行化相关的同步开销。然而，异步SGD算法的性能通常取决于工作节点本地数据之间的有界不相似性条件，这种条件可能在工作节点数据高度异质化时极大影响其效率。为了克服这一限制，我们引入了\textit{双延迟异步SGD（DuDe-ASGD）}算法，旨在抵消数据异质性的不利影响。DuDe-ASGD充分利用了所有工作节点的陈旧随机梯度，在异步训练期间引入了两个不同的时间滞后，用于服务器迭代中使用的模型参数和数据样本。此外，通过采用增量聚合策略，DuDe-ASGD保持了与传统异步SGD算法相当的每次迭代计算成本。我们的分析表明，DuDe-ASGD对于平滑非凸问题实现了近似极小极大收敛速度，即使在工作节点之间的数据极其异质化时也是如此。数值实验表明，DuDe-ASGD与现有的异步和同步SGD算法相比具有明显优势。

更新时间: 2024-05-27 09:00:30

领域: cs.LG,math.OC

下载: http://arxiv.org/abs/2405.16966v1

Exploring the LLM Journey from Cognition to Expression with Linear Representations

This paper presents an in-depth examination of the evolution and interplay of cognitive and expressive capabilities in large language models (LLMs), with a specific focus on Baichuan-7B and Baichuan-33B, an advanced bilingual (Chinese and English) LLM series. We define and explore the model's cognitive and expressive capabilities through linear representations across three critical phases: Pretraining, Supervised Fine-Tuning (SFT), and Reinforcement Learning from Human Feedback (RLHF). Cognitive capability is defined as the quantity and quality of information conveyed by the neuron output vectors within the network, similar to the neural signal processing in human cognition. Expressive capability is defined as the model's capability to produce word-level output. Our findings unveil a sequential development pattern, where cognitive abilities are largely established during Pretraining, whereas expressive abilities predominantly advance during SFT and RLHF. Statistical analyses confirm a significant correlation between the two capabilities, suggesting that cognitive capacity may limit expressive potential. The paper also explores the theoretical underpinnings of these divergent developmental trajectories and their connection to the LLMs' architectural design. Moreover, we evaluate various optimization-independent strategies, such as few-shot learning and repeated sampling, which bridge the gap between cognitive and expressive capabilities. This research reveals the potential connection between the hidden space and the output space, contributing valuable insights into the interpretability and controllability of their training processes.

Updated: 2024-05-27 08:57:04

标题: 用线性表示探索从认知到表达的LLM之旅

摘要: 本文对大型语言模型（LLMs）中认知和表达能力的演变和互动进行了深入研究，特别关注了Baichuan-7B和Baichuan-33B，这是一种先进的双语（中英文）LLM系列。我们通过线性表示跨三个关键阶段（预训练、监督微调（SFT）和从人类反馈中进行强化学习（RLHF））定义和探索了模型的认知和表达能力。认知能力被定义为网络内神经元输出向量传达的信息的数量和质量，类似于人类认知中的神经信号处理。表达能力被定义为模型生成单词级输出的能力。我们的研究揭示了一个顺序发展模式，其中认知能力在预训练阶段主要建立，而表达能力在SFT和RLHF阶段主要发展。统计分析验证了两种能力之间的显著相关性，表明认知能力可能限制表达潜力。本文还探讨了这些不同发展轨迹的理论基础及其与LLMs的架构设计的联系。此外，我们评估了各种与优化无关的策略，如少样本学习和重复取样，以弥合认知和表达能力之间的差距。这项研究揭示了隐藏空间与输出空间之间的潜在联系，为他们的训练过程的可解释性和可控性提供了宝贵的见解。

更新时间: 2024-05-27 08:57:04

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2405.16964v1

Blind Data Adaptation to tackle Covariate Shift in Operational Steganalysis

The proliferation of image manipulation for unethical purposes poses significant challenges in social networks. One particularly concerning method is Image Steganography, allowing individuals to hide illegal information in digital images without arousing suspicions. Such a technique pose severe security risks, making it crucial to develop effective steganalysis methods enabling to detect manipulated images for clandestine communications. Although significant advancements have been achieved with machine learning models, a critical issue remains: the disparity between the controlled datasets used to train steganalysis models against real-world datasets of forensic practitioners, undermining severely the practical effectiveness of standardized steganalysis models. In this paper, we address this issue focusing on a realistic scenario where practitioners lack crucial information about the limited target set of images under analysis, including details about their development process and even whereas it contains manipulated images or not. By leveraging geometric alignment and distribution matching of source and target residuals, we develop TADA (Target Alignment through Data Adaptation), a novel methodology enabling to emulate sources aligned with specific targets in steganalysis, which is also relevant for highly unbalanced targets. The emulator is represented by a light convolutional network trained to align distributions of image residuals. Experimental validation demonstrates the potential of our strategy over traditional methods fighting covariate shift in steganalysis.

Updated: 2024-05-27 08:55:22

标题: 盲数据适应以应对操作隐写分析中的协变量转移

摘要: 图像操纵在社交网络中的不道德目的的激增带来了重大挑战。一种特别令人担忧的方法是图像隐写术，允许个人在数字图像中隐藏非法信息而不引起怀疑。这种技术带来严重的安全风险，因此至关重要开发有效的隐写分析方法，以便检测用于秘密通信的操纵图像。尽管通过机器学习模型取得了重大进展，但一个关键问题仍然存在：训练隐写分析模型的受控数据集与法医实践者使用的真实数据集之间存在巨大差异，严重削弱了标准隐写分析模型的实际有效性。在本文中，我们针对这一问题，关注一种现实场景，即从业人员缺乏关于受分析图像集的有限目标集的关键信息，包括关于其开发过程的细节，甚至包括是否包含操纵图像。通过利用源和目标残差的几何对齐和分布匹配，我们开发了TADA（通过数据适应实现目标对齐），这是一种新颖的方法论，能够在隐写分析中模拟与特定目标对齐的源，也适用于高度不平衡的目标。该仿真器由一个轻量级卷积网络表示，经过训练可对齐图像残差的分布。实验验证表明，我们的策略相对于传统方法在隐写分析中对抗协变量转移具有潜力。

更新时间: 2024-05-27 08:55:22

领域: eess.IV,cs.AI,cs.CR,cs.MM

下载: http://arxiv.org/abs/2405.16961v1

Large Deviations of Gaussian Neural Networks with ReLU activation

We prove a large deviation principle for deep neural networks with Gaussian weights and (at most linearly growing) activation functions. This generalises earlier work, in which bounded and continuous activation functions were considered. In practice, linearly growing activation functions such as ReLU are most commonly used. We furthermore simplify previous expressions for the rate function and a give power-series expansions for the ReLU case.

Updated: 2024-05-27 08:53:24

标题: 用ReLU激活函数的高斯神经网络的大偏差

摘要: 我们证明了对于具有高斯权重和（最多线性增长）激活函数的深度神经网络的大偏移原理。这一结果推广了先前的研究，其中考虑了有界和连续的激活函数。在实践中，像ReLU这样的线性增长激活函数是最常用的。此外，我们简化了之前对速率函数的表达，并为ReLU情况提供了幂级数展开。

更新时间: 2024-05-27 08:53:24

领域: stat.ML,cs.LG,math.PR,60F10, 68T07

下载: http://arxiv.org/abs/2405.16958v1

LLM meets Vision-Language Models for Zero-Shot One-Class Classification

We consider the problem of zero-shot one-class visual classification, extending traditional one-class classification to scenarios where only the label of the target class is available. This method aims to discriminate between positive and negative query samples without requiring examples from the target class. We propose a two-step solution that first queries large language models for visually confusing objects and then relies on vision-language pre-trained models (e.g., CLIP) to perform classification. By adapting large-scale vision benchmarks, we demonstrate the ability of the proposed method to outperform adapted off-the-shelf alternatives in this setting. Namely, we propose a realistic benchmark where negative query samples are drawn from the same original dataset as positive ones, including a granularity-controlled version of iNaturalist, where negative samples are at a fixed distance in the taxonomy tree from the positive ones. To our knowledge, we are the first to demonstrate the ability to discriminate a single category from other semantically related ones using only its label.

Updated: 2024-05-27 08:53:15

标题: LLM遇见视觉-语言模型：零样本单类分类

摘要: 我们考虑了零样本单类视觉分类的问题，将传统的单类分类扩展到只有目标类别标签可用的情况。该方法旨在在无需来自目标类别的示例的情况下区分正负查询样本。我们提出了一个两步解决方案，首先查询大型语言模型以找到视觉上混淆的对象，然后依赖于视觉-语言预训练模型（例如，CLIP）进行分类。通过调整大规模视觉基准，我们展示了所提方法在这种情况下胜过自适应的现成替代方案的能力。换句话说，我们提出了一个现实基准，其中负查询样本来自与正样本相同的原始数据集，包括一个粒度受控版本的iNaturalist，其中负样本在分类树中与正样本固定距离。据我们所知，我们是第一个演示能够仅使用其标签区分单一类别与其他语义相关类别的能力的研究。

更新时间: 2024-05-27 08:53:15

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2404.00675v3

Functional Programming Paradigm of Python for Scientific Computation Pipeline Integration

The advent of modern data processing has led to an increasing tendency towards interdisciplinarity, which frequently involves the importation of different technical approaches. Consequently, there is an urgent need for a unified data control system to facilitate the integration of varying libraries. This integration is of profound significance in accelerating prototype verification, optimising algorithm performance and minimising maintenance costs. This paper presents a novel functional programming (FP) paradigm based on the Python architecture and associated suites in programming practice, designed for the integration of pipelines of different data mapping operations. In particular, the solution is intended for the integration of scientific computation flows, which affords a robust yet flexible solution for the aforementioned challenges.

Updated: 2024-05-27 08:46:57

标题: Python的函数式编程范式在科学计算管道集成中的应用

摘要: 现代数据处理技术的出现导致了跨学科研究的趋势增加，这经常涉及不同技术方法的引入。因此，迫切需要一个统一的数据控制系统，以促进不同库的集成。这种集成对于加速原型验证、优化算法性能和减少维护成本具有重要意义。本文提出了一种基于Python架构和相关编程实践套件的新颖函数式编程（FP）范式，旨在用于集成不同数据映射操作的流水线。特别是，该解决方案旨在用于集成科学计算流程，为上述挑战提供了强大而灵活的解决方案。

更新时间: 2024-05-27 08:46:57

领域: cs.LG,cs.AI,cs.CE,cs.SE

下载: http://arxiv.org/abs/2405.16956v1

Convergence of SGD with momentum in the nonconvex case: A novel time window-based analysis

We propose a novel time window-based analysis technique to investigate the convergence behavior of the stochastic gradient descent method with momentum (SGDM) in nonconvex settings. Despite its popularity, the convergence behavior of SGDM remains less understood in nonconvex scenarios. This is primarily due to the absence of a sufficient descent property and challenges in controlling stochastic errors in an almost sure sense. To address these challenges, we study the behavior of SGDM over specific time windows, rather than examining the descent of consecutive iterates as in traditional analyses. This time window-based approach simplifies the convergence analysis and enables us to establish the first iterate convergence result for SGDM under the Kurdyka-Lojasiewicz (KL) property. Based on the underlying KL exponent and the utilized step size scheme, we further characterize local convergence rates of SGDM.

Updated: 2024-05-27 08:46:28

标题: SGD在非凸情况下的动量收敛：一种基于时间窗口的新颖分析

摘要: 我们提出了一种新颖的基于时间窗口的分析技术，用于研究带动量的随机梯度下降方法（SGDM）在非凸设置中的收敛行为。尽管SGDM很受欢迎，但在非凸情况下其收敛行为仍不太清楚。这主要是由于缺乏足够的下降特性和在几乎肯定意义下控制随机误差的挑战。为了解决这些挑战，我们研究了SGDM在特定时间窗口内的行为，而不是像传统分析中那样检查连续迭代的下降。这种基于时间窗口的方法简化了收敛分析，并使我们能够在Kurdyka-Lojasiewicz（KL）性质下建立SGDM的第一个迭代收敛结果。根据基础的KL指数和所使用的步长方案，我们进一步表征了SGDM的局部收敛速度。

更新时间: 2024-05-27 08:46:28

领域: math.OC,cs.LG

下载: http://arxiv.org/abs/2405.16954v1

Fast ML-driven Analog Circuit Layout using Reinforcement Learning and Steiner Trees

This paper presents an artificial intelligence driven methodology to reduce the bottleneck often encountered in the analog ICs layout phase. We frame the floorplanning problem as a Markov Decision Process and leverage reinforcement learning for automatic placement generation under established topological constraints. Consequently, we introduce Steiner tree-based methods for the global routing step and generate guiding paths to be used to connect every circuit block. Finally, by integrating these solutions into a procedural generation framework, we present a unified pipeline that bridges the divide between circuit design and verification steps. Experimental results demonstrate the efficacy in generating complete layouts, eventually reducing runtimes to 1.5% compared to manual efforts.

Updated: 2024-05-27 08:42:42

标题: 使用强化学习和斯坦纳树的快速机器学习驱动模拟电路布局

摘要: 本文提出了一种人工智能驱动的方法论，用于减少模拟集成电路布局阶段经常遇到的瓶颈。我们将布局问题框定为马尔可夫决策过程，并利用强化学习在已建立的拓扑约束条件下进行自动放置生成。因此，我们引入基于斯坦纳树的方法用于全局路由步骤，并生成引导路径用于连接每个电路块。最后，通过将这些解决方案集成到程序生成框架中，我们提出了一个统一的流程，弥合了电路设计和验证步骤之间的鸿沟。实验结果证明了在生成完整布局方面的有效性，最终将运行时间与手动工作相比减少了1.5%。

更新时间: 2024-05-27 08:42:42

领域: cs.LG

下载: http://arxiv.org/abs/2405.16951v1

Context-Former: Stitching via Latent Conditioned Sequence Modeling

Offline reinforcement learning (RL) algorithms can learn better decision-making compared to behavior policies by stitching the suboptimal trajectories to derive more optimal ones. Meanwhile, Decision Transformer (DT) abstracts the RL as sequence modeling, showcasing competitive performance on offline RL benchmarks. However, recent studies demonstrate that DT lacks of stitching capacity, thus exploiting stitching capability for DT is vital to further improve its performance. In order to endow stitching capability to DT, we abstract trajectory stitching as expert matching and introduce our approach, ContextFormer, which integrates contextual information-based imitation learning (IL) and sequence modeling to stitch sub-optimal trajectory fragments by emulating the representations of a limited number of expert trajectories. To validate our approach, we conduct experiments from two perspectives: 1) We conduct extensive experiments on D4RL benchmarks under the settings of IL, and experimental results demonstrate ContextFormer can achieve competitive performance in multiple IL settings. 2) More importantly, we conduct a comparison of ContextFormer with various competitive DT variants using identical training datasets. The experimental results unveiled ContextFormer's superiority, as it outperformed all other variants, showcasing its remarkable performance.

Updated: 2024-05-27 08:38:18

标题: 背景-形成者：通过潜在条件序列建模进行拼接

摘要: 离线强化学习（RL）算法可以通过将次优轨迹串联起来以获得更优化的决策，相较于行为策略可以学习更好的决策。同时，决策变换器（DT）将RL抽象为序列建模，展示了在离线RL基准上具有竞争性表现。然而，最近的研究表明DT缺乏串联能力，因此对DT进行串联能力的利用对进一步提高其性能至关重要。为了赋予DT串联能力，我们将轨迹串联抽象为专家匹配，并引入我们的方法ContextFormer，该方法结合基于上下文信息的模仿学习（IL）和序列建模，通过模拟有限数量的专家轨迹的表示来串联次优轨迹片段。为验证我们的方法，我们从两个角度进行实验：1）我们在IL设置下对D4RL基准进行广泛实验，实验结果表明ContextFormer在多个IL设置下可以取得竞争性表现。2）更重要的是，我们使用相同的训练数据集对ContextFormer与各种竞争性DT变体进行比较。实验结果揭示了ContextFormer的优越性，它优于所有其他变体，展示了其卓越的性能。

更新时间: 2024-05-27 08:38:18

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2401.16452v3

Biological Neurons Compete with Deep Reinforcement Learning in Sample Efficiency in a Simulated Gameworld

How do biological systems and machine learning algorithms compare in the number of samples required to show significant improvements in completing a task? We compared the learning efficiency of in vitro biological neural networks to the state-of-the-art deep reinforcement learning (RL) algorithms in a simplified simulation of the game `Pong'. Using DishBrain, a system that embodies in vitro neural networks with in silico computation using a high-density multi-electrode array, we contrasted the learning rate and the performance of these biological systems against time-matched learning from three state-of-the-art deep RL algorithms (i.e., DQN, A2C, and PPO) in the same game environment. This allowed a meaningful comparison between biological neural systems and deep RL. We find that when samples are limited to a real-world time course, even these very simple biological cultures outperformed deep RL algorithms across various game performance characteristics, implying a higher sample efficiency. Ultimately, even when tested across multiple types of information input to assess the impact of higher dimensional data input, biological neurons showcased faster learning than all deep reinforcement learning agents.

Updated: 2024-05-27 08:38:17

标题: 在模拟游戏世界中，生物神经元与深度强化学习在样本效率方面竞争

摘要: 生物系统和机器学习算法在完成任务时所需的样本数量方面有何区别？我们比较了体外生物神经网络在学习效率上与最先进的深度强化学习（RL）算法在简化的“Pong”游戏模拟中的表现。使用DishBrain，这是一个将体外神经网络与高密度多电极阵列的硅内计算相结合的系统，我们对比了这些生物系统的学习速度和性能与相同游戏环境中三种最先进的深度RL算法（即DQN，A2C和PPO）的时间匹配学习。这使得生物神经系统和深度RL之间的比较具有意义。我们发现，当样本限制在实际时间段时，即使是这些非常简单的生物培养物也在各种游戏性能特征上优于深度RL算法，这意味着更高的样本效率。最终，即使在评估更高维数据输入影响的多种信息输入情况下进行测试，生物神经元也展示出比所有深度强化学习代理更快的学习速度。

更新时间: 2024-05-27 08:38:17

领域: q-bio.NC,cs.AI

下载: http://arxiv.org/abs/2405.16946v1

Federated Full-Parameter Tuning of Billion-Sized Language Models with Communication Cost under 18 Kilobytes

Pre-trained large language models (LLMs) need fine-tuning to improve their responsiveness to natural language instructions. Federated learning offers a way to fine-tune LLMs using the abundant data on end devices without compromising data privacy. Most existing federated fine-tuning methods for LLMs rely on parameter-efficient fine-tuning techniques, which may not reach the performance height possible with full-parameter tuning. However, federated full-parameter tuning of LLMs is a non-trivial problem due to the immense communication cost. This work introduces FedKSeed that employs zeroth-order optimization with a finite set of random seeds. It significantly reduces transmission requirements between the server and clients to just a few random seeds and scalar gradients, amounting to only a few thousand bytes, making federated full-parameter tuning of billion-sized LLMs possible on devices. Building on it, we develop a strategy enabling probability-differentiated seed sampling, prioritizing perturbations with greater impact on model accuracy. Experiments across six scenarios with various LLMs, datasets and data partitions demonstrate that our approach outperforms existing federated LLM fine-tuning methods in both communication efficiency and new task generalization.

Updated: 2024-05-27 08:31:47

标题: 使用通信成本低于18千字节的联邦式全参数调整十亿规模语言模型

摘要: 预训练的大型语言模型（LLMs）需要微调以提高其对自然语言指令的响应性。联邦学习提供了一种使用终端设备上丰富数据进行LLMs微调的方式，而不会损害数据隐私。大多数现有的针对LLMs的联邦微调方法依赖于参数高效的微调技术，这可能无法达到完全参数微调可能的性能高度。然而，由于通信成本巨大，LLMs的联邦全参数微调是一个非平凡的问题。本文介绍了FedKSeed，它采用零阶优化和有限的随机种子集。它显著减少了服务器和客户端之间的传输需求，仅需要几个随机种子和标量梯度，总量仅为几千字节，使得在设备上进行对百亿级LLMs进行联邦全参数微调成为可能。在此基础上，我们开发了一种策略，使概率差异化种子采样成为可能，优先考虑对模型准确性影响更大的扰动。通过六种不同LLMs、数据集和数据分区的场景的实验表明，我们的方法在通信效率和新任务泛化方面均优于现有的联邦LLM微调方法。

更新时间: 2024-05-27 08:31:47

领域: cs.LG,cs.DC

下载: http://arxiv.org/abs/2312.06353v5

AbstractBeam: Enhancing Bottom-Up Program Synthesis using Library Learning

LambdaBeam is a state-of-the-art execution-guided algorithm for program synthesis that incorporates higher-order functions, lambda functions, and iterative loops into the Domain-Specific Language (DSL). LambdaBeam generates every program from the start. Yet, many program blocks or subprograms occur frequently in a given domain, e.g., loops to traverse a list. Thus, repeating programs can be used to enhance the synthesis algorithm. However, LambdaBeam fails to leverage this potential. For this purpose, we introduce AbstractBeam: A novel program synthesis framework that employs Library Learning to identify such program repetitions, integrates them into the DSL, and thus utilizes their potential to boost LambdaBeam's synthesis algorithm. Our experimental evaluations demonstrate that AbstractBeam significantly improves LambdaBeam's performance in the LambdaBeam integer list manipulation domain. Additionally, AbstractBeam's program generation is more efficient compared to LambdaBeam's synthesis. Finally, our findings indicate that Library Learning is effective in domains not specifically crafted to highlight its benefits.

Updated: 2024-05-27 08:31:12

标题: AbstractBeam：利用库学习增强自底向上的程序合成

摘要: LambdaBeam是一种先进的执行引导算法，用于程序合成，它将高阶函数、lambda函数和迭代循环结合到特定领域语言（DSL）中。LambdaBeam从头开始生成每个程序。然而，在特定领域中，许多程序块或子程序经常出现，例如用于遍历列表的循环。因此，重复的程序可以用来增强合成算法。然而，LambdaBeam未能充分利用这一潜力。为此，我们引入了AbstractBeam：一种新颖的程序合成框架，利用库学习来识别这种程序重复出现，将它们整合到DSL中，从而利用它们的潜力来提升LambdaBeam的合成算法。我们的实验评估表明，AbstractBeam在LambdaBeam整数列表操作领域显著提高了LambdaBeam的性能。此外，与LambdaBeam的合成相比，AbstractBeam的程序生成更加有效。最后，我们的研究结果表明，库学习在并非专门设计以突出其优点的领域中是有效的。

更新时间: 2024-05-27 08:31:12

领域: cs.SE,cs.AI,cs.PL

下载: http://arxiv.org/abs/2405.17514v1

S-Eval: Automatic and Adaptive Test Generation for Benchmarking Safety Evaluation of Large Language Models

Large Language Models have gained considerable attention for their revolutionary capabilities. However, there is also growing concern on their safety implications, making a comprehensive safety evaluation for LLMs urgently needed before model deployment. In this work, we propose S-Eval, a new comprehensive, multi-dimensional and open-ended safety evaluation benchmark. At the core of S-Eval is a novel LLM-based automatic test prompt generation and selection framework, which trains an expert testing LLM Mt combined with a range of test selection strategies to automatically construct a high-quality test suite for the safety evaluation. The key to the automation of this process is a novel expert safety-critique LLM Mc able to quantify the riskiness score of an LLM's response, and additionally produce risk tags and explanations. Besides, the generation process is also guided by a carefully designed risk taxonomy with four different levels, covering comprehensive and multi-dimensional safety risks of concern. Based on these, we systematically construct a new and large-scale safety evaluation benchmark for LLMs consisting of 220,000 evaluation prompts, including 20,000 base risk prompts (10,000 in Chinese and 10,000 in English) and 200,000 corresponding attack prompts derived from 10 popular adversarial instruction attacks against LLMs. Moreover, considering the rapid evolution of LLMs and accompanied safety threats, S-Eval can be flexibly configured and adapted to include new risks, attacks and models. S-Eval is extensively evaluated on 20 popular and representative LLMs. The results confirm that S-Eval can better reflect and inform the safety risks of LLMs compared to existing benchmarks. We also explore the impacts of parameter scales, language environments, and decoding parameters on the evaluation, providing a systematic methodology for evaluating the safety of LLMs.

Updated: 2024-05-27 08:27:29

标题: S-Eval：用于大型语言模型安全性评估基准测试的自动化和自适应测试生成

摘要: 大型语言模型因其革命性能力而引起了广泛关注。然而，人们也越来越担心它们的安全隐患，因此在模型部署之前迫切需要进行全面的安全评估。在这项工作中，我们提出了S-Eval，这是一个新的综合、多维和开放式的安全评估基准。S-Eval的核心是一个新颖的基于LLM的自动测试提示生成和选择框架，该框架训练一个专家测试LLM Mt，结合一系列测试选择策略，自动构建一个高质量的安全评估测试套件。这一过程的自动化关键在于一个新颖的专家安全批评LLM Mc，能够量化LLM响应的风险评分，并额外生成风险标签和解释。此外，生成过程还受到一个精心设计的风险分类法的指导，该分类法包括四个不同级别，涵盖了综合和多维度的关注安全风险。基于这些，我们系统地构建了一个新的大规模的LLM安全评估基准，包括22万个评估提示，其中包括2万个基本风险提示（1万个中文和1万个英文），以及从10种流行的对抗性指令攻击中导出的20万个攻击提示。此外，考虑到LLM的快速发展和伴随的安全威胁，S-Eval可以灵活配置和适应新的风险、攻击和模型。S-Eval在20个流行和代表性的LLM上进行了广泛评估。结果证实，与现有基准相比，S-Eval能更好地反映和告知LLM的安全风险。我们还探讨了参数规模、语言环境和解码参数对评估的影响，为评估LLM安全性提供了系统方法。

更新时间: 2024-05-27 08:27:29

领域: cs.CR,cs.CL

下载: http://arxiv.org/abs/2405.14191v2

Decoupled Sequence and Structure Generation for Realistic Antibody Design

Antibody design plays a pivotal role in advancing therapeutics. Although deep learning has made rapid progress in this field, existing methods jointly generate antibody sequences and structures, limiting task-specific optimization. In response, we propose an antibody sequence-structure decoupling (ASSD) framework, which separates sequence generation and structure prediction. Although our approach is simple, such a decoupling strategy has been overlooked in previous works. We also find that the widely used non-autoregressive generators promote sequences with overly repeating tokens. Such sequences are both out-of-distribution and prone to undesirable developability properties that can trigger harmful immune responses in patients. To resolve this, we introduce a composition-based objective that allows an efficient trade-off between high performance and low token repetition. Our results demonstrate that ASSD consistently outperforms existing antibody design models, while the composition-based objective successfully mitigates token repetition of non-autoregressive models. Our code is available at \url{https://github.com/lkny123/ASSD_public}.

Updated: 2024-05-27 08:24:55

标题: 解耦序列和结构生成用于现实抗体设计

摘要: 抗体设计在推动治疗方面发挥着关键作用。尽管深度学习在这一领域取得了快速进展，但现有方法同时生成抗体序列和结构，限制了特定任务的优化。为此，我们提出了一种抗体序列-结构解耦（ASSD）框架，将序列生成和结构预测分离开来。尽管我们的方法很简单，但这种解耦策略在先前的研究中被忽视了。我们还发现，广泛使用的非自回归生成器会促使序列中出现过多重复的标记。这种序列既不在分布范围内，又容易出现不良的可发展性特性，可能引发患者的有害免疫反应。为了解决这个问题，我们引入了一种基于成分的目标，允许在高性能和低标记重复之间有效地权衡。我们的结果表明，ASSD始终优于现有的抗体设计模型，而基于成分的目标成功减轻了非自回归模型中的标记重复。我们的代码可以在\url{https://github.com/lkny123/ASSD_public}上找到。

更新时间: 2024-05-27 08:24:55

领域: q-bio.QM,cs.LG

下载: http://arxiv.org/abs/2402.05982v2

Uncertainty Management in the Construction of Knowledge Graphs: a Survey

Knowledge Graphs (KGs) are a major asset for companies thanks to their great flexibility in data representation and their numerous applications, e.g., vocabulary sharing, Q/A or recommendation systems. To build a KG it is a common practice to rely on automatic methods for extracting knowledge from various heterogeneous sources. But in a noisy and uncertain world, knowledge may not be reliable and conflicts between data sources may occur. Integrating unreliable data would directly impact the use of the KG, therefore such conflicts must be resolved. This could be done manually by selecting the best data to integrate. This first approach is highly accurate, but costly and time-consuming. That is why recent efforts focus on automatic approaches, which represents a challenging task since it requires handling the uncertainty of extracted knowledge throughout its integration into the KG. We survey state-of-the-art approaches in this direction and present constructions of both open and enterprise KGs and how their quality is maintained. We then describe different knowledge extraction methods, introducing additional uncertainty. We also discuss downstream tasks after knowledge acquisition, including KG completion using embedding models, knowledge alignment, and knowledge fusion in order to address the problem of knowledge uncertainty in KG construction. We conclude with a discussion on the remaining challenges and perspectives when constructing a KG taking into account uncertainty.

Updated: 2024-05-27 08:22:52

标题: 知识图谱建设中的不确定性管理：一项调查

摘要: 知识图谱（KGs）是公司的主要资产，由于其在数据表示和众多应用方面的巨大灵活性，例如，词汇共享、问答或推荐系统。构建KG时，通常会依赖于从各种异构来源提取知识的自动方法。但在一个嘈杂和不确定的世界中，知识可能不可靠，数据来源之间可能发生冲突。整合不可靠数据将直接影响KG的使用，因此必须解决这些冲突。这可以通过手动选择最佳数据进行整合来完成。这种第一种方法非常准确，但成本高且耗时。这就是为什么最近的努力集中在自动方法上的原因，因为这需要在整合到KG中的过程中处理提取知识的不确定性，这是一个具有挑战性的任务。我们在这个方向上调查了最新的方法，并介绍了开放和企业KG的构建以及它们的质量如何被维护。然后我们描述了不同的知识提取方法，引入了额外的不确定性。我们还讨论了知识获取后的下游任务，包括使用嵌入模型进行KG完成、知识对齐和知识融合，以解决KG构建中的知识不确定性问题。最后，我们讨论了在考虑不确定性时构建KG时剩下的挑战和展望。

更新时间: 2024-05-27 08:22:52

领域: cs.AI

下载: http://arxiv.org/abs/2405.16929v1

Demystifying amortized causal discovery with transformers

Supervised learning approaches for causal discovery from observational data often achieve competitive performance despite seemingly avoiding explicit assumptions that traditional methods make for identifiability. In this work, we investigate CSIvA (Ke et al., 2023), a transformer-based model promising to train on synthetic data and transfer to real data. First, we bridge the gap with existing identifiability theory and show that constraints on the training data distribution implicitly define a prior on the test observations. Consistent with classical approaches, good performance is achieved when we have a good prior on the test data, and the underlying model is identifiable. At the same time, we find new trade-offs. Training on datasets generated from different classes of causal models, unambiguously identifiable in isolation, improves the test generalization. Performance is still guaranteed, as the ambiguous cases resulting from the mixture of identifiable causal models are unlikely to occur (which we formally prove). Overall, our study finds that amortized causal discovery still needs to obey identifiability theory, but it also differs from classical methods in how the assumptions are formulated, trading more reliance on assumptions on the noise type for fewer hypotheses on the mechanisms.

Updated: 2024-05-27 08:17:49

标题: 揭秘使用变压器进行摊销因果发现

摘要: 监督学习方法用于从观测数据中发现因果关系，尽管似乎避免了传统方法为可识别性所做的明确假设，但通常表现出竞争性能。在这项工作中，我们调查了基于变压器的模型CSIvA（Ke等人，2023年），该模型承诺在合成数据上训练并转移到真实数据上。首先，我们弥合了现有可识别性理论与显示，表明对训练数据分布的约束隐含地定义了对测试观测值的先验。一致于经典方法，当我们对测试数据有良好的先验，并且基础模型可识别时，可以实现良好性能。与此同时，我们发现了新的权衡。在从不同类别的因果模型生成的数据集上训练，这些模型在单独识别时是明确的，可以提高测试泛化能力。性能仍然得到保证，因为由可识别因果模型混合而成的模糊情况不太可能发生（我们正式证明了这一点）。总的来说，我们的研究发现，摊销因果发现仍然需要遵守可识别性理论，但它也与经典方法不同，它在假设的表述方式上更多地依赖于对噪声类型的假设，对机制的假设较少。

更新时间: 2024-05-27 08:17:49

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2405.16924v1

E2USD: Efficient-yet-effective Unsupervised State Detection for Multivariate Time Series

Cyber-physical system sensors emit multivariate time series (MTS) that monitor physical system processes. Such time series generally capture unknown numbers of states, each with a different duration, that correspond to specific conditions, e.g., "walking" or "running" in human-activity monitoring. Unsupervised identification of such states facilitates storage and processing in subsequent data analyses, as well as enhances result interpretability. Existing state-detection proposals face three challenges. First, they introduce substantial computational overhead, rendering them impractical in resourceconstrained or streaming settings. Second, although state-of-the-art (SOTA) proposals employ contrastive learning for representation, insufficient attention to false negatives hampers model convergence and accuracy. Third, SOTA proposals predominantly only emphasize offline non-streaming deployment, we highlight an urgent need to optimize online streaming scenarios. We propose E2Usd that enables efficient-yet-accurate unsupervised MTS state detection. E2Usd exploits a Fast Fourier Transform-based Time Series Compressor (fftCompress) and a Decomposed Dual-view Embedding Module (ddEM) that together encode input MTSs at low computational overhead. Additionally, we propose a False Negative Cancellation Contrastive Learning method (fnccLearning) to counteract the effects of false negatives and to achieve more cluster-friendly embedding spaces. To reduce computational overhead further in streaming settings, we introduce Adaptive Threshold Detection (adaTD). Comprehensive experiments with six baselines and six datasets offer evidence that E2Usd is capable of SOTA accuracy at significantly reduced computational overhead.

Updated: 2024-05-27 08:14:20

标题: E2USD：用于多元时间序列的高效而有效的无监督状态检测

摘要: 网络物理系统传感器发出多变量时间序列（MTS），用于监测物理系统过程。这种时间序列通常捕捉未知数量的状态，每个状态持续时间不同，对应特定条件，例如在人体活动监测中的“步行”或“跑步”。无监督识别这些状态有助于存储和处理后续数据分析中的数据，同时提高结果的可解释性。现有的状态检测提议面临三大挑战。首先，它们引入了大量的计算开销，使它们在资源受限或流式设置下不切实际。其次，尽管最先进的提议采用对比学习进行表示，但对假阴性的不足关注阻碍了模型的收敛和准确性。第三，最先进的提议主要强调离线非流式部署，我们强调迫切需要优化在线流式场景。我们提出了E2Usd，它实现了高效而准确的无监督MTS状态检测。E2Usd利用基于快速傅里叶变换的时间序列压缩器（fftCompress）和分解的双视图嵌入模块（ddEM）共同以较低的计算开销对输入MTS进行编码。此外，我们提出了一种虚假阴性消除对比学习方法（fnccLearning），以抵消假阴性的影响，并实现更友好的聚类空间。为了在流式设置中进一步降低计算开销，我们引入了自适应阈值检测（adaTD）。通过六个基线和六个数据集的全面实验，证明E2Usd能以显著降低的计算开销实现最先进的准确性。

更新时间: 2024-05-27 08:14:20

领域: cs.LG,cs.AI,cs.DB

下载: http://arxiv.org/abs/2402.14041v6

Theories of synaptic memory consolidation and intelligent plasticity for continual learning

Humans and animals learn throughout life. Such continual learning is crucial for intelligence. In this chapter, we examine the pivotal role plasticity mechanisms with complex internal synaptic dynamics could play in enabling this ability in neural networks. By surveying theoretical research, we highlight two fundamental enablers for continual learning. First, synaptic plasticity mechanisms must maintain and evolve an internal state over several behaviorally relevant timescales. Second, plasticity algorithms must leverage the internal state to intelligently regulate plasticity at individual synapses to facilitate the seamless integration of new memories while avoiding detrimental interference with existing ones. Our chapter covers successful applications of these principles to deep neural networks and underscores the significance of synaptic metaplasticity in sustaining continual learning capabilities. Finally, we outline avenues for further research to understand the brain's superb continual learning abilities and harness similar mechanisms for artificial intelligence systems.

Updated: 2024-05-27 08:13:39

标题: 突触记忆巩固理论和智能可塑性理论对持续学习的影响

摘要: 人类和动物在整个生命中都在学习。这种持续的学习对智力至关重要。在本章中，我们探讨了具有复杂内部突触动态的可塑性机制在神经网络中发挥的关键作用，使得这种能力成为可能。通过调查理论研究，我们强调了持续学习的两个基本要素。首先，突触可塑性机制必须在几个行为相关的时间尺度上维持和演变内部状态。其次，可塑性算法必须利用内部状态，智能地调节单个突触的可塑性，以促进新记忆的无缝整合，同时避免对现有记忆产生不利干扰。我们的章节涵盖了这些原则在深度神经网络中的成功应用，并强调了突触元可塑性在维持持续学习能力方面的重要性。最后，我们概述了进一步研究的途径，以了解大脑出色的持续学习能力，并利用类似的机制用于人工智能系统。

更新时间: 2024-05-27 08:13:39

领域: q-bio.NC,cs.AI,cs.LG,cs.NE

下载: http://arxiv.org/abs/2405.16922v1

SCAFFLSA: Taming Heterogeneity in Federated Linear Stochastic Approximation and TD Learning

In this paper, we analyze the sample and communication complexity of the federated linear stochastic approximation (FedLSA) algorithm. We explicitly quantify the effects of local training with agent heterogeneity. We show that the communication complexity of FedLSA scales polynomially with the inverse of the desired accuracy $\epsilon$. To overcome this, we propose SCAFFLSA a new variant of FedLSA that uses control variates to correct for client drift, and establish its sample and communication complexities. We show that for statistically heterogeneous agents, its communication complexity scales logarithmically with the desired accuracy, similar to Scaffnew. An important finding is that, compared to the existing results for Scaffnew, the sample complexity scales with the inverse of the number of agents, a property referred to as linear speed-up. Achieving this linear speed-up requires completely new theoretical arguments. We apply the proposed method to federated temporal difference learning with linear function approximation and analyze the corresponding complexity improvements.

Updated: 2024-05-27 08:13:02

标题: SCAFFLSA: 在联邦线性随机逼近和TD学习中驯服异质性

摘要: 在这篇论文中，我们分析了联邦线性随机逼近（FedLSA）算法的样本和通信复杂性。我们明确量化了局部训练与代理异质性的影响。我们展示了FedLSA的通信复杂性与所需精度$\epsilon$的倒数呈多项式关系。为了克服这一问题，我们提出了SCAFFLSA，这是FedLSA的一种新变体，它使用控制变量来校正客户端漂移，并建立其样本和通信复杂性。我们发现，对于统计异质代理，其通信复杂性与所需精度呈对数关系，类似于Scaffnew。一个重要的发现是，与现有的Scaffnew结果相比，样本复杂性与代理数量的倒数成比例增加，这种性质被称为线性加速。实现这种线性加速需要全新的理论论证。我们将所提出的方法应用于具有线性函数逼近的联邦时序差分学习，并分析相应的复杂性改进。

更新时间: 2024-05-27 08:13:02

领域: stat.ML,cs.LG,math.OC

下载: http://arxiv.org/abs/2402.04114v2

VoCoT: Unleashing Visually Grounded Multi-Step Reasoning in Large Multi-Modal Models

While large multi-modal models (LMMs) have exhibited impressive capabilities across diverse tasks, their effectiveness in handling complex tasks has been limited by the prevailing single-step reasoning paradigm. To this end, this paper proposes VoCoT, a multi-step Visually grounded object-centric Chain-of-Thought reasoning framework tailored for inference with LMMs. VoCoT is characterized by two key features: (1) object-centric reasoning paths that revolve around cross-modal shared object-level information, and (2) visually grounded representation of object concepts in a multi-modal interleaved and aligned manner, which effectively bridges the modality gap within LMMs during long-term generation. Additionally, we construct an instruction dataset to facilitate LMMs in adapting to reasoning with VoCoT. By introducing VoCoT into the prevalent open-source LMM architecture, we introduce VolCano. With only 7B parameters and limited input resolution, VolCano demonstrates excellent performance across various scenarios, surpassing SOTA models, including GPT-4V, in tasks requiring complex reasoning. Our code, data and model will be available at https://github.com/RupertLuo/VoCoT.

Updated: 2024-05-27 08:12:00

标题: VoCoT：在大型多模态模型中释放基于视觉的多步推理

摘要: 尽管大型多模态模型（LMMs）在各种任务中展示出令人印象深刻的能力，但它们在处理复杂任务方面的有效性受到了当前单步推理范式的限制。为此，本文提出了VoCoT，这是一个专为LMMs推理而设计的多步视觉对象中心化思维链框架。VoCoT具有两个关键特征：（1）以对象为中心的推理路径，围绕跨模态共享的对象级信息展开，以及（2）对象概念的视觉基础表示，以多模态交错和对齐的方式，有效地在LMMs内部填补了模态差距，实现了长期生成。此外，我们构建了一个指导数据集，以便LMMs适应与VoCoT推理。通过将VoCoT引入流行的开源LMM架构，我们引入了VolCano。仅具有7B参数和有限的输入分辨率，VolCano在各种场景中展现出出色的性能，超越了包括GPT-4V在内的SOTA模型，在需要复杂推理的任务中。我们的代码、数据和模型将在https://github.com/RupertLuo/VoCoT 上提供。

更新时间: 2024-05-27 08:12:00

领域: cs.CV,cs.AI,cs.CL

下载: http://arxiv.org/abs/2405.16919v1

The Uncanny Valley: Exploring Adversarial Robustness from a Flatness Perspective

Flatness of the loss surface not only correlates positively with generalization but is also related to adversarial robustness, since perturbations of inputs relate non-linearly to perturbations of weights. In this paper, we empirically analyze the relation between adversarial examples and relative flatness with respect to the parameters of one layer. We observe a peculiar property of adversarial examples: during an iterative first-order white-box attack, the flatness of the loss surface measured around the adversarial example first becomes sharper until the label is flipped, but if we keep the attack running it runs into a flat uncanny valley where the label remains flipped. We find this phenomenon across various model architectures and datasets. Our results also extend to large language models (LLMs), but due to the discrete nature of the input space and comparatively weak attacks, the adversarial examples rarely reach a truly flat region. Most importantly, this phenomenon shows that flatness alone cannot explain adversarial robustness unless we can also guarantee the behavior of the function around the examples. We theoretically connect relative flatness to adversarial robustness by bounding the third derivative of the loss surface, underlining the need for flatness in combination with a low global Lipschitz constant for a robust model.

Updated: 2024-05-27 08:10:46

标题: 幽谷效应：从平坦性的角度探讨对抗性强度

摘要: Loss surface的平坦度不仅与泛化正相关，而且与对抗鲁棒性有关，因为输入的扰动与权重的扰动之间存在非线性关系。在本文中，我们通过实证分析了对抗样本与相对平坦度之间的关系，相对于一个层的参数。我们观察到对抗样本的一个奇特特性：在迭代的一阶白盒攻击中，围绕对抗样本测量的损失表面的平坦度首先变得更加尖锐，直到标签翻转，但如果我们继续攻击，它会进入一个平坦的怪异谷地，标签仍然保持翻转状态。我们发现这种现象在各种模型架构和数据集中都存在。我们的结果也适用于大型语言模型（LLMs），但由于输入空间的离散性和相对较弱的攻击，对抗样本很少达到真正平坦的区域。最重要的是，这种现象表明单独的平坦度无法解释对抗鲁棒性，除非我们还能保证在示例周围的函数行为。我们通过限制损失表面的三阶导数，从理论上将相对平坦度与对抗鲁棒性联系起来，强调了在鲁棒模型中需要平坦度与低全局利普希茨常数相结合的必要性。

更新时间: 2024-05-27 08:10:46

领域: cs.LG

下载: http://arxiv.org/abs/2405.16918v1

Identifiability of total effects from abstractions of time series causal graphs

We study the problem of identifiability of the total effect of an intervention from observational time series in the situation, common in practice, where one only has access to abstractions of the true causal graph. We consider here two abstractions: the extended summary causal graph, which conflates all lagged causal relations but distinguishes between lagged and instantaneous relations, and the summary causal graph which does not give any indication about the lag between causal relations. We show that the total effect is always identifiable in extended summary causal graphs and provide sufficient conditions for identifiability in summary causal graphs. We furthermore provide adjustment sets allowing to estimate the total effect whenever it is identifiable.

Updated: 2024-05-27 08:10:40

标题: 时间序列因果图的抽象化中总效应的可识别性

摘要: 我们研究了在实践中常见的情况下，即只能访问真实因果图的抽象时，从观测时间序列中识别干预的总效应的可识别性问题。我们在这里考虑了两种抽象：扩展摘要因果图，将所有滞后因果关系混合在一起，但区分滞后和瞬时关系，以及摘要因果图，不提供有关因果关系之间滞后的任何指示。我们表明总效应在扩展摘要因果图中总是可识别的，并提供在摘要因果图中可识别性的充分条件。此外，我们提供调整集，允许在总效应可识别时估计总效应。

更新时间: 2024-05-27 08:10:40

领域: math.ST,cs.AI,stat.TH

下载: http://arxiv.org/abs/2310.14691v6

Multilingual Diversity Improves Vision-Language Representations

Massive web-crawled image-text datasets lay the foundation for recent progress in multimodal learning. These datasets are designed with the goal of training a model to do well on standard computer vision benchmarks, many of which, however, have been shown to be English-centric (e.g., ImageNet). Consequently, existing data curation techniques gravitate towards using predominantly English image-text pairs and discard many potentially useful non-English samples. Our work questions this practice. Multilingual data is inherently enriching not only because it provides a gateway to learn about culturally salient concepts, but also because it depicts common concepts differently from monolingual data. We thus conduct a systematic study to explore the performance benefits of using more samples of non-English origins with respect to English vision tasks. By translating all multilingual image-text pairs from a raw web crawl to English and re-filtering them, we increase the prevalence of (translated) multilingual data in the resulting training set. Pre-training on this dataset outperforms using English-only or English-dominated datasets on ImageNet, ImageNet distribution shifts, image-English-text retrieval and on average across 38 tasks from the DataComp benchmark. On a geographically diverse task like GeoDE, we also observe improvements across all regions, with the biggest gain coming from Africa. In addition, we quantitatively show that English and non-English data are significantly different in both image and (translated) text space. We hope that our findings motivate future work to be more intentional about including multicultural and multilingual data, not just when non-English or geographically diverse tasks are involved, but to enhance model capabilities at large.

Updated: 2024-05-27 08:08:51

标题: 多语言多样性改善视觉-语言表示

摘要: 大规模网络爬虫图像文本数据集为最近多模态学习的进展奠定了基础。这些数据集旨在训练模型在标准计算机视觉基准上表现良好，然而，许多基准都显示出英语为中心（例如，ImageNet）。因此，现有的数据筛选技术倾向于使用主要是英语的图像文本对，并丢弃许多潜在有用的非英语样本。我们的工作对这种做法提出了质疑。多语言数据本质上是丰富的，不仅因为它提供了学习文化相关概念的通道，而且因为它在展示共同概念时与单语数据有所不同。因此，我们进行了系统研究，探讨使用更多非英语来源样本在英语视觉任务中带来的性能优势。通过将原始网络爬取的所有多语言图像文本对翻译成英语并重新筛选，我们增加了（翻译后的）多语言数据在结果训练集中的普及程度。在这个数据集上的预训练表现优于仅使用英语或英语为主导的数据集在ImageNet、ImageNet分布转移、图像-英语文本检索以及DataComp基准中38个任务的平均表现。在地理多样化任务GeoDE上，我们也观察到在所有地区都有改进，非洲地区的增益最大。此外，我们定量显示英语和非英语数据在图像和（翻译后的）文本空间中显著不同。我们希望我们的发现能激励未来的工作更加有意识地包含多文化和多语言数据，不仅仅是在涉及非英语或地理多样化任务时，而是为了提升模型的整体能力。

更新时间: 2024-05-27 08:08:51

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2405.16915v1

Beyond Concept Bottleneck Models: How to Make Black Boxes Intervenable?

Recently, interpretable machine learning has re-explored concept bottleneck models (CBM). An advantage of this model class is the user's ability to intervene on predicted concept values, affecting the downstream output. In this work, we introduce a method to perform such concept-based interventions on pretrained neural networks, which are not interpretable by design, only given a small validation set with concept labels. Furthermore, we formalise the notion of intervenability as a measure of the effectiveness of concept-based interventions and leverage this definition to fine-tune black boxes. Empirically, we explore the intervenability of black-box classifiers on synthetic tabular and natural image benchmarks. We focus on backbone architectures of varying complexity, from simple, fully connected neural nets to Stable Diffusion. We demonstrate that the proposed fine-tuning improves intervention effectiveness and often yields better-calibrated predictions. To showcase the practical utility of our techniques, we apply them to deep chest X-ray classifiers and show that fine-tuned black boxes are more intervenable than CBMs. Lastly, we establish that our methods are still effective under vision-language-model-based concept annotations, alleviating the need for a human-annotated validation set.

Updated: 2024-05-27 08:07:49

标题: 超越概念瓶颈模型：如何使黑匣子可干预？

摘要: 最近，可解释的机器学习重新探索了概念瓶颈模型（CBM）。这种模型类别的一个优势是用户能够干预预测的概念数值，从而影响下游输出。在这项工作中，我们介绍了一种方法，可以在预训练的神经网络上执行基于概念的干预，这些网络从设计上来说不可解释，只需要一个带有概念标签的小型验证集。此外，我们将干预性的概念定义为基于概念干预的有效性的度量，并利用这个定义来微调黑盒模型。在经验上，我们探索了黑盒分类器在合成表格和自然图像基准上的干预性。我们关注不同复杂度的骨干架构，从简单的全连接神经网络到稳定扩散。我们证明了所提出的微调改善了干预效果，并经常产生更好校准的预测。为了展示我们技术的实际效用，我们将它们应用于深度胸部X射线分类器，并展示微调的黑盒比CBMs更容易干预。最后，我们确立了我们的方法在基于视觉语言模型的概念注释下仍然有效，减轻了对人工注释验证集的需求。

更新时间: 2024-05-27 08:07:49

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2401.13544v2

Continuous Diffusion for Mixed-Type Tabular Data

Score-based generative models (or diffusion models for short) have proven successful for generating text and image data. However, the adaption of this model family to tabular data of mixed-type has fallen short so far. In this paper, we propose CDTD, a Continuous Diffusion model for mixed-type Tabular Data. Specifically, we combine score matching and score interpolation to ensure a common continuous noise distribution for both continuous and categorical features alike. We counteract the high heterogeneity inherent to data of mixed-type with distinct, adaptive noise schedules per feature or per data type. The learnable noise schedules ensure optimally allocated model capacity and balanced generative capability. We homogenize the data types further with model-specific loss calibration and initialization schemes tailored to mixed-type tabular data. Our experimental results show that CDTD consistently outperforms state-of-the-art benchmark models, captures feature correlations exceptionally well, and that heterogeneity in the noise schedule design boosts the sample quality.

Updated: 2024-05-27 08:07:39

标题: 混合类型表格数据的连续扩散

摘要: 基于得分的生成模型（或简称为扩散模型）已被证明在生成文本和图像数据方面取得成功。然而，将这一模型家族应用于混合类型的表格数据迄今为止还存在一定不足。在本文中，我们提出了CDTD，一种适用于混合类型表格数据的连续扩散模型。具体来说，我们结合了得分匹配和得分插值，以确保连续和分类特征均具有共同的连续噪声分布。我们通过针对每个特征或每种数据类型采用不同的自适应噪声计划来抵消混合类型数据固有的高异质性。可学习的噪声计划确保了模型容量的最佳分配和平衡的生成能力。我们进一步通过针对混合类型表格数据量身定制的模型特定损失校准和初始化方案来使数据类型更加均质化。我们的实验结果表明，CDTD始终优于最先进的基准模型，异常好地捕捉特征之间的相关性，并且在噪声计划设计中的异质性提高了样本质量。

更新时间: 2024-05-27 08:07:39

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2312.10431v2

Reinforcement Learning Approaches for the Orienteering Problem with Stochastic and Dynamic Release Dates

In this paper, we study a sequential decision-making problem faced by e-commerce carriers related to when to send out a vehicle from the central depot to serve customer requests, and in which order to provide the service, under the assumption that the time at which parcels arrive at the depot is stochastic and dynamic. The objective is to maximize the expected number of parcels that can be delivered during service hours. We propose two reinforcement learning (RL) approaches for solving this problem. These approaches rely on a look-ahead strategy in which future release dates are sampled in a Monte-Carlo fashion and a batch approach is used to approximate future routes. Both RL approaches are based on value function approximation - one combines it with a consensus function (VFA-CF) and the other one with a two-stage stochastic integer linear programming model (VFA-2S). VFA-CF and VFA-2S do not need extensive training as they are based on very few hyper-parameters and make good use of integer linear programming (ILP) and branch-and-cut-based exact methods to improve the quality of decisions. We also establish sufficient conditions for partial characterization of optimal policy and integrate them into VFA-CF/VFA-2S. In an empirical study, we conduct a competitive analysis using upper bounds with perfect information. We also show that VFA-CF and VFA-2S greatly outperform alternative approaches that: 1) do not rely on future information, or 2) are based on point estimation of future information, or 3) employ heuristics rather than exact methods, or 4) use exact evaluations of future rewards.

Updated: 2024-05-27 08:03:48

标题: 强化学习方法应用于具有随机和动态发布日期的定向问题

摘要: 在这篇论文中，我们研究了电子商务承运商面临的一个顺序决策问题，即何时从中央仓库派车去为客户提供服务，并以何种顺序提供服务，假设包裹到达仓库的时间是随机动态的。目标是在服务时间内最大化可送达包裹的数量。我们提出了两种强化学习（RL）方法来解决这个问题。这些方法依赖于一个前瞻策略，其中未来的发布日期以蒙特卡洛方式抽样，并使用批处理方法来近似未来的路线。这两种RL方法都基于值函数逼近 - 其中一种将其与一致性函数（VFA-CF）结合，另一种将其与两阶段随机整数线性规划模型（VFA-2S）结合。VFA-CF和VFA-2S不需要大量的训练，因为它们基于非常少的超参数，并充分利用整数线性规划（ILP）和基于分支和割的精确方法来改进决策的质量。我们还建立了充分条件来部分表征最优策略，并将其整合到VFA-CF/VFA-2S中。在一个实证研究中，我们进行了使用完美信息的上界的竞争分析。我们还展示了VFA-CF和VFA-2S远远优于其他方法，这些方法：1）不依赖未来信息，或2）基于未来信息的点估计，或3）使用启发式而不是精确方法，或4）使用未来奖励的精确评估。

更新时间: 2024-05-27 08:03:48

领域: math.OC,cs.LG

下载: http://arxiv.org/abs/2207.00885v3

Solar Panel Segmentation :Self-Supervised Learning Solutions for Imperfect Datasets

The increasing adoption of solar energy necessitates advanced methodologies for monitoring and maintenance to ensure optimal performance of solar panel installations. A critical component in this context is the accurate segmentation of solar panels from aerial or satellite imagery, which is essential for identifying operational issues and assessing efficiency. This paper addresses the significant challenges in panel segmentation, particularly the scarcity of annotated data and the labour-intensive nature of manual annotation for supervised learning. We explore and apply Self-Supervised Learning (SSL) to solve these challenges. We demonstrate that SSL significantly enhances model generalization under various conditions and reduces dependency on manually annotated data, paving the way for robust and adaptable solar panel segmentation solutions.

Updated: 2024-05-27 07:59:33

标题: 太阳能电池板分割：针对不完美数据集的自监督学习解决方案

摘要: 随着太阳能的日益普及，需要先进的监测和维护方法来确保太阳能电池板安装的最佳性能。在这种情况下，一个关键组成部分是准确地从航空或卫星图像中分割太阳能电池板，这对于识别运行问题和评估效率至关重要。本文探讨了在电池板分割中面临的重大挑战，特别是标注数据的稀缺性和手动标注用于监督学习的劳动密集性。我们探索并应用自监督学习（SSL）来解决这些挑战。我们证明了SSL在各种条件下显著提高了模型的泛化能力，并减少了对手动标注数据的依赖，为太阳能电池板分割解决方案的稳健和适应性铺平了道路。

更新时间: 2024-05-27 07:59:33

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2402.12843v2

Analysis of Atom-level pretraining with Quantum Mechanics (QM) data for Graph Neural Networks Molecular property models

Despite the rapid and significant advancements in deep learning for Quantitative Structure-Activity Relationship (QSAR) models, the challenge of learning robust molecular representations that effectively generalize in real-world scenarios to novel compounds remains an elusive and unresolved task. This study examines how atom-level pretraining with quantum mechanics (QM) data can mitigate violations of assumptions regarding the distributional similarity between training and test data and therefore improve performance and generalization in downstream tasks. In the public dataset Therapeutics Data Commons (TDC), we show how pretraining on atom-level QM improves performance overall and makes the activation of the features distributes more Gaussian-like which results in a representation that is more robust to distribution shifts. To the best of our knowledge, this is the first time that hidden state molecular representations are analyzed to compare the effects of molecule-level and atom-level pretraining on QM data.

Updated: 2024-05-27 07:56:06

标题: 原子级别预训练与量子力学数据在图神经网络分子性质模型中的分析

摘要: 尽管深度学习在量子结构-活性关系（QSAR）模型方面取得了快速且显著的进展，但学习稳健的分子表示仍然是一个难以解决的挑战，这种表示应该在真实世界情景中对新化合物进行有效泛化。本研究探讨了如何利用量子力学（QM）数据对原子级进行预训练，以减轻关于训练数据和测试数据之间分布相似性的假设违规问题，从而提高性能和泛化能力。在公共数据集Therapeutics Data Commons（TDC）中，我们展示了原子级QM预训练如何提高整体性能，并使特征的激活更加类似于高斯分布，从而产生更具鲁棒性的表示，能够更好应对分布转移。据我们所知，这是首次分析隐藏状态分子表示以比较分子级和原子级QM数据预训练对效果的影响。

更新时间: 2024-05-27 07:56:06

领域: cs.LG,physics.chem-ph,quant-ph

下载: http://arxiv.org/abs/2405.14837v2

GTA: Generative Trajectory Augmentation with Guidance for Offline Reinforcement Learning

Offline Reinforcement Learning (Offline RL) presents challenges of learning effective decision-making policies from static datasets without any online interactions. Data augmentation techniques, such as noise injection and data synthesizing, aim to improve Q-function approximation by smoothing the learned state-action region. However, these methods often fall short of directly improving the quality of offline datasets, leading to suboptimal results. In response, we introduce \textbf{GTA}, Generative Trajectory Augmentation, a novel generative data augmentation approach designed to enrich offline data by augmenting trajectories to be both high-rewarding and dynamically plausible. GTA applies a diffusion model within the data augmentation framework. GTA partially noises original trajectories and then denoises them with classifier-free guidance via conditioning on amplified return value. Our results show that GTA, as a general data augmentation strategy, enhances the performance of widely used offline RL algorithms in both dense and sparse reward settings. Furthermore, we conduct a quality analysis of data augmented by GTA and demonstrate that GTA improves the quality of the data. Our code is available at https://github.com/Jaewoopudding/GTA

Updated: 2024-05-27 07:55:45

标题: GTA：具有离线强化学习指导的生成轨迹增强

摘要: 离线强化学习(Offline RL)面临着从静态数据集中学习有效决策策略的挑战，而没有任何在线交互。数据增强技术，如注入噪声和数据合成，旨在通过平滑学习的状态-动作区域来改善Q函数的逼近。然而，这些方法往往无法直接改善离线数据集的质量，导致次优结果。为此，我们引入了GTA(Generative Trajectory Augmentation)，一种新颖的生成数据增强方法，旨在通过增加轨迹使离线数据既具有高奖励性又具有动态可信度。GTA在数据增强框架内应用扩散模型。GTA部分注入原始轨迹的噪声，然后通过放大回报值的条件来无监督地去噪。我们的结果表明，作为一种通用数据增强策略，GTA提高了在稠密和稀疏奖励设置下广泛使用的离线RL算法的性能。此外，我们对GTA增强的数据进行了质量分析，并展示了GTA改善数据质量的效果。我们的代码可在https://github.com/Jaewoopudding/GTA找到。

更新时间: 2024-05-27 07:55:45

领域: cs.AI,cs.LG

下载: http://arxiv.org/abs/2405.16907v1

Harnessing the Power of Vicinity-Informed Analysis for Classification under Covariate Shift

Transfer learning enhances prediction accuracy on a target distribution by leveraging data from a source distribution, demonstrating significant benefits in various applications. This paper introduces a novel dissimilarity measure that utilizes vicinity information, i.e., the local structure of data points, to analyze the excess error in classification under covariate shift, a transfer learning setting where marginal feature distributions differ but conditional label distributions remain the same. We characterize the excess error using the proposed measure and demonstrate faster or competitive convergence rates compared to previous techniques. Notably, our approach is effective in situations where the non-absolute continuousness assumption, which often appears in real-world applications, holds. Our theoretical analysis bridges the gap between current theoretical findings and empirical observations in transfer learning, particularly in scenarios with significant differences between source and target distributions.

Updated: 2024-05-27 07:55:27

标题: 利用邻域信息分析提升在协变量转移下的分类能力

摘要: 迁移学习通过利用来自源分布的数据来提高对目标分布的预测准确性，在各种应用中表现出明显的优势。本文介绍了一种利用邻近信息的新颖差异度度量，即数据点的局部结构，用于分析在协变量转移下分类中的额外误差，协变量转移是指边际特征分布不同但条件标签分布保持不变的情况。我们利用提出的度量来表征额外误差，并展示与先前技术相比更快或竞争性的收敛速度。值得注意的是，我们的方法在现实应用中经常出现的非绝对连续性假设成立的情况下是有效的。我们的理论分析填补了当前理论发现与迁移学习中的经验观察之间的差距，特别是在源分布和目标分布之间存在显著差异的情况下。

更新时间: 2024-05-27 07:55:27

领域: stat.ML,cs.LG

下载: http://arxiv.org/abs/2405.16906v1

Predicting from a Different Perspective in Re-ranking Model for Inductive Knowledge Graph Completion

Rule-induction models have been shown great power in the inductive setting of knowledge graph completion. In this setting, the models are tested on a knowledge graph entirely composed of unseen entities. These models learn relation patterns as rules by utilizing subgraphs. The same input but different rules cause differences in the model's predictions. In this paper, we focus on this behavior of the model. We propose a re-ranking-based model called ReDistLP (Re-ranking with a Distinct Model for Link Prediction). This model enhances the effectiveness of re-ranking by leveraging the difference in the predictions between the initial retriever and the re-ranker. ReDistLP outperforms the state-of-the-art methods in 2 out of 3 datasets.

Updated: 2024-05-27 07:50:09

标题: 使用感知知识图补全中的重新排名模型进行不同视角的预测

摘要: 规则归纳模型在知识图完成的归纳设置中展示了巨大的力量。在这种设置中，模型被测试在一个完全由未见实体组成的知识图上。这些模型通过利用子图学习关系模式作为规则。相同的输入但不同的规则会导致模型预测的差异。在本文中，我们关注模型的这种行为。我们提出了一种基于重新排名的模型，称为ReDistLP（重新排名与连接预测的不同模型）。该模型通过利用初始检索器和重新排名器之间的预测差异来增强重新排名的有效性。在3个数据集中，ReDistLP的表现优于现有技术方法中的2个。

更新时间: 2024-05-27 07:50:09

领域: cs.LG

下载: http://arxiv.org/abs/2405.16902v1

Recurrent and Convolutional Neural Networks in Classification of EEG Signal for Guided Imagery and Mental Workload Detection

The Guided Imagery technique is reported to be used by therapists all over the world in order to increase the comfort of patients suffering from a variety of disorders from mental to oncology ones and proved to be successful in numerous of ways. Possible support for the therapists can be estimation of the time at which subject goes into deep relaxation. This paper presents the results of the investigations of a cohort of 26 students exposed to Guided Imagery relaxation technique and mental task workloads conducted with the use of dense array electroencephalographic amplifier. The research reported herein aimed at verification whether it is possible to detect differences between those two states and to classify them using deep learning methods and recurrent neural networks such as EEGNet, Long Short-Term Memory-based classifier, 1D Convolutional Neural Network and hybrid model of 1D Convolutional Neural Network and Long Short-Term Memory. The data processing pipeline was presented from the data acquisition, through the initial data cleaning, preprocessing and postprocessing. The classification was based on two datasets: one of them using 26 so-called cognitive electrodes and the other one using signal collected from 256 channels. So far there have not been such comparisons in the application being discussed. The classification results are presented by the validation metrics such as: accuracy, recall, precision, F1-score and loss for each case. It turned out that it is not necessary to collect signals from all electrodes as classification of the cognitive ones gives the results similar to those obtained for the full signal and extending input to 256 channels does not add much value. In Disscussion there were proposed an optimal classifier as well as some suggestions concerning the prospective development of the project.

Updated: 2024-05-27 07:49:30

标题: Recurrent and Convolutional神经网络在脑电信号分类中的应用：引导想象和心理负荷检测

摘要: 引导性想象技术被报道为世界各地的治疗师使用，以增加患有各种疾病（从心理到肿瘤）的患者的舒适感，并在许多方面证明是成功的。治疗师可能会得到支持，以估算受试者进入深度放松状态的时间。本文介绍了对一组接受引导性想象放松技术和心理任务工作负荷的26名学生进行的研究结果，该研究使用密集阵列脑电放大器进行。本文报告的研究旨在验证是否可能检测这两种状态之间的差异，并使用深度学习方法和循环神经网络对其进行分类，例如EEGNet、基于长短期记忆的分类器、1D卷积神经网络和1D卷积神经网络和长短期记忆的混合模型。数据处理流程从数据获取开始，经过初始数据清理、预处理和后处理。分类基于两个数据集：一个使用26个所谓的认知电极，另一个使用从256个通道收集的信号。迄今为止，在所讨论的应用中还没有进行过这样的比较。分类结果通过验证指标（如准确率、召回率、精确率、F1分数和损失）进行展示。结果表明，不必收集所有电极的信号，因为对认知电极的分类结果与对整个信号获得的结果相似，并且将输入扩展到256个通道并不增加太多价值。在讨论部分提出了一个最优的分类器，以及关于项目未来发展的一些建议。

更新时间: 2024-05-27 07:49:30

领域: cs.LG

下载: http://arxiv.org/abs/2405.16901v1

Partial Models for Building Adaptive Model-Based Reinforcement Learning Agents

In neuroscience, one of the key behavioral tests for determining whether a subject of study exhibits model-based behavior is to study its adaptiveness to local changes in the environment. In reinforcement learning, however, recent studies have shown that modern model-based agents display poor adaptivity to such changes. The main reason for this is that modern agents are typically designed to improve sample efficiency in single task settings and thus do not take into account the challenges that can arise in other settings. In local adaptation settings, one particularly important challenge is in quickly building and maintaining a sufficiently accurate model after a local change. This is challenging for deep model-based agents as their models and replay buffers are monolithic structures lacking distribution shift handling capabilities. In this study, we show that the conceptually simple idea of partial models can allow deep model-based agents to overcome this challenge and thus allow for building locally adaptive model-based agents. By modeling the different parts of the state space through different models, the agent can not only maintain a model that is accurate across the state space, but it can also quickly adapt it in the presence of a local change in the environment. We demonstrate this by showing that the use of partial models in agents such as deep Dyna-Q, PlaNet and Dreamer can allow for them to effectively adapt to the local changes in their environments.

Updated: 2024-05-27 07:46:36

标题: 部分模型用于构建自适应基于模型的强化学习代理

摘要: 在神经科学领域，确定研究对象是否表现出基于模型的行为的关键行为测试之一是研究其对环境局部变化的适应性。然而，在强化学习中，最近的研究表明，现代基于模型的代理显示出对这种变化的适应性较差。这主要原因是现代代理通常设计用于提高单一任务环境下的样本效率，因此没有考虑其他环境中可能出现的挑战。在局部适应设置中，一个特别重要的挑战是在局部变化后迅速建立并维护一个足够准确的模型。这对于深度基于模型的代理来说是具有挑战性的，因为它们的模型和重放缓冲区是缺乏分布转移处理能力的整体结构。在这项研究中，我们展示了部分模型的概念简单地可以让深度基于模型的代理克服这一挑战，从而允许构建局部适应的基于模型的代理。通过通过不同模型对状态空间的不同部分建模，代理不仅可以维护一个在整个状态空间上准确的模型，还可以在环境中发生局部变化时迅速适应。我们通过展示在代理如deep Dyna-Q，PlaNet和Dreamer中使用部分模型可以使它们有效地适应环境中的局部变化来证明这一点。

更新时间: 2024-05-27 07:46:36

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2405.16899v1

Asymptotic Gaussian Fluctuations of Eigenvectors in Spectral Clustering

The performance of spectral clustering relies on the fluctuations of the entries of the eigenvectors of a similarity matrix, which has been left uncharacterized until now. In this letter, it is shown that the signal $+$ noise structure of a general spike random matrix model is transferred to the eigenvectors of the corresponding Gram kernel matrix and the fluctuations of their entries are Gaussian in the large-dimensional regime. This CLT-like result was the last missing piece to precisely predict the classification performance of spectral clustering. The proposed proof is very general and relies solely on the rotational invariance of the noise. Numerical experiments on synthetic and real data illustrate the universality of this phenomenon.

Updated: 2024-05-27 07:44:57

标题: 谱聚类中特征向量的渐近高斯波动

摘要: 谱聚类的性能依赖于相似性矩阵的特征向量条目的波动，这一点直到现在还没有被表征。在这封信中，我们展示了一般尖峰随机矩阵模型的信号+$ $噪声结构被转移到相应格拉姆核矩阵的特征向量中，并且它们的条目波动在大维度范围内是高斯分布的。这类似于中心极限定理的结果是精确预测谱聚类分类性能的最后一个缺失部分。提出的证明非常通用，仅依赖于噪声的旋转不变性。对合成和真实数据的数值实验说明了这一现象的普遍性。

更新时间: 2024-05-27 07:44:57

领域: stat.ML,cs.LG,math.PR

下载: http://arxiv.org/abs/2402.12302v2

A Taxmans guide to taxation of crypto assets

The Financial system has witnessed rapid technological changes. The rise of Bitcoin and other crypto assets based on Distributed Ledger Technology mark a fundamental change in the way people transact and transmit value over a decentralized network, spread across geographies. This has created regulatory and tax policy blind spots, as governments and tax administrations take time to understand and provide policy responses to this innovative, revolutionary, and fast-paced technology. Due to the breakneck speed of innovation in blockchain technology and advent of Decentralized Finance, Decentralized Autonomous Organizations and the Metaverse, it is unlikely that the policy interventions and guidance by regulatory authorities or tax administrations would be ahead or in sync with the pace of innovation. This paper tries to explain the principles on which crypto assets function, their underlying technology and relates them to the tax issues and taxable events which arise within this ecosystem. It also provides instances of tax and regulatory policy responses already in effect in various jurisdictions, including the recent changes in reporting standards by the FATF and the OECD. This paper tries to explain the rationale behind existing laws and policies and the challenges in their implementation. It also attempts to present a ballpark estimate of tax potential of this asset class and suggests creation of global public digital infrastructure that can address issues related to pseudonymity and extra-territoriality. The paper analyses both direct and indirect taxation issues related to crypto assets and discusses more recent aspects like proof-of-stake and maximal extractable value in greater detail.

Updated: 2024-05-27 07:42:56

标题: 一个税务人员对加密资产税收的指南

摘要: 金融系统已经见证了快速的技术变革。比特币和其他基于分布式账本技术的加密资产的兴起标志着人们在分散网络上进行交易和传输价值的方式发生了根本性变化，这个网络跨越地理边界。这导致了监管和税收政策的盲点，因为政府和税务管理部门需要时间来理解并针对这种创新、革命性和快节奏的技术提供政策响应。由于区块链技术和去中心化金融、去中心化自治组织和元宇宙的出现速度惊人，监管机构或税务管理部门的政策干预和指导不太可能领先或与创新的步伐保持同步。本文试图解释加密资产运作的原则，它们的基础技术，并将其与涉及的税收问题和可征税事件联系起来。它还提供了各个司法管辖区已经实施的税收和监管政策响应的实例，包括FATF和OECD最近对报告标准的变化。本文试图解释现有法律和政策背后的原因以及实施中的挑战。它还尝试提供这一资产类别的税收潜力的大致估计，并建议创建一个可以解决与化名和超地域性相关问题的全球公共数字基础设施。本文分析了与加密资产相关的直接和间接税收问题，并更详细地讨论了最近的方面，如权益证明和最大可提取价值。

更新时间: 2024-05-27 07:42:56

领域: q-fin.GN,cs.CR

下载: http://arxiv.org/abs/2403.15074v2

Explaining Explanations in Probabilistic Logic Programming

The emergence of tools based on artificial intelligence has also led to the need of producing explanations which are understandable by a human being. In most approaches, the system is considered a \emph{black box}, making it difficult to generate appropriate explanations. In this work, though, we consider a setting where models are \emph{transparent}: probabilistic logic programming (PLP), a paradigm that combines logic programming for knowledge representation and probability to model uncertainty. However, given a query, the usual notion of \emph{explanation} is associated with a set of choices, one for each random variable of the model. Unfortunately, such a set does not explain \emph{why} the query is true and, in fact, it may contain choices that are actually irrelevant for the considered query. To improve this situation, we present in this paper an approach to explaining explanations which is based on defining a new query-driven inference mechanism for PLP where proofs are labeled with \emph{choice expressions}, a compact and easy to manipulate representation for sets of choices. The combination of proof trees and choice expressions allows one to produce comprehensible query justifications with a causal structure.

Updated: 2024-05-27 07:38:10

标题: 在概率逻辑编程中解释解释

摘要: 基于人工智能的工具的出现也导致了需要产生人类可以理解的解释的需求。在大多数方法中，系统被视为一个“黑匣子”，这使得生成适当的解释变得困难。然而，在这项工作中，我们考虑了一种模型透明的设置：概率逻辑编程（PLP），这是一种将逻辑编程用于知识表示并利用概率来建模不确定性的范式。然而，对于一个查询，通常的“解释”概念与模型的每个随机变量的一个选择相关联。不幸的是，这样的集合并不能解释为什么查询是真实的，事实上，它可能包含对所考虑的查询实际上是无关紧要的选择。为了改善这种情况，我们在本文中提出了一种基于为PLP定义一个新的基于查询的推理机制来解释解释的方法，其中证明被标记为“选择表达式”，这是一种紧凑且易于操作的选择集表示。证明树和选择表达式的结合使得可以生成具有因果结构的易于理解的查询解释。

更新时间: 2024-05-27 07:38:10

领域: cs.AI,cs.PL

下载: http://arxiv.org/abs/2401.17045v2

On Fairness of Low-Rank Adaptation of Large Models

Low-rank adaptation of large models, particularly LoRA, has gained traction due to its computational efficiency. This efficiency, contrasted with the prohibitive costs of full-model fine-tuning, means that practitioners often turn to LoRA and sometimes without a complete understanding of its ramifications. In this study, we focus on fairness and ask whether LoRA has an unexamined impact on utility, calibration, and resistance to membership inference across different subgroups (e.g., genders, races, religions) compared to a full-model fine-tuning baseline. We present extensive experiments across vision and language domains and across classification and generation tasks using ViT-Base, Swin-v2-Large, Llama-2 7B, and Mistral 7B. Intriguingly, experiments suggest that while one can isolate cases where LoRA exacerbates model bias across subgroups, the pattern is inconsistent -- in many cases, LoRA has equivalent or even improved fairness compared to the base model or its full fine-tuning baseline. We also examine the complications of evaluating fine-tuning fairness relating to task design and model token bias, calling for more careful fairness evaluations in future work.

Updated: 2024-05-27 07:37:43

标题: 关于大型模型低秩适应的公平性

摘要: 大型模型的低秩适应，特别是LoRA，由于其计算效率而受到关注。与完整模型微调的成本高昂相比，这种效率意味着从业者经常转向LoRA，有时甚至没有完全理解其影响。在这项研究中，我们关注公平性，并询问LoRA是否对效用、校准和对抗不同子群（例如性别、种族、宗教）的成员推断具有未经审查的影响，与完整模型微调基线相比。我们在视觉和语言领域以及分类和生成任务中进行了广泛实验，使用了ViT-Base、Swin-v2-Large、Llama-27B和Mistral 7B。有趣的是，实验表明，虽然可以孤立出LoRA在不同子组中加剧模型偏见的情况，但这种模式并不一致——在许多情况下，LoRA与基准模型或其完整微调基线相比具有相等甚至改进的公平性。我们还研究了与任务设计和模型标记偏见相关的微调公平性评估的复杂性，呼吁在未来工作中进行更加细致的公平性评估。

更新时间: 2024-05-27 07:37:43

领域: cs.LG,cs.AI,cs.CY

下载: http://arxiv.org/abs/2405.17512v1

Fair Recommendations with Limited Sensitive Attributes: A Distributionally Robust Optimization Approach

As recommender systems are indispensable in various domains such as job searching and e-commerce, providing equitable recommendations to users with different sensitive attributes becomes an imperative requirement. Prior approaches for enhancing fairness in recommender systems presume the availability of all sensitive attributes, which can be difficult to obtain due to privacy concerns or inadequate means of capturing these attributes. In practice, the efficacy of these approaches is limited, pushing us to investigate ways of promoting fairness with limited sensitive attribute information. Toward this goal, it is important to reconstruct missing sensitive attributes. Nevertheless, reconstruction errors are inevitable due to the complexity of real-world sensitive attribute reconstruction problems and legal regulations. Thus, we pursue fair learning methods that are robust to reconstruction errors. To this end, we propose Distributionally Robust Fair Optimization (DRFO), which minimizes the worst-case unfairness over all potential probability distributions of missing sensitive attributes instead of the reconstructed one to account for the impact of the reconstruction errors. We provide theoretical and empirical evidence to demonstrate that our method can effectively ensure fairness in recommender systems when only limited sensitive attributes are accessible.

Updated: 2024-05-27 07:33:45

标题: 公平推荐与有限敏感属性：一种分布鲁棒优化方法

摘要: 随着推荐系统在诸如求职和电子商务等各个领域的不可或缺性，为具有不同敏感属性的用户提供公平的推荐成为迫切需求。先前用于增强推荐系统公平性的方法假定所有敏感属性都是可得的，但由于隐私问题或捕捉这些属性的手段不足而难以获得。在实践中，这些方法的有效性受到限制，促使我们研究在有限敏感属性信息的情况下促进公平的方法。为实现这一目标，重建缺失的敏感属性是重要的。然而，由于现实世界敏感属性重建问题的复杂性和法律法规，重建错误是不可避免的。因此，我们追求对重建错误具有鲁棒性的公平学习方法。为此，我们提出了分布鲁棒公平优化（DRFO），它通过最小化所有潜在概率分布上的最坏不公平性，而不是重建的概率分布，来考虑重建错误的影响。我们提供理论和经验证据表明，我们的方法可以在仅有限敏感属性可访问时有效确保推荐系统的公平性。

更新时间: 2024-05-27 07:33:45

领域: cs.IR,cs.CY,cs.LG

下载: http://arxiv.org/abs/2405.01063v2

Boosting Robustness by Clipping Gradients in Distributed Learning

Robust distributed learning consists in achieving good learning performance despite the presence of misbehaving workers. State-of-the-art (SOTA) robust distributed gradient descent (Robust-DGD) methods, relying on robust aggregation, have been proven to be optimal: Their learning error matches the lower bound established under the standard heterogeneity model of $(G, B)$-gradient dissimilarity. The learning guarantee of SOTA Robust-DGD cannot be further improved when model initialization is done arbitrarily. However, we show that it is possible to circumvent the lower bound, and improve the learning performance, when the workers' gradients at model initialization are assumed to be bounded. We prove this by proposing pre-aggregation clipping of workers' gradients, using a novel scheme called adaptive robust clipping (ARC). Incorporating ARC in Robust-DGD provably improves the learning, under the aforementioned assumption on model initialization. The factor of improvement is prominent when the tolerable fraction of misbehaving workers approaches the breakdown point. ARC induces this improvement by constricting the search space, while preserving the robustness property of the original aggregation scheme at the same time. We validate this theoretical finding through exhaustive experiments on benchmark image classification tasks.

Updated: 2024-05-27 07:25:40

标题: 通过在分布式学习中剪裁梯度来提高鲁棒性

摘要: Robust distributed learning是指在存在不良工作者的情况下实现良好的学习性能。最先进的鲁棒分布式梯度下降（Robust-DGD）方法依赖于鲁棒聚合，已被证明是最优的：它们的学习误差与在标准异质性模型$(G, B)$-梯度差异性下建立的下限相匹配。当模型初始化是任意的时，SOTA Robust-DGD的学习保证无法进一步改进。然而，我们表明当假定工作者在模型初始化时的梯度受到限制时，可以绕过下限并提高学习性能。我们通过提出工作者梯度的预聚合剪辑来证明这一点，使用一种名为自适应鲁棒剪辑（ARC）的新方案。将ARC纳入Robust-DGD可以在模型初始化的上述假设下明显改善学习。当可容忍的不良工作者比例接近破坏点时，改进因子突出。ARC通过限制搜索空间来引入这种改进，同时保留原始聚合方案的鲁棒性质。我们通过在基准图像分类任务上进行详尽的实验验证了这一理论发现。

更新时间: 2024-05-27 07:25:40

领域: cs.LG

下载: http://arxiv.org/abs/2405.14432v2

Functional Protein Design with Local Domain Alignment

The core challenge of de novo protein design lies in creating proteins with specific functions or properties, guided by certain conditions. Current models explore to generate protein using structural and evolutionary guidance, which only provide indirect conditions concerning functions and properties. However, textual annotations of proteins, especially the annotations for protein domains, which directly describe the protein's high-level functionalities, properties, and their correlation with target amino acid sequences, remain unexplored in the context of protein design tasks. In this paper, we propose Protein-Annotation Alignment Generation (PAAG), a multi-modality protein design framework that integrates the textual annotations extracted from protein database for controllable generation in sequence space. Specifically, within a multi-level alignment module, PAAG can explicitly generate proteins containing specific domains conditioned on the corresponding domain annotations, and can even design novel proteins with flexible combinations of different kinds of annotations. Our experimental results underscore the superiority of the aligned protein representations from PAAG over 7 prediction tasks. Furthermore, PAAG demonstrates a nearly sixfold increase in generation success rate (24.7% vs 4.7% in zinc finger, and 54.3% vs 8.7% in the immunoglobulin domain) in comparison to the existing model.

Updated: 2024-05-27 07:23:26

标题: Functional Protein Design with Local Domain Alignment 使用局部结构域对齐进行功能蛋白设计

摘要: 蛋白质设计的核心挑战在于创建具有特定功能或特性的蛋白质，并在一定条件下进行引导。目前的模型探索使用结构和进化指导生成蛋白质，这些模型仅提供了关于功能和特性的间接条件。然而，蛋白质的文本注释，特别是蛋白质结构域的注释，直接描述了蛋白质的高级功能、特性以及与目标氨基酸序列的相关性，在蛋白质设计任务的背景下尚未被探索。在本文中，我们提出了蛋白质注释对齐生成（PAAG）的多模态蛋白质设计框架，该框架集成了从蛋白质数据库中提取的文本注释，用于在序列空间中进行可控生成。具体而言，在多级对齐模块内，PAAG可以明确地生成包含特定结构域的蛋白质，条件是对应的结构域注释，并且甚至可以设计具有不同种类注释的灵活组合的新蛋白质。我们的实验结果强调了来自PAAG的对齐蛋白质表示在7个预测任务中的优越性。此外，与现有模型相比，PAAG在生成成功率方面表现出了近6倍的增长（锌指蛋白中24.7%对4.7%，免疫球蛋白结构域中54.3%对8.7%）。

更新时间: 2024-05-27 07:23:26

领域: q-bio.QM,cs.AI,cs.LG

下载: http://arxiv.org/abs/2404.16866v2

HyperInterval: Hypernetwork approach to training weight interval regions in continual learning

Recently, a new Continual Learning (CL) paradigm was presented to control catastrophic forgetting, called Interval Continual Learning (InterContiNet), which relies on enforcing interval constraints on the neural network parameter space. Unfortunately, InterContiNet training is challenging due to the high dimensionality of the weight space, making intervals difficult to manage. To address this issue, we introduce HyperInterval, a technique that employs interval arithmetic within the embedding space and utilizes a hypernetwork to map these intervals to the target network parameter space. We train interval embeddings for consecutive tasks and train a hypernetwork to transform these embeddings into weights of the target network. An embedding for a given task is trained along with the hypernetwork, preserving the response of the target network for the previous task embeddings. Interval arithmetic works with a more manageable, lower-dimensional embedding space rather than directly preparing intervals in a high-dimensional weight space. Our model allows faster and more efficient training. Furthermore, HyperInterval maintains the guarantee of not forgetting. At the end of training, we can choose one universal embedding to produce a single network dedicated to all tasks. In such a framework, hypernetwork is used only for training and can be seen as a meta-trainer. HyperInterval obtains significantly better results than InterContiNet and gives SOTA results on several benchmarks.

Updated: 2024-05-27 07:22:58

标题: 超区间：在持续学习中训练权重区间区域的超网络方法

摘要: 最近，提出了一种新的持续学习（CL）范式，用于控制灾难性遗忘，称为间隔持续学习（InterContiNet），它依赖于对神经网络参数空间施加间隔约束。不幸的是，由于权重空间的高维性，InterContiNet的训练具有挑战性，使得间隔难以管理。为了解决这个问题，我们引入了HyperInterval，一种在嵌入空间中采用区间算术并利用超网络将这些区间映射到目标网络参数空间的技术。我们为连续任务训练区间嵌入，并训练一个超网络将这些嵌入转换为目标网络的权重。给定任务的嵌入与超网络一起训练，保留了目标网络对先前任务嵌入的响应。区间算术在一个更易管理、低维的嵌入空间中工作，而不是直接在高维权重空间中准备区间。我们的模型允许更快、更有效的训练。此外，HyperInterval保证不会遗忘。在训练结束时，我们可以选择一个通用嵌入来产生一个专门用于所有任务的单一网络。在这样的框架中，超网络仅用于训练，可以看作是一个元训练器。HyperInterval在几个基准测试上取得了明显更好的结果，比InterContiNet表现更好，并在一些基准测试中取得了SOTA结果。

更新时间: 2024-05-27 07:22:58

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2405.15444v2

Faithful Attention Explainer: Verbalizing Decisions Based on Discriminative Features

In recent years, model explanation methods have been designed to interpret model decisions faithfully and intuitively so that users can easily understand them. In this paper, we propose a framework, Faithful Attention Explainer (FAE), capable of generating faithful textual explanations regarding the attended-to features. Towards this goal, we deploy an attention module that takes the visual feature maps from the classifier for sentence generation. Furthermore, our method successfully learns the association between features and words, which allows a novel attention enforcement module for attention explanation. Our model achieves promising performance in caption quality metrics and a faithful decision-relevance metric on two datasets (CUB and ACT-X). In addition, we show that FAE can interpret gaze-based human attention, as human gaze indicates the discriminative features that humans use for decision-making, demonstrating the potential of deploying human gaze for advanced human-AI interaction.

Updated: 2024-05-27 07:20:52

标题: 忠实关注解释器：基于辨别特征的决策言语化

摘要: 近年来，模型解释方法已经被设计用来忠实且直观地解释模型的决策，以便用户能够轻松理解它们。在本文中，我们提出了一个框架，称为Faithful Attention Explainer（FAE），能够生成关于被关注特征的忠实文本解释。为实现这一目标，我们部署了一个注意力模块，该模块从分类器中获取视觉特征图用于生成句子。此外，我们的方法成功地学习了特征和单词之间的关联，从而为注意力解释提供了一种新颖的注意力强化模块。我们的模型在两个数据集（CUB和ACT-X）上的字幕质量指标和忠实决策相关度指标上取得了令人满意的表现。此外，我们展示了FAE能够解释基于凝视的人类注意力，因为人类凝视指示了人类在决策中使用的具有辨别能力的特征，展示了利用人类凝视进行先进的人机交互的潜力。

更新时间: 2024-05-27 07:20:52

领域: cs.CL,cs.AI,cs.CV

下载: http://arxiv.org/abs/2405.13032v2

Cross-Validated Off-Policy Evaluation

In this paper, we study the problem of estimator selection and hyper-parameter tuning in off-policy evaluation. Although cross-validation is the most popular method for model selection in supervised learning, off-policy evaluation relies mostly on theory-based approaches, which provide only limited guidance to practitioners. We show how to use cross-validation for off-policy evaluation. This challenges a popular belief that cross-validation in off-policy evaluation is not feasible. We evaluate our method empirically and show that it addresses a variety of use cases.

Updated: 2024-05-27 07:15:52

标题: 交叉验证的离线策略评估

摘要: 在这篇论文中，我们研究了在离线策略评估中的估计器选择和超参数调整问题。虽然交叉验证是监督学习中模型选择最流行的方法，但离线策略评估主要依赖基于理论的方法，这些方法对实践者提供的指导有限。我们展示了如何在离线策略评估中使用交叉验证。这挑战了一个流行的观念，即在离线策略评估中使用交叉验证是不可行的。我们通过实证评估我们的方法，并展示它适用于各种使用案例。

更新时间: 2024-05-27 07:15:52

领域: cs.LG

下载: http://arxiv.org/abs/2405.15332v2

SUNY: A Visual Interpretation Framework for Convolutional Neural Networks from a Necessary and Sufficient Perspective

Researchers have proposed various methods for visually interpreting the Convolutional Neural Network (CNN) via saliency maps, which include Class-Activation-Map (CAM) based approaches as a leading family. However, in terms of the internal design logic, existing CAM-based approaches often overlook the causal perspective that answers the core "why" question to help humans understand the explanation. Additionally, current CNN explanations lack the consideration of both necessity and sufficiency, two complementary sides of a desirable explanation. This paper presents a causality-driven framework, SUNY, designed to rationalize the explanations toward better human understanding. Using the CNN model's input features or internal filters as hypothetical causes, SUNY generates explanations by bi-directional quantifications on both the necessary and sufficient perspectives. Extensive evaluations justify that SUNY not only produces more informative and convincing explanations from the angles of necessity and sufficiency, but also achieves performances competitive to other approaches across different CNN architectures over large-scale datasets, including ILSVRC2012 and CUB-200-2011.

Updated: 2024-05-27 07:11:49

标题: SUNY: 从必要和充分的角度看卷积神经网络的视觉解释框架

摘要: 研究人员提出了各种方法来通过显著性图形解释卷积神经网络（CNN），其中包括基于类激活映射（CAM）的方法作为主要家族。然而，在内部设计逻辑方面，现有的基于CAM的方法经常忽视回答核心“为什么”问题以帮助人类理解解释的因果透视。此外，当前的CNN解释缺乏对必要性和充分性两个互补方面的考虑，这是一个理想解释的两个方面。本文提出了一个基于因果关系驱动的框架SUNY，旨在使解释合理化以更好地为人类理解。通过使用CNN模型的输入特征或内部滤波器作为假设原因，SUNY通过在必要和充分两个方面上进行双向量化生成解释。广泛的评估证明，SUNY不仅从必要性和充分性的角度产生了更具信息性和说服力的解释，而且在大规模数据集上，包括ILSVRC2012和CUB-200-2011，也取得了与其他方法竞争性能。

更新时间: 2024-05-27 07:11:49

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2303.00244v3

A Large Language Model-based multi-agent manufacturing system for intelligent shopfloor

As productivity advances, the demand of customers for multi-variety and small-batch production is increasing, thereby putting forward higher requirements for manufacturing systems. When production tasks frequent changes due to this demand, traditional manufacturing systems often cannot response promptly. The multi-agent manufacturing system is proposed to address this problem. However, because of technical limitations, the negotiation among agents in this kind of system is realized through predefined heuristic rules, which is not intelligent enough to deal with the multi-variety and small batch production. To this end, a Large Language Model-based (LLM-based) multi-agent manufacturing system for intelligent shopfloor is proposed in the present study. This system delineates the diverse agents and defines their collaborative methods. The roles of the agents encompass Machine Server Agent (MSA), Bid Inviter Agent (BIA), Bidder Agent (BA), Thinking Agent (TA), and Decision Agent (DA). Due to the support of LLMs, TA and DA acquire the ability of analyzing the shopfloor condition and choosing the most suitable machine, as opposed to executing a predefined program artificially. The negotiation between BAs and BIA is the most crucial step in connecting manufacturing resources. With the support of TA and DA, BIA will finalize the distribution of orders, relying on the information of each machine returned by BA. MSAs bears the responsibility for connecting the agents with the physical shopfloor. This system aims to distribute and transmit workpieces through the collaboration of the agents with these distinct roles, distinguishing it from other scheduling approaches. Comparative experiments were also conducted to validate the performance of this system.

Updated: 2024-05-27 07:10:04

标题: 基于大型语言模型的智能车间多智能体制造系统

摘要: 随着生产效率的提高，客户对多品种和小批量生产的需求正在增加，从而对制造系统提出了更高的要求。当生产任务频繁变化时，传统制造系统往往无法及时响应。为解决这一问题，提出了多智能体制造系统。然而，由于技术限制，这种系统中智能体之间的协商是通过预定义的启发式规则实现的，这种方法不够智能化来应对多品种和小批量生产。因此，本研究提出了基于大型语言模型（LLM-based）的多智能体制造系统，用于智能车间。该系统描述了不同智能体并定义了它们的协作方法。智能体的角色包括机器服务器智能体（MSA）、竞标邀请智能体（BIA）、竞标智能体（BA）、思考智能体（TA）和决策智能体（DA）。由于LLM的支持，TA和DA能够分析车间情况并选择最合适的机器，而不是人为地执行预定义程序。BA和BIA之间的协商是连接制造资源的最关键步骤。在TA和DA的支持下，BIA将根据BA返回的每台机器的信息最终确定订单的分配。MSA负责将智能体连接到物理车间。该系统旨在通过这些具有不同角色的智能体的协作来分配和传递工件，与其他调度方法不同。还进行了比较实验，以验证该系统的性能。

更新时间: 2024-05-27 07:10:04

领域: cs.AI,cs.MA,cs.RO

下载: http://arxiv.org/abs/2405.16887v1

ReFusion: Improving Natural Language Understanding with Computation-Efficient Retrieval Representation Fusion

Retrieval-based augmentations (RA) incorporating knowledge from an external database into language models have greatly succeeded in various knowledge-intensive (KI) tasks. However, integrating retrievals in non-knowledge-intensive (NKI) tasks is still challenging. Existing works focus on concatenating retrievals with inputs to improve model performance. Unfortunately, the use of retrieval concatenation-based augmentations causes an increase in the input length, substantially raising the computational demands of attention mechanisms. This paper proposes a new paradigm of RA named \textbf{ReFusion}, a computation-efficient Retrieval representation Fusion with bi-level optimization. Unlike previous works, ReFusion directly fuses the retrieval representations into the hidden states of models. Specifically, ReFusion leverages an adaptive retrieval integrator to seek the optimal combination of the proposed ranking schemes across different model layers. Experimental results demonstrate that the proposed ReFusion can achieve superior and robust performance in various NKI tasks.

Updated: 2024-05-27 07:04:19

标题: ReFusion：通过计算高效的检索表示融合来改进自然语言理解

摘要: 检索增强（RA）将外部数据库中的知识纳入语言模型，已经在各种知识密集型（KI）任务中取得了巨大成功。然而，在非知识密集型（NKI）任务中整合检索仍然具有挑战性。现有研究主要集中在将检索与输入串联以提高模型性能。不幸的是，基于检索串联的增强使用会增加输入长度，从而大幅提高了注意机制的计算需求。本文提出了一种名为\textbf{ReFusion}的新的RA范式，这是一种计算高效的检索表示融合与双层优化。与以往的作品不同，ReFusion直接将检索表示融入模型的隐藏状态中。具体而言，ReFusion利用自适应检索集成器来寻求在不同模型层中提出的排名方案的最佳组合。实验结果表明，所提出的ReFusion在各种NKI任务中能够实现卓越且稳健的性能。

更新时间: 2024-05-27 07:04:19

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2401.02993v2

Scorch: A Library for Sparse Deep Learning

The rapid growth in the size of deep learning models strains the capabilities of traditional dense computation paradigms. Leveraging sparse computation has become increasingly popular for training and deploying large-scale models, but existing deep learning frameworks lack extensive support for sparse operations. To bridge this gap, we introduce Scorch, a library that seamlessly integrates efficient sparse tensor computation into the PyTorch ecosystem, with an initial focus on inference workloads on CPUs. Scorch provides a flexible and intuitive interface for sparse tensors, supporting diverse sparse data structures. Scorch introduces a compiler stack that automates key optimizations, including automatic loop ordering, tiling, and format inference. Combined with a runtime that adapts its execution to both dense and sparse data, Scorch delivers substantial speedups over hand-written PyTorch Sparse (torch.sparse) operations without sacrificing usability. More importantly, Scorch enables efficient computation of complex sparse operations that lack hand-optimized PyTorch implementations. This flexibility is crucial for exploring novel sparse architectures. We demonstrate Scorch's ease of use and performance gains on diverse deep learning models across multiple domains. With only minimal code changes, Scorch achieves 1.05-5.78x speedups over PyTorch Sparse on end-to-end tasks. Scorch's seamless integration and performance gains make it a valuable addition to the PyTorch ecosystem. We believe Scorch will enable wider exploration of sparsity as a tool for scaling deep learning and inform the development of other sparse libraries.

Updated: 2024-05-27 06:59:20

标题: Scorch：用于稀疏深度学习的库

摘要: 随着深度学习模型规模的快速增长，传统的密集计算范式的能力受到了压力。利用稀疏计算越来越受欢迎，用于训练和部署大规模模型，但现有的深度学习框架缺乏对稀疏操作的广泛支持。为了填补这一差距，我们引入了Scorch，一个库，它将高效的稀疏张量计算无缝集成到PyTorch生态系统中，最初专注于在CPU上进行推理工作负载。Scorch为稀疏张量提供了灵活直观的接口，支持多样化的稀疏数据结构。Scorch引入了一个编译器堆栈，自动执行关键优化，包括自动循环排序、平铺和格式推断。结合一个能够适应密集和稀疏数据执行的运行时，Scorch在不牺牲可用性的情况下，比手写的PyTorch稀疏（torch.sparse）操作实现了显著的加速。更重要的是，Scorch使得对缺乏手写PyTorch优化实现的复杂稀疏操作进行高效计算成为可能。这种灵活性对于探索新颖的稀疏架构至关重要。我们展示了Scorch在多个领域的不同深度学习模型上的易用性和性能收益。仅需进行最少的代码更改，Scorch在端到端任务上实现了1.05-5.78倍PyTorch稀疏的加速。Scorch的无缝集成和性能提升使其成为PyTorch生态系统中的有价值的补充。我们相信Scorch将促进更广泛地探索稀疏性作为扩展深度学习的工具，并为其他稀疏库的开发提供启示。

更新时间: 2024-05-27 06:59:20

领域: cs.LG,cs.AI,cs.MS,cs.PL

下载: http://arxiv.org/abs/2405.16883v1

Revisiting the Power of Prompt for Visual Tuning

Visual prompt tuning (VPT) is a promising solution incorporating learnable prompt tokens to customize pre-trained models for downstream tasks. However, VPT and its variants often encounter challenges like prompt initialization, prompt length, and subpar performance in self-supervised pretraining, hindering successful contextual adaptation. This study commences by exploring the correlation evolvement between prompts and patch tokens during proficient training. Inspired by the observation that the prompt tokens tend to share high mutual information with patch tokens, we propose initializing prompts with downstream token prototypes. The strategic initialization, a stand-in for the previous initialization, substantially improves performance in fine-tuning. To refine further, we optimize token construction with a streamlined pipeline that maintains excellent performance with almost no increase in computational expenses compared to VPT. Exhaustive experiments show our proposed approach outperforms existing methods by a remarkable margin. For instance, it surpasses full fine-tuning in 19 out of 24 tasks, using less than 0.4% of learnable parameters on the FGVC and VTAB-1K benchmarks. Notably, our method significantly advances the adaptation for self-supervised pretraining, achieving impressive task performance gains of at least 10% to 30%. Besides, the experimental results demonstrate the proposed SPT is robust to prompt lengths and scales well with model capacity and training data size. We finally provide an insightful exploration into the amount of target data facilitating the adaptation of pre-trained models to downstream tasks. The code is available at https://github.com/WangYZ1608/Self-Prompt-Tuning.

Updated: 2024-05-27 06:51:07

标题: 重新审视提示对视觉调整的力量

摘要: 视觉提示调整（VPT）是一种有前途的解决方案，它将可学习的提示标记结合起来，以定制预训练模型用于下游任务。然而，VPT及其变种经常遇到挑战，如提示初始化、提示长度，在自监督预训练中表现不佳，阻碍了成功的上下文适应。本研究首先探讨了在熟练训练过程中提示和补丁标记之间的相关性演变。受到观察到提示标记倾向于与补丁标记共享高互信息的启发，我们提出使用下游标记原型初始化提示。这种战略初始化作为先前初始化的替代，在微调中显著提高了性能。为了进一步优化，我们通过一种简化的流程优化标记构建，与VPT相比，几乎不增加计算开销，同时保持出色的性能。详尽的实验表明，我们提出的方法胜过现有方法很大一部分。例如，在FGVC和VTAB-1K基准测试中，它在24项任务中有19项超过了完全微调，仅使用不到0.4%的可学习参数。值得注意的是，我们的方法极大地推进了自监督预训练的适应性，实现了至少10%至30%的任务性能增益。此外，实验结果表明，所提出的SPT对提示长度具有鲁棒性，并且随着模型容量和训练数据规模的增加而扩展。最后，我们对目标数据量对于预训练模型适应到下游任务的影响进行了深入探讨。代码可在https://github.com/WangYZ1608/Self-Prompt-Tuning上找到。

更新时间: 2024-05-27 06:51:07

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2402.02382v3

Reference Neural Operators: Learning the Smooth Dependence of Solutions of PDEs on Geometric Deformations

For partial differential equations on domains of arbitrary shapes, existing works of neural operators attempt to learn a mapping from geometries to solutions. It often requires a large dataset of geometry-solution pairs in order to obtain a sufficiently accurate neural operator. However, for many industrial applications, e.g., engineering design optimization, it can be prohibitive to satisfy the requirement since even a single simulation may take hours or days of computation. To address this issue, we propose reference neural operators (RNO), a novel way of implementing neural operators, i.e., to learn the smooth dependence of solutions on geometric deformations. Specifically, given a reference solution, RNO can predict solutions corresponding to arbitrary deformations of the referred geometry. This approach turns out to be much more data efficient. Through extensive experiments, we show that RNO can learn the dependence across various types and different numbers of geometry objects with relatively small datasets. RNO outperforms baseline models in accuracy by a large lead and achieves up to 80% error reduction.

Updated: 2024-05-27 06:50:17

标题: 参考神经算子：学习偏微分方程解对几何变形的平滑依赖

摘要: 对于任意形状域上的偏微分方程，现有的神经算子工作尝试学习从几何到解决方案的映射。为了获得足够精确的神经算子，通常需要大量的几何-解决方案对数据集。然而，对于许多工业应用，例如工程设计优化，满足这一要求可能是困难的，因为即使一次模拟可能需要几小时甚至几天的计算时间。为了解决这个问题，我们提出了参考神经算子（RNO），这是一种实现神经算子的新方法，即学习解决方案对几何变形的平滑依赖关系。具体来说，给定一个参考解，RNO可以预测与被参考几何形状的任意变形对应的解决方案。这种方法在数据效率上表现得更好。通过大量实验证明，RNO可以在相对较小的数据集中学习不同类型和数量的几何对象之间的依赖关系。RNO在准确性上胜过基线模型，并实现了高达80%的误差减少。

更新时间: 2024-05-27 06:50:17

领域: cs.LG

下载: http://arxiv.org/abs/2405.17509v1

Unsupervised Generative Feature Transformation via Graph Contrastive Pre-training and Multi-objective Fine-tuning

Feature transformation is to derive a new feature set from original features to augment the AI power of data. In many science domains such as material performance screening, while feature transformation can model material formula interactions and compositions and discover performance drivers, supervised labels are collected from expensive and lengthy experiments. This issue motivates an Unsupervised Feature Transformation Learning (UFTL) problem. Prior literature, such as manual transformation, supervised feedback guided search, and PCA, either relies on domain knowledge or expensive supervised feedback, or suffers from large search space, or overlooks non-linear feature-feature interactions. UFTL imposes a major challenge on existing methods: how to design a new unsupervised paradigm that captures complex feature interactions and avoids large search space? To fill this gap, we connect graph, contrastive, and generative learning to develop a measurement-pretrain-finetune paradigm for UFTL. For unsupervised feature set utility measurement, we propose a feature value consistency preservation perspective and develop a mean discounted cumulative gain like unsupervised metric to evaluate feature set utility. For unsupervised feature set representation pretraining, we regard a feature set as a feature-feature interaction graph, and develop an unsupervised graph contrastive learning encoder to embed feature sets into vectors. For generative transformation finetuning, we regard a feature set as a feature cross sequence and feature transformation as sequential generation. We develop a deep generative feature transformation model that coordinates the pretrained feature set encoder and the gradient information extracted from a feature set utility evaluator to optimize a transformed feature generator.

Updated: 2024-05-27 06:50:00

标题: 无监督生成特征转换：通过图对比预训练和多目标微调

摘要: 特征转换是从原始特征中导出一个新的特征集，以增强数据的人工智能能力。在许多科学领域，如材料性能筛选，特征转换可以建模材料配方的相互作用和组成，并发现性能驱动因素，但监督标签通常需要通过昂贵和耗时的实验收集。这个问题促使了一个无监督特征转换学习（UFTL）问题。先前的文献，如手动转换，监督反馈引导搜索和PCA，要么依赖于领域知识或昂贵的监督反馈，要么受制于庞大的搜索空间，要么忽视非线性特征-特征交互作用。UFTL对现有方法提出了重大挑战：如何设计一种捕捉复杂特征交互作用并避免庞大搜索空间的新的无监督范式？为了填补这一空白，我们将图形、对比和生成学习联系起来，开发了一个用于UFTL的度量-预训练-微调范式。对于无监督特征集实用性度量，我们提出了一个特征值一致性保持的视角，并开发了类似于均衡折扣累积增益的无监督指标来评估特征集实用性。对于无监督特征集表示预训练，我们将特征集视为特征-特征交互图，并开发了一个无监督图对比学习编码器来将特征集嵌入向量中。对于生成变换微调，我们将特征集视为特征交叉序列，将特征转换视为顺序生成。我们开发了一个深度生成特征转换模型，协调了预训练特征集编码器和从特征集实用性评估器中提取的梯度信息，以优化转换特征生成器。

更新时间: 2024-05-27 06:50:00

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2405.16879v1

Are Self-Attentions Effective for Time Series Forecasting?

Time series forecasting is crucial for applications across multiple domains and various scenarios. Although Transformer models have dramatically shifted the landscape of forecasting, their effectiveness remains debated. Recent findings have indicated that simpler linear models might outperform complex Transformer-based approaches, highlighting the potential for more streamlined architectures. In this paper, we shift focus from the overall architecture of the Transformer to the effectiveness of self-attentions for time series forecasting. To this end, we introduce a new architecture, Cross-Attention-only Time Series transformer (CATS), that rethinks the traditional Transformer framework by eliminating self-attention and leveraging cross-attention mechanisms instead. By establishing future horizon-dependent parameters as queries and enhanced parameter sharing, our model not only improves long-term forecasting accuracy but also reduces the number of parameters and memory usage. Extensive experiment across various datasets demonstrates that our model achieves superior performance with the lowest mean squared error and uses fewer parameters compared to existing models.

Updated: 2024-05-27 06:49:39

标题: 自我关注对时间序列预测有效吗？

摘要: 时间序列预测对于跨越多个领域和各种场景的应用至关重要。虽然Transformer模型已经极大地改变了预测的格局，但其有效性仍存在争议。最近的研究结果表明，简单的线性模型可能优于复杂的基于Transformer的方法，突显了更简化架构的潜力。在本文中，我们将重点从Transformer的整体架构转移到自注意力机制在时间序列预测中的有效性。为此，我们引入了一种新的架构，即Cross-Attention-only Time Series transformer (CATS)，通过消除自注意力机制并利用交叉注意力机制来重新思考传统的Transformer框架。通过将未来的时间跨度相关参数作为查询并增强参数共享，我们的模型不仅提高了长期预测的准确性，还减少了参数数量和内存使用。在各种数据集上进行的大量实验表明，我们的模型以最低的均方误差实现了卓越的性能，并且与现有模型相比使用了更少的参数。

更新时间: 2024-05-27 06:49:39

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2405.16877v1

Transfer Learning for Diffusion Models

Diffusion models, a specific type of generative model, have achieved unprecedented performance in recent years and consistently produce high-quality synthetic samples. A critical prerequisite for their notable success lies in the presence of a substantial number of training samples, which can be impractical in real-world applications due to high collection costs or associated risks. Consequently, various finetuning and regularization approaches have been proposed to transfer knowledge from existing pre-trained models to specific target domains with limited data. This paper introduces the Transfer Guided Diffusion Process (TGDP), a novel approach distinct from conventional finetuning and regularization methods. We prove that the optimal diffusion model for the target domain integrates pre-trained diffusion models on the source domain with additional guidance from a domain classifier. We further extend TGDP to a conditional version for modeling the joint distribution of data and its corresponding labels, together with two additional regularization terms to enhance the model performance. We validate the effectiveness of TGDP on Gaussian mixture simulations and on real electrocardiogram (ECG) datasets.

Updated: 2024-05-27 06:48:58

标题: 扩散模型的迁移学习

摘要: 扩散模型是一种特定类型的生成模型，在近年来取得了前所未有的性能，并始终产生高质量的合成样本。它们引人注目的成功的一个关键前提在于存在大量的训练样本，但在现实世界的应用中，由于高昂的收集成本或相关风险，这可能并不现实。因此，人们提出了各种微调和正则化方法，以将知识从现有的预训练模型转移到具有有限数据的特定目标域。本文介绍了一种新颖的方法，即传输引导扩散过程（TGDP），与传统的微调和正则化方法不同。我们证明了针对目标域的最佳扩散模型将预训练扩散模型与来自领域分类器的额外指导相结合。我们进一步将TGDP扩展为条件版本，用于建模数据及其相应标签的联合分布，并添加两个额外的正则化项以增强模型性能。我们在高斯混合模拟和真实心电图（ECG）数据集上验证了TGDP的有效性。

更新时间: 2024-05-27 06:48:58

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2405.16876v1

Automated discovery of symbolic laws governing skill acquisition from naturally occurring data

Skill acquisition is a key area of research in cognitive psychology as it encompasses multiple psychological processes. The laws discovered under experimental paradigms are controversial and lack generalizability. This paper aims to unearth the laws of skill learning from large-scale training log data. A two-stage algorithm was developed to tackle the issues of unobservable cognitive states and algorithmic explosion in searching. Initially a deep learning model is employed to determine the learner's cognitive state and assess the feature importance. Subsequently, symbolic regression algorithms are utilized to parse the neural network model into algebraic equations. Experimental results show the algorithm can accurately restore preset laws within a noise range in continuous feedback settings. When applied to Lumosity training data, the method outperforms traditional and recent models in fitness terms. The study reveals two new forms of skill acquisition laws and reaffirms some previous findings.

Updated: 2024-05-27 06:48:09

标题: 自然数据中自动发现统治技能获取的符号规律

摘要: 技能习得是认知心理学中的一个重要研究领域，涵盖了多个心理过程。在实验范式下发现的规律存在争议，并且缺乏普适性。本文旨在从大规模训练日志数据中揭示技能学习规律。开发了一个两阶段算法来解决不可观测的认知状态和搜索中的算法爆炸问题。首先，采用深度学习模型来确定学习者的认知状态并评估特征重要性。随后，利用符号回归算法将神经网络模型解析为代数方程。实验结果显示，该算法能够在连续反馈设置中准确恢复预设规律，在Lumosity训练数据中应用时，在健身方面优于传统和最新模型。研究揭示了两种新形式的技能习得规律，并重新证实了一些先前的发现。

更新时间: 2024-05-27 06:48:09

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2404.05689v2

Return-Aligned Decision Transformer

Traditional approaches in offline reinforcement learning aim to learn the optimal policy that maximizes the cumulative reward, also known as return. However, as applications broaden, it becomes increasingly crucial to train agents that not only maximize the returns, but align the actual return with a specified target return, giving control over the agent's performance. Decision Transformer (DT) optimizes a policy that generates actions conditioned on the target return through supervised learning and is equipped with a mechanism to control the agent using the target return. However, the action generation is hardly influenced by the target return because DT's self-attention allocates scarce attention scores to the return tokens. In this paper, we propose Return-Aligned Decision Transformer (RADT), designed to effectively align the actual return with the target return. RADT utilizes features extracted by paying attention solely to the return, enabling the action generation to consistently depend on the target return. Extensive experiments show that RADT reduces the discrepancies between the actual return and the target return of DT-based methods.

Updated: 2024-05-27 06:41:34

标题: Return-Aligned Decision Transformer 的翻译是：回报对齐决策变换器

摘要: 传统的离线强化学习方法旨在学习最大化累积奖励的最优策略，也称为回报。然而，随着应用范围的扩大，训练不仅最大化回报的代理变得越来越重要，而且使实际回报与指定目标回报保持一致，从而控制代理的性能也变得至关重要。决策变压器（DT）通过监督学习优化生成动作的策略，该动作是基于目标回报的，并配备了一个机制来使用目标回报来控制代理。然而，由于DT的自注意力将有限的注意力分配给回报标记，因此动作生成几乎不受目标回报的影响。在本文中，我们提出了Return-Aligned Decision Transformer（RADT），旨在有效地使实际回报与目标回报保持一致。RADT利用仅通过关注回报提取的特征，使动作生成始终依赖于目标回报。广泛的实验表明，RADT减少了基于DT的方法中实际回报与目标回报之间的差异。

更新时间: 2024-05-27 06:41:34

领域: cs.LG

下载: http://arxiv.org/abs/2402.03923v3

Mixture of Modality Knowledge Experts for Robust Multi-modal Knowledge Graph Completion

Multi-modal knowledge graph completion (MMKGC) aims to automatically discover new knowledge triples in the given multi-modal knowledge graphs (MMKGs), which is achieved by collaborative modeling the structural information concealed in massive triples and the multi-modal features of the entities. Existing methods tend to focus on crafting elegant entity-wise multi-modal fusion strategies, yet they overlook the utilization of multi-perspective features concealed within the modalities under diverse relational contexts. To address this issue, we introduce a novel MMKGC framework with Mixture of Modality Knowledge experts (MoMoK for short) to learn adaptive multi-modal embedding under intricate relational contexts. We design relation-guided modality knowledge experts to acquire relation-aware modality embeddings and integrate the predictions from multi-modalities to achieve comprehensive decisions. Additionally, we disentangle the experts by minimizing their mutual information. Experiments on four public MMKG benchmarks demonstrate the outstanding performance of MoMoK under complex scenarios.

Updated: 2024-05-27 06:36:17

标题: 多模态知识专家混合用于稳健的多模态知识图完善

摘要: 多模态知识图谱补全（MMKGC）旨在自动发现给定的多模态知识图谱（MMKGs）中的新知识三元组，通过协同建模隐藏在大量三元组中的结构信息和实体的多模态特征来实现。现有方法往往专注于精心设计基于实体的多模态融合策略，但它们忽略了在不同关系上下文中隐藏的多视角特征的利用。为解决这一问题，我们引入一种新颖的MMKGC框架，采用混合模态知识专家（简称MoMoK）来学习复杂关系上下文下的自适应多模态嵌入。我们设计了关系引导的模态知识专家来获取关系感知的模态嵌入，并整合多模态的预测结果以实现全面的决策。此外，我们通过最小化他们之间的互信息来解开专家之间的联系。在四个公开的MMKG基准测试上的实验证明了MoMoK在复杂场景下的出色性能。

更新时间: 2024-05-27 06:36:17

领域: cs.AI,cs.CL

下载: http://arxiv.org/abs/2405.16869v1

Clustering-based Learning for UAV Tracking and Pose Estimation

UAV tracking and pose estimation plays an imperative role in various UAV-related missions, such as formation control and anti-UAV measures. Accurately detecting and tracking UAVs in a 3D space remains a particularly challenging problem, as it requires extracting sparse features of micro UAVs from different flight environments and continuously matching correspondences, especially during agile flight. Generally, cameras and LiDARs are the two main types of sensors used to capture UAV trajectories in flight. However, both sensors have limitations in UAV classification and pose estimation. This technical report briefly introduces the method proposed by our team "NTU-ICG" for the CVPR 2024 UG2+ Challenge Track 5. This work develops a clustering-based learning detection approach, CL-Det, for UAV tracking and pose estimation using two types of LiDARs, namely Livox Avia and LiDAR 360. We combine the information from the two data sources to locate drones in 3D. We first align the timestamps of Livox Avia data and LiDAR 360 data and then separate the point cloud of objects of interest (OOIs) from the environment. The point cloud of OOIs is clustered using the DBSCAN method, with the midpoint of the largest cluster assumed to be the UAV position. Furthermore, we utilize historical estimations to fill in missing data. The proposed method shows competitive pose estimation performance and ranks 5th on the final leaderboard of the CVPR 2024 UG2+ Challenge.

Updated: 2024-05-27 06:33:25

标题: 基于聚类的无人机追踪和姿态估计学习

摘要: 无人机的跟踪和姿态估计在各种与无人机相关的任务中起着至关重要的作用，如编队控制和反无人机措施。在三维空间准确检测和跟踪无人机仍然是一个特别具有挑战性的问题，因为它需要从不同的飞行环境中提取微型无人机的稀疏特征，并在敏捷飞行过程中持续匹配对应物。通常，摄像头和LiDAR是用于捕捉无人机轨迹的两种主要传感器。然而，这两种传感器在无人机分类和姿态估计方面存在局限性。本技术报告简要介绍了我们团队提出的方法“NTU-ICG”用于CVPR 2024 UG2+挑战赛Track 5的。这项工作开发了一种基于聚类的学习检测方法CL-Det，用于使用两种LiDAR（Livox Avia和LiDAR 360）进行无人机跟踪和姿态估计。我们结合两个数据源的信息来在三维空间中定位无人机。我们首先对Livox Avia数据和LiDAR 360数据的时间戳进行对齐，然后将感兴趣对象（OOIs）的点云从环境中分离出来。采用DBSCAN方法对OOIs的点云进行聚类，假设最大聚类的中点是无人机位置。此外，我们利用历史估计来填补缺失数据。所提出的方法显示出竞争性的姿态估计性能，并在CVPR 2024 UG2+挑战赛的最终排行榜上排名第5。

更新时间: 2024-05-27 06:33:25

领域: cs.RO,cs.AI

下载: http://arxiv.org/abs/2405.16867v1

An Investigation of Conformal Isometry Hypothesis for Grid Cells

This paper investigates the conformal isometry hypothesis as a potential explanation for the emergence of hexagonal periodic patterns in the response maps of grid cells. The hypothesis posits that the activities of the population of grid cells form a high-dimensional vector in the neural space, representing the agent's self-position in 2D physical space. As the agent moves in the 2D physical space, the vector rotates in a 2D manifold in the neural space, driven by a recurrent neural network. The conformal isometry hypothesis proposes that this 2D manifold in the neural space is a conformally isometric embedding of the 2D physical space, in the sense that local displacements of the vector in neural space are proportional to local displacements of the agent in the physical space. Thus the 2D manifold forms an internal map of the 2D physical space, equipped with an internal metric. In this paper, we conduct numerical experiments to show that this hypothesis underlies the hexagon periodic patterns of grid cells. We also conduct theoretical analysis to further support this hypothesis. In addition, we propose a conformal modulation of the input velocity of the agent so that the recurrent neural network of grid cells satisfies the conformal isometry hypothesis automatically. To summarize, our work provides numerical and theoretical evidences for the conformal isometry hypothesis for grid cells and may serve as a foundation for further development of normative models of grid cells and beyond.

Updated: 2024-05-27 06:31:39

标题: 一个关于网格细胞共形等距假设的调查

摘要: 这篇论文探讨了共形同构假设作为解释网格细胞响应图中出现六边形周期模式的潜在说明。该假设认为，网格细胞群体的活动形成神经空间中的高维向量，代表2D物理空间中的主体位置。当主体在2D物理空间中移动时，向量在神经空间中的2D流形中旋转，由一个递归神经网络驱动。共形同构假设提出，在神经空间中的这个2D流形是2D物理空间的一个共形同构嵌入，即神经空间中向量的局部位移与物理空间中主体的局部位移成比例。因此，这个2D流形形成了2D物理空间的内部地图，配备有内部度量。在这篇论文中，我们进行了数值实验，以展示这一假设是六边形周期模式的基础。我们还进行了理论分析，进一步支持这一假设。此外，我们提出了对主体输入速度进行共形调制，以使网格细胞的递归神经网络自动满足共形同构假设。总之，我们的工作为网格细胞的共形同构假设提供了数值和理论证据，并可能为进一步发展网格细胞及其他模型的规范模型奠定基础。

更新时间: 2024-05-27 06:31:39

领域: q-bio.NC,cs.LG,stat.ML

下载: http://arxiv.org/abs/2405.16865v1

DSEG-LIME: Improving Image Explanation by Hierarchical Data-Driven Segmentation

Explainable Artificial Intelligence is critical in unraveling decision-making processes in complex machine learning models. LIME (Local Interpretable Model-agnostic Explanations) is a well-known XAI framework for image analysis. It utilizes image segmentation to create features to identify relevant areas for classification. Consequently, poor segmentation can compromise the consistency of the explanation and undermine the importance of the segments, affecting the overall interpretability. Addressing these challenges, we introduce DSEG-LIME (Data-Driven Segmentation LIME), featuring: i) a data-driven segmentation for human-recognized feature generation, and ii) a hierarchical segmentation procedure through composition. We benchmark DSEG-LIME on pre-trained models with images from the ImageNet dataset - scenarios without domain-specific knowledge. The analysis includes a quantitative evaluation using established XAI metrics, complemented by a qualitative assessment through a user study. Our findings demonstrate that DSEG outperforms in most of the XAI metrics and enhances the alignment of explanations with human-recognized concepts, significantly improving interpretability. The code is available under: https://github. com/patrick-knab/DSEG-LIME

Updated: 2024-05-27 06:28:28

标题: DSEG-LIME：通过分层数据驱动分割改善图像解释

摘要: 可解释的人工智能在揭示复杂机器学习模型中的决策过程中至关重要。LIME（Local Interpretable Model-agnostic Explanations）是一个著名的用于图像分析的XAI框架。它利用图像分割来创建特征，以识别分类中的相关区域。因此，较差的分割可能会影响解释的一致性，削弱段落的重要性，从而影响整体的可解释性。为了解决这些挑战，我们介绍了DSEG-LIME（Data-Driven Segmentation LIME），具有以下特点：i）用于人类识别特征生成的数据驱动分割，以及ii）通过组成的分层分割过程。我们在ImageNet数据集的图像上对预训练模型进行了DSEG-LIME性能基准测试-在没有领域特定知识的情况下。分析包括使用已建立的XAI指标进行定量评估，并通过用户研究进行定性评估。我们的研究结果表明，DSEG在大多数XAI指标上表现优异，并提高了解释与人类识别概念的一致性，显著提高了可解释性。代码可在以下网址找到：https://github.com/patrick-knab/DSEG-LIME

更新时间: 2024-05-27 06:28:28

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2403.07733v2

NCIDiff: Non-covalent Interaction-generative Diffusion Model for Improving Reliability of 3D Molecule Generation Inside Protein Pocket

Advancements in deep generative modeling have changed the paradigm of drug discovery. Among such approaches, target-aware methods that exploit 3D structures of protein pockets were spotlighted for generating ligand molecules with their plausible binding modes. While docking scores superficially assess the quality of generated ligands, closer inspection of the binding structures reveals the inconsistency in local interactions between a pocket and generated ligands. Here, we address the issue by explicitly generating non-covalent interactions (NCIs), which are universal patterns throughout protein-ligand complexes. Our proposed model, NCIDiff, simultaneously denoises NCI types of protein-ligand edges along with a 3D graph of a ligand molecule during the sampling. With the NCI-generating strategy, our model generates ligands with more reliable NCIs, especially outperforming the baseline diffusion-based models. We further adopted inpainting techniques on NCIs to further improve the quality of the generated molecules. Finally, we showcase the applicability of NCIDiff on drug design tasks for real-world settings with specialized objectives by guiding the generation process with desired NCI patterns.

Updated: 2024-05-27 06:26:55

标题: NCIDiff：用于改善蛋白质口袋内三维分子生成可靠性的非共价相互作用生成扩散模型

摘要: 深度生成建模的进步改变了药物发现的范式。在这些方法中，利用蛋白质口袋的三维结构生成配体分子的目标感知方法备受关注，以生成具有合理结合模式的配体分子。虽然对接得分表面上评估了生成的配体的质量，但对结合结构进行更密切的检查揭示了口袋和生成的配体之间局部相互作用的不一致性。在这里，我们通过明确生成非共价相互作用（NCIs）来解决这个问题，这些相互作用是蛋白质-配体复合物中的普遍模式。我们提出的模型NCIDiff在采样过程中同时去噪蛋白质-配体边的NCI类型以及配体分子的三维图。通过NCI生成策略，我们的模型生成具有更可靠NCI的配体，尤其在性能上优于基线扩散模型。我们进一步采用NCIs上的修补技术来进一步提高生成的分子的质量。最后，我们展示了NCIDiff在具有特定目标的真实世界药物设计任务中的适用性，通过指导生成过程使用期望的NCI模式。

更新时间: 2024-05-27 06:26:55

领域: q-bio.BM,cs.LG,physics.bio-ph

下载: http://arxiv.org/abs/2405.16861v1

Think Before You Act: A Two-Stage Framework for Mitigating Gender Bias Towards Vision-Language Tasks

Gender bias in vision-language models (VLMs) can reinforce harmful stereotypes and discrimination. In this paper, we focus on mitigating gender bias towards vision-language tasks. We identify object hallucination as the essence of gender bias in VLMs. Existing VLMs tend to focus on salient or familiar attributes in images but ignore contextualized nuances. Moreover, most VLMs rely on the co-occurrence between specific objects and gender attributes to infer the ignored features, ultimately resulting in gender bias. We propose GAMA, a task-agnostic generation framework to mitigate gender bias. GAMA consists of two stages: narrative generation and answer inference. During narrative generation, GAMA yields all-sided but gender-obfuscated narratives, which prevents premature concentration on localized image features, especially gender attributes. During answer inference, GAMA integrates the image, generated narrative, and a task-specific question prompt to infer answers for different vision-language tasks. This approach allows the model to rethink gender attributes and answers. We conduct extensive experiments on GAMA, demonstrating its debiasing and generalization ability.

Updated: 2024-05-27 06:20:58

标题: 三思而后行：一个两阶段框架用于减轻视觉-语言任务中的性别偏见

摘要: 视觉-语言模型（VLMs）中的性别偏见可能会强化有害的刻板印象和歧视。本文关注减轻视觉-语言任务中的性别偏见。我们确定对象幻觉是VLMs中性别偏见的本质。现有的VLMs倾向于关注图像中显著或熟悉的属性，但忽略了上下文中的微妙之处。此外，大多数VLMs依赖于特定对象与性别属性之间的共现来推断被忽略的特征，最终导致性别偏见。我们提出了GAMA，一个用于减轻性别偏见的任务不可知生成框架。GAMA包括两个阶段：叙事生成和答案推理。在叙事生成阶段，GAMA生成全面但模糊性别的叙事，防止过早集中在局部图像特征上，特别是性别属性。在答案推理阶段，GAMA整合图像、生成的叙事和任务特定的问题提示，推断不同视觉-语言任务的答案。这种方法允许模型重新思考性别属性和答案。我们对GAMA进行了大量实验，展示了其去偏见和泛化能力。

更新时间: 2024-05-27 06:20:58

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2405.16860v1

Timely Fusion of Surround Radar/Lidar for Object Detection in Autonomous Driving Systems

Fusing Radar and Lidar sensor data can fully utilize their complementary advantages and provide more accurate reconstruction of the surrounding for autonomous driving systems. Surround Radar/Lidar can provide 360-degree view sampling with the minimal cost, which are promising sensing hardware solutions for autonomous driving systems. However, due to the intrinsic physical constraints, the rotating speed of surround Radar, and thus the frequency to generate Radar data frames, is much lower than surround Lidar. Existing Radar/Lidar fusion methods have to work at the low frequency of surround Radar, which cannot meet the high responsiveness requirement of autonomous driving systems.This paper develops techniques to fuse surround Radar/Lidar with working frequency only limited by the faster surround Lidar instead of the slower surround Radar, based on the state-of-the-art object detection model MVDNet. The basic idea of our approach is simple: we let MVDNet work with temporally unaligned data from Radar/Lidar, so that fusion can take place at any time when a new Lidar data frame arrives, instead of waiting for the slow Radar data frame. However, directly applying MVDNet to temporally unaligned Radar/Lidar data greatly degrades its object detection accuracy. The key information revealed in this paper is that we can achieve high output frequency with little accuracy loss by enhancing the training procedure to explore the temporal redundancy in MVDNet so that it can tolerate the temporal unalignment of input data. We explore several different ways of training enhancement and compare them quantitatively with experiments.

Updated: 2024-05-27 06:09:43

标题: 自动驾驶系统中环绕雷达/激光雷达的及时融合用于目标检测

摘要: 融合雷达和激光雷达传感器数据可以充分利用它们互补的优势，并为自动驾驶系统提供更精确的周围重建。环绕雷达/激光雷达可以以最小成本提供360度视野采样，这些是自动驾驶系统的有前途的感知硬件解决方案。然而，由于固有的物理限制，环绕雷达的旋转速度，以及因此生成雷达数据帧的频率，远低于环绕激光雷达。现有的雷达/激光雷达融合方法必须在环绕雷达的低频率下工作，这无法满足自动驾驶系统的高响应要求。本文开发了一种技术，可以融合环绕雷达/激光雷达，其工作频率仅受更快的环绕激光雷达限制，而不是更慢的环绕雷达，基于最先进的对象检测模型MVDNet。我们方法的基本思想很简单：我们让MVDNet处理来自雷达/激光雷达的时间不对齐数据，以便在新的激光雷达数据帧到达时随时进行融合，而不是等待缓慢的雷达数据帧。然而，直接将MVDNet应用于时间不对齐的雷达/激光雷达数据会显著降低其对象检测准确度。本文揭示的关键信息是，通过增强训练过程来探索MVDNet中的时间冗余，从而使其能够容忍输入数据的时间不对齐，我们可以在几乎没有准确度损失的情况下实现高输出频率。我们探索了几种不同的训练增强方式，并通过实验证明了它们之间的定量比较。

更新时间: 2024-05-27 06:09:43

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2309.04806v3

A Systematic Review of Low-Rank and Local Low-Rank Matrix Approximation in Big Data Medical Imaging

The large volume and complexity of medical imaging datasets are bottlenecks for storage, transmission, and processing. To tackle these challenges, the application of low-rank matrix approximation (LRMA) and its derivative, local LRMA (LLRMA) has demonstrated potential. A detailed analysis of the literature identifies LRMA and LLRMA methods applied to various imaging modalities, and the challenges and limitations associated with existing LRMA and LLRMA methods are addressed. We note a significant shift towards a preference for LLRMA in the medical imaging field since 2015, demonstrating its potential and effectiveness in capturing complex structures in medical data compared to LRMA. Acknowledging the limitations of shallow similarity methods used with LLRMA, we suggest advanced semantic image segmentation for similarity measure, explaining in detail how it can be used to measure similar patches and its feasibility. We note that LRMA and LLRMA are mainly applied to unstructured medical data, and we propose extending their application to different medical data types, including structured and semi-structured. This paper also discusses how LRMA and LLRMA can be applied to regular data with missing entries and the impact of inaccuracies in predicting missing values and their effects. We discuss the impact of patch size and propose the use of random search (RS) to determine the optimal patch size. To enhance feasibility, a hybrid approach using Bayesian optimization and RS is proposed, which could improve the application of LRMA and LLRMA in medical imaging.

Updated: 2024-05-27 06:00:15

标题: 大数据医学影像中低秩和局部低秩矩阵逼近的系统综述

摘要: 医学影像数据集的大容量和复杂性是存储、传输和处理的瓶颈。为了应对这些挑战，低秩矩阵逼近（LRMA）及其衍生物局部LRMA（LLRMA）的应用已经展现出潜力。文献的详细分析确定了LRMA和LLRMA方法在各种成像模式下的应用，以及现有LRMA和LLRMA方法的挑战和局限性。自2015年以来，我们注意到在医学影像领域对LLRMA的偏好明显增加，显示了它相对于LRMA在捕捉医学数据中复杂结构方面的潜力和有效性。鉴于LLRMA所用的浅层相似性方法的局限性，我们建议采用先进的语义图像分割来衡量相似性，详细解释了它如何用来衡量相似的块及其可行性。我们注意到LRMA和LLRMA主要应用于非结构化医学数据，并提议将它们的应用扩展到不同类型的医学数据，包括结构化和半结构化数据。本文还讨论了LRMA和LLRMA如何应用于具有缺失条目的常规数据，以及在预测缺失值和其影响的准确性方面的影响。我们讨论了块大小的影响，并提出使用随机搜索（RS）来确定最佳块大小。为提高可行性，提出了一种使用贝叶斯优化和RS的混合方法，可以改进LRMA和LLRMA在医学影像中的应用。

更新时间: 2024-05-27 06:00:15

领域: eess.IV,cs.CV,cs.LG

下载: http://arxiv.org/abs/2402.14045v3

Split-Ensemble: Efficient OOD-aware Ensemble via Task and Model Splitting

Uncertainty estimation is crucial for machine learning models to detect out-of-distribution (OOD) inputs. However, the conventional discriminative deep learning classifiers produce uncalibrated closed-set predictions for OOD data. A more robust classifiers with the uncertainty estimation typically require a potentially unavailable OOD dataset for outlier exposure training, or a considerable amount of additional memory and compute to build ensemble models. In this work, we improve on uncertainty estimation without extra OOD data or additional inference costs using an alternative Split-Ensemble method. Specifically, we propose a novel subtask-splitting ensemble training objective, where a common multiclass classification task is split into several complementary subtasks. Then, each subtask's training data can be considered as OOD to the other subtasks. Diverse submodels can therefore be trained on each subtask with OOD-aware objectives. The subtask-splitting objective enables us to share low-level features across submodels to avoid parameter and computational overheads. In particular, we build a tree-like Split-Ensemble architecture by performing iterative splitting and pruning from a shared backbone model, where each branch serves as a submodel corresponding to a subtask. This leads to improved accuracy and uncertainty estimation across submodels under a fixed ensemble computation budget. Empirical study with ResNet-18 backbone shows Split-Ensemble, without additional computation cost, improves accuracy over a single model by 0.8%, 1.8%, and 25.5% on CIFAR-10, CIFAR-100, and Tiny-ImageNet, respectively. OOD detection for the same backbone and in-distribution datasets surpasses a single model baseline by, correspondingly, 2.2%, 8.1%, and 29.6% mean AUROC.

Updated: 2024-05-27 05:59:06

标题: Split-Ensemble: 通过任务和模型拆分实现高效的OOD感知集成

摘要: 不确定性估计对于机器学习模型检测超出分布（OOD）输入至关重要。然而，传统的判别式深度学习分类器对OOD数据产生未校准的封闭集预测。通常，需要更加稳健的分类器来进行不确定性估计，这可能需要一个潜在不可用的OOD数据集用于异常值暴露训练，或者需要大量额外的内存和计算资源来构建集成模型。在本研究中，我们通过一种替代的分裂集成方法，在不需要额外OOD数据或额外推断成本的情况下改进了不确定性估计。具体来说，我们提出了一种新颖的子任务分裂集成训练目标，将常见的多类分类任务分成几个互补的子任务。然后，每个子任务的训练数据可以被视为对其他子任务的OOD。因此，每个子任务都可以基于OOD感知的目标进行训练。子任务分裂目标使我们能够跨子模型共享低级特征，以避免参数和计算开销。特别地，我们通过从共享的骨干模型执行迭代分裂和修剪，构建了一个类似树状的分裂集成架构，其中每个分支作为对应于一个子任务的子模型。这样，在固定的集成计算预算下，可以提高各个子模型的准确性和不确定性估计。使用ResNet-18骨干的实证研究显示，分裂集成在CIFAR-10，CIFAR-100和Tiny-ImageNet上分别将准确性提高了0.8％，1.8％和25.5％，而不需要额外的计算成本。对于相同的骨干和内分布数据集，OOD检测与单一模型基线相比分别提高了2.2％，8.1％和29.6％的平均AUROC。

更新时间: 2024-05-27 05:59:06

领域: cs.LG,cs.CV

下载: http://arxiv.org/abs/2312.09148v2

EM Distillation for One-step Diffusion Models

While diffusion models can learn complex distributions, sampling requires a computationally expensive iterative process. Existing distillation methods enable efficient sampling, but have notable limitations, such as performance degradation with very few sampling steps, reliance on training data access, or mode-seeking optimization that may fail to capture the full distribution. We propose EM Distillation (EMD), a maximum likelihood-based approach that distills a diffusion model to a one-step generator model with minimal loss of perceptual quality. Our approach is derived through the lens of Expectation-Maximization (EM), where the generator parameters are updated using samples from the joint distribution of the diffusion teacher prior and inferred generator latents. We develop a reparametrized sampling scheme and a noise cancellation technique that together stabilizes the distillation process. We further reveal an interesting connection of our method with existing methods that minimize mode-seeking KL. EMD outperforms existing one-step generative methods in terms of FID scores on ImageNet-64 and ImageNet-128, and compares favorably with prior work on distilling text-to-image diffusion models.

Updated: 2024-05-27 05:55:22

标题: EM精馏用于一步扩散模型

摘要: 扩散模型可以学习复杂的分布，但采样需要一个计算昂贵的迭代过程。现有的蒸馏方法可以实现高效的采样，但存在明显的局限性，如在采样步骤很少时性能下降，依赖于训练数据访问，或者寻找模式的优化可能无法捕捉到完整的分布。我们提出了EM蒸馏（EMD），这是一种基于最大似然的方法，将扩散模型提炼成一个一步生成器模型，同时最小化视觉质量的损失。我们的方法是通过期望最大化（EM）的视角推导出来的，在这里，生成器参数是通过从扩散教师先验和推断生成器潜变量的联合分布中获取的样本来更新的。我们开发了一种重新参数化的采样方案和噪声消除技术，共同稳定了蒸馏过程。此外，我们进一步揭示了我们的方法与现有的最小化寻找模式KL的方法之间的有趣联系。EMD在ImageNet-64和ImageNet-128的FID分数方面优于现有的一步生成方法，并与先前在提炼文本到图像扩散模型方面的工作相比表现良好。

更新时间: 2024-05-27 05:55:22

领域: cs.LG,cs.AI,stat.ML

下载: http://arxiv.org/abs/2405.16852v1

Temporal Spiking Neural Networks with Synaptic Delay for Graph Reasoning

Spiking neural networks (SNNs) are investigated as biologically inspired models of neural computation, distinguished by their computational capability and energy efficiency due to precise spiking times and sparse spikes with event-driven computation. A significant question is how SNNs can emulate human-like graph-based reasoning of concepts and relations, especially leveraging the temporal domain optimally. This paper reveals that SNNs, when amalgamated with synaptic delay and temporal coding, are proficient in executing (knowledge) graph reasoning. It is elucidated that spiking time can function as an additional dimension to encode relation properties via a neural-generalized path formulation. Empirical results highlight the efficacy of temporal delay in relation processing and showcase exemplary performance in diverse graph reasoning tasks. The spiking model is theoretically estimated to achieve $20\times$ energy savings compared to non-spiking counterparts, deepening insights into the capabilities and potential of biologically inspired SNNs for efficient reasoning. The code is available at https://github.com/pkuxmq/GRSNN.

Updated: 2024-05-27 05:53:30

标题: 具有突触延迟的时间脉冲神经网络用于图推理

摘要: 脉冲神经网络（SNNs）被研究作为受生物启发的神经计算模型，其特点是由于精确的脉冲时间和稀疏的脉冲而具有计算能力和能源效率。一个重要的问题是SNNs如何模拟人类类似的基于图的概念和关系推理，特别是如何最佳地利用时间域。本文揭示了当SNNs与突触延迟和时间编码相结合时，在执行（知识）图推理方面表现出高效性。通过神经广义路径的形式阐明了脉冲时间可以作为额外的维度来编码关系属性。实证结果突出了时间延迟在关系处理中的有效性，并展示了在各种图推理任务中的出色表现。理论上估计，与非脉冲对应物相比，脉冲模型能够实现20倍的能源节约，深化了对受生物启发的SNNs在高效推理方面的能力和潜力的洞察。代码可以在 https://github.com/pkuxmq/GRSNN 中找到。

更新时间: 2024-05-27 05:53:30

领域: cs.NE,cs.AI,cs.LG

下载: http://arxiv.org/abs/2405.16851v1

UniCompress: Enhancing Multi-Data Medical Image Compression with Knowledge Distillation

In the field of medical image compression, Implicit Neural Representation (INR) networks have shown remarkable versatility due to their flexible compression ratios, yet they are constrained by a one-to-one fitting approach that results in lengthy encoding times. Our novel method, ``\textbf{UniCompress}'', innovatively extends the compression capabilities of INR by being the first to compress multiple medical data blocks using a single INR network. By employing wavelet transforms and quantization, we introduce a codebook containing frequency domain information as a prior input to the INR network. This enhances the representational power of INR and provides distinctive conditioning for different image blocks. Furthermore, our research introduces a new technique for the knowledge distillation of implicit representations, simplifying complex model knowledge into more manageable formats to improve compression ratios. Extensive testing on CT and electron microscopy (EM) datasets has demonstrated that UniCompress outperforms traditional INR methods and commercial compression solutions like HEVC, especially in complex and high compression scenarios. Notably, compared to existing INR techniques, UniCompress achieves a 4$\sim$5 times increase in compression speed, marking a significant advancement in the field of medical image compression. Codes will be publicly available.

Updated: 2024-05-27 05:52:13

标题: UniCompress：利用知识蒸馏技术增强多数据医学图像压缩

摘要: 在医学图像压缩领域，由于其灵活的压缩比，隐式神经表示（INR）网络显示出了卓越的多功能性，但受到一对一拟合方法的限制，导致编码时间过长。我们的新方法“UniCompress”通过首次使用单个INR网络压缩多个医学数据块，创新地扩展了INR的压缩能力。通过采用小波变换和量化，我们引入了一个包含频域信息的码书作为INR网络的先验输入。这增强了INR的表达能力，并为不同的图像块提供了独特的条件。此外，我们的研究引入了一种新的技术，用于将隐式表示的知识蒸馏，将复杂模型知识简化为更易处理的格式，以提高压缩比。对CT和电子显微镜（EM）数据集的广泛测试表明，UniCompress在复杂和高压缩场景下优于传统的INR方法和HEVC等商业压缩解决方案。值得注意的是，与现有的INR技术相比，UniCompress在压缩速度上实现了4-5倍的增加，标志着医学图像压缩领域的重大进步。代码将公开提供。

更新时间: 2024-05-27 05:52:13

领域: eess.IV,cs.CV,cs.LG

下载: http://arxiv.org/abs/2405.16850v1

TokenUnify: Scalable Autoregressive Visual Pre-training with Mixture Token Prediction

Autoregressive next-token prediction is a standard pretraining method for large-scale language models, but its application to vision tasks is hindered by the non-sequential nature of image data, leading to cumulative errors. Most vision models employ masked autoencoder (MAE) based pretraining, which faces scalability issues. To address these challenges, we introduce \textbf{TokenUnify}, a novel pretraining method that integrates random token prediction, next-token prediction, and next-all token prediction. We provide theoretical evidence demonstrating that TokenUnify mitigates cumulative errors in visual autoregression. Cooperated with TokenUnify, we have assembled a large-scale electron microscopy (EM) image dataset with ultra-high resolution, ideal for creating spatially correlated long sequences. This dataset includes over 120 million annotated voxels, making it the largest neuron segmentation dataset to date and providing a unified benchmark for experimental validation. Leveraging the Mamba network inherently suited for long-sequence modeling on this dataset, TokenUnify not only reduces the computational complexity but also leads to a significant 45\% improvement in segmentation performance on downstream EM neuron segmentation tasks compared to existing methods. Furthermore, TokenUnify demonstrates superior scalability over MAE and traditional autoregressive methods, effectively bridging the gap between pretraining strategies for language and vision models. Code is available at \url{https://github.com/ydchen0806/TokenUnify}.

Updated: 2024-05-27 05:45:51

标题: TokenUnify：使用混合令牌预测进行可扩展的自回归视觉预训练

摘要: 自回归的下一个标记预测是大规模语言模型的标准预训练方法，但其在视觉任务中的应用受到图像数据的非顺序性的阻碍，导致累积误差。大多数视觉模型采用基于掩蔽自编码器（MAE）的预训练方法，面临可扩展性问题。为了解决这些挑战，我们介绍了一种新颖的预训练方法TokenUnify，它集成了随机标记预测、下一个标记预测和下一个所有标记预测。我们提供了理论证据表明TokenUnify可以减少视觉自回归中的累积误差。与TokenUnify合作，我们组装了一个超高分辨率的大规模电子显微镜（EM）图像数据集，非常适合创建空间相关的长序列。这个数据集包括超过1.2亿个带注释的体素，是迄今为止最大的神经元分割数据集，为实验验证提供了统一的基准。利用在这个数据集上本身适合长序列建模的Mamba网络，TokenUnify不仅减少了计算复杂性，还相对于现有方法在下游EM神经元分割任务中的分割性能显著提高了45％。此外，TokenUnify表现出优越的可扩展性，相对于MAE和传统的自回归方法，有效地弥合了语言和视觉模型的预训练策略之间的差距。代码可在\url{https://github.com/ydchen0806/TokenUnify}上找到。

更新时间: 2024-05-27 05:45:51

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2405.16847v1

On Mesa-Optimization in Autoregressively Trained Transformers: Emergence and Capability

Autoregressively trained transformers have brought a profound revolution to the world, especially with their in-context learning (ICL) ability to address downstream tasks. Recently, several studies suggest that transformers learn a mesa-optimizer during autoregressive (AR) pretraining to implement ICL. Namely, the forward pass of the trained transformer is equivalent to optimizing an inner objective function in-context. However, whether the practical non-convex training dynamics will converge to the ideal mesa-optimizer is still unclear. Towards filling this gap, we investigate the non-convex dynamics of a one-layer linear causal self-attention model autoregressively trained by gradient flow, where the sequences are generated by an AR process $x_{t+1} = W x_t$. First, under a certain condition of data distribution, we prove that an autoregressively trained transformer learns $W$ by implementing one step of gradient descent to minimize an ordinary least squares (OLS) problem in-context. It then applies the learned $\widehat{W}$ for next-token prediction, thereby verifying the mesa-optimization hypothesis. Next, under the same data conditions, we explore the capability limitations of the obtained mesa-optimizer. We show that a stronger assumption related to the moments of data is the sufficient and necessary condition that the learned mesa-optimizer recovers the distribution. Besides, we conduct exploratory analyses beyond the first data condition and prove that generally, the trained transformer will not perform vanilla gradient descent for the OLS problem. Finally, our simulation results verify the theoretical results.

Updated: 2024-05-27 05:41:06

标题: 关于自回归训练的变压器中的Mesa优化：出现和能力

摘要: 自回归训练的变压器已经给世界带来了深刻的革命，特别是其在上下文学习（ICL）能力方面，以解决下游任务。最近，几项研究表明，变压器在自回归（AR）预训练过程中学习了一个mesa-优化器，以实现ICL。换句话说，训练后的变压器的前向传递等效于在上下文中优化内部目标函数。然而，实际的非凸训练动态是否会收敛到理想的mesa-优化器仍不清楚。为了填补这一空白，我们调查了通过梯度流自回归训练的一个层线性因果自注意模型的非凸动态，其中序列是由AR过程$x_{t+1} = W x_t$生成的。首先，在一定的数据分布条件下，我们证明了一个自回归训练的变压器通过实现一步梯度下降来学习$W$，以最小化上下文中的普通最小二乘（OLS）问题。然后，它应用学习到的$\widehat{W}$进行下一个标记的预测，从而验证mesa-优化假设。接下来，在相同的数据条件下，我们探讨了获得的mesa-优化器的能力限制。我们表明，与数据的矩相关的更强的假设是学习到的mesa-优化器恢复分布的充分且必要条件。此外，我们进行了超出第一个数据条件的探索分析，并证明通常训练后的变压器不会执行普通的梯度下降来解决OLS问题。最后，我们的模拟结果验证了理论结果。

更新时间: 2024-05-27 05:41:06

领域: cs.LG,cs.CL,stat.ML

下载: http://arxiv.org/abs/2405.16845v1

Growth in products of matrices: fastest, average, and generic

The problems that we consider in this paper are as follows. Let A and B be 2x2 matrices (over reals). Let w(A, B) be a word of length n. After evaluating w(A, B) as a product of matrices, we get a 2x2 matrix, call it W. What is the largest (by the absolute value) possible entry of W, over all w(A, B) of length n, as a function of n? What is the expected absolute value of the largest (by the absolute value) entry in a random product of n matrices, where each matrix is A or B with probability 0.5? What is the Lyapunov exponent for a random matrix product like that? We give partial answer to the first of these questions and an essentially complete answer to the second question. For the third question (the most difficult of the three), we offer a very simple method to produce an upper bound on the Lyapunov exponent in the case where all entries of the matrices A and B are nonnegative.

Updated: 2024-05-27 05:33:27

标题: 矩阵乘积的增长：最快、平均和一般情况

摘要: 在这篇论文中我们考虑的问题如下。设A和B是2x2矩阵（实数）。设w(A, B)是长度为n的字。在将w(A, B)作为矩阵乘积进行评估后，我们得到一个2x2矩阵，称为W。在所有长度为n的w(A, B)中，W的最大（绝对值）可能的元素是多少，作为n的函数？在n个矩阵的随机乘积中，每个矩阵为A或B的概率为0.5，最大（绝对值）元素的期望绝对值是多少？这种随机矩阵乘积的Lyapunov指数是多少？我们对第一个问题给出了部分答案，并对第二个问题基本上给出了完整答案。对于第三个问题（三个问题中最困难的），我们提供了一个非常简单的方法，在矩阵A和B的所有元素非负的情况下，产生Lyapunov指数的上限。

更新时间: 2024-05-27 05:33:27

领域: math.GR,cs.CR,math.CO,math.DS,math.PR

下载: http://arxiv.org/abs/2405.00610v4

Non-stochastic Bandits With Evolving Observations

We introduce a novel online learning framework that unifies and generalizes pre-established models, such as delayed and corrupted feedback, to encompass adversarial environments where action feedback evolves over time. In this setting, the observed loss is arbitrary and may not correlate with the true loss incurred, with each round updating previous observations adversarially. We propose regret minimization algorithms for both the full-information and bandit settings, with regret bounds quantified by the average feedback accuracy relative to the true loss. Our algorithms match the known regret bounds across many special cases, while also introducing previously unknown bounds.

Updated: 2024-05-27 05:32:46

标题: 具有不断演化观察的非随机老虎机

摘要: 我们介绍了一种新颖的在线学习框架，统一和概括了预先建立的模型，如延迟和损坏的反馈，以涵盖在行动反馈随时间演变的对抗环境中。在这种情况下，观察到的损失是任意的，可能与实际发生的损失不相关，每一轮都会对先前的观察进行对抗性更新。我们提出了适用于全信息和老虎机设置的后悔最小化算法，后悔界由平均反馈准确性与真实损失相关度量。我们的算法与许多特殊情况下已知的后悔界相匹配，同时也引入了以前未知的边界。

更新时间: 2024-05-27 05:32:46

领域: cs.LG

下载: http://arxiv.org/abs/2405.16843v1

How Do Recommendation Models Amplify Popularity Bias? An Analysis from the Spectral Perspective

Recommendation Systems (RS) are often plagued by popularity bias. When training a recommendation model on a typically long-tailed dataset, the model tends to not only inherit this bias but often exacerbate it, resulting in over-representation of popular items in the recommendation lists. This study conducts comprehensive empirical and theoretical analyses to expose the root causes of this phenomenon, yielding two core insights: 1) Item popularity is memorized in the principal spectrum of the score matrix predicted by the recommendation model; 2) The dimension collapse phenomenon amplifies the relative prominence of the principal spectrum, thereby intensifying the popularity bias. Building on these insights, we propose a novel debiasing strategy that leverages a spectral norm regularizer to penalize the magnitude of the principal singular value. We have developed an efficient algorithm to expedite the calculation of the spectral norm by exploiting the spectral property of the score matrix. Extensive experiments across seven real-world datasets and three testing paradigms have been conducted to validate the superiority of the proposed method.

Updated: 2024-05-27 05:28:57

标题: 推荐模型如何放大流行偏见？来自频谱视角的分析

摘要: 推荐系统经常受到流行度偏见的困扰。当在通常具有长尾数据集上训练推荐模型时，模型往往不仅继承了这种偏见，而且常常加剧了这种偏见，导致推荐列表中流行物品的过度呈现。本研究进行了全面的实证和理论分析，揭示了这一现象的根本原因，得出了两个核心见解：1）物品的流行度被预测的得分矩阵的主谱记忆；2）维度崩溃现象增强了主谱的相对突出性，从而加剧了流行度偏见。基于这些见解，我们提出了一种新的去偏见策略，利用谱范数正则化器来惩罚主奇异值的大小。我们已经开发了一种有效的算法，通过利用得分矩阵的谱特性来加速计算谱范数。在七个真实世界数据集和三种测试范式上进行了广泛的实验证实了所提方法的优越性。

更新时间: 2024-05-27 05:28:57

领域: cs.IR,cs.AI

下载: http://arxiv.org/abs/2404.12008v2

ALPINE: Unveiling the Planning Capability of Autoregressive Learning in Language Models

In this paper, we present the findings of our Project ALPINE which stands for ``Autoregressive Learning for Planning In NEtworks." Project ALPINE initiates a theoretical investigation into the development of planning capabilities in Transformer-based language models through their autoregressive learning mechanisms, aiming to identify any potential limitations in their planning abilities. We abstract planning as a network path-finding task where the objective is to generate a valid path from a specified source node to a designated target node. In terms of expressiveness, we show that the Transformer is capable of executing path-finding by embedding the adjacency and reachability matrices within its weights. Our theoretical analysis of the gradient-based learning dynamic of the Transformer reveals that the Transformer is capable of learning both the adjacency matrix and a limited form of the reachability matrix. These theoretical insights are then validated through experiments, which demonstrate that the Transformer indeed learns the adjacency matrix and an incomplete reachability matrix, which aligns with the predictions made in our theoretical analysis. Additionally, when applying our methodology to a real-world planning benchmark, called Blocksworld, our observations remain consistent. Our theoretical and empirical analyses further unveil a potential limitation of Transformer in path-finding: it cannot identify reachability relationships through transitivity, and thus would fail when path concatenation is needed to generate a path. In summary, our findings shed new light on how the internal mechanisms of autoregressive learning enable planning in networks. This study may contribute to our understanding of the general planning capabilities in other related domains.

Updated: 2024-05-27 05:25:05

标题: 《ALPINE: 揭示自回归学习在语言模型中的规划能力》

摘要: 在这篇论文中，我们介绍了我们的ALPINE项目的研究结果，ALPINE代表“Autoregressive Learning for Planning In NEtworks”。ALPINE项目通过自回归学习机制对基于Transformer的语言模型中规划能力的发展进行了理论研究，旨在确定它们规划能力中的潜在限制。我们将规划抽象为网络路径查找任务，目标是从指定的源节点生成到指定目标节点的有效路径。在表达能力方面，我们展示了Transformer能够通过将邻接和可达矩阵嵌入其权重中执行路径查找。我们对Transformer基于梯度学习动态的理论分析揭示了Transformer能够学习邻接矩阵和有限形式的可达矩阵。这些理论洞见随后通过实验得到验证，实验证明Transformer确实学习了邻接矩阵和不完整的可达矩阵，这与我们理论分析中的预测一致。此外，当将我们的方法应用于名为Blocksworld的真实世界规划基准时，我们的观察结果保持一致。我们的理论和实证分析进一步揭示了Transformer在路径查找中的潜在限制：它无法通过传递性识别可达关系，因此在需要路径串联来生成路径时会失败。总之，我们的研究结果为我们了解自回归学习内部机制如何在网络中实现规划提供了新视角。这项研究可能有助于我们理解其他相关领域的一般规划能力。

更新时间: 2024-05-27 05:25:05

领域: cs.LG,cs.AI,cs.CL

下载: http://arxiv.org/abs/2405.09220v2

On the Distance from Calibration in Sequential Prediction

We study a sequential binary prediction setting where the forecaster is evaluated in terms of the calibration distance, which is defined as the $L_1$ distance between the predicted values and the set of predictions that are perfectly calibrated in hindsight. This is analogous to a calibration measure recently proposed by B{\l}asiok, Gopalan, Hu and Nakkiran (STOC 2023) for the offline setting. The calibration distance is a natural and intuitive measure of deviation from perfect calibration, and satisfies a Lipschitz continuity property which does not hold for many popular calibration measures, such as the $L_1$ calibration error and its variants. We prove that there is a forecasting algorithm that achieves an $O(\sqrt{T})$ calibration distance in expectation on an adversarially chosen sequence of $T$ binary outcomes. At the core of this upper bound is a structural result showing that the calibration distance is accurately approximated by the lower calibration distance, which is a continuous relaxation of the former. We then show that an $O(\sqrt{T})$ lower calibration distance can be achieved via a simple minimax argument and a reduction to online learning on a Lipschitz class. On the lower bound side, an $\Omega(T^{1/3})$ calibration distance is shown to be unavoidable, even when the adversary outputs a sequence of independent random bits, and has an additional ability to early stop (i.e., to stop producing random bits and output the same bit in the remaining steps). Interestingly, without this early stopping, the forecaster can achieve a much smaller calibration distance of $\mathrm{polylog}(T)$.

Updated: 2024-05-27 05:25:05

标题: 关于顺序预测中校准距离的研究

摘要: 我们研究了一个顺序二进制预测设置，在这个设置中，预测者根据校准距离进行评估，校准距离被定义为预测值与事后完全校准预测集之间的$L_1$距离。这类似于最近由B{\l}asiok、Gopalan、Hu和Nakkiran (STOC 2023)提出的离线设置中的校准度量。校准距离是一种自然直观的衡量完美校准偏差的方法，并满足一个Lipschitz连续性性质，这种性质对于许多流行的校准度量，如$L_1$校准误差及其变体，是不成立的。我们证明存在一种预测算法，在对手选择的$T$个二进制结果序列上，能够实现期望中的$O(\sqrt{T})$校准距离。这个上界的核心是一个结构性结果，显示校准距离可以通过较低的校准距离准确近似，后者是前者的连续放宽。然后我们证明，可以通过简单的极小化论证和将其归约到一个Lipschitz类的在线学习，实现$O(\sqrt{T})$的较低校准距离。在下界方面，我们展示了一个$\Omega(T^{1/3})$的校准距离是不可避免的，即使对手输出一系列独立的随机比特，并且具有提前停止的额外能力（即，在剩余步骤中停止生成随机比特并输出相同的比特）。有趣的是，没有这种提前停止，预测者可以实现一个较小的$\mathrm{polylog}(T)$的校准距离。

更新时间: 2024-05-27 05:25:05

领域: cs.LG,cs.DS,stat.ML

下载: http://arxiv.org/abs/2402.07458v2

Enhancing Accuracy in Generative Models via Knowledge Transfer

This paper investigates the accuracy of generative models and the impact of knowledge transfer on their generation precision. Specifically, we examine a generative model for a target task, fine-tuned using a pre-trained model from a source task. Building on the "Shared Embedding" concept, which bridges the source and target tasks, we introduce a novel framework for transfer learning under distribution metrics such as the Kullback-Leibler divergence. This framework underscores the importance of leveraging inherent similarities between diverse tasks despite their distinct data distributions. Our theory suggests that the shared structures can augment the generation accuracy for a target task, reliant on the capability of a source model to identify shared structures and effective knowledge transfer from source to target learning. To demonstrate the practical utility of this framework, we explore the theoretical implications for two specific generative models: diffusion and normalizing flows. The results show enhanced performance in both models over their non-transfer counterparts, indicating advancements for diffusion models and providing fresh insights into normalizing flows in transfer and non-transfer settings. These results highlight the significant contribution of knowledge transfer in boosting the generation capabilities of these models.

Updated: 2024-05-27 05:10:49

标题: 通过知识转移提高生成模型的准确性

摘要: 本文研究了生成模型的准确性以及知识转移对它们生成精度的影响。具体来说，我们研究了一个用于目标任务的生成模型，通过使用来自源任务的预训练模型进行微调。基于“共享嵌入”概念，该概念连接了源任务和目标任务，我们引入了一个新颖的基于分布度量（如Kullback-Leibler散度）的迁移学习框架。这个框架强调了在不同数据分布的任务之间利用内在相似性的重要性。我们的理论表明，共享结构可以增强目标任务的生成准确性，依赖于源模型识别共享结构的能力以及从源到目标学习的有效知识转移。为了展示这一框架的实际效用，我们探讨了两种特定生成模型的理论影响：扩散和正则流模型。结果显示，这两种模型在转移对照组上表现出了增强的性能，表明了扩散模型的进步，并为正则流在转移和非转移环境中提供了新的见解。这些结果突显了知识转移在提升这些模型的生成能力方面的重要贡献。

更新时间: 2024-05-27 05:10:49

领域: stat.ML,cs.LG

下载: http://arxiv.org/abs/2405.16837v1

Enhancing Fast Feed Forward Networks with Load Balancing and a Master Leaf Node

Fast feedforward networks (FFFs) are a class of neural networks that exploit the observation that different regions of the input space activate distinct subsets of neurons in wide networks. FFFs partition the input space into separate sections using a differentiable binary tree of neurons and during inference descend the binary tree in order to improve computational efficiency. Inspired by Mixture of Experts (MoE) research, we propose the incorporation of load balancing and Master Leaf techniques into the FFF architecture to improve performance and simplify the training process. We reproduce experiments found in literature and present results on FFF models enhanced using these techniques. The proposed architecture and training recipe achieves up to 16.3% and 3% absolute classification accuracy increase in training and test accuracy, respectively, compared to the original FFF architecture. Additionally, we observe a smaller variance in the results compared to those reported in prior research. These findings demonstrate the potential of integrating MoE-inspired techniques into FFFs for developing more accurate and efficient models.

Updated: 2024-05-27 05:06:24

标题: 通过负载平衡和主要叶节点增强快速前馈网络

摘要: 快速前馈网络（FFFs）是一类利用观察到输入空间的不同区域在广泛网络中激活不同神经元子集的神经网络。 FFFs利用可微分的神经元二叉树将输入空间划分为单独的部分，并在推理过程中沿着二叉树下降以提高计算效率。受到专家混合（MoE）研究的启发，我们提出将负载平衡和Master Leaf技术纳入FFF体系结构以提高性能并简化训练过程。我们重现了文献中发现的实验，并展示了使用这些技术增强的FFF模型的结果。与原始FFF架构相比，所提出的架构和训练配方在训练和测试精度上分别实现了高达16.3％和3％的绝对分类准确度提高。此外，与先前研究中报告的结果相比，我们观察到结果的差异较小。这些发现表明了将MoE启发技术整合到FFF中以开发更准确和高效模型的潜力。

更新时间: 2024-05-27 05:06:24

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2405.16836v1

Safe LoRA: the Silver Lining of Reducing Safety Risks when Fine-tuning Large Language Models

While large language models (LLMs) such as Llama-2 or GPT-4 have shown impressive zero-shot performance, fine-tuning is still necessary to enhance their performance for customized datasets, domain-specific tasks, or other private needs. However, fine-tuning all parameters of LLMs requires significant hardware resources, which can be impractical for typical users. Therefore, parameter-efficient fine-tuning such as LoRA have emerged, allowing users to fine-tune LLMs without the need for considerable computing resources, with little performance degradation compared to fine-tuning all parameters. Unfortunately, recent studies indicate that fine-tuning can increase the risk to the safety of LLMs, even when data does not contain malicious content. To address this challenge, we propose Safe LoRA, a simple one-liner patch to the original LoRA implementation by introducing the projection of LoRA weights from selected layers to the safety-aligned subspace, effectively reducing the safety risks in LLM fine-tuning while maintaining utility. It is worth noting that Safe LoRA is a training-free and data-free approach, as it only requires the knowledge of the weights from the base and aligned LLMs. Our extensive experiments demonstrate that when fine-tuning on purely malicious data, Safe LoRA retains similar safety performance as the original aligned model. Moreover, when the fine-tuning dataset contains a mixture of both benign and malicious data, Safe LoRA mitigates the negative effect made by malicious data while preserving performance on downstream tasks.

Updated: 2024-05-27 05:04:05

标题: 安全的LoRA：在微调大型语言模型时减少安全风险的好处

摘要: 尽管大型语言模型（LLMs）如Llama-2或GPT-4展示出了令人印象深刻的零次迁移性能，但仍然需要微调以增强它们在定制数据集、领域特定任务或其他私人需求中的性能。然而，微调LLMs的所有参数需要大量硬件资源，这对于普通用户来说可能是不切实际的。因此，出现了像LoRA这样的参数高效微调方法，允许用户在不需要大量计算资源的情况下微调LLMs，与微调所有参数相比，性能下降很小。然而，最近的研究表明，即使数据不包含恶意内容，微调也会增加LLMs的安全风险。为了解决这一挑战，我们提出了Safe LoRA，这是对原始LoRA实现的一个简单的一行修补，通过将LoRA权重从选定的层投影到与安全对齐的子空间，有效降低LLM微调中的安全风险，同时保持效用。值得注意的是，Safe LoRA是一种无需训练和数据的方法，因为它只需要基础和对齐LLMs的权重知识。我们的广泛实验证明，当在纯恶意数据上微调时，Safe LoRA保持了与原始对齐模型相似的安全性能。此外，当微调数据集包含良性和恶意数据的混合时，Safe LoRA减轻了恶意数据造成的负面影响，同时保持了对下游任务的性能。

更新时间: 2024-05-27 05:04:05

领域: cs.LG

下载: http://arxiv.org/abs/2405.16833v1

AUTONODE: A Neuro-Graphic Self-Learnable Engine for Cognitive GUI Automation

In recent advancements within the domain of Large Language Models (LLMs), there has been a notable emergence of agents capable of addressing Robotic Process Automation (RPA) challenges through enhanced cognitive capabilities and sophisticated reasoning. This development heralds a new era of scalability and human-like adaptability in goal attainment. In this context, we introduce AUTONODE (Autonomous User-interface Transformation through Online Neuro-graphic Operations and Deep Exploration). AUTONODE employs advanced neuro-graphical techniques to facilitate autonomous navigation and task execution on web interfaces, thereby obviating the necessity for predefined scripts or manual intervention. Our engine empowers agents to comprehend and implement complex workflows, adapting to dynamic web environments with unparalleled efficiency. Our methodology synergizes cognitive functionalities with robotic automation, endowing AUTONODE with the ability to learn from experience. We have integrated an exploratory module, DoRA (Discovery and mapping Operation for graph Retrieval Agent), which is instrumental in constructing a knowledge graph that the engine utilizes to optimize its actions and achieve objectives with minimal supervision. The versatility and efficacy of AUTONODE are demonstrated through a series of experiments, highlighting its proficiency in managing a diverse array of web-based tasks, ranging from data extraction to transaction processing.

Updated: 2024-05-27 05:03:09

标题: AUTONODE：一种用于认知GUI自动化的神经图形自学习引擎

摘要: 在大型语言模型（LLMs）领域的最新进展中，出现了一批能够通过增强的认知能力和复杂的推理来解决机器人流程自动化（RPA）挑战的代理人。这一发展预示着在目标实现方面出现了可扩展性和类人适应性的新时代。在这种背景下，我们介绍了AUTONODE（通过在线神经图操作和深度探索实现自主用户界面转换）。AUTONODE采用先进的神经图技术，以促进对网络界面的自主导航和任务执行，从而消除了对预定义脚本或手动干预的必要性。我们的引擎赋予代理人理解和实施复杂工作流程的能力，以无与伦比的效率适应动态网络环境。我们的方法将认知功能与机器人自动化相结合，赋予AUTONODE从经验中学习的能力。我们已经集成了一个探索模块DoRA（用于图检索代理的发现和映射操作），该模块有助于构建一个知识图，引擎利用该图来优化其行动并在最少监督下实现目标。通过一系列实验展示了AUTONODE的多功能性和高效性，突出了它在处理各种基于网络的任务（从数据提取到交易处理）中的熟练程度。

更新时间: 2024-05-27 05:03:09

领域: cs.AI,cs.CV

下载: http://arxiv.org/abs/2403.10171v2

Structured Graph Network for Constrained Robot Crowd Navigation with Low Fidelity Simulation

We investigate the feasibility of deploying reinforcement learning (RL) policies for constrained crowd navigation using a low-fidelity simulator. We introduce a representation of the dynamic environment, separating human and obstacle representations. Humans are represented through detected states, while obstacles are represented as computed point clouds based on maps and robot localization. This representation enables RL policies trained in a low-fidelity simulator to deploy in real world with a reduced sim2real gap. Additionally, we propose a spatio-temporal graph to model the interactions between agents and obstacles. Based on the graph, we use attention mechanisms to capture the robot-human, human-human, and human-obstacle interactions. Our method significantly improves navigation performance in both simulated and real-world environments. Video demonstrations can be found at https://sites.google.com/view/constrained-crowdnav/home.

Updated: 2024-05-27 04:53:09

标题: 受限机器人群体导航的结构化图网络与低保真仿真

摘要: 我们研究了利用低保真度模拟器部署强化学习（RL）策略来进行受限人群导航的可行性。我们引入了一个动态环境的表示，将人类和障碍物分开表示。人类通过检测状态表示，而障碍物则根据地图和机器人定位计算的点云表示。这种表示使得在低保真度模拟器中训练的RL策略可以在现实世界中部署，减少了模拟到现实的差距。此外，我们提出了一个时空图来模拟代理和障碍物之间的交互。基于该图，我们使用注意机制来捕捉机器人-人类、人类-人类和人类-障碍物之间的交互。我们的方法显著提高了在模拟和现实环境中的导航性能。视频演示可以在以下网址找到：https://sites.google.com/view/constrained-crowdnav/home.

更新时间: 2024-05-27 04:53:09

领域: cs.RO,cs.AI,cs.LG

下载: http://arxiv.org/abs/2405.16830v1

EDEFuzz: A Web API Fuzzer for Excessive Data Exposures

APIs often transmit far more data to client applications than they need, and in the context of web applications, often do so over public channels. This issue, termed Excessive Data Exposure (EDE), was OWASP's third most significant API vulnerability of 2019. However, there are few automated tools -- either in research or industry -- to effectively find and remediate such issues. This is unsurprising as the problem lacks an explicit test oracle: the vulnerability does not manifest through explicit abnormal behaviours (e.g., program crashes or memory access violations). In this work, we develop a metamorphic relation to tackle that challenge and build the first fuzzing tool -- that we call EDEFuzz -- to systematically detect EDEs. EDEFuzz can significantly reduce false negatives that occur during manual inspection and ad-hoc text-matching techniques, the current most-used approaches. We tested EDEFuzz against the sixty-nine applicable targets from the Alexa Top-200 and found 33,365 potential leaks -- illustrating our tool's broad applicability and scalability. In a more-tightly controlled experiment of eight popular websites in Australia, EDEFuzz achieved a high true positive rate of 98.65% with minimal configuration, illustrating our tool's accuracy and efficiency.

Updated: 2024-05-27 04:49:43

标题: EDEFuzz：用于大量数据暴露的Web API模糊测试工具

摘要: API通常向客户端应用程序传输比其所需更多的数据，并且在Web应用程序的上下文中，通常通过公共通道传输这些数据。这个问题被称为过度数据暴露（EDE），是OWASP在2019年最重要的API漏洞之一。然而，在研究或工业领域中，很少有自动化工具能够有效地发现和解决此类问题。这并不奇怪，因为这个问题缺乏明确的测试预示：漏洞不会通过明确的异常行为（例如程序崩溃或内存访问违规）显现。在这项工作中，我们开发了一个变态关系来应对这一挑战，并构建了第一个模糊测试工具——我们称之为EDEFuzz——以系统地检测EDE。EDEFuzz可以显著减少手动检查和临时文本匹配技术中出现的误报负数，这是目前最常用的方法。我们对Alexa Top-200中的69个适用目标进行了EDEFuzz测试，发现了33,365个潜在泄漏，说明了我们工具的广泛适用性和可扩展性。在澳大利亚八个热门网站的更严格控制的实验中，EDEFuzz以最少的配置实现了高达98.65％的真正阳性率，说明了我们工具的准确性和效率。

更新时间: 2024-05-27 04:49:43

领域: cs.CR

下载: http://arxiv.org/abs/2301.09258v2

Kernel-based optimally weighted conformal prediction intervals

Conformal prediction has been a popular distribution-free framework for uncertainty quantification. In this paper, we present a novel conformal prediction method for time-series, which we call Kernel-based Optimally Weighted Conformal Prediction Intervals (KOWCPI). Specifically, KOWCPI adapts the classic Reweighted Nadaraya-Watson (RNW) estimator for quantile regression on dependent data and learns optimal data-adaptive weights. Theoretically, we tackle the challenge of establishing a conditional coverage guarantee for non-exchangeable data under strong mixing conditions on the non-conformity scores. We demonstrate the superior performance of KOWCPI on real time-series against state-of-the-art methods, where KOWCPI achieves narrower confidence intervals without losing coverage.

Updated: 2024-05-27 04:49:41

标题: 基于核的最优加权符合预测区间

摘要: 共形预测已经成为一种流行的无分布框架，用于不确定性量化。在本文中，我们提出了一种新颖的用于时间序列的共形预测方法，称为基于核的最优加权共形预测区间（KOWCPI）。具体来说，KOWCPI调整了经典的Reweighted Nadaraya-Watson（RNW）估计器，用于依赖数据的分位数回归，并学习了最优的数据自适应权重。在理论上，我们解决了在非交换数据上建立条件覆盖保证的挑战，在非依从分数的强混合条件下。我们展示了KOWCPI在真实时间序列上相对于最先进方法的优越性能，其中KOWCPI实现了更窄的置信区间，而不会失去覆盖范围。

更新时间: 2024-05-27 04:49:41

领域: cs.LG,math.ST,stat.ML,stat.TH

下载: http://arxiv.org/abs/2405.16828v1

Unified Editing of Panorama, 3D Scenes, and Videos Through Disentangled Self-Attention Injection

While text-to-image models have achieved impressive capabilities in image generation and editing, their application across various modalities often necessitates training separate models. Inspired by existing method of single image editing with self attention injection and video editing with shared attention, we propose a novel unified editing framework that combines the strengths of both approaches by utilizing only a basic 2D image text-to-image (T2I) diffusion model. Specifically, we design a sampling method that facilitates editing consecutive images while maintaining semantic consistency utilizing shared self-attention features during both reference and consecutive image sampling processes. Experimental results confirm that our method enables editing across diverse modalities including 3D scenes, videos, and panorama images.

Updated: 2024-05-27 04:44:36

标题: 通过解离式自注意注入实现全景、3D场景和视频的统一编辑

摘要: 虽然文本到图像模型在图像生成和编辑方面取得了令人印象深刻的能力，但它们在各种模态之间的应用通常需要训练单独的模型。受现有的单图像编辑自注意注入和视频编辑共享注意力方法的启发，我们提出了一种新颖的统一编辑框架，通过仅利用基本的2D图像文本到图像(T2I)扩散模型，结合了这两种方法的优势。具体来说，我们设计了一种采样方法，通过在参考和连续图像采样过程中利用共享自注意力特征，实现对连续图像的编辑，同时保持语义一致性。实验证实，我们的方法能够跨多种模态进行编辑，包括3D场景、视频和全景图像。

更新时间: 2024-05-27 04:44:36

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2405.16823v1

Simplified Diffusion Schrödinger Bridge

This paper introduces a novel theoretical simplification of the Diffusion Schr\"odinger Bridge (DSB) that facilitates its unification with Score-based Generative Models (SGMs), addressing the limitations of DSB in complex data generation and enabling faster convergence and enhanced performance. By employing SGMs as an initial solution for DSB, our approach capitalizes on the strengths of both frameworks, ensuring a more efficient training process and improving the performance of SGM. We also propose a reparameterization technique that, despite theoretical approximations, practically improves the network's fitting capabilities. Our extensive experimental evaluations confirm the effectiveness of the simplified DSB, demonstrating its significant improvements. We believe the contributions of this work pave the way for advanced generative modeling. The code is available at https://github.com/checkcrab/SDSB.

Updated: 2024-05-27 04:44:22

标题: 简化扩散薛定谔桥

摘要: 这篇论文介绍了一种新颖的Diffusion Schr\"odinger Bridge（DSB）理论简化，使其能够与基于得分的生成模型（SGMs）统一，解决了DSB在复杂数据生成中的局限性，实现了更快的收敛速度和更好的性能。通过将SGMs作为DSB的初始解决方案，我们的方法充分利用了两种框架的优势，确保了更高效的训练过程，并提高了SGM的性能。我们还提出了一种重新参数化技术，尽管存在理论近似，但实际上提高了网络的拟合能力。我们进行了广泛的实验评估，证实了简化的DSB的有效性，并展示了其显著的改进。我们相信这项工作的贡献为先进的生成建模铺平了道路。代码可在https://github.com/checkcrab/SDSB 上找到。

更新时间: 2024-05-27 04:44:22

领域: cs.LG,cs.CV

下载: http://arxiv.org/abs/2403.14623v3

A Multi-Perspective Analysis of Memorization in Large Language Models

Large Language Models (LLMs), trained on massive corpora with billions of parameters, show unprecedented performance in various fields. Though surprised by their excellent performances, researchers also noticed some special behaviors of those LLMs. One of those behaviors is memorization, in which LLMs can generate the same content used to train them. Though previous research has discussed memorization, the memorization of LLMs still lacks explanation, especially the cause of memorization and the dynamics of generating them. In this research, we comprehensively discussed memorization from various perspectives and extended the discussion scope to not only just the memorized content but also less and unmemorized content. Through various studies, we found that: (1) Through experiments, we revealed the relation of memorization between model size, continuation size, and context size. Further, we showed how unmemorized sentences transition to memorized sentences. (2) Through embedding analysis, we showed the distribution and decoding dynamics across model size in embedding space for sentences with different memorization scores. The n-gram statistics analysis presents d (3) An analysis over n-gram and entropy decoding dynamics discovered a boundary effect when the model starts to generate memorized sentences or unmemorized sentences. (4)We trained a Transformer model to predict the memorization of different models, showing that it is possible to predict memorizations by context.

Updated: 2024-05-27 04:41:02

标题: 大型语言模型中记忆的多角度分析

摘要: 大型语言模型（LLMs）在训练时使用数十亿个参数的大型语料库，展示了在各个领域中前所未有的性能。尽管研究人员对它们的出色表现感到惊讶，但也注意到了这些LLMs的一些特殊行为。其中之一是记忆，即LLMs能够生成用于训练它们的相同内容。尽管先前的研究已经讨论过记忆，但LLMs的记忆仍然缺乏解释，特别是记忆的原因和生成它们的动态。在这项研究中，我们从各个角度全面讨论了记忆，并将讨论范围扩展到不仅仅是记忆的内容，还包括较少和未记忆的内容。通过各种研究，我们发现：（1）通过实验，我们揭示了模型大小、延续大小和上下文大小之间的记忆关系。此外，我们展示了未记忆句子如何过渡为记忆句子。（2）通过嵌入分析，我们展示了在嵌入空间中不同记忆分数的句子的模型大小的分布和解码动态。n-gram统计分析表明（3）当模型开始生成记忆句子或未记忆句子时，n-gram和熵解码动态发现了一个边界效应。（4）我们训练了一个Transformer模型来预测不同模型的记忆，表明可以通过上下文来预测记忆。

更新时间: 2024-05-27 04:41:02

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2405.11577v2

Laboratory-Scale AI: Open-Weight Models are Competitive with ChatGPT Even in Low-Resource Settings

The rapid proliferation of generative AI has raised questions about the competitiveness of lower-parameter, locally tunable, open-weight models relative to high-parameter, API-guarded, closed-weight models in terms of performance, domain adaptation, cost, and generalization. Centering under-resourced yet risk-intolerant settings in government, research, and healthcare, we see for-profit closed-weight models as incompatible with requirements for transparency, privacy, adaptability, and standards of evidence. Yet the performance penalty in using open-weight models, especially in low-data and low-resource settings, is unclear. We assess the feasibility of using smaller, open-weight models to replace GPT-4-Turbo in zero-shot, few-shot, and fine-tuned regimes, assuming access to only a single, low-cost GPU. We assess value-sensitive issues around bias, privacy, and abstention on three additional tasks relevant to those topics. We find that with relatively low effort, very low absolute monetary cost, and relatively little data for fine-tuning, small open-weight models can achieve competitive performance in domain-adapted tasks without sacrificing generality. We then run experiments considering practical issues in bias, privacy, and hallucination risk, finding that open models offer several benefits over closed models. We intend this work as a case study in understanding the opportunity cost of reproducibility and transparency over for-profit state-of-the-art zero shot performance, finding this cost to be marginal under realistic settings.

Updated: 2024-05-27 04:38:10

标题: 实验室规模的人工智能：开放权重模型即使在资源匮乏的情况下也能与ChatGPT竞争力相当

摘要: 生成式人工智能的快速增长引发了关于低参数、本地可调整、开放权重模型与高参数、API保护、封闭权重模型在性能、领域适应性、成本和泛化能力方面的竞争力的问题。以政府、研究和医疗保健等资源匮乏但风险不容忍的环境为中心，我们认为盈利闭合权重模型与透明度、隐私、适应性和证据标准要求不兼容。然而，在低数据和低资源环境中使用开放权重模型的性能惩罚尤其不明确。我们评估了使用更小的开放权重模型来取代GPT-4-Turbo在零样本、少样本和微调制度中的可行性，假设只能访问单个低成本GPU。我们评估了与偏见、隐私和弃权有关的价值敏感问题，并对与这些主题相关的另外三个任务进行了评估。我们发现，在相对较低的投入、非常低的绝对货币成本和相对较少的微调数据的情况下，小型开放权重模型可以在领域适应任务中取得竞争性性能，而不会牺牲泛化能力。然后，我们进行了实验，考虑了偏见、隐私和幻觉风险方面的实际问题，发现开放模型相对于封闭模型有几个优点。我们将这项工作视为一项案例研究，以了解在盈利为最新的零样本性能而牺牲可重复性和透明度的机会成本，并在现实环境下发现这种成本是边缘的。

更新时间: 2024-05-27 04:38:10

领域: cs.LG,cs.AI,cs.CY,cs.HC

下载: http://arxiv.org/abs/2405.16820v1

Federated Learning with Blockchain-Enhanced Machine Unlearning: A Trustworthy Approach

With the growing need to comply with privacy regulations and respond to user data deletion requests, integrating machine unlearning into IoT-based federated learning has become imperative. Traditional unlearning methods, however, often lack verifiable mechanisms, leading to challenges in establishing trust. This paper delves into the innovative integration of blockchain technology with federated learning to surmount these obstacles. Blockchain fortifies the unlearning process through its inherent qualities of immutability, transparency, and robust security. It facilitates verifiable certification, harmonizes security with privacy, and sustains system efficiency. We introduce a framework that melds blockchain with federated learning, thereby ensuring an immutable record of unlearning requests and actions. This strategy not only bolsters the trustworthiness and integrity of the federated learning model but also adeptly addresses efficiency and security challenges typical in IoT environments. Our key contributions encompass a certification mechanism for the unlearning process, the enhancement of data security and privacy, and the optimization of data management to ensure system responsiveness in IoT scenarios.

Updated: 2024-05-27 04:35:49

标题: 使用区块链增强的机器遗忘的联邦学习：一种可信的方法

摘要: 随着对隐私法规的遵守需求不断增长，并且需要响应用户数据删除请求，将机器遗忘集成到基于物联网的联邦学习中已成为必要。然而，传统的遗忘方法通常缺乏可验证的机制，导致建立信任面临挑战。本文深入探讨了将区块链技术与联邦学习创新地整合以克服这些障碍。区块链通过其不可变性、透明性和强大的安全性增强了遗忘过程。它促进了可验证的认证，将安全性与隐私保护协调一致，并维持了系统的效率。我们介绍了一个将区块链与联邦学习融合的框架，从而确保对遗忘请求和操作的不可变记录。这一策略不仅增强了联邦学习模型的可信度和完整性，还巧妙地解决了物联网环境中典型的效率和安全性挑战。我们的主要贡献包括遗忘过程的认证机制，数据安全和隐私的增强，以及优化数据管理，确保在物联网场景中系统的响应性。

更新时间: 2024-05-27 04:35:49

领域: cs.CR,cs.AI,cs.DC,cs.LG

下载: http://arxiv.org/abs/2405.20776v1

Automatic Domain Adaptation by Transformers in In-Context Learning

Selecting or designing an appropriate domain adaptation algorithm for a given problem remains challenging. This paper presents a Transformer model that can provably approximate and opt for domain adaptation methods for a given dataset in the in-context learning framework, where a foundation model performs new tasks without updating its parameters at test time. Specifically, we prove that Transformers can approximate instance-based and feature-based unsupervised domain adaptation algorithms and automatically select an algorithm suited for a given dataset. Numerical results indicate that in-context learning demonstrates an adaptive domain adaptation surpassing existing methods.

Updated: 2024-05-27 04:33:53

标题: Transformers在上下文学习中的自动领域适应

摘要: 选择或设计适合特定问题的域自适应算法仍然具有挑战性。本文介绍了一种Transformer模型，可以在上下文学习框架中，可证明地逼近并选择适用于给定数据集的领域自适应方法，其中一个基础模型在测试时执行新任务而无需更新其参数。具体来说，我们证明了Transformers可以逼近基于实例和基于特征的无监督领域自适应算法，并自动选择适合给定数据集的算法。数值结果表明，在上下文学习中展示了超越现有方法的自适应领域自适应能力。

更新时间: 2024-05-27 04:33:53

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2405.16819v1

INSIGHT: End-to-End Neuro-Symbolic Visual Reinforcement Learning with Language Explanations

Neuro-symbolic reinforcement learning (NS-RL) has emerged as a promising paradigm for explainable decision-making, characterized by the interpretability of symbolic policies. NS-RL entails structured state representations for tasks with visual observations, but previous methods are unable to refine the structured states with rewards due to a lack of efficiency. Accessibility also remains to be an issue, as extensive domain knowledge is required to interpret symbolic policies. In this paper, we present a framework for learning structured states and symbolic policies jointly, whose key idea is to distill vision foundation models into a scalable perception module and refine it during policy learning. Moreover, we design a pipeline to generate language explanations for policies and decisions using large language models. In experiments on nine Atari tasks, we verify the efficacy of our approach, and we also present explanations for policies and decisions.

Updated: 2024-05-27 04:30:01

标题: 洞察：具有语言解释的端到端神经符号视觉强化学习

摘要: 神经符号强化学习（NS-RL）已经成为一种有前途的可解释决策制定范式，其特点是符号策略的可解释性。NS-RL包括针对具有视觉观察的任务的结构化状态表示，但是先前的方法由于缺乏效率而无法通过奖励来完善结构化状态。可访问性也仍然是一个问题，因为需要广泛的领域知识来解释符号策略。在本文中，我们提出了一个学习结构化状态和符号策略的框架，其关键思想是将视觉基础模型提炼为可扩展的感知模块，并在策略学习过程中对其进行完善。此外，我们设计了一个流水线来使用大型语言模型为策略和决策生成语言解释。在九个Atari任务的实验中，我们验证了我们方法的有效性，并提供了策略和决策的解释。

更新时间: 2024-05-27 04:30:01

领域: cs.AI

下载: http://arxiv.org/abs/2403.12451v2

G-Retriever: Retrieval-Augmented Generation for Textual Graph Understanding and Question Answering

Given a graph with textual attributes, we enable users to `chat with their graph': that is, to ask questions about the graph using a conversational interface. In response to a user's questions, our method provides textual replies and highlights the relevant parts of the graph. While existing works integrate large language models (LLMs) and graph neural networks (GNNs) in various ways, they mostly focus on either conventional graph tasks (such as node, edge, and graph classification), or on answering simple graph queries on small or synthetic graphs. In contrast, we develop a flexible question-answering framework targeting real-world textual graphs, applicable to multiple applications including scene graph understanding, common sense reasoning, and knowledge graph reasoning. Toward this goal, we first develop a Graph Question Answering (GraphQA) benchmark with data collected from different tasks. Then, we propose our G-Retriever method, introducing the first retrieval-augmented generation (RAG) approach for general textual graphs, which can be fine-tuned to enhance graph understanding via soft prompting. To resist hallucination and to allow for textual graphs that greatly exceed the LLM's context window size, G-Retriever performs RAG over a graph by formulating this task as a Prize-Collecting Steiner Tree optimization problem. Empirical evaluations show that our method outperforms baselines on textual graph tasks from multiple domains, scales well with larger graph sizes, and mitigates hallucination.~\footnote{Our codes and datasets are available at: \url{https://github.com/XiaoxinHe/G-Retriever}}

Updated: 2024-05-27 04:04:40

标题: G-Retriever：用于文本图理解和问答的检索增强生成模型

摘要: 鉴于一个具有文本属性的图形，我们使用户能够“与他们的图形交谈”：也就是说，使用对话界面向图形提出问题。针对用户的问题，我们的方法提供文本回复并突出显示图形的相关部分。虽然现有的作品以各种方式整合大型语言模型（LLMs）和图神经网络（GNNs），但它们大多集中在传统的图任务（如节点、边缘和图分类）上，或者在小型或合成图上回答简单的图查询。相比之下，我们开发了一个灵活的问答框架，针对现实世界的文本图形，适用于包括场景图理解、常识推理和知识图推理在内的多个应用。为了实现这一目标，我们首先开发了一个从不同任务中收集数据的图问题回答（GraphQA）基准。然后，我们提出了我们的G-Retriever方法，引入了第一个适用于一般文本图形的检索增强生成（RAG）方法，可以进行微调以通过软提示增强图形理解。为了抵抗幻觉并允许文本图形大大超过LLM的上下文窗口大小，G-Retriever通过将这个任务形式化为一个奖励收集斯坦纳树优化问题，在图上执行RAG。实证评估表明，我们的方法在多个领域的文本图任务中优于基线，在更大的图尺寸上表现良好，并减轻了幻觉。【我们的代码和数据集可在以下网址找到：\url{https://github.com/XiaoxinHe/G-Retriever}】

更新时间: 2024-05-27 04:04:40

领域: cs.LG

下载: http://arxiv.org/abs/2402.07630v3

Trajectory Data Suffices for Statistically Efficient Learning in Offline RL with Linear $q^π$-Realizability and Concentrability

We consider offline reinforcement learning (RL) in $H$-horizon Markov decision processes (MDPs) under the linear $q^\pi$-realizability assumption, where the action-value function of every policy is linear with respect to a given $d$-dimensional feature function. The hope in this setting is that learning a good policy will be possible without requiring a sample size that scales with the number of states in the MDP. Foster et al. [2021] have shown this to be impossible even under $\textit{concentrability}$, a data coverage assumption where a coefficient $C_\text{conc}$ bounds the extent to which the state-action distribution of any policy can veer off the data distribution. However, the data in this previous work was in the form of a sequence of individual transitions. This leaves open the question of whether the negative result mentioned could be overcome if the data was composed of sequences of full trajectories. In this work we answer this question positively by proving that with trajectory data, a dataset of size $\text{poly}(d,H,C_\text{conc})/\epsilon^2$ is sufficient for deriving an $\epsilon$-optimal policy, regardless of the size of the state space. The main tool that makes this result possible is due to Weisz et al. [2023], who demonstrate that linear MDPs can be used to approximate linearly $q^\pi$-realizable MDPs. The connection to trajectory data is that the linear MDP approximation relies on "skipping" over certain states. The associated estimation problems are thus easy when working with trajectory data, while they remain nontrivial when working with individual transitions. The question of computational efficiency under our assumptions remains open.

Updated: 2024-05-27 03:59:13

标题: 轨迹数据足以在具有线性$q^π$可实现性和集中性的离线强化学习中实现统计高效学习

摘要: 我们考虑在线强化学习（RL）在$H$-horizon 马尔可夫决策过程（MDPs）中，基于线性$q^\pi$-实现假设，其中每个策略的动作值函数与给定的$d$维特征函数是线性的。在这种设置中，希望能够学习到一个好的策略，而不需要样本量随着MDP中状态数量的增加而扩展。Foster等人[2021]已经证明，即使在$\textit{concentrability}$下，即数据覆盖的假设下，任何策略的状态-动作分布偏离数据分布的程度由系数$C_\text{conc}$限制，这是不可能的。然而，先前工作中的数据是以单个转换序列的形式存在。这引发了一个问题，即如果数据由完整轨迹序列组成，是否可以克服之前提到的负面结果。在这项工作中，我们通过证明，使用轨迹数据，大小为$\text{poly}(d,H,C_\text{conc})/\epsilon^2$的数据集就足以得出一个$\epsilon$-最优策略，无论状态空间的大小如何。实现这一结果的主要工具归功于Weisz等人[2023]，他们证明线性MDP可以用来近似线性$q^\pi$-实现的MDP。与轨迹数据的联系在于，线性MDP逼近依赖于“跳过”某些状态。因此，在处理轨迹数据时，相关的估计问题变得容易，而在处理单个转换时则仍然非常复杂。在我们的假设下，计算效率的问题仍然悬而未决。

更新时间: 2024-05-27 03:59:13

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2405.16809v1

Extreme Compression of Adaptive Neural Images

Implicit Neural Representations (INRs) and Neural Fields are a novel paradigm for signal representation, from images and audio to 3D scenes and videos. The fundamental idea is to represent a signal as a continuous and differentiable neural network. This idea offers unprecedented benefits such as continuous resolution and memory efficiency, enabling new compression techniques. However, representing data as neural networks poses new challenges. For instance, given a 2D image as a neural network, how can we further compress such a neural image?. In this work, we present a novel analysis on compressing neural fields, with the focus on images. We also introduce Adaptive Neural Images (ANI), an efficient neural representation that enables adaptation to different inference or transmission requirements. Our proposed method allows to reduce the bits-per-pixel (bpp) of the neural image by 4x, without losing sensitive details or harming fidelity. We achieve this thanks to our successful implementation of 4-bit neural representations. Our work offers a new framework for developing compressed neural fields.

Updated: 2024-05-27 03:54:09

标题: 自适应神经图像的极端压缩

摘要: Implicit Neural Representations (INRs)和Neural Fields是一种新颖的信号表示范式，从图像和音频到3D场景和视频。基本思想是将信号表示为连续且可微的神经网络。这个想法带来了前所未有的好处，如连续分辨率和内存效率，从而实现新的压缩技术。然而，将数据表示为神经网络也带来了新的挑战。例如，给定一个2D图像作为神经网络，我们如何进一步压缩这样一个神经图像？在这项工作中，我们提出了一个关于压缩神经场的新颖分析，重点放在图像上。我们还介绍了Adaptive Neural Images (ANI)，这是一种高效的神经表示，可以适应不同的推理或传输需求。我们提出的方法可以将神经图像的比特数每像素减少4倍，而不会丢失敏感细节或损害保真度。我们之所以能够实现这一点，要归功于我们成功实现了4位神经表示。我们的工作为开发压缩神经场提供了一个新的框架。

更新时间: 2024-05-27 03:54:09

领域: cs.CV,cs.AI,cs.GR,cs.MM

下载: http://arxiv.org/abs/2405.16807v1

Invisible Relevance Bias: Text-Image Retrieval Models Prefer AI-Generated Images

With the advancement of generation models, AI-generated content (AIGC) is becoming more realistic, flooding the Internet. A recent study suggests that this phenomenon causes source bias in text retrieval for web search. Specifically, neural retrieval models tend to rank generated texts higher than human-written texts. In this paper, we extend the study of this bias to cross-modal retrieval. Firstly, we successfully construct a suitable benchmark to explore the existence of the bias. Subsequent extensive experiments on this benchmark reveal that AI-generated images introduce an invisible relevance bias to text-image retrieval models. Specifically, our experiments show that text-image retrieval models tend to rank the AI-generated images higher than the real images, even though the AI-generated images do not exhibit more visually relevant features to the query than real images. This invisible relevance bias is prevalent across retrieval models with varying training data and architectures. Furthermore, our subsequent exploration reveals that the inclusion of AI-generated images in the training data of the retrieval models exacerbates the invisible relevance bias. The above phenomenon triggers a vicious cycle, which makes the invisible relevance bias become more and more serious. To elucidate the potential causes of invisible relevance and address the aforementioned issues, we introduce an effective training method aimed at alleviating the invisible relevance bias. Subsequently, we apply our proposed debiasing method to retroactively identify the causes of invisible relevance, revealing that the AI-generated images induce the image encoder to embed additional information into their representation. This information exhibits a certain consistency across generated images with different semantics and can make the retriever estimate a higher relevance score.

Updated: 2024-05-27 03:53:05

标题: 看不见的相关性偏见：文本-图像检索模型更偏爱人工智能生成的图像

摘要: 随着生成模型的进步，人工智能生成的内容（AIGC）变得越来越逼真，充斥着互联网。最近的一项研究表明，这种现象导致了网络搜索文本检索中的信息源偏见。具体而言，神经检索模型往往会将生成的文本排名高于人类编写的文本。本文将这种偏见的研究扩展到跨模态检索。首先，我们成功构建了一个适合探索这种偏见存在的基准。随后在这个基准上进行的大量实验揭示了人工智能生成的图像在文本-图像检索模型中引入了看不见的相关性偏见。具体来说，我们的实验表明，文本-图像检索模型倾向于将人工智能生成的图像排名高于真实图像，即使人工智能生成的图像在视觉上并不比真实图像展示更相关的特征给查询。这种看不见的相关性偏见在具有不同训练数据和架构的检索模型中普遍存在。此外，我们的后续探索揭示了将人工智能生成的图像包含在检索模型的训练数据中会加剧看不见的相关性偏见。上述现象触发了一种恶性循环，使得看不见的相关性偏见变得越来越严重。为了阐明看不见的相关性的潜在原因并解决上述问题，我们引入了一种旨在缓解看不见的相关性偏见的有效训练方法。随后，我们将我们提出的去偏见方法应用于事后识别看不见的相关性的原因，揭示人工智能生成的图像会导致图像编码器将额外信息嵌入到它们的表示中。这些信息在具有不同语义的生成图像中表现出一定的一致性，并且可以使检索器估计出更高的相关性得分。

更新时间: 2024-05-27 03:53:05

领域: cs.IR,cs.AI,cs.CV

下载: http://arxiv.org/abs/2311.14084v4

Entity Alignment with Noisy Annotations from Large Language Models

Entity alignment (EA) aims to merge two knowledge graphs (KGs) by identifying equivalent entity pairs. While existing methods heavily rely on human-generated labels, it is prohibitively expensive to incorporate cross-domain experts for annotation in real-world scenarios. The advent of Large Language Models (LLMs) presents new avenues for automating EA with annotations, inspired by their comprehensive capability to process semantic information. However, it is nontrivial to directly apply LLMs for EA since the annotation space in real-world KGs is large. LLMs could also generate noisy labels that may mislead the alignment. To this end, we propose a unified framework, LLM4EA, to effectively leverage LLMs for EA. Specifically, we design a novel active learning policy to significantly reduce the annotation space by prioritizing the most valuable entities based on the entire inter-KG and intra-KG structure. Moreover, we introduce an unsupervised label refiner to continuously enhance label accuracy through in-depth probabilistic reasoning. We iteratively optimize the policy based on the feedback from a base EA model. Extensive experiments demonstrate the advantages of LLM4EA on four benchmark datasets in terms of effectiveness, robustness, and efficiency.

Updated: 2024-05-27 03:52:55

标题: 使用大型语言模型中的嘈杂标注进行实体对齐

摘要: 实体对齐（EA）旨在通过识别等价实体对来合并两个知识图谱（KGs）。现有方法严重依赖人工生成的标签，但在现实场景中为跨领域专家进行注释是成本高昂的。大型语言模型（LLMs）的出现为利用注释自动化EA提供了新的途径，受其处理语义信息的综合能力启发。然而，直接应用LLMs进行EA并非简单，因为现实世界KGs中的注释空间很大。LLMs也可能产生会误导对齐的嘈杂标签。因此，我们提出了一个统一框架LLM4EA，以有效利用LLMs进行EA。具体来说，我们设计了一种新颖的主动学习策略，通过根据整个KG和内部KG结构优先考虑最有价值的实体，显著减少注释空间。此外，我们引入了一个无监督标签细化器，通过深入的概率推理持续增强标签准确性。我们根据基础EA模型的反馈迭代优化策略。大量实验证明了LLM4EA在效果、鲁棒性和效率方面在四个基准数据集上的优势。

更新时间: 2024-05-27 03:52:55

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2405.16806v1

Gradient Compressed Sensing: A Query-Efficient Gradient Estimator for High-Dimensional Zeroth-Order Optimization

We study nonconvex zeroth-order optimization (ZOO) in a high-dimensional space $\mathbb R^d$ for functions with approximately $s$-sparse gradients. To reduce the dependence on the dimensionality $d$ in the query complexity, high-dimensional ZOO methods seek to leverage gradient sparsity to design gradient estimators. The previous best method needs $O\big(s\log\frac ds\big)$ queries per step to achieve $O\big(\frac1T\big)$ rate of convergence w.r.t. the number T of steps. In this paper, we propose *Gradient Compressed Sensing* (GraCe), a query-efficient and accurate estimator for sparse gradients that uses only $O\big(s\log\log\frac ds\big)$ queries per step and still achieves $O\big(\frac1T\big)$ rate of convergence. To our best knowledge, we are the first to achieve a *double-logarithmic* dependence on $d$ in the query complexity under weaker assumptions. Our proposed GraCe generalizes the Indyk--Price--Woodruff (IPW) algorithm in compressed sensing from linear measurements to nonlinear functions. Furthermore, since the IPW algorithm is purely theoretical due to its impractically large constant, we improve the IPW algorithm via our *dependent random partition* technique together with our corresponding novel analysis and successfully reduce the constant by a factor of nearly 4300. Our GraCe is not only theoretically query-efficient but also achieves strong empirical performance. We benchmark our GraCe against 12 existing ZOO methods with 10000-dimensional functions and demonstrate that GraCe significantly outperforms existing methods.

Updated: 2024-05-27 03:52:53

标题: 梯度压缩感知：一种用于高维零阶优化的查询高效梯度估计器

摘要: 我们研究在具有近似$s$-稀疏梯度的函数的高维空间$\mathbb R^d$中的非凸零阶优化（ZOO）。为了减少对维度$d$的查询复杂性的依赖，高维ZOO方法试图利用梯度稀疏性设计梯度估计器。先前最佳方法每步需要$O\big(s\log\frac ds\big)$个查询，以实现与步数$T$相关的$O\big(\frac1T\big)$的收敛速度。在本文中，我们提出了*梯度压缩感知*（GraCe），这是一种查询高效且准确的稀疏梯度估计器，每步仅使用$O\big(s\log\log\frac ds\big)$个查询，仍然实现$O\big(\frac1T\big)$的收敛速度。据我们所知，我们是第一个在较弱假设下实现对维度$d$的查询复杂性具有*双对数*依赖的研究。我们提出的GraCe将压缩感知中的Indyk-Price-Woodruff（IPW）算法从线性测量推广到非线性函数。此外，由于IPW算法在实践中具有不切实际的大常数，我们通过我们的*依赖随机分区*技术以及相应的新型分析改进了IPW算法，并成功将常数减少了近4300倍。我们的GraCe不仅在理论上是查询高效的，还取得了强大的实证表现。我们将我们的GraCe与12种现有的ZOO方法进行了基准测试，使用10000维函数，并展示了GraCe明显优于现有方法。

更新时间: 2024-05-27 03:52:53

领域: cs.LG

下载: http://arxiv.org/abs/2405.16805v1

Explore until Confident: Efficient Exploration for Embodied Question Answering

We consider the problem of Embodied Question Answering (EQA), which refers to settings where an embodied agent such as a robot needs to actively explore an environment to gather information until it is confident about the answer to a question. In this work, we leverage the strong semantic reasoning capabilities of large vision-language models (VLMs) to efficiently explore and answer such questions. However, there are two main challenges when using VLMs in EQA: they do not have an internal memory for mapping the scene to be able to plan how to explore over time, and their confidence can be miscalibrated and can cause the robot to prematurely stop exploration or over-explore. We propose a method that first builds a semantic map of the scene based on depth information and via visual prompting of a VLM - leveraging its vast knowledge of relevant regions of the scene for exploration. Next, we use conformal prediction to calibrate the VLM's question answering confidence, allowing the robot to know when to stop exploration - leading to a more calibrated and efficient exploration strategy. To test our framework in simulation, we also contribute a new EQA dataset with diverse, realistic human-robot scenarios and scenes built upon the Habitat-Matterport 3D Research Dataset (HM3D). Both simulated and real robot experiments show our proposed approach improves the performance and efficiency over baselines that do no leverage VLM for exploration or do not calibrate its confidence. Webpage with experiment videos and code: https://explore-eqa.github.io/

Updated: 2024-05-27 03:47:11

标题: 探索直至自信：面向具体问题回答的有效探索

摘要: 我们考虑了具身问答（EQA）问题，这指的是在这种情况下，一个具身代理，如机器人需要积极探索环境以收集信息，直到对问题的答案有信心。在这项工作中，我们利用大型视觉语言模型（VLMs）的强大语义推理能力来高效地探索和回答这类问题。然而，在EQA中使用VLMs时存在两个主要挑战：它们没有内部记忆来映射场景以规划如何随时间探索，而且它们的置信度可能被误校准，可能导致机器人过早停止探索或过度探索。我们提出了一种方法，首先基于深度信息和通过视觉提示VLM构建场景的语义地图，利用其对场景的相关区域的广泛知识进行探索。接下来，我们使用一致预测来校准VLM的问题回答置信度，使机器人知道何时停止探索，从而实现更加校准和高效的探索策略。为了在模拟中测试我们的框架，我们还贡献了一个新的EQA数据集，其中包含多样化、逼真的人机场景和基于Habitat-Matterport 3D研究数据集（HM3D）构建的场景。模拟和真实机器人实验显示我们提出的方法提高了性能和效率，超过了不利用VLM进行探索或不校准其置信度的基线。实验视频和代码网页：https://explore-eqa.github.io/

更新时间: 2024-05-27 03:47:11

领域: cs.RO,cs.AI,cs.CV,cs.LG

下载: http://arxiv.org/abs/2403.15941v2

AutoCV: Empowering Reasoning with Automated Process Labeling via Confidence Variation

In this work, we propose a novel method named \textbf{Auto}mated Process Labeling via \textbf{C}onfidence \textbf{V}ariation (\textbf{\textsc{AutoCV}}) to enhance the reasoning capabilities of large language models (LLMs) by automatically annotating the reasoning steps. Our approach begins by training a verification model on the correctness of final answers, enabling it to generate automatic process annotations. This verification model assigns a confidence score to each reasoning step, indicating the probability of arriving at the correct final answer from that point onward. We detect relative changes in the verification's confidence scores across reasoning steps to automatically annotate the reasoning process. This alleviates the need for numerous manual annotations or the high computational costs associated with model-induced annotation approaches. We experimentally validate that the confidence variations learned by the verification model trained on the final answer correctness can effectively identify errors in the reasoning steps. Subsequently, we demonstrate that the process annotations generated by \textsc{AutoCV} can improve the accuracy of the verification model in selecting the correct answer from multiple outputs generated by LLMs. Notably, we achieve substantial improvements across five datasets in mathematics and commonsense reasoning. The source code of \textsc{AutoCV} is available at \url{https://github.com/rookie-joe/AUTOCV}.

Updated: 2024-05-27 03:44:24

标题: AutoCV：通过置信度变化实现自动化过程标记的推理能力增强

摘要: 在这项工作中，我们提出了一种名为\textbf{Auto}mated Process Labeling via \textbf{C}onfidence \textbf{V}ariation（\textbf{\textsc{AutoCV}}）的新方法，通过自动标注推理步骤来增强大型语言模型（LLMs）的推理能力。我们的方法首先通过训练一个验证模型来判断最终答案的正确性，使其能够生成自动的过程注释。这个验证模型为每个推理步骤分配一个置信度分数，表示从那一点出发到达正确最终答案的概率。我们检测验证的置信度分数在推理步骤中的相对变化，以自动标注推理过程。这减轻了许多手动注释或与模型诱导注释方法相关的高计算成本的需求。我们通过实验证实，通过在最终答案正确性上训练的验证模型学习到的置信度变化可以有效地识别推理步骤中的错误。随后，我们证明了\textsc{AutoCV}生成的过程注释可以提高验证模型在从LLMs生成的多个输出中选择正确答案的准确性。值得注意的是，我们在数学和常识推理的五个数据集中取得了显着的改进。\textsc{AutoCV}的源代码可在\url{https://github.com/rookie-joe/AUTOCV}上找到。

更新时间: 2024-05-27 03:44:24

领域: cs.CL,cs.LG

下载: http://arxiv.org/abs/2405.16802v1

TAGA: Text-Attributed Graph Self-Supervised Learning by Synergizing Graph and Text Mutual Transformations

Text-Attributed Graphs (TAGs) enhance graph structures with natural language descriptions, enabling detailed representation of data and their relationships across a broad spectrum of real-world scenarios. Despite the potential for deeper insights, existing TAG representation learning primarily relies on supervised methods, necessitating extensive labeled data and limiting applicability across diverse contexts. This paper introduces a new self-supervised learning framework, Text-And-Graph Multi-View Alignment (TAGA), which overcomes these constraints by integrating TAGs' structural and semantic dimensions. TAGA constructs two complementary views: Text-of-Graph view, which organizes node texts into structured documents based on graph topology, and the Graph-of-Text view, which converts textual nodes and connections into graph data. By aligning representations from both views, TAGA captures joint textual and structural information. In addition, a novel structure-preserving random walk algorithm is proposed for efficient training on large-sized TAGs. Our framework demonstrates strong performance in zero-shot and few-shot scenarios across eight real-world datasets.

Updated: 2024-05-27 03:40:16

标题: TAGA：通过协同作用的图和文本相互转换进行文本属性图自监督学习

摘要: 文本属性图（TAGs）通过自然语言描述增强图结构，实现对数据及其关系在广泛的现实场景中进行详细表示。尽管存在更深入的洞察力，现有的TAG表示学习主要依赖监督方法，需要大量标记数据，并限制其在不同环境中的适用性。本文介绍了一种新的自监督学习框架，即文本与图多视图对齐（TAGA），通过整合TAGs的结构和语义维度来克服这些限制。TAGA构建了两种互补视图：文本-图视图，将节点文本根据图拓扑结构组织成结构化文档；文本-图视图，将文本节点和连接转换为图数据。通过对齐两个视图的表示，TAGA捕捉了联合文本和结构信息。此外，提出了一种新颖的保持结构的随机游走算法，用于在大型TAG上进行高效训练。我们的框架在八个真实世界数据集中展现了在零样本和少样本场景中的强大性能。

更新时间: 2024-05-27 03:40:16

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2405.16800v1

Dual-State Personalized Knowledge Tracing with Emotional Incorporation

Knowledge tracing has been widely used in online learning systems to guide the students' future learning. However, most existing KT models primarily focus on extracting abundant information from the question sets and explore the relationships between them, but ignore the personalized student behavioral information in the learning process. This will limit the model's ability to accurately capture the personalized knowledge states of students and reasonably predict their performances. To alleviate this limitation, we explicitly models the personalized learning process by incorporating the emotions, a representative personalized behavior in the learning process, into KT framework. Specifically, we present a novel Dual-State Personalized Knowledge Tracing with Emotional Incorporation model to achieve this goal: Firstly, we incorporate emotional information into the modeling process of knowledge state, resulting in the Knowledge State Boosting Module. Secondly, we design an Emotional State Tracing Module to monitor students' personalized emotional states, and propose an emotion prediction method based on personalized emotional states. Finally, we apply the predicted emotions to enhance students' response prediction. Furthermore, to extend the generalization capability of our model across different datasets, we design a transferred version of DEKT, named Transfer Learning-based Self-loop model (T-DEKT). Extensive experiments show our method achieves the state-of-the-art performance.

Updated: 2024-05-27 03:39:34

标题: 具有情感融合的双状态个性化知识追踪

摘要: 知识追踪在在线学习系统中被广泛应用，用于指导学生未来的学习。然而，大多数现有的知识追踪模型主要集中于从问题集中提取丰富信息并探索它们之间的关系，但忽略了学生在学习过程中的个性化行为信息。这将限制模型准确捕捉学生的个性化知识状态并合理预测他们的表现。为了缓解这一限制，我们通过将情绪作为代表性个性化行为信息，明确地模拟了个性化学习过程，并将其整合到知识追踪框架中。具体地，我们提出了一种新颖的带情绪整合的双状态个性化知识追踪模型，以实现这一目标：首先，我们将情绪信息整合到知识状态建模过程中，得到知识状态增强模块。其次，我们设计了一个情绪追踪模块，用于监测学生的个性化情绪状态，并提出了基于个性化情绪状态的情绪预测方法。最后，我们将预测的情绪应用于增强学生的反应预测。此外，为了扩展我们模型在不同数据集上的泛化能力，我们设计了一个名为基于迁移学习的自环模型（T-DEKT）的DEKT的转移版本。大量实验证明我们的方法达到了最先进的性能水平。

更新时间: 2024-05-27 03:39:34

领域: cs.LG

下载: http://arxiv.org/abs/2405.16799v1

Exploring Fairness in Educational Data Mining in the Context of the Right to be Forgotten

In education data mining (EDM) communities, machine learning has achieved remarkable success in discovering patterns and structures to tackle educational challenges. Notably, fairness and algorithmic bias have gained attention in learning analytics of EDM. With the increasing demand for the right to be forgotten, there is a growing need for machine learning models to forget sensitive data and its impact, particularly within the realm of EDM. The paradigm of selective forgetting, also known as machine unlearning, has been extensively studied to address this need by eliminating the influence of specific data from a pre-trained model without complete retraining. However, existing research assumes that interactive data removal operations are conducted in secure and reliable environments, neglecting potential malicious unlearning requests to undermine the fairness of machine learning systems. In this paper, we introduce a novel class of selective forgetting attacks designed to compromise the fairness of learning models while maintaining their predictive accuracy, thereby preventing the model owner from detecting the degradation in model performance. Additionally, we propose an innovative optimization framework for selective forgetting attacks, capable of generating malicious unlearning requests across various attack scenarios. We validate the effectiveness of our proposed selective forgetting attacks on fairness through extensive experiments using diverse EDM datasets.

Updated: 2024-05-27 03:35:50

标题: 在被遗忘权背景下探索教育数据挖掘中的公平性

摘要: 在教育数据挖掘（EDM）社区中，机器学习在发现模式和结构以解决教育挑战方面取得了显著成功。值得注意的是，在EDM的学习分析中，公平性和算法偏见已经引起了关注。随着对被遗忘权利的需求不断增加，对于机器学习模型忘记敏感数据及其影响的需求也在不断增加，特别是在EDM领域内。选择性遗忘的范式，也称为机器取消学习，已被广泛研究以解决这一需求，通过从预训练模型中消除特定数据的影响，而不必完全重新训练。然而，现有研究假设交互式数据删除操作在安全可靠的环境中进行，忽视了可能的恶意取消学习请求，从而破坏了机器学习系统的公平性。在本文中，我们介绍了一种新型的选择性遗忘攻击类型，旨在破坏学习模型的公平性，同时保持其预测准确性，从而防止模型所有者检测到模型性能的降级。此外，我们提出了一种创新的选择性遗忘攻击优化框架，能够在各种攻击场景下生成恶意取消学习请求。我们通过使用不同的EDM数据集进行广泛实验证实了我们提出的选择性遗忘攻击对公平性的有效性。

更新时间: 2024-05-27 03:35:50

领域: cs.LG

下载: http://arxiv.org/abs/2405.16798v1

A Real-Time Voice Activity Detection Based On Lightweight Neural

Voice activity detection (VAD) is the task of detecting speech in an audio stream, which is challenging due to numerous unseen noises and low signal-to-noise ratios in real environments. Recently, neural network-based VADs have alleviated the degradation of performance to some extent. However, the majority of existing studies have employed excessively large models and incorporated future context, while neglecting to evaluate the operational efficiency and latency of the models. In this paper, we propose a lightweight and real-time neural network called MagicNet, which utilizes casual and depth separable 1-D convolutions and GRU. Without relying on future features as input, our proposed model is compared with two state-of-the-art algorithms on synthesized in-domain and out-domain test datasets. The evaluation results demonstrate that MagicNet can achieve improved performance and robustness with fewer parameter costs.

Updated: 2024-05-27 03:31:16

标题: 基于轻量级神经网络的实时语音活动检测

摘要: 语音活动检测（VAD）是检测音频流中的语音的任务，由于现实环境中存在大量看不见的噪音和低信噪比，这是具有挑战性的。最近，基于神经网络的VAD已经在一定程度上缓解了性能下降的问题。然而，大多数现有研究都采用了过度庞大的模型，并结合了未来的背景，同时忽视了对模型的操作效率和延迟的评估。本文提出了一种轻量级和实时的神经网络MagicNet，它利用了随机和深度可分离的1-D卷积和GRU。在不依赖未来特征作为输入的情况下，我们的模型与两种最先进的算法在合成的领域内和领域外测试数据集上进行了比较。评估结果表明，MagicNet可以在更少的参数成本下实现更好的性能和稳健性。

更新时间: 2024-05-27 03:31:16

领域: cs.SD,cs.AI,eess.AS

下载: http://arxiv.org/abs/2405.16797v1

Exploring Prompting Methods for Mitigating Class Imbalance through Synthetic Data Generation with Large Language Models

Large language models (LLMs) have demonstrated impressive in-context learning capabilities across various domains. Inspired by this, our study explores the effectiveness of LLMs in generating realistic tabular data to mitigate class imbalance. We investigate and identify key prompt design elements such as data format, class presentation, and variable mapping to optimize the generation performance. Our findings indicate that using CSV format, balancing classes, and employing unique variable mapping produces realistic and reliable data, significantly enhancing machine learning performance for minor classes in imbalanced datasets. Additionally, these approaches improve the stability and efficiency of LLM data generation. We validate our approach using six real-world datasets and a toy dataset, achieving state-of-the-art performance in classification tasks. The code is available at: https://github.com/seharanul17/synthetic-tabular-LLM

Updated: 2024-05-27 03:29:18

标题: 探索通过大型语言模型生成合成数据来缓解类别不平衡的提示方法

摘要: 大型语言模型（LLM）已经在各个领域展示了令人印象深刻的上下文学习能力。受此启发，我们的研究探讨了LLM在生成真实表格数据以缓解类别不平衡方面的有效性。我们调查并确定了关键提示设计元素，如数据格式、类别展示和变量映射，以优化生成性能。我们的研究结果表明，使用CSV格式、平衡类别，并采用独特的变量映射可以产生真实可靠的数据，显著提升了在不平衡数据集中次要类别的机器学习性能。此外，这些方法改善了LLM数据生成的稳定性和效率。我们使用六个真实世界数据集和一个玩具数据集验证了我们的方法，在分类任务中实现了最先进的性能。该代码可在以下链接找到：https://github.com/seharanul17/synthetic-tabular-LLM

更新时间: 2024-05-27 03:29:18

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2404.12404v2

Generalized Category Discovery with Large Language Models in the Loop

Generalized Category Discovery (GCD) is a crucial task that aims to recognize both known and novel categories from a set of unlabeled data by utilizing a few labeled data with only known categories. Due to the lack of supervision and category information, current methods usually perform poorly on novel categories and struggle to reveal semantic meanings of the discovered clusters, which limits their applications in the real world. To mitigate the above issues, we propose Loop, an end-to-end active-learning framework that introduces Large Language Models (LLMs) into the training loop, which can boost model performance and generate category names without relying on any human efforts. Specifically, we first propose Local Inconsistent Sampling (LIS) to select samples that have a higher probability of falling to wrong clusters, based on neighborhood prediction consistency and entropy of cluster assignment probabilities. Then we propose a Scalable Query strategy to allow LLMs to choose true neighbors of the selected samples from multiple candidate samples. Based on the feedback from LLMs, we perform Refined Neighborhood Contrastive Learning (RNCL) to pull samples and their neighbors closer to learn clustering-friendly representations. Finally, we select representative samples from clusters corresponding to novel categories to allow LLMs to generate category names for them. Extensive experiments on three benchmark datasets show that Loop outperforms SOTA models by a large margin and generates accurate category names for the discovered clusters. Code and data are available at https://github.com/Lackel/LOOP.

Updated: 2024-05-27 03:27:57

标题: 利用大型语言模型在循环中进行广义类别发现

摘要: 广义类别发现（GCD）是一个关键任务，旨在通过利用少量已知类别的标记数据，从一组未标记数据中识别已知和新颖的类别。由于缺乏监督和类别信息，当前方法通常在新颖类别上表现不佳，并且难以揭示已发现聚类的语义含义，这限制了它们在现实世界中的应用。为了减轻上述问题，我们提出了Loop，这是一个端到端主动学习框架，引入了大型语言模型（LLMs）到训练循环中，可以提升模型性能并生成类别名称，而无需依赖任何人力。具体来说，我们首先提出了局部不一致采样（LIS）来选择那些有更高概率落入错误聚类的样本，基于邻域预测一致性和聚类分配概率的熵。然后，我们提出了一个可扩展的查询策略，让LLMs从多个候选样本中选择选定样本的真实邻居。根据LLMs的反馈，我们执行精细化邻域对比学习（RNCL）来拉近样本及其邻居以学习适合聚类的表示。最后，我们从对应于新颖类别的聚类中选择代表性样本，以便LLMs为它们生成类别名称。对三个基准数据集的广泛实验表明，Loop在性能上远远超过SOTA模型，并为已发现的聚类生成准确的类别名称。代码和数据可在https://github.com/Lackel/LOOP找到。

更新时间: 2024-05-27 03:27:57

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2312.10897v2

Laurel: Generating Dafny Assertions Using Large Language Models

Dafny is a popular verification language, which automates proofs by outsourcing them to an SMT solver. This automation is not perfect, however, and the solver often requires guidance in the form of helper assertions creating a burden for the proof engineer. In this paper, we propose Laurel, a tool that uses large language models (LLMs) to automatically generate helper assertions for Dafny programs. To improve the success rate of LLMs in this task, we design two domain-specific prompting techniques. First, we help the LLM determine the location of the missing assertion by analyzing the verifier's error message and inserting an assertion placeholder at that location. Second, we provide the LLM with example assertions from the same codebase, which we select based on a new lemma similarity metric. We evaluate our techniques on a dataset of helper assertions we extracted from three real-world Dafny codebases. Our evaluation shows that Laurel is able to generate over 50% of the required helper assertions given only a few attempts, making LLMs a usable and affordable tool to further automate practical program verification.

Updated: 2024-05-27 03:26:01

标题: Laurel：利用大型语言模型生成Dafny断言

摘要: Dafny是一种流行的验证语言，通过将证明外包给SMT求解器来自动化证明。然而，这种自动化并不完美，求解器通常需要辅助断言的指导，为证明工程师带来负担。在本文中，我们提出了一种名为Laurel的工具，它利用大型语言模型（LLMs）自动生成Dafny程序的辅助断言。为了提高LLMs在此任务中的成功率，我们设计了两种领域特定的提示技术。首先，通过分析验证器的错误消息并在该位置插入断言占位符，帮助LLM确定缺失断言的位置。其次，我们提供来自相同代码库的示例断言，根据一种新的引理相似度度量进行选择。我们在从三个真实世界的Dafny代码库中提取的辅助断言数据集上评估了我们的技术。我们的评估表明，Laurel能够在仅经过几次尝试的情况下生成超过50%所需的辅助断言，使LLMs成为进一步自动化实际程序验证的可用且可负担的工具。

更新时间: 2024-05-27 03:26:01

领域: cs.LO,cs.AI

下载: http://arxiv.org/abs/2405.16792v1

Finding and Exploring Promising Search Space for the 0-1 Multidimensional Knapsack Problem

The 0-1 Multidimensional Knapsack Problem (MKP) is a classical NP-hard combinatorial optimization problem with many engineering applications. In this paper, we propose a novel algorithm combining evolutionary computation with the exact algorithm to solve the 0-1 MKP. It maintains a set of solutions and utilizes the information from the population to extract good partial assignments. To find high-quality solutions, an exact algorithm is applied to explore the promising search space specified by the good partial assignments. The new solutions are used to update the population. Thus, the good partial assignments evolve towards a better direction with the improvement of the population. Extensive experimentation with commonly used benchmark sets shows that our algorithm outperforms the state-of-the-art heuristic algorithms, TPTEA and DQPSO, as well as the commercial solver CPlex. It finds better solutions than the existing algorithms and provides new lower bounds for 10 large and hard instances.

Updated: 2024-05-27 03:19:04

标题: 寻找和探索0-1多维背包问题的有前景的搜索空间

摘要: 多维背包问题（MKP）是一个经典的NP难题组合优化问题，在工程应用中有许多应用。本文提出了一种新颖的算法，将进化计算与精确算法相结合，以解决0-1 MKP。该算法维护一组解，并利用种群信息提取良好的部分赋值。为了找到高质量的解，精确算法被应用于探索由良好部分赋值指定的有前途的搜索空间。新的解用于更新种群。因此，随着种群的改进，良好的部分赋值朝着更好的方向演变。通过对常用基准集的大量实验表明，我们的算法表现优于最先进的启发式算法TPTEA和DQPSO，以及商业求解器CPlex。它比现有算法找到更好的解，并为10个大型和困难实例提供新的下界。

更新时间: 2024-05-27 03:19:04

领域: cs.AI

下载: http://arxiv.org/abs/2210.03918v3

LEGO: Learning and Graph-Optimized Modular Tracker for Online Multi-Object Tracking with Point Clouds

Online multi-object tracking (MOT) plays a pivotal role in autonomous systems. The state-of-the-art approaches usually employ a tracking-by-detection method, and data association plays a critical role. This paper proposes a learning and graph-optimized (LEGO) modular tracker to improve data association performance in the existing literature. The proposed LEGO tracker integrates graph optimization and self-attention mechanisms, which efficiently formulate the association score map, facilitating the accurate and efficient matching of objects across time frames. To further enhance the state update process, the Kalman filter is added to ensure consistent tracking by incorporating temporal coherence in the object states. Our proposed method utilizing LiDAR alone has shown exceptional performance compared to other online tracking approaches, including LiDAR-based and LiDAR-camera fusion-based methods. LEGO ranked 1st at the time of submitting results to KITTI object tracking evaluation ranking board and remains 2nd at the time of submitting this paper, among all online trackers in the KITTI MOT benchmark for cars1

Updated: 2024-05-27 03:12:50

标题: LEGO：用于基于点云的在线多目标跟踪的学习和图优化模块化跟踪器

摘要: 网络多目标跟踪（MOT）在自主系统中起着至关重要的作用。当前最先进的方法通常采用基于检测的跟踪方法，数据关联起着至关重要的作用。本文提出了一种名为学习和图优化（LEGO）模块化跟踪器，以改进现有文献中的数据关联性能。所提出的LEGO跟踪器集成了图优化和自注意机制，有效地制定了关联分数图，促进了对象在时间帧之间的准确和高效匹配。为了进一步增强状态更新过程，添加了卡尔曼滤波器以确保通过在对象状态中整合时间连贯性实现一致的跟踪。我们提出的仅利用LiDAR的方法相比其他在线跟踪方法（包括基于LiDAR和基于LiDAR-摄像机融合的方法）表现出了出色的性能。在提交结果给KITTI对象跟踪评估排名委员会时，LEGO在所有KITTI MOT基准测试中的在线跟踪器中排名第一，并在提交本文时仍保持第二名。

更新时间: 2024-05-27 03:12:50

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2308.09908v2

The second-order zero differential uniformity of the swapped inverse functions over finite fields

The Feistel Boomerang Connectivity Table (FBCT) was proposed as the feistel counterpart of the Boomerang Connectivity Table. The entries of the FBCT are actually related to the second-order zero differential spectrum. Recently, several results on the second-order zero differential uniformity of some functions were introduced. However, almost all of them were focused on power functions, and there are only few results on non-power functions. In this paper, we investigate the second-order zero differential uniformity of the swapped inverse functions, which are functions obtained from swapping two points in the inverse function. We also present the second-order zero differential spectrum of the swapped inverse functions for certain cases. In particular, this paper is the first result to characterize classes of non-power functions with the second-order zero differential uniformity equal to 4, in even characteristic.

Updated: 2024-05-27 03:11:57

标题: 有限域上交换的逆函数的二阶零微分一致性

摘要: Feistel Boomerang连接表（FBCT）被提议作为Boomerang连接表的Feistel对应物。FBCT的条目实际上与二阶零差分谱有关。最近，有关一些函数的二阶零差分均匀性的一些结果被介绍。然而，几乎所有这些结果都集中在幂函数上，对非幂函数的结果很少。本文研究了交换逆函数的二阶零差分均匀性，这些函数是通过交换逆函数中的两个点获得的。我们还为某些情况下的交换逆函数呈现了二阶零差分谱。特别地，本文是第一篇表征具有二阶零差分均匀性等于4的非幂函数类别的结果，在偶特征下。

更新时间: 2024-05-27 03:11:57

领域: cs.IT,cs.CR,math.IT

下载: http://arxiv.org/abs/2405.16784v1

TrojFM: Resource-efficient Backdoor Attacks against Very Large Foundation Models

One key challenge in backdoor attacks against large foundation models is the resource limits. Backdoor attacks usually require retraining the target model, which is impractical for very large foundation models. Existing backdoor attacks are mainly designed for supervised classifiers or small foundation models (e.g., BERT). None of these attacks has successfully compromised a very large foundation model, such as Llama-3-70B, especially with limited computational resources. In this paper, we propose TrojFM, a novel backdoor attack tailored for very large foundation models. Our primary technical contribution is the development of a novel backdoor injection method. This method forces a backdoored model to generate similar hidden representations for poisoned inputs regardless of their actual semantics. Our approach injects such backdoors by fine-tuning only a very small proportion of model parameters. This enables TrojFM to efficiently launch downstream task-agnostic backdoor attacks against very large foundation models under limited computational resources. Moreover, we optimize the fine-tuning process with our customized QLoRA technique, enabling launching our attack via only~\textit{one A100 GPU}. Furthermore, we design a new trigger injection method to ensure our attack stealthiness. Through extensive experiments, we first demonstrate that TrojFM can launch effective backdoor attacks against widely used large GPT-style models without jeopardizing their normal functionalities (and outperforming existing attacks on BERT-style models). Furthermore, we show that TrojFM is resilient to SOTA defenses and is insensitive to changes in key hyper-parameters. Finally, we conduct a resource analysis to quantify that our method can significantly save computational and memory costs compared to existing backdoor attacks.

Updated: 2024-05-27 03:10:57

标题: TrojFM：针对非常庞大基础模型的资源高效后门攻击

摘要: 大型基础模型背门攻击面临的一个关键挑战是资源限制。背门攻击通常需要重新训练目标模型，这对于非常大的基础模型来说是不切实际的。现有的背门攻击主要设计用于监督分类器或小型基础模型（例如BERT）。这些攻击没有成功地 compromise 过非常大的基础模型，比如Llama-3-70B，尤其是在有限的计算资源下。在本文中，我们提出了TrojFM，一种专为非常大的基础模型定制的新型背门攻击。我们的主要技术贡献在于开发了一种新颖的背门注入方法。这种方法强制背门模型生成类似的隐藏表示，无论毒害输入的实际语义如何。我们的方法只通过微调模型的极小比例参数来注入这些背门。这使TrojFM能够在有限的计算资源下有效地发起面向下游任务的背门攻击，尤其是对非常大的基础模型。此外，我们通过我们定制的QLoRA技术优化了微调过程，使我们的攻击只需一个A100 GPU。此外，我们设计了一种新的触发注入方法，以确保我们的攻击隐蔽性。通过大量实验证明，我们首先展示了TrojFM可以对广泛使用的大型GPT风格模型发起有效的背门攻击，而不损害其正常功能（并且在BERT风格模型上超越现有攻击）。此外，我们表明TrojFM对最先进的防御措施具有韧性，并且对关键超参数的变化不敏感。最后，我们进行了资源分析，量化了我们的方法相比现有背门攻击可以显著节省计算和内存成本。

更新时间: 2024-05-27 03:10:57

领域: cs.CR,cs.AI,cs.LG

下载: http://arxiv.org/abs/2405.16783v1

Causal Bayesian Optimization via Exogenous Distribution Learning

Maximizing a target variable as an operational objective in a structural causal model is an important problem. Existing Causal Bayesian Optimization~(CBO) methods either rely on hard interventions that alter the causal structure to maximize the reward; or introduce action nodes to endogenous variables so that the data generation mechanisms are adjusted to achieve the objective. In this paper, a novel method is introduced to learn the distribution of exogenous variables, which is typically ignored or marginalized through expectation by existing methods. Exogenous distribution learning improves the approximation accuracy of structural causal models in a surrogate model that is usually trained with limited observational data. Moreover, the learned exogenous distribution extends existing CBO to general causal schemes beyond Additive Noise Models~(ANM). The recovery of exogenous variables allows us to use a more flexible prior for noise or unobserved hidden variables. We develop a new CBO method by leveraging the learned exogenous distribution. Experiments on different datasets and applications show the benefits of our proposed method.

Updated: 2024-05-27 03:03:07

标题: 因果贝叶斯优化通过外生分布学习

摘要: 将目标变量最大化作为结构因果模型中的操作目标是一个重要问题。现有的因果贝叶斯优化（CBO）方法要么依赖于改变因果结构以最大化奖励的硬干预；要么引入动作节点到内生变量，使数据生成机制被调整以实现目标。本文介绍了一种新方法，用于学习外生变量的分布，这通常被现有方法忽视或通过期望边缘化。外生分布学习提高了结构因果模型在通常使用有限观测数据训练的替代模型中的逼近精度。此外，学习的外生分布将现有的CBO扩展到超出加性噪声模型（ANM）的一般因果方案。恢复外生变量使我们能够为噪声或未观察到的隐藏变量使用更灵活的先验。我们通过利用学习的外生分布开发了一种新的CBO方法。在不同数据集和应用上的实验显示了我们提出的方法的好处。

更新时间: 2024-05-27 03:03:07

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2402.02277v6

Enhancing Adverse Drug Event Detection with Multimodal Dataset: Corpus Creation and Model Development

The mining of adverse drug events (ADEs) is pivotal in pharmacovigilance, enhancing patient safety by identifying potential risks associated with medications, facilitating early detection of adverse events, and guiding regulatory decision-making. Traditional ADE detection methods are reliable but slow, not easily adaptable to large-scale operations, and offer limited information. With the exponential increase in data sources like social media content, biomedical literature, and Electronic Medical Records (EMR), extracting relevant ADE-related information from these unstructured texts is imperative. Previous ADE mining studies have focused on text-based methodologies, overlooking visual cues, limiting contextual comprehension, and hindering accurate interpretation. To address this gap, we present a MultiModal Adverse Drug Event (MMADE) detection dataset, merging ADE-related textual information with visual aids. Additionally, we introduce a framework that leverages the capabilities of LLMs and VLMs for ADE detection by generating detailed descriptions of medical images depicting ADEs, aiding healthcare professionals in visually identifying adverse events. Using our MMADE dataset, we showcase the significance of integrating visual cues from images to enhance overall performance. This approach holds promise for patient safety, ADE awareness, and healthcare accessibility, paving the way for further exploration in personalized healthcare.

Updated: 2024-05-27 02:55:45

标题: 增强多模态数据集的不良药物事件检测：语料库创建和模型开发

摘要: 药物不良事件（ADEs）的挖掘在药物监管中至关重要，通过识别与药物相关的潜在风险，促进早期发现不良事件，并指导监管决策，增强患者安全。传统的ADE检测方法可靠但缓慢，不易适应大规模运营，并提供有限信息。随着社交媒体内容、生物医学文献和电子医疗记录（EMR）等数据源的指数增长，从这些非结构化文本中提取相关ADE信息至关重要。以往的ADE挖掘研究主要集中在基于文本的方法上，忽视了视觉线索，限制了上下文理解，阻碍了准确解释。为了弥补这一差距，我们提出了一个多模态不良药物事件（MMADE）检测数据集，将与ADE相关的文本信息与视觉辅助手段相结合。此外，我们引入了一个框架，利用LLMs和VLMs的能力进行ADE检测，生成描述描绘ADE的医学图像的详细描述，帮助医疗专业人员在视觉上识别不良事件。使用我们的MMADE数据集，我们展示了整合来自图像的视觉线索以增强整体性能的重要性。这种方法对患者安全、ADE意识和医疗可访问性具有潜力，为个性化医疗的进一步探索铺平了道路。

更新时间: 2024-05-27 02:55:45

领域: cs.AI,cs.CL,cs.CV

下载: http://arxiv.org/abs/2405.15766v2

Balancing User Preferences by Social Networks: A Condition-Guided Social Recommendation Model for Mitigating Popularity Bias

Social recommendation models weave social interactions into their design to provide uniquely personalized recommendation results for users. However, social networks not only amplify the popularity bias in recommendation models, resulting in more frequent recommendation of hot items and fewer long-tail items, but also include a substantial amount of redundant information that is essentially meaningless for the model's performance. Existing social recommendation models fail to address the issues of popularity bias and the redundancy of social information, as they directly characterize social influence across the entire social network without making targeted adjustments. In this paper, we propose a Condition-Guided Social Recommendation Model (named CGSoRec) to mitigate the model's popularity bias by denoising the social network and adjusting the weights of user's social preferences. More specifically, CGSoRec first includes a Condition-Guided Social Denoising Model (CSD) to remove redundant social relations in the social network for capturing users' social preferences with items more precisely. Then, CGSoRec calculates users' social preferences based on denoised social network and adjusts the weights in users' social preferences to make them can counteract the popularity bias present in the recommendation model. At last, CGSoRec includes a Condition-Guided Diffusion Recommendation Model (CGD) to introduce the adjusted social preferences as conditions to control the recommendation results for a debiased direction. Comprehensive experiments on three real-world datasets demonstrate the effectiveness of our proposed method. The code is in: https://github.com/hexin5515/CGSoRec.

Updated: 2024-05-27 02:45:01

标题: 通过社交网络平衡用户偏好：一种基于条件引导的社交推荐模型，用于减轻流行度偏见

摘要: 社交推荐模型将社交互动融入设计中，为用户提供独特个性化的推荐结果。然而，社交网络不仅放大了推荐模型中的流行偏见，导致更频繁地推荐热门物品和较少的长尾物品，还包含大量基本对模型性能无意义的冗余信息。现有的社交推荐模型未能解决流行偏见和社交信息冗余的问题，因为它们直接对整个社交网络的社交影响进行特征化，而没有进行有针对性的调整。在本文中，我们提出了一种条件引导的社交推荐模型（命名为CGSoRec），通过去噪社交网络和调整用户社交偏好的权重来减轻模型的流行偏见。具体来说，CGSoRec首先包括一个条件引导的社交去噪模型（CSD），以去除社交网络中的冗余社交关系，更精确地捕捉用户的社交偏好与物品。然后，CGSoRec根据去噪后的社交网络计算用户的社交偏好，并调整用户社交偏好中的权重，使其能够抵消推荐模型中存在的流行偏见。最后，CGSoRec包括一个条件引导的扩散推荐模型（CGD），将调整后的社交偏好作为条件引入，控制推荐结果朝着无偏向的方向。在三个真实数据集上进行的全面实验证明了我们提出的方法的有效性。代码在：https://github.com/hexin5515/CGSoRec。

更新时间: 2024-05-27 02:45:01

领域: cs.SI,cs.LG

下载: http://arxiv.org/abs/2405.16772v1

ARC: A Generalist Graph Anomaly Detector with In-Context Learning

Graph anomaly detection (GAD), which aims to identify abnormal nodes that differ from the majority within a graph, has garnered significant attention. However, current GAD methods necessitate training specific to each dataset, resulting in high training costs, substantial data requirements, and limited generalizability when being applied to new datasets and domains. To address these limitations, this paper proposes ARC, a generalist GAD approach that enables a ``one-for-all'' GAD model to detect anomalies across various graph datasets on-the-fly. Equipped with in-context learning, ARC can directly extract dataset-specific patterns from the target dataset using few-shot normal samples at the inference stage, without the need for retraining or fine-tuning on the target dataset. ARC comprises three components that are well-crafted for capturing universal graph anomaly patterns: 1) smoothness-based feature Alignment module that unifies the features of different datasets into a common and anomaly-sensitive space; 2) ego-neighbor Residual graph encoder that learns abnormality-related node embeddings; and 3) cross-attentive in-Context anomaly scoring module that predicts node abnormality by leveraging few-shot normal samples. Extensive experiments on multiple benchmark datasets from various domains demonstrate the superior anomaly detection performance, efficiency, and generalizability of ARC.

Updated: 2024-05-27 02:42:33

标题: ARC：一种具有上下文学习功能的通用图异常检测器

摘要: 图形异常检测（GAD）旨在识别在图形中与大多数节点不同的异常节点，引起了广泛关注。然而，当前的GAD方法需要针对每个数据集进行特定训练，导致训练成本高昂、数据需求大、且应用于新数据集和领域时的泛化能力有限。为解决这些限制，本文提出了ARC，一种通用的GAD方法，能够实时跨多个图形数据集检测异常。ARC具有上下文学习功能，可以在推断阶段直接从目标数据集中提取特定模式，无需在目标数据集上重新训练或微调。ARC包括三个组件，精心设计用于捕捉通用图形异常模式：1）基于平滑度的特征对齐模块，将不同数据集的特征统一到一个常见且对异常敏感的空间中；2）自我相邻残差图编码器，学习与异常相关的节点嵌入；和3）交叉关注的上下文异常评分模块，通过利用少量正常样本来预测节点异常。对来自不同领域的多个基准数据集进行的大量实验表明，ARC具有优越的异常检测性能、效率和泛化能力。

更新时间: 2024-05-27 02:42:33

领域: cs.LG

下载: http://arxiv.org/abs/2405.16771v1

Physics informed cell representations for variational formulation of multiscale problems

With the rapid advancement of graphical processing units, Physics-Informed Neural Networks (PINNs) are emerging as a promising tool for solving partial differential equations (PDEs). However, PINNs are not well suited for solving PDEs with multiscale features, particularly suffering from slow convergence and poor accuracy. To address this limitation of PINNs, this article proposes physics-informed cell representations for resolving multiscale Poisson problems using a model architecture consisting of multilevel multiresolution grids coupled with a multilayer perceptron (MLP). The grid parameters (i.e., the level-dependent feature vectors) and the MLP parameters (i.e., the weights and biases) are determined using gradient-descent based optimization. The variational (weak) form based loss function accelerates computation by allowing the linear interpolation of feature vectors within grid cells. This cell-based MLP model also facilitates the use of a decoupled training scheme for Dirichlet boundary conditions and a parameter-sharing scheme for periodic boundary conditions, delivering superior accuracy compared to conventional PINNs. Furthermore, the numerical examples highlight improved speed and accuracy in solving PDEs with nonlinear or high-frequency boundary conditions and provide insights into hyperparameter selection. In essence, by cell-based MLP model along with the parallel tiny-cuda-nn library, our implementation improves convergence speed and numerical accuracy.

Updated: 2024-05-27 02:42:16

标题: 物理学信息的细胞表示用于多尺度问题变分公式的形成

摘要: 随着图形处理单元的快速发展，物理信息神经网络（PINNs）正成为解决偏微分方程（PDEs）的一个有前途的工具。然而，PINNs并不适合解决具有多尺度特征的PDEs，特别是在收敛速度缓慢和精度低下方面遇到困难。为了解决PINNs的这一局限性，本文提出了一种基于物理信息的单元表示来解决多尺度泊松问题，使用由多级多分辨率网格和多层感知器（MLP）组成的模型架构。通过基于梯度下降的优化确定网格参数（即级别相关的特征向量）和MLP参数（即权重和偏置）。基于变分（弱）形式的损失函数通过允许网格单元内特征向量的线性插值加速计算。这种基于单元的MLP模型还促进了对迪利克雷边界条件的解耦训练方案和对周期边界条件的参数共享方案的使用，与传统的PINNs相比提供了更高的准确性。此外，数值示例突显了在解决具有非线性或高频边界条件的PDEs时，速度和准确性的提高，并提供了超参数选择的见解。本质上，通过基于单元的MLP模型以及并行的tiny-cuda-nn库，我们的实现提高了收敛速度和数值精度。

更新时间: 2024-05-27 02:42:16

领域: cs.LG

下载: http://arxiv.org/abs/2405.16770v1

Leveraging Unknown Objects to Construct Labeled-Unlabeled Meta-Relationships for Zero-Shot Object Navigation

Zero-shot object navigation (ZSON) addresses situation where an agent navigates to an unseen object that does not present in the training set. Previous works mainly train agent using seen objects with known labels, and ignore the seen objects without labels. In this paper, we introduce seen objects without labels, herein termed as ``unknown objects'', into training procedure to enrich the agent's knowledge base with distinguishable but previously overlooked information. Furthermore, we propose the label-wise meta-correlation module (LWMCM) to harness relationships among objects with and without labels, and obtain enhanced objects information. Specially, we propose target feature generator (TFG) to generate the features representation of the unlabeled target objects. Subsequently, the unlabeled object identifier (UOI) module assesses whether the unlabeled target object appears in the current observation frame captured by the camera and produces an adapted target features representation specific to the observed context. In meta contrastive feature modifier (MCFM), the target features is modified via approaching the features of objects within the observation frame while distancing itself from features of unobserved objects. Finally, the meta object-graph learner (MOGL) module is utilized to calculate the relationships among objects based on the features. Experiments conducted on AI2THOR and RoboTHOR platforms demonstrate the effectiveness of our proposed method.

Updated: 2024-05-27 02:39:39

标题: 利用未知对象构建标记-未标记元关系，用于零样本对象导航

摘要: 零样本物体导航（ZSON）解决了一个情况，即代理导航到训练集中不存在的未见过的物体。先前的研究主要是使用已知标签的已见物体来训练代理，并忽略了没有标签的已见物体。本文引入了没有标签的已见物体，这里称为“未知物体”，进入训练过程，以丰富代理的知识库，并获得可区分但先前被忽视的信息。此外，我们提出了标签智能相关模块（LWMCM）来利用具有和没有标签的物体之间的关系，并获得增强的物体信息。特别地，我们提出了目标特征生成器（TFG）来生成未标记目标物体的特征表示。随后，未标记物体识别器（UOI）模块评估了未标记目标物体是否出现在相机捕捉到的当前观察帧中，并产生适应观察上下文的调整目标特征表示。在元对比特征修改器（MCFM）中，目标特征通过接近观察帧内的物体特征而远离未观察到的物体特征进行修改。最后，元对象图学习器（MOGL）模块被用来基于特征计算物体之间的关系。在AI2THOR和RoboTHOR平台上进行的实验表明了我们提出的方法的有效性。

更新时间: 2024-05-27 02:39:39

领域: cs.CV,cs.AI,cs.RO

下载: http://arxiv.org/abs/2405.15222v2

Extended Flow Matching: a Method of Conditional Generation with Generalized Continuity Equation

The task of conditional generation is one of the most important applications of generative models, and numerous methods have been developed to date based on the celebrated flow-based models. However, many flow-based models in use today are not built to allow one to introduce an explicit inductive bias to how the conditional distribution to be generated changes with respect to conditions. This can result in unexpected behavior in the task of style transfer, for example. In this research, we introduce extended flow matching (EFM), a direct extension of flow matching that learns a ``matrix field'' corresponding to the continuous map from the space of conditions to the space of distributions. We show that we can introduce inductive bias to the conditional generation through the matrix field and demonstrate this fact with MMOT-EFM, a version of EFM that aims to minimize the Dirichlet energy or the sensitivity of the distribution with respect to conditions. We will present our theory along with experimental results that support the competitiveness of EFM in conditional generation.

Updated: 2024-05-27 02:38:08

标题: 扩展流匹配：一种带有广义连续性方程的条件生成方法

摘要: 条件生成任务是生成模型中最重要的应用之一，迄今已基于著名的基于流的模型开发了许多方法。然而，今天许多使用的基于流的模型并未设计成允许引入明确的归纳偏差，以控制生成条件分布随条件变化的方式。这可能导致在风格转移任务中出现意外行为。在本研究中，我们引入了扩展流匹配（EFM），这是流匹配的直接扩展，学习了一个与条件空间到分布空间的连续映射对应的“矩阵场”。我们展示了通过矩阵场可以引入归纳偏差到条件生成，并通过MMOT-EFM证明了这一事实，该版本的EFM旨在最小化分布对条件的Dirichlet能量或灵敏度。我们将展示支持EFM在条件生成中竞争力的理论和实验结果。

更新时间: 2024-05-27 02:38:08

领域: cs.LG,math.AP,math.FA,math.OC,math.PR,68T07 (Primary), 49Q22 (Secondary)

下载: http://arxiv.org/abs/2402.18839v4

HiddenSpeaker: Generate Imperceptible Unlearnable Audios for Speaker Verification System

In recent years, the remarkable advancements in deep neural networks have brought tremendous convenience. However, the training process of a highly effective model necessitates a substantial quantity of samples, which brings huge potential threats, like unauthorized exploitation with privacy leakage. In response, we propose a framework named HiddenSpeaker, embedding imperceptible perturbations within the training speech samples and rendering them unlearnable for deep-learning-based speaker verification systems that employ large-scale speakers for efficient training. The HiddenSpeaker utilizes a simplified error-minimizing method named Single-Level Error-Minimizing (SLEM) to generate specific and effective perturbations. Additionally, a hybrid objective function is employed for human perceptual optimization, ensuring the perturbation is indistinguishable from human listeners. We conduct extensive experiments on multiple state-of-the-art (SOTA) models in the speaker verification domain to evaluate HiddenSpeaker. Our results demonstrate that HiddenSpeaker not only deceives the model with unlearnable samples but also enhances the imperceptibility of the perturbations, showcasing strong transferability across different models.

Updated: 2024-05-27 02:33:22

标题: 隐形扬声器：为说话者验证系统生成无法察觉和学习的音频

摘要: 近年来，深度神经网络取得了显著的进展，带来了巨大的便利。然而，高效模型的训练过程需要大量样本，这带来了潜在的巨大威胁，如未经授权的隐私泄露。为此，我们提出了一个名为HiddenSpeaker的框架，将难以察觉的扰动嵌入训练语音样本中，使其对采用大规模说话人进行高效训练的基于深度学习的说话人验证系统不可学习。HiddenSpeaker利用一种简化的错误最小化方法称为Single-Level Error-Minimizing（SLEM）来生成具体有效的扰动。此外，还采用了混合目标函数进行人类感知优化，确保扰动与人类听者无法区分。我们对说话人验证领域的多个最新技术模型进行了广泛实验，评估了HiddenSpeaker。我们的结果表明，HiddenSpeaker不仅欺骗模型使用无法学习的样本，还增强了扰动的难以察觉性，展示了跨不同模型的强大可转移性。

更新时间: 2024-05-27 02:33:22

领域: cs.SD,cs.LG,eess.AS

下载: http://arxiv.org/abs/2405.15655v2

Oblivious Monitoring for Discrete-Time STL via Fully Homomorphic Encryption

When monitoring a cyber-physical system (CPS) from a remote server, keeping the monitored data secret is crucial, particularly when they contain sensitive information, e.g., biological or location data. Recently, Banno et al. (CAV'22) proposed a protocol for online LTL monitoring that keeps data concealed from the server using Fully Homomorphic Encryption (FHE). We build on this protocol to allow arithmetic operations over encrypted values, e.g., to compute a safety measurement combining distance, velocity, and so forth. Overall, our protocol enables oblivious online monitoring of discrete-time real-valued signals against signal temporal logic (STL) formulas. Our protocol combines two FHE schemes, CKKS and TFHE, leveraging their respective strengths. We employ CKKS to evaluate arithmetic predicates in STL formulas while utilizing TFHE to process them using a DFA derived from the STL formula. We conducted case studies on monitoring blood glucose levels and vehicles' behavior against the Responsibility-Sensitive Safety (RSS) rules. Our results suggest the practical relevance of our protocol.

Updated: 2024-05-27 02:32:16

标题: 使用全同态加密进行离散时间STL的无意识监控

摘要: 在从远程服务器监视网络物理系统（CPS）时，保持监视数据的机密性至关重要，尤其是当它们包含敏感信息，例如生物或位置数据。最近，Banno等人（CAV'22）提出了一种在线LTL监视协议，使用全同态加密（FHE）将数据隐藏在服务器中。我们基于该协议，允许对加密值进行算术运算，例如，计算结合距离、速度等的安全测量。总体而言，我们的协议实现了对离散时间实值信号针对信号时间逻辑（STL）公式的遗忘式在线监视。我们的协议结合了两种FHE方案，CKKS和TFHE，充分利用它们各自的优势。我们使用CKKS来评估STL公式中的算术谓词，同时利用TFHE来使用从STL公式派生的DFA来处理它们。我们对监控血糖水平和车辆行为与责任敏感安全（RSS）规则进行了案例研究。我们的结果表明了我们协议的实际相关性。

更新时间: 2024-05-27 02:32:16

领域: cs.CR,cs.FL

下载: http://arxiv.org/abs/2405.16767v1

Reframing the Relationship in Out-of-Distribution Detection

The remarkable achievements of Large Language Models (LLMs) have captivated the attention of both academia and industry, transcending their initial role in dialogue generation. The utilization of LLMs as intermediary agents in various tasks has yielded promising results, sparking a wave of innovation in artificial intelligence. Building on these breakthroughs, we introduce a novel approach that integrates the agent paradigm into the Out-of-distribution (OOD) detection task, aiming to enhance its robustness and adaptability. Our proposed method, Concept Matching with Agent (CMA), employs neutral prompts as agents to augment the CLIP-based OOD detection process. These agents function as dynamic observers and communication hubs, interacting with both In-distribution (ID) labels and data inputs to form vector triangle relationships. This triangular framework offers a more nuanced approach than the traditional binary relationship, allowing for better separation and identification of ID and OOD inputs. Our extensive experimental results showcase the superior performance of CMA over both zero-shot and training-required methods in a diverse array of real-world scenarios.

Updated: 2024-05-27 02:27:28

标题: 重新构建在分布检测中的关系

摘要: 大型语言模型（LLMs）取得的显著成就吸引了学术界和工业界的关注，超越了它们最初在对话生成中的作用。LLMs作为各种任务中介代理的利用产生了有希望的结果，引发了人工智能领域的创新浪潮。在这些突破的基础上，我们介绍了一种将代理范式整合到Out-of-distribution（OOD）检测任务中的新方法，旨在增强其稳健性和适应性。我们提出的方法，概念匹配与代理（CMA），利用中立提示作为代理来增强基于CLIP的OOD检测过程。这些代理作为动态观察者和通信中心，与In-distribution（ID）标签和数据输入进行交互，形成向量三角关系。这种三角形框架比传统的二元关系提供了更细致的方法，可以更好地分离和识别ID和OOD输入。我们的广泛实验结果展示了CMA在各种真实场景中的优越性能，超过了零-shot和需要训练的方法。

更新时间: 2024-05-27 02:27:28

领域: cs.CV,cs.AI,cs.LG

下载: http://arxiv.org/abs/2405.16766v1

Study of Robust Direction Finding Based on Joint Sparse Representation

Standard Direction of Arrival (DOA) estimation methods are typically derived based on the Gaussian noise assumption, making them highly sensitive to outliers. Therefore, in the presence of impulsive noise, the performance of these methods may significantly deteriorate. In this paper, we model impulsive noise as Gaussian noise mixed with sparse outliers. By exploiting their statistical differences, we propose a novel DOA estimation method based on sparse signal recovery (SSR). Furthermore, to address the issue of grid mismatch, we utilize an alternating optimization approach that relies on the estimated outlier matrix and the on-grid DOA estimates to obtain the off-grid DOA estimates. Simulation results demonstrate that the proposed method exhibits robustness against large outliers.

Updated: 2024-05-27 02:26:37

标题: 基于联合稀疏表示的鲁棒定向查找研究

摘要: 标准到达方向（DOA）估计方法通常基于高斯噪声假设推导而来，使其对异常值非常敏感。因此，在存在冲击噪声的情况下，这些方法的性能可能显著恶化。在本文中，我们将冲击噪声建模为高斯噪声混合稀疏异常值。通过利用它们的统计差异，我们提出了一种基于稀疏信号恢复（SSR）的新型DOA估计方法。此外，为了解决网格不匹配问题，我们利用交替优化方法，依赖于估计的异常值矩阵和在网格上的DOA估计，获得离网格的DOA估计。模拟结果表明，所提出的方法对大异常值表现出鲁棒性。

更新时间: 2024-05-27 02:26:37

领域: cs.LG,eess.SP

下载: http://arxiv.org/abs/2405.16765v1

Variational DAG Estimation via State Augmentation With Stochastic Permutations

Estimating the structure of a Bayesian network, in the form of a directed acyclic graph (DAG), from observational data is a statistically and computationally hard problem with essential applications in areas such as causal discovery. Bayesian approaches are a promising direction for solving this task, as they allow for uncertainty quantification and deal with well-known identifiability issues. From a probabilistic inference perspective, the main challenges are (i) representing distributions over graphs that satisfy the DAG constraint and (ii) estimating a posterior over the underlying combinatorial space. We propose an approach that addresses these challenges by formulating a joint distribution on an augmented space of DAGs and permutations. We carry out posterior estimation via variational inference, where we exploit continuous relaxations of discrete distributions. We show that our approach performs competitively when compared with a wide range of Bayesian and non-Bayesian benchmarks on a range of synthetic and real datasets.

Updated: 2024-05-27 02:25:19

标题: 使用随机排列进行状态增广的变分DAG估计

摘要: 从观测数据中估计贝叶斯网络的结构，以有向无环图（DAG）的形式，是一个在统计和计算上具有困难的问题，在因果发现等领域具有重要应用。贝叶斯方法是解决这一任务的一个有前途的方向，因为它们允许对不确定性进行量化，并处理众所周知的可识别性问题。从概率推断的角度来看，主要的挑战是(i)表示满足DAG约束的图的分布和(ii)估计潜在组合空间上的后验分布。我们提出了一种方法，通过在扩展的DAG和排列空间上制定一个联合分布来解决这些挑战。我们通过变分推断进行后验估计，利用了离散分布的连续松弛。我们展示了我们的方法在一系列合成和真实数据集上与各种贝叶斯和非贝叶斯基准比较时表现竞争力。

更新时间: 2024-05-27 02:25:19

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2402.02644v2

Transport of Algebraic Structure to Latent Embeddings

Machine learning often aims to produce latent embeddings of inputs which lie in a larger, abstract mathematical space. For example, in the field of 3D modeling, subsets of Euclidean space can be embedded as vectors using implicit neural representations. Such subsets also have a natural algebraic structure including operations (e.g., union) and corresponding laws (e.g., associativity). How can we learn to "union" two sets using only their latent embeddings while respecting associativity? We propose a general procedure for parameterizing latent space operations that are provably consistent with the laws on the input space. This is achieved by learning a bijection from the latent space to a carefully designed mirrored algebra which is constructed on Euclidean space in accordance with desired laws. We evaluate these structural transport nets for a range of mirrored algebras against baselines that operate directly on the latent space. Our experiments provide strong evidence that respecting the underlying algebraic structure of the input space is key for learning accurate and self-consistent operations.

Updated: 2024-05-27 02:24:57

标题: 代数结构的转化到潜在嵌入中的传输

摘要: 机器学习通常旨在生成输入的潜在嵌入，这些输入位于更大的、抽象的数学空间中。例如，在3D建模领域中，可以使用隐式神经表示将欧几里德空间的子集嵌入为向量。这些子集还具有自然的代数结构，包括操作（例如并集）和相应的定律（例如结合律）。我们如何学习仅使用它们的潜在嵌入来"并集"两个集合，同时尊重结合律？我们提出了一种通用过程，用于参数化与输入空间上的法则明显一致的潜在空间操作。这通过学习从潜在空间到经过精心设计的镜像代数的双射来实现，该镜像代数是根据所需的法则在欧几里德空间上构建的。我们对一系列镜像代数的结构传输网络进行评估，并将其与直接在潜在空间上操作的基线进行比较。我们的实验证据强烈表明，尊重输入空间的基础代数结构对学习准确和自洽的操作至关重要。

更新时间: 2024-05-27 02:24:57

领域: cs.LG

下载: http://arxiv.org/abs/2405.16763v1

PRoLoRA: Partial Rotation Empowers More Parameter-Efficient LoRA

With the rapid scaling of large language models (LLMs), serving numerous low-rank adaptations (LoRAs) concurrently has become increasingly impractical, leading to unaffordable costs and necessitating more parameter-efficient finetuning methods. In this work, we introduce Partially Rotation-enhanced Low-Rank Adaptation (PRoLoRA), an intra-layer sharing mechanism comprising four essential components: broadcast reduction, rotation enhancement, partially-sharing refinement, and rectified initialization strategy. As a superset of LoRA, PRoLoRA retains its advantages, and effectively circumvent the drawbacks of peer parameter-sharing methods with superior model capacity, practical feasibility, and broad applicability. Empirical experiments demonstrate the remarkably higher parameter efficiency of PRoLoRA in both specific parameter budget and performance target scenarios, and its scalability to larger LLMs. Notably, with one time less trainable parameters, PRoLoRA still outperforms LoRA on multiple instruction tuning datasets. Subsequently, an ablation study is conducted to validate the necessity of individual components and highlight the superiority of PRoLoRA over three potential variants. Hopefully, the conspicuously higher parameter efficiency can establish PRoLoRA as a resource-friendly alternative to LoRA.

Updated: 2024-05-27 02:24:25

标题: PRoLoRA：部分旋转增强更高效参数的LoRA

摘要: 随着大型语言模型（LLMs）的快速扩展，同时提供大量低秩适应（LoRAs）变得越来越不切实际，导致成本不可承受，并需要更具参数效率的微调方法。在这项工作中，我们引入了部分旋转增强低秩适应（PRoLoRA），这是一种由四个基本组件组成的层内共享机制：广播减少、旋转增强、部分共享改进和修正初始化策略。作为LoRA的超集，PRoLoRA保留了其优势，并有效地规避了同行参数共享方法的缺点，具有更高的模型容量、实际可行性和广泛适用性。实证实验表明，在特定参数预算和性能目标场景下，PRoLoRA具有显着更高的参数效率，并且可以扩展到更大的LLMs。值得注意的是，相比LoRA，PRoLoRA在多次指令调整数据集上仍然表现出色，可用更少的可训练参数。随后，进行了消融研究，以验证各个组件的必要性，并突出PRoLoRA相对于三个潜在变体的优越性。希望显着更高的参数效率可以将PRoLoRA确立为LoRA的资源友好替代方案。

更新时间: 2024-05-27 02:24:25

领域: cs.LG

下载: http://arxiv.org/abs/2402.16902v2

Addressing Discretization-Induced Bias in Demographic Prediction

Racial and other demographic imputation is necessary for many applications, especially in auditing disparities and outreach targeting in political campaigns. The canonical approach is to construct continuous predictions -- e.g., based on name and geography -- and then to $\textit{discretize}$ the predictions by selecting the most likely class (argmax). We study how this practice produces $\textit{discretization bias}$. In particular, we show that argmax labeling, as used by a prominent commercial voter file vendor to impute race/ethnicity, results in a substantial under-count of African-American voters, e.g., by 28.2% points in North Carolina. This bias can have substantial implications in downstream tasks that use such labels. We then introduce a $\textit{joint optimization}$ approach -- and a tractable $\textit{data-driven thresholding}$ heuristic -- that can eliminate this bias, with negligible individual-level accuracy loss. Finally, we theoretically analyze discretization bias, show that calibrated continuous models are insufficient to eliminate it, and that an approach such as ours is necessary. Broadly, we warn researchers and practitioners against discretizing continuous demographic predictions without considering downstream consequences.

Updated: 2024-05-27 02:22:43

标题: 解决离散化引起的人口预测偏差

摘要: 种族和其他人口统计学推断在许多应用中是必要的，特别是在审计不平等和政治活动中的外展目标方面。经典方法是构建连续预测 -- 例如，基于姓名和地理位置 -- 然后通过选择最可能的类别（argmax）来$\textit{离散化}$这些预测。我们研究了这种做法如何产生$\textit{离散化偏差}$。特别是，我们发现，像一个知名的商业选民档案供应商用于推断种族/族裔的argmax标记法在北卡罗来纳州会导致对非裔美国选民数量的实质性低估，例如低估了28.2%。这种偏差可能对使用这些标签的下游任务产生重要影响。然后，我们引入了一种$\textit{联合优化}$方法 -- 和一种可行的$\textit{数据驱动阈值}$启发式方法 -- 可以消除这种偏差，几乎没有个体级准确性损失。最后，我们从理论上分析了离散化偏差，显示校准的连续模型不足以消除它，我们这样的方法是必要的。总的来说，我们警告研究人员和从业者不要在考虑下游后果之前将连续人口预测离散化。

更新时间: 2024-05-27 02:22:43

领域: cs.CY,cs.LG,K.4.0

下载: http://arxiv.org/abs/2405.16762v1

A mechanism-driven reinforcement learning framework for shape optimization of airfoils

In this paper, a novel mechanism-driven reinforcement learning framework is proposed for airfoil shape optimization. To validate the framework, a reward function is designed and analyzed, from which the equivalence between the maximizing the cumulative reward and achieving the optimization objectives is guaranteed theoretically. To establish a quality exploration, and to obtain an accurate reward from the environment, an efficient solver for steady Euler equations is employed in the reinforcement learning method. The solver utilizes the B\'ezier curve to describe the shape of the airfoil, and a Newton-geometric multigrid method for the solution. In particular, a dual-weighted residual-based h-adaptive method is used for efficient calculation of target functional. To effectively streamline the airfoil shape during the deformation process, we introduce the Laplacian smoothing, and propose a B\'ezier fitting strategy, which not only remits mesh tangling but also guarantees a precise manipulation of the geometry. In addition, a neural network architecture is designed based on an attention mechanism to make the learning process more sensitive to the minor change of the airfoil geometry. Numerical experiments demonstrate that our framework can handle the optimization problem with hundreds of design variables. It is worth mentioning that, prior to this work, there are limited works combining such high-fidelity partial differential equatons framework with advanced reinforcement learning algorithms for design problems with such high dimensionality.

Updated: 2024-05-27 02:21:28

标题: 一个基于机制的强化学习框架用于翼型优化

摘要: 在这篇论文中，提出了一种新颖的基于机制的强化学习框架，用于翼型优化。为了验证该框架，设计并分析了一个奖励函数，理论上保证了最大化累积奖励与实现优化目标之间的等价性。为了建立高质量的探索，并从环境中获得准确的奖励，强化学习方法中采用了用于稳态Euler方程的高效求解器。求解器利用贝塞尔曲线描述翼型的形状，并采用牛顿-几何多重网格方法进行求解。特别地，还使用基于双加权残差的h自适应方法来高效计算目标功能。为了在变形过程中有效地调整翼型形状，引入了拉普拉斯平滑，并提出了一个贝塞尔拟合策略，不仅可以减轻网格缠结，还可以保证几何形状的精确操作。此外，基于注意机制设计了一个神经网络架构，使学习过程更加敏感于翼型几何的微小变化。数值实验表明，我们的框架可以处理具有数百个设计变量的优化问题。值得一提的是，在此工作之前，很少有将如此高保真度的偏微分方程框架与先进的强化学习算法结合起来，用于具有如此高维度的设计问题。

更新时间: 2024-05-27 02:21:28

领域: math.NA,cs.CE,cs.LG,cs.NA

下载: http://arxiv.org/abs/2403.04329v2

Masked Face Recognition with Generative-to-Discriminative Representations

Masked face recognition is important for social good but challenged by diverse occlusions that cause insufficient or inaccurate representations. In this work, we propose a unified deep network to learn generative-to-discriminative representations for facilitating masked face recognition. To this end, we split the network into three modules and learn them on synthetic masked faces in a greedy module-wise pretraining manner. First, we leverage a generative encoder pretrained for face inpainting and finetune it to represent masked faces into category-aware descriptors. Attribute to the generative encoder's ability in recovering context information, the resulting descriptors can provide occlusion-robust representations for masked faces, mitigating the effect of diverse masks. Then, we incorporate a multi-layer convolutional network as a discriminative reformer and learn it to convert the category-aware descriptors into identity-aware vectors, where the learning is effectively supervised by distilling relation knowledge from off-the-shelf face recognition model. In this way, the discriminative reformer together with the generative encoder serves as the pretrained backbone, providing general and discriminative representations towards masked faces. Finally, we cascade one fully-connected layer following by one softmax layer into a feature classifier and finetune it to identify the reformed identity-aware vectors. Extensive experiments on synthetic and realistic datasets demonstrate the effectiveness of our approach in recognizing masked faces.

Updated: 2024-05-27 02:20:55

标题: 用生成到判别表示进行面部遮蔽识别

摘要: 遮蔽人脸识别对社会具有重要意义，但受到各种遮挡的挑战，这会导致不足或不准确的表示。在这项工作中，我们提出了一个统一的深度网络，用于学习生成到判别式表示，以促进遮蔽人脸识别。为此，我们将网络分为三个模块，并以贪婪的逐模块预训练方式在合成遮蔽人脸上进行学习。首先，我们利用一个预训练用于人脸修补的生成编码器，并微调它以将遮蔽人脸表示为类别感知描述符。由于生成编码器恢复上下文信息的能力，得到的描述符可以为遮蔽人脸提供抗干扰表示，从而减轻各种面具的影响。然后，我们将一个多层卷积网络作为判别式改革者并学习将类别感知描述符转换为身份感知向量，其中学习有效地受到来自现成人脸识别模型的关系知识的监督。通过这种方式，判别式改革者与生成编码器一起作为预训练的骨干，提供通用和判别性表示以应对遮蔽人脸。最后，我们将一个全连接层和一个softmax层级联成一个特征分类器，并微调它以识别改革后的身份感知向量。在合成和现实数据集上进行的大量实验表明了我们的方法在识别遮蔽人脸方面的有效性。

更新时间: 2024-05-27 02:20:55

领域: cs.CV,cs.AI,cs.LG

下载: http://arxiv.org/abs/2405.16761v1

LoRA Meets Dropout under a Unified Framework

With the remarkable capabilities, large language models (LLMs) have emerged as essential elements in numerous NLP applications, while parameter-efficient finetuning, especially LoRA, has gained popularity as a lightweight approach for model customization. Meanwhile, various dropout methods, initially designed for full finetuning with all the parameters updated, alleviates overfitting associated with excessive parameter redundancy. Hence, a possible contradiction arises from negligible trainable parameters of LoRA and the effectiveness of previous dropout methods, which has been largely overlooked. To fill this gap, we first confirm that parameter-efficient LoRA is also overfitting-prone. We then revisit transformer-specific dropout methods, and establish their equivalence and distinctions mathematically and empirically. Building upon this comparative analysis, we introduce a unified framework for a comprehensive investigation, which instantiates these methods based on dropping position, structural pattern and compensation measure. Through this framework, we reveal the new preferences and performance comparisons of them when involved with limited trainable parameters. This framework also allows us to amalgamate the most favorable aspects into a novel dropout method named HiddenKey. Extensive experiments verify the remarkable superiority and sufficiency of HiddenKey across multiple models and tasks, which highlights it as the preferred approach for high-performance and parameter-efficient finetuning of LLMs.

Updated: 2024-05-27 02:16:43

标题: LoRA在统一框架下与Dropout相遇

摘要: 具有显著能力的大型语言模型(LLMs)已经成为众多NLP应用中不可或缺的要素，而参数高效的微调，特别是LoRA，作为一种轻量级的模型定制方法已经变得越来越受欢迎。同时，各种dropout方法最初设计用于全参数微调，通过减轻与过多参数冗余相关的过拟合问题。因此，一个可能的矛盾在于LoRA的可训练参数微不足道，而之前的dropout方法的有效性却被大多数人忽视。为了填补这一空白，我们首先确认参数高效的LoRA也容易出现过拟合问题。然后，我们重新审视了特定于transformer的dropout方法，并在数学和实证上建立它们的等价性和区别。基于这种比较分析，我们引入了一个统一的框架，用于全面调查，该框架基于丢弃位置、结构模式和补偿措施实例化这些方法。通过这个框架，我们揭示了当涉及有限可训练参数时它们的新偏好和性能比较。这个框架还允许我们将最有利的方面融合成一种名为HiddenKey的新型dropout方法。广泛的实验验证了HiddenKey在多个模型和任务中的显著优越性和充分性，这凸显了它作为LLMs高性能和参数高效微调的首选方法。

更新时间: 2024-05-27 02:16:43

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2403.00812v2

Greedy Growing Enables High-Resolution Pixel-Based Diffusion Models

We address the long-standing problem of how to learn effective pixel-based image diffusion models at scale, introducing a remarkably simple greedy growing method for stable training of large-scale, high-resolution models. without the needs for cascaded super-resolution components. The key insight stems from careful pre-training of core components, namely, those responsible for text-to-image alignment {\it vs.} high-resolution rendering. We first demonstrate the benefits of scaling a {\it Shallow UNet}, with no down(up)-sampling enc(dec)oder. Scaling its deep core layers is shown to improve alignment, object structure, and composition. Building on this core model, we propose a greedy algorithm that grows the architecture into high-resolution end-to-end models, while preserving the integrity of the pre-trained representation, stabilizing training, and reducing the need for large high-resolution datasets. This enables a single stage model capable of generating high-resolution images without the need of a super-resolution cascade. Our key results rely on public datasets and show that we are able to train non-cascaded models up to 8B parameters with no further regularization schemes. Vermeer, our full pipeline model trained with internal datasets to produce 1024x1024 images, without cascades, is preferred by 44.0% vs. 21.4% human evaluators over SDXL.

Updated: 2024-05-27 02:12:39

标题: 贪婪生长实现高分辨率基于像素的扩散模型

摘要: 我们解决了如何在规模上学习有效的基于像素的图像扩散模型的长期问题，引入了一种非常简单的贪婪生长方法，用于稳定训练大规模、高分辨率模型，而无需级联超分辨率组件。关键见解源自对核心组件的精心预训练，即负责文本到图像对齐和高分辨率渲染的组件。我们首先展示了通过扩展Shallow UNet模型，没有下（上）采样的编码（解码）器，带来的好处。扩展其深核心层被证明可以改善对齐、对象结构和构图。在此核心模型的基础上，我们提出了一种贪婪算法，将架构扩展为高分辨率端到端模型，同时保持预训练表示的完整性，稳定训练，并减少对大规模高分辨率数据集的需求。这使得单阶段模型能够生成高分辨率图像，而无需超分辨率级联。我们的关键结果依赖于公共数据集，并显示我们能够训练多达80亿参数的非级联模型，而无需进一步的正则化方案。我们的完整管道模型Vermeer通过内部数据集训练，生成1024x1024像素图像，不使用级联，被44.0%的人类评估者更喜欢，而不是SDXL的21.4%。

更新时间: 2024-05-27 02:12:39

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2405.16759v1

Symmetry-Informed Governing Equation Discovery

Despite the advancements in learning governing differential equations from observations of dynamical systems, data-driven methods are often unaware of fundamental physical laws, such as frame invariance. As a result, these algorithms may search an unnecessarily large space and discover equations that are less accurate or overly complex. In this paper, we propose to leverage symmetry in automated equation discovery to compress the equation search space and improve the accuracy and simplicity of the learned equations. Specifically, we derive equivariance constraints from the time-independent symmetries of ODEs. Depending on the types of symmetries, we develop a pipeline for incorporating symmetry constraints into various equation discovery algorithms, including sparse regression and genetic programming. In experiments across a diverse range of dynamical systems, our approach demonstrates better robustness against noise and recovers governing equations with significantly higher probability than baselines without symmetry.

Updated: 2024-05-27 01:58:23

标题: 基于对称性的统一方程发现

摘要: 尽管在从动力系统观测中学习微分方程方面取得了进展，数据驱动方法通常不了解基本的物理定律，比如框架不变性。因此，这些算法可能搜索一个不必要大的空间，并发现精度较低或过于复杂的方程。在本文中，我们提出利用对称性来压缩自动方程发现的搜索空间，提高学习方程的准确度和简单性。具体来说，我们从ODE的时间独立对称性推导出等变性约束。根据对称性的类型，我们开发了一个流水线，将对称性约束融入各种方程发现算法，包括稀疏回归和遗传编程。在跨越各种动态系统的实验中，我们的方法表现出更好的抗噪声性，并以显著更高的概率恢复统治方程，超过没有对称性的基线。

更新时间: 2024-05-27 01:58:23

领域: cs.LG

下载: http://arxiv.org/abs/2405.16756v1

CHESS: Contextual Harnessing for Efficient SQL Synthesis

Utilizing large language models (LLMs) for transforming natural language questions into SQL queries (text-to-SQL) is a promising yet challenging approach, particularly when applied to real-world databases with complex and extensive schemas. In particular, effectively incorporating data catalogs and database values for SQL generation remains an obstacle, leading to suboptimal solutions. We address this problem by proposing a new pipeline that effectively retrieves relevant data and context, selects an efficient schema, and synthesizes correct and efficient SQL queries. To increase retrieval precision, our pipeline introduces a hierarchical retrieval method leveraging model-generated keywords, locality-sensitive hashing indexing, and vector databases. Additionally, we have developed an adaptive schema pruning technique that adjusts based on the complexity of the problem and the model's context size. Our approach generalizes to both frontier proprietary models like GPT-4 and open-source models such as Llama-3-70B. Through a series of ablation studies, we demonstrate the effectiveness of each component of our pipeline and its impact on the end-to-end performance. Our method achieves new state-of-the-art performance on the cross-domain challenging BIRD dataset.

Updated: 2024-05-27 01:54:16

标题: CHESS: 上下文利用以实现高效的SQL合成

摘要: 利用大型语言模型（LLMs）将自然语言问题转化为SQL查询（文本到SQL）是一种有前途但具有挑战性的方法，特别是当应用于具有复杂和广泛架构的真实数据库时。特别是，有效地将数据目录和数据库值纳入SQL生成仍然是一个障碍，导致次优解。我们通过提出一个新的流程来解决这个问题，该流程有效地检索相关数据和上下文，选择高效的架构，并合成正确和高效的SQL查询。为了提高检索精度，我们的流程引入了一种利用模型生成的关键词、局部敏感哈希索引和向量数据库的分层检索方法。此外，我们还开发了一种自适应架构剪枝技术，根据问题的复杂性和模型的上下文大小进行调整。我们的方法适用于像GPT-4这样的前沿专有模型和开源模型，比如Llama-3-70B。通过一系列消融研究，我们展示了我们的流程的每个组件的有效性以及对端到端性能的影响。我们的方法在跨领域具有挑战性的BIRD数据集上实现了新的最先进性能。

更新时间: 2024-05-27 01:54:16

领域: cs.LG,cs.AI,cs.DB

下载: http://arxiv.org/abs/2405.16755v1

Model Ensembling for Constrained Optimization

There is a long history in machine learning of model ensembling, beginning with boosting and bagging and continuing to the present day. Much of this history has focused on combining models for classification and regression, but recently there is interest in more complex settings such as ensembling policies in reinforcement learning. Strong connections have also emerged between ensembling and multicalibration techniques. In this work, we further investigate these themes by considering a setting in which we wish to ensemble models for multidimensional output predictions that are in turn used for downstream optimization. More precisely, we imagine we are given a number of models mapping a state space to multidimensional real-valued predictions. These predictions form the coefficients of a linear objective that we would like to optimize under specified constraints. The fundamental question we address is how to improve and combine such models in a way that outperforms the best of them in the downstream optimization problem. We apply multicalibration techniques that lead to two provably efficient and convergent algorithms. The first of these (the white box approach) requires being given models that map states to output predictions, while the second (the \emph{black box} approach) requires only policies (mappings from states to solutions to the optimization problem). For both, we provide convergence and utility guarantees. We conclude by investigating the performance and behavior of the two algorithms in a controlled experimental setting.

Updated: 2024-05-27 01:48:07

标题: 模型集成用于受限优化

摘要: 机器学习中有一个长期的模型集成历史，从boosting和bagging开始，一直延续到现在。很多这方面的研究集中在将模型组合用于分类和回归，但最近人们对更复杂的情况，比如在强化学习中集成策略，产生了兴趣。集成和多校准技术之间也出现了密切的联系。在这项工作中，我们进一步研究了这些主题，考虑了一个情景，我们希望为多维输出预测集成模型，然后用于下游优化。更确切地说，我们想象我们获得了一些将状态空间映射到多维实值预测的模型。这些预测构成了一个线性目标的系数，我们希望在指定约束条件下优化。我们要解决的根本问题是如何改进和组合这些模型，使其在下游优化问题中超越最优模型。我们应用多校准技术，提出了两种证明有效和收敛的算法。第一种算法（白盒方法）需要提供将状态映射到输出预测的模型，而第二种算法（黑盒方法）仅需要策略（将状态映射到优化问题解决方案）。对于两种方法，我们提供了收敛性和效用保证。最后，我们在一个受控的实验设置中研究了这两种算法的性能和行为。

更新时间: 2024-05-27 01:48:07

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2405.16752v1

LLM-Based Cooperative Agents using Information Relevance and Plan Validation

We address the challenge of multi-agent cooperation, where agents achieve a common goal by interacting with a 3D scene and cooperating with decentralized agents under complex partial observations. This involves managing communication costs and optimizing interaction trajectories in dynamic environments. Our research focuses on three primary limitations of existing cooperative agent systems. Firstly, current systems demonstrate inefficiency in managing acquired information through observation, resulting in declining planning performance as the environment becomes more complex with additional objects or goals. Secondly, the neglect of false plans in partially observable settings leads to suboptimal cooperative performance, as agents struggle to adapt to environmental changes influenced by the unseen actions of other agents. Lastly, the failure to incorporate spatial data into decision-making processes restricts the agent's ability to construct optimized trajectories. To overcome these limitations, we propose the RElevance and Validation-Enhanced Cooperative Language Agent (REVECA), a novel cognitive architecture powered by GPT-3.5. REVECA leverages relevance assessment, plan validation, and spatial information to enhance the efficiency and robustness of agent cooperation in dynamic and partially observable environments while minimizing continuous communication costs and effectively managing irrelevant dummy objects. Our extensive experiments demonstrate the superiority of REVECA over previous approaches, including those driven by GPT-4.0. Additionally, a user study highlights REVECA's potential for achieving trustworthy human-AI cooperation. We expect that REVECA will have significant applications in gaming, XR applications, educational tools, and humanoid robots, contributing to substantial economic, commercial, and academic advancements.

Updated: 2024-05-27 01:47:14

标题: 基于LLM的合作代理人利用信息相关性和计划验证

摘要: 我们解决了多智能体合作的挑战，智能体通过与3D场景互动并在复杂的部分观察条件下与分散的智能体合作来实现共同目标。这涉及管理通信成本和优化动态环境中的交互轨迹。我们的研究集中在现有合作智能体系统的三个主要局限性上。首先，当前系统在通过观察获取信息方面表现出低效率，导致当环境变得更加复杂时，规划性能下降，出现额外对象或目标。其次，在部分可观察设置中忽略错误计划会导致合作绩效不佳，因为智能体难以适应由其他智能体的看不见行动所影响的环境变化。最后，未能将空间数据纳入决策过程限制了智能体构建优化轨迹的能力。为了克服这些限制，我们提出了基于GPT-3.5的RElevance和Validation-Enhanced Cooperative Language Agent（REVECA）的新型认知架构。REVECA利用相关性评估、计划验证和空间信息来增强智能体在动态和部分可观察环境中的合作效率和鲁棒性，同时最小化持续通信成本并有效管理无关的虚拟对象。我们的大量实验证明了REVECA相对于以前的方法的优越性，包括那些由GPT-4.0驱动的方法。此外，用户研究突出了REVECA在实现可信赖的人工智能合作方面的潜力。我们预计REVECA将在游戏、XR应用、教育工具和人形机器人领域有重要应用，为经济、商业和学术的重大进步做出贡献。

更新时间: 2024-05-27 01:47:14

领域: cs.AI,cs.CL,cs.CV,cs.MA

下载: http://arxiv.org/abs/2405.16751v1

Rethinking Independent Cross-Entropy Loss For Graph-Structured Data

Graph neural networks (GNNs) have exhibited prominent performance in learning graph-structured data. Considering node classification task, based on the i.i.d assumption among node labels, the traditional supervised learning simply sums up cross-entropy losses of the independent training nodes and applies the average loss to optimize GNNs' weights. But different from other data formats, the nodes are naturally connected. It is found that the independent distribution modeling of node labels restricts GNNs' capability to generalize over the entire graph and defend adversarial attacks. In this work, we propose a new framework, termed joint-cluster supervised learning, to model the joint distribution of each node with its corresponding cluster. We learn the joint distribution of node and cluster labels conditioned on their representations, and train GNNs with the obtained joint loss. In this way, the data-label reference signals extracted from the local cluster explicitly strengthen the discrimination ability on the target node. The extensive experiments demonstrate that our joint-cluster supervised learning can effectively bolster GNNs' node classification accuracy. Furthermore, being benefited from the reference signals which may be free from spiteful interference, our learning paradigm significantly protects the node classification from being affected by the adversarial attack.

Updated: 2024-05-27 01:42:32

标题: 重新思考图结构数据的独立交叉熵损失

摘要: 图神经网络（GNNs）在学习图结构数据方面表现出卓越的性能。考虑到节点分类任务，基于节点标签之间的i.i.d假设，传统的监督学习简单地将独立训练节点的交叉熵损失相加，并将平均损失应用于优化GNNs的权重。但与其他数据格式不同，节点自然地相互连接。研究发现，节点标签的独立分布建模限制了GNNs在整个图上泛化和抵御对抗性攻击的能力。在这项工作中，我们提出了一个新的框架，称为联合簇监督学习，来建模每个节点与其相应簇的联合分布。我们学习节点和簇标签在它们的表示条件下的联合分布，并用获得的联合损失训练GNNs。这样，从本地簇中提取的数据标签参考信号明确地增强了目标节点的判别能力。广泛的实验表明，我们的联合簇监督学习可以有效地增强GNNs的节点分类准确性。此外，由于参考信号可能不受恶意干扰，我们的学习范式显著地保护了节点分类免受对抗性攻击的影响。

更新时间: 2024-05-27 01:42:32

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2405.15564v2

User Decision Guidance with Selective Explanation Presentation from Explainable-AI

This paper addresses the challenge of selecting explanations for XAI (Explainable AI)-based Intelligent Decision Support Systems (IDSSs). IDSSs have shown promise in improving user decisions through XAI-generated explanations along with AI predictions, and the development of XAI made it possible to generate a variety of such explanations. However, how IDSSs should select explanations to enhance user decision-making remains an open question. This paper proposes X-Selector, a method for selectively presenting XAI explanations. It enables IDSSs to strategically guide users to an AI-suggested decision by predicting the impact of different combinations of explanations on a user's decision and selecting the combination that is expected to minimize the discrepancy between an AI suggestion and a user decision. We compared the efficacy of X-Selector with two naive strategies (all possible explanations and explanations only for the most likely prediction) and two baselines (no explanation and no AI support). The results suggest the potential of X-Selector to guide users to AI-suggested decisions and improve task performance under the condition of a high AI accuracy.

Updated: 2024-05-27 01:40:54

标题: 用户决策指导：可解释人工智能提供选择性解释展示

摘要: 这篇论文讨论了为基于可解释人工智能（XAI）的智能决策支持系统（IDSSs）选择解释的挑战。IDSSs通过XAI生成的解释以及人工智能预测显示出改进用户决策的潜力，而XAI的发展使得能够生成各种解释成为可能。然而，IDSSs应该如何选择解释以增强用户决策仍然是一个悬而未决的问题。本文提出了X-Selector，一种用于选择性呈现XAI解释的方法。它使IDSSs能够通过预测不同解释组合对用户决策的影响，并选择预计能够最小化人工智能建议与用户决策之间差异的组合，从而战略地引导用户到一个人工智能建议的决策。我们将X-Selector的功效与两种朴素策略（所有可能的解释和仅针对最可能的预测的解释）以及两个基线（无解释和无人工智能支持）进行了比较。结果表明，在高人工智能准确性的条件下，X-Selector有潜力引导用户做出人工智能建议的决策，并提高任务绩效。

更新时间: 2024-05-27 01:40:54

领域: cs.HC,cs.AI

下载: http://arxiv.org/abs/2402.18016v3

DMPlug: A Plug-in Method for Solving Inverse Problems with Diffusion Models

Pretrained diffusion models (DMs) have recently been popularly used in solving inverse problems (IPs). The existing methods mostly interleave iterative steps in the reverse diffusion process and iterative steps to bring the iterates closer to satisfying the measurement constraint. However, such interleaving methods struggle to produce final results that look like natural objects of interest (i.e., manifold feasibility) and fit the measurement (i.e., measurement feasibility), especially for nonlinear IPs. Moreover, their capabilities to deal with noisy IPs with unknown types and levels of measurement noise are unknown. In this paper, we advocate viewing the reverse process in DMs as a function and propose a novel plug-in method for solving IPs using pretrained DMs, dubbed DMPlug. DMPlug addresses the issues of manifold feasibility and measurement feasibility in a principled manner, and also shows great potential for being robust to unknown types and levels of noise. Through extensive experiments across various IP tasks, including two linear and three nonlinear IPs, we demonstrate that DMPlug consistently outperforms state-of-the-art methods, often by large margins especially for nonlinear IPs. The code is available at https://github.com/sun-umn/DMPlug.

Updated: 2024-05-27 01:38:30

标题: DMPlug: 一种用扩散模型解决逆问题的插件方法

摘要: 预训练扩散模型（DMs）最近在解决反问题（IPs）方面被广泛使用。现有方法主要交错进行反向扩散过程的迭代步骤和将迭代次数接近满足测量约束的步骤。然而，这种交错方法往往难以产生看起来像自然感兴趣对象（即流形可行性）并符合测量（即测量可行性）的最终结果，尤其是对于非线性IPs。此外，它们处理带有未知类型和水平的噪声的IPs的能力也是未知的。在本文中，我们提倡将DMs中的逆向过程视为一个函数，并提出了一种新颖的使用预训练DMs解决IPs的插件方法，称为DMPlug。DMPlug以原则性的方式解决了流形可行性和测量可行性的问题，并且显示出对未知类型和水平噪声具有很大的鲁棒性。通过在各种IP任务上进行广泛实验，包括两个线性和三个非线性IPs，我们证明DMPlug始终优于最先进的方法，尤其是对于非线性IPs，往往优势明显。代码可在https://github.com/sun-umn/DMPlug 上找到。

更新时间: 2024-05-27 01:38:30

领域: cs.LG,cs.CV

下载: http://arxiv.org/abs/2405.16749v1

Hypergraph Laplacian Eigenmaps and Face Recognition Problems

Face recognition is a very important topic in data science and biometric security research areas. It has multiple applications in military, finance, and retail, to name a few. In this paper, the novel hypergraph Laplacian Eigenmaps will be proposed and combine with the k nearest-neighbor method and/or with the kernel ridge regression method to solve the face recognition problem. Experimental results illustrate that the accuracy of the combination of the novel hypergraph Laplacian Eigenmaps and one specific classification system is similar to the accuracy of the combination of the old symmetric normalized hypergraph Laplacian Eigenmaps method and one specific classification system.

Updated: 2024-05-27 01:35:14

标题: 超图拉普拉斯特征映射与人脸识别问题

摘要: 人脸识别是数据科学和生物特征安全研究领域中非常重要的话题。它在军事、金融和零售等领域有多种应用。本文提出了一种新颖的超图拉普拉斯特征映射方法，并结合k最近邻方法和/或核岭回归方法来解决人脸识别问题。实验结果表明，新颖的超图拉普拉斯特征映射和一个特定分类系统的组合的准确性与旧对称归一化超图拉普拉斯特征映射方法和一个特定分类系统的组合的准确性类似。

更新时间: 2024-05-27 01:35:14

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2405.16748v1

Dissecting Query-Key Interaction in Vision Transformers

Self-attention in vision transformers is often thought to perform perceptual grouping where tokens attend to other tokens with similar embeddings, which could correspond to semantically similar features of an object. However, attending to dissimilar tokens can be beneficial by providing contextual information. We propose to use the Singular Value Decomposition to dissect the query-key interaction (i.e. ${\textbf{W}_q}^\top\textbf{W}_k$). We find that early layers attend more to similar tokens, while late layers show increased attention to dissimilar tokens, providing evidence corresponding to perceptual grouping and contextualization, respectively. Many of these interactions between features represented by singular vectors are interpretable and semantic, such as attention between relevant objects, between parts of an object, or between the foreground and background. This offers a novel perspective on interpreting the attention mechanism, which contributes to understanding how transformer models utilize context and salient features when processing images.

Updated: 2024-05-27 01:31:56

标题: "在视觉变换器中剖析查询键交互"

摘要: 视觉变换器中的自注意力通常被认为执行感知分组，其中令牌会关注具有相似嵌入的其他令牌，这可能对应于对象的语义相似特征。然而，关注不相似的令牌可以通过提供上下文信息而受益。我们建议使用奇异值分解来解剖查询-键交互（即${\textbf{W}_q}^\top\textbf{W}_k$）。我们发现早期层更多地关注相似的令牌，而后期层则显示出对不相似令牌的增加关注，分别提供了与感知分组和上下文化相对应的证据。许多由奇异向量表示的特征之间的这些交互是可以解释和语义化的，例如相关对象之间的关注、对象的部分之间的关注，或前景与背景之间的关注。这为解释注意机制提供了一种新的视角，有助于理解变换器模型在处理图像时如何利用上下文和显著特征。

更新时间: 2024-05-27 01:31:56

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2405.14880v2

Understanding Linear Probing then Fine-tuning Language Models from NTK Perspective

The two-stage fine-tuning (FT) method, linear probing then fine-tuning (LP-FT), consistently outperforms linear probing (LP) and FT alone in terms of accuracy for both in-distribution (ID) and out-of-distribution (OOD) data. This success is largely attributed to the preservation of pre-trained features, achieved through a near-optimal linear head obtained during LP. However, despite the widespread use of large language models, the exploration of complex architectures such as Transformers remains limited. In this paper, we analyze the training dynamics of LP-FT for classification models on the basis of the neural tangent kernel (NTK) theory. Our analysis decomposes the NTK matrix into two components, highlighting the importance of the linear head norm alongside the prediction accuracy at the start of the FT stage. We also observe a significant increase in the linear head norm during LP, stemming from training with the cross-entropy (CE) loss, which effectively minimizes feature changes. Furthermore, we find that this increased norm can adversely affect model calibration, a challenge that can be addressed by temperature scaling. Additionally, we extend our analysis with the NTK to the low-rank adaptation (LoRA) method and validate its effectiveness. Our experiments with a Transformer-based model on natural language processing tasks across multiple benchmarks confirm our theoretical analysis and demonstrate the effectiveness of LP-FT in fine-tuning language models. Code is available at https://github.com/tom4649/lp-ft_ntk.

Updated: 2024-05-27 01:31:40

标题: 从 NTK 角度理解线性探测，然后微调语言模型

摘要: 两阶段微调（FT）方法，线性探测然后微调（LP-FT），在准确性方面始终优于仅线性探测（LP）和仅微调（FT），无论是在分布内（ID）还是分布外（OOD）数据上。这一成功主要归因于通过LP期间获得的近乎最优线性头部实现的预训练特征的保留。然而，尽管大型语言模型被广泛使用，但对Transformer等复杂架构的探索仍然有限。在本文中，我们基于神经切线核（NTK）理论分析了基于分类模型的LP-FT的训练动态。我们的分析将NTK矩阵分解为两个组成部分，突出了FT阶段开始时线性头部规范的重要性以及预测准确性。我们还观察到，在LP期间线性头部规范显著增加，这源自使用交叉熵（CE）损失进行训练，有效地最小化特征变化。此外，我们发现这种增加的规范可能会对模型校准产生不利影响，这是可以通过温度缩放来解决的挑战。此外，我们将我们的NTK分析扩展到低秩适应（LoRA）方法，并验证其有效性。我们在多个基准测试上使用基于Transformer的模型进行自然语言处理任务的实验证实了我们的理论分析，并展示了LP-FT在微调语言模型中的有效性。代码可在https://github.com/tom4649/lp-ft_ntk上找到。

更新时间: 2024-05-27 01:31:40

领域: cs.LG

下载: http://arxiv.org/abs/2405.16747v1

Retro: Reusing teacher projection head for efficient embedding distillation on Lightweight Models via Self-supervised Learning

Self-supervised learning (SSL) is gaining attention for its ability to learn effective representations with large amounts of unlabeled data. Lightweight models can be distilled from larger self-supervised pre-trained models using contrastive and consistency constraints. Still, the different sizes of the projection heads make it challenging for students to mimic the teacher's embedding accurately. We propose \textsc{Retro}, which reuses the teacher's projection head for students, and our experimental results demonstrate significant improvements over the state-of-the-art on all lightweight models. For instance, when training EfficientNet-B0 using ResNet-50/101/152 as teachers, our approach improves the linear result on ImageNet to $66.9\%$, $69.3\%$, and $69.8\%$, respectively, with significantly fewer parameters.

Updated: 2024-05-27 01:22:52

标题: 复古：通过自监督学习在轻量级模型上高效嵌入蒸馏的教师投影头

摘要: 自监督学习（SSL）因其能够利用大量未标记数据学习有效表示而备受关注。轻量级模型可以通过对比和一致性约束从较大的自监督预训练模型中提炼出来。然而，投影头的不同大小使学生难以准确模仿教师的嵌入。我们提出了\textsc{Retro}，它重新利用教师的投影头来为学生服务，并我们的实验结果证明在所有轻量级模型上都取得了显著的改进。例如，当使用ResNet-50/101/152作为教师训练EfficientNet-B0时，我们的方法将ImageNet上的线性结果分别提高到$66.9\%$、$69.3\%$和$69.8\%$，而参数数量明显较少。

更新时间: 2024-05-27 01:22:52

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2405.15311v2

The CAP Principle for LLM Serving: A Survey of Long-Context Large Language Model Serving

We survey the large language model (LLM) serving area to understand the intricate dynamics between cost-efficiency and accuracy, which is magnified by the growing need for longer contextual understanding when deploying models at a massive scale. Our findings reveal that works in this space optimize along three distinct but conflicting goals: improving serving context length (C), improving serving accuracy (A), and improving serving performance (P). Drawing inspiration from the CAP theorem in databases, we propose a CAP principle for LLM serving, which suggests that any optimization can improve at most two of these three goals simultaneously. Our survey categorizes existing works within this framework. We find the definition and continuity of user-perceived measurement metrics are crucial in determining whether a goal has been met, akin to prior CAP databases in the wild. We recognize the CAP principle for LLM serving as a guiding principle, rather than a formal theorem, to inform designers of the inherent and dynamic trade-offs in serving models. As serving accuracy and performance have been extensively studied, this survey focuses on works that extend serving context length and address the resulting challenges.

Updated: 2024-05-27 01:09:07

标题: LLM服务的CAP原则：长上下文大语言模型服务调查

摘要: 我们对大型语言模型（LLM）服务领域进行调查，以了解成本效率和准确性之间的复杂动态，这一动态在部署大规模模型时需要更长的上下文理解。我们的研究发现，该领域的工作优化了三个独立但相互冲突的目标：提高服务上下文长度（C）、提高服务准确性（A）和提高服务性能（P）。受数据库中CAP定理的启发，我们提出了一个适用于LLM服务的CAP原则，该原则表明任何优化最多同时可以改善这三个目标中的两个。我们将现有的工作分类在这一框架内。我们发现用户感知测量指标的定义和连续性对于确定是否已达到目标至关重要，类似于以前在野外的CAP数据库。我们认识到LLM服务的CAP原则是一项指导性原则，而不是正式定理，可为设计人员提供有关服务模型中固有和动态权衡的信息。鉴于服务准确性和性能已得到广泛研究，本调查重点关注扩展服务上下文长度并解决相关挑战的工作。

更新时间: 2024-05-27 01:09:07

领域: cs.DB,cs.LG

下载: http://arxiv.org/abs/2405.11299v2

Oracle-Efficient Reinforcement Learning for Max Value Ensembles

Reinforcement learning (RL) in large or infinite state spaces is notoriously challenging, both theoretically (where worst-case sample and computational complexities must scale with state space cardinality) and experimentally (where function approximation and policy gradient techniques often scale poorly and suffer from instability and high variance). One line of research attempting to address these difficulties makes the natural assumption that we are given a collection of heuristic base or $\textit{constituent}$ policies upon which we would like to improve in a scalable manner. In this work we aim to compete with the $\textit{max-following policy}$, which at each state follows the action of whichever constituent policy has the highest value. The max-following policy is always at least as good as the best constituent policy, and may be considerably better. Our main result is an efficient algorithm that learns to compete with the max-following policy, given only access to the constituent policies (but not their value functions). In contrast to prior work in similar settings, our theoretical results require only the minimal assumption of an ERM oracle for value function approximation for the constituent policies (and not the global optimal policy or the max-following policy itself) on samplable distributions. We illustrate our algorithm's experimental effectiveness and behavior on several robotic simulation testbeds.

Updated: 2024-05-27 01:08:23

标题: Oracle-Efficient Reinforcement Learning for Max Value Ensembles （Max Value Ensembles的Oracle高效强化学习）

摘要: 在大型或无限状态空间中的强化学习（RL）在理论上（最坏情况下的样本和计算复杂性必须与状态空间的基数成比例）和实验上（函数逼近和策略梯度技术通常缩放较差，并且容易出现不稳定性和高方差）都是极具挑战性的。一种研究方向试图解决这些困难，做出自然假设，即我们已经获得一组启发式基本或“组成”策略，希望以可扩展的方式改进它们。在这项工作中，我们的目标是与“最大跟随策略”竞争，该策略在每个状态下遵循具有最高价值的组成策略的动作。最大跟随策略始终至少与最佳组成策略一样好，并且可能更好。我们的主要结果是一种有效算法，仅通过访问组成策略（而不是它们的价值函数）就能学会与最大跟随策略竞争。与类似设置中先前的工作相比，我们的理论结果仅需要对可取样分布的组成策略的价值函数进行近似的ERM预言机的最小假设（而不是全局最优策略或最大跟随策略本身）。我们展示了我们的算法在几个机器人仿真测试平台上的实验有效性和行为。

更新时间: 2024-05-27 01:08:23

领域: cs.LG,cs.AI,cs.SY,eess.SY

下载: http://arxiv.org/abs/2405.16739v1

TSIS with A Comparative Study on Linear Molecular Representation

Encoding is the carrier of information. AI models possess basic capabilities in syntax, semantics, and reasoning, but these capabilities are sensitive to specific inputs. In this study, we introduce an encoding algorithm, TSIS (Simplified TSID), to the t-SMILES family as a fragment-based linear molecular representation. TSID has been demonstrated to significantly outperform classical SMILES, DeepSMILES, and SELFIES in previous work. A further comparative analysis in this study reveals that the tree structure used by TSID is more easily learned than anticipated, regardless of whether Transformer or LSTM models are used. Furthermore, TSIS demonstrates comparable performance to TSID and significantly outperforms SMILES, SELFIES, and SAFE. While SEFLIES and SAFE present significant challenges in semantic and syntactic analysis, respectively, due to their inherent complexity.

Updated: 2024-05-27 01:07:17

标题: 使用线性分子表示的TSIS的比较研究

摘要: 编码是信息的载体。人工智能模型在语法、语义和推理方面具有基本能力，但这些能力对特定输入具有敏感性。在本研究中，我们引入了一种编码算法TSIS（简化的TSID），将其应用于t-SMILES家族作为基于片段的线性分子表示。之前的研究表明，TSID在性能上显著优于经典的SMILES、DeepSMILES和SELFIES。本研究中进一步的比较分析表明，TSID使用的树结构比预期中更容易学习，无论是使用Transformer还是LSTM模型。此外，TSIS表现出与TSID相媲美的性能，明显优于SMILES、SELFIES和SAFE。而SELFIES和SAFE在语义和句法分析方面分别存在重大挑战，这是由于它们固有的复杂性造成的。

更新时间: 2024-05-27 01:07:17

领域: cs.AI,q-bio.BM

下载: http://arxiv.org/abs/2402.02164v2

Scaling Law for Time Series Forecasting

Scaling law that rewards large datasets, complex models and enhanced data granularity has been observed in various fields of deep learning. Yet, studies on time series forecasting have cast doubt on scaling behaviors of deep learning methods for time series forecasting: while more training data improves performance, more capable models do not always outperform less capable models, and longer input horizons may hurt performance for some models. We propose a theory for scaling law for time series forecasting that can explain these seemingly abnormal behaviors. We take into account the impact of dataset size and model complexity, as well as time series data granularity, particularly focusing on the look-back horizon, an aspect that has been unexplored in previous theories. Furthermore, we empirically evaluate various models using a diverse set of time series forecasting datasets, which (1) verifies the validity of scaling law on dataset size and model complexity within the realm of time series forecasting, and (2) validates our theoretical framework, particularly regarding the influence of look back horizon. We hope our findings may inspire new models targeting time series forecasting datasets of limited size, as well as large foundational datasets and models for time series forecasting in future works.\footnote{Codes for our experiments will be made public at: \url{https://github.com/JingzheShi/ScalingLawForTimeSeriesForecasting}.

Updated: 2024-05-27 01:02:08

标题: 时间序列预测的尺度定律

摘要: 深度学习领域已经观察到了奖励大数据集、复杂模型和增强数据粒度的比例律。然而，关于时间序列预测的研究对深度学习方法在时间序列预测中的规模行为提出了质疑：尽管更多的训练数据可以提高性能，但更强大的模型并不总是能够胜过性能较差的模型，而更长的输入视野可能会损害某些模型的性能。我们提出了一个关于时间序列预测的比例律理论，可以解释这些看似异常的行为。我们考虑了数据集大小和模型复杂性的影响，以及时间序列数据的粒度，特别关注回看视野，这是以前理论中未曾探讨的方面。此外，我们通过使用多样的时间序列预测数据集对各种模型进行了实证评估，这既验证了在时间序列预测领域内关于数据集大小和模型复杂性的比例律的有效性，也验证了我们的理论框架，特别是关于回看视野的影响。我们希望我们的发现可以激发新模型，针对有限大小的时间序列预测数据集，以及未来工作中的大型基础数据集和模型进行时间序列预测。我们的实验代码将在以下地址公开：https://github.com/JingzheShi/ScalingLawForTimeSeriesForecasting。

更新时间: 2024-05-27 01:02:08

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2405.15124v2

Faster Sampling via Stochastic Gradient Proximal Sampler

Stochastic gradients have been widely integrated into Langevin-based methods to improve their scalability and efficiency in solving large-scale sampling problems. However, the proximal sampler, which exhibits much faster convergence than Langevin-based algorithms in the deterministic setting Lee et al. (2021), has yet to be explored in its stochastic variants. In this paper, we study the Stochastic Proximal Samplers (SPS) for sampling from non-log-concave distributions. We first establish a general framework for implementing stochastic proximal samplers and establish the convergence theory accordingly. We show that the convergence to the target distribution can be guaranteed as long as the second moment of the algorithm trajectory is bounded and restricted Gaussian oracles can be well approximated. We then provide two implementable variants based on Stochastic gradient Langevin dynamics (SGLD) and Metropolis-adjusted Langevin algorithm (MALA), giving rise to SPS-SGLD and SPS-MALA. We further show that SPS-SGLD and SPS-MALA can achieve $\epsilon$-sampling error in total variation (TV) distance within $\tilde{\mathcal{O}}(d\epsilon^{-2})$ and $\tilde{\mathcal{O}}(d^{1/2}\epsilon^{-2})$ gradient complexities, which outperform the best-known result by at least an $\tilde{\mathcal{O}}(d^{1/3})$ factor. This enhancement in performance is corroborated by our empirical studies on synthetic data with various dimensions, demonstrating the efficiency of our proposed algorithm.

Updated: 2024-05-27 00:53:18

标题: 更快的采样：通过随机梯度近端采样器

摘要: 随机梯度已被广泛整合到基于Langevin方法中，以提高其可扩展性和解决大规模采样问题的效率。然而，在确定性情境中，表现出比Langevin算法更快收敛速度的近端采样器Lee等人（2021）尚未在其随机变体中进行探索。本文研究了用于从非对数凹分布中进行采样的随机近端采样器（SPS）。我们首先建立了一个实现随机近端采样器的通用框架，并相应地建立了收敛理论。我们表明，只要算法轨迹的二阶矩有界且受限高斯预言可以被很好地近似，就可以保证收敛到目标分布。然后，我们基于随机梯度Langevin动力学（SGLD）和Metropolis调整Langevin算法（MALA）提供了两种可实现的变体，形成SPS-SGLD和SPS-MALA。我们进一步表明，SPS-SGLD和SPS-MALA可以在总变差（TV）距离内实现$\epsilon$-采样误差，其梯度复杂度为$\tilde{\mathcal{O}}(d\epsilon^{-2})$和$\tilde{\mathcal{O}}(d^{1/2}\epsilon^{-2})$，优于目前已知结果至少一个$\tilde{\mathcal{O}}(d^{1/3})$因子。通过我们在具有各种维度的合成数据上的实证研究，证实了我们提出的算法的效率提升。

更新时间: 2024-05-27 00:53:18

领域: stat.ML,cs.LG

下载: http://arxiv.org/abs/2405.16734v1

Debiased Distribution Compression

Modern compression methods can summarize a target distribution $\mathbb{P}$ more succinctly than i.i.d. sampling but require access to a low-bias input sequence like a Markov chain converging quickly to $\mathbb{P}$. We introduce a new suite of compression methods suitable for compression with biased input sequences. Given $n$ points targeting the wrong distribution and quadratic time, Stein kernel thinning (SKT) returns $\sqrt{n}$ equal-weighted points with $\widetilde{O}(n^{-1/2})$ maximum mean discrepancy (MMD) to $\mathbb{P}$. For larger-scale compression tasks, low-rank SKT achieves the same feat in sub-quadratic time using an adaptive low-rank debiasing procedure that may be of independent interest. For downstream tasks that support simplex or constant-preserving weights, Stein recombination and Stein Cholesky achieve even greater parsimony, matching the guarantees of SKT with as few as $\text{poly-log}(n)$ weighted points. Underlying these advances are new guarantees for the quality of simplex-weighted coresets, the spectral decay of kernel matrices, and the covering numbers of Stein kernel Hilbert spaces. In our experiments, our techniques provide succinct and accurate posterior summaries while overcoming biases due to burn-in, approximate Markov chain Monte Carlo, and tempering.

Updated: 2024-05-27 00:47:25

标题: 去偏置的分布压缩

摘要: 现代压缩方法可以比独立同分布采样更简洁地总结目标分布$\mathbb{P}$，但需要访问一个类似于快速收敛到$\mathbb{P}$的马尔可夫链的低偏差输入序列。我们引入了一套适用于带偏序列压缩的新压缩方法。给定$n$个针对错误分布的点和二次时间，Stein核稀疏化（SKT）返回$\sqrt{n}$个等权重点，其最大均值差异（MMD）为$\widetilde{O}(n^{-1/2})$，与$\mathbb{P}$相比。对于更大规模的压缩任务，低秩SKT使用自适应低秩去偏程序以次二次时间实现相同的效果，这可能是一个独立感兴趣的过程。对于支持单纯形或保持常数权重的下游任务，Stein重组和Stein乔列斯基实现了更大的简洁性，与SKT的保证相匹配，只需$\text{poly-log}(n)$个加权点。这些进展的基础是关于单纯形加权核心集质量、核矩阵谱衰减以及Stein核希尔伯特空间覆盖数的新保证。在我们的实验中，我们的技术提供了简洁准确的后验摘要，同时克服了由烧毁、近似马尔可夫链蒙特卡洛和温和引起的偏差。

更新时间: 2024-05-27 00:47:25

领域: stat.ML,cs.LG,stat.CO,stat.ME

下载: http://arxiv.org/abs/2404.12290v2

Adaptive Batch Normalization Networks for Adversarial Robustness

Deep networks are vulnerable to adversarial examples. Adversarial Training (AT) has been a standard foundation of modern adversarial defense approaches due to its remarkable effectiveness. However, AT is extremely time-consuming, refraining it from wide deployment in practical applications. In this paper, we aim at a non-AT defense: How to design a defense method that gets rid of AT but is still robust against strong adversarial attacks? To answer this question, we resort to adaptive Batch Normalization (BN), inspired by the recent advances in test-time domain adaptation. We propose a novel defense accordingly, referred to as the Adaptive Batch Normalization Network (ABNN). ABNN employs a pre-trained substitute model to generate clean BN statistics and sends them to the target model. The target model is exclusively trained on clean data and learns to align the substitute model's BN statistics. Experimental results show that ABNN consistently improves adversarial robustness against both digital and physically realizable attacks on both image and video datasets. Furthermore, ABNN can achieve higher clean data performance and significantly lower training time complexity compared to AT-based approaches.

Updated: 2024-05-27 00:38:08

标题: 自适应批量归一化网络用于对抗鲁棒性

摘要: 深度网络对抗性样本具有脆弱性。由于其显著的有效性，Adversarial Training (AT) 已成为现代对抗性防御方法的标准基础。然而，AT 非常耗时，限制了其在实际应用中的广泛部署。本文旨在提出一种非 AT 防御方法：如何设计一种在不使用 AT 的情况下仍然能够抵御强对抗性攻击的防御方法？为了回答这个问题，我们借鉴了最近在测试阶段领域自适应方面的进展，提出了一种新颖的防御方法，称为自适应批量归一化网络（ABNN）。ABNN 利用预训练的替代模型生成干净的 BN 统计信息，并将其发送给目标模型。目标模型仅在干净数据上进行训练，并学习调整替代模型的 BN 统计信息。实验结果显示，ABNN 在图像和视频数据集上持续提高对抗性鲁棒性，对数字和可实现的攻击均有效。此外，与基于 AT 的方法相比，ABNN 可以实现更高的干净数据性能和显著降低的训练时间复杂度。

更新时间: 2024-05-27 00:38:08

领域: cs.LG,cs.CV

下载: http://arxiv.org/abs/2405.11708v2

The Collusion of Memory and Nonlinearity in Stochastic Approximation With Constant Stepsize

In this work, we investigate stochastic approximation (SA) with Markovian data and nonlinear updates under constant stepsize $\alpha>0$. Existing work has primarily focused on either i.i.d. data or linear update rules. We take a new perspective and carefully examine the simultaneous presence of Markovian dependency of data and nonlinear update rules, delineating how the interplay between these two structures leads to complications that are not captured by prior techniques. By leveraging the smoothness and recurrence properties of the SA updates, we develop a fine-grained analysis of the correlation between the SA iterates $\theta_k$ and Markovian data $x_k$. This enables us to overcome the obstacles in existing analysis and establish for the first time the weak convergence of the joint process $(x_k, \theta_k)_{k\geq0}$. Furthermore, we present a precise characterization of the asymptotic bias of the SA iterates, given by $\mathbb{E}[\theta_\infty]-\theta^\ast=\alpha(b_\text{m}+b_\text{n}+b_\text{c})+O(\alpha^{3/2})$. Here, $b_\text{m}$ is associated with the Markovian noise, $b_\text{n}$ is tied to the nonlinearity, and notably, $b_\text{c}$ represents a multiplicative interaction between the Markovian noise and nonlinearity, which is absent in previous works. As a by-product of our analysis, we derive finite-time bounds on higher moment $\mathbb{E}[\|\theta_k-\theta^\ast\|^{2p}]$ and present non-asymptotic geometric convergence rates for the iterates, along with a Central Limit Theorem.

Updated: 2024-05-27 00:23:42

标题: 随机逼近中记忆和非线性的勾结与恒定步长

摘要: 在这项工作中，我们研究了具有马尔可夫数据和非线性更新的随机逼近（SA），步长$\alpha>0$恒定。现有工作主要集中在独立同分布数据或线性更新规则上。我们采取了新的视角，仔细研究了数据的马尔可夫依赖性和非线性更新规则的同时存在，描绘了这两种结构之间的相互作用如何导致以前的技术未能捕捉到的复杂性。通过利用SA更新的平滑性和循环性质，我们对SA迭代$\theta_k$与马尔可夫数据$x_k$之间的相关性进行了精细分析。这使我们能够克服现有分析中的障碍，并首次建立了联合过程$(x_k, \theta_k)_{k\geq0}$的弱收敛性。此外，我们提出了SA迭代的渐近偏差的精确表征，即 $\mathbb{E}[\theta_\infty]-\theta^\ast=\alpha(b_\text{m}+b_\text{n}+b_\text{c})+O(\alpha^{3/2})$。这里，$b_\text{m}$与马尔可夫噪声相关，$b_\text{n}$与非线性相关，值得注意的是，$b_\text{c}$表示马尔可夫噪声和非线性之间的乘法相互作用，在先前的工作中是不存在的。作为我们分析的副产品，我们推导了关于更高矩的有限时间界限$\mathbb{E}[\|\theta_k-\theta^\ast\|^{2p}]$，并提出了迭代的非渐近几何收敛速度，以及一个中心极限定理。

更新时间: 2024-05-27 00:23:42

领域: stat.ML,cs.LG,math.OC,math.ST,stat.TH

下载: http://arxiv.org/abs/2405.16732v1

Pretraining with Random Noise for Fast and Robust Learning without Weight Transport

The brain prepares for learning even before interacting with the environment, by refining and optimizing its structures through spontaneous neural activity that resembles random noise. However, the mechanism of such a process has yet to be thoroughly understood, and it is unclear whether this process can benefit the algorithm of machine learning. Here, we study this issue using a neural network with a feedback alignment algorithm, demonstrating that pretraining neural networks with random noise increases the learning efficiency as well as generalization abilities without weight transport. First, we found that random noise training modifies forward weights to match backward synaptic feedback, which is necessary for teaching errors by feedback alignment. As a result, a network with pre-aligned weights learns notably faster than a network without random noise training, even reaching a convergence speed comparable to that of a backpropagation algorithm. Sequential training with both random noise and data brings weights closer to synaptic feedback than training solely with data, enabling more precise credit assignment and faster learning. We also found that each readout probability approaches the chance level and that the effective dimensionality of weights decreases in a network pretrained with random noise. This pre-regularization allows the network to learn simple solutions of a low rank, reducing the generalization loss during subsequent training. This also enables the network robustly to generalize a novel, out-of-distribution dataset. Lastly, we confirmed that random noise pretraining reduces the amount of meta-loss, enhancing the network ability to adapt to various tasks. Overall, our results suggest that random noise training with feedback alignment offers a straightforward yet effective method of pretraining that facilitates quick and reliable learning without weight transport.

Updated: 2024-05-27 00:12:51

标题: 使用随机噪声进行预训练，实现快速和稳健的学习，无需权重传输

摘要: 大脑在与环境互动之前甚至准备学习，通过类似随机噪音的自发神经活动来完善和优化其结构。然而，这种过程的机制尚未被彻底理解，也不清楚这一过程是否有助于机器学习算法。在这里，我们使用带有反馈对齐算法的神经网络研究了这个问题，证明了使用随机噪音对神经网络进行预训练可以提高学习效率和泛化能力，而无需权重传递。首先，我们发现随机噪音训练修改前向权重以匹配反向突触反馈，这对通过反馈对齐教导错误是必要的。因此，具有预对齐权重的网络学习速度显著快于没有随机噪音训练的网络，甚至达到与反向传播算法相媲美的收敛速度。随机噪音和数据的顺序训练使权重更接近突触反馈，比仅使用数据训练更容易进行精确的信用分配和更快的学习。我们还发现随机噪音预训练的网络中，每个输出概率都接近机会水平，而权重的有效维度减小。这种预正则化使网络能够学习低秩简单解决方案，在后续训练过程中减少泛化损失。这还使网络能够稳健地推广到一个新的、超出分布的数据集。最后，我们确认随机噪音预训练减少了元损失的数量，增强了网络适应各种任务的能力。总的来说，我们的结果表明，使用反馈对齐的随机噪音训练提供了一种简单而有效的预训练方法，促进了快速和可靠的学习，而无需权重传输。

更新时间: 2024-05-27 00:12:51

领域: cs.LG,cs.NE

下载: http://arxiv.org/abs/2405.16731v1

Latent Energy-Based Odyssey: Black-Box Optimization via Expanded Exploration in the Energy-Based Latent Space

Offline Black-Box Optimization (BBO) aims at optimizing a black-box function using the knowledge from a pre-collected offline dataset of function values and corresponding input designs. However, the high-dimensional and highly-multimodal input design space of black-box function pose inherent challenges for most existing methods that model and operate directly upon input designs. These issues include but are not limited to high sample complexity, which relates to inaccurate approximation of black-box function; and insufficient coverage and exploration of input design modes, which leads to suboptimal proposal of new input designs. In this work, we consider finding a latent space that serves as a compressed yet accurate representation of the design-value joint space, enabling effective latent exploration of high-value input design modes. To this end, we formulate an learnable energy-based latent space, and propose Noise-intensified Telescoping density-Ratio Estimation (NTRE) scheme for variational learning of an accurate latent space model without costly Markov Chain Monte Carlo. The optimization process is then exploration of high-value designs guided by the learned energy-based model in the latent space, formulated as gradient-based sampling from a latent-variable-parameterized inverse model. We show that our particular parameterization encourages expanded exploration around high-value design modes, motivated by inversion thinking of a fundamental result of conditional covariance matrix typically used for variance reduction. We observe that our method, backed by an accurately learned informative latent space and an expanding-exploration model design, yields significant improvements over strong previous methods on both synthetic and real world datasets such as the design-bench suite.

Updated: 2024-05-27 00:11:53

标题: 潜在能量驱动的奥德赛：通过在基于能量的潜在空间中进行扩展探索实现黑盒优化

摘要: 离线黑盒优化（BBO）旨在利用预先收集的函数值和相应输入设计的离线数据集的知识来优化黑盒函数。然而，黑盒函数的高维和高度多模态的输入设计空间对大多数现有的直接建模和操作输入设计的方法提出了固有的挑战。这些问题包括但不限于高样本复杂性，与黑盒函数的不准确逼近有关；以及输入设计模式的覆盖范围和探索不足，导致新输入设计的次优提议。在这项工作中，我们考虑找到一个作为设计-值联合空间的压缩但准确表示的潜在空间，从而实现高价值输入设计模式的有效潜在探索。为此，我们制定了一个可学习的基于能量的潜在空间，并提出了噪声增强的望远密度比估计（NTRE）方案，用于变分学习准确的潜在空间模型，而无需昂贵的马尔可夫链蒙特卡洛。然后，优化过程是在潜在空间中由学习的基于能量的模型引导的高价值设计的探索，被规定为从一个潜在变量参数化的逆模型中进行梯度采样。我们展示了我们特定的参数化鼓励围绕高价值设计模式进行扩展探索，这受到通常用于方差缩减的条件协方差矩阵基本结果的反转思维的激励。我们观察到，我们的方法，支持准确学习的信息丰富的潜在空间和扩展探索模型设计，对合成和真实世界数据集（如设计基准套件）均取得了显著的改进。

更新时间: 2024-05-27 00:11:53

领域: cs.LG,cs.AI,stat.AP

下载: http://arxiv.org/abs/2405.16730v1

Free-Space Optical Channel Turbulence Prediction: A Machine Learning Approach

Channel turbulence presents a formidable obstacle for free-space optical (FSO) communication. Anticipation of turbulence levels is highly important for mitigating disruptions. We study the application of machine learning (ML) to FSO data streams to rapidly predict channel turbulence levels with no additional sensing hardware. An optical bit stream was transmitted through a controlled channel in the lab under six distinct turbulence levels, and the efficacy of using ML to classify turbulence levels was examined. ML-based turbulence level classification was found to be >98% accurate with multiple ML training parameters, but highly dependent upon the timescale of changes between turbulence levels.

Updated: 2024-05-27 00:08:36

标题: 自由空间光通道湍流预测：一种机器学习方法

摘要: 通道湍流对自由空间光（FSO）通信构成了巨大的障碍。预测湍流水平对于减轻干扰至关重要。我们研究了将机器学习（ML）应用于FSO数据流，以快速预测通道湍流水平，而无需额外的传感硬件。在实验室中通过受控通道传输了光比特流，涵盖了六个不同的湍流水平，并检查了使用ML对湍流水平进行分类的有效性。发现基于ML的湍流水平分类在多个ML训练参数下的准确率超过98％，但高度取决于湍流水平之间的变化时间尺度。

更新时间: 2024-05-27 00:08:36

领域: eess.SY,cs.LG,cs.SY

下载: http://arxiv.org/abs/2405.16729v1