Arxiv Day: Article

What Makes CLIP More Robust to Long-Tailed Pre-Training Data? A Controlled Study for Transferable Insights

Severe data imbalance naturally exists among web-scale vision-language datasets. Despite this, we find CLIP pre-trained thereupon exhibits notable robustness to the data imbalance compared to supervised learning, and demonstrates significant effectiveness in learning generalizable representations. With an aim to investigate the reasons behind this finding, we conduct controlled experiments to study various underlying factors, and reveal that CLIP's pretext task forms a dynamic classification problem wherein only a subset of classes is present in training. This isolates the bias from dominant classes and implicitly balances the learning signal. Furthermore, the robustness and discriminability of CLIP improve with more descriptive language supervision, larger data scale, and broader open-world concepts, which are inaccessible to supervised learning. Our study not only uncovers the mechanisms behind CLIP's generalizability beyond data imbalance but also provides transferable insights for the research community. The findings are validated in both supervised and self-supervised learning, enabling models trained on imbalanced data to achieve CLIP-level performance on diverse recognition tasks. Code and data are available at: https://github.com/CVMI-Lab/clip-beyond-tail.

Updated: 2024-10-27 23:53:20

标题: CLIP为什么更能适应长尾预训练数据？一项用于可转移见解的控制研究

摘要: 大规模视觉语言数据集中存在严重的数据不平衡。尽管如此，我们发现，在此基础上进行预训练的CLIP与监督学习相比，对数据不平衡表现出显著的鲁棒性，并且在学习可泛化的表示方面表现出显著的有效性。为了调查这一发现背后的原因，我们进行了受控实验来研究各种潜在因素，并揭示了CLIP的预文本任务形成了一个动态分类问题，在训练中只有一部分类别存在。这将偏向性从主导类别中隔离出来，并隐性地平衡了学习信号。此外，CLIP的鲁棒性和可区分性随着更具描述性的语言监督、更大规模的数据以及更广泛的开放世界概念而提高，这些对监督学习是不可获得的。我们的研究不仅揭示了CLIP在数据不平衡之外的泛化机制，还为研究社区提供了可转移的见解。这些发现在监督学习和自监督学习中得到验证，使在不平衡数据上训练的模型能够在各种识别任务上实现与CLIP水平的性能。代码和数据可在以下链接获取：https://github.com/CVMI-Lab/clip-beyond-tail。

更新时间: 2024-10-27 23:53:20

领域: cs.CV,cs.CL,cs.LG

下载: http://arxiv.org/abs/2405.21070v3

Language Models And A Second Opinion Use Case: The Pocket Professional

This research tests the role of Large Language Models (LLMs) as formal second opinion tools in professional decision-making, particularly focusing on complex medical cases where even experienced physicians seek peer consultation. The work analyzed 183 challenging medical cases from Medscape over a 20-month period, testing multiple LLMs' performance against crowd-sourced physician responses. A key finding was the high overall score possible in the latest foundational models (>80% accuracy compared to consensus opinion), which exceeds most human metrics reported on the same clinical cases (450 pages of patient profiles, test results). The study rates the LLMs' performance disparity between straightforward cases (>81% accuracy) and complex scenarios (43% accuracy), particularly in these cases generating substantial debate among human physicians. The research demonstrates that LLMs may be valuable as generators of comprehensive differential diagnoses rather than as primary diagnostic tools, potentially helping to counter cognitive biases in clinical decision-making, reduce cognitive loads, and thus remove some sources of medical error. The inclusion of a second comparative legal dataset (Supreme Court cases, N=21) provides added empirical context to the AI use to foster second opinions, though these legal challenges proved considerably easier for LLMs to analyze. In addition to the original contributions of empirical evidence for LLM accuracy, the research aggregated a novel benchmark for others to score highly contested question and answer reliability between both LLMs and disagreeing human practitioners. These results suggest that the optimal deployment of LLMs in professional settings may differ substantially from current approaches that emphasize automation of routine tasks.

Updated: 2024-10-27 23:48:47

标题: 语言模型和第二意见使用案例：口袋专业人士

摘要: 这项研究检验了大型语言模型（LLMs）在专业决策中作为正式第二意见工具的作用，特别关注复杂医疗案例，在这些案例中，即使经验丰富的医生也会寻求同行咨询。该研究分析了在20个月内从Medscape获得的183个具有挑战性的医疗案例，测试了多个LLMs在与众包医生回应的表现。一个关键发现是最新基础模型在这些临床案例中的高整体得分（>80%准确率相比共识意见），超过了大多数同一临床案例上报告的人类指标（450页的患者档案、检测结果）。研究评估了LLMs在简单案例（>81%准确率）和复杂场景（43%准确率）之间的表现差距，特别是在这些引发人类医生之间较大争议的案例中。研究表明，LLMs可能有价值作为全面不同诊断的生成器，而不是作为主要诊断工具，潜在地有助于对抗临床决策中的认知偏见，减轻认知负荷，从而消除一些医疗错误的来源。增加了第二比较法律数据集（最高法院案例，N=21）为AI使用以促进第二意见提供了额外的实证背景，尽管这些法律挑战对LLMs来说分析起来更容易。除了LLMs准确性的原始贡献之外，该研究还汇总了一个新颖的基准，供他人对LLMs和持不同意见的人类从业者之间高度争议的问题和答案可靠性进行评分。这些结果表明，在专业环境中最佳部署LLMs的方式可能与当前强调自动化例行任务的方法有很大不同。

更新时间: 2024-10-27 23:48:47

领域: cs.CR,cs.AI

下载: http://arxiv.org/abs/2410.20636v1

Learning Continually by Spectral Regularization

Loss of plasticity is a phenomenon where neural networks can become more difficult to train over the course of learning. Continual learning algorithms seek to mitigate this effect by sustaining good performance while maintaining network trainability. We develop a new technique for improving continual learning inspired by the observation that the singular values of the neural network parameters at initialization are an important factor for trainability during early phases of learning. From this perspective, we derive a new spectral regularizer for continual learning that better sustains these beneficial initialization properties throughout training. In particular, the regularizer keeps the maximum singular value of each layer close to one. Spectral regularization directly ensures that gradient diversity is maintained throughout training, which promotes continual trainability, while minimally interfering with performance in a single task. We present an experimental analysis that shows how the proposed spectral regularizer can sustain trainability and performance across a range of model architectures in continual supervised and reinforcement learning settings. Spectral regularization is less sensitive to hyperparameters while demonstrating better training in individual tasks, sustaining trainability as new tasks arrive, and achieving better generalization performance.

Updated: 2024-10-27 23:45:38

标题: 通过频谱正则化持续学习

摘要: 失去可塑性是一个现象，神经网络在学习过程中可能变得更难训练。持续学习算法旨在通过保持网络可训练性的同时维持良好性能来减轻这种影响。我们开发了一种新的技术，用于改进持续学习，灵感来自于神经网络参数在初始化时的奇异值是学习早期阶段可训练性的重要因素的观察结果。从这个角度出发，我们推导出一种新的用于持续学习的谱正则化器，可以更好地在整个训练过程中保持这些有益的初始化特性。特别是，该正则化器使每一层的最大奇异值保持接近于一。谱正则化直接确保梯度多样性在整个训练过程中得以保持，从而促进持续的可训练性，同时最小干扰单一任务中的性能。我们提出了一项实验分析，展示了所提出的谱正则化器如何在持续监督和强化学习设置中跨模型架构维持可训练性和性能。谱正则化对超参数不太敏感，同时在单个任务中表现更好，随着新任务的到来能够保持可训练性，并实现更好的泛化性能。

更新时间: 2024-10-27 23:45:38

领域: cs.LG

下载: http://arxiv.org/abs/2406.06811v2

Bayesian Bandit Algorithms with Approximate Inference in Stochastic Linear Bandits

Bayesian bandit algorithms with approximate Bayesian inference have been widely used in real-world applications. Despite the superior practical performance, their theoretical justification is less investigated in the literature, especially for contextual bandit problems. To fill this gap, we propose a theoretical framework to analyze the impact of approximate inference in stochastic linear bandits and conduct regret analysis on two Bayesian bandit algorithms, Linear Thompson sampling (LinTS) and the extension of Bayesian Upper Confidence Bound, namely Linear Bayesian Upper Confidence Bound (LinBUCB). We demonstrate that when applied in the presence of approximate inference, LinTS and LinBUCB can preserve their original rates of regret upper bound but with a sacrifice of larger constant terms. These results hold for general Bayesian inference approaches, assuming the inference error measured by two different $\alpha$-divergences is bounded. Additionally, by introducing a new definition of well-behaved distributions, we show that LinBUCB expedites the regret rate of LinTS from $\tilde{O}(d^{3/2}\sqrt{T})$ to $\tilde{O}(d\sqrt{T})$, matching the minimax optimal rate. To our knowledge, this work provides the first regret bounds in the setting of stochastic linear bandits with bounded approximate inference errors.

Updated: 2024-10-27 23:43:15

标题: 贝叶斯赌博算法在随机线性赌博机中的近似推断

摘要: 具有近似贝叶斯推断的贝叶斯赌博算法已广泛应用于实际应用中。尽管在实际性能方面表现出卓越的表现，但其理论基础在文献中的研究较少，特别是对于上下文赌博问题。为了填补这一空白，我们提出了一个理论框架来分析近似推断在随机线性赌博中的影响，并对两种贝叶斯赌博算法，即线性汤普森采样（LinTS）和贝叶斯上界置信度的扩展，即线性贝叶斯上界置信度（LinBUCB）进行遗憾分析。我们证明，当应用近似推断时，LinTS和LinBUCB可以保持其原始遗憾上界率，但会牺牲更大的常数项。这些结果适用于一般的贝叶斯推断方法，假设由两个不同的α-散度度量的推断误差是有界的。此外，通过引入新的良好分布定义，我们展示了LinBUCB将LinTS的遗憾率从O(d^3/2√T)加快到O(d√T)，与极小化最优率相匹配。据我们所知，这项工作提供了在具有有限近似推断误差的随机线性赌博设置中的第一个遗憾上界。

更新时间: 2024-10-27 23:43:15

领域: stat.ML,cs.LG

下载: http://arxiv.org/abs/2406.14071v2

Enhancing Domain Adaptation through Prompt Gradient Alignment

Prior Unsupervised Domain Adaptation (UDA) methods often aim to train a domain-invariant feature extractor, which may hinder the model from learning sufficiently discriminative features. To tackle this, a line of works based on prompt learning leverages the power of large-scale pre-trained vision-language models to learn both domain-invariant and specific features through a set of domain-agnostic and domain-specific learnable prompts. Those studies typically enforce invariant constraints on representation, output, or prompt space to learn such prompts. Differently, we cast UDA as a multiple-objective optimization problem in which each objective is represented by a domain loss. Under this new framework, we propose aligning per-objective gradients to foster consensus between them. Additionally, to prevent potential overfitting when fine-tuning this deep learning architecture, we penalize the norm of these gradients. To achieve these goals, we devise a practical gradient update procedure that can work under both single-source and multi-source UDA. Empirically, our method consistently surpasses other prompt-based baselines by a large margin on different UDA benchmarks.

Updated: 2024-10-27 23:40:32

标题: 通过提示梯度对齐增强领域适应性

摘要: 以前的无监督领域适应（UDA）方法通常旨在训练一个领域不变的特征提取器，这可能会阻碍模型学习足够具有区分性的特征。为了解决这个问题，基于提示学习的一系列工作利用大规模预训练的视觉-语言模型的能力，通过一组领域不可知和领域特定的可学习提示来学习领域不变和特定特征。这些研究通常通过对表示、输出或提示空间施加不变约束来学习这些提示。与此不同的是，我们将UDA视为一个多目标优化问题，其中每个目标由一个领域损失表示。在这个新框架下，我们提出将每个目标的梯度对齐以促进它们之间的共识。此外，在微调这种深度学习架构时，为了防止潜在的过拟合，我们惩罚这些梯度的范数。为了实现这些目标，我们设计了一个可以在单源和多源UDA下工作的实用梯度更新过程。从经验上看，我们的方法在不同的UDA基准测试中始终大幅超过其他基于提示的基线。

更新时间: 2024-10-27 23:40:32

领域: cs.LG,cs.CV

下载: http://arxiv.org/abs/2406.09353v2

Plastic Learning with Deep Fourier Features

Deep neural networks can struggle to learn continually in the face of non-stationarity. This phenomenon is known as loss of plasticity. In this paper, we identify underlying principles that lead to plastic algorithms. In particular, we provide theoretical results showing that linear function approximation, as well as a special case of deep linear networks, do not suffer from loss of plasticity. We then propose deep Fourier features, which are the concatenation of a sine and cosine in every layer, and we show that this combination provides a dynamic balance between the trainability obtained through linearity and the effectiveness obtained through the nonlinearity of neural networks. Deep networks composed entirely of deep Fourier features are highly trainable and sustain their trainability over the course of learning. Our empirical results show that continual learning performance can be drastically improved by replacing ReLU activations with deep Fourier features. These results hold for different continual learning scenarios (e.g., label noise, class incremental learning, pixel permutations) on all major supervised learning datasets used for continual learning research, such as CIFAR10, CIFAR100, and tiny-ImageNet.

Updated: 2024-10-27 23:38:06

标题: 使用深度傅里叶特征进行塑料学习

摘要: 深度神经网络在面对非稳态环境时可能难以持续学习。这种现象被称为可塑性丧失。本文确定了导致可塑性算法的基本原则。具体而言，我们提供了理论结果，显示线性函数逼近以及深度线性网络的特例并不受可塑性丧失的影响。然后，我们提出了深度傅立叶特征，即每一层中正弦和余弦的串联，并展示这种组合在线性训练性和神经网络的非线性效果之间提供了动态平衡。完全由深度傅立叶特征组成的深度网络具有很高的可训练性，并在学习过程中保持其可训练性。我们的实证结果表明，通过用深度傅里叶特征替换ReLU激活函数，可以显著改善持续学习的表现。这些结果适用于不同的持续学习场景（例如，标签噪声、类别递增学习、像素排列）以及用于持续学习研究的所有主要监督学习数据集，如CIFAR10、CIFAR100和小图像网。

更新时间: 2024-10-27 23:38:06

领域: cs.LG

下载: http://arxiv.org/abs/2410.20634v1

LaRS: Latent Reasoning Skills for Chain-of-Thought Reasoning

Chain-of-thought (CoT) prompting is a popular in-context learning (ICL) approach for large language models (LLMs), especially when tackling complex reasoning tasks. Traditional ICL approaches construct prompts using examples that contain questions similar to the input question. However, CoT prompting, which includes crucial intermediate reasoning steps (rationales) within its examples, necessitates selecting examples based on these rationales rather than the questions themselves. Existing methods require human experts or pre-trained LLMs to describe the skill, a high-level abstraction of rationales, to guide the selection. These methods, however, are often costly and difficult to scale. Instead, this paper introduces a new approach named Latent Reasoning Skills (LaRS) that employs unsupervised learning to create a latent space representation of rationales, with a latent variable called a reasoning skill. Concurrently, LaRS learns a reasoning policy to determine the required reasoning skill for a given question. Then the ICL examples are selected by aligning the reasoning skills between past examples and the question. This approach is theoretically grounded and compute-efficient, eliminating the need for auxiliary LLM inference or manual prompt design. Empirical results demonstrate that LaRS consistently outperforms SOTA skill-based selection methods, processing example banks four times faster, reducing LLM inferences during the selection stage by half, and showing greater robustness to sub-optimal example banks.

Updated: 2024-10-27 23:36:25

标题: LaRS: 用于思维链推理的潜在推理技能

摘要: 思维链（CoT）提示是一种流行的上下文学习（ICL）方法，特别适用于大型语言模型（LLMs），尤其是在处理复杂推理任务时。传统的ICL方法构建提示使用包含与输入问题类似的问题的示例。然而，CoT提示在其示例中包含关键的中间推理步骤（基本原理），必须基于这些基本原理而不是问题本身选择示例。现有方法需要人类专家或预训练的LLMs描述技能，即基本原理的高级抽象，以指导选择。然而，这些方法通常成本高昂且难以扩展。相反，本文介绍了一种名为潜在推理技能（LaRS）的新方法，该方法利用无监督学习创建基本原理的潜在空间表示，其中包含一个称为推理技能的潜在变量。同时，LaRS学习一个推理策略来确定给定问题所需的推理技能。然后，通过对齐过去示例和问题之间的推理技能来选择ICL示例。这种方法在理论上基础扎实且计算高效，消除了需要辅助LLM推理或手动提示设计的需求。实证结果表明，LaRS始终优于SOTA基于技能的选择方法，处理示例库的速度提高了四倍，减少了在选择阶段的LLM推理次数一半，并显示出对次优示例库的更大鲁棒性。

更新时间: 2024-10-27 23:36:25

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2312.04684v3

KG-FIT: Knowledge Graph Fine-Tuning Upon Open-World Knowledge

Knowledge Graph Embedding (KGE) techniques are crucial in learning compact representations of entities and relations within a knowledge graph, facilitating efficient reasoning and knowledge discovery. While existing methods typically focus either on training KGE models solely based on graph structure or fine-tuning pre-trained language models with classification data in KG, KG-FIT leverages LLM-guided refinement to construct a semantically coherent hierarchical structure of entity clusters. By incorporating this hierarchical knowledge along with textual information during the fine-tuning process, KG-FIT effectively captures both global semantics from the LLM and local semantics from the KG. Extensive experiments on the benchmark datasets FB15K-237, YAGO3-10, and PrimeKG demonstrate the superiority of KG-FIT over state-of-the-art pre-trained language model-based methods, achieving improvements of 14.4%, 13.5%, and 11.9% in the Hits@10 metric for the link prediction task, respectively. Furthermore, KG-FIT yields substantial performance gains of 12.6%, 6.7%, and 17.7% compared to the structure-based base models upon which it is built. These results highlight the effectiveness of KG-FIT in incorporating open-world knowledge from LLMs to significantly enhance the expressiveness and informativeness of KG embeddings.

Updated: 2024-10-27 23:31:49

标题: KG-FIT:知识图谱在开放世界知识上的微调

摘要: 知识图谱嵌入（KGE）技术在学习知识图谱中实体和关系的紧凑表示方面至关重要，促进了高效的推理和知识发现。尽管现有方法通常集中于仅基于图结构训练KGE模型或使用分类数据在KG中微调预训练语言模型，KG-FIT利用LLM引导的细化来构建实体集群的语义一致的层次结构。通过在微调过程中结合这种层次知识和文本信息，KG-FIT有效地捕获了LLM的全局语义和KG的局部语义。在基准数据集FB15K-237、YAGO3-10和PrimeKG上进行的大量实验表明，KG-FIT相对于基于最先进的预训练语言模型的方法具有明显优势，在链接预测任务的Hits@10指标上分别实现了14.4%、13.5%和11.9%的改进。此外，与构建KG-FIT的基于结构的基础模型相比，KG-FIT在性能上取得了12.6%、6.7%和17.7%的显著增益。这些结果突显了KG-FIT在整合LLM的开放世界知识以显著增强KG嵌入的表达能力和信息量方面的有效性。

更新时间: 2024-10-27 23:31:49

领域: cs.CL,cs.LG

下载: http://arxiv.org/abs/2405.16412v3

Schrödinger Bridge with Quadratic State Cost is Exactly Solvable

Schr\"{o}dinger bridge is a diffusion process that steers a given distribution to another in a prescribed time while minimizing the effort to do so. It can be seen as the stochastic dynamical version of the optimal mass transport, and has growing applications in generative diffusion models and stochastic optimal control. {\black{We say a Schr\"{o}dinger bridge is ``exactly solvable'' if the associated uncontrolled Markov kernel is available in closed form, since then the bridge can be numerically computed using dynamic Sinkhorn recursion for arbitrary endpoint distributions with finite second moments.}} In this work, we propose a regularized variant of the Schr\"{o}dinger bridge with a quadratic state cost-to-go that incentivizes the optimal sample paths to stay close to a nominal level. Unlike the conventional Schr\"{o}dinger bridge, the regularization induces a state-dependent rate of killing and creation of probability mass, and its solution requires determining the Markov kernel of a reaction-diffusion partial differential equation. We derive this Markov kernel in closed form, {\black{showing that the regularized Schr\"{o}dinger bridge is exactly solvable, even for non-Gaussian endpoints. This advances the state-of-the-art because closed form Markov kernel for the regularized Schr\"{o}dinger bridge is available in existing literature only for Gaussian endpoints}}. Our solution recovers the heat kernel in the vanishing regularization (i.e., diffusion without reaction) limit, thereby recovering the solution of the conventional Schr\"{o}dinger bridge {\black{as a special case}}. We deduce properties of the new kernel and explain its connections with certain exactly solvable models in quantum mechanics.

Updated: 2024-10-27 23:21:54

标题: 薛定谔桥与二次状态成本的确切可解性

摘要: 薛定谔桥是一种扩散过程，可以在规定的时间内将给定分布引导到另一个分布，同时最小化这一过程所需的努力。它可以看作是最优质量传输的随机动力学版本，并在生成式扩散模型和随机最优控制中得到越来越多的应用。如果相关的无控制马尔可夫核以闭合形式提供，则我们称薛定谔桥是“可精确求解的”，因为在这种情况下，可以使用动态Sinkhorn递归对具有有限二阶矩的任意端点分布进行数值计算。在这项工作中，我们提出了薛定谔桥的正则化变体，其中包含二次状态成本，激励最优采样路径保持接近名义水平。与传统的薛定谔桥不同，正则化引发了一个状态相关的杀戮和概率质量创造的速率，并且其解决方案需要确定反应扩散偏微分方程的马尔可夫核。我们以闭合形式推导了这个马尔可夫核，展示了正则化薛定谔桥是可精确求解的，即使对于非高斯端点也是如此。这推动了现有文献中仅对高斯端点提供正则化薛定谔桥的闭合形式马尔可夫核的最新技术进展。我们的解决方案在正则化消失（即无反应扩散）极限下恢复了热核，从而恢复了传统薛定谔桥的解作为特例。我们推断了新核的性质，并解释了它与量子力学中某些可精确求解模型的联系。

更新时间: 2024-10-27 23:21:54

领域: math.OC,cs.LG,cs.SY,eess.SY,math-ph,math.MP,stat.ML

下载: http://arxiv.org/abs/2406.00503v4

Taming Cross-Domain Representation Variance in Federated Prototype Learning with Heterogeneous Data Domains

Federated learning (FL) allows collaborative machine learning training without sharing private data. While most FL methods assume identical data domains across clients, real-world scenarios often involve heterogeneous data domains. Federated Prototype Learning (FedPL) addresses this issue, using mean feature vectors as prototypes to enhance model generalization. However, existing FedPL methods create the same number of prototypes for each client, leading to cross-domain performance gaps and disparities for clients with varied data distributions. To mitigate cross-domain feature representation variance, we introduce FedPLVM, which establishes variance-aware dual-level prototypes clustering and employs a novel $\alpha$-sparsity prototype loss. The dual-level prototypes clustering strategy creates local clustered prototypes based on private data features, then performs global prototypes clustering to reduce communication complexity and preserve local data privacy. The $\alpha$-sparsity prototype loss aligns samples from underrepresented domains, enhancing intra-class similarity and reducing inter-class similarity. Evaluations on Digit-5, Office-10, and DomainNet datasets demonstrate our method's superiority over existing approaches.

Updated: 2024-10-27 23:21:07

标题: 驯服异构数据域中的联邦原型学习中的跨领域表示差异

摘要: 联邦学习（FL）允许协作机器学习训练而无需共享私人数据。虽然大多数FL方法假设客户端之间具有相同的数据域，但现实世界中的场景通常涉及异构数据域。联邦原型学习（FedPL）解决了这个问题，使用均值特征向量作为原型来增强模型的泛化能力。然而，现有的FedPL方法为每个客户端创建相同数量的原型，导致跨领域性能差距和客户端数据分布差异。为了减轻跨领域特征表征方差，我们引入了FedPLVM，该方法建立了方差感知的双层原型聚类，并采用了新颖的α-稀疏原型损失。双层原型聚类策略基于私有数据特征创建本地聚类原型，然后执行全局原型聚类以减少通信复杂性并保护本地数据隐私。α-稀疏原型损失对来自低表征领域的样本进行对齐，增强类内相似性并减少类间相似性。在Digit-5、Office-10和DomainNet数据集上的评估结果表明我们的方法优于现有方法。

更新时间: 2024-10-27 23:21:07

领域: cs.LG,cs.CV

下载: http://arxiv.org/abs/2403.09048v2

TabDiff: a Multi-Modal Diffusion Model for Tabular Data Generation

Synthesizing high-quality tabular data is an important topic in many data science tasks, ranging from dataset augmentation to privacy protection. However, developing expressive generative models for tabular data is challenging due to its inherent heterogeneous data types, complex inter-correlations, and intricate column-wise distributions. In this paper, we introduce TabDiff, a joint diffusion framework that models all multi-modal distributions of tabular data in one model. Our key innovation is the development of a joint continuous-time diffusion process for numerical and categorical data, where we propose feature-wise learnable diffusion processes to counter the high disparity of different feature distributions. TabDiff is parameterized by a transformer handling different input types, and the entire framework can be efficiently optimized in an end-to-end fashion. We further introduce a multi-modal stochastic sampler to automatically correct the accumulated decoding error during sampling, and propose classifier-free guidance for conditional missing column value imputation. Comprehensive experiments on seven datasets demonstrate that TabDiff achieves superior average performance over existing competitive baselines across all eight metrics, with up to $22.5\%$ improvement over the state-of-the-art model on pair-wise column correlation estimations. Code is available at https://github.com/MinkaiXu/TabDiff.

Updated: 2024-10-27 22:58:47

标题: TabDiff：用于表格数据生成的多模态扩散模型

摘要: 合成高质量的表格数据是许多数据科学任务中的一个重要主题，涵盖了数据集增强到隐私保护等多个方面。然而，为表格数据开发具有表现力的生成模型具有挑战性，因为它具有固有的异构数据类型、复杂的相互关系和复杂的列分布。在本文中，我们介绍了TabDiff，这是一个联合扩散框架，可以在一个模型中对表格数据的所有多模态分布进行建模。我们的关键创新是开发了一个用于数值和分类数据的联合连续时间扩散过程，其中我们提出了特征-wise 可学习的扩散过程，以应对不同特征分布的高差异性。TabDiff由一个处理不同输入类型的变压器参数化，整个框架可以以端到端的方式高效优化。我们进一步引入了一个多模态随机采样器，以自动纠正在采样过程中积累的解码错误，并提出了无分类器的条件缺失列值插补指导。对七个数据集的全面实验表明，TabDiff在所有八个度量标准上均优于现有竞争基线的平均性能，对于成对列相关性估计，其改进率最高可达22.5%。代码可在https://github.com/MinkaiXu/TabDiff获取。

更新时间: 2024-10-27 22:58:47

领域: cs.LG

下载: http://arxiv.org/abs/2410.20626v1

LoRA Done RITE: Robust Invariant Transformation Equilibration for LoRA Optimization

Low-rank adaption (LoRA) is a widely used parameter-efficient finetuning method for LLM that reduces memory requirements. However, current LoRA optimizers lack transformation invariance, meaning the actual updates to the weights depends on how the two LoRA factors are scaled or rotated. This deficiency leads to inefficient learning and sub-optimal solutions in practice. This paper introduces LoRA-RITE, a novel adaptive matrix preconditioning method for LoRA optimization, which can achieve transformation invariance and remain computationally efficient. We provide theoretical analysis to demonstrate the benefit of our method and conduct experiments on various LLM tasks with different models including Gemma 2B, 7B, and mT5-XXL. The results demonstrate consistent improvements against existing optimizers. For example, replacing Adam with LoRA-RITE during LoRA fine-tuning of Gemma-2B yielded 4.6\% accuracy gain on Super-Natural Instructions and 3.5\% accuracy gain across other four LLM benchmarks (HellaSwag, ArcChallenge, GSM8K, OpenBookQA).

Updated: 2024-10-27 22:57:12

标题: LoRA完成RITE：LoRA优化的稳健不变转换均衡化

摘要: 低秩适应（LoRA）是一种广泛使用的参数高效微调方法，用于减少LLM的内存需求。然而，目前的LoRA优化器缺乏变换不变性，这意味着权重的实际更新取决于两个LoRA因子如何被缩放或旋转。这种缺陷导致了学习效率低和在实践中次优解的问题。本文介绍了LoRA-RITE，一种新颖的自适应矩阵预处理方法，用于LoRA优化，可以实现变换不变性并保持计算效率。我们提供理论分析来展示我们方法的优势，并在包括Gemma 2B、7B和mT5-XXL在内的不同模型上进行实验。结果表明，与现有优化器相比，我们的方法持续改进。例如，在LoRA微调Gemma-2B期间，用LoRA-RITE替换Adam，在“超自然指令”上获得了4.6％的准确率提升，并在其他四个LLM基准测试（HellaSwag、ArcChallenge、GSM8K、OpenBookQA）上获得了3.5％的准确率提升。

更新时间: 2024-10-27 22:57:12

领域: cs.LG,cs.AI,cs.CL

下载: http://arxiv.org/abs/2410.20625v1

Geometric Collaborative Filtering with Convergence

Latent variable collaborative filtering methods have been a standard approach to modelling user-click interactions due to their simplicity and effectiveness. However, there is limited work on analyzing the mathematical properties of these methods in particular on preventing the overfitting towards the identity, and such methods typically utilize loss functions that overlook the geometry between items. In this work, we introduce a notion of generalization gap in collaborative filtering and analyze this with respect to latent collaborative filtering models. We present a geometric upper bound that gives rise to loss functions, and a way to meaningfully utilize the geometry of item-metadata to improve recommendations. We show how these losses can be minimized and gives the recipe to a new latent collaborative filtering algorithm, which we refer to as GeoCF, due to the geometric nature of our results. We then show experimentally that our proposed GeoCF algorithm can outperform other all existing methods on the Movielens20M and Netflix datasets, as well as two large-scale internal datasets. In summary, our work proposes a theoretically sound method which paves a way to better understand generalization of collaborative filtering at large.

Updated: 2024-10-27 22:56:17

标题: 几何协同过滤与收敛性

摘要: 潜变量协同过滤方法一直是建模用户点击交互的标准方法，因其简单性和有效性而广泛应用。然而，关于这些方法的数学性质分析工作有限，尤其是在防止过度拟合到身份的方面，而这些方法通常利用忽视物品之间几何关系的损失函数。在这项工作中，我们引入了协同过滤中的泛化差距概念，并针对潜在协同过滤模型进行了分析。我们提出了一个几何上界，导致损失函数产生，并提出了一种有意义地利用物品元数据几何性提高推荐的方法。我们展示了如何将这些损失最小化，并提供了一种新的潜在协同过滤算法的配方，我们将其称为GeoCF，因为我们的结果具有几何性质。然后，我们在Movielens20M和Netflix数据集以及两个大规模内部数据集上实验证明，我们提出的GeoCF算法可以胜过所有现有方法。总之，我们的工作提出了一种理论上合理的方法，为更好地理解协同过滤的泛化打开了一条道路。

更新时间: 2024-10-27 22:56:17

领域: cs.IR,cs.LG,cs.SY,eess.SY

下载: http://arxiv.org/abs/2410.03064v2

Few-Shot Transfer Learning for Individualized Braking Intent Detection on Neuromorphic Hardware

Objective: This work explores use of a few-shot transfer learning method to train and implement a convolutional spiking neural network (CSNN) on a BrainChip Akida AKD1000 neuromorphic system-on-chip for developing individual-level, instead of traditionally used group-level, models using electroencephalographic data. Main Results: Efficacy of the above methodology to develop individual-specific braking intention predictive models by rapidly adapting the group-level model in as few as three training epochs while achieving at least 90% accuracy, true positive rate and true negative rate is presented. Further, results show the energy-efficiency of the neuromorphic hardware through a power reduction of over 97% with only a $1.3* increase in latency when using the Akida AKD1000 processor for network inference compared to an Intel Xeon central processing unit. Similar results were obtained in a subsequent ablation study using a subset of five out of 19 channels.

Updated: 2024-10-27 22:52:33

标题: 少样本迁移学习用于神经形态硬件上个性化制动意图检测

摘要: 目标：本研究探讨了使用少样本迁移学习方法，在BrainChip Akida AKD1000神经形态系统芯片上训练和实现卷积脉冲神经网络（CSNN），以开发使用脑电图数据开发个体级别而非传统使用群体级别模型。主要结果：上述方法的有效性在仅经过三个训练周期即可快速调整群体级别模型，实现至少90%准确度、真阳性率和真阴性率，用于开发个体特定的制动意图预测模型。此外，结果显示，相比于Intel Xeon中央处理单元，使用Akida AKD1000处理器进行网络推断时，神经形态硬件的能效性通过功耗降低超过97%，仅在延迟方面增加了1.3倍。在随后的消融研究中使用19个通道中的五个子集获得了类似的结果。

更新时间: 2024-10-27 22:52:33

领域: cs.NE,cs.LG,eess.SP

下载: http://arxiv.org/abs/2408.03336v2

Kernel Approximation of Fisher-Rao Gradient Flows

The purpose of this paper is to answer a few open questions in the interface of kernel methods and PDE gradient flows. Motivated by recent advances in machine learning, particularly in generative modeling and sampling, we present a rigorous investigation of Fisher-Rao and Wasserstein type gradient flows concerning their gradient structures, flow equations, and their kernel approximations. Specifically, we focus on the Fisher-Rao (also known as Hellinger) geometry and its various kernel-based approximations, developing a principled theoretical framework using tools from PDE gradient flows and optimal transport theory. We also provide a complete characterization of gradient flows in the maximum-mean discrepancy (MMD) space, with connections to existing learning and inference algorithms. Our analysis reveals precise theoretical insights linking Fisher-Rao flows, Stein flows, kernel discrepancies, and nonparametric regression. We then rigorously prove evolutionary $\Gamma$-convergence for kernel-approximated Fisher-Rao flows, providing theoretical guarantees beyond pointwise convergence. Finally, we analyze energy dissipation using the Helmholtz-Rayleigh principle, establishing important connections between classical theory in mechanics and modern machine learning practice. Our results provide a unified theoretical foundation for understanding and analyzing approximations of gradient flows in machine learning applications through a rigorous gradient flow and variational method perspective.

Updated: 2024-10-27 22:52:08

标题: 核逼近的Fisher-Rao梯度流近似

摘要: 本文的目的是回答核方法和PDE梯度流界面上的一些未解问题。受最近机器学习领域的进展启发，特别是在生成建模和抽样方面，我们对Fisher-Rao和Wasserstein类型的梯度流进行了严格的研究，涉及它们的梯度结构、流动方程和它们的核逼近。具体而言，我们着重研究Fisher-Rao（也称为Hellinger）几何及其各种基于核的逼近，利用PDE梯度流和最优传输理论的工具开发了一个有原则的理论框架。我们还在最大均值差异（MMD）空间中提供了梯度流的完整特征化，与现有的学习和推理算法有关。我们的分析揭示了将Fisher-Rao流、Stein流、核差异和非参数回归联系起来的精确理论洞察。然后，我们严格证明了核逼近的Fisher-Rao流的演化$\Gamma$-收敛，提供了超出点态收敛的理论保证。最后，我们使用Helmholtz-Rayleigh原理分析了能量耗散，建立了经典力学理论与现代机器学习实践之间的重要联系。我们的结果通过严格的梯度流和变分方法视角为理解和分析机器学习应用中梯度流逼近提供了统一的理论基础。

更新时间: 2024-10-27 22:52:08

领域: stat.ML,cs.LG,math.AP

下载: http://arxiv.org/abs/2410.20622v1

Pruning neural network models for gene regulatory dynamics using data and domain knowledge

The practical utility of machine learning models in the sciences often hinges on their interpretability. It is common to assess a model's merit for scientific discovery, and thus novel insights, by how well it aligns with already available domain knowledge--a dimension that is currently largely disregarded in the comparison of neural network models. While pruning can simplify deep neural network architectures and excels in identifying sparse models, as we show in the context of gene regulatory network inference, state-of-the-art techniques struggle with biologically meaningful structure learning. To address this issue, we propose DASH, a generalizable framework that guides network pruning by using domain-specific structural information in model fitting and leads to sparser, better interpretable models that are more robust to noise. Using both synthetic data with ground truth information, as well as real-world gene expression data, we show that DASH, using knowledge about gene interaction partners within the putative regulatory network, outperforms general pruning methods by a large margin and yields deeper insights into the biological systems being studied.

Updated: 2024-10-27 22:50:23

标题: 使用数据和领域知识对基因调控动力学进行神经网络模型修剪

摘要: 机器学习模型在科学中的实际应用往往取决于其可解释性。评估模型对科学发现的价值，以及与已有领域知识的对齐程度通常是常见的做法，但这一维度在神经网络模型的比较中目前很大程度上被忽视了。虽然剪枝可以简化深度神经网络结构并在识别稀疏模型方面表现出色，但正如我们在基因调控网络推断的情境中展示的那样，最先进的技术在学习具有生物意义的结构方面存在困难。为了解决这个问题，我们提出了一个通用框架DASH，通过在模型拟合中使用领域特定的结构信息来引导网络剪枝，从而产生更稀疏、更易解释的模型，对噪声更加稳健。使用既包含地面真实信息的合成数据，也包含真实基因表达数据，我们展示了DASH通过利用关于潜在调控网络中基因相互作用伙伴的知识，远远优于一般的剪枝方法，并深入了解研究中的生物系统。

更新时间: 2024-10-27 22:50:23

领域: cs.LG,q-bio.QM,stat.AP,stat.ML

下载: http://arxiv.org/abs/2403.04805v2

GemNet: Menu-Based, Strategy-Proof Multi-Bidder Auctions Through Deep Learning

Automated mechanism design (AMD) uses computational methods for mechanism design. Differentiable economics is a form of AMD that uses deep learning to learn mechanism designs and has enabled strong progress in AMD in recent years. Nevertheless, a major open problem has been to learn multi-bidder, general, and fully strategy-proof (SP) auctions. We introduce GEneral Menu-based NETwork (GemNet), which significantly extends the menu-based approach of the single-bidder RochetNet (D\"utting et al., 2024) to the multi-bidder setting. The challenge in achieving SP is to learn bidder-independent menus that are feasible, so that the optimal menu choices for each bidder do not over-allocate items when taken together (we call this menu compatibility). GemNet penalizes the failure of menu compatibility during training, and transforms learned menus after training through price changes, by considering a set of discretized bidder values and reasoning about Lipschitz smoothness to guarantee menu compatibility on the entire value space. This approach is general, leaving trained menus that already satisfy menu compatibility undisturbed and reducing to RochetNet for a single bidder. Mixed-integer linear programs are used for menu transforms, and through a number of optimizations enabled by deep learning, including adaptive grids and methods to skip menu elements, we scale to large auction design problems. GemNet learns auctions with better revenue than affine maximization methods, achieves exact SP whereas previous general multi-bidder methods are approximately SP, and offers greatly enhanced interpretability.

Updated: 2024-10-27 22:47:25

标题: GemNet：基于菜单的、通过深度学习实现的多竞标者拍卖机制

摘要: 自动机制设计（AMD）使用计算方法进行机制设计。可微经济学是一种AMD形式，它使用深度学习来学习机制设计，并在近年来在AMD领域取得了强大的进展。然而，一个主要的开放问题是学习多竞标者、通用且完全策略证明（SP）的拍卖。我们引入了通用基于菜单的网络（GemNet），它显著扩展了单竞标者RochetNet（D\"utting等，2024）的菜单式方法到多竞标者设置。实现SP的挑战是学习与竞标者无关的可行菜单，使得每个竞标者的最佳菜单选择在一起时不过度分配物品（我们称之为菜单兼容性）。GemNet在训练过程中惩罚菜单兼容性的失败，并通过价格变化在训练后转换学习的菜单，通过考虑一组离散化的竞标者值并推理关于Lipschitz平滑性来保证在整个值空间上的菜单兼容性。这种方法是通用的，保持已经满足菜单兼容性的训练菜单不受干扰，并为单竞标者减少到RochetNet。混合整数线性规划用于菜单转换，通过深度学习实现的一些优化，包括自适应网格和跳过菜单元素的方法，我们可以扩展到大型拍卖设计问题。GemNet学习的拍卖收入比仿射最大化方法更好，实现了精确的SP，而以前的通用多竞标者方法是近似SP，并提供了极大的可解释性。

更新时间: 2024-10-27 22:47:25

领域: cs.GT,cs.AI,cs.LG

下载: http://arxiv.org/abs/2406.07428v2

TreeBoN: Enhancing Inference-Time Alignment with Speculative Tree-Search and Best-of-N Sampling

Inference-time alignment enhances the performance of large language models without requiring additional training or fine-tuning but presents challenges due to balancing computational efficiency with high-quality output. Best-of-N (BoN) sampling, as a simple yet powerful approach, generates multiple responses and selects the best one, achieving improved performance but with a high computational cost. We propose TreeBoN, a novel framework that integrates a speculative tree-search strategy into Best-of-N (BoN) Sampling. TreeBoN maintains a set of parent nodes, iteratively branching and pruning low-quality responses, thereby reducing computational overhead while maintaining high output quality. Our approach also leverages token-level rewards from Direct Preference Optimization (DPO) to guide tree expansion and prune low-quality paths. We evaluate TreeBoN using AlpacaFarm, HH-RLHF, UltraFeedback, GSM8K, and TutorEval datasets, demonstrating consistent improvements. Specifically, TreeBoN achieves the highest win rate of 65% on TutorEval and around 60% win rates across other different datasets, outperforming standard BoN with the same computational cost and showcasing its scalability and alignment efficacy.

Updated: 2024-10-27 22:30:42

标题: TreeBoN：通过猜测性树搜索和最佳N抽样增强推断时间对齐

摘要: 推论时间对齐增强了大型语言模型的性能，无需额外的训练或微调，但在平衡计算效率和高质量输出方面面临挑战。最佳N（BoN）抽样作为一种简单而强大的方法，生成多个响应并选择最佳响应，实现了改进的性能但具有较高的计算成本。我们提出了TreeBoN，这是一个将一种推测式树搜索策略集成到最佳N（BoN）抽样中的新框架。TreeBoN维护一组父节点，迭代地分支和修剪低质量响应，从而降低计算开销同时保持高输出质量。我们的方法还利用来自直接偏好优化（DPO）的令牌级奖励来指导树的扩展并修剪低质量路径。我们使用AlpacaFarm、HH-RLHF、UltraFeedback、GSM8K和TutorEval数据集评估了TreeBoN，展示了一致的改进。具体而言，TreeBoN在TutorEval数据集上实现了65%的最高胜率，在其他不同数据集上的胜率约为60%，优于具有相同计算成本的标准BoN，并展示了其可伸缩性和对齐效果。

更新时间: 2024-10-27 22:30:42

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2410.16033v2

Zero-Trust Network Access (ZTNA)

Zero-Trust Network Access (ZTNA) marks a significant shift in network security by adopting a "never trust, always verify" approach. This work provides an in-depth analysis of ZTNA, offering a comprehensive framework for understanding its principles, architectures, and applications. We discuss its role in securing modern, complex network environments, which include cloud platforms, Internet of Things (IoT) devices, and hybrid enterprise networks. Our objective is to create a key resource for researchers and practitioners by reviewing critical methodologies, analyzing current implementations, and highlighting open challenges and research directions.

Updated: 2024-10-27 22:01:50

标题: Zero-Trust Network Access (ZTNA)的翻译是“零信任网络访问”。

摘要: 零信任网络访问（ZTNA）通过采用“永不信任，始终验证”方法，标志着网络安全的重大转变。本文提供了对ZTNA的深入分析，为理解其原则、架构和应用提供了全面的框架。我们讨论了它在保护现代复杂网络环境中的作用，这些环境包括云平台、物联网（IoT）设备和混合企业网络。我们的目标是通过审查关键方法论、分析当前实施情况，并突出开放挑战和研究方向，为研究人员和实践者创建一个重要资源。

更新时间: 2024-10-27 22:01:50

领域: cs.CR,cs.NI

下载: http://arxiv.org/abs/2410.20611v1

Evaluating the design space of diffusion-based generative models

Most existing theoretical investigations of the accuracy of diffusion models, albeit significant, assume the score function has been approximated to a certain accuracy, and then use this a priori bound to control the error of generation. This article instead provides a first quantitative understanding of the whole generation process, i.e., both training and sampling. More precisely, it conducts a non-asymptotic convergence analysis of denoising score matching under gradient descent. In addition, a refined sampling error analysis for variance exploding models is also provided. The combination of these two results yields a full error analysis, which elucidates (again, but this time theoretically) how to design the training and sampling processes for effective generation. For instance, our theory implies a preference toward noise distribution and loss weighting in training that qualitatively agree with the ones used in [Karras et al., 2022]. It also provides perspectives on the choices of time and variance schedules in sampling: when the score is well trained, the design in [Song et al., 2021] is more preferable, but when it is less trained, the design in [Karras et al., 2022] becomes more preferable.

Updated: 2024-10-27 21:51:56

标题: 评估基于扩散的生成模型的设计空间

摘要: 现有的关于扩散模型准确性的理论研究，尽管很重要，但都假设得分函数已经被近似到一定的准确度，然后利用这个先验界限来控制生成的误差。本文提供了对整个生成过程的第一次定量理解，即训练和抽样。更具体地说，它进行了一个非渐进收敛分析，对梯度下降下的去噪得分匹配进行分析。此外，还提供了对方差爆炸模型的精细抽样误差分析。这两个结果的结合产生了一个完整的误差分析，从而阐明了如何在理论上设计有效的生成的训练和抽样过程。例如，我们的理论暗示在训练中对噪声分布和损失加权的偏好，这与[Karras等人，2022]中使用的方法在定性上是一致的。它还提供了关于抽样中时间和方差调度选择的观点：当分数训练良好时，[Song等人，2021]中的设计更可取，但当分数训练较差时，[Karras等人，2022]中的设计更可取。

更新时间: 2024-10-27 21:51:56

领域: cs.LG,math.DS,math.OC,math.PR,stat.ML

下载: http://arxiv.org/abs/2406.12839v4

Understanding Sarcoidosis Using Large Language Models and Social Media Data

Sarcoidosis is a rare inflammatory disease characterized by the formation of granulomas in various organs. The disease presents diagnostic and treatment challenges due to its diverse manifestations and unpredictable nature. In this study, we employed a Large Language Model (LLM) to analyze sarcoidosis-related discussions on the social media platform Reddit. Our findings underscore the efficacy of LLMs in accurately identifying sarcoidosis-related content. We discovered a wide array of symptoms reported by patients, with fatigue, swollen lymph nodes, and shortness of breath as the most prevalent. Prednisone was the most prescribed medication, while infliximab showed the highest effectiveness in improving prognoses. Notably, our analysis revealed disparities in prognosis based on age and gender, with women and younger patients experiencing good and polarized outcomes, respectively. Furthermore, unsupervised clustering identified three distinct patient subgroups (phenotypes) with unique symptom profiles, prognostic outcomes, and demographic distributions. Finally, sentiment analysis revealed a moderate negative impact on patients' mental health post-diagnosis, particularly among women and younger individuals. Our study represents the first application of LLMs to understand sarcoidosis through social media data. It contributes to understanding the disease by providing data-driven insights into its manifestations, treatments, prognoses, and impact on patients' lives. Our findings have direct implications for improving personalized treatment strategies and enhancing the quality of care for individuals living with sarcoidosis.

Updated: 2024-10-27 21:48:23

标题: 利用大型语言模型和社交媒体数据理解结节病

摘要: 结节病是一种罕见的炎症性疾病，其特征是在各种器官中形成肉芽肿。该疾病由于其多样化的表现和不可预测的性质而具有诊断和治疗挑战。在这项研究中，我们使用了大型语言模型（LLM）来分析在社交媒体平台Reddit上关于结节病的讨论。我们的发现强调了LLM在准确识别与结节病相关内容方面的有效性。我们发现患者报告了多种症状，其中疲劳、淋巴结肿大和呼吸急促最为普遍。泼尼松是最常开的药物，而英夫利昔单抗在改善预后方面显示出最高的有效性。值得注意的是，我们的分析揭示了年龄和性别对预后的差异，女性和年轻患者分别经历良好和极端的结果。此外，无监督聚类识别出三个具有独特症状特征、预后结果和人口分布的患者亚组（表型）。最后，情感分析显示，诊断后对患者的心理健康产生了中度负面影响，尤其是在女性和年轻人中。我们的研究代表了LLM在通过社交媒体数据理解结节病方面的首次应用。它通过提供基于数据的洞察，有助于理解该疾病的表现、治疗、预后以及对患者生活的影响。我们的发现对改进个性化治疗策略和提高与结节病患者生活质量有直接意义。

更新时间: 2024-10-27 21:48:23

领域: cs.CL,cs.AI,cs.SI

下载: http://arxiv.org/abs/2405.13005v2

Images that Sound: Composing Images and Sounds on a Single Canvas

Spectrograms are 2D representations of sound that look very different from the images found in our visual world. And natural images, when played as spectrograms, make unnatural sounds. In this paper, we show that it is possible to synthesize spectrograms that simultaneously look like natural images and sound like natural audio. We call these visual spectrograms images that sound. Our approach is simple and zero-shot, and it leverages pre-trained text-to-image and text-to-spectrogram diffusion models that operate in a shared latent space. During the reverse process, we denoise noisy latents with both the audio and image diffusion models in parallel, resulting in a sample that is likely under both models. Through quantitative evaluations and perceptual studies, we find that our method successfully generates spectrograms that align with a desired audio prompt while also taking the visual appearance of a desired image prompt. Please see our project page for video results: https://ificl.github.io/images-that-sound/

Updated: 2024-10-27 21:47:14

标题: 图像发声：在单一画布上创作图像和声音

摘要: 光谱图是声音的2D表示，与我们视觉世界中的图像看起来非常不同。而自然图像，当作为光谱图播放时，会产生不自然的声音。在本文中，我们展示了可以合成既像自然图像又听起来像自然音频的光谱图是可能的。我们将这些视觉光谱图称为具有声音的图像。我们的方法简单且零-shot，并利用在共享潜在空间中运行的预训练文本到图像和文本到光谱图扩散模型。在反向过程中，我们同时使用音频和图像扩散模型去除噪声潜在因素，生成符合两种模型的样本。通过定量评估和感知研究，我们发现我们的方法成功生成了与所需音频提示相符的光谱图，同时也考虑了所需图像提示的视觉外观。请查看我们的项目页面获取视频结果：https://ificl.github.io/images-that-sound/

更新时间: 2024-10-27 21:47:14

领域: cs.CV,cs.LG,cs.MM,cs.SD,eess.AS

下载: http://arxiv.org/abs/2405.12221v2

Clustering and Alignment: Understanding the Training Dynamics in Modular Addition

Recent studies have revealed that neural networks learn interpretable algorithms for many simple problems. However, little is known about how these algorithms emerge during training. In this article, I study the training dynamics of a small neural network with 2-dimensional embeddings on the problem of modular addition. I observe that embedding vectors tend to organize into two types of structures: grids and circles. I study these structures and explain their emergence as a result of two simple tendencies exhibited by pairs of embeddings: clustering and alignment. I propose explicit formulae for these tendencies as interaction forces between different pairs of embeddings. To show that my formulae can fully account for the emergence of these structures, I construct an equivalent particle simulation where I show that identical structures emerge. I discuss the role of weight decay in my setup and reveal a new mechanism that links regularization and training dynamics. To support my findings, I also release an interactive demo available at https://modular-addition.vercel.app/.

Updated: 2024-10-27 21:40:33

标题: 聚类和对齐：理解模块加法中的训练动态

摘要: 近期研究表明，神经网络学会了许多简单问题的可解释算法。然而，关于这些算法在训练过程中是如何产生的知之甚少。在本文中，我研究了一个具有二维嵌入的小型神经网络在模块化加法问题上的训练动态。我观察到嵌入向量倾向于组织成两种类型的结构：网格和圆圈。我研究了这些结构，并解释了它们的出现是由于嵌入对展现出的两种简单倾向：聚类和对齐。我提出了这些倾向的显式公式，作为不同嵌入对之间的相互作用力。为了表明我的公式能够完全解释这些结构的出现，我构建了一个等效的粒子模拟，展示了相同的结构出现。我讨论了我的设置中权重衰减的作用，并揭示了一种将正则化和训练动态联系起来的新机制。为了支持我的发现，我还发布了一个交互式演示，可在https://modular-addition.vercel.app/上使用。

更新时间: 2024-10-27 21:40:33

领域: cs.LG

下载: http://arxiv.org/abs/2408.09414v2

Advancing Towards Green Blockchain: A Practical Energy-Efficient Blockchain Based Application for CV Verification

Blockchain has been widely criticized due to the use of inefficient consensus protocols and energy-intensive mechanisms that derived into a global enormous power consumption. Fortunately, since the first blockchain was conceived in 2008 (the one that supports Bitcoin), hardware and consensus protocols have evolved, decreasing energy consumption significantly. This article describes a green blockchain solution and quantifies energy savings when deploying the system on traditional computers and embedded Single-Board Computers (SBCs). To illustrate such savings, it is proposed a solution for tackling the problem of academic certificate forgery, which has a significant cost to society, since it harms the trustworthiness of certificates and academic institutions. The proposed solution is aimed at recording and verifying academic records (ARs) through a decentralized application (DApp) that is supported by a smart contract deployed in the Ethereum blockchain. The application stores the raw data (i.e., the data that are not managed by the blockchain) on a decentralized storage system based on Inter-Planetary File System (IPFS). To demonstrate the efficiency of the developed solution, it is evaluated in terms of performance (transaction latency and throughput) and efficiency (CPU usage and energy consumption), comparing the results obtained with a traditional Proof-of-Work (PoW) consensus protocol and the new Proof-of-Authority (PoA) protocol. The results shown in this paper indicate that the latter is clearly greener and demands less CPU load. Moreover, this article compares the performance of a traditional computer and two SBCs (a Raspberry Pi 4 and an Orange Pi One), showing that is possible to make use of the latter low-power devices to implement blockchain nodes for proposed DApp, but at the cost of higher response latency that varies greatly depending on the used SBCs [...]

Updated: 2024-10-27 21:32:20

标题: 朝着绿色区块链迈进：一种实用的能源高效区块链应用，用于CV验证

摘要: 区块链由于使用低效的共识协议和能耗高的机制而受到广泛批评，这导致了全球巨大的能源消耗。幸运的是，自2008年首次构想出第一个支持比特币的区块链以来，硬件和共识协议已经进化，大大降低了能源消耗。本文描述了一种绿色区块链解决方案，并量化了在传统计算机和嵌入式单板电脑(SBCs)上部署该系统时的节能效果。为了说明这种节能效果，提出了一种解决学术证书伪造问题的方案，这对社会造成了重大损失，因为它损害了证书和学术机构的可信度。所提出的解决方案旨在通过在以太坊区块链上部署的智能合约支持的去中心化应用程序(DApp)记录和验证学术记录(ARs)。该应用程序将原始数据（即区块链不管理的数据）存储在基于星际文件系统（IPFS）的去中心化存储系统上。为了证明开发的解决方案的效率，本文评估了其性能（交易延迟和吞吐量）和效率（CPU使用率和能耗），并将其与传统的工作量证明（PoW）共识协议和新的权威证明（PoA）协议的结果进行比较。本文中展示的结果表明后者显然更环保，且要求更少的CPU负载。此外，本文比较了传统计算机和两个SBCs（树莓派4和橙色派One）的性能，显示可以利用后者的低功耗设备来实现为提出的DApp部署区块链节点，但代价是更高的响应延迟，这取决于所使用的SBCs。

更新时间: 2024-10-27 21:32:20

领域: cs.DC,cs.CR,cs.CY,cs.SY,eess.SY

下载: http://arxiv.org/abs/2410.20605v1

Implementation and Application of an Intelligibility Protocol for Interaction with an LLM

Our interest is in constructing interactive systems involving a human-expert interacting with a machine learning engine on data analysis tasks. This is of relevance when addressing complex problems arising in areas of science, the environment, medicine and so on, which are not immediately amenable to the usual methods of statistical or mathematical modelling. In such situations, it is possible that harnessing human expertise and creativity to modern machine-learning capabilities of identifying patterns by constructing new internal representations of the data may provide some insight to possible solutions. In this paper, we examine the implementation of an abstract protocol developed for interaction between agents, each capable of constructing predictions and explanations. The \PXP protocol, described in [12] is motivated by the notion of ''two-way intelligibility'' and is specified using a pair of communicating finite-state machines. While the formalisation allows the authors to prove several properties about the protocol, no implementation was presented. Here, we address this shortcoming for the case in which one of the agents acts as a ''generator'' using a large language model (LLM) and the other is an agent that acts as a ''tester'' using either a human-expert, or a proxy for a human-expert (for example, a database compiled using human-expertise). We believe these use-cases will be a widely applicable form of interaction for problems of the kind mentioned above. We present an algorithmic description of general-purpose implementation, and conduct preliminary experiments on its use in two different areas (radiology and drug-discovery). The experimental results provide early evidence in support of the protocol's capability of capturing one- and two-way intelligibility in human-LLM in the manner proposed in [12].

Updated: 2024-10-27 21:20:18

标题: 实施和应用一个与LLM互动的可懂性协议

摘要: 我们的兴趣在于构建涉及人类专家与机器学习引擎在数据分析任务中互动的交互式系统。在解决科学、环境、医学等领域出现的复杂问题时，通常的统计或数学建模方法并不立即适用。在这种情况下，利用人类专业知识和创造力与现代机器学习能力相结合，通过构建数据的新内部表示来识别模式，可能会为可能的解决方案提供一些见解。在本文中，我们研究了为代理之间的交互开发的抽象协议的实现。在[12]中描述的\PXP协议，受到“双向智能性”概念的启发，采用了一对通信有限状态机进行指定。虽然这种形式化允许作者证明关于协议的几个属性，但没有提供实现。在这里，我们解决了这个缺点，针对一种代理充当“生成器”，使用大型语言模型（LLM），另一种代理充当“测试者”，使用人类专家或代表人类专家的代理（例如，使用人类专家知识编制的数据库）的情况。我们相信这些用例将是对上述问题的一种广泛适用的交互形式。我们提供了通用实现的算法描述，并在两个不同领域（放射学和药物发现）中进行了初步实验。实验结果初步证明了协议能够以[12]中提出的方式捕捉人类-LLM之间的单向和双向智能性。

更新时间: 2024-10-27 21:20:18

领域: cs.AI,cs.HC,cs.LG,cs.MA

下载: http://arxiv.org/abs/2410.20600v1

Practical Bayesian Algorithm Execution via Posterior Sampling

We consider Bayesian algorithm execution (BAX), a framework for efficiently selecting evaluation points of an expensive function to infer a property of interest encoded as the output of a base algorithm. Since the base algorithm typically requires more evaluations than are feasible, it cannot be directly applied. Instead, BAX methods sequentially select evaluation points using a probabilistic numerical approach. Current BAX methods use expected information gain to guide this selection. However, this approach is computationally intensive. Observing that, in many tasks, the property of interest corresponds to a target set of points defined by the function, we introduce PS-BAX, a simple, effective, and scalable BAX method based on posterior sampling. PS-BAX is applicable to a wide range of problems, including many optimization variants and level set estimation. Experiments across diverse tasks demonstrate that PS-BAX performs competitively with existing baselines while being significantly faster, simpler to implement, and easily parallelizable, setting a strong baseline for future research. Additionally, we establish conditions under which PS-BAX is asymptotically convergent, offering new insights into posterior sampling as an algorithm design paradigm.

Updated: 2024-10-27 21:11:55

标题: 通过后验抽样实现实用的贝叶斯算法执行

摘要: 我们考虑贝叶斯算法执行（BAX），这是一个用于有效选择昂贵函数的评估点以推断作为基本算法输出编码的感兴趣属性的框架。由于基本算法通常需要比可行的更多的评估，因此无法直接应用。相反，BAX方法使用概率数值方法顺序选择评估点。当前的BAX方法使用期望信息增益来指导这种选择。然而，这种方法计算密集。观察到，在许多任务中，感兴趣的属性对应于由函数定义的一组目标点，我们引入PS-BAX，这是一个基于后验抽样的简单、有效和可扩展的BAX方法。PS-BAX适用于广泛的问题，包括许多优化变体和水平集估计。跨不同任务的实验表明，PS-BAX在与现有基线的竞争中表现出色，同时速度显著更快，实施更简单，易于并行化，为未来研究奠定了强有力的基础。此外，我们建立了PS-BAX渐近收敛的条件，为后验抽样作为算法设计范例提供了新的见解。

更新时间: 2024-10-27 21:11:55

领域: cs.LG,math.OC,stat.ML

下载: http://arxiv.org/abs/2410.20596v1

Denoising: A Powerful Building-Block for Imaging, Inverse Problems, and Machine Learning

Denoising, the process of reducing random fluctuations in a signal to emphasize essential patterns, has been a fundamental problem of interest since the dawn of modern scientific inquiry. Recent denoising techniques, particularly in imaging, have achieved remarkable success, nearing theoretical limits by some measures. Yet, despite tens of thousands of research papers, the wide-ranging applications of denoising beyond noise removal have not been fully recognized. This is partly due to the vast and diverse literature, making a clear overview challenging. This paper aims to address this gap. We present a clarifying perspective on denoisers, their structure, and desired properties. We emphasize the increasing importance of denoising and showcase its evolution into an essential building block for complex tasks in imaging, inverse problems, and machine learning. Despite its long history, the community continues to uncover unexpected and groundbreaking uses for denoising, further solidifying its place as a cornerstone of scientific and engineering practice.

Updated: 2024-10-27 21:08:19

标题: 去噪：成像、逆问题和机器学习的强大基础

摘要: 降噪是指减少信号中的随机波动以突出基本模式的过程，自现代科学探究之初以来一直是一个引人关注的基本问题。最近的降噪技术在图像处理领域取得了显著成功，通过某些指标接近了理论极限。然而，尽管有成千上万篇研究论文，降噪在噪声去除之外的广泛应用尚未完全得到承认。部分原因是由于广泛而多样的文献，使得清晰的概述具有挑战性。本文旨在填补这一空白。我们提出了对降噪器、其结构和所需属性的澄清观点。我们强调降噪的重要性不断增加，并展示了其在图像处理、逆问题和机器学习中演变为复杂任务的基本构建模块。尽管降噪已有悠久历史，学术界仍在发现降噪的意想不到和开创性用途，进一步巩固其作为科学和工程实践的基石的地位。

更新时间: 2024-10-27 21:08:19

领域: cs.LG,cs.CV,eess.IV

下载: http://arxiv.org/abs/2409.06219v3

A Framework for Real-Time Volcano-Seismic Event Recognition Based on Multi-Station Seismograms and Semantic Segmentation Models

In volcano monitoring, effective recognition of seismic events is essential for understanding volcanic activity and raising timely warning alerts. Traditional methods rely on manual analysis, which can be subjective and labor-intensive. Furthermore, current automatic approaches often tackle detection and classification separately, mostly rely on single station information and generally require tailored preprocessing and representations to perform predictions. These limitations often hinder their application to real-time monitoring and utilization across different volcano conditions. This study introduces a novel approach that utilizes Semantic Segmentation models to automate seismic event recognition by applying a straight forward transformation of multi-channel 1D signals into 2D representations, enabling their use as images. Our framework employs a data-driven, end-to-end design that integrates multi-station seismic data with minimal preprocessing, performing both detection and classification simultaneously for five seismic event classes. We evaluated four state-of-the-art segmentation models (UNet, UNet++, DeepLabV3+ and SwinUNet) on approximately 25.000 seismic events recorded at four different Chilean volcanoes: Nevados del Chill\'an Volcanic Complex, Laguna del Maule, Villarrica and Puyehue-Cord\'on Caulle. Among these models, the UNet architecture was identified as the most effective model, achieving mean F1 and Intersection over Union (IoU) scores of up to 0.91 and 0.88, respectively, and demonstrating superior noise robustness and model flexibility to unseen volcano datasets.

Updated: 2024-10-27 21:02:37

标题: 一个基于多站台地震图和语义分割模型的实时火山地震事件识别框架

摘要: 在火山监测中，有效识别地震事件对于理解火山活动并及时发出警告至关重要。传统方法依赖于手动分析，这可能是主观的和劳动密集的。此外，当前的自动方法通常将检测和分类分开处理，大多依赖于单个站点的信息，并通常需要定制的预处理和表示来进行预测。这些限制通常阻碍了它们在实时监测和在不同火山条件下的利用。本研究介绍了一种新颖的方法，利用语义分割模型自动化地震事件识别，通过将多通道1D信号直接转换为2D表示，使其可以作为图像使用。我们的框架采用了数据驱动的端到端设计，将多站地震数据与最少的预处理相结合，同时为五种地震事件类别进行检测和分类。我们在四个不同的智利火山记录的约25,000个地震事件上评估了四种最先进的分割模型（UNet、UNet++、DeepLabV3+和SwinUNet）：涅瓦多斯德切尔岩浆复合体、马乌雷湖、维拉里卡和普耶韦-科尔东考雷火山。在这些模型中，UNet架构被确定为最有效的模型，分别实现了平均F1和交并比（IoU）得分高达0.91和0.88，展示了对未见过的火山数据集具有优越的噪声鲁棒性和模型灵活性。

更新时间: 2024-10-27 21:02:37

领域: cs.CV,cs.LG,eess.SP

下载: http://arxiv.org/abs/2410.20595v1

Towards a Blockchain and Opportunistic Edge Driven Metaverse of Everything

Decentralized Metaverses, built on Web 3.0 and Web 4.0 technologies, have attracted significant attention across various fields. This innovation leverages blockchain, Decentralized Autonomous Organizations (DAOs), Extended Reality (XR) and advanced technologies to create immersive and interconnected digital environments that mirror the real world. This article delves into the Metaverse of Everything (MoE), a platform that fuses the Metaverse concept with the Internet of Everything (IoE), an advanced version of the Internet of Things (IoT) that connects not only physical devices but also people, data and processes within a networked environment. Thus, the MoE integrates generated data and virtual entities, creating an extensive network of interconnected components. This article seeks to advance current MoE, examining decentralization and the application of Opportunistic Edge Computing (OEC) for interactions with surrounding IoT devices and IoE entities. Moreover, it outlines the main challenges to guide researchers and businesses towards building a future cyber-resilient opportunistic MoE.

Updated: 2024-10-27 21:02:14

标题: 朝向一切的区块链和机会主义边缘驱动元宇宙

摘要: 去中心化的元宇宙，建立在Web 3.0和Web 4.0技术之上，已经吸引了各个领域的重要关注。这种创新利用区块链、去中心化自治组织（DAOs）、扩展现实（XR）和先进技术，创造出模仿现实世界的沉浸式和互连的数字环境。本文深入探讨了“一切元宇宙”（MoE），这是一个将元宇宙概念与“一切互联网”（IoE）融合在一起的平台，IoE是物联网（IoT）的一个高级版本，不仅连接物理设备，还连接人、数据和网络环境中的流程。因此，MoE整合了生成的数据和虚拟实体，创建了一个庞大的互连组件网络。本文旨在推进当前MoE，研究去中心化和应用机会性边缘计算（OEC）与周围物联网设备和IoE实体进行交互。此外，本文概述了主要挑战，指导研究人员和企业朝着构建未来具有网络弹性的机会性MoE迈进。

更新时间: 2024-10-27 21:02:14

领域: cs.CY,cs.CR,cs.SY,eess.SY

下载: http://arxiv.org/abs/2410.20594v1

Generator Matching: Generative modeling with arbitrary Markov processes

We introduce generator matching, a modality-agnostic framework for generative modeling using arbitrary Markov processes. Generators characterize the infinitesimal evolution of a Markov process, which we leverage for generative modeling in a similar vein to flow matching: we construct conditional generators which generate single data points, then learn to approximate the marginal generator which generates the full data distribution. We show that generator matching unifies various generative modeling methods, including diffusion models, flow matching and discrete diffusion models. Furthermore, it provides the foundation to expand the design space to new and unexplored Markov processes such as jump processes. Finally, generator matching enables the construction of superpositions of Markov generative processes and enables the construction of multimodal models in a rigorous manner. We empirically validate our method on protein and image structure generation, showing that superposition with a jump process improves image generation.

Updated: 2024-10-27 20:47:29

标题: 生成器匹配：使用任意马尔可夫过程进行生成建模

摘要: 我们介绍了生成器匹配，这是一种不受模态影响的生成建模框架，使用任意马尔可夫过程。生成器描述了马尔可夫过程的微小演变，我们利用这一特性进行生成建模，类似于流匹配：我们构建生成单个数据点的条件生成器，然后学习逼近生成完整数据分布的边缘生成器。我们展示了生成器匹配统一了各种生成建模方法，包括扩散模型、流匹配和离散扩散模型。此外，它为扩展设计空间到新的和未探索的马尔可夫过程（如跳跃过程）提供了基础。最后，生成器匹配使得能够构建马尔可夫生成过程的叠加，并以严格的方式构建多模态模型。我们在蛋白质和图像结构生成上对我们的方法进行了实证验证，结果显示与跳跃过程的叠加改善了图像生成。

更新时间: 2024-10-27 20:47:29

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2410.20587v1

A Comprehensive Survey on Green Blockchain: Developing the Next Generation of Energy Efficient and Sustainable Blockchain Systems

Although Blockchain has been successfully used in many different fields and applications, it has been traditionally regarded as an energy-intensive technology, essentially due to the past use of inefficient consensus algorithms that prioritized security over sustainability. However, in the last years, thanks to the significant progress made on key blockchain components, their energy consumption can be decreased noticeably. To achieve this objective, this article analyzes the main components of blockchains and explores strategies to reduce their energy consumption. In this way, this article delves into each component of a blockchain system, including consensus mechanisms, network architecture, data storage and validation, smart contract execution, mining and block creation, and outlines specific strategies to decrease their energy consumption. For such a purpose, consensus mechanisms are compared, recommendations for reducing network communications energy consumption are provided, techniques for data storage and validation are suggested and diverse optimizations are proposed both for software and hardware components. Moreover, the main challenges and limitations of reducing power consumption in blockchain systems are analyzed. As a consequence, this article provides a guideline for the future researchers and developers who aim to develop the next generation of Green Blockchain solutions.

Updated: 2024-10-27 20:22:25

标题: 《绿色区块链综合调查：发展下一代高效节能可持续区块链系统》

摘要: 尽管区块链已经成功地应用于许多不同领域和应用程序，但传统上它被认为是一种耗能的技术，主要是由于过去使用了优先考虑安全性而不是可持续性的低效共识算法。然而，近年来，由于在关键区块链组件上取得的重大进展，它们的能源消耗可以显著降低。为了实现这一目标，本文分析了区块链的主要组件，并探讨了减少其能源消耗的策略。本文深入探讨了区块链系统的每个组件，包括共识机制、网络架构、数据存储和验证、智能合约执行、挖矿和区块创建，并概述了降低它们能源消耗的具体策略。为了实现这一目的，比较了共识机制，提供了减少网络通信能耗的建议，提出了数据存储和验证的技术，并针对软件和硬件组件提出了各种优化。此外，分析了在区块链系统中降低能源消耗的主要挑战和限制。因此，本文为未来的研究人员和开发人员提供了指导，他们的目标是开发下一代绿色区块链解决方案。

更新时间: 2024-10-27 20:22:25

领域: cs.DC,cs.CR

下载: http://arxiv.org/abs/2410.20581v1

Toward Conditional Distribution Calibration in Survival Prediction

Survival prediction often involves estimating the time-to-event distribution from censored datasets. Previous approaches have focused on enhancing discrimination and marginal calibration. In this paper, we highlight the significance of conditional calibration for real-world applications -- especially its role in individual decision-making. We propose a method based on conformal prediction that uses the model's predicted individual survival probability at that instance's observed time. This method effectively improves the model's marginal and conditional calibration, without compromising discrimination. We provide asymptotic theoretical guarantees for both marginal and conditional calibration and test it extensively across 15 diverse real-world datasets, demonstrating the method's practical effectiveness and versatility in various settings.

Updated: 2024-10-27 20:19:46

标题: 朝向生存预测中条件分布校准

摘要: 生存预测通常涉及从被截尾的数据集中估计时间分布。先前的方法主要集中在增强区分度和边际校准方面。在本文中，我们强调条件校准在现实世界应用中的重要性--尤其是它在个体决策中的作用。我们提出了一种基于符合预测的方法，该方法使用模型在观察时间处预测的个体生存概率。该方法有效地改善了模型的边际和条件校准，而不会损害区分度。我们为边际和条件校准提供了渐近理论保证，并在15个不同的真实世界数据集上进行了广泛测试，展示了该方法在各种设置中的实际有效性和灵活性。

更新时间: 2024-10-27 20:19:46

领域: cs.LG,cs.AI,stat.ML

下载: http://arxiv.org/abs/2410.20579v1

Meta-Learning Approaches for Improving Detection of Unseen Speech Deepfakes

Current speech deepfake detection approaches perform satisfactorily against known adversaries; however, generalization to unseen attacks remains an open challenge. The proliferation of speech deepfakes on social media underscores the need for systems that can generalize to unseen attacks not observed during training. We address this problem from the perspective of meta-learning, aiming to learn attack-invariant features to adapt to unseen attacks with very few samples available. This approach is promising since generating of a high-scale training dataset is often expensive or infeasible. Our experiments demonstrated an improvement in the Equal Error Rate (EER) from 21.67% to 10.42% on the InTheWild dataset, using just 96 samples from the unseen dataset. Continuous few-shot adaptation ensures that the system remains up-to-date.

Updated: 2024-10-27 20:14:32

标题: 元学习方法用于改进检测未知语音Deepfakes

摘要: 目前的语音深度伪造检测方法在对已知对手方面表现令人满意；然而，对未知攻击的泛化仍然是一个开放的挑战。社交媒体上语音深度伪造的增多强调了需要能够泛化到在训练过程中未观察到的未知攻击的系统。我们从元学习的角度解决这个问题，旨在学习攻击不变的特征，以适应仅有极少样本的未知攻击。这种方法是有希望的，因为生成大规模训练数据集通常是昂贵或不可行的。我们的实验表明，在InTheWild数据集上，仅使用96个来自未知数据集的样本，等错误率（EER）从21.67%提高到10.42%。持续的少样本适应确保系统保持最新。

更新时间: 2024-10-27 20:14:32

领域: eess.AS,cs.AI,cs.SD

下载: http://arxiv.org/abs/2410.20578v1

Getting More Juice Out of the SFT Data: Reward Learning from Human Demonstration Improves SFT for LLM Alignment

Aligning human preference and value is an important requirement for contemporary foundation models. State-of-the-art techniques such as Reinforcement Learning from Human Feedback (RLHF) often consist of two stages: 1) supervised fine-tuning (SFT), where the model is fine-tuned by learning from human demonstration data; 2) Preference learning, where preference data is used to learn a reward model, which is in turn used by a reinforcement learning (RL) step to fine-tune the model. Such reward model serves as a proxy to human preference, and it is critical to guide the RL step towards improving the model quality. In this work, we argue that the SFT stage significantly benefits from learning a reward model as well. Instead of using the human demonstration data directly via supervised learning, we propose to leverage an Inverse Reinforcement Learning (IRL) technique to simultaneously build an reward model and a policy model. This approach leads to new SFT algorithms that are not only efficient to implement, but are robust to the presence of low-quality supervised learning data. Moreover, we discover a connection between the proposed IRL based approach, and a recent line of works called Self-Play Fine-tune (SPIN). Theoretically, we show that the proposed algorithms converge to the stationary solutions of the IRL problem. Empirically, we align 1B and 7B models using proposed methods and evaluate them on a reward benchmark model and the HuggingFace Open LLM Leaderboard. The proposed methods show significant performance improvement over existing SFT approaches. Our results indicate that it is beneficial to leverage reward learning throughout the entire alignment process.

Updated: 2024-10-27 20:09:59

标题: 从SFT数据中获取更多信息：来自人类示范的奖励学习改善LLM对齐

摘要: 将人类偏好和价值进行对齐是当代基础模型的重要要求。最先进的技术，如从人类反馈中进行强化学习（RLHF），通常包括两个阶段：1）监督微调（SFT），在这个阶段，模型通过从人类演示数据中学习进行微调；2）偏好学习，其中利用偏好数据来学习一个奖励模型，然后由强化学习（RL）步骤来利用这个奖励模型对模型进行微调。这样的奖励模型充当了人类偏好的代理，并且对引导RL步骤改进模型质量至关重要。在这项工作中，我们认为SFT阶段也极大地受益于学习一个奖励模型。我们提出利用逆强化学习（IRL）技术同时构建奖励模型和策略模型，而不是直接通过监督学习使用人类演示数据。这种方法导致了新的SFT算法，不仅实施高效，而且对低质量监督学习数据的存在具有鲁棒性。此外，我们发现了所提出的基于IRL的方法与最近一系列名为Self-Play Fine-tune（SPIN）的工作之间的联系。在理论上，我们证明了所提出的算法收敛于IRL问题的稳态解。在实证方面，我们使用提出的方法对1B和7B模型进行对齐，并在奖励基准模型和HuggingFace Open LLM排行榜上对它们进行评估。所提出的方法显示出明显的性能改进，超过了现有的SFT方法。我们的结果表明，在整个对齐过程中利用奖励学习是有益的。

更新时间: 2024-10-27 20:09:59

领域: cs.AI

下载: http://arxiv.org/abs/2405.17888v3

Large Language Models as Urban Residents: An LLM Agent Framework for Personal Mobility Generation

This paper introduces a novel approach using Large Language Models (LLMs) integrated into an agent framework for flexible and effective personal mobility generation. LLMs overcome the limitations of previous models by effectively processing semantic data and offering versatility in modeling various tasks. Our approach addresses three research questions: aligning LLMs with real-world urban mobility data, developing reliable activity generation strategies, and exploring LLM applications in urban mobility. The key technical contribution is a novel LLM agent framework that accounts for individual activity patterns and motivations, including a self-consistency approach to align LLMs with real-world activity data and a retrieval-augmented strategy for interpretable activity generation. We evaluate our LLM agent framework and compare it with state-of-the-art personal mobility generation approaches, demonstrating the effectiveness of our approach and its potential applications in urban mobility. Overall, this study marks the pioneering work of designing an LLM agent framework for activity generation based on real-world human activity data, offering a promising tool for urban mobility analysis.

Updated: 2024-10-27 20:02:01

标题: 大型语言模型作为城市居民：用于个人移动生成的LLM代理框架

摘要: 这篇论文介绍了一种将大型语言模型（LLMs）集成到代理框架中，用于灵活有效的个人移动生成的新方法。LLMs通过有效处理语义数据并在建模各种任务方面提供灵活性，克服了以前模型的局限性。我们的方法解决了三个研究问题：将LLMs与现实世界的城市移动数据对齐，开发可靠的活动生成策略，以及探索LLMs在城市移动中的应用。关键技术贡献是一种新颖的LLM代理框架，考虑了个体活动模式和动机，包括一种自洽方法，将LLMs与现实世界的活动数据对齐，以及一种检索增强策略，用于可解释的活动生成。我们评估了我们的LLM代理框架，并与最先进的个人移动生成方法进行比较，展示了我们方法的有效性及其在城市移动中的潜在应用。总的来说，这项研究标志着基于现实世界人类活动数据设计LLM代理框架进行活动生成的开创性工作，为城市移动分析提供了一个有前途的工具。

更新时间: 2024-10-27 20:02:01

领域: cs.AI,cs.CL,cs.CY,cs.LG

下载: http://arxiv.org/abs/2402.14744v3

Encrypted system identification as-a-service via reliable encrypted matrix inversion

Encrypted computation opens up promising avenues across a plethora of application domains, including machine learning, health-care, finance, and control. Arithmetic homomorphic encryption, in particular, is a natural fit for cloud-based computational services. However, computations are essentially limited to polynomial circuits, while comparisons, transcendental functions, and iterative algorithms are notoriously hard to realize. Against this background, the paper presents an encrypted system identification service enabled by a reliable encrypted solution to least squares problems. More precisely, we devise an iterative algorithm for matrix inversion and present reliable initializations as well as certificates for the achieved accuracy without compromising the privacy of provided I/O-data. The effectiveness of the approach is illustrated with three popular identification tasks.

Updated: 2024-10-27 20:00:04

标题: 通过可靠的加密矩阵反漂转换为服务的加密系统识别

摘要: 加密计算为包括机器学习、医疗保健、金融和控制在内的众多应用领域打开了有前景的途径。算术同态加密特别适用于基于云的计算服务。然而，计算基本上受到多项式电路的限制，而比较、超越函数和迭代算法则难以实现。在这种背景下，本文提出了一种通过可靠的加密解决最小二乘问题的加密系统识别服务。更确切地说，我们设计了一个用于矩阵求逆的迭代算法，并提供了可靠的初始化以及实现准确性的证书，同时不损害提供的 I/O 数据的隐私。该方法的有效性通过三个流行的识别任务进行了说明。

更新时间: 2024-10-27 20:00:04

领域: cs.CR,cs.SY,eess.SY

下载: http://arxiv.org/abs/2410.20575v1

E(3)-invaraint diffusion model for pocket-aware peptide generation

Biologists frequently desire protein inhibitors for a variety of reasons, including use as research tools for understanding biological processes and application to societal problems in agriculture, healthcare, etc. Immunotherapy, for instance, relies on immune checkpoint inhibitors to block checkpoint proteins, preventing their binding with partner proteins and boosting immune cell function against abnormal cells. Inhibitor discovery has long been a tedious process, which in recent years has been accelerated by computational approaches. Advances in artificial intelligence now provide an opportunity to make inhibitor discovery smarter than ever before. While extensive research has been conducted on computer-aided inhibitor discovery, it has mainly focused on either sequence-to-structure mapping, reverse mapping, or bio-activity prediction, making it unrealistic for biologists to utilize such tools. Instead, our work proposes a new method of computer-assisted inhibitor discovery: de novo pocket-aware peptide structure and sequence generation network. Our approach consists of two sequential diffusion models for end-to-end structure generation and sequence prediction. By leveraging angle and dihedral relationships between backbone atoms, we ensure an E(3)-invariant representation of peptide structures. Our results demonstrate that our method achieves comparable performance to state-of-the-art models, highlighting its potential in pocket-aware peptide design. This work offers a new approach for precise drug discovery using receptor-specific peptide generation.

Updated: 2024-10-27 19:59:09

标题: E(3)-不变扩散模型用于考虑口袋的肽生成

摘要: 生物学家经常希望获得蛋白抑制剂，原因有很多，包括作为研究工具用于理解生物过程以及应用于农业、医疗保健等社会问题。免疫疗法依赖于免疫检查点抑制剂来阻止检查点蛋白，防止它们与伙伴蛋白结合，并增强免疫细胞对异常细胞的功能。抑制剂的发现长期以来一直是一个繁琐的过程，近年来计算方法加速了这一过程。人工智能的进步现在为使抑制剂的发现比以往任何时候都更加智能提供了机会。尽管在计算辅助抑制剂发现方面进行了大量研究，但主要集中在序列到结构映射、反向映射或生物活性预测，使得生物学家无法利用这些工具。相反，我们的工作提出了一种新的计算辅助抑制剂发现方法：全新的口袋感知肽结构和序列生成网络。我们的方法包括两个顺序扩散模型，用于端到端结构生成和序列预测。通过利用背骨原子之间的角度和二面角关系，我们确保了肽结构的E(3)不变表示。我们的结果表明，我们的方法实现了与最先进模型相当的性能，突显了它在口袋感知肽设计中的潜力。这项工作提供了一种使用受体特异性肽生成的精确药物发现新方法。

更新时间: 2024-10-27 19:59:09

领域: cs.LG,cs.AI,q-bio.BM

下载: http://arxiv.org/abs/2410.21335v1

Unsupervised Panoptic Interpretation of Latent Spaces in GANs Using Space-Filling Vector Quantization

Generative adversarial networks (GANs) learn a latent space whose samples can be mapped to real-world images. Such latent spaces are difficult to interpret. Some earlier supervised methods aim to create an interpretable latent space or discover interpretable directions that require exploiting data labels or annotated synthesized samples for training. However, we propose using a modification of vector quantization called space-filling vector quantization (SFVQ), which quantizes the data on a piece-wise linear curve. SFVQ can capture the underlying morphological structure of the latent space and thus make it interpretable. We apply this technique to model the latent space of pretrained StyleGAN2 and BigGAN networks on various datasets. Our experiments show that the SFVQ curve yields a general interpretable model of the latent space that determines which part of the latent space corresponds to what specific generative factors. Furthermore, we demonstrate that each line of SFVQ's curve can potentially refer to an interpretable direction for applying intelligible image transformations. We also showed that the points located on an SFVQ line can be used for controllable data augmentation.

Updated: 2024-10-27 19:56:02

标题: 使用空间填充矢量量化在GANs中对潜空间进行无监督全景解释

摘要: 生成对抗网络（GANs）学习一个潜在空间，其样本可以映射到现实世界的图像。这样的潜在空间很难解释。一些早期的监督方法旨在创建一个可解释的潜在空间或发现需要利用数据标签或注释合成样本进行训练的可解释方向。然而，我们提出使用一种称为填充向量量化（SFVQ）的向量量化修改，它在一条分段线性曲线上对数据进行量化。SFVQ可以捕捉潜在空间的基本形态结构，从而使其可解释。我们将这种技术应用于在各种数据集上预训练的StyleGAN2和BigGAN网络的潜在空间建模。我们的实验表明，SFVQ曲线产生了一个通用的可解释模型，确定潜在空间的哪一部分对应于具体的生成因素。此外，我们证明SFVQ曲线的每一条线都有可能指向一个可解释的方向，以应用可理解的图像转换。我们还展示了位于SFVQ线上的点可用于可控数据增强。

更新时间: 2024-10-27 19:56:02

领域: cs.CV,cs.AI,cs.LG

下载: http://arxiv.org/abs/2410.20573v1

Large Language Models as Data Preprocessors

Large Language Models (LLMs), typified by OpenAI's GPT, have marked a significant advancement in artificial intelligence. Trained on vast amounts of text data, LLMs are capable of understanding and generating human-like text across a diverse range of topics. This study expands on the applications of LLMs, exploring their potential in data preprocessing, a critical stage in data mining and analytics applications. Aiming at tabular data, we delve into the applicability of state-of-the-art LLMs such as GPT-4 and GPT-4o for a series of preprocessing tasks, including error detection, data imputation, schema matching, and entity matching. Alongside showcasing the inherent capabilities of LLMs, we highlight their limitations, particularly in terms of computational expense and inefficiency. We propose an LLM-based framework for data preprocessing, which integrates cutting-edge prompt engineering techniques, coupled with traditional methods like contextualization and feature selection, to improve the performance and efficiency of these models. The effectiveness of LLMs in data preprocessing is evaluated through an experimental study spanning a variety of public datasets. GPT-4 emerged as a standout, achieving 100\% accuracy or F1 score on 4 of these datasets, suggesting LLMs' immense potential in these tasks. Despite certain limitations, our study underscores the promise of LLMs in this domain and anticipates future developments to overcome current hurdles.

Updated: 2024-10-27 19:35:47

标题: 大型语言模型作为数据预处理器

摘要: 大型语言模型（LLMs），以OpenAI的GPT为代表，标志着人工智能领域的重大进展。通过在大量文本数据上进行训练，LLMs能够理解和生成涵盖各种主题的类人文本。本研究扩展了LLMs的应用，探索它们在数据预处理中的潜力，这是数据挖掘和分析应用中的关键阶段。针对表格数据，我们深入研究了最先进的LLMs（如GPT-4和GPT-4o）在一系列预处理任务中的适用性，包括错误检测、数据插补、模式匹配和实体匹配。除了展示LLMs的固有能力之外，我们还强调了它们的局限性，特别是在计算开销和效率方面。我们提出了一个基于LLMs的数据预处理框架，该框架整合了最前沿的提示工程技术，结合了上下文化和特征选择等传统方法，以提高这些模型的性能和效率。通过跨多个公共数据集的实验研究评估了LLMs在数据预处理中的有效性。GPT-4在其中4个数据集上取得了100%的准确性或F1分数，显示了LLMs在这些任务中的巨大潜力。尽管存在一定的局限性，我们的研究强调了LLMs在这一领域的潜力，并期待未来的发展来克服当前的障碍。

更新时间: 2024-10-27 19:35:47

领域: cs.AI,cs.DB

下载: http://arxiv.org/abs/2308.16361v2

Neural rendering enables dynamic tomography

Interrupted X-ray computed tomography (X-CT) has been the common way to observe the deformation of materials during an experiment. While this approach is effective for quasi-static experiments, it has never been possible to reconstruct a full 3d tomography during a dynamic experiment which cannot be interrupted. In this work, we propose that neural rendering tools can be used to drive the paradigm shift to enable 3d reconstruction during dynamic events. First, we derive theoretical results to support the selection of projections angles. Via a combination of synthetic and experimental data, we demonstrate that neural radiance fields can reconstruct data modalities of interest more efficiently than conventional reconstruction methods. Finally, we develop a spatio-temporal model with spline-based deformation field and demonstrate that such model can reconstruct the spatio-temporal deformation of lattice samples in real-world experiments.

Updated: 2024-10-27 19:18:20

标题: 神经渲染实现动态断层扫描

摘要: 中断X射线计算机断层扫描（X-CT）一直是观察实验过程中材料变形的常见方式。虽然这种方法对准静态实验很有效，但在无法中断的动态实验过程中，从未能够重建完整的三维断层扫描。在这项工作中，我们提出可以利用神经渲染工具推动范式转变，实现在动态事件中的三维重建。首先，我们推导出理论结果来支持选择投影角度。通过合成和实验数据的结合，我们证明神经辐射场可以比传统重建方法更有效地重建感兴趣的数据模态。最后，我们开发了一个基于样条的时空模型，并展示这种模型可以在真实世界实验中重建晶格样本的时空变形。

更新时间: 2024-10-27 19:18:20

领域: physics.ins-det,cond-mat.mtrl-sci,cs.CV,cs.LG,eess.IV

下载: http://arxiv.org/abs/2410.20558v1

RadarOcc: Robust 3D Occupancy Prediction with 4D Imaging Radar

3D occupancy-based perception pipeline has significantly advanced autonomous driving by capturing detailed scene descriptions and demonstrating strong generalizability across various object categories and shapes. Current methods predominantly rely on LiDAR or camera inputs for 3D occupancy prediction. These methods are susceptible to adverse weather conditions, limiting the all-weather deployment of self-driving cars. To improve perception robustness, we leverage the recent advances in automotive radars and introduce a novel approach that utilizes 4D imaging radar sensors for 3D occupancy prediction. Our method, RadarOcc, circumvents the limitations of sparse radar point clouds by directly processing the 4D radar tensor, thus preserving essential scene details. RadarOcc innovatively addresses the challenges associated with the voluminous and noisy 4D radar data by employing Doppler bins descriptors, sidelobe-aware spatial sparsification, and range-wise self-attention mechanisms. To minimize the interpolation errors associated with direct coordinate transformations, we also devise a spherical-based feature encoding followed by spherical-to-Cartesian feature aggregation. We benchmark various baseline methods based on distinct modalities on the public K-Radar dataset. The results demonstrate RadarOcc's state-of-the-art performance in radar-based 3D occupancy prediction and promising results even when compared with LiDAR- or camera-based methods. Additionally, we present qualitative evidence of the superior performance of 4D radar in adverse weather conditions and explore the impact of key pipeline components through ablation studies.

Updated: 2024-10-27 19:15:54

标题: 雷达Occ：具有4D成像雷达的稳健3D占用预测

摘要: 3D基于占用情况的感知管道通过捕捉详细的场景描述，展示了对各种物体类别和形状的强大泛化能力，显著推进了自动驾驶技术。当前的方法主要依赖于LiDAR或摄像头输入进行3D占用情况预测。这些方法容易受到恶劣天气条件的影响，限制了自动驾驶汽车在各种天气条件下的部署。为了提高感知的稳健性，我们利用汽车雷达的最新进展，引入了一种新颖的方法，利用4D成像雷达传感器进行3D占用情况预测。我们的方法RadarOcc通过直接处理4D雷达张量来规避稀疏雷达点云的限制，从而保留了关键的场景细节。RadarOcc通过采用多普勒频段描述符、旁瓣感知空间稀疏化和范围自注意机制，创新地解决了与庞大且嘈杂的4D雷达数据相关的挑战。为了减少直接坐标转换所带来的插值误差，我们还设计了基于球面的特征编码，然后进行球面到笛卡尔特征聚合。我们在公开的K-Radar数据集上基于不同模态的各种基线方法进行了基准测试。结果表明，与LiDAR或摄像头方法相比，RadarOcc在基于雷达的3D占用情况预测方面具有最先进的性能，并且在恶劣天气条件下表现出色，通过消融研究探讨了关键管道组件的影响。

更新时间: 2024-10-27 19:15:54

领域: cs.CV,cs.AI,cs.LG,cs.RO

下载: http://arxiv.org/abs/2405.14014v4

Privacy-Enhanced Adaptive Authentication: User Profiling with Privacy Guarantees

User profiling is a critical component of adaptive risk-based authentication, yet it raises significant privacy concerns, particularly when handling sensitive data. Profiling involves collecting and aggregating various user features, potentially creating quasi-identifiers that can reveal identities and compromise privacy. Even anonymized profiling methods remain vulnerable to re-identification attacks through these quasi-identifiers. This paper introduces a novel privacy-enhanced adaptive authentication protocol that leverages Oblivious Pseudorandom Functions (OPRF), anonymous tokens, and Differential Privacy (DP) to provide robust privacy guarantees. Our proposed approach dynamically adjusts authentication requirements based on real-time risk assessments, enhancing security while safeguarding user privacy. By integrating privacy considerations into the core of adaptive risk-based adaptive authentication, this approach addresses a gap often overlooked in traditional models. Advanced cryptographic techniques ensure confidentiality, integrity, and unlinkability of user data, while differential privacy mechanisms minimize the impact of individual data points on overall analysis. Formal security and privacy proofs demonstrate the protocol's resilience against various threats and its ability to provide strong privacy guarantees. Additionally, a comprehensive performance evaluation reveals that the computational and communication overheads are manageable, making the protocol practical for real-world deployment. By adhering to data protection regulations such as GDPR and CCPA, our protocol not only enhances security but also fosters user trust and compliance with legal standards.

Updated: 2024-10-27 19:11:33

标题: 隐私增强的自适应身份验证：具有隐私保证的用户画像

摘要: 用户画像是自适应风险基础认证的关键组成部分，但在处理敏感数据时会引发重大隐私问题。用户画像涉及收集和聚合各种用户特征，可能会创建可以揭示身份并危害隐私的准标识符。即使是匿名化的画像方法也仍然容易受到通过这些准标识符的再识别攻击。本文介绍了一种新颖的隐私增强自适应认证协议，利用了隐蔽伪随机函数（OPRF）、匿名令牌和差分隐私（DP）来提供强大的隐私保证。我们提出的方法根据实时风险评估动态调整认证要求，增强安全性同时保护用户隐私。通过将隐私考虑纳入自适应风险基础认证的核心，这种方法填补了传统模型经常忽视的差距。先进的密码技术确保用户数据的保密性、完整性和不可链接性，而差分隐私机制最大程度地减少个体数据点对整体分析的影响。正式的安全和隐私证明展示了协议对各种威胁的抵抗力以及提供强大隐私保证的能力。此外，全面的性能评估显示计算和通信开销可控，使协议在实际部署中实用。通过遵守GDPR和CCPA等数据保护法规，我们的协议不仅增强了安全性，还促进了用户信任和遵守法律标准。

更新时间: 2024-10-27 19:11:33

领域: cs.CR

下载: http://arxiv.org/abs/2410.20555v1

Shall We Team Up: Exploring Spontaneous Cooperation of Competing LLM Agents

Large Language Models (LLMs) have increasingly been utilized in social simulations, where they are often guided by carefully crafted instructions to stably exhibit human-like behaviors during simulations. Nevertheless, we doubt the necessity of shaping agents' behaviors for accurate social simulations. Instead, this paper emphasizes the importance of spontaneous phenomena, wherein agents deeply engage in contexts and make adaptive decisions without explicit directions. We explored spontaneous cooperation across three competitive scenarios and successfully simulated the gradual emergence of cooperation, findings that align closely with human behavioral data. This approach not only aids the computational social science community in bridging the gap between simulations and real-world dynamics but also offers the AI community a novel method to assess LLMs' capability of deliberate reasoning.

Updated: 2024-10-27 19:03:37

标题: 我们应该团队合作吗：探讨竞争LLM代理的自发合作

摘要: 大型语言模型（LLMs）越来越多地被用于社会模拟中，通常在模拟过程中受到精心设计的指导，以稳定地展现出类似人类行为。然而，我们怀疑塑造代理行为对于准确的社会模拟是否必要。相反，本文强调了自发现象的重要性，即代理在环境中深度参与并做出适应性决策而无需明确指导。我们探索了在三种竞争场景下的自发合作，并成功模拟了合作逐渐出现的过程，结果与人类行为数据密切吻合。这种方法不仅有助于计算社会科学界弥合模拟与现实动态之间的鸿沟，还为人工智能社区提供了一种评估LLMs的有意推理能力的新方法。

更新时间: 2024-10-27 19:03:37

领域: cs.AI,cs.CL,cs.CY,cs.MA,econ.GN,q-fin.EC

下载: http://arxiv.org/abs/2402.12327v3

SPICEPilot: Navigating SPICE Code Generation and Simulation with AI Guidance

Large Language Models (LLMs) have shown great potential in automating code generation; however, their ability to generate accurate circuit-level SPICE code remains limited due to a lack of hardware-specific knowledge. In this paper, we analyze and identify the typical limitations of existing LLMs in SPICE code generation. To address these limitations, we present SPICEPilot a novel Python-based dataset generated using PySpice, along with its accompanying framework. This marks a significant step forward in automating SPICE code generation across various circuit configurations. Our framework automates the creation of SPICE simulation scripts, introduces standardized benchmarking metrics to evaluate LLM's ability for circuit generation, and outlines a roadmap for integrating LLMs into the hardware design process. SPICEPilot is open-sourced under the permissive MIT license at https://github.com/ACADLab/SPICEPilot.git.

Updated: 2024-10-27 18:58:06

标题: SPICEPilot: 利用人工智能指导进行SPICE代码生成和仿真导航

摘要: 大型语言模型（LLMs）在自动化代码生成方面显示出巨大潜力；然而，由于缺乏硬件特定知识，它们生成准确的电路级SPICE代码的能力仍然有限。在本文中，我们分析并确定现有LLMs在SPICE代码生成中的典型限制。为了解决这些限制，我们提出了SPICEPilot，这是一个使用PySpice生成的新颖基于Python的数据集，以及其相应的框架。这标志着在各种电路配置中自动化SPICE代码生成的重要进展。我们的框架自动化SPICE仿真脚本的创建，引入了标准化的基准度量指标来评估LLMs在电路生成方面的能力，并概述了将LLMs整合到硬件设计过程中的路线图。SPICEPilot在https://github.com/ACADLab/SPICEPilot.git上以MIT许可证开源。

更新时间: 2024-10-27 18:58:06

领域: cs.AR,cs.AI

下载: http://arxiv.org/abs/2410.20553v1

Beyond Efficiency: A Systematic Survey of Resource-Efficient Large Language Models

The burgeoning field of Large Language Models (LLMs), exemplified by sophisticated models like OpenAI's ChatGPT, represents a significant advancement in artificial intelligence. These models, however, bring forth substantial challenges in the high consumption of computational, memory, energy, and financial resources, especially in environments with limited resource capabilities. This survey aims to systematically address these challenges by reviewing a broad spectrum of techniques designed to enhance the resource efficiency of LLMs. We categorize methods based on their optimization focus: computational, memory, energy, financial, and network resources and their applicability across various stages of an LLM's lifecycle, including architecture design, pretraining, finetuning, and system design. Additionally, the survey introduces a nuanced categorization of resource efficiency techniques by their specific resource types, which uncovers the intricate relationships and mappings between various resources and corresponding optimization techniques. A standardized set of evaluation metrics and datasets is also presented to facilitate consistent and fair comparisons across different models and techniques. By offering a comprehensive overview of the current sota and identifying open research avenues, this survey serves as a foundational reference for researchers and practitioners, aiding them in developing more sustainable and efficient LLMs in a rapidly evolving landscape.

Updated: 2024-10-27 18:47:47

标题: 超越效率：资源高效大型语言模型的系统调查

摘要: 大型语言模型（LLMs）领域蓬勃发展，OpenAI的ChatGPT等复杂模型是其代表，代表了人工智能的重大进步。然而，这些模型在计算、内存、能源和财务资源的高消耗方面带来了重大挑战，特别是在资源能力有限的环境中。本调查旨在通过审查一系列旨在提高LLMs资源效率的技术，系统性地解决这些挑战。我们根据优化重点将方法分类为：计算、内存、能源、财务和网络资源，以及它们在LLM生命周期的各个阶段（包括架构设计、预训练、微调和系统设计）中的适用性。此外，该调查通过特定资源类型对资源效率技术进行细致分类，揭示了各种资源及其相应优化技术之间的复杂关系和映射。还提供了一套标准的评估指标和数据集，以便在不同模型和技术之间进行一致和公平的比较。通过全面概述当前的研究现状并识别开放的研究方向，这项调查可作为研究人员和从业者的基础参考，帮助他们在快速发展的领域中开发更可持续和高效的LLMs。

更新时间: 2024-10-27 18:47:47

领域: cs.LG

下载: http://arxiv.org/abs/2401.00625v3

SympCam: Remote Optical Measurement of Sympathetic Arousal

Recent work has shown that a person's sympathetic arousal can be estimated from facial videos alone using basic signal processing. This opens up new possibilities in the field of telehealth and stress management, providing a non-invasive method to measure stress only using a regular RGB camera. In this paper, we present SympCam, a new 3D convolutional architecture tailored to the task of remote sympathetic arousal prediction. Our model incorporates a temporal attention module (TAM) to enhance the temporal coherence of our sequential data processing capabilities. The predictions from our method improve accuracy metrics of sympathetic arousal in prior work by 48% to a mean correlation of 0.77. We additionally compare our method with common remote photoplethysmography (rPPG) networks and show that they alone cannot accurately predict sympathetic arousal "out-of-the-box". Furthermore, we show that the sympathetic arousal predicted by our method allows detecting physical stress with a balanced accuracy of 90% - an improvement of 61% compared to the rPPG method commonly used in related work, demonstrating the limitations of using rPPG alone. Finally, we contribute a dataset designed explicitly for the task of remote sympathetic arousal prediction. Our dataset contains synchronized face and hand videos of 20 participants from two cameras synchronized with electrodermal activity (EDA) and photoplethysmography (PPG) measurements. We will make this dataset available to the community and use it to evaluate the methods in this paper. To the best of our knowledge, this is the first dataset available to other researchers designed for remote sympathetic arousal prediction.

Updated: 2024-10-27 18:46:55

标题: SympCam：远程光学测量交感激动

摘要: 最近的研究表明，可以使用基本信号处理仅通过面部视频来估计一个人的交感兴奋。这为远程健康和压力管理领域提供了新的可能性，提供了一种仅使用普通RGB相机来测量压力的非侵入性方法。在这篇论文中，我们提出了SympCam，一个专门用于远程交感兴奋预测任务的新的三维卷积架构。我们的模型包括一个时间注意模块（TAM），以增强我们的顺序数据处理能力的时间一致性。我们方法的预测通过48％提高了以往工作中的交感兴奋的准确度指标，平均相关性为0.77。此外，我们将我们的方法与常见的远程光电容测（rPPG）网络进行比较，并显示它们单独无法准确地“开箱即用”预测交感兴奋。此外，我们展示了我们的方法预测的交感兴奋可以以90％的平衡准确度检测身体压力 - 与相关工作中常用的rPPG方法相比提高了61％，展示了仅使用rPPG的局限性。最后，我们贡献了一个专门设计用于远程交感兴奋预测任务的数据集。我们的数据集包含20名参与者的面部和手部视频，与皮肤电活动（EDA）和光电容测（PPG）测量同步。我们将使这个数据集可供社区使用，并用它来评估本文中的方法。据我们所知，这是首个可供其他研究人员使用的专为远程交感兴奋预测而设计的数据集。

更新时间: 2024-10-27 18:46:55

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2410.20552v1

Deep Reinforcement Learning Agents for Strategic Production Policies in Microeconomic Market Simulations

Traditional economic models often rely on fixed assumptions about market dynamics, limiting their ability to capture the complexities and stochastic nature of real-world scenarios. However, reality is more complex and includes noise, making traditional models assumptions not met in the market. In this paper, we explore the application of deep reinforcement learning (DRL) to obtain optimal production strategies in microeconomic market environments to overcome the limitations of traditional models. Concretely, we propose a DRL-based approach to obtain an effective policy in competitive markets with multiple producers, each optimizing their production decisions in response to fluctuating demand, supply, prices, subsidies, fixed costs, total production curve, elasticities and other effects contaminated by noise. Our framework enables agents to learn adaptive production policies to several simulations that consistently outperform static and random strategies. As the deep neural networks used by the agents are universal approximators of functions, DRL algorithms can represent in the network complex patterns of data learnt by trial and error that explain the market. Through extensive simulations, we demonstrate how DRL can capture the intricate interplay between production costs, market prices, and competitor behavior, providing insights into optimal decision-making in dynamic economic settings. The results show that agents trained with DRL can strategically adjust production levels to maximize long-term profitability, even in the face of volatile market conditions. We believe that the study bridges the gap between theoretical economic modeling and practical market simulation, illustrating the potential of DRL to revolutionize decision-making in market strategies.

Updated: 2024-10-27 18:38:05

标题: 深度强化学习智能体用于微观经济市场模拟中的战略生产政策

摘要: 传统经济模型往往依赖于固定的市场动态假设，限制了它们捕捉现实世界场景复杂性和随机性的能力。然而，现实更加复杂，包括噪音，使得传统模型的假设无法满足市场需求。在本文中，我们探讨了深度强化学习（DRL）在微观经济市场环境中获取最佳生产策略的应用，以克服传统模型的局限性。具体地，我们提出了一种基于DRL的方法，在竞争市场中获得有效的政策，其中每个生产者都根据需求、供应、价格、补贴、固定成本、总生产曲线、弹性和其他受噪音影响的因素优化其生产决策。我们的框架使代理能够学习适应性生产策略，比静态和随机策略在多个模拟中表现更好。由于代理使用的深度神经网络是函数的通用逼近器，DRL算法可以在网络中表示通过试错学习的复杂数据模式，解释市场。通过广泛的模拟，我们展示了DRL如何捕捉生产成本、市场价格和竞争者行为之间的复杂互动，为动态经济环境中的最佳决策提供见解。结果显示，经过DRL训练的代理可以战略性地调整生产水平，以最大化长期盈利，即使面对波动的市场条件。我们相信该研究弥合了理论经济建模与实际市场模拟之间的差距，展示了DRL革新市场策略决策的潜力。

更新时间: 2024-10-27 18:38:05

领域: cs.LG,cs.AI,cs.MA

下载: http://arxiv.org/abs/2410.20550v1

Mind Your Step (by Step): Chain-of-Thought can Reduce Performance on Tasks where Thinking Makes Humans Worse

Chain-of-thought (CoT) prompting has become a widely used strategy for working with large language and multimodal models. While CoT has been shown to improve performance across many tasks, determining the settings in which it is effective remains an ongoing effort. In particular, it is still an open question in what settings CoT systematically reduces model performance. In this paper, we seek to identify the characteristics of tasks where CoT reduces performance by drawing inspiration from cognitive psychology, looking at cases where (i) verbal thinking or deliberation hurts performance in humans, and (ii) the constraints governing human performance generalize to language models. Three such cases are implicit statistical learning, visual recognition, and classifying with patterns containing exceptions. In extensive experiments across all three settings, we find that a diverse collection of state-of-the-art models exhibit significant drop-offs in performance (e.g., up to 36.3% absolute accuracy for OpenAI o1-preview compared to GPT-4o) when using inference-time reasoning compared to zero-shot counterparts. We also identify three tasks that satisfy condition (i) but not (ii), and find that while verbal thinking reduces human performance in these tasks, CoT retains or increases model performance. Overall, our results show that while there is not an exact parallel between the cognitive processes of models and those of humans, considering cases where thinking has negative consequences for human performance can help us identify settings where it negatively impacts models. By connecting the literature on human deliberation with evaluations of CoT, we offer a new tool that can be used in understanding the impact of prompt choices and inference-time reasoning.

Updated: 2024-10-27 18:30:41

标题: 小心你的步伐（逐步进行）：思维链条可能会降低人类在需要思考时的任务表现

摘要: 思维链（CoT）提示已成为与大型语言和多模态模型一起工作的广泛使用策略。虽然CoT已被证明在许多任务中提高性能，但确定其有效性的设置仍然是一个持续的努力。特别是，在哪些设置中CoT系统地降低模型性能仍然是一个开放性问题。在本文中，我们试图通过从认知心理学中汲取灵感，研究在哪些任务特征中CoT会降低性能，从而确定CoT降低性能的特征。我们观察到在哪些情况下，（i）语言思维或思考会损害人类表现，以及（ii）支配人类表现的约束规律会推广至语言模型。三种这样的情况是隐式统计学习、视觉识别和包含异常的模式分类。在所有这三种设置的广泛实验中，我们发现，一系列最先进的模型在使用推理时间推理时，与零-shot对应模型相比，性能显著下降（例如，OpenAI o1-preview的绝对准确率与GPT-4o相比下降了高达36.3%）。我们还确定了三个满足条件（i）但不满足条件（ii）的任务，并发现虽然语言思维会降低人类表现，但CoT保持或提高了模型性能。总的来说，我们的结果表明，虽然模型的认知过程与人类的认知过程之间并没有完全的平行，但考虑到思考对人类表现产生负面影响的情况，可以帮助我们确定它对模型产生负面影响的设置。通过将人类思考的文献与CoT的评估联系起来，我们提供了一种新工具，可以用于理解提示选择和推理时间推理的影响。

更新时间: 2024-10-27 18:30:41

领域: cs.LG,cs.AI,cs.CL,cs.CY

下载: http://arxiv.org/abs/2410.21333v1

Task-Agnostic Machine-Learning-Assisted Inference

Machine learning (ML) is playing an increasingly important role in scientific research. In conjunction with classical statistical approaches, ML-assisted analytical strategies have shown great promise in accelerating research findings. This has also opened a whole field of methodological research focusing on integrative approaches that leverage both ML and statistics to tackle data science challenges. One type of study that has quickly gained popularity employs ML to predict unobserved outcomes in massive samples, and then uses predicted outcomes in downstream statistical inference. However, existing methods designed to ensure the validity of this type of post-prediction inference are limited to very basic tasks such as linear regression analysis. This is because any extension of these approaches to new, more sophisticated statistical tasks requires task-specific algebraic derivations and software implementations, which ignores the massive library of existing software tools already developed for the same scientific problem given observed data. This severely constrains the scope of application for post-prediction inference. To address this challenge, we introduce a novel statistical framework named PSPS for task-agnostic ML-assisted inference. It provides a post-prediction inference solution that can be easily plugged into almost any established data analysis routines. It delivers valid and efficient inference that is robust to arbitrary choice of ML model, allowing nearly all existing statistical frameworks to be incorporated into the analysis of ML-predicted data. Through extensive experiments, we showcase our method's validity, versatility, and superiority compared to existing approaches. Our software is available at https://github.com/qlu-lab/psps.

Updated: 2024-10-27 18:18:36

标题: 任务无关的机器学习辅助推理

摘要: 机器学习（ML）在科学研究中发挥着越来越重要的作用。与传统统计方法结合，ML辅助分析策略显示出加速研究发现的巨大潜力。这也开辟了一个整个方法论研究领域，专注于利用ML和统计学两者来解决数据科学挑战的整合方法。一种迅速获得流行的研究类型是利用ML在大规模样本中预测未观察到的结果，然后在下游统计推断中使用预测的结果。然而，现有方法旨在确保这种后预测推断有效性的设计受限于非常基础的任务，如线性回归分析。这是因为将这些方法扩展到新的、更复杂的统计任务需要特定任务的代数推导和软件实现，这忽视了已经为相同科学问题基于观察数据开发的大量现有软件工具库。这严重限制了后预测推断的应用范围。为了解决这一挑战，我们提出了一种名为PSPS的新颖统计框架，用于任务不可知ML辅助推断。它提供了一个后预测推断解决方案，可以轻松地插入几乎任何已建立的数据分析程序中。它提供了有效的推断，对ML模型的任意选择具有鲁棒性，允许几乎所有现有的统计框架被纳入ML预测数据的分析中。通过广泛的实验，我们展示了我们的方法相对于现有方法的有效性、多功能性和优越性。我们的软件可在https://github.com/qlu-lab/psps 上获得。

更新时间: 2024-10-27 18:18:36

领域: stat.ML,cs.LG,stat.ME

下载: http://arxiv.org/abs/2405.20039v2

PaPaGei: Open Foundation Models for Optical Physiological Signals

Photoplethysmography (PPG) is the most widely used non-invasive technique for monitoring biosignals and cardiovascular health, with applications in both clinical settings and consumer health through wearable devices. Current machine learning models trained on PPG signals are mostly task-specific and lack generalizability. Previous works often used single-device datasets, did not explore out-of-domain generalization, or did not release their models, hindering reproducibility and further research. We introduce PaPaGei, the first open foundation model for PPG signals. PaPaGei is pre-trained on more than 57,000 hours of 20 million unlabeled segments of PPG signals using publicly available datasets exclusively. We evaluate against popular time-series foundation models and other benchmarks on 20 tasks of 10 diverse datasets spanning cardiovascular health, sleep disorders, pregnancy monitoring, and wellbeing assessment. Our architecture incorporates novel representation learning approaches that leverage differences in PPG signal morphology across individuals, enabling it to capture richer representations than traditional contrastive learning methods. Across 20 tasks, PaPaGei improves classification and regression performance by an average of 6.3% and 2.9%, respectively, compared to other competitive time-series foundation models in at least 14 tasks. PaPaGei is more data- and parameter-efficient than other foundation models or methods, as it outperforms 70x larger models. Beyond accuracy, we also investigate robustness against different skin tones, establishing a benchmark for bias evaluations of future models. Notably, PaPaGei can be used out of the box as both a feature extractor and an encoder for other multimodal models, opening up new opportunities for multimodal health monitoring

Updated: 2024-10-27 18:18:06

标题: PaPaGei：用于光学生理信号的开放基础模型

摘要: 。光电容积描记术（PPG）是监测生物信号和心血管健康的最广泛使用的非侵入性技术，应用于临床环境和可穿戴设备的消费者健康领域。目前基于PPG信号训练的机器学习模型大多是特定任务的，缺乏泛化能力。先前的研究通常使用单一设备数据集，未探索域外泛化，或未发布他们的模型，这阻碍了可重复性和进一步研究。我们介绍了PaPaGei，这是第一个针对PPG信号的开放基础模型。PaPaGei在公开可用数据集上使用超过57000小时的2000万个未标记的PPG信号段进行了预训练。我们在涵盖心血管健康、睡眠障碍、妊娠监测和健康评估的10个多样数据集的20个任务上对其进行评估。我们的架构融入了新颖的表示学习方法，利用个体间PPG信号形态学的差异，使其能够捕捉比传统对比学习方法更丰富的表示。在20个任务中，与其他竞争性时间序列基础模型相比，PaPaGei在分类和回归性能上分别提高了平均6.3%和2.9%，至少在14个任务中表现更好。PaPaGei比其他基础模型或方法更具数据和参数效率，在性能上胜过70倍更大的模型。除了准确性，我们还调查了对不同肤色的鲁棒性，为未来模型的偏见评估建立了一个基准。值得注意的是，PaPaGei可以作为特征提取器和其他多模态模型的编码器直接使用，为多模态健康监测开辟了新机会。

更新时间: 2024-10-27 18:18:06

领域: cs.LG,eess.SP

下载: http://arxiv.org/abs/2410.20542v1

Building, Reusing, and Generalizing Abstract Representations from Concrete Sequences

Humans excel at learning abstract patterns across different sequences, filtering out irrelevant details, and transferring these generalized concepts to new sequences. In contrast, many sequence learning models lack the ability to abstract, which leads to memory inefficiency and poor transfer. We introduce a non-parametric hierarchical variable learning model (HVM) that learns chunks from sequences and abstracts contextually similar chunks as variables. HVM efficiently organizes memory while uncovering abstractions, leading to compact sequence representations. When learning on language datasets such as babyLM, HVM learns a more efficient dictionary than standard compression algorithms such as Lempel-Ziv. In a sequence recall task requiring the acquisition and transfer of variables embedded in sequences, we demonstrate HVM's sequence likelihood correlates with human recall times. In contrast, large language models (LLMs) struggle to transfer abstract variables as effectively as humans. From HVM's adjustable layer of abstraction, we demonstrate that the model realizes a precise trade-off between compression and generalization. Our work offers a cognitive model that captures the learning and transfer of abstract representations in human cognition and differentiates itself from the behavior of large language models.

Updated: 2024-10-27 18:13:07

标题: 构建、重用和泛化从具体序列中抽象表示

摘要: 人类擅长学习跨不同序列的抽象模式，过滤掉不相关的细节，并将这些概念推广到新的序列中。相比之下，许多序列学习模型缺乏抽象能力，导致记忆效率低下和转移能力差。我们介绍了一种非参数化的分层可变学习模型（HVM），该模型从序列中学习块并将上下文相似的块抽象为变量。HVM在揭示抽象的同时高效地组织记忆，导致紧凑的序列表示。在像babyLM这样的语言数据集上学习时，HVM比Lempel-Ziv等标准压缩算法学习到更高效的字典。在一个需要获取和转移嵌入序列中的变量的序列回忆任务中，我们展示了HVM的序列可能性与人类回忆时间相关。相比之下，大型语言模型（LLMs）在转移抽象变量方面不如人类有效。通过HVM的可调节的抽象层，我们展示了模型在压缩和概括之间实现了精确的折衷。我们的工作提供了一个认知模型，捕捉了人类认知中抽象表示的学习和转移，并与大型语言模型的行为有所区别。

更新时间: 2024-10-27 18:13:07

领域: cs.LG,cs.AI,cs.CL

下载: http://arxiv.org/abs/2410.21332v1

Info-CELS: Informative Saliency Map Guided Counterfactual Explanation

As the demand for interpretable machine learning approaches continues to grow, there is an increasing necessity for human involvement in providing informative explanations for model decisions. This is necessary for building trust and transparency in AI-based systems, leading to the emergence of the Explainable Artificial Intelligence (XAI) field. Recently, a novel counterfactual explanation model, CELS, has been introduced. CELS learns a saliency map for the interest of an instance and generates a counterfactual explanation guided by the learned saliency map. While CELS represents the first attempt to exploit learned saliency maps not only to provide intuitive explanations for the reason behind the decision made by the time series classifier but also to explore post hoc counterfactual explanations, it exhibits limitations in terms of high validity for the sake of ensuring high proximity and sparsity. In this paper, we present an enhanced approach that builds upon CELS. While the original model achieved promising results in terms of sparsity and proximity, it faced limitations in validity. Our proposed method addresses this limitation by removing mask normalization to provide more informative and valid counterfactual explanations. Through extensive experimentation on datasets from various domains, we demonstrate that our approach outperforms the CELS model, achieving higher validity and producing more informative explanations.

Updated: 2024-10-27 18:12:02

标题: Info-CELS：信息显著图引导的反事实解释

摘要: 随着对可解释机器学习方法的需求不断增长，人类参与提供模型决策的信息解释变得越来越必要。这对于建立对基于人工智能系统的信任和透明度是必要的，从而导致了可解释人工智能（XAI）领域的出现。最近，引入了一种新颖的反事实解释模型CELS。CELS学习了一个实例的兴趣区域图，并生成了一个由学习到的兴趣区域图引导的反事实解释。虽然CELS代表了利用学习到的兴趣区域图不仅为时间序列分类器所做出的决定提供直观解释的第一次尝试，而且探索了事后反事实解释，但它在确保高接近度和稀疏性的高有效性方面存在局限性。在本文中，我们提出了一种基于CELS的增强方法。尽管原始模型在稀疏性和接近性方面取得了令人满意的结果，但在有效性方面存在局限性。我们提出的方法通过去除掩模归一化来解决这一限制，从而提供更多信息丰富和有效的反事实解释。通过对来自各个领域的数据集进行广泛实验，我们证明了我们的方法优于CELS模型，在实现更高的有效性和产生更具信息性的解释方面表现更好。

更新时间: 2024-10-27 18:12:02

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2410.20539v1

Memorization in In-Context Learning

In-context learning (ICL) has proven to be an effective strategy for improving the performance of large language models (LLMs) with no additional training. However, the exact mechanism behind this performance improvement remains unclear. This study is the first to show how ICL surfaces memorized training data and to explore the correlation between this memorization and performance on downstream tasks across various ICL regimes: zero-shot, few-shot, and many-shot. Our most notable findings include: (1) ICL significantly surfaces memorization compared to zero-shot learning in most cases; (2) demonstrations, without their labels, are the most effective element in surfacing memorization; (3) ICL improves performance when the surfaced memorization in few-shot regimes reaches a high level (about 40%); and (4) there is a very strong correlation between performance and memorization in ICL when it outperforms zero-shot learning. Overall, our study uncovers memorization as a new factor impacting ICL, raising an important question: to what extent do LLMs truly generalize from demonstrations in ICL, and how much of their success is due to memorization?

Updated: 2024-10-27 18:04:58

标题: 在上下文学习中的记忆化

摘要: 在上下文学习（ICL）已被证明是一种有效的策略，可以提高大型语言模型（LLMs）的性能，而无需额外的训练。然而，这种性能改进背后的确切机制仍不清楚。本研究首次展示了ICL如何呈现出记忆的训练数据，并探讨了在各种ICL制度下这种记忆和在下游任务中表现之间的相关性：零-shot、少-shot和多-shot。我们最显著的发现包括：（1）在大多数情况下，相比零-shot学习，ICL显著地呈现出记忆；（2）没有标签的演示是呈现记忆的最有效元素；（3）当在少-shot制度下呈现的记忆达到较高水平（约40%）时，ICL提高了性能；以及（4）当ICL优于零-shot学习时，性能和记忆之间存在非常强的相关性。总的来说，我们的研究揭示了记忆作为影响ICL的新因素，提出了一个重要问题：LLMs在ICL中真正从演示中推广到何种程度，他们的成功有多少归功于记忆？

更新时间: 2024-10-27 18:04:58

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2408.11546v2

Resource-aware Mixed-precision Quantization for Enhancing Deployability of Transformers for Time-series Forecasting on Embedded FPGAs

This study addresses the deployment challenges of integer-only quantized Transformers on resource-constrained embedded FPGAs (Xilinx Spartan-7 XC7S15). We enhanced the flexibility of our VHDL template by introducing a selectable resource type for storing intermediate results across model layers, thereby breaking the deployment bottleneck by utilizing BRAM efficiently. Moreover, we developed a resource-aware mixed-precision quantization approach that enables researchers to explore hardware-level quantization strategies without requiring extensive expertise in Neural Architecture Search. This method provides accurate resource utilization estimates with a precision discrepancy as low as 3%, compared to actual deployment metrics. Compared to previous work, our approach has successfully facilitated the deployment of model configurations utilizing mixed-precision quantization, thus overcoming the limitations inherent in five previously non-deployable configurations with uniform quantization bitwidths. Consequently, this research enhances the applicability of Transformers in embedded systems, facilitating a broader range of Transformer-powered applications on edge devices.

Updated: 2024-10-27 18:04:57

标题: 资源感知的混合精度量化技术，用于增强嵌入式FPGA上变压器模型在时间序列预测中的部署性

摘要: 这项研究探讨了在资源受限的嵌入式FPGA（Xilinx Spartan-7 XC7S15）上部署仅使用整数量化的Transformer的挑战。我们通过引入可选择的资源类型来增强我们的VHDL模板的灵活性，用于在模型层之间存储中间结果，从而通过有效利用BRAM来打破部署瓶颈。此外，我们开发了一种资源感知的混合精度量化方法，使研究人员能够探索硬件级别的量化策略，而无需具备广泛的神经结构搜索专业知识。与实际部署指标相比，该方法提供了精确的资源利用估计，精度差异低至3%。与先前的工作相比，我们的方法成功促进了利用混合精度量化的模型配置的部署，从而克服了以前五种无法部署的配置中均匀量化比特宽度的固有限制。因此，该研究增强了在嵌入式系统中使用Transformers的适用性，促进了在边缘设备上更广泛范围的基于Transformer的应用。

更新时间: 2024-10-27 18:04:57

领域: cs.LG

下载: http://arxiv.org/abs/2410.03294v2

Beyond Interpretability: The Gains of Feature Monosemanticity on Model Robustness

Deep learning models often suffer from a lack of interpretability due to polysemanticity, where individual neurons are activated by multiple unrelated semantics, resulting in unclear attributions of model behavior. Recent advances in monosemanticity, where neurons correspond to consistent and distinct semantics, have significantly improved interpretability but are commonly believed to compromise accuracy. In this work, we challenge the prevailing belief of the accuracy-interpretability tradeoff, showing that monosemantic features not only enhance interpretability but also bring concrete gains in model performance. Across multiple robust learning scenarios-including input and label noise, few-shot learning, and out-of-domain generalization-our results show that models leveraging monosemantic features significantly outperform those relying on polysemantic features. Furthermore, we provide empirical and theoretical understandings on the robustness gains of feature monosemanticity. Our preliminary analysis suggests that monosemanticity, by promoting better separation of feature representations, leads to more robust decision boundaries. This diverse evidence highlights the generality of monosemanticity in improving model robustness. As a first step in this new direction, we embark on exploring the learning benefits of monosemanticity beyond interpretability, supporting the long-standing hypothesis of linking interpretability and robustness. Code is available at \url{https://github.com/PKU-ML/Beyond_Interpretability}.

Updated: 2024-10-27 18:03:20

标题: 超越可解释性：特征单义性对模型稳健性的影响

摘要: 深度学习模型经常因为多义性而缺乏解释性，其中单个神经元被多个不相关的语义激活，导致模型行为的归因不清晰。最近单义性的进展，其中神经元对应一致和明显的语义，显著提高了解释性，但普遍认为会损害准确性。在这项工作中，我们挑战了准确性和解释性之间的权衡的普遍信念，显示出单义性特征不仅提高了解释性，还带来了模型性能的具体收益。在包括输入和标签噪声、少样本学习和跨域泛化在内的多个稳健学习场景中，我们的结果表明利用单义性特征的模型明显优于依赖多义性特征的模型。此外，我们提供了关于特征单义性稳健性收益的经验和理论理解。我们的初步分析表明，通过促进特征表示的更好分离，单义性导致更稳健的决策边界。这些多样化的证据突显了单义性在提高模型稳健性方面的普适性。作为这一新方向的第一步，我们着手探索单义性超出解释性的学习益处，支持将解释性和稳健性联系起来的长期假设。代码可在\url{https://github.com/PKU-ML/Beyond_Interpretability}找到。

更新时间: 2024-10-27 18:03:20

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2410.21331v1

SIGMA: Single Interpolated Generative Model for Anomalies

A key step in any resonant anomaly detection search is accurate modeling of the background distribution in each signal region. Data-driven methods like CATHODE accomplish this by training separate generative models on the complement of each signal region, and interpolating them into their corresponding signal regions. Having to re-train the generative model on essentially the entire dataset for each signal region is a major computational cost in a typical sliding window search with many signal regions. Here, we present SIGMA, a new, fully data-driven, computationally-efficient method for estimating background distributions. The idea is to train a single generative model on all of the data and interpolate its parameters in sideband regions in order to obtain a model for the background in the signal region. The SIGMA method significantly reduces the computational cost compared to previous approaches, while retaining a similar high quality of background modeling and sensitivity to anomalous signals.

Updated: 2024-10-27 18:00:00

标题: SIGMA：单插值生成模型用于异常

摘要: 在任何谐振异常检测搜索中的一个关键步骤是准确建模每个信号区域的背景分布。像CATHODE这样的数据驱动方法通过在每个信号区域的补集上训练单独的生成模型，并将它们插值到相应的信号区域中来实现这一点。在典型的具有许多信号区域的滑动窗口搜索中，必须重新对基本上整个数据集进行生成模型的训练是一个主要的计算成本。在这里，我们提出了SIGMA，一种新的、完全数据驱动、计算效率高的方法来估计背景分布。其思路是在所有数据上训练一个单一的生成模型，并在旁边区域内插值其参数，以获得信号区域的背景模型。与以前的方法相比，SIGMA方法显著降低了计算成本，同时保持了类似的高质量背景建模和对异常信号的敏感性。

更新时间: 2024-10-27 18:00:00

领域: hep-ph,cs.LG,hep-ex,physics.data-an

下载: http://arxiv.org/abs/2410.20537v1

Malinowski in the Age of AI: Can large language models create a text game based on an anthropological classic?

Recent advancements in Large Language Models (LLMs) like ChatGPT and GPT-4 have shown remarkable abilities in a wide range of tasks such as summarizing texts and assisting in coding. Scientific research has demonstrated that these models can also play text-adventure games. This study aims to explore whether LLMs can autonomously create text-based games based on anthropological classics, evaluating also their effectiveness in communicating knowledge. To achieve this, the study engaged anthropologists in discussions to gather their expectations and design inputs for an anthropologically themed game. Through iterative processes following the established HCI principle of 'design thinking', the prompts and the conceptual framework for crafting these games were refined. Leveraging GPT3.5, the study created three prototypes of games centered around the seminal anthropological work of the social anthropologist's Bronislaw Malinowski's "Argonauts of the Western Pacific" (1922). Subsequently, evaluations were conducted by inviting senior anthropologists to playtest these games, and based on their inputs, the game designs were refined. The tests revealed promising outcomes but also highlighted key challenges: the models encountered difficulties in providing in-depth thematic understandings, showed suspectibility to misinformation, tended towards monotonic responses after an extended period of play, and struggled to offer detailed biographical information. Despite these limitations, the study's findings open up new research avenues at the crossroads of artificial intelligence, machine learning, LLMs, ethnography, anthropology and human-computer interaction.

Updated: 2024-10-27 17:59:17

标题: 马林诺夫斯基在人工智能时代：大型语言模型能否基于人类学经典创作一个文本游戏？

摘要: 最近大型语言模型（LLMs）如ChatGPT和GPT-4的进展显示出在广泛任务中具有卓越能力，例如总结文本和辅助编码。科学研究表明，这些模型也可以玩文字冒险游戏。本研究旨在探讨LLMs是否能够自主创建基于人类学经典的基于文本的游戏，同时评估它们在传播知识方面的有效性。为实现这一目标，本研究邀请人类学家进行讨论，收集他们对一个人类学主题游戏的期望和设计输入。通过遵循已确立的HCI设计思维原则，通过迭代过程，完善了这些游戏的提示和概念框架。利用GPT3.5，研究创建了三个游戏原型，围绕社会人类学家Bronislaw Malinowski的重要人类学作品《西太平洋的阿尔戈诺特》（1922）展开。随后，邀请高级人类学家玩测试这些游戏，并根据他们的意见，改进游戏设计。测试显示了有希望的结果，但也突显了关键挑战：模型在提供深入主题理解方面遇到困难，易受错误信息影响，在长时间玩后倾向于单调回应，并难以提供详细的传记信息。尽管存在这些限制，本研究的发现开启了人工智能、机器学习、LLMs、民族志学、人类学和人机交互交叉领域的新研究路径。

更新时间: 2024-10-27 17:59:17

领域: cs.HC,cs.AI,cs.CY

下载: http://arxiv.org/abs/2410.20536v1

Asynchronous Perception Machine For Efficient Test-Time-Training

In this work, we propose Asynchronous Perception Machine (APM), a computationally-efficient architecture for test-time-training (TTT). APM can process patches of an image one at a time in any order \textit{asymmetrically,} and \textit{still encode} semantic-awareness in the net. We demonstrate APM's ability to recognize out-of-distribution images \textit{without} dataset-specific pre-training, augmentation or any-pretext task. APM offers competitive performance over existing TTT approaches. To perform TTT, APM just distills test sample's representation \textit{once}. APM possesses a unique property: it can learn using just this single representation and starts predicting semantically-aware features. APM demostrates potential applications beyond test-time-training: APM can scale up to a dataset of 2D images and yield semantic-clusterings in a single forward pass. APM also provides first empirical evidence towards validating GLOM's insight, i.e. input percept is a field. Therefore, APM helps us converge towards an implementation which can do \textit{both} interpolation and perception on a \textit{shared}-connectionist hardware. Our code is publicly available at this link: https://github.com/rajatmodi62/apm.

Updated: 2024-10-27 17:57:30

标题: 一个用于高效的测试时间训练的异步感知机

摘要: 在这项工作中，我们提出了一种异步感知机（APM），这是一种计算效率高的架构，用于测试时训练（TTT）。APM可以以任意顺序逐个处理图像的补丁，而仍然在网络中编码语义意识。我们展示了APM能够识别超出分布的图像，而不需要特定于数据集的预训练、增强或任何预文本任务。APM在现有的TTT方法上表现出竞争性能。为了进行TTT，APM只需提取测试样本的表示一次。APM具有独特的特性：它可以仅使用这个单一表示学习，并开始预测语义感知特征。 APM展示了超出测试时训练的潜在应用：APM可以扩展到包含2D图像的数据集，并在单次前向传递中产生语义聚类。APM还提供了第一手实证证据，验证GLOM的洞察力，即输入感知是一个领域。因此，APM帮助我们朝着能够在共享连接主义硬件上进行插值和感知的实现方向收敛。我们的代码公开可用于此链接：https://github.com/rajatmodi62/apm。

更新时间: 2024-10-27 17:57:30

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2410.20535v1

MOMAland: A Set of Benchmarks for Multi-Objective Multi-Agent Reinforcement Learning

Many challenging tasks such as managing traffic systems, electricity grids, or supply chains involve complex decision-making processes that must balance multiple conflicting objectives and coordinate the actions of various independent decision-makers (DMs). One perspective for formalising and addressing such tasks is multi-objective multi-agent reinforcement learning (MOMARL). MOMARL broadens reinforcement learning (RL) to problems with multiple agents each needing to consider multiple objectives in their learning process. In reinforcement learning research, benchmarks are crucial in facilitating progress, evaluation, and reproducibility. The significance of benchmarks is underscored by the existence of numerous benchmark frameworks developed for various RL paradigms, including single-agent RL (e.g., Gymnasium), multi-agent RL (e.g., PettingZoo), and single-agent multi-objective RL (e.g., MO-Gymnasium). To support the advancement of the MOMARL field, we introduce MOMAland, the first collection of standardised environments for multi-objective multi-agent reinforcement learning. MOMAland addresses the need for comprehensive benchmarking in this emerging field, offering over 10 diverse environments that vary in the number of agents, state representations, reward structures, and utility considerations. To provide strong baselines for future research, MOMAland also includes algorithms capable of learning policies in such settings.

Updated: 2024-10-27 17:55:41

标题: MOMAland：多目标多智能体强化学习的一组基准Benchmark

摘要: 许多具有挑战性的任务，如管理交通系统、电力网络或供应链，涉及复杂的决策过程，必须平衡多个相互冲突的目标，并协调各种独立决策者（DMs）的行动。一种形式化和解决这类任务的视角是多目标多智能体强化学习（MOMARL）。MOMARL将强化学习（RL）扩展到需要在学习过程中考虑多个目标的多个智能体的问题。在强化学习研究中，基准测试对于促进进展、评估和可重现性至关重要。基准测试的重要性得到了多个基准框架的存在所强调，这些框架针对各种RL范式进行了开发，包括单智能体RL（例如Gymnasium）、多智能体RL（例如PettingZoo）和单智能体多目标RL（例如MO-Gymnasium）。为了支持MOMARL领域的发展，我们介绍了MOMAland，这是第一个用于多目标多智能体强化学习的标准化环境集合。MOMAland解决了这一新兴领域中全面基准测试的需求，提供了超过10个不同的环境，这些环境在智能体数量、状态表示、奖励结构和效用考虑方面各不相同。为了为未来研究提供强有力的基准线，MOMAland还包括了能够在这种设置中学习策略的算法。

更新时间: 2024-10-27 17:55:41

领域: cs.MA,cs.AI,cs.GT

下载: http://arxiv.org/abs/2407.16312v2

Guiding Through Complexity: What Makes Good Supervision for Hard Reasoning Tasks?

How can "weak teacher models" such as average human annotators or existing AI systems, effectively supervise LLMs to improve performance on hard reasoning tasks, especially those that challenge and requires expertise or daily practice from the teacher models? In this paper, we seek for empirical answers to this question by investigating various data-driven strategies that offer supervision data at different quality levels upon tasks of varying complexity. Two intuitive strategies emerge for teacher models to provide supervision during alignment training: 1) using lower-quality supervision from complete tasks that match the difficulty of the target reasoning tasks, and 2) leveraging higher-quality supervision from easier subtasks that are less challenging. Interestingly, we find that even when the outcome error rate for hard task supervision is high (e.g., 90\%), training on such data can outperform perfectly correct supervision on easier subtasks on multiple hard math benchmarks. We further identify a more critical factor influencing training performance: step-wise error rates, which indicate the severity of errors in solutions. Specifically, training on hard task supervision with the same outcome error rates but disparate step-wise error rates can lead to a 30\% accuracy gap on MATH benchmark. Our results also reveal that supplementing hard task supervision with the corresponding subtask supervision can yield notable performance improvements than simply combining rephrased hard full task supervision, suggesting new avenues for data augmentation. Data and code are released at \url{https://github.com/hexuan21/Weak-to-Strong}.

Updated: 2024-10-27 17:55:27

标题: 引领复杂情境：什么样的监督对于复杂推理任务最为有效？

摘要: 如何能够有效地利用“弱教师模型”，如普通人类标记者或现有的人工智能系统，监督LLMs以提高在困难推理任务上的表现，特别是那些挑战并需要教师模型的专业知识或日常实践的任务？本文通过研究提供不同质量级别监督数据的各种数据驱动策略，寻求对这个问题的实证答案。在对不同复杂性任务进行对齐训练时，出现了两种直观策略供教师模型提供监督：1）使用与目标推理任务难度相匹配的完整任务的低质量监督，2）利用较不具挑战性的更简单子任务的高质量监督。有趣的是，即使在困难任务监督的结果错误率很高（例如90％），在多个困难数学基准测试中，训练这些数据的表现可以超过对更容易子任务进行完全正确监督。我们进一步确定了影响训练表现的更关键因素：逐步错误率，这指示解决方案中错误的严重程度。具体来说，训练在具有相同结果错误率但截然不同逐步错误率的困难任务监督上，可以导致在MATH基准测试中的30％准确性差距。我们的结果还显示，将困难任务监督补充为相应的子任务监督，可以获得显著的性能提升，而不仅仅是结合重新表述的困难完整任务监督，这表明了数据增强的新途径。数据和代码已经在\url{https://github.com/hexuan21/Weak-to-Strong}上发布。

更新时间: 2024-10-27 17:55:27

领域: cs.LG,cs.CL

下载: http://arxiv.org/abs/2410.20533v1

Search Wide, Focus Deep: Automated Fetal Brain Extraction with Sparse Training Data

Automated fetal brain extraction from full-uterus MRI is a challenging task due to variable head sizes, orientations, complex anatomy, and prevalent artifacts. While deep-learning (DL) models trained on synthetic images have been successful in adult brain extraction, adapting these networks for fetal MRI is difficult due to the sparsity of labeled data, leading to increased false-positive predictions. To address this challenge, we propose a test-time strategy that reduces false positives in networks trained on sparse, synthetic labels. The approach uses a breadth-fine search (BFS) to identify a subvolume likely to contain the fetal brain, followed by a deep-focused sliding window (DFS) search to refine the extraction, pooling predictions to minimize false positives. We train models at different window sizes using synthetic images derived from a small number of fetal brain label maps, augmented with random geometric shapes. Each model is trained on diverse head positions and scales, including cases with partial or no brain tissue. Our framework matches state-of-the-art brain extraction methods on clinical HASTE scans of third-trimester fetuses and exceeds them by up to 5\% in terms of Dice in the second trimester as well as EPI scans across both trimesters. Our results demonstrate the utility of a sliding-window approach and combining predictions from several models trained on synthetic images, for improving brain-extraction accuracy by progressively refining regions of interest and minimizing the risk of missing brain mask slices or misidentifying other tissues as brain.

Updated: 2024-10-27 17:54:01

标题: 搜索广泛，专注深入：利用稀疏训练数据自动提取胎儿大脑

摘要: 胎儿全子宫MRI自动化脑部提取是一项具有挑战性的任务，原因是由于头部大小、方向、复杂解剖结构和常见伪影的变化。虽然在成人脑部提取方面，基于合成图像训练的深度学习（DL）模型取得了成功，但将这些网络调整为适用于胎儿MRI是困难的，因为标记数据稀缺，导致假阳性预测增加。为了解决这一挑战，我们提出了一种在稀疏、合成标签上训练的网络中减少假阳性的测试时策略。该方法使用广度优先搜索（BFS）来识别可能包含胎儿大脑的子体积，然后使用深度聚焦滑动窗口（DFS）搜索来优化提取，汇总预测结果以减少假阳性。我们使用从少量胎儿大脑标签图中导出的合成图像，在不同窗口尺寸上训练模型，通过随机几何形状增强。每个模型在多样的头部位置和尺度上训练，包括部分或无脑组织的情况。我们的框架在临床HASTE扫描的第三孕期胎儿中与现有最先进的脑部提取方法相匹配，并在第二孕期以及跨越两个孕期的EPI扫描中超越它们高达5\%的Dice。我们的结果表明，通过逐步细化感兴趣区域和最小化错过脑部蒙版切片或误将其他组织识别为脑部的风险，采用滑动窗口方法和结合多个模型的预测，可以提高脑部提取准确性。

更新时间: 2024-10-27 17:54:01

领域: eess.IV,cs.CV,cs.LG

下载: http://arxiv.org/abs/2410.20532v1

CodeRosetta: Pushing the Boundaries of Unsupervised Code Translation for Parallel Programming

Recent advancements in Large Language Models (LLMs) have renewed interest in automatic programming language translation. Encoder-decoder transformer models, in particular, have shown promise in translating between different programming languages. However, translating between a language and its high-performance computing (HPC) extensions remains underexplored due to challenges such as complex parallel semantics. In this paper, we introduce CodeRosetta, an encoder-decoder transformer model designed specifically for translating between programming languages and their HPC extensions. CodeRosetta is evaluated on C++ to CUDA and Fortran to C++ translation tasks. It uses a customized learning framework with tailored pretraining and training objectives to effectively capture both code semantics and parallel structural nuances, enabling bidirectional translation. Our results show that CodeRosetta outperforms state-of-the-art baselines in C++ to CUDA translation by 2.9 BLEU and 1.72 CodeBLEU points while improving compilation accuracy by 6.05%. Compared to general closed-source LLMs, our method improves C++ to CUDA translation by 22.08 BLEU and 14.39 CodeBLEU, with 2.75% higher compilation accuracy. Finally, CodeRosetta exhibits proficiency in Fortran to parallel C++ translation, marking it, to our knowledge, as the first encoder-decoder model for this complex task, improving CodeBLEU by at least 4.63 points compared to closed-source and open-code LLMs.

Updated: 2024-10-27 17:34:07

标题: CodeRosetta：推动无监督代码翻译在并行编程中的边界

摘要: 最近大型语言模型（LLMs）的进展重新激发了对自动编程语言翻译的兴趣。特别是编码器-解码器变压器模型在不同编程语言之间的翻译中显示出了潜力。然而，由于复杂的并行语义等挑战，对于语言及其高性能计算（HPC）扩展之间的翻译仍未得到充分探索。在本文中，我们介绍了CodeRosetta，这是一个专门设计用于编程语言及其HPC扩展之间翻译的编码器-解码器变压器模型。CodeRosetta在C++到CUDA和Fortran到C++的翻译任务上进行了评估。它使用定制的学习框架和量身定制的预训练和训练目标，有效捕捉了代码语义和并行结构细微差异，实现了双向翻译。我们的结果显示，CodeRosetta在C++到CUDA翻译中优于现有基线2.9 BLEU和1.72 CodeBLEU点，同时将编译精度提高了6.05％。与一般的闭源LLMs相比，我们的方法将C++到CUDA翻译提高了22.08 BLEU和14.39 CodeBLEU，并且编译精度高出2.75％。最后，CodeRosetta在Fortran到并行C++翻译中表现出色，据我们所知，这是首个针对这一复杂任务的编码器-解码器模型，与闭源和开源代码LLMs相比，将CodeBLEU提高至少4.63个点。

更新时间: 2024-10-27 17:34:07

领域: cs.DC,cs.AI,cs.LG,cs.PF,cs.PL,cs.SE

下载: http://arxiv.org/abs/2410.20527v1

Llama Scope: Extracting Millions of Features from Llama-3.1-8B with Sparse Autoencoders

Sparse Autoencoders (SAEs) have emerged as a powerful unsupervised method for extracting sparse representations from language models, yet scalable training remains a significant challenge. We introduce a suite of 256 SAEs, trained on each layer and sublayer of the Llama-3.1-8B-Base model, with 32K and 128K features. Modifications to a state-of-the-art SAE variant, Top-K SAEs, are evaluated across multiple dimensions. In particular, we assess the generalizability of SAEs trained on base models to longer contexts and fine-tuned models. Additionally, we analyze the geometry of learned SAE latents, confirming that \emph{feature splitting} enables the discovery of new features. The Llama Scope SAE checkpoints are publicly available at~\url{https://huggingface.co/fnlp/Llama-Scope}, alongside our scalable training, interpretation, and visualization tools at \url{https://github.com/OpenMOSS/Language-Model-SAEs}. These contributions aim to advance the open-source Sparse Autoencoder ecosystem and support mechanistic interpretability research by reducing the need for redundant SAE training.

Updated: 2024-10-27 17:33:49

标题: 羊驼范围：使用稀疏自动编码器从羊驼3.1-8B中提取数百万个特征

摘要: 稀疏自动编码器（SAEs）已经成为从语言模型中提取稀疏表示的强大无监督方法，但可扩展的训练仍然是一个重要挑战。我们介绍了一套256个SAE，分别在Llama-3.1-8B-Base模型的每个层和子层上进行训练，特征数量分别为32K和128K。对最先进的SAE变种Top-K SAE进行了多维度评估。特别是，我们评估了在基础模型上训练的SAE对更长上下文和微调模型的泛化能力。此外，我们分析了学习SAE潜变量的几何结构，确认“特征分裂”可以发现新特征。Llama Scope SAE的检查点可以在https://huggingface.co/fnlp/Llama-Scope 上公开获取，同时我们的可扩展训练、解释和可视化工具可以在https://github.com/OpenMOSS/Language-Model-SAEs上获得。这些贡献旨在推动开源稀疏自动编码器生态系统的发展，通过减少冗余的SAE训练来支持机械可解释性研究。

更新时间: 2024-10-27 17:33:49

领域: cs.LG,cs.CL

下载: http://arxiv.org/abs/2410.20526v1

WorldCuisines: A Massive-Scale Benchmark for Multilingual and Multicultural Visual Question Answering on Global Cuisines

Vision Language Models (VLMs) often struggle with culture-specific knowledge, particularly in languages other than English and in underrepresented cultural contexts. To evaluate their understanding of such knowledge, we introduce WorldCuisines, a massive-scale benchmark for multilingual and multicultural, visually grounded language understanding. This benchmark includes a visual question answering (VQA) dataset with text-image pairs across 30 languages and dialects, spanning 9 language families and featuring over 1 million data points, making it the largest multicultural VQA benchmark to date. It includes tasks for identifying dish names and their origins. We provide evaluation datasets in two sizes (12k and 60k instances) alongside a training dataset (1 million instances). Our findings show that while VLMs perform better with correct location context, they struggle with adversarial contexts and predicting specific regional cuisines and languages. To support future research, we release a knowledge base with annotated food entries and images along with the VQA data.

Updated: 2024-10-27 17:26:53

标题: 《WorldCuisines：全球美食的多语言和多文化视觉问答的大规模基准》

摘要: 视觉语言模型（VLMs）通常在特定于文化的知识方面存在困难，特别是在英语以外的语言和少数文化环境中。为了评估它们对这种知识的理解，我们引入了WorldCuisines，这是一个用于多语言和多文化、基于视觉的语言理解的大规模基准。该基准包括一个视觉问答（VQA）数据集，跨越30种语言和方言，涵盖9个语言家族，拥有100多万个数据点，是迄今为止规模最大的跨文化VQA基准。它包括识别菜名及其起源的任务。我们提供两种尺寸（12k和60k个实例）的评估数据集，以及一个训练数据集（100万个实例）。我们的研究结果显示，虽然VLMs在正确的位置上下文中表现更好，但在对抗性背景和预测特定地区的美食和语言方面表现困难。为了支持未来的研究，我们发布了一个带有注释食品条目和图像的知识库，以及VQA数据。

更新时间: 2024-10-27 17:26:53

领域: cs.CL,cs.AI,cs.CV

下载: http://arxiv.org/abs/2410.12705v2

Props for Machine-Learning Security

We propose protected pipelines or props for short, a new approach for authenticated, privacy-preserving access to deep-web data for machine learning (ML). By permitting secure use of vast sources of deep-web data, props address the systemic bottleneck of limited high-quality training data in ML development. Props also enable privacy-preserving and trustworthy forms of inference, allowing for safe use of sensitive data in ML applications. Props are practically realizable today by leveraging privacy-preserving oracle systems initially developed for blockchain applications.

Updated: 2024-10-27 17:05:48

标题: 用于机器学习安全的辅助工具

摘要: 我们提出了一种新的方法，称为受保护的管道或简称为“props”，用于机器学习（ML）对深网数据进行认证和隐私保护的访问。通过允许安全使用大量的深网数据源，props解决了ML开发中有限高质量训练数据的系统性瓶颈。props还能够实现隐私保护和可信的推断形式，允许在ML应用中安全使用敏感数据。通过利用最初为区块链应用开发的隐私保护oracle系统，props今天可以实现。

更新时间: 2024-10-27 17:05:48

领域: cs.CR,cs.AI

下载: http://arxiv.org/abs/2410.20522v1

MidiTok Visualizer: a tool for visualization and analysis of tokenized MIDI symbolic music

Symbolic music research plays a crucial role in music-related machine learning, but MIDI data can be complex for those without musical expertise. To address this issue, we present MidiTok Visualizer, a web application designed to facilitate the exploration and visualization of various MIDI tokenization methods from the MidiTok Python package. MidiTok Visualizer offers numerous customizable parameters, enabling users to upload MIDI files to visualize tokenized data alongside an interactive piano roll.

Updated: 2024-10-27 17:00:55

标题: MidiTok可视化工具：用于可视化和分析标记化的MIDI符号音乐的工具

摘要: 符号音乐研究在与音乐相关的机器学习中起着至关重要的作用，但对于没有音乐专业知识的人来说，MIDI数据可能会很复杂。为了解决这个问题，我们提出了MidiTok Visualizer，这是一个旨在促进MidiTok Python软件包中各种MIDI标记化方法的探索和可视化的Web应用程序。MidiTok Visualizer提供了许多可定制的参数，使用户能够上传MIDI文件以将标记化的数据可视化，并在交互式钢琴卷轴旁边进行展示。

更新时间: 2024-10-27 17:00:55

领域: cs.SD,cs.AI,cs.MM,eess.AS

下载: http://arxiv.org/abs/2410.20518v1

A Cosmic-Scale Benchmark for Symmetry-Preserving Data Processing

Efficiently processing structured point cloud data while preserving multiscale information is a key challenge across domains, from graphics to atomistic modeling. Using a curated dataset of simulated galaxy positions and properties, represented as point clouds, we benchmark the ability of graph neural networks to simultaneously capture local clustering environments and long-range correlations. Given the homogeneous and isotropic nature of the Universe, the data exhibits a high degree of symmetry. We therefore focus on evaluating the performance of Euclidean symmetry-preserving ($E(3)$-equivariant) graph neural networks, showing that they can outperform non-equivariant counterparts and domain-specific information extraction techniques in downstream performance as well as simulation-efficiency. However, we find that current architectures fail to capture information from long-range correlations as effectively as domain-specific baselines, motivating future work on architectures better suited for extracting long-range information.

Updated: 2024-10-27 16:58:48

标题: 一个宇宙尺度的对称性保持数据处理基准

摘要: 有效处理结构化点云数据并保留多尺度信息是跨领域的关键挑战，从图形学到原子建模。利用模拟星系位置和属性的筛选数据集，表示为点云，我们评估了图神经网络同时捕捉本地聚类环境和远程相关性的能力。鉴于宇宙的均匀和各向同性性质，数据表现出高度对称性。因此，我们重点评估了保持欧几里德对称性（$E(3)$-等变）的图神经网络的性能，显示它们在下游性能和模拟效率方面可以胜过非等变对应物和领域特定信息提取技术。然而，我们发现当前的架构无法像领域特定的基准那样有效地捕捉远程相关性的信息，这促使我们未来致力于开发更适合提取远程信息的架构。

更新时间: 2024-10-27 16:58:48

领域: cs.LG,astro-ph.IM

下载: http://arxiv.org/abs/2410.20516v1

Symbotunes: unified hub for symbolic music generative models

Implementations of popular symbolic music generative models often differ significantly in terms of the libraries utilized and overall project structure. Therefore, directly comparing the methods or becoming acquainted with them may present challenges. To mitigate this issue we introduce Symbotunes, an open-source unified hub for symbolic music generative models. Symbotunes contains modern Python implementations of well-known methods for symbolic music generation, as well as a unified pipeline for generating and training.

Updated: 2024-10-27 16:54:58

标题: Symbotunes：符号音乐生成模型的统一中心

摘要: 流行的符号音乐生成模型的实现在所使用的库和整体项目结构方面通常存在显著差异。因此，直接比较这些方法或者熟悉它们可能会面临挑战。为了缓解这个问题，我们引入了Symbotunes，一个开源的符号音乐生成模型统一中心。Symbotunes包含了现代Python实现的知名符号音乐生成方法，以及一个用于生成和训练的统一流水线。

更新时间: 2024-10-27 16:54:58

领域: cs.SD,cs.AI,cs.LG,eess.AS

下载: http://arxiv.org/abs/2410.20515v1

Optimal Algorithms for Online Convex Optimization with Adversarial Constraints

A well-studied generalization of the standard online convex optimization (OCO) framework is constrained online convex optimization (COCO). In COCO, on every round, a convex cost function and a convex constraint function are revealed to the learner after it chooses the action for that round. The objective is to design an online learning policy that simultaneously achieves a small regret while ensuring a small cumulative constraint violation (CCV) against an adaptive adversary interacting over a horizon of length $T$. A long-standing open question in COCO is whether an online policy can simultaneously achieve $O(\sqrt{T})$ regret and $\tilde{O}(\sqrt{T})$ CCV without any restrictive assumptions. For the first time, we answer this in the affirmative and show that a simple first-order policy can simultaneously achieve these bounds. Furthermore, in the case of strongly convex cost and convex constraint functions, the regret guarantee can be improved to $O(\log T)$ while keeping the CCV bound the same as above. We establish these results by effectively combining adaptive OCO policies as a blackbox with Lyapunov optimization - a classic tool from control theory. Surprisingly, the analysis is short and elegant.

Updated: 2024-10-27 16:37:56

标题: 对于具有对抗性约束的在线凸优化问题的最优算法

摘要: 一个广泛研究的标准在线凸优化（OCO）框架的泛化是受限在线凸优化（COCO）。在COCO中，每一轮后，选择该轮行动后向学习者透露一个凸成本函数和一个凸约束函数。目标是设计一个在线学习策略，同时实现小的遗憾，同时确保在与适应性对手互动的长度为$T$的时间段内小的累积约束违反（CCV）。COCO中一个长期存在的问题是是否在线策略可以在没有任何限制性假设的情况下同时实现$O(\sqrt{T})$的遗憾和$\tilde{O}(\sqrt{T})$的CCV。我们首次肯定地回答了这个问题，并展示了一个简单的一阶策略可以同时实现这些界限。此外，在成本函数和凸约束函数强凸的情况下，遗憾保证可以改进到$O(\log T)$，同时保持CCV限制与上述相同。我们通过有效地将自适应OCO策略与李亚普诺夫优化结合起来，作为控制理论中的一个经典工具来建立这些结果。令人惊讶的是，分析简洁而优雅。

更新时间: 2024-10-27 16:37:56

领域: cs.LG,math.OC

下载: http://arxiv.org/abs/2310.18955v3

Hierarchical Universal Value Function Approximators

There have been key advancements to building universal approximators for multi-goal collections of reinforcement learning value functions -- key elements in estimating long-term returns of states in a parameterized manner. We extend this to hierarchical reinforcement learning, using the options framework, by introducing hierarchical universal value function approximators (H-UVFAs). This allows us to leverage the added benefits of scaling, planning, and generalization expected in temporal abstraction settings. We develop supervised and reinforcement learning methods for learning embeddings of the states, goals, options, and actions in the two hierarchical value functions: $Q(s, g, o; \theta)$ and $Q(s, g, o, a; \theta)$. Finally we demonstrate generalization of the HUVFAs and show they outperform corresponding UVFAs.

Updated: 2024-10-27 16:37:44

标题: 分层通用价值函数逼近器

摘要: 已经有关键进展用于构建多目标集合的强化学习值函数的通用逼近器 - 在参数化方式估计长期回报的状态中的关键元素。我们将这一概念扩展到分层强化学习中，使用选项框架，通过引入分层通用值函数逼近器（H-UVFAs）。这使我们能够利用在时间抽象设置中预期的缩放、规划和泛化的附加好处。我们开发了用于学习两个分层值函数中状态、目标、选项和动作的嵌入的监督学习和强化学习方法：$Q(s, g, o; \theta)$ 和 $Q(s, g, o, a; \theta)$。最后，我们展示了HUVFAs的泛化性，并展示它们胜过相应的UVFAs。

更新时间: 2024-10-27 16:37:44

领域: cs.LG,cs.AI,stat.ML,I.2.6

下载: http://arxiv.org/abs/2410.08997v2

LLM Robustness Against Misinformation in Biomedical Question Answering

The retrieval-augmented generation (RAG) approach is used to reduce the confabulation of large language models (LLMs) for question answering by retrieving and providing additional context coming from external knowledge sources (e.g., by adding the context to the prompt). However, injecting incorrect information can mislead the LLM to generate an incorrect answer. In this paper, we evaluate the effectiveness and robustness of four LLMs against misinformation - Gemma 2, GPT-4o-mini, Llama~3.1, and Mixtral - in answering biomedical questions. We assess the answer accuracy on yes-no and free-form questions in three scenarios: vanilla LLM answers (no context is provided), "perfect" augmented generation (correct context is provided), and prompt-injection attacks (incorrect context is provided). Our results show that Llama 3.1 (70B parameters) achieves the highest accuracy in both vanilla (0.651) and "perfect" RAG (0.802) scenarios. However, the accuracy gap between the models almost disappears with "perfect" RAG, suggesting its potential to mitigate the LLM's size-related effectiveness differences. We further evaluate the ability of the LLMs to generate malicious context on one hand and the LLM's robustness against prompt-injection attacks on the other hand, using metrics such as attack success rate (ASR), accuracy under attack, and accuracy drop. As adversaries, we use the same four LLMs (Gemma 2, GPT-4o-mini, Llama 3.1, and Mixtral) to generate incorrect context that is injected in the target model's prompt. Interestingly, Llama is shown to be the most effective adversary, causing accuracy drops of up to 0.48 for vanilla answers and 0.63 for "perfect" RAG across target models. Our analysis reveals that robustness rankings vary depending on the evaluation measure, highlighting the complexity of assessing LLM resilience to adversarial attacks.

Updated: 2024-10-27 16:23:26

标题: LLM在生物医学问答中对错误信息的强健性

摘要: 检索增强生成（RAG）方法用于减少大型语言模型（LLM）在问答中的混淆，通过从外部知识源（例如，通过将上下文添加到提示）检索和提供额外的上下文。然而，注入不正确的信息可能会误导LLM生成不正确的答案。本文评估了四种LLM（Gemma 2、GPT-4o-mini、Llama 3.1和Mixtral）在回答生物医学问题时对错误信息的有效性和鲁棒性。我们在三种情景下评估了是非和自由形式问题的答案准确性：纯LLM答案（未提供上下文）、“完美”增强生成（提供正确上下文）和提示注入攻击（提供错误上下文）。结果显示，Llama 3.1（70B参数）在纯LLM（0.651）和“完美”RAG（0.802）情景下均达到最高准确性。然而，在“完美”RAG下，模型之间的准确性差距几乎消失，表明其有潜力缓解LLM大小相关的效果差异。我们进一步评估了LLM生成恶意上下文的能力以及LLM对提示注入攻击的鲁棒性，使用攻击成功率（ASR）、攻击下的准确性和准确性下降等指标。作为对手，我们使用相同四种LLM（Gemma 2、GPT-4o-mini、Llama 3.1和Mixtral）生成不正确的上下文，注入到目标模型的提示中。有趣的是，Llama被证明是最有效的对手，导致纯LLM答案的准确性下降高达0.48，而“完美”RAG在目标模型中下降高达0.63。我们的分析表明，鲁棒性排名取决于评估指标，突显了评估LLM对抗攻击的复杂性。

更新时间: 2024-10-27 16:23:26

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2410.21330v1

When Less is More: Achieving Faster Convergence in Distributed Edge Machine Learning

Distributed Machine Learning (DML) on resource-constrained edge devices holds immense potential for real-world applications. However, achieving fast convergence in DML in these heterogeneous environments remains a significant challenge. Traditional frameworks like Bulk Synchronous Parallel and Asynchronous Stochastic Parallel rely on frequent, small updates that incur substantial communication overhead and hinder convergence speed. Furthermore, these frameworks often employ static dataset sizes, neglecting the heterogeneity of edge devices and potentially leading to straggler nodes that slow down the entire training process. The straggler nodes, i.e., edge devices that take significantly longer to process their assigned data chunk, hinder the overall training speed. To address these limitations, this paper proposes Hermes, a novel probabilistic framework for efficient DML on edge devices. This framework leverages a dynamic threshold based on recent test loss behavior to identify statistically significant improvements in the model's generalization capability, hence transmitting updates only when major improvements are detected, thereby significantly reducing communication overhead. Additionally, Hermes employs dynamic dataset allocation to optimize resource utilization and prevents performance degradation caused by straggler nodes. Our evaluations on a real-world heterogeneous resource-constrained environment demonstrate that Hermes achieves faster convergence compared to state-of-the-art methods, resulting in a remarkable $13.22$x reduction in training time and a $62.1\%$ decrease in communication overhead.

Updated: 2024-10-27 16:17:03

标题: 少即是多：实现分布式边缘机器学习中更快的收敛速度

摘要: 分布式机器学习(DML)在资源受限的边缘设备上具有巨大的潜力，适用于现实世界的应用。然而，在这些异构环境中实现DML的快速收敛仍然是一个重要挑战。传统框架如Bulk Synchronous Parallel和Asynchronous Stochastic Parallel依赖频繁的小型更新，导致大量的通信开销并阻碍收敛速度。此外，这些框架通常使用静态数据集大小，忽略了边缘设备的异构性，可能导致速度较慢的节点，从而减慢整个训练过程。速度较慢的节点，即需要更长时间处理其分配的数据块的边缘设备，阻碍了整体训练速度。为了解决这些限制，本文提出了Hermes，一个用于边缘设备上高效DML的新颖概率框架。该框架利用基于最近测试损失行为的动态阈值，识别模型泛化能力的统计显著改进，因此仅在检测到主要改进时传输更新，从而显著减少通信开销。此外，Hermes采用动态数据集分配来优化资源利用，并防止由速度较慢的节点引起的性能下降。我们在一个现实世界的异构资源受限环境上的评估表明，Hermes相对于最先进的方法实现了更快的收敛，导致训练时间减少了13.22倍，通信开销减少了62.1%。

更新时间: 2024-10-27 16:17:03

领域: cs.DC,cs.LG,cs.PF

下载: http://arxiv.org/abs/2410.20495v1

Adversarial Robustness Through Artifact Design

Adversarial examples arose as a challenge for machine learning. To hinder them, most defenses alter how models are trained (e.g., adversarial training) or inference is made (e.g., randomized smoothing). Still, while these approaches markedly improve models' adversarial robustness, models remain highly susceptible to adversarial examples. Identifying that, in certain domains such as traffic-sign recognition, objects are implemented per standards specifying how artifacts (e.g., signs) should be designed, we propose a novel approach for improving adversarial robustness. Specifically, we offer a method to redefine standards, making minor changes to existing ones, to defend against adversarial examples. We formulate the problem of artifact design as a robust optimization problem, and propose gradient-based and greedy search methods to solve it. We evaluated our approach in the domain of traffic-sign recognition, allowing it to alter traffic-sign pictograms (i.e., symbols within the signs) and their colors. We found that, combined with adversarial training, our approach led to up to 25.18\% higher robust accuracy compared to state-of-the-art methods against two adversary types, while further increasing accuracy on benign inputs. Notably, a user study we conducted showed that traffic signs produced by our approach are also easily recognizable by human subjects.

Updated: 2024-10-27 16:09:59

标题: 对抗性鲁棒性通过工件设计

摘要: 对抗性样本作为机器学习的一个挑战出现。为了阻碍它们，大多数防御措施会改变模型训练的方式（如对抗性训练）或推断的方式（如随机平滑）。然而，尽管这些方法显着提高了模型的对抗性强度，模型仍然对对抗性样本极为敏感。鉴于在某些领域（如交通标志识别）中，物体按照规定设计物品（如标志）的标准实现，我们提出了一种改进对抗性强度的新方法。具体来说，我们提供了一种重新定义标准的方法，对现有标准进行微小更改，以抵御对抗性样本。我们将设计物品的问题形式化为一个强健优化问题，并提出了基于梯度和贪婪搜索方法来解决它。我们在交通标志识别领域评估了我们的方法，允许对交通标志象形图（即标志内的符号）和它们的颜色进行改变。我们发现，结合对抗性训练，我们的方法在对抗两种对手类型时相较于最先进方法提高了高达25.18\%的强健准确度，同时进一步提高了对良性输入的准确性。值得注意的是，我们进行的用户研究表明，我们的方法生成的交通标志也容易被人类主体识别。

更新时间: 2024-10-27 16:09:59

领域: cs.CR,cs.AI,cs.CV,cs.LG

下载: http://arxiv.org/abs/2402.04660v2

On the differential and Walsh spectra of $x^{2q+1}$ over $\mathbb{F}_{q^2}$

Let $q$ be an odd prime power and let $\mathbb{F}_{q^2}$ be the finite field with $q^2$ elements. In this paper, we determine the differential spectrum of the power function $F(x)=x^{2q+1}$ over $\mathbb{F}_{q^2}$. When the characteristic of $\mathbb{F}_{q^2}$ is $3$, we also determine the value distribution of the Walsh spectrum of $F$, showing that it is $4$-valued, and use the obtained result to determine the weight distribution of a $4$-weight cyclic code.

Updated: 2024-10-27 16:09:10

标题: 关于$x^{2q+1}$在$\mathbb{F}_{q^2}$上的差分和Walsh谱

摘要: 让$q$为一个奇素数幂，并让$\mathbb{F}_{q^2}$为具有$q^2$个元素的有限域。在本文中，我们确定了幂函数$F(x)=x^{2q+1}$在$\mathbb{F}_{q^2}$上的微分频谱。当$\mathbb{F}_{q^2}$的特征为$3$时，我们还确定了$F$的Walsh频谱的值分布，表明它是$4$值的，并利用所得结果确定了一个$4$权重循环码的权重分布。

更新时间: 2024-10-27 16:09:10

领域: cs.CR,cs.IT,math.IT,math.NT

下载: http://arxiv.org/abs/2407.07710v2

$\textit{Who Speaks Matters}$: Analysing the Influence of the Speaker's Ethnicity on Hate Classification

Large Language Models (LLMs) offer a lucrative promise for scalable content moderation, including hate speech detection. However, they are also known to be brittle and biased against marginalised communities and dialects. This requires their applications to high-stakes tasks like hate speech detection to be critically scrutinized. In this work, we investigate the robustness of hate speech classification using LLMs, particularly when explicit and implicit markers of the speaker's ethnicity are injected into the input. For the explicit markers, we inject a phrase that mentions the speaker's identity. For the implicit markers, we inject dialectal features. By analysing how frequently model outputs flip in the presence of these markers, we reveal varying degrees of brittleness across 4 popular LLMs and 5 ethnicities. We find that the presence of implicit dialect markers in inputs causes model outputs to flip more than the presence of explicit markers. Further, the percentage of flips varies across ethnicities. Finally, we find that larger models are more robust. Our findings indicate the need for exercising caution in deploying LLMs for high-stakes tasks like hate speech detection.

Updated: 2024-10-27 16:06:24

标题: “谁说话很重要：分析演讲者种族对仇恨分类的影响”

摘要: 大型语言模型（LLMs）为可扩展内容管理，包括仇恨言论检测，提供了诱人的前景。然而，它们也被认为脆弱且存在对边缘化社区和方言的偏见。这要求对它们在高风险任务（如仇恨言论检测）中的应用进行批判性审查。在这项工作中，我们研究了使用LLMs进行仇恨言论分类的鲁棒性，特别是在将说话者的种族明确和隐含的标记注入输入时。对于明确的标记，我们注入提及说话者身份的短语。对于隐含的标记，我们注入方言特征。通过分析在存在这些标记时模型输出翻转的频率，我们揭示了4个流行LLMs和5个种族之间的不同程度的脆弱性。我们发现，输入中存在隐含的方言标记会导致模型输出翻转的次数比存在明确标记时更多。此外，翻转的百分比在不同种族之间变化。最后，我们发现更大的模型更具鲁棒性。我们的发现表明在部署LLMs进行高风险任务，如仇恨言论检测时需要谨慎行事。

更新时间: 2024-10-27 16:06:24

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2410.20490v1

Volume-Preserving Transformers for Learning Time Series Data with Structure

Two of the many trends in neural network research of the past few years have been (i) the learning of dynamical systems, especially with recurrent neural networks such as long short-term memory networks (LSTMs) and (ii) the introduction of transformer neural networks for natural language processing (NLP) tasks. While some work has been performed on the intersection of these two trends, those efforts were largely limited to using the vanilla transformer directly without adjusting its architecture for the setting of a physical system. In this work we develop a transformer-inspired neural network and use it to learn a dynamical system. We (for the first time) change the activation function of the attention layer to imbue the transformer with structure-preserving properties to improve long-term stability. This is shown to be of great advantage when applying the neural network to learning the trajectory of a rigid body.

Updated: 2024-10-27 16:05:07

标题: 体积保持变换器：用于学习具有结构的时间序列数据

摘要: 过去几年神经网络研究中的许多趋势之一是学习动态系统，特别是利用循环神经网络，如长短期记忆网络（LSTM），另一个趋势是引入变压器神经网络用于自然语言处理（NLP）任务。虽然一些工作已经在这两个趋势的交叉点上进行，但这些努力主要限于直接使用普通变压器，而没有调整其架构以适应物理系统的设置。在这项工作中，我们开发了一种受变压器启发的神经网络，并将其用于学习动态系统。我们（首次）改变了注意力层的激活函数，以赋予变压器结构保持特性，以改善长期稳定性。当将神经网络应用于学习刚体的轨迹时，这被证明是非常有利的。

更新时间: 2024-10-27 16:05:07

领域: math.NA,cs.LG,cs.NA,68T07, 65D30, 37M15, 65P10

下载: http://arxiv.org/abs/2312.11166v3

DNAHLM -- DNA sequence and Human Language mixed large language Model

There are already many DNA large language models, but most of them still follow traditional uses, such as extracting sequence features for classification tasks. More innovative applications of large language models, such as prompt engineering, RAG, and zero-shot or few-shot prediction, remain challenging for DNA-based models. The key issue lies in the fact that DNA models and human natural language models are entirely separate; however, techniques like prompt engineering require the use of natural language, thereby significantly limiting the application of DNA large language models. This paper introduces a pre-trained model trained on the GPT-2 network, combining DNA sequences and English text, and uses a unified BPE tokenization method. We then convert classification and other downstream tasks into Alpaca format instruction data, and perform instruction fine-tuning on this pre-trained model to create a fine-tuned model capable of handling multiple tasks. The model has demonstrated its effectiveness in DNA related zero-shot prediction and multitask application. This research provides a highly promising direction for building a unified DNA sequence task framework.

Updated: 2024-10-27 15:53:25

标题: DNAHLM -- DNA序列和人类语言混合的大型语言模型

摘要: 已经有许多DNA大型语言模型，但其中大多数仍然遵循传统用途，比如提取序列特征用于分类任务。大型语言模型的更具创新性的应用，如提示工程、RAG以及零样本或少样本预测，对于基于DNA的模型仍然具有挑战性。关键问题在于DNA模型和人类自然语言模型是完全独立的；然而，像提示工程这样的技术需要使用自然语言，从而显著限制了DNA大型语言模型的应用范围。本文介绍了一个在GPT-2网络上训练的预训练模型，结合了DNA序列和英文文本，并使用统一的BPE标记方法。然后，我们将分类和其他下游任务转换为Alpaca格式的指令数据，并在这个预训练模型上进行指令微调，以创建一个能够处理多个任务的微调模型。该模型在DNA相关的零样本预测和多任务应用中表现出了其有效性。这项研究为构建一个统一的DNA序列任务框架提供了一个极具前景的方向。

更新时间: 2024-10-27 15:53:25

领域: q-bio.GN,cs.LG,92-10,J.3

下载: http://arxiv.org/abs/2410.16917v2

Efficient Diversity-based Experience Replay for Deep Reinforcement Learning

Deep Reinforcement Learning (DRL) has achieved remarkable success in solving complex decision-making problems by combining the representation capabilities of deep learning with the decision-making power of reinforcement learning. However, learning in sparse reward environments remains challenging due to insufficient feedback to guide the optimization of agents, especially in real-life environments with high-dimensional states. To tackle this issue, experience replay is commonly introduced to enhance learning efficiency through past experiences. Nonetheless, current methods of experience replay, whether based on uniform or prioritized sampling, frequently struggle with suboptimal learning efficiency and insufficient utilization of samples. This paper proposes a novel approach, diversity-based experience replay (DBER), which leverages the deterministic point process to prioritize diverse samples in state realizations. We conducted extensive experiments on Robotic Manipulation tasks in MuJoCo, Atari games, and realistic in-door environments in Habitat. The results show that our method not only significantly improves learning efficiency but also demonstrates superior performance in sparse reward environments with high-dimensional states, providing a simple yet effective solution for this field.

Updated: 2024-10-27 15:51:27

标题: 深度强化学习中高效多样性经验重放

摘要: 深度强化学习（DRL）通过将深度学习的表示能力与强化学习的决策能力相结合，已经在解决复杂的决策问题中取得了显著成功。然而，在稀疏奖励环境中学习仍然具有挑战性，因为缺乏足够的反馈来引导代理的优化，特别是在具有高维状态的现实环境中。为了解决这个问题，通常引入经验重播来通过过去的经验增强学习效率。然而，当前的经验重播方法，无论是基于均匀采样还是优先采样，经常面临学习效率不佳和样本利用不足的困难。本文提出了一种新颖的方法，基于多样性的经验重播（DBER），利用确定性点过程来优先考虑状态实现中的多样样本。我们在MuJoCo的机器人操作任务、Atari游戏和Habitat中的真实室内环境中进行了广泛实验。结果表明，我们的方法不仅显著提高了学习效率，还在具有高维状态的稀疏奖励环境中表现出优越性能，为这一领域提供了简单而有效的解决方案。

更新时间: 2024-10-27 15:51:27

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2410.20487v1

Improving Decision Sparsity

Sparsity is a central aspect of interpretability in machine learning. Typically, sparsity is measured in terms of the size of a model globally, such as the number of variables it uses. However, this notion of sparsity is not particularly relevant for decision-making; someone subjected to a decision does not care about variables that do not contribute to the decision. In this work, we dramatically expand a notion of decision sparsity called the Sparse Explanation Value(SEV) so that its explanations are more meaningful. SEV considers movement along a hypercube towards a reference point. By allowing flexibility in that reference and by considering how distances along the hypercube translate to distances in feature space, we can derive sparser and more meaningful explanations for various types of function classes. We present cluster-based SEV and its variant tree-based SEV, introduce a method that improves credibility of explanations, and propose algorithms that optimize decision sparsity in machine learning models.

Updated: 2024-10-27 15:39:52

标题: 提高决策稀疏性

摘要: 稀疏性是机器学习中可解释性的一个核心方面。通常，稀疏性是以模型全局规模来衡量的，比如它使用的变量数量。然而，这种稀疏性的概念对于决策并不特别相关；受决策影响的人并不关心不对决策有贡献的变量。在这项工作中，我们大幅扩展了一种称为稀疏解释值（SEV）的决策稀疏性概念，使其解释更有意义。SEV考虑沿着一个超立方体向参考点移动。通过允许该参考的灵活性，并考虑沿着超立方体的距离如何转化为特征空间中的距离，我们可以推导出对于各种类型的函数类别更稀疏和更有意义的解释。我们提出基于聚类的SEV及其变体基于树的SEV，介绍一种提高解释可信度的方法，并提出优化机器学习模型中决策稀疏性的算法。

更新时间: 2024-10-27 15:39:52

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2410.20483v1

4-bit Shampoo for Memory-Efficient Network Training

Second-order optimizers, maintaining a matrix termed a preconditioner, are superior to first-order optimizers in both theory and practice. The states forming the preconditioner and its inverse root restrict the maximum size of models trained by second-order optimizers. To address this, compressing 32-bit optimizer states to lower bitwidths has shown promise in reducing memory usage. However, current approaches only pertain to first-order optimizers. In this paper, we propose the first 4-bit second-order optimizers, exemplified by 4-bit Shampoo, maintaining performance similar to that of 32-bit ones. We show that quantizing the eigenvector matrix of the preconditioner in 4-bit Shampoo is remarkably better than quantizing the preconditioner itself both theoretically and experimentally. By rectifying the orthogonality of the quantized eigenvector matrix, we enhance the approximation of the preconditioner's eigenvector matrix, which also benefits the computation of its inverse 4-th root. Besides, we find that linear square quantization slightly outperforms dynamic tree quantization when quantizing second-order optimizer states. Evaluation on various networks for image classification and natural language modeling demonstrates that our 4-bit Shampoo achieves comparable performance to its 32-bit counterpart while being more memory-efficient.

Updated: 2024-10-27 15:38:02

标题: 4位洗发水用于高效内存网络训练

摘要: Second-order optimizers，维护一个被称为预处理器的矩阵，在理论和实践中均优于一阶优化器。形成预处理器及其逆根的状态限制了由二阶优化器训练的模型的最大尺寸。为了解决这个问题，将32位优化器状态压缩到更低的位宽已经显示出减少内存使用的潜力。然而，目前的方法只适用于一阶优化器。在本文中，我们提出了第一个4位二阶优化器，以4位Shampoo为例，其性能与32位优化器相似。我们表明，在4位Shampoo中量化预处理器的特征向量矩阵在理论上和实验中明显优于量化预处理器本身。通过纠正量化特征向量矩阵的正交性，我们增强了预处理器特征向量矩阵的逼近，这也有利于计算其逆四次方根。此外，我们发现，在量化二阶优化器状态时，线性平方量化略优于动态树量化。对用于图像分类和自然语言建模的各种网络的评估表明，我们的4位Shampoo在性能上与其32位对应物相当，同时更具内存效率。

更新时间: 2024-10-27 15:38:02

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2405.18144v2

What Factors Affect Multi-Modal In-Context Learning? An In-Depth Exploration

Recently, rapid advancements in Multi-Modal In-Context Learning (MM-ICL) have achieved notable success, which is capable of achieving superior performance across various tasks without requiring additional parameter tuning. However, the underlying rules for the effectiveness of MM-ICL remain under-explored. To fill this gap, this work aims to investigate the research question: "What factors affect the performance of MM-ICL?'' To this end, we investigate extensive experiments on the three core steps of MM-ICL including demonstration retrieval, demonstration ordering, and prompt construction using 6 vision large language models and 20 strategies. Our findings highlight (1) the necessity of a multi-modal retriever for demonstration retrieval, (2) the importance of intra-demonstration ordering over inter-demonstration ordering, and (3) the enhancement of task comprehension through introductory instructions in prompts. We hope this study can serve as a foundational guide for optimizing MM-ICL strategies in future research.

Updated: 2024-10-27 15:37:51

标题: 什么因素影响多模态背景学习？深入探讨

摘要: 最近，多模态上下文学习（MM-ICL）领域取得了显著进展，能够在各种任务中取得优越性能，而无需额外的参数调整。然而，MM-ICL 的有效性规则尚未得到充分探讨。为了填补这一空白，本研究旨在探讨研究问题：“哪些因素影响了MM-ICL的性能？”为此，我们对MM-ICL的三个核心步骤进行了广泛实验，包括演示检索、演示排序和提示构建，使用了6种视觉大型语言模型和20种策略。我们的研究结果突显了：（1）演示检索需要多模态检索器的必要性，（2）演示内部排序比演示间排序更为重要，（3）通过提示中的介绍性说明增强任务理解。我们希望本研究可以成为未来优化MM-ICL策略的基础指南。

更新时间: 2024-10-27 15:37:51

领域: cs.CL,cs.AI,cs.CV

下载: http://arxiv.org/abs/2410.20482v1

MusicFlow: Cascaded Flow Matching for Text Guided Music Generation

We introduce MusicFlow, a cascaded text-to-music generation model based on flow matching. Based on self-supervised representations to bridge between text descriptions and music audios, we construct two flow matching networks to model the conditional distribution of semantic and acoustic features. Additionally, we leverage masked prediction as the training objective, enabling the model to generalize to other tasks such as music infilling and continuation in a zero-shot manner. Experiments on MusicCaps reveal that the music generated by MusicFlow exhibits superior quality and text coherence despite being over $2\sim5$ times smaller and requiring $5$ times fewer iterative steps. Simultaneously, the model can perform other music generation tasks and achieves competitive performance in music infilling and continuation. Our code and model will be publicly available.

Updated: 2024-10-27 15:35:41

标题: 音乐流：文本引导的音乐生成的级联流匹配

摘要: 我们引入了MusicFlow，这是一个基于流匹配的级联文本到音乐生成模型。基于自监督表示来连接文本描述和音乐音频之间的关系，我们构建了两个流匹配网络来建模语义和声学特征的条件分布。此外，我们利用掩码预测作为训练目标，使模型能够泛化到其他任务，如音乐填充和延续，以零样本方式实现。MusicCaps上的实验证明，MusicFlow生成的音乐在质量和文本连贯性方面表现出优越性，尽管其大小是之前的2到5倍，并且需要更少的迭代步骤。同时，该模型可以执行其他音乐生成任务，并在音乐填充和延续方面取得竞争性表现。我们的代码和模型将会公开发布。

更新时间: 2024-10-27 15:35:41

领域: cs.SD,cs.AI,eess.AS

下载: http://arxiv.org/abs/2410.20478v1

CorDA: Context-Oriented Decomposition Adaptation of Large Language Models for Task-Aware Parameter-Efficient Fine-tuning

Current parameter-efficient fine-tuning (PEFT) methods build adapters widely agnostic of the context of downstream task to learn, or the context of important knowledge to maintain. As a result, there is often a performance gap compared to full-parameter fine-tuning, and meanwhile the fine-tuned model suffers from catastrophic forgetting of the pre-trained world knowledge. In this paper, we propose CorDA, a Context-oriented Decomposition Adaptation method that builds learnable task-aware adapters from weight decomposition oriented by the context of downstream task or the world knowledge to maintain. Concretely, we collect a few data samples, and perform singular value decomposition for each linear layer of a pre-trained LLM multiplied by the covariance matrix of the input activation using these samples. The inverse of the covariance matrix is multiplied with the decomposed components to reconstruct the original weights. By doing so, the context of the representative samples is captured through deciding the factorizing orientation. Our method enables two options, the knowledge-preserved adaptation and the instruction-previewed adaptation. For the former, we use question-answering samples to obtain the covariance matrices, and use the decomposed components with the smallest $r$ singular values to initialize a learnable adapter, with the others frozen such that the world knowledge is better preserved. For the latter, we use the instruction data from the fine-tuning task, such as math or coding, to orientate the decomposition and train the largest $r$ components that most correspond to the task to learn. We conduct extensive experiments on Math, Code, and Instruction Following tasks.

Updated: 2024-10-27 15:27:57

标题: CorDA：基于上下文的大型语言模型分解适应性，用于任务感知参数高效微调

摘要: 当前的参数高效微调（PEFT）方法广泛构建适配器，对下游任务的上下文或需要维护的重要知识不加以考虑。因此，与完全参数微调相比，通常存在性能差距，同时微调模型容易遗忘预训练的世界知识。在本文中，我们提出了一种名为CorDA的上下文导向分解适应方法，该方法通过基于下游任务或需要维护的世界知识的上下文，构建可学习的任务感知适配器。具体而言，我们收集了一些数据样本，并针对每个预训练LLM的线性层，通过使用这些样本的输入激活的协方差矩阵进行奇异值分解。协方差矩阵的逆被乘以分解的组件，以重构原始权重。通过这样做，代表性样本的上下文通过决定因子化的方向来捕获。我们的方法实现了两种选项，即保留知识的适应和预览指令的适应。对于前者，我们使用问答样本获得协方差矩阵，并使用具有最小$r$奇异值的分解组件初始化可学习的适配器，同时冻结其他部分，以更好地保留世界知识。对于后者，我们使用来自微调任务的指令数据，如数学或编码，来定位分解并训练与任务最相关的前$r$个组件。我们在数学、编码和指令跟随任务上进行了广泛的实验。

更新时间: 2024-10-27 15:27:57

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2406.05223v2

Hamiltonian Score Matching and Generative Flows

Classical Hamiltonian mechanics has been widely used in machine learning in the form of Hamiltonian Monte Carlo for applications with predetermined force fields. In this work, we explore the potential of deliberately designing force fields for Hamiltonian ODEs, introducing Hamiltonian velocity predictors (HVPs) as a tool for score matching and generative models. We present two innovations constructed with HVPs: Hamiltonian Score Matching (HSM), which estimates score functions by augmenting data via Hamiltonian trajectories, and Hamiltonian Generative Flows (HGFs), a novel generative model that encompasses diffusion models and flow matching as HGFs with zero force fields. We showcase the extended design space of force fields by introducing Oscillation HGFs, a generative model inspired by harmonic oscillators. Our experiments validate our theoretical insights about HSM as a novel score matching metric and demonstrate that HGFs rival leading generative modeling techniques.

Updated: 2024-10-27 15:17:52

标题: Hamiltonian得分匹配和生成流

摘要: 经典哈密顿力学在机器学习中被广泛应用，以哈密顿蒙特卡洛的形式用于具有预定力场的应用。在这项工作中，我们探索了有意设计哈密顿ODE的力场的潜力，引入了哈密顿速度预测器（HVPs）作为评分匹配和生成模型的工具。我们提出了两个基于HVPs构建的创新：哈密顿评分匹配（HSM），通过增加数据通过哈密顿轨迹来估计评分函数，和哈密顿生成流（HGFs），一种新颖的生成模型，包括扩散模型和流匹配作为零力场的HGFs。我们通过引入振荡HGFs展示了力场的扩展设计空间，这是一种受谐振器启发的生成模型。我们的实验验证了HSM作为一种新颖的评分匹配度量的理论见解，并证明了HGFs与领先的生成建模技术不相上下。

更新时间: 2024-10-27 15:17:52

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2410.20470v1

Graph Neural Networks on Discriminative Graphs of Words

In light of the recent success of Graph Neural Networks (GNNs) and their ability to perform inference on complex data structures, many studies apply GNNs to the task of text classification. In most previous methods, a heterogeneous graph, containing both word and document nodes, is constructed using the entire corpus and a GNN is used to classify document nodes. In this work, we explore a new Discriminative Graph of Words Graph Neural Network (DGoW-GNN) approach encapsulating both a novel discriminative graph construction and model to classify text. In our graph construction, containing only word nodes and no document nodes, we split the training corpus into disconnected subgraphs according to their labels and weight edges by the pointwise mutual information of the represented words. Our graph construction, for which we provide theoretical motivation, allows us to reformulate the task of text classification as the task of walk classification. We also propose a new model for the graph-based classification of text, which combines a GNN and a sequence model. We evaluate our approach on seven benchmark datasets and find that it is outperformed by several state-of-the-art baseline models. We analyse reasons for this performance difference and hypothesise under which conditions it is likely to change.

Updated: 2024-10-27 15:14:06

标题: 基于词语判别图的图神经网络

摘要: 鉴于最近图神经网络（GNNs）的成功以及它们在复杂数据结构上执行推理的能力，许多研究将GNNs应用于文本分类任务。在大多数先前的方法中，使用整个语料库构建了一个包含词和文档节点的异构图，并使用GNN对文档节点进行分类。在这项工作中，我们探索了一种新的词图图神经网络（DGoW-GNN）方法，封装了一种新颖的歧视性图构建和模型来对文本进行分类。在我们的图构建中，只包含词节点而不包含文档节点，我们根据它们的标签将训练语料库分为不相连的子图，并通过所代表的词的点间互信息来加权边缘。我们提供了理论动机支持我们的图构建，这使得我们能够重新定义文本分类任务为漫步分类任务。我们还提出了一种新的基于图的文本分类模型，结合了GNN和序列模型。我们在七个基准数据集上评估了我们的方法，并发现它被几种最先进的基线模型超越。我们分析了这种性能差异的原因，并假设在哪些条件下它很可能会改变。

更新时间: 2024-10-27 15:14:06

领域: cs.LG,cs.CL

下载: http://arxiv.org/abs/2410.20469v1

A Derivational ChainBank for Modern Standard Arabic

This study presents the ``Arabic Derivational ChainBank,'' a novel framework for modeling Arabic derivational morphology. It establishes connections between forms and meanings by constructing a chain of derived words that reflect their derivational significance. To expedite the process, a rule-based methodology was employed, avoiding time-consuming manual annotation. The derivational network was then aligned with the CamelMorph morphological analyzer database. This two-step process resulted in a chain of derived word lemmas linked to their roots, encompassing 23,333 evaluated derivational relations, thereby demonstrating the efficiency of the ChainBank.

Updated: 2024-10-27 14:43:23

标题: 一个用于现代标准阿拉伯语的派生链银行

摘要: 这项研究介绍了“阿拉伯派生链银行”，这是一个新颖的框架，用于建模阿拉伯语派生形态学。它通过构建一系列反映其派生意义的派生词来建立形式和含义之间的联系。为了加快这一过程，采用了基于规则的方法论，避免了耗时的手动注释。然后，派生网络与CamelMorph形态分析器数据库进行了对齐。这个两步过程导致了一个链式派生词引用到它们的词根，包括了23,333个评估的派生关系，从而展示了ChainBank的效率。

更新时间: 2024-10-27 14:43:23

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2410.20463v1

Accelerating Transformer Pre-training with 2:4 Sparsity

Training large transformers is slow, but recent innovations on GPU architecture give us an advantage. NVIDIA Ampere GPUs can execute a fine-grained 2:4 sparse matrix multiplication twice as fast as its dense equivalent. In the light of this property, we comprehensively investigate the feasibility of accelerating feed-forward networks (FFNs) of transformers in pre-training. First, we define a ``flip rate'' to monitor the stability of a 2:4 training process. Utilizing this metric, we propose three techniques to preserve accuracy: to modify the sparse-refined straight-through estimator by applying the masked decay term on gradients, to determine a feasible decay factor in warm-up stage, and to enhance the model's quality by a dense fine-tuning procedure near the end of pre-training. Besides, we devise two techniques to practically accelerate training: to calculate transposable 2:4 masks by convolution, and to accelerate gated activation functions by reducing GPU L2 cache miss. Experiments show that our 2:4 sparse training algorithm achieves similar convergence to dense training algorithms on several transformer pre-training tasks, while actual acceleration can be observed on different shapes of transformer block apparently. Our toolkit is available at https://github.com/huyz2023/2by4-pretrain.

Updated: 2024-10-27 14:40:08

标题: 使用 2:4 稀疏性加速 Transformer 预训练

摘要: 训练大型变压器是缓慢的，但最近在GPU架构上的创新给了我们优势。 NVIDIA Ampere GPU可以比其密集等效物快两倍执行细粒度的2:4稀疏矩阵乘法。鉴于这一特性，我们全面调查了在预训练中加速变压器的前馈网络（FFNs）的可行性。首先，我们定义了“翻转率”来监控2:4训练过程的稳定性。利用这个度量标准，我们提出了三种技术来保持准确性：通过在梯度上应用掩蔽衰减项修改稀疏精化直通估计器，确定在热身阶段可行的衰减因子，并通过在预训练结束附近进行密集微调程序来增强模型的质量。此外，我们设计了两种实际加速训练的技术：通过卷积计算可转置的2:4掩码，以及通过减少GPU L2缓存丢失来加速门控激活函数。实验表明，我们的2:4稀疏训练算法在几个变压器预训练任务上实现了与密集训练算法类似的收敛性，同时在不同形状的变压器块上明显观察到实际加速。我们的工具包可以在https://github.com/huyz2023/2by4-pretrain找到。

更新时间: 2024-10-27 14:40:08

领域: cs.LG

下载: http://arxiv.org/abs/2404.01847v3

S-STE: Continuous Pruning Function for Efficient 2:4 Sparse Pre-training

Training deep neural networks (DNNs) is costly. Fortunately, Nvidia Ampere and Hopper GPUs can accelerate matrix multiplications twice as fast as a dense equivalent by implementing 2:4 sparsity. However, previous STE-based 2:4 pre-training methods (e.g. STE with hard-thresholding, SR-STE) suffer from optimization difficulties because of discontinuous pruning function. In this study, we comprehensively analyse the bottleneck of traditional N:M sparse training and recognize three drawbacks with discontinuity: incorrect descending direction, inability to predict the amount of descent and sparse mask oscillation. In the light of this statement, we propose S-STE, a simple yet powerful 2:4 training method that contains two parts: to continuously project weights to be 2:4 sparse, and to rescale sparse weights with a per-tensor fixed scaling factor. Besides, we adopt minimum-variance unbiased estimation for activation gradient and FP8 quantization for whole process. Results show that our method surpass previous 2:4 pre-training recipes and is comparable even with full parameter models.

Updated: 2024-10-27 14:15:32

标题: S-STE：用于高效2:4稀疏预训练的连续修剪功能

摘要: 训练深度神经网络（DNNs）是昂贵的。幸运的是，Nvidia Ampere和Hopper GPU可以通过实现2:4稀疏性加速矩阵乘法的速度是密集等效的两倍。然而，先前基于STE的2:4预训练方法（例如STE与硬阈值、SR-STE）由于不连续的修剪函数而遭受优化困难。在本研究中，我们全面分析了传统N:M稀疏训练的瓶颈，并识别了不连续性的三个缺点：不正确的下降方向，无法预测下降量和稀疏掩模振荡。鉴于这一情况，我们提出了S-STE，一种简单而强大的2:4训练方法，包含两部分：持续将权重投影为2:4稀疏，并使用每张量固定的缩放因子重新调整稀疏权重。此外，我们采用最小方差无偏估计来激活梯度和整个过程的FP8量化。结果表明，我们的方法超越了先前的2:4预训练配方，甚至与完整参数模型相媲美。

更新时间: 2024-10-27 14:15:32

领域: cs.LG

下载: http://arxiv.org/abs/2409.09099v2

TrajAgent: An Agent Framework for Unified Trajectory Modelling

Trajectory modeling, which includes research on trajectory data pattern mining and future prediction, has widespread applications in areas such as life services, urban transportation, and public administration. Numerous methods have been proposed to address specific problems within trajectory modelling. However, due to the heterogeneity of data and the diversity of trajectory tasks, achieving unified trajectory modelling remains an important yet challenging task. In this paper, we propose TrajAgent, a large language model-based agentic framework, to unify various trajectory modelling tasks. In TrajAgent, we first develop UniEnv, an execution environment with a unified data and model interface, to support the execution and training of various models. Building on UniEnv, we introduce TAgent, an agentic workflow designed for automatic trajectory modelling across various trajectory tasks. Specifically, we design AutOpt, a systematic optimization module within TAgent, to further improve the performance of the integrated model. With diverse trajectory tasks input in natural language, TrajAgent automatically generates competitive results via training and executing appropriate models. Extensive experiments on four tasks using four real-world datasets demonstrate the effectiveness of TrajAgent in unified trajectory modelling, achieving an average performance improvement of 15.43% over baseline methods.

Updated: 2024-10-27 13:51:09

标题: TrajAgent：统一轨迹建模的代理框架

摘要: 轨迹建模，包括轨迹数据模式挖掘和未来预测研究，在生活服务、城市交通和公共管理等领域有广泛的应用。已经提出了许多方法来解决轨迹建模中的特定问题。然而，由于数据的异质性和轨迹任务的多样性，实现统一的轨迹建模仍然是一项重要且具有挑战性的任务。在本文中，我们提出了TrajAgent，一个基于大型语言模型的代理框架，用于统一各种轨迹建模任务。在TrajAgent中，我们首先开发了UniEnv，一个具有统一数据和模型接口的执行环境，以支持各种模型的执行和训练。在UniEnv的基础上，我们引入了TAgent，一个设计用于自动跨各种轨迹任务建模的代理工作流。具体而言，我们设计了AutOpt，一个在TAgent中的系统优化模块，以进一步提高集成模型的性能。通过自然语言输入多样的轨迹任务，TrajAgent通过训练和执行适当的模型自动生成竞争性结果。利用四个真实数据集进行的广泛实验表明，TrajAgent在统一轨迹建模方面的有效性，平均性能提升了15.43%。

更新时间: 2024-10-27 13:51:09

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2410.20445v1

Vector Quantization Prompting for Continual Learning

Continual learning requires to overcome catastrophic forgetting when training a single model on a sequence of tasks. Recent top-performing approaches are prompt-based methods that utilize a set of learnable parameters (i.e., prompts) to encode task knowledge, from which appropriate ones are selected to guide the fixed pre-trained model in generating features tailored to a certain task. However, existing methods rely on predicting prompt identities for prompt selection, where the identity prediction process cannot be optimized with task loss. This limitation leads to sub-optimal prompt selection and inadequate adaptation of pre-trained features for a specific task. Previous efforts have tried to address this by directly generating prompts from input queries instead of selecting from a set of candidates. However, these prompts are continuous, which lack sufficient abstraction for task knowledge representation, making them less effective for continual learning. To address these challenges, we propose VQ-Prompt, a prompt-based continual learning method that incorporates Vector Quantization (VQ) into end-to-end training of a set of discrete prompts. In this way, VQ-Prompt can optimize the prompt selection process with task loss and meanwhile achieve effective abstraction of task knowledge for continual learning. Extensive experiments show that VQ-Prompt outperforms state-of-the-art continual learning methods across a variety of benchmarks under the challenging class-incremental setting. The code is available at \href{https://github.com/jiaolifengmi/VQ-Prompt}{this https URL}.

Updated: 2024-10-27 13:43:53

标题: 量化向量促进持续学习

摘要: 持续学习需要在训练单一模型时克服灾难性遗忘。最近表现优异的方法是基于提示的方法，利用一组可学习参数（即提示）来编码任务知识，从中选择适当的提示来引导固定的预训练模型生成针对特定任务定制的特征。然而，现有方法依赖于对提示身份的预测进行提示选择，其中身份预测过程无法通过任务损失进行优化。这种限制导致次优的提示选择和预训练特征对特定任务的不充分适应。之前的努力尝试通过直接从输入查询生成提示来解决这个问题，而不是从一组候选项中进行选择。然而，这些提示是连续的，缺乏足够的抽象性来表示任务知识，使它们对于持续学习效果不佳。为了解决这些挑战，我们提出了VQ-Prompt，一种基于提示的持续学习方法，将向量量化（VQ）融入到一组离散提示的端到端训练中。通过这种方式，VQ-Prompt可以通过任务损失优化提示选择过程，同时实现对任务知识的有效抽象，以便进行持续学习。大量实验证明，在具有挑战性的类增量设置下，VQ-Prompt在各种基准测试中表现优于最先进的持续学习方法。该代码可在\href{https://github.com/jiaolifengmi/VQ-Prompt}{此 https URL}上找到。

更新时间: 2024-10-27 13:43:53

领域: cs.LG,cs.CV

下载: http://arxiv.org/abs/2410.20444v1

TEAFormers: TEnsor-Augmented Transformers for Multi-Dimensional Time Series Forecasting

Multi-dimensional time series data, such as matrix and tensor-variate time series, are increasingly prevalent in fields such as economics, finance, and climate science. Traditional Transformer models, though adept with sequential data, do not effectively preserve these multi-dimensional structures, as their internal operations in effect flatten multi-dimensional observations into vectors, thereby losing critical multi-dimensional relationships and patterns. To address this, we introduce the Tensor-Augmented Transformer (TEAFormer), a novel method that incorporates tensor expansion and compression within the Transformer framework to maintain and leverage the inherent multi-dimensional structures, thus reducing computational costs and improving prediction accuracy. The core feature of the TEAFormer, the Tensor-Augmentation (TEA) module, utilizes tensor expansion to enhance multi-view feature learning and tensor compression for efficient information aggregation and reduced computational load. The TEA module is not just a specific model architecture but a versatile component that is highly compatible with the attention mechanism and the encoder-decoder structure of Transformers, making it adaptable to existing Transformer architectures. Our comprehensive experiments, which integrate the TEA module into three popular time series Transformer models across three real-world benchmarks, show significant performance enhancements, highlighting the potential of TEAFormers for cutting-edge time series forecasting.

Updated: 2024-10-27 13:32:12

标题: TEAFormers：用于多维时间序列预测的张量增强Transformer

摘要: 多维时间序列数据，例如矩阵和张量变量时间序列，在经济学、金融学和气候科学等领域越来越普遍。传统的Transformer模型虽然擅长处理顺序数据，但并不能有效地保留这些多维结构，因为它们的内部操作实际上将多维观测压缩成向量，从而丢失了关键的多维关系和模式。为了解决这个问题，我们引入了张量增强Transformer（TEAFormer），这是一种新颖的方法，它在Transformer框架中融合了张量扩展和压缩，以保持和利用固有的多维结构，从而降低计算成本并提高预测准确性。TEAFormer的核心特性是张量增强（TEA）模块，利用张量扩展增强多视图特征学习，利用张量压缩实现高效信息聚合和减少计算负载。TEA模块不仅是一个特定的模型架构，而且是一个与Transformer的注意机制和编码器-解码器结构高度兼容的多功能组件，使其能够适应现有的Transformer架构。我们的全面实验将TEA模块整合到三个热门时间序列Transformer模型中，并在三个实际基准测试中显示出显著的性能提升，突显了TEAFormer在尖端时间序列预测中的潜力。

更新时间: 2024-10-27 13:32:12

领域: cs.LG,cs.AI,stat.ML

下载: http://arxiv.org/abs/2410.20439v1

Integrating uncertainty quantification into randomized smoothing based robustness guarantees

Deep neural networks have proven to be extremely powerful, however, they are also vulnerable to adversarial attacks which can cause hazardous incorrect predictions in safety-critical applications. Certified robustness via randomized smoothing gives a probabilistic guarantee that the smoothed classifier's predictions will not change within an $\ell_2$-ball around a given input. On the other hand (uncertainty) score-based rejection is a technique often applied in practice to defend models against adversarial attacks. In this work, we fuse these two approaches by integrating a classifier that abstains from predicting when uncertainty is high into the certified robustness framework. This allows us to derive two novel robustness guarantees for uncertainty aware classifiers, namely (i) the radius of an $\ell_2$-ball around the input in which the same label is predicted and uncertainty remains low and (ii) the $\ell_2$-radius of a ball in which the predictions will either not change or be uncertain. While the former provides robustness guarantees with respect to attacks aiming at increased uncertainty, the latter informs about the amount of input perturbation necessary to lead the uncertainty aware model into a wrong prediction. Notably, this is on CIFAR10 up to 20.93% larger than for models not allowing for uncertainty based rejection. We demonstrate, that the novel framework allows for a systematic robustness evaluation of different network architectures and uncertainty measures and to identify desired properties of uncertainty quantification techniques. Moreover, we show that leveraging uncertainty in a smoothed classifier helps out-of-distribution detection.

Updated: 2024-10-27 13:07:43

标题: 将不确定性量化整合到基于随机平滑的鲁棒性保证中

摘要: 深度神经网络已被证明具有极强的能力，然而，它们也容易受到对抗性攻击的影响，这可能在安全关键应用中导致危险的错误预测。通过随机平滑实现的认证鲁棒性可以确保在给定输入周围的$\ell_2$球内，平滑分类器的预测不会改变。另一方面，基于（不确定性）分数的拒绝是一种经常应用于实践中的技术，用于防御对抗性攻击。在这项工作中，我们将这两种方法融合在一起，通过将在不确定性高时放弃预测的分类器整合到认证鲁棒性框架中。这使我们能够为不确定性感知分类器推导出两种新的鲁棒性保证，即(i)围绕输入的$\ell_2$球内预测相同标签且不确定性保持低的半径，以及(ii)球内的$\ell_2$半径，其中预测要么不会改变，要么会不确定。前者提供了相对于旨在增加不确定性的攻击的鲁棒性保证，而后者则告知有关将不确定性感知模型引导至错误预测所需的输入扰动量。值得注意的是，这在CIFAR10上比不允许基于不确定性拒绝的模型高出20.93%。我们证明，这一新框架允许对不同网络架构和不确定性测量进行系统性鲁棒性评估，并确定不确定性量化技术的期望性质。此外，我们展示利用平滑分类器中的不确定性有助于进行超出分布的检测。

更新时间: 2024-10-27 13:07:43

领域: cs.LG,cs.CR,stat.ML

下载: http://arxiv.org/abs/2410.20432v1

MedGo: A Chinese Medical Large Language Model

Large models are a hot research topic in the field of artificial intelligence. Leveraging their generative capabilities has the potential to enhance the level and quality of medical services. In response to the limitations of current large language models, which often struggle with accuracy and have narrow capabilities in medical applications, this paper presents a Chinese medical large language model, MedGo. MedGo was trained using a combination of high quality unsupervised medical data, supervised data, and preference alignment data, aimed at enhancing both its versatility and precision in medical tasks. The model was evaluated through the public CBLUE benchmark and a manually constructed dataset ClinicalQA. The results demonstrate that MedGo achieved promising performance across various Chinese medical information processing tasks, achieved the first place in the CBLUE evaluation. Additionally, on our constructed dataset ClinicalQA, MedGo outperformed its base model Qwen2, highlighting its potential to improve both automated medical question answering and clinical decision support. These experimental results demonstrate that MedGo possesses strong information processing capabilities in the medical field. At present, we have successfully deployed MedGo at Shanghai East Hospital.

Updated: 2024-10-27 12:52:52

标题: MedGo：一个中文医学大型语言模型

摘要: 大型模型是人工智能领域的热门研究课题。利用它们的生成能力有潜力提升医疗服务的水平和质量。针对当前大型语言模型存在的精度问题和在医疗应用中能力狭窄的限制，本文提出了一个中文医学大型语言模型MedGo。MedGo使用高质量的无监督医学数据、监督数据和偏好对齐数据相结合进行训练，旨在提升其在医学任务中的多样性和精确度。该模型通过公共CBLUE基准和手动构建的数据集ClinicalQA进行评估。结果显示，MedGo在各种中文医学信息处理任务中表现出有希望的性能，获得了CBLUE评估的第一名。此外，在我们构建的数据集ClinicalQA上，MedGo表现优于其基础模型Qwen2，突显了其改善自动医学问题回答和临床决策支持的潜力。这些实验结果表明，MedGo在医学领域具有强大的信息处理能力。目前，我们已成功将MedGo部署在上海东方医院。

更新时间: 2024-10-27 12:52:52

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2410.20428v1

ArterialNet: Reconstructing Arterial Blood Pressure Waveform with Wearable Pulsatile Signals, a Cohort-Aware Approach

Continuous arterial blood pressure (ABP) monitoring is invasive but essential for hemodynamic monitoring. Recent techniques have reconstructed ABP non-invasively using pulsatile signals but produced inaccurate systolic and diastolic blood pressure (SBP and DBP) values and were sensitive to individual variability. ArterialNet integrates generalized pulsatile-to-ABP signal translation and personalized feature extraction using hybrid loss functions and regularization. We validated ArterialNet using the MIMIC-III dataset and achieved a root mean square error (RMSE) of 5.41 mmHg, with at least a 58% lower standard deviation. ArterialNet reconstructed ABP with an RMSE of 7.99 mmHg in remote health scenarios. ArterialNet achieved superior performance in ABP reconstruction and SBP and DBP estimations, with significantly reduced subject variance, demonstrating its potential in remote health settings. We also ablated ArterialNet architecture to investigate the contributions of each component and evaluated its translational impact and robustness by conducting a series of ablations on data quality and availability.

Updated: 2024-10-27 12:47:53

标题: 动脉网络：利用可穿戴脉搏信号重建动脉血压波形，一种考虑队列的方法

摘要: 连续动脉血压（ABP）监测是侵入性的，但对血流动力学监测至关重要。最近的技术已经非侵入性地使用脉搏信号重建ABP，但产生了不准确的收缩压和舒张压（SBP和DBP）数值，并且对个体差异敏感。ArterialNet集成了广义脉搏至ABP信号转换和个性化特征提取，使用混合损失函数和正则化。我们使用MIMIC-III数据集验证了ArterialNet，并实现了5.41 mmHg的均方根误差（RMSE），标准偏差至少降低了58%。在远程健康场景中，ArterialNet以7.99 mmHg的RMSE重建了ABP。ArterialNet在ABP重建以及SBP和DBP估计方面表现出优越性能，明显减少了个体差异，展示了其在远程健康设置中的潜力。我们还消融了ArterialNet的架构，以调查每个组件的贡献，并通过对数据质量和可用性进行一系列消融来评估其转化影响和稳健性。

更新时间: 2024-10-27 12:47:53

领域: cs.LG

下载: http://arxiv.org/abs/2410.18895v2

Deconfounding Time Series Forecasting

Time series forecasting is a critical task in various domains, where accurate predictions can drive informed decision-making. Traditional forecasting methods often rely on current observations of variables to predict future outcomes, typically overlooking the influence of latent confounders, unobserved variables that simultaneously affect both the predictors and the target outcomes. This oversight can introduce bias and degrade the performance of predictive models. In this study, we address this challenge by proposing an enhanced forecasting approach that incorporates representations of latent confounders derived from historical data. By integrating these confounders into the predictive process, our method aims to improve the accuracy and robustness of time series forecasts. The proposed approach is demonstrated through its application to climate science data, showing significant improvements over traditional methods that do not account for confounders.

Updated: 2024-10-27 12:45:42

标题: 消除时间序列预测中的混淆

摘要: 时间序列预测是各个领域中的一个关键任务，准确的预测可以推动明智的决策。传统的预测方法通常依赖于当前变量的观察结果来预测未来结果，通常忽视潜在的混杂因素的影响，即同时影响预测变量和目标结果的未观察变量。这种忽视可能引入偏见并降低预测模型的性能。在本研究中，我们通过提出一种增强的预测方法来解决这一挑战，该方法将从历史数据中得出的潜在混杂因素的表示纳入预测过程中。通过将这些混杂因素整合到预测过程中，我们的方法旨在提高时间序列预测的准确性和鲁棒性。提出的方法通过应用于气候科学数据进行了演示，显示出相对于不考虑混杂因素的传统方法的显著改进。

更新时间: 2024-10-27 12:45:42

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2410.21328v1

AutoKaggle: A Multi-Agent Framework for Autonomous Data Science Competitions

Data science tasks involving tabular data present complex challenges that require sophisticated problem-solving approaches. We propose AutoKaggle, a powerful and user-centric framework that assists data scientists in completing daily data pipelines through a collaborative multi-agent system. AutoKaggle implements an iterative development process that combines code execution, debugging, and comprehensive unit testing to ensure code correctness and logic consistency. The framework offers highly customizable workflows, allowing users to intervene at each phase, thus integrating automated intelligence with human expertise. Our universal data science toolkit, comprising validated functions for data cleaning, feature engineering, and modeling, forms the foundation of this solution, enhancing productivity by streamlining common tasks. We selected 8 Kaggle competitions to simulate data processing workflows in real-world application scenarios. Evaluation results demonstrate that AutoKaggle achieves a validation submission rate of 0.85 and a comprehensive score of 0.82 in typical data science pipelines, fully proving its effectiveness and practicality in handling complex data science tasks.

Updated: 2024-10-27 12:44:25

标题: AutoKaggle：一个用于自主数据科学竞赛的多智能体框架

摘要: 涉及表格数据的数据科学任务提出了复杂的挑战，需要复杂的问题解决方法。我们提出了AutoKaggle，这是一个强大而以用户为中心的框架，可以通过协作多代理系统帮助数据科学家完成每日数据管道任务。AutoKaggle实现了一个迭代式开发过程，结合了代码执行、调试和全面的单元测试，以确保代码的正确性和逻辑一致性。该框架提供高度可定制的工作流程，允许用户在每个阶段进行干预，从而将自动化智能与人类专业知识相结合。我们的通用数据科学工具包括经过验证的数据清洗、特征工程和建模功能，构成了这一解决方案的基础，通过简化常见任务来提高生产率。我们选择了8个Kaggle比赛来模拟实际应用场景中的数据处理工作流程。评估结果表明，AutoKaggle在典型的数据科学管道中实现了0.85的验证提交率和0.82的综合得分，充分证明了其在处理复杂数据科学任务方面的有效性和实用性。

更新时间: 2024-10-27 12:44:25

领域: cs.AI,cs.CL

下载: http://arxiv.org/abs/2410.20424v1

Membership Inference on Text-to-Image Diffusion Models via Conditional Likelihood Discrepancy

Text-to-image diffusion models have achieved tremendous success in the field of controllable image generation, while also coming along with issues of privacy leakage and data copyrights. Membership inference arises in these contexts as a potential auditing method for detecting unauthorized data usage. While some efforts have been made on diffusion models, they are not applicable to text-to-image diffusion models due to the high computation overhead and enhanced generalization capabilities. In this paper, we first identify a conditional overfitting phenomenon in text-to-image diffusion models, indicating that these models tend to overfit the conditional distribution of images given the corresponding text rather than the marginal distribution of images only. Based on this observation, we derive an analytical indicator, namely Conditional Likelihood Discrepancy (CLiD), to perform membership inference, which reduces the stochasticity in estimating memorization of individual samples. Experimental results demonstrate that our method significantly outperforms previous methods across various data distributions and dataset scales. Additionally, our method shows superior resistance to overfitting mitigation strategies, such as early stopping and data augmentation.

Updated: 2024-10-27 12:43:56

标题: 通过条件似然差异在文本到图像扩散模型上进行成员推断

摘要: 文本到图像扩散模型在可控图像生成领域取得了巨大成功，同时也伴随着隐私泄露和数据版权等问题。会员推理在这些情况下出现，作为一种检测未经授权数据使用的潜在审计方法。虽然已经对扩散模型进行了一些努力，但由于高计算开销和增强的泛化能力，这些方法并不适用于文本到图像扩散模型。在本文中，我们首先确定了文本到图像扩散模型中的条件过拟合现象，表明这些模型倾向于过度拟合给定文本的图像条件分布，而不是仅仅过拟合图像的边际分布。基于这一观察，我们推导出一种分析指标，即条件似然差异（CLiD），以执行会员推理，从而减少估计个别样本记忆的随机性。实验结果表明，我们的方法在各种数据分布和数据集规模上显著优于先前的方法。此外，我们的方法显示出对过拟合缓解策略（如提前停止和数据增强）具有优越的抵抗能力。

更新时间: 2024-10-27 12:43:56

领域: cs.CR,cs.CV

下载: http://arxiv.org/abs/2405.14800v3

CollaFuse: Collaborative Diffusion Models

In the landscape of generative artificial intelligence, diffusion-based models have emerged as a promising method for generating synthetic images. However, the application of diffusion models poses numerous challenges, particularly concerning data availability, computational requirements, and privacy. Traditional approaches to address these shortcomings, like federated learning, often impose significant computational burdens on individual clients, especially those with constrained resources. In response to these challenges, we introduce a novel approach for distributed collaborative diffusion models inspired by split learning. Our approach facilitates collaborative training of diffusion models while alleviating client computational burdens during image synthesis. This reduced computational burden is achieved by retaining data and computationally inexpensive processes locally at each client while outsourcing the computationally expensive processes to shared, more efficient server resources. Through experiments on the common CelebA dataset, our approach demonstrates enhanced privacy by reducing the necessity for sharing raw data. These capabilities hold significant potential across various application areas, including the design of edge computing solutions. Thus, our work advances distributed machine learning by contributing to the evolution of collaborative diffusion models.

Updated: 2024-10-27 12:42:53

标题: CollaFuse：协同扩散模型

摘要: 在生成人工智能的领域中，基于扩散的模型已经成为一种有前途的方法，用于生成合成图像。然而，扩散模型的应用提出了许多挑战，特别是在数据可用性、计算需求和隐私方面。传统的方法来解决这些缺点，如联邦学习，通常会给个体客户带来重大的计算负担，特别是那些资源受限的客户。作为对这些挑战的回应，我们引入了一种受分割学习启发的分布式协作扩散模型的新方法。我们的方法促进了扩散模型的协作训练，同时减轻了客户在图像合成过程中的计算负担。通过在常见的CelebA数据集上进行实验，我们的方法通过减少共享原始数据的必要性来展示了增强的隐私保护能力。这些功能在各种应用领域中具有重大潜力，包括边缘计算解决方案的设计。因此，我们的工作通过促进协作扩散模型的进化来推动了分布式机器学习的发展。

更新时间: 2024-10-27 12:42:53

领域: cs.LG,cs.AI,cs.CV

下载: http://arxiv.org/abs/2406.14429v2

NT-VOT211: A Large-Scale Benchmark for Night-time Visual Object Tracking

Many current visual object tracking benchmarks such as OTB100, NfS, UAV123, LaSOT, and GOT-10K, predominantly contain day-time scenarios while the challenges posed by the night-time has been less investigated. It is primarily because of the lack of a large-scale, well-annotated night-time benchmark for rigorously evaluating tracking algorithms. To this end, this paper presents NT-VOT211, a new benchmark tailored for evaluating visual object tracking algorithms in the challenging night-time conditions. NT-VOT211 consists of 211 diverse videos, offering 211,000 well-annotated frames with 8 attributes including camera motion, deformation, fast motion, motion blur, tiny target, distractors, occlusion and out-of-view. To the best of our knowledge, it is the largest night-time tracking benchmark to-date that is specifically designed to address unique challenges such as adverse visibility, image blur, and distractors inherent to night-time tracking scenarios. Through a comprehensive analysis of results obtained from 42 diverse tracking algorithms on NT-VOT211, we uncover the strengths and limitations of these algorithms, highlighting opportunities for enhancements in visual object tracking, particularly in environments with suboptimal lighting. Besides, a leaderboard for revealing performance rankings, annotation tools, comprehensive meta-information and all the necessary code for reproducibility of results is made publicly available. We believe that our NT-VOT211 benchmark will not only be instrumental in facilitating field deployment of VOT algorithms, but will also help VOT enhancements and it will unlock new real-world tracking applications. Our dataset and other assets can be found at: {https://github.com/LiuYuML/NV-VOT211.

Updated: 2024-10-27 12:19:48

标题: NT-VOT211：一个大规模的夜间视觉目标跟踪基准

摘要: 许多当前的视觉目标跟踪基准，如OTB100、NfS、UAV123、LaSOT和GOT-10K，主要包含白天场景，而夜间场景带来的挑战较少被研究。这主要是因为缺乏一个大规模、良好注释的夜间基准来严格评估跟踪算法。为此，本文提出了NT-VOT211，一个新的专为评估视觉目标跟踪算法在具有挑战性的夜间条件下设计的基准。NT-VOT211包含211个不同的视频，提供了211,000帧带有8个属性的良好注释，包括相机运动、变形、快速运动、运动模糊、微小目标、干扰物、遮挡和视野之外。据我们所知，这是迄今为止最大的专门设计用于解决夜间跟踪场景固有挑战的夜间跟踪基准。通过对42种不同跟踪算法在NT-VOT211上获得的结果进行全面分析，我们揭示了这些算法的优势和局限性，突出了在视觉目标跟踪中改进机会，特别是在光照不佳的环境中。此外，为了揭示性能排名、注释工具、全面的元信息和所有必要的可重复性结果代码，我们公开提供了一个排行榜。我们相信我们的NT-VOT211基准将不仅有助于促进VOT算法的现场部署，还将有助于VOT的改进，并将开启新的真实世界跟踪应用。我们的数据集和其他资产可以在以下网址找到：{https://github.com/LiuYuML/NV-VOT211}。

更新时间: 2024-10-27 12:19:48

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2410.20421v1

Inevitable Trade-off between Watermark Strength and Speculative Sampling Efficiency for Language Models

Large language models are probabilistic models, and the process of generating content is essentially sampling from the output distribution of the language model. Existing watermarking techniques inject watermarks into the generated content without altering the output quality. On the other hand, existing acceleration techniques, specifically speculative sampling, leverage a draft model to speed up the sampling process while preserving the output distribution. However, there is no known method to simultaneously accelerate the sampling process and inject watermarks into the generated content. In this paper, we investigate this direction and find that the integration of watermarking and acceleration is non-trivial. We prove a no-go theorem, which states that it is impossible to simultaneously maintain the highest watermark strength and the highest sampling efficiency. Furthermore, we propose two methods that maintain either the sampling efficiency or the watermark strength, but not both. Our work provides a rigorous theoretical foundation for understanding the inherent trade-off between watermark strength and sampling efficiency in accelerating the generation of watermarked tokens for large language models. We also conduct numerical experiments to validate our theoretical findings and demonstrate the effectiveness of the proposed methods.

Updated: 2024-10-27 12:00:19

标题: 语言模型中水印强度与推测采样效率之间不可避免的权衡

摘要: 大型语言模型是概率模型，生成内容的过程实质上是从语言模型的输出分布中进行采样。现有的水印技术将水印注入到生成的内容中，而不会改变输出质量。另一方面，现有的加速技术，特别是猜测采样，利用一个草稿模型来加速采样过程同时保持输出分布。然而，目前还没有已知的方法可以同时加速采样过程并注入水印到生成的内容中。在本文中，我们研究了这个方向，并发现水印和加速的整合是非常复杂的。我们证明了一个不可行定理，它指出不可能同时保持最高的水印强度和最高的采样效率。此外，我们提出了两种方法，分别保持采样效率或水印强度，但不能同时保持两者。我们的工作为理解在加速生成大型语言模型的水印标记时水印强度和采样效率之间固有的权衡提供了严格的理论基础。我们还进行了数值实验来验证我们的理论发现，并展示所提出方法的有效性。

更新时间: 2024-10-27 12:00:19

领域: cs.CR,cs.AI

下载: http://arxiv.org/abs/2410.20418v1

AIME: AI System Optimization via Multiple LLM Evaluators

Text-based AI system optimization typically involves a feedback loop scheme where a single LLM generates an evaluation in natural language of the current output to improve the next iteration's output. However, in this work, we empirically demonstrate that for a practical and complex task (code generation) with multiple criteria to evaluate, utilizing only one LLM evaluator tends to let errors in generated code go undetected, thus leading to incorrect evaluations and ultimately suboptimal test case performance. Motivated by this failure case, we assume there exists an optimal evaluation policy that samples an evaluation between response and ground truth. We then theoretically prove that a linear combination of multiple evaluators can approximate this optimal policy. From this insight, we propose AI system optimization via Multiple LLM Evaluators (AIME). AIME is an evaluation protocol that utilizes multiple LLMs that each independently generate an evaluation on separate criteria and then combine them via concatenation. We provide an extensive empirical study showing AIME outperforming baseline methods in code generation tasks, with up to $62\%$ higher error detection rate and up to $16\%$ higher success rate than a single LLM evaluation protocol on LeetCodeHard and HumanEval datasets. We also show that the selection of the number of evaluators and which criteria to utilize is non-trivial as it can impact pact success rate by up to $12\%$.

Updated: 2024-10-27 11:48:10

标题: AIME: 通过多个LLM评估器进行AI系统优化

摘要: 基于文本的人工智能系统优化通常涉及反馈循环方案，其中一个单一的LLM生成自然语言评估当前输出，以改进下一次迭代的输出。然而，在这项工作中，我们经验性地证明，对于一个实际而复杂的任务（代码生成），使用仅一个LLM评估器往往会导致生成的代码中的错误未被检测到，从而导致不正确的评估，最终导致测试用例性能不佳。受这种失败案例的启发，我们假设存在一种最佳评估策略，可以在响应和真实情况之间对评估进行采样。然后，我们在理论上证明，多个评估器的线性组合可以近似这种最佳策略。基于这一洞察，我们提出了通过多个LLM评估器（AIME）进行AI系统优化。AIME是一种评估协议，利用多个LLM，每个LLM独立地在不同的标准上生成评估，然后通过连接它们进行组合。我们进行了广泛的经验研究，显示AIME在代码生成任务中优于基线方法，错误检测率高达62%，成功率比LeetCodeHard和HumanEval数据集上的单个LLM评估协议高达16%。我们还展示了评估器的数量选择和使用哪些标准是不平凡的，因为它可以使成功率下降高达12%。

更新时间: 2024-10-27 11:48:10

领域: cs.AI,cs.CL,cs.LG

下载: http://arxiv.org/abs/2410.03131v2

Internet of Federated Digital Twins (IoFDT): Connecting Twins Beyond Borders for Society 5.0

The concept of digital twin (DT), which enables the creation of a programmable, digital representation of physical systems, is expected to revolutionize future industries and will lie at the heart of the vision of a future smart society, namely, Society 5.0, in which high integration between cyber (digital) and physical spaces is exploited to bring economic and societal advancements. However, the success of such a DT-driven Society 5.0 requires a synergistic convergence of artificial intelligence and networking technologies into an integrated, programmable system that can coordinate DT networks to effectively deliver diverse Society 5.0 services. Prior works remain restricted to either qualitative study, simple analysis or software implementations of a single DT, and thus, they cannot provide the highly synergistic integration of digital and physical spaces as required by Society 5.0. In contrast, this paper envisions a novel concept of an Internet of Federated Digital Twins (IoFDT) that holistically integrates heterogeneous and physically separated DTs representing different Society 5.0 services within a single framework and system. For this concept of IoFDT, we first introduce a hierarchical architecture that integrates federated DTs through horizontal and vertical interactions, bridging cyber and physical spaces to unlock new possibilities. Then, we discuss challenges of realizing IoFDT, highlighting the intricacies across communication, computing, and AI-native networks while also underscoring potential innovative solutions. Subsequently, we elaborate on the importance of the implementation of a unified IoFDT platform that integrates all technical components and orchestrates their interactions, emphasizing the necessity of practical experimental platforms with a focus on real-world applications in areas like smart mobility.

Updated: 2024-10-27 11:35:59

标题: 联邦数字孪生物联网（IoFDT）：连接孪生体，实现社会5.0境界

摘要: 数字孪生（DT）的概念使得可以创建可编程的、数字化的物理系统表示，预计将彻底改变未来的产业，并将成为未来智能社会愿景的核心，即Society 5.0，其中数字空间和物理空间之间的高度整合被利用来带来经济和社会进步。然而，这种由DT驱动的Society 5.0的成功需要人工智能和网络技术的协同融合，形成一个集成的、可编程的系统，可以协调DT网络，有效地提供多样化的Society 5.0服务。之前的研究仍然局限于定性研究、简单分析或单一DT的软件实现，因此无法提供Society 5.0所需的数字和物理空间的高度协同整合。相比之下，本文构想了一个新颖的概念，即联邦数字孪生物联网（IoFDT），在一个框架和系统内全面整合异构和物理上分离的DT，代表不同的Society 5.0服务。对于IoFDT的概念，我们首先介绍了一个通过水平和垂直相互作用整合联邦DT的分层架构，架起了数字和物理空间之间的桥梁，开启了新的可能性。然后，我们讨论了实现IoFDT的挑战，强调了跨通信、计算和人工智能本地网络的复杂性，同时也强调了潜在的创新解决方案。随后，我们详细阐述了实施统一IoFDT平台的重要性，该平台整合了所有技术组件并协调它们的互动，强调了在实际应用领域（如智能移动性）中关注真实世界应用的实际实验平台的必要性。

更新时间: 2024-10-27 11:35:59

领域: cs.AI

下载: http://arxiv.org/abs/2312.06432v3

Development and Evaluation of a Retrieval-Augmented Generation Tool for Creating SAPPhIRE Models of Artificial Systems

Representing systems using the SAPPhIRE causality model is found useful in supporting design-by-analogy. However, creating a SAPPhIRE model of artificial or biological systems is an effort-intensive process that requires human experts to source technical knowledge from multiple technical documents regarding how the system works. This research investigates how to leverage Large Language Models (LLMs) in creating structured descriptions of systems using the SAPPhIRE model of causality. This paper, the second part of the two-part research, presents a new Retrieval-Augmented Generation (RAG) tool for generating information related to SAPPhIRE constructs of artificial systems and reports the results from a preliminary evaluation of the tool's success - focusing on the factual accuracy and reliability of outcomes.

Updated: 2024-10-27 11:28:07

标题: 《用于创建人工系统SAPPhIRE模型的检索增强生成工具的开发和评估》

摘要: 使用SAPPhIRE因果模型来表示系统在支持类比设计方面是有用的。然而，创建人工或生物系统的SAPPhIRE模型是一个耗时的过程，需要人类专家从多个技术文档中获取关于系统运作方式的技术知识。本研究探讨如何利用大型语言模型（LLMs）来使用SAPPhIRE因果模型创建系统的结构化描述。本文是两部分研究的第二部分，提出了一种新的检索增强生成（RAG）工具，用于生成与人工系统的SAPPhIRE构造相关的信息，并报告了工具成功的初步评估结果 - 着重关注事实准确性和结果的可靠性。

更新时间: 2024-10-27 11:28:07

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2406.19493v2

A Multivocal Literature Review on Privacy and Fairness in Federated Learning

Federated Learning presents a way to revolutionize AI applications by eliminating the necessity for data sharing. Yet, research has shown that information can still be extracted during training, making additional privacy-preserving measures such as differential privacy imperative. To implement real-world federated learning applications, fairness, ranging from a fair distribution of performance to non-discriminative behaviour, must be considered. Particularly in high-risk applications (e.g. healthcare), avoiding the repetition of past discriminatory errors is paramount. As recent research has demonstrated an inherent tension between privacy and fairness, we conduct a multivocal literature review to examine the current methods to integrate privacy and fairness in federated learning. Our analyses illustrate that the relationship between privacy and fairness has been neglected, posing a critical risk for real-world applications. We highlight the need to explore the relationship between privacy, fairness, and performance, advocating for the creation of integrated federated learning frameworks.

Updated: 2024-10-27 11:08:39

标题: 一篇关于联邦学习中隐私和公平性的多声音文献综述

摘要: 联邦学习提供了一种通过消除数据共享的必要性来彻底改变人工智能应用的方式。然而，研究表明在训练过程中仍然可以提取信息，因此实施额外的隐私保护措施，如差分隐私，变得至关重要。为了实施真实世界的联邦学习应用，公平性，从性能公平分配到非歧视行为，必须被考虑在内。特别是在高风险应用中（如医疗保健），避免重复过去的歧视性错误是至关重要的。正如最近的研究已经证明隐私和公平之间存在固有的张力，我们进行了多声音文献综述，以审查当前整合隐私和公平性的方法。我们的分析表明，隐私和公平之间的关系被忽视，对于真实世界的应用构成了重大风险。我们强调有必要探索隐私、公平和性能之间的关系，倡导创建整合的联邦学习框架。

更新时间: 2024-10-27 11:08:39

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2408.08666v2

LOBG:Less Overfitting for Better Generalization in Vision-Language Model

Existing prompt learning methods in Vision-Language Models (VLM) have effectively enhanced the transfer capability of VLM to downstream tasks, but they suffer from a significant decline in generalization due to severe overfitting. To address this issue, we propose a framework named LOBG for vision-language models. Specifically, we use CLIP to filter out fine-grained foreground information that might cause overfitting, thereby guiding prompts with basic visual concepts. To further mitigate overfitting, we devel oped a structural topology preservation (STP) loss at the feature level, which endows the feature space with overall plasticity, allowing effective reshaping of the feature space during optimization. Additionally, we employed hierarchical logit distilation (HLD) at the output level to constrain outputs, complementing STP at the output end. Extensive experimental results demonstrate that our method significantly improves generalization capability and alleviates overfitting compared to state-of-the-art approaches.

Updated: 2024-10-27 10:40:39

标题: LOBG：视觉-语言模型中更好泛化的少过拟合

摘要: 现有的视觉语言模型（VLM）中的提示学习方法有效地增强了VLM对下游任务的转移能力，但由于严重的过拟合而导致泛化能力显著下降。为了解决这个问题，我们提出了一个名为LOBG的视觉语言模型框架。具体来说，我们使用CLIP来过滤可能导致过拟合的细粒度前景信息，从而引导提示使用基本的视觉概念。为了进一步缓解过拟合问题，我们在特征级别开发了一种结构拓扑保持（STP）损失，赋予特征空间整体可塑性，允许在优化过程中有效地重塑特征空间。此外，我们在输出级别采用了分层逻辑蒸馏（HLD）来约束输出，在输出端补充了STP。大量实验结果表明，与最先进的方法相比，我们的方法显著改善了泛化能力并缓解了过拟合问题。

更新时间: 2024-10-27 10:40:39

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2410.10247v2

Causal Modeling in Multi-Context Systems: Distinguishing Multiple Context-Specific Causal Graphs which Account for Observational Support

Causal structure learning with data from multiple contexts carries both opportunities and challenges. Opportunities arise from considering shared and context-specific causal graphs enabling to generalize and transfer causal knowledge across contexts. However, a challenge that is currently understudied in the literature is the impact of differing observational support between contexts on the identifiability of causal graphs. Here we study in detail recently introduced [6] causal graph objects that capture both causal mechanisms and data support, allowing for the analysis of a larger class of context-specific changes, characterizing distribution shifts more precisely. We thereby extend results on the identifiability of context-specific causal structures and propose a framework to model context-specific independence (CSI) within structural causal models (SCMs) in a refined way that allows to explore scenarios where these graph objects differ. We demonstrate how this framework can help explaining phenomena like anomalies or extreme events, where causal mechanisms change or appear to change under different conditions. Our results contribute to the theoretical foundations for understanding causal relations in multi-context systems, with implications for generalization, transfer learning, and anomaly detection. Future work may extend this approach to more complex data types, such as time-series.

Updated: 2024-10-27 10:34:58

标题: 多环境系统中的因果建模：区分多个特定环境的因果图，解释观测支持

摘要: 使用来自多个环境的数据进行因果结构学习既带来机遇又面临挑战。通过考虑共享和特定于环境的因果图，可以实现在不同环境之间推广和转移因果知识的机会。然而，文献中目前未充分研究的一个挑战是不同环境之间观测支持的差异对因果图可辨识性的影响。在这里，我们详细研究了最近引入的既捕捉因果机制又捕捉数据支持的因果图对象，从而允许分析更大类别的特定于环境的变化，更精确地描述分布变化。我们扩展了有关特定于环境的因果结构可辨识性的结果，并提出了一个框架，以一种精细的方式在结构因果模型（SCMs）中建模特定于环境的独立性（CSI），从而可以探索这些图对象不同的情景。我们演示了这一框架如何有助于解释异常或极端事件等现象，在这些情况下，因果机制会发生变化或似乎在不同条件下发生变化。我们的结果有助于理解多环境系统中因果关系的理论基础，对泛化、转移学习和异常检测具有意义。未来的工作可以将这种方法扩展到更复杂的数据类型，如时间序列数据。

更新时间: 2024-10-27 10:34:58

领域: cs.LG,math.ST,stat.TH

下载: http://arxiv.org/abs/2410.20405v1

Deep Learning-Driven Microstructure Characterization and Vickers Hardness Prediction of Mg-Gd Alloys

In the field of materials science, exploring the relationship between composition, microstructure, and properties has long been a critical research focus. The mechanical performance of solid-solution Mg-Gd alloys is significantly influenced by Gd content, dendritic structures, and the presence of secondary phases. To better analyze and predict the impact of these factors, this study proposes a multimodal fusion learning framework based on image processing and deep learning techniques. This framework integrates both elemental composition and microstructural features to accurately predict the Vickers hardness of solid-solution Mg-Gd alloys. Initially, deep learning methods were employed to extract microstructural information from a variety of solid-solution Mg-Gd alloy images obtained from literature and experiments. This provided precise grain size and secondary phase microstructural features for performance prediction tasks. Subsequently, these quantitative analysis results were combined with Gd content information to construct a performance prediction dataset. Finally, a regression model based on the Transformer architecture was used to predict the Vickers hardness of Mg-Gd alloys. The experimental results indicate that the Transformer model performs best in terms of prediction accuracy, achieving an R^2 value of 0.9. Additionally, SHAP analysis identified critical values for four key features affecting the Vickers hardness of Mg-Gd alloys, providing valuable guidance for alloy design. These findings not only enhance the understanding of alloy performance but also offer theoretical support for future material design and optimization.

Updated: 2024-10-27 10:28:29

标题: 深度学习驱动的Mg-Gd合金微观结构表征和维氏硬度预测

摘要: 在材料科学领域，探索成分、微观结构和性能之间的关系长期以来一直是关键的研究焦点。固溶态Mg-Gd合金的机械性能受Gd含量、树枝状结构和次生相的显著影响。为了更好地分析和预测这些因素的影响，本研究提出了一种基于图像处理和深度学习技术的多模态融合学习框架。该框架整合了元素成分和微观结构特征，以准确预测固溶态Mg-Gd合金的维氏硬度。首先，采用深度学习方法从文献和实验获得的各种固溶态Mg-Gd合金图像中提取微观结构信息。这为性能预测任务提供了精确的晶粒尺寸和次生相微观结构特征。随后，将这些定量分析结果与Gd含量信息相结合，构建了一个性能预测数据集。最后，基于Transformer架构的回归模型用于预测Mg-Gd合金的维氏硬度。实验结果表明，Transformer模型在预测准确度方面表现最佳，实现了0.9的R^2值。此外，SHAP分析确定了影响Mg-Gd合金维氏硬度的四个关键特征的关键值，为合金设计提供了有价值的指导。这些发现不仅增强了对合金性能的理解，还为未来材料设计和优化提供了理论支持。

更新时间: 2024-10-27 10:28:29

领域: cs.LG,cond-mat.mtrl-sci,cs.CV

下载: http://arxiv.org/abs/2410.20402v1

Prototypical Extreme Multi-label Classification with a Dynamic Margin Loss

Extreme Multi-label Classification (XMC) methods predict relevant labels for a given query in an extremely large label space. Recent works in XMC address this problem using deep encoders that project text descriptions to an embedding space suitable for recovering the closest labels. However, learning deep models can be computationally expensive in large output spaces, resulting in a trade-off between high performing brute-force approaches and efficient solutions. In this paper, we propose PRIME, a XMC method that employs a novel prototypical contrastive learning technique to reconcile efficiency and performance surpassing brute-force approaches. We frame XMC as a data-to-prototype prediction task where label prototypes aggregate information from related queries. More precisely, we use a shallow transformer encoder that we coin as Label Prototype Network, which enriches label representations by aggregating text-based embeddings, label centroids and learnable free vectors. We jointly train a deep encoder and the Label Prototype Network using an adaptive triplet loss objective that better adapts to the high granularity and ambiguity of extreme label spaces. PRIME achieves state-of-the-art results in several public benchmarks of different sizes and domains, while keeping the model efficient.

Updated: 2024-10-27 10:24:23

标题: 使用动态边际损失的原型极端多标签分类

摘要: 极端多标签分类（XMC）方法预测给定查询在一个非常大的标签空间中的相关标签。最近的XMC研究使用深度编码器来解决这个问题，将文本描述投影到适合恢复最接近标签的嵌入空间。然而，在大输出空间中学习深度模型可能会导致计算成本高昂，从而在高性能的蛮力方法和高效解决方案之间产生权衡。在本文中，我们提出了PRIME，一种XMC方法，采用一种新颖的原型对比学习技术来协调效率和性能，超越蛮力方法。我们将XMC构建为一个数据到原型预测任务，其中标签原型聚合来自相关查询的信息。更具体地说，我们使用一个浅层Transformer编码器，称之为标签原型网络，通过聚合基于文本的嵌入、标签中心和可学习的自由向量来丰富标签表示。我们使用自适应三元损失目标共同训练深度编码器和标签原型网络，更好地适应极端标签空间的高细粒度和歧义性。PRIME在多个不同大小和领域的公共基准测试中取得了最先进的结果，同时保持模型高效。

更新时间: 2024-10-27 10:24:23

领域: cs.LG,cs.IR

下载: http://arxiv.org/abs/2410.20401v1

ThunderKittens: Simple, Fast, and Adorable AI Kernels

The challenge of mapping AI architectures to GPU hardware is creating a critical bottleneck in AI progress. Despite substantial efforts, hand-written custom kernels fail to meet their theoretical performance thresholds, even on well-established operations like linear attention. The diverse hardware capabilities of GPUs might suggest that we need a wide variety of techniques to achieve high performance. However, our work explores whether a small number of key abstractions can drastically simplify the process. We present ThunderKittens (TK), a framework for writing performant AI kernels while remaining easy to use and maintain. Our abstractions map to the three levels of the GPU hierarchy: (1) at the warp-level, we provide 16x16 matrix tiles as basic data structures and PyTorch-like parallel compute operations over tiles, (2) at the thread-block level, we provide a template for overlapping asynchronous operations across parallel warps, and (3) at the grid-level, we provide support to help hide the block launch and tear-down, and memory costs. We show the value of TK by providing kernels that match or outperform prior kernels for a range of AI operations. We match CuBLAS and FlashAttention-3 on GEMM and attention inference performance and outperform the strongest baselines by $10-40\%$ on attention backwards, $8\times$ on state space models, and $14\times$ on linear attention.

Updated: 2024-10-27 10:07:16

标题: 雷霆小猫：简单、快速和可爱的人工智能内核

摘要: 将AI架构映射到GPU硬件的挑战正在成为人工智能进展中的一个关键瓶颈。尽管付出了大量努力，手工编写的自定义内核未能达到它们的理论性能阈值，即使在像线性注意力这样的成熟操作上也是如此。GPU的多样硬件能力可能表明我们需要各种各样的技术来实现高性能。然而，我们的工作探讨了是否一小部分关键抽象可以极大地简化这一过程。我们提出了ThunderKittens (TK)，一个用于编写高性能AI内核的框架，同时保持易于使用和维护。我们的抽象映射到GPU层次的三个级别：（1）在warp级别，我们提供16x16矩阵瓦片作为基本数据结构，并在瓦片上提供类似于PyTorch的并行计算操作，（2）在线程块级别，我们提供一个模板，用于跨并行warp之间的重叠异步操作，（3）在网格级别，我们提供支持以帮助隐藏块的启动和拆除，以及内存成本。我们通过提供与一系列AI操作相匹配或优于先前内核的内核，展示了TK的价值。我们在GEMM和注意力推理性能上与CuBLAS和FlashAttention-3相匹配并超越最强基线，注意力后向性能提高了$10-40\%$，状态空间模型提高了$8\times$，线性注意力提高了$14\times$。

更新时间: 2024-10-27 10:07:16

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2410.20399v1

Evaluation of uncertainty estimations for Gaussian process regression based machine learning interatomic potentials

Machine learning interatomic potentials (MLIPs) have seen significant advances as efficient replacement of expensive quantum chemical calculations. Uncertainty estimations for MLIPs are crucial to quantify the additional model error they introduce and to leverage this information in active learning strategies. MLIPs that are based on Gaussian process regression provide a standard deviation as a possible uncertainty measure. An alternative approach are ensemble-based uncertainties. Although these uncertainty measures have been applied to active learning, it has rarely been studied how they correlate with the error, and it is not always clear whether active learning actually outperforms random sampling strategies. We consider GPR models with Coulomb and SOAP representations as inputs to predict potential energy surfaces and excitation energies of molecules. We evaluate, how the GPR variance and ensemble-based uncertainties relate to the error and whether model performance improves by selecting the most uncertain samples from a fixed configuration space. For the ensemble based uncertainty estimations, we find that they often do not provide any information about the error. For the GPR standard deviation, we find that often predictions with an increasing standard deviation also have an increasing systematical bias, which is not captured by the uncertainty. In these cases, selecting training samples with the highest uncertainty leads to a model with a worse test error compared to random sampling. We conclude that confidence intervals, which are derived from the predictive standard deviation, can be highly overconfident. Selecting samples with high GPR standard deviation leads to a model that overemphasizes the borders of the configuration space represented in the fixed dataset. This may result in worse performance in more densely sampled areas but better generalization for extrapolation tasks.

Updated: 2024-10-27 10:06:09

标题: 对基于高斯过程回归的机器学习原子间势的不确定性估计评估

摘要: 机器学习原子间势（MLIPs）已经取得了显著进展，成为昂贵的量子化学计算的有效替代品。对MLIPs的不确定性估计对于量化它们引入的额外模型误差以及在主动学习策略中利用这些信息至关重要。基于高斯过程回归的MLIPs提供了标准差作为可能的不确定性度量。另一种方法是基于集成的不确定性。尽管这些不确定性度量已被应用于主动学习，但很少有研究它们与误差的相关性，而且并不总是清楚主动学习是否实际上优于随机抽样策略。我们考虑使用库仑和SOAP表示作为输入来预测分子的势能表面和激发能的GPR模型。我们评估了GPR方差和基于集成的不确定性与误差之间的关系，以及通过从固定配置空间中选择最不确定的样本是否能改善模型性能。对于基于集成的不确定性估计，我们发现它们通常不提供有关误差的任何信息。对于GPR标准差，我们发现通常具有增加标准差的预测也具有增加的系统偏差，这并不被不确定性所捕捉。在这些情况下，选择具有最高不确定性的训练样本会导致模型的测试误差比随机抽样更差。我们得出结论，从预测标准差中得出的置信区间可能高度过于自信。选择具有高GPR标准差的样本会导致模型过于强调在固定数据集中表示的配置空间的边界。这可能导致在更密集取样的区域性能更差，但在外推任务中有更好的泛化能力。

更新时间: 2024-10-27 10:06:09

领域: cs.LG,physics.chem-ph,physics.comp-ph,q-bio.BM

下载: http://arxiv.org/abs/2410.20398v1

DALD: Improving Logits-based Detector without Logits from Black-box LLMs

The advent of Large Language Models (LLMs) has revolutionized text generation, producing outputs that closely mimic human writing. This blurring of lines between machine- and human-written text presents new challenges in distinguishing one from the other a task further complicated by the frequent updates and closed nature of leading proprietary LLMs. Traditional logits-based detection methods leverage surrogate models for identifying LLM-generated content when the exact logits are unavailable from black-box LLMs. However, these methods grapple with the misalignment between the distributions of the surrogate and the often undisclosed target models, leading to performance degradation, particularly with the introduction of new, closed-source models. Furthermore, while current methodologies are generally effective when the source model is identified, they falter in scenarios where the model version remains unknown, or the test set comprises outputs from various source models. To address these limitations, we present Distribution-Aligned LLMs Detection (DALD), an innovative framework that redefines the state-of-the-art performance in black-box text detection even without logits from source LLMs. DALD is designed to align the surrogate model's distribution with that of unknown target LLMs, ensuring enhanced detection capability and resilience against rapid model iterations with minimal training investment. By leveraging corpus samples from publicly accessible outputs of advanced models such as ChatGPT, GPT-4 and Claude-3, DALD fine-tunes surrogate models to synchronize with unknown source model distributions effectively.

Updated: 2024-10-27 09:55:40

标题: DALD：在不使用黑盒LLMs的情况下改进基于Logits的检测器

摘要: 大型语言模型（LLMs）的出现彻底改变了文本生成，产生的输出与人类写作极为相似。机器生成和人类写作之间的界限变得模糊，这给区分二者带来了新的挑战，尤其是在领先的专有LLMs频繁更新和封闭的情况下。传统的基于logits的检测方法利用替代模型来识别LLM生成的内容，当黑盒LLMs无法提供确切的logits时。然而，这些方法在替代模型和通常未公开的目标模型之间的分布不一致时会遇到困难，尤其是在引入新的闭源模型时性能会下降。此外，当前方法通常在确定源模型时效果良好，但在模型版本未知或测试集包含来自各种源模型的输出的情况下表现不佳。为了解决这些局限性，我们提出了Distribution-Aligned LLMs Detection（DALD），这是一个创新的框架，即使没有来自源LLMs的logits也可以重新定义黑盒文本检测的最新性能。DALD旨在将替代模型的分布与未知目标LLMs的分布对齐，确保增强的检测能力，并抵御对快速模型迭代的最少训练投入。通过利用ChatGPT、GPT-4和Claude-3等先进模型的公开可访问输出的语料库样本，DALD可以有效地微调替代模型，使其与未知源模型的分布同步。

更新时间: 2024-10-27 09:55:40

领域: cs.CL,cs.LG

下载: http://arxiv.org/abs/2406.05232v4

Hierarchical Multiple Kernel K-Means Algorithm Based on Sparse Connectivity

Multiple kernel learning (MKL) aims to find an optimal, consistent kernel function. In the hierarchical multiple kernel clustering (HMKC) algorithm, sample features are extracted layer by layer from a high-dimensional space to maximize the retention of effective information. However, information interaction between layers is often ignored. In this model, only corresponding nodes in adjacent layers exchange information; other nodes remain isolated, and if full connectivity is adopted, the diversity of the final consistency matrix is reduced. Therefore, this paper proposes a hierarchical multiple kernel K-Means (SCHMKKM) algorithm based on sparse connectivity, which controls the assignment matrix to achieve sparse connections through a sparsity rate, thereby locally fusing the features obtained by distilling information between layers. Finally, we conduct cluster analysis on multiple datasets and compare it with the fully connected hierarchical multiple kernel K-Means (FCHMKKM) algorithm in experiments. It is shown that more discriminative information fusion is beneficial for learning a better consistent partition matrix, and the fusion strategy based on sparse connection outperforms the full connection strategy.

Updated: 2024-10-27 09:35:09

标题: 基于稀疏连接性的分层多核K均值算法

摘要: 多核学习（MKL）旨在找到最佳的一致核函数。在分层多核聚类（HMKC）算法中，从高维空间逐层提取样本特征，以最大化有效信息的保留。然而，层间信息交互经常被忽略。在这个模型中，只有相邻层中对应的节点交换信息；其他节点保持孤立状态，如果采用完全连接，最终一致性矩阵的多样性将会降低。因此，本文提出了基于稀疏连接的分层多核K-Means（SCHMKKM）算法，通过稀疏率控制分配矩阵，从而通过在层间提炼信息来局部融合特征。最后，我们在多个数据集上进行聚类分析，并在实验中将其与全连接的分层多核K-Means（FCHMKKM）算法进行比较。实验证明，更具有区分性的信息融合有助于学习更好的一致分区矩阵，而基于稀疏连接的融合策略优于全连接策略。

更新时间: 2024-10-27 09:35:09

领域: cs.LG

下载: http://arxiv.org/abs/2410.20391v1

Towards Efficient and Scalable Training of Differentially Private Deep Learning

Differentially private stochastic gradient descent (DP-SGD) is the standard algorithm for training machine learning models under differential privacy (DP). The most common DP-SGD privacy accountants rely on Poisson subsampling for ensuring the theoretical DP guarantees. Implementing computationally efficient DP-SGD with Poisson subsampling is not trivial, which leads to many implementations ignoring this requirement. We conduct a comprehensive empirical study to quantify the computational cost of training deep learning models under DP given the requirement of Poisson subsampling, by re-implementing efficient methods using Poisson subsampling and benchmarking them. We find that using the naive implementation DP-SGD with Opacus in PyTorch has between 2.6 and 8 times lower throughput of processed training examples per second than SGD. However, efficient gradient clipping implementations with e.g. Ghost Clipping can roughly halve this cost. We propose alternative computationally efficient ways of implementing DP-SGD with JAX that are using Poisson subsampling and achieve only around 1.2 times lower throughput than SGD based on PyTorch. We highlight important implementation considerations with JAX. Finally, we study the scaling behaviour using up to 80 GPUs and find that DP-SGD scales better than SGD. We share our re-implementations using Poisson subsampling at https://github.com/DPBayes/Towards-Efficient-Scalable-Training-DP-DL.

Updated: 2024-10-27 09:32:52

标题: 朝着高效和可扩展的训练差分隐私深度学习

摘要: 差分隐私随机梯度下降（DP-SGD）是在差分隐私（DP）下训练机器学习模型的标准算法。最常见的DP-SGD隐私账户依赖于泊松子抽样来确保理论上的DP保证。使用泊松子抽样实现计算高效的DP-SGD并不是一件简单的事情，这导致许多实现忽视了这一要求。我们进行了一项全面的实证研究，通过重新实现使用泊松子抽样的高效方法并对其进行基准测试，来量化在DP下训练深度学习模型的计算成本。我们发现，在PyTorch中使用Opacus的朴素实现DP-SGD的每秒处理训练样本的吞吐量比SGD低2.6到8倍。然而，使用高效的梯度裁剪实现，如Ghost Clipping，可以将这一成本大致减半。我们提出了使用JAX实现DP-SGD的替代计算高效的方法，使用泊松子抽样，在PyTorch基础上的吞吐量仅比SGD低约1.2倍。我们强调了使用JAX时的重要实现考虑事项。最后，我们研究了使用多达80个GPU进行扩展的行为，并发现DP-SGD比SGD更好地扩展。我们在https://github.com/DPBayes/Towards-Efficient-Scalable-Training-DP-DL分享了我们重新实现的使用泊松子抽样的代码。

更新时间: 2024-10-27 09:32:52

领域: cs.LG,cs.CR,cs.DC

下载: http://arxiv.org/abs/2406.17298v2

Lodge++: High-quality and Long Dance Generation with Vivid Choreography Patterns

We propose Lodge++, a choreography framework to generate high-quality, ultra-long, and vivid dances given the music and desired genre. To handle the challenges in computational efficiency, the learning of complex and vivid global choreography patterns, and the physical quality of local dance movements, Lodge++ adopts a two-stage strategy to produce dances from coarse to fine. In the first stage, a global choreography network is designed to generate coarse-grained dance primitives that capture complex global choreography patterns. In the second stage, guided by these dance primitives, a primitive-based dance diffusion model is proposed to further generate high-quality, long-sequence dances in parallel, faithfully adhering to the complex choreography patterns. Additionally, to improve the physical plausibility, Lodge++ employs a penetration guidance module to resolve character self-penetration, a foot refinement module to optimize foot-ground contact, and a multi-genre discriminator to maintain genre consistency throughout the dance. Lodge++ is validated by extensive experiments, which show that our method can rapidly generate ultra-long dances suitable for various dance genres, ensuring well-organized global choreography patterns and high-quality local motion.

Updated: 2024-10-27 09:32:35

标题: 露店++：具有生动编舞模式的高质量和长舞蹈生成

摘要: 我们提出了Lodge++，一个编排框架，可以根据音乐和所需的流派生成高质量、超长且生动的舞蹈。为了解决计算效率、学习复杂和生动的全局编排模式以及本地舞蹈动作的物理质量等挑战，Lodge++采用了两阶段策略，从粗粒度到细粒度生成舞蹈。在第一阶段，设计了一个全局编排网络，用于生成捕捉复杂全局编排模式的粗粒度舞蹈基元。在第二阶段，根据这些舞蹈基元，提出了一个基元驱动的舞蹈扩散模型，进一步并行生成高质量、长序列的舞蹈，忠实地遵循复杂的编排模式。此外，为了提高物理可信度，Lodge++采用了一个穿透引导模块来解决角色自穿透问题，一个脚部优化模块来优化脚与地面的接触，以及一个多流派鉴别器来在整个舞蹈过程中保持流派一致性。通过大量实验证明，我们的方法能够快速生成适用于各种舞蹈流派的超长舞蹈，确保良好组织的全局编排模式和高质量的本地动作。

更新时间: 2024-10-27 09:32:35

领域: cs.CV,cs.AI,cs.GR

下载: http://arxiv.org/abs/2410.20389v1

Unsupervised Feature Selection Algorithm Based on Dual Manifold Re-ranking

High-dimensional data is commonly encountered in numerous data analysis tasks. Feature selection techniques aim to identify the most representative features from the original high-dimensional data. Due to the absence of class label information, it is significantly more challenging to select appropriate features in unsupervised learning scenarios compared to supervised ones. Traditional unsupervised feature selection methods typically score the features of samples based on certain criteria, treating samples indiscriminately. However, these approaches fail to fully capture the internal structure of the data. The importance of different samples should vary, and there is a dual relationship between the weight of samples and features that will influence each other. Therefore, an unsupervised feature selection algorithm based on dual manifold re-ranking (DMRR) is proposed in this paper. Different similarity matrices are constructed to depict the manifold structures among samples, between samples and features, and among features themselves. Then, manifold re-ranking is performed by combining the initial scores of samples and features. By comparing DMRR with three original unsupervised feature selection algorithms and two unsupervised feature selection post-processing algorithms, experimental results confirm that the importance information of different samples and the dual relationship between sample and feature are beneficial for achieving better feature selection.

Updated: 2024-10-27 09:29:17

标题: 基于双流形重新排序的无监督特征选择算法

摘要: 高维数据在许多数据分析任务中经常遇到。特征选择技术旨在从原始高维数据中识别最具代表性的特征。由于缺乏类标签信息，与监督学习场景相比，在无监督学习场景中选择适当的特征要困难得多。传统的无监督特征选择方法通常根据某些标准对样本的特征进行评分，对样本进行不加区分的处理。然而，这些方法未能充分捕捉数据的内部结构。不同样本的重要性应该有所不同，样本和特征之间的权重之间存在双重关系，彼此会相互影响。因此，本文提出了一种基于双重流形重新排名（DMRR）的无监督特征选择算法。构建了不同的相似矩阵来描述样本之间、样本与特征之间以及特征本身之间的流形结构。然后，通过结合样本和特征的初始得分进行流形重新排名。通过将DMRR与三种原始的无监督特征选择算法和两种无监督特征选择后处理算法进行比较，实验结果证实了不同样本的重要信息和样本与特征之间的双重关系有助于实现更好的特征选择。

更新时间: 2024-10-27 09:29:17

领域: cs.LG

下载: http://arxiv.org/abs/2410.20388v1

Accelerating Nash Equilibrium Convergence in Monte Carlo Settings Through Counterfactual Value Based Fictitious Play

Counterfactual Regret Minimization (CFR) and its variants are widely recognized as effective algorithms for solving extensive-form imperfect information games. Recently, many improvements have been focused on enhancing the convergence speed of the CFR algorithm. However, most of these variants are not applicable under Monte Carlo (MC) conditions, making them unsuitable for training in large-scale games. We introduce a new MC-based algorithm for solving extensive-form imperfect information games, called MCCFVFP (Monte Carlo Counterfactual Value-Based Fictitious Play). MCCFVFP combines CFR's counterfactual value calculations with fictitious play's best response strategy, leveraging the strengths of fictitious play to gain significant advantages in games with a high proportion of dominated strategies. Experimental results show that MCCFVFP achieved convergence speeds approximately 20\%$\sim$50\% faster than the most advanced MCCFR variants in games like poker and other test games.

Updated: 2024-10-27 09:16:16

标题: 通过基于反事实价值的虚拟对抗学加速蒙特卡洛环境中的Nash均衡收敛

摘要: 反事实后悔最小化（CFR）及其变体被广泛认为是解决广泛形式不完美信息博弈的有效算法。最近，许多改进都集中在提高CFR算法的收敛速度上。然而，大多数这些变体在蒙特卡罗（MC）条件下不适用，使它们不适用于大规模游戏的训练。我们引入了一种新的基于MC的算法，用于解决广泛形式不完美信息博弈，称为MCCFVFP（蒙特卡罗反事实值基虚拟博弈）。MCCFVFP将CFR的反事实值计算与虚拟博弈的最佳响应策略相结合，利用虚拟博弈的优势，在具有高比例支配策略的游戏中获得显著优势。实验结果表明，MCCFVFP在类似扑克和其他测试游戏中，收敛速度比最先进的MCCFR变体快约20％至50％。

更新时间: 2024-10-27 09:16:16

领域: cs.AI,cs.GT,cs.LG

下载: http://arxiv.org/abs/2309.03084v4

Addressing the Pitfalls of Image-Based Structural Health Monitoring: A Focus on False Positives, False Negatives, and Base Rate Bias

This study explores the limitations of image-based structural health monitoring (SHM) techniques in detecting structural damage. Leveraging machine learning and computer vision, image-based SHM offers a scalable and efficient alternative to manual inspections. However, its reliability is impacted by challenges such as false positives, false negatives, and environmental variability, particularly in low base rate damage scenarios. The Base Rate Bias plays a significant role, as low probabilities of actual damage often lead to misinterpretation of positive results. This study uses both Bayesian analysis and a frequentist approach to evaluate the precision of damage detection systems, revealing that even highly accurate models can yield misleading results when the occurrence of damage is rare. Strategies for mitigating these limitations are discussed, including hybrid systems that combine multiple data sources, human-in-the-loop approaches for critical assessments, and improving the quality of training data. These findings provide essential insights into the practical applicability of image-based SHM techniques, highlighting both their potential and their limitations for real-world infrastructure monitoring.

Updated: 2024-10-27 09:15:05

标题: 解决基于图像的结构健康监测的缺陷：关注假阳性、假阴性和基本率偏差

摘要: 这项研究探讨了基于图像的结构健康监测（SHM）技术在检测结构损伤方面的局限性。利用机器学习和计算机视觉，基于图像的SHM提供了一种可扩展和高效的替代手动检查的方法。然而，其可靠性受到诸如虚假阳性、虚假阴性和环境变化等挑战的影响，尤其是在低基础损伤率情景中。基础率偏误起着重要作用，因为实际损伤的低概率往往会导致对正面结果的误解。本研究采用贝叶斯分析和频率主义方法来评估损伤检测系统的精确度，揭示了即使是高度准确的模型在损伤发生率较低时也可能产生误导性结果。讨论了减轻这些局限性的策略，包括结合多种数据源的混合系统、人在环中的方法进行关键评估，以及改进训练数据的质量。这些发现为基于图像的SHM技术的实际适用性提供了重要见解，突显了它们在实际基础设施监测中的潜力和局限性。

更新时间: 2024-10-27 09:15:05

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2410.20384v1

Multiple kernel concept factorization algorithm based on global fusion

Non-negative Matrix Factorization(NMF) algorithm can only be used to find low rank approximation of original non-negative data while Concept Factorization(CF) algorithm extends matrix factorization to single non-linear kernel space, improving learning ability and adaptability of matrix factorization. In unsupervised environment, to design or select proper kernel function for specific dataset, a new algorithm called Globalized Multiple Kernel CF(GMKCF)was proposed. Multiple candidate kernel functions were input in the same time and learned in the CF framework based on global linear fusion, obtaining a clustering result with high quality and stability and solving the problem of kernel function selection that the CF faced. The convergence of the proposed algorithm was verified by solving the model with alternate iteration. The experimental results on several real databases show that the proposed algorithm outperforms comparison algorithms in data clustering, such as Kernel K-Means(KKM), Spectral Clustering(SC), Kernel CF(KCF), Co-regularized multi-view spectral clustering(Coreg), and Robust Multiple KKM(RMKKM).

Updated: 2024-10-27 09:13:57

标题: 基于全局融合的多核概念因子分解算法

摘要: 非负矩阵分解（NMF）算法仅可用于找到原始非负数据的低秩近似，而概念分解（CF）算法将矩阵分解扩展到单个非线性核空间，提高了矩阵分解的学习能力和适应性。在无监督环境中，为了为特定数据集设计或选择适当的核函数，提出了一种称为全局化多核CF（GMKCF）的新算法。同时输入多个候选核函数，并在基于全局线性融合的CF框架中学习，获得具有高质量和稳定性的聚类结果，并解决CF面临的核函数选择问题。通过交替迭代解决模型验证了所提出算法的收敛性。在几个真实数据库上的实验结果表明，所提出的算法在数据聚类方面优于比较算法，如核K均值（KKM）、谱聚类（SC）、核CF（KCF）、共正则化多视角谱聚类（Coreg）和鲁棒多核K均值（RMKKM）。

更新时间: 2024-10-27 09:13:57

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2410.20383v1

WildOcc: A Benchmark for Off-Road 3D Semantic Occupancy Prediction

3D semantic occupancy prediction is an essential part of autonomous driving, focusing on capturing the geometric details of scenes. Off-road environments are rich in geometric information, therefore it is suitable for 3D semantic occupancy prediction tasks to reconstruct such scenes. However, most of researches concentrate on on-road environments, and few methods are designed for off-road 3D semantic occupancy prediction due to the lack of relevant datasets and benchmarks. In response to this gap, we introduce WildOcc, to our knowledge, the first benchmark to provide dense occupancy annotations for off-road 3D semantic occupancy prediction tasks. A ground truth generation pipeline is proposed in this paper, which employs a coarse-to-fine reconstruction to achieve a more realistic result. Moreover, we introduce a multi-modal 3D semantic occupancy prediction framework, which fuses spatio-temporal information from multi-frame images and point clouds at voxel level. In addition, a cross-modality distillation function is introduced, which transfers geometric knowledge from point clouds to image features.

Updated: 2024-10-27 09:11:07

标题: WildOcc：用于越野3D语义占用预测的基准

摘要: 3D语义占用预测是自动驾驶的重要组成部分，着重捕捉场景的几何细节。越野环境富含几何信息，因此适合用于重建这样的场景的3D语义占用预测任务。然而，大多数研究集中在道路环境，很少有方法专为越野3D语义占用预测而设计，原因是缺乏相关数据集和基准。为弥补这一空白，我们引入了WildOcc，据我们所知，这是第一个为越野3D语义占用预测任务提供密集占用注释的基准。本文提出了一个地面真实生成管道，采用由粗到细的重建方法实现更加逼真的结果。此外，我们引入了一个多模态3D语义占用预测框架，融合来自多帧图像和点云的时空信息在体素级别。此外，还引入了一个跨模态蒸馏函数，将点云的几何知识转移到图像特征中。

更新时间: 2024-10-27 09:11:07

领域: cs.CV,cs.AI,cs.RO

下载: http://arxiv.org/abs/2410.15792v2

Multi-Agent Target Assignment and Path Finding for Intelligent Warehouse: A Cooperative Multi-Agent Deep Reinforcement Learning Perspective

Multi-agent target assignment and path planning (TAPF) are two key problems in intelligent warehouse. However, most literature only addresses one of these two problems separately. In this study, we propose a method to simultaneously solve target assignment and path planning from a perspective of cooperative multi-agent deep reinforcement learning (RL). To the best of our knowledge, this is the first work to model the TAPF problem for intelligent warehouse to cooperative multi-agent deep RL, and the first to simultaneously address TAPF based on multi-agent deep RL. Furthermore, previous literature rarely considers the physical dynamics of agents. In this study, the physical dynamics of the agents is considered. Experimental results show that our method performs well in various task settings, which means that the target assignment is solved reasonably well and the planned path is almost shortest. Moreover, our method is more time-efficient than baselines.

Updated: 2024-10-27 09:09:58

标题: 智能仓库的多智能体目标分配和路径规划：合作多智能体深度强化学习视角

摘要: 多智能体目标分配和路径规划（TAPF）是智能仓库中的两个关键问题。然而，大多数文献只单独解决这两个问题中的一个。在本研究中，我们提出了一种方法，从合作多智能体深度强化学习（RL）的角度同时解决目标分配和路径规划问题。据我们所知，这是第一项将TAPF问题建模为智能仓库的合作多智能体深度RL的工作，也是第一次基于多智能体深度RL同时解决TAPF问题。此外，先前的文献很少考虑智能体的物理动态。在本研究中，考虑了智能体的物理动态。实验结果表明，我们的方法在各种任务设置中表现良好，这意味着目标分配得到合理解决，计划路径几乎最短。此外，我们的方法比基线更具时间效率。

更新时间: 2024-10-27 09:09:58

领域: cs.AI,cs.MA

下载: http://arxiv.org/abs/2408.13750v3

FuseFL: One-Shot Federated Learning through the Lens of Causality with Progressive Model Fusion

One-shot Federated Learning (OFL) significantly reduces communication costs in FL by aggregating trained models only once. However, the performance of advanced OFL methods is far behind the normal FL. In this work, we provide a causal view to find that this performance drop of OFL methods comes from the isolation problem, which means that local isolatedly trained models in OFL may easily fit to spurious correlations due to the data heterogeneity. From the causal perspective, we observe that the spurious fitting can be alleviated by augmenting intermediate features from other clients. Built upon our observation, we propose a novel learning approach to endow OFL with superb performance and low communication and storage costs, termed as FuseFL. Specifically, FuseFL decomposes neural networks into several blocks, and progressively trains and fuses each block following a bottom-up manner for feature augmentation, introducing no additional communication costs. Comprehensive experiments demonstrate that FuseFL outperforms existing OFL and ensemble FL by a significant margin. We conduct comprehensive experiments to show that FuseFL supports high scalability of clients, heterogeneous model training, and low memory costs. Our work is the first attempt using causality to analyze and alleviate data heterogeneity of OFL.

Updated: 2024-10-27 09:07:10

标题: FuseFL：透过因果性的视角进行一次性联邦学习，并实现渐进式模型融合

摘要: 一次性联合学习（OFL）通过仅对训练模型进行一次聚合显著降低了FL中的通信成本。然而，先进的OFL方法的性能远远落后于正常的FL。在这项工作中，我们提供了一个因果视角，发现OFL方法的性能下降来自于孤立问题，这意味着OFL中局部孤立训练的模型可能很容易适应由于数据异质性而产生的虚假相关性。从因果的角度来看，我们观察到虚假拟合可以通过从其他客户端增加中间特征来缓解。基于我们的观察，我们提出了一种新颖的学习方法，赋予OFL出色的性能和低通信和存储成本，称为FuseFL。具体来说，FuseFL将神经网络分解为几个块，并按照自下而上的方式逐步训练和融合每个块，以进行特征增强，不引入额外的通信成本。全面的实验表明，FuseFL在很大程度上优于现有的OFL和集成FL。我们进行了全面的实验，展示了FuseFL支持客户端的高可扩展性、异构模型训练和低内存成本。我们的工作是利用因果关系来分析和缓解OFL数据异质性的首次尝试。

更新时间: 2024-10-27 09:07:10

领域: cs.LG,cs.AI,cs.DC,cs.NI

下载: http://arxiv.org/abs/2410.20380v1

R-CoT: Reverse Chain-of-Thought Problem Generation for Geometric Reasoning in Large Multimodal Models

Existing Large Multimodal Models (LMMs) struggle with mathematical geometric reasoning due to a lack of high-quality image-text paired data. Current geometric data generation approaches, which apply preset templates to generate geometric data or use Large Language Models (LLMs) to rephrase questions and answers (Q&A), unavoidably limit data accuracy and diversity. To synthesize higher-quality data, we propose a two-stage Reverse Chain-of-Thought (R-CoT) geometry problem generation pipeline. First, we introduce GeoChain to produce high-fidelity geometric images and corresponding descriptions highlighting relations among geometric elements. We then design a Reverse A&Q method that reasons step-by-step based on the descriptions and generates questions in reverse from the reasoning results. Experiments demonstrate that the proposed method brings significant and consistent improvements on multiple LMM baselines, achieving new performance records in the 2B, 7B, and 8B settings. Notably, R-CoT-8B significantly outperforms previous state-of-the-art open-source mathematical models by 16.6% on MathVista and 9.2% on GeoQA, while also surpassing the closed-source model GPT-4o by an average of 13% across both datasets. The code is available at https://github.com/dle666/R-CoT.

Updated: 2024-10-27 09:02:01

标题: R-CoT：大型多模态模型中几何推理的反向思维链问题生成

摘要: 现有的大型多模态模型（LMMs）由于缺乏高质量的图像-文本配对数据而在数学几何推理方面遇到困难。当前的几何数据生成方法，即应用预设模板生成几何数据或使用大型语言模型（LLMs）重新表述问题和答案（Q&A），不可避免地限制了数据的准确性和多样性。为了合成更高质量的数据，我们提出了一个两阶段的逆向思维链（R-CoT）几何问题生成管线。首先，我们引入了GeoChain来生成高保真度的几何图像和相应的描述，突出几何元素之间的关系。然后，我们设计了一个逆向A&Q方法，基于描述逐步推理，从推理结果中逆向生成问题。实验证明，所提出的方法在多个LMM基线上带来了显著且一致的改进，在2B、7B和8B设置中取得了新的性能记录。值得注意的是，R-CoT-8B在MathVista上比以前的最先进的开源数学模型提高了16.6％，在GeoQA上提高了9.2％，同时在两个数据集上平均超过了13％的闭源模型GPT-4o。该代码可在https://github.com/dle666/R-CoT 上找到。

更新时间: 2024-10-27 09:02:01

领域: cs.AI,cs.CV

下载: http://arxiv.org/abs/2410.17885v2

SeTAR: Out-of-Distribution Detection with Selective Low-Rank Approximation

Out-of-distribution (OOD) detection is crucial for the safe deployment of neural networks. Existing CLIP-based approaches perform OOD detection by devising novel scoring functions or sophisticated fine-tuning methods. In this work, we propose SeTAR, a novel, training-free OOD detection method that leverages selective low-rank approximation of weight matrices in vision-language and vision-only models. SeTAR enhances OOD detection via post-hoc modification of the model's weight matrices using a simple greedy search algorithm. Based on SeTAR, we further propose SeTAR+FT, a fine-tuning extension optimizing model performance for OOD detection tasks. Extensive evaluations on ImageNet1K and Pascal-VOC benchmarks show SeTAR's superior performance, reducing the relatively false positive rate by up to 18.95% and 36.80% compared to zero-shot and fine-tuning baselines. Ablation studies further validate SeTAR's effectiveness, robustness, and generalizability across different model backbones. Our work offers a scalable, efficient solution for OOD detection, setting a new state-of-the-art in this area.

Updated: 2024-10-27 08:44:46

标题: SeTAR：使用选择性低秩逼近进行异常检测

摘要: 超出分布（OOD）检测对于神经网络的安全部署至关重要。现有基于CLIP的方法通过设计新颖的评分函数或复杂的微调方法来执行OOD检测。在这项工作中，我们提出了SeTAR，一种新颖的、无需训练的OOD检测方法，它利用视觉-语言和仅视觉模型中权重矩阵的选择性低秩逼近。SeTAR通过使用简单的贪心搜索算法对模型的权重矩阵进行事后修改来增强OOD检测。基于SeTAR，我们进一步提出SeTAR+FT，一种微调扩展，优化模型性能以执行OOD检测任务。在ImageNet1K和Pascal-VOC基准上进行的大量评估显示SeTAR的优越性能，相对于零样本和微调基线，将相对误报率降低了高达18.95%和36.80%。消融研究进一步验证了SeTAR在不同模型骨干上的有效性、稳健性和泛化能力。我们的工作为OOD检测提供了可扩展、高效的解决方案，在这一领域树立了新的技术水平。

更新时间: 2024-10-27 08:44:46

领域: cs.CL,cs.AI,cs.CV

下载: http://arxiv.org/abs/2406.12629v2

Active Preference Learning for Ordering Items In- and Out-of-sample

Learning an ordering of items based on pairwise comparisons is useful when items are difficult to rate consistently on an absolute scale, for example, when annotators have to make subjective assessments. When exhaustive comparison is infeasible, actively sampling item pairs can reduce the number of annotations necessary for learning an accurate ordering. However, many algorithms ignore shared structure between items, limiting their sample efficiency and precluding generalization to new items. It is also common to disregard how noise in comparisons varies between item pairs, despite it being informative of item similarity. In this work, we study active preference learning for ordering items with contextual attributes, both in- and out-of-sample. We give an upper bound on the expected ordering error of a logistic preference model as a function of which items have been compared. Next, we propose an active learning strategy that samples items to minimize this bound by accounting for aleatoric and epistemic uncertainty in comparisons. We evaluate the resulting algorithm, and a variant aimed at reducing model misspecification, in multiple realistic ordering tasks with comparisons made by human annotators. Our results demonstrate superior sample efficiency and generalization compared to non-contextual ranking approaches and active preference learning baselines.

Updated: 2024-10-27 08:36:13

标题: 主动偏好学习用于对样本内外项目进行排序

摘要: 学习基于成对比较的项目排序在项目难以在绝对尺度上一致评价时非常有用，例如，当注释者必须做主观评估时。当耗尽比较是不可行的时候，主动采样项目对可以减少学习准确排序所需的注释数量。然而，许多算法忽略了项目之间的共享结构，限制了它们的样本效率，并排除了对新项目的泛化。忽略比较中的噪声在项目对之间如何变化也很常见，尽管它对项目相似性具有信息量。在这项工作中，我们研究了具有上下文属性的项目排序的主动偏好学习，包括样本内和样本外。我们给出了一个 logistic 偏好模型的预期排序误差的上限，该上限是已比较的项目的函数。接下来，我们提出了一种主动学习策略，通过考虑比较中的偶然和认知不确定性，对最小化这一上限进行样本取样。我们评估了结果算法，以及一个旨在减少模型错误规范化的变体，在多个由人类注释者进行比较的实际排序任务中。我们的结果证明了与非上下文排序方法和主动偏好学习基线相比，具有更好的样本效率和泛化能力。

更新时间: 2024-10-27 08:36:13

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2405.03059v2

Decoupled Kullback-Leibler Divergence Loss

In this paper, we delve deeper into the Kullback-Leibler (KL) Divergence loss and mathematically prove that it is equivalent to the Decoupled Kullback-Leibler (DKL) Divergence loss that consists of 1) a weighted Mean Square Error (wMSE) loss and 2) a Cross-Entropy loss incorporating soft labels. Thanks to the decomposed formulation of DKL loss, we have identified two areas for improvement. Firstly, we address the limitation of KL/DKL in scenarios like knowledge distillation by breaking its asymmetric optimization property. This modification ensures that the $\mathbf{w}$MSE component is always effective during training, providing extra constructive cues. Secondly, we introduce class-wise global information into KL/DKL to mitigate bias from individual samples. With these two enhancements, we derive the Improved Kullback-Leibler (IKL) Divergence loss and evaluate its effectiveness by conducting experiments on CIFAR-10/100 and ImageNet datasets, focusing on adversarial training, and knowledge distillation tasks. The proposed approach achieves new state-of-the-art adversarial robustness on the public leaderboard -- RobustBench and competitive performance on knowledge distillation, demonstrating the substantial practical merits. Our code is available at https://github.com/jiequancui/DKL.

Updated: 2024-10-27 08:32:11

标题: 解耦的Kullback-Leibler散度损失

摘要: 在这篇论文中，我们深入研究了Kullback-Leibler（KL）散度损失，并数学上证明了它等效于由加权均方误差（wMSE）损失和包含软标签的交叉熵损失构成的Decoupled Kullback-Leibler（DKL）散度损失。由于DKL损失的分解形式，我们确定了两个改进的方向。首先，我们解决了KL/DKL在知识蒸馏等场景中的局限性，打破了其不对称优化性质。这种修改确保了$\mathbf{w}$MSE成分在训练期间始终有效，提供额外的建设性线索。其次，我们将类别级全局信息引入KL/DKL中，以减轻来自个体样本的偏差。通过这两个增强，我们推导出了Improved Kullback-Leibler（IKL）散度损失，并通过在CIFAR-10/100和ImageNet数据集上进行实验评估其有效性，重点放在对抗训练和知识蒸馏任务上。所提出的方法在公共排行榜RobustBench上取得了新的最先进对抗鲁棒性，并在知识蒸馏方面表现出竞争性能，展示了实质性的实用优点。我们的代码可以在https://github.com/jiequancui/DKL上找到。

更新时间: 2024-10-27 08:32:11

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2305.13948v3

Semi-supervised Symmetric Non-negative Matrix Factorization with Low-Rank Tensor Representation

Semi-supervised symmetric non-negative matrix factorization (SNMF) utilizes the available supervisory information (usually in the form of pairwise constraints) to improve the clustering ability of SNMF. The previous methods introduce the pairwise constraints from the local perspective, i.e., they either directly refine the similarity matrix element-wisely or restrain the distance of the decomposed vectors in pairs according to the pairwise constraints, which overlook the global perspective, i.e., in the ideal case, the pairwise constraint matrix and the ideal similarity matrix possess the same low-rank structure. To this end, we first propose a novel semi-supervised SNMF model by seeking low-rank representation for the tensor synthesized by the pairwise constraint matrix and a similarity matrix obtained by the product of the embedding matrix and its transpose, which could strengthen those two matrices simultaneously from a global perspective. We then propose an enhanced SNMF model, making the embedding matrix tailored to the above tensor low-rank representation. We finally refine the similarity matrix by the strengthened pairwise constraints. We repeat the above steps to continuously boost the similarity matrix and pairwise constraint matrix, leading to a high-quality embedding matrix. Extensive experiments substantiate the superiority of our method. The code is available at https://github.com/JinaLeejnl/TSNMF.

Updated: 2024-10-27 08:23:08

标题: 半监督对称非负矩阵分解与低秩张量表示

摘要: 半监督对称非负矩阵分解（SNMF）利用可用的监督信息（通常以成对约束的形式）来提高SNMF的聚类能力。先前的方法从局部视角引入成对约束，即它们要么直接按元素精细调整相似性矩阵，要么根据成对约束限制分解向量的距离，这忽视了全局视角，即在理想情况下，成对约束矩阵和理想相似性矩阵具有相同的低秩结构。为此，我们首先提出了一种新颖的半监督SNMF模型，通过寻求由成对约束矩阵和通过嵌入矩阵及其转置的乘积获得的相似性矩阵合成的张量的低秩表示，从全局视角同时加强这两个矩阵。然后，我们提出了一个增强的SNMF模型，使嵌入矩阵适应上述张量低秩表示。最后，我们通过加强的成对约束来调整相似性矩阵。我们重复以上步骤以持续提升相似性矩阵和成对约束矩阵，从而得到高质量的嵌入矩阵。大量实验证实了我们方法的优越性。代码可在https://github.com/JinaLeejnl/TSNMF获得。

更新时间: 2024-10-27 08:23:08

领域: cs.LG

下载: http://arxiv.org/abs/2405.02688v2

Open-Vocabulary Object Detection via Language Hierarchy

Recent studies on generalizable object detection have attracted increasing attention with additional weak supervision from large-scale datasets with image-level labels. However, weakly-supervised detection learning often suffers from image-to-box label mismatch, i.e., image-level labels do not convey precise object information. We design Language Hierarchical Self-training (LHST) that introduces language hierarchy into weakly-supervised detector training for learning more generalizable detectors. LHST expands the image-level labels with language hierarchy and enables co-regularization between the expanded labels and self-training. Specifically, the expanded labels regularize self-training by providing richer supervision and mitigating the image-to-box label mismatch, while self-training allows assessing and selecting the expanded labels according to the predicted reliability. In addition, we design language hierarchical prompt generation that introduces language hierarchy into prompt generation which helps bridge the vocabulary gaps between training and testing. Extensive experiments show that the proposed techniques achieve superior generalization performance consistently across 14 widely studied object detection datasets.

Updated: 2024-10-27 08:20:03

标题: 通过语言层次结构进行开放词汇目标检测

摘要: 最近关于可泛化物体检测的研究引起了越来越多的关注，这些研究利用大规模数据集的图像级标签提供额外的弱监督。然而，弱监督检测学习经常受到图像到框标签不匹配的困扰，即图像级标签不能传达精确的物体信息。我们设计了语言层级自训练（LHST），将语言层次结构引入弱监督检测器训练中，以学习更可泛化的检测器。LHST通过语言层次结构扩展图像级标签，并使扩展标签与自训练之间进行协同正则化。具体而言，扩展标签通过提供更丰富的监督并减轻图像到框标签不匹配来正则化自训练，而自训练则允许根据预测的可靠性评估和选择扩展标签。此外，我们设计了语言层次提示生成，将语言层次结构引入提示生成，有助于弥合训练和测试之间的词汇差距。大量实验表明，所提出的技术在14个广泛研究的物体检测数据集上始终实现卓越的泛化性能。

更新时间: 2024-10-27 08:20:03

领域: cs.CV,cs.AI,cs.CL

下载: http://arxiv.org/abs/2410.20371v1

CATS: Contextually-Aware Thresholding for Sparsity in Large Language Models

Large Language Models (LLMs) have dramatically advanced AI applications, yet their deployment remains challenging due to their immense inference costs. Recent studies ameliorate the computational costs of LLMs by increasing their activation sparsity but suffer from significant performance degradation on downstream tasks. In this work, we introduce a new framework for sparsifying the activations of base LLMs and reducing inference costs, dubbed Contextually Aware Thresholding for Sparsity (CATS). CATS is relatively simple, easy to implement, and highly effective. At the heart of our framework is a new non-linear activation function. We demonstrate that CATS can be applied to various base models, including Mistral-7B and Llama2-7B, and outperforms existing sparsification techniques in downstream task performance. More precisely, CATS-based models often achieve downstream task performance within 1-2% of their base models without any fine-tuning and even at activation sparsity levels of 50%. Furthermore, CATS-based models converge faster and display better task performance than competing techniques when fine-tuning is applied. Finally, we develop a custom GPU kernel for efficient implementation of CATS that translates the activation of sparsity of CATS to real wall-clock time speedups. Our custom kernel implementation of CATS results in a ~15% improvement in wall-clock inference latency of token generation on both Llama-7B and Mistral-7B.

Updated: 2024-10-27 08:15:39

标题: CATS：大型语言模型中稀疏性的上下文感知阈值化

摘要: 大型语言模型（LLMs）已经极大地推进了人工智能应用，然而它们的部署仍然具有挑战性，因为它们巨大的推理成本。最近的研究通过增加它们的激活稀疏性来改善LLMs的计算成本，但在下游任务中遭受了显著的性能下降。在这项工作中，我们引入了一个新的框架，用于稀疏化基本LLMs的激活并降低推理成本，命名为上下文感知阈值稀疏性（CATS）。CATS相对简单，易于实现，并且非常有效。我们框架的核心是一个新的非线性激活函数。我们证明CATS可以应用于各种基本模型，包括Mistral-7B和Llama2-7B，并且在下游任务性能上胜过现有的稀疏化技术。更具体地说，基于CATS的模型通常在没有任何微调的情况下，甚至在50％的激活稀疏水平下，实现下游任务性能与其基础模型相近1-2％。此外，当应用微调时，基于CATS的模型收敛更快，显示比竞争技术更好的任务性能。最后，我们为CATS的高效实现开发了一个定制的GPU内核，将CATS的激活稀疏性转换为真实的挂钟时间加速。我们对CATS的定制内核实现导致在Llama-7B和Mistral-7B上的令牌生成的挂钟推理延迟大约提高了15％。

更新时间: 2024-10-27 08:15:39

领域: cs.LG,cs.CL

下载: http://arxiv.org/abs/2404.08763v3

StreamingDialogue: Prolonged Dialogue Learning via Long Context Compression with Minimal Losses

Standard Large Language Models (LLMs) struggle with handling dialogues with long contexts due to efficiency and consistency issues. According to our observation, dialogue contexts are highly structured, and the special token of \textit{End-of-Utterance} (EoU) in dialogues has the potential to aggregate information. We refer to the EoU tokens as ``conversational attention sinks'' (conv-attn sinks). Accordingly, we introduce StreamingDialogue, which compresses long dialogue history into conv-attn sinks with minimal losses, and thus reduces computational complexity quadratically with the number of sinks (i.e., the number of utterances). Current LLMs already demonstrate the ability to handle long context window, e.g., a window size of 200K or more. To this end, by compressing utterances into EoUs, our method has the potential to handle more than 200K of utterances, resulting in a prolonged dialogue learning. In order to minimize information losses from reconstruction after compression, we design two learning strategies of short-memory reconstruction (SMR) and long-memory reactivation (LMR). Our method outperforms strong baselines in dialogue tasks and achieves a 4 $\times$ speedup while reducing memory usage by 18 $\times$ compared to dense attention recomputation.

Updated: 2024-10-27 08:14:55

标题: 流式对话：通过最小损失进行长上下文压缩的长时间对话学习

摘要: 大型语言模型（LLM）在处理具有长上下文的对话时存在效率和一致性问题。根据我们的观察，对话上下文具有高度结构化，而对话中的特殊标记\textit{End-of-Utterance}（EoU）具有聚合信息的潜力。我们将EoU标记称为“会话关注池”（conv-attn sinks）。因此，我们引入了StreamingDialogue，将长对话历史压缩为conv-attn sinks，减少最小损失，从而使计算复杂度随着sinks数量（即utterances数量）的平方减少。当前的LLM已经表现出处理长上下文窗口的能力，例如窗口大小为200K或更大。通过将话语压缩成EoUs，我们的方法有潜力处理超过200K个话语，实现长时间对话学习。为了最小化压缩后的重建信息损失，我们设计了短期记忆重建（SMR）和长期记忆再激活（LMR）两种学习策略。我们的方法在对话任务中优于强基线，并在减少内存使用18倍的同时实现4倍速度提升，相比于密集注意力重新计算。

更新时间: 2024-10-27 08:14:55

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2403.08312v2

C-MCTS: Safe Planning with Monte Carlo Tree Search

The Constrained Markov Decision Process (CMDP) formulation allows to solve safety-critical decision making tasks that are subject to constraints. While CMDPs have been extensively studied in the Reinforcement Learning literature, little attention has been given to sampling-based planning algorithms such as MCTS for solving them. Previous approaches perform conservatively with respect to costs as they avoid constraint violations by using Monte Carlo cost estimates that suffer from high variance. We propose Constrained MCTS (C-MCTS), which estimates cost using a safety critic that is trained with Temporal Difference learning in an offline phase prior to agent deployment. The critic limits exploration by pruning unsafe trajectories within MCTS during deployment. C-MCTS satisfies cost constraints but operates closer to the constraint boundary, achieving higher rewards than previous work. As a nice byproduct, the planner is more efficient w.r.t. planning steps. Most importantly, under model mismatch between the planner and the real world, C-MCTS is less susceptible to cost violations than previous work.

Updated: 2024-10-27 08:11:16

标题: C-MCTS：蒙特卡洛树搜索的安全规划

摘要: 受到约束的马尔可夫决策过程（CMDP）的制定允许解决受到约束的安全关键决策任务。虽然CMDPs在强化学习文献中得到了广泛研究，但对于使用基于采样的规划算法（如MCTS）来解决它们的研究却很少。先前的方法在成本方面表现保守，因为它们通过使用蒙特卡洛成本估计来避免约束违规，而这些估计存在很高的方差。我们提出了约束MCTS（C-MCTS），它使用一个在代理部署之前通过时间差分学习进行训练的安全评论家来估计成本。评论家通过在MCTS中修剪不安全的轨迹来限制探索。C-MCTS满足成本约束，但在约束边界附近操作，实现比先前工作更高的奖励。作为一个很好的副产品，这个规划者在规划步骤方面更有效率。最重要的是，在规划者和真实世界之间存在模型不匹配的情况下，C-MCTS比先前的工作更不容易发生成本违规。

更新时间: 2024-10-27 08:11:16

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2305.16209v4

On $f$-Divergence Principled Domain Adaptation: An Improved Framework

Unsupervised domain adaptation (UDA) plays a crucial role in addressing distribution shifts in machine learning. In this work, we improve the theoretical foundations of UDA proposed in Acuna et al. (2021) by refining their $f$-divergence-based discrepancy and additionally introducing a new measure, $f$-domain discrepancy ($f$-DD). By removing the absolute value function and incorporating a scaling parameter, $f$-DD obtains novel target error and sample complexity bounds, allowing us to recover previous KL-based results and bridging the gap between algorithms and theory presented in Acuna et al. (2021). Using a localization technique, we also develop a fast-rate generalization bound. Empirical results demonstrate the superior performance of $f$-DD-based learning algorithms over previous works in popular UDA benchmarks.

Updated: 2024-10-27 07:54:21

标题: 基于$f$-散度原则的领域自适应：一个改进的框架

摘要: 无监督领域适应（UDA）在解决机器学习中的分布转移方面发挥着至关重要的作用。在这项工作中，我们通过改进Acuna等人（2021年）提出的UDA的理论基础，通过改进他们基于$f$-散度的差异，并另外引入一个新的度量，$f$-域差异（$f$-DD）。通过去除绝对值函数并加入缩放参数，$f$-DD获得了新颖的目标误差和样本复杂性界限，使我们能够恢复先前基于KL的结果，并弥合Acuna等人（2021年）中呈现的算法和理论之间的差距。利用一种定位技术，我们还开发了一个快速泛化界限。经验证实验结果表明，$f$-DD为基础的学习算法在流行的UDA基准测试中优于先前的作品。

更新时间: 2024-10-27 07:54:21

领域: stat.ML,cs.CV,cs.LG

下载: http://arxiv.org/abs/2402.01887v2

TorchOpera: A Compound AI System for LLM Safety

We introduce TorchOpera, a compound AI system for enhancing the safety and quality of prompts and responses for Large Language Models. TorchOpera ensures that all user prompts are safe, contextually grounded, and effectively processed, while enhancing LLM responses to be relevant and high quality. TorchOpera utilizes the vector database for contextual grounding, rule-based wrappers for flexible modifications, and specialized mechanisms for detecting and adjusting unsafe or incorrect content. We also provide a view of the compound AI system to reduce the computational cost. Extensive experiments show that TorchOpera ensures the safety, reliability, and applicability of LLMs in real-world settings while maintaining the efficiency of LLM responses.

Updated: 2024-10-27 07:53:40

标题: TorchOpera：一种用于LLM安全的复合人工智能系统

摘要: 我们介绍了TorchOpera，这是一个用于增强大型语言模型提示和响应的安全性和质量的复合人工智能系统。TorchOpera确保所有用户提示是安全的、与上下文相关的，并且能够有效处理，同时增强LLM的响应具有相关性和高质量。TorchOpera利用向量数据库进行上下文相关性，基于规则的包装器进行灵活修改，并使用专门的机制来检测和调整不安全或不正确的内容。我们还提供了一个复合人工智能系统的视图，以降低计算成本。大量实验证明，TorchOpera确保了LLM在现实环境中的安全性、可靠性和适用性，同时保持了LLM响应的效率。

更新时间: 2024-10-27 07:53:40

领域: cs.AI,cs.CE,cs.CL,cs.MA

下载: http://arxiv.org/abs/2406.10847v2

Optimization Hyper-parameter Laws for Large Language Models

Large Language Models have driven significant AI advancements, yet their training is resource-intensive and highly sensitive to hyper-parameter selection. While scaling laws provide valuable guidance on model size and data requirements, they fall short in choosing dynamic hyper-parameters, such as learning-rate (LR) schedules, that evolve during training. To bridge this gap, we present Optimization Hyper-parameter Laws (Opt-Laws), a framework that effectively captures the relationship between hyper-parameters and training outcomes, enabling the pre-selection of potential optimal schedules. Grounded in stochastic differential equations, Opt-Laws introduce novel mathematical interpretability and offer a robust theoretical foundation for some popular LR schedules. Our extensive validation across diverse model sizes and data scales demonstrates Opt-Laws' ability to accurately predict training loss and identify optimal LR schedule candidates in pre-training, continual training, and fine-tuning scenarios. This approach significantly reduces computational costs while enhancing overall model performance.

Updated: 2024-10-27 07:53:21

标题: 大语言模型的优化超参数规律

摘要: 大型语言模型推动了重大的人工智能进展，然而它们的训练需要大量资源，并且对超参数的选择非常敏感。虽然缩放定律为模型大小和数据需求提供了有价值的指导，但在选择动态超参数（如学习率调度）方面存在不足，这些超参数在训练过程中会发生变化。为了弥补这一差距，我们提出了优化超参数定律（Opt-Laws）框架，有效捕捉超参数与训练结果之间的关系，从而实现潜在最佳调度的预先选择。基于随机微分方程，Opt-Laws引入了新颖的数学可解释性，并为一些流行的学习率调度提供了坚实的理论基础。我们在不同模型大小和数据规模上进行了广泛验证，证明了Opt-Laws能够准确预测训练损失，并在预训练、持续训练和微调场景中识别出最佳学习率调度候选人。这种方法显著降低了计算成本，同时提高了整体模型性能。

更新时间: 2024-10-27 07:53:21

领域: cs.LG,math.OC

下载: http://arxiv.org/abs/2409.04777v2

A Tight Lower Bound on Adaptively Secure Full-Information Coin Flip

In a distributed coin-flipping protocol, Blum [ACM Transactions on Computer Systems '83], the parties try to output a common (close to) uniform bit, even when some adversarially chosen parties try to bias the common output. In an adaptively secure full-information coin flip, Ben-Or and Linial [FOCS '85], the parties communicate over a broadcast channel, and a computationally unbounded adversary can choose which parties to corrupt along the protocol execution. Ben-Or and Linial proved that the $n$-party majority protocol is resilient to $O(\sqrt{n})$ corruptions (ignoring poly-logarithmic factors), and conjectured this is a tight upper bound for any $n$-party protocol (of any round complexity). Their conjecture was proved to be correct for single-turn (each party sends a single message) single-bit (a message is one bit) protocols Lichtenstein, Linial and Saks [Combinatorica '89], symmetric protocols Goldwasser, Tauman Kalai and Park [ICALP '15], and recently for (arbitrary message length) single-turn protocols Tauman Kalai, Komargodski and Raz [DISC '18]. Yet, the question of many-turn protocols was left entirely open. In this work, we close the above gap, proving that no $n$-party protocol (of any round complexity) is resilient to $\omega(\sqrt{n})$ (adaptive) corruptions.

Updated: 2024-10-27 07:52:14

标题: 自适应安全全信息硬币翻转的严格下界

摘要: 在一个分布式的硬币翻转协议中，Blum [ACM Transactions on Computer Systems '83]，参与方尝试输出一个共同的（接近）均匀的比特，即使一些经过敌意选择的参与方试图偏向共同输出。在一个自适应安全的全信息硬币翻转中，Ben-Or和Linial [FOCS '85]，参与方通过广播信道进行通信，一个计算能力无限的对手可以选择在协议执行过程中腐化哪些参与方。Ben-Or和Linial证明了$n$方多数派协议对$O(\sqrt{n})$的腐化是有弹性的（忽略多对数因子），并猜测这是任何$n$方协议（无论回合复杂性）的紧密上限。他们的猜测被证明对于单轮（每个参与方发送一条消息）单比特（一条消息是一个比特）协议Lichtenstein，Linial和Saks [Combinatorica '89]，对称协议Goldwasser，Tauman Kalai和Park [ICALP '15]，最近对于（任意消息长度）单轮协议Tauman Kalai，Komargodski和Raz [DISC '18]是正确的。然而，对于多轮协议的问题完全没有解决。在这项工作中，我们填补了上述差距，证明没有任何$n$方协议（无论回合复杂性）对$\omega(\sqrt{n})$（自适应）腐化是有弹性的。

更新时间: 2024-10-27 07:52:14

领域: cs.CR,F.0; G.3

下载: http://arxiv.org/abs/2005.01565v3

Rethinking Reconstruction-based Graph-Level Anomaly Detection: Limitations and a Simple Remedy

Graph autoencoders (Graph-AEs) learn representations of given graphs by aiming to accurately reconstruct them. A notable application of Graph-AEs is graph-level anomaly detection (GLAD), whose objective is to identify graphs with anomalous topological structures and/or node features compared to the majority of the graph population. Graph-AEs for GLAD regard a graph with a high mean reconstruction error (i.e. mean of errors from all node pairs and/or nodes) as anomalies. Namely, the methods rest on the assumption that they would better reconstruct graphs with similar characteristics to the majority. We, however, report non-trivial counter-examples, a phenomenon we call reconstruction flip, and highlight the limitations of the existing Graph-AE-based GLAD methods. Specifically, we empirically and theoretically investigate when this assumption holds and when it fails. Through our analyses, we further argue that, while the reconstruction errors for a given graph are effective features for GLAD, leveraging the multifaceted summaries of the reconstruction errors, beyond just mean, can further strengthen the features. Thus, we propose a novel and simple GLAD method, named MUSE. The key innovation of MUSE involves taking multifaceted summaries of reconstruction errors as graph features for GLAD. This surprisingly simple method obtains SOTA performance in GLAD, performing best overall among 14 methods across 10 datasets.

Updated: 2024-10-27 07:41:51

标题: 重新思考基于重建的图级异常检测：局限性和简单的解决方案

摘要: 图自编码器（Graph-AEs）通过旨在准确重建给定图形来学习它们的表示。图自编码器的一个显著应用是图级异常检测（GLAD），其目标是识别拥有异常拓扑结构和/或节点特征的图形，与大多数图形群体相比。用于GLAD的图自编码器将具有高平均重建误差（即所有节点对和/或节点的误差的平均值）的图形视为异常。换言之，这些方法基于这样的假设，即它们可以更好地重建具有与大多数相似特征的图形。然而，我们报告了非平凡的反例，我们称之为重建翻转，并突出了现有基于图自编码器的GLAD方法的局限性。具体来说，我们在实证和理论上研究了这种假设何时成立，何时失败。通过我们的分析，我们进一步论证，虽然给定图形的重建误差对GLAD是有效特征，但利用重建误差的多方面汇总，而不仅仅是平均值，可以进一步加强这些特征。因此，我们提出一种新颖简单的GLAD方法，名为MUSE。MUSE的关键创新在于将重建误差的多方面汇总作为图形特征用于GLAD。这种令人惊讶的简单方法在GLAD中获得了SOTA性能，在10个数据集中的14种方法中表现最佳。

更新时间: 2024-10-27 07:41:51

领域: cs.LG,cs.SI

下载: http://arxiv.org/abs/2410.20366v1

Rethinking Data Synthesis: A Teacher Model Training Recipe with Interpretation

Recent advances in large language model (LLM) training have highlighted the need for diverse, high-quality instruction data. Recently, many works are exploring synthetic data generation using LLMs. However, they primarily focus on prompt engineering with standard supervised instruction-finetuned models, which contains a fundamental limitation: these models are optimized for general question-answering/problem-solving rather than data generation. We propose a paradigm shift named \textbf{NOMAD} by investigating how to specifically train models for data generation, demonstrating that this task differs significantly from training a classical LM. We identify two key factors: no-prompt-masked training and proper training set size selection. Our method, NOMAD, shows substantial improvements over baselines, achieving >4\% gains in TriviaQA and >2\% in GSM8K with limited training data. Finally, we offer new insights by interpreting synthetic data through the lenses of "relevance" and "novelty".

Updated: 2024-10-27 07:38:39

标题: 重新思考数据综合：一种带有解释的教师模型训练配方

摘要: 最近在大型语言模型（LLM）训练方面取得的进展突显出对多样化、高质量指导数据的需求。最近，许多研究正在探索使用LLM生成合成数据。然而，它们主要侧重于使用标准监督指导微调模型进行提示工程，其中包含一个基本限制：这些模型针对的是一般性问题回答/问题解决，而不是数据生成。我们提出了一种名为\textbf{NOMAD}的新范式，通过研究如何专门训练模型进行数据生成，展示了这一任务与训练传统LM有显著差异。我们确定了两个关键因素：无提示掩码训练和适当的训练集大小选择。我们的方法NOMAD在基线上显示出显著改进，在TriviaQA上实现了超过4\%的增益，在GSM8K上实现了超过2\%的增益，且使用有限的训练数据。最后，我们通过"相关性"和"新颖性"的视角解释合成数据，提供了新的见解。

更新时间: 2024-10-27 07:38:39

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2410.20362v1

A Proximal Gradient Method With Probabilistic Multi-Gossip Communications for Decentralized Composite Optimization

Decentralized optimization methods with local updates have recently gained attention for their provable ability to communication acceleration. In these methods, nodes perform several iterations of local computations between the communication rounds. Nevertheless, this capability is effective only when the loss function is smooth and the network is sufficiently well-connected. In this paper, we propose a communication-efficient method MG-Skip with probabilistic local updates and multi-gossip communications for decentralized composite (smooth + nonsmooth) optimization, whose stepsize is independent of the number of local updates and the network topology. Without any additional condition for network connectivity, MG-Skip allows for the multi-gossip communications to be skipped in most iterations in the strongly convex setting, while its iteration complexity is $\mathcal{O}\left(\kappa \log \frac{1}{\epsilon}\right)$ and communication complexity is only $\mathcal{O}\left(\sqrt{\frac{\kappa}{(1-\rho)}} \log \frac{1}{\epsilon}\right)$, where $\kappa$ is the condition number of the loss function, $\rho$ reflects the connectivity of the network topology, and $\epsilon$ is the target accuracy. The theoretical results demonstrate that MG-Skip achieves the optimal communication complexity and confirm the benefits of local updates in the nonsmooth setup.

Updated: 2024-10-27 07:37:14

标题: 一种基于概率多向传播通信的分布式复合优化的近端梯度方法

摘要: 最近，具有局部更新的分散优化方法因其可证明的通信加速能力而受到关注。在这些方法中，节点在通信轮之间执行多次局部计算迭代。然而，这种能力仅在损失函数平滑且网络足够连接时有效。本文提出了一种通信高效的方法MG-Skip，采用概率性局部更新和多次消息传递通信，用于分散式复合（平滑+非平滑）优化，其步长独立于局部更新次数和网络拓扑结构。在没有对网络连接性的额外条件的情况下，MG-Skip允许在强凸设置中在大多数迭代中跳过多次消息传递通信，其迭代复杂度为$\mathcal{O}\left(\kappa \log \frac{1}{\epsilon}\right)$，通信复杂度仅为$\mathcal{O}\left(\sqrt{\frac{\kappa}{(1-\rho)}} \log \frac{1}{\epsilon}\right)$，其中$\kappa$是损失函数的条件数，$\rho$反映了网络拓扑的连接性，$\epsilon$是目标准确度。理论结果表明，MG-Skip实现了最佳通信复杂度，并验证了在非平滑设置中局部更新的好处。

更新时间: 2024-10-27 07:37:14

领域: math.OC,cs.LG

下载: http://arxiv.org/abs/2312.11861v2

Inapproximability of Sparsest Vector in a Real Subspace

We establish strong inapproximability for finding the sparsest nonzero vector in a real subspace. We show that it is NP-Hard (under randomized reductions) to approximate the sparsest vector in a subspace within any constant factor (or almost polynomial factors in quasipolynomial time). We recover as a corollary state of the art inapproximability for the shortest vector problem (SVP), a foundational problem in lattice based cryptography. Our proof is surprisingly simple, bypassing even the PCP theorem. We are inspired by the homogenization framework from the inapproximability theory of minimum distance problems (MDC) in integer lattices and error correcting codes. We use a combination of (a) \emph{product testing via tensor codes} and (b) \emph{encoding an assignment as a coset of a random code in higher dimensional space} in order to embed non-homogeneous quadratic equations into the sparsest vector problem. (a) is inspired by Austrin and Khot's simplified proof of hardness of MDC over finite fields, and (b) is inspired by Micciancio's semi-derandomization of hardness of SVP. Our reduction involves the challenge of performing (a) over the reals. We prove that tensoring of the kernel of a +1/-1 random matrix furnishes an adequate product test (while still allowing (b)). The proof exposes a connection to Littlewood-Offord theory and relies on a powerful anticoncentration result of Rudelson and Vershynin. Our main motivation in this work is the development of inapproximability theory for problems over the reals. Analytic variants of sparsest vector have connections to small set expansion, quantum separability and polynomial maximization over convex sets, all of which cause similar barriers to inapproximability. The approach we develop could lead to progress on the hardness of some of these problems.

Updated: 2024-10-27 07:27:33

标题: 无法在实向量空间中近似稀疏向量

摘要: 我们建立了在实数子空间中找到稀疏非零向量的强近似难度。我们展示了在任何常数因子（或几乎多项式时间内）内近似子空间中的稀疏向量是NP困难的（在随机化约化下）。我们作为一个推论恢复了最短向量问题（SVP）的最新近似难度，这是基于格密码学的基础问题。我们的证明出奇地简单，甚至绕过了PCP定理。我们受到了整数格和错误更正码中最小距离问题（MDC）不可近似性理论的均匀化框架的启发。我们使用（a）通过张量码进行乘积测试和（b）将分配编码为高维空间中随机码的余类的组合，以便将非均匀二次方程嵌入稀疏向量问题中。（a）受到Austrin和Khot对有限域上MDC难度的简化证明的启发，（b）受到Micciancio对SVP难度的半去随机化的启发。我们的减少涉及在实数上执行（a）的挑战。我们证明了+1/-1随机矩阵的核的张量化提供了足够的乘积测试（同时仍然允许（b））。证明揭示了与Littlewood-Offord理论的联系，并依赖于Rudelson和Vershynin的强反集中结果。我们在这项工作中的主要动机是为实数问题的不可近似性理论的发展。稀疏向量的解析变体与小集合扩展、量子可分性和凸集上的多项式最大化有关，所有这些都导致不可近似性的类似障碍。我们开发的方法可能会推动这些问题的难度的进展。

更新时间: 2024-10-27 07:27:33

领域: cs.CC,cs.CR

下载: http://arxiv.org/abs/2410.02636v2

Conditional GAN for Enhancing Diffusion Models in Efficient and Authentic Global Gesture Generation from Audios

Audio-driven simultaneous gesture generation is vital for human-computer communication, AI games, and film production. While previous research has shown promise, there are still limitations. Methods based on VAEs are accompanied by issues of local jitter and global instability, whereas methods based on diffusion models are hampered by low generation efficiency. This is because the denoising process of DDPM in the latter relies on the assumption that the noise added at each step is sampled from a unimodal distribution, and the noise values are small. DDIM borrows the idea from the Euler method for solving differential equations, disrupts the Markov chain process, and increases the noise step size to reduce the number of denoising steps, thereby accelerating generation. However, simply increasing the step size during the step-by-step denoising process causes the results to gradually deviate from the original data distribution, leading to a significant drop in the quality of the generated actions and the emergence of unnatural artifacts. In this paper, we break the assumptions of DDPM and achieves breakthrough progress in denoising speed and fidelity. Specifically, we introduce a conditional GAN to capture audio control signals and implicitly match the multimodal denoising distribution between the diffusion and denoising steps within the same sampling step, aiming to sample larger noise values and apply fewer denoising steps for high-speed generation.

Updated: 2024-10-27 07:25:11

标题: 条件GAN用于增强扩散模型，从音频中高效和真实地生成全球手势

摘要: 音频驱动的同时手势生成对于人机交流、人工智能游戏和电影制作至关重要。尽管先前的研究表明了一定的希望，但仍存在一些限制。基于VAE的方法伴随着局部抖动和全局不稳定的问题，而基于扩散模型的方法受制于低生成效率。这是因为后者中DDPM的去噪过程依赖于一个假设，即每一步添加的噪声都是从单峰分布中采样的，并且噪声值较小。DDIM借鉴了欧拉方法解决微分方程的思想，打破了马尔可夫链过程，增加了噪声步长以减少去噪步骤的数量，从而加速生成过程。然而，在逐步去噪过程中简单地增加步长会导致结果逐渐偏离原始数据分布，导致生成动作的质量显著下降，出现不自然的瑕疵。本文打破了DDPM的假设，并在去噪速度和保真度方面取得了突破性进展。具体来说，我们引入了一个条件GAN来捕捉音频控制信号，并隐含地匹配扩散和去噪步骤之间在同一采样步骤内的多模态去噪分布，旨在对较大的噪声值进行采样，并应用较少的去噪步骤以实现高速生成。

更新时间: 2024-10-27 07:25:11

领域: cs.SD,cs.AI,cs.CV,cs.GR,eess.AS

下载: http://arxiv.org/abs/2410.20359v1

RopeTP: Global Human Motion Recovery via Integrating Robust Pose Estimation with Diffusion Trajectory Prior

We present RopeTP, a novel framework that combines Robust pose estimation with a diffusion Trajectory Prior to reconstruct global human motion from videos. At the heart of RopeTP is a hierarchical attention mechanism that significantly improves context awareness, which is essential for accurately inferring the posture of occluded body parts. This is achieved by exploiting the relationships with visible anatomical structures, enhancing the accuracy of local pose estimations. The improved robustness of these local estimations allows for the reconstruction of precise and stable global trajectories. Additionally, RopeTP incorporates a diffusion trajectory model that predicts realistic human motion from local pose sequences. This model ensures that the generated trajectories are not only consistent with observed local actions but also unfold naturally over time, thereby improving the realism and stability of 3D human motion reconstruction. Extensive experimental validation shows that RopeTP surpasses current methods on two benchmark datasets, particularly excelling in scenarios with occlusions. It also outperforms methods that rely on SLAM for initial camera estimates and extensive optimization, delivering more accurate and realistic trajectories.

Updated: 2024-10-27 07:19:39

标题: RopeTP: 通过将鲁棒姿势估计与扩散轨迹先验相结合实现全局人体运动恢复

摘要: 我们提出了一种新颖的框架RopeTP，将鲁棒姿态估计与扩散轨迹先验相结合，从视频中重构全局人体运动。RopeTP的核心是一种分层注意机制，显著提高了上下文感知能力，这对准确推断被遮挡身体部位的姿势至关重要。通过利用与可见解剖结构的关系，增强了局部姿势估计的准确性。这些局部估计的改进鲁棒性使得可以重构精确稳定的全局轨迹。此外，RopeTP还包括一个扩散轨迹模型，从局部姿势序列预测出逼真的人体运动。该模型确保生成的轨迹不仅与观察到的局部动作一致，而且随着时间自然展开，从而提高了3D人体运动重构的逼真度和稳定性。大量实验证明，RopeTP在两个基准数据集上超越了当前方法，特别在存在遮挡的场景中表现出色。它也优于依赖SLAM进行初始相机估计和广泛优化的方法，提供更准确和逼真的轨迹。

更新时间: 2024-10-27 07:19:39

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2410.20358v1

Dynamics as Prompts: In-Context Learning for Sim-to-Real System Identifications

Sim-to-real transfer remains a significant challenge in robotics due to the discrepancies between simulated and real-world dynamics. Traditional methods like Domain Randomization often fail to capture fine-grained dynamics, limiting their effectiveness for precise control tasks. In this work, we propose a novel approach that dynamically adjusts simulation environment parameters online using in-context learning. By leveraging past interaction histories as context, our method adapts the simulation environment dynamics to real-world dynamics without requiring gradient updates, resulting in faster and more accurate alignment between simulated and real-world performance. We validate our approach across two tasks: object scooping and table air hockey. In the sim-to-sim evaluations, our method significantly outperforms the baselines on environment parameter estimation by 80% and 42% in the object scooping and table air hockey setups, respectively. Furthermore, our method achieves at least 70% success rate in sim-to-real transfer on object scooping across three different objects. By incorporating historical interaction data, our approach delivers efficient and smooth system identification, advancing the deployment of robots in dynamic real-world scenarios. Demos are available on our project page: https://sim2real-capture.github.io/

Updated: 2024-10-27 07:13:38

标题: 动态作为提示：用于Sim-to-Real系统识别的上下文学习

摘要: 模拟到真实的转移在机器人领域仍然是一个重要挑战，这是因为模拟和真实世界动态之间存在差异。传统方法如域随机化通常无法捕捉到细粒度的动态，限制了它们在精确控制任务中的有效性。在这项工作中，我们提出了一种新颖的方法，通过上下文学习在线动态调整模拟环境参数。通过利用过去的交互历史作为上下文，我们的方法使模拟环境动态适应真实世界动态，而无需梯度更新，从而实现了模拟和真实世界性能之间更快速和更准确的对齐。我们在两个任务中验证了我们的方法：物体铲取和桌面空气曲棍球。在模拟到模拟的评估中，我们的方法在物体铲取和桌面空气曲棍球设置中的环境参数估计方面分别比基线提高了80%和42%。此外，我们的方法在物体铲取的模拟到真实转移中至少达到了70%的成功率，覆盖了三种不同的物体。通过整合历史交互数据，我们的方法实现了高效且平稳的系统识别，推动了机器人在动态真实世界场景中的部署。我们的项目页面上提供了演示：https://sim2real-capture.github.io/

更新时间: 2024-10-27 07:13:38

领域: cs.RO,cs.AI

下载: http://arxiv.org/abs/2410.20357v1

Uncovering Capabilities of Model Pruning in Graph Contrastive Learning

Graph contrastive learning has achieved great success in pre-training graph neural networks without ground-truth labels. Leading graph contrastive learning follows the classical scheme of contrastive learning, forcing model to identify the essential information from augmented views. However, general augmented views are produced via random corruption or learning, which inevitably leads to semantics alteration. Although domain knowledge guided augmentations alleviate this issue, the generated views are domain specific and undermine the generalization. In this work, motivated by the firm representation ability of sparse model from pruning, we reformulate the problem of graph contrastive learning via contrasting different model versions rather than augmented views. We first theoretically reveal the superiority of model pruning in contrast to data augmentations. In practice, we take original graph as input and dynamically generate a perturbed graph encoder to contrast with the original encoder by pruning its transformation weights. Furthermore, considering the integrity of node embedding in our method, we are capable of developing a local contrastive loss to tackle the hard negative samples that disturb the model training. We extensively validate our method on various benchmarks regarding graph classification via unsupervised and transfer learning. Compared to the state-of-the-art (SOTA) works, better performance can always be obtained by the proposed method.

Updated: 2024-10-27 07:09:31

标题: 揭示图对比学习中模型剪枝能力

摘要: 图对比学习在无地面真实标签的情况下，已经在预训练图神经网络中取得了巨大成功。领先的图对比学习遵循对比学习的经典方案，迫使模型从增强视图中识别出关键信息。然而，一般的增强视图是通过随机损坏或学习产生的，这不可避免地导致了语义的改变。虽然领域知识引导的增强可以缓解这个问题，但生成的视图是领域特定的，削弱了泛化能力。在这项工作中，受到剪枝稀疏模型的坚实表示能力的启发，我们重新制定了图对比学习问题，通过对比不同模型版本而不是增强视图。首先从理论上揭示了模型剪枝相对于数据增强的优越性。在实践中，我们将原始图作为输入，并通过剪枝其转换权重来动态生成一个扰动图编码器，与原始编码器进行对比。此外，考虑到我们方法中节点嵌入的完整性，我们能够开发一种局部对比损失来解决干扰模型训练的难以处理的负样本。我们在各种基准测试中广泛验证了我们的方法，关于图分类通过无监督学习和迁移学习。与最先进的工作相比，我们提出的方法总是能够获得更好的性能。

更新时间: 2024-10-27 07:09:31

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2410.20356v1

A Model for Intelligible Interaction Between Agents That Predict and Explain

Machine Learning (ML) has emerged as a powerful form of data modelling with widespread applicability beyond its roots in the design of autonomous agents. However, relatively little attention has been paid to the interaction between people and ML systems. In this paper we view interaction between humans and ML systems within the broader context of communication between agents capable of prediction and explanation. We formalise the interaction model by taking agents to be automata with some special characteristics and define a protocol for communication between such agents. We define One- and Two-Way Intelligibility as properties that emerge at run-time by execution of the protocol. The formalisation allows us to identify conditions under which run-time sequences are bounded, and identify conditions under which the protocol can correctly implement an axiomatic specification of intelligible interaction between a human and an ML system. We also demonstrate using the formal model to: (a) identify instances of One- and Two-Way Intelligibility in literature reports on humans interacting with ML systems providing logic-based explanations, as is done in Inductive Logic Programming (ILP); and (b) map interactions between humans and machines in an elaborate natural-language based dialogue-model to One- or Two-Way Intelligible interactions in the formal model.

Updated: 2024-10-27 07:08:57

标题: 一个预测和解释智能互动之间的模型

摘要: 机器学习（ML）已经成为一种强大的数据建模形式，除了在设计自主代理方面的根源之外，它还具有广泛的适用性。然而，相对较少的关注已经付给了人类与ML系统之间的互动。在本文中，我们将人类与ML系统之间的互动视为能够进行预测和解释的代理之间通信的更广泛背景。我们通过将代理视为具有一些特殊特征的自动机，并定义了一种用于这些代理之间通信的协议来形式化互动模型。我们定义了一路和双向可理解性作为通过执行协议而在运行时出现的属性。形式化使我们能够识别运行时序列受限的条件，并确定协议可以正确实现人类与ML系统之间可理解互动的公理化规范的条件。我们还演示了使用形式模型来：（a）在与ML系统互动提供基于逻辑的解释的文献报告中识别一路和双向可理解性的实例，就像归纳逻辑编程（ILP）中所做的那样；以及（b）将基于精心设计的自然语言对话模型中的人类与机器之间的互动映射到形式模型中的一路或双向可理解的互动。

更新时间: 2024-10-27 07:08:57

领域: cs.AI,cs.HC,cs.LG

下载: http://arxiv.org/abs/2301.01819v2

Vanilla Feedforward Neural Networks as a Discretization of Dynamical Systems

Deep learning has made significant applications in the field of data science and natural science. Some studies have linked deep neural networks to dynamic systems, but the network structure is restricted to the residual network. It is known that residual networks can be regarded as a numerical discretization of dynamic systems. In this paper, we back to the classical network structure and prove that the vanilla feedforward networks could also be a numerical discretization of dynamic systems, where the width of the network is equal to the dimension of the input and output. Our proof is based on the properties of the leaky-ReLU function and the numerical technique of splitting method to solve differential equations. Our results could provide a new perspective for understanding the approximation properties of feedforward neural networks.

Updated: 2024-10-27 06:56:09

标题: 香草前馈神经网络作为动力系统的离散化

摘要: 深度学习在数据科学和自然科学领域取得了重要应用。一些研究将深度神经网络与动态系统联系起来，但网络结构受限于残差网络。已知残差网络可以被视为动态系统的数值离散化。本文回归经典网络结构，并证明普通前馈网络也可以被视为动态系统的数值离散化，其中网络的宽度等于输入和输出的维度。我们的证明基于leaky-ReLU函数的特性和分裂方法求解微分方程的数值技术。我们的结果可以为理解前馈神经网络的逼近性质提供新的视角。

更新时间: 2024-10-27 06:56:09

领域: cs.LG,cs.NA,math.NA,68T07, 65P99, 65Z05, 41A65

下载: http://arxiv.org/abs/2209.10909v3

FoldMark: Protecting Protein Generative Models with Watermarking

Protein structure is key to understanding protein function and is essential for progress in bioengineering, drug discovery, and molecular biology. Recently, with the incorporation of generative AI, the power and accuracy of computational protein structure prediction/design have been improved significantly. However, ethical concerns such as copyright protection and harmful content generation (biosecurity) pose challenges to the wide implementation of protein generative models. Here, we investigate whether it is possible to embed watermarks into protein generative models and their outputs for copyright authentication and the tracking of generated structures. As a proof of concept, we propose a two-stage method FoldMark as a generalized watermarking strategy for protein generative models. FoldMark first pretrain watermark encoder and decoder, which can minorly adjust protein structures to embed user-specific information and faithfully recover the information from the encoded structure. In the second step, protein generative models are fine-tuned with watermark Low-Rank Adaptation (LoRA) modules to preserve generation quality while learning to generate watermarked structures with high recovery rates. Extensive experiments are conducted on open-source protein structure prediction models (e.g., ESMFold and MultiFlow) and de novo structure design models (e.g., FrameDiff and FoldFlow) and we demonstrate that our method is effective across all these generative models. Meanwhile, our watermarking framework only exerts a negligible impact on the original protein structure quality and is robust under potential post-processing and adaptive attacks.

Updated: 2024-10-27 06:53:46

标题: FoldMark：使用水印技术保护蛋白质生成模型

摘要: 蛋白质结构是理解蛋白质功能的关键，对于生物工程、药物发现和分子生物学的进展至关重要。最近，随着生成式人工智能的整合，计算蛋白质结构预测/设计的能力和准确性得到了显著提高。然而，诸如版权保护和有害内容生成（生物安全）等伦理问题对蛋白质生成模型的广泛实施提出了挑战。在这里，我们研究了是否可以将水印嵌入蛋白质生成模型及其输出中，用于版权认证和追踪生成的结构。作为概念验证，我们提出了一种名为FoldMark的两阶段方法，作为蛋白质生成模型的一种通用水印策略。FoldMark首先对水印编码器和解码器进行预训练，可以微调蛋白质结构以嵌入用户特定信息，并忠实地从编码结构中恢复信息。在第二步中，蛋白质生成模型通过水印低秩适应（LoRA）模块进行微调，以在学习生成带有高恢复率的水印结构的同时保持生成质量。我们在开源蛋白质结构预测模型（如ESMFold和MultiFlow）和全新结构设计模型（如FrameDiff和FoldFlow）上进行了大量实验，并证明我们的方法在所有这些生成模型上都是有效的。与此同时，我们的水印框架对原始蛋白质结构质量几乎没有影响，并且在潜在的后处理和自适应攻击下具有鲁棒性。

更新时间: 2024-10-27 06:53:46

领域: cs.CR,cs.LG,q-bio.BM

下载: http://arxiv.org/abs/2410.20354v1

An approach to hummed-tune and song sequences matching

Melody stuck in your head, also known as "earworm", is tough to get rid of, unless you listen to it again or sing it out loud. But what if you can not find the name of that song? It must be an intolerable feeling. Recognizing a song name base on humming sound is not an easy task for a human being and should be done by machines. However, there is no research paper published about hum tune recognition. Adapting from Hum2Song Zalo AI Challenge 2021 - a competition about querying the name of a song by user's giving humming tune, which is similar to Google's Hum to Search. This paper covers details about the pre-processed data from the original type (mp3) to usable form for training and inference. In training an embedding model for the feature extraction phase, we ran experiments with some states of the art, such as ResNet, VGG, AlexNet, MobileNetV2. And for the inference phase, we use the Faiss module to effectively search for a song that matched the sequence of humming sound. The result comes at nearly 94\% in MRR@10 metric on the public test set, along with the top 1 result on the public leaderboard.

Updated: 2024-10-27 06:50:43

标题: 一种用于哼唱曲调和歌曲序列匹配的方法

摘要: 你的头脑中困扰着的旋律，也被称为“耳虫”，很难摆脱，除非你再次听它或大声唱出它。但如果你找不到那首歌的名字呢？这一定是一种无法忍受的感觉。根据哼唱声音来识别歌曲名字对人类来说并不是一件容易的事情，应该由机器来完成。然而，关于哼唱曲调识别的研究论文尚未发表。本文从Hum2Song Zalo AI Challenge 2021中获得了灵感，这是一个关于通过用户哼唱曲调来查询歌曲名字的比赛，类似于谷歌的Hum to Search。本文介绍了将原始类型（mp3）的数据预处理为可用于训练和推理的形式的详细信息。在为特征提取阶段训练嵌入模型时，我们进行了一些最先进技术的实验，如ResNet、VGG、AlexNet、MobileNetV2。对于推理阶段，我们使用Faiss模块来有效地搜索与哼唱声音序列匹配的歌曲。结果表明，在公共测试集上，MRR@10指标接近94\%，并且在公共排行榜上获得了第一名。

更新时间: 2024-10-27 06:50:43

领域: cs.SD,cs.AI,cs.IR,eess.AS

下载: http://arxiv.org/abs/2410.20352v1

CodeCloak: A Method for Evaluating and Mitigating Code Leakage by LLM Code Assistants

LLM-based code assistants are becoming increasingly popular among developers. These tools help developers improve their coding efficiency and reduce errors by providing real-time suggestions based on the developer's codebase. While beneficial, the use of these tools can inadvertently expose the developer's proprietary code to the code assistant service provider during the development process. In this work, we propose a method to mitigate the risk of code leakage when using LLM-based code assistants. CodeCloak is a novel deep reinforcement learning agent that manipulates the prompts before sending them to the code assistant service. CodeCloak aims to achieve the following two contradictory goals: (i) minimizing code leakage, while (ii) preserving relevant and useful suggestions for the developer. Our evaluation, employing StarCoder and Code Llama, LLM-based code assistants models, demonstrates CodeCloak's effectiveness on a diverse set of code repositories of varying sizes, as well as its transferability across different models. We also designed a method for reconstructing the developer's original codebase from code segments sent to the code assistant service (i.e., prompts) during the development process, to thoroughly analyze code leakage risks and evaluate the effectiveness of CodeCloak under practical development scenarios.

Updated: 2024-10-27 06:44:03

标题: CodeCloak：一种通过LLM代码助手评估和减轻代码泄漏的方法

摘要: 基于LLM的代码助手在开发者中越来越受欢迎。这些工具帮助开发者提高编码效率，减少错误，通过基于开发者代码库提供实时建议。尽管有益，使用这些工具可能会在开发过程中无意中将开发者的专有代码暴露给代码助手服务提供商。在本研究中，我们提出了一种方法来减轻使用基于LLM的代码助手时代码泄露的风险。CodeCloak是一种新颖的深度强化学习代理，可以在将提示发送给代码助手服务之前对其进行操作。CodeCloak的目标是实现以下两个矛盾的目标：(i)最小化代码泄露，同时(ii)保留对开发者有关和有用的建议。我们的评估使用StarCoder和Code Llama，LLM-based代码助手模型，在各种大小的代码存储库上展示了CodeCloak的有效性，以及它在不同模型之间的可转移性。我们还设计了一种方法来从在开发过程中发送到代码助手服务的代码片段（即提示）重建开发者的原始代码库，以彻底分析代码泄露风险，并评估CodeCloak在实际开发场景下的有效性。

更新时间: 2024-10-27 06:44:03

领域: cs.CR,cs.CL,cs.LG,cs.PL

下载: http://arxiv.org/abs/2404.09066v2

Leveraging Auxiliary Task Relevance for Enhanced Industrial Fault Diagnosis through Curriculum Meta-learning

The accurate diagnosis of machine breakdowns is crucial for maintaining operational safety in smart manufacturing. Despite the promise shown by deep learning in automating fault identification, the scarcity of labeled training data, particularly for equipment failure instances, poses a significant challenge. This limitation hampers the development of robust classification models. Existing methods like model-agnostic meta-learning (MAML) do not adequately address variable working conditions, affecting knowledge transfer. To address these challenges, a Related Task Aware Curriculum Meta-learning (RT-ACM) enhanced fault diagnosis framework is proposed in this paper, inspired by human cognitive learning processes. RT-ACM improves training by considering the relevance of auxiliary working conditions, adhering to the principle of ``paying more attention to more relevant knowledge", and focusing on ``easier first, harder later" curriculum sampling. This approach aids the meta-learner in achieving a superior convergence state. Extensive experiments on two real-world datasets demonstrate the superiority of RT-ACM framework.

Updated: 2024-10-27 06:32:41

标题: 利用辅助任务相关性通过课程元学习增强工业故障诊断

摘要: 智能制造中，准确诊断机器故障对于维护操作安全至关重要。尽管深度学习在自动化故障识别方面显示出潜力，但标记训练数据的稀缺性，尤其是对于设备故障实例的情况，构成了重大挑战。这一限制影响了健壮分类模型的发展。现有方法如模型无关元学习（MAML）未能充分解决变化的工作条件，影响知识转移。为了解决这些挑战，本文提出了一种基于相关任务感知课程元学习（RT-ACM）增强的故障诊断框架，灵感来源于人类认知学习过程。RT-ACM通过考虑辅助工作条件的相关性来改进训练，遵循“更关注更相关知识”原则，并专注于“先易后难”的课程采样。这种方法有助于元学习者达到更优越的收敛状态。对两个真实世界数据集的大量实验表明了RT-ACM框架的优越性。

更新时间: 2024-10-27 06:32:41

领域: cs.LG

下载: http://arxiv.org/abs/2410.20351v1

Idempotent Unsupervised Representation Learning for Skeleton-Based Action Recognition

Generative models, as a powerful technique for generation, also gradually become a critical tool for recognition tasks. However, in skeleton-based action recognition, the features obtained from existing pre-trained generative methods contain redundant information unrelated to recognition, which contradicts the nature of the skeleton's spatially sparse and temporally consistent properties, leading to undesirable performance. To address this challenge, we make efforts to bridge the gap in theory and methodology and propose a novel skeleton-based idempotent generative model (IGM) for unsupervised representation learning. More specifically, we first theoretically demonstrate the equivalence between generative models and maximum entropy coding, which demonstrates a potential route that makes the features of generative models more compact by introducing contrastive learning. To this end, we introduce the idempotency constraint to form a stronger consistency regularization in the feature space, to push the features only to maintain the critical information of motion semantics for the recognition task. Our extensive experiments on benchmark datasets, NTU RGB+D and PKUMMD, demonstrate the effectiveness of our proposed method. On the NTU 60 xsub dataset, we observe a performance improvement from 84.6$\%$ to 86.2$\%$. Furthermore, in zero-shot adaptation scenarios, our model demonstrates significant efficacy by achieving promising results in cases that were previously unrecognizable. Our project is available at \url{https://github.com/LanglandsLin/IGM}.

Updated: 2024-10-27 06:29:04

标题: 基于骨架的动作识别的幂等无监督表示学习

摘要: 生成模型作为一种强大的生成技术，也逐渐成为识别任务的重要工具。然而，在基于骨架的动作识别中，现有预训练生成方法获得的特征包含与识别无关的冗余信息，与骨架的空间稀疏和时间一致性特性相矛盾，导致性能不佳。为了解决这一挑战，我们努力在理论和方法论之间建立联系，并提出了一种新颖的基于骨架的幂等生成模型（IGM）进行无监督表示学习。具体来说，我们首先从理论上证明了生成模型和最大熵编码之间的等价性，展示了通过引入对比学习使生成模型的特征更加紧凑的潜在路径。为此，我们引入幂等性约束，形成在特征空间中更强的一致性规则，推动特征仅保持运动语义的关键信息以用于识别任务。我们在基准数据集NTU RGB+D和PKUMMD上进行了大量实验，证明了我们提出的方法的有效性。在NTU 60 xsub数据集上，我们观察到性能从84.6%提高到86.2%。此外，在零样本适应场景中，我们的模型通过在先前无法识别的案例中取得有希望的结果，展现出显著的效果。我们的项目可在\url{https://github.com/LanglandsLin/IGM}获取。

更新时间: 2024-10-27 06:29:04

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2410.20349v1

Virtual imaging trials improved the transparency and reliability of AI systems in COVID-19 imaging

The credibility of Artificial Intelligence (AI) models in medical imaging, particularly during the COVID-19 pandemic, has been challenged by reproducibility issues and obscured clinical insights. To address these concerns, we propose a Virtual Imaging Trials (VIT) framework, utilizing both clinical and simulated datasets to evaluate AI systems. This study focuses on using convolutional neural networks (CNNs) for COVID-19 diagnosis using computed tomography (CT) and chest radiography (CXR). We developed and tested multiple AI models, 3D ResNet-like and 2D EfficientNetv2 architectures, across diverse datasets. Our evaluation metrics included the area under the curve (AUC). Statistical analyses, such as the DeLong method for AUC confidence intervals, were employed to assess performance differences. Our findings demonstrate that VIT provides a robust platform for objective assessment, revealing significant influences of dataset characteristics, patient factors, and imaging physics on AI efficacy. Notably, models trained on the most diverse datasets showed the highest external testing performance, with AUC values ranging from 0.73 to 0.76 for CT and 0.70 to 0.73 for CXR. Internal testing yielded higher AUC values (0.77 to 0.85 for CT and 0.77 to 1.0 for CXR), highlighting a substantial drop in performance during external validation, which underscores the importance of diverse and comprehensive training and testing data. This approach enhances model transparency and reliability, offering nuanced insights into the factors driving AI performance and bridging the gap between experimental and clinical settings. The study underscores the potential of VIT to improve the reproducibility and clinical relevance of AI systems in medical imaging.

Updated: 2024-10-27 06:03:16

标题: 虚拟成像试验提高了COVID-19成像中人工智能系统的透明度和可靠性

摘要: 在COVID-19大流行期间，医学影像中人工智能（AI）模型的可信度受到了可重复性问题和模糊的临床见解的挑战。为解决这些问题，我们提出了一个虚拟影像试验（VIT）框架，利用临床和模拟数据集来评估AI系统。本研究重点研究了使用卷积神经网络（CNN）进行COVID-19诊断的方法，包括计算机断层扫描（CT）和胸部X射线摄影（CXR）。我们开发并测试了多个AI模型，包括3D ResNet样式和2D EfficientNetv2架构，跨不同数据集。我们的评估指标包括曲线下面积（AUC）。我们采用了统计分析方法，如DeLong方法用于AUC置信区间，以评估性能差异。我们的研究结果表明，VIT提供了一个稳健的平台，用于客观评估，揭示了数据集特征、患者因素和影像物理对AI效果的显著影响。值得注意的是，训练在最多样化数据集上的模型显示最高的外部测试性能，CT的AUC值范围从0.73到0.76，CXR的AUC值范围从0.70到0.73。内部测试产生更高的AUC值（CT为0.77到0.85，CXR为0.77到1.0），突出了在外部验证期间性能显著下降的重要性，这凸显了多样化和全面的训练和测试数据的重要性。这种方法增强了模型的透明度和可靠性，为深入了解推动AI性能的因素并弥合实验和临床环境之间的差距提供了细微见解。该研究强调了VIT在提高医学影像中AI系统的可重复性和临床相关性方面的潜力。

更新时间: 2024-10-27 06:03:16

领域: eess.IV,cs.LG

下载: http://arxiv.org/abs/2308.09730v3

Historical Test-time Prompt Tuning for Vision Foundation Models

Test-time prompt tuning, which learns prompts online with unlabelled test samples during the inference stage, has demonstrated great potential by learning effective prompts on-the-fly without requiring any task-specific annotations. However, its performance often degrades clearly along the tuning process when the prompts are continuously updated with the test data flow, and the degradation becomes more severe when the domain of test samples changes continuously. We propose HisTPT, a Historical Test-time Prompt Tuning technique that memorizes the useful knowledge of the learnt test samples and enables robust test-time prompt tuning with the memorized knowledge. HisTPT introduces three types of knowledge banks, namely, local knowledge bank, hard-sample knowledge bank, and global knowledge bank, each of which works with different mechanisms for effective knowledge memorization and test-time prompt optimization. In addition, HisTPT features an adaptive knowledge retrieval mechanism that regularizes the prediction of each test sample by adaptively retrieving the memorized knowledge. Extensive experiments show that HisTPT achieves superior prompt tuning performance consistently while handling different visual recognition tasks (e.g., image classification, semantic segmentation, and object detection) and test samples from continuously changing domains.

Updated: 2024-10-27 06:03:15

标题: 视觉基础模型的历史测试时间提示调整

摘要: 测试时间提示调整是一种在推理阶段使用未标记的测试样本在线学习提示的技术，通过在没有需要任何特定任务注释的情况下，实时学习有效的提示，展现出巨大潜力。然而，当提示与测试数据流连续更新时，其性能往往会在调整过程中明显下降，并且在测试样本的领域不断变化时，这种下降会变得更加严重。我们提出了HisTPT，一种历史测试时间提示调整技术，它记忆了学习的测试样本的有用知识，并通过记忆的知识实现稳健的测试时间提示调整。HisTPT引入了三种类型的知识库，分别是本地知识库、难样本知识库和全局知识库，每种知识库都使用不同的机制进行有效的知识记忆和测试时间提示优化。此外，HisTPT具有自适应知识检索机制，通过自适应检索记忆知识来规范每个测试样本的预测。大量实验表明，HisTPT在处理不同的视觉识别任务（例如图像分类、语义分割和目标检测）和不断变化的领域的测试样本时，始终能够实现卓越的提示调整性能。

更新时间: 2024-10-27 06:03:15

领域: cs.CV,cs.AI,cs.CL

下载: http://arxiv.org/abs/2410.20346v1

Veagle: Advancements in Multimodal Representation Learning

Lately, researchers in artificial intelligence have been really interested in how language and vision come together, giving rise to the development of multimodal models that aim to seamlessly integrate textual and visual information. Multimodal models, an extension of Large Language Models (LLMs), have exhibited remarkable capabilities in addressing a diverse array of tasks, ranging from image captioning and visual question answering (VQA) to visual grounding. While these models have showcased significant advancements, challenges persist in accurately interpreting images and answering the question, a common occurrence in real-world scenarios. This paper introduces a novel approach to enhance the multimodal capabilities of existing models. In response to the limitations observed in current Vision Language Models (VLMs) and Multimodal Large Language Models (MLLMs), our proposed model Veagle, incorporates a unique mechanism inspired by the successes and insights of previous works. Veagle leverages a dynamic mechanism to project encoded visual information directly into the language model. This dynamic approach allows for a more nuanced understanding of intricate details present in visual contexts. To validate the effectiveness of Veagle, we conduct comprehensive experiments on benchmark datasets, emphasizing tasks such as visual question answering and image understanding. Our results indicate a improvement of 5-6 \% in performance, with Veagle outperforming existing models by a notable margin. The outcomes underscore the model's versatility and applicability beyond traditional benchmarks.

Updated: 2024-10-27 06:01:49

标题: Veagle：多模态表示学习的进展

摘要: 最近，人工智能领域的研究人员对语言和视觉如何结合产生了浓厚的兴趣，从而推动了多模态模型的发展，旨在无缝地整合文本和视觉信息。多模态模型是大型语言模型（LLMs）的延伸，已经展示出在解决各种任务方面的显著能力，包括图像标题生成、视觉问答（VQA）和视觉定位。虽然这些模型展示了显著的进展，但在准确解释图像和回答问题方面仍然存在挑战，这在现实场景中经常发生。本文介绍了一种增强现有模型多模态能力的新方法。针对当前视觉语言模型（VLMs）和多模态大型语言模型（MLLMs）存在的局限性，我们提出的Veagle模型融合了启发自先前工作成功和见解的独特机制。Veagle利用动态机制将编码的视觉信息直接投影到语言模型中。这种动态方法可以更细致地理解视觉环境中的复杂细节。为了验证Veagle的有效性，我们在基准数据集上进行了全面实验，重点关注视觉问答和图像理解等任务。我们的结果表明，Veagle的性能提高了5-6％，并且在较大程度上优于现有模型。这些结果突显了该模型的多功能性和适用性，超越了传统基准。

更新时间: 2024-10-27 06:01:49

领域: cs.CV,cs.AI,cs.CL,cs.MM

下载: http://arxiv.org/abs/2403.08773v2

Logarithmically Quantized Distributed Optimization over Dynamic Multi-Agent Networks

Distributed optimization finds many applications in machine learning, signal processing, and control systems. In these real-world applications, the constraints of communication networks, particularly limited bandwidth, necessitate implementing quantization techniques. In this paper, we propose distributed optimization dynamics over multi-agent networks subject to logarithmically quantized data transmission. Under this condition, data exchange benefits from representing smaller values with more bits and larger values with fewer bits. As compared to uniform quantization, this allows for higher precision in representing near-optimal values and more accuracy of the distributed optimization algorithm. The proposed optimization dynamics comprise a primary state variable converging to the optimizer and an auxiliary variable tracking the objective function's gradient. Our setting accommodates dynamic network topologies, resulting in a hybrid system requiring convergence analysis using matrix perturbation theory and eigenspectrum analysis.

Updated: 2024-10-27 06:01:01

标题: 对动态多智能体网络进行对数量化分布式优化

摘要: 分布式优化在机器学习、信号处理和控制系统中有许多应用。在这些现实世界的应用中，通信网络的约束，特别是带宽有限，需要实施量化技术。本文提出了一种在多智能体网络中基于对数量化数据传输的分布式优化动态方法。在这种条件下，数据交换通过用更多位表示较小的值和用较少位表示较大的值来获益。与均匀量化相比，这允许更高精度地表示接近最优值，并且提高了分布式优化算法的准确性。所提出的优化动态包括一个主状态变量收敛到最优化器，以及一个跟踪目标函数梯度的辅助变量。我们的设置适应动态网络拓扑，导致需要使用矩阵摄动理论和特征谱分析进行收敛分析的混合系统。

更新时间: 2024-10-27 06:01:01

领域: eess.SY,cs.LG,cs.MA,cs.SY,math.OC

下载: http://arxiv.org/abs/2410.20345v1

SuperCoder2.0: Technical Report on Exploring the feasibility of LLMs as Autonomous Programmer

We present SuperCoder2.0, an advanced autonomous system designed to enhance software development through artificial intelligence. The system combines an AI-native development approach with intelligent agents to enable fully autonomous coding. Key focus areas include a retry mechanism with error output traceback, comprehensive code rewriting and replacement using Abstract Syntax Tree (ast) parsing to minimize linting issues, code embedding technique for retrieval-augmented generation, and a focus on localizing methods for problem-solving rather than identifying specific line numbers. The methodology employs a three-step hierarchical search space reduction approach for code base navigation and bug localization:utilizing Retrieval Augmented Generation (RAG) and a Repository File Level Map to identify candidate files, (2) narrowing down to the most relevant files using a File Level Schematic Map, and (3) extracting 'relevant locations' within these files. Code editing is performed through a two-part module comprising CodeGeneration and CodeEditing, which generates multiple solutions at different temperature values and replaces entire methods or classes to maintain code integrity. A feedback loop executes repository-level test cases to validate and refine solutions. Experiments conducted on the SWE-bench Lite dataset demonstrate SuperCoder2.0's effectiveness, achieving correct file localization in 84.33% of cases within the top 5 candidates and successfully resolving 34% of test instances. This performance places SuperCoder2.0 fourth globally on the SWE-bench leaderboard. The system's ability to handle diverse repositories and problem types highlights its potential as a versatile tool for autonomous software development. Future work will focus on refining the code editing process and exploring advanced embedding models for improved natural language to code mapping.

Updated: 2024-10-27 05:57:07

标题: SuperCoder2.0: 探索将LLM作为自主程序员的可行性技术报告

摘要: 我们介绍了SuperCoder2.0，一个先进的自主系统，旨在通过人工智能增强软件开发。该系统将AI本地开发方法与智能代理相结合，实现完全自主编码。重点领域包括带有错误输出回溯的重试机制、使用抽象语法树（AST）解析进行全面代码重写和替换以最小化linting问题、用于检索增强生成的代码嵌入技术，以及专注于定位方法而不是识别特定行号来解决问题。该方法采用三步层次搜索空间缩减方法进行代码库导航和错误定位：利用检索增强生成（RAG）和存储库文件级映射来识别候选文件，（2）使用文件级示意图缩小范围到最相关的文件，（3）在这些文件中提取“相关位置”。代码编辑通过由CodeGeneration和CodeEditing组成的两部分模块执行，该模块在不同温度值下生成多个解决方案，并替换整个方法或类以保持代码完整性。反馈循环执行存储库级测试用例以验证和完善解决方案。在SWE-bench Lite数据集上进行的实验表明，SuperCoder2.0的有效性，在前5个候选者中正确定位文件的情况下的正确率为84.33％，并成功解决34％的测试实例。这种表现使SuperCoder2.0在SWE-bench排行榜上全球排名第四。该系统处理多样化存储库和问题类型的能力突显了其作为自主软件开发多功能工具的潜力。未来的工作将集中在完善代码编辑过程，并探索用于改进自然语言到代码映射的先进嵌入模型。

更新时间: 2024-10-27 05:57:07

领域: cs.SE,cs.AI

下载: http://arxiv.org/abs/2409.11190v2

GUIDE: Graphical User Interface Data for Execution

In this paper, we introduce GUIDE, a novel dataset tailored for the advancement of Multimodal Large Language Model (MLLM) applications, particularly focusing on Robotic Process Automation (RPA) use cases. Our dataset encompasses diverse data from various websites including Apollo(62.67\%), Gmail(3.43\%), Calendar(10.98\%) and Canva(22.92\%). Each data entry includes an image, a task description, the last action taken, CoT and the next action to be performed along with grounding information of where the action needs to be executed. The data is collected using our in-house advanced annotation tool NEXTAG (Next Action Grounding and Annotation Tool). The data is adapted for multiple OS, browsers and display types. It is collected by multiple annotators to capture the variation of design and the way person uses a website. Through this dataset, we aim to facilitate research and development in the realm of LLMs for graphical user interfaces, particularly in tasks related to RPA. The dataset's multi-platform nature and coverage of diverse websites enable the exploration of cross-interface capabilities in automation tasks. We believe that our dataset will serve as a valuable resource for advancing the capabilities of multi-platform LLMs in practical applications, fostering innovation in the field of automation and natural language understanding. Using GUIDE, we build V-Zen, the first RPA model to automate multiple websites using our in-House Automation tool AUTONODE

Updated: 2024-10-27 05:54:50

标题: 指南：用于执行的图形用户界面数据

摘要: 在这篇论文中，我们介绍了GUIDE，这是一个专为推动多模态大型语言模型（MLLM）应用而定制的新型数据集，特别关注机器人流程自动化（RPA）用例。我们的数据集涵盖了来自各种网站的多样化数据，包括Apollo（62.67％）、Gmail（3.43％）、日历（10.98％）和Canva（22.92％）。每个数据条目包括一张图片、一个任务描述、最后一个动作、CoT以及下一个要执行的动作，以及动作需要执行的具体位置信息。数据是使用我们的内部高级注释工具NEXTAG（Next Action Grounding and Annotation Tool）收集的。数据适用于多个操作系统、浏览器和显示类型。它由多个注释者收集，以捕捉设计变化和用户使用网站的方式。通过这个数据集，我们旨在促进图形用户界面LLM领域的研究和发展，特别是与RPA相关的任务。数据集的多平台性质和涵盖各种网站使得可以探索自动化任务中的跨界面能力。我们相信我们的数据集将作为推动多平台LLM在实际应用中能力提升的有价值资源，促进自动化和自然语言理解领域的创新。利用GUIDE，我们构建了V-Zen，这是第一个使用我们内部自动化工具AUTONODE自动化多个网站的RPA模型。

更新时间: 2024-10-27 05:54:50

领域: cs.HC,cs.AI

下载: http://arxiv.org/abs/2404.16048v2

Brain Networks and Intelligence: A Graph Neural Network Based Approach to Resting State fMRI Data

Resting-state functional magnetic resonance imaging (rsfMRI) is a powerful tool for investigating the relationship between brain function and cognitive processes as it allows for the functional organization of the brain to be captured without relying on a specific task or stimuli. In this paper, we present a novel modeling architecture called BrainRGIN for predicting intelligence (fluid, crystallized, and total intelligence) using graph neural networks on rsfMRI derived static functional network connectivity matrices. Extending from the existing graph convolution networks, our approach incorporates a clustering-based embedding and graph isomorphism network in the graph convolutional layer to reflect the nature of the brain sub-network organization and efficient network expression, in combination with TopK pooling and attention-based readout functions. We evaluated our proposed architecture on a large dataset, specifically the Adolescent Brain Cognitive Development Dataset, and demonstrated its effectiveness in predicting individual differences in intelligence. Our model achieved lower mean squared errors and higher correlation scores than existing relevant graph architectures and other traditional machine learning models for all of the intelligence prediction tasks. The middle frontal gyrus exhibited a significant contribution to both fluid and crystallized intelligence, suggesting their pivotal role in these cognitive processes. Total composite scores identified a diverse set of brain regions to be relevant which underscores the complex nature of total intelligence.

Updated: 2024-10-27 04:56:30

标题: 大脑网络与智力：基于图神经网络的静息态fMRI数据处理方法

摘要: 静息态功能磁共振成像（rsfMRI）是一种强大的工具，可用于研究大脑功能与认知过程之间的关系，因为它允许捕获大脑的功能组织，而无需依赖特定任务或刺激。在本文中，我们提出了一种名为BrainRGIN的新型建模架构，用于使用图神经网络在rsfMRI派生的静态功能网络连接矩阵上预测智力（流体、结晶和总智力）。延伸自现有的图卷积网络，我们的方法在图卷积层中结合了基于聚类的嵌入和图同构网络，以反映大脑子网络组织和有效网络表达的特性，同时还结合了TopK池化和基于注意力的读出函数。我们在一个大型数据集上评估了我们提出的架构，具体是青少年大脑认知发展数据集，并展示了其在预测智力个体差异方面的有效性。我们的模型在所有智力预测任务的均方误差和相关分数上均比现有相关图架构和其他传统机器学习模型表现出更低的水平。中央额叶回显示出对流体和结晶智力都有显著贡献，表明它们在这些认知过程中的关键作用。总复合分数确定了一组相关的脑区，强调了总智力的复杂性。

更新时间: 2024-10-27 04:56:30

领域: cs.LG,cs.AI,q-bio.NC

下载: http://arxiv.org/abs/2311.03520v3

Aligning Target-Aware Molecule Diffusion Models with Exact Energy Optimization

Generating ligand molecules for specific protein targets, known as structure-based drug design, is a fundamental problem in therapeutics development and biological discovery. Recently, target-aware generative models, especially diffusion models, have shown great promise in modeling protein-ligand interactions and generating candidate drugs. However, existing models primarily focus on learning the chemical distribution of all drug candidates, which lacks effective steerability on the chemical quality of model generations. In this paper, we propose a novel and general alignment framework to align pretrained target diffusion models with preferred functional properties, named AliDiff. AliDiff shifts the target-conditioned chemical distribution towards regions with higher binding affinity and structural rationality, specified by user-defined reward functions, via the preference optimization approach. To avoid the overfitting problem in common preference optimization objectives, we further develop an improved Exact Energy Preference Optimization method to yield an exact and efficient alignment of the diffusion models, and provide the closed-form expression for the converged distribution. Empirical studies on the CrossDocked2020 benchmark show that AliDiff can generate molecules with state-of-the-art binding energies with up to -7.07 Avg. Vina Score, while maintaining strong molecular properties. Code is available at https://github.com/MinkaiXu/AliDiff.

Updated: 2024-10-27 04:54:06

标题: 将目标感知分子扩散模型与精确能量优化对齐

摘要: 生成特定蛋白靶标的配体分子，即结构基药物设计，在治疗学发展和生物发现中是一个基本问题。最近，以目标为导向的生成模型，特别是扩散模型，在建模蛋白质-配体相互作用和生成候选药物方面显示出巨大的潜力。然而，现有模型主要集中在学习所有药物候选的化学分布，缺乏对模型生成化学质量的有效可控性。在本文中，我们提出了一个新颖且通用的对准框架，将预训练的目标扩散模型与优选的功能属性对准，称为AliDiff。AliDiff通过首选优化方法将目标条件的化学分布转移到具有更高结合亲和力和结构合理性的区域，由用户定义的奖励函数指定。为了避免常见首选优化目标中的过拟合问题，我们进一步开发了一种改进的精确能量首选优化方法，以产生扩散模型的准确且高效的对准，并为收敛分布提供了闭合形式表达式。对CrossDocked2020基准的实证研究表明，AliDiff可以生成具有最先进结合能量的分子，最高达到-7.07 Avg. Vina Score，同时保持强大的分子性质。代码可在https://github.com/MinkaiXu/AliDiff 上找到。

更新时间: 2024-10-27 04:54:06

领域: q-bio.BM,cs.LG,q-bio.QM

下载: http://arxiv.org/abs/2407.01648v2

Maintaining Informative Coherence: Migrating Hallucinations in Large Language Models via Absorbing Markov Chains

Large Language Models (LLMs) are powerful tools for text generation, translation, and summarization, but they often suffer from hallucinations-instances where they fail to maintain the fidelity and coherence of contextual information during decoding, sometimes overlooking critical details due to their sampling strategies and inherent biases from training data and fine-tuning discrepancies. These hallucinations can propagate through the web, affecting the trustworthiness of information disseminated online. To address this issue, we propose a novel decoding strategy that leverages absorbing Markov chains to quantify the significance of contextual information and measure the extent of information loss during generation. By considering all possible paths from the first to the last token, our approach enhances the reliability of model outputs without requiring additional training or external data. Evaluations on datasets including TruthfulQA, FACTOR, and HaluEval highlight the superior performance of our method in mitigating hallucinations, underscoring the necessity of ensuring accurate information flow in web-based applications.

Updated: 2024-10-27 04:51:18

标题: 保持信息连贯性：通过吸收马尔可夫链在大型语言模型中迁移幻觉

摘要: 大型语言模型（LLMs）是文本生成、翻译和摘要的强大工具，但它们经常遭受幻觉的困扰-即它们在解码过程中未能保持上下文信息的忠实性和连贯性，有时由于其抽样策略和训练数据以及微调差异的固有偏见而忽略关键细节。这些幻觉可能通过网络传播，影响在线传播的信息可信度。为了解决这个问题，我们提出了一种利用吸收马尔可夫链来量化上下文信息的重要性并衡量生成过程中信息损失程度的新型解码策略。通过考虑从第一个到最后一个标记的所有可能路径，我们的方法增强了模型输出的可靠性，而无需额外的训练或外部数据。在包括TruthfulQA、FACTOR和HaluEval在内的数据集上的评估突显了我们的方法在减轻幻觉方面的优越性能，强调了在基于网络的应用程序中确保准确信息流动的必要性。

更新时间: 2024-10-27 04:51:18

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2410.20340v1

Is Mamba Compatible with Trajectory Optimization in Offline Reinforcement Learning?

Transformer-based trajectory optimization methods have demonstrated exceptional performance in offline Reinforcement Learning (offline RL). Yet, it poses challenges due to substantial parameter size and limited scalability, which is particularly critical in sequential decision-making scenarios where resources are constrained such as in robots and drones with limited computational power. Mamba, a promising new linear-time sequence model, offers performance on par with transformers while delivering substantially fewer parameters on long sequences. As it remains unclear whether Mamba is compatible with trajectory optimization, this work aims to conduct comprehensive experiments to explore the potential of Decision Mamba (dubbed DeMa) in offline RL from the aspect of data structures and essential components with the following insights: (1) Long sequences impose a significant computational burden without contributing to performance improvements since DeMa's focus on sequences diminishes approximately exponentially. Consequently, we introduce a Transformer-like DeMa as opposed to an RNN-like DeMa. (2) For the components of DeMa, we identify the hidden attention mechanism as a critical factor in its success, which can also work well with other residual structures and does not require position embedding. Extensive evaluations demonstrate that our specially designed DeMa is compatible with trajectory optimization and surpasses previous methods, outperforming Decision Transformer (DT) with higher performance while using 30\% fewer parameters in Atari, and exceeding DT with only a quarter of the parameters in MuJoCo.

Updated: 2024-10-27 04:46:58

标题: Mamba是否与离线强化学习中的轨迹优化兼容？

摘要: 基于Transformer的轨迹优化方法在离线强化学习（offline RL）中表现出卓越的性能。然而，由于参数规模巨大和可扩展性有限，这带来了挑战，特别是在资源受限的顺序决策场景中，如计算能力有限的机器人和无人机。Mamba是一种新型的线性时间序列模型，提供了与Transformer相当的性能，同时在长序列上使用了大大减少的参数。由于尚不清楚Mamba是否与轨迹优化兼容，本研究旨在开展全面实验，从数据结构和关键组件的角度探索Decision Mamba（简称DeMa）在离线RL中的潜力，得出以下见解：（1）长序列对计算造成重大负担，但并不贡献性能改善，因为DeMa的重点在于序列，其贡献近似指数下降。因此，我们引入了类似Transformer的DeMa，而非类似RNN的DeMa。（2）对于DeMa的组件，我们确定隐藏的注意力机制是其成功的关键因素，它也可以很好地与其他残差结构一起使用，不需要位置嵌入。广泛的评估表明，我们特别设计的DeMa与轨迹优化兼容，并超越了以往的方法，在Atari中使用30％较少的参数而性能更高，而在MuJoCo中则只使用了四分之一的参数就超越了DT。

更新时间: 2024-10-27 04:46:58

领域: cs.LG

下载: http://arxiv.org/abs/2405.12094v2

Towards Understanding the Working Mechanism of Text-to-Image Diffusion Model

Recently, the strong latent Diffusion Probabilistic Model (DPM) has been applied to high-quality Text-to-Image (T2I) generation (e.g., Stable Diffusion), by injecting the encoded target text prompt into the gradually denoised diffusion image generator. Despite the success of DPM in practice, the mechanism behind it remains to be explored. To fill this blank, we begin by examining the intermediate statuses during the gradual denoising generation process in DPM. The empirical observations indicate, the shape of image is reconstructed after the first few denoising steps, and then the image is filled with details (e.g., texture). The phenomenon is because the low-frequency signal (shape relevant) of the noisy image is not corrupted until the final stage in the forward process (initial stage of generation) of adding noise in DPM. Inspired by the observations, we proceed to explore the influence of each token in the text prompt during the two stages. After a series of experiments of T2I generations conditioned on a set of text prompts. We conclude that in the earlier generation stage, the image is mostly decided by the special token [\texttt{EOS}] in the text prompt, and the information in the text prompt is already conveyed in this stage. After that, the diffusion model completes the details of generated images by information from themselves. Finally, we propose to apply this observation to accelerate the process of T2I generation by properly removing text guidance, which finally accelerates the sampling up to 25\%+.

Updated: 2024-10-27 04:44:41

标题: 朝向理解文本到图像扩散模型的工作机制

摘要: 最近，强潜在扩散概率模型（DPM）已被应用于高质量的文本到图像（T2I）生成（例如，稳定扩散），通过将编码的目标文本提示注入逐渐去噪的扩散图像生成器中。尽管DPM在实践中取得了成功，其背后的机制仍有待探索。为了填补这一空白，我们首先通过检查DPM中逐渐去噪生成过程中的中间状态来开始。经验观察表明，在前几个去噪步骤之后，图像的形状被重建，然后图像被填充了细节（例如纹理）。这种现象是因为嘈杂图像的低频信号（与形状相关）直到DPM中添加噪声的前向过程（生成的初始阶段）的最后阶段才被破坏。受到这些观察的启发，我们继续探讨在两个阶段期间文本提示中每个标记的影响。经过一系列基于一组文本提示条件的T2I生成实验，我们得出结论，在较早的生成阶段，图像主要由文本提示中的特殊标记[\texttt{EOS}]决定，并且文本提示中的信息已经在此阶段传达。之后，扩散模型通过自身信息完成所生成图像的细节。最后，我们建议将这一观察应用于通过适当删除文本指导来加速T2I生成过程，从而最终加速采样达到25\%以上。

更新时间: 2024-10-27 04:44:41

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2405.15330v3

Get Large Language Models Ready to Speak: A Late-fusion Approach for Speech Generation

Large language models (LLMs) have revolutionized natural language processing (NLP) with impressive performance across various text-based tasks. However, the extension of text-dominant LLMs to with speech generation tasks remains under-explored. In this work, we introduce a text-to-speech (TTS) system powered by a fine-tuned Llama model, named TTS-Llama, that achieves state-of-the-art speech synthesis performance. Building on TTS-Llama, we further propose MoLE-Llama, a text-and-speech multimodal LLM developed through purely late-fusion parameter-efficient fine-tuning (PEFT) and a mixture-of-expert architecture. Extensive empirical results demonstrate MoLE-Llama's competitive performance on both text-only question-answering (QA) and TTS tasks, mitigating catastrophic forgetting issue in either modality. Finally, we further explore MoLE-Llama in text-in-speech-out QA tasks, demonstrating its great potential as a multimodal dialog system capable of speech generation.

Updated: 2024-10-27 04:28:57

标题: 让大型语言模型准备好说话：一种用于语音生成的后融合方法

摘要: 大型语言模型（LLMs）已经彻底改变了自然语言处理（NLP），在各种基于文本的任务中表现出色。然而，针对语音生成任务的文本主导型LLMs的拓展仍未被深入探索。在本研究中，我们介绍了一个由经过精细调整的Llama模型驱动的文本到语音（TTS）系统，名为TTS-Llama，实现了最先进的语音合成性能。基于TTS-Llama，我们进一步提出了MoLE-Llama，一个通过纯后期融合参数高效微调（PEFT）和专家混合架构开发的文本和语音多模式LLM。大量实证结果表明MoLE-Llama在仅文本问答（QA）和TTS任务上具有竞争性表现，缓解了任一模态中的灾难性遗忘问题。最后，我们进一步探讨了MoLE-Llama在文本到语音问答任务中的应用，展示了它作为一种能够进行语音生成的多模式对话系统的巨大潜力。

更新时间: 2024-10-27 04:28:57

领域: cs.CL,cs.AI,cs.SD,eess.AS

下载: http://arxiv.org/abs/2410.20336v1

Intuitionistic Fuzzy Universum Twin Support Vector Machine for Imbalanced Data

One of the major difficulties in machine learning methods is categorizing datasets that are imbalanced. This problem may lead to biased models, where the training process is dominated by the majority class, resulting in inadequate representation of the minority class. Universum twin support vector machine (UTSVM) produces a biased model towards the majority class, as a result, its performance on the minority class is often poor as it might be mistakenly classified as noise. Moreover, UTSVM is not proficient in handling datasets that contain outliers and noises. Inspired by the concept of incorporating prior information about the data and employing an intuitionistic fuzzy membership scheme, we propose intuitionistic fuzzy universum twin support vector machines for imbalanced data (IFUTSVM-ID). We use an intuitionistic fuzzy membership scheme to mitigate the impact of noise and outliers. Moreover, to tackle the problem of imbalanced class distribution, data oversampling and undersampling methods are utilized. Prior knowledge about the data is provided by universum data. This leads to better generalization performance. UTSVM is susceptible to overfitting risks due to the omission of the structural risk minimization (SRM) principle in their primal formulations. However, the proposed IFUTSVM-ID model incorporates the SRM principle through the incorporation of regularization terms, effectively addressing the issue of overfitting. We conduct a comprehensive evaluation of the proposed IFUTSVM-ID model on benchmark datasets from KEEL and compare it with existing baseline models. Furthermore, to assess the effectiveness of the proposed IFUTSVM-ID model in diagnosing Alzheimer's disease (AD), we applied them to the Alzheimer's Disease Neuroimaging Initiative (ADNI) dataset. Experimental results showcase the superiority of the proposed IFUTSVM-ID models compared to the baseline models.

Updated: 2024-10-27 04:25:42

标题: 直觉模糊宇宙双支持向量机用于不平衡数据

摘要: 机器学习方法中的一个主要困难是对不平衡数据集进行分类。这个问题可能导致模型出现偏见，训练过程被主要类别所主导，导致少数类别的表示不足。Universum双支持向量机(UTSVM)会产生偏向主要类别的模型，因此其在少数类别上的表现通常较差，因为它可能会被错误地分类为噪声。此外，UTSVM在处理包含离群值和噪声的数据集方面并不擅长。受到将关于数据的先验信息并采用直觉模糊成员方案的概念的启发，我们提出了用于不平衡数据的直觉模糊universum双支持向量机(IFUTSVM-ID)。我们使用直觉模糊成员方案来减轻噪声和离群值的影响。此外，为了解决不平衡类别分布的问题，我们利用了数据过采样和欠采样方法。通过universum数据提供关于数据的先验知识。这导致更好的泛化性能。UTSVM容易受到过拟合风险的影响，因为它们的原始公式中省略了结构风险最小化(SRM)原则。然而，提出的IFUTSVM-ID模型通过合并正则化项来有效地解决过拟合问题。我们对来自KEEL的基准数据集上的提出的IFUTSVM-ID模型进行了全面评估，并将其与现有基线模型进行了比较。此外，为了评估提出的IFUTSVM-ID模型在诊断阿尔茨海默病(AD)方面的有效性，我们将其应用于阿尔茨海默病神经影像学倡议(ADNI)数据集。实验结果展示了提出的IFUTSVM-ID模型与基线模型相比的优越性。

更新时间: 2024-10-27 04:25:42

领域: cs.LG

下载: http://arxiv.org/abs/2410.20335v1

Modular Learning of Deep Causal Generative Models for High-dimensional Causal Inference

Sound and complete algorithms have been proposed to compute identifiable causal queries using the causal structure and data. However, most of these algorithms assume accurate estimation of the data distribution, which is impractical for high-dimensional variables such as images. On the other hand, modern deep generative architectures can be trained to sample from high-dimensional distributions. However, training these networks are typically very costly. Thus, it is desirable to leverage pre-trained models to answer causal queries using such high-dimensional data. To address this, we propose modular training of deep causal generative models that not only makes learning more efficient, but also allows us to utilize large, pre-trained conditional generative models. To the best of our knowledge, our algorithm, Modular-DCM is the first algorithm that, given the causal structure, uses adversarial training to learn the network weights, and can make use of pre-trained models to provably sample from any identifiable causal query in the presence of latent confounders. With extensive experiments on the Colored-MNIST dataset, we demonstrate that our algorithm outperforms the baselines. We also show our algorithm's convergence on the COVIDx dataset and its utility with a causal invariant prediction problem on CelebA-HQ.

Updated: 2024-10-27 04:18:04

标题: 深度因果生成模型的模块化学习用于高维因果推断

摘要: 已经提出了一些声音和完整的算法，用于使用因果结构和数据计算可识别的因果查询。然而，大多数这些算法都假设对数据分布进行准确估计，这对于诸如图像之类的高维变量是不切实际的。另一方面，现代深度生成架构可以训练以从高维分布中采样。然而，训练这些网络通常非常昂贵。因此，利用预训练模型来回答使用这种高维数据的因果查询是可取的。为了解决这个问题，我们提出了深度因果生成模型的模块化训练，不仅使学习更加高效，还允许我们利用大型的预训练条件生成模型。据我们所知，我们的算法Modular-DCM是第一个在给定因果结构的情况下使用对抗训练来学习网络权重，并可以利用预训练模型从任何可识别的因果查询中采样的算法。通过对Colored-MNIST数据集进行大量实验，我们证明了我们的算法优于基线。我们还展示了我们的算法在COVIDx数据集上的收敛性，以及在CelebA-HQ上的因果不变预测问题中的实用性。

更新时间: 2024-10-27 04:18:04

领域: cs.LG,cs.AI,cs.IT,math.IT,stat.ME,stat.ML

下载: http://arxiv.org/abs/2401.01426v2

Embedded Nonlocal Operator Regression (ENOR): Quantifying model error in learning nonlocal operators

Nonlocal, integral operators have become an efficient surrogate for bottom-up homogenization, due to their ability to represent long-range dependence and multiscale effects. However, the nonlocal homogenized model has unavoidable discrepancy from the microscale model. Such errors accumulate and propagate in long-term simulations, making the resultant prediction unreliable. To develop a robust and reliable bottom-up homogenization framework, we propose a new framework, which we coin Embedded Nonlocal Operator Regression (ENOR), to learn a nonlocal homogenized surrogate model and its structural model error. This framework provides discrepancy-adaptive uncertainty quantification for homogenized material response predictions in long-term simulations. The method is built on Nonlocal Operator Regression (NOR), an optimization-based nonlocal kernel learning approach, together with an embedded model error term in the trainable kernel. Then, Bayesian inference is employed to infer the model error term parameters together with the kernel parameters. To make the problem computationally feasible, we use a multilevel delayed acceptance Markov chain Monte Carlo (MLDA-MCMC) method, enabling efficient Bayesian model calibration and model error estimation. We apply this technique to predict long-term wave propagation in a heterogeneous one-dimensional bar, and compare its performance with additive noise models. Owing to its ability to capture model error, the learned ENOR achieves improved estimation of posterior predictive uncertainty.

Updated: 2024-10-27 04:17:27

标题: 嵌入式非局部算子回归（ENOR）：学习非局部算子中的模型误差量化

摘要: 非局部、积分算子已成为自下而上均质化的有效替代方案，因为它们能够表示长程依赖和多尺度效应。然而，非局部均质化模型与微观尺度模型之间不可避免存在差异。这种错误在长期模拟中累积和传播，导致预测结果不可靠。为了开发一个强大可靠的自下而上均质化框架，我们提出了一种新的框架，称为嵌入式非局部算子回归（ENOR），用于学习非局部均质化替代模型及其结构模型误差。该框架为长期模拟中的均质化材料响应预测提供了适应性差异不确定性量化。该方法建立在非局部算子回归（NOR）上，这是一种基于优化的非局部核学习方法，以及一个可训练核中嵌入的模型误差项。然后，贝叶斯推断被用来推断模型误差项参数以及核参数。为了使问题在计算上可行，我们使用了多级延迟接受马尔可夫链蒙特卡洛（MLDA-MCMC）方法，实现了高效的贝叶斯模型校准和模型误差估计。我们将这种技术应用于预测异质一维杆中的长期波传播，并将其性能与加性噪声模型进行比较。由于其捕捉模型误差的能力，学习的ENOR实现了后验预测不确定性的改进估计。

更新时间: 2024-10-27 04:17:27

领域: cs.LG,cond-mat.mtrl-sci

下载: http://arxiv.org/abs/2410.20331v1

Nova: An Iterative Planning and Search Approach to Enhance Novelty and Diversity of LLM Generated Ideas

Scientific innovation is pivotal for humanity, and harnessing large language models (LLMs) to generate research ideas could transform discovery. However, existing LLMs often produce simplistic and repetitive suggestions due to their limited ability in acquiring external knowledge for innovation. To address this problem, we introduce an enhanced planning and search methodology designed to boost the creative potential of LLM-based systems. Our approach involves an iterative process to purposely plan the retrieval of external knowledge, progressively enriching the idea generation with broader and deeper insights. Validation through automated and human assessments indicates that our framework substantially elevates the quality of generated ideas, particularly in novelty and diversity. The number of unique novel ideas produced by our framework is 3.4 times higher than without it. Moreover, our method outperforms the current state-of-the-art, generating at least 2.5 times more top-rated ideas based on 170 seed papers in a Swiss Tournament evaluation.

Updated: 2024-10-27 04:02:32

标题: 新星：一种增强LLM生成的创意新颖性和多样性的迭代规划和搜索方法

摘要: 科学创新对人类至关重要，利用大型语言模型（LLMs）生成研究思路可能会改变发现过程。然而，现有的LLMs通常会因其在获取创新外部知识方面的能力有限而产生简单和重复的建议。为了解决这一问题，我们引入了一种增强的规划和搜索方法，旨在提高基于LLM的系统的创造潜力。我们的方法涉及一个迭代过程，有意地规划外部知识的检索，逐渐丰富思路生成，引入更广泛和深入的见解。通过自动化和人工评估的验证表明，我们的框架显著提高了生成思路的质量，特别是在新颖性和多样性方面。我们的框架产生的独特新颖思路数量比没有使用框架时高出3.4倍。此外，我们的方法在瑞士赛评估中至少产生了2.5倍更多的高评分思路，基于170篇种子论文。

更新时间: 2024-10-27 04:02:32

领域: cs.AI,cs.CL

下载: http://arxiv.org/abs/2410.14255v2

R-LLaVA: Improving Med-VQA Understanding through Visual Region of Interest

Artificial intelligence has made significant strides in medical visual question answering (Med-VQA), yet prevalent studies often interpret images holistically, overlooking the visual regions of interest that may contain crucial information, potentially aligning with a doctor's prior knowledge that can be incorporated with minimal annotations (e.g., bounding boxes). To address this gap, this paper introduces R-LLaVA, designed to enhance biomedical VQA understanding by integrating simple medical annotations as prior knowledge directly into the image space through CLIP. These annotated visual regions of interest are then fed into the LLaVA model during training, aiming to enrich the model's understanding of biomedical queries. Experimental evaluation on four standard Med-VQA datasets demonstrates R-LLaVA's superiority over existing state-of-the-art (SoTA) methods. Additionally, to verify the model's capability in visual comprehension, a novel multiple-choice medical visual understanding dataset is introduced, confirming the positive impact of focusing on visual regions of interest in advancing biomedical VQA understanding.

Updated: 2024-10-27 03:56:56

标题: R-LLaVA：通过视觉感兴趣区域改进医学视觉问答理解

摘要: 人工智能在医学视觉问答（Med-VQA）方面取得了显著进展，然而普遍的研究往往将图像整体解释，忽略了可能包含关键信息的视觉感兴趣区域，这可能与医生的先前知识相符，可以通过最少的注释（例如，边界框）进行整合。为了弥补这一差距，本文介绍了R-LLaVA，旨在通过将简单的医学注释作为先验知识直接整合到图像空间中，通过CLIP来增强生物医学VQA的理解。这些带注释的视觉感兴趣区域然后在训练过程中输入LLaVA模型，旨在丰富模型对生物医学查询的理解。对四个标准Med-VQA数据集的实验评估显示，R-LLaVA优于现有的最先进方法。此外，为验证模型在视觉理解方面的能力，引入了一个新颖的多选医学视觉理解数据集，证实了专注于视觉感兴趣区域对推进生物医学VQA理解的积极影响。

更新时间: 2024-10-27 03:56:56

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2410.20327v1

Domain Specific Data Distillation and Multi-modal Embedding Generation

The challenge of creating domain-centric embeddings arises from the abundance of unstructured data and the scarcity of domain-specific structured data. Conventional embedding techniques often rely on either modality, limiting their applicability and efficacy. This paper introduces a novel modeling approach that leverages structured data to filter noise from unstructured data, resulting in embeddings with high precision and recall for domain-specific attribute prediction. The proposed model operates within a Hybrid Collaborative Filtering (HCF) framework, where generic entity representations are fine-tuned through relevant item prediction tasks. Our experiments, focusing on the cloud computing domain, demonstrate that HCF-based embeddings outperform AutoEncoder-based embeddings (using purely unstructured data), achieving a 28% lift in precision and an 11% lift in recall for domain-specific attribute prediction.

Updated: 2024-10-27 03:47:46

标题: 领域特定数据精炼与多模态嵌入生成

摘要: 创建领域中心嵌入的挑战源自非结构化数据的丰富和领域特定结构化数据的稀缺。传统的嵌入技术通常依赖于某种模式，从而限制了它们的适用性和有效性。本文介绍了一种新颖的建模方法，利用结构化数据来过滤非结构化数据中的噪音，从而产生具有高精确度和召回率的领域特定属性预测的嵌入。所提出的模型在混合协同过滤（HCF）框架中运行，通用实体表示通过相关项预测任务进行微调。我们的实验重点放在云计算领域，表明基于HCF的嵌入优于基于AutoEncoder的嵌入（仅使用非结构化数据），实现了领域特定属性预测精确度提升28%和召回率提升11%。

更新时间: 2024-10-27 03:47:46

领域: cs.LG,cs.SI,I.2.4; H.3.3; I.5.3; I.1.2; H.2.5

下载: http://arxiv.org/abs/2410.20325v1

Self-Supervised Learning and Opportunistic Inference for Continuous Monitoring of Freezing of Gait in Parkinson's Disease

Parkinson's disease (PD) is a progressive neurological disorder that impacts the quality of life significantly, making in-home monitoring of motor symptoms such as Freezing of Gait (FoG) critical. However, existing symptom monitoring technologies are power-hungry, rely on extensive amounts of labeled data, and operate in controlled settings. These shortcomings limit real-world deployment of the technology. This work presents LIFT-PD, a computationally-efficient self-supervised learning framework for real-time FoG detection. Our method combines self-supervised pre-training on unlabeled data with a novel differential hopping windowing technique to learn from limited labeled instances. An opportunistic model activation module further minimizes power consumption by selectively activating the deep learning module only during active periods. Extensive experimental results show that LIFT-PD achieves a 7.25% increase in precision and 4.4% improvement in accuracy compared to supervised models while using as low as 40% of the labeled training data used for supervised learning. Additionally, the model activation module reduces inference time by up to 67% compared to continuous inference. LIFT-PD paves the way for practical, energy-efficient, and unobtrusive in-home monitoring of PD patients with minimal labeling requirements.

Updated: 2024-10-27 03:47:18

标题: 自监督学习和机会推断在帕金森病中持续监测步态冻结

摘要: 帕金森病（PD）是一种逐渐发展的神经系统疾病，严重影响生活质量，因此在家监测运动症状（如步态冻结）变得至关重要。然而，现有的症状监测技术耗电量大，依赖大量标记数据，并且只能在受控环境中运行。这些缺点限制了技术在真实世界中的应用。本文提出了LIFT-PD，这是一个高效的自监督学习框架，用于实时检测步态冻结。我们的方法结合了对未标记数据进行自监督预训练以及一种新颖的差分跳跃窗口技术，以从有限的标记实例中学习。一种机会模型激活模块进一步通过仅在活跃时段选择性激活深度学习模块来最小化功耗。广泛的实验结果显示，与监督模型相比，LIFT-PD在精度上提高了7.25％，准确性提高了4.4％，同时仅使用了监督学习所需的标记训练数据的40％。此外，模型激活模块将推断时间降低了高达67％，相比连续推断。LIFT-PD为帕金森病患者的实用、节能和不显眼的家庭监测铺平了道路，并且对标记需求最小。

更新时间: 2024-10-27 03:47:18

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2410.21326v1

NeuroPath: A Neural Pathway Transformer for Joining the Dots of Human Connectomes

Although modern imaging technologies allow us to study connectivity between two distinct brain regions in-vivo, an in-depth understanding of how anatomical structure supports brain function and how spontaneous functional fluctuations emerge remarkable cognition is still elusive. Meanwhile, tremendous efforts have been made in the realm of machine learning to establish the nonlinear mapping between neuroimaging data and phenotypic traits. However, the absence of neuroscience insight in the current approaches poses significant challenges in understanding cognitive behavior from transient neural activities. To address this challenge, we put the spotlight on the coupling mechanism of structural connectivity (SC) and functional connectivity (FC) by formulating such network neuroscience question into an expressive graph representation learning problem for high-order topology. Specifically, we introduce the concept of topological detour to characterize how a ubiquitous instance of FC (direct link) is supported by neural pathways (detour) physically wired by SC, which forms a cyclic loop interacted by brain structure and function. In the clich\'e of machine learning, the multi-hop detour pathway underlying SC-FC coupling allows us to devise a novel multi-head self-attention mechanism within Transformer to capture multi-modal feature representation from paired graphs of SC and FC. Taken together, we propose a biological-inspired deep model, coined as NeuroPath, to find putative connectomic feature representations from the unprecedented amount of neuroimages, which can be plugged into various downstream applications such as task recognition and disease diagnosis. We have evaluated NeuroPath on large-scale public datasets including HCP and UK Biobank under supervised and zero-shot learning, where the state-of-the-art performance by our NeuroPath indicates great potential in network neuroscience.

Updated: 2024-10-27 03:25:05

标题: 神经路径：一种用于连接人类连接组的神经途径转换器

摘要: 尽管现代成像技术允许我们在体内研究两个不同脑区域之间的连接性，但对解剖结构如何支持大脑功能，以及自发功能波动如何产生显著认知的深入理解仍然难以捉摸。与此同时，在机器学习领域已经做出了巨大努力，以建立神经影像数据与表型特征之间的非线性映射。然而，当前方法缺乏神经科学的洞察力，这在理解从短暂神经活动中推断认知行为方面存在重大挑战。为了解决这一挑战，我们将注意力集中在结构连接（SC）和功能连接（FC）的耦合机制上，将这种网络神经科学问题构建成一个高阶拓扑的表达图学习问题。具体地，我们引入了拓扑绕道的概念，以描述如何由SC物理上连接的神经途径（绕道）支持FC的一个普遍实例（直接链接），形成由大脑结构和功能相互作用的循环环路。在机器学习的常见说法中，SC-FC耦合下的多跳绕道路径使我们能够在Transformer中设计一个新颖的多头自注意力机制，从SC和FC的配对图中捕获多模态特征表示。综上所述，我们提出了一个生物启发的深度模型，称为NeuroPath，从前所未有的大量神经影像中找到推测性连接特征表示，这些特征表示可以被应用于各种下游应用，如任务识别和疾病诊断。我们已经在包括HCP和UK生物库在内的大规模公共数据集上对NeuroPath进行了评估，包括监督学习和零-shot学习，我们的NeuroPath表现出的最新技术水平表明在网络神经科学中具有巨大潜力。

更新时间: 2024-10-27 03:25:05

领域: q-bio.NC,cs.AI,cs.CV,cs.LG

下载: http://arxiv.org/abs/2409.17510v3

A New Non-Binary Response Generation Scheme from Physical Unclonable Functions

Physical Unclonable Functions (PUFs) are widely used in key generation, with each PUF cell typically producing one bit of data. To enable the extraction of longer keys, a new non-binary response generation scheme based on the one-probability of PUF bits is proposed. Instead of using PUF bits directly as keys, non-binary responses are first derived by comparing the one-frequency of PUF bits with thresholds that evenly divide the area under the probability density function of the one-probability distribution and then converted to binary keys. To simplify the calculation of these thresholds, a re-scaling process is proposed and the beta distribution is used to model the one-probability distribution. Our FPGA implementation results demonstrate a significant increase in effective key length as opposed to previous works. Finally, we estimate the error rates and biases of the generated keys, and confirm the feasibility of the proposed key generation scheme.

Updated: 2024-10-27 03:24:17

标题: 一个新的非二进制响应生成方案来自物理不可克隆功能

摘要: 物理不可克隆功能（PUFs）广泛用于密钥生成，每个PUF单元通常产生一个数据位。为了实现更长密钥的提取，提出了一种基于PUF位的单概率的非二进制响应生成方案。通过将PUF位的单频率与均匀划分单概率分布函数下面积的阈值进行比较，首先派生非二进制响应，然后将其转换为二进制密钥，而不是直接使用PUF位作为密钥。为简化这些阈值的计算，提出了一种重新缩放过程，并使用beta分布来建模单概率分布。我们的FPGA实现结果表明，与以前的作品相比，有效密钥长度显著增加。最后，我们估计生成密钥的错误率和偏差，并确认了提出的密钥生成方案的可行性。

更新时间: 2024-10-27 03:24:17

领域: cs.CR,eess.SP

下载: http://arxiv.org/abs/2410.20324v1

Effective Instruction Parsing Plugin for Complex Logical Query Answering on Knowledge Graphs

Knowledge Graph Query Embedding (KGQE) aims to embed First-Order Logic (FOL) queries in a low-dimensional KG space for complex reasoning over incomplete KGs. To enhance the generalization of KGQE models, recent studies integrate various external information (such as entity types and relation context) to better capture the logical semantics of FOL queries. The whole process is commonly referred to as Query Pattern Learning (QPL). However, current QPL methods typically suffer from the pattern-entity alignment bias problem, leading to the learned defective query patterns limiting KGQE models' performance. To address this problem, we propose an effective Query Instruction Parsing Plugin (QIPP) that leverages the context awareness of Pre-trained Language Models (PLMs) to capture latent query patterns from code-like query instructions. Unlike the external information introduced by previous QPL methods, we first propose code-like instructions to express FOL queries in an alternative format. This format utilizes textual variables and nested tuples to convey the logical semantics within FOL queries, serving as raw materials for a PLM-based instruction encoder to obtain complete query patterns. Building on this, we design a query-guided instruction decoder to adapt query patterns to KGQE models. To further enhance QIPP's effectiveness across various KGQE models, we propose a query pattern injection mechanism based on compressed optimization boundaries and an adaptive normalization component, allowing KGQE models to utilize query patterns more efficiently. Extensive experiments demonstrate that our plug-and-play method improves the performance of eight basic KGQE models and outperforms two state-of-the-art QPL methods.

Updated: 2024-10-27 03:18:52

标题: 知识图谱上复杂逻辑查询答案的有效指令解析插件

摘要: 知识图谱查询嵌入（KGQE）旨在将一阶逻辑（FOL）查询嵌入到低维度的知识图谱空间中，以进行对不完整知识图谱的复杂推理。为了增强KGQE模型的泛化能力，最近的研究将各种外部信息（如实体类型和关系上下文）整合到一起，以更好地捕捉FOL查询的逻辑语义。整个过程通常被称为查询模式学习（QPL）。然而，当前的QPL方法通常受到模式-实体对齐偏差问题的困扰，导致学习有缺陷的查询模式限制了KGQE模型的性能。为了解决这个问题，我们提出了一种有效的查询指令解析插件（QIPP），利用预训练语言模型（PLMs）的上下文感知性从类似代码的查询指令中捕获潜在的查询模式。与先前QPL方法引入的外部信息不同，我们首先提出了类似代码的指令，以另一种格式表达FOL查询。这种格式利用文本变量和嵌套元组来传达FOL查询中的逻辑语义，作为PLM基于指令编码器获取完整查询模式的原材料。在此基础上，我们设计了一个基于查询的指令解码器，以将查询模式调整到KGQE模型。为了进一步增强QIPP在各种KGQE模型上的效果，我们提出了一种基于压缩优化边界和自适应归一化组件的查询模式注入机制，允许KGQE模型更有效地利用查询模式。广泛的实验证明，我们的即插即用方法提高了八个基本的KGQE模型的性能，并优于两种最先进的QPL方法。

更新时间: 2024-10-27 03:18:52

领域: cs.AI

下载: http://arxiv.org/abs/2410.20321v1

Few-shot Open Relation Extraction with Gaussian Prototype and Adaptive Margin

Few-shot relation extraction with none-of-the-above (FsRE with NOTA) aims at predicting labels in few-shot scenarios with unknown classes. FsRE with NOTA is more challenging than the conventional few-shot relation extraction task, since the boundaries of unknown classes are complex and difficult to learn. Meta-learning based methods, especially prototype-based methods, are the mainstream solutions to this task. They obtain the classification boundary by learning the sample distribution of each class. However, their performance is limited because few-shot overfitting and NOTA boundary confusion lead to misclassification between known and unknown classes. To this end, we propose a novel framework based on Gaussian prototype and adaptive margin named GPAM for FsRE with NOTA, which includes three modules, semi-factual representation, GMM-prototype metric learning and decision boundary learning. The first two modules obtain better representations to solve the few-shot problem through debiased information enhancement and Gaussian space distance measurement. The third module learns more accurate classification boundaries and prototypes through adaptive margin and negative sampling. In the training procedure of GPAM, we use contrastive learning loss to comprehensively consider the effects of range and margin on the classification of known and unknown classes to ensure the model's stability and robustness. Sufficient experiments and ablations on the FewRel dataset show that GPAM surpasses previous prototype methods and achieves state-of-the-art performance.

Updated: 2024-10-27 03:16:09

标题: 用高斯原型和自适应边界的少样本开放关系抽取

摘要: Few-shot关系抽取与无法分类（FsRE with NOTA）旨在在具有未知类别的少样本情景中预测标签。FsRE with NOTA比传统的少样本关系抽取任务更具挑战性，因为未知类别的边界复杂且难以学习。基于元学习的方法，特别是基于原型的方法，是解决这一任务的主流方案。它们通过学习每个类别的样本分布来获得分类边界。然而，它们的性能有限，因为少样本过拟合和NOTA边界混淆导致已知和未知类别之间的误分类。为此，我们提出了一种基于高斯原型和自适应边界的新框架，命名为GPAM，用于FsRE with NOTA，包括三个模块，半事实表示、GMM-原型度量学习和决策边界学习。前两个模块通过去偏信息增强和高斯空间距离测量获得更好的表示，以解决少样本问题。第三个模块通过自适应边界和负采样学习更准确的分类边界和原型。在GPAM的训练过程中，我们使用对比学习损失全面考虑范围和边界对已知和未知类别分类的影响，以确保模型的稳定性和鲁棒性。对FewRel数据集进行的充分实验和消融表明，GPAM超越了先前的原型方法，并取得了最先进的性能。

更新时间: 2024-10-27 03:16:09

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2410.20320v1

Rethinking Adversarial Inverse Reinforcement Learning: Policy Imitation, Transferable Reward Recovery and Algebraic Equilibrium Proof

Adversarial inverse reinforcement learning (AIRL) stands as a cornerstone approach in imitation learning, yet it faces criticisms from prior studies. In this paper, we rethink AIRL and respond to these criticisms. Criticism 1 lies in Inadequate Policy Imitation. We show that substituting the built-in algorithm with soft actor-critic (SAC) during policy updating (requires multi-iterations) significantly enhances the efficiency of policy imitation. Criticism 2 lies in Limited Performance in Transferable Reward Recovery Despite SAC Integration. While we find that SAC indeed exhibits a significant improvement in policy imitation, it introduces drawbacks to transferable reward recovery. We prove that the SAC algorithm itself is not feasible to disentangle the reward function comprehensively during the AIRL training process, and propose a hybrid framework, PPO-AIRL + SAC, for a satisfactory transfer effect. Criticism 3 lies in Unsatisfactory Proof from the Perspective of Potential Equilibrium. We reanalyze it from an algebraic theory perspective.

Updated: 2024-10-27 03:13:23

标题: 重新思考对抗性逆强化学习：策略模仿、可转移奖励恢复和代数均衡证明

摘要: Adversarial Inverse Reinforcement Learning (AIRL)是模仿学习中的一个基石方法，但它面临来自先前研究的批评。本文重新思考AIRL，并回应这些批评。批评1在于策略模仿不足。我们展示，在策略更新过程中用软actor-critic（SAC）替换内置算法（需要多次迭代）显著提高了策略模仿的效率。批评2在于尽管整合了SAC，但在可转移奖励恢复方面表现有限。虽然我们发现SAC确实在策略模仿方面有显著改进，但它对于可转移奖励恢复带来了一些缺点。我们证明SAC算法本身无法在AIRL训练过程中全面分离奖励函数，并提出了一个混合框架PPO-AIRL + SAC，以达到令人满意的转移效果。批评3在于从潜在均衡的角度看，证据不足。我们从代数理论的角度重新分析了这一问题。

更新时间: 2024-10-27 03:13:23

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2403.14593v4

Low-rank Bayesian matrix completion via geodesic Hamiltonian Monte Carlo on Stiefel manifolds

We present a new sampling-based approach for enabling efficient computation of low-rank Bayesian matrix completion and quantifying the associated uncertainty. Firstly, we design a new prior model based on the singular-value-decomposition (SVD) parametrization of low-rank matrices. Our prior is analogous to the seminal nuclear-norm regularization used in non-Bayesian setting and enforces orthogonality in the factor matrices by constraining them to Stiefel manifolds. Then, we design a geodesic Hamiltonian Monte Carlo (-within-Gibbs) algorithm for generating posterior samples of the SVD factor matrices. We demonstrate that our approach resolves the sampling difficulties encountered by standard Gibbs samplers for the common two-matrix factorization used in matrix completion. More importantly, the geodesic Hamiltonian sampler allows for sampling in cases with more general likelihoods than the typical Gaussian likelihood and Gaussian prior assumptions adopted in most of the existing Bayesian matrix completion literature. We demonstrate an applications of our approach to fit the categorical data of a mice protein dataset and the MovieLens recommendation problem. Numerical examples demonstrate superior sampling performance, including better mixing and faster convergence to a stationary distribution. Moreover, they demonstrate improved accuracy on the two real-world benchmark problems we considered.

Updated: 2024-10-27 03:12:53

标题: 通过Stiefel流形上的测地哈密顿蒙特卡洛进行低秩贝叶斯矩阵补全

摘要: 我们提出了一种新的基于采样的方法，用于实现低秩贝叶斯矩阵补全的高效计算并量化相关的不确定性。首先，我们设计了一个基于低秩矩阵奇异值分解（SVD）参数化的新先验模型。我们的先验类似于在非贝叶斯设置中使用的重要核范数正则化，并通过将它们限制在斯蒂弗尔流形上来强制施加因子矩阵的正交性。然后，我们设计了一个测地哈密顿蒙特卡罗（-在吉布斯内部）算法，用于生成SVD因子矩阵的后验样本。我们证明，我们的方法解决了常见的矩阵补全中标准吉布斯采样器遇到的采样困难。更重要的是，测地哈密顿采样器允许在比大多数现有贝叶斯矩阵补全文献中采用的典型高斯似然和高斯先验假设更一般的似然情况下进行采样。我们展示了我们方法在拟合小鼠蛋白数据集和MovieLens推荐问题中的应用。数值示例展示了出色的采样性能，包括更好的混合和更快地收敛到稳定分布。此外，它们展示了在我们考虑的两个实际基准问题上的改进准确性。

更新时间: 2024-10-27 03:12:53

领域: stat.ML,cs.LG,cs.NA,math.NA,stat.CO,stat.ME,65F55, 62F15, 15A83

下载: http://arxiv.org/abs/2410.20318v1

ProtSCAPE: Mapping the landscape of protein conformations in molecular dynamics

Understanding the dynamic nature of protein structures is essential for comprehending their biological functions. While significant progress has been made in predicting static folded structures, modeling protein motions on microsecond to millisecond scales remains challenging. To address these challenges, we introduce a novel deep learning architecture, Protein Transformer with Scattering, Attention, and Positional Embedding (ProtSCAPE), which leverages the geometric scattering transform alongside transformer-based attention mechanisms to capture protein dynamics from molecular dynamics (MD) simulations. ProtSCAPE utilizes the multi-scale nature of the geometric scattering transform to extract features from protein structures conceptualized as graphs and integrates these features with dual attention structures that focus on residues and amino acid signals, generating latent representations of protein trajectories. Furthermore, ProtSCAPE incorporates a regression head to enforce temporally coherent latent representations.

Updated: 2024-10-27 02:59:48

标题: ProtSCAPE：在分子动力学中映射蛋白质构象的景观

摘要: 理解蛋白质结构的动态性对于理解其生物功能至关重要。虽然在预测静态折叠结构方面取得了显著进展，但在微秒至毫秒尺度上对蛋白质运动建模仍具有挑战性。为了解决这些挑战，我们引入了一种新颖的深度学习架构，名为蛋白质变换器与散射、注意力和位置嵌入（ProtSCAPE），该架构利用几何散射变换以及基于变换器的注意力机制来捕捉蛋白质动态性质从分子动力学（MD）模拟中。ProtSCAPE利用几何散射变换的多尺度性质从将蛋白质结构概念化为图形的角度提取特征，并将这些特征与双重注意力结构相结合，重点关注残基和氨基酸信号，生成蛋白质轨迹的潜在表示。此外，ProtSCAPE还结合了回归头部以强制实现时间上连贯的潜在表示。

更新时间: 2024-10-27 02:59:48

领域: cs.LG,physics.chem-ph,q-bio.BM,q-bio.QM

下载: http://arxiv.org/abs/2410.20317v1

CleanDiffuser: An Easy-to-use Modularized Library for Diffusion Models in Decision Making

Leveraging the powerful generative capability of diffusion models (DMs) to build decision-making agents has achieved extensive success. However, there is still a demand for an easy-to-use and modularized open-source library that offers customized and efficient development for DM-based decision-making algorithms. In this work, we introduce CleanDiffuser, the first DM library specifically designed for decision-making algorithms. By revisiting the roles of DMs in the decision-making domain, we identify a set of essential sub-modules that constitute the core of CleanDiffuser, allowing for the implementation of various DM algorithms with simple and flexible building blocks. To demonstrate the reliability and flexibility of CleanDiffuser, we conduct comprehensive evaluations of various DM algorithms implemented with CleanDiffuser across an extensive range of tasks. The analytical experiments provide a wealth of valuable design choices and insights, reveal opportunities and challenges, and lay a solid groundwork for future research. CleanDiffuser will provide long-term support to the decision-making community, enhancing reproducibility and fostering the development of more robust solutions. The code and documentation of CleanDiffuser are open-sourced on the https://github.com/CleanDiffuserTeam/CleanDiffuser.

Updated: 2024-10-27 02:56:03

标题: CleanDiffuser：一个易于使用的模块化库，用于决策制定中的扩散模型

摘要: 利用扩散模型（DMs）强大的生成能力来构建决策代理已经取得了广泛成功。然而，仍然需要一个易于使用和模块化的开源库，为基于DM的决策算法提供定制和高效的开发。在这项工作中，我们介绍了CleanDiffuser，这是专门设计用于决策算法的第一个DM库。通过重新审视DM在决策领域的角色，我们确定了一组核心的子模块，构成了CleanDiffuser的核心，允许使用简单灵活的构建模块实现各种DM算法。为了展示CleanDiffuser的可靠性和灵活性，我们对使用CleanDiffuser实现的各种DM算法在广泛任务范围内进行了全面评估。分析实验提供了丰富的有价值的设计选择和见解，揭示了机遇和挑战，并为未来研究打下了坚实的基础。CleanDiffuser将为决策社区提供长期支持，增强可重现性，并促进更健壮解决方案的开发。CleanDiffuser的代码和文档在https://github.com/CleanDiffuserTeam/CleanDiffuser上开源。

更新时间: 2024-10-27 02:56:03

领域: cs.AI,cs.LG,cs.RO

下载: http://arxiv.org/abs/2406.09509v2

On the Implicit Relation Between Low-Rank Adaptation and Differential Privacy

A significant approach in natural language processing involves large-scale pre-training models on general domain data followed by their adaptation to specific tasks or domains. As models grow in size, full fine-tuning all of their parameters becomes increasingly impractical. To address this, some methods for low-rank task adaptation of language models have been proposed, e.g., LoRA and FLoRA. These methods keep the pre-trained model weights fixed and incorporate trainable low-rank decomposition matrices into some layers of the transformer architecture, called adapters. This approach significantly reduces the number of trainable parameters required for downstream tasks compared to full fine-tuning all parameters. In this work, we look at low-rank adaptation from the lens of data privacy. We show theoretically that the low-rank adaptation used in LoRA and FLoRA is equivalent to injecting some random noise into the batch gradients w.r.t the adapter parameters, and we quantify the variance of the injected noise. By establishing a Berry-Esseen type bound on the total variation distance between distribution of the injected noise and a Gaussian distribution with the same variance, we show that the dynamics of low-rank adaptation is close to that of differentially private fine-tuning of the adapters. Finally, using Johnson-Lindenstrauss lemma, we show that when augmented with gradient scaling, low-rank adaptation is very close to performing DPSGD algorithm with a fixed noise scale to fine-tune the adapters. These theoretical findings suggest that unlike other existing fine-tuning algorithms, low-rank adaptation provides privacy w.r.t the fine-tuning data implicitly.

Updated: 2024-10-27 02:54:59

标题: 关于低秩适应和差分隐私之间的隐含关系

摘要: 自然语言处理中一种重要的方法涉及在通用领域数据上对大规模预训练模型进行训练，然后将其适应到特定任务或领域。随着模型规模的增长，对所有参数进行完全微调变得越来越不切实际。为了解决这个问题，一些用于语言模型低秩任务适应的方法已被提出，例如LoRA和FLoRA。这些方法固定预训练模型的权重，并将可训练的低秩分解矩阵引入到transformer架构的一些层中，称为适配器。与完全微调所有参数相比，这种方法显著减少了下游任务所需的可训练参数的数量。在这项工作中，我们从数据隐私的角度看待低秩适应。我们在理论上表明，LoRA和FLoRA中使用的低秩适应等效于向适配器参数的批梯度注入一些随机噪声，并量化注入噪声的方差。通过在注入噪声的分布与具有相同方差的高斯分布之间的全变差距离上建立Berry-Esseen型界限，我们表明低秩适应的动态接近于差分私有微调适配器。最后，利用约翰逊-林登斯特劳斯引理，我们表明在梯度缩放的情况下，低秩适应非常接近执行带有固定噪声尺度的DPSGD算法来微调适配器。这些理论发现表明，与其他现有的微调算法不同，低秩适应隐含地提供了关于微调数据的隐私。

更新时间: 2024-10-27 02:54:59

领域: cs.LG,cs.AI,cs.CL

下载: http://arxiv.org/abs/2409.17538v2

Deep Learning Based Dense Retrieval: A Comparative Study

Dense retrievers have achieved state-of-the-art performance in various information retrieval tasks, but their robustness against tokenizer poisoning remains underexplored. In this work, we assess the vulnerability of dense retrieval systems to poisoned tokenizers by evaluating models such as BERT, Dense Passage Retrieval (DPR), Contriever, SimCSE, and ANCE. We find that supervised models like BERT and DPR experience significant performance degradation when tokenizers are compromised, while unsupervised models like ANCE show greater resilience. Our experiments reveal that even small perturbations can severely impact retrieval accuracy, highlighting the need for robust defenses in critical applications.

Updated: 2024-10-27 02:52:36

标题: 基于深度学习的密集检索：一项比较研究

摘要: 密集检索器在各种信息检索任务中取得了最先进的性能，但它们对标记器毒化的稳健性仍未得到充分探讨。在这项工作中，我们评估了密集检索系统对受污染标记器的脆弱性，评估了诸如BERT、密集通道检索（DPR）、Contriever、SimCSE和ANCE等模型。我们发现，像BERT和DPR这样的监督模型在标记器受损时经历了显著的性能下降，而像ANCE这样的无监督模型表现出更强的韧性。我们的实验表明，即使是小的扰动也可以严重影响检索准确性，突出了在关键应用中需要强大防御的必要性。

更新时间: 2024-10-27 02:52:36

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2410.20315v1

Q-Distribution guided Q-learning for offline reinforcement learning: Uncertainty penalized Q-value via consistency model

``Distribution shift'' is the main obstacle to the success of offline reinforcement learning. A learning policy may take actions beyond the behavior policy's knowledge, referred to as Out-of-Distribution (OOD) actions. The Q-values for these OOD actions can be easily overestimated. As a result, the learning policy is biased by using incorrect Q-value estimates. One common approach to avoid Q-value overestimation is to make a pessimistic adjustment. Our key idea is to penalize the Q-values of OOD actions associated with high uncertainty. In this work, we propose Q-Distribution Guided Q-Learning (QDQ), which applies a pessimistic adjustment to Q-values in OOD regions based on uncertainty estimation. This uncertainty measure relies on the conditional Q-value distribution, learned through a high-fidelity and efficient consistency model. Additionally, to prevent overly conservative estimates, we introduce an uncertainty-aware optimization objective for updating the Q-value function. The proposed QDQ demonstrates solid theoretical guarantees for the accuracy of Q-value distribution learning and uncertainty measurement, as well as the performance of the learning policy. QDQ consistently shows strong performance on the D4RL benchmark and achieves significant improvements across many tasks.

Updated: 2024-10-27 02:39:25

标题: Q分布引导的离线强化学习中的Q-learning：通过一致性模型对不确定性惩罚的Q值

摘要: “分布转移”是离线强化学习成功的主要障碍。学习策略可能采取超出行为策略知识范围的行动，称为超出分布（OOD）行动。这些OOD行动的Q值很容易被高估。因此，学习策略通过使用错误的Q值估计而存在偏差。避免Q值高估的一种常见方法是进行悲观调整。我们的关键想法是对与高不确定性相关的OOD行动的Q值进行惩罚。在这项工作中，我们提出了Q分布引导Q学习（QDQ），它基于不确定性估计在OOD区域对Q值进行悲观调整。这种不确定性度量依赖于通过高保真度和高效率的一致性模型学习的条件Q值分布。此外，为了防止过度保守的估计，我们引入了一个基于不确定性的优化目标来更新Q值函数。提出的QDQ展示了对Q值分布学习和不确定性测量准确性以及学习策略性能的坚实理论保证。QDQ在D4RL基准测试中始终表现出色，并在许多任务中取得显著改进。

更新时间: 2024-10-27 02:39:25

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2410.20312v1

A Distribution Semantics for Probabilistic Term Rewriting

Probabilistic programming is becoming increasingly popular thanks to its ability to specify problems with a certain degree of uncertainty. In this work, we focus on term rewriting, a well-known computational formalism. In particular, we consider systems that combine traditional rewriting rules with probabilities. Then, we define a distribution semantics for such systems that can be used to model the probability of reducing a term to some value. We also show how to compute a set of "explanations" for a given reduction, which can be used to compute its probability. Finally, we illustrate our approach with several examples and outline a couple of extensions that may prove useful to improve the expressive power of probabilistic rewrite systems.

Updated: 2024-10-27 02:37:31

标题: 一种概率项重写的分布语义

摘要: 概率编程因其能够以一定程度的不确定性来指定问题而日益流行。在这项工作中，我们专注于术语重写，这是一个众所周知的计算形式。特别地，我们考虑将传统的重写规则与概率相结合的系统。然后，我们为这样的系统定义了一个分布语义，可以用来建模将术语归约为某个值的概率。我们还展示了如何计算给定归约的一组“解释”，这些解释可以用于计算其概率。最后，我们通过几个例子来说明我们的方法，并概述了一些可能有助于提高概率重写系统表现力的扩展。

更新时间: 2024-10-27 02:37:31

领域: cs.PL,cs.AI

下载: http://arxiv.org/abs/2410.15081v2

ANOMIX: A Simple yet Effective Hard Negative Generation via Mixing for Graph Anomaly Detection

Graph contrastive learning (GCL) generally requires a large number of samples. The one of the effective ways to reduce the number of samples is using hard negatives (e.g., Mixup). Designing mixing-based approach for GAD can be difficult due to imbalanced data or limited number of anomalies. We propose ANOMIX, a framework that consists of a novel graph mixing approach, ANOMIX-M, and multi-level contrasts for GAD. ANOMIX-M can effectively mix abnormality and normality from input graph to generate hard negatives, which are important for efficient GCL. ANOMIX is (a) A first mixing approach: firstly attempting graph mixing to generate hard negatives for GAD task and node- and subgraph-level contrasts to distinguish underlying anomalies. (b) Accurate: winning the highest AUC, up to 5.49% higher and 1.76% faster. (c) Effective: reducing the number of samples nearly 80% in GCL. Code is available at https://github.com/missinghwan/ANOMIX.

Updated: 2024-10-27 02:35:12

标题: ANOMIX：通过混合生成简单而有效的硬负例，用于图异常检测

摘要: 图形对比学习（GCL）通常需要大量样本。减少样本数量的有效方法之一是使用硬负例（例如，Mixup）。为GAD设计基于混合的方法可能会很困难，因为数据不平衡或异常数量有限。我们提出了ANOMIX，一个包含了新型图混合方法ANOMIX-M和用于GAD的多级对比的框架。ANOMIX-M可以有效地混合输入图中的异常性和正常性，以生成硬负例，这对于高效的GCL至关重要。ANOMIX是（a）第一个混合方法：首先尝试图混合来生成GAD任务的硬负例，并且在节点和子图级别上进行对比以区分潜在的异常。（b）准确：获得最高的AUC，高达5.49％，速度快1.76％。（c）有效：在GCL中几乎减少样本数量80％。代码可在https://github.com/missinghwan/ANOMIX 上找到。

更新时间: 2024-10-27 02:35:12

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2410.20310v1

Enhancing Community Vision Screening -- AI Driven Retinal Photography for Early Disease Detection and Patient Trust

Community vision screening plays a crucial role in identifying individuals with vision loss and preventing avoidable blindness, particularly in rural communities where access to eye care services is limited. Currently, there is a pressing need for a simple and efficient process to screen and refer individuals with significant eye disease-related vision loss to tertiary eye care centers for further care. An ideal solution should seamlessly and readily integrate with existing workflows, providing comprehensive initial screening results to service providers, thereby enabling precise patient referrals for timely treatment. This paper introduces the Enhancing Community Vision Screening (ECVS) solution, which addresses the aforementioned concerns with a novel and feasible solution based on simple, non-invasive retinal photography for the detection of pathology-based visual impairment. Our study employs four distinct deep learning models: RETinal photo Quality Assessment (RETQA), Pathology Visual Impairment detection (PVI), Eye Disease Diagnosis (EDD) and Visualization of Lesion Regions of the eye (VLR). We conducted experiments on over 10 datasets, totaling more than 80,000 fundus photos collected from various sources. The models integrated into ECVS achieved impressive AUC scores of 0.98 for RETQA, 0.95 for PVI, and 0.90 for EDD, along with a DICE coefficient of 0.48 for VLR. These results underscore the promising capabilities of ECVS as a straightforward and scalable method for community-based vision screening.

Updated: 2024-10-27 02:31:19

标题: 增强社区视力筛查-基于人工智能驱动的眼底摄影技术用于早期疾病检测和患者信任

摘要: 社区视力筛查在识别视力损失个体和预防可避免的失明中发挥着至关重要的作用，特别是在眼科服务有限的农村社区。目前，迫切需要一个简单高效的过程来筛查和转诊有明显眼疾相关视力损失的个体至三级眼科中心进行进一步治疗。理想的解决方案应该与现有工作流程无缝集成，为服务提供者提供全面的初步筛查结果，从而使精确的病人转诊及时治疗成为可能。本文介绍了增强社区视力筛查(ECVS)解决方案，该解决方案通过简单的非侵入性视网膜摄影检测基于病理的视力障碍，从而解决了上述问题。我们的研究采用了四种不同的深度学习模型：视网膜照片质量评估(RETQA)、病理性视力障碍检测(PVI)、眼病诊断(EDD)以及眼部病变区域可视化(VLR)。我们在超过10个数据集上进行了实验，总共收集了来自各种来源的超过80,000张眼底照片。集成到ECVS中的模型实现了令人印象深刻的AUC分数，RETQA为0.98，PVI为0.95，EDD为0.90，VLR的DICE系数为0.48。这些结果强调了ECVS作为一种简单易扩展的社区视力筛查方法的有前途的能力。

更新时间: 2024-10-27 02:31:19

领域: eess.IV,cs.AI,cs.CV

下载: http://arxiv.org/abs/2410.20309v1

Accelerating Direct Preference Optimization with Prefix Sharing

Offline paired preference optimization algorithms have become a popular approach for fine-tuning on preference data, outperforming traditional supervised fine-tuning in various tasks. However, traditional implementations often involve redundant computations, especially for tasks with long shared prompts. We introduce prefix sharing for preference tuning, a novel technique that processes chosen and rejected responses as one sequence with a shared prefix. To prevent cross-response contamination, we use a custom block-sparse attention mask. Our method achieves $1.1$-$1.5\times$ improvement in training throughput on popular DPO datasets, without any effect on convergence. When combined with sequence packing, we observe consistent $1.3$-$1.6\times$ speedups, benefiting even datasets with smaller sequence lengths. While we focus on Direct Preference Optimization (DPO), our approach is applicable to other paired preference tuning methods. By enhancing computational efficiency, our work contributes to making preference-based fine-tuning more accessible for a wider range of applications and model sizes. We open-source our code at https://github.com/frankxwang/dpo-prefix-sharing.

Updated: 2024-10-27 02:06:17

标题: 加速直接偏好优化与前缀共享

摘要: 离线配对偏好优化算法已经成为微调偏好数据的流行方法，在各种任务中表现优于传统的监督微调。然而，传统的实现通常涉及冗余计算，特别是对于具有长共享提示的任务。我们引入了前缀共享用于偏好微调，这是一种新颖的技术，将选择和拒绝的响应作为一个具有共享前缀的序列进行处理。为了防止跨响应污染，我们使用了自定义的块状稀疏注意力掩码。我们的方法在流行的DPO数据集上实现了$1.1$-$1.5\times$的训练吞吐量改进，对收敛没有任何影响。当与序列打包结合时，我们观察到持续的$1.3$-$1.6\times$加速，甚至对具有较小序列长度的数据集也有益。虽然我们专注于直接偏好优化（DPO），但我们的方法适用于其他配对偏好调整方法。通过提高计算效率，我们的工作有助于使基于偏好的微调对更广泛的应用和模型规模更易于访问。我们在https://github.com/frankxwang/dpo-prefix-sharing上开源我们的代码。

更新时间: 2024-10-27 02:06:17

领域: cs.LG,cs.CL

下载: http://arxiv.org/abs/2410.20305v1

Distributional MIPLIB: a Multi-Domain Library for Advancing ML-Guided MILP Methods

Mixed Integer Linear Programming (MILP) is a fundamental tool for modeling combinatorial optimization problems. Recently, a growing body of research has used machine learning to accelerate MILP solving. Despite the increasing popularity of this approach, there is a lack of a common repository that provides distributions of similar MILP instances across different domains, at different hardness levels, with standardized test sets. In this paper, we introduce Distributional MIPLIB, a multi-domain library of problem distributions for advancing ML-guided MILP methods. We curate MILP distributions from existing work in this area as well as real-world problems that have not been used, and classify them into different hardness levels. It will facilitate research in this area by enabling comprehensive evaluation on diverse and realistic domains. We empirically illustrate the benefits of using Distributional MIPLIB as a research vehicle in two ways. We evaluate the performance of ML-guided variable branching on previously unused distributions to identify potential areas for improvement. Moreover, we propose to learn branching policies from a mix of distributions, demonstrating that mixed distributions achieve better performance compared to homogeneous distributions when there is limited data and generalize well to larger instances. The dataset is publicly available at https://sites.google.com/usc.edu/distributional-miplib/home.

Updated: 2024-10-27 01:54:10

标题: Distributional MIPLIB：一个用于推进ML引导的MILP方法的多领域库

摘要: 混合整数线性规划（MILP）是建模组合优化问题的基本工具。最近，越来越多的研究利用机器学习加速MILP求解。尽管这种方法越来越受欢迎，但缺乏一个共同的存储库，提供不同领域、不同困难程度的类似MILP实例的分布，以及标准化的测试集。在本文中，我们介绍了Distributional MIPLIB，一个用于推进ML引导的MILP方法的多领域问题分布库。我们从现有工作中策划MILP分布，以及未曾使用的实际问题，并将它们分类为不同的困难级别。它将通过在不同且现实的领域上进行全面评估来促进这一领域的研究。我们以实证方式说明了使用Distributional MIPLIB作为研究工具的好处。我们评估了ML引导变量分支在以前未使用的分布上的性能，以确定改进的潜在领域。此外，我们提出从混合分布中学习分支策略，证明与同质分布相比，混合分布在数据有限且泛化到更大实例时表现更好。该数据集可在https://sites.google.com/usc.edu/distributional-miplib/home上公开获取。

更新时间: 2024-10-27 01:54:10

领域: cs.LG,math.OC

下载: http://arxiv.org/abs/2406.06954v2

A Canonicalization Perspective on Invariant and Equivariant Learning

In many applications, we desire neural networks to exhibit invariance or equivariance to certain groups due to symmetries inherent in the data. Recently, frame-averaging methods emerged to be a unified framework for attaining symmetries efficiently by averaging over input-dependent subsets of the group, i.e., frames. What we currently lack is a principled understanding of the design of frames. In this work, we introduce a canonicalization perspective that provides an essential and complete view of the design of frames. Canonicalization is a classic approach for attaining invariance by mapping inputs to their canonical forms. We show that there exists an inherent connection between frames and canonical forms. Leveraging this connection, we can efficiently compare the complexity of frames as well as determine the optimality of certain frames. Guided by this principle, we design novel frames for eigenvectors that are strictly superior to existing methods -- some are even optimal -- both theoretically and empirically. The reduction to the canonicalization perspective further uncovers equivalences between previous methods. These observations suggest that canonicalization provides a fundamental understanding of existing frame-averaging methods and unifies existing equivariant and invariant learning methods. Code is available at https://github.com/GeorgeMLP/canonicalization.

Updated: 2024-10-27 00:58:54

标题: 一个规范化视角下关于不变性学习和等变性学习的研究

摘要: 在许多应用中，我们希望神经网络对数据中固有的对称性表现出不变性或等变性。最近，出现了一种统一的框架平均方法，通过对输入相关的子集（即框架）进行平均，有效地获得对称性。我们目前缺乏对框架设计的原则性理解。在这项工作中，我们引入了一个规范化的视角，为框架设计提供了一个基本和完整的视角。规范化是通过将输入映射到它们的规范形式来获得不变性的经典方法。我们展示了框架和规范形式之间存在固有的联系。利用这种联系，我们可以有效地比较框架的复杂性，以及确定某些框架的最优性。在这个原则的指导下，我们设计了用于特征向量的新颖框架，这些框架在理论和实践上都严格优于现有方法，甚至有些是最优的。将规范化视角转化为更深层次的视角还揭示了先前方法之间的等价性。这些观察结果表明，规范化提供了对现有框架平均方法的基本理解，并统一了现有的等变和不变学习方法。代码可在https://github.com/GeorgeMLP/canonicalization找到。

更新时间: 2024-10-27 00:58:54

领域: cs.LG

下载: http://arxiv.org/abs/2405.18378v3

Sequential Large Language Model-Based Hyper-Parameter Optimization

This study introduces SLLMBO, an innovative framework that leverages Large Language Models (LLMs) for hyperparameter optimization (HPO), incorporating dynamic search space adaptability, enhanced parameter landscape exploitation, and a hybrid, novel LLM-Tree-structured Parzen Estimator (LLM-TPE) sampler. By addressing limitations in recent fully LLM-based methods and traditional Bayesian Optimization (BO), SLLMBO achieves more robust optimization. This comprehensive benchmarking evaluates multiple LLMs, including GPT-3.5-turbo, GPT-4o, Claude-Sonnet-3.5, and Gemini-1.5-flash, extending prior work beyond GPT-3.5 and GPT-4 and establishing SLLMBO as the first framework to benchmark a diverse set of LLMs for HPO. By integrating LLMs' established strengths in parameter initialization with the exploitation abilities demonstrated in this study, alongside TPE's exploration capabilities, the LLM-TPE sampler achieves a balanced exploration-exploitation trade-off, reduces API costs, and mitigates premature early stoppings for more effective parameter searches. Across 14 tabular tasks in classification and regression, the LLM-TPE sampler outperformed fully LLM-based methods and achieved superior results over BO methods in 9 tasks. Testing early stopping in budget-constrained scenarios further demonstrated competitive performance, indicating that LLM-based methods generally benefit from extended iterations for optimal results. This work lays the foundation for future research exploring open-source LLMs, reproducibility of LLM results in HPO, and benchmarking SLLMBO on complex datasets, such as image classification, segmentation, and machine translation.

Updated: 2024-10-27 00:50:30

标题: 基于顺序大型语言模型的超参数优化

摘要: 这项研究介绍了SLLMBO，一个创新的框架，利用大型语言模型（LLMs）进行超参数优化（HPO），结合动态搜索空间适应性、增强参数景观开发，以及混合、新颖的LLM-树结构帕森估计器（LLM-TPE）采样器。通过解决最近完全基于LLM的方法和传统贝叶斯优化（BO）的局限性，SLLMBO实现了更加稳健的优化。这项全面的基准测试评估了多个LLMs，包括GPT-3.5-turbo、GPT-4o、Claude-Sonnet-3.5和Gemini-1.5-flash，将之前的工作扩展到GPT-3.5和GPT-4之外，并将SLLMBO确立为首个用于HPO基准测试多种LLMs的框架。通过将LLMs在参数初始化中已经建立的优势与本研究中展示的开发能力结合起来，再加上TPE的探索能力，LLM-TPE采样器实现了平衡的探索-开发权衡，降低了API成本，并减轻了过早停止以获得更有效的参数搜索。在分类和回归的14个表格任务中，LLM-TPE采样器胜过了完全基于LLM的方法，在9个任务中比BO方法取得了更好的结果。在有预算限制的场景中测试早停止进一步展示了竞争性表现，表明LLM-based方法通常受益于更多的迭代以获得最佳结果。这项工作为未来研究探索开源LLMs、在HPO中LLM结果的可重现性以及在复杂数据集上进行SLLMBO基准测试，如图像分类、分割和机器翻译，奠定了基础。

更新时间: 2024-10-27 00:50:30

领域: cs.LG,cs.AI,cs.CL

下载: http://arxiv.org/abs/2410.20302v1

Predicting Mortality and Functional Status Scores of Traumatic Brain Injury Patients using Supervised Machine Learning

Traumatic brain injury (TBI) presents a significant public health challenge, often resulting in mortality or lasting disability. Predicting outcomes such as mortality and Functional Status Scale (FSS) scores can enhance treatment strategies and inform clinical decision-making. This study applies supervised machine learning (ML) methods to predict mortality and FSS scores using a real-world dataset of 300 pediatric TBI patients from the University of Colorado School of Medicine. The dataset captures clinical features, including demographics, injury mechanisms, and hospitalization outcomes. Eighteen ML models were evaluated for mortality prediction, and thirteen models were assessed for FSS score prediction. Performance was measured using accuracy, ROC AUC, F1-score, and mean squared error. Logistic regression and Extra Trees models achieved high precision in mortality prediction, while linear regression demonstrated the best FSS score prediction. Feature selection reduced 103 clinical variables to the most relevant, enhancing model efficiency and interpretability. This research highlights the role of ML models in identifying high-risk patients and supporting personalized interventions, demonstrating the potential of data-driven analytics to improve TBI care and integrate into clinical workflows.

Updated: 2024-10-27 00:44:45

标题: 使用监督式机器学习预测创伤性脑损伤患者的死亡率和功能状态评分

摘要: 创伤性脑损伤（TBI）是一个重要的公共卫生挑战，通常会导致死亡或持续性残疾。预测死亡和功能状态评分（FSS）等结果可以增强治疗策略并指导临床决策。本研究应用监督机器学习（ML）方法，利用科罗拉多大学医学院的300名小儿TBI患者的真实数据集，预测死亡和FSS分数。数据集包含临床特征，包括人口统计学、伤害机制和住院结果。对于死亡预测评估了18个ML模型，对于FSS分数预测评估了13个模型。性能指标包括准确性、ROC AUC、F1分数和均方误差。逻辑回归和Extra Trees模型在死亡预测中表现出高精确度，而线性回归在FSS分数预测中表现最佳。特征选择将103个临床变量减少到最相关的变量，提高了模型的效率和可解释性。这项研究突出了ML模型在识别高危患者和支持个性化干预方面的作用，展示了数据驱动分析改善TBI护理并融入临床工作流程的潜力。

更新时间: 2024-10-27 00:44:45

领域: cs.LG

下载: http://arxiv.org/abs/2410.20300v1

Learning from Response not Preference: A Stackelberg Approach for LLM Detoxification using Non-parallel Data

Text detoxification, a variant of style transfer tasks, finds useful applications in online social media. This work presents a fine-tuning method that only uses non-parallel data to turn large language models (LLM) into a detoxification rewritter. We model the fine-tuning process as a Stackelberg game between an LLM (leader) and a toxicity screener (follower), which is a binary style classifier (toxic or non-toxic). The LLM aims to align its preference according to the screener and generate paraphases passing the screening. The primary challenge of non-parallel data fine-tuning is incomplete preference. In the case of unsuccessful paraphrases, the classifier cannot establish a preference between the input and paraphrase, as they belong to the same toxic style. Hence, preference-alignment fine-tuning methods, such as direct preference optimization (DPO), no longer apply. To address the challenge of incomplete preference, we propose Stackelberg response optimization (SRO), adapted from DPO, to enable the LLM to learn from the follower's response. The gist is that SRO decreases the likelihood of generating the paraphrase if it fails the follower's screening while performing DPO on the pair of the toxic input and its paraphrase when the latter passes the screening. Experiments indicate that the SRO-fine-tunned LLM achieves satisfying performance comparable to state-of-the-art models regarding style accuracy, content similarity, and fluency. The overall detoxification performance surpasses other computing methods and matches the human reference. Additional empirical evidence suggests that SRO is sensitive to the screener's feedback, and a slight perturbation leads to a significant performance drop. We release the code and LLM models at \url{https://github.com/XXXinhong/Detoxification_LLM}.

Updated: 2024-10-27 00:39:54

标题: 学习响应而非偏好：使用非平行数据的Stackelberg方法进行LLM解毒

摘要: 文本去毒化是风格转移任务的一种变体，在在线社交媒体中有着有用的应用。本文提出了一种精细调整方法，该方法仅使用非平行数据将大型语言模型(LLM)转变为去毒重写器。我们将微调过程建模为LLM(领导者)和毒性筛选器(跟随者)之间的斯坦克贝格博弈，后者是一个二元风格分类器(有毒或无毒)。LLM的目标是根据筛选器对其偏好进行调整，并生成通过筛选的释义。非平行数据微调的主要挑战是偏好不完整。在未成功的释义情况下，分类器无法建立输入和释义之间的偏好，因为它们属于相同的有毒风格。因此，像直接偏好优化(DPO)这样的偏好对齐微调方法不再适用。为了解决偏好不完整的挑战，我们提出了从DPO改编的斯坦克贝格响应优化(SRO)，使LLM能够从跟随者的响应中学习。SRO减少了生成释义的可能性，如果释义未通过跟随者的筛选，则执行DPO对有毒输入和其释义对进行。实验表明，经过SRO微调的LLM在风格准确性、内容相似性和流畅性方面实现了令人满意的性能，与最先进的模型相比。整体去毒化性能超越了其他计算方法，并与人类参考相匹配。额外的经验证据表明，SRO对筛选器的反馈敏感，轻微的扰动会导致显著的性能下降。我们在\url{https://github.com/XXXinhong/Detoxification_LLM}发布了代码和LLM模型。

更新时间: 2024-10-27 00:39:54

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2410.20298v1

Fine-Tuning and Evaluating Open-Source Large Language Models for the Army Domain

In recent years, the widespread adoption of Large Language Models (LLMs) has sparked interest in their potential for application within the military domain. However, the current generation of LLMs demonstrate sub-optimal performance on Army use cases, due to the prevalence of domain-specific vocabulary and jargon. In order to fully leverage LLMs in-domain, many organizations have turned to fine-tuning to circumvent the prohibitive costs involved in training new LLMs from scratch. In light of this trend, we explore the viability of adapting open-source LLMs for usage in the Army domain in order to address their existing lack of domain-specificity. Our investigations have resulted in the creation of three distinct generations of TRACLM, a family of LLMs fine-tuned by The Research and Analysis Center (TRAC), Army Futures Command (AFC). Through continuous refinement of our training pipeline, each successive iteration of TRACLM displayed improved capabilities when applied to Army tasks and use cases. Furthermore, throughout our fine-tuning experiments, we recognized the need for an evaluation framework that objectively quantifies the Army domain-specific knowledge of LLMs. To address this, we developed MilBench, an extensible software framework that efficiently evaluates the Army knowledge of a given LLM using tasks derived from doctrine and assessments. We share preliminary results, models, methods, and recommendations on the creation of TRACLM and MilBench. Our work significantly informs the development of LLM technology across the DoD and augments senior leader decisions with respect to artificial intelligence integration.

Updated: 2024-10-27 00:39:24

标题: Feinabstimmung und Evaluierung von Open-Source Large Language Models für den Militärbereich

摘要: 近年来，大规模语言模型（LLMs）的广泛应用引起了人们对其在军事领域应用潜力的兴趣。然而，由于领域特定词汇和行话的普遍存在，当前一代LLMs在军队使用案例上表现出次优性能。为了充分利用领域内的LLMs，许多组织已经转向微调，以规避从头开始训练新LLMs所涉及的 prohibitve 成本。鉴于这一趋势，我们探讨了将开源LLMs调整为用于解决陆军领域现有缺乏特定性的可行性。我们的调查结果导致了TRACLM的三代不同版本的创建，这是由研究与分析中心（TRAC）、陆军未来指挥部（AFC）微调的LLMs系列。通过对培训流程的持续改进，每一代TRACLM在应用于陆军任务和使用案例时都显示出改进的能力。此外，在我们的微调实验中，我们认识到需要一个评估框架，客观量化LLMs的陆军领域特定知识。为了解决这个问题，我们开发了MilBench，一个可扩展的软件框架，有效地评估给定LLMs的陆军知识，使用从学说和评估中得出的任务。我们分享了TRACLM和MilBench的初步结果、模型、方法和建议的创建。我们的工作极大地促进了LLM技术在国防部的发展，并在人工智能整合方面增强了高级领导人的决策。

更新时间: 2024-10-27 00:39:24

领域: cs.CL,cs.AI,cs.LG,cs.NE

下载: http://arxiv.org/abs/2410.20297v1

DeCaf: A Causal Decoupling Framework for OOD Generalization on Node Classification

Graph Neural Networks (GNNs) are susceptible to distribution shifts, creating vulnerability and security issues in critical domains. There is a pressing need to enhance the generalizability of GNNs on out-of-distribution (OOD) test data. Existing methods that target learning an invariant (feature, structure)-label mapping often depend on oversimplified assumptions about the data generation process, which do not adequately reflect the actual dynamics of distribution shifts in graphs. In this paper, we introduce a more realistic graph data generation model using Structural Causal Models (SCMs), allowing us to redefine distribution shifts by pinpointing their origins within the generation process. Building on this, we propose a casual decoupling framework, DeCaf, that independently learns unbiased feature-label and structure-label mappings. We provide a detailed theoretical framework that shows how our approach can effectively mitigate the impact of various distribution shifts. We evaluate DeCaf across both real-world and synthetic datasets that demonstrate different patterns of shifts, confirming its efficacy in enhancing the generalizability of GNNs.

Updated: 2024-10-27 00:22:18

标题: DeCaf：一种用于节点分类的OOD泛化的因果解耦框架

摘要: 图神经网络（GNNs）容易受到分布偏移的影响，在关键领域造成了脆弱性和安全问题。迫切需要提高GNNs在分布偏移测试数据上的泛化能力。现有的方法旨在学习不变的（特征，结构）-标签映射，通常依赖于对数据生成过程的简化假设，这些假设并不能充分反映图中分布偏移的实际动态。本文引入了更加现实的图数据生成模型，使用结构因果模型（SCMs），使我们能够通过定位生成过程中的来源重新定义分布偏移。在此基础上，我们提出了一个因果解耦框架，称为DeCaf，它独立学习无偏的特征-标签和结构-标签映射。我们提供了一个详细的理论框架，展示了我们的方法如何有效减轻各种分布偏移的影响。我们在真实世界和合成数据集上评估了DeCaf，这些数据集展现了不同的偏移模式，证实了其在提高GNNs泛化能力方面的有效性。

更新时间: 2024-10-27 00:22:18

领域: cs.LG

下载: http://arxiv.org/abs/2410.20295v1