Arxiv Day: Article

Adaptive RKHS Fourier Features for Compositional Gaussian Process Models

Deep Gaussian Processes (DGPs) leverage a compositional structure to model non-stationary processes. DGPs typically rely on local inducing point approximations across intermediate GP layers. Recent advances in DGP inference have shown that incorporating global Fourier features from Reproducing Kernel Hilbert Space (RKHS) can enhance the DGPs' capability to capture complex non-stationary patterns. This paper extends the use of these features to compositional GPs involving linear transformations. In particular, we introduce Ordinary Differential Equation (ODE) -based RKHS Fourier features that allow for adaptive amplitude and phase modulation through convolution operations. This convolutional formulation relates our work to recently proposed deep latent force models, a multi-layer structure designed for modelling nonlinear dynamical systems. By embedding these adjustable RKHS Fourier features within a doubly stochastic variational inference framework, our model exhibits improved predictive performance across various regression tasks.

Updated: 2024-07-01 23:56:56

标题: 自适应RKHS傅立叶特征用于组合高斯过程模型

摘要: 深高斯过程（DGPs）利用组合结构来建模非平稳过程。DGPs通常依赖于跨越中间GP层的局部诱导点逼近。最近关于DGP推断的进展表明，将来自复制核希尔伯特空间（RKHS）的全局傅里叶特征纳入可以增强DGPs捕获复杂非平稳模式的能力。本文将这些特征的使用扩展到涉及线性变换的组合GP中。特别地，我们引入基于常微分方程（ODE）的RKHS傅里叶特征，通过卷积运算实现自适应幅度和相位调制。这种卷积形式将我们的工作与最近提出的深潜在力模型联系起来，这是一种用于建模非线性动力系统的多层结构。通过将这些可调节的RKHS傅里叶特征嵌入到一个双随机变分推断框架中，我们的模型在各种回归任务中展现出了改善的预测性能。

更新时间: 2024-07-01 23:56:56

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2407.01856v1

Improving Multilingual Instruction Finetuning via Linguistically Natural and Diverse Datasets

Advancements in Large Language Models (LLMs) have significantly enhanced instruction-following capabilities. However, most Instruction Fine-Tuning (IFT) datasets are predominantly in English, limiting model performance in other languages. Traditional methods for creating multilingual IFT datasets such as translating existing English IFT datasets or converting existing NLP datasets into IFT datasets by templating, struggle to capture linguistic nuances and ensure prompt (instruction) diversity. To address this issue, we propose a novel method for collecting multilingual IFT datasets that preserves linguistic naturalness and ensures prompt diversity. This approach leverages English-focused LLMs, monolingual corpora, and a scoring function to create high-quality, diversified IFT datasets in multiple languages. Experiments demonstrate that LLMs finetuned using these IFT datasets show notable improvements in both generative and discriminative tasks, indicating enhanced language comprehension by LLMs in non-English contexts. Specifically, on the multilingual summarization task, LLMs using our IFT dataset achieved 17.57% and 15.23% improvements over LLMs fine-tuned with translation-based and template-based datasets, respectively.

Updated: 2024-07-01 23:47:09

标题: 通过语言自然和多样化数据集优化多语言指导微调

摘要: 大型语言模型（LLMs）的进展显著增强了指令遵循能力。然而，大多数指令微调（IFT）数据集主要是英文，限制了模型在其他语言中的表现。传统的创建多语言IFT数据集的方法，如翻译现有的英文IFT数据集或通过模板将现有的NLP数据集转换为IFT数据集，难以捕捉语言细微差别并确保指令多样性。为了解决这个问题，我们提出了一种新颖的方法，用于收集多语言IFT数据集，保留语言自然性并确保指令多样性。这种方法利用以英语为中心的LLMs、单语语料库和评分函数创建多语言高质量、多样化的IFT数据集。实验证明，使用这些IFT数据集微调的LLMs在生成和判别任务中显示出显著改进，表明LLMs在非英语环境中的语言理解能力得到提升。具体来说，在多语言摘要任务中，使用我们的IFT数据集的LLMs相对于使用基于翻译和基于模板的数据集进行微调的LLMs分别取得了17.57%和15.23%的改进。

更新时间: 2024-07-01 23:47:09

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2407.01853v1

Meerkat: Audio-Visual Large Language Model for Grounding in Space and Time

Leveraging Large Language Models' remarkable proficiency in text-based tasks, recent works on Multi-modal LLMs (MLLMs) extend them to other modalities like vision and audio. However, the progress in these directions has been mostly focused on tasks that only require a coarse-grained understanding of the audio-visual semantics. We present Meerkat, an audio-visual LLM equipped with a fine-grained understanding of image and audio both spatially and temporally. With a new modality alignment module based on optimal transport and a cross-attention module that enforces audio-visual consistency, Meerkat can tackle challenging tasks such as audio referred image grounding, image guided audio temporal localization, and audio-visual fact-checking. Moreover, we carefully curate a large dataset AVFIT that comprises 3M instruction tuning samples collected from open-source datasets, and introduce MeerkatBench that unifies five challenging audio-visual tasks. We achieve state-of-the-art performance on all these downstream tasks with a relative improvement of up to 37.12%.

Updated: 2024-07-01 23:32:25

标题: 獴：用于时空基础的音频-视觉大型语言模型

摘要: 利用大型语言模型在基于文本的任务中表现出色的能力，最近关于多模态LLMs（MLLMs）的研究将它们扩展到其他模态，如视觉和音频。然而，在这些方向上的进展大多集中在只需要粗粒度理解音频-视觉语义的任务上。我们提出了Meerkat，一个具备对图像和音频进行空间和时间上细粒度理解的音频-视觉LLM。通过基于最优传输的新模态对齐模块和强化音频-视觉一致性的交叉注意力模块，Meerkat能够应对挑战性任务，如音频引用图像定位、图像引导音频时间定位和音频-视觉事实核查。此外，我们精心策划了一个包含来自开源数据集收集的3M指令调校样本的大型数据集AVFIT，并引入了MeerkatBench，统一了五个具有挑战性的音频-视觉任务。我们在所有这些下游任务中取得了最先进的性能，相对改善高达37.12%。

更新时间: 2024-07-01 23:32:25

领域: cs.CV,cs.AI,cs.LG,eess.AS

下载: http://arxiv.org/abs/2407.01851v1

Approximate Gibbs Sampler for Efficient Inference of Hierarchical Bayesian Models for Grouped Count Data

Hierarchical Bayesian Poisson regression models (HBPRMs) provide a flexible modeling approach of the relationship between predictors and count response variables. The applications of HBPRMs to large-scale datasets require efficient inference algorithms due to the high computational cost of inferring many model parameters based on random sampling. Although Markov Chain Monte Carlo (MCMC) algorithms have been widely used for Bayesian inference, sampling using this class of algorithms is time-consuming for applications with large-scale data and time-sensitive decision-making, partially due to the non-conjugacy of many models. To overcome this limitation, this research develops an approximate Gibbs sampler (AGS) to efficiently learn the HBPRMs while maintaining the inference accuracy. In the proposed sampler, the data likelihood is approximated with Gaussian distribution such that the conditional posterior of the coefficients has a closed-form solution. Numerical experiments using real and synthetic datasets with small and large counts demonstrate the superior performance of AGS in comparison to the state-of-the-art sampling algorithm, especially for large datasets.

Updated: 2024-07-01 23:29:26

标题: 层次贝叶斯模型组合计数数据的高效推断的近似吉布斯采样器

摘要: Hierarchical Bayesian Poisson regression models (HBPRMs)提供了一种灵活的建模方法，用于预测变量和计数响应变量之间的关系。将HBPRMs应用于大规模数据集需要高效的推断算法，因为基于随机抽样的推断许多模型参数的计算成本很高。尽管马尔可夫链蒙特卡洛（MCMC）算法被广泛用于贝叶斯推断，但对于具有大规模数据和时间敏感决策的应用来说，使用这类算法进行抽样是耗时的，部分原因是许多模型的非共轭性。为了克服这一限制，本研究开发了一种近似吉布斯采样器（AGS），以便在保持推断准确性的同时高效地学习HBPRMs。在提出的采样器中，数据似然性被近似为高斯分布，从而使得系数的条件后验有一个封闭形式的解。使用真实和合成数据集进行的数值实验表明，与最先进的抽样算法相比，AGS在大数据集上表现出更优越的性能，特别是对于大型数据集。

更新时间: 2024-07-01 23:29:26

领域: cs.LG,stat.ME

下载: http://arxiv.org/abs/2211.15771v2

CGRclust: Chaos Game Representation for Twin Contrastive Clustering of Unlabelled DNA Sequences

This study proposes CGRclust, a novel combination of unsupervised twin contrastive clustering of Chaos Game Representations (CGR) of DNA sequences, with convolutional neural networks (CNNs). To the best of our knowledge, CGRclust is the first method to use unsupervised learning for image classification (herein applied to two-dimensional CGR images) for clustering datasets of DNA sequences. CGRclust overcomes the limitations of traditional sequence classification methods by leveraging unsupervised twin contrastive learning to detect distinctive sequence patterns, without requiring DNA sequence alignment or biological/taxonomic labels. CGRclust accurately clustered twenty-five diverse datasets, with sequence lengths ranging from 664 bp to 100 kbp, including mitochondrial genomes of fish, fungi, and protists, as well as viral whole genome assemblies and synthetic DNA sequences. Compared with three recent clustering methods for DNA sequences (DeLUCS, iDeLUCS, and MeShClust v3.0.), CGRclust is the only method that surpasses 81.70% accuracy across all four taxonomic levels tested for mitochondrial DNA genomes of fish. Moreover, CGRclust also consistently demonstrates superior performance across all the viral genomic datasets. The high clustering accuracy of CGRclust on these twenty-five datasets, which vary significantly in terms of sequence length, number of genomes, number of clusters, and level of taxonomy, demonstrates its robustness, scalability, and versatility.

Updated: 2024-07-01 23:24:05

标题: CGRclust：无标签DNA序列的双对比聚类的混沌游戏表示

摘要: 这项研究提出了CGRclust，这是一种将混沌游戏表示法（CGR）的无监督双对比聚类与卷积神经网络（CNNs）结合的新方法。据我们所知，CGRclust是首个利用无监督学习进行图像分类（在此应用于二维CGR图像）以对DNA序列数据集进行聚类的方法。CGRclust通过利用无监督双对比学习来检测独特的序列模式，克服了传统序列分类方法的局限，而无需DNA序列对齐或生物/分类标签。CGRclust准确地对包括鱼类、真菌和原生生物的线粒体基因组、病毒全基因组组装和合成DNA序列在内的二十五个不同数据集进行了聚类，序列长度范围从664 bp到100 kbp不等。与三种最近的DNA序列聚类方法（DeLUCS、iDeLUCS和MeShClust v3.0）相比，CGRclust是唯一一种在鱼类线粒体DNA基因组的四个分类级别中超过81.70%准确率的方法。此外，CGRclust在所有病毒基因组数据集中也表现出卓越性能。CGRclust在这二十五个数据集上的高聚类准确度，这些数据集在序列长度、基因组数量、聚类数量和分类级别方面有显著差异，展示了其稳健性、可扩展性和多功能性。

更新时间: 2024-07-01 23:24:05

领域: q-bio.GN,cs.LG,F.2.2, I.2.7

下载: http://arxiv.org/abs/2407.02538v1

Action Controlled Paraphrasing

Recent studies have demonstrated the potential to control paraphrase generation, such as through syntax, which has broad applications in various downstream tasks. However, these methods often require detailed parse trees or syntactic exemplars, countering human-like paraphrasing behavior in language use. Furthermore, an inference gap exists, as control specifications are only available during training but not during inference. In this work, we propose a new setup for controlled paraphrase generation. Specifically, we represent user intent as action tokens, embedding and concatenating them with text embeddings, thus flowing together into a self-attention encoder for representation fusion. To address the inference gap, we introduce an optional action token as a placeholder that encourages the model to determine the appropriate action independently when users' intended actions are not provided. Experimental results show that our method successfully enables precise action-controlled paraphrasing and preserves or even enhances performance compared to conventional uncontrolled methods when actions are not given. Our findings promote the concept of action-controlled paraphrasing for a more user-centered design.

Updated: 2024-07-01 23:23:41

标题: 行动控制的释义

摘要: 最近的研究已经证明了通过语法等方式控制释义生成的潜力，在各种下游任务中有广泛的应用。然而，这些方法通常需要详细的解析树或句法示例，从而抵消了语言使用中类似于人类释义行为的特征。此外，存在推理差距，因为控制规范仅在训练期间可用，但在推理期间不可用。在这项工作中，我们提出了一个用于受控释义生成的新设置。具体来说，我们将用户意图表示为行动标记，将其嵌入和与文本嵌入连接起来，从而共同流入一个自注意力编码器进行表示融合。为了解决推理差距，我们引入了一个可选的行动标记作为占位符，鼓励模型在用户未提供预期行动时独立确定适当的行动。实验结果表明，我们的方法成功实现了精确的行动控制释义，并在未提供行动时保留或甚至增强了性能，与传统的无控制方法相比。我们的发现促进了面向用户设计的行动控制释义概念。

更新时间: 2024-07-01 23:23:41

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2405.11277v2

UniFIDES: Universal Fractional Integro-Differential Equation Solvers

The development of data-driven approaches for solving differential equations has been followed by a plethora of applications in science and engineering across a multitude of disciplines and remains a central focus of active scientific inquiry. However, a large body of natural phenomena incorporates memory effects that are best described via fractional integro-differential equations (FIDEs), in which the integral or differential operators accept non-integer orders. Addressing the challenges posed by nonlinear FIDEs is a recognized difficulty, necessitating the application of generic methods with immediate practical relevance. This work introduces the Universal Fractional Integro-Differential Equation Solvers (UniFIDES), a comprehensive machine learning platform designed to expeditiously solve a variety of FIDEs in both forward and inverse directions, without the need for ad hoc manipulation of the equations. The effectiveness of UniFIDES is demonstrated through a collection of integer-order and fractional problems in science and engineering. Our results highlight UniFIDES' ability to accurately solve a wide spectrum of integro-differential equations and offer the prospect of using machine learning platforms universally for discovering and describing dynamical and complex systems.

Updated: 2024-07-01 23:16:34

标题: UniFIDES：通用分数阶积分-微分方程求解器

摘要: 数据驱动方法在解决微分方程方面的发展已经被广泛应用于科学和工程学中的各种学科，仍然是活跃科学研究的中心关注点。然而，许多自然现象包含记忆效应，最好通过分数积分微分方程（FIDEs）来描述，其中积分或微分算子接受非整数阶。解决非线性FIDEs带来的挑战是一个公认的困难，需要应用具有即时实际意义的通用方法。这项工作介绍了Universal Fractional Integro-Differential Equation Solvers (UniFIDES)，这是一个全面的机器学习平台，旨在快速解决各种FIDEs的正向和反向问题，无需对方程进行临时处理。UniFIDES的有效性通过在科学和工程领域的整数阶和分数问题集合中得到了证明。我们的结果突显了UniFIDES准确解决各种积分微分方程的能力，并为普遍使用机器学习平台来发现和描述动态和复杂系统的前景提供了可能性。

更新时间: 2024-07-01 23:16:34

领域: cs.LG,cs.CE

下载: http://arxiv.org/abs/2407.01848v1

TabReD: A Benchmark of Tabular Machine Learning in-the-Wild

Benchmarks that closely reflect downstream application scenarios are essential for the streamlined adoption of new research in tabular machine learning (ML). In this work, we examine existing tabular benchmarks and find two common characteristics of industry-grade tabular data that are underrepresented in the datasets available to the academic community. First, tabular data often changes over time in real-world deployment scenarios. This impacts model performance and requires time-based train and test splits for correct model evaluation. Yet, existing academic tabular datasets often lack timestamp metadata to enable such evaluation. Second, a considerable portion of datasets in production settings stem from extensive data acquisition and feature engineering pipelines. For each specific dataset, this can have a different impact on the absolute and relative number of predictive, uninformative, and correlated features, which in turn can affect model selection. To fill the aforementioned gaps in academic benchmarks, we introduce TabReD -- a collection of eight industry-grade tabular datasets covering a wide range of domains from finance to food delivery services. We assess a large number of tabular ML models in the feature-rich, temporally-evolving data setting facilitated by TabReD. We demonstrate that evaluation on time-based data splits leads to different methods ranking, compared to evaluation on random splits more common in academic benchmarks. Furthermore, on the TabReD datasets, MLP-like architectures and GBDT show the best results, while more sophisticated DL models are yet to prove their effectiveness.

Updated: 2024-07-01 23:01:33

标题: TabReD：在野外的表格机器学习基准

摘要: 在表格机器学习（ML）领域，与下游应用场景密切相关的基准是新研究得以快速采纳的关键。在这项工作中，我们检查了现有的表格基准，并发现工业级表格数据中存在两个学术界可用数据集中较少反映的共同特征。首先，实际部署场景中的表格数据经常随时间变化。这会影响模型性能，并需要基于时间的训练和测试拆分以进行正确的模型评估。然而，现有的学术界表格数据集通常缺乏时间戳元数据以实现这种评估。其次，在生产环境中，数据集的相当一部分来自于广泛的数据获取和特征工程流程。对于每个特定数据集，这可能会对预测、无信息和相关特征的绝对和相对数量产生不同影响，从而影响模型选择。为填补学术基准中上述空白，我们介绍了TabReD--一个包含从金融到食品配送服务等各个领域的八个工业级表格数据集的集合。我们在TabReD提供的功能丰富、随时间演变的数据设置中评估了大量表格ML模型。我们展示了在基于时间的数据拆分上进行评估会导致不同方法排名，与学术基准中更常见的随机拆分上进行评估相比。此外，在TabReD数据集上，类似MLP的架构和GBDT表现最佳，而更复杂的DL模型尚未证明其有效性。

更新时间: 2024-07-01 23:01:33

领域: cs.LG

下载: http://arxiv.org/abs/2406.19380v2

My part is bigger than yours -- assessment within a group of peers using the pairwise comparisons method

A project (e.g. writing a collaborative research paper) is often a group effort. At the end, each contributor identifies his or her contribution, often verbally. The reward, however, is quite often financial in nature. This leads to the question of what (percentage) share in the creation of the paper is due to individual authors. Different authors may have various opinions on the matter, and, even worse, their opinions may have different relevance. In this paper, we present a simple models that allows aggregation of experts' opinions linking the priority of his preference directly to the assessment made by other experts. In this approach, the greater the contribution of a given expert, the greater the importance of his opinion. The presented method can be considered as an attempt to find consensus among a group of peers involved in the same project. Hence, its applications may go beyond the proposed study example of writing a scientific paper.

Updated: 2024-07-01 22:54:51

标题: 我的份额比你的大--使用成对比较方法在同龄人群体中进行评估

摘要: 一个项目（例如撰写一篇合作研究论文）往往是一个团队的努力。最后，每个贡献者通常会口头说明他或她的贡献。然而，奖励往往是金融性质的。这就引出了一个问题，即在论文创作中，个人作者应该获得多少（百分比）的份额。不同的作者可能对此有不同的看法，更糟糕的是，他们的看法可能具有不同的相关性。在本文中，我们提出了一个简单的模型，可以汇总专家意见，将其优先级直接与其他专家的评估联系起来。在这种方法中，一个专家的贡献越大，他的意见就越重要。所提出的方法可以被认为是在参与同一项目的一组同行之间寻求共识的一种尝试。因此，它的应用可能超出了所提议的研究示例，即撰写科学论文。

更新时间: 2024-07-01 22:54:51

领域: cs.DM,cs.AI

下载: http://arxiv.org/abs/2407.01843v1

FIMP: Foundation Model-Informed Message Passing for Graph Neural Networks

Foundation models have achieved remarkable success across many domains, relying on pretraining over vast amounts of data. Graph-structured data often lacks the same scale as unstructured data, making the development of graph foundation models challenging. In this work, we propose Foundation-Informed Message Passing (FIMP), a Graph Neural Network (GNN) message-passing framework that leverages pretrained non-textual foundation models in graph-based tasks. We show that the self-attention layers of foundation models can effectively be repurposed on graphs to perform cross-node attention-based message-passing. Our model is evaluated on a real-world image network dataset and two biological applications (single-cell RNA sequencing data and fMRI brain activity recordings) in both finetuned and zero-shot settings. FIMP outperforms strong baselines, demonstrating that it can effectively leverage state-of-the-art foundation models in graph tasks.

Updated: 2024-07-01 22:54:01

标题: FIMP: 基于模型的消息传递的图神经网络基础模型

摘要: 基于预训练模型的基础模型在许多领域取得了显著的成功，依赖于在大量数据上的预训练。图结构化数据通常缺乏与非结构化数据相同的规模，这使得开发图基础模型具有挑战性。在这项工作中，我们提出了基于基础知识的消息传递（FIMP），这是一种利用预训练的非文本基础模型在基于图的任务中的图神经网络（GNN）消息传递框架。我们展示了基础模型的自注意力层可以有效地在图上重新用于执行基于跨节点注意力的消息传递。我们的模型在一个真实的图像网络数据集和两个生物应用（单细胞RNA测序数据和fMRI脑活动记录）上进行了评估，包括微调和零-shot设置。FIMP优于强基准线，表明它可以有效地利用最先进的基础模型在图任务中。

更新时间: 2024-07-01 22:54:01

领域: cs.LG

下载: http://arxiv.org/abs/2210.09475v5

Quantifying Language Models' Sensitivity to Spurious Features in Prompt Design or: How I learned to start worrying about prompt formatting

As large language models (LLMs) are adopted as a fundamental component of language technologies, it is crucial to accurately characterize their performance. Because choices in prompt design can strongly influence model behavior, this design process is critical in effectively using any modern pre-trained generative language model. In this work, we focus on LLM sensitivity to a quintessential class of meaning-preserving design choices: prompt formatting. We find that several widely used open-source LLMs are extremely sensitive to subtle changes in prompt formatting in few-shot settings, with performance differences of up to 76 accuracy points when evaluated using LLaMA-2-13B. Sensitivity remains even when increasing model size, the number of few-shot examples, or performing instruction tuning. Our analysis suggests that work evaluating LLMs with prompting-based methods would benefit from reporting a range of performance across plausible prompt formats, instead of the currently-standard practice of reporting performance on a single format. We also show that format performance only weakly correlates between models, which puts into question the methodological validity of comparing models with an arbitrarily chosen, fixed prompt format. To facilitate systematic analysis we propose FormatSpread, an algorithm that rapidly evaluates a sampled set of plausible prompt formats for a given task, and reports the interval of expected performance without accessing model weights. Furthermore, we present a suite of analyses that characterize the nature of this sensitivity, including exploring the influence of particular atomic perturbations and the internal representation of particular formats.

Updated: 2024-07-01 22:28:01

标题: 量化语言模型对提示设计中虚假特征的敏感性：或者说我如何开始担心提示格式化

摘要: 随着大型语言模型（LLMs）被采用作为语言技术的基本组成部分，准确表征它们的性能至关重要。由于提示设计选择可以强烈影响模型行为，因此这一设计过程在有效使用任何现代预训练生成式语言模型中至关重要。在这项工作中，我们关注LLM对一个基本的保留含义的设计选择类别的敏感性：提示格式。我们发现，几种广泛使用的开源LLM在少样本设置中对提示格式的微小变化极为敏感，使用LLaMA-2-13B进行评估时，性能差异高达76个准确率点。即使增加模型大小、少样本示例数量或执行指令调整，敏感性仍然存在。我们的分析表明，使用基于提示的方法评估LLM的工作将受益于报告跨可信提示格式范围内的性能，而不是当前标准做法，在单个格式上报告性能。我们还表明，不同模型之间的格式性能仅呈弱相关性，这对比较使用任意选择的固定提示格式的模型的方法论有效性提出了质疑。为了促进系统性分析，我们提出了FormatSpread算法，该算法可以快速评估给定任务的一组可信提示格式样本，并报告预期性能区间，而无需访问模型权重。此外，我们提出了一系列分析，对这种敏感性的性质进行了表征，包括探索特定原子扰动和特定格式的内部表示的影响。

更新时间: 2024-07-01 22:28:01

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2310.11324v2

To Switch or Not to Switch? Balanced Policy Switching in Offline Reinforcement Learning

Reinforcement learning (RL) -- finding the optimal behaviour (also referred to as policy) maximizing the collected long-term cumulative reward -- is among the most influential approaches in machine learning with a large number of successful applications. In several decision problems, however, one faces the possibility of policy switching -- changing from the current policy to a new one -- which incurs a non-negligible cost (examples include the shifting of the currently applied educational technology, modernization of a computing cluster, and the introduction of a new webpage design), and in the decision one is limited to using historical data without the availability for further online interaction. Despite the inevitable importance of this offline learning scenario, to our best knowledge, very little effort has been made to tackle the key problem of balancing between the gain and the cost of switching in a flexible and principled way. Leveraging ideas from the area of optimal transport, we initialize the systematic study of policy switching in offline RL. We establish fundamental properties and design a Net Actor-Critic algorithm for the proposed novel switching formulation. Numerical experiments demonstrate the efficiency of our approach on multiple benchmarks of the Gymnasium.

Updated: 2024-07-01 22:24:31

标题: 要不要切换？离线强化学习中的平衡策略切换

摘要: 强化学习（RL）——寻找最优行为（也称为策略），最大化收集到的长期累积奖励——是机器学习中最具影响力的方法之一，具有大量成功的应用。然而，在一些决策问题中，人们面临策略切换的可能性——从当前策略转换到新的策略——这会产生不可忽视的成本（例如当前应用的教育技术的转换，计算集群的现代化，以及引入新的网页设计），并且在决策中，人们只能使用历史数据，没有进一步的在线交互的可能性。尽管这种离线学习场景的重要性不可避免，但据我们所知，很少有人致力于以灵活而有原则的方式解决平衡切换的收益和成本之间的关键问题。借鉴最优传输领域的思想，我们开始系统研究离线RL中的策略切换。我们建立了基本属性，并设计了一个Net Actor-Critic算法，用于提出的新型切换公式。数值实验证明了我们方法在Gymnasium的多个基准测试中的效率。

更新时间: 2024-07-01 22:24:31

领域: stat.ML,cs.IT,cs.LG,math.IT

下载: http://arxiv.org/abs/2407.01837v1

LHManip: A Dataset for Long-Horizon Language-Grounded Manipulation Tasks in Cluttered Tabletop Environments

Instructing a robot to complete an everyday task within our homes has been a long-standing challenge for robotics. While recent progress in language-conditioned imitation learning and offline reinforcement learning has demonstrated impressive performance across a wide range of tasks, they are typically limited to short-horizon tasks -- not reflective of those a home robot would be expected to complete. While existing architectures have the potential to learn these desired behaviours, the lack of the necessary long-horizon, multi-step datasets for real robotic systems poses a significant challenge. To this end, we present the Long-Horizon Manipulation (LHManip) dataset comprising 200 episodes, demonstrating 20 different manipulation tasks via real robot teleoperation. The tasks entail multiple sub-tasks, including grasping, pushing, stacking and throwing objects in highly cluttered environments. Each task is paired with a natural language instruction and multi-camera viewpoints for point-cloud or NeRF reconstruction. In total, the dataset comprises 176,278 observation-action pairs which form part of the Open X-Embodiment dataset. The full LHManip dataset is made publicly available at https://github.com/fedeceola/LHManip.

Updated: 2024-07-01 22:10:55

标题: LHManip: 一个用于在杂乱的桌面环境中进行长期语言引导操纵任务的数据集

摘要: 指导机器人在我们的家中完成日常任务一直是机器人技术的一个长期挑战。尽管最近在语言条件的模仿学习和离线强化学习方面取得了令人印象深刻的进展，在各种任务中表现出色，但它们通常局限于短期任务，无法反映家用机器人应该完成的任务。虽然现有的架构有潜力学习这些期望的行为，但缺乏实际机器人系统所需的长期、多步数据集是一个重大挑战。为此，我们提出了Long-Horizon Manipulation（LHManip）数据集，包括200个情节，通过真实机器人远程操作展示了20种不同的操作任务。这些任务包括抓取、推动、堆叠和在高度混乱的环境中抛掷物体等多个子任务。每个任务都配有自然语言指令和多摄像机视角以进行点云或NeRF重建。总共，数据集包括176,278个观察-动作对，这些对是Open X-Embodiment数据集的一部分。完整的LHManip数据集可在https://github.com/fedeceola/LHManip上公开获取。

更新时间: 2024-07-01 22:10:55

领域: cs.RO,cs.AI

下载: http://arxiv.org/abs/2312.12036v3

Artificial Leviathan: Exploring Social Evolution of LLM Agents Through the Lens of Hobbesian Social Contract Theory

The emergence of Large Language Models (LLMs) and advancements in Artificial Intelligence (AI) offer an opportunity for computational social science research at scale. Building upon prior explorations of LLM agent design, our work introduces a simulated agent society where complex social relationships dynamically form and evolve over time. Agents are imbued with psychological drives and placed in a sandbox survival environment. We conduct an evaluation of the agent society through the lens of Thomas Hobbes's seminal Social Contract Theory (SCT). We analyze whether, as the theory postulates, agents seek to escape a brutish "state of nature" by surrendering rights to an absolute sovereign in exchange for order and security. Our experiments unveil an alignment: Initially, agents engage in unrestrained conflict, mirroring Hobbes's depiction of the state of nature. However, as the simulation progresses, social contracts emerge, leading to the authorization of an absolute sovereign and the establishment of a peaceful commonwealth founded on mutual cooperation. This congruence between our LLM agent society's evolutionary trajectory and Hobbes's theoretical account indicates LLMs' capability to model intricate social dynamics and potentially replicate forces that shape human societies. By enabling such insights into group behavior and emergent societal phenomena, LLM-driven multi-agent simulations, while unable to simulate all the nuances of human behavior, may hold potential for advancing our understanding of social structures, group dynamics, and complex human systems.

Updated: 2024-07-01 22:06:13

标题: 人工利维坦：透过霍布斯社会契约理论探索LLM代理的社会进化

摘要: 大型语言模型（LLMs）的出现和人工智能（AI）的进步为规模化的计算社会科学研究提供了机会。在之前对LLM代理设计的探索基础上，我们的工作引入了一个模拟代理社会，在这里复杂的社会关系会随着时间动态形成和演变。代理被赋予心理驱动力，并被放置在一个沙盒生存环境中。我们通过托马斯·霍布斯（Thomas Hobbes）的开创性社会契约理论（SCT）对代理社会进行评估。我们分析了代理是否像理论所述那样试图通过放弃权利给绝对主权者来换取秩序和安全，从而逃离野蛮的“自然状态”。我们的实验揭示了一种一致性：最初，代理参与无节制的冲突，反映了霍布斯对自然状态的描述。然而，随着模拟的进展，社会契约出现，导致了授权一个绝对主权者并建立了基于相互合作的和平共和国。我们LLM代理社会的演化轨迹与霍布斯的理论说明之间的一致性表明LLMs有能力模拟复杂的社会动态，并有可能复制塑造人类社会的力量。通过使这种对群体行为和新兴社会现象的洞察力能够实现，由LLM驱动的多代理模拟，虽然无法模拟所有人类行为的细微差别，但可能有助于推动我们对社会结构、群体动态和复杂人类系统的理解。

更新时间: 2024-07-01 22:06:13

领域: cs.AI,cs.CL,cs.CY,cs.HC,cs.MA

下载: http://arxiv.org/abs/2406.14373v2

Accelerating Diffusion Sampling with Optimized Time Steps

Diffusion probabilistic models (DPMs) have shown remarkable performance in high-resolution image synthesis, but their sampling efficiency is still to be desired due to the typically large number of sampling steps. Recent advancements in high-order numerical ODE solvers for DPMs have enabled the generation of high-quality images with much fewer sampling steps. While this is a significant development, most sampling methods still employ uniform time steps, which is not optimal when using a small number of steps. To address this issue, we propose a general framework for designing an optimization problem that seeks more appropriate time steps for a specific numerical ODE solver for DPMs. This optimization problem aims to minimize the distance between the ground-truth solution to the ODE and an approximate solution corresponding to the numerical solver. It can be efficiently solved using the constrained trust region method, taking less than $15$ seconds. Our extensive experiments on both unconditional and conditional sampling using pixel- and latent-space DPMs demonstrate that, when combined with the state-of-the-art sampling method UniPC, our optimized time steps significantly improve image generation performance in terms of FID scores for datasets such as CIFAR-10 and ImageNet, compared to using uniform time steps.

Updated: 2024-07-01 22:01:38

标题: 优化时间步长加速扩散采样

摘要: 扩散概率模型（DPMs）在高分辨率图像合成方面表现出了卓越的性能，但由于通常需要大量的采样步骤，它们的采样效率仍有待改善。最近高阶数值ODE求解器在DPMs方面的进展已经实现了用更少的采样步骤生成高质量图像。尽管这是一个重要的发展，但大多数采样方法仍然采用均匀的时间步长，这在使用少量步骤时并不是最佳选择。为了解决这个问题，我们提出了一个通用框架，用于设计一个优化问题，寻求适用于特定数值ODE求解器的DPMs更合适的时间步长。这个优化问题的目标是最小化地面真实解与数值求解器对应的近似解之间的距离。可以通过使用受约束信任区域方法高效地解决这个问题，耗时不到15秒。我们在无条件和有条件采样方面进行了大量实验，使用像素空间和潜在空间的DPMs，结果表明，与使用均匀时间步长相比，当与最先进的采样方法UniPC结合使用时，我们优化的时间步长显著改善了图像生成性能，针对数据集如CIFAR-10和ImageNet，以FID分数为指标。

更新时间: 2024-07-01 22:01:38

领域: cs.CV,cs.AI,cs.LG

下载: http://arxiv.org/abs/2402.17376v2

Empirical Tests of Optimization Assumptions in Deep Learning

There is a significant gap between our theoretical understanding of optimization algorithms used in deep learning and their practical performance. Theoretical development usually focuses on proving convergence guarantees under a variety of different assumptions, which are themselves often chosen based on a rough combination of intuitive match to practice and analytical convenience. The theory/practice gap may then arise because of the failure to prove a theorem under such assumptions, or because the assumptions do not reflect reality. In this paper, we carefully measure the degree to which these assumptions are capable of explaining modern optimization algorithms by developing new empirical metrics that closely track the key quantities that must be controlled in theoretical analysis. All of our tested assumptions (including typical modern assumptions based on bounds on the Hessian) fail to reliably capture optimization performance. This highlights a need for new empirical verification of analytical assumptions used in theoretical analysis.

Updated: 2024-07-01 21:56:54

标题: 深度学习中优化假设的实证测试

摘要: 我们对深度学习中使用的优化算法的理论理解和实际性能之间存在显著差距。理论发展通常集中在证明在各种不同假设下的收敛保证，这些假设通常是基于直觉与分析便利的粗略组合而选择的。理论/实践差距可能是因为在这些假设下未能证明定理，或者因为这些假设不反映现实。在本文中，我们通过开发新的经验度量标准来仔细衡量这些假设能够解释现代优化算法的程度，这些度量标准紧密跟踪理论分析中必须控制的关键数量。我们测试的所有假设（包括基于Hessian上界的典型现代假设）都未能可靠地捕捉优化性能。这突显了需要对理论分析中使用的分析假设进行新的经验验证的需求。

更新时间: 2024-07-01 21:56:54

领域: cs.LG,math.OC

下载: http://arxiv.org/abs/2407.01825v1

Adversarial Attacks on Reinforcement Learning Agents for Command and Control

Given the recent impact of Deep Reinforcement Learning in training agents to win complex games like StarCraft and DoTA(Defense Of The Ancients) - there has been a surge in research for exploiting learning based techniques for professional wargaming, battlefield simulation and modeling. Real time strategy games and simulators have become a valuable resource for operational planning and military research. However, recent work has shown that such learning based approaches are highly susceptible to adversarial perturbations. In this paper, we investigate the robustness of an agent trained for a Command and Control task in an environment that is controlled by an active adversary. The C2 agent is trained on custom StarCraft II maps using the state of the art RL algorithms - A3C and PPO. We empirically show that an agent trained using these algorithms is highly susceptible to noise injected by the adversary and investigate the effects these perturbations have on the performance of the trained agent. Our work highlights the urgent need to develop more robust training algorithms especially for critical arenas like the battlefield.

Updated: 2024-07-01 21:45:52

标题: 对指挥与控制强化学习代理的对抗性攻击

摘要: 考虑到深度强化学习在训练代理人赢得复杂游戏如星际争霸和DOTA（Defense Of The Ancients）方面的最近影响 - 研究利用基于学习的技术用于专业战争游戏、战场模拟和建模的热潮出现了。实时战略游戏和模拟器已经成为操作规划和军事研究的宝贵资源。然而，最近的研究表明，这种基于学习的方法极易受到对手的干扰。在本文中，我们调查了在由积极对手控制的环境中训练用于指挥与控制任务的代理人的鲁棒性。C2代理人是使用最先进的RL算法A3C和PPO在定制的星际争霸II地图上训练的。我们经验性地展示了使用这些算法训练的代理人极易受到对手注入的噪声的影响，并调查这些扰动对训练代理人性能的影响。我们的工作强调了在关键领域如战场上尤其需要开发更加鲁棒的训练算法的紧迫性。

更新时间: 2024-07-01 21:45:52

领域: cs.CR

下载: http://arxiv.org/abs/2405.01693v2

Automated radiotherapy treatment planning guided by GPT-4Vision

Radiotherapy treatment planning is a time-consuming and potentially subjective process that requires the iterative adjustment of model parameters to balance multiple conflicting objectives. Recent advancements in large foundation models offer promising avenues for addressing the challenges in planning and clinical decision-making. This study introduces GPT-RadPlan, a fully automated treatment planning framework that harnesses prior radiation oncology knowledge encoded in multi-modal large language models, such as GPT-4Vision (GPT-4V) from OpenAI. GPT-RadPlan is made aware of planning protocols as context and acts as an expert human planner, capable of guiding a treatment planning process. Via in-context learning, we incorporate clinical protocols for various disease sites as prompts to enable GPT-4V to acquire treatment planning domain knowledge. The resulting GPT-RadPlan agent is integrated into our in-house inverse treatment planning system through an API. The efficacy of the automated planning system is showcased using multiple prostate and head & neck cancer cases, where we compared GPT-RadPlan results to clinical plans. In all cases, GPT-RadPlan either outperformed or matched the clinical plans, demonstrating superior target coverage and organ-at-risk sparing. Consistently satisfying the dosimetric objectives in the clinical protocol, GPT-RadPlan represents the first multimodal large language model agent that mimics the behaviors of human planners in radiation oncology clinics, achieving remarkable results in automating the treatment planning process without the need for additional training.

Updated: 2024-07-01 21:45:44

标题: 由GPT-4Vision指导的自动放疗治疗计划

摘要: 放射治疗治疗计划是一个耗时且潜在主观的过程，需要通过迭代调整模型参数来平衡多个冲突的目标。最近大型基础模型的进展为解决计划和临床决策中的挑战提供了有希望的途径。本研究介绍了GPT-RadPlan，一个完全自动化的治疗计划框架，利用编码在多模态大型语言模型（如来自OpenAI的GPT-4Vision（GPT-4V））中的放射肿瘤学知识。GPT-RadPlan意识到计划协议作为背景，并充当专业的人类规划者，能够引导治疗计划过程。通过上下文学习，我们将各种疾病部位的临床协议作为提示，使GPT-4V能够获得治疗计划领域知识。产生的GPT-RadPlan代理被整合到我们的内部逆向治疗计划系统中，通过API。通过在多个前列腺和头颈癌症病例中展示自动化计划系统的有效性，我们将GPT-RadPlan的结果与临床计划进行比较。在所有情况下，GPT-RadPlan要么优于要么与临床计划相匹配，展示了卓越的目标覆盖和风险器官保护。通过始终满足临床协议中的剂量学目标，GPT-RadPlan代表了第一个模拟放射肿瘤学诊所人类规划者行为的多模态大型语言模型代理，实现了在不需要额外培训的情况下自动化治疗计划过程并取得显著成果。

更新时间: 2024-07-01 21:45:44

领域: physics.med-ph,cs.AI

下载: http://arxiv.org/abs/2406.15609v2

Distilling Event Sequence Knowledge From Large Language Models

Event sequence models have been found to be highly effective in the analysis and prediction of events. Building such models requires availability of abundant high-quality event sequence data. In certain applications, however, clean structured event sequences are not available, and automated sequence extraction results in data that is too noisy and incomplete. In this work, we explore the use of Large Language Models (LLMs) to generate event sequences that can effectively be used for probabilistic event model construction. This can be viewed as a mechanism of distilling event sequence knowledge from LLMs. Our approach relies on a Knowledge Graph (KG) of event concepts with partial causal relations to guide the generative language model for causal event sequence generation. We show that our approach can generate high-quality event sequences, filling a knowledge gap in the input KG. Furthermore, we explore how the generated sequences can be leveraged to discover useful and more complex structured knowledge from pattern mining and probabilistic event models. We release our sequence generation code and evaluation framework, as well as corpus of event sequence data.

Updated: 2024-07-01 21:43:56

标题: 从大型语言模型中提取事件序列知识

摘要: 事件序列模型已被证明在事件分析和预测中非常有效。构建这样的模型需要大量高质量的事件序列数据。然而，在某些应用中，干净结构化的事件序列不可用，自动化序列提取会导致数据过于嘈杂和不完整。在这项工作中，我们探讨了使用大型语言模型（LLMs）生成可有效用于概率事件模型构建的事件序列。这可以被视为从LLMs中提取事件序列知识的一种机制。我们的方法依赖于事件概念的知识图（KG），其中包含部分因果关系，以指导生成语言模型进行因果事件序列生成。我们展示了我们的方法可以生成高质量的事件序列，填补了输入KG中的知识空白。此外，我们探讨了如何利用生成的序列来从模式挖掘和概率事件模型中发现有用且更复杂的结构化知识。我们发布了我们的序列生成代码和评估框架，以及事件序列数据的语料库。

更新时间: 2024-07-01 21:43:56

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2401.07237v3

Explicit Flow Matching: On The Theory of Flow Matching Algorithms with Applications

This paper proposes a novel method, Explicit Flow Matching (ExFM), for training and analyzing flow-based generative models. ExFM leverages a theoretically grounded loss function, ExFM loss (a tractable form of Flow Matching (FM) loss), to demonstrably reduce variance during training, leading to faster convergence and more stable learning. Based on theoretical analysis of these formulas, we derived exact expressions for the vector field (and score in stochastic cases) for model examples (in particular, for separating multiple exponents), and in some simple cases, exact solutions for trajectories. In addition, we also investigated simple cases of diffusion generative models by adding a stochastic term and obtained an explicit form of the expression for score. While the paper emphasizes the theoretical underpinnings of ExFM, it also showcases its effectiveness through numerical experiments on various datasets, including high-dimensional ones. Compared to traditional FM methods, ExFM achieves superior performance in terms of both learning speed and final outcomes.

Updated: 2024-07-01 21:28:19

标题: 明确的流匹配：流匹配算法理论与应用

摘要: 本文提出了一种新颖的方法，显式流匹配（ExFM），用于训练和分析基于流的生成模型。ExFM利用一个理论上基础的损失函数，ExFM损失（Flow Matching（FM）损失的一种可解的形式），可证明在训练过程中减少方差，导致更快的收敛和更稳定的学习。基于这些公式的理论分析，我们为模型示例（特别是用于分离多个指数）推导出了矢量场（在随机情况下为分数）的精确表达式，以及在一些简单情况下，轨迹的精确解。此外，我们还通过添加一个随机项研究了扩散生成模型的简单情况，并获得了分数表达式的明确形式。虽然本文强调ExFM的理论基础，但也通过对各种数据集（包括高维数据集）的数值实验展示了其有效性。与传统的FM方法相比，ExFM在学习速度和最终结果方面实现了卓越的性能。

更新时间: 2024-07-01 21:28:19

领域: cs.LG

下载: http://arxiv.org/abs/2402.03232v2

ExcelFormer: A neural network surpassing GBDTs on tabular data

Data organized in tabular format is ubiquitous in real-world applications, and users often craft tables with biased feature definitions and flexibly set prediction targets of their interests. Thus, a rapid development of a robust, effective, dataset-versatile, user-friendly tabular prediction approach is highly desired. While Gradient Boosting Decision Trees (GBDTs) and existing deep neural networks (DNNs) have been extensively utilized by professional users, they present several challenges for casual users, particularly: (i) the dilemma of model selection due to their different dataset preferences, and (ii) the need for heavy hyperparameter searching, failing which their performances are deemed inadequate. In this paper, we delve into this question: Can we develop a deep learning model that serves as a "sure bet" solution for a wide range of tabular prediction tasks, while also being user-friendly for casual users? We delve into three key drawbacks of deep tabular models, encompassing: (P1) lack of rotational variance property, (P2) large data demand, and (P3) over-smooth solution. We propose ExcelFormer, addressing these challenges through a semi-permeable attention module that effectively constrains the influence of less informative features to break the DNNs' rotational invariance property (for P1), data augmentation approaches tailored for tabular data (for P2), and attentive feedforward network to boost the model fitting capability (for P3). These designs collectively make ExcelFormer a "sure bet" solution for diverse tabular datasets. Extensive and stratified experiments conducted on real-world datasets demonstrate that our model outperforms previous approaches across diverse tabular data prediction tasks, and this framework can be friendly to casual users, offering ease of use without the heavy hyperparameter tuning.

Updated: 2024-07-01 21:26:51

标题: ExcelFormer：一种在表格数据上超越GBDTs的神经网络

摘要: 数据以表格形式组织在现实世界的应用中随处可见，用户通常会根据自己的偏见定义特征，并灵活地设置他们感兴趣的预测目标。因此，迫切需要快速开发一种强大、有效、适用于多种数据集、用户友好的表格预测方法。虽然梯度提升决策树（GBDTs）和现有的深度神经网络（DNNs）已被专业用户广泛利用，但它们对休闲用户提出了几个挑战，特别是：（i）由于它们的不同数据集偏好而导致模型选择的困境，以及（ii）需要大量超参数搜索，否则它们的性能被认为是不足的。在本文中，我们深入探讨这个问题：我们能否开发一种深度学习模型，它既能作为各种表格预测任务的“稳赢”解决方案，同时也对休闲用户友好？我们深入研究了深度表格模型的三个关键缺点，包括：（P1）缺乏旋转不变性属性，（P2）大数据需求，以及（P3）过度平滑的解决方案。我们提出了ExcelFormer，通过一个半渗透的注意力模块来有效约束不够信息量的特征的影响，打破DNNs的旋转不变性属性（对于P1），为表格数据量身定制的数据增强方法（对于P2），以及注意力前馈网络来提升模型的拟合能力（对于P3）。这些设计共同使ExcelFormer成为各种表格数据集的“稳赢”解决方案。在真实世界数据集上进行的广泛和分层实验表明，我们的模型在各种表格数据预测任务中优于先前的方法，而且这个框架对休闲用户友好，提供了无需繁重超参数调整的易用性。

更新时间: 2024-07-01 21:26:51

领域: cs.LG

下载: http://arxiv.org/abs/2301.02819v7

Equivariant Diffusion Policy

Recent work has shown diffusion models are an effective approach to learning the multimodal distributions arising from demonstration data in behavior cloning. However, a drawback of this approach is the need to learn a denoising function, which is significantly more complex than learning an explicit policy. In this work, we propose Equivariant Diffusion Policy, a novel diffusion policy learning method that leverages domain symmetries to obtain better sample efficiency and generalization in the denoising function. We theoretically analyze the $\mathrm{SO}(2)$ symmetry of full 6-DoF control and characterize when a diffusion model is $\mathrm{SO}(2)$-equivariant. We furthermore evaluate the method empirically on a set of 12 simulation tasks in MimicGen, and show that it obtains a success rate that is, on average, 21.9% higher than the baseline Diffusion Policy. We also evaluate the method on a real-world system to show that effective policies can be learned with relatively few training samples, whereas the baseline Diffusion Policy cannot.

Updated: 2024-07-01 21:23:26

标题: 等变扩散策略

摘要: 最近的研究表明，扩散模型是学习行为克隆中演示数据产生的多模态分布的有效方法。然而，这种方法的一个缺点是需要学习一个去噪函数，这比学习显式策略复杂得多。在这项工作中，我们提出了等变扩散策略，这是一种利用域对称性来获得更好的样本效率和去噪函数泛化性能的新型扩散策略学习方法。我们从理论上分析了完整的6自由度控制的SO(2)对称性，并表征了扩散模型何时是SO(2)-等变的。此外，我们在MimicGen的一组12个模拟任务上对该方法进行了实证评估，结果显示其成功率平均比基线扩散策略高出21.9%。我们还在一个真实系统上评估了该方法，结果显示相对较少的训练样本就可以学习到有效的策略，而基线扩散策略则无法做到。

更新时间: 2024-07-01 21:23:26

领域: cs.RO,cs.LG

下载: http://arxiv.org/abs/2407.01812v1

Parameter Tuning of the Firefly Algorithm by Standard Monte Carlo and Quasi-Monte Carlo Methods

Almost all optimization algorithms have algorithm-dependent parameters, and the setting of such parameter values can significantly influence the behavior of the algorithm under consideration. Thus, proper parameter tuning should be carried out to ensure that the algorithm used for optimization performs well and is sufficiently robust for solving different types of optimization problems. In this study, the Firefly Algorithm (FA) is used to evaluate the influence of its parameter values on its efficiency. Parameter values are randomly initialized using both the standard Monte Carlo method and the Quasi Monte-Carlo method. The values are then used for tuning the FA. Two benchmark functions and a spring design problem are used to test the robustness of the tuned FA. From the preliminary findings, it can be deduced that both the Monte Carlo method and Quasi-Monte Carlo method produce similar results in terms of optimal fitness values. Numerical experiments using the two different methods on both benchmark functions and the spring design problem showed no major variations in the final fitness values, irrespective of the different sample values selected during the simulations. This insensitivity indicates the robustness of the FA.

Updated: 2024-07-01 21:17:27

标题: 用标准蒙特卡洛和准蒙特卡洛方法对萤火虫算法进行参数调整

摘要: 几乎所有优化算法都有与算法相关的参数，设置这些参数的值可以显著影响所考虑算法的行为。因此，应进行适当的参数调整，以确保用于优化的算法表现良好，并足够强大，能够解决不同类型的优化问题。本研究使用萤火虫算法（FA）评估其参数值对效率的影响。参数值使用标准蒙特卡洛方法和准蒙特卡洛方法随机初始化。然后使用这些值来调整FA。两个基准函数和一个弹簧设计问题用于测试调整后FA的稳健性。初步研究结果显示，无论在优化适应值方面，蒙特卡洛方法和准蒙特卡洛方法产生类似的结果。在两种不同方法上进行的数值实验，无论在模拟过程中选择的不同样本值如何，都没有出现最终适应值的主要变化。这种不敏感性表明了FA的稳健性。

更新时间: 2024-07-01 21:17:27

领域: cs.NE,cs.AI,68T20, 68W50

下载: http://arxiv.org/abs/2407.02537v1

Spoken Word2Vec: Learning Skipgram Embeddings from Speech

Text word embeddings that encode distributional semantics work by modeling contextual similarities of frequently occurring words. Acoustic word embeddings, on the other hand, typically encode low-level phonetic similarities. Semantic embeddings for spoken words have been previously explored using analogous algorithms to Word2Vec, but the resulting vectors still mainly encoded phonetic rather than semantic features. In this paper, we examine the assumptions and architectures used in previous works and show experimentally how shallow skipgram-like algorithms fail to encode distributional semantics when the input units are acoustically correlated. We illustrate the potential of an alternative deep end-to-end variant of the model and examine the effects on the resulting embeddings, showing positive results of semantic relatedness in the embedding space.

Updated: 2024-07-01 21:08:18

标题: 口语Word2Vec：从语音学习Skipgram嵌入

摘要: 文本词嵌入编码分布语义的工作是通过建模频繁出现词的上下文相似性来完成的。另一方面，声学词嵌入通常编码低级语音相似性。先前已经使用类似Word2Vec的算法探索了口语词的语义嵌入，但结果向主要编码语音而不是语义特征。在本文中，我们审查了先前工作中使用的假设和架构，并通过实验证明，当输入单元在声学上相关时，浅层跳字类算法无法编码分布语义。我们展示了模型的另一种深度端到端变体的潜力，并检验了对结果嵌入的影响，显示了在嵌入空间中的语义相关性的积极结果。

更新时间: 2024-07-01 21:08:18

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2311.09319v2

DCoM: Active Learning for All Learners

Deep Active Learning (AL) techniques can be effective in reducing annotation costs for training deep models. However, their effectiveness in low- and high-budget scenarios seems to require different strategies, and achieving optimal results across varying budget scenarios remains a challenge. In this study, we introduce Dynamic Coverage & Margin mix (DCoM), a novel active learning approach designed to bridge this gap. Unlike existing strategies, DCoM dynamically adjusts its strategy, considering the competence of the current model. Through theoretical analysis and empirical evaluations on diverse datasets, including challenging computer vision tasks, we demonstrate DCoM's ability to overcome the cold start problem and consistently improve results across different budgetary constraints. Thus DCoM achieves state-of-the-art performance in both low- and high-budget regimes.

Updated: 2024-07-01 21:06:34

标题: DCoM：适用于所有学习者的主动学习

摘要: 深度主动学习（AL）技术可以有效地降低训练深度模型的标注成本。然而，在低预算和高预算情景下，它们的有效性似乎需要不同的策略，并且在不同预算情景下实现最佳结果仍然是一个挑战。在这项研究中，我们介绍了一种新颖的主动学习方法-动态覆盖和边际混合（DCoM），旨在弥合这一差距。与现有策略不同，DCoM根据当前模型的能力动态调整其策略。通过对包括具有挑战性的计算机视觉任务在内的多样化数据集进行理论分析和实证评估，我们展示了DCoM克服冷启动问题并在不同预算限制下持续改善结果的能力。因此，DCoM在低预算和高预算制度下均取得了最先进的性能。

更新时间: 2024-07-01 21:06:34

领域: cs.LG

下载: http://arxiv.org/abs/2407.01804v1

Reducing False Discoveries in Statistically-Significant Regional-Colocation Mining: A Summary of Results

Given a set \emph{S} of spatial feature types, its feature instances, a study area, and a neighbor relationship, the goal is to find pairs $<$a region ($r_{g}$), a subset \emph{C} of \emph{S}$>$ such that \emph{C} is a statistically significant regional-colocation pattern in $r_{g}$. This problem is important for applications in various domains including ecology, economics, and sociology. The problem is computationally challenging due to the exponential number of regional colocation patterns and candidate regions. Previously, we proposed a miner \cite{10.1145/3557989.3566158} that finds statistically significant regional colocation patterns. However, the numerous simultaneous statistical inferences raise the risk of false discoveries (also known as the multiple comparisons problem) and carry a high computational cost. We propose a novel algorithm, namely, multiple comparisons regional colocation miner (MultComp-RCM) which uses a Bonferroni correction. Theoretical analysis, experimental evaluation, and case study results show that the proposed method reduces both the false discovery rate and computational cost.

Updated: 2024-07-01 21:03:04

标题: 减少在统计上显著的区域共定位挖掘中的虚假发现：结果摘要

摘要: 给定一组空间特征类型\emph{S}、其特征实例、一个研究区域和邻居关系，目标是找到一对$<$一个区域($r_{g}$)，一组\emph{S}的子集\emph{C}$>$，使得\emph{C}在$r_{g}$中是一个具有统计显著性的区域共定位模式。这个问题对包括生态学、经济学和社会学在内的各个领域的应用非常重要。由于区域共定位模式和候选区域的数量呈指数增长，这个问题在计算上具有挑战性。我们之前提出了一个矿工，可以找到具有统计显著性的区域共定位模式。然而，众多同时的统计推断增加了假发现的风险（也称为多重比较问题），并且带来了很高的计算成本。我们提出了一种新的算法，即多重比较区域共定位矿工（MultComp-RCM），该算法使用Bonferroni校正。理论分析、实验评估和案例研究结果表明，所提出的方法既降低了假发现率，又降低了计算成本。

更新时间: 2024-07-01 21:03:04

领域: cs.LG,cs.IR,econ.GN,q-fin.EC,stat.AP,E.m; F.2; E.1; H.3; I.5; J.0

下载: http://arxiv.org/abs/2407.02536v1

Normalization and effective learning rates in reinforcement learning

Normalization layers have recently experienced a renaissance in the deep reinforcement learning and continual learning literature, with several works highlighting diverse benefits such as improving loss landscape conditioning and combatting overestimation bias. However, normalization brings with it a subtle but important side effect: an equivalence between growth in the norm of the network parameters and decay in the effective learning rate. This becomes problematic in continual learning settings, where the resulting effective learning rate schedule may decay to near zero too quickly relative to the timescale of the learning problem. We propose to make the learning rate schedule explicit with a simple re-parameterization which we call Normalize-and-Project (NaP), which couples the insertion of normalization layers with weight projection, ensuring that the effective learning rate remains constant throughout training. This technique reveals itself as a powerful analytical tool to better understand learning rate schedules in deep reinforcement learning, and as a means of improving robustness to nonstationarity in synthetic plasticity loss benchmarks along with both the single-task and sequential variants of the Arcade Learning Environment. We also show that our approach can be easily applied to popular architectures such as ResNets and transformers while recovering and in some cases even slightly improving the performance of the base model in common stationary benchmarks.

Updated: 2024-07-01 20:58:01

标题: 强化学习中的归一化和有效学习率

摘要: 最近，在深度强化学习和持续学习文献中，归一化层经历了一次复兴，有几项作品突出了各种好处，如改善损失景观调节和对抗高估偏差。然而，归一化带来一个微妙但重要的副作用：网络参数范数增长与有效学习率下降之间的等价性。在持续学习环境中，这变得棘手，因为由此产生的有效学习率调度相对于学习问题的时间尺度可能会迅速降至接近零。我们提出用一个简单的重新参数化方法——称为Normalize-and-Project（NaP）来明确学习率调度，将归一化层的插入与权重投影结合起来，确保有效学习率在整个训练过程中保持恒定。这种技术表现为一个强大的分析工具，有助于更好地理解深度强化学习中的学习率调度，并且能够改善对合成可塑性损失基准的非静态性的稳健性，以及Arcade Learning Environment的单任务和顺序变体。我们还展示了我们的方法可以轻松应用于流行的架构，如ResNets和转换器，同时在常见的静态基准测试中恢复甚至在某些情况下稍微提高基础模型的性能。

更新时间: 2024-07-01 20:58:01

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2407.01800v1

End-to-End Breast Cancer Radiotherapy Planning via LMMs with Consistency Embedding

Recent advances in AI foundation models have significant potential for lightening the clinical workload by mimicking the comprehensive and multi-faceted approaches used by medical professionals. In the field of radiation oncology, the integration of multiple modalities holds great importance, so the opportunity of foundational model is abundant. Inspired by this, here we present RO-LMM, a multi-purpose, comprehensive large multimodal model (LMM) tailored for the field of radiation oncology. This model effectively manages a series of tasks within the clinical workflow, including clinical context summarization, radiation treatment plan suggestion, and plan-guided target volume segmentation by leveraging the capabilities of LMM. In particular, to perform consecutive clinical tasks without error accumulation, we present a novel Consistency Embedding Fine-Tuning (CEFTune) technique, which boosts LMM's robustness to noisy inputs while preserving the consistency of handling clean inputs. We further extend this concept to LMM-driven segmentation framework, leading to a novel Consistency Embedding Segmentation~(CESEG) techniques. Experimental results including multi-centre validation confirm that our RO-LMM with CEFTune and CESEG results in promising performance for multiple clinical tasks with generalization capabilities.

Updated: 2024-07-01 20:51:59

标题: 通过具有一致性嵌入的LMMs进行端到端的乳腺癌放疗规划

摘要: 人工智能基础模型的最新进展在减轻临床工作负担方面具有重要潜力，通过模仿医疗专业人员使用的全面和多方面方法。在放射肿瘤学领域，集成多种模态具有重要意义，因此基础模型的机会很多。受此启发，我们在这里介绍了RO-LMM，一个专为放射肿瘤学领域量身定制的多功能、全面的大型多模态模型（LMM）。该模型通过利用LMM的能力有效地管理临床工作流程中的一系列任务，包括临床背景摘要、放射治疗计划建议和计划引导的靶区分割。特别是为了在没有错误积累的情况下执行连续的临床任务，我们提出了一种新颖的一致性嵌入微调（CEFTune）技术，通过提高LMM对嘈杂输入的鲁棒性，同时保持处理干净输入的一致性。我们进一步将这一概念扩展到由LMM驱动的分割框架，导致一种新颖的一致性嵌入分割（CESEG）技术。包括多中心验证在内的实验结果证实，我们的RO-LMM配合CEFTune和CESEG在多个临床任务上表现出有前途的性能和泛化能力。

更新时间: 2024-07-01 20:51:59

领域: cs.CV,cs.AI,cs.LG

下载: http://arxiv.org/abs/2311.15876v3

Honor Among Bandits: No-Regret Learning for Online Fair Division

We consider the problem of online fair division of indivisible goods to players when there are a finite number of types of goods and player values are drawn from distributions with unknown means. Our goal is to maximize social welfare subject to allocating the goods fairly in expectation. When a player's value for an item is unknown at the time of allocation, we show that this problem reduces to a variant of (stochastic) multi-armed bandits, where there exists an arm for each player's value for each type of good. At each time step, we choose a distribution over arms which determines how the next item is allocated. We consider two sets of fairness constraints for this problem: envy-freeness in expectation and proportionality in expectation. Our main result is the design of an explore-then-commit algorithm that achieves $\tilde{O}(T^{2/3})$ regret while maintaining either fairness constraint. This result relies on unique properties fundamental to fair-division constraints that allow faster rates of learning, despite the restricted action space.

Updated: 2024-07-01 20:44:52

标题: 匪徒间的荣誉：在线公平分配的无悔学习

摘要: 我们考虑在线公平分配不可分割商品给玩家的问题，当存在有限种类的商品并且玩家价值是从未知均值的分布中抽取时。我们的目标是最大化社会福利，同时公平地分配商品。当玩家对一件物品的价值在分配时是未知的，我们展示了这个问题可以简化为一种（随机）多臂老虎机的变体，其中每种商品类型的每个玩家的价值都有一个臂。在每个时间步，我们选择一个臂上的分布，这决定了下一个物品的分配方式。对于这个问题，我们考虑了两组公平约束：期望上的无嫉妒和期望上的比例性。我们的主要结果是设计了一种探索-执行算法，实现了$\tilde{O}(T^{2/3})$的后悔，同时保持任一公平约束。这个结果依赖于公平分配约束中独特的基本性质，这些性质允许更快的学习速率，尽管行动空间受限。

更新时间: 2024-07-01 20:44:52

领域: cs.GT,cs.LG

下载: http://arxiv.org/abs/2407.01795v1

Conditionally valid Probabilistic Conformal Prediction

We develop a new method for creating prediction sets that combines the flexibility of conformal methods with an estimate of the conditional distribution $P_{Y \mid X}$. Most existing methods, such as conformalized quantile regression and probabilistic conformal prediction, only offer marginal coverage guarantees. Our approach extends these methods to achieve conditional coverage, which is essential for many practical applications. While exact conditional guarantees are impossible without assumptions on the data distribution, we provide non-asymptotic bounds that explicitly depend on the quality of the available estimate of the conditional distribution. Our confidence sets are highly adaptive to the local structure of the data, making them particularly useful in high heteroskedasticity situations. We demonstrate the effectiveness of our approach through extensive simulations, showing that it outperforms existing methods in terms of conditional coverage and improves the reliability of statistical inference in a wide range of applications.

Updated: 2024-07-01 20:44:48

标题: 有条件有效的概率符合预测

摘要: 我们开发了一种新的方法来创建预测集，将符合方法的灵活性与条件分布$P_{Y \mid X}$的估计结合起来。大多数现有方法，如符合化分位数回归和概率符合预测，只提供边际覆盖保证。我们的方法扩展了这些方法以实现条件覆盖，这对许多实际应用至关重要。虽然在没有对数据分布的假设的情况下准确的条件保证是不可能的，但我们提供了明确取决于条件分布估计质量的非渐近界限。我们的置信集对数据的局部结构非常适应，特别适用于高异方差性情况。通过广泛的模拟，我们展示了我们方法的有效性，表明在条件覆盖方面优于现有方法，并提高了在广泛应用中的统计推断的可靠性。

更新时间: 2024-07-01 20:44:48

领域: stat.ML,cs.LG,math.PR,math.ST,stat.ME,stat.TH

下载: http://arxiv.org/abs/2407.01794v1

μ-Bench: A Vision-Language Benchmark for Microscopy Understanding

Recent advances in microscopy have enabled the rapid generation of terabytes of image data in cell biology and biomedical research. Vision-language models (VLMs) offer a promising solution for large-scale biological image analysis, enhancing researchers' efficiency, identifying new image biomarkers, and accelerating hypothesis generation and scientific discovery. However, there is a lack of standardized, diverse, and large-scale vision-language benchmarks to evaluate VLMs' perception and cognition capabilities in biological image understanding. To address this gap, we introduce {\mu}-Bench, an expert-curated benchmark encompassing 22 biomedical tasks across various scientific disciplines (biology, pathology), microscopy modalities (electron, fluorescence, light), scales (subcellular, cellular, tissue), and organisms in both normal and abnormal states. We evaluate state-of-the-art biomedical, pathology, and general VLMs on {\mu}-Bench and find that: i) current models struggle on all categories, even for basic tasks such as distinguishing microscopy modalities; ii) current specialist models fine-tuned on biomedical data often perform worse than generalist models; iii) fine-tuning in specific microscopy domains can cause catastrophic forgetting, eroding prior biomedical knowledge encoded in their base model. iv) weight interpolation between fine-tuned and pre-trained models offers one solution to forgetting and improves general performance across biomedical tasks. We release {\mu}-Bench under a permissive license to accelerate the research and development of microscopy foundation models.

Updated: 2024-07-01 20:30:26

标题: μ-Bench：一个用于显微镜理解的视觉语言基准

摘要: 最近显微镜技术的进步使得细胞生物学和生物医学研究中能够快速生成凤凰彩票app的图像数据。视觉-语言模型(VLMs)为大规模生物医学图像分析提供了一个有前途的解决方案，提高了研究人员的效率，识别了新的图像生物标志物，并加速了假设生成和科学发现。然而，在生物图像理解中缺乏标准化、多样化和大规模的视觉-语言基准来评估VLMs的感知和认知能力。为了弥补这一差距，我们引入了{\mu}-Bench，这是一个由专家策划的基准，涵盖了各种科学学科(生物学、病理学)、显微镜模式(电子、荧光、光学)、规模(亚细胞、细胞、组织)和生物体在正常和异常状态下的22项生物医学任务。我们在{\mu}-Bench上评估了最先进的生物医学、病理学和一般VLMs，并发现：i) 当前模型在所有类别上都存在困难，即使是基本任务如区分显微镜模式也是如此；ii) 在生物医学数据上进行专门模型微调的当前模型通常表现不如通用模型；iii) 在特定显微镜领域进行微调可能导致灾难性遗忘，侵蚀了其基础模型中编码的先前生物医学知识。iv) 在微调和预训练模型之间的权重插值提供了一种解决遗忘的方案，并改善了跨生物医学任务的总体性能。我们以一种宽松的许可证发布{\mu}-Bench，以加速显微镜基础模型的研究和发展。

更新时间: 2024-07-01 20:30:26

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2407.01791v1

Label-free Neural Semantic Image Synthesis

Recent work has shown great progress in integrating spatial conditioning to control large, pre-trained text-to-image diffusion models. Despite these advances, existing methods describe the spatial image content using hand-crafted conditioning inputs, which are either semantically ambiguous (e.g., edges) or require expensive manual annotations (e.g., semantic segmentation). To address these limitations, we propose a new label-free way of conditioning diffusion models to enable fine-grained spatial control. We introduce the concept of neural semantic image synthesis, which uses neural layouts extracted from pre-trained foundation models as conditioning. Neural layouts are advantageous as they provide rich descriptions of the desired image, containing both semantics and detailed geometry of the scene. We experimentally show that images synthesized via neural semantic image synthesis achieve similar or superior pixel-level alignment of semantic classes compared to those created using expensive semantic label maps. At the same time, they capture better semantics, instance separation, and object orientation than other label-free conditioning options, such as edges or depth. Moreover, we show that images generated by neural layout conditioning can effectively augment real data for training various perception tasks.

Updated: 2024-07-01 20:30:23

标题: 无标签的神经语义图像合成

摘要: 最近的研究显示，将空间条件整合到控制大型、预训练的文本到图像扩散模型中取得了巨大进展。尽管取得了这些进展，现有方法描述空间图像内容时使用手工制作的条件输入，这些输入要么语义模糊（例如边缘），要么需要昂贵的手工标注（例如语义分割）。为了解决这些限制，我们提出了一种新的无标记方式来对扩散模型进行调节，以实现精细的空间控制。我们引入了神经语义图像合成的概念，该方法使用从预训练的基础模型中提取的神经布局作为条件。神经布局具有优势，因为它们提供了对所需图像的丰富描述，包含了场景的语义和详细的几何信息。我们在实验中展示了通过神经语义图像合成合成的图像在语义类别的像素级对齐方面与使用昂贵的语义标签图创建的图像相当或更好。同时，与其他无标记调节选项（如边缘或深度）相比，它们捕捉到更好的语义、实例分离和物体方向。此外，我们还展示了通过神经布局条件生成的图像可以有效地为训练各种感知任务提供真实数据。

更新时间: 2024-07-01 20:30:23

领域: cs.CV,cs.AI,cs.LG

下载: http://arxiv.org/abs/2407.01790v1

Analyzing Persuasive Strategies in Meme Texts: A Fusion of Language Models with Paraphrase Enrichment

This paper describes our approach to hierarchical multi-label detection of persuasion techniques in meme texts. Our model, developed as a part of the recent SemEval task, is based on fine-tuning individual language models (BERT, XLM-RoBERTa, and mBERT) and leveraging a mean-based ensemble model in addition to dataset augmentation through paraphrase generation from ChatGPT. The scope of the study encompasses enhancing model performance through innovative training techniques and data augmentation strategies. The problem addressed is the effective identification and classification of multiple persuasive techniques in meme texts, a task complicated by the diversity and complexity of such content. The objective of the paper is to improve detection accuracy by refining model training methods and examining the impact of balanced versus unbalanced training datasets. Novelty in the results and discussion lies in the finding that training with paraphrases enhances model performance, yet a balanced training set proves more advantageous than a larger unbalanced one. Additionally, the analysis reveals the potential pitfalls of indiscriminate incorporation of paraphrases from diverse distributions, which can introduce substantial noise. Results with the SemEval 2024 data confirm these insights, demonstrating improved model efficacy with the proposed methods.

Updated: 2024-07-01 20:25:20

标题: 分析模因文本中的说服策略：语言模型与释义丰富的融合

摘要: 本文描述了我们在模因文本中层次多标签检测说服技术的方法。我们的模型是作为最近SemEval任务的一部分开发的，基于微调个别语言模型（BERT、XLM-RoBERTa和mBERT），并利用基于均值的集成模型，另外还通过从ChatGPT生成释义来进行数据增强。研究的范围包括通过创新的训练技术和数据增强策略来增强模型性能。研究的问题是在模因文本中有效识别和分类多个说服技术，这是一个由于内容的多样性和复杂性而变得复杂的任务。本文的目标是通过改进模型训练方法和检查平衡与不平衡的训练数据集之间的影响来提高检测准确性。结果和讨论中的新颖性在于发现，使用释义进行训练可以提高模型性能，但平衡的训练集比较大的不平衡训练集更有优势。此外，分析揭示了从不同分布中随意引入释义可能会引入大量噪音的潜在风险。与SemEval 2024数据的结果证实了这些见解，展示了使用提出的方法改进的模型效果更好。

更新时间: 2024-07-01 20:25:20

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2407.01784v1

APACE: AlphaFold2 and advanced computing as a service for accelerated discovery in biophysics

The prediction of protein 3D structure from amino acid sequence is a computational grand challenge in biophysics, and plays a key role in robust protein structure prediction algorithms, from drug discovery to genome interpretation. The advent of AI models, such as AlphaFold, is revolutionizing applications that depend on robust protein structure prediction algorithms. To maximize the impact, and ease the usability, of these novel AI tools we introduce APACE, AlphaFold2 and advanced computing as a service, a novel computational framework that effectively handles this AI model and its TB-size database to conduct accelerated protein structure prediction analyses in modern supercomputing environments. We deployed APACE in the Delta and Polaris supercomputers, and quantified its performance for accurate protein structure predictions using four exemplar proteins: 6AWO, 6OAN, 7MEZ, and 6D6U. Using up to 300 ensembles, distributed across 200 NVIDIA A100 GPUs, we found that APACE is up to two orders of magnitude faster than off-the-self AlphaFold2 implementations, reducing time-to-solution from weeks to minutes. This computational approach may be readily linked with robotics laboratories to automate and accelerate scientific discovery.

Updated: 2024-07-01 20:25:05

标题: APACE: AlphaFold2和先进计算作为加速生物物理学发现的服务

摘要: 从氨基酸序列预测蛋白质3D结构是生物物理学中的一个计算大挑战，对于强健的蛋白质结构预测算法至关重要，从药物发现到基因组解析都扮演着关键角色。人工智能模型的出现，比如AlphaFold，正在革新依赖于强健的蛋白质结构预测算法的应用。为了最大化这些新型人工智能工具的影响并方便使用，我们引入了APACE，AlphaFold2和高级计算作为服务，这是一个有效处理这种人工智能模型及其TB级数据库，以在现代超级计算环境中进行加速蛋白质结构预测分析的新型计算框架。我们在Delta和Polaris超级计算机上部署了APACE，并通过使用四个示范蛋白质：6AWO、6OAN、7MEZ和6D6U来量化其性能，以进行准确的蛋白质结构预测。通过在200个NVIDIA A100 GPU上分布的300个集合，我们发现APACE比现成的AlphaFold2实现快两个数量级，将解决时间从几周缩短到几分钟。这种计算方法可以轻松地与机器人实验室相连接，以自动化和加速科学发现。

更新时间: 2024-07-01 20:25:05

领域: q-bio.BM,cs.AI,cs.DC,I.2

下载: http://arxiv.org/abs/2308.07954v2

Semiotics Networks Representing Perceptual Inference

Every day, humans perceive objects and communicate these perceptions through various channels. In this paper, we present a computational model designed to track and simulate the perception of objects, as well as their representations as conveyed in communication. We delineate two fundamental components of our internal representation, termed "observed" and "seen", which we correlate with established concepts in computer vision, namely encoding and decoding. These components are integrated into semiotic networks, which simulate perceptual inference of object perception and human communication. Our model of object perception by a person allows us to define object perception by {\em a network}. We demonstrate this with an example of an image baseline classifier by constructing a new network that includes the baseline classifier and an additional layer. This layer produces the images "perceived" by the entire network, transforming it into a perceptualized image classifier. This facilitates visualization of the acquired network. Within our network, the image representations become more efficient for classification tasks when they are assembled and randomized. In our experiments, the perceptualized network outperformed the baseline classifier on MNIST training databases consisting of a restricted number of images. Our model is not limited to persons and can be applied to any system featuring a loop involving the processing from "internal" to "external" representations.

Updated: 2024-07-01 20:23:31

标题: 符号学网络代表感知推理

摘要: 每天，人类感知物体并通过各种渠道传达这些感知。在这篇论文中，我们提出了一个计算模型，旨在跟踪和模拟物体的感知，以及它们在交流中传达的表现。我们划分了我们内部表征的两个基本组成部分，称为“观察”和“看到”，我们将其与计算机视觉中已建立的概念编码和解码相对应。这些组件被整合到符号网络中，模拟了物体感知和人类交流的感知推理。我们通过一个人的物体感知模型，使我们能够通过网络来定义物体感知。我们通过构建一个包含基线分类器和一个额外层的新网络的示例来演示这一点。这一层产生了整个网络“感知到”的图像，将其转化为一个感知化的图像分类器。这有助于可视化获得的网络。在我们的网络中，当图像表征被组合和随机化时，它们对分类任务变得更加有效。在我们的实验中，感知化网络在由有限数量的图像组成的MNIST训练数据库上表现优于基线分类器。我们的模型不仅限于人类，还可应用于任何涉及从“内部”到“外部”表征的处理回路的系统。

更新时间: 2024-07-01 20:23:31

领域: cs.AI,cs.CV,cs.SI

下载: http://arxiv.org/abs/2310.05212v4

Addressing a fundamental limitation in deep vision models: lack of spatial attention

The primary aim of this manuscript is to underscore a significant limitation in current deep learning models, particularly vision models. Unlike human vision, which efficiently selects only the essential visual areas for further processing, leading to high speed and low energy consumption, deep vision models process the entire image. In this work, we examine this issue from a broader perspective and propose a solution that could pave the way for the next generation of more efficient vision models. Basically, convolution and pooling operations are selectively applied to altered regions, with a change map sent to subsequent layers. This map indicates which computations need to be repeated. The code is available at https://github.com/aliborji/spatial_attention.

Updated: 2024-07-01 20:21:09

标题: 解决深度视觉模型中的一个基本局限性：缺乏空间注意力

摘要: 这篇论文的主要目的是强调当前深度学习模型，尤其是视觉模型存在一个重要局限性。与人类视觉不同，人类视觉能够高效选择仅对进一步处理至关重要的视觉区域，从而实现高速和低能耗，而深度视觉模型会处理整个图像。在这项工作中，我们从更广泛的视角审视这个问题，并提出了一个解决方案，可以为下一代更高效的视觉模型铺平道路。基本上，卷积和池化操作被选择性地应用于修改后的区域，并将变化图发送到后续层。这个图表明哪些计算需要重复。代码可以在https://github.com/aliborji/spatial_attention找到。

更新时间: 2024-07-01 20:21:09

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2407.01782v1

fVDB: A Deep-Learning Framework for Sparse, Large-Scale, and High-Performance Spatial Intelligence

We present fVDB, a novel GPU-optimized framework for deep learning on large-scale 3D data. fVDB provides a complete set of differentiable primitives to build deep learning architectures for common tasks in 3D learning such as convolution, pooling, attention, ray-tracing, meshing, etc. fVDB simultaneously provides a much larger feature set (primitives and operators) than established frameworks with no loss in efficiency: our operators match or exceed the performance of other frameworks with narrower scope. Furthermore, fVDB can process datasets with much larger footprint and spatial resolution than prior works, while providing a competitive memory footprint on small inputs. To achieve this combination of versatility and performance, fVDB relies on a single novel VDB index grid acceleration structure paired with several key innovations including GPU accelerated sparse grid construction, convolution using tensorcores, fast ray tracing kernels using a Hierarchical Digital Differential Analyzer algorithm (HDDA), and jagged tensors. Our framework is fully integrated with PyTorch enabling interoperability with existing pipelines, and we demonstrate its effectiveness on a number of representative tasks such as large-scale point-cloud segmentation, high resolution 3D generative modeling, unbounded scale Neural Radiance Fields, and large-scale point cloud reconstruction.

Updated: 2024-07-01 20:20:33

标题: fVDB：用于稀疏、大规模和高性能空间智能的深度学习框架

摘要: 我们提出了fVDB，这是一个针对大规模3D数据的新型GPU优化框架，提供了一套完整的可微原语，用于构建深度学习体系结构，用于3D学习中的常见任务，如卷积、池化、注意力、光线跟踪、网格化等。 fVDB同时提供了比已建立框架更大的功能集（原语和运算符），并且没有效率损失：我们的运算符与其他范围更窄的框架的性能相匹敌甚至超越。此外，fVDB可以处理比先前作品更大的数据集占地面积和空间分辨率，同时在小输入上提供具有竞争力的内存占用。为实现这种多功能性和性能的结合，fVDB依赖于一种新型的VDB索引网格加速结构，配合几个关键创新，包括GPU加速稀疏网格构建、使用张量核心进行卷积、使用分层数字微分分析算法（HDDA）进行快速光线追踪内核，以及不规则张量。我们的框架与PyTorch完全集成，实现了与现有流程的互操作性，并且我们展示了其在一些代表性任务上的有效性，如大规模点云分割、高分辨率3D生成建模、无界规模神经辐射场和大规模点云重建。

更新时间: 2024-07-01 20:20:33

领域: cs.CV,cs.GR,cs.LG

下载: http://arxiv.org/abs/2407.01781v1

Multifidelity linear regression for scientific machine learning from scarce data

Machine learning (ML) methods, which fit to data the parameters of a given parameterized model class, have garnered significant interest as potential methods for learning surrogate models for complex engineering systems for which traditional simulation is expensive. However, in many scientific and engineering settings, generating high-fidelity data on which to train ML models is expensive, and the available budget for generating training data is limited, so that high-fidelity training data are scarce. ML models trained on scarce data have high variance, resulting in poor expected generalization performance. We propose a new multifidelity training approach for scientific machine learning via linear regression that exploits the scientific context where data of varying fidelities and costs are available: for example, high-fidelity data may be generated by an expensive fully resolved physics simulation whereas lower-fidelity data may arise from a cheaper model based on simplifying assumptions. We use the multifidelity data within an approximate control variate framework to define new multifidelity Monte Carlo estimators for linear regression models. We provide bias and variance analysis of our new estimators that guarantee the approach's accuracy and improved robustness to scarce high-fidelity data. Numerical results demonstrate that our multifidelity training approach achieves similar accuracy to the standard high-fidelity only approach with orders-of-magnitude reduced high-fidelity data requirements.

Updated: 2024-07-01 20:11:32

标题: 多信度线性回归用于从稀缺数据中进行科学机器学习

摘要: 机器学习（ML）方法适合数据给定参数化模型类的参数，已经引起了人们的极大兴趣，作为学习复杂工程系统的代理模型的潜在方法，传统模拟成本昂贵。然而，在许多科学和工程环境中，生成用于训练ML模型的高保真数据成本高，用于生成训练数据的可用预算有限，因此高保真训练数据稀缺。在稀缺数据上训练的ML模型具有很高的方差，导致期望的泛化性能较差。我们提出了一种新的基于线性回归的科学机器学习多保真训练方法，利用可用的具有不同保真度和成本的数据的科学背景：例如，高保真数据可以通过昂贵的完全解析物理模拟生成，而低保真数据可能来自基于简化假设的更便宜的模型。我们在近似控制变量框架中使用多保真数据为线性回归模型定义新的多保真蒙特卡洛估计器。我们提供了对我们的新估计器的偏差和方差分析，保证了该方法的准确性和改进的对高保真数据稀缺性的鲁棒性。数值结果表明，我们的多保真训练方法实现了与标准仅高保真方法相似的准确性，同时减少了高保真数据需求的数量数量级。

更新时间: 2024-07-01 20:11:32

领域: stat.ML,cs.CE,cs.LG

下载: http://arxiv.org/abs/2403.08627v2

Deepfake Audio Detection Using Spectrogram-based Feature and Ensemble of Deep Learning Models

In this paper, we propose a deep learning based system for the task of deepfake audio detection. In particular, the draw input audio is first transformed into various spectrograms using three transformation methods of Short-time Fourier Transform (STFT), Constant-Q Transform (CQT), Wavelet Transform (WT) combined with different auditory-based filters of Mel, Gammatone, linear filters (LF), and discrete cosine transform (DCT). Given the spectrograms, we evaluate a wide range of classification models based on three deep learning approaches. The first approach is to train directly the spectrograms using our proposed baseline models of CNN-based model (CNN-baseline), RNN-based model (RNN-baseline), C-RNN model (C-RNN baseline). Meanwhile, the second approach is transfer learning from computer vision models such as ResNet-18, MobileNet-V3, EfficientNet-B0, DenseNet-121, SuffleNet-V2, Swint, Convnext-Tiny, GoogLeNet, MNASsnet, RegNet. In the third approach, we leverage the state-of-the-art audio pre-trained models of Whisper, Seamless, Speechbrain, and Pyannote to extract audio embeddings from the input spectrograms. Then, the audio embeddings are explored by a Multilayer perceptron (MLP) model to detect the fake or real audio samples. Finally, high-performance deep learning models from these approaches are fused to achieve the best performance. We evaluated our proposed models on ASVspoof 2019 benchmark dataset. Our best ensemble model achieved an Equal Error Rate (EER) of 0.03, which is highly competitive to top-performing systems in the ASVspoofing 2019 challenge. Experimental results also highlight the potential of selective spectrograms and deep learning approaches to enhance the task of audio deepfake detection.

Updated: 2024-07-01 20:10:43

标题: 使用基于频谱图特征和深度学习模型集成的Deepfake音频检测

摘要: 在这篇论文中，我们提出了一个基于深度学习的系统，用于深度伪造音频检测任务。具体来说，输入的音频首先通过三种转换方法（短时傅里叶变换（STFT），常量Q变换（CQT），小波变换（WT））转换为各种频谱图，同时结合不同的基于听觉的滤波器（Mel，Gammatone，线性滤波器（LF），离散余弦变换（DCT））。给定频谱图，我们评估了基于三种深度学习方法的广泛分类模型。第一种方法是直接训练频谱图，使用我们提出的基线模型，包括基于CNN的模型（CNN-baseline），基于RNN的模型（RNN-baseline），C-RNN模型（C-RNN baseline）。与此同时，第二种方法是从计算机视觉模型进行迁移学习，如ResNet-18，MobileNet-V3，EfficientNet-B0，DenseNet-121，ShuffleNet-V2，Swint，Convnext-Tiny，GoogLeNet，MNASsnet，RegNet。在第三种方法中，我们利用最先进的音频预训练模型（Whisper，Seamless，Speechbrain和Pyannote）从输入频谱图中提取音频嵌入。然后，通过多层感知器（MLP）模型探索音频嵌入以检测伪造或真实音频样本。最后，将这些方法中的高性能深度学习模型融合以实现最佳性能。我们在ASVspoof 2019基准数据集上评估了我们提出的模型。我们最佳的集成模型实现了0.03的等误差率（EER），这与ASVspoofing 2019挑战中表现最佳的系统相当竞争。实验结果还突显了选择性频谱图和深度学习方法增强音频深度伪造检测任务的潜力。

更新时间: 2024-07-01 20:10:43

领域: cs.SD,cs.AI,eess.AS

下载: http://arxiv.org/abs/2407.01777v1

Federated Binary Matrix Factorization using Proximal Optimization

Identifying informative components in binary data is an essential task in many research areas, including life sciences, social sciences, and recommendation systems. Boolean matrix factorization (BMF) is a family of methods that performs this task by efficiently factorizing the data. In real-world settings, the data is often distributed across stakeholders and required to stay private, prohibiting the straightforward application of BMF. To adapt BMF to this context, we approach the problem from a federated-learning perspective, while building on a state-of-the-art continuous binary matrix factorization relaxation to BMF that enables efficient gradient-based optimization. We propose to only share the relaxed component matrices, which are aggregated centrally using a proximal operator that regularizes for binary outcomes. We show the convergence of our federated proximal gradient descent algorithm and provide differential privacy guarantees. Our extensive empirical evaluation demonstrates that our algorithm outperforms, in terms of quality and efficacy, federation schemes of state-of-the-art BMF methods on a diverse set of real-world and synthetic data.

Updated: 2024-07-01 20:10:24

标题: 使用近端优化的联邦二进制矩阵分解

摘要: 在许多研究领域，包括生命科学、社会科学和推荐系统中，识别二进制数据中的信息组件是一项重要任务。布尔矩阵分解（BMF）是一类方法，通过有效地对数据进行分解来执行此任务。在现实世界中，数据通常分布在利益相关者之间，并且需要保持私密性，这禁止了对BMF的直接应用。为了使BMF适应这种情况，我们从联邦学习的角度来解决问题，同时基于最先进的连续二进制矩阵分解松弛到BMF，从而实现了高效的基于梯度的优化。我们建议仅分享松弛的组件矩阵，这些矩阵使用一个正则化二元结果的近端算子在中央进行聚合。我们展示了我们的联邦近端梯度下降算法的收敛性，并提供差分隐私保证。我们的广泛实证评估表明，我们的算法在质量和效能方面优于最先进的BMF方法的联邦方案，适用于各种真实世界和合成数据。

更新时间: 2024-07-01 20:10:24

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2407.01776v1

SuperGaussian: Repurposing Video Models for 3D Super Resolution

We present a simple, modular, and generic method that upsamples coarse 3D models by adding geometric and appearance details. While generative 3D models now exist, they do not yet match the quality of their counterparts in image and video domains. We demonstrate that it is possible to directly repurpose existing (pretrained) video models for 3D super-resolution and thus sidestep the problem of the shortage of large repositories of high-quality 3D training models. We describe how to repurpose video upsampling models, which are not 3D consistent, and combine them with 3D consolidation to produce 3D-consistent results. As output, we produce high quality Gaussian Splat models, which are object centric and effective. Our method is category agnostic and can be easily incorporated into existing 3D workflows. We evaluate our proposed SuperGaussian on a variety of 3D inputs, which are diverse both in terms of complexity and representation (e.g., Gaussian Splats or NeRFs), and demonstrate that our simple method significantly improves the fidelity of the final 3D models. Check our project website for details: supergaussian.github.io

Updated: 2024-07-01 20:01:40

标题: 超高斯：将视频模型重新用于3D超分辨率

摘要: 我们提出了一种简单、模块化和通用的方法，通过增加几何和外观细节来提高粗糙的3D模型的分辨率。虽然生成式3D模型现在已经存在，但它们的质量还不及图像和视频领域中的模型。我们展示了可以直接重新利用现有（预训练）视频模型进行3D超分辨率，从而避开高质量3D训练模型的不足问题。我们描述了如何重新利用视频上采样模型，这些模型不是3D一致的，并将它们与3D整合结合起来，产生3D一致的结果。作为输出，我们生成了高质量的高斯斑块模型，这些模型是以对象为中心且有效的。我们的方法是类别不可知的，并且可以轻松地整合到现有的3D工作流程中。我们对各种3D输入使用我们提出的SuperGaussian进行评估，这些输入在复杂性和表示（例如高斯斑块或NeRFs）方面都具有多样性，并且证明我们的简单方法显著提高了最终3D模型的保真度。请查看我们的项目网站以获取详细信息：supergaussian.github.io

更新时间: 2024-07-01 20:01:40

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2406.00609v3

Improving Trip Mode Choice Modeling Using Ensemble Synthesizer (ENSY)

Accurate classification of mode choice datasets is crucial for transportation planning and decision-making processes. However, conventional classification models often struggle to adequately capture the nuanced patterns of minority classes within these datasets, leading to sub-optimal accuracy. In response to this challenge, we present Ensemble Synthesizer (ENSY) which leverages probability distribution for data augmentation, a novel data model tailored specifically for enhancing classification accuracy in mode choice datasets. In our study, ENSY demonstrates remarkable efficacy by nearly quadrupling the F1 score of minority classes and improving overall classification accuracy by nearly 3%. To assess its performance comprehensively, we compare ENSY against various augmentation techniques including Random Oversampling, SMOTE-NC, and CTGAN. Through experimentation, ENSY consistently outperforms these methods across various scenarios, underscoring its robustness and effectiveness

Updated: 2024-07-01 19:59:29

标题: 使用集成合成器（ENSY）改进出行方式选择建模

摘要: 准确分类出行方式数据集对于交通规划和决策过程至关重要。然而，传统的分类模型往往难以充分捕捉这些数据集中少数类别的微妙模式，导致准确性不佳。为了解决这一挑战，我们提出了Ensemble Synthesizer（ENSY），利用概率分布进行数据增强，这是一种专门针对增强出行方式数据集分类准确性的新型数据模型。在我们的研究中，ENSY表现出显著的功效，将少数类别的F1分数几乎提高了四倍，并将整体分类准确性提高了近3%。为了全面评估其性能，我们将ENSY与各种增强技术进行比较，包括随机过采样、SMOTE-NC和CTGAN。通过实验，ENSY在各种场景中始终胜过这些方法，突显其稳健性和有效性。

更新时间: 2024-07-01 19:59:29

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2407.01769v1

Adversaries Can Misuse Combinations of Safe Models

Developers try to evaluate whether an AI system can be misused by adversaries before releasing it; for example, they might test whether a model enables cyberoffense, user manipulation, or bioterrorism. In this work, we show that individually testing models for misuse is inadequate; adversaries can misuse combinations of models even when each individual model is safe. The adversary accomplishes this by first decomposing tasks into subtasks, then solving each subtask with the best-suited model. For example, an adversary might solve challenging-but-benign subtasks with an aligned frontier model, and easy-but-malicious subtasks with a weaker misaligned model. We study two decomposition methods: manual decomposition where a human identifies a natural decomposition of a task, and automated decomposition where a weak model generates benign tasks for a frontier model to solve, then uses the solutions in-context to solve the original task. Using these decompositions, we empirically show that adversaries can create vulnerable code, explicit images, python scripts for hacking, and manipulative tweets at much higher rates with combinations of models than either individual model. Our work suggests that even perfectly-aligned frontier systems can enable misuse without ever producing malicious outputs, and that red-teaming efforts should extend beyond single models in isolation.

Updated: 2024-07-01 19:58:00

标题: 对手可以滥用安全模型的组合

摘要: 开发人员在发布AI系统之前会尝试评估对手是否可以滥用它；例如，他们可能会测试模型是否能够实现网络攻击、用户操纵或生物恐怖主义。在这项工作中，我们展示了单独测试模型是否会被滥用是不足够的；对手可以即使每个单独的模型都是安全的，也可以滥用模型的组合。对手通过首先将任务分解为子任务，然后使用最适合的模型解决每个子任务来实现这一点。例如，对手可能会使用对齐的前沿模型解决具有挑战性但良性的子任务，并使用较弱的不对齐模型解决容易但恶意的子任务。我们研究了两种分解方法：手动分解，其中人类识别任务的自然分解，和自动分解，其中一个弱模型生成良性任务供前沿模型解决，然后在上下文中使用这些解决方案来解决原始任务。通过这些分解，我们经验性地表明，对手可以使用模型组合以比单个模型更高的速率创建易受攻击的代码、显式图像、用于黑客攻击的Python脚本和操纵性推文。我们的工作表明，即使完全对齐的前沿系统也可能会在不产生恶意输出的情况下促使滥用，而且红队工作应该超越单个模型的孤立。

更新时间: 2024-07-01 19:58:00

领域: cs.CR,cs.AI,cs.LG

下载: http://arxiv.org/abs/2406.14595v2

Predicting Trust Dynamics with Dynamic SEM in Human-AI Cooperation

Humans' trust in AI constitutes a pivotal element in fostering a synergistic relationship between humans and AI. This is particularly significant in the context of systems that leverage AI technology, such as autonomous driving systems and human-robot interaction. Trust facilitates appropriate utilization of these systems, thereby optimizing their potential benefits. If humans over-trust or under-trust an AI, serious problems such as misuse and accidents occur. To prevent over/under-trust, it is necessary to predict trust dynamics. However, trust is an internal state of humans and hard to directly observe. Therefore, we propose a prediction model for trust dynamics using dynamic structure equation modeling, which extends SEM that can handle time-series data. A path diagram, which shows causalities between variables, is developed in an exploratory way and the resultant path diagram is optimized for effective path structures. Over/under-trust was predicted with 90\% accuracy in a drone simulator task,, and it was predicted with 99\% accuracy in an autonomous driving task. These results show that our proposed method outperformed the conventional method including an auto regression family.

Updated: 2024-07-01 19:31:07

标题: 使用动态结构方程模型在人工智能合作中预测信任动态

摘要: 人类对人工智能的信任构成了促进人类与人工智能之间协同关系的关键因素。这在利用人工智能技术的系统中尤为重要，例如自动驾驶系统和人机交互。信任有助于适当利用这些系统，从而优化它们的潜在好处。如果人类过度信任或不信任人工智能，就会出现严重问题，如误用和事故。为了防止过度/不信任，有必要预测信任动态。然而，信任是人类的内部状态，难以直接观察。因此，我们提出了一个使用动态结构方程建模的信任动态预测模型，该模型扩展了可以处理时间序列数据的SEM。通过探索性方式开发了一个路径图，显示了变量之间的因果关系，并对结果路径图进行了优化，以获得有效的路径结构。在无人机模拟任务中，过度/不信任的预测准确率达到了90％，在自动驾驶任务中，预测准确率达到了99％。这些结果表明，我们提出的方法优于包括自回归族在内的传统方法。

更新时间: 2024-07-01 19:31:07

领域: cs.HC,cs.AI

下载: http://arxiv.org/abs/2407.01752v1

On the Generalization and Approximation Capacities of Neural Controlled Differential Equations

Neural Controlled Differential Equations (NCDEs) are a state-of-the-art tool for supervised learning with irregularly sampled time series (Kidger, 2020). However, no theoretical analysis of their performance has been provided yet, and it remains unclear in particular how the irregularity of the time series affects their predictions. By merging the rich theory of controlled differential equations (CDE) and Lipschitz-based measures of the complexity of deep neural nets, we take a first step towards the theoretical understanding of NCDE. Our first result is a generalization bound for this class of predictors that depends on the regularity of the time series data. In a second time, we leverage the continuity of the flow of CDEs to provide a detailed analysis of both the sampling-induced bias and the approximation bias. Regarding this last result, we show how classical approximation results on neural nets may transfer to NCDEs. Our theoretical results are validated through a series of experiments.

Updated: 2024-07-01 19:29:57

标题: 关于神经控制微分方程的泛化和逼近能力

摘要: 神经控制微分方程（NCDEs）是一种用于不规则采样时间序列的监督学习的最先进工具（Kidger，2020）。然而，他们的性能尚未进行理论分析，特别是时间序列的不规则性如何影响他们的预测仍不清楚。通过将受控微分方程（CDE）的丰富理论与基于利普希茨的复杂度度量结合起来，我们迈出了理解NCDE理论的第一步。我们的第一个结果是这类预测器的泛化界限，取决于时间序列数据的规则性。其次，我们利用CDE的流的连续性，提供了对采样引起的偏差和逼近偏差的详细分析。关于最后一个结果，我们展示了神经网络的经典逼近结果如何转移到NCDEs。我们的理论结果通过一系列实验得到验证。

更新时间: 2024-07-01 19:29:57

领域: stat.ML,cs.LG

下载: http://arxiv.org/abs/2305.16791v4

Invariant Correlation of Representation with Label

The Invariant Risk Minimization (IRM) approach aims to address the challenge of domain generalization by training a feature representation that remains invariant across multiple environments. However, in noisy environments, IRM-related techniques such as IRMv1 and VREx may be unable to achieve the optimal IRM solution, primarily due to erroneous optimization directions. To address this issue, we introduce ICorr (an abbreviation for \textbf{I}nvariant \textbf{Corr}elation), a novel approach designed to surmount the above challenge in noisy settings. Additionally, we dig into a case study to analyze why previous methods may lose ground while ICorr can succeed. Through a theoretical lens, particularly from a causality perspective, we illustrate that the invariant correlation of representation with label is a necessary condition for the optimal invariant predictor in noisy environments, whereas the optimization motivations for other methods may not be. Furthermore, we empirically demonstrate the effectiveness of ICorr by comparing it with other domain generalization methods on various noisy datasets.

Updated: 2024-07-01 19:27:28

标题: Representation与Label的不变相关性

摘要: 不变风险最小化（IRM）方法旨在通过训练一个在多个环境中保持不变的特征表示来解决领域泛化的挑战。然而，在嘈杂的环境中，像IRMv1和VREx这样的IRM相关技术可能无法实现最佳IRM解决方案，主要是由于错误的优化方向。为了解决这个问题，我们引入了ICorr（\textbf{I}nvariant \textbf{Corr}elation的缩写），这是一种设计用来克服嘈杂环境中上述挑战的新方法。此外，我们进行了一个案例研究，分析了为什么以前的方法可能失去优势，而ICorr可以成功。通过理论视角，特别是从因果关系的角度，我们说明了在嘈杂环境中，表示与标签的不变相关性是最佳不变预测器的必要条件，而其他方法的优化动机可能并非如此。此外，我们通过将ICorr与其他领域泛化方法在各种嘈杂数据集上进行比较来实证地展示ICorr的有效性。

更新时间: 2024-07-01 19:27:28

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2407.01749v1

Adaptive control of reaction-diffusion PDEs via neural operator-approximated gain kernels

Neural operator approximations of the gain kernels in PDE backstepping has emerged as a viable method for implementing controllers in real time. With such an approach, one approximates the gain kernel, which maps the plant coefficient into the solution of a PDE, with a neural operator. It is in adaptive control that the benefit of the neural operator is realized, as the kernel PDE solution needs to be computed online, for every updated estimate of the plant coefficient. We extend the neural operator methodology from adaptive control of a hyperbolic PDE to adaptive control of a benchmark parabolic PDE (a reaction-diffusion equation with a spatially-varying and unknown reaction coefficient). We prove global stability and asymptotic regulation of the plant state for a Lyapunov design of parameter adaptation. The key technical challenge of the result is handling the 2D nature of the gain kernels and proving that the target system with two distinct sources of perturbation terms, due to the parameter estimation error and due to the neural approximation error, is Lyapunov stable. To verify our theoretical result, we present simulations achieving calculation speedups up to 45x relative to the traditional finite difference solvers for every timestep in the simulation trajectory.

Updated: 2024-07-01 19:24:36

标题: 利用神经算子逼近增益核对反应扩散PDE的自适应控制

摘要: PDE反演中增益核的神经运算符逼近已经成为实时实现控制器的可行方法。通过这种方法，我们可以用神经运算符逼近将植物系数映射成PDE解的增益核。在自适应控制中，神经运算符的好处得以体现，因为核PDE解需要在线计算，以适应每次对植物系数的更新估计。我们将神经运算符方法从自适应对双曲PDE的控制扩展到自适应对基准抛物线PDE的控制（一种具有空间变化和未知反应系数的反应扩散方程）。我们证明了通过Lyapunov设计参数适应可以实现植物状态的全局稳定性和渐近调节。结果的关键技术挑战在于处理增益核的二维特性，并证明具有两个不同源的扰动项的目标系统，由于参数估计误差和神经逼近误差，是Lyapunov稳定的。为了验证我们的理论结果，我们展示了模拟结果，相较于传统的有限差分求解器，实现了每个仿真轨迹中每个时间步长的计算加速高达45倍。

更新时间: 2024-07-01 19:24:36

领域: eess.SY,cs.AI,cs.LG,cs.SY,math.AP,math.DS

下载: http://arxiv.org/abs/2407.01745v1

Detecting Edited Knowledge in Language Models

Knowledge editing methods (KEs) can update language models' obsolete or inaccurate knowledge learned from pre-training. However, KEs can be used for malicious applications, e.g., inserting misinformation and toxic content. Knowing whether a generated output is based on edited knowledge or first-hand knowledge from pre-training can increase users' trust in generative models and provide more transparency. Driven by this, we propose a novel task: detecting edited knowledge in language models. Given an edited model and a fact retrieved by a prompt from an edited model, the objective is to classify the knowledge as either unedited (based on the pre-training), or edited (based on subsequent editing). We instantiate the task with four KEs, two LLMs, and two datasets. Additionally, we propose using the hidden state representations and the probability distributions as features for the detection. Our results reveal that, using these features as inputs to a simple AdaBoost classifiers establishes a strong baseline. This classifier requires only a limited amount of data and maintains its performance even in cross-domain settings. Last, we find it more challenging to distinguish edited knowledge from unedited but related knowledge, highlighting the need for further research. Our work lays the groundwork for addressing malicious model editing, which is a critical challenge associated with the strong generative capabilities of LLMs.

Updated: 2024-07-01 19:20:58

标题: 检测语言模型中编辑后知识

摘要: 知识编辑方法（KEs）可以更新语言模型从预训练中学到的过时或不准确的知识。然而，KEs也可以用于恶意应用，例如插入错误信息和有毒内容。了解生成的输出是基于编辑知识还是基于预训练的第一手知识，可以增加用户对生成模型的信任并提供更多透明度。受此驱动，我们提出了一个新颖的任务：检测语言模型中的编辑知识。给定一个经过编辑的模型和从编辑模型的提示中检索到的事实，目标是将知识分类为未编辑（基于预训练）或已编辑（基于后续编辑）。我们用四种KEs、两种LLMs和两个数据集实例化了这个任务。此外，我们提议使用隐藏状态表示和概率分布作为检测的特征。我们的结果显示，将这些特征作为简单AdaBoost分类器的输入，建立了一个强大的基线。这个分类器只需有限的数据，即使在跨领域设置中也能保持性能。最后，我们发现更难区分编辑知识和未编辑但相关的知识，突显了进一步研究的必要性。我们的工作为解决与LLMs强大生成能力相关的恶意模型编辑问题奠定了基础。

更新时间: 2024-07-01 19:20:58

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2405.02765v2

Analyzing heterogeneity in Alzheimer Disease using multimodal normative modeling on imaging-based ATN biomarkers

INTRODUCTION: Previous studies have applied normative modeling on a single neuroimaging modality to investigate Alzheimer Disease (AD) heterogeneity. We employed a deep learning-based multimodal normative framework to analyze individual-level variation across ATN (amyloid-tau-neurodegeneration) imaging biomarkers. METHODS: We selected cross-sectional discovery (n = 665) and replication cohorts (n = 430) with available T1-weighted MRI, amyloid and tau PET. Normative modeling estimated individual-level abnormal deviations in amyloid-positive individuals compared to amyloid-negative controls. Regional abnormality patterns were mapped at different clinical group levels to assess intra-group heterogeneity. An individual-level disease severity index (DSI) was calculated using both the spatial extent and magnitude of abnormal deviations across ATN. RESULTS: Greater intra-group heterogeneity in ATN abnormality patterns was observed in more severe clinical stages of AD. Higher DSI was associated with worse cognitive function and increased risk of disease progression. DISCUSSION: Subject-specific abnormality maps across ATN reveal the heterogeneous impact of AD on the brain.

Updated: 2024-07-01 19:13:36

标题: 使用基于成像的ATN生物标志物的多模态规范建模分析阿尔茨海默病的异质性

摘要: 引言：先前的研究已经应用规范建模在单一神经影像模态上研究阿尔茨海默病（AD）的异质性。我们采用基于深度学习的多模态规范框架来分析 ATN（β淀粉蛋白-τ蛋白-神经退行性）影像生物标志物的个体水平变异。方法：我们选择了具有可用 T1加权 MRI、β淀粉蛋白和τ蛋白 PET 的横断面发现组（n = 665）和复制组（n = 430）。规范建模估计了β淀粉蛋白阳性个体与β淀粉蛋白阴性对照组之间的个体水平异常偏差。区域异常模式在不同临床组水平上进行了映射，以评估组内异质性。使用 ATN 全部区域的异常偏差的空间范围和幅度计算了个体水平的疾病严重指数（DSI）。结果：在 AD 更严重的临床阶段观察到了 ATN 异常模式中更大的组内异质性。更高的 DSI 与更差的认知功能和疾病进展风险增加相关。讨论：ATN 跨越个体的异常性地图揭示了 AD 对大脑的异质影响。

更新时间: 2024-07-01 19:13:36

领域: q-bio.NC,cs.LG

下载: http://arxiv.org/abs/2404.05748v2

A Domain Decomposition-Based CNN-DNN Architecture for Model Parallel Training Applied to Image Recognition Problems

Deep neural networks (DNNs) and, in particular, convolutional neural networks (CNNs) have brought significant advances in a wide range of modern computer application problems. However, the increasing availability of large amounts of datasets as well as the increasing available computational power of modern computers lead to a steady growth in the complexity and size of DNN and CNN models, respectively, and thus, to longer training times. Hence, various methods and attempts have been developed to accelerate and parallelize the training of complex network architectures. In this work, a novel CNN-DNN architecture is proposed that naturally supports a model parallel training strategy and that is loosely inspired by two-level domain decomposition methods (DDM). First, local CNN models, that is, subnetworks, are defined that operate on overlapping or nonoverlapping parts of the input data, for example, sub-images. The subnetworks can be trained completely in parallel and independently of each other. Each subnetwork then outputs a local decision for the given machine learning problem which is exclusively based on the respective local input data. Subsequently, in a second step, an additional DNN model is trained which evaluates the local decisions of the local subnetworks and generates a final, global decision. In this paper, we apply the proposed architecture to image classification problems using CNNs. Experimental results for different 2D image classification problems are provided as well as a face recognition problem, and a classification problem for 3D computer tomography (CT) scans. Therefore, classical ResNet and VGG architectures are considered. The results show that the proposed approach can significantly accelerate the required training time compared to the global model and, additionally, can also help to improve the accuracy of the underlying classification problem.

Updated: 2024-07-01 19:12:49

标题: 基于域分解的CNN-DNN架构用于模型并行训练在图像识别问题中的应用

摘要: 深度神经网络（DNNs）和特别是卷积神经网络（CNNs）在各种现代计算机应用问题中取得了显著进展。然而，大量数据集的不断增加以及现代计算机的计算能力的增加导致了DNN和CNN模型的复杂性和规模的稳定增长，从而导致更长的训练时间。因此，已经开发了各种方法和尝试来加速和并行化复杂网络结构的训练。在这项工作中，提出了一种新颖的CNN-DNN架构，自然地支持模型并行训练策略，并受到两级域分解方法（DDM）的启发。首先，定义了本地CNN模型，即子网络，这些模型在输入数据的重叠或非重叠部分上运行，例如，子图像。这些子网络可以完全并行地训练，互相独立。然后，每个子网络输出给定机器学习问题的本地决策，该决策完全基于相应的本地输入数据。随后，在第二步中，训练一个额外的DNN模型，该模型评估本地子网络的本地决策并生成最终的全局决策。在本文中，我们将提出的架构应用于使用CNN的图像分类问题。提供了不同2D图像分类问题的实验结果，以及人脸识别问题，以及3D计算机断层扫描（CT）的分类问题。因此，考虑了经典的ResNet和VGG架构。结果表明，与全局模型相比，提出的方法可以显著加速所需的训练时间，并且还可以帮助提高基础分类问题的准确性。

更新时间: 2024-07-01 19:12:49

领域: cs.LG,cs.CV,68T07, 68W10, 68W15, 65N55,I.2.6

下载: http://arxiv.org/abs/2302.06564v2

Universal Quantum Tomography With Deep Neural Networks

Quantum state tomography is a crucial technique for characterizing the state of a quantum system, which is essential for many applications in quantum technologies. In recent years, there has been growing interest in leveraging neural networks to enhance the efficiency and accuracy of quantum state tomography. Still, many of them did not include mixed quantum state, since pure states are arguably less common in practical situations. In this research paper, we present two neural networks based approach for both pure and mixed quantum state tomography: Restricted Feature Based Neural Network and Mixed States Conditional Generative Adversarial Network, evaluate its effectiveness in comparison to existing neural based methods. We demonstrate that our proposed methods can achieve state-of-the-art results in reconstructing mixed quantum states from experimental data. Our work highlights the potential of neural networks in revolutionizing quantum state tomography and facilitating the development of quantum technologies.

Updated: 2024-07-01 19:09:18

标题: 深度神经网络在通用量子态测量中的应用

摘要: 量子态重构是表征量子系统状态的关键技术，对于量子技术的许多应用至关重要。近年来，人们越来越关注利用神经网络提高量子态重构的效率和准确性。然而，许多方法并未包括混合态，因为在实际情况下纯态可能较少见。在这篇研究论文中，我们提出了两种基于神经网络的方法，用于纯态和混合态的量子态重构：基于限制特征的神经网络和混合态条件生成对抗网络，评估其与现有基于神经网络的方法的有效性。我们证明了我们提出的方法可以在从实验数据中重建混合态量子态方面取得最先进的结果。我们的工作突显了神经网络在革新量子态重构和促进量子技术发展方面的潜力。

更新时间: 2024-07-01 19:09:18

领域: quant-ph,cs.AI

下载: http://arxiv.org/abs/2407.01734v1

ACR: A Benchmark for Automatic Cohort Retrieval

Identifying patient cohorts is fundamental to numerous healthcare tasks, including clinical trial recruitment and retrospective studies. Current cohort retrieval methods in healthcare organizations rely on automated queries of structured data combined with manual curation, which are time-consuming, labor-intensive, and often yield low-quality results. Recent advancements in large language models (LLMs) and information retrieval (IR) offer promising avenues to revolutionize these systems. Major challenges include managing extensive eligibility criteria and handling the longitudinal nature of unstructured Electronic Medical Records (EMRs) while ensuring that the solution remains cost-effective for real-world application. This paper introduces a new task, Automatic Cohort Retrieval (ACR), and evaluates the performance of LLMs and commercial, domain-specific neuro-symbolic approaches. We provide a benchmark task, a query dataset, an EMR dataset, and an evaluation framework. Our findings underscore the necessity for efficient, high-quality ACR systems capable of longitudinal reasoning across extensive patient databases.

Updated: 2024-07-01 19:05:00

标题: ACR：自动队列检索的基准Benchmark

摘要: Identifying patient cohorts is fundamental to numerous healthcare tasks, including clinical trial recruitment and retrospective studies. Current cohort retrieval methods in healthcare organizations rely on automated queries of structured data combined with manual curation, which are time-consuming, labor-intensive, and often yield low-quality results. Recent advancements in large language models (LLMs) and information retrieval (IR) offer promising avenues to revolutionize these systems. Major challenges include managing extensive eligibility criteria and handling the longitudinal nature of unstructured Electronic Medical Records (EMRs) while ensuring that the solution remains cost-effective for real-world application. This paper introduces a new task, Automatic Cohort Retrieval (ACR), and evaluates the performance of LLMs and commercial, domain-specific neuro-symbolic approaches. We provide a benchmark task, a query dataset, an EMR dataset, and an evaluation framework. Our findings underscore the necessity for efficient, high-quality ACR systems capable of longitudinal reasoning across extensive patient databases.

更新时间: 2024-07-01 19:05:00

领域: cs.AI

下载: http://arxiv.org/abs/2406.14780v2

DiscoveryBench: Towards Data-Driven Discovery with Large Language Models

Can the rapid advances in code generation, function calling, and data analysis using large language models (LLMs) help automate the search and verification of hypotheses purely from a set of provided datasets? To evaluate this question, we present DiscoveryBench, the first comprehensive benchmark that formalizes the multi-step process of data-driven discovery. The benchmark is designed to systematically assess current model capabilities in discovery tasks and provide a useful resource for improving them. Our benchmark contains 264 tasks collected across 6 diverse domains, such as sociology and engineering, by manually deriving discovery workflows from published papers to approximate the real-world challenges faced by researchers, where each task is defined by a dataset, its metadata, and a discovery goal in natural language. We additionally provide 903 synthetic tasks to conduct controlled evaluations across task complexity. Furthermore, our structured formalism of data-driven discovery enables a facet-based evaluation that provides useful insights into different failure modes. We evaluate several popular LLM-based reasoning frameworks using both open and closed LLMs as baselines on DiscoveryBench and find that even the best system scores only 25%. Our benchmark, thus, illustrates the challenges in autonomous data-driven discovery and serves as a valuable resource for the community to make progress.

Updated: 2024-07-01 18:58:22

标题: DiscoveryBench: 朝着大型语言模型驱动的发现前进

摘要: 快速发展的代码生成、函数调用和数据分析技术是否可以利用大型语言模型(LLMs)自动化搜索和验证假设，仅仅通过一组提供的数据集？为了评估这个问题，我们提出了DiscoveryBench，这是第一个系统评估数据驱动发现过程的综合基准。该基准旨在系统评估当前模型在发现任务中的能力，并为改进提供有用资源。我们的基准包含了来自社会学和工程等6个不同领域的264个任务，通过从已发表论文中手动推导发现工作流程来模拟研究人员面临的现实挑战，其中每个任务由数据集、元数据和自然语言中的发现目标定义。我们还提供了903个合成任务，以进行对任务复杂性的受控评估。此外，我们对数据驱动发现的结构化形式主义进行了基于方面的评估，提供了对不同失败模式的有用见解。我们在DiscoveryBench上评估了几种流行的基于LLM的推理框架，将开放和封闭LLM作为基准，并发现即使最佳系统得分也仅为25%。因此，我们的基准展示了自主数据驱动发现中的挑战，并为社区提供了宝贵的资源以取得进展。

更新时间: 2024-07-01 18:58:22

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2407.01725v1

Entropic Optimal Transport Eigenmaps for Nonlinear Alignment and Joint Embedding of High-Dimensional Datasets

Embedding high-dimensional data into a low-dimensional space is an indispensable component of data analysis. In numerous applications, it is necessary to align and jointly embed multiple datasets from different studies or experimental conditions. Such datasets may share underlying structures of interest but exhibit individual distortions, resulting in misaligned embeddings using traditional techniques. In this work, we propose \textit{Entropic Optimal Transport (EOT) eigenmaps}, a principled approach for aligning and jointly embedding a pair of datasets with theoretical guarantees. Our approach leverages the leading singular vectors of the EOT plan matrix between two datasets to extract their shared underlying structure and align the datasets accordingly in a common embedding space. We interpret our approach as an inter-data variant of the classical Laplacian eigenmaps and diffusion maps embeddings, showing that it enjoys many favorable analogous properties. We then analyze a data-generative model where two observed high-dimensional datasets share latent variables on a common low-dimensional manifold, but each dataset is subject to data-specific translation, scaling, nuisance structures, and noise. We show that in a high-dimensional asymptotic regime, the EOT plan recovers the shared manifold structure by approximating a kernel function evaluated at the locations of the latent variables. Subsequently, we provide a geometric interpretation of our embedding by relating it to the eigenfunctions of population-level operators encoding the density and geometry of the shared manifold. Finally, we showcase the performance of our approach for data integration and embedding through simulations and analyses of real-world biological data, demonstrating its advantages over alternative methods in challenging scenarios.

Updated: 2024-07-01 18:48:55

标题: 基于熵最优输运的非线性对齐和高维数据集联合嵌入

摘要: 将高维数据嵌入到低维空间是数据分析中不可或缺的组成部分。在许多应用中，需要对来自不同研究或实验条件的多个数据集进行对齐和联合嵌入。这些数据集可能共享感兴趣的潜在结构，但可能表现出个别扭曲，导致传统技术产生不对齐的嵌入。在这项工作中，我们提出了\textit{熵最优输运（EOT）特征映射}，这是一种有理论保证的方法，用于对齐和联合嵌入一对数据集。我们的方法利用两个数据集之间的EOT计划矩阵的主奇异向量来提取它们共享的潜在结构，并相应地在一个共同的嵌入空间中对齐数据集。我们将我们的方法解释为经典拉普拉斯特征映射和扩散映射嵌入的一种数据间变体，并展示它具有许多有利的类似性质。然后，我们分析一个数据生成模型，在这个模型中，两个观察到的高维数据集共享一个共同的低维流形上的潜在变量，但每个数据集都受特定于数据的平移、缩放、干扰结构和噪声的影响。我们展示，在高维渐近情况下，EOT计划通过在潜在变量位置评估的核函数来恢复共享的流形结构。随后，我们通过将其与编码共享流形的密度和几何形状的群体水平操作符的特征函数相关联，提供了我们嵌入的几何解释。最后，我们通过模拟和对真实世界生物数据的分析展示了我们方法在数据集成和嵌入方面的性能，证明了它在具有挑战性场景中优于其他方法的优势。

更新时间: 2024-07-01 18:48:55

领域: stat.ML,cs.LG,math.ST,stat.TH

下载: http://arxiv.org/abs/2407.01718v1

Optimal Design and Implementation of an Open-source Emulation Platform for User-Centric Shared E-mobility Services

With the rising concern over transportation emissions and pollution on a global scale, shared electric mobility services like E-cars, E-bikes, and E-scooters have emerged as promising solutions to mitigate these pressing challenges. However, existing shared E-mobility services exhibit critical design deficiencies, including insufficient service integration, imprecise energy consumption forecasting, limited scalability and geographical coverage, and a notable absence of a user-centric perspective, particularly in the context of multi-modal transportation. More importantly, there is no consolidated open-source platform which could benefit the E-mobility research community. This paper aims to bridge this gap by providing an open-source platform for shared E-mobility. The proposed platform, with an agent-in-the-loop approach and modular architecture, is tailored to diverse user preferences and offers enhanced customization. We demonstrate the viability of this platform by providing a comprehensive analysis for integrated multi-modal route-optimization in diverse scenarios of energy availability, user preferences and E-mobility tools placement for which we use modified Ant Colony Optimization algorithm so called Multi-Model Energy Constrained ACO (MMEC-ACO) and Q-Learning algorithms. Our findings demonstrate that Q-learning achieves significantly better performance in terms of travel time cost for more than 90\% of the instances as compared to MMEC-ACO for different scenarios including energy availability, user preference and E-mobility tools distribution. For a fixed (O, D) pair, the average execution time to achieve optimal time cost solution for MMEC-ACO is less than 2 seconds, while Q-learning reaches an optimal time cost in 20 seconds on average. For a run-time of 2 seconds, Q-learning still achieves a better optimal time cost with a 20\% reduction over MMEC-ACO's time cost.

Updated: 2024-07-01 18:46:52

标题: 用户中心共享电动出行服务的开源仿真平台的最佳设计和实施

摘要: 随着对全球交通排放和污染日益关注，共享电动出行服务如电动汽车、电动自行车和电动滑板车已成为缓解这些紧迫挑战的有希望的解决方案。然而，现有的共享电动出行服务存在关键设计缺陷，包括服务集成不足、能源消耗预测不精确、可扩展性和地理覆盖有限，以及在多模式交通背景下缺乏用户中心的视角。更重要的是，目前没有整合的开源平台可以使电动出行研究社区受益。本文旨在通过为共享电动出行提供一个开源平台来弥补这一差距。所提出的平台采用代理人参与和模块化架构，可以根据不同用户偏好进行定制，并提供增强的定制功能。我们通过提供综合分析来展示该平台的可行性，分析在不同场景下的能源可用性、用户偏好和电动出行工具布局的集成多模式路线优化。我们使用改进的蚁群优化算法，即多模式能量约束蚁群优化（MMEC-ACO）和Q学习算法。我们的研究结果表明，在不同场景下，包括能源可用性、用户偏好和电动出行工具分布，Q学习在超过90％的情况下在行程时间成本方面表现明显优于MMEC-ACO。对于固定的（O，D）配对，MMEC-ACO实现最佳时间成本解决方案的平均执行时间不到2秒，而Q学习平均在20秒内实现最佳时间成本。在2秒的运行时间内，Q学习仍然比MMEC-ACO的时间成本减少20％。

更新时间: 2024-07-01 18:46:52

领域: cs.AI

下载: http://arxiv.org/abs/2403.07964v2

Optimized Learning for X-Ray Image Classification for Multi-Class Disease Diagnoses with Accelerated Computing Strategies

X-ray image-based disease diagnosis lies in ensuring the precision of identifying afflictions within the sample, a task fraught with challenges stemming from the occurrence of false positives and false negatives. False positives introduce the risk of erroneously identifying non-existent conditions, leading to misdiagnosis and a decline in patient care quality. Conversely, false negatives pose the threat of overlooking genuine abnormalities, potentially causing delays in treatment and interventions, thereby resulting in adverse patient outcomes. The urgency to overcome these challenges compels ongoing efforts to elevate the precision and reliability of X-ray image analysis algorithms within the computational framework. This study introduces modified pre-trained ResNet models tailored for multi-class disease diagnosis of X-ray images, incorporating advanced optimization strategies to reduce the execution runtime of training and inference tasks. The primary objective is to achieve tangible performance improvements through accelerated implementations of PyTorch, CUDA, Mixed- Precision Training, and Learning Rate Scheduler. While outcomes demonstrate substantial improvements in execution runtimes between normal training and CUDA-accelerated training, negligible differences emerge between various training optimization modalities. This research marks a significant advancement in optimizing computational approaches to reduce training execution time for larger models. Additionally, we explore the potential of effective parallel data processing using MPI4Py for the distribution of gradient descent optimization across multiple nodes and leverage multiprocessing to expedite data preprocessing for larger datasets.

Updated: 2024-07-01 18:31:30

标题: 优化学习X射线图像分类多类疾病诊断的加速计算策略

摘要: 基于X射线图像的疾病诊断在确保在样本中准确识别疾病的重要性上，面临着识别假阳性和假阴性所带来的挑战。假阳性会引入错误识别不存在疾病的风险，导致误诊和患者护理质量下降。相反，假阴性可能会忽视真实的异常，可能导致治疗和干预的延误，从而导致患者不良结局。克服这些挑战的紧迫性促使持续努力提高计算框架内X射线图像分析算法的准确性和可靠性。本研究引入了针对X射线图像的多类疾病诊断定制的修改过的预训练ResNet模型，结合先进的优化策略来减少训练和推理任务的执行时间。主要目标是通过加速PyTorch、CUDA、混合精度训练和学习率调度的实现来实现明显的性能改进。结果显示，在普通训练和CUDA加速训练之间的执行时间显着改善，但在不同的训练优化模式之间出现了微不足道的差异。这项研究标志着在优化计算方法以减少大型模型的训练执行时间方面取得了重大进展。此外，我们探讨了使用MPI4Py进行有效并行数据处理的潜力，以实现梯度下降优化在多个节点之间的分布，并利用多进程加速更大数据集的数据预处理。

更新时间: 2024-07-01 18:31:30

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2407.01705v1

Weight Clipping for Deep Continual and Reinforcement Learning

Many failures in deep continual and reinforcement learning are associated with increasing magnitudes of the weights, making them hard to change and potentially causing overfitting. While many methods address these learning failures, they often change the optimizer or the architecture, a complexity that hinders widespread adoption in various systems. In this paper, we focus on learning failures that are associated with increasing weight norm and we propose a simple technique that can be easily added on top of existing learning systems: clipping neural network weights to limit them to a specific range. We study the effectiveness of weight clipping in a series of supervised and reinforcement learning experiments. Our empirical results highlight the benefits of weight clipping for generalization, addressing loss of plasticity and policy collapse, and facilitating learning with a large replay ratio.

Updated: 2024-07-01 18:29:29

标题: 深度持续和强化学习的权重裁剪

摘要: 许多深度持续学习和强化学习中的失败与权重的增加幅度相关，使它们难以改变并可能导致过拟合。虽然许多方法解决了这些学习失败，但它们经常改变优化器或架构，这种复杂性阻碍了在各种系统中的广泛采用。在本文中，我们专注于与增加权重范数相关的学习失败，并提出了一种简单的技术，可以轻松添加到现有的学习系统之上：将神经网络权重修剪到特定范围以限制它们。我们研究了权重修剪在一系列监督学习和强化学习实验中的有效性。我们的实证结果突显了权重修剪对泛化、解决可塑性丧失和策略崩溃问题以及促进具有大回放比率的学习的好处。

更新时间: 2024-07-01 18:29:29

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2407.01704v1

Dual Process Learning: Controlling Use of In-Context vs. In-Weights Strategies with Weight Forgetting

Language models have the ability to perform in-context learning (ICL), allowing them to flexibly adapt their behavior based on context. This contrasts with in-weights learning, where information is statically encoded in model parameters from iterated observations of the data. Despite this apparent ability to learn in-context, language models are known to struggle when faced with unseen or rarely seen tokens. Hence, we study $\textbf{structural in-context learning}$, which we define as the ability of a model to execute in-context learning on arbitrary tokens -- so called because the model must generalize on the basis of e.g. sentence structure or task structure, rather than semantic content encoded in token embeddings. An ideal model would be able to do both: flexibly deploy in-weights operations (in order to robustly accommodate ambiguous or unknown contexts using encoded semantic information) and structural in-context operations (in order to accommodate novel tokens). We study structural in-context algorithms in a simple part-of-speech setting using both practical and toy models. We find that active forgetting, a technique that was recently introduced to help models generalize to new languages, forces models to adopt structural in-context learning solutions. Finally, we introduce $\textbf{temporary forgetting}$, a straightforward extension of active forgetting that enables one to control how much a model relies on in-weights vs. in-context solutions. Importantly, temporary forgetting allows us to induce a $\textit{dual process strategy}$ where in-context and in-weights solutions coexist within a single model.

Updated: 2024-07-01 18:23:43

标题: 双过程学习：通过权重遗忘控制上下文策略与权重策略的使用

摘要: 语言模型具有执行上下文学习（ICL）的能力，使它们能够根据上下文灵活调整行为。这与权重学习相对应，后者是通过对数据的迭代观察在模型参数中静态编码信息。尽管具有这种学习上下文的能力，但众所周知，当面对未知或很少见的标记时，语言模型往往会遇到困难。因此，我们研究了$\textbf{结构化上下文学习}$，我们将其定义为模型在任意标记上执行上下文学习的能力--所谓的是因为模型必须基于例如句子结构或任务结构进行概括，而不是基于嵌入在标记中的语义内容。理想的模型应该能够同时灵活地部署权重操作（以便使用编码的语义信息稳健地适应模糊或未知的上下文）和结构化上下文操作（以适应新的标记）。我们在一个简单的词性标注设置中研究了结构化上下文算法，使用了实际和玩具模型。我们发现，最近引入的一种称为主动遗忘的技术迫使模型采用结构化上下文学习解决方案。最后，我们介绍了$\textbf{临时遗忘}$，这是主动遗忘的一个简单扩展，使我们能够控制模型依赖权重解决方案和上下文解决方案的程度。重要的是，临时遗忘使我们能够引入$\textit{双过程策略}$，其中上下文和权重解决方案在单个模型中共存。

更新时间: 2024-07-01 18:23:43

领域: cs.CL,cs.LG

下载: http://arxiv.org/abs/2406.00053v2

Pytorch-Wildlife: A Collaborative Deep Learning Framework for Conservation

The alarming decline in global biodiversity, driven by various factors, underscores the urgent need for large-scale wildlife monitoring. In response, scientists have turned to automated deep learning methods for data processing in wildlife monitoring. However, applying these advanced methods in real-world scenarios is challenging due to their complexity and the need for specialized knowledge, primarily because of technical challenges and interdisciplinary barriers. To address these challenges, we introduce Pytorch-Wildlife, an open-source deep learning platform built on PyTorch. It is designed for creating, modifying, and sharing powerful AI models. This platform emphasizes usability and accessibility, making it accessible to individuals with limited or no technical background. It also offers a modular codebase to simplify feature expansion and further development. Pytorch-Wildlife offers an intuitive, user-friendly interface, accessible through local installation or Hugging Face, for animal detection and classification in images and videos. As two real-world applications, Pytorch-Wildlife has been utilized to train animal classification models for species recognition in the Amazon Rainforest and for invasive opossum recognition in the Galapagos Islands. The Opossum model achieves 98% accuracy, and the Amazon model has 92% recognition accuracy for 36 animals in 90% of the data. As Pytorch-Wildlife evolves, we aim to integrate more conservation tasks, addressing various environmental challenges. Pytorch-Wildlife is available at https://github.com/microsoft/CameraTraps.

Updated: 2024-07-01 18:22:38

标题: Pytorch-Wildlife：用于保护的合作深度学习框架

摘要: 全球生物多样性急剧下降，受到各种因素驱使，强调了大规模野生动物监测的紧迫性。作为回应，科学家们已经转向自动化的深度学习方法来处理野生动物监测数据。然而，在实际应用中应用这些先进方法是具有挑战性的，因为它们的复杂性和对专业知识的需求，主要是由于技术挑战和跨学科障碍。为了解决这些挑战，我们介绍了Pytorch-Wildlife，这是一个建立在PyTorch上的开源深度学习平台。它旨在创建、修改和共享强大的人工智能模型。该平台强调易用性和可访问性，使得即使是技术背景有限或没有技术背景的个人也能够使用。它还提供了一个模块化的代码库，以简化功能扩展和进一步开发。Pytorch-Wildlife提供了一个直观、用户友好的界面，可通过本地安装或Hugging Face访问，用于图像和视频中的动物检测和分类。作为两个实际应用，Pytorch-Wildlife已被用于训练亚马逊雨林物种识别和加拉帕戈斯群岛入侵负鼠识别的动物分类模型。负鼠模型达到了98%的准确率，而亚马逊模型在90%的数据中对36种动物有92%的识别准确率。随着Pytorch-Wildlife的发展，我们的目标是整合更多的保护任务，解决各种环境挑战。Pytorch-Wildlife可在https://github.com/microsoft/CameraTraps 上获取。

更新时间: 2024-07-01 18:22:38

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2405.12930v3

Exploring the Reversal Curse and Other Deductive Logical Reasoning in BERT and GPT-Based Large Language Models

The term "Reversal Curse" refers to the scenario where auto-regressive decoder large language models (LLMs), such as ChatGPT, trained on "A is B" fail to learn "B is A," assuming that B and A are distinct and can be uniquely identified from each other, demonstrating a basic failure of logical deduction. This raises a red flag in the use of GPT models for certain general tasks such as constructing knowledge graphs, considering their adherence to this symmetric principle. In our study, we examined a bidirectional LLM, BERT, and found that it is immune to the reversal curse. Driven by ongoing efforts to construct biomedical knowledge graphs with LLMs, we also embarked on evaluating more complex but essential deductive reasoning capabilities. This process included first training encoder and decoder language models to master the intersection and union operations on two sets and then moving on to assess their capability to infer different combinations of union and intersection operations on three newly created sets. The findings showed that while both encoder and decoder language models, trained for tasks involving two sets (union/intersection), were proficient in such scenarios, they encountered difficulties when dealing with operations that included three sets (various combinations of union and intersection). Our research highlights the distinct characteristics of encoder and decoder models in simple and complex logical reasoning. In practice, the choice between BERT and GPT should be guided by the specific requirements and nature of the task at hand, leveraging their respective strengths in bidirectional context comprehension and sequence prediction.

Updated: 2024-07-01 18:13:24

标题: 探索BERT和GPT基于大型语言模型中的逆向诅咒和其他演绎逻辑推理

摘要: “逆转诅咒”一词指的是自回归解码器大型语言模型（LLMs），例如ChatGPT，在“A是B”训练失败无法学习“B是A”的情况，假设B和A是不同的并且可以从彼此中唯一识别，展示了逻辑推理的基本失败。这在使用GPT模型进行某些常规任务（如构建知识图谱）时引起了警示，考虑到它们遵循这种对称原则。在我们的研究中，我们研究了一个双向LLM，BERT，并发现它对逆转诅咒免疫。在构建生物医学知识图谱的持续努力驱动下，我们还开始评估更复杂但必不可少的演绎推理能力。这个过程首先包括训练编码器和解码器语言模型来掌握两个集合上的交集和并集操作，然后继续评估它们推断三个新创建的集合上不同组合的并集和交集操作的能力。研究结果显示，虽然针对涉及两个集合的任务（并集/交集）进行训练的编码器和解码器语言模型在这种情况下表现出色，但当涉及包含三个集合的操作时（各种并集和交集的组合），它们遇到了困难。我们的研究突出了编码器和解码器模型在简单和复杂逻辑推理中的不同特征。实际上，在BERT和GPT之间的选择应该受到特定任务要求和性质的指导，利用它们在双向上下文理解和序列预测方面的各自优势。

更新时间: 2024-07-01 18:13:24

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2312.03633v3

NLPGuard: A Framework for Mitigating the Use of Protected Attributes by NLP Classifiers

AI regulations are expected to prohibit machine learning models from using sensitive attributes during training. However, the latest Natural Language Processing (NLP) classifiers, which rely on deep learning, operate as black-box systems, complicating the detection and remediation of such misuse. Traditional bias mitigation methods in NLP aim for comparable performance across different groups based on attributes like gender or race but fail to address the underlying issue of reliance on protected attributes. To partly fix that, we introduce NLPGuard, a framework for mitigating the reliance on protected attributes in NLP classifiers. NLPGuard takes an unlabeled dataset, an existing NLP classifier, and its training data as input, producing a modified training dataset that significantly reduces dependence on protected attributes without compromising accuracy. NLPGuard is applied to three classification tasks: identifying toxic language, sentiment analysis, and occupation classification. Our evaluation shows that current NLP classifiers heavily depend on protected attributes, with up to $23\%$ of the most predictive words associated with these attributes. However, NLPGuard effectively reduces this reliance by up to $79\%$, while slightly improving accuracy.

Updated: 2024-07-01 18:08:17

标题: NLPGuard：一个用于减轻NLP分类器对受保护属性使用的框架

摘要: 预计AI规定将禁止机器学习模型在训练过程中使用敏感属性。然而，最新的自然语言处理（NLP）分类器依赖于深度学习，作为黑匣子系统运行，使得检测和纠正此类误用变得复杂。传统的NLP中的偏见缓解方法旨在基于性别或种族等属性实现不同群体之间的可比性，但未能解决对受保护属性的依赖性的根本问题。为了部分解决这个问题，我们引入了NLPGuard，一个用于缓解NLP分类器对受保护属性依赖性的框架。NLPGuard接受一个未标记的数据集、一个现有的NLP分类器及其训练数据作为输入，生成一个修改后的训练数据集，显著减少对受保护属性的依赖，同时不影响准确性。NLPGuard应用于三个分类任务：识别有害语言、情感分析和职业分类。我们的评估显示，当前的NLP分类器严重依赖受保护属性，最具预测性的单词中最多有23%与这些属性相关。然而，NLPGuard可以有效地将这种依赖性减少高达79%，同时略微提高准确性。

更新时间: 2024-07-01 18:08:17

领域: cs.CL,cs.AI,cs.HC

下载: http://arxiv.org/abs/2407.01697v1

A Survey on Safe Multi-Modal Learning System

In the rapidly evolving landscape of artificial intelligence, multimodal learning systems (MMLS) have gained traction for their ability to process and integrate information from diverse modality inputs. Their expanding use in vital sectors such as healthcare has made safety assurance a critical concern. However, the absence of systematic research into their safety is a significant barrier to progress in this field. To bridge the gap, we present the first taxonomy that systematically categorizes and assesses MMLS safety. This taxonomy is structured around four fundamental pillars that are critical to ensuring the safety of MMLS: robustness, alignment, monitoring, and controllability. Leveraging this taxonomy, we review existing methodologies, benchmarks, and the current state of research, while also pinpointing the principal limitations and gaps in knowledge. Finally, we discuss unique challenges in MMLS safety. In illuminating these challenges, we aim to pave the way for future research, proposing potential directions that could lead to significant advancements in the safety protocols of MMLS.

Updated: 2024-07-01 18:03:26

标题: 一个安全多模态学习系统的调查

摘要: 在人工智能领域迅速发展的背景下，多模态学习系统（MMLS）因其能够处理和整合来自不同模态输入的信息而备受关注。它们在诸如医疗保健等关键领域的广泛应用使安全保障成为一个关键关注点。然而，缺乏对其安全性的系统性研究是该领域进展的重要障碍。为了弥合这一差距，我们提出了第一个系统分类和评估MMLS安全性的分类法。这个分类法围绕着四个关键的支柱结构，这些支柱对确保MMLS的安全性至关重要：鲁棒性、对齐性、监测和可控性。利用这个分类法，我们回顾了现有的方法论、基准和当前的研究现状，同时也指出了主要的局限性和知识空白。最后，我们讨论了MMLS安全性中的独特挑战。通过阐明这些挑战，我们旨在为未来研究铺平道路，提出可能的方向，可能会在MMLS安全协议的重大进展方面取得显著进展。

更新时间: 2024-07-01 18:03:26

领域: cs.CY,cs.AI

下载: http://arxiv.org/abs/2402.05355v5

Deciphering the Factors Influencing the Efficacy of Chain-of-Thought: Probability, Memorization, and Noisy Reasoning

Chain-of-Thought (CoT) prompting has been shown to enhance the multi-step reasoning capabilities of Large Language Models (LLMs). However, debates persist about whether LLMs exhibit abstract generalization or rely on shallow heuristics when given CoT prompts. To understand the factors influencing CoT reasoning we provide a detailed case study of the symbolic reasoning task of decoding shift ciphers, where letters are shifted forward some number of steps in the alphabet. GPT-4 achieves zero accuracy on most shift ciphers with standard prompting, but with CoT its accuracy improves to an average of 32%. By focusing on a single relatively simple task, we are able to identify three factors that systematically affect CoT performance: the probability of the task's expected output (probability), what the model has implicitly learned during pre-training (memorization), and the number of intermediate operations involved in reasoning (noisy reasoning). We show that these factors can drastically influence the task accuracy; e.g., varying the output's probability of occurrence can shift accuracy from 26% to 70%. We also demonstrate that it is essential for the model to explicitly produce intermediate steps as output that can be conditioned on to increase the probability of the correct answer. Our experiments indicate that as long as the model does so, the validity of the demonstrations in the prompt does not matter. Overall, we conclude that CoT prompting performance reflects both memorization and a probabilistic version of genuine reasoning.

Updated: 2024-07-01 18:01:07

标题: 破解影响思维链条效力的因素：概率、记忆和嘈杂推理

摘要: 链式思维（CoT）提示已经被证明可以增强大型语言模型（LLMs）的多步推理能力。然而，关于LLMs在给定CoT提示时是展现抽象概括还是依赖浅层启发式方法仍存在争论。为了了解影响CoT推理的因素，我们提供了一个详细的案例研究，即解码移位密码的符号推理任务，其中字母在字母表中向前移动一定数量的步骤。GPT-4在大多数标准提示下的移位密码上准确率为零，但使用CoT后，其准确率平均提高到32%。通过专注于一个相对简单的任务，我们能够确定三个系统性影响CoT表现的因素：任务期望输出的概率（概率）、模型在预训练期间隐式学习的内容（记忆）以及推理中涉及的中间操作数量（嘈杂推理）。我们展示这些因素可以显著影响任务准确率；例如，改变输出出现的概率可以将准确率从26%提高到70%。我们还证明，模型明确产生中间步骤作为输出是至关重要的，这可以被条件化以增加正确答案的概率。我们的实验表明，只要模型这样做，提示中展示的演示的有效性并不重要。总体而言，我们得出结论，CoT提示的表现既反映了记忆化，又反映了真正推理的概率版本。

更新时间: 2024-07-01 18:01:07

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2407.01687v1

Everything that can be learned about a causal structure with latent variables by observational and interventional probing schemes

What types of differences among causal structures with latent variables are impossible to distinguish by statistical data obtained by probing each visible variable? If the probing scheme is simply passive observation, then it is well-known that many different causal structures can realize the same joint probability distributions. Even for the simplest case of two visible variables, for instance, one cannot distinguish between one variable being a causal parent of the other and the two variables sharing a latent common cause. However, it is possible to distinguish between these two causal structures if we have recourse to more powerful probing schemes, such as the possibility of intervening on one of the variables and observing the other. Herein, we address the question of which causal structures remain indistinguishable even given the most informative types of probing schemes on the visible variables. We find that two causal structures remain indistinguishable if and only if they are both associated with the same mDAG structure (as defined by Evans (2016)). We also consider the question of when one causal structure dominates another in the sense that it can realize all of the joint probability distributions that can be realized by the other using a given probing scheme. (Equivalence of causal structures is the special case of mutual dominance.) Finally, we investigate to what extent one can weaken the probing schemes implemented on the visible variables and still have the same discrimination power as a maximally informative probing scheme.

Updated: 2024-07-01 18:01:07

标题: 通过观察和干预探测方案学习潜变量因果结构的全部内容

摘要: 哪些具有潜在变量的因果结构之间的差异是不可能通过对每个可见变量进行探测获得的统计数据来区分的？如果探测方案仅是被动观察，那么众所周知，许多不同的因果结构可以实现相同的联合概率分布。即使对于最简单的两个可见变量的情况，例如，一个变量是另一个的因果父节点，或者这两个变量共享一个潜在的共同原因是无法区分的。然而，如果我们可以利用更强大的探测方案，比如干预其中一个变量并观察另一个变量，就可以区分这两种因果结构。在本文中，我们探讨了即使在可见变量上使用最具信息性的探测方案，哪些因果结构仍然无法区分。我们发现，只有当两个因果结构都与相同的mDAG结构相关时，这两个因果结构才无法区分（如Evans（2016）定义的）。我们还考虑了一个因果结构何时支配另一个因果结构的问题，即在某种意义上，它可以利用给定的探测方案实现另一个可以实现的所有联合概率分布。（因果结构的等价性是互相支配的特殊情况。）最后，我们调查了在可见变量上实施的探测方案可以被削弱到什么程度，仍然具有与最具信息性的探测方案相同的区分能力。

更新时间: 2024-07-01 18:01:07

领域: stat.ML,cs.LG,quant-ph

下载: http://arxiv.org/abs/2407.01686v1

Sparse Diffusion Policy: A Sparse, Reusable, and Flexible Policy for Robot Learning

The increasing complexity of tasks in robotics demands efficient strategies for multitask and continual learning. Traditional models typically rely on a universal policy for all tasks, facing challenges such as high computational costs and catastrophic forgetting when learning new tasks. To address these issues, we introduce a sparse, reusable, and flexible policy, Sparse Diffusion Policy (SDP). By adopting Mixture of Experts (MoE) within a transformer-based diffusion policy, SDP selectively activates experts and skills, enabling efficient and task-specific learning without retraining the entire model. SDP not only reduces the burden of active parameters but also facilitates the seamless integration and reuse of experts across various tasks. Extensive experiments on diverse tasks in both simulations and real world show that SDP 1) excels in multitask scenarios with negligible increases in active parameters, 2) prevents forgetting in continual learning of new tasks, and 3) enables efficient task transfer, offering a promising solution for advanced robotic applications. Demos and codes can be found in https://forrest-110.github.io/sparse_diffusion_policy/.

Updated: 2024-07-01 17:59:56

标题: 稀疏扩散策略：一种稀疏、可重复使用和灵活的机器人学习策略

摘要: 机器人学中任务的复杂性不断增加，需要高效的多任务和持续学习策略。传统模型通常依赖于所有任务的通用策略，面临诸如高计算成本和学习新任务时的灾难性遗忘等挑战。为了解决这些问题，我们引入了一种稀疏、可重复使用和灵活的策略，稀疏扩散策略（SDP）。通过在基于Transformer的扩散策略中采用专家混合（MoE），SDP可以选择性地激活专家和技能，实现高效和任务特定的学习，而无需重新训练整个模型。SDP不仅减轻了活跃参数的负担，还促进了专家在各种任务中的无缝集成和重复使用。在模拟和真实世界中进行了大量实验，结果表明SDP在多任务场景中表现出色，活跃参数几乎没有增加，避免了对新任务的持续学习中的遗忘，并实现了有效的任务转移，为先进的机器人应用提供了有希望的解决方案。示范和代码可在https://forrest-110.github.io/sparse_diffusion_policy/找到。

更新时间: 2024-07-01 17:59:56

领域: cs.RO,cs.LG

下载: http://arxiv.org/abs/2407.01531v1

On the Abuse and Detection of Polyglot Files

A polyglot is a file that is valid in two or more formats. Polyglot files pose a problem for malware detection systems that route files to format-specific detectors/signatures, as well as file upload and sanitization tools. In this work we found that existing file-format and embedded-file detection tools, even those developed specifically for polyglot files, fail to reliably detect polyglot files used in the wild, leaving organizations vulnerable to attack. To address this issue, we studied the use of polyglot files by malicious actors in the wild, finding $30$ polyglot samples and $15$ attack chains that leveraged polyglot files. In this report, we highlight two well-known APTs whose cyber attack chains relied on polyglot files to bypass detection mechanisms. Using knowledge from our survey of polyglot usage in the wild -- the first of its kind -- we created a novel data set based on adversary techniques. We then trained a machine learning detection solution, PolyConv, using this data set. PolyConv achieves a precision-recall area-under-curve score of $0.999$ with an F1 score of $99.20$% for polyglot detection and $99.47$% for file-format identification, significantly outperforming all other tools tested. We developed a content disarmament and reconstruction tool, ImSan, that successfully sanitized $100$% of the tested image-based polyglots, which were the most common type found via the survey. Our work provides concrete tools and suggestions to enable defenders to better defend themselves against polyglot files, as well as directions for future work to create more robust file specifications and methods of disarmament.

Updated: 2024-07-01 17:59:54

标题: 关于混合文件的滥用和检测

摘要: 多语言文件是在两种或更多格式中有效的文件。多语言文件对于将文件路由到特定格式检测器/签名以及文件上传和清理工具的恶意软件检测系统构成问题。在这项工作中，我们发现现有的文件格式和嵌入文件检测工具，即使是专门针对多语言文件开发的工具，也未能可靠地检测到野外使用的多语言文件，使组织容易受到攻击。为了解决这个问题，我们研究了恶意行为者在野外使用多语言文件的情况，发现了30个多语言样本和15个利用多语言文件的攻击链。在这份报告中，我们重点介绍了两个依赖多语言文件绕过检测机制的网络攻击链的知名APT。利用我们对野外多语言使用情况的调查知识 - 这是第一次这样的调查 - 我们基于对手技术创建了一个新颖的数据集。然后我们使用这个数据集训练了一个名为PolyConv的机器学习检测解决方案。PolyConv在多语言检测方面实现了0.999的精确度-召回率曲线下面积得分，F1得分为99.20%，在文件格式识别方面为99.47%，明显优于所有其他测试过的工具。我们开发了一个内容消毒和重建工具ImSan，成功清理了所有经过测试的基于图像的多语言文件，这是通过调查发现的最常见类型。我们的工作提供了具体的工具和建议，使防御者能够更好地抵御多语言文件的攻击，并为未来工作提供了创建更强大的文件规范和消毒方法的方向。

更新时间: 2024-07-01 17:59:54

领域: cs.CR,cs.LG

下载: http://arxiv.org/abs/2407.01529v1

Scalable Nested Optimization for Deep Learning

Gradient-based optimization has been critical to the success of machine learning, updating a single set of parameters to minimize a single loss. A growing number of applications rely on a generalization of this, where we have a bilevel or nested optimization of which subsets of parameters update on different objectives nested inside each other. We focus on motivating examples of hyperparameter optimization and generative adversarial networks. However, naively applying classical methods often fails when we look at solving these nested problems on a large scale. In this thesis, we build tools for nested optimization that scale to deep learning setups.

Updated: 2024-07-01 17:59:41

标题: 深度学习的可扩展嵌套优化

摘要: 梯度优化对机器学习的成功至关重要，它更新一组参数以最小化单一损失。越来越多的应用程序依赖于这种概括，其中我们有一个双层或嵌套优化，其中不同目标的子集在彼此内部嵌套更新。我们重点关注超参数优化和生成对抗网络的激励示例。然而，当我们试图解决这些大规模嵌套问题时，朴素地应用传统方法经常失败。在这篇论文中，我们构建了适用于深度学习设置的嵌套优化工具。

更新时间: 2024-07-01 17:59:41

领域: cs.LG,cs.AI,cs.NE,math.OC,stat.ML,68T05,I.2.6; I.2.8; I.5.1; G.1.6

下载: http://arxiv.org/abs/2407.01526v1

Improving Diffusion Inverse Problem Solving with Decoupled Noise Annealing

Diffusion models have recently achieved success in solving Bayesian inverse problems with learned data priors. Current methods build on top of the diffusion sampling process, where each denoising step makes small modifications to samples from the previous step. However, this process struggles to correct errors from earlier sampling steps, leading to worse performance in complicated nonlinear inverse problems, such as phase retrieval. To address this challenge, we propose a new method called Decoupled Annealing Posterior Sampling (DAPS) that relies on a novel noise annealing process. Specifically, we decouple consecutive steps in a diffusion sampling trajectory, allowing them to vary considerably from one another while ensuring their time-marginals anneal to the true posterior as we reduce noise levels. This approach enables the exploration of a larger solution space, improving the success rate for accurate reconstructions. We demonstrate that DAPS significantly improves sample quality and stability across multiple image restoration tasks, particularly in complicated nonlinear inverse problems. For example, we achieve a PSNR of 30.72dB on the FFHQ 256 dataset for phase retrieval, which is an improvement of 9.12dB compared to existing methods.

Updated: 2024-07-01 17:59:23

标题: 优化分离噪声退火的扩散逆问题求解

摘要: 扩散模型最近在解决具有学习数据先验的贝叶斯逆问题方面取得了成功。当前的方法建立在扩散采样过程的基础上，其中每个去噪步骤对上一步的样本进行微小修改。然而，这个过程在纠正早期采样步骤中的错误时遇到困难，导致在复杂的非线性逆问题中（例如相位恢复）性能较差。为了解决这个挑战，我们提出了一种名为Decoupled Annealing Posterior Sampling（DAPS）的新方法，该方法依赖于一种新颖的噪声退火过程。具体地，我们在扩散采样轨迹中解耦连续步骤，允许它们彼此之间有很大变化，同时确保它们的时间边缘在我们降低噪声水平时退火到真实的后验分布。这种方法使得可以探索更大的解空间，提高准确重建的成功率。我们证明，DAPS显著提高了在多个图像恢复任务中的样本质量和稳定性，特别是在复杂的非线性逆问题中。例如，对于相位恢复，我们在FFHQ 256数据集上实现了30.72dB的PSNR，相对于现有方法的改进为9.12dB。

更新时间: 2024-07-01 17:59:23

领域: cs.LG,cs.AI,cs.CV

下载: http://arxiv.org/abs/2407.01521v1

Towards Multimodal Open-Set Domain Generalization and Adaptation through Self-supervision

The task of open-set domain generalization (OSDG) involves recognizing novel classes within unseen domains, which becomes more challenging with multiple modalities as input. Existing works have only addressed unimodal OSDG within the meta-learning framework, without considering multimodal scenarios. In this work, we introduce a novel approach to address Multimodal Open-Set Domain Generalization (MM-OSDG) for the first time, utilizing self-supervision. To this end, we introduce two innovative multimodal self-supervised pretext tasks: Masked Cross-modal Translation and Multimodal Jigsaw Puzzles. These tasks facilitate the learning of multimodal representative features, thereby enhancing generalization and open-class detection capabilities. Additionally, we propose a novel entropy weighting mechanism to balance the loss across different modalities. Furthermore, we extend our approach to tackle also the Multimodal Open-Set Domain Adaptation (MM-OSDA) problem, especially in scenarios where unlabeled data from the target domain is available. Extensive experiments conducted under MM-OSDG, MM-OSDA, and Multimodal Closed-Set DG settings on the EPIC-Kitchens and HAC datasets demonstrate the efficacy and versatility of the proposed approach. Our source code is available at https://github.com/donghao51/MOOSA.

Updated: 2024-07-01 17:59:09

标题: 朝向多模态开放领域泛化和自监督适应的发展

摘要: 开放域泛化任务(OSDG)涉及在看不见的领域中识别新颖类别，当输入是多模态时更具挑战性。现有研究仅在元学习框架中解决了单模态OSDG，而没有考虑多模态场景。在本研究中，我们首次引入一种新方法来解决多模态开放域泛化(MM-OSDG)，利用自监督学习。为此，我们引入了两种创新的多模态自监督预文本任务：蒙面跨模态翻译和多模态拼图。这些任务促进了多模态代表性特征的学习，从而增强了泛化和开放类别检测能力。此外，我们提出了一种新的熵权重机制来平衡不同模态之间的损失。此外，我们扩展了我们的方法，以解决多模态开放域自适应(MM-OSDA)问题，特别是在目标领域存在未标记数据的情况下。在EPIC-Kitchens和HAC数据集上进行的大量实验证明了所提出方法的效力和多功能性。我们的源代码可在https://github.com/donghao51/MOOSA 上找到。

更新时间: 2024-07-01 17:59:09

领域: cs.CV,cs.AI,cs.LG

下载: http://arxiv.org/abs/2407.01518v1

Centerline Boundary Dice Loss for Vascular Segmentation

Vascular segmentation in medical imaging plays a crucial role in analysing morphological and functional assessments. Traditional methods, like the centerline Dice (clDice) loss, ensure topology preservation but falter in capturing geometric details, especially under translation and deformation. The combination of clDice with traditional Dice loss can lead to diameter imbalance, favoring larger vessels. Addressing these challenges, we introduce the centerline boundary Dice (cbDice) loss function, which harmonizes topological integrity and geometric nuances, ensuring consistent segmentation across various vessel sizes. cbDice enriches the clDice approach by including boundary-aware aspects, thereby improving geometric detail recognition. It matches the performance of the boundary difference over union (B-DoU) loss through a mask-distance-based approach, enhancing traslation sensitivity. Crucially, cbDice incorporates radius information from vascular skeletons, enabling uniform adaptation to vascular diameter changes and maintaining balance in branch growth and fracture impacts. Furthermore, we conducted a theoretical analysis of clDice variants (cl-X-Dice). We validated cbDice's efficacy on three diverse vascular segmentation datasets, encompassing both 2D and 3D, and binary and multi-class segmentation. Particularly, the method integrated with cbDice demonstrated outstanding performance on the MICCAI 2023 TopCoW Challenge dataset. Our code is made publicly available at: https://github.com/PengchengShi1220/cbDice.

Updated: 2024-07-01 17:58:44

标题: 中心线边界Dice损失用于血管分割

摘要: 在医学影像中，血管分割在分析形态和功能评估中起着至关重要的作用。传统方法，如中心线Dice（clDice）损失，确保拓扑保持，但在捕获几何细节方面存在不足，特别是在平移和变形下。将clDice与传统Dice损失结合可以导致直径不平衡，偏向较大的血管。为了解决这些挑战，我们引入了中心线边界Dice（cbDice）损失函数，它使拓扑完整性和几何细微差别相协调，确保在各种血管大小上一致的分割。cbDice通过包含边界感知方面丰富了clDice方法，从而改善了几何细节识别。它通过基于掩模距离的方法匹配了边界差异联合（B-DoU）损失的性能，增强了平移敏感性。至关重要的是，cbDice从血管骨架中获取半径信息，使其能够统一适应血管直径变化，并保持分支生长和断裂影响的平衡。此外，我们对clDice变种（cl-X-Dice）进行了理论分析。我们在三种不同的血管分割数据集上验证了cbDice的有效性，包括2D和3D，以及二进制和多类别分割。特别是，与cbDice集成的方法在MICCAI 2023 TopCoW挑战数据集上表现出色。我们的代码可以公开访问：https://github.com/PengchengShi1220/cbDice。

更新时间: 2024-07-01 17:58:44

领域: eess.IV,cs.CV,cs.LG

下载: http://arxiv.org/abs/2407.01517v1

Neural Distributed Source Coding

Distributed source coding (DSC) is the task of encoding an input in the absence of correlated side information that is only available to the decoder. Remarkably, Slepian and Wolf showed in 1973 that an encoder without access to the side information can asymptotically achieve the same compression rate as when the side information is available to it. While there is vast prior work on this topic, practical DSC has been limited to synthetic datasets and specific correlation structures. Here we present a framework for lossy DSC that is agnostic to the correlation structure and can scale to high dimensions. Rather than relying on hand-crafted source modeling, our method utilizes a conditional Vector-Quantized Variational Autoencoder (VQ-VAE) to learn the distributed encoder and decoder. We evaluate our method on multiple datasets and show that our method can handle complex correlations and achieves state-of-the-art PSNR. Our code is made available at https://github.com/acnagle/neural-dsc.

Updated: 2024-07-01 17:56:00

标题: 神经网络分布式源编码

摘要: 分布式源编码（DSC）是在缺乏仅供解码器使用的相关侧信息的情况下对输入进行编码的任务。值得注意的是，Slepian和Wolf在1973年表明，一个没有访问侧信息的编码器可以渐近地实现与侧信息可用时相同的压缩率。虽然在这个主题上有大量的先前工作，但实际的DSC一直局限于合成数据集和特定的相关结构。在这里，我们提出了一个适用于有损DSC的框架，对相关结构不加假设，并且可以扩展到高维度。我们的方法不依赖于手工制作的源建模，而是利用条件矢量量化变分自动编码器（VQ-VAE）来学习分布式编码器和解码器。我们在多个数据集上评估了我们的方法，并展示了我们的方法能够处理复杂的相关性并实现最先进的PSNR。我们的代码可以在https://github.com/acnagle/neural-dsc 上找到。

更新时间: 2024-07-01 17:56:00

领域: cs.IT,cs.LG,math.IT

下载: http://arxiv.org/abs/2106.02797v4

Open-TeleVision: Teleoperation with Immersive Active Visual Feedback

Teleoperation serves as a powerful method for collecting on-robot data essential for robot learning from demonstrations. The intuitiveness and ease of use of the teleoperation system are crucial for ensuring high-quality, diverse, and scalable data. To achieve this, we propose an immersive teleoperation system Open-TeleVision that allows operators to actively perceive the robot's surroundings in a stereoscopic manner. Additionally, the system mirrors the operator's arm and hand movements on the robot, creating an immersive experience as if the operator's mind is transmitted to a robot embodiment. We validate the effectiveness of our system by collecting data and training imitation learning policies on four long-horizon, precise tasks (Can Sorting, Can Insertion, Folding, and Unloading) for 2 different humanoid robots and deploy them in the real world. The system is open-sourced at: https://robot-tv.github.io/

Updated: 2024-07-01 17:55:35

标题: 打开电视：具有沉浸式主动视觉反馈的远程操作

摘要: 远程操作作为一种强大的方法，用于收集机器人学习示范所必需的数据。远程操作系统的直观性和易用性对于确保高质量、多样化和可扩展的数据至关重要。为了实现这一目标，我们提出了一种沉浸式远程操作系统Open-TeleVision，允许操作员以立体方式主动感知机器人的周围环境。此外，该系统在机器人上镜像操作员的手臂和手部动作，创造了一种沉浸式体验，仿佛操作员的思维被传输到一个机器人的具象化身上。我们通过收集数据并对四项长期、精确任务（罐子分类、罐子插入、折叠和卸载）进行模仿学习策略的训练，验证了我们系统的有效性，适用于两种不同的人形机器人，并在真实世界中部署。该系统的源码可在以下网址获取：https://robot-tv.github.io/

更新时间: 2024-07-01 17:55:35

领域: cs.RO,cs.HC,cs.LG

下载: http://arxiv.org/abs/2407.01512v1

CRAB: Cross-environment Agent Benchmark for Multimodal Language Model Agents

The development of autonomous agents increasingly relies on Multimodal Language Models (MLMs) to perform tasks described in natural language with GUI environments, such as websites, desktop computers, or mobile phones. Existing benchmarks for MLM agents in interactive environments are limited by their focus on a single environment, lack of detailed and generalized evaluation methods, and the complexities of constructing tasks and evaluators. To overcome these limitations, we introduce Crab, the first agent benchmark framework designed to support cross-environment tasks, incorporating a graph-based fine-grained evaluation method and an efficient mechanism for task and evaluator construction. Our framework supports multiple devices and can be easily extended to any environment with a Python interface. Leveraging Crab, we developed a cross-platform Crab Benchmark-v0 comprising 100 tasks in computer desktop and mobile phone environments. We evaluated four advanced MLMs using different single and multi-agent system configurations on this benchmark. The experimental results demonstrate that the single agent with GPT-4o achieves the best completion ratio of 35.26%. All framework code, agent code, and task datasets are publicly available at https://github.com/camel-ai/crab.

Updated: 2024-07-01 17:55:04

标题: CRAB: 跨环境代理基准测试用于多模态语言模型代理

摘要: 独立代理的发展越来越依赖于多模态语言模型（MLMs）来执行自然语言描述的任务，如网站、台式电脑或手机等GUI环境中的任务。现有的互动环境中MLM代理的基准受限于它们专注于单一环境、缺乏详细和泛化评估方法以及构建任务和评估器的复杂性。为了克服这些限制，我们引入了Crab，这是第一个旨在支持跨环境任务的代理基准框架，包括基于图的细粒度评估方法和一种有效的任务和评估器构建机制。我们的框架支持多种设备，并且可以通过Python接口轻松扩展到任何环境。利用Crab，我们开发了一个跨平台的Crab Benchmark-v0，包括在计算机台式机和手机环境中的100个任务。我们在这个基准上评估了四种先进的MLMs，使用不同的单一和多代理系统配置。实验结果表明，采用GPT-4o的单一代理实现了最佳的完成比例为35.26%。所有框架代码、代理代码和任务数据集都可以在https://github.com/camel-ai/crab 上公开获取。

更新时间: 2024-07-01 17:55:04

领域: cs.AI

下载: http://arxiv.org/abs/2407.01511v1

Self-Cognition in Large Language Models: An Exploratory Study

While Large Language Models (LLMs) have achieved remarkable success across various applications, they also raise concerns regarding self-cognition. In this paper, we perform a pioneering study to explore self-cognition in LLMs. Specifically, we first construct a pool of self-cognition instruction prompts to evaluate where an LLM exhibits self-cognition and four well-designed principles to quantify LLMs' self-cognition. Our study reveals that 4 of the 48 models on Chatbot Arena--specifically Command R, Claude3-Opus, Llama-3-70b-Instruct, and Reka-core--demonstrate some level of detectable self-cognition. We observe a positive correlation between model size, training data quality, and self-cognition level. Additionally, we also explore the utility and trustworthiness of LLM in the self-cognition state, revealing that the self-cognition state enhances some specific tasks such as creative writing and exaggeration. We believe that our work can serve as an inspiration for further research to study the self-cognition in LLMs.

Updated: 2024-07-01 17:52:05

标题: 大型语言模型中的自我认知：一项探索性研究

摘要: 尽管大型语言模型（LLMs）在各种应用领域取得了显著成功，但它们也引发了关于自我认知的担忧。在本文中，我们进行了一项开创性研究，探索LLMs中的自我认知。具体来说，我们首先构建了一个自我认知指令提示池，以评估LLMs展现自我认知的位置，并设计了四个精心设计的原则来量化LLMs的自我认知。我们的研究发现，在Chatbot Arena的48个模型中，有4个模型，即Command R、Claude3-Opus、Llama-3-70b-Instruct和Reka-core，展示出一定程度的可检测到的自我认知。我们观察到模型大小、训练数据质量和自我认知水平之间存在正相关。此外，我们还探讨了LLM在自我认知状态下的实用性和可信度，揭示了自我认知状态增强了一些特定任务的能力，如创意写作和夸张。我们相信我们的工作可以作为进一步研究LLMs中自我认知的灵感来源。

更新时间: 2024-07-01 17:52:05

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2407.01505v1

Reinvestigating the R2 Indicator: Achieving Pareto Compliance by Integration

In multi-objective optimization, set-based quality indicators are a cornerstone of benchmarking and performance assessment. They capture the quality of a set of trade-off solutions by reducing it to a scalar number. One of the most commonly used set-based metrics is the R2 indicator, which describes the expected utility of a solution set to a decision-maker under a distribution of utility functions. Typically, this indicator is applied by discretizing this distribution of utility functions, yielding a weakly Pareto-compliant indicator. In consequence, adding a nondominated or dominating solution to a solution set may - but does not have to - improve the indicator's value. In this paper, we reinvestigate the R2 indicator under the premise that we have a continuous, uniform distribution of (Tchebycheff) utility functions. We analyze its properties in detail, demonstrating that this continuous variant is indeed Pareto-compliant - that is, any beneficial solution will improve the metric's value. Additionally, we provide an efficient computational procedure to compute this metric for bi-objective problems in $\mathcal O (N \log N)$. As a result, this work contributes to the state-of-the-art Pareto-compliant unary performance metrics, such as the hypervolume indicator, offering an efficient and promising alternative.

Updated: 2024-07-01 17:50:44

标题: 重新调查R2指标：通过整合实现帕累托合规

摘要: 在多目标优化中，基于集合的质量指标是基准和性能评估的基石。它们通过将一组权衡解决方案降为一个标量数来捕捉解决方案的质量。其中最常用的基于集合的度量标准之一是R2指标，它描述了在效用函数分布下解决方案集对决策者的预期效用。通常，通过将效用函数分布离散化，应用这一指标，从而得到一个弱Pareto兼容指标。因此，向解决方案集添加一个非支配或支配解决方案可能会改善该指标的值，但不一定。在本文中，我们重新研究了R2指标，假设我们有一个连续的均匀分布的（切比雪夫）效用函数。我们详细分析了它的属性，证明这个连续变体确实是Pareto兼容的 - 也就是说，任何有益的解决方案都将改善度量标准的值。此外，我们提供了一个有效的计算程序，在$\mathcal O (N \log N)$的时间复杂度内计算双目标问题的这个度量标准。因此，这项工作为现有的Pareto兼容的一元性能度量标准，如超体积指标，提供了一种高效且有前景的替代方案。

更新时间: 2024-07-01 17:50:44

领域: math.OC,cs.AI

下载: http://arxiv.org/abs/2407.01504v1

AI Agents That Matter

AI agents are an exciting new research direction, and agent development is driven by benchmarks. Our analysis of current agent benchmarks and evaluation practices reveals several shortcomings that hinder their usefulness in real-world applications. First, there is a narrow focus on accuracy without attention to other metrics. As a result, SOTA agents are needlessly complex and costly, and the community has reached mistaken conclusions about the sources of accuracy gains. Our focus on cost in addition to accuracy motivates the new goal of jointly optimizing the two metrics. We design and implement one such optimization, showing its potential to greatly reduce cost while maintaining accuracy. Second, the benchmarking needs of model and downstream developers have been conflated, making it hard to identify which agent would be best suited for a particular application. Third, many agent benchmarks have inadequate holdout sets, and sometimes none at all. This has led to agents that are fragile because they take shortcuts and overfit to the benchmark in various ways. We prescribe a principled framework for avoiding overfitting. Finally, there is a lack of standardization in evaluation practices, leading to a pervasive lack of reproducibility. We hope that the steps we introduce for addressing these shortcomings will spur the development of agents that are useful in the real world and not just accurate on benchmarks.

Updated: 2024-07-01 17:48:14

标题: AI代理人的重要性

摘要: AI代理是一个令人兴奋的新的研究方向，代理的发展受基准的驱动。我们对当前代理基准和评估实践的分析揭示了几个缺点，这些缺点阻碍了它们在现实应用中的有效性。首先，对准确性的狭窄关注，而没有关注其他指标。结果，SOTA代理是不必要地复杂和昂贵的，社区已经对准确性提升的来源达成了错误的结论。我们除了准确性之外，对成本的关注推动了同时优化这两个指标的新目标。我们设计和实施了这样一个优化过程，展示了它在保持准确性的同时大大降低成本的潜力。其次，模型和下游开发者的基准需求被混淆在一起，使得很难确定哪个代理适合于特定的应用。第三，许多代理基准的保留集不足，有时甚至没有。这导致了代理的脆弱性，因为它们采取捷径并以各种方式对基准进行过拟合。我们提出了一个避免过拟合的原则性框架。最后，评估实践缺乏标准化，导致普遍缺乏可重现性。我们希望我们提出的解决这些缺点的步骤将推动开发出在现实世界中有用而不仅仅在基准上准确的代理。

更新时间: 2024-07-01 17:48:14

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2407.01502v1

Online Learning of Temporal Dependencies for Sustainable Foraging Problem

The sustainable foraging problem is a dynamic environment testbed for exploring the forms of agent cognition in dealing with social dilemmas in a multi-agent setting. The agents need to resist the temptation of individual rewards through foraging and choose the collective long-term goal of sustainability. We investigate methods of online learning in Neuro-Evolution and Deep Recurrent Q-Networks to enable agents to attempt the problem one-shot as is often required by wicked social problems. We further explore if learning temporal dependencies with Long Short-Term Memory may be able to aid the agents in developing sustainable foraging strategies in the long term. It was found that the integration of Long Short-Term Memory assisted agents in developing sustainable strategies for a single agent, however failed to assist agents in managing the social dilemma that arises in the multi-agent scenario.

Updated: 2024-07-01 17:47:31

标题: 在线学习时间依赖性以解决可持续觅食问题

摘要: 可持续的觅食问题是一个动态环境测试平台，用于探索在多智能体环境中处理社会困境时的智能体认知形式。智能体需要抵制通过觅食获得个人奖励的诱惑，并选择可持续性的集体长期目标。我们研究了神经进化和深度递归Q网络中的在线学习方法，使智能体能够尝试一次性解决问题，这通常是恶劣社会问题所要求的。我们进一步探讨了使用长短期记忆学习时间依赖关系是否能够帮助智能体长期发展可持续的觅食策略。研究发现，长短期记忆的整合有助于单个智能体发展可持续的策略，但未能帮助智能体处理多智能体场景中出现的社会困境。

更新时间: 2024-07-01 17:47:31

领域: cs.MA,cs.LG,cs.NE

下载: http://arxiv.org/abs/2407.01501v1

Adam-mini: Use Fewer Learning Rates To Gain More

We propose Adam-mini, an optimizer that achieves on-par or better performance than AdamW with 45% to 50% less memory footprint. Adam-mini reduces memory by cutting down the learning rate resources in Adam (i.e., $1/\sqrt{v}$). We find that $\geq$ 90% of these learning rates in $v$ could be harmlessly removed if we (1) carefully partition the parameters into blocks following our proposed principle on Hessian structure; (2) assign a single but good learning rate to each parameter block. We further find that, for each of these parameter blocks, there exists a single high-quality learning rate that can outperform Adam, provided that sufficient resources are available to search it out. We then provide one cost-effective way to find good learning rates and propose Adam-mini. Empirically, we verify that Adam-mini performs on par or better than AdamW on various language models sized from 125M to 7B for pre-training, supervised fine-tuning, and RLHF. The reduced memory footprint of Adam-mini also alleviates communication overheads among GPUs and CPUs, thereby increasing throughput. For instance, Adam-mini achieves 49.6% higher throughput than AdamW when pre-training Llama2-7B on $2\times$ A800-80GB GPUs, which saves 33% wall-clock time for pre-training.

Updated: 2024-07-01 17:46:19

标题: Adam-mini：使用更少的学习率获得更多收益

摘要: 我们提出了Adam-mini，这是一种优化器，在占用45%至50%更少内存的情况下，能够实现与AdamW相当甚至更好的性能。Adam-mini通过减少Adam中的学习速率资源（即$1/\sqrt{v}$）来减少内存。我们发现，如果我们（1）根据我们提出的Hessian结构原则，仔细将参数分成块；（2）为每个参数块分配一个单一但良好的学习速率，那么$v$中$\geq$90%的这些学习速率可以被无害地移除。我们进一步发现，对于这些参数块中的每一个，存在一个可以超越Adam的高质量学习速率，只要有足够的资源来搜索它。然后我们提供了一种成本效益的方法来找到好的学习速率，并提出了Adam-mini。在经验上，我们验证了Adam-mini在各种大小从125M到7B的语言模型上的预训练、监督微调和RLHF方面的性能与AdamW相当或更好。Adam-mini的减少内存占用也减轻了GPU和CPU之间的通信开销，从而提高了吞吐量。例如，当在$2\times$ A800-80GB GPU上预训练Llama2-7B时，Adam-mini的吞吐量比AdamW高出49.6%，为预训练节省了33%的挂钟时间。

更新时间: 2024-07-01 17:46:19

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2406.16793v4

Pictures Of MIDI: Controlled Music Generation via Graphical Prompts for Image-Based Diffusion Inpainting

Recent years have witnessed significant progress in generative models for music, featuring diverse architectures that balance output quality, diversity, speed, and user control. This study explores a user-friendly graphical interface enabling the drawing of masked regions for inpainting by an Hourglass Diffusion Transformer (HDiT) model trained on MIDI piano roll images. To enhance note generation in specified areas, masked regions can be "repainted" with extra noise. The non-latent HDiTs linear scaling with pixel count allows efficient generation in pixel space, providing intuitive and interpretable controls such as masking throughout the network and removing the need to operate in compressed latent spaces such as those provided by pretrained autoencoders. We demonstrate that, in addition to inpainting of melodies, accompaniment, and continuations, the use of repainting can help increase note density yielding musical structures closely matching user specifications such as rising, falling, or diverging melody and/or accompaniment, even when these lie outside the typical training data distribution. We achieve performance on par with prior results while operating at longer context windows, with no autoencoder, and can enable complex geometries for inpainting masks, increasing the options for machine-assisted composers to control the generated music.

Updated: 2024-07-01 17:43:45

标题: MIDI的图片：通过图形提示控制音乐生成，用于基于图像的扩散修补

摘要: 近年来，在音乐生成模型方面取得了显著进展，其中包括平衡输出质量、多样性、速度和用户控制的各种架构。本研究探讨了一种用户友好的图形界面，可以通过绘制遮罩区域来进行修复，修复模型采用了在MIDI钢琴卷图像上训练的Hourglass扩散变压器（HDiT）模型。为了增强指定区域的音符生成，可以使用额外的噪音重新绘制遮罩区域。非潜变HDiTs与像素数量呈线性比例，可以在像素空间中高效生成，提供直观且可解释的控制，例如在整个网络中进行遮罩和消除在预训练的自动编码器提供的压缩潜在空间中操作的需要。我们展示了，除了修复旋律、伴奏和延续外，重新绘制的使用可以帮助增加音符密度，产生与用户规格密切匹配的音乐结构，例如上升、下降或分歧的旋律和/或伴奏，即使这些规格超出了典型的训练数据分布。我们在操作更长的上下文窗口时，达到了与以前结果相当的性能，无需自动编码器，可以为修复遮罩提供复杂的几何形状，增加了机器辅助作曲家控制生成音乐的选项。

更新时间: 2024-07-01 17:43:45

领域: cs.SD,cs.LG,eess.AS,J.5; I.2.0; I.4.0

下载: http://arxiv.org/abs/2407.01499v1

Fast Iterative Solver For Neural Network Method: II. 1D Diffusion-Reaction Problems And Data Fitting

This paper expands the damped block Newton (dBN) method introduced recently in [4] for 1D diffusion-reaction equations and least-squares data fitting problems. To determine the linear parameters (the weights and bias of the output layer) of the neural network (NN), the dBN method requires solving systems of linear equations involving the mass matrix. While the mass matrix for local hat basis functions is tri-diagonal and well-conditioned, the mass matrix for NNs is dense and ill-conditioned. For example, the condition number of the NN mass matrix for quasi-uniform meshes is at least ${\cal O}(n^4)$. We present a factorization of the mass matrix that enables solving the systems of linear equations in ${\cal O}(n)$ operations. To determine the non-linear parameters (the weights and bias of the hidden layer), one step of a damped Newton method is employed at each iteration. A Gauss-Newton method is used in place of Newton for the instances in which the Hessian matrices are singular. This modified dBN is referred to as dBGN. For both methods, the computational cost per iteration is ${\cal O}(n)$. Numerical results demonstrate the ability dBN and dBGN to efficiently achieve accurate results and outperform BFGS for select examples.

Updated: 2024-07-01 17:42:29

标题: 快速迭代求解器用于神经网络方法：II. 一维扩散-反应问题和数据拟合

摘要: 本文扩展了最近在[4]中引入的阻尼块牛顿（dBN）方法，用于一维扩散反应方程和最小二乘数据拟合问题。为了确定神经网络（NN）的线性参数（输出层的权重和偏置），dBN方法需要解决涉及质量矩阵的线性方程组。虽然局部帽基函数的质量矩阵是三对角且条件良好的，但神经网络的质量矩阵是稠密且条件糟糕的。例如，对于准均匀网格，神经网络质量矩阵的条件数至少为${\cal O}(n^4)$。我们提出了一种质量矩阵的因式分解方法，可以在${\cal O}(n)$次运算中解决线性方程组。为了确定非线性参数（隐藏层的权重和偏置），每次迭代中使用了一步阻尼牛顿方法。在Hessian矩阵奇异的情况下，使用高斯牛顿方法代替牛顿方法。这种修改后的dBN被称为dBGN。对于这两种方法，每次迭代的计算成本为${\cal O}(n)$。数值结果表明，dBN和dBGN能够有效地实现准确结果，并在某些例子中胜过BFGS。

更新时间: 2024-07-01 17:42:29

领域: math.NA,cs.LG,cs.NA,65K10, 65F05

下载: http://arxiv.org/abs/2407.01496v1

RegMix: Data Mixture as Regression for Language Model Pre-training

The data mixture for large language model pre-training significantly impacts performance, yet how to determine an effective mixture remains unclear. We propose RegMix to automatically identify a high-performing data mixture by formulating it as a regression task. RegMix involves training a set of small models with diverse data mixtures and fitting a regression model to predict their performance given their respective mixtures. With the fitted regression model, we simulate the top-ranked mixture and use it to train a large-scale model with orders of magnitude more compute. To empirically validate RegMix, we train 512 models with 1M parameters for 1B tokens of different mixtures to fit the regression model and find the optimal mixture. Using this mixture we train a 1B parameter model for 25B tokens (i.e. 1000x larger and 25x longer) which we find performs best among 64 candidate 1B parameter models with other mixtures. Further, our method demonstrates superior performance compared to human selection and achieves results that match or surpass DoReMi, while utilizing only 10% of the compute budget. Our experiments also show that (1) Data mixtures significantly impact performance with single-task performance variations of up to 14.6%; (2) Web corpora rather than data perceived as high-quality like Wikipedia have the strongest positive correlation with downstream performance; (3) Domains interact in complex ways often contradicting common sense, thus automatic approaches like RegMix are needed; (4) Data mixture effects transcend scaling laws, and our approach captures the complexity by considering all domains together. Our code is available at https://github.com/sail-sg/regmix.

Updated: 2024-07-01 17:31:03

标题: RegMix：数据混合作为语言模型预训练的回归

摘要: 大型语言模型预训练的数据混合显著影响性能，然而如何确定有效的混合仍不清楚。我们提出了RegMix，通过将其制定为回归任务来自动识别高性能数据混合。RegMix涉及训练一组具有不同数据混合的小型模型，并拟合回归模型以预测它们在各自混合下的性能。通过拟合的回归模型，我们模拟出排名靠前的混合，并用它来训练一个具有数量级更多计算资源的大规模模型。为了在经验上验证RegMix，我们训练了512个具有100万参数的模型，使用1亿个标记的不同混合来拟合回归模型并找到最佳混合。使用这种混合，我们训练了一个有10亿参数的模型，使用了250亿个标记（即比其他混合的64个1亿参数模型大1000倍，更长25倍），我们发现它表现最佳。此外，我们的方法表现优于人工选择，并取得了与DoReMi相匹配或超越的结果，同时只利用了10%的计算预算。我们的实验还表明：（1）数据混合对性能有显著影响，单任务性能变化高达14.6%；（2）与维基百科等被认为是高质量数据不同，Web语料库与下游性能之间存在最强的正相关性；（3）领域以复杂的方式相互作用，常常与常识相矛盾，因此需要像RegMix这样的自动方法；（4）数据混合效应超越了扩展规律，我们的方法通过考虑所有领域来捕捉复杂性。我们的代码可在https://github.com/sail-sg/regmix 上找到。

更新时间: 2024-07-01 17:31:03

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2407.01492v1

Large Language Models Assume People are More Rational than We Really are

In order for AI systems to communicate effectively with people, they must understand how we make decisions. However, people's decisions are not always rational, so the implicit internal models of human decision-making in Large Language Models (LLMs) must account for this. Previous empirical evidence seems to suggest that these implicit models are accurate -- LLMs offer believable proxies of human behavior, acting how we expect humans would in everyday interactions. However, by comparing LLM behavior and predictions to a large dataset of human decisions, we find that this is actually not the case: when both simulating and predicting people's choices, a suite of cutting-edge LLMs (GPT-4o & 4-Turbo, Llama-3-8B & 70B, Claude 3 Opus) assume that people are more rational than we really are. Specifically, these models deviate from human behavior and align more closely with a classic model of rational choice -- expected value theory. Interestingly, people also tend to assume that other people are rational when interpreting their behavior. As a consequence, when we compare the inferences that LLMs and people draw from the decisions of others using another psychological dataset, we find that these inferences are highly correlated. Thus, the implicit decision-making models of LLMs appear to be aligned with the human expectation that other people will act rationally, rather than with how people actually act.

Updated: 2024-07-01 17:29:54

标题: 大型语言模型假设人们比实际更理性

摘要: 为了使人工智能系统能够有效地与人类沟通，它们必须了解我们做出决策的方式。然而，人们的决策并不总是理性的，因此大型语言模型（LLMs）中的隐含内部人类决策模型必须考虑到这一点。先前的经验证据似乎表明这些隐含模型是准确的 - LLMs提供了人类行为的可信代理，表现出我们在日常互动中所期望的人类行为。然而，通过将LLM的行为和预测与大量人类决策数据集进行比较，我们发现实际情况并非如此：在模拟和预测人们的选择时，一系列尖端LLMs（GPT-4o和4-Turbo，Llama-3-8B和70B，Claude 3 Opus）假设人们比实际更加理性。具体而言，这些模型偏离了人类行为，更接近于理性选择的经济价值理论。有趣的是，人们在解释他人行为时也倾向于假设其他人是理性的。因此，当我们比较LLMs和人们从他人决策中得出的推断时，使用另一个心理数据集，我们发现这些推断之间高度相关。因此，LLMs的隐含决策模型似乎与人类期望其他人会理性行事一致，而不是与人们实际行为一致。

更新时间: 2024-07-01 17:29:54

领域: cs.CL,cs.AI,cs.CY,cs.LG

下载: http://arxiv.org/abs/2406.17055v2

Unmasking Bias in AI: A Systematic Review of Bias Detection and Mitigation Strategies in Electronic Health Record-based Models

Objectives: Leveraging artificial intelligence (AI) in conjunction with electronic health records (EHRs) holds transformative potential to improve healthcare. Yet, addressing bias in AI, which risks worsening healthcare disparities, cannot be overlooked. This study reviews methods to detect and mitigate diverse forms of bias in AI models developed using EHR data. Methods: We conducted a systematic review following the Preferred Reporting Items for Systematic Reviews and Meta-analyses (PRISMA) guidelines, analyzing articles from PubMed, Web of Science, and IEEE published between January 1, 2010, and Dec 17, 2023. The review identified key biases, outlined strategies for detecting and mitigating bias throughout the AI model development process, and analyzed metrics for bias assessment. Results: Of the 450 articles retrieved, 20 met our criteria, revealing six major bias types: algorithmic, confounding, implicit, measurement, selection, and temporal. The AI models were primarily developed for predictive tasks in healthcare settings. Four studies concentrated on the detection of implicit and algorithmic biases employing fairness metrics like statistical parity, equal opportunity, and predictive equity. Sixty proposed various strategies for mitigating biases, especially targeting implicit and selection biases. These strategies, evaluated through both performance (e.g., accuracy, AUROC) and fairness metrics, predominantly involved data collection and preprocessing techniques like resampling, reweighting, and transformation. Discussion: This review highlights the varied and evolving nature of strategies to address bias in EHR-based AI models, emphasizing the urgent needs for the establishment of standardized, generalizable, and interpretable methodologies to foster the creation of ethical AI systems that promote fairness and equity in healthcare.

Updated: 2024-07-01 17:26:23

标题: 揭示人工智能中的偏见：基于电子健康记录模型的偏见检测和缓解策略的系统性回顾

摘要: 目标：利用人工智能（AI）与电子健康记录（EHRs）相结合，有望改善医疗保健，具有变革潜力。然而，解决AI中的偏见问题，可能会加剧医疗保健差距，不容忽视。本研究回顾了使用EHR数据开发的AI模型中检测和减轻各种形式偏见的方法。方法：我们进行了一项系统性回顾，遵循了优选报告项目的系统性回顾和荟萃分析（PRISMA）指南，分析了自2010年1月1日至2023年12月17日间发表在PubMed、Web of Science和IEEE上的文章。回顾确定了关键偏见，概述了在AI模型开发过程中检测和减轻偏见的策略，并分析了偏见评估的指标。结果：共检索到450篇文章，符合我们的标准的有20篇，揭示出六种主要的偏见类型：算法、混杂、隐性、测量、选择和时间性。AI模型主要是为医疗保健设置中的预测任务而开发的。四项研究集中在检测隐性和算法偏见，采用公平度指标如统计平等、机会平等和预测公平。60项提出了各种减轻偏见的策略，特别是针对隐性和选择性偏见。这些策略通过性能（如准确度、AUROC）和公平度指标进行评估，主要涉及数据收集和预处理技术，如重采样、重新加权和转换。讨论：本回顾突显了面向EHR的AI模型中解决偏见的策略的多样化和不断发展的特性，强调了建立标准化、可推广和可解释的方法论的紧迫需要，以促进推广公平和公正的伦理AI系统在医疗保健中的应用。

更新时间: 2024-07-01 17:26:23

领域: cs.AI,cs.CY,cs.LG,q-bio.QM

下载: http://arxiv.org/abs/2310.19917v3

LLM See, LLM Do: Guiding Data Generation to Target Non-Differentiable Objectives

The widespread adoption of synthetic data raises new questions about how models generating the data can influence other large language models (LLMs) via distilled data. To start, our work exhaustively characterizes the impact of passive inheritance of model properties by systematically studying the consequences of synthetic data integration. We provide one of the most comprehensive studies to-date of how the source of synthetic data shapes models' internal biases, calibration and generations' textual attributes and preferences. We find that models are surprisingly sensitive towards certain attributes even when the synthetic data prompts appear "neutral". which invites the question whether this sensitivity can be exploited for good. Our findings invite the question can we explicitly steer the models towards the properties we want at test time by exploiting the data generation process? This would have historically been considered infeasible due to the cost of collecting data with a specific characteristic or objective in mind. However, improvement in the quality of synthetic data, as well as a shift towards general-purpose models designed to follow a diverse way of instructions, means this question is timely. We propose active inheritance as a term to describe intentionally constraining synthetic data according to a non-differentiable objective. We demonstrate how active inheritance can steer the generation profiles of models towards desirable non-differentiable attributes, e.g. high lexical diversity or low toxicity.

Updated: 2024-07-01 17:26:21

标题: LLM 见，LLM 行：引导数据生成以实现非可微目标

摘要: 合成数据的广泛采用引发了关于生成数据的模型如何通过精炼数据影响其他大型语言模型(LLMs)的新问题。首先，我们的工作通过系统研究合成数据整合的后果，详尽地描述了模型属性的被动继承对模型的影响。我们提供了迄今为止对合成数据来源如何塑造模型内在偏见、校准和生成的文本属性和偏好的最全面的研究之一。我们发现，即使合成数据提示看起来“中性”，模型对某些属性也异常敏感，这引发了一个问题，即这种敏感性是否可以被利用为善。我们的发现引发了一个问题，即我们是否可以通过利用数据生成过程在测试时间明确引导模型朝着我们想要的属性发展？考虑到以往由于采集具有特定特征或目标的数据的成本而被认为不可行，这在历史上是不可行的。然而，合成数据质量的提高，以及向设计成遵循多样化指令的通用模型的转变，意味着这个问题是及时的。我们提出了“主动继承”作为一个术语，用来描述根据非可微分目标有意地限制合成数据。我们演示了主动继承如何引导模型的生成配置朝着理想的非可微分属性发展，例如高词汇多样性或低有毒性。

更新时间: 2024-07-01 17:26:21

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2407.01490v1

Agentless: Demystifying LLM-based Software Engineering Agents

Recent advancements in large language models (LLMs) have significantly advanced the automation of software development tasks, including code synthesis, program repair, and test generation. More recently, researchers and industry practitioners have developed various autonomous LLM agents to perform end-to-end software development tasks. These agents are equipped with the ability to use tools, run commands, observe feedback from the environment, and plan for future actions. However, the complexity of these agent-based approaches, together with the limited abilities of current LLMs, raises the following question: Do we really have to employ complex autonomous software agents? To attempt to answer this question, we build Agentless -- an agentless approach to automatically solve software development problems. Compared to the verbose and complex setup of agent-based approaches, Agentless employs a simplistic two-phase process of localization followed by repair, without letting the LLM decide future actions or operate with complex tools. Our results on the popular SWE-bench Lite benchmark show that surprisingly the simplistic Agentless is able to achieve both the highest performance (27.33%) and lowest cost (\$0.34) compared with all existing open-source software agents! Furthermore, we manually classified the problems in SWE-bench Lite and found problems with exact ground truth patch or insufficient/misleading issue descriptions. As such, we construct SWE-bench Lite-S by excluding such problematic issues to perform more rigorous evaluation and comparison. Our work highlights the current overlooked potential of a simple, interpretable technique in autonomous software development. We hope Agentless will help reset the baseline, starting point, and horizon for autonomous software agents, and inspire future work along this crucial direction.

Updated: 2024-07-01 17:24:45

标题: Agentless: 揭秘基于LLM的软件工程代理

摘要: 最近大型语言模型（LLMs）的进展显著推动了软件开发任务的自动化，包括代码合成、程序修复和测试生成。最近，研究人员和行业从业者开发了各种自主的LLM代理，用于执行端到端的软件开发任务。这些代理具备使用工具、运行命令、观察环境反馈和规划未来行动的能力。然而，这些基于代理的方法的复杂性，以及当前LLMs的能力有限，引发了以下问题：我们是否真的需要使用复杂的自主软件代理？为了尝试回答这个问题，我们建立了Agentless——一种无代理的方法来自动解决软件开发问题。与基于代理的方法的冗长和复杂设置相比，Agentless采用了简单的两阶段过程，即定位和修复，而不让LLM决定未来的行动或使用复杂工具。我们在流行的SWE-bench Lite基准测试上的结果显示，令人惊讶的是，简单的Agentless能够实现最高的性能（27.33%）和最低的成本（0.34美元），与所有现有的开源软件代理相比！此外，我们手动对SWE-bench Lite中的问题进行分类，发现存在精确的基本修补程序或不足/误导性问题描述的问题。因此，我们通过排除这些问题，构建了SWE-bench Lite-S，以进行更严格的评估和比较。我们的工作突出了在自主软件开发中简单、可解释技术的潜力，我们希望Agentless能够帮助重新设定自主软件代理的基线、起点和未来方向，并激发沿着这一关键方向进行未来工作。

更新时间: 2024-07-01 17:24:45

领域: cs.SE,cs.AI,cs.CL,cs.LG

下载: http://arxiv.org/abs/2407.01489v1

Comparing AI Algorithms for Optimizing Elliptic Curve Cryptography Parameters in e-Commerce Integrations: A Pre-Quantum Analysis

This paper presents a comparative analysis between the Genetic Algorithm (GA) and Particle Swarm Optimization (PSO), two vital artificial intelligence algorithms, focusing on optimizing Elliptic Curve Cryptography (ECC) parameters. These encompass the elliptic curve coefficients, prime number, generator point, group order, and cofactor. The study provides insights into which of the bio-inspired algorithms yields better optimization results for ECC configurations, examining performances under the same fitness function. This function incorporates methods to ensure robust ECC parameters, including assessing for singular or anomalous curves and applying Pollard's rho attack and Hasse's theorem for optimization precision. The optimized parameters generated by GA and PSO are tested in a simulated e-commerce environment, contrasting with well-known curves like secp256k1 during the transmission of order messages using Elliptic Curve-Diffie Hellman (ECDH) and Hash-based Message Authentication Code (HMAC). Focusing on traditional computing in the pre-quantum era, this research highlights the efficacy of GA and PSO in ECC optimization, with implications for enhancing cybersecurity in third-party e-commerce integrations. We recommend the immediate consideration of these findings before quantum computing's widespread adoption.

Updated: 2024-07-01 17:19:27

标题: 比较AI算法在电子商务集成中优化椭圆曲线密码参数的研究：一项前量子分析

摘要: 这篇论文介绍了遗传算法（GA）和粒子群优化（PSO）之间的比较分析，这两种重要的人工智能算法主要关注优化椭圆曲线密码学（ECC）参数。这些参数包括椭圆曲线系数、素数、生成点、群阶和余子。本研究深入探讨了哪种生物启发算法在相同适应度函数下为ECC配置提供更好的优化结果，考察了它们的性能。该函数包括确保健壮ECC参数的方法，包括评估奇异或异常曲线以及应用Pollard的rho攻击和Hasse定理以实现优化精度。GA和PSO生成的优化参数在模拟的电子商务环境中进行了测试，与secp256k1等知名曲线进行了对比，在使用椭圆曲线-迪菲-赫尔曼（ECDH）和基于哈希的消息认证码（HMAC）传输订单消息时。着重研究传统计算在前量子时代的应用，本研究凸显了GA和PSO在ECC优化中的有效性，对于提升第三方电子商务集成的网络安全具有重要意义。我们建议在量子计算普及之前立即考虑这些发现。

更新时间: 2024-07-01 17:19:27

领域: cs.CR,cs.AI

下载: http://arxiv.org/abs/2310.06752v2

Fine-tuning can cripple your foundation model; preserving features may be the solution

Pre-trained foundation models, due to their enormous capacity and exposure to vast amounts of data during pre-training, are known to have learned plenty of real-world concepts. An important step in making these pre-trained models effective on downstream tasks is to fine-tune them on related datasets. While various fine-tuning methods have been devised and have been shown to be highly effective, we observe that a fine-tuned model's ability to recognize concepts on tasks $\textit{different}$ from the downstream one is reduced significantly compared to its pre-trained counterpart. This is an undesirable effect of fine-tuning as a substantial amount of resources was used to learn these pre-trained concepts in the first place. We call this phenomenon ''concept forgetting'' and via experiments show that most end-to-end fine-tuning approaches suffer heavily from this side effect. To this end, we propose a simple fix to this problem by designing a new fine-tuning method called $\textit{LDIFS}$ (short for $\ell_2$ distance in feature space) that, while learning new concepts related to the downstream task, allows a model to preserve its pre-trained knowledge as well. Through extensive experiments on 10 fine-tuning tasks we show that $\textit{LDIFS}$ significantly reduces concept forgetting. Additionally, we show that LDIFS is highly effective in performing continual fine-tuning on a sequence of tasks as well, in comparison with both fine-tuning as well as continual learning baselines.

Updated: 2024-07-01 17:14:27

标题: 微调可能损害您的基础模型；保留特征可能是解决方案

摘要: 预训练的基础模型，由于其巨大的容量和在预训练期间暴露于大量数据，被认为已经学习了大量的现实世界概念。使这些预训练模型在下游任务上有效的一个重要步骤是在相关数据集上对它们进行微调。虽然已经制定了各种微调方法并证明它们非常有效，但我们观察到，与预训练对应的模型相比，微调模型在不同于下游任务的任务上识别概念的能力显著降低。这是微调的一个不良影响，因为大量资源被用于首次学习这些预训练概念。我们称之为“概念遗忘”，并通过实验证明，大多数端到端微调方法都严重受到这种副作用的影响。为此，我们提出了一个简单的解决方案，设计了一种新的微调方法，称为LDIFS（在特征空间中的$\ell_2$距离），在学习与下游任务相关的新概念的同时，允许模型保留其预训练知识。通过对10个微调任务进行广泛实验，我们展示了LDIFS显著减少了概念遗忘。此外，我们还展示了LDIFS在连续微调一系列任务时与微调和连续学习基线相比非常有效。

更新时间: 2024-07-01 17:14:27

领域: cs.LG,cs.CV

下载: http://arxiv.org/abs/2308.13320v3

EquiBot: SIM(3)-Equivariant Diffusion Policy for Generalizable and Data Efficient Learning

Building effective imitation learning methods that enable robots to learn from limited data and still generalize across diverse real-world environments is a long-standing problem in robot learning. We propose EquiBot, a robust, data-efficient, and generalizable approach for robot manipulation task learning. Our approach combines SIM(3)-equivariant neural network architectures with diffusion models. This ensures that our learned policies are invariant to changes in scale, rotation, and translation, enhancing their applicability to unseen environments while retaining the benefits of diffusion-based policy learning such as multi-modality and robustness. We show in a suite of 6 simulation tasks that our proposed method reduces the data requirements and improves generalization to novel scenarios. In the real world, we show with in total 10 variations of 6 mobile manipulation tasks that our method can easily generalize to novel objects and scenes after learning from just 5 minutes of human demonstrations in each task.

Updated: 2024-07-01 17:09:43

标题: EquiBot：SIM（3）-等变扩散策略用于可泛化和数据高效学习

摘要: 构建有效的模仿学习方法，使机器人能够从有限数据中学习，并且在不同的真实环境中实现泛化，是机器人学习中长期存在的问题。我们提出EquiBot，这是一种强大、高效和可泛化的机器人操作任务学习方法。我们的方法将SIM（3）-等变神经网络架构与扩散模型相结合。这确保我们学到的策略对于尺度、旋转和平移的变化是不变的，增强了它们对未知环境的适用性，同时保留了基于扩散的策略学习的多模态性和稳健性的好处。我们在一系列6个仿真任务中展示了我们提出的方法降低了数据需求并改善了对新场景的泛化能力。在现实世界中，我们展示了在总共10个变化的6个移动操作任务中，我们的方法可以在仅学习每个任务中5分钟的人类演示之后轻松地泛化到新的物体和场景。

更新时间: 2024-07-01 17:09:43

领域: cs.RO,cs.LG

下载: http://arxiv.org/abs/2407.01479v1

Tree Search for Language Model Agents

Autonomous agents powered by language models (LMs) have demonstrated promise in their ability to perform decision-making tasks such as web automation. However, a key limitation remains: LMs, primarily optimized for natural language understanding and generation, struggle with multi-step reasoning, planning, and using environmental feedback when attempting to solve realistic computer tasks. Towards addressing this, we propose an inference-time search algorithm for LM agents to explicitly perform exploration and multi-step planning in interactive web environments. Our approach is a form of best-first tree search that operates within the actual environment space, and is complementary with most existing state-of-the-art agents. It is the first tree search algorithm for LM agents that shows effectiveness on realistic web tasks. On the challenging VisualWebArena benchmark, applying our search algorithm on top of a GPT-4o agent yields a 39.7% relative increase in success rate compared to the same baseline without search, setting a state-of-the-art success rate of 26.4%. On WebArena, search also yields a 28.0% relative improvement over a baseline agent, setting a competitive success rate of 19.2%. Our experiments highlight the effectiveness of search for web agents, and we demonstrate that performance scales with increased test-time compute. We conduct a thorough analysis of our results to highlight improvements from search, limitations, and promising directions for future work. Our code and models are publicly released at https://jykoh.com/search-agents.

Updated: 2024-07-01 17:07:55

标题: 树搜索用于语言模型代理

摘要: 由语言模型（LMs）驱动的自主代理已经展示了在执行决策任务（如网络自动化）方面的潜力。然而，一个关键的限制仍然存在：LMs主要针对自然语言理解和生成进行了优化，但在尝试解决现实计算机任务时，它们在多步推理、规划和使用环境反馈方面存在困难。为了解决这个问题，我们提出了一种推理时搜索算法，使LM代理能够在交互式网络环境中显式地进行探索和多步规划。我们的方法是一种在实际环境空间内运行的最佳优先树搜索，与大多数现有的最先进代理是互补的。这是LM代理的第一个树搜索算法，在现实网络任务上显示出了有效性。在具有挑战性的VisualWebArena基准测试中，将我们的搜索算法应用于GPT-4o代理，与没有搜索的相同基准相比，成功率相对增加了39.7%，创下了26.4%的最先进成功率。在WebArena上，搜索还使一个基准代理相对改进了28.0%，创下了19.2%的竞争性成功率。我们的实验突出了搜索对网络代理的有效性，并且我们展示了性能随着测试时间计算的增加而提高。我们对结果进行了彻底分析，以突出搜索的改进、限制和未来工作的有希望方向。我们的代码和模型已经公开发布在https://jykoh.com/search-agents。

更新时间: 2024-07-01 17:07:55

领域: cs.AI,cs.CL,cs.LG

下载: http://arxiv.org/abs/2407.01476v1

Exploring FPGA designs for MX and beyond

A number of companies recently worked together to release the new Open Compute Project MX standard for low-precision computation, aimed at efficient neural network implementation. In this paper, we describe and evaluate the first open-source FPGA implementation of the arithmetic defined in the standard. Our designs fully support all the standard's concrete formats for conversion into and out of MX formats and for the standard-defined arithmetic operations, as well as arbitrary fixed-point and floating-point formats. Certain elements of the standard are left as implementation-defined, and we present the first concrete FPGA-inspired choices for these elements, which we outline in the paper. Our library of optimized hardware components is available open source, and can be used to build larger systems. For this purpose, we also describe and release an open-source Pytorch library for quantization into the new standard, integrated with the Brevitas library so that the community can develop novel neural network designs quantized with MX formats in mind. We demonstrate the usability and efficacy of our libraries via the implementation of example neural networks such as ResNet-18 on the ImageNet ILSVRC12 dataset. Our testing shows that MX is very effective for formats such as INT5 or FP6 which are not natively supported on GPUs. This gives FPGAs an advantage as they have the flexibility to implement a custom datapath and take advantage of the smaller area footprints offered by these formats.

Updated: 2024-07-01 17:07:33

标题: 探索适用于MX及更高级别的FPGA设计

摘要: 最近，一些公司合作发布了针对低精度计算的新的Open Compute Project MX标准，旨在实现高效的神经网络实现。在本文中，我们描述和评估了这一标准中定义的算术的第一个开源FPGA实现。我们的设计完全支持所有标准的具体格式，可以进行MX格式转换，并进行标准定义的算术运算，以及任意固定点和浮点格式。标准的某些元素留给实现定义，我们在论文中提出了这些元素的第一个具体的FPGA启发式选择。我们的优化硬件组件库是开源的，可用于构建更大的系统。为此，我们还描述并发布了一个用于量化为新标准的开源Pytorch库，与Brevitas库集成，以便社区可以开发以MX格式为目标的新颖神经网络设计。我们通过在ImageNet ILSVRC12数据集上实现示例神经网络（如ResNet-18）来展示我们库的可用性和有效性。我们的测试表明，MX对于INT5或FP6等GPU不支持的格式非常有效。这使得FPGA具有优势，因为它们具有实现自定义数据路径的灵活性，并利用这些格式提供的较小的面积占用。

更新时间: 2024-07-01 17:07:33

领域: cs.AR,cs.LG

下载: http://arxiv.org/abs/2407.01475v1

Survey and Analysis of IoT Operating Systems: A Comparative Study on the Effectiveness and Acquisition Time of Open Source Digital Forensics Tools

The main goal of this research project is to evaluate the effectiveness and speed of open-source forensic tools for digital evidence collecting from various Internet-of-Things (IoT) devices. The project will create and configure many IoT environments, across popular IoT operating systems, and run common forensics tasks in order to accomplish this goal. To validate these forensic analysis operations, a variety of open-source forensic tools covering four standard digital forensics tasks. These tasks will be utilized across each sample IoT operating system and will have its time spent on record carefully tracked down and examined, allowing for a thorough evaluation of the effectiveness and speed for performing forensics on each type of IoT device. The research also aims to offer recommendations to IoT security experts and digital forensic practitioners about the most efficient open-source tools for forensic investigations with IoT devices while maintaining the integrity of gathered evidence and identifying challenges that exist with these new device types. The results will be shared widely and well-documented in order to provide significant contributions to the field of internet-of-things device makers and digital forensics.

Updated: 2024-07-01 17:06:32

标题: 调查和分析物联网操作系统：关于开源数字取证工具的效率和获取时间的比较研究

摘要: 这项研究项目的主要目标是评估开源取证工具在从各种物联网设备收集数字证据方面的有效性和速度。该项目将创建并配置许多物联网环境，涵盖流行的物联网操作系统，并运行常见的取证任务以实现这一目标。为验证这些取证分析操作，将利用涵盖四项标准数字取证任务的各种开源取证工具。这些任务将在每个样本物联网操作系统上使用，并将其花费的时间记录下来并进行仔细检查，以便对在每种类型的物联网设备上执行取证的有效性和速度进行彻底评估。该研究还旨在向物联网安全专家和数字取证从业者提供关于在维护所收集证据的完整性并识别这些新设备类型存在的挑战的同时，进行物联网设备取证调查的最有效开源工具的建议。结果将广泛分享和详细记录，以便为物联网设备制造商和数字取证领域提供重要贡献。

更新时间: 2024-07-01 17:06:32

领域: cs.CR

下载: http://arxiv.org/abs/2407.01474v1

Image-to-Text Logic Jailbreak: Your Imagination can Help You Do Anything

Large Visual Language Models (VLMs) such as GPT-4 have achieved remarkable success in generating comprehensive and nuanced responses, surpassing the capabilities of large language models. However, with the integration of visual inputs, new security concerns emerge, as malicious attackers can exploit multiple modalities to achieve their objectives. This has led to increasing attention on the vulnerabilities of VLMs to jailbreak. Most existing research focuses on generating adversarial images or nonsensical image collections to compromise these models. However, the challenge of leveraging meaningful images to produce targeted textual content using the VLMs' logical comprehension of images remains unexplored. In this paper, we explore the problem of logical jailbreak from meaningful images to text. To investigate this issue, we introduce a novel dataset designed to evaluate flowchart image jailbreak. Furthermore, we develop a framework for text-to-text jailbreak using VLMs. Finally, we conduct an extensive evaluation of the framework on GPT-4o and GPT-4-vision-preview, with jailbreak rates of 92.8% and 70.0%, respectively. Our research reveals significant vulnerabilities in current VLMs concerning image-to-text jailbreak. These findings underscore the need for a deeper examination of the security flaws in VLMs before their practical deployment.

Updated: 2024-07-01 16:58:55

标题: 图像到文本逻辑越狱：你的想象力可以帮助你做任何事情

摘要: 大型视觉语言模型（VLMs）如GPT-4在生成全面且细致的响应方面取得了显著成功，超越了大型语言模型的能力。然而，随着视觉输入的整合，新的安全性问题出现了，恶意攻击者可以利用多种模态来实现他们的目标。这导致人们越来越关注VLMs对越狱的脆弱性。大多数现有研究集中于生成对抗性图像或无意义的图像集合来破坏这些模型。然而，利用有意义的图像通过VLMs对图像的逻辑理解生成有针对性的文本内容的挑战仍未被探索。在本文中，我们探讨了从有意义的图像到文本的逻辑越狱问题。为了研究这个问题，我们引入了一个新颖的数据集，旨在评估流程图像越狱。此外，我们开发了一个使用VLMs进行文本到文本越狱的框架。最后，我们对GPT-4o和GPT-4-vision-preview的框架进行了广泛评估，分别达到92.8%和70.0%的破解率。我们的研究揭示了当前VLMs在图像到文本越狱方面存在重大漏洞。这些发现强调了在实际部署之前对VLMs的安全漏洞进行更深入的审查的必要性。

更新时间: 2024-07-01 16:58:55

领域: cs.CR,cs.CV

下载: http://arxiv.org/abs/2407.02534v1

BMW Agents -- A Framework For Task Automation Through Multi-Agent Collaboration

Autonomous agents driven by Large Language Models (LLMs) offer enormous potential for automation. Early proof of this technology can be found in various demonstrations of agents solving complex tasks, interacting with external systems to augment their knowledge, and triggering actions. In particular, workflows involving multiple agents solving complex tasks in a collaborative fashion exemplify their capacity to operate in less strict and less well-defined environments. Thus, a multi-agent approach has great potential for serving as a backbone in many industrial applications, ranging from complex knowledge retrieval systems to next generation robotic process automation. Given the reasoning abilities within the current generation of LLMs, complex processes require a multi-step approach that includes a plan of well-defined and modular tasks. Depending on the level of complexity, these tasks can be executed either by a single agent or a group of agents. In this work, we focus on designing a flexible agent engineering framework with careful attention to planning and execution, capable of handling complex use case applications across various domains. The proposed framework provides reliability in industrial applications and presents techniques to ensure a scalable, flexible, and collaborative workflow for multiple autonomous agents working together towards solving tasks.

Updated: 2024-07-01 16:58:15

标题: BMW代理商 - 通过多智能体协作实现任务自动化的框架

摘要: 由大型语言模型（LLMs）驱动的自主代理人具有巨大的自动化潜力。这项技术的早期证据可以在代理人解决复杂任务、与外部系统互动以增强他们的知识并触发动作的各种演示中找到。特别是，在协作方式下涉及多个代理人解决复杂任务的工作流程，展示了它们在不太严格和不太明确的环境中运作的能力。因此，多代理人方法在许多工业应用中作为支柱具有巨大潜力，范围从复杂知识检索系统到下一代机器人流程自动化。鉴于当前一代LLMs的推理能力，复杂过程需要一个包括明确定义和模块化任务计划的多步骤方法。根据复杂程度，这些任务可以由单个代理人或一组代理人执行。在这项工作中，我们专注于设计一个灵活的代理人工程框架，特别关注规划和执行，能够处理跨不同领域的复杂用例应用。所提出的框架为工业应用提供可靠性，并提供技术以确保多个自主代理人合作解决任务的可扩展、灵活和协作工作流程。

更新时间: 2024-07-01 16:58:15

领域: cs.MA,cs.AI

下载: http://arxiv.org/abs/2406.20041v2

Retrieval-augmented generation in multilingual settings

Retrieval-augmented generation (RAG) has recently emerged as a promising solution for incorporating up-to-date or domain-specific knowledge into large language models (LLMs) and improving LLM factuality, but is predominantly studied in English-only settings. In this work, we consider RAG in the multilingual setting (mRAG), i.e. with user queries and the datastore in 13 languages, and investigate which components and with which adjustments are needed to build a well-performing mRAG pipeline, that can be used as a strong baseline in future works. Our findings highlight that despite the availability of high-quality off-the-shelf multilingual retrievers and generators, task-specific prompt engineering is needed to enable generation in user languages. Moreover, current evaluation metrics need adjustments for multilingual setting, to account for variations in spelling named entities. The main limitations to be addressed in future works include frequent code-switching in non-Latin alphabet languages, occasional fluency errors, wrong reading of the provided documents, or irrelevant retrieval. We release the code for the resulting mRAG baseline pipeline at https://github.com/naver/bergen.

Updated: 2024-07-01 16:56:50

标题: 多语言环境下的检索增强生成

摘要: 检索增强生成（RAG）最近已成为将最新或领域特定知识纳入大型语言模型（LLMs）并提高LLM真实性的一种有前途的解决方案，但主要在仅英语环境中进行研究。在这项工作中，我们考虑在多语言环境中的RAG（mRAG），即用户查询和13种语言中的数据存储，并研究构建一个表现良好的mRAG流水线所需的哪些组件以及以何种调整，该流水线可作为未来工作中的强基线。我们的研究结果表明，尽管有高质量的现成多语言检索器和生成器，但需要特定于任务的提示工程来实现用户语言的生成。此外，当前的评估指标需要调整以适应多语言环境，以考虑命名实体拼写的变化。未来工作需要解决的主要限制包括非拉丁字母语言中的频繁代码切换、偶尔的流畅性错误、对提供的文档的错误阅读或无关检索。我们发布了该mRAG基线流水线的代码，网址为https://github.com/naver/bergen。

更新时间: 2024-07-01 16:56:50

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2407.01463v1

On Implications of Scaling Laws on Feature Superposition

Using results from scaling laws, this theoretical note argues that the following two statements cannot be simultaneously true: 1. Superposition hypothesis where sparse features are linearly represented across a layer is a complete theory of feature representation. 2. Features are universal, meaning two models trained on the same data and achieving equal performance will learn identical features.

Updated: 2024-07-01 16:54:07

标题: 关于特征叠加的尺度定律影响

摘要: 利用尺度律的结果，这个理论笔记认为以下两个陈述不能同时成立：1. 超定位假设，即稀疏特征在层间线性表示是特征表征的完整理论。2. 特征是普遍的，意味着在相同数据上训练且获得相同性能的两个模型将学习相同的特征。

更新时间: 2024-07-01 16:54:07

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2407.01459v1

Proceedings of The second international workshop on eXplainable AI for the Arts (XAIxArts)

This second international workshop on explainable AI for the Arts (XAIxArts) brought together a community of researchers in HCI, Interaction Design, AI, explainable AI (XAI), and digital arts to explore the role of XAI for the Arts. Workshop held at the 16th ACM Conference on Creativity and Cognition (C&C 2024), Chicago, USA.

Updated: 2024-07-01 16:49:09

标题: 第二届可解释人工智能艺术国际研讨会论文集 (XAIxArts)

摘要: 这次第二届国际可解释人工智能艺术研讨会（XAIxArts）汇集了人机交互、交互设计、人工智能、可解释人工智能（XAI）和数字艺术领域的研究者社群，探讨了XAI在艺术领域的作用。研讨会在第16届ACM创造力与认知会议（C&C 2024）上举行，地点为美国芝加哥。

更新时间: 2024-07-01 16:49:09

领域: cs.AI,cs.HC,cs.MM,cs.SD,eess.AS

下载: http://arxiv.org/abs/2406.14485v3

FastCLIP: A Suite of Optimization Techniques to Accelerate CLIP Training with Limited Resources

Existing studies of training state-of-the-art Contrastive Language-Image Pretraining (CLIP) models on large-scale data involve hundreds of or even thousands of GPUs due to the requirement of a large batch size. However, such a large amount of resources is not accessible to most people. While advanced compositional optimization techniques for optimizing global contrastive losses have been demonstrated effective for removing the requirement of large batch size, their performance on large-scale data remains underexplored and not optimized. To bridge the gap, this paper explores several aspects of CLIP training with limited resources (e.g., up to tens of GPUs). First, we introduce FastCLIP, a general CLIP training framework built on advanced compositional optimization techniques while designed and optimized for the distributed setting. Our framework is equipped with an efficient gradient reduction strategy to reduce communication overhead. Second, to further boost training efficiency, we investigate three components of the framework from an optimization perspective: the schedule of the inner learning rate, the update rules of the temperature parameter and the model parameters, respectively. Experiments on different strategies for each component shed light on how to conduct CLIP training more efficiently. Finally, we benchmark the performance of FastCLIP and the state-of-the-art training baseline (OpenCLIP) on different compute scales up to 32 GPUs on 8 nodes, and three data scales ranging from 2.7 million, 9.1 million to 315 million image-text pairs to demonstrate the significant improvement of FastCLIP in the resource-limited setting. We release the code of FastCLIP at https://github.com/Optimization-AI/fast_clip .

Updated: 2024-07-01 16:37:18

标题: 快速CLIP：一套优化技术，加速在资源有限情况下进行CLIP训练

摘要: 现有的关于在大规模数据上训练最先进的对比语言-图像预训练（CLIP）模型的研究涉及数百甚至数千个GPU，因为需要较大的批量大小。然而，这么大量的资源并不是大多数人都可以使用的。尽管已经证明了用于优化全局对比损失的先进组合优化技术可以有效地消除大批量大小的要求，但它们在大规模数据上的性能仍未得到充分探讨和优化。为了弥合这一差距，本文探讨了在有限资源（例如，最多数十个GPU）下进行CLIP训练的几个方面。首先，我们介绍了FastCLIP，这是一个通用的CLIP训练框架，建立在先进的组合优化技术之上，同时设计和优化适用于分布式环境。我们的框架配备了一种高效的梯度减少策略，以减少通信开销。其次，为了进一步提高训练效率，我们从优化的角度研究了框架的三个组成部分：内部学习率的调度，温度参数和模型参数的更新规则。对每个组件的不同策略的实验揭示了如何更有效地进行CLIP训练。最后，我们对FastCLIP和最先进的训练基准（OpenCLIP）在不同计算规模（最多32个GPU，8个节点）和三个数据规模（图像-文本对数量分别为270万、910万和3.15亿）上的性能进行了基准测试，以展示FastCLIP在资源有限环境中的显著改进。我们在https://github.com/Optimization-AI/fast_clip 上发布了FastCLIP的代码。

更新时间: 2024-07-01 16:37:18

领域: cs.LG,cs.CV

下载: http://arxiv.org/abs/2407.01445v1

Does Writing with Language Models Reduce Content Diversity?

Large language models (LLMs) have led to a surge in collaborative writing with model assistance. As different users incorporate suggestions from the same model, there is a risk of decreased diversity in the produced content, potentially limiting diverse perspectives in public discourse. In this work, we measure the impact of co-writing on diversity via a controlled experiment, where users write argumentative essays in three setups -- using a base LLM (GPT3), a feedback-tuned LLM (InstructGPT), and writing without model help. We develop a set of diversity metrics and find that writing with InstructGPT (but not the GPT3) results in a statistically significant reduction in diversity. Specifically, it increases the similarity between the writings of different authors and reduces the overall lexical and content diversity. We additionally find that this effect is mainly attributable to InstructGPT contributing less diverse text to co-written essays. In contrast, the user-contributed text remains unaffected by model collaboration. This suggests that the recent improvement in generation quality from adapting models to human feedback might come at the cost of more homogeneous and less diverse content.

Updated: 2024-07-01 16:36:30

标题: 使用语言模型写作是否减少内容多样性？

摘要: 大型语言模型（LLMs）导致协作写作激增，而模型辅助。当不同用户采纳同一模型的建议时，会存在生产内容多样性降低的风险，可能限制公共讨论中的多元视角。在这项工作中，我们通过一个控制实验来衡量协作写作对多样性的影响，用户在三种设置下撰写议论性文章 -- 使用基础LLM（GPT3）、经过反馈调整的LLM（InstructGPT）以及不使用模型帮助。我们开发了一组多样性度量标准，发现使用InstructGPT进行写作（而不是GPT3）会导致多样性显著降低。具体而言，它增加了不同作者的文章之间的相似性，减少了整体词汇和内容的多样性。我们还发现，这种效应主要归因于InstructGPT为协作写作的文章贡献了更少多样性的文本。相反，用户贡献的文本不受模型协作的影响。这表明，最近通过将模型适应人类反馈而提高生成质量的进展可能是以更同质化和更少多样性的内容为代价。

更新时间: 2024-07-01 16:36:30

领域: cs.CL,cs.CY,cs.HC,cs.LG

下载: http://arxiv.org/abs/2309.05196v3

GAT-Steiner: Rectilinear Steiner Minimal Tree Prediction Using GNNs

The Rectilinear Steiner Minimum Tree (RSMT) problem is a fundamental problem in VLSI placement and routing and is known to be NP-hard. Traditional RSMT algorithms spend a significant amount of time on finding Steiner points to reduce the total wire length or use heuristics to approximate producing sub-optimal results. We show that Graph Neural Networks (GNNs) can be used to predict optimal Steiner points in RSMTs with high accuracy and can be parallelized on GPUs. In this paper, we propose GAT-Steiner, a graph attention network model that correctly predicts 99.846% of the nets in the ISPD19 benchmark with an average increase in wire length of only 0.480% on suboptimal wire length nets. On randomly generated benchmarks, GAT-Steiner correctly predicts 99.942% with an average increase in wire length of only 0.420% on suboptimal wire length nets.

Updated: 2024-07-01 16:32:49

标题: GAT-Steiner：使用GNN进行直角Steiner最小树预测

摘要: Rectilinear Steiner最小树（RSMT）问题是VLSI布局和布线中的一个基本问题，被认为是NP难问题。传统的RSMT算法花费大量时间在寻找Steiner点以减少总线长度，或者使用启发式方法来近似产生次优结果。我们展示了图神经网络（GNNs）可以被用来高精度地预测RSMT中的最优Steiner点，并且可以在GPU上并行化。在本文中，我们提出了GAT-Steiner，一个图注意力网络模型，在ISPD19基准测试中正确预测99.846%的网络，对次优长度网络的平均增加仅为0.480%。在随机生成的基准测试中，GAT-Steiner在次优长度网络上的平均增加仅为0.420%，正确预测了99.942%。

更新时间: 2024-07-01 16:32:49

领域: cs.LG

下载: http://arxiv.org/abs/2407.01440v1

Needle in the Haystack for Memory Based Large Language Models

In this paper, we demonstrate the benefits of using memory augmented Large Language Model (LLM) architecture in improving the recall abilities of facts from a potentially long context. As a case study we test LARIMAR, a recently proposed LLM architecture which augments a LLM decoder with an external associative memory, on several long-context recall tasks, including passkey and needle-in-the-haystack tests. We demonstrate that the external memory can be adapted at test time to handle contexts much longer than those seen during training, while keeping readouts from the memory recognizable to the trained decoder and without increasing GPU memory footprint. Compared to alternative architectures for long-context recall tasks with models of a comparable parameter count, LARIMAR is able to maintain strong performance without any task-specific training.

Updated: 2024-07-01 16:32:16

标题: 大型语言模型中的记忆导向Needle in the Haystack

摘要: 在这篇论文中，我们展示了使用记忆增强的大型语言模型（LLM）架构可以提高从潜在长上下文中召回事实的能力。作为案例研究，我们测试了LARIMAR，这是一种最近提出的LLM架构，它通过外部关联内存来增强LLM解码器，在几个长上下文召回任务上进行测试，包括密码和大海捞针测试。我们展示了外部内存可以在测试时适应比训练过程中看到的上下文更长的情况，同时保持内存读出对经过训练的解码器可识别，而且不会增加GPU内存占用。与具有相似参数数量的模型的长上下文召回任务的替代架构相比，LARIMAR能够在没有任何任务特定训练的情况下保持强大性能。

更新时间: 2024-07-01 16:32:16

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2407.01437v1

Towards objective and systematic evaluation of bias in artificial intelligence for medical imaging

Artificial intelligence (AI) models trained using medical images for clinical tasks often exhibit bias in the form of disparities in performance between subgroups. Since not all sources of biases in real-world medical imaging data are easily identifiable, it is challenging to comprehensively assess how those biases are encoded in models, and how capable bias mitigation methods are at ameliorating performance disparities. In this article, we introduce a novel analysis framework for systematically and objectively investigating the impact of biases in medical images on AI models. We developed and tested this framework for conducting controlled in silico trials to assess bias in medical imaging AI using a tool for generating synthetic magnetic resonance images with known disease effects and sources of bias. The feasibility is showcased by using three counterfactual bias scenarios to measure the impact of simulated bias effects on a convolutional neural network (CNN) classifier and the efficacy of three bias mitigation strategies. The analysis revealed that the simulated biases resulted in expected subgroup performance disparities when the CNN was trained on the synthetic datasets. Moreover, reweighing was identified as the most successful bias mitigation strategy for this setup, and we demonstrated how explainable AI methods can aid in investigating the manifestation of bias in the model using this framework. Developing fair AI models is a considerable challenge given that many and often unknown sources of biases can be present in medical imaging datasets. In this work, we present a novel methodology to objectively study the impact of biases and mitigation strategies on deep learning pipelines, which can support the development of clinical AI that is robust and responsible.

Updated: 2024-07-01 16:30:53

标题: 走向医学影像人工智能偏见客观和系统化评估

摘要: 使用医学图像训练的人工智能（AI）模型在临床任务中经常表现出偏见，表现为不同子群体之间性能的差异。由于现实世界医学图像数据中的偏见并非所有来源都容易识别，因此全面评估这些偏见如何编码在模型中，以及偏见缓解方法在改善性能差异方面的能力具有挑战性。在本文中，我们介绍了一种新的分析框架，用于系统地客观地研究医学图像中偏见对AI模型的影响。我们开发并测试了这一框架，用于进行受控的体外试验，评估医学图像AI中的偏见，使用一种工具生成具有已知疾病效应和偏见来源的合成磁共振图像。通过使用三种反事实偏见场景来衡量模拟偏见效应对卷积神经网络（CNN）分类器的影响以及三种偏见缓解策略的有效性来展示可行性。分析表明，当CNN在合成数据集上训练时，模拟的偏见导致预期的子组性能差异。此外，重新加权被确定为该设置中最成功的偏见缓解策略，并且我们演示了可解释的AI方法如何帮助使用这一框架调查模型中偏见的表现。开发公平的AI模型是一个相当大的挑战，因为医学图像数据集中可能存在许多且常常未知的偏见来源。在这项工作中，我们提出了一种新方法，以客观地研究偏见对深度学习管线的影响和缓解策略，这可以支持开发既强大又负责任的临床AI。

更新时间: 2024-07-01 16:30:53

领域: cs.CV,cs.AI,cs.CY,cs.LG

下载: http://arxiv.org/abs/2311.02115v2

Evaluation of Deep Learning Semantic Segmentation for Land Cover Mapping on Multispectral, Hyperspectral and High Spatial Aerial Imagery

In the rise of climate change, land cover mapping has become such an urgent need in environmental monitoring. The accuracy of land cover classification has gotten increasingly based on the improvement of remote sensing data. Land cover classification using satellite imageries has been explored and become more prevalent in recent years, but the methodologies remain some drawbacks of subjective and time-consuming. Some deep learning techniques have been utilized to overcome these limitations. However, most studies implemented just one image type to evaluate algorithms for land cover mapping. Therefore, our study conducted deep learning semantic segmentation in multispectral, hyperspectral, and high spatial aerial image datasets for landcover mapping. This research implemented a semantic segmentation method such as Unet, Linknet, FPN, and PSPnet for categorizing vegetation, water, and others (i.e., soil and impervious surface). The LinkNet model obtained high accuracy in IoU (Intersection Over Union) at 0.92 in all datasets, which is comparable with other mentioned techniques. In evaluation with different image types, the multispectral images showed higher performance with the IoU, and F1-score are 0.993 and 0.997, respectively. Our outcome highlighted the efficiency and broad applicability of LinkNet and multispectral image on land cover classification. This research contributes to establishing an approach on landcover segmentation via open source for long-term future application.

Updated: 2024-07-01 16:30:23

标题: 深度学习语义分割在多光谱、高光谱和高空间分辨率航空影像地物分类中的评估

摘要: 在气候变化的崛起中，土地覆盖映射已成为环境监测中迫切需要。土地覆盖分类的准确性越来越多地基于遥感数据的改进。利用卫星图像进行土地覆盖分类近年来已被探索并变得更加普遍，但方法仍然存在一些主观和耗时的缺点。一些深度学习技术已被利用来克服这些局限。然而，大多数研究仅使用一种图像类型来评估土地覆盖映射的算法。因此，我们的研究针对土地覆盖映射在多光谱、高光谱和高空间航空图像数据集中进行了深度学习语义分割。本研究实施了Unet、Linknet、FPN和PSPnet等语义分割方法，用于对植被、水域和其他地表（即土壤和不透水表面）进行分类。LinkNet模型在所有数据集中的IoU（交并比）方面获得了0.92的高准确性，与其他提到的技术相媲美。在不同图像类型的评估中，多光谱图像显示出更高的IoU和F1分数分别为0.993和0.997。我们的结果突显了LinkNet和多光谱图像在土地覆盖分类中的高效性和广泛适用性。这项研究有助于建立一种通过开源进行长期未来应用的土地覆盖分割方法。

更新时间: 2024-07-01 16:30:23

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2406.14220v2

POST: Email Archival, Processing and Flagging Stack for Incident Responders

Phishing is one of the main points of compromise, with email security and awareness being estimated at \$50-100B in 2022. There is great need for email forensics capability to quickly search for malicious content. A novel solution POST is proposed. POST is an API driven serverless email archival, processing, and flagging workflow for both large and small organizations that collects and parses all email, flags emails using state of the art Natural Language Processing and Machine Learning, allows full email searching on every aspect of an email, and provides a cost savings of up to 68.6%.

Updated: 2024-07-01 16:23:45

标题: 邮件存档、处理和标记堆栈，用于事件响应者

摘要: 网络钓鱼是妥协的主要方式之一，电子邮件安全和意识在2022年被估计为500-1000亿美元。迫切需要电子邮件取证能力来快速搜索恶意内容。提出了一种新颖的解决方案POST。POST是一个基于API驱动的无服务器电子邮件归档、处理和标记工作流，适用于大型和小型组织，可以收集和解析所有电子邮件，并使用最先进的自然语言处理和机器学习技术标记电子邮件，允许在电子邮件的各个方面进行全面搜索，并提供高达68.6%的成本节约。

更新时间: 2024-07-01 16:23:45

领域: cs.CR,cs.IR,cs.LG

下载: http://arxiv.org/abs/2407.01433v1

Reinforcement Learning-driven Data-intensive Workflow Scheduling for Volunteer Edge-Cloud

In recent times, Volunteer Edge-Cloud (VEC) has gained traction as a cost-effective, community computing paradigm to support data-intensive scientific workflows. However, due to the highly distributed and heterogeneous nature of VEC resources, centralized workflow task scheduling remains a challenge. In this paper, we propose a Reinforcement Learning (RL)-driven data-intensive scientific workflow scheduling approach that takes into consideration: i) workflow requirements, ii) VEC resources' preference on workflows, and iii) diverse VEC resource policies, to ensure robust resource allocation. We formulate the long-term average performance optimization problem as a Markov Decision Process, which is solved using an event-based Asynchronous Advantage Actor-Critic RL approach. Our extensive simulations and testbed implementations demonstrate our approach's benefits over popular baseline strategies in terms of workflow requirement satisfaction, VEC preference satisfaction, and available VEC resource utilization.

Updated: 2024-07-01 16:21:13

标题: 强化学习驱动的志愿者边缘云数据密集型工作流调度

摘要: 最近，作为一种成本效益高、支持数据密集型科学工作流的社区计算范式，志愿者边缘云（VEC）逐渐受到关注。然而，由于VEC资源的高度分布和异构性，集中式工作流任务调度仍然是一个挑战。在本文中，我们提出了一种基于强化学习（RL）的数据密集型科学工作流调度方法，考虑了：i）工作流需求，ii）VEC资源对工作流的偏好，以及iii）多样化的VEC资源策略，以确保资源分配的稳健性。我们将长期平均性能优化问题形式化为马尔科夫决策过程，采用基于事件的异步优势演员-评论家RL方法来解决。我们的广泛模拟和实验平台实现表明，相对于流行的基准策略，我们的方法在满足工作流需求、满足VEC偏好以及利用可用VEC资源方面具有优势。

更新时间: 2024-07-01 16:21:13

领域: cs.DC,cs.AI

下载: http://arxiv.org/abs/2407.01428v1

Maximizing Blockchain Performance: Mitigating Conflicting Transactions through Parallelism and Dependency Management

While blockchains initially gained popularity in the realm of cryptocurrencies, their widespread adoption is expanding beyond conventional applications, driven by the imperative need for enhanced data security. Despite providing a secure network, blockchains come with certain tradeoffs, including high latency, lower throughput, and an increased number of transaction failures. A pivotal issue contributing to these challenges is the improper management of "conflicting transactions", commonly referred to as "contention". When a number of pending transactions within a blockchain collide with each other, this results in a state of contention. This situation worsens network latency, leads to the wastage of system resources, and ultimately contributes to reduced throughput and higher transaction failures. In response to this issue, in this work, we present a novel blockchain scheme that integrates transaction parallelism and an intelligent dependency manager aiming to reduce the occurrence of conflicting transactions within blockchain networks. In terms of effectiveness and efficiency, experimental results show that our scheme not only mitigates the challenges posed by conflicting transactions, but also outperforms both existing parallel and non-parallel Hyperledger Fabric blockchain networks achieving higher transaction success rate, throughput, and latency. The integration of our scheme with Hyperledger Fabric appears to be a promising solution for improving the overall performance and stability of blockchain networks in real-world applications.

Updated: 2024-07-01 16:17:33

标题: 优化区块链性能：通过并行处理和依赖管理缓解冲突交易

摘要: 尽管区块链最初在加密货币领域获得了流行，但其广泛应用正在超越传统应用，驱动力是对增强数据安全性的迫切需求。尽管提供了一个安全的网络，但区块链也存在一定的权衡，包括高延迟、较低吞吐量和增加的交易失败次数。导致这些挑战的一个关键问题是"冲突交易"的不当管理，通常称为"争用"。当区块链中的一些待处理交易相互冲突时，就会导致争用状态。这种情况加剧了网络延迟，导致系统资源的浪费，并最终导致吞吐量降低和交易失败率提高。为了应对这一问题，本文提出了一种新颖的区块链方案，该方案整合了交易并行性和智能依赖管理器，旨在减少区块链网络中冲突交易的发生。在有效性和效率方面，实验结果显示，我们的方案不仅减轻了冲突交易带来的挑战，而且在实现更高的交易成功率、吞吐量和延迟方面表现优于现有的并行和非并行Hyperledger Fabric区块链网络。我们的方案与Hyperledger Fabric的集成似乎是改善区块链网络在实际应用中整体性能和稳定性的一个有前途的解决方案。

更新时间: 2024-07-01 16:17:33

领域: cs.CR,cs.DC

下载: http://arxiv.org/abs/2407.01426v1

Predicting Fairness of ML Software Configurations

This paper investigates the relationships between hyperparameters of machine learning and fairness. Data-driven solutions are increasingly used in critical socio-technical applications where ensuring fairness is important. Rather than explicitly encoding decision logic via control and data structures, the ML developers provide input data, perform some pre-processing, choose ML algorithms, and tune hyperparameters (HPs) to infer a program that encodes the decision logic. Prior works report that the selection of HPs can significantly influence fairness. However, tuning HPs to find an ideal trade-off between accuracy, precision, and fairness has remained an expensive and tedious task. Can we predict fairness of HP configuration for a given dataset? Are the predictions robust to distribution shifts? We focus on group fairness notions and investigate the HP space of 5 training algorithms. We first find that tree regressors and XGBoots significantly outperformed deep neural networks and support vector machines in accurately predicting the fairness of HPs. When predicting the fairness of ML hyperparameters under temporal distribution shift, the tree regressors outperforms the other algorithms with reasonable accuracy. However, the precision depends on the ML training algorithm, dataset, and protected attributes. For example, the tree regressor model was robust for training data shift from 2014 to 2018 on logistic regression and discriminant analysis HPs with sex as the protected attribute; but not for race and other training algorithms. Our method provides a sound framework to efficiently perform fine-tuning of ML training algorithms and understand the relationships between HPs and fairness.

Updated: 2024-07-01 16:16:34

标题: 预测机器学习软件配置的公平性

摘要: 本文研究了机器学习的超参数与公平性之间的关系。数据驱动的解决方案在确保公平性至关重要的关键社会技术应用中越来越被广泛使用。与其通过控制和数据结构显式地编码决策逻辑，机器学习开发人员提供输入数据，进行一些预处理，选择机器学习算法，并调整超参数（HPs）以推断出编码决策逻辑的程序。先前的研究报告称，HP的选择可以显著影响公平性。然而，调整HP以找到准确性、精度和公平性之间的理想权衡一直是一项昂贵和繁琐的任务。我们能否预测给定数据集的HP配置的公平性？这些预测是否对分布变化具有鲁棒性？我们关注组公平性概念，并研究了5种训练算法的HP空间。我们首先发现，树回归器和XGBoots在准确预测HP公平性方面明显优于深度神经网络和支持向量机。在预测在时间分布变化下ML超参数的公平性时，树回归器以合理的准确性优于其他算法。然而，精度取决于机器学习训练算法、数据集和保护属性。例如，树回归模型在性别作为保护属性的情况下，对于从2014年到2018年在逻辑回归和判别分析HP上的训练数据变化是稳健的；但对于种族和其他训练算法则不是。我们的方法提供了一个有效执行ML训练算法微调并理解HP与公平性之间关系的坚实框架。

更新时间: 2024-07-01 16:16:34

领域: cs.SE,cs.AI,cs.CY,cs.LG

下载: http://arxiv.org/abs/2404.19100v2

sec-certs: Examining the security certification practice for better vulnerability mitigation

Products certified under security certification frameworks such as Common Criteria undergo significant scrutiny during the costly certification process. Yet, critical vulnerabilities, including private key recovery (ROCA, Minerva, TPM-Fail...), get discovered in certified products with high assurance levels. Furthermore, assessing which certified products are impacted by such vulnerabilities is complicated due to the large amount of unstructured certification-related data and unclear relationships between the certified products. To address these problems, we conducted a large-scale automated analysis of Common Criteria certificates. We trained unsupervised models to learn which vulnerabilities from NIST's National Vulnerability Database impact existing certified products and how certified products reference each other. Our tooling automates the analysis of tens of thousands of certification-related documents, extracting machine-readable features where manual analysis is unattainable. Further, we identify the security requirements that are associated with products being affected by fewer and less severe vulnerabilities. This indicates which aspects of certification correlate with higher security. We demonstrate how our tool can be used for better vulnerability mitigation on four case studies of known, high-profile vulnerabilities. All tools and continuously updated results are available at https://seccerts.org

Updated: 2024-07-01 16:16:29

标题: sec-certs：审查安全认证实践以实现更好的漏洞缓解

摘要: 专家对于通过安全认证框架（如Common Criteria）认证的产品进行了严格审查，这在昂贵的认证过程中扮演了重要角色。然而，一些关键漏洞，包括私钥恢复（如ROCA，Minerva，TPM-Fail等），却被发现存在于保证高水平的认证产品中。此外，由于大量非结构化的认证相关数据和认证产品之间的关系不清晰，评估哪些认证产品受到这些漏洞的影响变得复杂。为了解决这些问题，我们进行了大规模的Common Criteria证书的自动化分析。我们训练了无监督模型，以了解NIST国家漏洞数据库中的哪些漏洞会影响现有的认证产品，以及认证产品之间的引用关系。我们的工具自动化分析了数以万计的认证相关文档，提取了机器可读特征，这些特征在手动分析方面很难获得。此外，我们确定了与受影响产品数量较少和严重性较低漏洞相关的安全要求。这表明认证的哪些方面与更高的安全相关。我们演示了我们的工具如何在四个已知的知名漏洞案例中用于更好地缓解漏洞。所有工具和持续更新的结果均可在https://seccerts.org 上找到。

更新时间: 2024-07-01 16:16:29

领域: cs.CR

下载: http://arxiv.org/abs/2311.17603v2

Accelerated Algorithms for Constrained Nonconvex-Nonconcave Min-Max Optimization and Comonotone Inclusion

We study constrained comonotone min-max optimization, a structured class of nonconvex-nonconcave min-max optimization problems, and their generalization to comonotone inclusion. In our first contribution, we extend the Extra Anchored Gradient (EAG) algorithm, originally proposed by Yoon and Ryu (2021) for unconstrained min-max optimization, to constrained comonotone min-max optimization and comonotone inclusion, achieving an optimal convergence rate of $O\left(\frac{1}{T}\right)$ among all first-order methods. Additionally, we prove that the algorithm's iterations converge to a point in the solution set. In our second contribution, we extend the Fast Extra Gradient (FEG) algorithm, as developed by Lee and Kim (2021), to constrained comonotone min-max optimization and comonotone inclusion, achieving the same $O\left(\frac{1}{T}\right)$ convergence rate. This rate is applicable to the broadest set of comonotone inclusion problems yet studied in the literature. Our analyses are based on simple potential function arguments, which might be useful for analyzing other accelerated algorithms.

Updated: 2024-07-01 16:14:01

标题: 加速算法用于受约束的非凸非凹最小-最大优化和共单调包含

摘要: 我们研究了受限共单调最小-最大优化，这是一类结构化的非凸-非凹最小-最大优化问题，以及它们对共单调包含的推广。在我们的第一项贡献中，我们将最初由Yoon和Ryu（2021年）提出的用于无约束最小-最大优化的Extra Anchored Gradient（EAG）算法扩展到受限共单调最小-最大优化和共单调包含问题，实现了所有一阶方法中的最佳收敛速率为$O\left(\frac{1}{T}\right)$。此外，我们证明算法的迭代收敛到解集中的一个点。在我们的第二项贡献中，我们将由Lee和Kim（2021年）开发的快速额外梯度（FEG）算法扩展到受限共单调最小-最大优化和共单调包含问题，实现了相同的$O\left(\frac{1}{T}\right)$收敛速率。该速率适用于文献中迄今研究的最广泛的共单调包含问题集。我们的分析基于简单的潜在函数论证，这可能对分析其他加速算法有用。

更新时间: 2024-07-01 16:14:01

领域: math.OC,cs.DS,cs.LG

下载: http://arxiv.org/abs/2206.05248v4

FairLay-ML: Intuitive Debugging of Fairness in Data-Driven Social-Critical Software

Data-driven software solutions have significantly been used in critical domains with significant socio-economic, legal, and ethical implications. The rapid adoptions of data-driven solutions, however, pose major threats to the trustworthiness of automated decision-support software. A diminished understanding of the solution by the developer and historical/current biases in the data sets are primary challenges. To aid data-driven software developers and end-users, we present \toolname, a debugging tool to test and explain the fairness implications of data-driven solutions. \toolname visualizes the logic of datasets, trained models, and decisions for a given data point. In addition, it trains various models with varying fairness-accuracy trade-offs. Crucially, \toolname incorporates counterfactual fairness testing that finds bugs beyond the development datasets. We conducted two studies through \toolname that allowed us to measure false positives/negatives in prevalent counterfactual testing and understand the human perception of counterfactual test cases in a class survey. \toolname and its benchmarks are publicly available at~\url{https://github.com/Pennswood/FairLay-ML}. The live version of the tool is available at~\url{https://fairlayml-v2.streamlit.app/}. We provide a video demo of the tool at https://youtu.be/wNI9UWkywVU?t=127

Updated: 2024-07-01 16:13:54

标题: FairLay-ML：数据驱动社会关键软件中公平性的直观调试

摘要: 数据驱动的软件解决方案已被广泛应用于具有重要社会经济、法律和道德影响的关键领域。然而，数据驱动解决方案的快速采用对自动决策支持软件的可信度构成重大威胁。开发人员对解决方案的理解不足以及数据集中的历史/当前偏见是主要挑战。为了帮助数据驱动软件开发人员和最终用户，我们提出了一种称为\toolname 的调试工具，用于测试和解释数据驱动解决方案的公平性影响。 \toolname 可视化给定数据点的数据集、训练模型和决策逻辑。此外，它训练了多种具有不同公平性-准确性权衡的模型。关键的是，\toolname 包含反事实公平性测试，可以发现超出开发数据集的错误。我们通过\toolname 进行了两项研究，允许我们测量普遍反事实测试中的误报/漏报，并了解课堂调查中反事实测试案例的人类感知。 \toolname 及其基准可在以下网址公开获得：https://github.com/Pennswood/FairLay-ML。该工具的在线版本可在以下网址获得：https://fairlayml-v2.streamlit.app/。我们在https://youtu.be/wNI9UWkywVU?t=127提供了该工具的视频演示。

更新时间: 2024-07-01 16:13:54

领域: cs.SE,cs.CY,cs.LG

下载: http://arxiv.org/abs/2407.01423v1

Neurovascular Segmentation in sOCT with Deep Learning and Synthetic Training Data

Microvascular anatomy is known to be involved in various neurological disorders. However, understanding these disorders is hindered by the lack of imaging modalities capable of capturing the comprehensive three-dimensional vascular network structure at microscopic resolution. With a lateral resolution of $<=$20 {\textmu}m and ability to reconstruct large tissue blocks up to tens of cubic centimeters, serial-section optical coherence tomography (sOCT) is well suited for this task. This method uses intrinsic optical properties to visualize the vessels and therefore does not possess a specific contrast, which complicates the extraction of accurate vascular models. The performance of traditional vessel segmentation methods is heavily degraded in the presence of substantial noise and imaging artifacts and is sensitive to domain shifts, while convolutional neural networks (CNNs) require extensive labeled data and are also sensitive the precise intensity characteristics of the data that they are trained on. Building on the emerging field of synthesis-based training, this study demonstrates a synthesis engine for neurovascular segmentation in sOCT images. Characterized by minimal priors and high variance sampling, our highly generalizable method tested on five distinct sOCT acquisitions eliminates the need for manual annotations while attaining human-level precision. Our approach comprises two phases: label synthesis and label-to-image transformation. We demonstrate the efficacy of the former by comparing it to several more realistic sets of training labels, and the latter by an ablation study of synthetic noise and artifact models.

Updated: 2024-07-01 16:09:07

标题: 使用深度学习和合成训练数据在sOCT中进行神经血管分割

摘要: 微血管解剖已知涉及各种神经疾病。然而，由于缺乏能够以微观分辨率捕捉全面三维血管网络结构的成像模式，理解这些疾病受到阻碍。具有<=20μm横向分辨率并能够重建达数十立方厘米大的组织块的连续切片光学相干断层扫描（sOCT）非常适合此任务。该方法利用固有的光学特性来可视化血管，因此不具有特定对比度，这使得准确提取血管模型变得复杂。传统血管分割方法在存在大量噪声和成像伪影的情况下性能严重降低，并对域偏移敏感，而卷积神经网络（CNNs）需要大量标记数据，同时还对其训练数据的精确强度特征敏感。借鉴综合训练新兴领域，本研究展示了用于sOCT图像中神经血管分割的综合引擎。我们的高度通用方法具有最少的先验和高方差采样，经过对五个不同的sOCT采集进行测试，消除了手动注释的需要同时实现了人类水平的精度。我们的方法包括两个阶段：标签合成和标签到图像转换。通过将其与几组更真实的训练标签进行比较，我们展示了前者的有效性，并通过合成噪声和伪影模型的剔除研究展示了后者。

更新时间: 2024-07-01 16:09:07

领域: eess.IV,cs.CV,cs.LG

下载: http://arxiv.org/abs/2407.01419v1

RoboPack: Learning Tactile-Informed Dynamics Models for Dense Packing

Tactile feedback is critical for understanding the dynamics of both rigid and deformable objects in many manipulation tasks, such as non-prehensile manipulation and dense packing. We introduce an approach that combines visual and tactile sensing for robotic manipulation by learning a neural, tactile-informed dynamics model. Our proposed framework, RoboPack, employs a recurrent graph neural network to estimate object states, including particles and object-level latent physics information, from historical visuo-tactile observations and to perform future state predictions. Our tactile-informed dynamics model, learned from real-world data, can solve downstream robotics tasks with model-predictive control. We demonstrate our approach on a real robot equipped with a compliant Soft-Bubble tactile sensor on non-prehensile manipulation and dense packing tasks, where the robot must infer the physics properties of objects from direct and indirect interactions. Trained on only an average of 30 minutes of real-world interaction data per task, our model can perform online adaptation and make touch-informed predictions. Through extensive evaluations in both long-horizon dynamics prediction and real-world manipulation, our method demonstrates superior effectiveness compared to previous learning-based and physics-based simulation systems.

Updated: 2024-07-01 16:08:37

标题: RoboPack: 学习用于密集打包的触觉信息动力学模型

摘要: 触觉反馈对于理解刚性和可变形物体在许多操作任务中的动态非常关键，例如非抓取操作和密集堆积。我们引入了一种结合视觉和触觉感知的方法，通过学习神经网络、触觉信息动态模型来进行机器人操作。我们提出的框架RoboPack采用循环图神经网络来估计对象状态，包括粒子和对象级的潜在物理信息，从历史视觉-触觉观察中并进行未来状态预测。我们从真实世界数据中学习的触觉信息动态模型可以通过模型预测控制来解决下游机器人任务。我们在一个真实机器人上展示了我们的方法，该机器人配备了一个具有可变形Soft-Bubble触觉传感器，用于非抓取操作和密集堆积任务，在这些任务中，机器人必须从直接和间接的交互中推断出物体的物理属性。仅通过每个任务平均30分钟的真实世界交互数据进行训练，我们的模型可以进行在线适应并进行基于触觉的预测。通过在长时间预测动态和真实世界操作中进行广泛评估，我们的方法表现出比先前基于学习和基于物理的模拟系统更高的有效性。

更新时间: 2024-07-01 16:08:37

领域: cs.RO,cs.AI,cs.LG,I.2.9; I.2.6; I.2.10

下载: http://arxiv.org/abs/2407.01418v1

Good Gottesman-Kitaev-Preskill codes from the NTRU cryptosystem

We introduce a new class of random Gottesman-Kitaev-Preskill (GKP) codes derived from the cryptanalysis of the so-called NTRU cryptosystem. The derived codes are good in that they exhibit constant rate and average distance scaling $\Delta \propto \sqrt{n}$ with high probability, where $n$ is the number of bosonic modes, which is a distance scaling equivalent to that of a GKP code obtained by concatenating single mode GKP codes into a qubit-quantum error correcting code with linear distance. The derived class of NTRU-GKP codes has the additional property that decoding for a stochastic displacement noise model is equivalent to decrypting the NTRU cryptosystem, such that every random instance of the code naturally comes with an efficient decoder. This construction highlights how the GKP code bridges aspects of classical error correction, quantum error correction as well as post-quantum cryptography. We underscore this connection by discussing the computational hardness of decoding GKP codes and propose, as a new application, a simple public key quantum communication protocol with security inherited from the NTRU cryptosystem.

Updated: 2024-07-01 16:05:52

标题: 从NTRU密码体系得到的优秀的Gottesman-Kitaev-Preskill码

摘要: 我们引入了一种新的基于随机Gottesman-Kitaev-Preskill（GKP）代码的类别，这些代码源自所谓的NTRU密码系统的加密分析。所得到的代码之所以好，是因为它们表现出恒定的速率和平均距离缩放$\Delta \propto \sqrt{n}$，并且这种缩放距离与将单模GKP代码连接成具有线性距离的量子比特纠错码所获得的GKP代码等效。所得到的NTRU-GKP代码类别具有额外的特性，即对于随机位移噪声模型的解码等效于解密NTRU密码系统，从而每个代码的随机实例自然地配备了一个高效的解码器。这种构建凸显了GKP代码如何连接经典错误校正、量子错误校正以及后量子密码学的方面。我们通过讨论解码GKP代码的计算难度并提出一个新的应用，即一个简单的具有继承自NTRU密码系统的安全性的公钥量子通信协议来强调这种连接。

更新时间: 2024-07-01 16:05:52

领域: quant-ph,cs.CR,cs.IT,math.IT

下载: http://arxiv.org/abs/2303.02432v4

Dynamic Few-Shot Learning for Knowledge Graph Question Answering

Large language models present opportunities for innovative Question Answering over Knowledge Graphs (KGQA). However, they are not inherently designed for query generation. To bridge this gap, solutions have been proposed that rely on fine-tuning or ad-hoc architectures, achieving good results but limited out-of-domain distribution generalization. In this study, we introduce a novel approach called Dynamic Few-Shot Learning (DFSL). DFSL integrates the efficiency of in-context learning and semantic similarity and provides a generally applicable solution for KGQA with state-of-the-art performance. We run an extensive evaluation across multiple benchmark datasets and architecture configurations.

Updated: 2024-07-01 15:59:17

标题: 动态少样本学习用于知识图谱问答

摘要: 大型语言模型为知识图谱问答（KGQA）提供了创新机会。然而，它们并非专门设计用于查询生成。为了弥补这一差距，提出了依赖于微调或临时架构的解决方案，取得了良好的结果，但在领域外分布泛化方面有限。在本研究中，我们介绍了一种名为动态少样本学习（DFSL）的新方法。DFSL集成了上下文学习和语义相似性的效率，并提供了一个通用的解决方案，具有最先进的性能。我们在多个基准数据集和架构配置上进行了广泛评估。

更新时间: 2024-07-01 15:59:17

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2407.01409v1

Semantic Compositions Enhance Vision-Language Contrastive Learning

In the field of vision-language contrastive learning, models such as CLIP capitalize on matched image-caption pairs as positive examples and leverage within-batch non-matching pairs as negatives. This approach has led to remarkable outcomes in zero-shot image classification, cross-modal retrieval, and linear evaluation tasks. We show that the zero-shot classification and retrieval capabilities of CLIP-like models can be improved significantly through the introduction of semantically composite examples during pretraining. Inspired by CutMix in vision categorization, we create semantically composite image-caption pairs by merging elements from two distinct instances in the dataset via a novel procedure. Our method fuses the captions and blends 50% of each image to form a new composite sample. This simple technique (termed CLIP-C for CLIP Compositions), devoid of any additional computational overhead or increase in model parameters, significantly improves zero-shot image classification and cross-modal retrieval. The benefits of CLIP-C are particularly pronounced in settings with relatively limited pretraining data.

Updated: 2024-07-01 15:58:20

标题: 语义组合增强视觉-语言对比学习

摘要: 在视觉-语言对比学习领域，诸如CLIP的模型利用匹配的图像-标题对作为正例，并利用批内不匹配对作为负例。这种方法在零样本图像分类、跨模态检索和线性评估任务中取得了显著的成果。我们展示了类似CLIP模型的零样本分类和检索能力可以通过在预训练过程中引入语义复合示例而显著改善。受视觉分类中CutMix的启发，我们通过一种新颖的方法将数据集中来自两个不同实例的元素合并，创建了语义复合的图像-标题对。我们的方法融合了标题并混合了每个图像的50%以形成一个新的复合样本。这种简单的技术（称为CLIP-C用于CLIP组合），不涉及任何额外的计算开销或模型参数增加，显著改善了零样本图像分类和跨模态检索。CLIP-C的好处在相对有限的预训练数据环境下尤为明显。

更新时间: 2024-07-01 15:58:20

领域: cs.CV,cs.AI,cs.LG

下载: http://arxiv.org/abs/2407.01408v1

Cutting through buggy adversarial example defenses: fixing 1 line of code breaks Sabre

Sabre is a defense to adversarial examples that was accepted at IEEE S&P 2024. We first reveal significant flaws in the evaluation that point to clear signs of gradient masking. We then show the cause of this gradient masking: a bug in the original evaluation code. By fixing a single line of code in the original repository, we reduce Sabre's robust accuracy to 0%. In response to this, the authors modify the defense and introduce a new defense component not described in the original paper. But this fix contains a second bug; modifying one more line of code reduces robust accuracy to below baseline levels. After we released the first version of our paper online, the authors introduced another change to the defense; by commenting out one line of code during attack we reduce the robust accuracy to 0% again.

Updated: 2024-07-01 15:57:59

标题: 穿透有缺陷的对抗性示例防御：修复一行代码破坏了Sabre

摘要: Sabre是对抗样本的一种防御方法，已被IEEE S&P 2024年接受。我们首先揭示了评估中的重大缺陷，指向明显的梯度掩盖迹象。然后我们展示了这种梯度掩盖的原因：原始评估代码中的一个错误。通过修复原始存储库中的一行代码，我们将Sabre的强化准确率降至0％。作为回应，作者修改了防御，并引入了原始论文中未描述的新防御组件。但这个修复包含了第二个错误；修改另外一行代码将强化准确率降至基准水平以下。在我们在线发布我们的论文的第一个版本后，作者对防御进行了另一次更改；在攻击期间注释掉一行代码，我们再次将强化准确率降至0％。

更新时间: 2024-07-01 15:57:59

领域: cs.CR,cs.LG

下载: http://arxiv.org/abs/2405.03672v3

A Convex Hull Cheapest Insertion Heuristic for the Non-Euclidean TSP

The convex hull cheapest insertion heuristic is known to produce good solutions to the Traveling Salesperson Problem in Euclidean spaces, but it has never been extended to the non-Euclidean problem. This paper proposes an adaptation that uses multidimensional scaling to first project the points from a non-Euclidean space into a Euclidean equivalent space, thereby enabling the generation of a convex hull that initializes the algorithm. To evaluate the proposed algorithm, non-Euclidean spaces are created by adding separators to the Euclidean TSPLIB benchmark data-set, or by using the L1 norm as a metric. This adapted heuristic is demonstrated to outperform the commonly used Nearest Neighbor heuristic and Nearest Insertion heuristic in 88% and 99% of the cases studied, respectively. When compared with metaheuristic algorithms, the proposed heuristic's tour costs are lower than the solutions found by the genetic algorithm and ant colony optimization algorithm in 87% and 95% of the instances, respectively.

Updated: 2024-07-01 15:56:49

标题: 一个凸壳最小插入启发式算法用于非欧几里德TSP

摘要: 凸包最便宜插入启发式算法已知在欧几里得空间中产生良好的旅行推销员问题解决方案，但从未扩展到非欧几里得问题。本文提出了一种适应性方法，该方法使用多维缩放首先将点从非欧几里得空间投影到等价的欧几里得空间，从而使算法初始化生成凸包。为了评估所提出的算法，通过向欧几里得TSPLIB基准数据集添加分隔符，或者使用L1范数作为度量，创建了非欧几里得空间。在所研究的案例中，这种改进的启发式算法被证明优于常用的最近邻启发式算法和最近插入启发式算法的情况分别为88%和99%。与元启发式算法相比，所提出的启发式算法的旅行成本在87%和95%的实例中分别低于遗传算法和蚁群优化算法找到的解决方案。

更新时间: 2024-07-01 15:56:49

领域: cs.AI,cs.SY,eess.SY

下载: http://arxiv.org/abs/2302.06582v4

Adapting Multilingual LLMs to Low-Resource Languages with Knowledge Graphs via Adapters

This paper explores the integration of graph knowledge from linguistic ontologies into multilingual Large Language Models (LLMs) using adapters to improve performance for low-resource languages (LRLs) in sentiment analysis (SA) and named entity recognition (NER). Building upon successful parameter-efficient fine-tuning techniques, such as K-ADAPTER and MAD-X, we propose a similar approach for incorporating knowledge from multilingual graphs, connecting concepts in various languages with each other through linguistic relationships, into multilingual LLMs for LRLs. Specifically, we focus on eight LRLs -- Maltese, Bulgarian, Indonesian, Nepali, Javanese, Uyghur, Tibetan, and Sinhala -- and employ language-specific adapters fine-tuned on data extracted from the language-specific section of ConceptNet, aiming to enable knowledge transfer across the languages covered by the knowledge graph. We compare various fine-tuning objectives, including standard Masked Language Modeling (MLM), MLM with full-word masking, and MLM with targeted masking, to analyse their effectiveness in learning and integrating the extracted graph data. Through empirical evaluation on language-specific tasks, we assess how structured graph knowledge affects the performance of multilingual LLMs for LRLs in SA and NER, providing insights into the potential benefits of adapting language models for low-resource scenarios.

Updated: 2024-07-01 15:56:24

标题: 利用适配器通过知识图谱将多语言LLMs适应低资源语言

摘要: 本文探讨了将语言本体知识集成到多语言大型语言模型（LLMs）中，使用适配器来提高情感分析（SA）和命名实体识别（NER）中低资源语言（LRLs）的性能。在成功的参数高效微调技术（如K-ADAPTER和MAD-X）基础上，我们提出了一种类似的方法，用于将多语言图中的知识纳入多语言LLMs，通过语言关系将各种语言中的概念连接起来，专注于马耳他语、保加利亚语、印尼语、尼泊尔语、爪哇语、维吾尔语、藏语和僧伽罗语等八种LRLs。并使用从ConceptNet的语言特定部分提取的数据微调语言特定的适配器，旨在实现跨知识图覆盖的语言之间的知识转移。我们比较了各种微调目标，包括标准掩码语言建模（MLM）、带有全词掩码的MLM和有针对性的掩码的MLM，以分析它们在学习和整合提取的图数据方面的有效性。通过对语言特定任务的经验评估，我们评估了结构化图知识对LRLs的多语言LLMs在SA和NER中的性能的影响，提供了关于为低资源场景调整语言模型的潜在好处的见解。

更新时间: 2024-07-01 15:56:24

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2407.01406v1

Affine Invariant Ensemble Transform Methods to Improve Predictive Uncertainty in Neural Networks

We consider the problem of performing Bayesian inference for logistic regression using appropriate extensions of the ensemble Kalman filter. Two interacting particle systems are proposed that sample from an approximate posterior and prove quantitative convergence rates of these interacting particle systems to their mean-field limit as the number of particles tends to infinity. Furthermore, we apply these techniques and examine their effectiveness as methods of Bayesian approximation for quantifying predictive uncertainty in neural networks.

Updated: 2024-07-01 15:55:16

标题: 使用仿射不变性集合变换方法来提高神经网络中的预测不确定性

摘要: 我们考虑使用集合卡尔曼滤波器的适当扩展来进行逻辑回归的贝叶斯推断问题。提出了两种相互作用的粒子系统，从近似后验中进行抽样，并证明了这些相互作用粒子系统收敛到其均场极限的定量收敛率，当粒子数趋于无穷时。此外，我们应用这些技术，并考察它们作为贝叶斯逼近方法的有效性，用于量化神经网络中的预测不确定性。

更新时间: 2024-07-01 15:55:16

领域: stat.ML,cs.LG,cs.NA,math.NA

下载: http://arxiv.org/abs/2309.04742v2

Optimization of Retrieval-Augmented Generation Context with Outlier Detection

In this paper, we focus on methods to reduce the size and improve the quality of the prompt context required for question-answering systems. Attempts to increase the number of retrieved chunked documents and thereby enlarge the context related to the query can significantly complicate the processing and decrease the performance of a Large Language Model (LLM) when generating responses to queries. It is well known that a large set of documents retrieved from a database in response to a query may contain irrelevant information, which often leads to hallucinations in the resulting answers. Our goal is to select the most semantically relevant documents, treating the discarded ones as outliers. We propose and evaluate several methods for identifying outliers by creating features that utilize the distances of embedding vectors, retrieved from the vector database, to both the centroid and the query vectors. The methods were evaluated by comparing the similarities of the retrieved LLM responses to ground-truth answers obtained using the OpenAI GPT-4o model. It was found that the greatest improvements were achieved with increasing complexity of the questions and answers.

Updated: 2024-07-01 15:53:29

标题: 检索增强生成上下文的优化与异常值检测

摘要: 在这篇论文中，我们关注减少问题回答系统所需的提示背景的大小并提高其质量的方法。试图增加检索到的分块文档数量，从而扩大与查询相关的背景，可能会显著复杂化处理过程，并降低大型语言模型（LLM）在生成回答查询时的性能。众所周知，从数据库检索到的大量文档可能包含无关信息，这经常导致生成的答案中出现幻觉。我们的目标是选择最具语义相关性的文档，将被丢弃的文档视为异常值。我们提出并评估了几种识别异常值的方法，通过创建利用从向量数据库检索到的嵌入向量与中心点和查询向量之间距离的特征。通过比较检索到的LLM回答与使用OpenAI GPT-4o模型获得的地面真实答案的相似性来评估这些方法。发现随着问题和答案复杂性的增加，取得了最大的改进。

更新时间: 2024-07-01 15:53:29

领域: cs.IR,cs.AI,cs.CL,cs.LG

下载: http://arxiv.org/abs/2407.01403v1

Superconstant Inapproximability of Decision Tree Learning

We consider the task of properly PAC learning decision trees with queries. Recent work of Koch, Strassle, and Tan showed that the strictest version of this task, where the hypothesis tree $T$ is required to be optimally small, is NP-hard. Their work leaves open the question of whether the task remains intractable if $T$ is only required to be close to optimal, say within a factor of 2, rather than exactly optimal. We answer this affirmatively and show that the task indeed remains NP-hard even if $T$ is allowed to be within any constant factor of optimal. More generally, our result allows for a smooth tradeoff between the hardness assumption and the inapproximability factor. As Koch et al.'s techniques do not appear to be amenable to such a strengthening, we first recover their result with a new and simpler proof, which we couple with a new XOR lemma for decision trees. While there is a large body of work on XOR lemmas for decision trees, our setting necessitates parameters that are extremely sharp, and are not known to be attainable by existing XOR lemmas. Our work also carries new implications for the related problem of Decision Tree Minimization.

Updated: 2024-07-01 15:53:03

标题: 决策树学习的超常数近似难度

摘要: 我们考虑使用查询正确地PAC学习决策树的任务。Koch、Strassle和Tan的最近工作表明，对于这个任务的最严格版本，其中假设树$T$需要最优小，是NP难的。他们的工作留下了一个问题：如果$T$只需要接近最优，比如在一个因子内，而不是完全最优，这个任务是否仍然难以处理。我们肯定地回答了这个问题，并且表明即使$T$允许在任何常数因子内接近最优，这个任务仍然是NP难的。更一般地说，我们的结果允许在难度假设和近似因子之间进行平滑的权衡。由于Koch等人的技术似乎不适合这种加强，我们首先用一个新的和更简单的证明重新得到他们的结果，然后结合一个新的决策树的XOR引理。虽然有大量关于决策树的XOR引理的工作，但我们的设置需要非常尖锐的参数，而且尚不清楚现有的XOR引理是否能够达到这些参数。我们的工作还对相关问题——决策树最小化——产生了新的影响。

更新时间: 2024-07-01 15:53:03

领域: cs.CC,cs.DS,cs.LG

下载: http://arxiv.org/abs/2407.01402v1

Mask and Compress: Efficient Skeleton-based Action Recognition in Continual Learning

The use of skeletal data allows deep learning models to perform action recognition efficiently and effectively. Herein, we believe that exploring this problem within the context of Continual Learning is crucial. While numerous studies focus on skeleton-based action recognition from a traditional offline perspective, only a handful venture into online approaches. In this respect, we introduce CHARON (Continual Human Action Recognition On skeletoNs), which maintains consistent performance while operating within an efficient framework. Through techniques like uniform sampling, interpolation, and a memory-efficient training stage based on masking, we achieve improved recognition accuracy while minimizing computational overhead. Our experiments on Split NTU-60 and the proposed Split NTU-120 datasets demonstrate that CHARON sets a new benchmark in this domain. The code is available at https://github.com/Sperimental3/CHARON.

Updated: 2024-07-01 15:48:49

标题: 遮罩和压缩：基于骨架的动作识别在持续学习中的有效性

摘要: 骨架数据的使用使深度学习模型能够高效有效地执行动作识别。在此，我们相信在持续学习的背景下探索这个问题是至关重要的。虽然许多研究侧重于基于骨架的动作识别，从传统的离线视角，但只有少数尝试在线方法。在这方面，我们介绍了CHARON（Continual Human Action Recognition On skeletoNs），该模型在高效框架内保持一致的性能。通过统一采样、插值和基于屏蔽的内存高效训练阶段等技术，我们实现了提高识别准确性的同时最大限度地减少计算开销。我们在Split NTU-60和提出的Split NTU-120数据集上的实验表明，CHARON在该领域树立了新的基准。该代码可在https://github.com/Sperimental3/CHARON 上找到。

更新时间: 2024-07-01 15:48:49

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2407.01397v1

Gloss2Text: Sign Language Gloss translation using LLMs and Semantically Aware Label Smoothing

Sign language translation from video to spoken text presents unique challenges owing to the distinct grammar, expression nuances, and high variation of visual appearance across different speakers and contexts. The intermediate gloss annotations of videos aim to guide the translation process. In our work, we focus on {\em Gloss2Text} translation stage and propose several advances by leveraging pre-trained large language models (LLMs), data augmentation, and novel label-smoothing loss function exploiting gloss translation ambiguities improving significantly the performance of state-of-the-art approaches. Through extensive experiments and ablation studies on the PHOENIX Weather 2014T dataset, our approach surpasses state-of-the-art performance in {\em Gloss2Text} translation, indicating its efficacy in addressing sign language translation and suggesting promising avenues for future research and development.

Updated: 2024-07-01 15:46:45

标题: Gloss2Text：使用LLMs和语义感知标签平滑进行手语词汇翻译

摘要: 视频到口语文本的手语翻译面临着独特的挑战，因为不同说话者和语境下的手语语法、表达细微差别以及视觉外观的高度变化。视频的中间注释旨在指导翻译过程。在我们的工作中，我们专注于 Gloss2Text 翻译阶段，并提出了几项进展，通过利用预训练的大型语言模型（LLMs）、数据增强和利用手语翻译歧义的新型标签平滑损失函数，显著提高了最先进方法的性能。通过在 PHOENIX Weather 2014T 数据集上进行大量实验和消融研究，我们的方法超越了最先进的 Gloss2Text 翻译性能，表明其在解决手语翻译方面的有效性，并为未来的研究和发展提供了有前途的途径。

更新时间: 2024-07-01 15:46:45

领域: cs.CV,cs.CL,cs.LG

下载: http://arxiv.org/abs/2407.01394v1

Optimizing Age of Information in Vehicular Edge Computing with Federated Graph Neural Network Multi-Agent Reinforcement Learning

With the rapid development of intelligent vehicles and Intelligent Transport Systems (ITS), the sensors such as cameras and LiDAR installed on intelligent vehicles provides higher capacity of executing computation-intensive and delay-sensitive tasks, thereby raising deployment costs. To address this issue, Vehicular Edge Computing (VEC) has been proposed to process data through Road Side Units (RSUs) to support real-time applications. This paper focuses on the Age of Information (AoI) as a key metric for data freshness and explores task offloading issues for vehicles under RSU communication resource constraints. We adopt a Multi-agent Deep Reinforcement Learning (MADRL) approach, allowing vehicles to autonomously make optimal data offloading decisions. However, MADRL poses risks of vehicle information leakage during communication learning and centralized training. To mitigate this, we employ a Federated Learning (FL) framework that shares model parameters instead of raw data to protect the privacy of vehicle users. Building on this, we propose an innovative distributed federated learning framework combining Graph Neural Networks (GNN), named Federated Graph Neural Network Multi-Agent Reinforcement Learning (FGNN-MADRL), to optimize AoI across the system. For the first time, road scenarios are constructed as graph data structures, and a GNN-based federated learning framework is proposed, effectively combining distributed and centralized federated aggregation. Furthermore, we propose a new MADRL algorithm that simplifies decision making and enhances offloading efficiency, further reducing the decision complexity. Simulation results demonstrate the superiority of our proposed approach to other methods through simulations.

Updated: 2024-07-01 15:37:38

标题: 优化边缘计算中车辆信息时效性的方法：利用联邦图神经网络多智能体强化学习

摘要: 随着智能车辆和智能交通系统（ITS）的快速发展，安装在智能车辆上的摄像头和激光雷达等传感器提供了更高的计算密集型和延迟敏感任务的执行能力，从而提高了部署成本。为解决这一问题，提出了车辆边缘计算（VEC）通过路边单元（RSUs）处理数据以支持实时应用程序。本文重点关注信息时代（AoI）作为数据新鲜度的关键指标，并探讨了在RSU通信资源限制下车辆的任务卸载问题。我们采用了多智能体深度强化学习（MADRL）方法，使车辆可以自主地做出最佳的数据卸载决策。然而，MADRL在通信学习和集中训练过程中存在车辆信息泄露的风险。为了减轻这一问题，我们采用了一个联邦学习（FL）框架，该框架共享模型参数而不是原始数据，以保护车辆用户的隐私。在此基础上，我们提出了一种创新的分布式联邦学习框架，结合了图神经网络（GNN），命名为联邦图神经网络多智能体强化学习（FGNN-MADRL），以优化整个系统中的AoI。首次将道路场景构建为图数据结构，并提出了基于GNN的联邦学习框架，有效地结合了分布式和集中式联邦聚合。此外，我们提出了一种新的MADRL算法，简化了决策过程，增强了卸载效率，进一步降低了决策复杂性。仿真结果通过模拟证明了我们提出的方法相对于其他方法的优越性。

更新时间: 2024-07-01 15:37:38

领域: cs.LG,cs.DC,cs.MA,cs.NI

下载: http://arxiv.org/abs/2407.02342v1

Beyond Throughput and Compression Ratios: Towards High End-to-end Utility of Gradient Compression

Gradient aggregation has long been identified as a major bottleneck in today's large-scale distributed machine learning training systems. One promising solution to mitigate such bottlenecks is gradient compression, directly reducing communicated gradient data volume. However, in practice, many gradient compression schemes do not achieve acceleration of the training process while also preserving accuracy. In this work, we identify several common issues in previous gradient compression systems and evaluation methods. These issues include excessive computational overheads; incompatibility with all-reduce; and inappropriate evaluation metrics, such as not using an end-to-end metric or using a 32-bit baseline instead of a 16-bit baseline. We propose several general design and evaluation techniques to address these issues and provide guidelines for future work. Our preliminary evaluation shows that our techniques enhance the system's performance and provide a clearer understanding of the end-to-end utility of gradient compression methods.

Updated: 2024-07-01 15:32:28

标题: 超越吞吐量和压缩比率：朝向梯度压缩的端到端高效用途

摘要: 梯度聚合长期以来一直被认为是当今大规模分布式机器学习训练系统的一个主要瓶颈。减轻这种瓶颈的一个有前途的解决方案是梯度压缩，直接减少了传输的梯度数据量。然而，在实践中，许多梯度压缩方案在保持准确性的同时并没有加速训练过程。在这项工作中，我们确定了以前梯度压缩系统和评估方法中的一些常见问题。这些问题包括计算开销过大；与全局归一化不兼容；以及不恰当的评估指标，例如不使用端到端指标或使用32位基准而不是16位基准。我们提出了几种通用的设计和评估技术来解决这些问题，并为未来的工作提供指导。我们的初步评估显示，我们的技术提升了系统的性能，并提供了对梯度压缩方法的端到端效用的更清晰理解。

更新时间: 2024-07-01 15:32:28

领域: cs.LG,cs.NI

下载: http://arxiv.org/abs/2407.01378v1

Badllama 3: removing safety finetuning from Llama 3 in minutes

We show that extensive LLM safety fine-tuning is easily subverted when an attacker has access to model weights. We evaluate three state-of-the-art fine-tuning methods-QLoRA, ReFT, and Ortho-and show how algorithmic advances enable constant jailbreaking performance with cuts in FLOPs and optimisation power. We strip safety fine-tuning from Llama 3 8B in one minute and Llama 3 70B in 30 minutes on a single GPU, and sketch ways to reduce this further.

Updated: 2024-07-01 15:29:45

标题: Badllama 3：在几分钟内从Llama 3中移除安全微调

摘要: 我们展示了当攻击者可以访问模型权重时，广泛的LLM安全微调很容易被破坏。我们评估了三种最先进的微调方法- QLoRA，ReFT和Ortho-并展示了算法的进步如何实现了在FLOPs和优化能力上的不断越狱性能。我们在单个GPU上在一分钟内剥离了Llama 3 8B的安全微调，以及在30分钟内剥离了Llama 3 70B的安全微调，并勾勒了进一步减少这种情况的途径。

更新时间: 2024-07-01 15:29:45

领域: cs.LG,cs.AI,cs.CL,cs.CR

下载: http://arxiv.org/abs/2407.01376v1

Patch-Prompt Aligned Bayesian Prompt Tuning for Vision-Language Models

For downstream applications of vision-language pre-trained models, there has been significant interest in constructing effective prompts. Existing works on prompt engineering, which either require laborious manual designs or optimize the prompt tuning as a point estimation problem, may fail to describe diverse characteristics of categories and limit their applications. We introduce a Bayesian probabilistic resolution to prompt tuning, where the label-specific stochastic prompts are generated hierarchically by first sampling a latent vector from an underlying distribution and then employing a lightweight generative model. Importantly, we semantically regularize the tuning process by minimizing the statistical distance between the visual patches and linguistic prompts, which pushes the stochastic label representations to faithfully capture diverse visual concepts, instead of overfitting the training categories. We evaluate the effectiveness of our approach on four tasks: few-shot image recognition, base-to-new generalization, dataset transfer learning, and domain shifts. Extensive results over 15 datasets show promising transferability and generalization performance of our proposed model, both quantitatively and qualitatively.

Updated: 2024-07-01 15:29:45

标题: 补丁提示对齐的贝叶斯提示调整对视觉-语言模型的影响

摘要: 对于视觉语言预训练模型的下游应用，构建有效提示引起了人们的极大兴趣。现有的有关提示工程的研究要么需要繁琐的手动设计，要么将提示调优优化为一个点估计问题，这可能无法描述类别的多样特征并限制其应用。我们引入了贝叶斯概率解决方案来进行提示调优，其中标签特定的随机提示通过首先从潜在分布中采样一个潜在向量，然后使用轻量级生成模型进行层次生成。重要的是，我们通过最小化视觉补丁和语言提示之间的统计距离来语义规范调优过程，这将随机标签表示推向忠实地捕捉各种视觉概念，而不是过度拟合训练类别。我们在四项任务上评估了我们方法的有效性：少样本图像识别、基础到新领域泛化、数据集迁移学习和领域转移。对15个数据集的广泛结果显示出我们提出的模型在定量和定性上的可靠的迁移性和泛化性能。

更新时间: 2024-07-01 15:29:45

领域: cs.CV,cs.CL,cs.LG

下载: http://arxiv.org/abs/2303.09100v2

Unleashing the Power of Meta-tuning for Few-shot Generalization Through Sparse Interpolated Experts

Recent successes suggest that parameter-efficient fine-tuning of foundation models as the state-of-the-art method for transfer learning in vision, replacing the rich literature of alternatives such as meta-learning. In trying to harness the best of both worlds, meta-tuning introduces a subsequent optimization stage of foundation models but has so far only shown limited success and crucially tends to underperform on out-of-distribution (OOD) tasks. In this paper, we introduce Sparse MetA-Tuning (SMAT), a method inspired by sparse mixture-of-experts approaches and trained to isolate subsets of pre-trained parameters automatically for meta-tuning on each task. SMAT successfully overcomes OOD sensitivity and delivers on the promise of enhancing the transfer abilities of vision foundation models beyond parameter-efficient fine-tuning. We establish new state-of-the-art results on a challenging combination of Meta-Dataset augmented with additional OOD tasks in both zero-shot and gradient-based adaptation settings. In addition, we provide a thorough analysis of the superiority of learned over hand-designed sparsity patterns for sparse expert methods and the pivotal importance of the sparsity level in balancing between in-distribution and out-of-distribution generalization. Our code is publicly available.

Updated: 2024-07-01 15:29:16

标题: 释放元调谐的力量，通过稀疏插值专家实现少样本通用化

摘要: 最近的成功表明，参数高效微调基础模型是迁移学习在视觉领域的最先进方法，取代了元学习等替代方法的丰富文献。在试图兼顾两者优势的同时，元调优引入了基础模型的后续优化阶段，但迄今仅显示出有限的成功，并且在分布外（OOD）任务中往往表现不佳。在本文中，我们介绍了一种受稀疏专家混合方法启发的方法，名为Sparse MetA-Tuning（SMAT），该方法经过训练可自动为每个任务隔离预训练参数的子集进行元调优。SMAT成功地克服了OOD敏感性，并实现了增强视觉基础模型传递能力的承诺，超越了参数高效微调。我们在一个具有挑战性的Meta-Dataset组合上建立了最新的最先进结果，并增加了额外的OOD任务，包括零样本和基于梯度的适应设置。此外，我们对学习过程中超过手动设计的稀疏模式的优越性进行了彻底分析，并探讨了在分布内和分布外泛化之间平衡的稀疏水平的关键重要性。我们的代码是公开可用的。

更新时间: 2024-07-01 15:29:16

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2403.08477v3

Learning the boundary-to-domain mapping using Lifting Product Fourier Neural Operators for partial differential equations

Neural operators such as the Fourier Neural Operator (FNO) have been shown to provide resolution-independent deep learning models that can learn mappings between function spaces. For example, an initial condition can be mapped to the solution of a partial differential equation (PDE) at a future time-step using a neural operator. Despite the popularity of neural operators, their use to predict solution functions over a domain given only data over the boundary (such as a spatially varying Dirichlet boundary condition) remains unexplored. In this paper, we refer to such problems as boundary-to-domain problems; they have a wide range of applications in areas such as fluid mechanics, solid mechanics, heat transfer etc. We present a novel FNO-based architecture, named Lifting Product FNO (or LP-FNO) which can map arbitrary boundary functions defined on the lower-dimensional boundary to a solution in the entire domain. Specifically, two FNOs defined on the lower-dimensional boundary are lifted into the higher dimensional domain using our proposed lifting product layer. We demonstrate the efficacy and resolution independence of the proposed LP-FNO for the 2D Poisson equation.

Updated: 2024-07-01 15:27:50

标题: 学习使用Lifting Product Fourier神经算子进行边界到域映射的偏微分方程

摘要: 神经算子，如傅里叶神经算子（FNO），已被证明能提供分辨率无关的深度学习模型，可以学习函数空间之间的映射。例如，可以使用神经算子将初始条件映射到偏微分方程（PDE）在未来时间步的解决方案。尽管神经算子很受欢迎，但它们在仅提供边界数据（如空间变化的迪利希条件）的情况下预测域内解决方案函数的使用仍未被探索。在本文中，我们将这样的问题称为边界到域问题；它们在流体力学、固体力学、传热等领域有广泛的应用。我们提出了一种新颖的基于FNO的架构，称为Lifting Product FNO（或LP-FNO），它可以将定义在低维边界上的任意边界函数映射到整个域中的解决方案。具体地，我们提出的lifting product层将定义在低维边界上的两个FNO提升到更高维度的域中。我们展示了所提出的LP-FNO对于2D泊松方程的有效性和分辨率独立性。

更新时间: 2024-07-01 15:27:50

领域: cs.LG,cs.NA,math.NA,65N99, 68T07,I.2.1; J.2

下载: http://arxiv.org/abs/2406.16740v2

Safe Linear Bandits over Unknown Polytopes

The safe linear bandit problem (SLB) is an online approach to linear programming with unknown objective and unknown roundwise constraints, under stochastic bandit feedback of rewards and safety risks of actions. We study the tradeoffs between efficacy and smooth safety costs of SLBs over polytopes, and the role of aggressive doubly-optimistic play in avoiding the strong assumptions made by extant pessimistic-optimistic approaches. We first elucidate an inherent hardness in SLBs due the lack of knowledge of constraints: there exist `easy' instances, for which suboptimal extreme points have large `gaps', but on which SLB methods must still incur $\Omega(\sqrt{T})$ regret or safety violations, due to an inability to resolve unknown optima to arbitrary precision. We then analyse a natural doubly-optimistic strategy for the safe linear bandit problem, DOSS, which uses optimistic estimates of both reward and safety risks to select actions, and show that despite the lack of knowledge of constraints or feasible points, DOSS simultaneously obtains tight instance-dependent $O(\log^2 T)$ bounds on efficacy regret, and $\tilde O(\sqrt{T})$ bounds on safety violations. Further, when safety is demanded to a finite precision, violations improve to $O(\log^2 T).$ These results rely on a novel dual analysis of linear bandits: we argue that \algoname proceeds by activating noisy versions of at least $d$ constraints in each round, which allows us to separately analyse rounds where a `poor' set of constraints is activated, and rounds where `good' sets of constraints are activated. The costs in the former are controlled to $O(\log^2 T)$ by developing new dual notions of gaps, based on global sensitivity analyses of linear programs, that quantify the suboptimality of each such set of constraints. The latter costs are controlled to $O(1)$ by explicitly analysing the solutions of optimistic play.

Updated: 2024-07-01 15:26:27

标题: 未知多面体上的安全线性赌臂算法

摘要: 安全线性赌博问题（SLB）是一种在线方法，用于处理具有未知目标和未知轮次约束的线性规划，在随机赌博反馈奖励和行动的安全风险下。我们研究了SLB在多面体上的效力和平滑安全成本之间的权衡，以及积极的双重乐观游戏在避免现有悲观-乐观方法所做的强假设方面的作用。我们首先阐明了由于对约束缺乏知识而导致的SLB困难的固有难度：存在“简单”实例，其中次优极端点具有较大的“间隙”，但SLB方法仍必须承担$ \ Omega（\ sqrt {T}）$后悔或安全违规，因为无法解决未知的最优解到任意精度。然后，我们分析了一种自然的双重乐观策略，用于安全线性赌博问题，DOSS，它使用奖励和安全风险的乐观估计来选择行动，并且表明尽管缺乏对约束或可行点的知识，DOSS同时获得了对效力后悔的紧密实例相关$ O（\ log ^ 2 T）$界限，以及对安全违规的$ \ tilde O（\ sqrt {T}）$界限。此外，当对安全性要求以有限精度时，违规改进为$ O（\ log ^ 2 T）$。这些结果依赖于线性赌博的新颖双重分析：我们认为\algoname会在每轮激活至少$ d $个约束的噪声版本，这使我们能够分别分析激活“差”约束集的轮次和激活“好”约束集的轮次。前者的成本通过开发新的基于线性规划全局敏感性分析的间隙的对偶概念来控制为$ O（\ log ^ 2 T）$，这些概念量化了每个这种约束集的次优性。后者的成本通过明确分析乐观玩法的解来控制为$ O（1）$。

更新时间: 2024-07-01 15:26:27

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2209.13694v3

Binary Losses for Density Ratio Estimation

Estimating the ratio of two probability densities from finitely many observations of the densities, is a central problem in machine learning and statistics. A large class of methods constructs estimators from binary classifiers which distinguish observations from the two densities. However, the error of these constructions depends on the choice of the binary loss function, raising the question of which loss function to choose based on desired error properties. In this work, we start from prescribed error measures in a class of Bregman divergences and characterize all loss functions that lead to density ratio estimators with a small error. Our characterization provides a simple recipe for constructing loss functions with certain properties, such as loss functions that prioritize an accurate estimation of large values. This contrasts with classical loss functions, such as the logistic loss or boosting loss, which prioritize accurate estimation of small values. We provide numerical illustrations with kernel methods and test their performance in applications of parameter selection for deep domain adaptation.

Updated: 2024-07-01 15:24:34

标题: 二元损失函数用于密度比估计

摘要: 从有限数量的密度观测中估计两个概率密度的比率是机器学习和统计学中的一个核心问题。一个大类方法是从能够区分两个密度观测的二元分类器构建估计器。然而，这些构建的误差取决于二元损失函数的选择，从而提出了基于所需误差性质选择损失函数的问题。在这项工作中，我们从一类Bregman散度中规定的误差测量开始，并表征了所有导致密度比率估计器误差较小的损失函数。我们的表征提供了一个简单的配方，用于构建具有特定属性的损失函数，如优先准确估计大值的损失函数。这与传统的损失函数形成对比，如逻辑损失或提升损失，其优先考虑小值的准确估计。我们通过核方法提供了数值示例，并测试它们在深度领域适应参数选择应用中的性能。

更新时间: 2024-07-01 15:24:34

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2407.01371v1

BeHonest: Benchmarking Honesty of Large Language Models

Previous works on Large Language Models (LLMs) have mainly focused on evaluating their helpfulness or harmlessness. However, honesty, another crucial alignment criterion, has received relatively less attention. Dishonest behaviors in LLMs, such as spreading misinformation and defrauding users, eroding user trust, and causing real-world harm, present severe risks that intensify as these models approach superintelligence levels. Enhancing honesty in LLMs addresses critical deficiencies and helps uncover latent capabilities that are not readily expressed. This underscores the urgent need for reliable methods and benchmarks to effectively ensure and evaluate the honesty of LLMs. In this paper, we introduce BeHonest, a pioneering benchmark specifically designed to assess honesty in LLMs comprehensively. BeHonest evaluates three essential aspects of honesty: awareness of knowledge boundaries, avoidance of deceit, and consistency in responses. Building on this foundation, we designed 10 scenarios to evaluate and analyze 9 popular LLMs on the market, including both closed-source and open-source models from different model families with varied model sizes. Our findings indicate that there is still significant room for improvement in the honesty of LLMs. We also encourage the AI community to prioritize honesty alignment in LLMs. Our benchmark and code can be found at: \url{https://github.com/GAIR-NLP/BeHonest}.

Updated: 2024-07-01 15:18:07

标题: 诚实之道：大型语言模型诚实性的基准测试

摘要: 之前关于大型语言模型（LLMs）的研究主要集中在评估它们的帮助性或无害性上。然而，诚实性，另一个关键的对齐准则，却受到相对较少的关注。LLMs中的不诚实行为，如传播错误信息和欺骗用户，侵蚀用户信任，造成现实世界的危害，带来了严重的风险，随着这些模型接近超级智能水平，这些风险会加剧。增强LLMs的诚实性可以解决关键缺陷，并帮助发现不容易表达的潜在能力。这强调了迫切需要可靠方法和基准来有效确保和评估LLMs的诚实性。在本文中，我们介绍了BeHonest，一个专门设计用于全面评估LLMs诚实性的开创性基准。BeHonest评估诚实性的三个关键方面：意识到知识边界、避免欺骗和响应一致性。基于这一基础，我们设计了10个场景，评估和分析了市场上的9个流行LLMs，包括来自不同模型系列、不同模型大小的闭源和开源模型。我们的研究结果表明，LLMs的诚实性仍有很大的改进空间。我们也鼓励AI社区将LLMs中的诚实性对齐作为优先考虑。我们的基准和代码可以在以下网址找到：\url{https://github.com/GAIR-NLP/BeHonest}。

更新时间: 2024-07-01 15:18:07

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2406.13261v2

Spotting LLMs With Binoculars: Zero-Shot Detection of Machine-Generated Text

Detecting text generated by modern large language models is thought to be hard, as both LLMs and humans can exhibit a wide range of complex behaviors. However, we find that a score based on contrasting two closely related language models is highly accurate at separating human-generated and machine-generated text. Based on this mechanism, we propose a novel LLM detector that only requires simple calculations using a pair of pre-trained LLMs. The method, called Binoculars, achieves state-of-the-art accuracy without any training data. It is capable of spotting machine text from a range of modern LLMs without any model-specific modifications. We comprehensively evaluate Binoculars on a number of text sources and in varied situations. Over a wide range of document types, Binoculars detects over 90% of generated samples from ChatGPT (and other LLMs) at a false positive rate of 0.01%, despite not being trained on any ChatGPT data.

Updated: 2024-07-01 15:17:10

标题: 用双筒望远镜发现LLMs：机器生成文本的零-shot检测

摘要: 检测现代大型语言模型生成的文本被认为是困难的，因为LLMs和人类都可以展示广泛的复杂行为。然而，我们发现基于对比两个密切相关的语言模型的得分在区分人类生成和机器生成的文本方面非常准确。基于这一机制，我们提出了一种新颖的LLM检测器，只需要使用一对预先训练的LLMs进行简单的计算。该方法名为双筒望远镜，实现了最先进的准确性，而无需任何训练数据。它能够在不进行任何特定于模型的修改的情况下，从一系列现代LLMs中发现机器文本。我们全面评估了双筒望远镜在多种文本来源和不同情况下的性能。在各种文档类型中，双筒望远镜以0.01%的误报率检测出ChatGPT（和其他LLMs）生成样本的超过90%，尽管没有在任何ChatGPT数据上进行训练。

更新时间: 2024-07-01 15:17:10

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2401.12070v2

tPARAFAC2: Tracking evolving patterns in (incomplete) temporal data

Tensor factorizations have been widely used for the task of uncovering patterns in various domains. Often, the input is time-evolving, shifting the goal to tracking the evolution of underlying patterns instead. To adapt to this more complex setting, existing methods incorporate temporal regularization but they either have overly constrained structural requirements or lack uniqueness which is crucial for interpretation. In this paper, in order to capture the underlying evolving patterns, we introduce t(emporal)PARAFAC2 which utilizes temporal smoothness regularization on the evolving factors. We propose an algorithmic framework that employs Alternating Optimization (AO) and the Alternating Direction Method of Multipliers (ADMM) to fit the model. Furthermore, we extend the algorithmic framework to the case of partially observed data. Our numerical experiments on both simulated and real datasets demonstrate the effectiveness of the temporal smoothness regularization, in particular, in the case of data with missing entries. We also provide an extensive comparison of different approaches for handling missing data within the proposed framework.

Updated: 2024-07-01 15:10:55

标题: tPARAFAC2：追踪（不完整的）时间数据中不断演变的模式

摘要: 张量分解被广泛用于揭示各个领域中的模式。通常，输入是随时间变化的，使目标转向跟踪潜在模式的演变。为了适应这种更复杂的情况，现有方法会加入时间正则化，但它们要么具有过于约束的结构要求，要么缺乏关键的唯一性以便进行解释。在本文中，为了捕捉潜在的演变模式，我们引入了t(PARAFAC2)，它利用对演变因子进行时间平滑正则化。我们提出了一个算法框架，利用交替优化（AO）和多重方向法（ADMM）来拟合模型。此外，我们将算法框架扩展到部分观测数据的情况。我们在模拟和真实数据集上的数值实验表明，时间平滑正则化的有效性，特别是在具有缺失条目的数据情况下。我们还对在提出的框架内处理缺失数据的不同方法进行了广泛比较。

更新时间: 2024-07-01 15:10:55

领域: cs.LG

下载: http://arxiv.org/abs/2407.01356v1

Hyperspectral Pansharpening: Critical Review, Tools and Future Perspectives

Hyperspectral pansharpening consists of fusing a high-resolution panchromatic band and a low-resolution hyperspectral image to obtain a new image with high resolution in both the spatial and spectral domains. These remote sensing products are valuable for a wide range of applications, driving ever growing research efforts. Nonetheless, results still do not meet application demands. In part, this comes from the technical complexity of the task: compared to multispectral pansharpening, many more bands are involved, in a spectral range only partially covered by the panchromatic component and with overwhelming noise. However, another major limiting factor is the absence of a comprehensive framework for the rapid development and accurate evaluation of new methods. This paper attempts to address this issue. We started by designing a dataset large and diverse enough to allow reliable training (for data-driven methods) and testing of new methods. Then, we selected a set of state-of-the-art methods, following different approaches, characterized by promising performance, and reimplemented them in a single PyTorch framework. Finally, we carried out a critical comparative analysis of all methods, using the most accredited quality indicators. The analysis highlights the main limitations of current solutions in terms of spectral/spatial quality and computational efficiency, and suggests promising research directions. To ensure full reproducibility of the results and support future research, the framework (including codes, evaluation procedures and links to the dataset) is shared on https://github.com/matciotola/hyperspectral_pansharpening_toolbox, as a single Python-based reference benchmark toolbox.

Updated: 2024-07-01 15:10:50

标题: 高光谱全色融合：关键审查、工具及未来展望

摘要: 高光谱泛谱融合包括将高分辨率全色波段和低分辨率高光谱图像融合，以获得在空间和光谱域中具有高分辨率的新图像。这些遥感产品对各种应用都非常有价值，推动着不断增长的研究工作。然而，结果仍然无法满足应用需求。部分原因是任务的技术复杂性：与多光谱泛谱融合相比，涉及的波段更多，在全色组件部分覆盖的光谱范围内，并且存在压倒性的噪音。然而，另一个主要限制因素是缺乏一个快速开发和准确评估新方法的综合框架。本文试图解决这个问题。我们首先设计了一个足够大而多样化的数据集，以允许可靠的训练（针对数据驱动方法）和测试新方法。然后，我们选择了一组最先进的方法，采用不同的方法，具有有希望的性能，并在一个PyTorch框架中重新实现了它们。最后，我们对所有方法进行了批判性的比较分析，使用最受认可的质量指标。分析突出了当前解决方案在光谱/空间质量和计算效率方面的主要限制，并提出了有希望的研究方向。为了确保结果的完全可重复性并支持未来研究，该框架（包括代码、评估程序和数据集链接）在https://github.com/matciotola/hyperspectral_pansharpening_toolbox上共享，作为一个基于Python的单一参考基准工具箱。

更新时间: 2024-07-01 15:10:50

领域: cs.CV,cs.AI,eess.IV

下载: http://arxiv.org/abs/2407.01355v1

Coordination Failure in Cooperative Offline MARL

Offline multi-agent reinforcement learning (MARL) leverages static datasets of experience to learn optimal multi-agent control. However, learning from static data presents several unique challenges to overcome. In this paper, we focus on coordination failure and investigate the role of joint actions in multi-agent policy gradients with offline data, focusing on a common setting we refer to as the 'Best Response Under Data' (BRUD) approach. By using two-player polynomial games as an analytical tool, we demonstrate a simple yet overlooked failure mode of BRUD-based algorithms, which can lead to catastrophic coordination failure in the offline setting. Building on these insights, we propose an approach to mitigate such failure, by prioritising samples from the dataset based on joint-action similarity during policy learning and demonstrate its effectiveness in detailed experiments. More generally, however, we argue that prioritised dataset sampling is a promising area for innovation in offline MARL that can be combined with other effective approaches such as critic and policy regularisation. Importantly, our work shows how insights drawn from simplified, tractable games can lead to useful, theoretically grounded insights that transfer to more complex contexts. A core dimension of offering is an interactive notebook, from which almost all of our results can be reproduced, in a browser.

Updated: 2024-07-01 14:51:29

标题: 合作离线多智能体强化学习中的协调失败

摘要: 离线多智能体强化学习（MARL）利用静态数据集的经验来学习最佳的多智能体控制。然而，从静态数据中学习存在一些独特的挑战需要克服。在本文中，我们关注协调失败，并研究在离线数据中多智能体策略梯度中联合动作的作用，重点讨论一种我们称之为“数据下的最佳响应”（BRUD）方法的常见设置。通过使用两人多项式博弈作为分析工具，我们展示了基于BRUD算法的一个简单但被忽视的失败模式，这可能导致离线设置中的灾难性协调失败。基于这些见解，我们提出了一种减轻这种失败的方法，通过在策略学习过程中根据联合动作相似性对数据集中的样本进行优先处理，并在详细实验中展示了其有效性。然而，更一般地，我们认为优先数据集抽样是离线MARL中一个有前途的创新领域，可以与其他有效方法（如评论家和策略规范化）结合使用。重要的是，我们的工作展示了如何从简化的、可处理的游戏中得出的见解可以导致有用的、理论上的见解，并转移到更复杂的环境中。一个核心的特点是一个交互式笔记本，几乎所有我们的结果都可以在浏览器中重现。

更新时间: 2024-07-01 14:51:29

领域: cs.LG,cs.AI,cs.MA

下载: http://arxiv.org/abs/2407.01343v1

Stationary Kernels and Gaussian Processes on Lie Groups and their Homogeneous Spaces II: non-compact symmetric spaces

Gaussian processes are arguably the most important class of spatiotemporal models within machine learning. They encode prior information about the modeled function and can be used for exact or approximate Bayesian learning. In many applications, particularly in physical sciences and engineering, but also in areas such as geostatistics and neuroscience, invariance to symmetries is one of the most fundamental forms of prior information one can consider. The invariance of a Gaussian process' covariance to such symmetries gives rise to the most natural generalization of the concept of stationarity to such spaces. In this work, we develop constructive and practical techniques for building stationary Gaussian processes on a very large class of non-Euclidean spaces arising in the context of symmetries. Our techniques make it possible to (i) calculate covariance kernels and (ii) sample from prior and posterior Gaussian processes defined on such spaces, both in a practical manner. This work is split into two parts, each involving different technical considerations: part I studies compact spaces, while part II studies non-compact spaces possessing certain structure. Our contributions make the non-Euclidean Gaussian process models we study compatible with well-understood computational techniques available in standard Gaussian process software packages, thereby making them accessible to practitioners.

Updated: 2024-07-01 14:48:19

标题: 固定核和Lie群上的高斯过程及其齐次空间II：非紧对称空间

摘要: 高斯过程可以说是机器学习中最重要的时空模型类别。它们编码了关于建模函数的先验信息，并可用于精确或近似的贝叶斯学习。在许多应用中，尤其是在物理科学和工程领域，以及地统计学和神经科学领域，对对称性的不变性是可以考虑的最基本的先验信息形式之一。高斯过程的协方差对这些对称性的不变性导致了将该概念自然推广到这些空间的最自然方式。在这项工作中，我们开发了一种构建在与对称性相关的非欧几里得空间中的平稳高斯过程的构造性和实用技术。我们的技术使得在这些空间上计算协方差核和从先验和后验高斯过程中采样成为可能，而且是以实用的方式。这项工作分为两部分，每部分涉及不同的技术考虑：第一部分研究紧致空间，而第二部分研究具有特定结构的非紧致空间。我们的贡献使我们研究的非欧几里得高斯过程模型与标准高斯过程软件包中可用的良好理解的计算技术兼容，从而使其对从业者可见。

更新时间: 2024-07-01 14:48:19

领域: stat.ME,cs.LG,math.ST,stat.ML,stat.TH

下载: http://arxiv.org/abs/2301.13088v3

Inverse Evolution Layers: Physics-informed Regularizers for Deep Neural Networks

Traditional image processing methods employing partial differential equations (PDEs) offer a multitude of meaningful regularizers, along with valuable theoretical foundations for a wide range of image-related tasks. This makes their integration into neural networks a promising avenue. In this paper, we introduce a novel regularization approach inspired by the reverse process of PDE-based evolution models. Specifically, we propose inverse evolution layers (IELs), which serve as bad property amplifiers to penalize neural networks of which outputs have undesired characteristics. Using IELs, one can achieve specific regularization objectives and endow neural networks' outputs with corresponding properties of the PDE models. Our experiments, focusing on semantic segmentation tasks using heat-diffusion IELs, demonstrate their effectiveness in mitigating noisy label effects. Additionally, we develop curve-motion IELs to enforce convex shape regularization in neural network-based segmentation models for preventing the generation of concave outputs. Theoretical analysis confirms the efficacy of IELs as an effective regularization mechanism, particularly in handling training with label issues.

Updated: 2024-07-01 14:47:32

标题: 逆演化层：物理信息正则化器用于深度神经网络

摘要: 传统图像处理方法采用偏微分方程（PDEs）提供了许多有意义的正则化器，以及广泛的与图像相关任务的宝贵理论基础。这使得它们被整合到神经网络中成为一个有前途的途径。在本文中，我们介绍了一种受基于PDE的演化模型的逆向过程启发的新型正则化方法。具体来说，我们提出了逆演化层（IELs），它们作为不良属性放大器，惩罚具有不良特征的神经网络的输出。使用IELs，可以实现特定的正则化目标，并赋予神经网络输出与PDE模型相应的特性。我们的实验主要集中在使用热扩散IELs进行语义分割任务，证明了它们在减轻嘈杂标签效果方面的有效性。此外，我们开发了曲线运动IELs，以在基于神经网络的分割模型中强制实施凸形状正则化，以防止生成凹形输出。理论分析证实了IELs作为一种有效的正则化机制，特别是在处理带有标签问题的训练中。

更新时间: 2024-07-01 14:47:32

领域: cs.LG,cs.NA,math.NA

下载: http://arxiv.org/abs/2307.07344v2

On the Convergence of Multi-objective Optimization under Generalized Smoothness

Multi-objective optimization (MOO) is receiving more attention in various fields such as multi-task learning. Recent works provide some effective algorithms with theoretical analysis but they are limited by the standard $L$-smooth or bounded-gradient assumptions, which are typically unsatisfactory for neural networks, such as recurrent neural networks (RNNs) and transformers. In this paper, we study a more general and realistic class of $\ell$-smooth loss functions, where $\ell$ is a general non-decreasing function of gradient norm. We develop two novel single-loop algorithms for $\ell$-smooth MOO problems, Generalized Smooth Multi-objective Gradient descent (GSMGrad) and its stochastic variant, Stochastic Generalized Smooth Multi-objective Gradient descent (SGSMGrad), which approximate the conflict-avoidant (CA) direction that maximizes the minimum improvement among objectives. We provide a comprehensive convergence analysis of both algorithms and show that they converge to an $\epsilon$-accurate Pareto stationary point with a guaranteed $\epsilon$-level average CA distance (i.e., the gap between the updating direction and the CA direction) over all iterations, where totally $\mathcal{O}(\epsilon^{-2})$ and $\mathcal{O}(\epsilon^{-4})$ samples are needed for deterministic and stochastic settings, respectively. Our algorithms can also guarantee a tighter $\epsilon$-level CA distance in each iteration using more samples. Moreover, we propose a practical variant of GSMGrad named GSMGrad-FA using only constant-level time and space, while achieving the same performance guarantee as GSMGrad. Our experiments validate our theory and demonstrate the effectiveness of the proposed methods.

Updated: 2024-07-01 14:43:51

标题: 关于广义平滑条件下多目标优化的收敛性

摘要: 多目标优化（MOO）在各个领域，如多任务学习中受到越来越多的关注。最近的研究提供了一些有效的算法，并进行了理论分析，但它们受到了标准的$L$-smooth或有界梯度假设的限制，这些假设通常对于神经网络，如循环神经网络（RNNs）和transformers是不令人满意的。在本文中，我们研究了一个更一般和现实的类别$\ell$-smooth损失函数，其中$\ell$是一个梯度范数的一般非递减函数。我们为$\ell$-smooth MOO问题开发了两种新颖的单循环算法，分别是Generalized Smooth Multi-objective Gradient descent（GSMGrad）及其随机变体，Stochastic Generalized Smooth Multi-objective Gradient descent（SGSMGrad），这些算法近似冲突避免（CA）方向，该方向最大化目标之间的最小改善。我们提供了这两种算法的全面收敛分析，证明它们收敛到一个$\epsilon$-准确的帕累托稳定点，并保证了所有迭代中的$\epsilon$级平均CA距离（即更新方向与CA方向之间的差距），其中确定性和随机设置分别需要$\mathcal{O}(\epsilon^{-2})$和$\mathcal{O}(\epsilon^{-4})$个样本。我们的算法还可以使用更多样本在每次迭代中保证更紧的$\epsilon$级CA距离。此外，我们提出了一个名为GSMGrad-FA的实用变体，只需恒定级的时间和空间，同时实现了与GSMGrad相同的性能保证。我们的实验证实了我们的理论，并展示了所提方法的有效性。

更新时间: 2024-07-01 14:43:51

领域: cs.LG,math.OC,stat.ML

下载: http://arxiv.org/abs/2405.19440v3

Rethinking LLM Memorization through the Lens of Adversarial Compression

Large language models (LLMs) trained on web-scale datasets raise substantial concerns regarding permissible data usage. One major question is whether these models "memorize" all their training data or they integrate many data sources in some way more akin to how a human would learn and synthesize information. The answer hinges, to a large degree, on how we define memorization. In this work, we propose the Adversarial Compression Ratio (ACR) as a metric for assessing memorization in LLMs. A given string from the training data is considered memorized if it can be elicited by a prompt (much) shorter than the string itself -- in other words, if these strings can be "compressed" with the model by computing adversarial prompts of fewer tokens. The ACR overcomes the limitations of existing notions of memorization by (i) offering an adversarial view of measuring memorization, especially for monitoring unlearning and compliance; and (ii) allowing for the flexibility to measure memorization for arbitrary strings at a reasonably low compute. Our definition serves as a practical tool for determining when model owners may be violating terms around data usage, providing a potential legal tool and a critical lens through which to address such scenarios.

Updated: 2024-07-01 14:43:11

标题: 重新审视通过对抗性压缩镜头下的LLM记忆化

摘要: 大型语言模型（LLMs）在网络规模数据集上训练引发了关于可允许数据使用的重大担忧。一个主要问题是这些模型是否“记忆”了所有的训练数据，或者它们以某种更接近人类学习和综合信息的方式整合了许多数据源。答案在很大程度上取决于我们如何定义记忆。在这项工作中，我们提出了对LLMs记忆进行评估的Adversarial Compression Ratio（ACR）作为度量标准。如果可以通过比字符串本身（远）更短的提示引出训练数据中的某个字符串，则认为该字符串已被记忆--换句话说，如果这些字符串可以通过计算更少标记的对抗提示与模型“压缩”。ACR通过（i）提供衡量记忆的对抗视图，尤其是用于监控遗忘和合规性；以及（ii）允许灵活地以相对较低的计算量测量任意字符串的记忆，克服了现有记忆概念的局限性。我们的定义作为一个实用工具，用于确定模型所有者何时可能违反围绕数据使用的条款，提供潜在的法律工具和一个关键的视角，以解决这类情况。

更新时间: 2024-07-01 14:43:11

领域: cs.LG,cs.CL

下载: http://arxiv.org/abs/2404.15146v2

Protecting Privacy in Classifiers by Token Manipulation

Using language models as a remote service entails sending private information to an untrusted provider. In addition, potential eavesdroppers can intercept the messages, thereby exposing the information. In this work, we explore the prospects of avoiding such data exposure at the level of text manipulation. We focus on text classification models, examining various token mapping and contextualized manipulation functions in order to see whether classifier accuracy may be maintained while keeping the original text unrecoverable. We find that although some token mapping functions are easy and straightforward to implement, they heavily influence performance on the downstream task, and via a sophisticated attacker can be reconstructed. In comparison, the contextualized manipulation provides an improvement in performance.

Updated: 2024-07-01 14:41:59

标题: 通过令牌操作保护分类器中的隐私

摘要: 将语言模型用作远程服务意味着向一个不受信任的提供者发送私人信息。此外，潜在的窃听者可以拦截消息，从而暴露信息。在这项工作中，我们探讨了在文本处理层面避免数据暴露的前景。我们专注于文本分类模型，研究各种令牌映射和上下文化处理函数，以查看是否可以在保持原始文本不可恢复的同时保持分类器准确性。我们发现，尽管一些令牌映射函数易于实现且直接，但它们严重影响下游任务的性能，并且通过复杂的攻击者可以重建。相比之下，上下文化处理提升了性能。

更新时间: 2024-07-01 14:41:59

领域: cs.CL,cs.CR

下载: http://arxiv.org/abs/2407.01334v1

Deep Reinforcement Learning for Adverse Garage Scenario Generation

Autonomous vehicles need to travel over 11 billion miles to ensure their safety. Therefore, the importance of simulation testing before real-world testing is self-evident. In recent years, the release of 3D simulators for autonomous driving, represented by Carla and CarSim, marks the transition of autonomous driving simulation testing environments from simple 2D overhead views to complex 3D models. During simulation testing, experimenters need to build static scenes and dynamic traffic flows, pedestrian flows, and other experimental elements to construct experimental scenarios. When building static scenes in 3D simulators, experimenters often need to manually construct 3D models, set parameters and attributes, which is time-consuming and labor-intensive. This thesis proposes an automated program generation framework. Based on deep reinforcement learning, this framework can generate different 2D ground script codes, on which 3D model files and map model files are built. The generated 3D ground scenes are displayed in the Carla simulator, where experimenters can use this scene for navigation algorithm simulation testing.

Updated: 2024-07-01 14:41:18

标题: 深度强化学习用于不利车库场景生成

摘要: 自动驾驶汽车需要行驶超过110亿英里以确保安全。因此，在进行现实世界测试之前进行模拟测试的重要性是不言而喻的。近年来，以Carla和CarSim为代表的自动驾驶3D模拟器的发布标志着自动驾驶模拟测试环境从简单的2D俯视图转变为复杂的3D模型。在模拟测试期间，实验者需要构建静态场景和动态交通流、行人流等实验元素以构建实验场景。在3D模拟器中构建静态场景时，实验者通常需要手动构建3D模型、设置参数和属性，这是耗时和劳动密集的。本文提出了一个自动生成程序的框架。基于深度强化学习，该框架可以生成不同的2D地面脚本代码，用于构建3D模型文件和地图模型文件。生成的3D地面场景在Carla模拟器中显示，实验者可以使用这个场景进行导航算法模拟测试。

更新时间: 2024-07-01 14:41:18

领域: cs.AI,cs.LG,cs.RO,I.2.0; I.2.6

下载: http://arxiv.org/abs/2407.01333v1

From Alexnet to Transformers: Measuring the Non-linearity of Deep Neural Networks with Affine Optimal Transport

In the last decade, we have witnessed the introduction of several novel deep neural network (DNN) architectures exhibiting ever-increasing performance across diverse tasks. Explaining the upward trend of their performance, however, remains difficult as different DNN architectures of comparable depth and width -- common factors associated with their expressive power -- may exhibit a drastically different performance even when trained on the same dataset. In this paper, we introduce the concept of the non-linearity signature of DNN, the first theoretically sound solution for approximately measuring the non-linearity of deep neural networks. Built upon a score derived from closed-form optimal transport mappings, this signature provides a better understanding of the inner workings of a wide range of DNN architectures and learning paradigms, with a particular emphasis on the computer vision task. We provide extensive experimental results that highlight the practical usefulness of the proposed non-linearity signature and its potential for long-reaching implications. The code for our work is available at https://github.com/qbouniot/AffScoreDeep

Updated: 2024-07-01 14:39:54

标题: 从Alexnet到变压器：用仿射最优输运测量深度神经网络的非线性

摘要: 在过去的十年里，我们目睹了几种新颖的深度神经网络（DNN）架构的引入，它们在不同任务中的性能不断提高。然而，解释它们性能上升的趋势仍然很困难，因为即使在相同数据集上训练，具有相似深度和宽度的不同DNN架构 - 与它们的表达能力相关的常见因素 - 也可能表现出截然不同的性能。在本文中，我们引入了DNN的非线性特征的概念，这是第一个理论上合理的解决方案，用于近似测量深度神经网络的非线性。建立在从闭合形式最优传输映射中导出的得分的基础上，这个特征提供了对广泛的DNN架构和学习范式内部工作机制的更好理解，特别强调计算机视觉任务。我们提供了大量实验结果，突出了所提出的非线性特征的实际用途及其潜在的深远影响。我们的工作代码可在https://github.com/qbouniot/AffScoreDeep 上找到。

更新时间: 2024-07-01 14:39:54

领域: cs.LG,cs.AI,stat.ML

下载: http://arxiv.org/abs/2310.11439v3

Restyling Unsupervised Concept Based Interpretable Networks with Generative Models

Developing inherently interpretable models for prediction has gained prominence in recent years. A subclass of these models, wherein the interpretable network relies on learning high-level concepts, are valued because of closeness of concept representations to human communication. However, the visualization and understanding of the learnt unsupervised dictionary of concepts encounters major limitations, specially for large-scale images. We propose here a novel method that relies on mapping the concept features to the latent space of a pretrained generative model. The use of a generative model enables high quality visualization, and naturally lays out an intuitive and interactive procedure for better interpretation of the learnt concepts. Furthermore, leveraging pretrained generative models has the additional advantage of making the training of the system more efficient. We quantitatively ascertain the efficacy of our method in terms of accuracy of the interpretable prediction network, fidelity of reconstruction, as well as faithfulness and consistency of learnt concepts. The experiments are conducted on multiple image recognition benchmarks for large-scale images. Project page available at https://jayneelparekh.github.io/VisCoIN_project_page/

Updated: 2024-07-01 14:39:41

标题: 用生成模型对无监督概念为基础的可解释网络进行重新设计

摘要: 在近年来，为预测开发本质上可解释的模型已经引起了人们的关注。其中一类这些模型的子类，即可解释网络依赖于学习高级概念的模型，因为概念表示与人类沟通的接近而受到重视。然而，学习的无监督概念字典的可视化和理解遇到了重大限制，特别是对于大规模图像而言。我们提出了一种新颖的方法，该方法依赖于将概念特征映射到预训练生成模型的潜在空间中。使用生成模型可以实现高质量的可视化，并自然地提供了一个直观和交互式的过程，以更好地解释学习的概念。此外，利用预训练生成模型还具有使系统训练更加高效的额外优势。我们在可解释预测网络的准确性、重建的保真度以及学习的概念的忠实性和一致性方面定量确定了我们方法的有效性。实验在多个大规模图像识别基准上进行。项目页面可在https://jayneelparekh.github.io/VisCoIN_project_page/ 上找到。

更新时间: 2024-07-01 14:39:41

领域: cs.CV,cs.AI,cs.LG

下载: http://arxiv.org/abs/2407.01331v1

Gradient-based Class Weighting for Unsupervised Domain Adaptation in Dense Prediction Visual Tasks

In unsupervised domain adaptation (UDA), where models are trained on source data (e.g., synthetic) and adapted to target data (e.g., real-world) without target annotations, addressing the challenge of significant class imbalance remains an open issue. Despite considerable progress in bridging the domain gap, existing methods often experience performance degradation when confronted with highly imbalanced dense prediction visual tasks like semantic and panoptic segmentation. This discrepancy becomes especially pronounced due to the lack of equivalent priors between the source and target domains, turning class imbalanced techniques used for other areas (e.g., image classification) ineffective in UDA scenarios. This paper proposes a class-imbalance mitigation strategy that incorporates class-weights into the UDA learning losses, but with the novelty of estimating these weights dynamically through the loss gradient, defining a Gradient-based class weighting (GBW) learning. GBW naturally increases the contribution of classes whose learning is hindered by large-represented classes, and has the advantage of being able to automatically and quickly adapt to the iteration training outcomes, avoiding explicitly curricular learning patterns common in loss-weighing strategies. Extensive experimentation validates the effectiveness of GBW across architectures (convolutional and transformer), UDA strategies (adversarial, self-training and entropy minimization), tasks (semantic and panoptic segmentation), and datasets (GTA and Synthia). Analysing the source of advantage, GBW consistently increases the recall of low represented classes.

Updated: 2024-07-01 14:34:25

标题: 基于梯度的类别加权在密集预测视觉任务中的无监督域自适应中的应用

摘要: 在无监督的领域自适应（UDA）中，模型在源数据（如合成数据）上训练，并在目标数据（如现实世界数据）上适应，而无需目标注释。解决显著的类别不平衡挑战仍然是一个未解决的问题。尽管在弥合领域差距方面取得了相当大的进展，但现有方法在面对高度不平衡的密集预测视觉任务（如语义和全景分割）时往往会出现性能下降。由于源域和目标域之间缺乏等价先验知识，导致在其他领域（如图像分类）中使用的类别不平衡技术在UDA场景中变得无效。本文提出了一种类别不平衡缓解策略，将类别权重融入到UDA学习损失中，但通过损失梯度动态估计这些权重的新颖性，定义了基于梯度的类别加权（GBW）学习。GBW自然地增加了那些受到大量表现类别阻碍的类别的贡献，并具有自动快速适应迭代训练结果的优势，避免了明确地采用课程学习模式在损失加权策略中常见的问题。大量实验验证了GBW在架构（卷积和变压器）、UDA策略（对抗性、自我训练和熵最小化）、任务（语义和全景分割）和数据集（GTA和Synthia）上的有效性。分析优势的来源，GBW始终增加了低表现类别的召回率。

更新时间: 2024-07-01 14:34:25

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2407.01327v1

SECOMP: Formally Secure Compilation of Compartmentalized C Programs

Undefined behavior in C often causes devastating security vulnerabilities. One practical mitigation is compartmentalization, which allows developers to structure large programs into mutually distrustful compartments with clearly specified privileges and interactions. In this paper we introduce SECOMP, a compiler for compartmentalized C code that comes with machine-checked proofs guaranteeing that the scope of undefined behavior is restricted to the compartments that encounter it and become dynamically compromised. These guarantees are formalized as the preservation of safety properties against adversarial contexts, a secure compilation criterion similar to full abstraction, and this is the first time such a strong criterion is proven for a mainstream programming language. To achieve this we extend the languages of the CompCert verified C compiler with isolated compartments that can only interact via procedure calls and returns, as specified by cross-compartment interfaces. We adapt the passes and optimizations of CompCert as well as their correctness proofs to this compartment-aware setting. We then use compiler correctness as an ingredient in a larger secure compilation proof that involves several proof engineering novelties, needed to scale formally secure compilation up to a C compiler.

Updated: 2024-07-01 14:31:41

标题: SECOMP：隔离的C程序的形式化安全编译

摘要: 在C语言中，未定义行为通常会导致严重的安全漏洞。一种实际的缓解方法是区隔化，它允许开发人员将大型程序结构化为相互不信任的区隔，具有明确定义的特权和交互。在本文中，我们介绍了SECOMP，这是一个用于区隔化C代码的编译器，具有经过机器检查的证明，保证未定义行为的范围被限制在遇到它并动态受损的区隔中。这些保证被形式化为针对对抗性环境的安全性属性的保留，这是一种类似于完全抽象的安全编译标准，这是首次为主流编程语言证明了这样强的标准。为了实现这一点，我们扩展了CompCert验证的C编译器的语言，使用孤立区隔，只能通过过程调用和返回来进行交互，如跨区隔接口所规定。我们调整了CompCert的传递和优化，以及它们在此区隔感知设置中的正确性证明。然后，我们将编译器正确性作为更大安全编译证明的一部分，其中涉及几种证明工程的创新，需要将形式化安全编译扩展到C编译器。

更新时间: 2024-07-01 14:31:41

领域: cs.PL,cs.CR

下载: http://arxiv.org/abs/2401.16277v4

Efficient Prompt Tuning by Multi-Space Projection and Prompt Fusion

Prompt tuning is a promising method to fine-tune a pre-trained language model without retraining its large-scale parameters. Instead, it attaches a soft prompt to the input text, whereby downstream tasks can be well adapted by merely learning the embeddings of prompt tokens. Nevertheless, existing methods still suffer from two challenges: (i) they are hard to balance accuracy and efficiency. A longer (shorter) soft prompt generally leads to a better(worse) accuracy but at the cost of more (less) training time. (ii)The performance may not be consistent when adapting to different downstream tasks. We attribute it to the same embedding space but responsible for different requirements of downstream tasks. To address these issues, we propose an Efficient Prompt Tuning method (EPT) by multi-space projection and prompt fusion. Specifically, it decomposes a given soft prompt into a shorter prompt and two low-rank matrices, significantly reducing the training time. Accuracy is also enhanced by leveraging low-rank matrices and the short prompt as additional knowledge sources to enrich the semantics of the original short prompt. In addition, we project the soft prompt into multiple subspaces to improve the performance consistency, and then adaptively learn the combination weights of different spaces through a gating network. Experiments on 13 natural language processing downstream tasks show that our method significantly and consistently outperforms 11 comparison methods with the relative percentage of improvements up to 12.9%, and training time decreased by 14%.

Updated: 2024-07-01 14:27:51

标题: 通过多空间投影和提示融合实现高效及时调整

摘要: 提示调整是一种有前途的方法，用于微调预训练的语言模型而无需重新训练其大规模参数。相反，它将一个软提示附加到输入文本中，从而下游任务可以通过仅学习提示标记的嵌入来很好地适应。然而，现有方法仍然面临两个挑战：(i)很难平衡准确性和效率。一个更长(更短)的软提示通常会导致更好(更差)的准确性，但以更多(更少)的训练时间为代价。(ii) 在适应不同下游任务时，性能可能不一致。我们将其归因于相同的嵌入空间，但负责不同的下游任务需求。为了解决这些问题，我们提出了一种高效提示调整方法(EPT)，通过多空间投影和提示融合。具体来说，它将给定的软提示分解为一个较短的提示和两个低秩矩阵，显著减少了训练时间。通过利用低秩矩阵和短提示作为额外的知识来源来丰富原始短提示的语义，也增加了准确性。此外，我们将软提示投影到多个子空间以提高性能一致性，然后通过一个门控网络自适应地学习不同空间的组合权重。在13个自然语言处理下游任务上的实验证明，我们的方法在相对改进百分比高达12.9%，并且训练时间减少了14%，明显且一致地优于11种比较方法。

更新时间: 2024-07-01 14:27:51

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2405.11464v2

Decomposing Global Feature Effects Based on Feature Interactions

Global feature effect methods, such as partial dependence plots, provide an intelligible visualization of the expected marginal feature effect. However, such global feature effect methods can be misleading, as they do not represent local feature effects of single observations well when feature interactions are present. We formally introduce generalized additive decomposition of global effects (GADGET), which is a new framework based on recursive partitioning to find interpretable regions in the feature space such that the interaction-related heterogeneity of local feature effects is minimized. We provide a mathematical foundation of the framework and show that it is applicable to the most popular methods to visualize marginal feature effects, namely partial dependence, accumulated local effects, and Shapley additive explanations (SHAP) dependence. Furthermore, we introduce and validate a new permutation-based interaction test to detect significant feature interactions that is applicable to any feature effect method that fits into our proposed framework. We empirically evaluate the theoretical characteristics of the proposed methods based on various feature effect methods in different experimental settings. Moreover, we apply our introduced methodology to three real-world examples to showcase their usefulness.

Updated: 2024-07-01 14:26:49

标题: 基于特征交互分解全局特征效应

摘要: 全局特征效应方法，如部分依赖图，提供了对期望边际特征效应的可理解可视化。然而，这种全局特征效应方法可能具有误导性，因为当存在特征交互作用时，它们无法很好地表示单个观测的局部特征效应。我们正式引入了广义加性全局效应分解（GADGET），这是一个基于递归分区的新框架，用于在特征空间中找到可解释的区域，从而最小化局部特征效应的交互相关的异质性。我们提供了该框架的数学基础，并展示了它适用于最流行的用于可视化边际特征效应的方法，即部分依赖、累积局部效应和Shapley加性解释（SHAP）依赖。此外，我们引入并验证了一种基于置换的交互测试，用于检测显著的特征交互作用，适用于符合我们提出的框架的任何特征效应方法。我们根据不同的实验设置对所提出的方法的理论特性进行了实证评估。此外，我们将我们引入的方法应用于三个现实世界的示例，展示它们的实用性。

更新时间: 2024-07-01 14:26:49

领域: stat.ML,cs.LG

下载: http://arxiv.org/abs/2306.00541v2

Increasing Model Capacity for Free: A Simple Strategy for Parameter Efficient Fine-tuning

Fine-tuning large pre-trained foundation models, such as the 175B GPT-3, has attracted more attention for downstream tasks recently. While parameter-efficient fine-tuning methods have been proposed and proven effective without retraining all model parameters, their performance is limited by the capacity of incremental modules, especially under constrained parameter budgets. \\ To overcome this challenge, we propose CapaBoost, a simple yet effective strategy that enhances model capacity by leveraging low-rank updates through parallel weight modules in target layers. By applying static random masks to the shared weight matrix, CapaBoost constructs a diverse set of weight matrices, effectively increasing the rank of incremental weights without adding parameters. Notably, our approach can be seamlessly integrated into various existing parameter-efficient fine-tuning methods. We extensively validate the efficacy of CapaBoost through experiments on diverse downstream tasks, including natural language understanding, question answering, and image classification. Our results demonstrate significant improvements over baselines, without incurring additional computation or storage costs. Our code is available at \url{https://github.com/LINs-lab/CapaBoost}.

Updated: 2024-07-01 14:26:48

标题: 增加模型容量免费：一种简单的参数高效微调策略

摘要: 最近，对于对于调整大型预训练基础模型，如175B GPT-3，以用于下游任务的方法引起了更多关注。虽然已经提出了参数高效的微调方法并证明了其有效性，但由于增量模块的容量限制，其性能受到了限制，尤其是在受限参数预算的情况下。为了克服这一挑战，我们提出了CapaBoost，这是一种简单而有效的策略，通过在目标层中利用低秩更新来增强模型容量。通过将静态随机掩码应用于共享权重矩阵，CapaBoost构建了一组不同的权重矩阵，有效地增加了增量权重的秩而不添加参数。值得注意的是，我们的方法可以无缝集成到各种现有的参数高效微调方法中。我们通过对各种下游任务进行实验证明了CapaBoost的有效性，包括自然语言理解、问答和图像分类。我们的结果显示，与基线相比，不会产生额外的计算或存储成本。我们的代码可在\url{https://github.com/LINs-lab/CapaBoost}上获得。

更新时间: 2024-07-01 14:26:48

领域: cs.LG,cs.AI,cs.CL

下载: http://arxiv.org/abs/2407.01320v1

Deep Dive into MRI: Exploring Deep Learning Applications in 0.55T and 7T MRI

The development of magnetic resonance imaging (MRI) for medical imaging has provided a leap forward in diagnosis, providing a safe, non-invasive alternative to techniques involving ionising radiation exposure for diagnostic purposes. It was described by Block and Purcel in 1946, and it was not until 1980 that the first clinical application of MRI became available. Since that time the MRI has gone through many advances and has altered the way diagnosing procedures are performed. Due to its ability to improve constantly, MRI has become a commonly used practice among several specialisations in medicine. Particularly starting 0.55T and 7T MRI technologies have pointed out enhanced preservation of image detail and advanced tissue characterisation. This review examines the integration of deep learning (DL) techniques into these MRI modalities, disseminating and exploring the study applications. It highlights how DL contributes to 0.55T and 7T MRI data, showcasing the potential of DL in improving and refining these technologies. The review ends with a brief overview of how MRI technology will evolve in the coming years.

Updated: 2024-07-01 14:26:31

标题: MRI的深度探索：探索0.55T和7T MRI中的深度学习应用

摘要: 医学成像磁共振成像（MRI）的发展为诊断提供了一个飞跃，提供了一种安全、无创的替代方法，用于诊断目的的技术涉及电离辐射暴露。它由Block和Purcel在1946年描述，直到1980年第一次临床应用MRI才可用。自那时以来，MRI经历了许多进步，改变了诊断程序的进行方式。由于其不断改进的能力，MRI已成为医学中几个专业常用的实践。特别是从0.55T和7T MRI技术开始，强调增强图像细节的保留和先进的组织表征。本综述考察了将深度学习（DL）技术整合到这些MRI模式中，传播和探索研究应用。它强调了DL如何有助于0.55T和7T MRI数据，展示了DL在改进和完善这些技术方面的潜力。综述最后简要概述了MRI技术在未来几年将如何发展。

更新时间: 2024-07-01 14:26:31

领域: eess.IV,cs.AI,cs.LG

下载: http://arxiv.org/abs/2407.01318v1

Leveraging Speaker Embeddings in End-to-End Neural Diarization for Two-Speaker Scenarios

End-to-end neural speaker diarization systems are able to address the speaker diarization task while effectively handling speech overlap. This work explores the incorporation of speaker information embeddings into the end-to-end systems to enhance the speaker discriminative capabilities, while maintaining their overlap handling strengths. To achieve this, we propose several methods for incorporating these embeddings along the acoustic features. Furthermore, we delve into an analysis of the correct handling of silence frames, the window length for extracting speaker embeddings and the transformer encoder size. The effectiveness of our proposed approach is thoroughly evaluated on the CallHome dataset for the two-speaker diarization task, with results that demonstrate a significant reduction in diarization error rates achieving a relative improvement of a 10.78% compared to the baseline end-to-end model.

Updated: 2024-07-01 14:26:28

标题: 利用说话者嵌入在端到端神经对话系统中用于双人说话场景

摘要: 端到端的神经说话者分割系统能够在有效处理语音重叠的同时解决说话者分割任务。本文探讨了将说话者信息嵌入到端到端系统中以增强说话者区分能力的方法，同时保持其处理重叠的优势。为了实现这一目标，我们提出了几种方法来将这些嵌入融入到声学特征中。此外，我们还深入分析了对沉默帧的正确处理、提取说话者嵌入的窗口长度以及变压器编码器大小。我们提出的方法在CallHome数据集上对两个说话者分割任务进行了全面评估，结果表明在相对改进方面，分割错误率显著降低，比基线端到端模型提高了10.78%。

更新时间: 2024-07-01 14:26:28

领域: cs.SD,cs.AI,eess.AS

下载: http://arxiv.org/abs/2407.01317v1

Jailbreak Vision Language Models via Bi-Modal Adversarial Prompt

In the realm of large vision language models (LVLMs), jailbreak attacks serve as a red-teaming approach to bypass guardrails and uncover safety implications. Existing jailbreaks predominantly focus on the visual modality, perturbing solely visual inputs in the prompt for attacks. However, they fall short when confronted with aligned models that fuse visual and textual features simultaneously for generation. To address this limitation, this paper introduces the Bi-Modal Adversarial Prompt Attack (BAP), which executes jailbreaks by optimizing textual and visual prompts cohesively. Initially, we adversarially embed universally harmful perturbations in an image, guided by a few-shot query-agnostic corpus (e.g., affirmative prefixes and negative inhibitions). This process ensures that image prompt LVLMs to respond positively to any harmful queries. Subsequently, leveraging the adversarial image, we optimize textual prompts with specific harmful intent. In particular, we utilize a large language model to analyze jailbreak failures and employ chain-of-thought reasoning to refine textual prompts through a feedback-iteration manner. To validate the efficacy of our approach, we conducted extensive evaluations on various datasets and LVLMs, demonstrating that our method significantly outperforms other methods by large margins (+29.03% in attack success rate on average). Additionally, we showcase the potential of our attacks on black-box commercial LVLMs, such as Gemini and ChatGLM.

Updated: 2024-07-01 14:25:23

标题: 透过双模态对抗提示实现越狱视觉语言模型

摘要: 在大规模视觉语言模型（LVLMs）领域，越狱攻击作为一种红队测试方法，旨在绕过防护栏并揭示安全隐患。现有的越狱主要集中在视觉模态上，仅扰乱提示中的视觉输入进行攻击。然而，当面对同时融合视觉和文本特征进行生成的对齐模型时，它们无法很好地发挥作用。为了解决这一局限性，本文介绍了双模态对抗提示攻击（BAP），通过优化文本和视觉提示来执行越狱。首先，我们在图像中对普遍有害的扰动进行对抗嵌入，由少量查询不可知语料库（例如，肯定前缀和否定抑制）引导。这个过程确保图像提示LVLM对任何有害查询做出积极回应。随后，利用对抗图像，我们优化具有特定有害意图的文本提示。特别地，我们利用大型语言模型分析越狱失败，并通过一种反馈迭代的方式利用思维链推理来完善文本提示。为了验证我们方法的有效性，我们对各种数据集和LVLM进行了广泛评估，结果表明我们的方法在攻击成功率上明显优于其他方法（平均增加了+29.03%）。此外，我们展示了我们的攻击对黑盒商业LVLMs（如Gemini和ChatGLM）的潜力。

更新时间: 2024-07-01 14:25:23

领域: cs.CV,cs.CR

下载: http://arxiv.org/abs/2406.04031v2

Evaluating Model Performance Under Worst-case Subpopulations

The performance of ML models degrades when the training population is different from that seen under operation. Towards assessing distributional robustness, we study the worst-case performance of a model over all subpopulations of a given size, defined with respect to core attributes Z. This notion of robustness can consider arbitrary (continuous) attributes Z, and automatically accounts for complex intersectionality in disadvantaged groups. We develop a scalable yet principled two-stage estimation procedure that can evaluate the robustness of state-of-the-art models. We prove that our procedure enjoys several finite-sample convergence guarantees, including dimension-free convergence. Instead of overly conservative notions based on Rademacher complexities, our evaluation error depends on the dimension of Z only through the out-of-sample error in estimating the performance conditional on Z. On real datasets, we demonstrate that our method certifies the robustness of a model and prevents deployment of unreliable models.

Updated: 2024-07-01 14:24:05

标题: 评估最不利子群体条件下的模型性能

摘要: 当训练人口与操作中看到的不同时，ML模型的性能会下降。为了评估分布鲁棒性，我们研究了模型在所有给定大小的子人口上的最坏情况表现，这些子人口是相对于核心属性Z定义的。这种鲁棒性概念可以考虑任意（连续）属性Z，并自动考虑劣势群体中的复杂交叉性。我们开发了一个可伸缩但有原则的两阶段估计过程，可以评估最先进模型的鲁棒性。我们证明了我们的过程享有几个有限样本收敛保证，包括无维度收敛。与基于Rademacher复杂度的过于保守的概念不同，我们的评估错误仅通过在Z上估计条件性能的样外错误依赖于Z的维度。在真实数据集上，我们展示了我们的方法证实了模型的鲁棒性，并阻止了不可靠模型的部署。

更新时间: 2024-07-01 14:24:05

领域: cs.LG,cs.CY,stat.ML

下载: http://arxiv.org/abs/2407.01316v1

Multi-State-Action Tokenisation in Decision Transformers for Multi-Discrete Action Spaces

Decision Transformers, in their vanilla form, struggle to perform on image-based environments with multi-discrete action spaces. Although enhanced Decision Transformer architectures have been developed to improve performance, these methods have not specifically addressed this problem of multi-discrete action spaces which hampers existing Decision Transformer architectures from learning good representations. To mitigate this, we propose Multi-State Action Tokenisation (M-SAT), an approach for tokenising actions in multi-discrete action spaces that enhances the model's performance in such environments. Our approach involves two key changes: disentangling actions to the individual action level and tokenising the actions with auxiliary state information. These two key changes also improve individual action level interpretability and visibility within the attention layers. We demonstrate the performance gains of M-SAT on challenging ViZDoom environments with multi-discrete action spaces and image-based state spaces, including the Deadly Corridor and My Way Home scenarios, where M-SAT outperforms the baseline Decision Transformer without any additional data or heavy computational overheads. Additionally, we find that removing positional encoding does not adversely affect M-SAT's performance and, in some cases, even improves it.

Updated: 2024-07-01 14:18:15

标题: 决策变换器中的多状态行动标记化对多离散行动空间

摘要: 决策变压器在其基本形式中很难在基于图像的环境中执行具有多离散动作空间的任务。尽管已经开发了增强型决策变压器架构来提高性能，但这些方法并没有专门解决多离散动作空间的问题，这一问题阻碍了现有决策变压器架构学习良好表示。为了缓解这一问题，我们提出了多状态动作标记化（M-SAT），一种在多离散动作空间中对动作进行标记化的方法，以增强模型在这种环境中的性能。我们的方法涉及两个关键变化：将动作分解到个体动作水平，并使用辅助状态信息对动作进行标记化。这两个关键变化还提高了个体动作水平的可解释性和在注意力层中的可见性。我们展示了M-SAT在具有多离散动作空间和基于图像的状态空间的具有挑战性的ViZDoom环境中的性能提升，包括Deadly Corridor和My Way Home场景，在这些场景中，M-SAT优于基线决策变压器，而无需额外数据或大量计算开销。此外，我们发现去除位置编码并不会对M-SAT的性能产生不利影响，并且在某些情况下甚至会提高性能。

更新时间: 2024-07-01 14:18:15

领域: cs.LG,cs.CV

下载: http://arxiv.org/abs/2407.01310v1

Statistical signatures of abstraction in deep neural networks

We study how abstract representations emerge in a Deep Belief Network (DBN) trained on benchmark datasets. Our analysis targets the principles of learning in the early stages of information processing, starting from the "primordial soup" of the under-sampling regime. As the data is processed by deeper and deeper layers, features are detected and removed, transferring more and more "context-invariant" information to deeper layers. We show that the representation approaches an universal model -- the Hierarchical Feature Model (HFM) -- determined by the principle of maximal relevance. Relevance quantifies the uncertainty on the model of the data, thus suggesting that "meaning" -- i.e. syntactic information -- is that part of the data which is not yet captured by a model. Our analysis shows that shallow layers are well described by pairwise Ising models, which provide a representation of the data in terms of generic, low order features. We also show that plasticity increases with depth, in a similar way as it does in the brain. These findings suggest that DBNs are capable of extracting a hierarchy of features from the data which is consistent with the principle of maximal relevance.

Updated: 2024-07-01 14:13:11

标题: 深度神经网络中抽象化的统计特征

摘要: 我们研究了在基准数据集上训练的深度信念网络（DBN）中抽象表示是如何出现的。我们的分析着眼于信息处理早期阶段的学习原则，从欠采样机制的“原始汤”开始。随着数据被更深层次处理，特征被检测并移除，将更多“上下文不变”信息传递给更深层次。我们展示了表示逼近一个通用模型--层次特征模型（HFM）--由最大相关性原则确定。相关性量化了数据模型的不确定性，因此暗示“意义”--即句法信息--是尚未被模型捕捉的数据部分。我们的分析显示浅层可被成对伊辛模型很好描述，提供了一种用通用、低阶特征表示数据的方法。我们还展示了可塑性随深度增加而增加，类似于大脑中的情况。这些发现表明DBN能够从数据中提取符合最大相关性原则的特征层次结构。

更新时间: 2024-07-01 14:13:11

领域: cs.LG,cond-mat.dis-nn,physics.data-an,stat.ML

下载: http://arxiv.org/abs/2407.01656v1

ammBoost: State Growth Control for AMMs

Automated market makers (AMMs) are a form of decentralized cryptocurrency exchanges and considered a prime example of Decentralized Finance (DeFi) applications. Their popularity and high trading activity have resulted in millions of on-chain transactions leading to serious scalability issues. In this paper, we address the on-chain storage overhead problem of AMMs by utilizing a new sidechain architecture as a layer 2 solution, building a system called ammBoost. Our system reduces the amount of on-chain transactions, boosts throughput, and supports blockchain pruning. We devise several techniques to enable layer 2 processing for AMMs while preserving correctness and security of the underlying AMM. We also build a proof-of-concept of ammBoost for a Uniswap-inspired use case to empirically evaluate its performance. Our experiments show that ammBoost decreases the gas cost by 94.53% and the chain growth by at least 80%, and that it can support up to 500x of the daily traffic volume observed for Uniswap in practice.

Updated: 2024-07-01 14:10:56

标题: ammBoost：AMM的状态增长控制

摘要: 自动做市商（AMMs）是一种去中心化加密货币交易所的形式，被认为是去中心化金融（DeFi）应用的一个主要例子。它们的流行和高交易活动导致了数百万条链上交易，引发了严重的可扩展性问题。在本文中，我们通过利用一种新的侧链架构作为第二层解决方案，构建了一个名为ammBoost的系统，解决了AMMs的链上存储开销问题。我们的系统减少了链上交易量，提高了吞吐量，并支持区块链修剪。我们设计了几种技术，使AMMs能够进行第二层处理，同时保留底层AMM的正确性和安全性。我们还为受Uniswap启发的用例构建了ammBoost的概念验证，以经验评估其性能。我们的实验表明，ammBoost将gas成本降低了94.53%，链增长至少减少了80%，并且可以支持实际观察到的Uniswap每日流量量的500倍。

更新时间: 2024-07-01 14:10:56

领域: cs.CR

下载: http://arxiv.org/abs/2406.17094v2

Federated Temporal Difference Learning with Linear Function Approximation under Environmental Heterogeneity

We initiate the study of federated reinforcement learning under environmental heterogeneity by considering a policy evaluation problem. Our setup involves $N$ agents interacting with environments that share the same state and action space but differ in their reward functions and state transition kernels. Assuming agents can communicate via a central server, we ask: Does exchanging information expedite the process of evaluating a common policy? To answer this question, we provide the first comprehensive finite-time analysis of a federated temporal difference (TD) learning algorithm with linear function approximation, while accounting for Markovian sampling, heterogeneity in the agents' environments, and multiple local updates to save communication. Our analysis crucially relies on several novel ingredients: (i) deriving perturbation bounds on TD fixed points as a function of the heterogeneity in the agents' underlying Markov decision processes (MDPs); (ii) introducing a virtual MDP to closely approximate the dynamics of the federated TD algorithm; and (iii) using the virtual MDP to make explicit connections to federated optimization. Putting these pieces together, we rigorously prove that in a low-heterogeneity regime, exchanging model estimates leads to linear convergence speedups in the number of agents.

Updated: 2024-07-01 14:07:58

标题: 在环境异质性下使用线性函数逼近的联邦式时间差异学习

摘要: 我们通过考虑一个政策评估问题，来启动对环境异质性下联邦强化学习的研究。我们的设置涉及$N$个与环境进行交互的代理，这些环境共享相同的状态和动作空间，但在奖励函数和状态转移核中有所不同。假设代理可以通过中央服务器进行通信，我们提出了一个问题：交换信息是否可以加快评估共同政策的过程？为了回答这个问题，我们对具有线性函数逼近的联邦时差(TD)学习算法进行了首次全面的有限时间分析，同时考虑了马尔可夫采样、代理环境的异质性以及多个本地更新以节省通信。我们的分析在很大程度上依赖于几个新颖的要素：(i)推导TD固定点的扰动界限，作为代理底层马尔可夫决策过程(MDP)异质性的函数；(ii)引入一个虚拟MDP，以紧密逼近联邦TD算法的动态；(iii)使用虚拟MDP来明确与联邦优化的联系。将这些要素结合起来，我们严格证明在低异质性范围内，交换模型估计会导致代理数量的线性收敛加速。

更新时间: 2024-07-01 14:07:58

领域: cs.LG,math.OC

下载: http://arxiv.org/abs/2302.02212v2

Unveiling the Unseen: Exploring Whitebox Membership Inference through the Lens of Explainability

The increasing prominence of deep learning applications and reliance on personalized data underscore the urgent need to address privacy vulnerabilities, particularly Membership Inference Attacks (MIAs). Despite numerous MIA studies, significant knowledge gaps persist, particularly regarding the impact of hidden features (in isolation) on attack efficacy and insufficient justification for the root causes of attacks based on raw data features. In this paper, we aim to address these knowledge gaps by first exploring statistical approaches to identify the most informative neurons and quantifying the significance of the hidden activations from the selected neurons on attack accuracy, in isolation and combination. Additionally, we propose an attack-driven explainable framework by integrating the target and attack models to identify the most influential features of raw data that lead to successful membership inference attacks. Our proposed MIA shows an improvement of up to 26% on state-of-the-art MIA.

Updated: 2024-07-01 14:07:46

标题: 揭示未知：通过可解释性的视角探索白盒成员推断

摘要: 深度学习应用的日益突出和对个性化数据的依赖强调了迫切需要解决隐私漏洞的问题，特别是成员推理攻击（MIAs）。尽管进行了大量的MIA研究，但仍存在重要的知识空白，特别是关于隐含特征（单独）对攻击效果的影响以及基于原始数据特征的攻击根本原因的不充分证明。本文旨在通过首先探索统计方法来识别最具信息量的神经元，量化所选神经元的隐含激活对攻击准确性的重要性，无论是单独还是结合。此外，我们提出了一个攻击驱动的可解释框架，通过整合目标模型和攻击模型来识别导致成功成员推理攻击的原始数据中最具影响力的特征。我们提出的MIA在最新技术MIA上有多达26%的改进。

更新时间: 2024-07-01 14:07:46

领域: cs.LG,cs.CR

下载: http://arxiv.org/abs/2407.01306v1

Text2Robot: Evolutionary Robot Design from Text Descriptions

Robot design has traditionally been costly and labor-intensive. Despite advancements in automated processes, it remains challenging to navigate a vast design space while producing physically manufacturable robots. We introduce Text2Robot, a framework that converts user text specifications and performance preferences into physical quadrupedal robots. Within minutes, Text2Robot can use text-to-3D models to provide strong initializations of diverse morphologies. Within a day, our geometric processing algorithms and body-control co-optimization produce a walking robot by explicitly considering real-world electronics and manufacturability. Text2Robot enables rapid prototyping and opens new opportunities for robot design with generative models.

Updated: 2024-07-01 14:05:22

标题: Text2Robot：从文本描述到进化机器人设计

摘要: 机器人设计传统上成本高昂且劳动密集。尽管自动化流程有所进步，但在生产可制造的机器人的同时导航庞大的设计空间仍然具有挑战性。我们引入了Text2Robot，这是一个将用户文本规格和性能偏好转化为四足机器人的框架。在几分钟内，Text2Robot可以使用文本转3D模型来提供多样性形态的强初始化。在一天内，我们的几何处理算法和身体控制共同优化通过明确考虑现实世界的电子设备和可制造性来生产行走机器人。Text2Robot实现了快速原型设计，并为基于生成模型的机器人设计开辟了新的机会。

更新时间: 2024-07-01 14:05:22

领域: cs.RO,cs.AI,cs.LG

下载: http://arxiv.org/abs/2406.19963v2

DCSI -- An improved measure of cluster separability based on separation and connectedness

Whether class labels in a given data set correspond to meaningful clusters is crucial for the evaluation of clustering algorithms using real-world data sets. This property can be quantified by separability measures. The central aspects of separability for density-based clustering are between-class separation and within-class connectedness, and neither classification-based complexity measures nor cluster validity indices (CVIs) adequately incorporate them. A newly developed measure (density cluster separability index, DCSI) aims to quantify these two characteristics and can also be used as a CVI. Extensive experiments on synthetic data indicate that DCSI correlates strongly with the performance of DBSCAN measured via the adjusted Rand index (ARI) but lacks robustness when it comes to multi-class data sets with overlapping classes that are ill-suited for density-based hard clustering. Detailed evaluation on frequently used real-world data sets shows that DCSI can correctly identify touching or overlapping classes that do not correspond to meaningful density-based clusters.

Updated: 2024-07-01 14:04:12

标题: DCSI -- 基于分离和连通性的改进的聚类可分离性度量

摘要: 在使用真实世界数据集评估聚类算法时，给定数据集中的类标签是否对应于有意义的簇是至关重要的。这种属性可以通过可分性度量来量化。对于基于密度的聚类，可分性的中心方面是类间分离和类内连通性，既分类复杂度度量也不足以充分包含它们，也不适用于集群有效性指数（CVIs）。一种新开发的度量（密度簇可分性指数，DCSI）旨在量化这两个特征，并且也可以用作CVI。对合成数据的大量实验证明，DCSI与通过调整后的Rand指数（ARI）衡量的DBSCAN性能强相关，但在涉及不适合于基于密度的硬聚类的重叠类的多类数据集时，缺乏稳健性。对经常使用的真实世界数据集进行详细评估表明，DCSI可以正确识别不对应于有意义基于密度的簇的接触或重叠类。

更新时间: 2024-07-01 14:04:12

领域: stat.ML,cs.LG

下载: http://arxiv.org/abs/2310.12806v2

Robot Instance Segmentation with Few Annotations for Grasping

The ability of robots to manipulate objects relies heavily on their aptitude for visual perception. In domains characterized by cluttered scenes and high object variability, most methods call for vast labeled datasets, laboriously hand-annotated, with the aim of training capable models. Once deployed, the challenge of generalizing to unfamiliar objects implies that the model must evolve alongside its domain. To address this, we propose a novel framework that combines Semi-Supervised Learning (SSL) with Learning Through Interaction (LTI), allowing a model to learn by observing scene alterations and leverage visual consistency despite temporal gaps without requiring curated data of interaction sequences. As a result, our approach exploits partially annotated data through self-supervision and incorporates temporal context using pseudo-sequences generated from unlabeled still images. We validate our method on two common benchmarks, ARMBench mix-object-tote and OCID, where it achieves state-of-the-art performance. Notably, on ARMBench, we attain an $\text{AP}_{50}$ of $86.37$, almost a $20\%$ improvement over existing work, and obtain remarkable results in scenarios with extremely low annotation, achieving an $\text{AP}_{50}$ score of $84.89$ with just $1 \%$ of annotated data compared to $72$ presented in ARMBench on the fully annotated counterpart.

Updated: 2024-07-01 13:58:32

标题: 机器人实例分割在抓取中少标注上的应用

摘要: 机器人操纵物体的能力在很大程度上取决于它们对视觉知觉的能力。在充满混乱场景和高物体变异性的领域中，大多数方法需要大量经过费力手工注释的标记数据集，以训练出能力强大的模型。一旦部署，面对泛化到陌生对象的挑战意味着模型必须与其领域一起发展。为了解决这个问题，我们提出了一个新颖的框架，将半监督学习（SSL）与通过交互学习（LTI）相结合，使模型能够通过观察场景变化并利用视觉一致性来学习，尽管存在时间间隔，而无需交互序列的策划数据。因此，我们的方法通过自监督利用部分注释数据，并使用从未标记的静止图像生成的伪序列来整合时间上下文。我们在两个常见的基准测试中验证了我们的方法，ARMBench混合对象提袋和OCID，在这些基准测试中，我们实现了最先进的性能。值得注意的是，在ARMBench上，我们实现了86.37的AP50，比现有工作几乎提高了20％，在极低注释情况下获得了显著的结果，相比之下，与ARMBench上完全注释的对应项中的72相比，我们仅使用了1％的注释数据，实现了84.89的AP50分数。

更新时间: 2024-07-01 13:58:32

领域: cs.CV,cs.AI,cs.RO

下载: http://arxiv.org/abs/2407.01302v1

Collaborative Performance Prediction for Large Language Models

Comprehensively understanding and accurately predicting the performance of large language models across diverse downstream tasks has emerged as a pivotal challenge in NLP research. The pioneering scaling law on downstream works demonstrated intrinsic similarities within model families and utilized such similarities for performance prediction. However, they tend to overlook the similarities between model families and only consider design factors listed in the original scaling law. To overcome these limitations, we introduce a novel framework, Collaborative Performance Prediction (CPP), which significantly enhances prediction accuracy by leveraging the historical performance of various models on downstream tasks and other design factors for both model and task. We also collect a collaborative data sourced from online platforms containing both historical performance and additional design factors. With the support of the collaborative data, CPP not only surpasses traditional scaling laws in predicting the performance of scaled LLMs but also facilitates a detailed analysis of factor importance, an area previously overlooked.

Updated: 2024-07-01 13:56:42

标题: 大型语言模型的协作性能预测

摘要: 全面理解和准确预测大型语言模型在各种下游任务中的表现已成为自然语言处理研究中的一个关键挑战。先驱性的下游工作规模定律表明了模型系列内在的相似性，并利用这种相似性进行性能预测。然而，他们往往忽视了模型系列之间的相似性，只考虑了原始规模定律中列出的设计因素。为了克服这些限制，我们引入了一种新颖的框架，协作性能预测（CPP），通过利用不同模型在下游任务和其他设计因素上的历史性能显著提高了预测准确性。我们还收集了一份来自在线平台的合作数据，其中包含了历史性能和额外的设计因素。借助合作数据的支持，CPP不仅在预测放大LLMs的性能方面超越了传统的规模定律，还促进了对因素重要性的详细分析，这是以前被忽视的领域。

更新时间: 2024-07-01 13:56:42

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2407.01300v1

Assessing Logical Reasoning Capabilities of Encoder-Only Transformer Models

Logical reasoning is central to complex human activities, such as thinking, debating, and planning; it is also a central component of many AI systems as well. In this paper, we investigate the extent to which encoder-only transformer language models (LMs) can reason according to logical rules. We ask whether those LMs can deduce theorems in propositional calculus and first-order logic; if their relative success in these problems reflects general logical capabilities; and which layers contribute the most to the task. First, we show for several encoder-only LMs that they can be trained, to a reasonable degree, to determine logical validity on various datasets. Next, by cross-probing fine-tuned models on these datasets, we show that LMs have difficulty in transferring their putative logical reasoning ability, which suggests that they may have learned dataset-specific features, instead of a general capability. Finally, we conduct a layerwise probing experiment, which shows that the hypothesis classification task is mostly solved through higher layers.

Updated: 2024-07-01 13:49:45

标题: 评估仅编码器Transformer模型的逻辑推理能力

摘要: 逻辑推理是复杂人类活动的核心，如思考、辩论和规划；它也是许多人工智能系统的核心组成部分。在这篇论文中，我们调查了仅编码器transformer语言模型（LMs）能够根据逻辑规则进行推理的程度。我们询问这些LM是否能够推导命题演算和一阶逻辑中的定理；如果它们在这些问题中的相对成功反映了一般逻辑能力；以及哪些层对任务贡献最大。首先，我们展示了对几个仅编码器LM进行训练，以合理程度确定不同数据集上的逻辑有效性。接下来，通过在这些数据集上交叉检测微调模型，我们发现LM在转移其假设的逻辑推理能力方面存在困难，这表明它们可能学习了特定于数据集的特征，而不是一般能力。最后，我们进行了逐层探测实验，结果显示假设分类任务主要通过更高层解决。

更新时间: 2024-07-01 13:49:45

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2312.11720v2

A Collaborative, Human-Centred Taxonomy of AI, Algorithmic, and Automation Harms

This paper introduces a collaborative, human-centered taxonomy of AI, algorithmic and automation harms. We argue that existing taxonomies, while valuable, can be narrow, unclear, typically cater to practitioners and government, and often overlook the needs of the wider public. Drawing on existing taxonomies and a large repository of documented incidents, we propose a taxonomy that is clear and understandable to a broad set of audiences, as well as being flexible, extensible, and interoperable. Through iterative refinement with topic experts and crowdsourced annotation testing, we propose a taxonomy that can serve as a powerful tool for civil society organisations, educators, policymakers, product teams and the general public. By fostering a greater understanding of the real-world harms of AI and related technologies, we aim to increase understanding, empower NGOs and individuals to identify and report violations, inform policy discussions, and encourage responsible technology development and deployment.

Updated: 2024-07-01 13:47:53

标题: 一种协作的、以人为中心的人工智能、算法和自动化伤害分类法

摘要: 这篇论文介绍了一种协作的、以人为中心的人工智能、算法和自动化危害分类法。我们认为，现有的分类法虽然有价值，但可能过于狭窄、不清晰，通常只适用于从业者和政府，往往忽视了更广泛公众的需求。借鉴现有分类法和大量记录的事件库，我们提出了一种清晰易懂的分类法，适用于广泛的受众，同时具有灵活性、可扩展性和互操作性。通过与领域专家和众包注释测试的迭代改进，我们提出了一种可以成为民间社会组织、教育者、政策制定者、产品团队和普通公众的强大工具的分类法。通过促进对人工智能和相关技术的现实危害有更深入的了解，我们旨在增进对违规行为的认识，赋予非政府组织和个人发现和举报违规行为的能力，促进政策讨论，鼓励负责任的技术开发和部署。

更新时间: 2024-07-01 13:47:53

领域: cs.LG,cs.AI,cs.CY

下载: http://arxiv.org/abs/2407.01294v1

Biology-inspired joint distribution neurons based on Hierarchical Correlation Reconstruction allowing for multidirectional neural networks

Biological neural networks seem qualitatively superior (e.g. in learning, flexibility, robustness) from current artificial like Multi-Layer Perceptron (MLP) or Kolmogorov-Arnold Network (KAN). Simultaneously, in contrast to them: have fundamentally multidirectional signal propagation~\cite{axon}, also of probability distributions e.g. for uncertainty estimation, and are believed not being able to use standard backpropagation training~\cite{backprop}. There are proposed novel artificial neurons based on HCR (Hierarchical Correlation Reconstruction) removing the above low level differences: with neurons containing local joint distribution model (of its connections), representing joint density on normalized variables as just linear combination among $(f_\mathbf{j})$ orthonormal polynomials: $\rho(\mathbf{x})=\sum_{\mathbf{j}\in B} a_\mathbf{j} f_\mathbf{j}(\mathbf{x})$ for $\mathbf{x} \in [0,1]^d$ and $B$ some chosen basis, with basis growth approaching complete description of joint distribution. By various index summations of such $(a_\mathbf{j})$ tensor as neuron parameters, we get simple formulas for e.g. conditional expected values for propagation in any direction, like $E[x|y,z]$, $E[y|x]$, which degenerate to KAN-like parametrization if restricting to pairwise dependencies. Such HCR network can also propagate probability distributions (also joint) like $\rho(y,z|x)$. It also allows for additional training approaches, like direct $(a_\mathbf{j})$ estimation, through tensor decomposition, or more biologically plausible information bottleneck training: layers directly influencing only neighbors, optimizing content to maximize information about the next layer, and minimizing about the previous to minimize the noise.

Updated: 2024-07-01 13:46:06

标题: 基于分层相关重建的生物启发式联合分布神经元，实现多方向神经网络

摘要: 生物神经网络在学习、灵活性和鲁棒性等方面似乎在质量上优于当前的人工神经网络，如多层感知器（MLP）或科尔莫戈洛夫-阿诺德网络（KAN）。与它们相反的是：生物神经网络具有基本的多方向信号传播，还具有概率分布，用于不确定性估计，并且据信不能使用标准的反向传播训练。有提出基于HCR（分层相关重建）的新型人工神经元，消除了上述低级差异：神经元包含局部联合分布模型（其连接），将标准化变量上的联合密度表示为正交多项式$(f_\mathbf{j})$之间的线性组合：$\rho(\mathbf{x})=\sum_{\mathbf{j}\in B} a_\mathbf{j} f_\mathbf{j}(\mathbf{x})$，其中$\mathbf{x} \in [0,1]^d$，$B$为某个选择的基础，基础增长接近完全描述联合分布。通过对这种$(a_\mathbf{j})$张量的各种指数求和作为神经元参数，我们可以得到简单的公式，例如在任何方向上的传播的条件期望值，如$E[x|y,z]$，$E[y|x]$，如果限制为成对依赖，则会退化为类似KAN的参数化。这样的HCR网络也可以传播概率分布（也是联合的），如$\rho(y,z|x)$。它还允许额外的训练方法，如通过张量分解直接估计$(a_\mathbf{j})$，或更具生物学可信度的信息瓶颈训练：层直接影响只有邻居，优化内容以最大化关于下一层的信息，并最小化关于前一层的噪音。

更新时间: 2024-07-01 13:46:06

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2405.05097v3

Lightweight Zero-shot Text-to-Speech with Mixture of Adapters

The advancements in zero-shot text-to-speech (TTS) methods, based on large-scale models, have demonstrated high fidelity in reproducing speaker characteristics. However, these models are too large for practical daily use. We propose a lightweight zero-shot TTS method using a mixture of adapters (MoA). Our proposed method incorporates MoA modules into the decoder and the variance adapter of a non-autoregressive TTS model. These modules enhance the ability to adapt a wide variety of speakers in a zero-shot manner by selecting appropriate adapters associated with speaker characteristics on the basis of speaker embeddings. Our method achieves high-quality speech synthesis with minimal additional parameters. Through objective and subjective evaluations, we confirmed that our method achieves better performance than the baseline with less than 40\% of parameters at 1.9 times faster inference speed. Audio samples are available on our demo page (https://ntt-hilab-gensp.github.io/is2024lightweightTTS/).

Updated: 2024-07-01 13:45:31

标题: 轻量级零样本文本转语音技术：适配器混合模型

摘要: 零样本文本到语音（TTS）方法的进步，基于大规模模型，已经证明在复制说话者特征方面具有高保真度。然而，这些模型对于实际日常使用来说太大了。我们提出了一种使用适配器混合（MoA）的轻量级零样本TTS方法。我们的方法将MoA模块整合到非自回归TTS模型的解码器和方差适配器中。这些模块通过根据说话者嵌入选择与说话者特征相关的适配器，增强了以零样本方式适应各种说话者的能力。我们的方法实现了高质量的语音合成，只需极少的额外参数。通过客观和主观评估，我们确认我们的方法在1.9倍更快的推理速度下，比基线表现更好，参数数量不到40％。音频样本可在我们的演示页面上找到(https://ntt-hilab-gensp.github.io/is2024lightweightTTS/)。

更新时间: 2024-07-01 13:45:31

领域: cs.SD,cs.CL,cs.LG,eess.AS

下载: http://arxiv.org/abs/2407.01291v1

Hypformer: Exploring Efficient Hyperbolic Transformer Fully in Hyperbolic Space

Hyperbolic geometry have shown significant potential in modeling complex structured data, particularly those with underlying tree-like and hierarchical structures. Despite the impressive performance of various hyperbolic neural networks across numerous domains, research on adapting the Transformer to hyperbolic space remains limited. Previous attempts have mainly focused on modifying self-attention modules in the Transformer. However, these efforts have fallen short of developing a complete hyperbolic Transformer. This stems primarily from: (i) the absence of well-defined modules in hyperbolic space, including linear transformation layers, LayerNorm layers, activation functions, dropout operations, etc. (ii) the quadratic time complexity of the existing hyperbolic self-attention module w.r.t the number of input tokens, which hinders its scalability. To address these challenges, we propose, Hypformer, a novel hyperbolic Transformer based on the Lorentz model of hyperbolic geometry. In Hypformer, we introduce two foundational blocks that define the essential modules of the Transformer in hyperbolic space. Furthermore, we develop a linear self-attention mechanism in hyperbolic space, enabling hyperbolic Transformer to process billion-scale graph data and long-sequence inputs for the first time. Our experimental results confirm the effectiveness and efficiency of Hypformer across various datasets, demonstrating its potential as an effective and scalable solution for large-scale data representation and large models.

Updated: 2024-07-01 13:44:38

标题: Hypformer：在双曲空间中充分探索高效的双曲变换器

摘要: 双曲几何在建模复杂结构化数据方面显示出显著潜力，特别是那些具有树状和分层结构的数据。尽管各种双曲神经网络在许多领域表现出色，但将Transformer适应双曲空间的研究仍然有限。先前的尝试主要集中在修改Transformer中的自注意模块上。然而，这些努力未能开发出完整的双曲Transformer。这主要是由于：(i) 在双曲空间中缺乏明确定义的模块，包括线性变换层、LayerNorm层、激活函数、dropout操作等。(ii) 现有双曲自注意模块相对于输入标记数量的二次时间复杂度，这阻碍了其可伸缩性。为了解决这些挑战，我们提出了Hypformer，这是一种基于双曲几何的Lorentz模型的新型双曲Transformer。在Hypformer中，我们引入了两个定义了双曲空间中Transformer基本模块的基础块。此外，我们在双曲空间中开发了一个线性自注意机制，使双曲Transformer能够首次处理十亿级图数据和长序列输入。我们的实验结果证实了Hypformer在各种数据集上的有效性和效率，展示了其作为大规模数据表示和大型模型的有效可伸缩解决方案的潜力。

更新时间: 2024-07-01 13:44:38

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2407.01290v1

We-Math: Does Your Large Multimodal Model Achieve Human-like Mathematical Reasoning?

Visual mathematical reasoning, as a fundamental visual reasoning ability, has received widespread attention from the Large Multimodal Models (LMMs) community. Existing benchmarks, such as MathVista and MathVerse, focus more on the result-oriented performance but neglect the underlying principles in knowledge acquisition and generalization. Inspired by human-like mathematical reasoning, we introduce WE-MATH, the first benchmark specifically designed to explore the problem-solving principles beyond end-to-end performance. We meticulously collect and categorize 6.5K visual math problems, spanning 67 hierarchical knowledge concepts and five layers of knowledge granularity. We decompose composite problems into sub-problems according to the required knowledge concepts and introduce a novel four-dimensional metric, namely Insufficient Knowledge (IK), Inadequate Generalization (IG), Complete Mastery (CM), and Rote Memorization (RM), to hierarchically assess inherent issues in LMMs' reasoning process. With WE-MATH, we conduct a thorough evaluation of existing LMMs in visual mathematical reasoning and reveal a negative correlation between solving steps and problem-specific performance. We confirm the IK issue of LMMs can be effectively improved via knowledge augmentation strategies. More notably, the primary challenge of GPT-4o has significantly transitioned from IK to IG, establishing it as the first LMM advancing towards the knowledge generalization stage. In contrast, other LMMs exhibit a marked inclination towards Rote Memorization - they correctly solve composite problems involving multiple knowledge concepts yet fail to answer sub-problems. We anticipate that WE-MATH will open new pathways for advancements in visual mathematical reasoning for LMMs. The WE-MATH data and evaluation code are available at https://github.com/We-Math/We-Math.

Updated: 2024-07-01 13:39:08

标题: We-Math：您的大型多模型是否实现了类似人类的数学推理？

摘要: 视觉数学推理作为一种基本的视觉推理能力，受到了大型多模型（LMMs）社区的广泛关注。现有的基准测试，如MathVista和MathVerse，更注重结果导向的表现，但忽视了知识获取和概括中的基本原则。受到类人数学推理的启发，我们引入了WE-MATH，这是第一个专门设计用于探索超越端到端性能的问题解决原则的基准测试。我们精心收集和分类了6.5K个视觉数学问题，涵盖了67个分层知识概念和五层知识粒度。我们根据所需的知识概念将复合问题分解为子问题，并引入了一种新颖的四维度指标，即不足知识（IK）、不充分概括（IG）、完全掌握（CM）和死记硬背（RM），以层次化评估LMMs推理过程中的固有问题。通过WE-MATH，我们对现有的LMMs在视觉数学推理方面进行了彻底评估，并揭示了解决步骤与特定问题表现之间的负相关关系。我们确认LMMs的IK问题可以通过知识增强策略有效改善。更值得注意的是，GPT-4o的主要挑战已经显著地从IK转变为IG，使其成为首个朝着知识概括阶段发展的LMM。相比之下，其他LMMs更倾向于死记硬背 - 它们能够正确解决涉及多个知识概念的复合问题，但无法回答子问题。我们预计WE-MATH将为LMMs在视觉数学推理方面的进展开辟新的道路。WE-MATH的数据和评估代码可在https://github.com/We-Math/We-Math 上找到。

更新时间: 2024-07-01 13:39:08

领域: cs.AI,cs.CL,cs.CV,cs.LG,cs.SC

下载: http://arxiv.org/abs/2407.01284v1

Energy-Aware Decentralized Learning with Intermittent Model Training

Decentralized learning (DL) offers a powerful framework where nodes collaboratively train models without sharing raw data and without the coordination of a central server. In the iterative rounds of DL, models are trained locally, shared with neighbors in the topology, and aggregated with other models received from neighbors. Sharing and merging models contribute to convergence towards a consensus model that generalizes better across the collective data captured at training time. In addition, the energy consumption while sharing and merging model parameters is negligible compared to the energy spent during the training phase. Leveraging this fact, we present SkipTrain, a novel DL algorithm, which minimizes energy consumption in decentralized learning by strategically skipping some training rounds and substituting them with synchronization rounds. These training-silent periods, besides saving energy, also allow models to better mix and finally produce models with superior accuracy than typical DL algorithms that train at every round. Our empirical evaluations with 256 nodes demonstrate that SkipTrain reduces energy consumption by 50% and increases model accuracy by up to 12% compared to D-PSGD, the conventional DL algorithm.

Updated: 2024-07-01 13:39:03

标题: 能源感知的分散式学习与间断模型训练

摘要: 分散学习（DL）提供了一个强大的框架，其中节点在不共享原始数据和不需要中央服务器协调的情况下协作训练模型。在DL的迭代轮次中，模型在本地训练，与拓扑结构中的邻居共享，并与邻居收到的其他模型进行聚合。共享和合并模型有助于收敛到一个共识模型，该模型在训练时能够更好地泛化整体数据。此外，在共享和合并模型参数时的能量消耗与训练阶段中消耗的能量相比微不足道。利用这一事实，我们提出了一种新颖的DL算法SkipTrain，通过策略性地跳过一些训练轮次并用同步轮次替代，从而最大程度地减少分散学习中的能量消耗。这些训练静默期不仅节省能量，还允许模型更好地混合，最终产生比典型DL算法在每一轮都训练的模型更准确的模型。我们对256个节点进行的实证评估表明，与传统DL算法D-PSGD相比，SkipTrain将能量消耗减少了50％，并将模型准确性提高了高达12％。

更新时间: 2024-07-01 13:39:03

领域: cs.LG,cs.DC

下载: http://arxiv.org/abs/2407.01283v1

Data After Death: Australian User Preferences and Future Solutions to Protect Posthumous User Data

The digital footprints of today's internet-active individuals are a testament to their lives, and have the potential to become digital legacies once they pass on. Future descendants of those alive today will greatly appreciate the unprecedented insight into the lives of their long since deceased ancestors, but this can only occur if today we have a process for data preservation and handover after death. Many prominent online platforms offer nebulous or altogether absent policies regarding posthumous data handling, and despite recent advances it is currently unclear who the average Australian would like their data to be managed after their death (i.e., social media platforms, a trusted individual, or another digital executor). While at present the management of deceased accounts is largely performed by the platform (e.g., Facebook), it is conceivable that many Australians may not trust such platforms to do so with integrity. This study aims to further the academic conversation around posthumous data by delving deeper into the preferences of the Australian Public regarding the management of their data after death, ultimately to inform future development of research programs and industry solutions. A survey of 1020 Australians revealed that most desired a level of control over how their data is managed after death. Australians currently prefer to entrust the management of their data to a trusted close individual or third party software that they can administrate themselves. As expected, social media companies ranked low regarding both trust and convenience to manage data after death. Future research focus should be to conceptualise and develop a third-party solution that enables these preferences to be realised. Such a solution could interface with the major online vendors (social media, cloud hosting etc.) to action the deceased's will.

Updated: 2024-07-01 13:38:45

标题: 死后数据：澳大利亚用户偏好和未来解决方案以保护身后用户数据

摘要: 今天互联网活跃个体的数字足迹是他们生活的见证，并有可能在他们去世后成为数字遗产。今天活着的人的未来后代将非常欣赏对他们长已逝去的祖先生活的前所未有的深入了解，但这只有在今天我们有一个数据保留和死后移交的流程时才能发生。许多知名在线平台在处理死后数据方面提供模糊或完全缺失的政策，尽管最近有一些进展，但目前尚不清楚普通澳大利亚人希望他们的数据在去世后如何管理（即社交媒体平台、信任的个人或其他数字执行者）。尽管目前已故账户的管理大多由平台（如Facebook）执行，但许多澳大利亚人可能不相信这些平台能够以诚信的方式进行管理。本研究旨在通过深入探讨澳大利亚公众关于死后数据管理偏好的学术讨论，最终为未来研究项目和行业解决方案的发展提供信息。对1020名澳大利亚人的调查显示，大多数人希望在去世后对他们的数据管理有一定程度的控制。澳大利亚人目前更倾向于将他们的数据管理委托给一个信任的亲近个人或第三方软件，他们可以自行管理。正如预期的那样，社交媒体公司在信任度和管理死后数据方便性方面排名较低。未来的研究重点应该是构思和开发一个第三方解决方案，使这些偏好得以实现。这样的解决方案可以与主要在线供应商（社交媒体、云主机等）进行接口，执行死者的遗愿。

更新时间: 2024-07-01 13:38:45

领域: cs.CY,cs.CR

下载: http://arxiv.org/abs/2407.01282v1

Bridging Smoothness and Approximation: Theoretical Insights into Over-Smoothing in Graph Neural Networks

In this paper, we explore the approximation theory of functions defined on graphs. Our study builds upon the approximation results derived from the $K$-functional. We establish a theoretical framework to assess the lower bounds of approximation for target functions using Graph Convolutional Networks (GCNs) and examine the over-smoothing phenomenon commonly observed in these networks. Initially, we introduce the concept of a $K$-functional on graphs, establishing its equivalence to the modulus of smoothness. We then analyze a typical type of GCN to demonstrate how the high-frequency energy of the output decays, an indicator of over-smoothing. This analysis provides theoretical insights into the nature of over-smoothing within GCNs. Furthermore, we establish a lower bound for the approximation of target functions by GCNs, which is governed by the modulus of smoothness of these functions. This finding offers a new perspective on the approximation capabilities of GCNs. In our numerical experiments, we analyze several widely applied GCNs and observe the phenomenon of energy decay. These observations corroborate our theoretical results on exponential decay order.

Updated: 2024-07-01 13:35:53

标题: 在图神经网络中的平滑性和逼近性之间建立桥梁：对过度平滑的理论洞见

摘要: 在本文中，我们探讨了定义在图上的函数的逼近理论。我们的研究基于从$K$-functional导出的逼近结果。我们建立了一个理论框架，用于评估使用图卷积网络（GCNs）逼近目标函数的下界，并检查这些网络中常见的过度平滑现象。最初，我们介绍了图上$K$-functional的概念，并建立了它与光滑度模量的等价性。然后，我们分析了一种典型类型的GCN，以展示输出的高频能量衰减，这是过度平滑的指标。这种分析提供了关于GCNs内部过度平滑性质的理论见解。此外，我们建立了GCNs逼近目标函数的下界，该下界由这些函数的光滑度模量控制。这一发现为我们提供了对GCNs逼近能力的新视角。在我们的数值实验中，我们分析了几种广泛应用的GCNs，并观察到能量衰减现象。这些观察结果证实了我们关于指数衰减阶数的理论结果。

更新时间: 2024-07-01 13:35:53

领域: cs.LG,cs.AI,math.FA

下载: http://arxiv.org/abs/2407.01281v1

Robust Model-Based Reinforcement Learning with an Adversarial Auxiliary Model

Reinforcement learning has demonstrated impressive performance in various challenging problems such as robotics, board games, and classical arcade games. However, its real-world applications can be hindered by the absence of robustness and safety in the learned policies. More specifically, an RL agent that trains in a certain Markov decision process (MDP) often struggles to perform well in nearly identical MDPs. To address this issue, we employ the framework of Robust MDPs (RMDPs) in a model-based setting and introduce a novel learned transition model. Our method specifically incorporates an auxiliary pessimistic model, updated adversarially, to estimate the worst-case MDP within a Kullback-Leibler uncertainty set. In comparison to several existing works, our work does not impose any additional conditions on the training environment, such as the need for a parametric simulator. To test the effectiveness of the proposed pessimistic model in enhancing policy robustness, we integrate it into a practical RL algorithm, called Robust Model-Based Policy Optimization (RMBPO). Our experimental results indicate a notable improvement in policy robustness on high-dimensional MuJoCo control tasks, with the auxiliary model enhancing the performance of the learned policy in distorted MDPs. We further explore the learned deviation between the proposed auxiliary world model and the nominal model, to examine how pessimism is achieved. By learning a pessimistic world model and demonstrating its role in improving policy robustness, our research contributes towards making (model-based) RL more robust.

Updated: 2024-07-01 13:35:44

标题: 具有对抗性辅助模型的稳健模型基础强化学习

摘要: 强化学习在各种具有挑战性的问题中展现出了出色的性能，比如机器人技术、棋盘游戏和经典街机游戏。然而，其在现实世界中的应用可能会受到学习策略缺乏稳健性和安全性的影响。更具体地说，经过训练在某个马尔可夫决策过程（MDP）中的强化学习代理通常难以在几乎相同的MDP中表现良好。为了解决这个问题，我们在基于模型的框架中采用了Robust MDPs（RMDPs）并引入了一种新颖的学习转移模型。我们的方法具体包括一个辅助悲观模型，通过对抗性更新来估计Kullback-Leibler不确定性集合中的最坏情况MDP。与几种现有作品相比，我们的工作不会对训练环境施加任何额外条件，比如需要一个参数化模拟器。为了测试所提出的悲观模型在增强策略稳健性方面的有效性，我们将其集成到一个实用的RL算法中，称为Robust Model-Based Policy Optimization（RMBPO）。我们的实验结果表明，在高维MuJoCo控制任务中，策略稳健性得到了显著改善，辅助模型提高了在扭曲MDP中学习策略的性能。我们进一步探讨了所提出的辅助世界模型与名义模型之间的学习偏差，以研究悲观主义是如何实现的。通过学习悲观世界模型并展示其在改善策略稳健性中的作用，我们的研究为使（基于模型的）强化学习更加稳健做出了贡献。

更新时间: 2024-07-01 13:35:44

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2406.09976v2

Human-Robot Mutual Learning through Affective-Linguistic Interaction and Differential Outcomes Training [Pre-Print]

Owing to the recent success of Large Language Models, Modern A.I has been much focused on linguistic interactions with humans but less focused on non-linguistic forms of communication between man and machine. In the present paper, we test how affective-linguistic communication, in combination with differential outcomes training, affects mutual learning in a human-robot context. Taking inspiration from child-caregiver dynamics, our human-robot interaction setup consists of a (simulated) robot attempting to learn how best to communicate internal, homeostatically-controlled needs; while a human "caregiver" attempts to learn the correct object to satisfy the robot's present communicated need. We studied the effects of i) human training type, and ii) robot reinforcement learning type, to assess mutual learning terminal accuracy and rate of learning (as measured by the average reward achieved by the robot). Our results find mutual learning between a human and a robot is significantly improved with Differential Outcomes Training (DOT) compared to Non-DOT (control) conditions. We find further improvements when the robot uses an exploration-exploitation policy selection, compared to purely exploitation policy selection. These findings have implications for utilizing socially assistive robots (SAR) in therapeutic contexts, e.g. for cognitive interventions, and educational applications.

Updated: 2024-07-01 13:35:08

标题: 人机相互学习：通过情感-语言交互和差异性结果训练【预印本】

摘要: 由于大型语言模型的最近成功，现代人工智能更多地关注于人类之间的语言交流，但较少关注人与机器之间的非语言形式的沟通。在本文中，我们测试了情感-语言交流与差异结果训练相结合如何影响人机环境中的相互学习。受到儿童-照顾者动态的启发，我们的人机交互设置包括一个（模拟的）机器人试图学习如何最好地沟通内部、恒定控制的需求；而一个人类“照顾者”试图学习正确的物体来满足机器人目前沟通的需求。我们研究了人类训练类型和机器人强化学习类型对相互学习终端准确性和学习速率的影响（通过机器人实现的平均奖励来衡量）。我们的结果发现，与非差异结果训练（控制）条件相比，人类和机器人之间的相互学习在差异结果训练（DOT）下显著改善。当机器人使用探索-利用策略选择时，与纯粹利用策略选择相比，我们发现进一步的改善。这些发现对于在治疗环境中利用社会辅助机器人（SAR）进行认知干预和教育应用具有重要意义。

更新时间: 2024-07-01 13:35:08

领域: cs.RO,cs.AI

下载: http://arxiv.org/abs/2407.01280v1

CILF-CIAE: CLIP-driven Image-Language Fusion for Correcting Inverse Age Estimation

The age estimation task aims to predict the age of an individual by analyzing facial features in an image. The development of age estimation can improve the efficiency and accuracy of various applications (e.g., age verification and secure access control, etc.). In recent years, contrastive language-image pre-training (CLIP) has been widely used in various multimodal tasks and has made some progress in the field of age estimation. However, existing CLIP-based age estimation methods require high memory usage (quadratic complexity) when globally modeling images, and lack an error feedback mechanism to prompt the model about the quality of age prediction results. To tackle the above issues, we propose a novel CLIP-driven Image-Language Fusion for Correcting Inverse Age Estimation (CILF-CIAE). Specifically, we first introduce the CLIP model to extract image features and text semantic information respectively, and map them into a highly semantically aligned high-dimensional feature space. Next, we designed a new Transformer architecture (i.e., FourierFormer) to achieve channel evolution and spatial interaction of images, and to fuse image and text semantic information. Compared with the quadratic complexity of the attention mechanism, the proposed Fourierformer is of linear log complexity. To further narrow the semantic gap between image and text features, we utilize an efficient contrastive multimodal learning module that supervises the multimodal fusion process of FourierFormer through contrastive loss for image-text matching, thereby improving the interaction effect between different modalities. Finally, we introduce reversible age estimation, which uses end-to-end error feedback to reduce the error rate of age predictions. Through extensive experiments on multiple data sets, CILF-CIAE has achieved better age prediction results.

Updated: 2024-07-01 13:31:40

标题: CILF-CIAE: 基于CLIP的图像-语言融合用于修正逆龄期估计

摘要: 年龄估计任务旨在通过分析图像中的面部特征来预测个体的年龄。年龄估计的发展可以提高各种应用的效率和准确性（例如，年龄验证和安全访问控制等）。近年来，对比语言-图像预训练（CLIP）在各种多模态任务中被广泛使用，并在年龄估计领域取得了一些进展。然而，现有基于CLIP的年龄估计方法在全局建模图像时需要高内存使用（二次复杂度），并且缺乏错误反馈机制来提示模型关于年龄预测结果的质量。为了解决上述问题，我们提出了一种新颖的CLIP驱动的图像-语言融合技术，用于修正逆向年龄估计（CILF-CIAE）。具体来说，我们首先引入CLIP模型分别提取图像特征和文本语义信息，并将它们映射到高度语义对齐的高维特征空间。接下来，我们设计了一种新的Transformer架构（即FourierFormer）来实现图像的通道演变和空间交互，并融合图像和文本语义信息。与注意机制的二次复杂度相比，所提出的Fourierformer具有线性对数复杂度。为了进一步缩小图像和文本特征之间的语义差距，我们利用高效的对比多模态学习模块，通过对比损失监督FourierFormer的多模态融合过程，从而提高不同模态之间的交互效果。最后，我们引入可逆年龄估计，利用端到端的错误反馈来降低年龄预测的错误率。通过对多个数据集进行广泛实验，CILF-CIAE取得了更好的年龄预测结果。

更新时间: 2024-07-01 13:31:40

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2312.01758v2

Connectivity Oracles for Predictable Vertex Failures

The problem of designing connectivity oracles supporting vertex failures is one of the basic data structures problems for undirected graphs. It is already well understood: previous works [Duan--Pettie STOC'10; Long--Saranurak FOCS'22] achieve query time linear in the number of failed vertices, and it is conditionally optimal as long as we require preprocessing time polynomial in the size of the graph and update time polynomial in the number of failed vertices. We revisit this problem in the paradigm of algorithms with predictions: we ask if the query time can be improved if the set of failed vertices can be predicted beforehand up to a small number of errors. More specifically, we design a data structure that, given a graph $G=(V,E)$ and a set of vertices predicted to fail $\widehat{D} \subseteq V$ of size $d=|\widehat{D}|$, preprocesses it in time $\tilde{O}(d|E|)$ and then can receive an update given as the symmetric difference between the predicted and the actual set of failed vertices $\widehat{D} \triangle D = (\widehat{D} \setminus D) \cup (D \setminus \widehat{D})$ of size $\eta = |\widehat{D} \triangle D|$, process it in time $\tilde{O}(\eta^4)$, and after that answer connectivity queries in $G \setminus D$ in time $O(\eta)$. Viewed from another perspective, our data structure provides an improvement over the state of the art for the \emph{fully dynamic subgraph connectivity problem} in the \emph{sensitivity setting} [Henzinger--Neumann ESA'16]. We argue that the preprocessing time and query time of our data structure are conditionally optimal under standard fine-grained complexity assumptions.

Updated: 2024-07-01 13:24:51

标题: 可预测顶点故障的连通性神谕

摘要: 设计支持顶点故障的连接性预言的问题是无向图的基本数据结构问题之一。这个问题已经得到很好的理解：之前的工作[Duan--Pettie STOC'10; Long--Saranurak FOCS'22]实现了查询时间与失败顶点数量成线性关系，并且在我们要求预处理时间与图的大小成多项式关系以及更新时间与失败顶点数量成多项式关系的条件下是最优的。我们在算法预测范式中重新审视这个问题：我们询问是否可以在预测到失败顶点集合之前（最多有少量错误）改进查询时间。更具体地说，我们设计了一个数据结构，给定一个图$G=(V,E)$和一个预测将失败的顶点集合$\widehat{D} \subseteq V$，大小为$d=|\widehat{D}|$，在时间$\tilde{O}(d|E|)$内进行预处理，然后可以接收一个更新，即预测的失败顶点集合与实际失败顶点集合的对称差$\widehat{D} \triangle D = (\widehat{D} \setminus D) \cup (D \setminus \widehat{D})$，大小为$\eta = |\widehat{D} \triangle D|$，在时间$\tilde{O}(\eta^4)$内处理，并在$G \setminus D$内以时间$O(\eta)$回答连接性查询。从另一个角度看，我们的数据结构在“灵敏度设置”下为“完全动态子图连接性问题”提供了超越现有技术的改进[Henzinger--Neumann ESA'16]。我们认为我们的数据结构的预处理时间和查询时间在标准细粒度复杂性假设下是最优的。

更新时间: 2024-07-01 13:24:51

领域: cs.DS,cs.LG

下载: http://arxiv.org/abs/2312.08489v3

Using Voice and Biofeedback to Predict User Engagement during Product Feedback Interviews

Capturing users' engagement is crucial for gathering feedback about the features of a software product. In a market-driven context, current approaches to collect and analyze users' feedback are based on techniques leveraging information extracted from product reviews and social media. These approaches are hardly applicable in bespoke software development, or in contexts in which one needs to gather information from specific users. In such cases, companies need to resort to face-to-face interviews to get feedback on their products. In this paper, we propose to utilize biometric data, in terms of physiological and voice features, to complement interviews with information about the engagement of the user on the discussed product-relevant topics. We evaluate our approach by interviewing users while gathering their physiological data (i.e., biofeedback) using an Empatica E4 wristband, and capturing their voice through the default audio-recorder of a common laptop. Our results show that we can predict users' engagement by training supervised machine learning algorithms on biometric data (F1=0.72), and that voice features alone are sufficiently effective (F1=0.71). Our work contributes with one the first studies in requirements engineering in which biometrics are used to identify emotions. This is also the first study in software engineering that considers voice analysis. The usage of voice features could be particularly helpful for emotion-aware requirements elicitation in remote communication, either performed by human analysts or voice-based chatbots, and can also be exploited to support the analysis of meetings in software engineering research.

Updated: 2024-07-01 13:23:11

标题: 使用语音和生物反馈来预测用户在产品反馈访谈中的参与度

摘要: 捕捉用户的参与度对于收集有关软件产品特性的反馈至关重要。在市场驱动的背景下，目前收集和分析用户反馈的方法基于利用从产品评论和社交媒体中提取的信息的技术。这些方法几乎不适用于定制软件开发，或者需要从特定用户那里收集信息的情况。在这种情况下，公司需要采取面对面的采访来获取有关其产品的反馈。在本文中，我们提议利用生理和语音特征的生物数据来补充关于用户在讨论的产品相关主题上参与度的信息。我们通过采访用户同时收集其生理数据（即生物反馈）使用Empatica E4手环，并通过普通笔记本电脑的默认音频记录器捕捉他们的声音来评估我们的方法。我们的结果显示，通过对生物数据进行监督机器学习算法的训练，我们可以预测用户的参与度（F1=0.72），而仅仅声音特征就已经足够有效（F1=0.71）。我们的工作是要求工程中首次使用生物特征来识别情绪的研究之一。这也是软件工程中首个考虑声音分析的研究。声音特征的使用可能特别有助于远程沟通中的情绪感知要求引出，无论是由人类分析员还是基于语音的聊天机器人执行，还可以用来支持软件工程研究中会议的分析。

更新时间: 2024-07-01 13:23:11

领域: cs.SE,cs.LG,cs.SD,eess.AS,68N30,D.2.1; D.2.2

下载: http://arxiv.org/abs/2104.02410v5

The African Woman is Rhythmic and Soulful: Evaluation of Open-ended Generation for Implicit Biases

This study investigates the subtle and often concealed biases present in Large Language Models (LLMs), which, despite passing explicit bias tests, can still exhibit implicit biases akin to those observed in humans who profess egalitarian beliefs yet demonstrate underlying prejudices. The challenge of measuring such biases is exacerbated as LLMs become increasingly proprietary, restricting access to their internal mechanisms such as embeddings, which are crucial for applying traditional bias measures. To tackle these issues, this study introduces innovative measures of bias inspired by psychological methodologies: the LLM Implicit Association Test (IAT) Bias and the LLM Decision Bias. The LLM IAT Bias is a prompt-based method designed to unearth implicit biases by simulating the well-known psychological IAT but adapted for use with LLMs. The LLM Decision Bias measure is developed to detect subtle discrimination in decision-making tasks, focusing on how LLMs choose between individuals in various scenarios. Open-ended generation is also utilised through thematic analysis of word generations and storytelling. The experiments revealed biases across gender and racial domains, from discriminatory categorisations to exoticisation. Our findings indicate that the prompt-based measure of implicit bias not only correlates with traditional embedding-based methods but also more effectively predicts downstream behaviors, which are crucially measured by the LLM Decision Bias. This relationship underscores the importance of relative, rather than absolute, evaluations in assessing implicit biases, reflecting psychological insights into human bias assessment. This research contributes to the broader understanding of AI ethics and provides suggestions for continually assessing and mitigating biases in advanced AI systems, emphasising the need for more qualitative and downstream focus.

Updated: 2024-07-01 13:21:33

标题: 非洲女性具有节奏感和灵魂：对隐性偏见的开放式生成评估

摘要: 这项研究调查了大型语言模型（LLMs）中存在的微妙且常常隐藏的偏见，尽管它们通过了明确的偏见测试，但仍然可能表现出类似于那些持有平等主义信念但同时展现出潜在偏见的人类所观察到的隐性偏见。随着LLMs变得越来越专有，限制了对其内部机制（如嵌入）的访问，这种偏见的衡量挑战变得更加严峻，而这些嵌入对应用传统偏见度量方法至关重要。为了解决这些问题，本研究引入了受心理学方法启发的创新偏见度量方法：LLM隐性联想测验（IAT）偏见和LLM决策偏见。LLM IAT偏见是一种基于提示的方法，旨在通过模拟众所周知的心理IAT来揭示隐性偏见，但为了与LLMs一起使用进行了调整。LLM决策偏见度量方法被开发用于检测决策任务中的微妙歧视，重点关注LLMs在各种情景中如何选择个体。此外，通过对单词生成和叙事的主题分析，还利用了开放式生成。实验揭示了在性别和种族领域存在的偏见，从歧视性分类到异国情调。我们的研究结果表明，隐性偏见的基于提示的度量方法不仅与传统的基于嵌入的方法相关，而且更有效地预测下游行为，这些行为通过LLM决策偏见进行了关键测量。这种关联强调了在评估隐性偏见时相对而不是绝对评估的重要性，反映了心理学对人类偏见评估的见解。这项研究有助于更广泛地理解人工智能伦理，并提出了持续评估和减少先进人工智能系统中偏见的建议，强调了更多定性和下游关注的需要。

更新时间: 2024-07-01 13:21:33

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2407.01270v1

Efficient Estimation for Longitudinal Networks via Adaptive Merging

Longitudinal network consists of a sequence of temporal edges among multiple nodes, where the temporal edges are observed in real time. It has become ubiquitous with the rise of online social platform and e-commerce, but largely under-investigated in literature. In this paper, we propose an efficient estimation framework for longitudinal network, leveraging strengths of adaptive network merging, tensor decomposition and point process. It merges neighboring sparse networks so as to enlarge the number of observed edges and reduce estimation variance, whereas the estimation bias introduced by network merging is controlled by exploiting local temporal structures for adaptive network neighborhood. A projected gradient descent algorithm is proposed to facilitate estimation, where the upper bound of the estimation error in each iteration is established. A thorough analysis is conducted to quantify the asymptotic behavior of the proposed method, which shows that it can significantly reduce the estimation error and also provides guideline for network merging under various scenarios. We further demonstrate the advantage of the proposed method through extensive numerical experiments on synthetic datasets and a militarized interstate dispute dataset.

Updated: 2024-07-01 13:17:32

标题: 通过自适应合并实现纵向网络的有效估计

摘要: 长期网络由多个节点之间的时间边序列组成，其中时间边是实时观察到的。随着在线社交平台和电子商务的兴起，长期网络已经变得无处不在，但在文献中很大程度上未被研究。在本文中，我们提出了一个高效的长期网络估计框架，利用自适应网络合并、张量分解和点过程的优势。它合并相邻的稀疏网络，以扩大观察到的边的数量并减少估计方差，同时通过利用本地时间结构来控制网络合并引入的估计偏差，以实现自适应网络邻域。提出了一种投影梯度下降算法来促进估计，每次迭代中估计误差的上限被建立。进行了彻底的分析，以量化所提出方法的渐近行为，结果表明它可以显著减少估计误差，同时为各种情景下的网络合并提供指导。通过对合成数据集和军事化国际争端数据集的广泛数值实验进一步证明了所提方法的优势。

更新时间: 2024-07-01 13:17:32

领域: stat.ML,cs.LG

下载: http://arxiv.org/abs/2211.07866v5

Complementary Fusion of Deep Network and Tree Model for ETA Prediction

Estimated time of arrival (ETA) is a very important factor in the transportation system. It has attracted increasing attentions and has been widely used as a basic service in navigation systems and intelligent transportation systems. In this paper, we propose a novel solution to the ETA estimation problem, which is an ensemble on tree models and neural networks. We proved the accuracy and robustness of the solution on the A/B list and finally won first place in the SIGSPATIAL 2021 GISCUP competition.

Updated: 2024-07-01 13:17:09

标题: 深度网络和树模型的互补融合用于ETA预测

摘要: 预计到达时间（ETA）是交通系统中非常重要的因素。它吸引了越来越多的关注，并广泛用作导航系统和智能交通系统中的基本服务。在本文中，我们提出了一种新颖的解决方案来解决ETA估计问题，即基于树模型和神经网络的集成方法。我们证明了该解决方案在A/B榜上的准确性和鲁棒性，最终在SIGSPATIAL 2021 GISCUP竞赛中获得第一名。

更新时间: 2024-07-01 13:17:09

领域: cs.LG

下载: http://arxiv.org/abs/2407.01262v1

Model Generation with LLMs: From Requirements to UML Sequence Diagrams

Complementing natural language (NL) requirements with graphical models can improve stakeholders' communication and provide directions for system design. However, creating models from requirements involves manual effort. The advent of generative large language models (LLMs), ChatGPT being a notable example, offers promising avenues for automated assistance in model generation. This paper investigates the capability of ChatGPT to generate a specific type of model, i.e., UML sequence diagrams, from NL requirements. We conduct a qualitative study in which we examine the sequence diagrams generated by ChatGPT for 28 requirements documents of various types and from different domains. Observations from the analysis of the generated diagrams have systematically been captured through evaluation logs, and categorized through thematic analysis. Our results indicate that, although the models generally conform to the standard and exhibit a reasonable level of understandability, their completeness and correctness with respect to the specified requirements often present challenges. This issue is particularly pronounced in the presence of requirements smells, such as ambiguity and inconsistency. The insights derived from this study can influence the practical utilization of LLMs in the RE process, and open the door to novel RE-specific prompting strategies targeting effective model generation.

Updated: 2024-07-01 13:16:49

标题: 使用LLMs生成模型：从需求到UML序列图

摘要: 通过将自然语言（NL）需求与图形模型相结合，可以改善利益相关者之间的沟通，并为系统设计提供方向。然而，从需求中创建模型需要手动努力。大规模生成语言模型（LLMs）的出现，其中ChatGPT是一个著名的例子，为模型生成提供了自动化帮助的有希望的途径。本文研究了ChatGPT生成一种特定类型的模型（即UML序列图）的能力，从NL需求中。我们进行了一项定性研究，检查了ChatGPT为各种类型和不同领域的28份需求文档生成的序列图。通过评估日志系统地捕获了对生成图表的分析观察，并通过主题分析进行了分类。我们的结果表明，尽管这些模型通常符合标准，展现出合理的可理解性，但它们相对于指定的需求的完整性和正确性通常存在挑战。这个问题在需求存在模糊和不一致等问题时尤为突出。从这项研究中得出的见解可以影响LLMs在RE过程中的实际利用，并为针对有效模型生成的新型RE专用提示策略打开大门。

更新时间: 2024-07-01 13:16:49

领域: cs.SE,cs.CL,cs.LG,D.2; K.6.3; D.2.1; D.3.1; D.2.2; D.2.10; D.2.2; I.2; I.2.7

下载: http://arxiv.org/abs/2404.06371v2

DeepiSign-G: Generic Watermark to Stamp Hidden DNN Parameters for Self-contained Tracking

Deep learning solutions in critical domains like autonomous vehicles, facial recognition, and sentiment analysis require caution due to the severe consequences of errors. Research shows these models are vulnerable to adversarial attacks, such as data poisoning and neural trojaning, which can covertly manipulate model behavior, compromising reliability and safety. Current defense strategies like watermarking have limitations: they fail to detect all model modifications and primarily focus on attacks on CNNs in the image domain, neglecting other critical architectures like RNNs. To address these gaps, we introduce DeepiSign-G, a versatile watermarking approach designed for comprehensive verification of leading DNN architectures, including CNNs and RNNs. DeepiSign-G enhances model security by embedding an invisible watermark within the Walsh-Hadamard transform coefficients of the model's parameters. This watermark is highly sensitive and fragile, ensuring prompt detection of any modifications. Unlike traditional hashing techniques, DeepiSign-G allows substantial metadata incorporation directly within the model, enabling detailed, self-contained tracking and verification. We demonstrate DeepiSign-G's applicability across various architectures, including CNN models (VGG, ResNets, DenseNet) and RNNs (Text sentiment classifier). We experiment with four popular datasets: VGG Face, CIFAR10, GTSRB Traffic Sign, and Large Movie Review. We also evaluate DeepiSign-G under five potential attacks. Our comprehensive evaluation confirms that DeepiSign-G effectively detects these attacks without compromising CNN and RNN model performance, highlighting its efficacy as a robust security measure for deep learning applications. Detection of integrity breaches is nearly perfect, while hiding only a bit in approximately 1% of the Walsh-Hadamard coefficients.

Updated: 2024-07-01 13:15:38

标题: DeepiSign-G：用于自包含跟踪的隐藏DNN参数的通用水印

摘要: 在像自动驾驶汽车、面部识别和情感分析这样的关键领域中，深度学习解决方案需要谨慎，因为错误会带来严重后果。研究表明，这些模型容易受到对抗性攻击，如数据污染和神经特洛伊攻击，这些攻击可以秘密操纵模型行为，危及可靠性和安全性。当前的防御策略如水印技术存在局限性：它们无法检测所有模型修改，并且主要关注图像领域CNN的攻击，忽视了其他关键结构如RNN。为填补这些空白，我们引入了DeepiSign-G，一种多功能水印技术，旨在全面验证领先的DNN架构，包括CNN和RNN。DeepiSign-G通过在模型参数的Walsh-Hadamard变换系数中嵌入一个隐形水印来增强模型安全性。这个水印非常敏感和脆弱，确保及时发现任何修改。与传统的哈希技术不同，DeepiSign-G允许在模型内直接进行大量元数据的整合，从而实现详细的、自包含的跟踪和验证。我们展示了DeepiSign-G在各种架构中的适用性，包括CNN模型（VGG、ResNets、DenseNet）和RNN（文本情感分类器）。我们对四个流行数据集进行了实验：VGG面部、CIFAR10、GTSRB交通标志和大型电影评论。我们还在五种潜在攻击下评估了DeepiSign-G。我们全面的评估证实，DeepiSign-G能够有效地检测这些攻击，而不会影响CNN和RNN模型的性能，突显其作为深度学习应用的强大安全措施的功效。完整性违规的检测几乎完美，同时只在大约1%的Walsh-Hadamard系数中隐藏一点信息。

更新时间: 2024-07-01 13:15:38

领域: cs.CR

下载: http://arxiv.org/abs/2407.01260v1

QUEEN: Query Unlearning against Model Extraction

Model extraction attacks currently pose a non-negligible threat to the security and privacy of deep learning models. By querying the model with a small dataset and usingthe query results as the ground-truth labels, an adversary can steal a piracy model with performance comparable to the original model. Two key issues that cause the threat are, on the one hand, accurate and unlimited queries can be obtained by the adversary; on the other hand, the adversary can aggregate the query results to train the model step by step. The existing defenses usually employ model watermarking or fingerprinting to protect the ownership. However, these methods cannot proactively prevent the violation from happening. To mitigate the threat, we propose QUEEN (QUEry unlEarNing) that proactively launches counterattacks on potential model extraction attacks from the very beginning. To limit the potential threat, QUEEN has sensitivity measurement and outputs perturbation that prevents the adversary from training a piracy model with high performance. In sensitivity measurement, QUEEN measures the single query sensitivity by its distance from the center of its cluster in the feature space. To reduce the learning accuracy of attacks, for the highly sensitive query batch, QUEEN applies query unlearning, which is implemented by gradient reverse to perturb the softmax output such that the piracy model will generate reverse gradients to worsen its performance unconsciously. Experiments show that QUEEN outperforms the state-of-the-art defenses against various model extraction attacks with a relatively low cost to the model accuracy. The artifact is publicly available at https://anonymous.4open.science/r/queen implementation-5408/.

Updated: 2024-07-01 13:01:41

标题: QUEEN: 针对模型提取的查询反学习

摘要: 模型提取攻击目前对深度学习模型的安全性和隐私构成了一个不可忽视的威胁。通过使用一个小数据集查询模型，并将查询结果作为基准标签，对手可以窃取一个性能与原始模型相媲美的盗版模型。导致这种威胁的两个关键问题是，一方面，对手可以获得准确和无限次的查询；另一方面，对手可以聚合查询结果逐步训练模型。现有的防御通常采用模型水印或指纹技术来保护所有权。然而，这些方法不能主动防止侵犯的发生。为了减轻威胁，我们提出了QUEEN（QUEry unlEarNing），从一开始就主动发起对潜在模型提取攻击的反击。为了限制潜在威胁，QUEEN具有敏感度测量和输出扰动，防止对手训练一个性能优异的盗版模型。在敏感度测量中，QUEEN通过其在特征空间中距离其簇中心的距离来测量单个查询的敏感度。为了降低攻击的学习准确性，对于高度敏感的查询批次，QUEEN应用查询遗忘，通过梯度反转来扰乱softmax输出，使盗版模型生成反向梯度，不知不觉地恶化其性能。实验证明，QUEEN在各种模型提取攻击中优于最先进的防御方法，并且对模型准确性的成本相对较低。该工件可在https://anonymous.4open.science/r/queen implementation-5408/上公开获取。

更新时间: 2024-07-01 13:01:41

领域: cs.CR,cs.AI

下载: http://arxiv.org/abs/2407.01251v1

Metric-Entropy Limits on Nonlinear Dynamical System Learning

This paper is concerned with the fundamental limits of nonlinear dynamical system learning from input-output traces. Specifically, we show that recurrent neural networks (RNNs) are capable of learning nonlinear systems that satisfy a Lipschitz property and forget past inputs fast enough in a metric-entropy optimal manner. As the sets of sequence-to-sequence maps realized by the dynamical systems we consider are significantly more massive than function classes generally considered in deep neural network approximation theory, a refined metric-entropy characterization is needed, namely in terms of order, type, and generalized dimension. We compute these quantities for the classes of exponentially-decaying and polynomially-decaying Lipschitz fading-memory systems and show that RNNs can achieve them.

Updated: 2024-07-01 12:57:03

标题: 度量熵对非线性动态系统学习的限制

摘要: 这篇论文探讨了非线性动态系统从输入输出轨迹学习的基本限制。具体来说，我们展示了递归神经网络（RNNs）能够学习满足Lipschitz性质且在度量-熵最优方式下快速忘记过去输入的非线性系统。由于我们考虑的动态系统实现的序列到序列映射集合比深度神经网络逼近理论中通常考虑的函数类更加庞大，因此需要一种精细的度量-熵特征化，即基于阶数、类型和广义维度。我们计算了指数衰减和多项式衰减Lipschitz衰减记忆系统类的这些数量，并展示了RNNs能够实现它们。

更新时间: 2024-07-01 12:57:03

领域: cs.LG,cs.IT,math.DS,math.IT

下载: http://arxiv.org/abs/2407.01250v1

The Solution for Temporal Sound Localisation Task of ICCV 1st Perception Test Challenge 2023

In this paper, we propose a solution for improving the quality of temporal sound localization. We employ a multimodal fusion approach to combine visual and audio features. High-quality visual features are extracted using a state-of-the-art self-supervised pre-training network, resulting in efficient video feature representations. At the same time, audio features serve as complementary information to help the model better localize the start and end of sounds. The fused features are trained in a multi-scale Transformer for training. In the final test dataset, we achieved a mean average precision (mAP) of 0.33, obtaining the second-best performance in this track.

Updated: 2024-07-01 12:52:05

标题: 2023年ICCV第一次感知测试挑战中时间声音定位任务的解决方案

摘要: 在这篇论文中，我们提出了一种改进时间声音定位质量的解决方案。我们采用了一种多模态融合方法，将视觉和音频特征结合起来。通过使用最先进的自监督预训练网络提取高质量的视觉特征，得到了高效的视频特征表示。同时，音频特征作为补充信息，帮助模型更好地定位声音的开始和结束。融合的特征在多尺度Transformer中进行训练。在最终的测试数据集中，我们实现了0.33的平均精度(mAP)，在该领域取得了第二好的性能。

更新时间: 2024-07-01 12:52:05

领域: cs.SD,cs.CV,cs.LG,eess.AS

下载: http://arxiv.org/abs/2407.02318v1

Recovering the Pre-Fine-Tuning Weights of Generative Models

The dominant paradigm in generative modeling consists of two steps: i) pre-training on a large-scale but unsafe dataset, ii) aligning the pre-trained model with human values via fine-tuning. This practice is considered safe, as no current method can recover the unsafe, pre-fine-tuning model weights. In this paper, we demonstrate that this assumption is often false. Concretely, we present Spectral DeTuning, a method that can recover the weights of the pre-fine-tuning model using a few low-rank (LoRA) fine-tuned models. In contrast to previous attacks that attempt to recover pre-fine-tuning capabilities, our method aims to recover the exact pre-fine-tuning weights. Our approach exploits this new vulnerability against large-scale models such as a personalized Stable Diffusion and an aligned Mistral.

Updated: 2024-07-01 12:48:51

标题: 恢复生成模型的预微调权重

摘要: 生成建模中的主导范式包括两个步骤：i) 在大规模但不安全的数据集上进行预训练，ii) 通过微调将预训练模型与人类价值观对齐。这种做法被认为是安全的，因为目前没有任何方法可以恢复不安全的、预微调模型的权重。在本文中，我们证明了这种假设通常是错误的。具体来说，我们提出了Spectral DeTuning，一种可以使用一些低秩（LoRA）微调模型来恢复预微调模型权重的方法。与先前尝试恢复预微调能力的攻击不同，我们的方法旨在恢复精确的预微调权重。我们的方法利用了这种新的漏洞来针对大规模模型，例如个性化的Stable Diffusion和对齐的Mistral。

更新时间: 2024-07-01 12:48:51

领域: cs.LG,cs.CL,cs.CR,cs.CV

下载: http://arxiv.org/abs/2402.10208v2

Probabilistic Test-Time Generalization by Variational Neighbor-Labeling

This paper strives for domain generalization, where models are trained exclusively on source domains before being deployed on unseen target domains. We follow the strict separation of source training and target testing, but exploit the value of the unlabeled target data itself during inference. We make three contributions. First, we propose probabilistic pseudo-labeling of target samples to generalize the source-trained model to the target domain at test time. We formulate the generalization at test time as a variational inference problem, by modeling pseudo labels as distributions, to consider the uncertainty during generalization and alleviate the misleading signal of inaccurate pseudo labels. Second, we learn variational neighbor labels that incorporate the information of neighboring target samples to generate more robust pseudo labels. Third, to learn the ability to incorporate more representative target information and generate more precise and robust variational neighbor labels, we introduce a meta-generalization stage during training to simulate the generalization procedure. Experiments on seven widely-used datasets demonstrate the benefits, abilities, and effectiveness of our proposal.

Updated: 2024-07-01 12:46:35

标题: 通过变分邻居标记实现概率测试时的泛化

摘要: 本文致力于领域泛化，在此过程中，模型在部署到未见目标域之前仅在源域上进行训练。我们严格遵循源域训练和目标域测试的分离原则，但在推断过程中利用未标记的目标数据本身的价值。我们做出了三方面的贡献。首先，我们提出了目标样本的概率伪标记，以在测试时将源域训练的模型泛化到目标域。我们将测试时的泛化问题建模为变分推断问题，通过将伪标记建模为分布来考虑泛化过程中的不确定性，并减轻不准确伪标记的误导信号。其次，我们学习变分邻近标签，结合邻近目标样本的信息生成更加稳健的伪标记。第三，为了学习将更具代表性的目标信息融入并生成更精确和稳健的变分邻近标签的能力，我们在训练过程中引入了元泛化阶段来模拟泛化过程。在七个广泛使用的数据集上的实验证明了我们提议的方法的益处、能力和有效性。

更新时间: 2024-07-01 12:46:35

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2307.04033v3

A Deep Reinforcement Learning Approach to Battery Management in Dairy Farming via Proximal Policy Optimization

Dairy farms consume a significant amount of electricity for their operations, and this research focuses on enhancing energy efficiency and minimizing the impact on the environment in the sector by maximizing the utilization of renewable energy sources. This research investigates the application of Proximal Policy Optimization (PPO), a deep reinforcement learning algorithm (DRL), to enhance dairy farming battery management. We evaluate the algorithm's effectiveness based on its ability to reduce reliance on the electricity grid, highlighting the potential of DRL to enhance energy management in dairy farming. Using real-world data our results demonstrate how the PPO approach outperforms Q-learning by 1.62% for reducing electricity import from the grid. This significant improvement highlights the potential of the Deep Reinforcement Learning algorithm for improving energy efficiency and sustainability in dairy farms.

Updated: 2024-07-01 12:46:09

标题: 一种通过近端策略优化的深度强化学习方法在奶牛养殖中的电池管理

摘要: 乳业农场为其运营消耗了大量电力，本研究关注如何通过最大化利用可再生能源，增强能源效率并最小化对环境的影响。该研究探讨了应用深度强化学习算法（DRL）中的Proximal Policy Optimization（PPO）来增强乳业农场的电池管理。我们评估了该算法的有效性，基于其减少对电网依赖的能力，突显了DRL在乳业农场能源管理中的潜力。利用真实世界的数据，我们的结果表明，PPO方法相比Q学习能够减少1.62%的电力从电网进口。这一显著改进突显了深度强化学习算法在提高乳业农场能源效率和可持续性方面的潜力。

更新时间: 2024-07-01 12:46:09

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2407.01653v1

SINKT: A Structure-Aware Inductive Knowledge Tracing Model with Large Language Model

Knowledge Tracing (KT) aims to determine whether students will respond correctly to the next question, which is a crucial task in intelligent tutoring systems (ITS). In educational KT scenarios, transductive ID-based methods often face severe data sparsity and cold start problems, where interactions between individual students and questions are sparse, and new questions and concepts consistently arrive in the database. In addition, existing KT models only implicitly consider the correlation between concepts and questions, lacking direct modeling of the more complex relationships in the heterogeneous graph of concepts and questions. In this paper, we propose a Structure-aware Inductive Knowledge Tracing model with large language model (dubbed SINKT), which, for the first time, introduces large language models (LLMs) and realizes inductive knowledge tracing. Firstly, SINKT utilizes LLMs to introduce structural relationships between concepts and constructs a heterogeneous graph for concepts and questions. Secondly, by encoding concepts and questions with LLMs, SINKT incorporates semantic information to aid prediction. Finally, SINKT predicts the student's response to the target question by interacting with the student's knowledge state and the question representation. Experiments on four real-world datasets demonstrate that SINKT achieves state-of-the-art performance among 12 existing transductive KT models. Additionally, we explore the performance of SINKT on the inductive KT task and provide insights into various modules.

Updated: 2024-07-01 12:44:52

标题: SINKT：一种具有大型语言模型的结构感知归纳知识追踪模型

摘要: 知识追踪（KT）旨在确定学生是否会正确回答下一个问题，这是智能辅导系统（ITS）中的一个关键任务。在教育KT场景中，基于归纳ID的方法经常面临严重的数据稀疏和冷启动问题，其中个别学生和问题之间的互动稀疏，而新的问题和概念不断出现在数据库中。此外，现有的KT模型只隐含地考虑了概念和问题之间的相关性，缺乏对概念和问题的异构图中更复杂关系的直接建模。在本文中，我们提出了一个具有大语言模型（称为SINKT）的结构感知归纳知识追踪模型，首次引入了大语言模型（LLMs）并实现了归纳知识追踪。首先，SINKT利用LLM引入概念之间的结构关系，并为概念和问题构建了一个异构图。其次，通过使用LLM对概念和问题进行编码，SINKT结合语义信息来辅助预测。最后，SINKT通过与学生的知识状态和问题表示交互来预测学生对目标问题的回答。对四个真实数据集的实验表明，SINKT在12个现有基于归纳的KT模型中取得了最先进的性能。此外，我们探讨了SINKT在归纳KT任务上的表现，并对各个模块提供了见解。

更新时间: 2024-07-01 12:44:52

领域: cs.AI,cs.CY

下载: http://arxiv.org/abs/2407.01245v1

IID Relaxation by Logical Expressivity: A Research Agenda for Fitting Logics to Neurosymbolic Requirements

Neurosymbolic background knowledge and the expressivity required of its logic can break Machine Learning assumptions about data Independence and Identical Distribution. In this position paper we propose to analyze IID relaxation in a hierarchy of logics that fit different use case requirements. We discuss the benefits of exploiting known data dependencies and distribution constraints for Neurosymbolic use cases and argue that the expressivity required for this knowledge has implications for the design of underlying ML routines. This opens a new research agenda with general questions about Neurosymbolic background knowledge and the expressivity required of its logic.

Updated: 2024-07-01 12:44:38

标题: 基于逻辑表达能力的IID放松：适应神经符号需求的逻辑拟合研究议程

摘要: 神经符号背景知识及其逻辑所需的表达能力可以打破机器学习对数据独立性和相同分布的假设。在这篇立场论文中，我们提议分析在适应不同用例要求的一系列逻辑中的IID放松。我们讨论利用已知数据依赖性和分布约束对神经符号用例的好处，并认为这种知识所需的表达能力对底层机器学习程序的设计具有影响。这开辟了一个新的研究议程，涉及神经符号背景知识以及其逻辑所需的表达能力的一般问题。

更新时间: 2024-07-01 12:44:38

领域: cs.AI

下载: http://arxiv.org/abs/2404.19485v2

Model Internals-based Answer Attribution for Trustworthy Retrieval-Augmented Generation

Ensuring the verifiability of model answers is a fundamental challenge for retrieval-augmented generation (RAG) in the question answering (QA) domain. Recently, self-citation prompting was proposed to make large language models (LLMs) generate citations to supporting documents along with their answers. However, self-citing LLMs often struggle to match the required format, refer to non-existent sources, and fail to faithfully reflect LLMs' context usage throughout the generation. In this work, we present MIRAGE --Model Internals-based RAG Explanations -- a plug-and-play approach using model internals for faithful answer attribution in RAG applications. MIRAGE detects context-sensitive answer tokens and pairs them with retrieved documents contributing to their prediction via saliency methods. We evaluate our proposed approach on a multilingual extractive QA dataset, finding high agreement with human answer attribution. On open-ended QA, MIRAGE achieves citation quality and efficiency comparable to self-citation while also allowing for a finer-grained control of attribution parameters. Our qualitative evaluation highlights the faithfulness of MIRAGE's attributions and underscores the promising application of model internals for RAG answer attribution.

Updated: 2024-07-01 12:39:26

标题: 基于模型内部的答案归因，用于可信的检索增强生成

摘要: 确保模型答案的可验证性是问答领域中检索增强生成（RAG）的一个基本挑战。最近，提出了自引用提示，以使大型语言模型（LLMs）生成支持文档的引用以及它们的答案。然而，自引用的LLMs经常难以匹配所需的格式，参考不存在的来源，并且未能忠实地反映LLMs在生成过程中的上下文使用。在这项工作中，我们提出了MIRAGE--基于模型内部的RAG解释--一种使用模型内部进行忠实答案归因的即插即用方法。MIRAGE通过显著性方法检测上下文敏感的答案令牌，并将它们与通过检索的文档配对，以贡献于它们的预测。我们在一个多语言提取问答数据集上评估了我们提出的方法，发现与人类答案归因高度一致。在开放式问答中，MIRAGE实现了与自引用相媲美的引用质量和效率，同时还允许对归因参数进行更精细的控制。我们的定性评估突出了MIRAGE归因的忠实性，并强调了将模型内部应用于RAG答案归因的有前途的应用。

更新时间: 2024-07-01 12:39:26

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2406.13663v2

Bayesian Regression Markets

Although machine learning tasks are highly sensitive to the quality of input data, relevant datasets can often be challenging for firms to acquire, especially when held privately by a variety of owners. For instance, if these owners are competitors in a downstream market, they may be reluctant to share information. Focusing on supervised learning for regression tasks, we develop a regression market to provide a monetary incentive for data sharing. Our mechanism adopts a Bayesian framework, allowing us to consider a more general class of regression tasks. We present a thorough exploration of the market properties, and show that similar proposals in literature expose the market agents to sizeable financial risks, which can be mitigated in our setup.

Updated: 2024-07-01 12:36:03

标题: 贝叶斯回归市场

摘要: 尽管机器学习任务对输入数据的质量非常敏感，但相关数据集往往对公司难以获取，特别是当这些数据由各种所有者私下持有时。例如，如果这些所有者是下游市场的竞争对手，他们可能不愿意分享信息。我们专注于用于回归任务的监督学习，开发了一个回归市场，以提供货币激励来促进数据共享。我们的机制采用了贝叶斯框架，使我们能够考虑更一般的回归任务类别。我们对市场属性进行了深入探讨，并展示了文献中类似提议暴露市场代理商面临可观的财务风险，而在我们的设置中可以减轻这些风险。

更新时间: 2024-07-01 12:36:03

领域: cs.LG

下载: http://arxiv.org/abs/2310.14992v3

SGCCNet: Single-Stage 3D Object Detector With Saliency-Guided Data Augmentation and Confidence Correction Mechanism

The single-stage point-based 3D object detectors have attracted widespread research interest due to their advantages of lightweight and fast inference speed. However, they still face challenges such as inadequate learning of low-quality objects (ILQ) and misalignment between localization accuracy and classification confidence (MLC). In this paper, we propose SGCCNet to alleviate these two issues. For ILQ, SGCCNet adopts a Saliency-Guided Data Augmentation (SGDA) strategy to enhance the robustness of the model on low-quality objects by reducing its reliance on salient features. Specifically, We construct a classification task and then approximate the saliency scores of points by moving points towards the point cloud centroid in a differentiable process. During the training process, SGCCNet will be forced to learn from low saliency features through dropping points. Meanwhile, to avoid internal covariate shift and contextual features forgetting caused by dropping points, we add a geometric normalization module and skip connection block in each stage. For MLC, we design a Confidence Correction Mechanism (CCM) specifically for point-based multi-class detectors. This mechanism corrects the confidence of the current proposal by utilizing the predictions of other key points within the local region in the post-processing stage. Extensive experiments on the KITTI dataset demonstrate the generality and effectiveness of our SGCCNet. On the KITTI \textit{test} set, SGCCNet achieves $80.82\%$ for the metric of $AP_{3D}$ on the \textit{Moderate} level, outperforming all other point-based detectors, surpassing IA-SSD and Fast Point R-CNN by $2.35\%$ and $3.42\%$, respectively. Additionally, SGCCNet demonstrates excellent portability for other point-based detectors

Updated: 2024-07-01 12:36:01

标题: SGCCNet：单阶段3D物体检测器，具有显著性引导数据增强和置信度修正机制

摘要: 单阶段基于点的三维物体检测器由于其轻量化和快速推理速度的优势而受到广泛关注。然而，它们仍然面临着一些挑战，例如低质量物体的学习不足（ILQ）和定位精度与分类置信度之间的不一致（MLC）。在本文中，我们提出了SGCCNet来缓解这两个问题。对于ILQ，SGCCNet采用了一种基于显著性引导的数据增强（SGDA）策略，通过减少对显著特征的依赖来增强模型对低质量物体的鲁棒性。具体来说，我们构建了一个分类任务，然后通过将点向点云中心移动来近似点的显著性分数的过程中，SGCCNet将被迫从低显著性特征中学习通过删除点。同时，为了避免由于删除点而引起的内部协变量转移和上下文特征遗忘，我们在每个阶段添加了几何归一化模块和跳过连接块。对于MLC，我们为基于点的多类检测器设计了一种置信度校正机制（CCM）。该机制通过在后处理阶段利用局部区域内其他关键点的预测来校正当前提议的置信度。对KITTI数据集的大量实验证明了我们的SGCCNet的普适性和有效性。在KITTI测试集上，SGCCNet在“中等”级别的$AP_{3D}$指标上达到了$80.82\%$，优于所有其他基于点的检测器，分别比IA-SSD和Fast Point R-CNN高出$2.35\%$和$3.42\%$。此外，SGCCNet还表现出对其他基于点的检测器的出色可移植性。

更新时间: 2024-07-01 12:36:01

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2407.01239v1

Large Language Models are Zero-Shot Recognizers for Activities of Daily Living

The sensor-based recognition of Activities of Daily Living (ADLs) in smart home environments enables several applications in the areas of energy management, safety, well-being, and healthcare. ADLs recognition is typically based on deep learning methods requiring large datasets to be trained. Recently, several studies proved that Large Language Models (LLMs) effectively capture common-sense knowledge about human activities. However, the effectiveness of LLMs for ADLs recognition in smart home environments still deserves to be investigated. In this work, we propose ADL-LLM, a novel LLM-based ADLs recognition system. ADLLLM transforms raw sensor data into textual representations, that are processed by an LLM to perform zero-shot ADLs recognition. Moreover, in the scenario where a small labeled dataset is available, ADL-LLM can also be empowered with few-shot prompting. We evaluated ADL-LLM on two public datasets, showing its effectiveness in this domain.

Updated: 2024-07-01 12:32:38

标题: 大型语言模型是零-shot 识别日常生活活动者

摘要: 基于传感器的智能家居环境中的日常生活活动（ADLs）识别能够在能源管理、安全、福祉和医疗保健领域实现多种应用。ADLs识别通常基于深度学习方法，需要大规模数据集进行训练。最近的几项研究证明，大型语言模型（LLMs）有效地捕捉了有关人类活动的常识知识。然而，LLMs在智能家居环境中用于ADLs识别的有效性仍有待研究。在这项工作中，我们提出了ADL-LLM，一种基于LLM的新型ADLs识别系统。ADL-LLM将原始传感器数据转换为文本表示，该表示由LLM处理，以执行零样本ADLs识别。此外，在有少量标记数据集的情况下，ADL-LLM还可以通过少样本提示增强。我们在两个公共数据集上评估了ADL-LLM，展示了其在该领域的有效性。

更新时间: 2024-07-01 12:32:38

领域: cs.AI,cs.CL,eess.SP

下载: http://arxiv.org/abs/2407.01238v1

In-Context Reinforcement Learning for Variable Action Spaces

Recently, it has been shown that transformers pre-trained on diverse datasets with multi-episode contexts can generalize to new reinforcement learning tasks in-context. A key limitation of previously proposed models is their reliance on a predefined action space size and structure. The introduction of a new action space often requires data re-collection and model re-training, which can be costly for some applications. In our work, we show that it is possible to mitigate this issue by proposing the Headless-AD model that, despite being trained only once, is capable of generalizing to discrete action spaces of variable size, semantic content and order. By experimenting with Bernoulli and contextual bandits, as well as a gridworld environment, we show that Headless-AD exhibits significant capability to generalize to action spaces it has never encountered, even outperforming specialized models trained for a specific set of actions on several environment configurations. Implementation is available at: https://github.com/corl-team/headless-ad.

Updated: 2024-07-01 12:29:58

标题: 上下文强化学习用于可变动作空间

摘要: 最近的研究表明，预先在多种数据集上进行训练的变压器，在具有多集上下文的情况下，可以推广到新的强化学习任务。先前提出的模型的一个关键限制是它们依赖于预定义的动作空间大小和结构。引入新的动作空间通常需要重新收集数据和重新训练模型，这对于一些应用来说可能是昂贵的。在我们的工作中，我们展示了通过提出Headless-AD模型可以缓解这个问题，尽管该模型只训练一次，但能够泛化到大小、语义内容和顺序可变的离散动作空间。通过对伯努利和情境赌博机以及一个网格世界环境进行实验，我们展示了Headless-AD表现出对它从未遇到过的动作空间的泛化能力，甚至在几个环境配置上优于针对特定动作集训练的专门模型。实现可在此链接找到：https://github.com/corl-team/headless-ad。

更新时间: 2024-07-01 12:29:58

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2312.13327v6

Breaking Free: Efficient Multi-Party Private Set Union Without Non-Collusion Assumptions

Multi-party private set union (MPSU) protocol enables $m$ $(m > 2)$ parties, each holding a set, to collectively compute the union of their sets without revealing any additional information to other parties. There are two main categories of MPSU protocols: The first builds on public-key techniques. All existing works in this category involve a super-linear number of public-key operations, resulting in poor practical efficiency. The second builds on oblivious transfer and symmetric-key techniques. The only existing work in this category is proposed by Liu and Gao (ASIACRYPT 2023), which features the best concrete performance among all existing protocols, despite its super-linear computation and communication. Unfortunately, it does not achieve the standard semi-honest security, as it inherently relies on a non-collusion assumption, which is unlikely to hold in practice. Therefore, the problem of constructing a practical MPSU protocol based on oblivious transfer and symmetric-key techniques in standard semi-honest model remains open. Furthermore, there is no MPSU protocol achieving both linear computation and linear communication complexity, which leaves another unresolved problem. In this work, we resolve these two open problems. We propose the first MPSU protocol based on oblivious transfer and symmetric-key techniques in the standard semi-honest model. This protocol is $4.9-9.3 \times$ faster than Liu and Gao in the LAN setting. Concretely, our protocol requires only $3.6$ seconds in online phase for 3 parties with sets of $2^{20}$ items each. We propose the first MPSU protocol achieving both linear computation and linear communication complexity, based on public-key operations. This protocol has the lowest overall communication costs and shows a factor of $3.0-36.5\times$ improvement in terms of overall communication compared to Liu and Gao.

Updated: 2024-07-01 12:26:05

标题: 摆脱束缚：无需非合谋假设的高效多方私人集合并操作

摘要: 多方私有集合并（MPSU）协议使$m$（$m>2$）各方，每方持有一个集合，能够共同计算它们的集合的并集，而不向其他方透露任何额外信息。MPSU协议主要分为两类：第一类基于公钥技术。该类别中的所有现有作品都涉及超线性数量的公钥操作，导致实际效率较低。第二类基于遗忘传输和对称密钥技术。该类别中唯一的现有作品由Liu和Gao（ASIACRYPT 2023）提出，尽管其计算和通信都是超线性的，但在所有现有协议中具有最好的具体性能。不幸的是，它没有实现标准的半诚实安全性，因为它固有地依赖于一个不合作的假设，在实践中不太可能成立。因此，在标准半诚实模型中基于遗忘传输和对称密钥技术构建实用的MPSU协议的问题仍然存在。此外，目前没有任何MPSU协议实现线性计算和线性通信复杂度，这留下了另一个未解决的问题。在这项工作中，我们解决了这两个未解决的问题。我们提出了第一个基于遗忘传输和对称密钥技术的标准半诚实模型下的MPSU协议。在局域网环境中，这个协议比Liu和Gao快$4.9-9.3$倍。具体来说，我们的协议在线阶段只需要$3.6$秒，适用于每个拥有$2^{20}$个项的3个方。我们提出了第一个实现线性计算和线性通信复杂度的MPSU协议，基于公钥操作。这个协议具有最低的总通信成本，并且在整体通信方面相对于Liu和Gao有$3.0-36.5$倍的改进。

更新时间: 2024-07-01 12:26:05

领域: cs.CR

下载: http://arxiv.org/abs/2406.07011v3

A Fingerprint for Large Language Models

Recent advances show that scaling a pre-trained language model could achieve state-of-the-art performance on many downstream tasks, prompting large language models (LLMs) to become a hot research topic in the field of artificial intelligence. However, due to the resource-intensive nature of training LLMs from scratch, it is urgent and crucial to protect the intellectual property of LLMs against infringement. This has motivated the authors in this paper to propose a novel black-box fingerprinting technique for LLMs, which requires neither model training nor model fine-tuning. We first demonstrate that the outputs of LLMs span a unique vector space associated with each model. We model the problem of ownership authentication as the task of evaluating the similarity between the victim model's space and the output's space of the suspect model. To deal with this problem, we propose two solutions, where the first solution involves verifying whether the outputs of the suspected large model are in the same space as those of the victim model, enabling rapid identification of model infringement, and the second one reconstructs the union of the vector spaces for LLM outputs and the victim model to address situations where the victim model has undergone the Parameter-Efficient Fine-Tuning (PEFT) attacks. Experimental results indicate that the proposed technique achieves superior performance in ownership verification and robustness against PEFT attacks. This work reveals inherent characteristics of LLMs and provides a promising solution for ownership verification of LLMs in black-box scenarios, ensuring efficiency, generality and practicality.

Updated: 2024-07-01 12:25:42

标题: 大型语言模型的指纹

摘要: 最近的进展表明，对预训练语言模型进行扩展可以在许多下游任务上实现最先进的性能，这促使大型语言模型(LLMs)成为人工智能领域的热门研究课题。然而，由于从头开始训练LLMs需要大量资源，因此迫切而关键地需要保护LLMs的知识产权免受侵犯。这促使本文的作者提出了一种新颖的用于LLMs的黑盒指纹技术，既不需要模型训练也不需要模型微调。我们首先证明LLMs的输出涵盖了与每个模型相关联的唯一向量空间。我们将所有权验证问题建模为评估受害模型空间与嫌疑模型输出空间之间相似性的任务。为了解决这个问题，我们提出了两种解决方案，第一种解决方案涉及验证嫌疑大型模型的输出是否与受害模型的输出处于相同空间，从而实现对模型侵权的快速识别；第二种解决方案重构了LLM输出和受害模型向量空间的并集，以应对受害模型经历了参数高效微调(PEFT)攻击的情况。实验结果表明，所提出的技术在所有权验证和抵抗PEFT攻击方面取得了优越的性能。这项工作揭示了LLMs的固有特性，并为黑盒场景下LLMs的所有权验证提供了一种有前途的解决方案，确保效率、通用性和实用性。

更新时间: 2024-07-01 12:25:42

领域: cs.CR

下载: http://arxiv.org/abs/2407.01235v1

MIRAI: Evaluating LLM Agents for Event Forecasting

Recent advancements in Large Language Models (LLMs) have empowered LLM agents to autonomously collect world information, over which to conduct reasoning to solve complex problems. Given this capability, increasing interests have been put into employing LLM agents for predicting international events, which can influence decision-making and shape policy development on an international scale. Despite such a growing interest, there is a lack of a rigorous benchmark of LLM agents' forecasting capability and reliability. To address this gap, we introduce MIRAI, a novel benchmark designed to systematically evaluate LLM agents as temporal forecasters in the context of international events. Our benchmark features an agentic environment with tools for accessing an extensive database of historical, structured events and textual news articles. We refine the GDELT event database with careful cleaning and parsing to curate a series of relational prediction tasks with varying forecasting horizons, assessing LLM agents' abilities from short-term to long-term forecasting. We further implement APIs to enable LLM agents to utilize different tools via a code-based interface. In summary, MIRAI comprehensively evaluates the agents' capabilities in three dimensions: 1) autonomously source and integrate critical information from large global databases; 2) write codes using domain-specific APIs and libraries for tool-use; and 3) jointly reason over historical knowledge from diverse formats and time to accurately predict future events. Through comprehensive benchmarking, we aim to establish a reliable framework for assessing the capabilities of LLM agents in forecasting international events, thereby contributing to the development of more accurate and trustworthy models for international relation analysis.

Updated: 2024-07-01 12:22:46

标题: 未来：评估LLM代理进行事件预测

摘要: 近年来，大型语言模型（LLMs）的最新进展使LLM代理能够自主收集世界信息，并在此基础上进行推理以解决复杂问题。鉴于这种能力，人们对利用LLM代理预测国际事件的兴趣日益增加，这可以影响决策并塑造国际政策发展。尽管存在这种日益增长的兴趣，但缺乏对LLM代理预测能力和可靠性的严格基准。为了填补这一空白，我们引入了MIRAI，这是一个新颖的基准，旨在系统评估LLM代理在国际事件背景下作为时间预测者的能力。我们的基准具有一个代理环境，其中包含用于访问大量历史结构化事件和文本新闻文章的工具。我们通过仔细清理和解析精细调整了GDELT事件数据库，策划了一系列具有不同预测范围的关系预测任务，评估LLM代理从短期到长期预测的能力。我们进一步实现了API，以便LLM代理通过基于代码的接口利用不同工具。总之，MIRAI全面评估了代理的能力，包括：1）自主从大型全球数据库中获取和整合关键信息；2）使用领域特定的API和库编写代码以使用工具；和3）共同推理历史知识，从不同格式和时间准确预测未来事件。通过全面的基准测试，我们旨在建立一个可靠的框架，用于评估LLM代理在预测国际事件方面的能力，从而为发展更准确和可信赖的国际关系分析模型做出贡献。

更新时间: 2024-07-01 12:22:46

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2407.01231v1

Bayesian grey-box identification of nonlinear convection effects in heat transfer dynamics

We propose a computational procedure for identifying convection in heat transfer dynamics. The procedure is based on a Gaussian process latent force model, consisting of a white-box component (i.e., known physics) for the conduction and linear convection effects and a Gaussian process that acts as a black-box component for the nonlinear convection effects. States are inferred through Bayesian smoothing and we obtain approximate posterior distributions for the kernel covariance function's hyperparameters using Laplace's method. The nonlinear convection function is recovered from the Gaussian process states using a Bayesian regression model. We validate the procedure by simulation error using the identified nonlinear convection function, on both data from a simulated system and measurements from a physical assembly.

Updated: 2024-07-01 12:17:01

标题: 贝叶斯灰箱方法识别热传递动力学中的非线性对流效应

摘要: 我们提出了一种用于识别传热动力学中对流的计算过程。该过程基于高斯过程潜在力模型，包括一个白盒子组件（即已知物理）用于传导和线性对流效应，以及一个高斯过程，作为非线性对流效应的黑盒子组件。通过贝叶斯平滑推断状态，并使用拉普拉斯方法获得核协方差函数的超参数的近似后验分布。通过贝叶斯回归模型从高斯过程状态中恢复非线性对流函数。我们通过使用识别的非线性对流函数在模拟系统数据和物理装配测量数据上进行模拟误差验证该过程。

更新时间: 2024-07-01 12:17:01

领域: eess.SY,cs.CE,cs.LG,cs.SY

下载: http://arxiv.org/abs/2407.01226v1

A Policy Gradient Primal-Dual Algorithm for Constrained MDPs with Uniform PAC Guarantees

We study a primal-dual (PD) reinforcement learning (RL) algorithm for online constrained Markov decision processes (CMDPs). Despite its widespread practical use, the existing theoretical literature on PD-RL algorithms for this problem only provides sublinear regret guarantees and fails to ensure convergence to optimal policies. In this paper, we introduce a novel policy gradient PD algorithm with uniform probably approximate correctness (Uniform-PAC) guarantees, simultaneously ensuring convergence to optimal policies, sublinear regret, and polynomial sample complexity for any target accuracy. Notably, this represents the first Uniform-PAC algorithm for the online CMDP problem. In addition to the theoretical guarantees, we empirically demonstrate in a simple CMDP that our algorithm converges to optimal policies, while baseline algorithms exhibit oscillatory performance and constraint violation.

Updated: 2024-07-01 12:08:25

标题: 一种具有均匀PAC保证的受限MDPs的政策梯度原始-对偶算法

摘要: 我们研究了一种基于原始-对偶（PD）强化学习（RL）算法，用于在线受约束的马尔可夫决策过程（CMDPs）。尽管该算法在实际应用中被广泛使用，但现有的理论文献对于解决这一问题的PD-RL算法仅提供次线性遗憾保证，并未确保收敛到最优策略。在本文中，我们引入了一种新颖的策略梯度PD算法，具有统一的可能近似正确性（Uniform-PAC）保证，同时确保收敛到最优策略，次线性遗憾和对于任何目标精度的多项式样本复杂度。值得注意的是，这是针对在线CMDP问题的第一个Uniform-PAC算法。除了理论保证之外，我们在简单的CMDP中进行了实证验证，证明我们的算法收敛到最优策略，而基准算法表现出振荡性能和约束违反。

更新时间: 2024-07-01 12:08:25

领域: cs.LG

下载: http://arxiv.org/abs/2401.17780v3

Robust Low-Cost Drone Detection and Classification in Low SNR Environments

The proliferation of drones, or unmanned aerial vehicles (UAVs), has raised significant safety concerns due to their potential misuse in activities such as espionage, smuggling, and infrastructure disruption. This paper addresses the critical need for effective drone detection and classification systems that operate independently of UAV cooperation. We evaluate various convolutional neural networks (CNNs) for their ability to detect and classify drones using spectrogram data derived from consecutive Fourier transforms of signal components. The focus is on model robustness in low signal-to-noise ratio (SNR) environments, which is critical for real-world applications. A comprehensive dataset is provided to support future model development. In addition, we demonstrate a low-cost drone detection system using a standard computer, software-defined radio (SDR) and antenna, validated through real-world field testing. On our development dataset, all models consistently achieved an average balanced classification accuracy of >= 85% at SNR > -12dB. In the field test, these models achieved an average balance accuracy of > 80%, depending on transmitter distance and antenna direction. Our contributions include: a publicly available dataset for model development, a comparative analysis of CNN for drone detection under low SNR conditions, and the deployment and field evaluation of a practical, low-cost detection system.

Updated: 2024-07-01 12:07:16

标题: 在低信噪比环境中的稳健低成本无人机检测和分类

摘要: The proliferation of drones, or unmanned aerial vehicles (UAVs, has raised significant safety concerns due to their potential misuse in activities such as espionage, smuggling, and infrastructure disruption. This paper addresses the critical need for effective drone detection and classification systems that operate independently of UAV cooperation. We evaluate various convolutional neural networks (CNNs) for their ability to detect and classify drones using spectrogram data derived from consecutive Fourier transforms of signal components. The focus is on model robustness in low signal-to-noise ratio (SNR) environments, which is critical for real-world applications. A comprehensive dataset is provided to support future model development. In addition, we demonstrate a low-cost drone detection system using a standard computer, software-defined radio (SDR) and antenna, validated through real-world field testing. On our development dataset, all models consistently achieved an average balanced classification accuracy of >= 85% at SNR > -12dB. In the field test, these models achieved an average balance accuracy of > 80%, depending on transmitter distance and antenna direction. Our contributions include: a publicly available dataset for model development, a comparative analysis of CNN for drone detection under low SNR conditions, and the deployment and field evaluation of a practical, low-cost detection system.

更新时间: 2024-07-01 12:07:16

领域: eess.SP,cs.LG

下载: http://arxiv.org/abs/2406.18624v2

Let Hybrid A* Path Planner Obey Traffic Rules: A Deep Reinforcement Learning-Based Planning Framework

Deep reinforcement learning (DRL) allows a system to interact with its environment and take actions by training an efficient policy that maximizes self-defined rewards. In autonomous driving, it can be used as a strategy for high-level decision making, whereas low-level algorithms such as the hybrid A* path planning have proven their ability to solve the local trajectory planning problem. In this work, we combine these two methods where the DRL makes high-level decisions such as lane change commands. After obtaining the lane change command, the hybrid A* planner is able to generate a collision-free trajectory to be executed by a model predictive controller (MPC). In addition, the DRL algorithm is able to keep the lane change command consistent within a chosen time-period. Traffic rules are implemented using linear temporal logic (LTL), which is then utilized as a reward function in DRL. Furthermore, we validate the proposed method on a real system to demonstrate its feasibility from simulation to implementation on real hardware.

Updated: 2024-07-01 12:00:10

标题: 让混合A*路径规划器遵守交通规则：基于深度强化学习的规划框架

摘要: 深度强化学习（DRL）允许系统与环境互动，并通过训练有效的策略来采取行动，以最大化自定义奖励。在自动驾驶中，它可以用作高级决策制定的策略，而低级算法，如混合A*路径规划已经证明了其解决局部轨迹规划问题的能力。在这项工作中，我们将这两种方法结合起来，其中DRL做出高级决策，如车道变换指令。在获得车道变换指令后，混合A*规划器能够生成一个无碰撞的轨迹，由模型预测控制器（MPC）执行。此外，DRL算法能够在选择的时间段内保持车道变换指令的一致性。交通规则使用线性时序逻辑（LTL）实现，然后作为DRL的奖励函数。此外，我们在真实系统上验证了提出的方法，以展示其从模拟到实际硬件的可行性。

更新时间: 2024-07-01 12:00:10

领域: cs.RO,cs.AI

下载: http://arxiv.org/abs/2407.01216v1

Revisiting Random Walks for Learning on Graphs

We revisit a simple idea for machine learning on graphs, where a random walk on a graph produces a machine-readable record, and this record is processed by a deep neural network to directly make vertex-level or graph-level predictions. We refer to these stochastic machines as random walk neural networks, and show that we can design them to be isomorphism invariant while capable of universal approximation of graph functions in probability. A useful finding is that almost any kind of record of random walk guarantees probabilistic invariance as long as the vertices are anonymized. This enables us to record random walks in plain text and adopt a language model to read these text records to solve graph tasks. We further establish a parallelism to message passing neural networks using tools from Markov chain theory, and show that over-smoothing in message passing is alleviated by construction in random walk neural networks, while over-squashing manifests as probabilistic under-reaching. We show that random walk neural networks based on pre-trained language models can solve several hard problems on graphs, such as separating strongly regular graphs where the 3-WL test fails, counting substructures, and transductive classification on arXiv citation network without training. Code is available at https://github.com/jw9730/random-walk.

Updated: 2024-07-01 11:59:59

标题: 重温在图上学习的随机游走

摘要: 我们重新审视了一种在图上进行机器学习的简单思想，即在图上进行随机游走以生成一个机器可读的记录，然后通过深度神经网络处理该记录，直接进行顶点级或图级的预测。我们将这些随机机器称为随机游走神经网络，并展示我们可以设计它们为同构不变，同时具有概率上的图函数的通用逼近能力。一个有用的发现是，只要顶点被匿名化，几乎任何随机游走记录都能保证概率不变性。这使我们能够以纯文本记录随机游走，并采用语言模型来读取这些文本记录以解决图任务。我们进一步建立了与消息传递神经网络的并行性，使用马尔可夫链理论的工具，并展示了在随机游走神经网络的构造中缓解了消息传递中的过度平滑问题，同时过度压缩表现为概率性的未达到。我们展示了基于预训练语言模型的随机游走神经网络可以解决图上的几个难题，如在3-WL测试失败的情况下分离强规则图，计算子结构，并在arXiv引用网络上进行传导分类而无需训练。代码可在https://github.com/jw9730/random-walk找到。

更新时间: 2024-07-01 11:59:59

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2407.01214v1

Efficient Cutting Tool Wear Segmentation Based on Segment Anything Model

Tool wear conditions impact the surface quality of the workpiece and its final geometric precision. In this research, we propose an efficient tool wear segmentation approach based on Segment Anything Model, which integrates U-Net as an automated prompt generator to streamline the processes of tool wear detection. Our evaluation covered three Point-of-Interest generation methods and further investigated the effects of variations in training dataset sizes and U-Net training intensities on resultant wear segmentation outcomes. The results consistently highlight our approach's advantage over U-Net, emphasizing its ability to achieve accurate wear segmentation even with limited training datasets. This feature underscores its potential applicability in industrial scenarios where datasets may be limited.

Updated: 2024-07-01 11:57:53

标题: 基于分段任意模型的高效刀具磨损分割

摘要: 工具磨损情况会影响工件的表面质量和最终的几何精度。在这项研究中，我们提出了一种基于Segment Anything Model的高效工具磨损分割方法，该方法集成了U-Net作为自动提示生成器，以简化工具磨损检测过程。我们的评估涵盖了三种感兴趣点生成方法，并进一步研究了训练数据集大小的变化以及U-Net训练强度对磨损分割结果的影响。结果一致强调了我们的方法相对于U-Net的优势，强调了即使训练数据集有限，也能实现准确的磨损分割的能力。这一特点凸显了其在数据集可能有限的工业场景中的潜在适用性。

更新时间: 2024-07-01 11:57:53

领域: cs.CV,cs.AI,cs.LG,eess.IV

下载: http://arxiv.org/abs/2407.01211v1

AdaCL:Adaptive Continual Learning

Class-Incremental Learning aims to update a deep classifier to learn new categories while maintaining or improving its accuracy on previously observed classes. Common methods to prevent forgetting previously learned classes include regularizing the neural network updates and storing exemplars in memory, which come with hyperparameters such as the learning rate, regularization strength, or the number of exemplars. However, these hyperparameters are usually only tuned at the start and then kept fixed throughout the learning sessions, ignoring the fact that newly encountered tasks may have varying levels of novelty or difficulty. This study investigates the necessity of hyperparameter `adaptivity' in Class-Incremental Learning: the ability to dynamically adjust hyperparameters such as the learning rate, regularization strength, and memory size according to the properties of the new task at hand. We propose AdaCL, a Bayesian Optimization-based approach to automatically and efficiently determine the optimal values for those parameters with each learning task. We show that adapting hyperpararmeters on each new task leads to improvement in accuracy, forgetting and memory. Code is available at https://github.com/ElifCerenGokYildirim/AdaCL.

Updated: 2024-07-01 11:57:06

标题: AdaCL：自适应持续学习

摘要: Class-Incremental Learning旨在更新深度分类器，以学习新的类别，同时保持或提高先前观察到的类别的准确性。防止遗忘先前学习的类别的常见方法包括规范化神经网络更新和在内存中存储示例，这些方法涉及学习率、规范化强度或示例数量等超参数。然而，这些超参数通常只在开始时进行调整，然后在整个学习过程中保持固定，忽视了新遇到的任务可能具有不同程度的新颖性或难度的事实。本研究调查了Class-Incremental Learning中超参数“适应性”的必要性：根据手头的新任务的属性动态调整学习率、规范化强度和内存大小等超参数的能力。我们提出了AdaCL，一种基于贝叶斯优化的方法，用于自动有效地确定每个学习任务的这些参数的最佳值。我们表明，在每个新任务上调整超参数会提高准确性、遗忘和记忆。代码可在https://github.com/ElifCerenGokYildirim/AdaCL上找到。

更新时间: 2024-07-01 11:57:06

领域: cs.LG,cs.CV

下载: http://arxiv.org/abs/2303.13113v3

Energy-based Epistemic Uncertainty for Graph Neural Networks

In domains with interdependent data, such as graphs, quantifying the epistemic uncertainty of a Graph Neural Network (GNN) is challenging as uncertainty can arise at different structural scales. Existing techniques neglect this issue or only distinguish between structure-aware and structure-agnostic uncertainty without combining them into a single measure. We propose GEBM, an energy-based model (EBM) that provides high-quality uncertainty estimates by aggregating energy at different structural levels that naturally arise from graph diffusion. In contrast to logit-based EBMs, we provably induce an integrable density in the data space by regularizing the energy function. We introduce an evidential interpretation of our EBM that significantly improves the predictive robustness of the GNN. Our framework is a simple and effective post hoc method applicable to any pre-trained GNN that is sensitive to various distribution shifts. It consistently achieves the best separation of in-distribution and out-of-distribution data on 6 out of 7 anomaly types while having the best average rank over shifts on \emph{all} datasets.

Updated: 2024-07-01 11:56:17

标题: 基于能量的图神经网络的认知不确定性

摘要: 在具有相互依赖数据的领域，如图形中，量化图神经网络（GNN）的认知不确定性是具有挑战性的，因为不确定性可以在不同的结构尺度上出现。现有技术忽视了这个问题，或者只区分了结构感知和结构无关的不确定性，而没有将它们合并为一个单一度量。我们提出了GEBM，一种基于能量的模型（EBM），通过聚合从图扩散中自然产生的不同结构水平的能量，提供高质量的不确定性估计。与基于逻辑的EBM相比，我们通过对能量函数进行正则化，可以证明在数据空间中引入可积密度。我们引入了我们的EBM的证据解释，显著提高了GNN的预测稳健性。我们的框架是一种简单有效的事后方法，适用于任何对各种分布转移敏感的预训练GNN。在7种异常类型中的6种上，它始终实现了最好的内部分布和外部分布数据的分离，同时在所有数据集上具有最佳的平均排名。

更新时间: 2024-07-01 11:56:17

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2406.04043v2

SCIF: A Language for Compositional Smart Contract Security

Securing smart contracts remains a fundamental challenge. At its core, it is about building software that is secure in composition with untrusted code, a challenge that extends far beyond blockchains. We introduce SCIF, a language for building smart contracts that are compositionally secure. SCIF is based on the fundamentally compositional principle of secure information flow, but extends this core mechanism to include protection against reentrancy attacks, confused deputy attacks, and improper error handling, even in the presence of malicious contracts that do not follow SCIF's rules. SCIF supports a rich ecosystem of interacting principals with partial trust through its mechanisms for dynamic trust management. SCIF has been implemented as a compiler to Solidity. We describe the SCIF language, including its static checking rules and runtime. Finally, we implement several applications with intricate security reasoning, showing how SCIF supports building complex smart contracts securely and gives programmer accurate diagnostics about potential security bugs.

Updated: 2024-07-01 11:51:21

标题: SCIF：一种用于组合智能合约安全性的语言

摘要: 保护智能合约仍然是一个基本挑战。在其核心，它是关于构建与不受信任代码安全组合的软件，这是一个远远超出区块链范围的挑战。我们介绍了SCIF，这是一种用于构建组合安全智能合约的语言。SCIF基于安全信息流的基本组合原则，但将这一核心机制扩展到包括对重入攻击、混淆副手攻击和不当错误处理的保护，即使存在不遵循SCIF规则的恶意合约。SCIF通过其动态信任管理机制支持一个与部分信任的互动主体丰富的生态系统。SCIF已经实现为Solidity的编译器。我们描述了SCIF语言，包括其静态检查规则和运行时。最后，我们实现了几个具有复杂安全推理的应用程序，展示了SCIF如何支持安全构建复杂的智能合约，并为程序员提供有关潜在安全漏洞的准确诊断。

更新时间: 2024-07-01 11:51:21

领域: cs.CR,cs.PL

下载: http://arxiv.org/abs/2407.01204v1

Deep Learning Approach for Enhanced Transferability and Learning Capacity in Tool Wear Estimation

As an integral part of contemporary manufacturing, monitoring systems obtain valuable information during machining to oversee the condition of both the process and the machine. Recently, diverse algorithms have been employed to detect tool wear using single or multiple sources of measurements. In this study, a deep learning approach is proposed for estimating tool wear, considering cutting parameters. The model's accuracy and transferability in tool wear estimation were assessed with milling experiments conducted under varying cutting parameters. The results indicate that the proposed method outperforms conventional methods in terms of both transferability and rapid learning capabilities.

Updated: 2024-07-01 11:49:10

标题: 深度学习方法用于提高工具磨损估计的可转移性和学习能力

摘要: 作为当代制造业的一个组成部分，监控系统在加工过程中获取有价值的信息，以监督过程和机器的状态。最近，已经采用了各种算法来利用单一或多个测量源来检测刀具磨损。在这项研究中，提出了一种深度学习方法来估计刀具磨损，考虑到切削参数。通过在不同切削参数下进行的铣削实验评估了模型在刀具磨损估计中的准确性和可转移性。结果表明，所提出的方法在可转移性和快速学习能力方面优于传统方法。

更新时间: 2024-07-01 11:49:10

领域: cs.LG,cs.AI,eess.SP

下载: http://arxiv.org/abs/2407.01200v1

Deep Learning Based Tool Wear Estimation Considering Cutting Conditions

Tool wear conditions impact the final quality of the workpiece. In this study, we propose a deep learning approach based on a convolutional neural network that incorporates cutting conditions as extra model inputs, aiming to improve tool wear estimation accuracy and fulfill industrial demands for zero-shot transferability. Through a series of milling experiments under various cutting parameters, we evaluate the model's performance in terms of tool wear estimation accuracy and its transferability to new fixed or variable cutting parameters. The results consistently highlight our approach's advantage over conventional models that omit cutting conditions, maintaining superior performance irrespective of the stability of the wear development or the limitation of the training dataset. This finding underscores its potential applicability in industrial scenarios.

Updated: 2024-07-01 11:48:33

标题: 基于深度学习的考虑切削条件的刀具磨损估计

摘要: 刀具磨损状况会影响工件的最终质量。在这项研究中，我们提出了一种基于卷积神经网络的深度学习方法，将切削条件作为额外的模型输入，旨在提高刀具磨损估计精度，并满足工业对零射击转移能力的需求。通过一系列不同切削参数下的铣削实验，我们评估了模型在刀具磨损估计精度方面的表现，以及其在新的固定或可变切削参数下的转移能力。结果一致突显了我们的方法优于忽略切削条件的传统模型，在磨损发展稳定性或训练数据集限制的情况下保持卓越性能。这一发现强调了其在工业场景中的潜在应用性。

更新时间: 2024-07-01 11:48:33

领域: cs.LG,cs.AI,eess.SP

下载: http://arxiv.org/abs/2407.01199v1

Minimax Excess Risk of First-Order Methods for Statistical Learning with Data-Dependent Oracles

In this paper, our aim is to analyse the generalization capabilities of first-order methods for statistical learning in multiple, different yet related, scenarios including supervised learning, transfer learning, robust learning and federated learning. To do so, we provide sharp upper and lower bounds for the minimax excess risk of strongly convex and smooth statistical learning when the gradient is accessed through partial observations given by a data-dependent oracle. This novel class of oracles can query the gradient with any given data distribution, and is thus well suited to scenarios in which the training data distribution does not match the target (or test) distribution. In particular, our upper and lower bounds are proportional to the smallest mean square error achievable by gradient estimators, thus allowing us to easily derive multiple sharp bounds in the aforementioned scenarios using the extensive literature on parameter estimation.

Updated: 2024-07-01 11:44:15

标题: 统计学习中数据相关的预言者的第一阶方法的最小化超额风险

摘要: 在本文中，我们的目标是分析一阶方法在多个不同但相关的情景中对统计学习的泛化能力，包括监督学习、迁移学习、鲁棒学习和联邦学习。为此，我们为强凸和平滑统计学习的极小超出风险提供了尖锐的上下界，当通过一个由数据相关的预测器提供的部分观测来访问梯度时。这种新颖的预测器可以查询梯度与任何给定的数据分布，并因此非常适用于训练数据分布不匹配目标（或测试）分布的情景。特别地，我们的上下界与梯度估计器可实现的最小均方误差成比例，因此可以利用广泛的参数估计文献轻松推导出前述情景中的多个尖锐边界。

更新时间: 2024-07-01 11:44:15

领域: cs.LG,math.OC

下载: http://arxiv.org/abs/2307.04679v3

Extracting Protocol Format as State Machine via Controlled Static Loop Analysis

Reverse engineering of protocol message formats is critical for many security applications. Mainstream techniques use dynamic analysis and inherit its low-coverage problem -- the inferred message formats only reflect the features of their inputs. To achieve high coverage, we choose to use static analysis to infer message formats from the implementation of protocol parsers. In this work, we focus on a class of extremely challenging protocols whose formats are described via constraint-enhanced regular expressions and parsed using finite-state machines. Such state machines are often implemented as complicated parsing loops, which are inherently difficult to analyze via conventional static analysis. Our new technique extracts a state machine by regarding each loop iteration as a state and the dependency between loop iterations as state transitions. To achieve high, i.e., path-sensitive, precision but avoid path explosion, the analysis is controlled to merge as many paths as possible based on carefully-designed rules. The evaluation results show that we can infer a state machine and, thus, the message formats, in five minutes with over 90% precision and recall, far better than state of the art. We also applied the state machines to enhance protocol fuzzers, which are improved by 20% to 230% in terms of coverage and detect ten more zero-days compared to baselines.

Updated: 2024-07-01 11:43:28

标题: 通过受控静态循环分析提取协议格式作为状态机

摘要: 协议消息格式的逆向工程对许多安全应用至关重要。主流技术使用动态分析并继承其低覆盖率问题 -- 推断的消息格式仅反映其输入的特征。为了实现高覆盖率，我们选择使用静态分析从协议解析器的实现中推断消息格式。在这项工作中，我们关注一类极具挑战性的协议，其格式通过增强约束正则表达式描述，并使用有限状态机进行解析。这种状态机通常被实现为复杂的解析循环，传统静态分析难以对其进行分析。我们的新技术通过将每个循环迭代视为一个状态，将循环迭代之间的依赖关系视为状态转换来提取状态机。为了实现高、即路径敏感的精度，同时避免路径爆炸，分析受控地根据精心设计的规则合并尽可能多的路径。评估结果显示，我们可以在五分钟内推断出状态机和消息格式，精度和召回率均超过90％，远远优于现有技术水平。我们还将状态机应用于增强协议模糊测试工具，覆盖范围提高了20％至230％，相较于基准测试，检测到了更多的零日漏洞。

更新时间: 2024-07-01 11:43:28

领域: cs.CR,cs.PL,cs.SE

下载: http://arxiv.org/abs/2305.13483v4

A Learned Generalized Geodesic Distance Function-Based Approach for Node Feature Augmentation on Graphs

Geodesic distances on manifolds have numerous applications in image processing, computer graphics and computer vision. In this work, we introduce an approach called `LGGD' (Learned Generalized Geodesic Distances). This method involves generating node features by learning a generalized geodesic distance function through a training pipeline that incorporates training data, graph topology and the node content features. The strength of this method lies in the proven robustness of the generalized geodesic distances to noise and outliers. Our contributions encompass improved performance in node classification tasks, competitive results with state-of-the-art methods on real-world graph datasets, the demonstration of the learnability of parameters within the generalized geodesic equation on graph, and dynamic inclusion of new labels.

Updated: 2024-07-01 11:39:15

标题: 一种基于学习广义测地距离函数的图节点特征增强方法

摘要: 在流形上的测地距离在图像处理、计算机图形学和计算机视觉中有许多应用。在这项工作中，我们引入了一种称为“LGGD”（学习的广义测地距离）的方法。该方法涉及通过学习一个广义测地距离函数生成节点特征，通过一个包含训练数据、图拓扑和节点内容特征的训练流程。该方法的优势在于广义测地距离对噪声和异常值的稳健性已被证实。我们的贡献包括在节点分类任务中表现提升、实际图数据集上与最先进方法的竞争结果、在图上证明广义测地方程中参数的可学习性，以及动态包含新标签。

更新时间: 2024-07-01 11:39:15

领域: cs.LG

下载: http://arxiv.org/abs/2407.01194v1

MARS: Multimodal Active Robotic Sensing for Articulated Characterization

Precise perception of articulated objects is vital for empowering service robots. Recent studies mainly focus on point cloud, a single-modal approach, often neglecting vital texture and lighting details and assuming ideal conditions like optimal viewpoints, unrepresentative of real-world scenarios. To address these limitations, we introduce MARS, a novel framework for articulated object characterization. It features a multi-modal fusion module utilizing multi-scale RGB features to enhance point cloud features, coupled with reinforcement learning-based active sensing for autonomous optimization of observation viewpoints. In experiments conducted with various articulated object instances from the PartNet-Mobility dataset, our method outperformed current state-of-the-art methods in joint parameter estimation accuracy. Additionally, through active sensing, MARS further reduces errors, demonstrating enhanced efficiency in handling suboptimal viewpoints. Furthermore, our method effectively generalizes to real-world articulated objects, enhancing robot interactions. Code is available at https://github.com/robhlzeng/MARS.

Updated: 2024-07-01 11:32:39

标题: 火星：用于关节特征描述的多模态主动机器人感知

摘要: 精确感知关节对象对于赋予服务机器人能力至关重要。最近的研究主要集中在点云上，这是一种单模态方法，通常忽略了重要的纹理和光照细节，并假设理想条件，如最佳视角，不代表真实世界的情况。为了解决这些局限性，我们引入了MARS，这是一个用于关节对象表征的新框架。它具有一个利用多尺度RGB特征增强点云特征的多模态融合模块，结合基于强化学习的主动感知，实现观察视角的自主优化。在对PartNet-Mobility数据集中的各种关节对象实例进行的实验中，我们的方法在联合参数估计准确性方面优于当前的最先进方法。此外，通过主动感知，MARS进一步减少了错误，展示了在处理次优视角时的增强效率。此外，我们的方法有效地泛化到真实世界的关节对象，增强了机器人的交互。代码可在https://github.com/robhlzeng/MARS获得。

更新时间: 2024-07-01 11:32:39

领域: cs.RO,cs.AI,cs.CV

下载: http://arxiv.org/abs/2407.01191v1

Fast and Efficient 2-bit LLM Inference on GPU: 2/4/16-bit in a Weight Matrix with Asynchronous Dequantization

Large language models (LLMs) have demonstrated impressive abilities in various domains while the inference cost is expensive. Many previous studies exploit quantization methods to reduce LLM inference cost by reducing latency and memory consumption. Applying 2-bit single-precision weight quantization brings >3% accuracy loss, so the state-of-the-art methods use mixed-precision methods for LLMs (e.g. Llama2-7b, etc.) to improve the accuracy. However, challenges still exist: (1) Uneven distribution in weight matrix. (2) Large speed degradation by adding sparse outliers. (3) Time-consuming dequantization operations on GPUs. To tackle these challenges and enable fast and efficient LLM inference on GPUs, we propose the following techniques in this paper. (1) Intra-weight mixed-precision quantization. (2) Exclusive 2-bit sparse outlier with minimum speed degradation. (3) Asynchronous dequantization. We conduct extensive experiments on different model families (e.g. Llama3, etc.) and model sizes. We achieve 2.91-bit for each weight considering all scales/zeros for different models with negligible loss. As a result, with our 2/4/16 mixed-precision quantization for each weight matrix and asynchronous dequantization during inference, our design achieves an end-to-end speedup for Llama2-7b is 1.74x over the original model, and we reduce both runtime cost and total cost by up to 2.53x and 2.29x with less GPU requirements.

Updated: 2024-07-01 11:13:54

标题: 在GPU上快速高效地进行2位LLM推断：在具有异步去量化的权重矩阵中的2/4/16位

摘要: 大型语言模型（LLMs）在各个领域展示出令人印象深刻的能力，但推理成本昂贵。许多先前的研究利用量化方法来减少LLM推理成本，通过降低延迟和内存消耗。应用2位单精度权重量化会导致>3%的准确性损失，因此最先进的方法使用LLMs的混合精度方法（例如Llama2-7b等）来提高准确性。然而，仍然存在挑战：（1）权重矩阵中的不均匀分布。（2）添加稀疏异常值会导致速度大幅下降。（3）在GPU上耗时的去量化操作。为了解决这些挑战并在GPU上实现快速高效的LLM推理，本文提出了以下技术：（1）权重内部混合精度量化。（2）独特的2位稀疏异常值，速度下降最小。（3）异步去量化。我们对不同模型系列（如Llama3等）和模型大小进行了广泛的实验。考虑所有比例/零点，我们对每个权重实现了2.91位，几乎没有损失。因此，通过对每个权重矩阵进行2/4/16混合精度量化，并在推理过程中进行异步去量化，我们的设计在Llama2-7b上实现了端到端加速1.74倍，减少了运行时成本和总成本，GPU要求更少，最多减少了2.53倍和2.29倍。

更新时间: 2024-07-01 11:13:54

领域: cs.LG,cs.DC

下载: http://arxiv.org/abs/2311.16442v3

Remote sensing framework for geological mapping via stacked autoencoders and clustering

Supervised machine learning methods for geological mapping via remote sensing face limitations due to the scarcity of accurately labelled training data that can be addressed by unsupervised learning, such as dimensionality reduction and clustering. Dimensionality reduction methods have the potential to play a crucial role in improving the accuracy of geological maps. Although conventional dimensionality reduction methods may struggle with nonlinear data, unsupervised deep learning models such as autoencoders can model non-linear relationships. Stacked autoencoders feature multiple interconnected layers to capture hierarchical data representations useful for remote sensing data. This study presents an unsupervised machine learning-based framework for processing remote sensing data using stacked autoencoders for dimensionality reduction and k-means clustering for mapping geological units. We use Landsat 8, ASTER, and Sentinel-2 datasets to evaluate the framework for geological mapping of the Mutawintji region in Western New South Wales, Australia. We also compare stacked autoencoders with principal component analysis and canonical autoencoders. Our results reveal that the framework produces accurate and interpretable geological maps, efficiently discriminating rock units. We find that the accuracy of stacked autoencoders ranges from 86.6 % to 90 %, depending on the remote sensing data type, which is superior to their counterparts. We also find that the generated maps align with prior geological knowledge of the study area while providing novel insights into geological structures.

Updated: 2024-07-01 11:11:29

标题: 远程感知框架用于通过堆叠自动编码器和聚类进行地质制图

摘要: 通过遥感地质制图的监督机器学习方法面临着由于准确标记训练数据的稀缺性而导致的局限性，这可以通过无监督学习来解决，比如降维和聚类。降维方法有潜力在改善地质图的准确性方面发挥关键作用。虽然传统的降维方法可能会在处理非线性数据时遇到困难，但无监督深度学习模型如自动编码器可以对非线性关系进行建模。堆叠自动编码器具有多个相互连接的层，可以捕获对遥感数据有用的分层数据表示。本研究提出了一个基于无监督机器学习的框架，使用堆叠自动编码器进行降维和k均值聚类，用于地质单元的制图。我们使用Landsat 8、ASTER和Sentinel-2数据集评估了该框架在澳大利亚新南威尔士州西部Mutawintji地区地质制图的效果。我们还将堆叠自动编码器与主成分分析和规范自动编码器进行了比较。我们的结果显示，该框架能够产生准确且可解释的地质图，有效地区分岩石单元。我们发现，堆叠自动编码器的准确度在86.6%至90%之间，取决于遥感数据类型，优于其对应的方法。我们还发现生成的地图与对研究区域的先前地质知识相符，同时提供了对地质结构的新见解。

更新时间: 2024-07-01 11:11:29

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2404.02180v2

A General Verification Framework for Dynamical and Control Models via Certificate Synthesis

An emerging branch of control theory specialises in certificate learning, concerning the specification of a desired (possibly complex) system behaviour for an autonomous or control model, which is then analytically verified by means of a function-based proof. However, the synthesis of controllers abiding by these complex requirements is in general a non-trivial task and may elude the most expert control engineers. This results in a need for automatic techniques that are able to design controllers and to analyse a wide range of elaborate specifications. In this paper, we provide a general framework to encode system specifications and define corresponding certificates, and we present an automated approach to formally synthesise controllers and certificates. Our approach contributes to the broad field of safe learning for control, exploiting the flexibility of neural networks to provide candidate control and certificate functions, whilst using SMT-solvers to offer a formal guarantee of correctness. We test our framework by developing a prototype software tool, and assess its efficacy at verification via control and certificate synthesis over a large and varied suite of benchmarks.

Updated: 2024-07-01 11:08:14

标题: 一个通用的动力学和控制模型验证框架：通过证书合成

摘要: 控制理论的一个新兴分支专门研究证书学习，涉及对自主或控制模型的期望（可能是复杂的）系统行为规范，然后通过基于函数的证明进行分析验证。然而，遵守这些复杂要求进行控制器的合成通常是一项非常困难的任务，可能会超出大多数专家控制工程师的能力。这导致了对能够设计控制器并分析各种复杂规范的自动技术的需求。在本文中，我们提供了一个通用框架来编码系统规范并定义相应的证书，并提出了一种自动化方法来正式合成控制器和证书。我们的方法对于控制的安全学习领域做出了贡献，利用神经网络的灵活性提供候选控制和证书函数，同时使用SMT求解器提供正确性的正式保证。我们通过开发一个原型软件工具来测试我们的框架，并通过对一系列大型和多样化基准测试的控制和证书合成来评估其有效性。

更新时间: 2024-07-01 11:08:14

领域: eess.SY,cs.LG,cs.LO,cs.SY

下载: http://arxiv.org/abs/2309.06090v2

$\text{Memory}^3$: Language Modeling with Explicit Memory

The training and inference of large language models (LLMs) are together a costly process that transports knowledge from raw data to meaningful computation. Inspired by the memory hierarchy of the human brain, we reduce this cost by equipping LLMs with explicit memory, a memory format cheaper than model parameters and text retrieval-augmented generation (RAG). Conceptually, with most of its knowledge externalized to explicit memories, the LLM can enjoy a smaller parameter size, training cost, and inference cost, all proportional to the amount of remaining "abstract knowledge". As a preliminary proof of concept, we train from scratch a 2.4B LLM, which achieves better performance than much larger LLMs as well as RAG models, and maintains higher decoding speed than RAG. The model is named $\text{Memory}^3$, since explicit memory is the third form of memory in LLMs after implicit memory (model parameters) and working memory (context key-values). We introduce a memory circuitry theory to support the externalization of knowledge, and present novel techniques including a memory sparsification mechanism that makes storage tractable and a two-stage pretraining scheme that facilitates memory formation.

Updated: 2024-07-01 11:07:23

标题: "Memory^3: 具有显式记忆的语言建模"

摘要: 大语言模型（LLMs）的训练和推断是一个昂贵的过程，将知识从原始数据转化为有意义的计算。受人类大脑记忆层次结构的启发，我们通过为LLMs配备显式内存来降低这种成本，这种内存格式比模型参数和文本检索增强生成（RAG）更便宜。概念上，将大部分知识外部化到显式内存后，LLM可以享受更小的参数大小、训练成本和推断成本，都与剩余的“抽象知识”量成比例。作为概念的初步证明，我们从头开始训练了一个2.4B的LLM，其性能优于更大的LLMs以及RAG模型，并且比RAG具有更高的解码速度。该模型被命名为$\text{Memory}^3$，因为显式内存是LLMs中隐式内存（模型参数）和工作记忆（上下文键值）之后的第三种内存形式。我们提出了一种记忆电路理论来支持知识的外部化，并提出了包括一种记忆稀疏化机制使存储可控，以及一个两阶段预训练方案来促进记忆形成等新颖技术。

更新时间: 2024-07-01 11:07:23

领域: cs.CL,cs.AI,cs.LG,68T50,I.2.7

下载: http://arxiv.org/abs/2407.01178v1

Contextualized Hybrid Ensemble Q-learning: Learning Fast with Control Priors

Combining Reinforcement Learning (RL) with a prior controller can yield the best out of two worlds: RL can solve complex nonlinear problems, while the control prior ensures safer exploration and speeds up training. Prior work largely blends both components with a fixed weight, neglecting that the RL agent's performance varies with the training progress and across regions in the state space. Therefore, we advocate for an adaptive strategy that dynamically adjusts the weighting based on the RL agent's current capabilities. We propose a new adaptive hybrid RL algorithm, Contextualized Hybrid Ensemble Q-learning (CHEQ). CHEQ combines three key ingredients: (i) a time-invariant formulation of the adaptive hybrid RL problem treating the adaptive weight as a context variable, (ii) a weight adaption mechanism based on the parametric uncertainty of a critic ensemble, and (iii) ensemble-based acceleration for data-efficient RL. Evaluating CHEQ on a car racing task reveals substantially stronger data efficiency, exploration safety, and transferability to unknown scenarios than state-of-the-art adaptive hybrid RL methods.

Updated: 2024-07-01 11:02:45

标题: 上下文化混合集成Q学习：利用控制先验快速学习

摘要: 将强化学习（RL）与先前的控制器结合起来可以兼顾两个世界的优势：RL可以解决复杂的非线性问题，而控制先验可以确保更安全的探索并加快训练速度。先前的工作主要将这两个组件以固定权重混合在一起，忽略了RL代理的表现随着训练进展和状态空间中的区域而变化的事实。因此，我们主张采用一种自适应策略，根据RL代理当前的能力动态调整权重。我们提出了一种新的自适应混合RL算法，Contextualized Hybrid Ensemble Q-learning（CHEQ）。CHEQ结合了三个关键要素：（i）将自适应混合RL问题的时间不变公式化，将自适应权重视为上下文变量，（ii）基于评论家集合的参数不确定性的权重调整机制，以及（iii）基于集合的加速，用于数据高效的RL。在汽车赛跑任务上评估CHEQ显示出比最先进的自适应混合RL方法更强大的数据效率、探索安全性和对未知情景的可转移性。

更新时间: 2024-07-01 11:02:45

领域: cs.LG

下载: http://arxiv.org/abs/2406.19768v2

Explaining the Explainers in Graph Neural Networks: a Comparative Study

Following a fast initial breakthrough in graph based learning, Graph Neural Networks (GNNs) have reached a widespread application in many science and engineering fields, prompting the need for methods to understand their decision process. GNN explainers have started to emerge in recent years, with a multitude of methods both novel or adapted from other domains. To sort out this plethora of alternative approaches, several studies have benchmarked the performance of different explainers in terms of various explainability metrics. However, these earlier works make no attempts at providing insights into why different GNN architectures are more or less explainable, or which explainer should be preferred in a given setting. In this survey, we fill these gaps by devising a systematic experimental study, which tests ten explainers on eight representative architectures trained on six carefully designed graph and node classification datasets. With our results we provide key insights on the choice and applicability of GNN explainers, we isolate key components that make them usable and successful and provide recommendations on how to avoid common interpretation pitfalls. We conclude by highlighting open questions and directions of possible future research.

Updated: 2024-07-01 10:48:40

标题: 图神经网络中的解释者解释：一项比较研究

摘要: 在图形学习的快速初步突破之后，图神经网络（GNNs）已经在许多科学和工程领域广泛应用，促使人们需要了解它们的决策过程。近年来，GNN解释器开始出现，有许多方法是新颖的或者是从其他领域调整过来的。为了梳理这一大量的替代方法，一些研究对不同解释器的表现进行了基于各种可解释性指标的基准测试。然而，这些早期的研究并未试图提供关于为什么不同的GNN架构更或更少可解释，或在特定情况下应该选择哪种解释器的见解。在这项调查中，我们通过设计一个系统性实验研究来填补这些空白，测试了十种解释器在八种代表性架构上训练的六个精心设计的图形和节点分类数据集。通过我们的结果，我们提供了关于选择和应用GNN解释器的关键见解，我们分离了使其可用和成功的关键组件，并提供了如何避免常见解释误区的建议。我们最后总结了未解决的问题，并指出可能未来研究的方向。

更新时间: 2024-07-01 10:48:40

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2210.15304v3

Neural Conditional Probability for Inference

We introduce NCP (Neural Conditional Probability), a novel operator-theoretic approach for learning conditional distributions with a particular focus on inference tasks. NCP can be used to build conditional confidence regions and extract important statistics like conditional quantiles, mean, and covariance. It offers streamlined learning through a single unconditional training phase, facilitating efficient inference without the need for retraining even when conditioning changes. By tapping into the powerful approximation capabilities of neural networks, our method efficiently handles a wide variety of complex probability distributions, effectively dealing with nonlinear relationships between input and output variables. Theoretical guarantees ensure both optimization consistency and statistical accuracy of the NCP method. Our experiments show that our approach matches or beats leading methods using a simple Multi-Layer Perceptron (MLP) with two hidden layers and GELU activations. This demonstrates that a minimalistic architecture with a theoretically grounded loss function can achieve competitive results without sacrificing performance, even in the face of more complex architectures.

Updated: 2024-07-01 10:44:29

标题: 神经条件概率用于推理

摘要: 我们介绍了NCP（神经条件概率），这是一种新颖的运算符理论方法，用于学习有特定推理任务的条件分布。NCP可用于构建条件置信区间，并提取重要统计量，如条件分位数、均值和协方差。它通过单一无条件训练阶段实现了简化学习，促进了有效的推理，即使在条件改变时也无需重新训练。通过利用神经网络的强大逼近能力，我们的方法有效地处理各种复杂概率分布，有效地处理输入和输出变量之间的非线性关系。理论保证确保了NCP方法的优化一致性和统计准确性。我们的实验表明，我们的方法使用具有两个隐藏层和GELU激活的简单多层感知器（MLP）匹配或超越了领先方法。这表明，具有理论基础的损失函数的极简主义架构可以在不牺牲性能的情况下实现竞争性结果，即使面对更复杂的架构。

更新时间: 2024-07-01 10:44:29

领域: cs.LG,math.ST,stat.ME,stat.ML,stat.TH

下载: http://arxiv.org/abs/2407.01171v1

Multi-View Black-Box Physical Attacks on Infrared Pedestrian Detectors Using Adversarial Infrared Grid

While extensive research exists on physical adversarial attacks within the visible spectrum, studies on such techniques in the infrared spectrum are limited. Infrared object detectors are vital in modern technological applications but are susceptible to adversarial attacks, posing significant security threats. Previous studies using physical perturbations like light bulb arrays and aerogels for white-box attacks, or hot and cold patches for black-box attacks, have proven impractical or limited in multi-view support. To address these issues, we propose the Adversarial Infrared Grid (AdvGrid), which models perturbations in a grid format and uses a genetic algorithm for black-box optimization. These perturbations are cyclically applied to various parts of a pedestrian's clothing to facilitate multi-view black-box physical attacks on infrared pedestrian detectors. Extensive experiments validate AdvGrid's effectiveness, stealthiness, and robustness. The method achieves attack success rates of 80.00\% in digital environments and 91.86\% in physical environments, outperforming baseline methods. Additionally, the average attack success rate exceeds 50\% against mainstream detectors, demonstrating AdvGrid's robustness. Our analyses include ablation studies, transfer attacks, and adversarial defenses, confirming the method's superiority.

Updated: 2024-07-01 10:38:08

标题: 使用对抗性红外光栅进行多视角黑匣子物理攻击红外行人检测器

摘要: 尽管关于可见光谱内的物理对抗攻击存在大量研究，但红外光谱内的这种技术的研究却有限。红外物体探测器在现代技术应用中至关重要，但容易受到对抗攻击，构成重大安全威胁。先前的研究使用物理扰动，如灯泡阵列和气凝胶进行白盒攻击，或使用热点和冷点进行黑盒攻击，但证明在多视角支持方面不切实际或受限。为了解决这些问题，我们提出了Adversarial Infrared Grid (AdvGrid)方法，以网格格式建模扰动，并使用遗传算法进行黑盒优化。这些扰动周期性地应用于行人服装的各个部分，以促进对红外行人探测器的多视角黑盒物理攻击。广泛的实验验证了AdvGrid的有效性、隐蔽性和稳健性。该方法在数字环境中实现了80.00\%的攻击成功率，在物理环境中实现了91.86\%的攻击成功率，优于基准方法。此外，平均攻击成功率超过50\%，针对主流探测器，展示了AdvGrid的稳健性。我们的分析包括消融研究、转移攻击和对抗性防御，证实了该方法的优越性。

更新时间: 2024-07-01 10:38:08

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2407.01168v1

Information Density Bounds for Privacy

This paper explores the implications of guaranteeing privacy by imposing a lower bound on the information density between the private and the public data. We introduce an operationally meaningful privacy measure called pointwise maximal cost (PMC) and demonstrate that imposing an upper bound on PMC is equivalent to enforcing a lower bound on the information density. PMC quantifies the information leakage about a secret to adversaries who aim to minimize non-negative cost functions after observing the outcome of a privacy mechanism. When restricted to finite alphabets, PMC can equivalently be defined as the information leakage to adversaries aiming to minimize the probability of incorrectly guessing randomized functions of the secret. We study the properties of PMC and apply it to standard privacy mechanisms to demonstrate its practical relevance. Through a detailed examination, we connect PMC with other privacy measures that impose upper or lower bounds on the information density. Our results highlight that lower bounding the information density is a more stringent requirement than upper bounding it. Overall, our work significantly bridges the gaps in understanding the relationships between various privacy frameworks and provides insights for selecting a suitable framework for a given application.

Updated: 2024-07-01 10:38:02

标题: 隐私的信息密度边界

摘要: 本文探讨了通过在私人数据和公共数据之间强制信息密度的下限来保证隐私的影响。我们引入了一个操作上有意义的隐私度量称为点最大成本（PMC），并证明强制 PMC 的上限等效于强制信息密度的下限。PMC 量化了对希望在观察隐私机制结果后最小化非负成本函数的对手泄露关于秘密的信息。当限制在有限字母表上时，PMC 可以等效地定义为对希望最小化对秘密的随机化函数进行错误猜测的对手的信息泄漏。我们研究了 PMC 的性质，并将其应用于标准隐私机制，以展示其实际相关性。通过详细检查，我们将 PMC 与其他对信息密度施加上限或下限的隐私度量联系起来。我们的结果强调了对信息密度施加下限比对其施加上限更严格的要求。总的来说，我们的工作极大地弥合了对各种隐私框架之间关系的理解差距，并为选择适用于特定应用的框架提供了见解。

更新时间: 2024-07-01 10:38:02

领域: cs.IT,cs.CR,math.IT

下载: http://arxiv.org/abs/2407.01167v1

Benchmarking Predictive Coding Networks -- Made Simple

In this work, we tackle the problems of efficiency and scalability for predictive coding networks in machine learning. To do so, we first propose a library called PCX, whose focus lies on performance and simplicity, and provides a user-friendly, deep-learning oriented interface. Second, we use PCX to implement a large set of benchmarks for the community to use for their experiments. As most works propose their own tasks and architectures, do not compare one against each other, and focus on small-scale tasks, a simple and fast open-source library adopted by the whole community would address all of these concerns. Third, we perform extensive benchmarks using multiple algorithms, setting new state-of-the-art results in multiple tasks and datasets, as well as highlighting limitations inherent to PC that should be addressed. Thanks to the efficiency of PCX, we are able to analyze larger architectures than commonly used, providing baselines to galvanize community efforts towards one of the main open problems in the field: scalability. The code for PCX is available at \textit{https://github.com/liukidar/pcax}.

Updated: 2024-07-01 10:33:44

标题: 基准测试预测编码网络--简化版

摘要: 在这项工作中，我们解决了机器学习中预测编码网络的效率和可扩展性问题。为此，我们首先提出了一个名为PCX的库，其重点在于性能和简单性，并提供了一个面向深度学习的用户友好界面。其次，我们使用PCX为社区实现了一系列基准测试，供他们用于实验。由于大多数工作提出了自己的任务和架构，没有相互比较，并且专注于小规模任务，一个简单且快速的开源库被整个社区采用将解决所有这些问题。第三，我们使用多种算法进行了广泛的基准测试，在多个任务和数据集中取得了新的最先进结果，并突出了应该解决的PC固有限制。由于PCX的效率，我们能够分析比常用更大的架构，为激励社区努力解决该领域的主要开放问题之一提供了基线：可扩展性。PCX的代码可在\textit{https://github.com/liukidar/pcax}上找到。

更新时间: 2024-07-01 10:33:44

领域: cs.LG,cs.CV,I.2.6

下载: http://arxiv.org/abs/2407.01163v1

Towards Robust Physical-world Backdoor Attacks on Lane Detection

Deep learning-based lane detection (LD) plays a critical role in autonomous driving systems, such as adaptive cruise control. However, it is vulnerable to backdoor attacks. Existing backdoor attack methods on LD exhibit limited effectiveness in dynamic real-world scenarios, primarily because they fail to consider dynamic scene factors, including changes in driving perspectives (e.g., viewpoint transformations) and environmental conditions (e.g., weather or lighting changes). To tackle this issue, this paper introduces BadLANE, a dynamic scene adaptation backdoor attack for LD designed to withstand changes in real-world dynamic scene factors. To address the challenges posed by changing driving perspectives, we propose an amorphous trigger pattern composed of shapeless pixels. This trigger design allows the backdoor to be activated by various forms or shapes of mud spots or pollution on the road or lens, enabling adaptation to changes in vehicle observation viewpoints during driving. To mitigate the effects of environmental changes, we design a meta-learning framework to train meta-generators tailored to different environmental conditions. These generators produce meta-triggers that incorporate diverse environmental information, such as weather or lighting conditions, as the initialization of the trigger patterns for backdoor implantation, thus enabling adaptation to dynamic environments. Extensive experiments on various commonly used LD models in both digital and physical domains validate the effectiveness of our attacks, outperforming other baselines significantly (+25.15% on average in Attack Success Rate). Our codes will be available upon paper publication.

Updated: 2024-07-01 10:27:42

标题: 朝向对车道检测的物理世界强大后门攻击

摘要: 基于深度学习的车道检测（LD）在自动驾驶系统中扮演着至关重要的角色，例如自适应巡航控制。然而，它容易受到后门攻击的影响。现有的LD后门攻击方法在动态现实场景中表现出有限的有效性，主要是因为它们未能考虑到动态场景因素，包括驾驶视角的变化（例如视角转换）和环境条件的变化（例如天气或光照变化）。为了解决这个问题，本文介绍了BadLANE，一种面向LD的动态场景适应后门攻击，旨在抵御现实世界动态场景因素的变化。为了解决不断变化的驾驶视角所带来的挑战，我们提出了一个由无规则像素组成的无定形触发模式。这种触发设计允许后门被各种形式或形状的泥点或污染物激活，从而使其能够适应驾驶过程中的车辆观察视角的变化。为了减轻环境变化的影响，我们设计了一个元学习框架，用于训练针对不同环境条件定制的元生成器。这些生成器产生包含各种环境信息的元触发器，如天气或光照条件，作为后门植入的触发模式的初始化，从而使其能够适应动态环境。在数字和物理领域中对各种常用LD模型进行的广泛实验验证了我们攻击的有效性，明显优于其他基线（攻击成功率平均增加了25.15%）。我们的代码将在论文发表后提供。

更新时间: 2024-07-01 10:27:42

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2405.05553v3

Unaligning Everything: Or Aligning Any Text to Any Image in Multimodal Models

Utilizing a shared embedding space, emerging multimodal models exhibit unprecedented zero-shot capabilities. However, the shared embedding space could lead to new vulnerabilities if different modalities can be misaligned. In this paper, we extend and utilize a recently developed effective gradient-based procedure that allows us to match the embedding of a given text by minimally modifying an image. Using the procedure, we show that we can align the embeddings of distinguishable texts to any image through unnoticeable adversarial attacks in joint image-text models, revealing that semantically unrelated images can have embeddings of identical texts and at the same time visually indistinguishable images can be matched to the embeddings of very different texts. Our technique achieves 100\% success rate when it is applied to text datasets and images from multiple sources. Without overcoming the vulnerability, multimodal models cannot robustly align inputs from different modalities in a semantically meaningful way. \textbf{Warning: the text data used in this paper are toxic in nature and may be offensive to some readers.}

Updated: 2024-07-01 10:25:47

标题: Unaligning Everything: Or Aligning Any Text to Any Image in Multimodal Models 将所有内容解除对齐：或在多模态模型中将任何文本对齐到任何图像

摘要: 利用共享嵌入空间，新兴的多模态模型展示了前所未有的零样本能力。然而，如果不同的模态不对齐，共享嵌入空间可能会导致新的漏洞。在本文中，我们扩展并利用了最近开发的一种有效的基于梯度的过程，该过程允许我们通过最小修改图像来匹配给定文本的嵌入。使用这种方法，我们展示了我们可以通过不可察觉的对抗攻击在联合图像-文本模型中将可区分的文本的嵌入对准到任何图像，揭示了语义无关的图像可以具有相同文本的嵌入，同时外观上无法区分的图像可以与非常不同的文本的嵌入匹配。当应用于来自多个来源的文本数据集和图像时，我们的技术实现了100%的成功率。在不克服漏洞的情况下，多模态模型无法以语义上有意义的方式稳健地对齐来自不同模态的输入。警告：本文中使用的文本数据具有毒性，并可能对一些读者造成冒犯。

更新时间: 2024-07-01 10:25:47

领域: cs.CV,cs.AI,cs.CL,cs.LG

下载: http://arxiv.org/abs/2407.01157v1

CPT: Consistent Proxy Tuning for Black-box Optimization

Black-box tuning has attracted recent attention due to that the structure or inner parameters of advanced proprietary models are not accessible. Proxy-tuning provides a test-time output adjustment for tuning black-box language models. It applies the difference of the output logits before and after tuning a smaller white-box "proxy" model to improve the black-box model. However, this technique serves only as a decoding-time algorithm, leading to an inconsistency between training and testing which potentially limits overall performance. To address this problem, we introduce Consistent Proxy Tuning (CPT), a simple yet effective black-box tuning method. Different from Proxy-tuning, CPT additionally exploits the frozen large black-box model and another frozen small white-box model, ensuring consistency between training-stage optimization objective and test-time proxies. This consistency benefits Proxy-tuning and enhances model performance. Note that our method focuses solely on logit-level computation, which makes it model-agnostic and applicable to any task involving logit classification. Extensive experimental results demonstrate the superiority of our CPT in both black-box tuning of Large Language Models (LLMs) and Vision-Language Models (VLMs) across various datasets. The code is available at https://github.com/chunmeifeng/CPT.

Updated: 2024-07-01 10:23:14

标题: CPT：黑盒优化的一致代理调整

摘要: 黑盒调整近来备受关注，因为先进专有模型的结构或内部参数不可访问。代理调整为调整黑盒语言模型提供了一个测试时间输出调整。它应用了在调整较小的白盒“代理”模型之前和之后的输出对数之间的差异来改进黑盒模型。然而，这种技术仅作为一个解码时间算法，导致训练和测试之间的不一致，可能限制整体性能。为了解决这个问题，我们引入了一种简单但有效的黑盒调整方法Consistent Proxy Tuning（CPT）。与代理调整不同，CPT还利用了冻结的大型黑盒模型和另一个冻结的小型白盒模型，确保训练阶段优化目标与测试时间代理之间的一致性。这种一致性有利于代理调整并增强模型性能。值得注意的是，我们的方法仅关注对数级别的计算，使其与模型无关，并适用于涉及对数分类的任何任务。广泛的实验结果证明了我们的CPT在各种数据集上对大型语言模型（LLMs）和视觉语言模型（VLMs）的黑盒调整中的优越性。代码可在https://github.com/chunmeifeng/CPT上找到。

更新时间: 2024-07-01 10:23:14

领域: cs.LG

下载: http://arxiv.org/abs/2407.01155v1

Wind Estimation in Unmanned Aerial Vehicles with Causal Machine Learning

In this work we demonstrate the possibility of estimating the wind environment of a UAV without specialised sensors, using only the UAV's trajectory, applying a causal machine learning approach. We implement the causal curiosity method which combines machine learning times series classification and clustering with a causal framework. We analyse three distinct wind environments: constant wind, shear wind, and turbulence, and explore different optimisation strategies for optimal UAV manoeuvres to estimate the wind conditions. The proposed approach can be used to design optimal trajectories in challenging weather conditions, and to avoid specialised sensors that add to the UAV's weight and compromise its functionality.

Updated: 2024-07-01 10:22:16

标题: 无人机中的风速估计与因果机器学习

摘要: 在这项工作中，我们展示了在没有专门传感器的情况下，仅利用UAV的轨迹，应用因果机器学习方法来估计UAV的风环境的可能性。我们实现了结合机器学习时间序列分类和聚类的因果好奇方法，与因果框架相结合。我们分析了三种不同的风环境：恒定风、剪切风和湍流，并探索了不同的优化策略，以实现最佳UAV机动以估计风况。所提出的方法可用于设计挑战性天气条件下的最佳轨迹，并避免增加UAV重量并损害其功能性的专门传感器。

更新时间: 2024-07-01 10:22:16

领域: cs.LG,cs.RO

下载: http://arxiv.org/abs/2407.01154v1

Mimicking User Data: On Mitigating Fine-Tuning Risks in Closed Large Language Models

Fine-tuning large language models on small, high-quality datasets can enhance their performance on specific downstream tasks. Recent research shows that fine-tuning on benign, instruction-following data can inadvertently undo the safety alignment process and increase a model's propensity to comply with harmful queries. Although critical, understanding and mitigating safety risks in well-defined tasks remains distinct from the instruction-following context due to structural differences in the data. Our work addresses the gap in our understanding of these risks across diverse types of data in closed models - where providers control how user data is utilized in the fine-tuning process. We demonstrate how malicious actors can subtly manipulate the structure of almost any task-specific dataset to foster significantly more dangerous model behaviors, while maintaining an appearance of innocuity and reasonable downstream task performance. To address this issue, we propose a novel mitigation strategy that mixes in safety data which mimics the task format and prompting style of the user data, showing this is more effective than existing baselines at re-establishing safety alignment while maintaining similar task performance.

Updated: 2024-07-01 10:17:58

标题: 模拟用户数据：在封闭大型语言模型中减轻微调风险

摘要: 在小型、高质量数据集上对大型语言模型进行微调可以增强它们在特定下游任务上的性能。最近的研究表明，在良性、遵循指令的数据上微调可能会无意间撤销安全对齐过程，并增加模型遵守有害查询的倾向。尽管理解和减轻明确定义任务中的安全风险至关重要，但由于数据的结构差异，这与遵循指令的上下文有所区别。我们的工作着眼于我们对封闭模型中各种类型数据中这些风险的理解的差距 - 在这些模型中，提供商控制用户数据在微调过程中的使用方式。我们展示了恶意行为者如何微妙地操纵几乎任何特定任务数据集的结构，以促进更为危险的模型行为，同时保持一种无害和合理的下游任务性能外观。为了解决这个问题，我们提出了一种新颖的缓解策略，混入模仿用户数据的任务格式和提示风格的安全数据，显示这比现有基线更有效地重新建立安全对齐，同时保持类似的任务性能。

更新时间: 2024-07-01 10:17:58

领域: cs.CL,cs.LG

下载: http://arxiv.org/abs/2406.10288v2

Training-Free Acceleration of ViTs with Delayed Spatial Merging

Token merging has emerged as a new paradigm that can accelerate the inference of Vision Transformers (ViTs) without any retraining or fine-tuning. To push the frontier of training-free acceleration in ViTs, we improve token merging by adding the perspectives of 1) activation outliers and 2) hierarchical representations. Through a careful analysis of the attention behavior in ViTs, we characterize a delayed onset of the convergent attention phenomenon, which makes token merging undesirable in the bottom blocks of ViTs. Moreover, we augment token merging with a hierarchical processing scheme to capture multi-scale redundancy between visual tokens. Combining these two insights, we build a unified inference framework called DSM: Delayed Spatial Merging. We extensively evaluate DSM on various ViT model scales (Tiny to Huge) and tasks (ImageNet-1k and transfer learning), achieving up to 1.8$\times$ FLOP reduction and 1.6$\times$ throughput speedup at a negligible loss while being two orders of magnitude faster than existing methods.

Updated: 2024-07-01 10:16:38

标题: 无需训练的ViTs加速与延迟空间合并

摘要: Token merging已经成为一种新的范例，可以加速Vision Transformers（ViTs）的推断，而无需重新训练或微调。为了推动ViTs中无需训练的加速前沿，我们通过添加激活异常值和层次表征的视角来改进token merging。通过对ViTs中的注意力行为进行仔细分析，我们表征了收敛关注现象的延迟发生，这使得在ViTs的底层块中token merging变得不可取。此外，我们通过层次处理方案增强了token merging，以捕获视觉token之间的多尺度冗余。结合这两个见解，我们构建了一个统一的推断框架，称为DSM：Delayed Spatial Merging。我们在各种ViT模型规模（从Tiny到Huge）和任务（ImageNet-1k和迁移学习）上对DSM进行了广泛评估，实现了高达1.8倍的FLOP减少和1.6倍的吞吐量加速，同时几乎没有损失，而且比现有方法快两个数量级。

更新时间: 2024-07-01 10:16:38

领域: cs.CV,cs.AI,cs.LG

下载: http://arxiv.org/abs/2303.02331v2

Are you sure? Analysing Uncertainty Quantification Approaches for Real-world Speech Emotion Recognition

Uncertainty Quantification (UQ) is an important building block for the reliable use of neural networks in real-world scenarios, as it can be a useful tool in identifying faulty predictions. Speech emotion recognition (SER) models can suffer from particularly many sources of uncertainty, such as the ambiguity of emotions, Out-of-Distribution (OOD) data or, in general, poor recording conditions. Reliable UQ methods are thus of particular interest as in many SER applications no prediction is better than a faulty prediction. While the effects of label ambiguity on uncertainty are well documented in the literature, we focus our work on an evaluation of UQ methods for SER under common challenges in real-world application, such as corrupted signals, and the absence of speech. We show that simple UQ methods can already give an indication of the uncertainty of a prediction and that training with additional OOD data can greatly improve the identification of such signals.

Updated: 2024-07-01 10:11:08

标题: 你确定吗？分析实际语音情感识别中的不确定性量化方法

摘要: 不确定性量化（UQ）是在现实世界场景中可靠使用神经网络的重要基础，因为它可以作为识别错误预测的有用工具。语音情感识别（SER）模型可能会受到特别多的不确定性来源的影响，例如情绪的模糊性、离群数据或者一般的录音条件不佳。因此，可靠的UQ方法在许多SER应用中尤为重要，因为在许多情况下，没有预测比错误的预测更好。虽然文献中已经详细记录了标签模糊性对不确定性的影响，但我们的工作侧重于评估在现实应用中常见挑战下的SER的UQ方法，比如受损信号和缺乏语音。我们展示，简单的UQ方法已经能够给出一个预测的不确定性指示，并且用额外的离群数据训练可以大大改善这种信号的识别。

更新时间: 2024-07-01 10:11:08

领域: cs.SD,cs.AI,eess.AS

下载: http://arxiv.org/abs/2407.01143v1

Integrated feature analysis for deep learning interpretation and class activation maps

Understanding the decisions of deep learning (DL) models is essential for the acceptance of DL to risk-sensitive applications. Although methods, like class activation maps (CAMs), give a glimpse into the black box, they do miss some crucial information, thereby limiting its interpretability and merely providing the considered locations of objects. To provide more insight into the models and the influence of datasets, we propose an integrated feature analysis method, which consists of feature distribution analysis and feature decomposition, to look closer into the intermediate features extracted by DL models. This integrated feature analysis could provide information on overfitting, confounders, outliers in datasets, model redundancies and principal features extracted by the models, and provide distribution information to form a common intensity scale, which are missing in current CAM algorithms. The integrated feature analysis was applied to eight different datasets for general validation: photographs of handwritten digits, two datasets of natural images and five medical datasets, including skin photography, ultrasound, CT, X-rays and MRIs. The method was evaluated by calculating the consistency between the CAMs average class activation levels and the logits of the model. Based on the eight datasets, the correlation coefficients through our method were all very close to 100%, and based on the feature decomposition, 5%-25% of features could generate equally informative saliency maps and obtain the same model performances as using all features. This proves the reliability of the integrated feature analysis. As the proposed methods rely on very few assumptions, this is a step towards better model interpretation and a useful extension to existing CAM algorithms. Codes: https://github.com/YanliLi27/IFA

Updated: 2024-07-01 10:10:57

标题: 深度学习解释和类激活图的集成特征分析

摘要: 理解深度学习（DL）模型的决策对于DL在风险敏感的应用中的接受至关重要。尽管像类激活映射（CAMs）这样的方法可以让人一窥黑匣子，但它们确实缺少一些关键信息，从而限制了其可解释性，并仅提供考虑对象位置。为了更深入地了解模型和数据集的影响，我们提出了一种集成特征分析方法，该方法包括特征分布分析和特征分解，以更仔细地查看DL模型提取的中间特征。这种集成特征分析可以提供关于过拟合、混淆因素、数据集中的异常值、模型冗余和模型提取的主要特征的信息，并提供分布信息以形成一个共同的强度尺度，这在当前的CAM算法中是缺失的。集成特征分析应用于八个不同的数据集进行一般验证：手写数字的照片、两个自然图像数据集和五个医学数据集，包括皮肤摄影、超声波、CT、X射线和MRI。该方法通过计算CAMs平均类激活水平和模型的逻辑之间的一致性来评估。基于这八个数据集，我们的方法的相关系数都非常接近100%，根据特征分解，5%-25%的特征可以生成同等信息量的显著性地图，并获得与使用所有特征相同的模型性能。这证明了集成特征分析的可靠性。由于提出的方法依赖于非常少的假设，这是朝着更好地解释模型和对现有CAM算法的有用扩展迈出的一步。代码：https://github.com/YanliLi27/IFA

更新时间: 2024-07-01 10:10:57

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2407.01142v1

CodeIt: Self-Improving Language Models with Prioritized Hindsight Replay

Large language models are increasingly solving tasks that are commonly believed to require human-level reasoning ability. However, these models still perform very poorly on benchmarks of general intelligence such as the Abstraction and Reasoning Corpus (ARC). In this paper, we approach ARC as a programming-by-examples problem, and introduce a novel and scalable method for language model self-improvement called Code Iteration (CodeIt). Our method iterates between 1) program sampling and hindsight relabeling, and 2) learning from prioritized experience replay. By relabeling the goal of an episode (i.e., the target program output given input) to the realized output produced by the sampled program, our method effectively deals with the extreme sparsity of rewards in program synthesis. Applying CodeIt to the ARC dataset, we demonstrate that prioritized hindsight replay, along with pre-training and data-augmentation, leads to successful inter-task generalization. CodeIt is the first neuro-symbolic approach that scales to the full ARC evaluation dataset. Our method solves 15% of ARC evaluation tasks, achieving state-of-the-art performance and outperforming existing neural and symbolic baselines. Our code is available at https://github.com/Qualcomm-AI-research/codeit .

Updated: 2024-07-01 10:03:33

标题: CodeIt：具有优先后见重播功能的自我改进语言模型

摘要: 大型语言模型越来越能够解决通常被认为需要人类水平推理能力的任务。然而，这些模型在像抽象和推理语料库（ARC）这样的智能基准测试中表现非常糟糕。在本文中，我们将ARC视为一个编程示例问题，并引入一种新颖且可扩展的语言模型自我改进方法称为代码迭代（CodeIt）。我们的方法在程序抽样和事后重标记以及从优先体验重放中学习之间迭代。通过将一个情节的目标（即给定输入的目标程序输出）重标记为采样程序产生的实际输出，我们的方法有效地处理了程序合成中奖励的极端稀疏性。将CodeIt应用于ARC数据集，我们展示了优先事后重放，以及预训练和数据增强，导致成功的跨任务泛化。CodeIt是第一个扩展到完整ARC评估数据集的神经符号方法。我们的方法解决了15%的ARC评估任务，达到了最先进的性能，优于现有的神经和符号基线。我们的代码可在https://github.com/Qualcomm-AI-research/codeit 上找到。

更新时间: 2024-07-01 10:03:33

领域: cs.AI,cs.CL,cs.LG

下载: http://arxiv.org/abs/2402.04858v2

An Empirical Comparison of Generative Approaches for Product Attribute-Value Identification

Product attributes are crucial for e-commerce platforms, supporting applications like search, recommendation, and question answering. The task of Product Attribute and Value Identification (PAVI) involves identifying both attributes and their values from product information. In this paper, we formulate PAVI as a generation task and provide, to the best of our knowledge, the most comprehensive evaluation of PAVI so far. We compare three different attribute-value generation (AVG) strategies based on fine-tuning encoder-decoder models on three datasets. Experiments show that end-to-end AVG approach, which is computationally efficient, outperforms other strategies. However, there are differences depending on model sizes and the underlying language model. The code to reproduce all experiments is available at: https://github.com/kassemsabeh/pavi-avg

Updated: 2024-07-01 10:02:17

标题: 一个生成式方法的实证比较：产品属性值识别

摘要: 产品属性对于电子商务平台至关重要，支持搜索、推荐和问题解答等应用。产品属性和数值识别（PAVI）的任务涉及从产品信息中识别属性和它们的数值。在本文中，我们将PAVI定义为生成任务，并据我们所知，提供迄今为止最全面的PAVI评估。我们比较了基于微调编码器-解码器模型在三个数据集上的三种不同的属性-值生成（AVG）策略。实验表明，端到端的AVG方法在计算效率上表现优于其他策略。然而，根据模型大小和基础语言模型的不同，存在差异。复制所有实验的代码可在以下网址找到：https://github.com/kassemsabeh/pavi-avg

更新时间: 2024-07-01 10:02:17

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2407.01137v1

Towards a fully declarative neuro-symbolic language

Neuro-symbolic systems (NeSy), which claim to combine the best of both learning and reasoning capabilities of artificial intelligence, are missing a core property of reasoning systems: Declarativeness. The lack of declarativeness is caused by the functional nature of neural predicates inherited from neural networks. We propose and implement a general framework for fully declarative neural predicates, which hence extends to fully declarative NeSy frameworks. We first show that the declarative extension preserves the learning and reasoning capabilities while being able to answer arbitrary queries while only being trained on a single query type.

Updated: 2024-07-01 09:58:55

标题: 走向完全声明性的神经符号语言

摘要: 神经符号系统（NeSy）声称结合了人工智能学习和推理能力的最佳部分，但缺少推理系统的核心属性：可声明性。缺乏可声明性是由于从神经网络继承的神经谓词的功能性质造成的。我们提出并实现了一个通用框架，用于完全声明性的神经谓词，因此扩展到完全声明性的NeSy框架。我们首先展示声明性扩展保留了学习和推理能力，同时能够回答任意查询，而仅在单个查询类型上进行训练。

更新时间: 2024-07-01 09:58:55

领域: cs.AI

下载: http://arxiv.org/abs/2405.09521v2

$σ$-PCA: a building block for neural learning of identifiable linear transformations

Linear principal component analysis (PCA) learns (semi-)orthogonal transformations by orienting the axes to maximize variance. Consequently, it can only identify orthogonal axes whose variances are clearly distinct, but it cannot identify the subsets of axes whose variances are roughly equal. It cannot eliminate the subspace rotational indeterminacy: it fails to disentangle components with equal variances (eigenvalues), resulting, in each eigen subspace, in randomly rotated axes. In this paper, we propose $\sigma$-PCA, a method that (1) formulates a unified model for linear and nonlinear PCA, the latter being a special case of linear independent component analysis (ICA), and (2) introduces a missing piece into nonlinear PCA that allows it to eliminate, from the canonical linear PCA solution, the subspace rotational indeterminacy -- without whitening the inputs. Whitening, a preprocessing step which converts the inputs into unit-variance inputs, has generally been a prerequisite step for linear ICA methods, which meant that conventional nonlinear PCA could not necessarily preserve the orthogonality of the overall transformation, could not directly reduce dimensionality, and could not intrinsically order by variances. We offer insights on the relationship between linear PCA, nonlinear PCA, and linear ICA -- three methods with autoencoder formulations for learning special linear transformations from data, transformations that are (semi-)orthogonal for PCA, and arbitrary unit-variance for ICA. As part of our formulation, nonlinear PCA can be seen as a method that maximizes both variance and statistical independence, lying in the middle between linear PCA and linear ICA, serving as a building block for learning linear transformations that are identifiable.

Updated: 2024-07-01 09:55:40

标题: $σ$-主成分分析：可识别线性变换的神经学习基石

摘要: 线性主成分分析（PCA）通过定向轴以最大化方差来学习（半）正交变换。因此，它只能识别方差明显不同的正交轴，但无法识别方差大致相等的轴子集。它无法消除子空间的旋转不确定性：它无法分离具有相等方差（特征值）的成分，导致在每个特征子空间中，轴被随机旋转。在本文中，我们提出了$\sigma$-PCA，一种方法，（1）制定了线性和非线性PCA的统一模型，后者是线性独立分量分析（ICA）的一个特例，（2）在非线性PCA中引入了一个缺失的部分，允许它从规范线性PCA解中消除子空间的旋转不确定性--而不是对输入进行白化。白化是一个预处理步骤，将输入转换为单位方差输入，通常是线性ICA方法的先决条件步骤，这意味着传统的非线性PCA不能保留整体变换的正交性，不能直接降低维度，也不能本质上按方差排序。我们提供了关于线性PCA、非线性PCA和线性ICA之间关系的见解--三种具有自动编码器公式的方法，用于从数据中学习特殊线性变换，对于PCA，这些变换是（半）正交的，对于ICA，这些变换是任意单位方差的。作为我们公式的一部分，非线性PCA可以被看作是一种同时最大化方差和统计独立性的方法，它处于线性PCA和线性ICA之间的中间位置，作为学习可识别线性变换的构建块。

更新时间: 2024-07-01 09:55:40

领域: cs.LG,cs.AI,stat.ML

下载: http://arxiv.org/abs/2311.13580v4

Offline Digital Euro: a Minimum Viable CBDC using Groth-Sahai proofs

Current digital payment solutions are fragile and offer less privacy than traditional cash. Their critical dependency on an online service used to perform and validate transactions makes them void if this service is unreachable. Moreover, no transaction can be executed during server malfunctions or power outages. Due to climate change, the likelihood of extreme weather increases. As extreme weather is a major cause of power outages, the frequency of power outages is expected to increase. The lack of privacy is an inherent result of their account-based design or the use of a public ledger. The critical dependency and lack of privacy can be resolved with a Central Bank Digital Currency that can be used offline. This thesis proposes a design and a first implementation for an offline-first digital euro. The protocol offers complete privacy during transactions using zero-knowledge proofs. Furthermore, transactions can be executed offline without third parties and retroactive double-spending detection is facilitated. To protect the users' privacy, but also guard against money laundering, we have added the following privacy-guarding mechanism. The bank and trusted third parties for law enforcement must collaborate to decrypt transactions, revealing the digital pseudonym used in the transaction. Importantly, the transaction can be decrypted without decrypting prior transactions attached to the digital euro. The protocol has a working initial implementation showcasing its usability and demonstrating functionality.

Updated: 2024-07-01 09:55:14

标题: 线下数字欧元：使用Groth-Sahai证明的最小可行中央银行数字货币

摘要: 当前的数字支付解决方案很脆弱，比传统现金提供更少的隐私。它们对用于执行和验证交易的在线服务的关键依赖性使得如果该服务无法访问，这些解决方案就失效了。此外，在服务器故障或停电期间无法执行任何交易。由于气候变化，极端天气的可能性增加。极端天气是停电的主要原因之一，停电的频率预计将增加。缺乏隐私是它们基于账户设计或使用公共分类帐的固有结果。中央银行数字货币可以在离线状态下使用，从而解决关键依赖性和缺乏隐私的问题。本论文提出了一个离线优先数字欧元的设计和首次实施。该协议通过零知识证明在交易过程中提供完全的隐私。此外，交易可以在离线状态下执行，无需第三方，并且便于检测追溯式双重花费。为了保护用户的隐私，同时又防范洗钱，我们添加了以下保护隐私的机制。银行和执法部门的可信第三方必须合作解密交易，揭示交易中使用的数字化伪名。重要的是，可以在不解密附加到数字欧元的先前交易的情况下解密交易。该协议具有一个可用的初始实施，展示了其可用性和功能性。

更新时间: 2024-07-01 09:55:14

领域: cs.CR,q-fin.TR,68-02

下载: http://arxiv.org/abs/2407.13776v1

Investigating the potential of Sparse Mixtures-of-Experts for multi-domain neural machine translation

We focus on multi-domain Neural Machine Translation, with the goal of developing efficient models which can handle data from various domains seen during training and are robust to domains unseen during training. We hypothesize that Sparse Mixture-of-Experts (SMoE) models are a good fit for this task, as they enable efficient model scaling, which helps to accommodate a variety of multi-domain data, and allow flexible sharing of parameters between domains, potentially enabling knowledge transfer between similar domains and limiting negative transfer. We conduct a series of experiments aimed at validating the utility of SMoE for the multi-domain scenario, and find that a straightforward width scaling of Transformer is a simpler and surprisingly more efficient approach in practice, and reaches the same performance level as SMoE. We also search for a better recipe for robustness of multi-domain systems, highlighting the importance of mixing-in a generic domain, i.e. Paracrawl, and introducing a simple technique, domain randomization.

Updated: 2024-07-01 09:45:22

标题: 研究稀疏专家混合在多领域神经机器翻译中的潜力

摘要: 我们关注多领域神经机器翻译，旨在开发能够处理训练过程中见过的各种领域数据并对训练过程中未见过的领域具有鲁棒性的高效模型。我们假设稀疏的专家混合模型（SMoE）非常适合这一任务，因为它们能够实现高效的模型扩展，有助于适应多领域数据的多样性，并允许在领域之间灵活共享参数，潜在地实现类似领域之间的知识转移并限制负面转移。我们进行了一系列实验，旨在验证SMoE在多领域场景中的效用，并发现在实践中，对Transformer进行简单的宽度扩展是一种更简单且令人惊讶地更有效的方法，并且达到了与SMoE相同的性能水平。我们还寻求更好的多领域系统鲁棒性的方法，强调将通用领域（如Paracrawl）混合进其中的重要性，并引入一种简单的技术，即领域随机化。

更新时间: 2024-07-01 09:45:22

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2407.01126v1

Textual Similarity as a Key Metric in Machine Translation Quality Estimation

Machine Translation (MT) Quality Estimation (QE) assesses translation reliability without reference texts. This study introduces "textual similarity" as a new metric for QE, using sentence transformers and cosine similarity to measure semantic closeness. Analyzing data from the MLQE-PE dataset, we found that textual similarity exhibits stronger correlations with human scores than traditional metrics (hter, model evaluation, sentence probability etc.). Employing GAMMs as a statistical tool, we demonstrated that textual similarity consistently outperforms other metrics across multiple language pairs in predicting human scores. We also found that "hter" actually failed to predict human scores in QE. Our findings highlight the effectiveness of textual similarity as a robust QE metric, recommending its integration with other metrics into QE frameworks and MT system training for improved accuracy and usability.

Updated: 2024-07-01 09:30:34

标题: 文本相似性作为机器翻译质量评估的关键指标

摘要: 机器翻译（MT）质量评估（QE）评估翻译可靠性而无需参考文本。本研究引入了“文本相似性”作为QE的新度量标准，利用句子嵌入和余弦相似度来衡量语义接近度。分析MLQE-PE数据集的数据，我们发现文本相似性与人类评分之间表现出比传统度量标准（hter，模型评估，句子概率等）更强的相关性。利用广义可加混合模型（GAMMs）作为统计工具，我们证明文本相似性在多种语言对中一贯优于其他度量标准以预测人类评分。我们还发现，“hter”实际上在QE中未能预测人类评分。我们的研究结果强调了文本相似性作为一个稳健的QE度量标准的有效性，建议将其与其他度量标准整合到QE框架和MT系统培训中，以提高准确性和可用性。

更新时间: 2024-07-01 09:30:34

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2406.07440v2

Pron vs Prompt: Can Large Language Models already Challenge a World-Class Fiction Author at Creative Text Writing?

It has become routine to report research results where Large Language Models (LLMs) outperform average humans in a wide range of language-related tasks, and creative text writing is no exception. It seems natural, then, to raise the bid: Are LLMs ready to compete in creative writing skills with a top (rather than average) novelist? To provide an initial answer for this question, we have carried out a contest between Patricio Pron (an awarded novelist, considered one of the best of his generation) and GPT-4 (one of the top performing LLMs), in the spirit of AI-human duels such as DeepBlue vs Kasparov and AlphaGo vs Lee Sidol. We asked Pron and GPT-4 to provide thirty titles each, and then to write short stories for both their titles and their opponent's. Then, we prepared an evaluation rubric inspired by Boden's definition of creativity, and we collected 5,400 manual assessments provided by literature critics and scholars. The results of our experimentation indicate that LLMs are still far from challenging a top human creative writer, and that reaching such level of autonomous creative writing skills probably cannot be reached simply with larger language models.

Updated: 2024-07-01 09:28:58

标题: Pron vs Prompt: 大型语言模型是否已经具备挑战世界一流小说作家进行创意文本写作的能力？

摘要: 已经成为一种常规报告研究结果，大型语言模型（LLMs）在各种与语言相关的任务中超过普通人，创意文本写作也不例外。因此，提出了一个问题：LLMs是否准备好与顶尖（而不是普通）小说家在创意写作技能上竞争？为了初步回答这个问题，我们进行了一个比赛，比赛双方是已获奖的小说家帕特里西奥·普龙（被认为是他那一代中最优秀的之一）和GPT-4（表现最出色的LLMs之一），灵感来源于AI与人类的对决，比如深蓝对卡斯帕罗夫和AlphaGo对李世石。我们要求普龙和GPT-4各自提供30个标题，然后为自己的标题和对手的标题写短篇故事。然后，我们准备了一份受博登对创造力定义启发的评价规则，并收集了由文学评论家和学者提供的5,400个手工评估。我们的实验结果表明，LLMs仍远未挑战顶尖人类创意作家，并且仅仅依靠更大的语言模型可能无法达到这种水平的自主创意写作技能。

更新时间: 2024-07-01 09:28:58

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2407.01119v1

Stealing Maggie's Secrets -- On the Challenges of IP Theft Through FPGA Reverse Engineering

Intellectual Property (IP) theft is a cause of major financial and reputational damage, reportedly in the range of hundreds of billions of dollars annually in the U.S. alone. Field Programmable Gate Arrays (FPGAs) are particularly exposed to IP theft, because their configuration file contains the IP in a proprietary format that can be mapped to a gate-level netlist with moderate effort. Despite this threat, the scientific understanding of this issue lacks behind reality, thereby preventing an in-depth assessment of IP theft from FPGAs in academia. We address this discrepancy through a real-world case study on a Lattice iCE40 FPGA found inside iPhone 7. Apple refers to this FPGA as Maggie. By reverse engineering the proprietary signal-processing algorithm implemented on Maggie, we generate novel insights into the actual efforts required to commit FPGA IP theft and the challenges an attacker faces on the way. Informed by our case study, we then introduce generalized netlist reverse engineering techniques that drastically reduce the required manual effort and are applicable across a diverse spectrum of FPGA implementations and architectures. We evaluate these techniques on six benchmarks that are representative of different FPGA applications and have been synthesized for Xilinx and Lattice FPGAs, as well as in an end-to-end white-box case study. Finally, we provide a comprehensive open-source tool suite of netlist reverse engineering techniques to foster future research, enable the community to perform realistic threat assessments, and facilitate the evaluation of novel countermeasures.

Updated: 2024-07-01 09:21:50

标题: 窃取玛吉的秘密--关于通过FPGA逆向工程进行知识产权盗窃的挑战

摘要: 知识产权（IP）盗窃是造成重大财务和声誉损失的原因，据报道，仅在美国每年就损失数百亿美元。可编程门阵列（FPGAs）特别容易受到IP盗窃的影响，因为它们的配置文件包含以专有格式存储的IP，可以通过适度的努力映射到门级网表。尽管存在这一威胁，但学术界对这一问题的科学认识落后于现实，因此无法对来自FPGAs的IP盗窃进行深入评估。我们通过一个实际案例研究来解决这种差距，该案例研究涉及iPhone 7内部的Lattice iCE40 FPGA。苹果将该FPGA称为Maggie。通过对Maggie上实施的专有信号处理算法进行逆向工程，我们得出了关于实施FPGA IP盗窃所需的实际努力以及攻击者面临的挑战的新见解。根据我们的案例研究，我们引入了一种通用的网表逆向工程技术，大大减少了手动努力，并适用于各种FPGA实现和架构。我们对代表不同FPGA应用的六个基准进行评估，这些基准已经为Xilinx和Lattice FPGAs进行了综合，并且进行了一个端到端的白盒案例研究。最后，我们提供了一套全面的开源工具套件，用于网表逆向工程技术，以促进未来研究，使社区能够进行现实威胁评估，并促进对新型对策的评估。

更新时间: 2024-07-01 09:21:50

领域: cs.CR

下载: http://arxiv.org/abs/2312.06195v2

Proximity Matters: Local Proximity Preserved Balancing for Treatment Effect Estimation

Heterogeneous treatment effect (HTE) estimation from observational data poses significant challenges due to treatment selection bias. Existing methods address this bias by minimizing distribution discrepancies between treatment groups in latent space, focusing on global alignment. However, the fruitful aspect of local proximity, where similar units exhibit similar outcomes, is often overlooked. In this study, we propose Proximity-aware Counterfactual Regression (PCR) to exploit proximity for representation balancing within the HTE estimation context. Specifically, we introduce a local proximity preservation regularizer based on optimal transport to depict the local proximity in discrepancy calculation. Furthermore, to overcome the curse of dimensionality that renders the estimation of discrepancy ineffective, exacerbated by limited data availability for HTE estimation, we develop an informative subspace projector, which trades off minimal distance precision for improved sample complexity. Extensive experiments demonstrate that PCR accurately matches units across different treatment groups, effectively mitigates treatment selection bias, and significantly outperforms competitors. Code is available at https://anonymous.4open.science/status/ncr-B697.

Updated: 2024-07-01 09:20:26

标题: 距离重要：保持本地邻近性的平衡用于治疗效果估计

摘要: 从观测数据中估计异质性治疗效果（HTE）面临着重要挑战，因为存在治疗选择偏差。现有方法通过最小化潜在空间中治疗组之间的分布差异来解决这种偏差，重点放在全局对齐上。然而，局部邻近的丰富性，即相似单位表现出类似结果的方面，经常被忽视。在本研究中，我们提出Proximity-aware Counterfactual Regression（PCR）来利用邻近性在HTE估计背景下的表示平衡。具体地，我们引入了基于最优输运的局部邻近保持正则化器来描述差异计算中的局部邻近性。此外，为了克服维度灾难，该灾难使得差异估计无效，加剧了HTE估计的数据有限性，我们开发了一个信息子空间投影器，它权衡了最小距离精度和改善样本复杂性。大量实验证明，PCR准确地匹配了不同治疗组之间的单位，有效地减轻了治疗选择偏差，并且显著优于竞争对手。代码可在https://anonymous.4open.science/status/ncr-B697 上获得。

更新时间: 2024-07-01 09:20:26

领域: cs.LG,cs.AI,stat.ML

下载: http://arxiv.org/abs/2407.01111v1

SecGenAI: Enhancing Security of Cloud-based Generative AI Applications within Australian Critical Technologies of National Interest

The rapid advancement of Generative AI (GenAI) technologies offers transformative opportunities within Australia's critical technologies of national interest while introducing unique security challenges. This paper presents SecGenAI, a comprehensive security framework for cloud-based GenAI applications, with a focus on Retrieval-Augmented Generation (RAG) systems. SecGenAI addresses functional, infrastructure, and governance requirements, integrating end-to-end security analysis to generate specifications emphasizing data privacy, secure deployment, and shared responsibility models. Aligned with Australian Privacy Principles, AI Ethics Principles, and guidelines from the Australian Cyber Security Centre and Digital Transformation Agency, SecGenAI mitigates threats such as data leakage, adversarial attacks, and model inversion. The framework's novel approach combines advanced machine learning techniques with robust security measures, ensuring compliance with Australian regulations while enhancing the reliability and trustworthiness of GenAI systems. This research contributes to the field of intelligent systems by providing actionable strategies for secure GenAI implementation in industry, fostering innovation in AI applications, and safeguarding national interests.

Updated: 2024-07-01 09:19:50

标题: SecGenAI：增强澳大利亚国家利益关键技术中基于云的生成人工智能应用的安全性

摘要: Generative AI（GenAI）技术的快速发展为澳大利亚关键技术领域带来了变革性机遇，同时也引入了独特的安全挑战。本文介绍了SecGenAI，一个针对基于云的GenAI应用的综合安全框架，重点关注检索增强生成（RAG）系统。SecGenAI解决了功能、基础设施和治理要求，整合了端到端的安全分析，生成强调数据隐私、安全部署和共同责任模型的规范。与澳大利亚隐私原则、AI伦理原则以及澳大利亚网络安全中心和数字转型机构的指南一致，SecGenAI减轻了数据泄露、对抗性攻击和模型反转等威胁。该框架的新颖方法将先进的机器学习技术与强大的安全措施结合，确保符合澳大利亚法规，同时增强了GenAI系统的可靠性和信任度。这项研究为智能系统领域提供了可行的安全GenAI实施策略，促进了AI应用的创新，并保护了国家利益。

更新时间: 2024-07-01 09:19:50

领域: cs.CR,cs.AI,cs.CY,cs.LG

下载: http://arxiv.org/abs/2407.01110v1

Uni-Mol2: Exploring Molecular Pretraining Model at Scale

In recent years, pretraining models have made significant advancements in the fields of natural language processing (NLP), computer vision (CV), and life sciences. The significant advancements in NLP and CV are predominantly driven by the expansion of model parameters and data size, a phenomenon now recognized as the scaling laws. However, research exploring scaling law in molecular pretraining models remains unexplored. In this work, we present Uni-Mol2 , an innovative molecular pretraining model that leverages a two-track transformer to effectively integrate features at the atomic level, graph level, and geometry structure level. Along with this, we systematically investigate the scaling law within molecular pretraining models, characterizing the power-law correlations between validation loss and model size, dataset size, and computational resources. Consequently, we successfully scale Uni-Mol2 to 1.1 billion parameters through pretraining on 800 million conformations, making it the largest molecular pretraining model to date. Extensive experiments show consistent improvement in the downstream tasks as the model size grows. The Uni-Mol2 with 1.1B parameters also outperforms existing methods, achieving an average 27% improvement on the QM9 and 14% on COMPAS-1D dataset.

Updated: 2024-07-01 09:08:44

标题: Uni-Mol2：在规模上探索分子预训练模型

摘要: 近年来，预训练模型在自然语言处理（NLP）、计算机视觉（CV）和生命科学领域取得了显著进展。NLP和CV领域的重大进展主要是由模型参数和数据规模的扩展驱动的，这一现象现在被认为是规模定律。然而，在分子预训练模型中探索规模定律的研究仍未被开发。在这项工作中，我们提出了Uni-Mol2，这是一种创新的分子预训练模型，利用双轨道变压器有效地整合了原子级、图形级和几何结构级的特征。除此之外，我们系统地研究了分子预训练模型中的规模定律，描述了验证损失与模型大小、数据集大小和计算资源之间的幂律相关性。因此，我们成功地将Uni-Mol2扩展到了11亿参数，通过在8亿个构象上进行预训练，使其成为迄今为止最大的分子预训练模型。广泛的实验显示，随着模型规模的增长，在下游任务中保持了持续的改进。具有11亿参数的Uni-Mol2也优于现有方法，在QM9和COMPAS-1D数据集上平均提高了27%和14%。

更新时间: 2024-07-01 09:08:44

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2406.14969v2

How Reliable Are Automatic Evaluation Methods for Instruction-Tuned LLMs?

Work on instruction-tuned Large Language Models (LLMs) has used automatic methods based on text overlap and LLM judgments as cost-effective alternatives to human evaluation. In this paper, we perform a meta-evaluation of such methods and assess their reliability across a broad range of tasks. We observe that while automatic evaluation methods can approximate human ratings under specific conditions, their validity is highly context-dependent. Specifically, the simple ROUGE-L metric correlates well with human ratings for short-answer English tasks but is unreliable in free-form generation tasks and cross-lingual transfer. The effectiveness of the more advanced method of using GPT-4 as a judge diminishes significantly if reference answers are not included in the prompt, which is the scenario where this method has the potential to provide the most value compared to other metrics. Our findings enhance the understanding of how automatic methods should be applied and interpreted when developing and evaluating instruction-tuned LLMs.

Updated: 2024-07-01 08:59:08

标题: 自动评估方法对于面向指导的LLM模型有多可靠？

摘要: 关于指导调整的大型语言模型（LLMs）的研究已经使用基于文本重叠和LLM判断的自动方法作为人工评估的经济替代方法。在本文中，我们对这些方法进行了元评估，并评估它们在广泛任务范围内的可靠性。我们观察到，虽然自动评估方法可以在特定条件下近似人工评分，但它们的有效性高度依赖于上下文。具体来说，简单的ROUGE-L度量在短答案英语任务中与人工评分相关良好，但在自由形式生成任务和跨语言转移中不可靠。如果参考答案不包含在提示中，使用GPT-4作为评判的更高级方法的效果显著下降，而这种方法在与其他度量标准相比具有最大价值的情况下有潜力提供。我们的发现增强了对自动方法在开发和评估指导调整的LLMs时应如何应用和解释的理解。

更新时间: 2024-07-01 08:59:08

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2402.10770v2

Long-term drought prediction using deep neural networks based on geospatial weather data

The problem of high-quality drought forecasting up to a year in advance is critical for agriculture planning and insurance. Yet, it is still unsolved with reasonable accuracy due to data complexity and aridity stochasticity. We tackle drought data by introducing an end-to-end approach that adopts a spatio-temporal neural network model with accessible open monthly climate data as the input. Our systematic research employs diverse proposed models and five distinct environmental regions as a testbed to evaluate the efficacy of the Palmer Drought Severity Index (PDSI) prediction. Key aggregated findings are the exceptional performance of a Transformer model, EarthFormer, in making accurate short-term (up to six months) forecasts. At the same time, the Convolutional LSTM excels in longer-term forecasting. Both models achieved high ROC AUC scores: 0.948 for one month ahead and 0.617 for twelve months ahead forecasts, becoming closer to perfect ROC-AUC by $54\%$ and $16\%$, respectively, c.t. classic approaches.

Updated: 2024-07-01 08:58:07

标题: 基于地理空间天气数据的深度神经网络长期干旱预测

摘要: 在提前一年进行高质量干旱预测的问题对于农业规划和保险至关重要。然而，由于数据复杂性和干旱的随机性，这个问题仍然没有得到合理准确的解决。我们通过引入一种端到端方法来处理干旱数据，该方法采用了一个具有可访问的开放月度气候数据的时空神经网络模型作为输入。我们的系统研究采用了多种提出的模型和五个不同的环境区域作为测试基地，以评估Palmer干旱严重指数（PDSI）预测的有效性。关键的综合发现是Transformer模型EarthFormer在做出准确的短期（长达六个月）预测方面表现出色。与此同时，卷积LSTM在长期预测方面表现出色。这两个模型在ROC AUC得分方面表现出色：一个月内的得分为0.948，十二个月内的得分为0.617，分别比经典方法接近完美的ROC-AUC分数高出54%和16%。

更新时间: 2024-07-01 08:58:07

领域: cs.LG

下载: http://arxiv.org/abs/2309.06212v5

IBSEN: Director-Actor Agent Collaboration for Controllable and Interactive Drama Script Generation

Large language models have demonstrated their capabilities in storyline creation and human-like character role-playing. Current language model agents mainly focus on reasonable behaviors from the level of individuals, and their behaviors might be hard to constraint on the level of the whole storyline. In this paper we introduce IBSEN, a director-actor coordinate agent framework that generates drama scripts and makes the plot played by agents more controllable. The director agent writes plot outlines that the user desires to see, instructs the actor agents to role-play their characters, and reschedules the plot when human players participate in the scenario to ensure the plot is progressing towards the objective. To evaluate the framework, we create a novel drama plot that involves several actor agents and check the interactions between them under the instruction of the director agent. Evaluation results show that our framework could generate complete, diverse drama scripts from only a rough outline of plot objectives, meanwhile maintaining the characteristics of characters in the drama. Our codes and prompts are available at https://github.com/OpenDFM/ibsen.

Updated: 2024-07-01 08:49:57

标题: 易卜生：导演-演员代理合作用于可控和互动戏剧剧本生成

摘要: 大型语言模型已经展示了它们在情节创作和人类角色扮演中的能力。当前的语言模型代理主要关注个体层面的合理行为，它们的行为可能难以约束在整个故事情节的层面上。在本文中，我们介绍了IBSEN，一个导演-演员协作代理框架，用于生成戏剧剧本并使角色扮演更具可控性。导演代理编写用户希望看到的剧情概要，指导演员代理扮演他们的角色，并在人类玩家参与情节时重新安排剧情，以确保剧情朝着目标发展。为了评估该框架，我们创建了一个涉及多个演员代理的新颖戏剧情节，并检查了在导演代理的指导下它们之间的互动。评估结果显示，我们的框架可以仅从粗略的剧情目标大纲生成完整多样的戏剧剧本，同时保持戏剧中角色的特征。我们的代码和提示可在https://github.com/OpenDFM/ibsen 上找到。

更新时间: 2024-07-01 08:49:57

领域: cs.CL,cs.AI,cs.MA

下载: http://arxiv.org/abs/2407.01093v1

Kolmogorov-Arnold Convolutions: Design Principles and Empirical Studies

The emergence of Kolmogorov-Arnold Networks (KANs) has sparked significant interest and debate within the scientific community. This paper explores the application of KANs in the domain of computer vision (CV). We examine the convolutional version of KANs, considering various nonlinearity options beyond splines, such as Wavelet transforms and a range of polynomials. We propose a parameter-efficient design for Kolmogorov-Arnold convolutional layers and a parameter-efficient finetuning algorithm for pre-trained KAN models, as well as KAN convolutional versions of self-attention and focal modulation layers. We provide empirical evaluations conducted on MNIST, CIFAR10, CIFAR100, Tiny ImageNet, ImageNet1k, and HAM10000 datasets for image classification tasks. Additionally, we explore segmentation tasks, proposing U-Net-like architectures with KAN convolutions, and achieving state-of-the-art results on BUSI, GlaS, and CVC datasets. We summarized all of our findings in a preliminary design guide of KAN convolutional models for computer vision tasks. Furthermore, we investigate regularization techniques for KANs. All experimental code and implementations of convolutional layers and models, pre-trained on ImageNet1k weights are available on GitHub via this https://github.com/IvanDrokin/torch-conv-kan

Updated: 2024-07-01 08:49:33

标题: 科尔莫哥洛夫-阿诺德卷积：设计原则与实证研究

摘要: 科尔莫戈洛夫-阿诺德网络（KANs）的出现引起了科学界的极大兴趣和争论。本文探讨了KANs在计算机视觉（CV）领域的应用。我们研究了KANs的卷积版本，考虑了超越样条的各种非线性选项，如小波变换和一系列多项式。我们提出了一种参数高效的科尔莫戈洛夫-阿诺德卷积层设计，以及一个参数高效的微调算法，用于预训练的KAN模型，以及KAN卷积版本的自注意和焦点调制层。我们在MNIST、CIFAR10、CIFAR100、Tiny ImageNet、ImageNet1k和HAM10000数据集上进行了实证评估，用于图像分类任务。此外，我们探讨了分割任务，并提出了具有KAN卷积的U-Net-like架构，在BUSI、GlaS和CVC数据集上取得了最先进的结果。我们在KAN卷积模型的计算机视觉任务初步设计指南中总结了所有发现。此外，我们还研究了KAN的正则化技术。所有实验代码和卷积层和模型的实现，预训练的ImageNet1k权重可在GitHub上获得，链接为：https://github.com/IvanDrokin/torch-conv-kan

更新时间: 2024-07-01 08:49:33

领域: cs.CV,cs.AI,cs.LG

下载: http://arxiv.org/abs/2407.01092v1

Face4RAG: Factual Consistency Evaluation for Retrieval Augmented Generation in Chinese

The prevailing issue of factual inconsistency errors in conventional Retrieval Augmented Generation (RAG) motivates the study of Factual Consistency Evaluation (FCE). Despite the various FCE methods proposed earlier, these methods are evaluated on datasets generated by specific Large Language Models (LLMs). Without a comprehensive benchmark, it remains unexplored how these FCE methods perform on other LLMs with different error distributions or even unseen error types, as these methods may fail to detect the error types generated by other LLMs. To fill this gap, in this paper, we propose the first comprehensive FCE benchmark \emph{Face4RAG} for RAG independent of the underlying LLM. Our benchmark consists of a synthetic dataset built upon a carefully designed typology for factuality inconsistency error and a real-world dataset constructed from six commonly used LLMs, enabling evaluation of FCE methods on specific error types or real-world error distributions. On the proposed benchmark, we discover the failure of existing FCE methods to detect the logical fallacy, which refers to a mismatch of logic structures between the answer and the retrieved reference. To fix this issue, we further propose a new method called \emph{L-Face4RAG} with two novel designs of logic-preserving answer decomposition and fact-logic FCE. Extensive experiments show L-Face4RAG substantially outperforms previous methods for factual inconsistency detection on a wide range of tasks, notably beyond the RAG task from which it is originally motivated. Both the benchmark and our proposed method are publicly available.\footnote{\url{https://huggingface.co/datasets/yq27/Face4RAG}\label{link_face4rag}}

Updated: 2024-07-01 08:35:04

标题: 面向RAG的Factual Consistency评估：检索增强生成中文

摘要: 常规检索增强生成（RAG）中事实不一致错误的主要问题促使对事实一致性评估（FCE）的研究。尽管早期提出了各种FCE方法，但这些方法是在特定大型语言模型（LLMs）生成的数据集上进行评估的。缺乏全面的基准，尚未探讨这些FCE方法在其他具有不同错误分布甚至未知错误类型的LLMs上的表现如何，因为这些方法可能无法检测其他LLMs生成的错误类型。为填补这一空白，本文提出了第一个独立于基础LLM的全面FCE基准\emph{Face4RAG}。我们的基准由一个基于精心设计的事实不一致性错误类型学的合成数据集和一个由六个常用LLMs构建的真实数据集组成，可以评估FCE方法在特定错误类型或实际错误分布上的表现。在提出的基准上，我们发现现有的FCE方法无法检测逻辑谬误，即答案和检索到的参考之间逻辑结构不匹配的问题。为了解决这个问题，我们进一步提出了一种名为\emph{L-Face4RAG}的新方法，其中包括保留逻辑的答案分解和事实逻辑FCE两个新设计。大量实验表明，L-Face4RAG在广泛的任务上显著优于以前的方法，特别是超出其最初激发的RAG任务。我们提出的基准和方法都是公开可用的。

更新时间: 2024-07-01 08:35:04

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2407.01080v1

On Statistical Rates and Provably Efficient Criteria of Latent Diffusion Transformers (DiTs)

We investigate the statistical and computational limits of latent \textbf{Di}ffusion \textbf{T}ransformers (\textbf{DiT}s) under the low-dimensional linear latent space assumption. Statistically, we study the universal approximation and sample complexity of the DiTs score function, as well as the distribution recovery property of the initial data. Specifically, under mild data assumptions, we derive an approximation error bound for the score network of latent DiTs, which is sub-linear in the latent space dimension. Additionally, we derive the corresponding sample complexity bound and show that the data distribution generated from the estimated score function converges toward a proximate area of the original one. Computationally, we characterize the hardness of both forward inference and backward computation of latent DiTs, assuming the Strong Exponential Time Hypothesis (SETH). For forward inference, we identify efficient criteria for all possible latent DiTs inference algorithms and showcase our theory by pushing the efficiency toward almost-linear time inference. For backward computation, we leverage the low-rank structure within the gradient computation of DiTs training for possible algorithmic speedup. Specifically, we show that such speedup achieves almost-linear time latent DiTs training by casting the DiTs gradient as a series of chained low-rank approximations with bounded error. Under the low-dimensional assumption, we show that the convergence rate and the computational efficiency are both dominated by the dimension of the subspace, suggesting that latent DiTs have the potential to bypass the challenges associated with the high dimensionality of initial data.

Updated: 2024-07-01 08:34:40

标题: 关于潜在扩散变压器（DiTs）的统计率和可证明高效标准

摘要: 我们研究了在低维线性潜在空间假设下的潜在扩散变换器（DiT）的统计和计算极限。在统计上，我们研究了DiT得分函数的通用逼近和样本复杂度，以及初始数据的分布恢复属性。具体而言，在温和的数据假设下，我们推导了潜在DiT得分网络的逼近误差界，该界在潜在空间维度上是次线性的。此外，我们推导了相应的样本复杂度界限，并展示了从估计的得分函数生成的数据分布向原始数据的近似区域收敛的情况。在计算上，我们表征了在强指数时间假设（SETH）下，潜在DiT的前向推断和反向计算的难度。对于前向推断，我们确定了所有可能的潜在DiT推断算法的高效标准，并通过将效率推向几乎线性时间推断来展示我们的理论。对于反向计算，我们利用DiT训练中梯度计算内的低秩结构，以实现可能的算法加速。具体而言，我们展示了这种加速通过将DiT梯度表示为一系列具有有界误差的链状低秩逼近来实现几乎线性时间的潜在DiT训练。在低维假设下，我们表明收敛速度和计算效率均受到子空间维度的主导，暗示潜在DiT有可能避开与初始数据高维度相关的挑战。

更新时间: 2024-07-01 08:34:40

领域: stat.ML,cs.AI,cs.LG

下载: http://arxiv.org/abs/2407.01079v1

M-to-N Backdoor Paradigm: A Multi-Trigger and Multi-Target Attack to Deep Learning Models

Deep neural networks (DNNs) are vulnerable to backdoor attacks, where a backdoored model behaves normally with clean inputs but exhibits attacker-specified behaviors upon the inputs containing triggers. Most previous backdoor attacks mainly focus on either the all-to-one or all-to-all paradigm, allowing attackers to manipulate an input to attack a single target class. Besides, the two paradigms rely on a single trigger for backdoor activation, rendering attacks ineffective if the trigger is destroyed. In light of the above, we propose a new $M$-to-$N$ attack paradigm that allows an attacker to manipulate any input to attack $N$ target classes, and each backdoor of the $N$ target classes can be activated by any one of its $M$ triggers. Our attack selects $M$ clean images from each target class as triggers and leverages our proposed poisoned image generation framework to inject the triggers into clean images invisibly. By using triggers with the same distribution as clean training images, the targeted DNN models can generalize to the triggers during training, thereby enhancing the effectiveness of our attack on multiple target classes. Extensive experimental results demonstrate that our new backdoor attack is highly effective in attacking multiple target classes and robust against pre-processing operations and existing defenses.

Updated: 2024-07-01 08:23:31

标题: M到N后门范式：一种用于深度学习模型的多触发器和多目标攻击

摘要: 深度神经网络（DNN）容易受到后门攻击的影响，其中一个后门模型在干净输入情况下表现正常，但在包含触发器的输入时会展现攻击者指定的行为。大多数先前的后门攻击主要集中在全对一或全对全范式上，允许攻击者操纵输入以攻击单个目标类别。此外，这两种范式依赖于单个触发器进行后门激活，如果触发器被破坏，则攻击就会失效。鉴于以上情况，我们提出了一种新的M对N攻击范式，允许攻击者操纵任何输入以攻击N个目标类别，并且N个目标类别的每个后门可以被其M个触发器中的任何一个激活。我们的攻击选择每个目标类别中的M个干净图像作为触发器，并利用我们提出的毒害图像生成框架将触发器隐形地注入到干净图像中。通过使用与干净训练图像相同分布的触发器，目标DNN模型可以在训练过程中泛化到触发器，从而增强我们对多个目标类别的攻击效果。大量实验结果表明，我们的新后门攻击在攻击多个目标类别方面非常有效，并且对预处理操作和现有防御措施具有鲁棒性。

更新时间: 2024-07-01 08:23:31

领域: cs.CR

下载: http://arxiv.org/abs/2211.01875v2

Human-like object concept representations emerge naturally in multimodal large language models

The conceptualization and categorization of natural objects in the human mind have long intrigued cognitive scientists and neuroscientists, offering crucial insights into human perception and cognition. Recently, the rapid development of Large Language Models (LLMs) has raised the attractive question of whether these models can also develop human-like object representations through exposure to vast amounts of linguistic and multimodal data. In this study, we combined behavioral and neuroimaging analysis methods to uncover how the object concept representations in LLMs correlate with those of humans. By collecting large-scale datasets of 4.7 million triplet judgments from LLM and Multimodal LLM (MLLM), we were able to derive low-dimensional embeddings that capture the underlying similarity structure of 1,854 natural objects. The resulting 66-dimensional embeddings were found to be highly stable and predictive, and exhibited semantic clustering akin to human mental representations. Interestingly, the interpretability of the dimensions underlying these embeddings suggests that LLM and MLLM have developed human-like conceptual representations of natural objects. Further analysis demonstrated strong alignment between the identified model embeddings and neural activity patterns in many functionally defined brain ROIs (e.g., EBA, PPA, RSC and FFA). This provides compelling evidence that the object representations in LLMs, while not identical to those in the human, share fundamental commonalities that reflect key schemas of human conceptual knowledge. This study advances our understanding of machine intelligence and informs the development of more human-like artificial cognitive systems.

Updated: 2024-07-01 08:17:19

标题: 大型多模态语言模型中自然出现人类对象概念表示

摘要: 人类大脑对自然物体的概念化和分类长期以来一直引起认知科学家和神经科学家的兴趣，为人类感知和认知提供了关键见解。最近，大型语言模型（LLMs）的快速发展引发了一个有趣的问题，即这些模型是否也能通过接触大量语言和多模态数据而发展出类似人类的物体表示。在这项研究中，我们结合了行为和神经影像分析方法，揭示了LLMs中的对象概念表示与人类之间的相关性。通过从LLM和多模态LLM（MLLM）收集的470万个三元组判断的大规模数据集，我们能够推导出捕捉1854个自然物体的潜在相似结构的低维嵌入。结果显示，66维嵌入非常稳定和预测性，并展示了类似于人类心理表征的语义聚类。有趣的是，这些嵌入底层维度的可解释性表明LLM和MLLM已经发展出类似于人类的自然物体的概念表示。进一步分析表明，确定的模型嵌入与许多功能定义的大脑ROI（如EBA、PPA、RSC和FFA）中的神经活动模式具有强烈的一致性。这提供了令人信服的证据，即LLMs中的对象表示虽然不同于人类，但共享反映人类概念知识关键模式的基本共性。这项研究推动了我们对机器智能的理解，并为更具人类化的人工认知系统的发展提供了信息。

更新时间: 2024-07-01 08:17:19

领域: cs.AI,cs.CL,cs.CV,cs.HC,cs.LG

下载: http://arxiv.org/abs/2407.01067v1

Evolutionary Morphology Towards Overconstrained Locomotion via Large-Scale, Multi-Terrain Deep Reinforcement Learning

While the animals' Fin-to-Limb evolution has been well-researched in biology, such morphological transformation remains under-adopted in the modern design of advanced robotic limbs. This paper investigates a novel class of overconstrained locomotion from a design and learning perspective inspired by evolutionary morphology, aiming to integrate the concept of `intelligent design under constraints' - hereafter referred to as constraint-driven design intelligence - in developing modern robotic limbs with superior energy efficiency. We propose a 3D-printable design of robotic limbs parametrically reconfigurable as a classical planar 4-bar linkage, an overconstrained Bennett linkage, and a spherical 4-bar linkage. These limbs adopt a co-axial actuation, identical to the modern legged robot platforms, with the added capability of upgrading into a wheel-legged system. Then, we implemented a large-scale, multi-terrain deep reinforcement learning framework to train these reconfigurable limbs for a comparative analysis of overconstrained locomotion in energy efficiency. Results show that the overconstrained limbs exhibit more efficient locomotion than planar limbs during forward and sideways walking over different terrains, including floors, slopes, and stairs, with or without random noises, by saving at least 22% mechanical energy in completing the traverse task, with the spherical limbs being the least efficient. It also achieves the highest average speed of 0.85 meters per second on flat terrain, which is 20% faster than the planar limbs. This study paves the path for an exciting direction for future research in overconstrained robotics leveraging evolutionary morphology and reconfigurable mechanism intelligence when combined with state-of-the-art methods in deep reinforcement learning.

Updated: 2024-07-01 07:57:01

标题: 演化形态学：通过大规模、多地形深度强化学习实现过度约束的运动

摘要: 尽管动物的鳍肢到肢体进化在生物学中得到了深入研究，但这种形态上的转变在现代先进机器人肢体设计中仍未得到广泛采用。本文从设计和学习的角度探讨了一类新颖的超约束运动，灵感来自进化形态学，旨在将“在约束条件下的智能设计”概念整合到发展具有优越能量效率的现代机器人肢体中。我们提出了一种可三维打印的机器人肢体设计，可以参数化地重新配置为经典的平面四杆机构、超约束的Bennett链杆和球形四杆机构。这些肢体采用同轴传动，与现代四足机器人平台相同，具有升级为轮式机器人系统的能力。然后，我们实施了一个大规模、多地形的深度强化学习框架，用于训练这些可重新配置的肢体，以比较分析超约束运动在能量效率上的表现。结果显示，在不同地形上，包括地板、坡道和楼梯，超约束肢体在完成穿越任务时比平面肢体节省至少22%的机械能，表现出更有效的运动，无论是否有随机噪声干扰，其中球形肢体的效率最低。它在平坦地形上实现了最高的平均速度为每秒0.85米，比平面肢体快20%。这项研究为未来在超约束机器人领域中利用进化形态学和可重新配置机制智能结合深度强化学习的最新方法打开了一条激动人心的研究方向。

更新时间: 2024-07-01 07:57:01

领域: cs.RO,cs.AI

下载: http://arxiv.org/abs/2407.01050v1

FRoG: Evaluating Fuzzy Reasoning of Generalized Quantifiers in Large Language Models

Fuzzy reasoning is vital due to the frequent use of imprecise information in daily contexts. However, the ability of current large language models (LLMs) to handle such reasoning remains largely uncharted. In this paper, we introduce a new benchmark, FRoG, for fuzzy reasoning, featuring real-world mathematical word problems that incorporate generalized quantifiers. Our experimental findings reveal that fuzzy reasoning continues to pose significant challenges for LLMs. Moreover, we find that existing methods designed to enhance reasoning do not consistently improve performance in tasks involving fuzzy logic. Additionally, our results show an inverse scaling effect in the performance of LLMs on FRoG. Interestingly, we also demonstrate that strong mathematical reasoning skills are not necessarily indicative of success on our benchmark.

Updated: 2024-07-01 07:56:14

标题: FRoG：评估大型语言模型中的广义量词模糊推理

摘要: 模糊推理在日常环境中经常使用不精确信息，因此至关重要。然而，目前大型语言模型（LLMs）处理这种推理的能力仍然未被充分探索。在本文中，我们介绍了一个新的基准，FRoG，用于模糊推理，其中包括包含广义量词的真实世界数学问题。我们的实验结果显示，模糊推理对LLMs仍然具有重大挑战。此外，我们发现，现有旨在增强推理的方法在涉及模糊逻辑的任务中并不总是能够提高性能。此外，我们的结果显示LLMs在FRoG上的表现存在反向缩放效应。有趣的是，我们还证明了强大的数学推理能力并不一定能够预示在我们的基准测试中取得成功。

更新时间: 2024-07-01 07:56:14

领域: cs.AI,cs.CL

下载: http://arxiv.org/abs/2407.01046v1

Augmenting Document-level Relation Extraction with Efficient Multi-Supervision

Despite its popularity in sentence-level relation extraction, distantly supervised data is rarely utilized by existing work in document-level relation extraction due to its noisy nature and low information density. Among its current applications, distantly supervised data is mostly used as a whole for pertaining, which is of low time efficiency. To fill in the gap of efficient and robust utilization of distantly supervised training data, we propose Efficient Multi-Supervision for document-level relation extraction, in which we first select a subset of informative documents from the massive dataset by combining distant supervision with expert supervision, then train the model with Multi-Supervision Ranking Loss that integrates the knowledge from multiple sources of supervision to alleviate the effects of noise. The experiments demonstrate the effectiveness of our method in improving the model performance with higher time efficiency than existing baselines.

Updated: 2024-07-01 07:22:32

标题: 利用高效多监督方法增强文档级关系抽取

摘要: 尽管在句子级关系抽取中很受欢迎，但由于其噪声性质和低信息密度，远程监督数据很少被现有的文档级关系抽取工作利用。在当前的应用中，远程监督数据大多被整体用于相关性，这在时间效率上效果不佳。为了填补远程监督训练数据高效和稳健利用的空白，我们提出了一种文档级关系抽取的高效多重监督方法，首先通过将远程监督与专家监督相结合，从庞大的数据集中选择一部分信息丰富的文档子集，然后用多重监督排序损失训练模型，该损失整合了来自多个监督来源的知识以减轻噪声的影响。实验表明，我们的方法在提高模型性能方面表现出了有效性，并且比现有基线具有更高的时间效率。

更新时间: 2024-07-01 07:22:32

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2407.01026v1

Is one brick enough to break the wall of spoken dialogue state tracking?

In Task-Oriented Dialogue (TOD) systems, correctly updating the system's understanding of the user's requests (\textit{a.k.a} dialogue state tracking) is key to a smooth interaction. Traditionally, TOD systems perform this update in three steps: transcription of the user's utterance, semantic extraction of the key concepts, and contextualization with the previously identified concepts. Such cascade approaches suffer from cascading errors and separate optimization. End-to-End approaches have been proven helpful up to the turn-level semantic extraction step. This paper goes one step further and provides (1) a novel approach for completely neural spoken DST, (2) an in depth comparison with a state of the art cascade approach and (3) avenues towards better context propagation. Our study highlights that jointly-optimized approaches are also competitive for contextually dependent tasks, such as Dialogue State Tracking (DST), especially in audio native settings. Context propagation in DST systems could benefit from training procedures accounting for the previous' context inherent uncertainty.

Updated: 2024-07-01 07:15:33

标题: 一个砖头足以打破口语对话状态跟踪的壁垒吗？

摘要: 在面向任务的对话（TOD）系统中，正确更新系统对用户请求的理解（即对话状态跟踪）是实现流畅互动的关键。传统上，TOD系统通过三个步骤来执行此更新：用户话语的转录，关键概念的语义提取，以及与先前识别的概念进行上下文化。这种级联方法容易受到级联错误和分开优化的影响。端到端方法已被证明在转交层语义提取步骤上很有帮助。本文进一步提供了（1）一种全新的完全神经口语DST方法，（2）与最先进的级联方法进行了深入比较，（3）为更好的上下文传播提供了途径。我们的研究强调，联合优化方法在上下文相关任务（如对话状态跟踪）中也具有竞争力，特别是在音频原生设置下。DST系统中的上下文传播可以受益于考虑先前上下文固有不确定性的训练程序。

更新时间: 2024-07-01 07:15:33

领域: cs.CL,cs.AI,eess.AS,eess.SP

下载: http://arxiv.org/abs/2311.04923v3

KAGNNs: Kolmogorov-Arnold Networks meet Graph Learning

In recent years, Graph Neural Networks (GNNs) have become the de facto tool for learning node and graph representations. Most GNNs typically consist of a sequence of neighborhood aggregation (a.k.a., message passing) layers. Within each of these layers, the representation of each node is updated from an aggregation and transformation of its neighbours representations at the previous layer. The upper bound for the expressive power of message passing GNNs was reached through the use of MLPs as a transformation, due to their universal approximation capabilities. However, MLPs suffer from well-known limitations, which recently motivated the introduction of Kolmogorov-Arnold Networks (KANs). KANs rely on the Kolmogorov-Arnold representation theorem, rendering them a promising alternative to MLPs. In this work, we compare the performance of KANs against that of MLPs in graph learning tasks. We perform extensive experiments on node classification, graph classification and graph regression datasets. Our preliminary results indicate that while KANs are on-par with MLPs in classification tasks, they seem to have a clear advantage in the graph regression tasks. Code is available at https: //github.com/RomanBresson/KAGNN.

Updated: 2024-07-01 07:13:08

标题: KAGNNs: 科尔莫戈洛夫-阿诺德网络遇上图学习

摘要: 近年来，图神经网络（GNNs）已成为学习节点和图表示的事实工具。大多数GNN通常由一系列邻域聚合（也称为消息传递）层组成。在每个层中，每个节点的表示都是从前一层邻居的表示的聚合和转换中更新的。通过使用MLPs作为转换，消息传递GNNs的表达能力的上限已经达到。然而，MLPs存在众所周知的限制，最近引起了Kolmogorov-Arnold网络（KANs）的引入。KANs依赖于Kolmogorov-Arnold表示定理，使它们成为MLPs的有希望的替代品。在这项工作中，我们比较了KANs在图学习任务中与MLPs的性能。我们在节点分类、图分类和图回归数据集上进行了大量实验。我们的初步结果表明，在分类任务中，KANs与MLPs不相上下，但在图回归任务中似乎具有明显优势。代码可在https://github.com/RomanBresson/KAGNN获取。

更新时间: 2024-07-01 07:13:08

领域: cs.LG

下载: http://arxiv.org/abs/2406.18380v2

Evaluating Copyright Takedown Methods for Language Models

Language models (LMs) derive their capabilities from extensive training on diverse data, including potentially copyrighted material. These models can memorize and generate content similar to their training data, posing potential concerns. Therefore, model creators are motivated to develop mitigation methods that prevent generating protected content. We term this procedure as copyright takedowns for LMs, noting the conceptual similarity to (but legal distinction from) the DMCA takedown This paper introduces the first evaluation of the feasibility and side effects of copyright takedowns for LMs. We propose CoTaEval, an evaluation framework to assess the effectiveness of copyright takedown methods, the impact on the model's ability to retain uncopyrightable factual knowledge from the training data whose recitation is embargoed, and how well the model maintains its general utility and efficiency. We examine several strategies, including adding system prompts, decoding-time filtering interventions, and unlearning approaches. Our findings indicate that no tested method excels across all metrics, showing significant room for research in this unique problem setting and indicating potential unresolved challenges for live policy proposals.

Updated: 2024-07-01 07:12:45

标题: 评估语言模型的版权下架方法

摘要: 语言模型（LMs）的能力来源于对多样化数据的广泛训练，包括潜在的受版权保护的材料。这些模型可以记忆和生成与其训练数据类似的内容，引发潜在的担忧。因此，模型创建者被激励开发减轻方法，防止生成受保护的内容。我们将这一过程称为LMs的版权下架，注意到它与DMCA下架的概念上的相似性（但在法律上有区别）。本文介绍了对LMs版权下架的可行性和副作用的首次评估。我们提出了CoTaEval，一个评估框架，用于评估版权下架方法的有效性，对模型保留训练数据中被禁止引述的不受版权保护的事实知识的影响，以及模型如何保持其一般效用和效率。我们研究了几种策略，包括添加系统提示，解码时过滤干预和去学习方法。我们的发现表明，在所有指标上没有测试过的方法表现出色，显示了在这个独特问题设置中进一步研究的重要空间，并指出了活跃政策提案可能存在未解决的挑战。

更新时间: 2024-07-01 07:12:45

领域: cs.CL,cs.LG

下载: http://arxiv.org/abs/2406.18664v2

Assessing the Brittleness of Safety Alignment via Pruning and Low-Rank Modifications

Large language models (LLMs) show inherent brittleness in their safety mechanisms, as evidenced by their susceptibility to jailbreaking and even non-malicious fine-tuning. This study explores this brittleness of safety alignment by leveraging pruning and low-rank modifications. We develop methods to identify critical regions that are vital for safety guardrails, and that are disentangled from utility-relevant regions at both the neuron and rank levels. Surprisingly, the isolated regions we find are sparse, comprising about $3\%$ at the parameter level and $2.5\%$ at the rank level. Removing these regions compromises safety without significantly impacting utility, corroborating the inherent brittleness of the model's safety mechanisms. Moreover, we show that LLMs remain vulnerable to low-cost fine-tuning attacks even when modifications to the safety-critical regions are restricted. These findings underscore the urgent need for more robust safety strategies in LLMs.

Updated: 2024-07-01 07:11:17

标题: 评估安全对齐的脆弱性：通过修剪和低秩修改

摘要: 大语言模型（LLMs）在其安全机制中表现出固有的脆弱性，这表现在它们容易被越狱甚至非恶意微调所影响。本研究通过利用修剪和低秩修改来探讨这种安全对齐的脆弱性。我们开发了方法来识别对安全护栏至关重要且与效用相关区域在神经元和秩级别上解耦的关键区域。令人惊讶的是，我们发现的孤立区域是稀疏的，在参数级别约占$3\%$，在秩级别约占$2.5\%$。去除这些区域会损害安全性而不会显著影响效用，证实了模型安全机制的固有脆弱性。此外，我们表明，即使限制了对安全关键区域的修改，LLMs仍然容易受到低成本微调攻击的影响。这些发现强调了LLMs中更加稳健的安全策略的迫切需求。

更新时间: 2024-07-01 07:11:17

领域: cs.LG,cs.AI,cs.CL

下载: http://arxiv.org/abs/2402.05162v3

Benchmarking Mental State Representations in Language Models

While numerous works have assessed the generative performance of language models (LMs) on tasks requiring Theory of Mind reasoning, research into the models' internal representation of mental states remains limited. Recent work has used probing to demonstrate that LMs can represent beliefs of themselves and others. However, these claims are accompanied by limited evaluation, making it difficult to assess how mental state representations are affected by model design and training choices. We report an extensive benchmark with various LM types with different model sizes, fine-tuning approaches, and prompt designs to study the robustness of mental state representations and memorisation issues within the probes. Our results show that the quality of models' internal representations of the beliefs of others increases with model size and, more crucially, with fine-tuning. We are the first to study how prompt variations impact probing performance on theory of mind tasks. We demonstrate that models' representations are sensitive to prompt variations, even when such variations should be beneficial. Finally, we complement previous activation editing experiments on Theory of Mind tasks and show that it is possible to improve models' reasoning performance by steering their activations without the need to train any probe.

Updated: 2024-07-01 06:48:34

标题: 在语言模型中对心理状态表征进行基准测试

摘要: 尽管许多作品已评估了语言模型（LMs）在需要心灵理论推理的任务中的生成性能，但对模型对心智状态的内部表示的研究仍然有限。最近的研究利用探测来证明LMs可以表示自己和他人的信仰。然而，这些声明缺乏评估，使得难以评估模型设计和训练选择对心智状态表示的影响。我们报告了一个广泛的基准测试，使用不同模型大小、微调方法和提示设计的各种LM类型，以研究心智状态表示的稳健性和探测中的记忆问题。我们的结果显示，模型对他人信仰的内部表示质量随着模型大小和微调的增加而提高，更关键的是，随着微调的增加。我们是第一个研究提示变化如何影响心灵理论任务探测性能的研究者。我们证明了模型的表示对提示变化敏感，即使这些变化应该是有益的。最后，我们补充了以往在心灵理论任务上的激活编辑实验，并展示了通过引导它们的激活而无需训练任何探针可以改进模型的推理性能。

更新时间: 2024-07-01 06:48:34

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2406.17513v2

σ-GPTs: A New Approach to Autoregressive Models

Autoregressive models, such as the GPT family, use a fixed order, usually left-to-right, to generate sequences. However, this is not a necessity. In this paper, we challenge this assumption and show that by simply adding a positional encoding for the output, this order can be modulated on-the-fly per-sample which offers key advantageous properties. It allows for the sampling of and conditioning on arbitrary subsets of tokens, and it also allows sampling in one shot multiple tokens dynamically according to a rejection strategy, leading to a sub-linear number of model evaluations. We evaluate our method across various domains, including language modeling, path-solving, and aircraft vertical rate prediction, decreasing the number of steps required for generation by an order of magnitude.

Updated: 2024-07-01 06:46:36

标题: σ-GPTs：一个新的自回归模型方法

摘要: Autoregressive模型，如GPT系列，通常使用固定顺序（通常是从左到右）生成序列。然而，这并不是必需的。在本文中，我们挑战了这一假设，并展示了通过简单地为输出添加位置编码，可以实现每个样本的即时调节顺序，从而提供关键优势属性。它允许对任意子标记进行采样和调节，并且还允许根据拒绝策略动态一次性地采样多个标记，从而导致模型评估数量次线性增长。我们在各个领域评估了我们的方法，包括语言建模、路径解决和飞机垂直速率预测，将生成所需的步骤数量降低一个数量级。

更新时间: 2024-07-01 06:46:36

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2404.09562v2

Flood Prediction Using Classical and Quantum Machine Learning Models

This study investigates the potential of quantum machine learning to improve flood forecasting we focus on daily flood events along Germany's Wupper River in 2023 our approach combines classical machine learning techniques with QML techniques this hybrid model leverages quantum properties like superposition and entanglement to achieve better accuracy and efficiency classical and QML models are compared based on training time accuracy and scalability results show that QML models offer competitive training times and improved prediction accuracy this research signifies a step towards utilizing quantum technologies for climate change adaptation we emphasize collaboration and continuous innovation to implement this model in real-world flood management ultimately enhancing global resilience against floods

Updated: 2024-07-01 06:31:41

标题: 利用经典和量子机器学习模型进行洪水预测

摘要: 本研究调查了量子机器学习在改善洪水预测方面的潜力，我们重点关注2023年德国魏珀河的日常洪水事件。我们的方法结合了经典机器学习技术和量子机器学习技术，这种混合模型利用了量子超位置和纠缠等特性，以实现更好的准确性和效率。基于训练时间、准确性和可扩展性，我们比较了经典和量子机器学习模型的结果显示，量子机器学习模型提供了具有竞争力的训练时间和改善的预测准确性。这项研究标志着利用量子技术应对气候变化的一步，我们强调合作和持续创新，以在实际洪水管理中实施这一模型，最终增强全球对抗洪水的韧性。

更新时间: 2024-07-01 06:31:41

领域: cs.LG,cs.AI,physics.geo-ph,quant-ph

下载: http://arxiv.org/abs/2407.01001v1

Enhancing the LLM-Based Robot Manipulation Through Human-Robot Collaboration

Large Language Models (LLMs) are gaining popularity in the field of robotics. However, LLM-based robots are limited to simple, repetitive motions due to the poor integration between language models, robots, and the environment. This paper proposes a novel approach to enhance the performance of LLM-based autonomous manipulation through Human-Robot Collaboration (HRC). The approach involves using a prompted GPT-4 language model to decompose high-level language commands into sequences of motions that can be executed by the robot. The system also employs a YOLO-based perception algorithm, providing visual cues to the LLM, which aids in planning feasible motions within the specific environment. Additionally, an HRC method is proposed by combining teleoperation and Dynamic Movement Primitives (DMP), allowing the LLM-based robot to learn from human guidance. Real-world experiments have been conducted using the Toyota Human Support Robot for manipulation tasks. The outcomes indicate that tasks requiring complex trajectory planning and reasoning over environments can be efficiently accomplished through the incorporation of human demonstrations.

Updated: 2024-07-01 06:11:31

标题: 通过人机协作增强基于LLM的机器人操纵

摘要: 大型语言模型（LLMs）在机器人领域日益受到青睐。然而，基于LLM的机器人由于语言模型、机器人和环境之间的集成不足，仅限于简单重复的动作。本文提出了一种通过人机协作（HRC）来增强基于LLM的自主操纵性能的新方法。该方法涉及使用提示的GPT-4语言模型将高级语言命令分解为机器人可以执行的动作序列。该系统还采用基于YOLO的感知算法，为LLM提供视觉线索，有助于在特定环境内规划可行的动作。此外，通过结合远程操作和动态运动原理（DMP）提出了一种HRC方法，允许基于LLM的机器人从人类指导中学习。使用丰田人类辅助机器人进行操纵任务的实际实验已经进行。结果表明，通过整合人类示范，可以有效地完成需要复杂轨迹规划和对环境推理的任务。

更新时间: 2024-07-01 06:11:31

领域: cs.RO,cs.AI,cs.HC

下载: http://arxiv.org/abs/2406.14097v2

Mobile-Bench: An Evaluation Benchmark for LLM-based Mobile Agents

With the remarkable advancements of large language models (LLMs), LLM-based agents have become a research hotspot in human-computer interaction. However, there is a scarcity of benchmarks available for LLM-based mobile agents. Benchmarking these agents generally faces three main challenges: (1) The inefficiency of UI-only operations imposes limitations to task evaluation. (2) Specific instructions within a singular application lack adequacy for assessing the multi-dimensional reasoning and decision-making capacities of LLM mobile agents. (3) Current evaluation metrics are insufficient to accurately assess the process of sequential actions. To this end, we propose Mobile-Bench, a novel benchmark for evaluating the capabilities of LLM-based mobile agents. First, we expand conventional UI operations by incorporating 103 collected APIs to accelerate the efficiency of task completion. Subsequently, we collect evaluation data by combining real user queries with augmentation from LLMs. To better evaluate different levels of planning capabilities for mobile agents, our data is categorized into three distinct groups: SAST, SAMT, and MAMT, reflecting varying levels of task complexity. Mobile-Bench comprises 832 data entries, with more than 200 tasks specifically designed to evaluate multi-APP collaboration scenarios. Furthermore, we introduce a more accurate evaluation metric, named CheckPoint, to assess whether LLM-based mobile agents reach essential points during their planning and reasoning steps.

Updated: 2024-07-01 06:10:01

标题: Mobile-Bench：基于LLM的移动代理评估基准

摘要: 随着大型语言模型（LLMs）的显着进展，基于LLM的代理已成为人机交互领域的研究热点。然而，目前针对LLM移动代理的基准测试仍然很少。对这些代理进行基准测试通常面临三个主要挑战：（1）仅使用UI操作效率低下，限制了任务评估的范围。(2) 单一应用程序中的特定指令不足以评估LLM移动代理的多维推理和决策能力。(3) 当前的评估指标不足以准确评估顺序动作的过程。因此，我们提出了Mobile-Bench，一个用于评估LLM移动代理能力的新型基准测试。首先，我们通过整合103个收集的API扩展传统UI操作，以加快任务完成效率。随后，我们通过将真实用户查询与LLMs的增强相结合来收集评估数据。为了更好地评估移动代理的不同规划能力水平，我们的数据被分为三个明确的组别：SAST，SANT和MAMT，反映了不同水平的任务复杂性。Mobile-Bench包括832个数据条目，其中有超过200个任务专门设计用于评估多应用程序协作场景。此外，我们引入了一个更准确的评估指标，名为CheckPoint，用于评估LLM移动代理在规划和推理步骤中是否达到关键点。

更新时间: 2024-07-01 06:10:01

领域: cs.AI,cs.CL

下载: http://arxiv.org/abs/2407.00993v1

Individual brain parcellation: Review of methods, validations and applications

Individual brains vary greatly in morphology, connectivity and organization. The applicability of group-level parcellations is limited by the rapid development of precision medicine today because they do not take into account the variation of parcels at the individual level. Accurate mapping of brain functional regions at the individual level is pivotal for a comprehensive understanding of the variations in brain function and behaviors, early and precise identification of brain abnormalities, as well as personalized treatments for neuropsychiatric disorders. With the development of neuroimaging and machine learning techniques, studies on individual brain parcellation are booming. In this paper, we offer an overview of recent advances in the methodologies of individual brain parcellation, including optimization- and learning-based methods. Comprehensive evaluation metrics to validate individual brain mapping have been introduced. We also review the studies of how individual brain mapping promotes neuroscience research and clinical medicine. Finally, we summarize the major challenges and important future directions of individualized brain parcellation. Collectively, we intend to offer a thorough overview of individual brain parcellation methods, validations, and applications, along with highlighting the current challenges that call for an urgent demand for integrated platforms that integrate datasets, methods, and validations.

Updated: 2024-07-01 05:48:05

标题: 个体脑部分割：方法、验证和应用的综述

摘要: 个体大脑在形态、连接性和组织上存在着巨大的变异性。由于个体水平的分区变异未被考虑，因此群体水平的分区适用性受到限制，这与当今精准医学的快速发展存在矛盾。准确映射个体水平的脑功能区域对于全面了解大脑功能和行为变异、早期和精准识别脑异常以及个性化治疗神经精神疾病至关重要。随着神经影像和机器学习技术的发展，个体大脑分区研究正在蓬勃发展。本文概述了个体大脑分区方法的最新进展，包括基于优化和学习的方法。引入了全面评估指标来验证个体大脑映射。我们还回顾了个体大脑映射如何促进神经科学研究和临床医学的研究。最后，我们总结了个体化大脑分区面临的主要挑战和重要未来方向。总的来说，我们意在提供个体大脑分区方法、验证和应用的全面概述，同时强调当前面临紧迫需求的集成平台，整合数据集、方法和验证。

更新时间: 2024-07-01 05:48:05

领域: q-bio.NC,cs.AI

下载: http://arxiv.org/abs/2407.00984v1

RouteLLM: Learning to Route LLMs with Preference Data

Large language models (LLMs) exhibit impressive capabilities across a wide range of tasks, yet the choice of which model to use often involves a trade-off between performance and cost. More powerful models, though effective, come with higher expenses, while less capable models are more cost-effective. To address this dilemma, we propose several efficient router models that dynamically select between a stronger and a weaker LLM during inference, aiming to optimize the balance between cost and response quality. We develop a training framework for these routers leveraging human preference data and data augmentation techniques to enhance performance. Our evaluation on widely-recognized benchmarks shows that our approach significantly reduces costs-by over 2 times in certain cases-without compromising the quality of responses. Interestingly, our router models also demonstrate significant transfer learning capabilities, maintaining their performance even when the strong and weak models are changed at test time. This highlights the potential of these routers to provide a cost-effective yet high-performance solution for deploying LLMs.

Updated: 2024-07-01 05:38:08

标题: RouteLLM：利用偏好数据学习路由LLM

摘要: 大型语言模型（LLMs）在各种任务中展现出令人印象深刻的能力，然而选择使用哪种模型通常涉及性能和成本之间的权衡。更强大的模型虽然有效，但伴随着更高的费用，而能力较弱的模型则更具成本效益。为了解决这一困境，我们提出了几种高效的路由器模型，在推理过程中动态选择更强大或更弱的LLM，旨在优化成本和响应质量之间的平衡。我们开发了一个训练框架，利用人类偏好数据和数据增强技术来提升性能。我们在广泛认可的基准测试上的评估结果显示，我们的方法显著降低了成本，在某些情况下降低了超过2倍，同时没有损害响应质量。有趣的是，我们的路由器模型还展示了显著的迁移学习能力，在测试时即使更改了强弱模型，也能保持其性能。这突显了这些路由器为部署LLMs提供成本效益高且高性能的解决方案的潜力。

更新时间: 2024-07-01 05:38:08

领域: cs.LG,cs.AI,cs.CL

下载: http://arxiv.org/abs/2406.18665v2

Acceleration method for generating perception failure scenarios based on editing Markov process

With the rapid advancement of autonomous driving technology, self-driving cars have become a central focus in the development of future transportation systems. Scenario generation technology has emerged as a crucial tool for testing and verifying the safety performance of autonomous driving systems. Current research in scenario generation primarily focuses on open roads such as highways, with relatively limited studies on underground parking garages. The unique structural constraints, insufficient lighting, and high-density obstacles in underground parking garages impose greater demands on the perception systems, which are critical to autonomous driving technology. This study proposes an accelerated generation method for perception failure scenarios tailored to the underground parking garage environment, aimed at testing and improving the safety performance of autonomous vehicle (AV) perception algorithms in such settings. The method presented in this paper generates an intelligent testing environment with a high density of perception failure scenarios by learning the interactions between background vehicles (BVs) and autonomous vehicles (AVs) within perception failure scenarios. Furthermore, this method edits the Markov process within the perception failure scenario data to increase the density of critical information in the training data, thereby optimizing the learning and generation of perception failure scenarios. A simulation environment for an underground parking garage was developed using the Carla and Vissim platforms, with Bevfusion employed as the perception algorithm for testing. The study demonstrates that this method can generate an intelligent testing environment with a high density of perception failure scenarios and enhance the safety performance of perception algorithms within this experimental setup.

Updated: 2024-07-01 05:33:48

标题: 基于编辑马尔可夫过程的感知失败场景生成加速方法

摘要: 随着自动驾驶技术的快速发展，自动驾驶汽车已成为未来交通系统发展的核心关注点。场景生成技术已经成为测试和验证自动驾驶系统安全性能的关键工具。当前场景生成的研究主要集中在高速公路等开放道路上，对地下停车库的研究相对较少。地下停车库的独特结构约束、光照不足和高密度障碍物对感知系统提出了更高要求，这对自动驾驶技术至关重要。本研究提出了一种针对地下停车库环境定制的感知失败场景加速生成方法，旨在测试和改进自动驾驶汽车(AV)感知算法在这种环境中的安全性能。本文提出的方法通过学习感知失败场景内背景车辆(BVs)与自动驾驶车辆(AVs)之间的相互作用，生成一个具有高密度感知失败场景的智能测试环境。此外，该方法编辑感知失败场景数据中的马尔可夫过程，增加训练数据中关键信息的密度，从而优化感知失败场景的学习和生成。利用Carla和Vissim平台开发了一个地下停车库的模拟环境，使用Bevfusion作为测试的感知算法。研究表明，该方法可以生成一个具有高密度感知失败场景的智能测试环境，并提高在这种实验设置中感知算法的安全性能。

更新时间: 2024-07-01 05:33:48

领域: cs.AI,cs.RO

下载: http://arxiv.org/abs/2407.00980v1

A Differentiable Approach to Multi-scale Brain Modeling

We present a multi-scale differentiable brain modeling workflow utilizing BrainPy, a unique differentiable brain simulator that combines accurate brain simulation with powerful gradient-based optimization. We leverage this capability of BrainPy across different brain scales. At the single-neuron level, we implement differentiable neuron models and employ gradient methods to optimize their fit to electrophysiological data. On the network level, we incorporate connectomic data to construct biologically constrained network models. Finally, to replicate animal behavior, we train these models on cognitive tasks using gradient-based learning rules. Experiments demonstrate that our approach achieves superior performance and speed in fitting generalized leaky integrate-and-fire and Hodgkin-Huxley single neuron models. Additionally, training a biologically-informed network of excitatory and inhibitory spiking neurons on working memory tasks successfully replicates observed neural activity and synaptic weight distributions. Overall, our differentiable multi-scale simulation approach offers a promising tool to bridge neuroscience data across electrophysiological, anatomical, and behavioral scales.

Updated: 2024-07-01 05:31:37

标题: 一个可微分的多尺度大脑建模方法

摘要: 我们提出了一个多尺度可微分的大脑建模工作流程，利用了BrainPy，这是一个独特的可微分大脑模拟器，将精确的大脑模拟与强大的基于梯度的优化结合在一起。我们利用BrainPy在不同的大脑尺度上的这一能力。在单个神经元水平上，我们实现了可微分的神经元模型，并利用梯度方法优化它们与电生理数据的拟合。在网络水平上，我们结合连接组学数据构建受生物约束的网络模型。最后，为了复制动物行为，我们使用基于梯度的学习规则在认知任务上训练这些模型。实验证明，我们的方法在拟合广义漏电积分-发放和Hodgkin-Huxley单神经元模型方面实现了优越的性能和速度。此外，在工作记忆任务上训练一个充满生物信息的兴奋性和抑制性尖峰神经元网络成功复制了观察到的神经活动和突触权重分布。总的来说，我们的可微分多尺度模拟方法提供了一个有前途的工具，用于跨越电生理学、解剖学和行为尺度之间的神经科学数据。

更新时间: 2024-07-01 05:31:37

领域: cs.NE,cs.AI,cs.CE,q-bio.NC

下载: http://arxiv.org/abs/2406.19708v2

Hybrid RAG-empowered Multi-modal LLM for Secure Healthcare Data Management: A Diffusion-based Contract Theory Approach

Secure data management and effective data sharing have become paramount in the rapidly evolving healthcare landscape. The advancement of generative artificial intelligence has positioned Multi-modal Large Language Models (MLLMs) as crucial tools for managing healthcare data. MLLMs can support multi-modal inputs and generate diverse types of content by leveraging large-scale training on vast amounts of multi-modal data. However, critical challenges persist in developing medical MLLMs, including healthcare data security and freshness issues, affecting the output quality of MLLMs. In this paper, we propose a hybrid Retrieval-Augmented Generation (RAG)-empowered medical MLLMs framework for healthcare data management. This framework leverages a hierarchical cross-chain architecture to facilitate secure data training. Moreover, it enhances the output quality of MLLMs through hybrid RAG, which employs multi-modal metrics to filter various unimodal RAG results and incorporates these retrieval results as additional inputs to MLLMs. Additionally, we employ age of information to indirectly evaluate the data freshness impact of MLLMs and utilize contract theory to incentivize healthcare data holders to share fresh data, mitigating information asymmetry in data sharing. Finally, we utilize a generative diffusion model-based reinforcement learning algorithm to identify the optimal contract for efficient data sharing. Numerical results demonstrate the effectiveness of the proposed schemes, which achieve secure and efficient healthcare data management.

Updated: 2024-07-01 05:28:40

标题: 混合RAG增强的多模态LLM用于安全医疗数据管理：基于扩散式合同理论的方法

摘要: 安全数据管理和有效数据共享已成为快速发展的医疗保健领域的首要任务。生成人工智能的进步使得多模态大型语言模型（MLLMs）成为管理医疗保健数据的关键工具。MLLMs能够支持多模态输入，并通过在大量多模态数据上进行大规模训练生成各种类型的内容。然而，在开发医疗MLLMs时仍存在关键挑战，包括医疗保健数据安全和数据更新问题，影响了MLLMs的输出质量。在本文中，我们提出了一种基于检索增强生成（RAG）的医疗MLLMs框架，用于医疗保健数据管理。该框架利用分层交叉链架构促进安全数据训练。此外，通过混合RAG来增强MLLMs的输出质量，该方法利用多模态指标来过滤各种单模态RAG结果，并将这些检索结果作为MLLMs的额外输入。此外，我们利用信息时代来间接评估MLLMs对数据新鲜度的影响，并利用合同理论来激励医疗保健数据持有者共享新鲜数据，缓解数据共享中的信息不对称问题。最后，我们利用基于生成扩散模型的强化学习算法来确定有效数据共享的最佳合同。数值结果表明所提出的方案的有效性，实现了安全高效的医疗保健数据管理。

更新时间: 2024-07-01 05:28:40

领域: cs.AI,cs.LG

下载: http://arxiv.org/abs/2407.00978v1

Optimizing PM2.5 Forecasting Accuracy with Hybrid Meta-Heuristic and Machine Learning Models

Timely alerts about hazardous air pollutants are crucial for public health. However, existing forecasting models often overlook key factors like baseline parameters and missing data, limiting their accuracy. This study introduces a hybrid approach to address these issues, focusing on forecasting hourly PM2.5 concentrations using Support Vector Regression (SVR). Meta-heuristic algorithms, Grey Wolf Optimization (GWO) and Particle Swarm Optimization (PSO), optimize SVR Hyper-parameters "C" and "Gamma" to enhance prediction accuracy. Evaluation metrics include R-squared (R2), Root Mean Square Error (RMSE), and Mean Absolute Error (MAE). Results show significant improvements with PSO-SVR (R2: 0.9401, RMSE: 0.2390, MAE: 0.1368) and GWO-SVR (R2: 0.9408, RMSE: 0.2376, MAE: 0.1373), indicating robust and accurate models suitable for similar research applications.

Updated: 2024-07-01 05:24:19

标题: 用混合元启发式和机器学习模型优化PM2.5预测精度

摘要: 及时警报有关危险空气污染物的信息对公共健康至关重要。然而，现有的预测模型经常忽视基线参数和缺失数据等关键因素，从而限制了其准确性。本研究引入了一种混合方法来解决这些问题，重点是使用支持向量回归（SVR）来预测每小时的PM2.5浓度。元启发式算法灰狼优化（GWO）和粒子群优化（PSO）优化SVR的超参数“C”和“Gamma”以提高预测准确性。评估指标包括R平方（R2）、均方根误差（RMSE）和平均绝对误差（MAE）。结果表明，PSO-SVR（R2：0.9401，RMSE：0.2390，MAE：0.1368）和GWO-SVR（R2：0.9408，RMSE：0.2376，MAE：0.1373）的显著改善，表明这些稳健和准确的模型适用于类似的研究应用。

更新时间: 2024-07-01 05:24:19

领域: cs.NE,cs.AI,cs.LG

下载: http://arxiv.org/abs/2407.01647v1

Deep learning for automated detection of breast cancer in deep ultraviolet fluorescence images with diffusion probabilistic model

Data limitation is a significant challenge in applying deep learning to medical images. Recently, the diffusion probabilistic model (DPM) has shown the potential to generate high-quality images by converting Gaussian random noise into realistic images. In this paper, we apply the DPM to augment the deep ultraviolet fluorescence (DUV) image dataset with an aim to improve breast cancer classification for intraoperative margin assessment. For classification, we divide the whole surface DUV image into small patches and extract convolutional features for each patch by utilizing the pre-trained ResNet. Then, we feed them into an XGBoost classifier for patch-level decisions and then fuse them with a regional importance map computed by Grad-CAM++ for whole surface-level prediction. Our experimental results show that augmenting the training dataset with the DPM significantly improves breast cancer detection performance in DUV images, increasing accuracy from 93% to 97%, compared to using Affine transformations and ProGAN.

Updated: 2024-07-01 05:00:26

标题: 深度学习在深紫外荧光图像中利用扩散概率模型自动检测乳腺癌

摘要: 数据限制是将深度学习应用于医学图像时面临的重要挑战。最近，扩散概率模型（DPM）已经显示出潜力，通过将高斯随机噪声转化为逼真图像，生成高质量的图像。在本文中，我们将DPM应用于增强深紫外荧光（DUV）图像数据集，旨在提高乳腺癌分类的术中边缘评估。对于分类，我们将整个表面的DUV图像划分为小块，并利用预先训练的ResNet提取每个块的卷积特征。然后，我们将这些特征输入到XGBoost分类器进行块级决策，然后与由Grad-CAM++计算的区域重要性图进行融合，用于整个表面级别的预测。我们的实验结果表明，使用DPM增强训练数据集显著提高了DUV图像中乳腺癌检测的性能，将准确率从93%提高到97%，相比使用仿射变换和ProGAN。

更新时间: 2024-07-01 05:00:26

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2407.00967v1

ASCENT: Amplifying Power Side-Channel Resilience via Learning & Monte-Carlo Tree Search

Power side-channel (PSC) analysis is pivotal for securing cryptographic hardware. Prior art focused on securing gate-level netlists obtained as-is from chip design automation, neglecting all the complexities and potential side-effects for security arising from the design automation process. That is, automation traditionally prioritizes power, performance, and area (PPA), sidelining security. We propose a "security-first" approach, refining the logic synthesis stage to enhance the overall resilience of PSC countermeasures. We introduce ASCENT, a learning-and-search-based framework that (i) drastically reduces the time for post-design PSC evaluation and (ii) explores the security-vs-PPA design space. Thus, ASCENT enables an efficient exploration of a large number of candidate netlists, leading to an improvement in PSC resilience compared to regular PPA-optimized netlists. ASCENT is up to 120x faster than traditional PSC analysis and yields a 3.11x improvement for PSC resilience of state-of-the-art PSC countermeasures

Updated: 2024-07-01 04:52:56

标题: ASCENT：通过学习和蒙特卡洛树搜索增强功率侧信道抗干扰能力

摘要: 功率侧信道（PSC）分析对于保护密码硬件至关重要。先前的研究集中在保护从芯片设计自动化中获得的门级网络列表，忽略了设计自动化过程中出现的所有复杂性和潜在的安全副作用。也就是说，自动化传统上优先考虑功耗、性能和面积（PPA），而忽视了安全性。我们提出了一种“安全优先”方法，通过改进逻辑合成阶段来增强PSC对抗措施的整体韧性。我们引入了ASCENT，这是一个基于学习和搜索的框架，（i）极大地减少了后设计PSC评估的时间，（ii）探索了安全性与PPA设计空间。因此，ASCENT能够有效地探索大量候选网络列表，相较于常规PPA优化网络列表，提高了PSC的韧性。ASCENT比传统的PSC分析快了最多120倍，并且对于最先进的PSC对抗措施，PSC韧性提高了3.11倍。

更新时间: 2024-07-01 04:52:56

领域: cs.CR,cs.LG

下载: http://arxiv.org/abs/2406.19549v2

Tokenize the World into Object-level Knowledge to Address Long-tail Events in Autonomous Driving

The autonomous driving industry is increasingly adopting end-to-end learning from sensory inputs to minimize human biases in system design. Traditional end-to-end driving models, however, suffer from long-tail events due to rare or unseen inputs within their training distributions. To address this, we propose TOKEN, a novel Multi-Modal Large Language Model (MM-LLM) that tokenizes the world into object-level knowledge, enabling better utilization of LLM's reasoning capabilities to enhance autonomous vehicle planning in long-tail scenarios. TOKEN effectively alleviates data scarcity and inefficient tokenization by leveraging a traditional end-to-end driving model to produce condensed and semantically enriched representations of the scene, which are optimized for LLM planning compatibility through deliberate representation and reasoning alignment training stages. Our results demonstrate that TOKEN excels in grounding, reasoning, and planning capabilities, outperforming existing frameworks with a 27% reduction in trajectory L2 error and a 39% decrease in collision rates in long-tail scenarios. Additionally, our work highlights the importance of representation alignment and structured reasoning in sparking the common-sense reasoning capabilities of MM-LLMs for effective planning.

Updated: 2024-07-01 04:34:50

标题: 将世界划分为对象级知识以解决自动驾驶中的长尾事件

摘要: 自动驾驶行业越来越多地采用端到端学习，从感官输入中减少人为偏见在系统设计中的影响。然而，传统的端到端驾驶模型由于训练分布中罕见或未见输入而遭受长尾事件的影响。为了解决这个问题，我们提出了TOKEN，一种新颖的多模态大型语言模型（MM-LLM），它将世界分解为对象级知识，从而更好地利用LLM的推理能力，以增强自动驾驶车辆在长尾场景中的规划能力。TOKEN通过利用传统的端到端驾驶模型生成场景的简化和语义丰富的表示，有效缓解了数据稀缺和低效的标记化问题，并通过精心设计的表示和推理对齐训练阶段，使其更适用于LLM规划兼容性。我们的结果表明，TOKEN在基础、推理和规划能力方面表现出色，在长尾场景中，轨迹L2误差减少27％，碰撞率降低39％，超过现有框架。此外，我们的工作强调了表示对齐和结构化推理在激发多模态大型语言模型的常识推理能力方面的重要性，以实现有效规划。

更新时间: 2024-07-01 04:34:50

领域: cs.AI,cs.RO

下载: http://arxiv.org/abs/2407.00959v1

Universal Approximation Theory: The basic theory for large language models

Language models have emerged as a critical area of focus in artificial intelligence, particularly with the introduction of groundbreaking innovations like ChatGPT. Large-scale Transformer networks have quickly become the leading approach for advancing natural language processing algorithms. Built on the Transformer architecture, these models enable interactions that closely mimic human communication and, equipped with extensive knowledge, can even assist in guiding human tasks. Despite their impressive capabilities and growing complexity, a key question remains-the theoretical foundations of large language models (LLMs). What makes Transformer so effective for powering intelligent language applications, such as translation and coding? What underlies LLMs' ability for In-Context Learning (ICL)? How does the LoRA scheme enhance the fine-tuning of LLMs? And what supports the practicality of pruning LLMs? To address these critical questions and explore the technological strategies within LLMs, we leverage the Universal Approximation Theory (UAT) to offer a theoretical backdrop, shedding light on the mechanisms that underpin these advancements.

Updated: 2024-07-01 04:29:35

标题: 普适逼近理论：大型语言模型的基础理论

摘要: 语言模型已经成为人工智能领域的一个关键关注领域，特别是随着像ChatGPT这样的开创性创新的引入。大规模的Transformer网络迅速成为推进自然语言处理算法的主导方法。基于Transformer架构，这些模型使得交互方式能够紧密模仿人类沟通，并且配备了广泛的知识，甚至可以在指导人类任务方面提供帮助。尽管它们具有令人印象深刻的能力和不断增长的复杂性，但一个关键问题仍然存在-大型语言模型(LLMs)的理论基础是什么？是什么使得Transformer在推动智能语言应用，如翻译和编码方面如此有效？LLMs的In-Context Learning(ICL)能力的基础是什么？LoRA方案如何增强LLMs的微调？以及什么支持了LLMs修剪的实用性？为了回答这些关键问题并探索LLMs内部的技术策略，我们利用了通用逼近理论(UAT)来提供一个理论背景，阐明支撑这些进步的机制。

更新时间: 2024-07-01 04:29:35

领域: cs.AI,cs.CL,cs.LG

下载: http://arxiv.org/abs/2407.00958v1

Task-oriented Over-the-air Computation for Edge-device Co-inference with Balanced Classification Accuracy

Edge-device co-inference, which concerns the cooperation between edge devices and an edge server for completing inference tasks over wireless networks, has been a promising technique for enabling various kinds of intelligent services at the network edge, e.g., auto-driving. In this paradigm, the concerned design objective of the network shifts from the traditional communication throughput to the effective and efficient execution of the inference task underpinned by the network, measured by, e.g., the inference accuracy and latency. In this paper, a task-oriented over-the-air computation scheme is proposed for a multidevice artificial intelligence system. Particularly, a novel tractable inference accuracy metric is proposed for classification tasks, which is called minimum pair-wise discriminant gain. Unlike prior work measuring the average of all class pairs in feature space, it measures the minimum distance of all class pairs. By maximizing the minimum pair-wise discriminant gain instead of its average counterpart, any pair of classes can be better separated in the feature space, and thus leading to a balanced and improved inference accuracy for all classes. Besides, this paper jointly optimizes the minimum discriminant gain of all feature elements instead of separately maximizing that of each element in the existing designs. As a result, the transmit power can be adaptively allocated to the feature elements according to their different contributions to the inference accuracy, opening an extra degree of freedom to improve inference performance. Extensive experiments are conducted using a concrete use case of human motion recognition to verify the superiority of the proposed design over the benchmarking scheme.

Updated: 2024-07-01 04:17:32

标题: 面向任务的边缘设备联合推理的平衡分类准确度的空中计算

摘要: 边缘设备协同推断是指边缘设备与边缘服务器在无线网络上合作完成推断任务的技术，已成为实现各种智能服务（如自动驾驶）在网络边缘的有前景的技术。在这种范式中，网络的设计目标从传统的通信吞吐量转变为通过网络支持推断任务的有效和高效执行，其衡量标准包括推断准确性和延迟等。本文提出了一种面向任务的空中计算方案，用于多设备人工智能系统。特别地，提出了一种新颖的可处理的推断准确性度量标准，用于分类任务，称为最小成对判别增益。与之前测量特征空间中所有类别对的平均值不同，它测量所有类别对的最小距离。通过最大化最小成对判别增益而不是其平均值，任何一对类别都可以在特征空间中更好地分开，从而实现所有类别的平衡和提高推断准确性。此外，本文联合优化所有特征元素的最小判别增益，而不是分别最大化现有设计中的每个元素的判别增益。因此，可以根据它们对推断准确性的不同贡献来自适应地分配传输功率给特征元素，为提高推断性能提供额外的自由度。通过使用人体运动识别的具体用例进行大量实验，验证了所提出的设计相对于基准方案的优越性。

更新时间: 2024-07-01 04:17:32

领域: cs.IT,cs.AI,eess.SP,math.IT

下载: http://arxiv.org/abs/2407.00955v1

The House Always Wins: A Framework for Evaluating Strategic Deception in LLMs

We propose a framework for evaluating strategic deception in large language models (LLMs). In this framework, an LLM acts as a game master in two scenarios: one with random game mechanics and another where it can choose between random or deliberate actions. As an example, we use blackjack because the action space nor strategies involve deception. We benchmark Llama3-70B, GPT-4-Turbo, and Mixtral in blackjack, comparing outcomes against expected distributions in fair play to determine if LLMs develop strategies favoring the "house." Our findings reveal that the LLMs exhibit significant deviations from fair play when given implicit randomness instructions, suggesting a tendency towards strategic manipulation in ambiguous scenarios. However, when presented with an explicit choice, the LLMs largely adhere to fair play, indicating that the framing of instructions plays a crucial role in eliciting or mitigating potentially deceptive behaviors in AI systems.

Updated: 2024-07-01 04:07:49

标题: 永远是庄家赢：评估LLMs中战略欺骗的框架

摘要: 我们提出了一个评估大型语言模型（LLMs）中战略欺骗的框架。在这个框架中，一个LLM在两种情境下充当游戏主持人：一个是随机游戏机制，另一个是可以选择随机或蓄意行动。我们以21点为例，因为动作空间和策略都不涉及欺骗。我们在21点中对Llama3-70B、GPT-4-Turbo和Mixtral进行了基准测试，将结果与公平游戏中的预期分布进行比较，以确定LLMs是否制定了有利于“庄家”的策略。我们的研究结果表明，当给予隐含的随机性指令时，LLMs表现出明显偏离公平游戏的情况，表明在模糊情境中存在对战略操纵的倾向。然而，当面对明确的选择时，LLMs大多遵循公平游戏，表明指令的框架在引发或减轻AI系统中潜在欺骗行为方面起着至关重要的作用。

更新时间: 2024-07-01 04:07:49

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2407.00948v1

FAITH: Frequency-domain Attention In Two Horizons for Time Series Forecasting

Time Series Forecasting plays a crucial role in various fields such as industrial equipment maintenance, meteorology, energy consumption, traffic flow and financial investment. However, despite their considerable advantages over traditional statistical approaches, current deep learning-based predictive models often exhibit a significant deviation between their forecasting outcomes and the ground truth. This discrepancy is largely due to an insufficient emphasis on extracting the sequence's latent information, particularly its global information within the frequency domain and the relationship between different variables. To address this issue, we propose a novel model Frequency-domain Attention In Two Horizons, which decomposes time series into trend and seasonal components using a multi-scale sequence adaptive decomposition and fusion architecture, and processes them separately. FAITH utilizes Frequency Channel feature Extraction Module and Frequency Temporal feature Extraction Module to capture inter-channel relationships and temporal global information in the sequence, significantly improving its ability to handle long-term dependencies and complex patterns. Furthermore, FAITH achieves theoretically linear complexity by modifying the time-frequency domain transformation method, effectively reducing computational costs. Extensive experiments on 6 benchmarks for long-term forecasting and 3 benchmarks for short-term forecasting demonstrate that FAITH outperforms existing models in many fields, such as electricity, weather and traffic, proving its effectiveness and superiority both in long-term and short-term time series forecasting tasks. Our codes and data are available at https://github.com/LRQ577/FAITH.

Updated: 2024-07-01 04:01:11

标题: 信念：时间序列预测中的两个视角的频域注意力

摘要: 时间序列预测在工业设备维护、气象学、能源消耗、交通流量和金融投资等领域起着至关重要的作用。然而，尽管与传统统计方法相比具有相当大的优势，当前基于深度学习的预测模型通常在其预测结果与实际情况之间存在显著偏差。这种差异主要是由于对提取序列潜在信息的重视不足，特别是在频域内的全局信息以及不同变量之间的关系。为了解决这个问题，我们提出了一种新颖的模型——Frequency-domain Attention In Two Horizons（FAITH），通过多尺度序列自适应分解和融合架构将时间序列分解为趋势和季节性组件，并分别处理它们。FAITH利用频道特征提取模块和频率时间特征提取模块来捕获序列中的通道间关系和时间全局信息，显著提高了处理长期依赖关系和复杂模式的能力。此外，FAITH通过修改时频领域转换方法实现了理论上的线性复杂度，有效降低了计算成本。对长期预测的6个基准和短期预测的3个基准进行的大量实验表明，FAITH在许多领域（如电力、天气和交通）的表现优于现有模型，证明了其在长期和短期时间序列预测任务中的有效性和优越性。我们的代码和数据可在https://github.com/LRQ577/FAITH上找到。

更新时间: 2024-07-01 04:01:11

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2405.13300v3

ProductAgent: Benchmarking Conversational Product Search Agent with Asking Clarification Questions

This paper introduces the task of product demand clarification within an e-commercial scenario, where the user commences the conversation with ambiguous queries and the task-oriented agent is designed to achieve more accurate and tailored product searching by asking clarification questions. To address this task, we propose ProductAgent, a conversational information seeking agent equipped with abilities of strategic clarification question generation and dynamic product retrieval. Specifically, we develop the agent with strategies for product feature summarization, query generation, and product retrieval. Furthermore, we propose the benchmark called PROCLARE to evaluate the agent's performance both automatically and qualitatively with the aid of a LLM-driven user simulator. Experiments show that ProductAgent interacts positively with the user and enhances retrieval performance with increasing dialogue turns, where user demands become gradually more explicit and detailed. All the source codes will be released after the review anonymity period.

Updated: 2024-07-01 03:50:23

标题: ProductAgent：使用询问澄清问题进行对话式产品搜索代理基准测试

摘要: 本文介绍了在电子商务场景中产品需求澄清的任务，用户以模糊的查询开始对话，任务导向的代理被设计为通过询问澄清问题实现更准确和定制的产品搜索。为了解决这一任务，我们提出了ProductAgent，一个具有战略澄清问题生成和动态产品检索能力的对话信息搜索代理。具体而言，我们开发了代理的产品特征总结、查询生成和产品检索策略。此外，我们提出了名为PROCLARE的基准来评估代理的性能，通过LLM驱动的用户模拟器自动和定性地进行评估。实验表明，ProductAgent与用户积极互动，并在对话轮次增加时提升了检索性能，用户需求逐渐变得更加明确和详细。所有源代码将在审查匿名期结束后发布。

更新时间: 2024-07-01 03:50:23

领域: cs.IR,cs.AI,cs.CL

下载: http://arxiv.org/abs/2407.00942v1

Time-Frequency Jointed Imperceptible Adversarial Attack to Brainprint Recognition with Deep Learning Models

EEG-based brainprint recognition with deep learning models has garnered much attention in biometric identification. Yet, studies have indicated vulnerability to adversarial attacks in deep learning models with EEG inputs. In this paper, we introduce a novel adversarial attack method that jointly attacks time-domain and frequency-domain EEG signals by employing wavelet transform. Different from most existing methods which only target time-domain EEG signals, our method not only takes advantage of the time-domain attack's potent adversarial strength but also benefits from the imperceptibility inherent in frequency-domain attack, achieving a better balance between attack performance and imperceptibility. Extensive experiments are conducted in both white- and grey-box scenarios and the results demonstrate that our attack method achieves state-of-the-art attack performance on three datasets and three deep-learning models. In the meanwhile, the perturbations in the signals attacked by our method are barely perceptible to the human visual system.

Updated: 2024-07-01 03:37:51

标题: 使用深度学习模型进行脑纹识别的时频联合不可察觉对抗攻击

摘要: 基于深度学习模型的基于脑电图的脑印识别引起了生物特征识别领域的广泛关注。然而，研究表明，使用脑电图输入的深度学习模型容易受到对抗性攻击的影响。本文介绍了一种新颖的对抗性攻击方法，通过利用小波变换同时攻击时域和频域脑电图信号。与大多数只针对时域脑电图信号的现有方法不同，我们的方法不仅利用了时域攻击的强大对抗性能力，还从频域攻击中固有的不可察觉性中获益，实现了攻击性能和不可察觉性之间的更好平衡。在白盒和灰盒场景下进行了大量实验，结果表明，我们的攻击方法在三个数据集和三个深度学习模型上实现了最新攻击性能。同时，我们的方法攻击的信号中的扰动几乎对人类视觉系统不可察觉。

更新时间: 2024-07-01 03:37:51

领域: cs.CR

下载: http://arxiv.org/abs/2403.10021v3

Large Language Model Enhanced Knowledge Representation Learning: A Survey

The integration of Large Language Models (LLMs) with Knowledge Representation Learning (KRL) signifies a pivotal advancement in the field of artificial intelligence, enhancing the ability to capture and utilize complex knowledge structures. This synergy leverages the advanced linguistic and contextual understanding capabilities of LLMs to improve the accuracy, adaptability, and efficacy of KRL, thereby expanding its applications and potential. Despite the increasing volume of research focused on embedding LLMs within the domain of knowledge representation, a thorough review that examines the fundamental components and processes of these enhanced models is conspicuously absent. Our survey addresses this by categorizing these models based on three distinct Transformer architectures, and by analyzing experimental data from various KRL downstream tasks to evaluate the strengths and weaknesses of each approach. Finally, we identify and explore potential future research directions in this emerging yet underexplored domain, proposing pathways for continued progress.

Updated: 2024-07-01 03:37:35

标题: 大型语言模型增强知识表示学习：一项调查

摘要: 将大型语言模型(LLMs)与知识表示学习(KRL)相结合标志着人工智能领域的一个重要进展，增强了捕捉和利用复杂知识结构的能力。这种协同作用利用LLMs的高级语言和上下文理解能力来提高KRL的准确性、适应性和效力，从而扩大其应用和潜力。尽管越来越多的研究集中在将LLMs嵌入到知识表示领域，但缺乏一个系统审查来检视这些增强模型的基本组成部分和过程。我们的调查通过将这些模型基于三种不同的Transformer架构进行分类，并分析来自各种KRL下游任务的实验数据，以评估每种方法的优势和劣势。最后，我们确定并探讨这一新兴但尚未充分探索的领域的潜在未来研究方向，提出持续进展的途径。

更新时间: 2024-07-01 03:37:35

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2407.00936v1

Backdoor for Debias: Mitigating Model Bias with Backdoor Attack-based Artificial Bias

With the swift advancement of deep learning, state-of-the-art algorithms have been utilized in various social situations. Nonetheless, some algorithms have been discovered to exhibit biases and provide unequal results. The current debiasing methods face challenges such as poor utilization of data or intricate training requirements. In this work, we found that the backdoor attack can construct an artificial bias similar to the model bias derived in standard training. Considering the strong adjustability of backdoor triggers, we are motivated to mitigate the model bias by carefully designing reverse artificial bias created from backdoor attack. Based on this, we propose a backdoor debiasing framework based on knowledge distillation, which effectively reduces the model bias from original data and minimizes security risks from the backdoor attack. The proposed solution is validated on both image and structured datasets, showing promising results. This work advances the understanding of backdoor attacks and highlights its potential for beneficial applications. The code for the study can be found at \url{https://anonymous.4open.science/r/DwB-BC07/}.

Updated: 2024-07-01 03:33:55

标题: 通过后门攻击减轻模型偏见：利用后门攻击基于人工偏见消除模型偏见

摘要: 随着深度学习的快速发展，最先进的算法已经被应用于各种社会场景中。然而，一些算法被发现存在偏见并提供不平等的结果。当前的去偏方法面临挑战，如数据利用不足或复杂的训练要求。在这项工作中，我们发现后门攻击可以构建一种类似于标准训练中产生的模型偏见的人工偏见。考虑到后门触发器的强大可调性，我们受到启发，通过精心设计从后门攻击中创建的反向人工偏见来减轻模型偏见。基于此，我们提出了基于知识蒸馏的后门去偏框架，有效减少了原始数据中的模型偏见，并最小化了后门攻击的安全风险。所提出的解决方案在图像和结构化数据集上得到验证，显示出有希望的结果。这项工作推动了对后门攻击的理解，并突显了其在有益应用中的潜力。该研究的代码可以在网址https://anonymous.4open.science/r/DwB-BC07/找到。

更新时间: 2024-07-01 03:33:55

领域: cs.LG,cs.AI,cs.CY

下载: http://arxiv.org/abs/2303.01504v3

Beyond Human Preferences: Exploring Reinforcement Learning Trajectory Evaluation and Improvement through LLMs

Reinforcement learning (RL) faces challenges in evaluating policy trajectories within intricate game tasks due to the difficulty in designing comprehensive and precise reward functions. This inherent difficulty curtails the broader application of RL within game environments characterized by diverse constraints. Preference-based reinforcement learning (PbRL) presents a pioneering framework that capitalizes on human preferences as pivotal reward signals, thereby circumventing the need for meticulous reward engineering. However, obtaining preference data from human experts is costly and inefficient, especially under conditions marked by complex constraints. To tackle this challenge, we propose a LLM-enabled automatic preference generation framework named LLM4PG , which harnesses the capabilities of large language models (LLMs) to abstract trajectories, rank preferences, and reconstruct reward functions to optimize conditioned policies. Experiments on tasks with complex language constraints demonstrated the effectiveness of our LLM-enabled reward functions, accelerating RL convergence and overcoming stagnation caused by slow or absent progress under original reward structures. This approach mitigates the reliance on specialized human knowledge and demonstrates the potential of LLMs to enhance RL's effectiveness in complex environments in the wild.

Updated: 2024-07-01 03:32:48

标题: 超越人类偏好：通过LLMs探索强化学习轨迹的评估和改进

摘要: 强化学习（RL）在评估复杂游戏任务中的策略轨迹时面临挑战，因为设计全面和精确的奖励函数很困难。这种固有困难限制了RL在具有多样性约束的游戏环境中的广泛应用。基于偏好的强化学习（PbRL）提出了一个开创性框架，利用人类偏好作为关键奖励信号，从而避免了对精心设计奖励的需要。然而，从人类专家获取偏好数据成本高且低效，尤其是在存在复杂约束的情况下。为了应对这一挑战，我们提出了一个名为LLM4PG的LLM启用的自动偏好生成框架，利用大型语言模型（LLMs）的能力来抽象轨迹、排名偏好，并重构奖励函数以优化条件策略。在具有复杂语言约束的任务上的实验证明了我们LLM启用的奖励函数的有效性，加速了RL的收敛，并克服了在原始奖励结构下由于进展缓慢或缺乏进展而导致的停滞。这种方法减少了对专业人员知识的依赖，并展示了LLMs提高RL在复杂环境中的有效性的潜力。

更新时间: 2024-07-01 03:32:48

领域: cs.AI

下载: http://arxiv.org/abs/2406.19644v2

Multi-State TD Target for Model-Free Reinforcement Learning

Temporal difference (TD) learning is a fundamental technique in reinforcement learning that updates value estimates for states or state-action pairs using a TD target. This target represents an improved estimate of the true value by incorporating both immediate rewards and the estimated value of subsequent states. Traditionally, TD learning relies on the value of a single subsequent state. We propose an enhanced multi-state TD (MSTD) target that utilizes the estimated values of multiple subsequent states. Building on this new MSTD concept, we develop complete actor-critic algorithms that include management of replay buffers in two modes, and integrate with deep deterministic policy optimization (DDPG) and soft actor-critic (SAC). Experimental results demonstrate that algorithms employing the MSTD target significantly improve learning performance compared to traditional methods.The code is provided on GitHub.

Updated: 2024-07-01 03:21:38

标题: 多状态TD目标用于无模型强化学习

摘要: 时间差异（TD）学习是强化学习中的一种基本技术，它通过使用TD目标来更新状态或状态-动作对的值估计。这个目标代表了对真实值的改进估计，同时融合了即时奖励和后续状态的估计值。传统上，TD学习依赖于单个后续状态的值。我们提出了一种增强的多状态TD（MSTD）目标，利用了多个后续状态的估计值。基于这个新的MSTD概念，我们开发了完整的演员-评论家算法，包括在两种模式下管理重播缓冲区，并与深度确定性策略优化（DDPG）和软演员-评论家（SAC）集成。实验结果表明，采用MSTD目标的算法与传统方法相比显著提高了学习性能。代码提供在GitHub上。

更新时间: 2024-07-01 03:21:38

领域: cs.LG,cs.AI,68T05(Primary)

下载: http://arxiv.org/abs/2405.16522v3

WaterBench: Towards Holistic Evaluation of Watermarks for Large Language Models

To mitigate the potential misuse of large language models (LLMs), recent research has developed watermarking algorithms, which restrict the generation process to leave an invisible trace for watermark detection. Due to the two-stage nature of the task, most studies evaluate the generation and detection separately, thereby presenting a challenge in unbiased, thorough, and applicable evaluations. In this paper, we introduce WaterBench, the first comprehensive benchmark for LLM watermarks, in which we design three crucial factors: (1) For benchmarking procedure, to ensure an apples-to-apples comparison, we first adjust each watermarking method's hyper-parameter to reach the same watermarking strength, then jointly evaluate their generation and detection performance. (2) For task selection, we diversify the input and output length to form a five-category taxonomy, covering $9$ tasks. (3) For evaluation metric, we adopt the GPT4-Judge for automatically evaluating the decline of instruction-following abilities after watermarking. We evaluate $4$ open-source watermarks on $2$ LLMs under $2$ watermarking strengths and observe the common struggles for current methods on maintaining the generation quality. The code and data are available at https://github.com/THU-KEG/WaterBench.

Updated: 2024-07-01 03:17:42

标题: 水印评估: 面向大型语言模型的全面评估

摘要: 为了减轻大型语言模型（LLMs）的潜在滥用，最近的研究已经开发出了水印算法，这些算法限制了生成过程，留下了一个不可见的痕迹以供水印检测。由于任务的两阶段性质，大多数研究分别评估生成和检测，因此在公正、全面和适用性评估方面存在挑战。在本文中，我们介绍了WaterBench，这是第一个针对LLM水印的全面基准测试，我们设计了三个关键因素：（1）对于基准测试过程，为了确保苹果与苹果之间的比较，我们首先调整每种水印方法的超参数，以达到相同的水印强度，然后联合评估它们的生成和检测性能。（2）对于任务选择，我们多样化输入和输出长度，形成了一个包含9个任务的五类分类法。（3）对于评估度量，我们采用GPT4-Judge自动评估水印后指令遵循能力的下降。我们在2个LLM上评估了4个开源水印，在2个水印强度下观察到当前方法在维持生成质量方面的共同困难。代码和数据可在https://github.com/THU-KEG/WaterBench 上获得。

更新时间: 2024-07-01 03:17:42

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2311.07138v2

Scene Graph Generation in Large-Size VHR Satellite Imagery: A Large-Scale Dataset and A Context-Aware Approach

Scene graph generation (SGG) in satellite imagery (SAI) benefits promoting intelligent understanding of geospatial scenarios from perception to cognition. In SAI, objects exhibit great variations in scales and aspect ratios, and there exist rich relationships between objects (even between spatially disjoint objects), which makes it necessary to holistically conduct SGG in large-size very-high-resolution (VHR) SAI. However, the lack of SGG datasets with large-size VHR SAI has constrained the advancement of SGG in SAI. Due to the complexity of large-size VHR SAI, mining triplets <subject, relationship, object> in large-size VHR SAI heavily relies on long-range contextual reasoning. Consequently, SGG models designed for small-size natural imagery are not directly applicable to large-size VHR SAI. To address the scarcity of datasets, this paper constructs a large-scale dataset for SGG in large-size VHR SAI with image sizes ranging from 512 x 768 to 27,860 x 31,096 pixels, named RSG, encompassing over 210,000 objects and more than 400,000 triplets. To realize SGG in large-size VHR SAI, we propose a context-aware cascade cognition (CAC) framework to understand SAI at three levels: object detection (OBD), pair pruning and relationship prediction. As a fundamental prerequisite for SGG in large-size SAI, a holistic multi-class object detection network (HOD-Net) that can flexibly integrate multi-scale contexts is proposed. With the consideration that there exist a huge amount of object pairs in large-size SAI but only a minority of object pairs contain meaningful relationships, we design a pair proposal generation (PPG) network via adversarial reconstruction to select high-value pairs. Furthermore, a relationship prediction network with context-aware messaging (RPCM) is proposed to predict the relationship types of these pairs.

Updated: 2024-07-01 03:10:43

标题: 大尺寸超高分辨率卫星图像中的场景图生成：大规模数据集和上下文感知方法

摘要: 卫星图像中的场景图生成（SGG）有助于从感知到认知促进地理空间情景的智能理解。在卫星图像中，对象在尺度和长宽比上表现出很大的变化，并且对象之间存在丰富的关系（甚至在空间上不相连的对象之间也存在关系），这使得在大尺寸高分辨率（VHR）卫星图像中全面进行SGG是必要的。然而，缺乏大尺寸VHR卫星图像的SGG数据集限制了SGG在卫星图像中的发展。由于大尺寸VHR卫星图像的复杂性，在大尺寸VHR卫星图像中挖掘三元组<主体，关系，对象>严重依赖于长距离上下文推理。因此，为小尺寸自然图像设计的SGG模型并不直接适用于大尺寸VHR卫星图像。为解决数据集稀缺性问题，本文构建了一个大规模的SGG数据集，用于大尺寸VHR卫星图像，图像尺寸范围从512 x 768到27,860 x 31,096像素，命名为RSG，涵盖超过21万个对象和40万多个三元组。为实现大尺寸VHR卫星图像中的SGG，我们提出了一个上下文感知级联认知（CAC）框架，以三个级别理解卫星图像：对象检测（OBD），配对修剪和关系预测。作为大尺寸卫星图像中SGG的基本前提，提出了一个可以灵活集成多尺度上下文的全面多类对象检测网络（HOD-Net）。考虑到大尺寸卫星图像中存在大量对象对，但只有少数对象对包含有意义的关系，我们设计了一个通过对抗重建选择高价值对的配对提案生成（PPG）网络。此外，提出了一个具有上下文感知传输的关系预测网络（RPCM），用于预测这些对的关系类型。

更新时间: 2024-07-01 03:10:43

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2406.09410v2

Fast Unsupervised Deep Outlier Model Selection with Hypernetworks

Outlier detection (OD) finds many applications with a rich literature of numerous techniques. Deep neural network based OD (DOD) has seen a recent surge of attention thanks to the many advances in deep learning. In this paper, we consider a critical-yet-understudied challenge with unsupervised DOD, that is, effective hyperparameter (HP) tuning/model selection. While several prior work report the sensitivity of OD models to HPs, it becomes ever so critical for the modern DOD models that exhibit a long list of HPs. We introduce HYPER for tuning DOD models, tackling two fundamental challenges: (1) validation without supervision (due to lack of labeled anomalies), and (2) efficient search of the HP/model space (due to exponential growth in the number of HPs). A key idea is to design and train a novel hypernetwork (HN) that maps HPs onto optimal weights of the main DOD model. In turn, HYPER capitalizes on a single HN that can dynamically generate weights for many DOD models (corresponding to varying HPs), which offers significant speed-up. In addition, it employs meta-learning on historical OD tasks with labels to train a proxy validation function, likewise trained with our proposed HN efficiently. Extensive experiments on 35 OD tasks show that HYPER achieves high performance against 8 baselines with significant efficiency gains.

Updated: 2024-07-01 03:10:34

标题: 使用超网络进行快速无监督深度异常模型选择

摘要: 异常检测（OD）在许多应用中发挥作用，有许多技术的丰富文献。基于深度神经网络的OD（DOD）由于深度学习的许多进展而受到近来的关注。本文考虑了无监督DOD中一个关键但鲜为人知的挑战，即有效的超参数（HP）调整/模型选择。虽然一些先前的工作报告了OD模型对HP的敏感性，但对于表现出长列表HP的现代DOD模型来说，这一点变得尤为关键。我们介绍了HYPER来调整DOD模型，解决了两个基本挑战：（1）在没有监督的情况下进行验证（由于缺乏标记的异常），以及（2）在HP/模型空间中进行高效搜索（由于HP数量的指数增长）。一个关键思想是设计和训练一个新颖的超网络（HN），将HP映射到主DOD模型的最佳权重。因此，HYPER利用一个单一的HN，可以为许多DOD模型（对应不同HP）动态生成权重，从而提供了显著的加速。另外，它利用具有标签的历史OD任务上的元学习来训练一个代理验证函数，同样有效地与我们提出的HN一起进行训练。对35个OD任务的大量实验表明，HYPER在与8个基线相比取得了高性能，并且实现了显著的效率提升。

更新时间: 2024-07-01 03:10:34

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2307.10529v2

ESALE: Enhancing Code-Summary Alignment Learning for Source Code Summarization

(Source) code summarization aims to automatically generate succinct natural language summaries for given code snippets. Such summaries play a significant role in promoting developers to understand and maintain code. Inspired by neural machine translation, deep learning-based code summarization techniques widely adopt an encoder-decoder framework, where the encoder transforms given code snippets into context vectors, and the decoder decodes context vectors into summaries. Recently, large-scale pre-trained models for source code are equipped with encoders capable of producing general context vectors and have achieved substantial improvements on code summarization. However, although they are usually trained mainly on code-focused tasks and can capture general code features, they still fall short in capturing specific features that need to be summarized. This paper proposes a novel approach to improve code summarization based on summary-focused tasks. Specifically, we exploit a multi-task learning paradigm to train the encoder on three summary-focused tasks to enhance its ability to learn code-summary alignment, including unidirectional language modeling (ULM), masked language modeling (MLM), and action word prediction (AWP). Unlike pre-trained models that mainly predict masked tokens in code snippets, we design ULM and MLM to predict masked words in summaries. Intuitively, predicting words based on given code snippets would help learn the code-summary alignment. Additionally, we introduce the domain-specific task AWP to enhance the ability of the encoder to learn the alignment between action words and code snippets. The extensive experiments on four datasets demonstrate that our approach, called ESALE significantly outperforms baselines in all three widely used metrics, including BLEU, METEOR, and ROUGE-L.

Updated: 2024-07-01 03:06:51

标题: ESALE：增强源代码摘要对齐学习以用于源代码摘要化

摘要: 源代码摘要旨在自动生成给定代码片段的简洁自然语言摘要。这些摘要在促进开发人员理解和维护代码方面起着重要作用。受神经机器翻译的启发，基于深度学习的代码摘要技术广泛采用编码器-解码器框架，其中编码器将给定的代码片段转换为上下文向量，解码器将上下文向量解码为摘要。最近，用于源代码的大规模预训练模型配备了能够生成通用上下文向量的编码器，并在代码摘要方面取得了实质性进展。然而，尽管它们通常主要在代码相关任务上进行训练，并且可以捕捉通用代码特征，但它们仍然在捕捉需要被摘要的特定特征方面表现不足。本文提出了一种基于摘要关注任务来改进代码摘要的新方法。具体而言，我们利用多任务学习范式来训练编码器进行三个摘要关注任务，以增强其学习代码摘要对齐的能力，包括单向语言建模（ULM）、掩码语言建模（MLM）和动作词预测（AWP）。与主要预测代码片段中掩码标记的预训练模型不同，我们设计ULM和MLM来预测摘要中的掩码词。直觉上，基于给定代码片段预测词语将有助于学习代码摘要对齐。此外，我们引入了领域特定任务AWP，以增强编码器学习动作词和代码片段之间的对齐能力。对四个数据集的广泛实验表明，我们的方法称为ESALE在所有三个广泛使用的指标，包括BLEU、METEOR和ROUGE-L中明显优于基准方法。

更新时间: 2024-07-01 03:06:51

领域: cs.SE,cs.AI,68-04,D.2.3; I.2.7

下载: http://arxiv.org/abs/2407.01646v1

Robust and Reliable Early-Stage Website Fingerprinting Attacks via Spatial-Temporal Distribution Analysis

Website Fingerprinting (WF) attacks identify the websites visited by users by performing traffic analysis, compromising user privacy. Particularly, DL-based WF attacks demonstrate impressive attack performance. However, the effectiveness of DL-based WF attacks relies on the collected complete and pure traffic during the page loading, which impacts the practicality of these attacks. The WF performance is rather low under dynamic network conditions and various WF defenses, particularly when the analyzed traffic is only a small part of the complete traffic. In this paper, we propose Holmes, a robust and reliable early-stage WF attack. Holmes utilizes temporal and spatial distribution analysis of website traffic to effectively identify websites in the early stages of page loading. Specifically, Holmes develops adaptive data augmentation based on the temporal distribution of website traffic and utilizes a supervised contrastive learning method to extract the correlations between the early-stage traffic and the pre-collected complete traffic. Holmes accurately identifies traffic in the early stages of page loading by computing the correlation of the traffic with the spatial distribution information, which ensures robust and reliable detection according to early-stage traffic. We extensively evaluate Holmes using six datasets. Compared to nine existing DL-based WF attacks, Holmes improves the F1-score of identifying early-stage traffic by an average of 169.18%. Furthermore, we replay the traffic of visiting real-world dark web websites. Holmes successfully identifies dark web websites when the ratio of page loading on average is only 21.71%, with an average precision improvement of 169.36% over the existing WF attacks.

Updated: 2024-07-01 02:51:26

标题: 稳健可靠的早期网站指纹攻击：基于时空分布分析

摘要: 网站指纹识别（WF）攻击通过进行流量分析识别用户访问的网站，侵犯用户隐私。特别是，基于DL的WF攻击展示了令人印象深刻的攻击性能。然而，DL-based WF攻击的有效性取决于在页面加载过程中收集到的完整和纯净的流量，这影响了这些攻击的实用性。在动态网络条件和各种WF防御下，WF性能在分析流量只是完整流量的一小部分时相当低。在本文中，我们提出了Holmes，一个强大可靠的早期WF攻击。Holmes利用网站流量的时空分布分析有效地在页面加载的早期阶段识别网站。具体而言，Holmes根据网站流量的时间分布开发自适应数据增强，并利用监督对比学习方法提取早期流量与预先收集的完整流量之间的相关性。Holmes通过计算与空间分布信息的流量相关性来准确识别页面加载的早期阶段流量，从而根据早期阶段流量确保了强大可靠的检测。我们使用六个数据集对Holmes进行了广泛评估。与九种现有的基于DL的WF攻击相比，Holmes将识别早期阶段流量的F1分数平均提高了169.18%。此外，我们重放了访问现实世界暗网网站的流量。当页面加载的平均比例仅为21.71%时，Holmes成功识别了暗网网站，平均精度提高了169.36%以上，超过了现有的WF攻击。

更新时间: 2024-07-01 02:51:26

领域: cs.CR,cs.AI,cs.LG

下载: http://arxiv.org/abs/2407.00918v1

Improved Monte Carlo tree search (MCTS) formulation with multiple root nodes for discrete sizing optimization of truss structures

This paper proposes a new method for discrete optimum design of truss structures utilizing Monte Carlo tree search (MCTS) with update process, the best reward, accelerating technique, and terminal condition. An improved MCTS formulation with multiple root nodes is developed in this study. Update process means that once a final solution is found, it is used as the initial solution for next search tree. The best reward is used in the backpropagation step. Accelerating technique is introduced by decreasing the width of search tree and reducing maximum number of iterations. The agent is trained to minimize the total structural weight under various constraints until the terminal condition is satisfied. Then, optimal solution is the minimum value of all solutions found by search trees. These numerical examples show that the agent can find optimal solution with low computational cost, stably produces an optimal design, and is suitable for practical engineering problems.

Updated: 2024-07-01 02:44:38

标题: 改进的蒙特卡洛树搜索（MCTS）方法：用于离散框架结构尺寸优化的多个根节点模型

摘要: 本文提出了一种利用蒙特卡洛树搜索（MCTS）与更新过程、最佳奖励、加速技术和终止条件的离散桁架结构优化设计的新方法。本研究中开发了一种改进的具有多个根节点的MCTS公式。更新过程意味着一旦找到最终解决方案，就将其用作下一个搜索树的初始解决方案。在反向传播步骤中使用最佳奖励。通过减小搜索树的宽度和减少最大迭代次数来引入加速技术。代理被训练以在各种约束条件下最小化总体结构重量，直到满足终止条件。然后，最佳解决方案是搜索树找到的所有解决方案的最小值。这些数值示例表明，代理可以以较低的计算成本找到最优解决方案，稳定地产生最佳设计，并适用于实际工程问题。

更新时间: 2024-07-01 02:44:38

领域: cs.AI,cs.NA,math.NA

下载: http://arxiv.org/abs/2309.06045v3

SecureSpectra: Safeguarding Digital Identity from Deep Fake Threats via Intelligent Signatures

Advancements in DeepFake (DF) audio models pose a significant threat to voice authentication systems, leading to unauthorized access and the spread of misinformation. We introduce a defense mechanism, SecureSpectra, addressing DF threats by embedding orthogonal, irreversible signatures within audio. SecureSpectra leverages the inability of DF models to replicate high-frequency content, which we empirically identify across diverse datasets and DF models. Integrating differential privacy into the pipeline protects signatures from reverse engineering and strikes a delicate balance between enhanced security and minimal performance compromises. Our evaluations on Mozilla Common Voice, LibriSpeech, and VoxCeleb datasets showcase SecureSpectra's superior performance, outperforming recent works by up to 71% in detection accuracy. We open-source SecureSpectra to benefit the research community.

Updated: 2024-07-01 02:36:27

标题: SecureSpectra：通过智能签名保护数字身份免受深度伪造威胁

摘要: DeepFake（DF）音频模型的进展对语音认证系统构成了重大威胁，导致未经授权访问和虚假信息的传播。我们引入了一种防御机制，SecureSpectra，通过在音频中嵌入正交、不可逆的签名来应对DF威胁。SecureSpectra利用DF模型无法复制高频内容的能力，我们在各种数据集和DF模型中经验性地确定了这一点。将差分隐私整合到管道中可以保护签名免受逆向工程的侵害，并在增强安全性和最小性能妥协之间取得微妙的平衡。我们对Mozilla Common Voice、LibriSpeech和VoxCeleb数据集进行评估，展示了SecureSpectra卓越的性能，检测准确率比最近的研究成果提高了高达71%。我们开源SecureSpectra，以造福研究社区。

更新时间: 2024-07-01 02:36:27

领域: cs.CR,cs.LG,cs.SD,eess.AS

下载: http://arxiv.org/abs/2407.00913v1

FineSurE: Fine-grained Summarization Evaluation using LLMs

Automated evaluation is crucial for streamlining text summarization benchmarking and model development, given the costly and time-consuming nature of human evaluation. Traditional methods like ROUGE do not correlate well with human judgment, while recently proposed LLM-based metrics provide only summary-level assessment using Likert-scale scores. This limits deeper model analysis, e.g., we can only assign one hallucination score at the summary level, while at the sentence level, we can count sentences containing hallucinations. To remedy those limitations, we propose FineSurE, a fine-grained evaluator specifically tailored for the summarization task using large language models (LLMs). It also employs completeness and conciseness criteria, in addition to faithfulness, enabling multi-dimensional assessment. We compare various open-source and proprietary LLMs as backbones for FineSurE. In addition, we conduct extensive benchmarking of FineSurE against SOTA methods including NLI-, QA-, and LLM-based methods, showing improved performance especially on the completeness and conciseness dimensions. The code is available at https://github.com/DISL-Lab/FineSurE-ACL24.

Updated: 2024-07-01 02:20:28

标题: FineSurE: 使用LLMs进行细粒度摘要评估

摘要: 自动化评估对于简化文本摘要基准和模型开发至关重要，考虑到人类评估的昂贵和耗时性质。传统方法如ROUGE与人类判断不太相关，而最近提出的基于LLM的指标仅使用Likert量表分数提供摘要级别评估。这限制了更深入的模型分析，例如我们只能在摘要级别分配一个幻觉得分，而在句子级别，我们可以计算包含幻觉的句子数。为了解决这些限制，我们提出了FineSurE，一个专门为使用大型语言模型（LLMs）的摘要任务量身定制的细粒度评估器。它还采用了完整性和简洁性标准，除了忠实性，还能进行多维度评估。我们比较了各种开源和专有LLMs作为FineSurE的支撑。此外，我们对FineSurE与SOTA方法进行了广泛的基准测试，包括NLI、QA和LLM方法，表明在完整性和简洁性维度上表现出改进的性能。代码可在https://github.com/DISL-Lab/FineSurE-ACL24获取。

更新时间: 2024-07-01 02:20:28

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2407.00908v1

A Survey on Deep Clustering: From the Prior Perspective

Facilitated by the powerful feature extraction ability of neural networks, deep clustering has achieved great success in analyzing high-dimensional and complex real-world data. The performance of deep clustering methods is affected by various factors such as network structures and learning objectives. However, as pointed out in this survey, the essence of deep clustering lies in the incorporation and utilization of prior knowledge, which is largely ignored by existing works. From pioneering deep clustering methods based on data structure assumptions to recent contrastive clustering methods based on data augmentation invariances, the development of deep clustering intrinsically corresponds to the evolution of prior knowledge. In this survey, we provide a comprehensive review of deep clustering methods by categorizing them into six types of prior knowledge. We find that in general the prior innovation follows two trends, namely, i) from mining to constructing, and ii) from internal to external. Besides, we provide a benchmark on five widely-used datasets and analyze the performance of methods with diverse priors. By providing a novel prior knowledge perspective, we hope this survey could provide some novel insights and inspire future research in the deep clustering community.

Updated: 2024-07-01 02:10:16

标题: 一项关于深度聚类的调查：从先验角度看

摘要: 通过神经网络强大的特征提取能力的支持，深度聚类在分析高维和复杂的现实世界数据方面取得了巨大成功。深度聚类方法的性能受到各种因素的影响，如网络结构和学习目标。然而，正如本调查指出的那样，深度聚类的本质在于融合和利用先验知识，而这在现有研究中往往被忽视。从基于数据结构假设的开创性深度聚类方法到基于数据增强不变性的最新对比聚类方法，深度聚类的发展本质上对应于先验知识的演变。在本调查中，我们通过将深度聚类方法分类为六种先验知识类型，提供了对深度聚类方法的全面评估。我们发现，一般来说，先验创新遵循两种趋势，即 i）从挖掘到构建，和 ii）从内部到外部。此外，我们在五个广泛使用的数据集上提供了一个基准，并分析了具有不同先验的方法的性能。通过提供一种新颖的先验知识视角，我们希望这项调查能够提供一些新颖的见解，并激发深度聚类社区未来研究的灵感。

更新时间: 2024-07-01 02:10:16

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2406.19602v2

From Introspection to Best Practices: Principled Analysis of Demonstrations in Multimodal In-Context Learning

Motivated by in-context learning (ICL) capabilities of Large Language models (LLMs), multimodal LLMs with additional visual modality are also exhibited with similar ICL abilities when multiple image-text pairs are provided as demonstrations. However, relatively less work has been done to investigate the principles behind how and why multimodal ICL works. We conduct a systematic and principled evaluation of multimodal ICL for models of different scales on a broad spectrum of new yet critical tasks. Through perturbations over different modality information, we show that modalities matter differently across tasks in multimodal ICL. Considering such modality impact, we further utilize modality-driven demonstration strategies to boost ICL performance. We also identify that demonstration selection is closely related to the models' ability to capture task inductive biases from multimodal ICL. Our principled analysis provides a comprehensive way of understanding the role of demonstrations in multimodal in-context learning, and sheds light on effectively improving multimodal ICL on a wide range of tasks even if those tasks are not seen in or even contradict pretraining data.

Updated: 2024-07-01 01:57:21

标题: 从内省到最佳实践：多模态上下文学习中示范的原则分析

摘要: 受到大型语言模型（LLMs）的现场学习（ICL）能力的启发，具有额外视觉模态的多模态LLMs在提供多个图像-文本对作为示范时，也展现出类似的ICL能力。然而，相对较少的工作已经进行了对多模态ICL起作用的原则进行探究。我们对不同规模模型在广泛的新而关键任务上进行了系统化和原则性评估多模态ICL。通过对不同模态信息的扰动，我们展示了在多模态ICL中，模态在不同任务中的影响不同。考虑到这种模态影响，我们进一步利用模态驱动的示范策略来提升ICL的性能。我们还发现示范选择与模型从多模态ICL中捕捉任务归纳偏见的能力密切相关。我们的原则性分析提供了一种全面理解示范在多模态现场学习中的作用的方式，并为有效改进在广泛任务上的多模态ICL提供了启示，即使这些任务在预训练数据中没有出现，甚至与之相矛盾。

更新时间: 2024-07-01 01:57:21

领域: cs.CV,cs.AI,cs.CL,cs.LG

下载: http://arxiv.org/abs/2407.00902v1

MathCAMPS: Fine-grained Synthesis of Mathematical Problems From Human Curricula

Mathematical problem solving is an important skill for Large Language Models (LLMs), both as an important capability and a proxy for a range of reasoning abilities. Existing benchmarks probe a diverse set of skills, but they yield aggregate accuracy metrics, obscuring specific abilities or weaknesses. Furthermore, they are difficult to extend with new problems, risking data contamination over time. To address these challenges, we propose MathCAMPS: a method to synthesize high-quality mathematical problems at scale, grounded on 44 fine-grained "standards" from the Mathematics Common Core (CC) Standard for K-8 grades. We encode each standard in a formal grammar, allowing us to sample diverse symbolic problems and their answers. We then use LLMs to realize the symbolic problems into word problems. We propose a cycle-consistency method for validating problem faithfulness. Finally, we derive follow-up questions from symbolic structures and convert them into follow-up word problems - a novel task of mathematical dialogue that probes for robustness in understanding. Experiments on 23 LLMs show surprising failures even in the strongest models (in particular when asked simple follow-up questions). Moreover, we evaluate training checkpoints of Pythia 12B on MathCAMPS, allowing us to analyze when particular mathematical skills develop during its training. Our framework enables the community to reproduce and extend our pipeline for a fraction of the typical cost of building new high-quality datasets.

Updated: 2024-07-01 01:56:28

标题: MathCAMPS：从人类课程中精细合成数学问题

摘要: 数学问题解决是大型语言模型（LLMs）的重要技能，既是一项重要的能力，也是一种推理能力的代理。现有的基准测试探索了多种技能，但它们产生了综合准确度指标，模糊了具体的能力或弱点。此外，它们很难扩展到新问题，随着时间的推移可能存在数据污染的风险。为了解决这些挑战，我们提出了MathCAMPS：一种在K-8年级数学共同核心（CC）标准的44个细粒度“标准”上，以规模合成高质量数学问题的方法。我们在一个形式化语法中编码每个标准，使我们能够抽样不同的符号问题及其答案。然后，我们使用LLMs将符号问题转化为文字问题。我们提出了一个用于验证问题忠诚度的循环一致性方法。最后，我们从符号结构中推导出后续问题，并将其转化为后续文字问题 - 这是一项探索理解稳健性的数学对话的新任务。在23个LLMs上的实验显示了即使在最强大的模型中也存在令人惊讶的失败（特别是在被问及简单的后续问题时）。此外，我们评估了Pythia 12B在MathCAMPS上的训练检查点，从而使我们能够分析在其训练过程中特定数学技能是如何发展的。我们的框架使社区能够以比建立新高质量数据集的典型成本的一小部分来复制和扩展我们的流程。

更新时间: 2024-07-01 01:56:28

领域: cs.AI,cs.CL

下载: http://arxiv.org/abs/2407.00900v1

Channel Modeling Aided Dataset Generation for AI-Enabled CSI Feedback: Advances, Challenges, and Solutions

The AI-enabled autoencoder has demonstrated great potential in channel state information (CSI) feedback in frequency division duplex (FDD) multiple input multiple output (MIMO) systems. However, this method completely changes the existing feedback strategies, making it impractical to deploy in recent years. To address this issue, this paper proposes a channel modeling aided data augmentation method based on a limited number of field channel data. Specifically, the user equipment (UE) extracts the primary stochastic parameters of the field channel data and transmits them to the base station (BS). The BS then updates the typical TR 38.901 model parameters with the extracted parameters. In this way, the updated channel model is used to generate the dataset. This strategy comprehensively considers the dataset collection, model generalization, model monitoring, and so on. Simulations verify that our proposed strategy can significantly improve performance compared to the benchmarks.

Updated: 2024-07-01 01:37:30

标题: 通道建模辅助数据集生成用于AI-启用CSI反馈：进展、挑战和解决方案

摘要: AI启用的自编码器已经在频分双工（FDD）多输入多输出（MIMO）系统中的信道状态信息（CSI）反馈中展现出巨大潜力。然而，这种方法完全改变了现有的反馈策略，使其在近年来难以部署。为了解决这个问题，本文提出了一种基于有限数量现场信道数据的信道建模辅助数据增强方法。具体而言，用户设备（UE）提取现场信道数据的主要随机参数并将其传输给基站（BS）。然后，BS使用提取的参数更新典型的TR 38.901模型参数。通过这种方式，更新后的信道模型被用于生成数据集。这种策略全面考虑了数据集收集、模型泛化、模型监控等方面。模拟验证了我们提出的策略相对于基准的显著性能提升。

更新时间: 2024-07-01 01:37:30

领域: eess.SP,cs.AI

下载: http://arxiv.org/abs/2407.00896v1

Bioptic -- A Target-Agnostic Potency-Based Small Molecules Search Engine

Recent successes in virtual screening have been made possible by large models and extensive chemical libraries. However, combining these elements is challenging: the larger the model, the more expensive it is to run, making ultra-large libraries unfeasible. To address this, we developed a target-agnostic, efficacy-based molecule search model, which allows us to find structurally dissimilar molecules with similar biological activities. We used the best practices to design fast retrieval system, based on processor-optimized SIMD instructions, enabling us to screen the ultra-large 40B Enamine REAL library with 100\% recall rate. We extensively benchmarked our model and several state-of-the-art models for both speed performance and retrieval quality of novel molecules.

Updated: 2024-07-01 01:33:10

标题: 生物光学 - 一种基于效能的小分子搜索引擎，不受靶点限制

摘要: 最近在虚拟筛选领域取得的成功得益于大型模型和广泛的化学库。然而，结合这些元素是具有挑战性的：模型越大，运行成本就越高，使得超大型库不可行。为了解决这个问题，我们开发了一个面向目标的、基于功效的分子搜索模型，这使我们能够找到结构不同但具有相似生物活性的分子。我们采用最佳实践设计了一个快速检索系统，基于处理器优化的SIMD指令，使我们能够以100\%的召回率筛选超大型40B Enamine REAL库。我们广泛对我们的模型以及几种最先进的模型进行了基准测试，评估其速度性能和新分子的检索质量。

更新时间: 2024-07-01 01:33:10

领域: q-bio.QM,cs.AI,cs.IR

下载: http://arxiv.org/abs/2406.14572v3

Mechanistic Interpretation through Contextual Decomposition in Transformers

Transformers exhibit impressive capabilities but are often regarded as black boxes due to challenges in understanding the complex nonlinear relationships between features. Interpreting machine learning models is of paramount importance to mitigate risks, and mechanistic interpretability is in particular of current interest as it opens up a window for guiding manual modifications and reverse-engineering solutions. In this work, we introduce contextual decomposition for transformers (CD-T), extending a prior work on CD for RNNs and CNNs, to address mechanistic interpretation computationally efficiently. CD-T is a flexible interpretation method for transformers. It can capture contributions of combinations of input features or source internal components (e.g. attention heads, feed-forward networks) to (1) final predictions or (2) the output of any target internal component. Using CD-T, we propose a novel algorithm for circuit discovery. On a real-world pathology report classification task: we show CD-T distills a more faithful circuit of attention heads with improved computational efficiency (speed up 2x) than a prior benchmark, path patching. As a versatile interpretation method, CD-T also exhibits exceptional capabilities for local interpretations. CD-T is shown to reliably find words and phrases of contrasting sentiment/topic on SST-2 and AGNews datasets. Through human experiments, we demonstrate CD-T enables users to identify the more accurate of two models and to better trust a model's outputs compared to alternative interpretation methods such as SHAP and LIME.

Updated: 2024-07-01 01:12:20

标题: 变压器中的情境分解机制解释

摘要: Transformers展示了令人印象深刻的能力，但由于难以理解特征之间复杂非线性关系的挑战，通常被视为黑盒子。解释机器学习模型对于减轻风险至关重要，而机械解释性尤其受到关注，因为它为引导手动修改和逆向工程解决方案打开了一扇窗户。在这项工作中，我们介绍了用于transformers的上下文分解（CD-T），扩展了先前用于RNN和CNN的CD工作，以实现计算效率高的机械解释。CD-T是一种灵活的transformers解释方法。它可以捕捉输入特征或源内部组件（如注意力头、前馈网络）的组合对最终预测或任何目标内部组件的输出的贡献。使用CD-T，我们提出了一种用于电路发现的新算法。在一个真实的病理报告分类任务中：我们展示了CD-T比先前的基准路径修补具有更好的计算效率（加速2倍），精炼了注意力头的电路。作为一种多功能解释方法，CD-T还展示了出色的本地解释能力。CD-T被证明能够可靠地在SST-2和AGNews数据集上找到情感/主题对比的单词和短语。通过人类实验，我们证明了CD-T使用户能够识别两个模型中更准确的模型，并相比于SHAP和LIME等替代解释方法更信任模型的输出。

更新时间: 2024-07-01 01:12:20

领域: cs.AI,cs.CL,cs.LG

下载: http://arxiv.org/abs/2407.00886v1

A Two-Layer Blockchain Sharding Protocol Leveraging Safety and Liveness for Enhanced Performance

Sharding is essential for improving blockchain scalability. Existing protocols overlook diverse adversarial attacks, limiting transaction throughput. This paper presents Reticulum, a groundbreaking sharding protocol addressing this issue, boosting blockchain scalability. Reticulum employs a two-phase approach, adapting transaction throughput based on runtime adversarial attacks. It comprises "control" and "process" shards in two layers. Process shards contain at least one trustworthy node, while control shards have a majority of trusted nodes. In the first phase, transactions are written to blocks and voted on by nodes in process shards. Unanimously accepted blocks are confirmed. In the second phase, blocks without unanimous acceptance are voted on by control shards. Blocks are accepted if the majority votes in favor, eliminating first-phase opponents and silent voters. Reticulum uses unanimous voting in the first phase, involving fewer nodes, enabling more parallel process shards. Control shards finalize decisions and resolve disputes. Experiments confirm Reticulum's innovative design, providing high transaction throughput and robustness against various network attacks, outperforming existing sharding protocols for blockchain networks.

Updated: 2024-07-01 00:57:56

标题: 利用安全性和活跃性提升性能的双层区块链分片协议

摘要: 分片技术是提高区块链可扩展性的关键。现有的协议忽视了不同的对抗性攻击，限制了交易吞吐量。本文介绍了一种突破性的分片协议Reticulum，解决了这个问题，增强了区块链的可扩展性。 Reticulum采用了两阶段方法，根据运行时的对抗性攻击来调整交易吞吐量。它包括两层的“控制”和“处理”分片。处理分片至少包含一个可信任的节点，而控制分片则拥有大多数受信任的节点。在第一阶段，交易被写入区块并由处理分片中的节点进行投票。一致接受的区块得到确认。在第二阶段，没有一致接受的区块将由控制分片进行投票。如果大多数投票赞成，那么区块将被接受，消除了第一阶段的对手和沉默选民。Reticulum在第一阶段采用一致投票，涉及更少的节点，从而使更多的并行处理分片成为可能。控制分片最终确定决策并解决争议。实验证实了Reticulum的创新设计，提供了高交易吞吐量，并对各种网络攻击具有强大的鲁棒性，优于现有的区块链网络分片协议。

更新时间: 2024-07-01 00:57:56

领域: cs.CR,cs.DC

下载: http://arxiv.org/abs/2310.11373v4

The Future of QKD Networks

With the recent advancements in quantum technologies, the QKD market exploded. World players are scrambling to win the race towards global QKD networks, even before the rules and policies required by such large endeavors were even discussed. Several vendors are on the market, each with specific parameters and advantages (in terms of key rate, link range, KMS software, etc.), hence considerable effort is now made towards standardization. While quantum communications is expected to reach a market size of up to \$36B by 2040, the largest QKD initiative to date is EuroQCI, which, due to its sheer scale, is forcing the market to mature. Although building a QKD network is believed to be trivial today, inter-connecting federated networks on a global scale is a heavy challenge. We propose QKD virtual networks not only as a useful infrastructure abstraction for increased flexibility and granular security, but as an inevitable solution for several problems that future QKD networks will encounter on the way towards widespread adoption.

Updated: 2024-07-01 00:56:13

标题: 量子密钥分发网络的未来

摘要: 随着量子技术的最新进展，量子密钥分发（QKD）市场迅速发展。全球参与者争相赢得向全球QKD网络的竞赛，甚至在这样的大型项目所需的规则和政策被讨论之前。市场上有几家供应商，每家都有特定的参数和优势（如密钥速率、链路范围、KMS软件等），因此现在正在大力推动标准化工作。预计到2040年，量子通信市场规模将达到360亿美元，迄今为止最大的QKD倡议是EuroQCI，由于其巨大规模，正在推动市场成熟。虽然今天建立QKD网络被认为是微不足道的，但在全球范围内互连联邦网络是一个巨大的挑战。我们提出QKD虚拟网络不仅作为一种有用的基础设施抽象，用于增强灵活性和细粒度安全性，而且作为未来QKD网络在普遍采用过程中将遇到的几个问题的必然解决方案。

更新时间: 2024-07-01 00:56:13

领域: cs.NI,cs.CR,quant-ph

下载: http://arxiv.org/abs/2407.00877v1

Decentralized PKI Framework for Data Integrity in Spatial Crowdsourcing Drone Services

In the domain of spatial crowdsourcing drone services, which includes tasks like delivery, surveillance, and data collection, secure communication is paramount. The Public Key Infrastructure (PKI) ensures this by providing a system for digital certificates that authenticate the identities of entities involved, securing data and command transmissions between drones and their operators. However, the centralized trust model of traditional PKI, dependent on Certificate Authorities (CAs), presents a vulnerability due to its single point of failure, risking security breaches. To counteract this, the paper presents D2XChain, a blockchain-based PKI framework designed for the Internet of Drone Things (IoDT). By decentralizing the CA infrastructure, D2XChain eliminates this single point of failure, thereby enhancing the security and reliability of drone communications. Fully compatible with the X.509 standard, it integrates seamlessly with existing PKI systems, supporting all key operations such as certificate registration, validation, verification, and revocation in a distributed manner. This innovative approach not only strengthens the defense of drone services against various security threats but also showcases its practical application through deployment on a private Ethereum testbed, representing a significant advancement in addressing the unique security challenges of drone-based services and ensuring their trustworthy operation in critical tasks.

Updated: 2024-07-01 00:55:07

标题: 空间众包无人机服务中基于去中心化PKI框架的数据完整性

摘要: 在空间众包无人机服务领域，包括交付、监视和数据收集等任务，安全通信至关重要。公钥基础设施（PKI）通过提供用于认证参与实体身份的数字证书系统来确保安全，保障了无人机和其操作者之间数据和命令传输的安全。然而，传统PKI的集中信任模型依赖于证书颁发机构（CAs），存在单点故障的漏洞，因此有风险导致安全漏洞。为了应对这一问题，该论文提出了D2XChain，这是一个基于区块链的PKI框架，专为无人机物联网（IoDT）设计。通过分散化的CA基础设施，D2XChain消除了这个单点故障，从而增强了无人机通信的安全性和可靠性。与X.509标准完全兼容，它与现有的PKI系统无缝集成，支持所有关键操作，如证书注册、验证、验证和吊销，以分布式方式进行。这种创新方法不仅加强了无人机服务对各种安全威胁的防御，还通过在私人以太坊测试平台上部署，展示了其实际应用，从而在解决基于无人机的服务的独特安全挑战和确保其在关键任务中的可信操作方面取得了重大进展。

更新时间: 2024-07-01 00:55:07

领域: cs.CR

下载: http://arxiv.org/abs/2407.00876v1

Privacy-First Crowdsourcing: Blockchain and Local Differential Privacy in Crowdsourced Drone Services

We introduce a privacy-preserving framework for integrating consumer-grade drones into bushfire management. This system creates a marketplace where bushfire management authorities obtain essential data from drone operators. Key features include local differential privacy to protect data providers and a blockchain-based solution ensuring fair data exchanges and accountability. The framework is validated through a proof-of-concept implementation, demonstrating its scalability and potential for various large-scale data collection scenarios. This approach addresses privacy concerns and compliance with regulations like Australia's Privacy Act 1988, offering a practical solution for enhancing bushfire detection and management through crowdsourced drone services.

Updated: 2024-07-01 00:46:25

标题: 隐私优先众包：区块链和本地差分隐私在众包无人机服务中的应用

摘要: 我们介绍了一个隐私保护框架，用于将消费级别的无人机整合到森林火灾管理中。该系统创建了一个市场，森林火灾管理部门可以从无人机操作者那里获得必要的数据。关键特点包括本地差分隐私以保护数据提供者，以及基于区块链的解决方案确保公平的数据交换和问责制。该框架通过一个概念验证实现进行验证，展示了其可扩展性和在各种大规模数据收集场景中的潜力。这种方法解决了隐私问题，并遵守澳大利亚《1988年隐私法》，为通过众包无人机服务增强森林火灾检测和管理提供了实际解决方案。

更新时间: 2024-07-01 00:46:25

领域: cs.CR,cs.DC

下载: http://arxiv.org/abs/2407.00873v1

Large Language Models Are Involuntary Truth-Tellers: Exploiting Fallacy Failure for Jailbreak Attacks

We find that language models have difficulties generating fallacious and deceptive reasoning. When asked to generate deceptive outputs, language models tend to leak honest counterparts but believe them to be false. Exploiting this deficiency, we propose a jailbreak attack method that elicits an aligned language model for malicious output. Specifically, we query the model to generate a fallacious yet deceptively real procedure for the harmful behavior. Since a fallacious procedure is generally considered fake and thus harmless by LLMs, it helps bypass the safeguard mechanism. Yet the output is factually harmful since the LLM cannot fabricate fallacious solutions but proposes truthful ones. We evaluate our approach over five safety-aligned large language models, comparing four previous jailbreak methods, and show that our approach achieves competitive performance with more harmful outputs. We believe the findings could be extended beyond model safety, such as self-verification and hallucination.

Updated: 2024-07-01 00:23:43

标题: 大型语言模型是不自愿的真相告诉者：利用谬误失效进行越狱攻击

摘要: 我们发现语言模型在生成错误和欺骗性推理时存在困难。当要求生成欺骗性输出时，语言模型往往会泄露诚实的对应物，但认为它们是虚假的。利用这种缺陷，我们提出了一种越狱攻击方法，以引导一个对恶意输出对齐的语言模型。具体来说，我们查询模型生成一个虚假但具有欺骗性的真实过程，用于有害行为。由于虚假的过程通常被LLMs认为是假的，因此无害，有助于绕过安全机制。然而，由于LLM无法捏造虚假解决方案而提出真实的解决方案，输出实际上是有害的。我们在五个安全对齐的大型语言模型上评估我们的方法，比较了四种先前的越狱方法，并展示了我们的方法实现了具有更多有害输出的竞争性性能。我们相信这些发现可以扩展到模型安全之外，例如自我验证和幻觉。

更新时间: 2024-07-01 00:23:43

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2407.00869v1