_              _         ____              
   / \   _ ____  _(_)_   __ |  _ \  __ _ _   _ 
  / _ \ | '__\ \/ / \ \ / / | | | |/ _` | | | |
 / ___ \| |   >  <| |\ V /  | |_| | (_| | |_| |
/_/   \_\_|  /_/\_\_| \_/   |____/ \__,_|\__, |
                                         |___/ 
        

Articles: 0

Last Updated: N/A (+00:00)

A Block-Coordinate Descent EMO Algorithm: Theoretical and Empirical Analysis

We consider whether conditions exist under which block-coordinate descent is asymptotically efficient in evolutionary multi-objective optimization, addressing an open problem. Block-coordinate descent, where an optimization problem is decomposed into $k$ blocks of decision variables and each of the blocks is optimized (with the others fixed) in a sequence, is a technique used in some large-scale optimization problems such as airline scheduling, however its use in multi-objective optimization is less studied. We propose a block-coordinate version of GSEMO and compare its running time to the standard GSEMO algorithm. Theoretical and empirical results on a bi-objective test function, a variant of LOTZ, serve to demonstrate the existence of cases where block-coordinate descent is faster. The result may yield wider insights into this class of algorithms.

Updated: 2024-04-04 23:50:18

标题: 一个分块坐标下降的多目标优化算法:理论和实证分析

摘要: 我们考虑是否存在条件,使得块坐标下降在演化多目标优化中渐近有效,解决一个开放问题。块坐标下降是一种优化问题被分解成$k$个决策变量块,每个块在序列中被优化(其他块被固定)的技术,在一些大规模优化问题中被使用,如航空公司调度,然而在多目标优化中的使用较少研究。我们提出了一个块坐标版本的GSEMO,并将其运行时间与标准GSEMO算法进行比较。在一个双目标测试函数上的理论和实证结果,如LOTZ的一个变体,证明了块坐标下降更快的情况存在。这一结果可能为这类算法提供更广泛的见解。

更新时间: 2024-04-04 23:50:18

领域: cs.NE,cs.AI

下载: http://arxiv.org/abs/2404.03838v1

PARIS3D: Reasoning-based 3D Part Segmentation Using Large Multimodal Model

Recent advancements in 3D perception systems have significantly improved their ability to perform visual recognition tasks such as segmentation. However, these systems still heavily rely on explicit human instruction to identify target objects or categories, lacking the capability to actively reason and comprehend implicit user intentions. We introduce a novel segmentation task known as reasoning part segmentation for 3D objects, aiming to output a segmentation mask based on complex and implicit textual queries about specific parts of a 3D object. To facilitate evaluation and benchmarking, we present a large 3D dataset comprising over 60k instructions paired with corresponding ground-truth part segmentation annotations specifically curated for reasoning-based 3D part segmentation. We propose a model that is capable of segmenting parts of 3D objects based on implicit textual queries and generating natural language explanations corresponding to 3D object segmentation requests. Experiments show that our method achieves competitive performance to models that use explicit queries, with the additional abilities to identify part concepts, reason about them, and complement them with world knowledge. Our source code, dataset, and trained models are available at https://github.com/AmrinKareem/PARIS3D.

Updated: 2024-04-04 23:38:45

标题: PARIS3D: 使用大型多模态模型进行基于推理的3D部分分割

摘要: 最近在3D感知系统方面取得的进展显著提高了它们执行视觉识别任务(如分割)的能力。然而,这些系统仍然严重依赖显式的人类指令来识别目标物体或类别,缺乏主动推理和理解隐含用户意图的能力。我们引入了一种新颖的分割任务,称为3D物体的推理部分分割,旨在根据关于3D物体特定部分的复杂和隐含的文本查询输出一个分割掩模。为了促进评估和基准测试,我们提出了一个大型的3D数据集,包括超过60k条指令,配对相应的经过精心策划的基于推理的3D部分分割注释,用于推理基础的3D部分分割。我们提出了一种模型,能够根据隐含的文本查询分割3D物体的部分,并生成与3D物体分割请求相对应的自然语言解释。实验表明,我们的方法在达到与使用显式查询的模型竞争性能的同时,还具有识别部分概念、推理它们,并结合世界知识的额外能力。我们的源代码、数据集和训练模型可在https://github.com/AmrinKareem/PARIS3D获取。

更新时间: 2024-04-04 23:38:45

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2404.03836v1

Learning Collective Behaviors from Observation

We present a comprehensive examination of learning methodologies employed for the structural identification of dynamical systems. These techniques are designed to elucidate emergent phenomena within intricate systems of interacting agents. Our approach not only ensures theoretical convergence guarantees but also exhibits computational efficiency when handling high-dimensional observational data. The methods adeptly reconstruct both first- and second-order dynamical systems, accommodating observation and stochastic noise, intricate interaction rules, absent interaction features, and real-world observations in agent systems. The foundational aspect of our learning methodologies resides in the formulation of tailored loss functions using the variational inverse problem approach, inherently equipping our methods with dimension reduction capabilities.

Updated: 2024-04-04 23:30:37

标题: 学习观察到的集体行为

摘要: 我们提出了一个对用于动态系统结构识别的学习方法进行全面考察。这些技术旨在阐明复杂系统中相互作用代理的新兴现象。我们的方法不仅确保理论收敛性保证,而且在处理高维观测数据时还表现出计算效率。这些方法巧妙地重建了一阶和二阶动态系统,适应观测和随机噪声、复杂的交互规则、缺失的交互特征和代理系统中的真实观测。我们学习方法的基础在于使用变分逆问题方法制定定制损失函数,从根本上装备我们的方法具有降维能力。

更新时间: 2024-04-04 23:30:37

领域: cs.LG,cs.MA,math.DS

下载: http://arxiv.org/abs/2311.00875v3

An ExplainableFair Framework for Prediction of Substance Use Disorder Treatment Completion

Fairness of machine learning models in healthcare has drawn increasing attention from clinicians, researchers, and even at the highest level of government. On the other hand, the importance of developing and deploying interpretable or explainable models has been demonstrated, and is essential to increasing the trustworthiness and likelihood of adoption of these models. The objective of this study was to develop and implement a framework for addressing both these issues - fairness and explainability. We propose an explainable fairness framework, first developing a model with optimized performance, and then using an in-processing approach to mitigate model biases relative to the sensitive attributes of race and sex. We then explore and visualize explanations of the model changes that lead to the fairness enhancement process through exploring the changes in importance of features. Our resulting-fairness enhanced models retain high sensitivity with improved fairness and explanations of the fairness-enhancement that may provide helpful insights for healthcare providers to guide clinical decision-making and resource allocation.

Updated: 2024-04-04 23:30:01

标题: 一个可解释的公平框架用于预测物质使用障碍治疗完成

摘要: 在医疗领域,机器学习模型的公平性越来越受到临床医生、研究人员甚至政府最高层的关注。另一方面,开发和部署可解释的模型的重要性已经得到证明,这对增加这些模型的可信度和采纳可能性至关重要。本研究的目标是开发并实施一个解决公平性和可解释性两个问题的框架。我们提出了一个可解释公平性框架,首先开发一个性能优化的模型,然后使用一个内部处理方法来减轻模型对种族和性别等敏感属性的偏见。然后,我们探索并可视化模型变化的解释,这些变化导致了公平性增强过程,通过探索特征重要性的变化。我们的结果显示,增强公平性的模型保持了高灵敏度,并改善了公平性,并提供了有益的见解,可供医疗提供者指导临床决策和资源分配。

更新时间: 2024-04-04 23:30:01

领域: cs.LG,cs.CY

下载: http://arxiv.org/abs/2404.03833v1

BiSHop: Bi-Directional Cellular Learning for Tabular Data with Generalized Sparse Modern Hopfield Model

We introduce the \textbf{B}i-Directional \textbf{S}parse \textbf{Hop}field Network (\textbf{BiSHop}), a novel end-to-end framework for deep tabular learning. BiSHop handles the two major challenges of deep tabular learning: non-rotationally invariant data structure and feature sparsity in tabular data. Our key motivation comes from the recent established connection between associative memory and attention mechanisms. Consequently, BiSHop uses a dual-component approach, sequentially processing data both column-wise and row-wise through two interconnected directional learning modules. Computationally, these modules house layers of generalized sparse modern Hopfield layers, a sparse extension of the modern Hopfield model with adaptable sparsity. Methodologically, BiSHop facilitates multi-scale representation learning, capturing both intra-feature and inter-feature interactions, with adaptive sparsity at each scale. Empirically, through experiments on diverse real-world datasets, we demonstrate that BiSHop surpasses current SOTA methods with significantly less HPO runs, marking it a robust solution for deep tabular learning.

Updated: 2024-04-04 23:13:32

标题: BiSHop:具有广义稀疏现代霍普菲尔德模型的表格数据的双向细胞学习

摘要: 我们介绍了\textbf{B}i-Directional \textbf{S}parse \textbf{Hop}field Network(\textbf{BiSHop}),这是一个新颖的端到端框架,用于深度表格学习。BiSHop处理深度表格学习的两个主要挑战:非旋转不变的数据结构和表格数据中的特征稀疏性。我们的关键动机来自于最近建立的关联记忆和注意机制之间的联系。因此,BiSHop使用了双组件方法,通过两个相互连接的方向学习模块,逐列和逐行顺序处理数据。在计算上,这些模块包含了一系列广义稀疏的现代Hopfield层,这是现代Hopfield模型的一种稀疏扩展,具有可适应的稀疏性。在方法上,BiSHop促进了多尺度表示学习,捕捉了每个尺度上的内部特征和特征之间的相互作用,具有自适应的稀疏性。经验上,通过对不同真实世界数据集的实验,我们证明了BiSHop在显著减少HPO运行的情况下超越了当前的SOTA方法,使其成为深度表格学习的强大解决方案。

更新时间: 2024-04-04 23:13:32

领域: cs.LG,cs.AI,stat.ML

下载: http://arxiv.org/abs/2404.03830v1

Outlier-Efficient Hopfield Layers for Large Transformer-Based Models

We introduce an Outlier-Efficient Modern Hopfield Model (termed $\mathtt{OutEffHop}$) and use it to address the outlier-induced challenge of quantizing gigantic transformer-based models. Our main contribution is a novel associative memory model facilitating \textit{outlier-efficient} associative memory retrievals. Interestingly, this memory model manifests a model-based interpretation of an outlier-efficient attention mechanism ($\text{Softmax}_1$): it is an approximation of the memory retrieval process of $\mathtt{OutEffHop}$. Methodologically, this allows us to debut novel outlier-efficient Hopfield layers a powerful attention alternative with superior post-quantization performance. Theoretically, the Outlier-Efficient Modern Hopfield Model retains and improves the desirable properties of the standard modern Hopfield models, including fixed point convergence and exponential storage capacity. Empirically, we demonstrate the proposed model's efficacy across large-scale transformer-based and Hopfield-based models (including BERT, OPT, ViT and STanHop-Net), benchmarking against state-of-the-art methods including $\mathtt{Clipped\_Softmax}$ and $\mathtt{Gated\_Attention}$. Notably, $\mathtt{OutEffHop}$ achieves on average $\sim$22+\% reductions in both average kurtosis and maximum infinity norm of model outputs accross 4 models.

Updated: 2024-04-04 23:08:43

标题: 大型基于Transformer的模型的异常值高效Hopfield层

摘要: 我们引入了一种异常值高效的现代霍普菲尔德模型(称为$\mathtt{OutEffHop}$),并将其用于解决异常值引起的量化巨大基于变压器的模型的挑战。我们的主要贡献是一种新颖的关联记忆模型,促进了\textit{高效}的关联记忆检索。有趣的是,这种记忆模型展现了一种基于模型的异常值高效注意机制($\text{Softmax}_1$)的解释:它是$\mathtt{OutEffHop}$的记忆检索过程的近似。从方法上看,这使我们能够推出新颖的异常值高效霍普菲尔德层,这是一个强大的注意替代方案,具有优越的后量化性能。在理论上,异常值高效现代霍普菲尔德模型保留并改进了标准现代霍普菲尔德模型的可取属性,包括固定点收敛和指数存储容量。在经验上,我们演示了所提出的模型在大规模基于变压器和霍普菲尔德的模型(包括BERT、OPT、ViT和STanHop-Net)上的功效,与最先进的方法进行了基准测试,包括$\mathtt{Clipped\_Softmax}$和$\mathtt{Gated\_Attention}$。值得注意的是,$\mathtt{OutEffHop}$在4个模型中平均实现了约22+\%的平均峰度和最大无穷范数减少。

更新时间: 2024-04-04 23:08:43

领域: cs.LG,cs.AI,stat.ML

下载: http://arxiv.org/abs/2404.03828v1

Uniform Memory Retrieval with Larger Capacity for Modern Hopfield Models

We propose a two-stage memory retrieval dynamics for modern Hopfield models, termed $\mathtt{U\text{-}Hop}$, with enhanced memory capacity. Our key contribution is a learnable feature map $\Phi$ which transforms the Hopfield energy function into a kernel space. This transformation ensures convergence between the local minima of energy and the fixed points of retrieval dynamics within the kernel space. Consequently, the kernel norm induced by $\Phi$ serves as a novel similarity measure. It utilizes the stored memory patterns as learning data to enhance memory capacity across all modern Hopfield models. Specifically, we accomplish this by constructing a separation loss $\mathcal{L}_\Phi$ that separates the local minima of kernelized energy by separating stored memory patterns in kernel space. Methodologically, $\mathtt{U\text{-}Hop}$ memory retrieval process consists of: \textbf{(Stage~I.)} minimizing separation loss for a more uniformed memory (local minimum) distribution, followed by \textbf{(Stage~II.)} standard Hopfield energy minimization for memory retrieval. This results in a significant reduction of possible meta-stable states in the Hopfield energy function, thus enhancing memory capacity by preventing memory confusion. Empirically, with real-world datasets, we demonstrate that $\mathtt{U\text{-}Hop}$ outperforms all existing modern Hopfield models and SOTA similarity measures, achieving substantial improvements in both associative memory retrieval and deep learning tasks.

Updated: 2024-04-04 23:05:30

标题: 现代霍普菲尔德模型具有更大容量的统一内存检索

摘要: 我们提出了一种用于现代Hopfield模型的两阶段记忆检索动态,称为$\mathtt{U\text{-}Hop}$,具有增强的记忆容量。我们的关键贡献是一个可学习的特征映射$\Phi$,它将Hopfield能量函数转换为一个核空间。这种转换确保了能量的局部极小值与检索动态的固定点在核空间内的收敛。因此,由$\Phi$引起的核范数作为一种新颖的相似性度量。它利用存储的记忆模式作为学习数据,增强了所有现代Hopfield模型的记忆容量。具体来说,我们通过构建一个分离损失$\mathcal{L}_\Phi$来实现这一点,该损失通过在核空间中分离存储的记忆模式来分离核化能量的局部极小值。方法上,$\mathtt{U\text{-}Hop}$的记忆检索过程包括:\textbf{(阶段I.)} 最小化分离损失以获得更均匀的记忆(局部极小值)分布,然后\textbf{(阶段II.)} 进行标准Hopfield能量最小化以进行记忆检索。这导致Hopfield能量函数中可能的亚稳态数量显著减少,从而通过防止记忆混淆来增强记忆容量。在实证研究中,我们使用真实世界数据集证明,$\mathtt{U\text{-}Hop}$优于所有现有的现代Hopfield模型和SOTA相似性度量,实现了在联想记忆检索和深度学习任务中的显著改进。

更新时间: 2024-04-04 23:05:30

领域: cs.LG,cs.AI,stat.ML

下载: http://arxiv.org/abs/2404.03827v1

Benchmarking Machine Learning Models for Quantum Error Correction

Quantum Error Correction (QEC) is one of the fundamental problems in quantum computer systems, which aims to detect and correct errors in the data qubits within quantum computers. Due to the presence of unreliable data qubits in existing quantum computers, implementing quantum error correction is a critical step when establishing a stable quantum computer system. Recently, machine learning (ML)-based approaches have been proposed to address this challenge. However, they lack a thorough understanding of quantum error correction. To bridge this research gap, we provide a new perspective to understand machine learning-based QEC in this paper. We find that syndromes in the ancilla qubits result from errors on connected data qubits, and distant ancilla qubits can provide auxiliary information to rule out some incorrect predictions for the data qubits. Therefore, to detect errors in data qubits, we must consider the information present in the long-range ancilla qubits. To the best of our knowledge, machine learning is less explored in the dependency relationship of QEC. To fill the blank, we curate a machine learning benchmark to assess the capacity to capture long-range dependencies for quantum error correction. To provide a comprehensive evaluation, we evaluate seven state-of-the-art deep learning algorithms spanning diverse neural network architectures, such as convolutional neural networks, graph neural networks, and graph transformers. Our exhaustive experiments reveal an enlightening trend: By enlarging the receptive field to exploit information from distant ancilla qubits, the accuracy of QEC significantly improves. For instance, U-Net can improve CNN by a margin of about 50%. Finally, we provide a comprehensive analysis that could inspire future research in this field.

Updated: 2024-04-04 22:54:08

标题: 为量子误差校正的机器学习模型进行基准测试

摘要: 量子错误校正(QEC)是量子计算机系统中的一个基本问题,旨在检测和纠正量子计算机中数据量子位的错误。由于现有量子计算机中存在不可靠的数据量子位,实施量子错误校正是建立稳定量子计算机系统的关键步骤。最近,提出了基于机器学习(ML)的方法来解决这一挑战。然而,它们缺乏对量子错误校正的深入理解。为了弥补这一研究空白,我们在本文中提供了一种新的视角来理解基于机器学习的QEC。我们发现辅助量子位中的综合症是由于连接的数据量子位上的错误造成的,并且远距离的辅助量子位可以提供辅助信息来排除一些对数据量子位的错误预测。因此,为了检测数据量子位中的错误,我们必须考虑长距离辅助量子位中存在的信息。据我们所知,机器学习在QEC的依赖关系方面研究较少。为了填补这一空白,我们策划了一个机器学习基准来评估捕捉量子错误校正的长距离依赖关系的能力。为了提供全面的评估,我们评估了七种最先进的深度学习算法,涵盖了各种神经网络架构,如卷积神经网络、图神经网络和图变压器。我们的详尽实验揭示了一个启发性的趋势:通过扩大接受域以利用来自远距离辅助量子位的信息,QEC的准确性显著提高。例如,U-Net可以使CNN的准确性提高约50%。最后,我们提供了一项全面的分析,可以激发未来在这一领域的研究。

更新时间: 2024-04-04 22:54:08

领域: quant-ph,cs.LG

下载: http://arxiv.org/abs/2311.11167v3

An Investigation into Misuse of Java Security APIs by Large Language Models

The increasing trend of using Large Language Models (LLMs) for code generation raises the question of their capability to generate trustworthy code. While many researchers are exploring the utility of code generation for uncovering software vulnerabilities, one crucial but often overlooked aspect is the security Application Programming Interfaces (APIs). APIs play an integral role in upholding software security, yet effectively integrating security APIs presents substantial challenges. This leads to inadvertent misuse by developers, thereby exposing software to vulnerabilities. To overcome these challenges, developers may seek assistance from LLMs. In this paper, we systematically assess ChatGPT's trustworthiness in code generation for security API use cases in Java. To conduct a thorough evaluation, we compile an extensive collection of 48 programming tasks for 5 widely used security APIs. We employ both automated and manual approaches to effectively detect security API misuse in the code generated by ChatGPT for these tasks. Our findings are concerning: around 70% of the code instances across 30 attempts per task contain security API misuse, with 20 distinct misuse types identified. Moreover, for roughly half of the tasks, this rate reaches 100%, indicating that there is a long way to go before developers can rely on ChatGPT to securely implement security API code.

Updated: 2024-04-04 22:52:41

标题: 对大型语言模型滥用Java安全API的调查

摘要: 使用大型语言模型(LLMs)进行代码生成的趋势日益增长,引发了它们生成可信代码的能力问题。虽然许多研究人员正在探索代码生成对发现软件漏洞的实用性,但一个关键但经常被忽视的方面是安全应用程序编程接口(APIs)。APIs在维护软件安全方面发挥着重要作用,但有效整合安全APIs面临着重大挑战。这导致开发人员无意中滥用APIs,从而使软件容易受到漏洞的影响。为了克服这些挑战,开发人员可能会寻求LLMs的帮助。在本文中,我们系统评估ChatGPT在Java中用于安全API使用案例的代码生成的可信度。为了进行彻底评估,我们编制了一个包含5个广泛使用的安全API的48个编程任务的广泛集合。我们采用自动化和手动方法来有效地检测ChatGPT为这些任务生成的代码中的安全API误用。我们的发现令人担忧:在每个任务的30次尝试中,大约70%的代码实例包含安全API误用,识别了20种不同的误用类型。此外,对于大约一半的任务来说,这一比率达到了100%,这表明在开发人员可以依赖ChatGPT安全实现安全API代码之前还有很长的路要走。

更新时间: 2024-04-04 22:52:41

领域: cs.CR,cs.CL,cs.CY

下载: http://arxiv.org/abs/2404.03823v1

Multi-word Tokenization for Sequence Compression

Large Language Models have proven highly successful at modelling a variety of tasks. However, this comes at a steep computational cost that hinders wider industrial uptake. In this paper, we present MWT: a Multi-Word Tokenizer that goes beyond word boundaries by representing frequent multi-word expressions as single tokens. MWTs produce a more compact and efficient tokenization that yields two benefits: (1) Increase in performance due to a greater coverage of input data given a fixed sequence length budget; (2) Faster and lighter inference due to the ability to reduce the sequence length with negligible drops in performance. Our results show that MWT is more robust across shorter sequence lengths, thus allowing for major speedups via early sequence truncation.

Updated: 2024-04-04 22:50:25

标题: 多词标记化用于序列压缩

摘要: 大型语言模型已被证明在建模各种任务方面非常成功。然而,这需要高昂的计算成本,阻碍了更广泛的工业应用。在本文中,我们提出了MWT:一种多词标记器,通过将频繁的多词表达表示为单个标记,超越了单词边界。MWT生成更紧凑和高效的标记化,带来两个好处:(1)由于在固定序列长度预算下提供更广泛的输入数据覆盖而提高性能;(2)由于能够减少序列长度而导致推理更快更轻,且性能下降可以忽略不计。我们的结果表明,MWT在较短序列长度下更加稳健,因此通过早期序列截断可以实现主要加速。

更新时间: 2024-04-04 22:50:25

领域: cs.CL,cs.LG

下载: http://arxiv.org/abs/2402.09949v2

SteinGen: Generating Fidelitous and Diverse Graph Samples

Generating graphs that preserve characteristic structures while promoting sample diversity can be challenging, especially when the number of graph observations is small. Here, we tackle the problem of graph generation from only one observed graph. The classical approach of graph generation from parametric models relies on the estimation of parameters, which can be inconsistent or expensive to compute due to intractable normalisation constants. Generative modelling based on machine learning techniques to generate high-quality graph samples avoids parameter estimation but usually requires abundant training samples. Our proposed generating procedure, SteinGen, which is phrased in the setting of graphs as realisations of exponential random graph models, combines ideas from Stein's method and MCMC by employing Markovian dynamics which are based on a Stein operator for the target model. SteinGen uses the Glauber dynamics associated with an estimated Stein operator to generate a sample, and re-estimates the Stein operator from the sample after every sampling step. We show that on a class of exponential random graph models this novel "estimation and re-estimation" generation strategy yields high distributional similarity (high fidelity) to the original data, combined with high sample diversity.

Updated: 2024-04-04 22:38:02

标题: SteinGen:生成忠实和多样化的图样本

摘要: 生成保留特征结构并促进样本多样性的图形可能具有挑战性,特别是当图形观测数量较少时。在这里,我们解决了仅从一个观察到的图形生成图形的问题。从参数模型生成图形的经典方法依赖于参数的估计,由于难以处理的归一化常数,这可能是不一致或计算昂贵的。基于机器学习技术的生成建模以生成高质量的图形样本避免了参数估计,但通常需要大量的训练样本。我们提出的生成过程SteinGen,在图形被视为指数随机图模型的背景下,结合了Stein方法和MCMC的思想,通过采用基于目标模型的Stein算子的马尔可夫动力学。 SteinGen使用与估计的Stein算子相关的Glauber动力学来生成样本,并在每个采样步骤后从样本中重新估计Stein算子。我们展示了在一类指数随机图模型上,这种新颖的“估计和重新估计”生成策略产生高分布相似性(高保真度)与原始数据相结合的高样本多样性。

更新时间: 2024-04-04 22:38:02

领域: stat.ML,cs.LG

下载: http://arxiv.org/abs/2403.18578v2

Differentially Private Stream Processing at Scale

We design, to the best of our knowledge, the first differentially private (DP) stream aggregation processing system at scale. Our system -- Differential Privacy SQL Pipelines (DP-SQLP) -- is built using a streaming framework similar to Spark streaming, and is built on top of the Spanner database and the F1 query engine from Google. Towards designing DP-SQLP we make both algorithmic and systemic advances, namely, we (i) design a novel (user-level) DP key selection algorithm that can operate on an unbounded set of possible keys, and can scale to one billion keys that users have contributed, (ii) design a preemptive execution scheme for DP key selection that avoids enumerating all the keys at each triggering time, and (iii) use algorithmic techniques from DP continual observation to release a continual DP histogram of user contributions to different keys over the stream length. We empirically demonstrate the efficacy by obtaining at least $16\times$ reduction in error over meaningful baselines we consider. We implemented a streaming differentially private user impressions for Google Shopping with DP-SQLP. The streaming DP algorithms are further applied to Google Trends.

Updated: 2024-04-04 22:24:56

标题: 规模化的差分隐私流处理

摘要: 我们设计了据我们所知规模最大的第一个差分隐私(DP)流聚合处理系统。我们的系统——差分隐私SQL管道(DP-SQLP)——是使用类似于Spark流处理的流式框架构建的,并建立在Google的Spanner数据库和F1查询引擎之上。 为了设计DP-SQLP,我们在算法和系统方面取得了进展,即我们(i)设计了一种新颖的(用户级别的)DP密钥选择算法,可以在可能的无限密钥集上操作,并且可以扩展到用户贡献的十亿个密钥,(ii)设计了一种DP密钥选择的预先执行方案,避免在每次触发时枚举所有密钥,并且(iii)使用DP持续观察的算法技术发布用户在流长度上对不同密钥的持续DP直方图贡献。我们通过获得至少$16\times$的误差减少,实证证明了其有效性。我们使用DP-SQLP为Google购物实现了流式差分隐私用户印象。流式DP算法进一步应用到了Google趋势。

更新时间: 2024-04-04 22:24:56

领域: cs.CR,cs.DB

下载: http://arxiv.org/abs/2303.18086v2

On Computational Limits of Modern Hopfield Models: A Fine-Grained Complexity Analysis

We investigate the computational limits of the memory retrieval dynamics of modern Hopfield models from the fine-grained complexity analysis. Our key contribution is the characterization of a phase transition behavior in the efficiency of all possible modern Hopfield models based on the norm of patterns. Specifically, we establish an upper bound criterion for the norm of input query patterns and memory patterns. Only below this criterion, sub-quadratic (efficient) variants of the modern Hopfield model exist, assuming the Strong Exponential Time Hypothesis (SETH). To showcase our theory, we provide a formal example of efficient constructions of modern Hopfield models using low-rank approximation when the efficient criterion holds. This includes a derivation of a lower bound on the computational time, scaling linearly with $\max\{$\# of stored memory patterns, length of input query sequence$\}$. In addition, we prove its memory retrieval error bound and exponential memory capacity.

Updated: 2024-04-04 21:56:56

标题: 现代Hopfield模型的计算界限:一项细粒度复杂性分析

摘要: 我们通过细粒度复杂性分析,研究了现代Hopfield模型记忆检索动态的计算限制。我们的关键贡献是基于模式的范数对所有可能的现代Hopfield模型的效率进行特征化,并确定了相变行为。具体而言,我们建立了输入查询模式和记忆模式的范数的上界标准。只有在此标准以下,假设强指数时间假设(SETH)成立,现代Hopfield模型的次二次(高效)变体才存在。为了展示我们的理论,我们提供了一个正式示例,即当高效标准成立时,使用低秩逼近构建现代Hopfield模型的高效构造。这包括一个计算时间的下界推导,与存储的记忆模式数量和输入查询序列长度的最大值成线性比例。此外,我们证明了其记忆检索误差界和指数内存容量。

更新时间: 2024-04-04 21:56:56

领域: cs.LG,cs.AI,stat.ML

下载: http://arxiv.org/abs/2402.04520v3

Learning Optimal Topology for Ad-hoc Robot Networks

In this paper, we synthesize a data-driven method to predict the optimal topology of an ad-hoc robot network. This problem is technically a multi-task classification problem. However, we divide it into a class of multi-class classification problems that can be more efficiently solved. For this purpose, we first compose an algorithm to create ground-truth optimal topologies associated with various configurations of a robot network. This algorithm incorporates a complex collection of optimality criteria that our learning model successfully manages to learn. This model is an stacked ensemble whose output is the topology prediction for a particular robot. Each stacked ensemble instance constitutes three low-level estimators whose outputs will be aggregated by a high-level boosting blender. Applying our model to a network of 10 robots displays over 80% accuracy in the prediction of optimal topologies corresponding to various configurations of the cited network.

Updated: 2024-04-04 21:54:18

标题: 学习自组织机器人网络的最佳拓扑结构

摘要: 在这篇论文中,我们综合了一种数据驱动的方法来预测自组织机器人网络的最佳拓扑结构。这个问题在技术上是一个多任务分类问题。然而,我们将其划分为一类多类分类问题,可以更有效地解决。为此,我们首先构建了一个算法,用于创建与机器人网络的各种配置相关联的地面实况最佳拓扑结构。这个算法包含一个复杂的最优性准则集合,我们的学习模型成功地学习到了这些准则。这个模型是一个堆叠集成模型,其输出是特定机器人的拓扑结构预测。每个堆叠集成实例包括三个低级估计器,其输出将被一个高级增强混合器聚合。将我们的模型应用于一个由10个机器人组成的网络,显示出超过80%的准确率,预测了对应于该网络引用的各种配置的最佳拓扑结构。

更新时间: 2024-04-04 21:54:18

领域: cs.RO,cs.LG

下载: http://arxiv.org/abs/2201.12900v2

CBR-RAG: Case-Based Reasoning for Retrieval Augmented Generation in LLMs for Legal Question Answering

Retrieval-Augmented Generation (RAG) enhances Large Language Model (LLM) output by providing prior knowledge as context to input. This is beneficial for knowledge-intensive and expert reliant tasks, including legal question-answering, which require evidence to validate generated text outputs. We highlight that Case-Based Reasoning (CBR) presents key opportunities to structure retrieval as part of the RAG process in an LLM. We introduce CBR-RAG, where CBR cycle's initial retrieval stage, its indexing vocabulary, and similarity knowledge containers are used to enhance LLM queries with contextually relevant cases. This integration augments the original LLM query, providing a richer prompt. We present an evaluation of CBR-RAG, and examine different representations (i.e. general and domain-specific embeddings) and methods of comparison (i.e. inter, intra and hybrid similarity) on the task of legal question-answering. Our results indicate that the context provided by CBR's case reuse enforces similarity between relevant components of the questions and the evidence base leading to significant improvements in the quality of generated answers.

Updated: 2024-04-04 21:47:43

标题: CBR-RAG: 基于案例推理的检索增强生成在LLMs中用于法律问题回答

摘要: 检索增强生成(RAG)通过提供先验知识作为输入的上下文来增强大型语言模型(LLM)的输出。这对于知识密集和依赖专家的任务非常有益,包括需要证据验证生成的文本输出的法律问答。我们强调,基于案例推理(CBR)为将检索结构化作为LLM中RAG过程的关键机会。我们介绍CBR-RAG,在其中CBR循环的初始检索阶段,其索引词汇和相似性知识容器用于增强LLM查询与上下文相关案例。这种整合增强了原始LLM查询,提供了更丰富的提示。我们对CBR-RAG进行评估,并在法律问答任务中检查不同的表示(即通用和领域特定嵌入)和比较方法(即内部,内部和混合相似性)。我们的结果表明,CBR的案例重用提供的上下文强化了问题和证据库中相关组件之间的相似性,从而显著提高了生成答案的质量。

更新时间: 2024-04-04 21:47:43

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2404.04302v1

Agnostic Tomography of Stabilizer Product States

We define a quantum learning task called agnostic tomography, where given copies of an arbitrary state $\rho$ and a class of quantum states $\mathcal{C}$, the goal is to output a succinct description of a state that approximates $\rho$ at least as well as any state in $\mathcal{C}$ (up to some small error $\varepsilon$). This task generalizes ordinary quantum tomography of states in $\mathcal{C}$ and is more challenging because the learning algorithm must be robust to perturbations of $\rho$. We give an efficient agnostic tomography algorithm for the class $\mathcal{C}$ of $n$-qubit stabilizer product states. Assuming $\rho$ has fidelity at least $\tau$ with a stabilizer product state, the algorithm runs in time $n^{O(1 + \log(1/\tau))} / \varepsilon^2$. This runtime is quasipolynomial in all parameters, and polynomial if $\tau$ is a constant.

Updated: 2024-04-04 21:39:47

标题: 不可知论稳态产品态的测量学

摘要: 我们定义了一个量子学习任务,称为无偏测试,其中给定任意状态$\rho$的副本和一个量子状态类$\mathcal{C}$,目标是输出一个简洁的描述状态,该描述状态至少与$\mathcal{C}$中的任何状态一样好地近似$\rho$(直到一些小误差$\varepsilon$为止)。这个任务推广了$\mathcal{C}$中状态的普通量子测试,并且更具挑战性,因为学习算法必须对$\rho$的扰动具有鲁棒性。 我们为$n$量子比特稳定子乘积状态类$\mathcal{C}$提供了一个高效的无偏测试算法。假设$\rho$至少与一个稳定子乘积状态具有保真度$\tau$,则该算法运行时间为$n^{O(1 + \log(1/\tau))} / \varepsilon^2$。这个运行时间在所有参数上都是准多项式的,并且如果$\tau$是一个常数,则是多项式的。

更新时间: 2024-04-04 21:39:47

领域: quant-ph,cs.LG

下载: http://arxiv.org/abs/2404.03813v1

Efficient Learning of Quantum States Prepared With Few Non-Clifford Gates II: Single-Copy Measurements

Recent work has shown that $n$-qubit quantum states output by circuits with at most $t$ single-qubit non-Clifford gates can be learned to trace distance $\epsilon$ using $\mathsf{poly}(n,2^t,1/\epsilon)$ time and samples. All prior algorithms achieving this runtime use entangled measurements across two copies of the input state. In this work, we give a similarly efficient algorithm that learns the same class of states using only single-copy measurements.

Updated: 2024-04-04 21:27:11

标题: 高效学习使用少量非克利福门准备的量子态 II:单次测量

摘要: 最近的研究表明,通过至多$t$个单量子比特非Clifford门输出的$n$量子比特量子态可以学习到跟踪距离$\epsilon$,并且只需要$\mathsf{poly}(n,2^t,1/\epsilon)$的时间和样本。所有先前达到这个运行时间的算法都使用横跨输入态的两个副本的纠缠测量。在这项工作中,我们提供了一个类似高效的算法,只使用单个副本测量就可以学习相同类别的态。

更新时间: 2024-04-04 21:27:11

领域: quant-ph,cs.LG

下载: http://arxiv.org/abs/2308.07175v2

The Reversal Curse: LLMs trained on "A is B" fail to learn "B is A"

We expose a surprising failure of generalization in auto-regressive large language models (LLMs). If a model is trained on a sentence of the form "A is B", it will not automatically generalize to the reverse direction "B is A". This is the Reversal Curse. For instance, if a model is trained on "Valentina Tereshkova was the first woman to travel to space", it will not automatically be able to answer the question, "Who was the first woman to travel to space?". Moreover, the likelihood of the correct answer ("Valentina Tershkova") will not be higher than for a random name. Thus, models do not generalize a prevalent pattern in their training set: if "A is B" occurs, "B is A" is more likely to occur. It is worth noting, however, that if "A is B" appears in-context, models can deduce the reverse relationship. We provide evidence for the Reversal Curse by finetuning GPT-3 and Llama-1 on fictitious statements such as "Uriah Hawthorne is the composer of Abyssal Melodies" and showing that they fail to correctly answer "Who composed Abyssal Melodies?". The Reversal Curse is robust across model sizes and model families and is not alleviated by data augmentation. We also evaluate ChatGPT (GPT-3.5 and GPT-4) on questions about real-world celebrities, such as "Who is Tom Cruise's mother? [A: Mary Lee Pfeiffer]" and the reverse "Who is Mary Lee Pfeiffer's son?". GPT-4 correctly answers questions like the former 79% of the time, compared to 33% for the latter. Code available at: https://github.com/lukasberglund/reversal_curse.

Updated: 2024-04-04 21:25:17

标题: 逆转诅咒:在“A是B”训练的LLMs无法学习“B是A”

摘要: 我们揭示了自回归大语言模型(LLMs)在泛化方面的一个令人惊讶的失败。如果一个模型在形式为“A是B”的句子上进行训练,它不会自动泛化到相反的方向“B是A”。这就是逆转诅咒。例如,如果一个模型在“瓦伦蒂娜·特雷什科娃是第一个前往太空的女性”上进行训练,它不会自动回答“谁是第一个前往太空的女性?”的问题。此外,正确答案(“瓦伦蒂娜·特雷什科娃”)的可能性不会比随机姓名更高。因此,模型没有泛化到它们训练集中的普遍模式:如果“A是B”发生,那么“B是A”更有可能发生。然而,值得注意的是,如果“A是B”出现在上下文中,模型可以推断出反向关系。我们通过在虚构的语句(如“乌里亚·霍桑是深渊旋律的作曲家”)上微调GPT-3和Llama-1来提供逆转诅咒的证据,并展示它们无法正确回答“谁作曲了深渊旋律?”。逆转诅咒在模型大小和模型系列中是稳健的,并且不受数据增强的缓解。我们还对ChatGPT(GPT-3.5和GPT-4)在关于真实世界名人的问题上进行评估,如“汤姆·克鲁斯的母亲是谁?[答:玛丽·李·皮费尔]”和相反的“玛丽·李·皮费尔的儿子是谁?”。与后者相比,GPT-4正确回答前者类似问题的几率为79%,而后者为33%。 代码可在以下链接找到:https://github.com/lukasberglund/reversal_curse。

更新时间: 2024-04-04 21:25:17

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2309.12288v3

GENEVIC: GENetic data Exploration and Visualization via Intelligent interactive Console

Summary: The vast generation of genetic data poses a significant challenge in efficiently uncovering valuable knowledge. Introducing GENEVIC, an AI-driven chat framework that tackles this challenge by bridging the gap between genetic data generation and biomedical knowledge discovery. Leveraging generative AI, notably ChatGPT, it serves as a biologist's 'copilot'. It automates the analysis, retrieval, and visualization of customized domain-specific genetic information, and integrates functionalities to generate protein interaction networks, enrich gene sets, and search scientific literature from PubMed, Google Scholar, and arXiv, making it a comprehensive tool for biomedical research. In its pilot phase, GENEVIC is assessed using a curated database that ranks genetic variants associated with Alzheimer's disease, schizophrenia, and cognition, based on their effect weights from the Polygenic Score Catalog, thus enabling researchers to prioritize genetic variants in complex diseases. GENEVIC's operation is user-friendly, accessible without any specialized training, secured by Azure OpenAI's HIPAA-compliant infrastructure, and evaluated for its efficacy through real-time query testing. As a prototype, GENEVIC is set to advance genetic research, enabling informed biomedical decisions. Availability and implementation: GENEVIC is publicly accessible at https://genevic-anath2024.streamlit.app. The underlying code is open-source and available via GitHub at https://github.com/anath2110/GENEVIC.git.

Updated: 2024-04-04 20:53:30

标题: GENEVIC:通过智能交互式控制台进行基因数据的探索和可视化

摘要: 摘要:大量的遗传数据生成给有效挖掘有价值知识带来了重大挑战。介绍了GENEVIC,一个由人工智能驱动的聊天框架,通过弥合遗传数据生成和生物医学知识发现之间的差距来应对这一挑战。利用生成式人工智能,特别是ChatGPT,它充当生物学家的“副驾驶员”。它自动化了定制领域特定遗传信息的分析、检索和可视化,并集成了生成蛋白相互作用网络、丰富基因集和从PubMed、Google Scholar和arXiv搜索科学文献的功能,使其成为生物医学研究的综合工具。在试点阶段,GENEVIC使用一个经过策划的数据库进行评估,该数据库根据其来自Polygenic Score Catalog的效应权重对与阿尔茨海默病、精神分裂症和认知有关的遗传变异进行排名,从而使研究人员能够优先考虑复杂疾病中的遗传变异。GENEVIC的操作用户友好,无需任何专门的培训,由Azure OpenAI的符合HIPAA的基础设施保护,并通过实时查询测试评估其有效性。作为一个原型,GENEVIC旨在推动遗传研究,促使知情的生物医学决策。 可用性和实施:GENEVIC可以在https://genevic-anath2024.streamlit.app 上公开访问。底层代码是开源的,可以通过GitHub上的https://github.com/anath2110/GENEVIC.git获取。

更新时间: 2024-04-04 20:53:30

领域: q-bio.QM,cs.AI

下载: http://arxiv.org/abs/2404.04299v1

TransformerLSR: Attentive Joint Model of Longitudinal Data, Survival, and Recurrent Events with Concurrent Latent Structure

In applications such as biomedical studies, epidemiology, and social sciences, recurrent events often co-occur with longitudinal measurements and a terminal event, such as death. Therefore, jointly modeling longitudinal measurements, recurrent events, and survival data while accounting for their dependencies is critical. While joint models for the three components exist in statistical literature, many of these approaches are limited by heavy parametric assumptions and scalability issues. Recently, incorporating deep learning techniques into joint modeling has shown promising results. However, current methods only address joint modeling of longitudinal measurements at regularly-spaced observation times and survival events, neglecting recurrent events. In this paper, we develop TransformerLSR, a flexible transformer-based deep modeling and inference framework to jointly model all three components simultaneously. TransformerLSR integrates deep temporal point processes into the joint modeling framework, treating recurrent and terminal events as two competing processes dependent on past longitudinal measurements and recurrent event times. Additionally, TransformerLSR introduces a novel trajectory representation and model architecture to potentially incorporate a priori knowledge of known latent structures among concurrent longitudinal variables. We demonstrate the effectiveness and necessity of TransformerLSR through simulation studies and analyzing a real-world medical dataset on patients after kidney transplantation.

Updated: 2024-04-04 20:51:37

标题: TransformerLSR:关注纵向数据、生存和具有并发潜在结构的复发事件的联合模型

摘要: 在生物医学研究、流行病学和社会科学等应用中,经常发生的事件通常与纵向测量和终端事件(如死亡)同时发生。因此,在考虑它们的依赖关系的同时,联合建模纵向测量、频发事件和生存数据至关重要。尽管统计文献中存在三个组成部分的联合模型,但许多方法受到严格的参数假设和可扩展性问题的限制。最近,将深度学习技术纳入联合建模已经显示出有希望的结果。然而,当前的方法只解决了在定期观测时间上进行纵向测量和生存事件的联合建模,忽略了频发事件。在本文中,我们开发了TransformerLSR,这是一个灵活的基于Transformer的深度建模和推断框架,可以同时联合建模所有三个组成部分。TransformerLSR将深度时间点过程整合到联合建模框架中,将经常发生和终端事件视为两个取决于过去纵向测量和经常发生事件时间的竞争过程。此外,TransformerLSR引入了一种新颖的轨迹表示和模型架构,可以潜在地将已知潜在结构与同时发生的纵向变量结合。通过模拟研究和分析肾移植后患者的真实世界医疗数据,我们展示了TransformerLSR的有效性和必要性。

更新时间: 2024-04-04 20:51:37

领域: stat.ML,cs.LG,stat.AP,stat.ME

下载: http://arxiv.org/abs/2404.03804v1

Learning Social Fairness Preferences from Non-Expert Stakeholder Opinions in Kidney Placement

Modern kidney placement incorporates several intelligent recommendation systems which exhibit social discrimination due to biases inherited from training data. Although initial attempts were made in the literature to study algorithmic fairness in kidney placement, these methods replace true outcomes with surgeons' decisions due to the long delays involved in recording such outcomes reliably. However, the replacement of true outcomes with surgeons' decisions disregards expert stakeholders' biases as well as social opinions of other stakeholders who do not possess medical expertise. This paper alleviates the latter concern and designs a novel fairness feedback survey to evaluate an acceptance rate predictor (ARP) that predicts a kidney's acceptance rate in a given kidney-match pair. The survey is launched on Prolific, a crowdsourcing platform, and public opinions are collected from 85 anonymous crowd participants. A novel social fairness preference learning algorithm is proposed based on minimizing social feedback regret computed using a novel logit-based fairness feedback model. The proposed model and learning algorithm are both validated using simulation experiments as well as Prolific data. Public preferences towards group fairness notions in the context of kidney placement have been estimated and discussed in detail. The specific ARP tested in the Prolific survey has been deemed fair by the participants.

Updated: 2024-04-04 20:44:56

标题: 从非专业利益相关者意见中学习肾脏配对的社会公平偏好

摘要: 现代肾脏移植包括几个智能推荐系统,这些系统由于训练数据中继承的偏见而表现出社会歧视。虽然文献中曾有初步尝试研究肾脏移植中的算法公平性,但这些方法由于记录真实结果需要较长时间而将真实结果替换为外科医生的决定。然而,将真实结果替换为外科医生的决定忽视了专家利益相关者的偏见以及其他不具备医学专业知识的利益相关者的社会意见。本文缓解了后者的担忧,并设计了一种新颖的公平反馈调查,以评估一个接受率预测器(ARP),该预测器可以预测给定肾脏匹配对中肾脏的接受率。调查在众包平台Prolific上启动,并从85名匿名众包参与者收集了公众意见。提出了一种基于最小化使用新型logit模型计算的社会反馈后悔的社会公平偏好学习算法。所提出的模型和学习算法均通过模拟实验和Prolific数据进行了验证。在肾脏移植背景下,公众对于群体公平概念的偏好已被估计并详细讨论。在Prolific调查中测试的具体ARP被参与者认为是公平的。

更新时间: 2024-04-04 20:44:56

领域: cs.LG,cs.HC

下载: http://arxiv.org/abs/2404.03800v1

Language-Guided Instance-Aware Domain-Adaptive Panoptic Segmentation

The increasing relevance of panoptic segmentation is tied to the advancements in autonomous driving and AR/VR applications. However, the deployment of such models has been limited due to the expensive nature of dense data annotation, giving rise to unsupervised domain adaptation (UDA). A key challenge in panoptic UDA is reducing the domain gap between a labeled source and an unlabeled target domain while harmonizing the subtasks of semantic and instance segmentation to limit catastrophic interference. While considerable progress has been achieved, existing approaches mainly focus on the adaptation of semantic segmentation. In this work, we focus on incorporating instance-level adaptation via a novel instance-aware cross-domain mixing strategy IMix. IMix significantly enhances the panoptic quality by improving instance segmentation performance. Specifically, we propose inserting high-confidence predicted instances from the target domain onto source images, retaining the exhaustiveness of the resulting pseudo-labels while reducing the injected confirmation bias. Nevertheless, such an enhancement comes at the cost of degraded semantic performance, attributed to catastrophic forgetting. To mitigate this issue, we regularize our semantic branch by employing CLIP-based domain alignment (CDA), exploiting the domain-robustness of natural language prompts. Finally, we present an end-to-end model incorporating these two mechanisms called LIDAPS, achieving state-of-the-art results on all popular panoptic UDA benchmarks.

Updated: 2024-04-04 20:42:49

标题: 语言引导的实例感知域自适应全景分割

摘要: 全景分割的日益重要性与自动驾驶和AR/VR应用的进步息息相关。然而,这种模型的部署受到了数据注释昂贵的限制,从而催生了无监督域自适应(UDA)。全景UDA面临的关键挑战是在减少标记源域和未标记目标域之间的域差异的同时,协调语义和实例分割的子任务,以限制灾难性干扰。尽管取得了相当大的进展,现有方法主要集中在语义分割的适应上。在这项工作中,我们专注于通过一种新颖的实例感知跨域混合策略IMix,来整合实例级别的适应。IMix通过将目标域中的高置信度预测实例插入到源图像中,提高实例分割性能,从而显著提高了全景质量。具体来说,我们提出在源图像中插入来自目标域的高置信度预测实例,保留了所得伪标签的完整性,同时减少了注入的确认偏差。然而,这种增强性是以语义性能下降为代价的,这归因于灾难性遗忘。为了缓解这个问题,我们通过采用基于CLIP的领域对齐(CDA)来规范我们的语义分支,利用自然语言提示的领域鲁棒性。最后,我们提出了一个整合这两种机制的端到端模型,名为LIDAPS,在所有流行的全景UDA基准上取得了最先进的结果。

更新时间: 2024-04-04 20:42:49

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2404.03799v1

SELF-[IN]CORRECT: LLMs Struggle with Refining Self-Generated Responses

Can LLMs continually improve their previous outputs for better results? An affirmative answer would require LLMs to be better at discriminating among previously-generated alternatives, than generating initial responses. We explore the validity of this hypothesis in practice. We first introduce a unified framework that allows us to compare the generative and discriminative capability of any model on any task. Then, in our resulting experimental analysis of several LLMs, we do not observe the performance of those models on discrimination to be reliably better than generation. We hope these findings inform the growing literature on self-improvement AI systems.

Updated: 2024-04-04 20:27:37

标题: 自我-[自我]修正:LLM在改进自我生成的回答方面遇到困难

摘要: LLM能否持续改进其先前的输出以获得更好的结果?肯定的答案需要LLM在区分先前生成的替代方案方面比生成初始响应更好。我们在实践中探讨这个假设的有效性。我们首先介绍一个统一框架,允许我们比较任何模型在任何任务上的生成和区分能力。然后,在我们对几个LLM进行的实验分析中,我们没有观察到这些模型在区分上的性能可靠地优于生成。我们希望这些发现能够为自我改进的人工智能系统的不断增长的文献提供信息。

更新时间: 2024-04-04 20:27:37

领域: cs.AI,cs.CL,cs.LG

下载: http://arxiv.org/abs/2404.04298v1

Spatial Bayesian Neural Networks

interpretable, and well understood models that are routinely employed even though, as is revealed through prior and posterior predictive checks, these can poorly characterise the spatial heterogeneity in the underlying process of interest. Here, we propose a new, flexible class of spatial-process models, which we refer to as spatial Bayesian neural networks (SBNNs). An SBNN leverages the representational capacity of a Bayesian neural network; it is tailored to a spatial setting by incorporating a spatial ``embedding layer'' into the network and, possibly, spatially-varying network parameters. An SBNN is calibrated by matching its finite-dimensional distribution at locations on a fine gridding of space to that of a target process of interest. That process could be easy to simulate from or we may have many realisations from it. We propose several variants of SBNNs, most of which are able to match the finite-dimensional distribution of the target process at the selected grid better than conventional BNNs of similar complexity. We also show that an SBNN can be used to represent a variety of spatial processes often used in practice, such as Gaussian processes, lognormal processes, and max-stable processes. We briefly discuss the tools that could be used to make inference with SBNNs, and we conclude with a discussion of their advantages and limitations.

Updated: 2024-04-04 20:18:27

标题: 空间贝叶斯神经网络

摘要: 我们通常会使用可解释且易于理解的模型,即使通过先验和后验预测检查揭示,这些模型可能无法很好地描述感兴趣的基础过程中的空间异质性。在这里,我们提出了一种新的灵活的空间过程模型类别,我们称之为空间贝叶斯神经网络(SBNNs)。SBNN利用了贝叶斯神经网络的表示能力;通过将空间“嵌入层”纳入网络,并可能包含空间变化的网络参数,使其适应空间环境。通过将SBNN在空间精细划分的位置上的有限维分布与感兴趣的目标过程的分布进行匹配来校准。该过程可能易于模拟,或者我们可能有许多来自该过程的实现。我们提出了几种SBNN的变体,其中大多数能够比类似复杂度的传统BNN更好地匹配所选网格上目标过程的有限维分布。我们还展示了SBNN可以用来表示实践中经常使用的各种空间过程,如高斯过程、对数正态过程和最大稳态过程。我们简要讨论了可以用于对SBNN进行推断的工具,并在讨论其优势和局限性后得出结论。

更新时间: 2024-04-04 20:18:27

领域: stat.ML,cs.LG

下载: http://arxiv.org/abs/2311.09491v2

Quantifying Uncertainty in Motion Prediction with Variational Bayesian Mixture

Safety and robustness are crucial factors in developing trustworthy autonomous vehicles. One essential aspect of addressing these factors is to equip vehicles with the capability to predict future trajectories for all moving objects in the surroundings and quantify prediction uncertainties. In this paper, we propose the Sequential Neural Variational Agent (SeNeVA), a generative model that describes the distribution of future trajectories for a single moving object. Our approach can distinguish Out-of-Distribution data while quantifying uncertainty and achieving competitive performance compared to state-of-the-art methods on the Argoverse 2 and INTERACTION datasets. Specifically, a 0.446 meters minimum Final Displacement Error, a 0.203 meters minimum Average Displacement Error, and a 5.35% Miss Rate are achieved on the INTERACTION test set. Extensive qualitative and quantitative analysis is also provided to evaluate the proposed model. Our open-source code is available at https://github.com/PurdueDigitalTwin/seneva.

Updated: 2024-04-04 20:04:12

标题: 用变分贝叶斯混合模型量化运动预测中的不确定性

摘要: 安全性和鲁棒性是发展可信任自动驾驶车辆的关键因素。解决这些因素的一个基本方面是为车辆配备能够预测周围所有移动物体未来轨迹并量化预测不确定性的能力。在本文中,我们提出了顺序神经变分代理(SeNeVA),这是一个描述单个移动物体未来轨迹分布的生成模型。我们的方法可以区分分布外数据,同时量化不确定性,在Argoverse 2和INTERACTION数据集上与最先进的方法相比取得具有竞争力的性能。具体地,在INTERACTION测试集上达到了0.446米的最小最终位移误差,0.203米的最小平均位移误差和5.35%的漏检率。我们还提供了广泛的定性和定量分析来评估所提出的模型。我们的开源代码可在https://github.com/PurdueDigitalTwin/seneva 上获得。

更新时间: 2024-04-04 20:04:12

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2404.03789v1

Layerwise Early Stopping for Test Time Adaptation

Test Time Adaptation (TTA) addresses the problem of distribution shift by enabling pretrained models to learn new features on an unseen domain at test time. However, it poses a significant challenge to maintain a balance between learning new features and retaining useful pretrained features. In this paper, we propose Layerwise EArly STopping (LEAST) for TTA to address this problem. The key idea is to stop adapting individual layers during TTA if the features being learned do not appear beneficial for the new domain. For that purpose, we propose using a novel gradient-based metric to measure the relevance of the current learnt features to the new domain without the need for supervised labels. More specifically, we propose to use this metric to determine dynamically when to stop updating each layer during TTA. This enables a more balanced adaptation, restricted to layers benefiting from it, and only for a certain number of steps. Such an approach also has the added effect of limiting the forgetting of pretrained features useful for dealing with new domains. Through extensive experiments, we demonstrate that Layerwise Early Stopping improves the performance of existing TTA approaches across multiple datasets, domain shifts, model architectures, and TTA losses.

Updated: 2024-04-04 19:55:11

标题: 逐层早停止用于测试时间适应

摘要: Test Time Adaptation(TTA)通过使预训练模型在测试时学习新领域的特征来解决分布转移问题。然而,它面临着在学习新特征和保留有用的预训练特征之间保持平衡的重要挑战。在本文中,我们提出了Layerwise EArly STopping(LEAST)来解决这个问题。关键思想是在TTA期间,如果正在学习的特征对新领域没有益处,则停止适应各个层。为此,我们提出使用一种新颖的基于梯度的度量来衡量当前学习特征与新领域的相关性,而无需监督标签。更具体地说,我们建议使用这个度量来动态确定何时在TTA期间停止更新每个层。这使得适应更加平衡,限于受益于它的层,并且仅限于一定数量的步骤。这种方法还具有限制有用于处理新领域的预训练特征遗忘的额外效果。通过大量实验证明,Layerwise Early Stopping可以改善现有TTA方法在多个数据集、领域转移、模型架构和TTA损失方面的性能。

更新时间: 2024-04-04 19:55:11

领域: cs.LG,cs.AI,cs.CV

下载: http://arxiv.org/abs/2404.03784v1

Compensatory Biases Under Cognitive Load: Reducing Selection Bias in Large Language Models

Large Language Models (LLMs) like gpt-3.5-turbo and claude-instant-1.2 have become instrumental in interpreting and executing semantic-based tasks. Unfortunately, these models' inherent biases, akin to human cognitive biases, adversely affect their performance. Particularly affected is object selection from lists; a fundamental operation in digital navigation and decision-making. This research critically examines these biases and quantifies the effects on a representative list selection task. To explore these biases, we conducted a series of controlled experiments, manipulating temperature, list length, object identity, object type, prompt complexity, and model. This enabled us to isolate and measure the influence of the biases on selection behavior. Our findings show that bias structure is strongly dependent on the model, with object type modulating the magnitude of the effect. With a strong primacy effect, causing the first objects in a list to be disproportionately represented in outputs. Furthermore the usage of guard rails, a prompt engineering method of ensuring a response structure, can increase bias and decrease instruction adherence when combined with a selection task. The bias is ablated when the guard rail step is separated from the list sampling step, lowering the complexity of each individual task. The implications of this research are two-fold, practically providing a guide for designing unbiased LLM applications and theoretically suggesting that LLMs experience a form of cognitive load compensated for by increasing bias.

Updated: 2024-04-04 19:54:07

标题: 负载认知下的补偿偏差:减少大型语言模型中的选择偏差

摘要: 大型语言模型(LLMs)如gpt-3.5-turbo和claude-instant-1.2已成为解释和执行基于语义的任务的重要工具。不幸的是,这些模型固有的偏见,类似于人类认知偏见,对它们的性能产生了不利影响。特别受影响的是从列表中进行对象选择;这是数字导航和决策制定中的基本操作。本研究对这些偏见进行了批判性审查,并量化了其对代表性列表选择任务的影响。为了探索这些偏见,我们进行了一系列受控实验,操纵了温度、列表长度、对象身份、对象类型、提示复杂度和模型。这使我们能够隔离和衡量这些偏见对选择行为的影响。我们的研究结果显示,偏见结构在很大程度上取决于模型,对象类型调节了效应的幅度。由于存在强烈的首位效应,导致列表中的第一个对象在输出中被过度表示。此外,使用防护栏,这是一种确保响应结构的提示工程方法,可以增加偏见并在与选择任务结合时降低指令遵从性。当保护栏步骤与列表抽样步骤分开时,偏见会消除,降低每个单独任务的复杂性。这项研究的意义是双重的,实际上提供了设计无偏见LLM应用程序的指南,并在理论上暗示LLMs经历了一种认知负荷,通过增加偏见来补偿。

更新时间: 2024-04-04 19:54:07

领域: cs.CL,cs.AI,I.2.0

下载: http://arxiv.org/abs/2402.01740v2

DisCo: Disentangled Control for Realistic Human Dance Generation

Generative AI has made significant strides in computer vision, particularly in text-driven image/video synthesis (T2I/T2V). Despite the notable advancements, it remains challenging in human-centric content synthesis such as realistic dance generation. Current methodologies, primarily tailored for human motion transfer, encounter difficulties when confronted with real-world dance scenarios (e.g., social media dance), which require to generalize across a wide spectrum of poses and intricate human details. In this paper, we depart from the traditional paradigm of human motion transfer and emphasize two additional critical attributes for the synthesis of human dance content in social media contexts: (i) Generalizability: the model should be able to generalize beyond generic human viewpoints as well as unseen human subjects, backgrounds, and poses; (ii) Compositionality: it should allow for the seamless composition of seen/unseen subjects, backgrounds, and poses from different sources. To address these challenges, we introduce DISCO, which includes a novel model architecture with disentangled control to improve the compositionality of dance synthesis, and an effective human attribute pre-training for better generalizability to unseen humans. Extensive qualitative and quantitative results demonstrate that DisCc can generate high-quality human dance images and videos with diverse appearances and flexible motions. Code is available at https://disco-dance.github.io/.

Updated: 2024-04-04 19:41:09

标题: DisCo:用于逼真人类舞蹈生成的分离控制

摘要: 生成式人工智能在计算机视觉领域取得了显著进展,特别是在文本驱动的图像/视频合成(T2I/T2V)方面。尽管取得了显著进展,但在人类中心内容合成方面仍然具有挑战性,例如逼真的舞蹈生成。当前的方法主要针对人类运动转移,当面对现实世界的舞蹈场景(例如社交媒体舞蹈)时遇到困难,这些场景需要在广泛的姿势和复杂的人体细节之间进行泛化。在本文中,我们摆脱传统的人体运动转移范式,强调社交媒体环境中合成人类舞蹈内容的另外两个关键属性:(一)泛化能力:模型应能够在通用的人类视角以及看不见的人类主体、背景和姿势之间进行泛化;(二)组合性:它应允许从不同来源无缝地组合已见/未见的主体、背景和姿势。为了解决这些挑战,我们引入了DISCO,其中包括一种新颖的模型架构,具有解耦控制以提高舞蹈合成的组合性,以及一种有效的人体属性预训练,以更好地泛化到看不见的人类。广泛的定性和定量结果表明,DisCc能够生成具有多样外观和灵活动作的高质量人类舞蹈图像和视频。源代码可在https://disco-dance.github.io/上找到。

更新时间: 2024-04-04 19:41:09

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2307.00040v3

A Systems Theoretic Approach to Online Machine Learning

The machine learning formulation of online learning is incomplete from a systems theoretic perspective. Typically, machine learning research emphasizes domains and tasks, and a problem solving worldview. It focuses on algorithm parameters, features, and samples, and neglects the perspective offered by considering system structure and system behavior or dynamics. Online learning is an active field of research and has been widely explored in terms of statistical theory and computational algorithms, however, in general, the literature still lacks formal system theoretical frameworks for modeling online learning systems and resolving systems-related concept drift issues. Furthermore, while the machine learning formulation serves to classify methods and literature, the systems theoretic formulation presented herein serves to provide a framework for the top-down design of online learning systems, including a novel definition of online learning and the identification of key design parameters. The framework is formulated in terms of input-output systems and is further divided into system structure and system behavior. Concept drift is a critical challenge faced in online learning, and this work formally approaches it as part of the system behavior characteristics. Healthcare provider fraud detection using machine learning is used as a case study throughout the paper to ground the discussion in a real-world online learning challenge.

Updated: 2024-04-04 19:36:47

标题: 一种系统理论方法的在线机器学习

摘要: 在线学习的机器学习形式在系统论的角度是不完整的。通常,机器学习研究强调领域和任务,以及问题解决的世界观。它侧重于算法参数、特征和样本,并忽略了考虑系统结构和系统行为或动态所提供的视角。在线学习是一个活跃的研究领域,并且在统计理论和计算算法方面已经得到广泛探讨,然而,总体上,文献仍然缺乏用于建模在线学习系统和解决与系统相关的概念漂移问题的正式系统理论框架。 此外,虽然机器学习形式用于分类方法和文献,但这里提出的系统理论形式则用于为在线学习系统的自上而下设计提供框架,包括在线学习的新定义和关键设计参数的识别。该框架以输入输出系统的形式进行了构建,并进一步分为系统结构和系统行为。概念漂移是在线学习中面临的一个关键挑战,这项工作正式将其作为系统行为特征的一部分来处理。本文全文中使用医疗保健提供者欺诈检测作为案例研究,以将讨论基于真实世界的在线学习挑战。

更新时间: 2024-04-04 19:36:47

领域: cs.LG,cs.AI,cs.SY,eess.SY

下载: http://arxiv.org/abs/2404.03775v1

Exploration is Harder than Prediction: Cryptographically Separating Reinforcement Learning from Supervised Learning

Supervised learning is often computationally easy in practice. But to what extent does this mean that other modes of learning, such as reinforcement learning (RL), ought to be computationally easy by extension? In this work we show the first cryptographic separation between RL and supervised learning, by exhibiting a class of block MDPs and associated decoding functions where reward-free exploration is provably computationally harder than the associated regression problem. We also show that there is no computationally efficient algorithm for reward-directed RL in block MDPs, even when given access to an oracle for this regression problem. It is known that being able to perform regression in block MDPs is necessary for finding a good policy; our results suggest that it is not sufficient. Our separation lower bound uses a new robustness property of the Learning Parities with Noise (LPN) hardness assumption, which is crucial in handling the dependent nature of RL data. We argue that separations and oracle lower bounds, such as ours, are a more meaningful way to prove hardness of learning because the constructions better reflect the practical reality that supervised learning by itself is often not the computational bottleneck.

Updated: 2024-04-04 19:35:41

标题: 探索比预测更困难:从加密角度区分强化学习和监督学习

摘要: 监督学习在实践中通常在计算上容易。但是这是否意味着其他学习方式,如强化学习(RL),也应该在计算上容易?在这项研究中,我们展示了RL和监督学习之间的第一个密码分离,通过展示一类块MDP和相关的解码函数,在其中无奖励探索在计算上被证明比相关的回归问题更困难。我们还展示了在块MDP中没有计算有效的奖励导向RL算法,即使可以访问这个回归问题的oracle。 已知在块MDP中执行回归是找到一个好策略的必要条件;我们的结果表明这并不足够。我们的分离下界使用了学习带噪声的鲁棒性属性(LPN)困难假设的一个新属性,这对处理RL数据的相关性至关重要。我们认为,像我们这样的分离和oracle下界,是证明学习难度的更有意义的方法,因为这些构造更好地反映了监督学习本身通常不是计算难题的实际现实。

更新时间: 2024-04-04 19:35:41

领域: cs.LG,cs.CC,cs.CR,cs.DS

下载: http://arxiv.org/abs/2404.03774v1

ProLoc: Robust Location Proofs in Hindsight

Many online services rely on self-reported locations of user devices like smartphones. To mitigate harm from falsified self-reported locations, the literature has proposed location proof services (LPSs), which provide proof of a device's location by corroborating its self-reported location using short-range radio contacts with either trusted infrastructure or nearby devices that also report their locations. This paper presents ProLoc, a new LPS that extends prior work in two ways. First, ProLoc relaxes prior work's proofs that a device was at a given location to proofs that a device was within distance "d" of a given location. We argue that these weaker proofs, which we call "region proofs", are important because (i) region proofs can be constructed with few requirements on device reporting behavior as opposed to precise location proofs, and (ii) a quantitative bound on a device's distance from a known epicenter is useful for many applications. For example, in the context of citizen reporting near an unexpected event (earthquake, violent protest, etc.), knowing the verified distances of the reporting devices from the event's epicenter would be valuable for ranking the reports by relevance or flagging fake reports. Second, ProLoc includes a novel mechanism to prevent collusion attacks where a set of attacker-controlled devices corroborate each others' false locations. Ours is the first mechanism that does not need additional infrastructure to handle attacks with made-up devices, which an attacker can create in any number at any location without any cost. For this, we rely on a variant of TrustRank applied to the self-reported trajectories and encounters of devices. Our goal is to prevent retroactive attacks where the adversary cannot predict ahead of time which fake location it will want to report, which is the case for the reporting of unexpected events.

Updated: 2024-04-04 19:34:16

标题: ProLoc:事后的稳健定位证明

摘要: 许多在线服务依赖于用户设备(如智能手机)的自报位置。为了减少虚假自报位置带来的危害,文献提出了位置证明服务(LPSs),通过使用与受信任基础设施或附近设备的短距离无线电接触来证实设备位置的自报位置。本文介绍了一种新的LPS,ProLoc,它在两个方面扩展了之前的工作。首先,ProLoc将设备在给定位置的证明放宽为设备在给定位置的距离“d”范围内的证明。我们认为这些更弱的证明,即“区域证明”,很重要,因为(i)与精确位置证明相比,区域证明可以在设备报告行为上具有少量要求,(ii)设备距离已知震中的定量边界对许多应用很有用。例如,在意外事件附近的公民报告环境中(地震、暴力抗议等),了解报告设备与事件震中的验证距离将有助于根据相关性对报告进行排名或标记虚假报告。其次,ProLoc包括一种新颖的机制,可以防止串通攻击,即一组攻击者控制的设备协同证实彼此的虚假位置。我们是第一个无需额外基础设施处理用虚构设备进行的攻击的机制,攻击者可以在任何地点以任何数量创建虚构设备而不需要任何成本。为此,我们依赖于一种变种的TrustRank,应用于设备的自报轨迹和相遇情况。我们的目标是防止敌方进行回溯攻击,其不能预测出要报告的虚假位置,这点在意外事件报告中是适用的情况。

更新时间: 2024-04-04 19:34:16

领域: cs.CR

下载: http://arxiv.org/abs/2404.04297v1

R5Detect: Detecting Control-Flow Attacks from Standard RISC-V Enclaves

Embedded and Internet-of-Things (IoT) devices are ubiquitous today, and the uprising of several botnets based on them (e.g., Mirai, Ripple20) raises issues about the security of such devices. Especially low-power devices often lack support for modern system security measures, such as stack integrity, Non-eXecutable bits or strong cryptography. In this work, we present R5Detect, a security monitoring software that detects and prevents control-flow attacks on unmodified RISC-V standard architectures. With a novel combination of different protection techniques, it can run on embedded and low-power IoT devices, which may lack proper security features. R5Detect implements a memory-protected shadow stack to prevent runtime modifications, as well as a heuristics detection based on Hardware Performance Counters to detect control-flow integrity violations. Our results indicate that regular software can be protected against different degrees of control-flow manipulations with an average performance overhead of below 5 %. We implement and evaluate R5Detect on standard low-power RISC-V devices and show that such security features can be effectively used with minimal hardware support.

Updated: 2024-04-04 19:32:45

标题: R5Detect: 从标准 RISC-V 飞地中检测控制流攻击

摘要: 嵌入式和物联网(IoT)设备如今已无处不在,基于它们的几种僵尸网络的兴起(例如Mirai,Ripple20)引发了对这些设备安全性的担忧。特别是低功耗设备通常缺乏对现代系统安全措施的支持,如堆栈完整性、不可执行位或强加密技术。 在这项工作中,我们提出了R5Detect,一种安全监控软件,可以检测和防止对未修改的RISC-V标准架构进行控制流攻击。通过不同保护技术的新颖组合,它可以运行在嵌入式和低功耗的IoT设备上,这些设备可能缺乏适当的安全功能。R5Detect实现了一个内存保护的影子堆栈,以防止运行时修改,同时基于硬件性能计数器的启发式检测来检测控制流完整性的违规行为。我们的结果表明,常规软件可以在平均性能开销低于5%的情况下受到不同程度的控制流操作的保护。我们在标准低功耗RISC-V设备上实现和评估了R5Detect,并展示了这样的安全功能可以在较少的硬件支持下有效使用。

更新时间: 2024-04-04 19:32:45

领域: cs.CR

下载: http://arxiv.org/abs/2404.03771v1

Shadow Cones: A Generalized Framework for Partial Order Embeddings

Hyperbolic space has proven to be well-suited for capturing hierarchical relations in data, such as trees and directed acyclic graphs. Prior work introduced the concept of entailment cones, which uses partial orders defined by nested cones in the Poincar\'e ball to model hierarchies. Here, we introduce the ``shadow cones" framework, a physics-inspired entailment cone construction. Specifically, we model partial orders as subset relations between shadows formed by a light source and opaque objects in hyperbolic space. The shadow cones framework generalizes entailment cones to a broad class of formulations and hyperbolic space models beyond the Poincar\'e ball. This results in clear advantages over existing constructions: for example, shadow cones possess better optimization properties over constructions limited to the Poincar\'e ball. Our experiments on datasets of various sizes and hierarchical structures show that shadow cones consistently and significantly outperform existing entailment cone constructions. These results indicate that shadow cones are an effective way to model partial orders in hyperbolic space, offering physically intuitive and novel insights about the nature of such structures.

Updated: 2024-04-04 19:30:22

标题: "影子锥:偏序嵌入的泛化框架"

摘要: 超伪空间已被证明非常适合捕捉数据中的层次关系,如树和有向无环图。先前的工作引入了蕴涵锥的概念,它利用Poincaré球中定义的嵌套锥的偏序来建模层次结构。在这里,我们介绍了“阴影锥”框架,这是一个受物理启发的蕴涵锥构造。具体地,我们将偏序建模为超伪空间中由光源和不透明物体形成的阴影之间的子集关系。阴影锥框架将蕴涵锥推广到一类广泛的表述和超伪空间模型,超越了Poincaré球。这带来了比现有构造更清晰的优势:例如,阴影锥在优化性能上优于仅限于Poincaré球的构造。我们在各种大小和层次结构的数据集上的实验表明,阴影锥始终且显著地优于现有的蕴涵锥构造。这些结果表明,阴影锥是在超伪空间中建模偏序的有效方法,提供了关于这种结构本质的直观和新颖的见解。

更新时间: 2024-04-04 19:30:22

领域: cs.LG

下载: http://arxiv.org/abs/2305.15215v2

On Extending the Automatic Test Markup Language (ATML) for Machine Learning

This paper addresses the urgent need for messaging standards in the operational test and evaluation (T&E) of machine learning (ML) applications, particularly in edge ML applications embedded in systems like robots, satellites, and unmanned vehicles. It examines the suitability of the IEEE Standard 1671 (IEEE Std 1671), known as the Automatic Test Markup Language (ATML), an XML-based standard originally developed for electronic systems, for ML application testing. The paper explores extending IEEE Std 1671 to encompass the unique challenges of ML applications, including the use of datasets and dependencies on software. Through modeling various tests such as adversarial robustness and drift detection, this paper offers a framework adaptable to specific applications, suggesting that minor modifications to ATML might suffice to address the novelties of ML. This paper differentiates ATML's focus on testing from other ML standards like Predictive Model Markup Language (PMML) or Open Neural Network Exchange (ONNX), which concentrate on ML model specification. We conclude that ATML is a promising tool for effective, near real-time operational T&E of ML applications, an essential aspect of AI lifecycle management, safety, and governance.

Updated: 2024-04-04 19:28:38

标题: 关于将自动化测试标记语言(ATML)扩展至机器学习领域

摘要: 本文讨论了在机器学习(ML)应用的操作测试和评估(T&E)中,特别是在嵌入在机器人,卫星和无人车等系统中的边缘ML应用中,消息标准的迫切需求。它检查了IEEE标准1671(IEEE Std 1671),即自动测试标记语言(ATML),这是一种最初为电子系统开发的基于XML的标准,对于ML应用测试的适用性。本文探讨了将IEEE标准1671扩展到涵盖ML应用的独特挑战,包括数据集的使用和对软件的依赖。通过对抗性强度和漂移检测等各种测试进行建模,本文提供了一个适用于特定应用的框架,建议对ATML进行轻微修改可能足以应对ML的新颖性。本文区分了ATML对测试的关注点与其他ML标准(如预测模型标记语言(PMML)或开放神经网络交换(ONNX))专注于ML模型规范。我们得出结论,ATML是一种有前途的工具,可用于ML应用的有效,接近实时的操作T&E,这是AI生命周期管理,安全性和治理的重要方面。

更新时间: 2024-04-04 19:28:38

领域: cs.SE,cs.LG,cs.SY,eess.SY

下载: http://arxiv.org/abs/2404.03769v1

Federated Bayesian Deep Learning: The Application of Statistical Aggregation Methods to Bayesian Models

Federated learning (FL) is an approach to training machine learning models that takes advantage of multiple distributed datasets while maintaining data privacy and reducing communication costs associated with sharing local datasets. Aggregation strategies have been developed to pool or fuse the weights and biases of distributed deterministic models; however, modern deterministic deep learning (DL) models are often poorly calibrated and lack the ability to communicate a measure of epistemic uncertainty in prediction, which is desirable for remote sensing platforms and safety-critical applications. Conversely, Bayesian DL models are often well calibrated and capable of quantifying and communicating a measure of epistemic uncertainty along with a competitive prediction accuracy. Unfortunately, because the weights and biases in Bayesian DL models are defined by a probability distribution, simple application of the aggregation methods associated with FL schemes for deterministic models is either impossible or results in sub-optimal performance. In this work, we use independent and identically distributed (IID) and non-IID partitions of the CIFAR-10 dataset and a fully variational ResNet-20 architecture to analyze six different aggregation strategies for Bayesian DL models. Additionally, we analyze the traditional federated averaging approach applied to an approximate Bayesian Monte Carlo dropout model as a lightweight alternative to more complex variational inference methods in FL. We show that aggregation strategy is a key hyperparameter in the design of a Bayesian FL system with downstream effects on accuracy, calibration, uncertainty quantification, training stability, and client compute requirements.

Updated: 2024-04-04 19:09:21

标题: 联邦贝叶斯深度学习:统计聚合方法在贝叶斯模型中的应用

摘要: 联邦学习(FL)是一种训练机器学习模型的方法,利用多个分布式数据集的优势,同时保持数据隐私并降低与共享本地数据集相关的通信成本。已经开发了聚合策略来汇集或融合分布式确定性模型的权重和偏差;然而,现代确定性深度学习(DL)模型通常校准不佳,缺乏在预测中传达认知不确定性的能力,这对于遥感平台和安全关键应用是可取的。相反,贝叶斯DL模型通常校准良好,能够量化和传达伴随有竞争性预测准确性的认知不确定性度量。不幸的是,由于贝叶斯DL模型中的权重和偏差由概率分布定义,因此简单应用与确定性模型的FL方案相关的聚合方法要么不可能,要么导致次优性能。在这项工作中,我们使用CIFAR-10数据集的独立同分布(IID)和非IID分区以及完全变分ResNet-20架构来分析用于贝叶斯DL模型的六种不同聚合策略。此外,我们分析了传统的联邦平均方法,应用于近似贝叶斯蒙特卡洛辍学模型,作为对FL中更复杂变分推断方法的轻量级替代方案。我们表明,聚合策略是设计贝叶斯FL系统的关键超参数,对准确性、校准、不确定性量化、训练稳定性和客户端计算要求产生下游影响。

更新时间: 2024-04-04 19:09:21

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2403.15263v2

Difference of Submodular Minimization via DC Programming

Minimizing the difference of two submodular (DS) functions is a problem that naturally occurs in various machine learning problems. Although it is well known that a DS problem can be equivalently formulated as the minimization of the difference of two convex (DC) functions, existing algorithms do not fully exploit this connection. A classical algorithm for DC problems is called the DC algorithm (DCA). We introduce variants of DCA and its complete form (CDCA) that we apply to the DC program corresponding to DS minimization. We extend existing convergence properties of DCA, and connect them to convergence properties on the DS problem. Our results on DCA match the theoretical guarantees satisfied by existing DS algorithms, while providing a more complete characterization of convergence properties. In the case of CDCA, we obtain a stronger local minimality guarantee. Our numerical results show that our proposed algorithms outperform existing baselines on two applications: speech corpus selection and feature selection.

Updated: 2024-04-04 19:08:45

标题: DC编程在子模规划最小化中的差异

摘要: 将两个次模模函数的差最小化是各种机器学习问题中自然发生的问题。尽管众所周知,DS问题可以等价地被公式化为两个凸函数的差的最小化,但现有算法并没有充分利用这种联系。一个用于DC问题的经典算法称为DC算法(DCA)。我们引入了DCA及其完整形式(CDCA)的变体,将其应用于与DS最小化相对应的DC程序。我们扩展了DCA的现有收敛性质,并将其与DS问题上的收敛性质联系起来。我们关于DCA的结果与现有DS算法满足的理论保证相匹配,同时提供了更完整的收敛性质描述。在CDCA的情况下,我们获得了更强的局部最小性保证。我们的数值结果显示,我们提出的算法在语料库选择和特征选择两个应用中优于现有基线。

更新时间: 2024-04-04 19:08:45

领域: cs.LG,cs.DM,cs.DS,math.OC,stat.ML

下载: http://arxiv.org/abs/2305.11046v2

Learning smooth functions in high dimensions: from sparse polynomials to deep neural networks

Learning approximations to smooth target functions of many variables from finite sets of pointwise samples is an important task in scientific computing and its many applications in computational science and engineering. Despite well over half a century of research on high-dimensional approximation, this remains a challenging problem. Yet, significant advances have been made in the last decade towards efficient methods for doing this, commencing with so-called sparse polynomial approximation methods and continuing most recently with methods based on Deep Neural Networks (DNNs). In tandem, there have been substantial advances in the relevant approximation theory and analysis of these techniques. In this work, we survey this recent progress. We describe the contemporary motivations for this problem, which stem from parametric models and computational uncertainty quantification; the relevant function classes, namely, classes of infinite-dimensional, Banach-valued, holomorphic functions; fundamental limits of learnability from finite data for these classes; and finally, sparse polynomial and DNN methods for efficiently learning such functions from finite data. For the latter, there is currently a significant gap between the approximation theory of DNNs and the practical performance of deep learning. Aiming to narrow this gap, we develop the topic of practical existence theory, which asserts the existence of dimension-independent DNN architectures and training strategies that achieve provably near-optimal generalization errors in terms of the amount of training data.

Updated: 2024-04-04 19:07:21

标题: 在高维空间中学习平滑函数:从稀疏多项式到深度神经网络

摘要: 从有限集合的点值样本中学习多变量平滑目标函数的近似值是科学计算及其在计算科学和工程中的许多应用中的一个重要任务。尽管在高维逼近领域已经有半个多世纪的研究,但这仍然是一个具有挑战性的问题。然而,在过去的十年中,在有效方法方面取得了重大进展,首先是所谓的稀疏多项式逼近方法,最近则是基于深度神经网络(DNNs)的方法。与此同时,相关逼近理论和这些技术的分析也取得了重大进展。在本文中,我们对这一最近的进展进行了概述。我们描述了这一问题的当代动机,源自参数模型和计算不确定性量化;相关的函数类,即无限维、Banach值、全纯函数类;对于这些类别的有限数据学习的基本限制;最后,稀疏多项式和DNN方法,用于从有限数据高效学习这些函数。对于后者,目前深度学习的逼近理论与实际性能之间存在着显著差距。为了缩小这一差距,我们发展了实际存在理论的主题,该理论断言存在与维度无关的DNN架构和训练策略,可以在训练数据量方面达到可证实的近乎最优泛化误差。

更新时间: 2024-04-04 19:07:21

领域: math.NA,cs.LG,cs.NA

下载: http://arxiv.org/abs/2404.03761v1

Localized Distributional Robustness in Submodular Multi-Task Subset Selection

In this work, we approach the problem of multi-task submodular optimization with the perspective of local distributional robustness, within the neighborhood of a reference distribution which assigns an importance score to each task. We initially propose to introduce a regularization term which makes use of the relative entropy to the standard multi-task objective. We then demonstrate through duality that this novel formulation itself is equivalent to the maximization of a submodular function, which may be efficiently carried out through standard greedy selection methods. This approach bridges the existing gap in the optimization of performance-robustness trade-offs in multi-task subset selection. To numerically validate our theoretical results, we test the proposed method in two different setting, one involving the selection of satellites in low Earth orbit constellations in the context of a sensor selection problem, and the other involving an image summarization task using neural networks. Our method is compared with two other algorithms focused on optimizing the performance of the worst-case task, and on directly optimizing the performance on the reference distribution itself. We conclude that our novel formulation produces a solution that is locally distributional robust, and computationally inexpensive.

Updated: 2024-04-04 19:06:29

标题: 子模块多任务子集选择中的本地化分布稳健性

摘要: 在这项工作中,我们以局部分布鲁棒性的视角来解决多任务子模优化问题,参考分配给每个任务一个重要性评分的参考分布的邻域。我们最初提出引入一个正则化项,利用与标准多任务目标相对熵。然后通过对偶证明,这种新颖的公式本身等价于最大化一个子模函数,可以通过标准的贪婪选择方法有效地进行。这种方法弥合了多任务子集选择中优化性能-鲁棒性权衡的现有差距。为了数值验证我们的理论结果,我们在两种不同的设置下测试了所提出的方法,一种涉及在传感器选择问题的背景下选择低地球轨道星座中的卫星,另一种涉及使用神经网络进行图像摘要任务。我们的方法与另外两种专注于优化最坏情况任务性能的算法和直接优化参考分布本身性能的算法进行了比较。我们得出结论,我们的新颖公式产生了一个在本地分布上鲁棒的解决方案,并且计算成本低廉。

更新时间: 2024-04-04 19:06:29

领域: cs.LG,eess.SP,math.OC

下载: http://arxiv.org/abs/2404.03759v1

pixelSplat: 3D Gaussian Splats from Image Pairs for Scalable Generalizable 3D Reconstruction

We introduce pixelSplat, a feed-forward model that learns to reconstruct 3D radiance fields parameterized by 3D Gaussian primitives from pairs of images. Our model features real-time and memory-efficient rendering for scalable training as well as fast 3D reconstruction at inference time. To overcome local minima inherent to sparse and locally supported representations, we predict a dense probability distribution over 3D and sample Gaussian means from that probability distribution. We make this sampling operation differentiable via a reparameterization trick, allowing us to back-propagate gradients through the Gaussian splatting representation. We benchmark our method on wide-baseline novel view synthesis on the real-world RealEstate10k and ACID datasets, where we outperform state-of-the-art light field transformers and accelerate rendering by 2.5 orders of magnitude while reconstructing an interpretable and editable 3D radiance field.

Updated: 2024-04-04 19:04:55

标题: pixelSplat:从图像对中生成的3D高斯点状图案,用于可扩展的通用3D重建

摘要: 我们引入了pixelSplat,这是一个前馈模型,它学习从图像对中参数化的3D高斯基元重建3D辐射场。我们的模型具有实时和内存高效的渲染,可用于可扩展的训练以及推断时的快速3D重建。为了克服稀疏和局部支持表示固有的局部极小值,我们预测3D上的密集概率分布,并从该概率分布中采样高斯均值。我们通过重新参数化技巧使该采样操作可微分,从而可以通过高斯扩散表示反向传播梯度。我们在真实世界的RealEstate10k和ACID数据集上对我们的方法进行了基准测试,在宽基线新颖视图合成中,我们优于最先进的光场变换器,并将渲染加速了2.5个数量级,同时重建出可解释和可编辑的3D辐射场。

更新时间: 2024-04-04 19:04:55

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2312.12337v4

Experimental demonstration of magnetic tunnel junction-based computational random-access memory

Conventional computing paradigm struggles to fulfill the rapidly growing demands from emerging applications, especially those for machine intelligence, because much of the power and energy is consumed by constant data transfers between logic and memory modules. A new paradigm, called "computational random-access memory (CRAM)" has emerged to address this fundamental limitation. CRAM performs logic operations directly using the memory cells themselves, without having the data ever leave the memory. The energy and performance benefits of CRAM for both conventional and emerging applications have been well established by prior numerical studies. However, there lacks an experimental demonstration and study of CRAM to evaluate its computation accuracy, which is a realistic and application-critical metrics for its technological feasibility and competitiveness. In this work, a CRAM array based on magnetic tunnel junctions (MTJs) is experimentally demonstrated. First, basic memory operations as well as 2-, 3-, and 5-input logic operations are studied. Then, a 1-bit full adder with two different designs is demonstrated. Based on the experimental results, a suite of modeling has been developed to characterize the accuracy of CRAM computation. Further analysis of scalar addition, multiplication, and matrix multiplication shows promising results. These results are then applied to a complete application: a neural network based handwritten digit classifier, as an example to show the connection between the application performance and further MTJ development. The classifier achieved almost-perfect classification accuracy, with reasonable projections of future MTJ development. With the confirmation of MTJ-based CRAM's accuracy, there is a strong case that this technology will have a significant impact on power- and energy-demanding applications of machine intelligence.

Updated: 2024-04-04 18:44:47

标题: 基于磁隧道结的计算随机存取存储器的实验演示

摘要: 传统的计算范例难以满足新兴应用的快速增长需求,特别是对于机器智能的需求,因为大部分的功耗和能量都被逻辑和存储模块之间的数据传输所消耗。一种新的范例,称为“计算随机访问存储器(CRAM)”已经出现以解决这一根本限制。CRAM直接使用存储单元进行逻辑运算,而无需数据离开存储器。先前的数值研究已经确认了CRAM对于传统和新兴应用的能量和性能优势。然而,缺乏对CRAM的实验演示和研究,以评估其计算精度,这是对其技术可行性和竞争力的现实和应用关键指标。本文实验演示了一种基于磁隧道结合(MTJs)的CRAM阵列。首先,研究了基本存储操作以及2、3和5输入逻辑操作。然后,演示了具有两种不同设计的1位全加器。基于实验结果,开发了一套模型来表征CRAM计算的准确性。对标量加法、乘法和矩阵乘法的进一步分析显示了有希望的结果。这些结果随后应用于一个完整的应用程序:基于神经网络的手写数字分类器,作为一个示例来展示应用性能与进一步MTJ发展之间的联系。分类器实现了几乎完美的分类准确性,并合理预测了未来MTJ发展。通过确认基于MTJ的CRAM的准确性,可以有力地证明这项技术将对机器智能的功耗和能量需求的应用产生重大影响。

更新时间: 2024-04-04 18:44:47

领域: cs.ET,cond-mat.mes-hall,cs.AI,cs.AR,cs.SY,eess.SY

下载: http://arxiv.org/abs/2312.14264v2

A Reinforcement Learning based Reset Policy for CDCL SAT Solvers

Restart policy is an important technique used in modern Conflict-Driven Clause Learning (CDCL) solvers, wherein some parts of the solver state are erased at certain intervals during the run of the solver. In most solvers, variable activities are preserved across restart boundaries, resulting in solvers continuing to search parts of the assignment tree that are not far from the one immediately prior to a restart. To enable the solver to search possibly "distant" parts of the assignment tree, we study the effect of resets, a variant of restarts which not only erases the assignment trail, but also randomizes the activity scores of the variables of the input formula after reset, thus potentially enabling a better global exploration of the search space. In this paper, we model the problem of whether to trigger reset as a multi-armed bandit (MAB) problem, and propose two reinforcement learning (RL) based adaptive reset policies using the Upper Confidence Bound (UCB) and Thompson sampling algorithms. These two algorithms balance the exploration-exploitation tradeoff by adaptively choosing arms (reset vs. no reset) based on their estimated rewards during the solver's run. We implement our reset policies in four baseline SOTA CDCL solvers and compare the baselines against the reset versions on Satcoin benchmarks and SAT Competition instances. Our results show that RL-based reset versions outperform the corresponding baseline solvers on both Satcoin and the SAT competition instances, suggesting that our RL policy helps to dynamically and profitably adapt the reset frequency for any given input instance. We also introduce the concept of a partial reset, where at least a constant number of variable activities are retained across reset boundaries. Building on previous results, we show that there is an exponential separation between O(1) vs. $\Omega(n)$-length partial resets.

Updated: 2024-04-04 18:44:33

标题: 基于强化学习的CDCL SAT求解器的复位策略

摘要: 重启策略是现代基于冲突驱动子句学习(CDCL)求解器中使用的一种重要技术,其中在求解器运行期间的某些时间间隔内擦除了求解器状态的某些部分。在大多数求解器中,变量活动在重启边界上被保留,导致求解器继续搜索距上次重启之前不远的赋值树的部分。为了使求解器能够搜索可能“遥远”的赋值树的部分,我们研究了重置的效果,这是重启的一种变体,不仅擦除赋值路径,还在重置后随机化输入公式中变量的活动分数,从而可能更好地全局探索搜索空间。 在本文中,我们将触发重置的问题建模为多臂老虎机(MAB)问题,并提出了两种基于强化学习(RL)的自适应重置策略,使用上界置信度(UCB)和汤普森抽样算法。这两种算法通过根据求解器运行期间的估计奖励自适应地选择臂(重置 vs. 不重置)来平衡探索-开发权衡。我们在四个基线SOTA CDCL求解器中实现了我们的重置策略,并将基线与重置版本在Satcoin基准测试和SAT竞赛实例上进行比较。我们的结果表明,基于RL的重置版本在Satcoin和SAT竞赛实例上均优于相应的基线求解器,表明我们的RL策略有助于动态且有利地调整给定输入实例的重置频率。我们还介绍了部分重置的概念,其中至少有一个恒定数量的变量活动跨越重置边界保留。借鉴先前的结果,我们展示了O(1)与$\Omega(n)$长度部分重置之间的指数差异。

更新时间: 2024-04-04 18:44:33

领域: cs.LO,cs.AI,cs.LG

下载: http://arxiv.org/abs/2404.03753v1

Object-Centric Conformance Alignments with Synchronization (Extended Version)

Real-world processes operate on objects that are inter-dependent. To accurately reflect the nature of such processes, object-centric process mining techniques are needed, notably conformance checking. However, while the object-centric perspective has recently gained traction, few concrete process mining techniques have been presented so far. Moreover, existing approaches are severely limited in their abilities to keep track of object identity and object dependencies. Consequently, serious problems in logs remain undetected. In this paper, we present a new formalism that combines the key modelling features of two existing approaches, in particular the ability of object-centric Petri nets to capture one-to-many relations and the one of Petri nets with identifiers to compare and synchronize objects based on their identity. We call the resulting formalism 'object-centric Petri nets with identifiers', and define alignments and the conformance checking task for this setting. We propose a conformance checking approach for such nets based on an encoding in satisfiability modulo theories (SMT), and illustrate how it can be effectively used to overcome shortcomings of earlier work. To assess its practicality, we perform an evaluation on data from the literature.

Updated: 2024-04-04 18:39:50

标题: Object-Centric Conformance Alignments with Synchronization (Extended Version) 对象中心的符合对齐与同步(扩展版本)

摘要: 真实世界的过程涉及相互依赖的对象。为了准确反映这类过程的性质,需要对象为中心的过程挖掘技术,特别是符合性检查。然而,虽然对象为中心的视角最近取得了进展,但迄今为止很少提出了具体的过程挖掘技术。此外,现有方法在跟踪对象身份和对象依赖性方面受到严重限制。因此,日志中的严重问题仍然未被发现。在本文中,我们提出了一个新的形式化方法,结合了两种现有方法的关键建模特征,特别是对象为中心的Petri网能够捕捉一对多关系,以及具有标识符的Petri网能够根据其身份比较和同步对象。我们将得到的形式化方法称为“具有标识符的对象为中心的Petri网”,并为该设置定义了对齐和符合性检查任务。我们提出了一种基于满足性模理论(SMT)编码的这些网的符合性检查方法,并说明它如何有效地用于克服早期工作的缺点。为了评估其实用性,我们对文献中的数据进行了评估。

更新时间: 2024-04-04 18:39:50

领域: cs.LO,cs.AI

下载: http://arxiv.org/abs/2312.08537v2

GenQREnsemble: Zero-Shot LLM Ensemble Prompting for Generative Query Reformulation

Query Reformulation(QR) is a set of techniques used to transform a user's original search query to a text that better aligns with the user's intent and improves their search experience. Recently, zero-shot QR has been shown to be a promising approach due to its ability to exploit knowledge inherent in large language models. By taking inspiration from the success of ensemble prompting strategies which have benefited many tasks, we investigate if they can help improve query reformulation. In this context, we propose an ensemble based prompting technique, GenQREnsemble which leverages paraphrases of a zero-shot instruction to generate multiple sets of keywords ultimately improving retrieval performance. We further introduce its post-retrieval variant, GenQREnsembleRF to incorporate pseudo relevant feedback. On evaluations over four IR benchmarks, we find that GenQREnsemble generates better reformulations with relative nDCG@10 improvements up to 18% and MAP improvements upto 24% over the previous zero-shot state-of-art. On the MSMarco Passage Ranking task, GenQREnsembleRF shows relative gains of 5% MRR using pseudo-relevance feedback, and 9% nDCG@10 using relevant feedback documents.

Updated: 2024-04-04 18:35:25

标题: GenQREnsemble: 用于生成查询重构的零样本LLM集成提示

摘要: 查询重构(QR)是一组技术,用于将用户的原始搜索查询转换为更符合用户意图并改善其搜索体验的文本。最近,零-shot QR已被证明是一种有前途的方法,因为它能够利用大型语言模型中固有的知识。受到集成提示策略成功的启发,我们探讨它们是否可以帮助改进查询重构。在这种背景下,我们提出了一种基于集成提示技术的GenQREnsemble,它利用零-shot指令的释义生成多组关键词,最终提高检索性能。我们进一步介绍了其后检索变体GenQREnsembleRF,以整合伪相关反馈。在四个IR基准测试中的评估中,我们发现GenQREnsemble生成更好的重构,相对nDCG@10提高了高达18%,MAP提高了高达24%,超过了先前的零-shot 最新技术。在MSMarco Passage Ranking任务中,GenQREnsembleRF利用伪相关反馈显示出5%MRR的相对增益,利用相关反馈文档显示出9%nDCG@10的相对增益。

更新时间: 2024-04-04 18:35:25

领域: cs.IR,cs.AI,cs.CL

下载: http://arxiv.org/abs/2404.03746v1

Fakes of Varying Shades: How Warning Affects Human Perception and Engagement Regarding LLM Hallucinations

The widespread adoption and transformative effects of large language models (LLMs) have sparked concerns regarding their capacity to produce inaccurate and fictitious content, referred to as `hallucinations'. Given the potential risks associated with hallucinations, humans should be able to identify them. This research aims to understand the human perception of LLM hallucinations by systematically varying the degree of hallucination (genuine, minor hallucination, major hallucination) and examining its interaction with warning (i.e., a warning of potential inaccuracies: absent vs. present). Participants (N=419) from Prolific rated the perceived accuracy and engaged with content (e.g., like, dislike, share) in a Q/A format. Results indicate that humans rank content as truthful in the order genuine > minor hallucination > major hallucination and user engagement behaviors mirror this pattern. More importantly, we observed that warning improves hallucination detection without significantly affecting the perceived truthfulness of genuine content. We conclude by offering insights for future tools to aid human detection of hallucinations.

Updated: 2024-04-04 18:34:32

标题: 不同程度的伪造品:警示如何影响人类对LLM幻觉的感知和参与

摘要: 大型语言模型(LLMs)的广泛应用和转变性影响引发了人们对其产生不准确和虚构内容能力的担忧,被称为“幻觉”。鉴于与幻觉相关的潜在风险,人类应该能够识别它们。本研究旨在通过系统地改变幻觉程度(真实、轻微幻觉、重大幻觉)并检查其与警告(即潜在不准确性的警告:缺席 vs. 存在)的交互作用,以了解人类对LLM幻觉的感知。来自Prolific的参与者(N=419)通过问答形式对感知准确性进行评分并与内容互动(例如,喜欢、不喜欢、分享)。结果表明,人类将内容排列为真实 > 轻微幻觉 > 重大幻觉,用户参与行为也反映了这一模式。更重要的是,我们观察到警告可以改善幻觉检测,而不会显著影响真实内容的感知准确性。最后,我们提供了为未来辅助人类检测幻觉的工具的见解。

更新时间: 2024-04-04 18:34:32

领域: cs.HC,cs.AI,cs.CL

下载: http://arxiv.org/abs/2404.03745v1

The Relational Bottleneck as an Inductive Bias for Efficient Abstraction

A central challenge for cognitive science is to explain how abstract concepts are acquired from limited experience. This has often been framed in terms of a dichotomy between connectionist and symbolic cognitive models. Here, we highlight a recently emerging line of work that suggests a novel reconciliation of these approaches, by exploiting an inductive bias that we term the relational bottleneck. In that approach, neural networks are constrained via their architecture to focus on relations between perceptual inputs, rather than the attributes of individual inputs. We review a family of models that employ this approach to induce abstractions in a data-efficient manner, emphasizing their potential as candidate models for the acquisition of abstract concepts in the human mind and brain.

Updated: 2024-04-04 18:08:52

标题: 关系瓶颈作为高效抽象的归纳偏好

摘要: 认知科学面临的一个核心挑战是解释如何从有限的经验中获得抽象概念。这经常被描述为连接主义和符号认知模型之间的二元对立。在这里,我们强调了一个最近出现的工作线索,表明通过利用我们所称的关系瓶颈的归纳偏见,可以实现这些方法的新型调和。在这种方法中,神经网络通过其结构受限于关注感知输入之间的关系,而不是个体输入的属性。我们回顾了一系列采用这种方法以高效地从数据中引出抽象的模型,强调它们作为人类心智和大脑中获得抽象概念的候选模型的潜力。

更新时间: 2024-04-04 18:08:52

领域: cs.AI,cs.NE

下载: http://arxiv.org/abs/2309.06629v4

SHROOM-INDElab at SemEval-2024 Task 6: Zero- and Few-Shot LLM-Based Classification for Hallucination Detection

We describe the University of Amsterdam Intelligent Data Engineering Lab team's entry for the SemEval-2024 Task 6 competition. The SHROOM-INDElab system builds on previous work on using prompt programming and in-context learning with large language models (LLMs) to build classifiers for hallucination detection, and extends that work through the incorporation of context-specific definition of task, role, and target concept, and automated generation of examples for use in a few-shot prompting approach. The resulting system achieved fourth-best and sixth-best performance in the model-agnostic track and model-aware tracks for Task 6, respectively, and evaluation using the validation sets showed that the system's classification decisions were consistent with those of the crowd-sourced human labellers. We further found that a zero-shot approach provided better accuracy than a few-shot approach using automatically generated examples. Code for the system described in this paper is available on Github.

Updated: 2024-04-04 18:01:21

标题: SHROOM-INDElab在SemEval-2024任务6中:基于零样本和少样本LLM的分类用于幻觉检测

摘要: 我们描述了阿姆斯特丹大学智能数据工程实验室团队参加SemEval-2024任务6竞赛的情况。SHROOM-INDElab系统建立在先前使用提示式编程和上下文学习与大型语言模型(LLMs)构建幻觉检测分类器的工作基础上,并通过结合任务、角色和目标概念的上下文特定定义,以及自动生成示例用于少量提示方法来扩展该工作。结果系统在任务6的模型无关和模型感知轨道中分别取得了第四和第六的表现,并使用验证集进行评估表明系统的分类决策与众包人工标注者的一致。我们进一步发现,零提示方法比使用自动生成示例的少量提示方法提供了更好的准确度。本文描述的系统代码可在Github上获得。

更新时间: 2024-04-04 18:01:21

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2404.03732v1

JUICER: Data-Efficient Imitation Learning for Robotic Assembly

While learning from demonstrations is powerful for acquiring visuomotor policies, high-performance imitation without large demonstration datasets remains challenging for tasks requiring precise, long-horizon manipulation. This paper proposes a pipeline for improving imitation learning performance with a small human demonstration budget. We apply our approach to assembly tasks that require precisely grasping, reorienting, and inserting multiple parts over long horizons and multiple task phases. Our pipeline combines expressive policy architectures and various techniques for dataset expansion and simulation-based data augmentation. These help expand dataset support and supervise the model with locally corrective actions near bottleneck regions requiring high precision. We demonstrate our pipeline on four furniture assembly tasks in simulation, enabling a manipulator to assemble up to five parts over nearly 2500 time steps directly from RGB images, outperforming imitation and data augmentation baselines.

Updated: 2024-04-04 18:00:15

标题: 榨汁机:用于机器人装配的数据高效模仿学习

摘要: 虽然从示范中学习在获取视觉动作策略方面非常强大,但在不需要大量示范数据集的情况下实现高性能模仿对于需要精确、长期操纵的任务仍然具有挑战性。本文提出了一种用于改善模仿学习性能的流程,只需很少的人类示范预算。我们将我们的方法应用于需要精确抓取、重新定位和插入多个零件的装配任务,在长期和多个任务阶段中。我们的流程结合了富有表现力的策略架构和各种数据集扩展和基于仿真的数据增强技术。这些技术有助于扩展数据集支持,并使用局部校正动作监督模型,使其在需要高精度的瓶颈区域附近执行校正动作。我们在仿真环境中展示了我们的流程,在四个家具装配任务中,使得机械臂能够直接从RGB图像中组装高达五个零件,表现优于模仿和数据增强基准。

更新时间: 2024-04-04 18:00:15

领域: cs.RO,cs.LG

下载: http://arxiv.org/abs/2404.03729v1

OW-VISCap: Open-World Video Instance Segmentation and Captioning

Open-world video instance segmentation is an important video understanding task. Yet most methods either operate in a closed-world setting, require an additional user-input, or use classic region-based proposals to identify never before seen objects. Further, these methods only assign a one-word label to detected objects, and don't generate rich object-centric descriptions. They also often suffer from highly overlapping predictions. To address these issues, we propose Open-World Video Instance Segmentation and Captioning (OW-VISCap), an approach to jointly segment, track, and caption previously seen or unseen objects in a video. For this, we introduce open-world object queries to discover never before seen objects without additional user-input. We generate rich and descriptive object-centric captions for each detected object via a masked attention augmented LLM input. We introduce an inter-query contrastive loss to ensure that the object queries differ from one another. Our generalized approach matches or surpasses state-of-the-art on three tasks: open-world video instance segmentation on the BURST dataset, dense video object captioning on the VidSTG dataset, and closed-world video instance segmentation on the OVIS dataset.

Updated: 2024-04-04 17:59:58

标题: OW-VISCap:开放世界视频实例分割和字幕化

摘要: 开放世界视频实例分割是一项重要的视频理解任务。然而,大多数方法要么在封闭世界设置中运行,要么需要额外的用户输入,要么使用经典的基于区域的提议来识别以前未见过的对象。此外,这些方法只为检测到的对象分配一个单词标签,并不生成丰富的以对象为中心的描述。它们也经常受到高度重叠的预测的困扰。为了解决这些问题,我们提出了开放世界视频实例分割和字幕化(OW-VISCap)的方法,该方法同时在视频中对以前看过或未看过的对象进行分割、跟踪和字幕化。为此,我们引入了开放世界对象查询,以发现以前从未见过的对象而无需额外的用户输入。我们通过掩码注意增强的LLM输入为每个检测到的对象生成丰富和描述性的以对象为中心的字幕。我们引入了一种查询间对比损失,以确保对象查询彼此不同。我们的通用方法在三项任务上与或超过了最先进技术:BURST数据集上的开放世界视频实例分割,VidSTG数据集上的密集视频对象字幕,以及OVIS数据集上的封闭世界视频实例分割。

更新时间: 2024-04-04 17:59:58

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2404.03657v1

CoMat: Aligning Text-to-Image Diffusion Model with Image-to-Text Concept Matching

Diffusion models have demonstrated great success in the field of text-to-image generation. However, alleviating the misalignment between the text prompts and images is still challenging. The root reason behind the misalignment has not been extensively investigated. We observe that the misalignment is caused by inadequate token attention activation. We further attribute this phenomenon to the diffusion model's insufficient condition utilization, which is caused by its training paradigm. To address the issue, we propose CoMat, an end-to-end diffusion model fine-tuning strategy with an image-to-text concept matching mechanism. We leverage an image captioning model to measure image-to-text alignment and guide the diffusion model to revisit ignored tokens. A novel attribute concentration module is also proposed to address the attribute binding problem. Without any image or human preference data, we use only 20K text prompts to fine-tune SDXL to obtain CoMat-SDXL. Extensive experiments show that CoMat-SDXL significantly outperforms the baseline model SDXL in two text-to-image alignment benchmarks and achieves start-of-the-art performance.

Updated: 2024-04-04 17:59:46

标题: CoMat:将文本到图像扩散模型与图像到文本概念匹配对齐

摘要: 扩散模型在文本到图像生成领域取得了巨大成功。然而,缓解文本提示与图像之间的不对齐仍然具有挑战性。导致不对齐的根本原因尚未得到广泛调查。我们观察到不对齐是由于令牌关注激活不足引起的。我们进一步将这一现象归因于扩散模型的不足条件利用,这是由其训练范式引起的。为了解决这个问题,我们提出了CoMat,一种端到端的扩散模型微调策略,具有图像到文本概念匹配机制。我们利用图像字幕模型来衡量图像到文本的对齐,并指导扩散模型重新审视被忽略的令牌。我们还提出了一种新颖的属性集中模块来解决属性绑定问题。在没有任何图像或人类偏好数据的情况下,我们仅使用2万个文本提示来对SDXL进行微调,以获得CoMat-SDXL。大量实验证明,CoMat-SDXL在两个文本到图像对齐基准测试中明显优于基线模型SDXL,并实现了最先进的性能。

更新时间: 2024-04-04 17:59:46

领域: cs.CV,cs.AI,cs.CL

下载: http://arxiv.org/abs/2404.03653v1

Capabilities of Large Language Models in Control Engineering: A Benchmark Study on GPT-4, Claude 3 Opus, and Gemini 1.0 Ultra

In this paper, we explore the capabilities of state-of-the-art large language models (LLMs) such as GPT-4, Claude 3 Opus, and Gemini 1.0 Ultra in solving undergraduate-level control problems. Controls provides an interesting case study for LLM reasoning due to its combination of mathematical theory and engineering design. We introduce ControlBench, a benchmark dataset tailored to reflect the breadth, depth, and complexity of classical control design. We use this dataset to study and evaluate the problem-solving abilities of these LLMs in the context of control engineering. We present evaluations conducted by a panel of human experts, providing insights into the accuracy, reasoning, and explanatory prowess of LLMs in control engineering. Our analysis reveals the strengths and limitations of each LLM in the context of classical control, and our results imply that Claude 3 Opus has become the state-of-the-art LLM for solving undergraduate control problems. Our study serves as an initial step towards the broader goal of employing artificial general intelligence in control engineering.

Updated: 2024-04-04 17:58:38

标题: 大语言模型在控制工程领域的能力:对GPT-4、Claude 3 Opus和Gemini 1.0 Ultra的基准研究

摘要: 在本文中,我们探讨了最新一代大型语言模型(LLMs)如GPT-4、Claude 3 Opus和Gemini 1.0 Ultra在解决本科水平控制问题方面的能力。控制问题提供了LLM推理的一个有趣案例研究,因为它结合了数学理论和工程设计。我们引入了ControlBench,一个专门设计以反映传统控制设计广度、深度和复杂性的基准数据集。我们使用这个数据集来研究并评估这些LLMs在控制工程背景下的问题解决能力。我们提供了由人类专家小组进行的评估,以洞察LLMs在控制工程中的准确性、推理能力和解释能力。我们的分析揭示了每个LLM在传统控制背景下的优势和局限性,我们的结果表明Claude 3 Opus已成为解决本科控制问题的最新一代LLM。我们的研究是朝着在控制工程中应用人工通用智能的更广泛目标迈出的初步步骤。

更新时间: 2024-04-04 17:58:38

领域: math.OC,cs.AI,cs.LG

下载: http://arxiv.org/abs/2404.03647v1

No "Zero-Shot" Without Exponential Data: Pretraining Concept Frequency Determines Multimodal Model Performance

Web-crawled pretraining datasets underlie the impressive "zero-shot" evaluation performance of multimodal models, such as CLIP for classification/retrieval and Stable-Diffusion for image generation. However, it is unclear how meaningful the notion of "zero-shot" generalization is for such multimodal models, as it is not known to what extent their pretraining datasets encompass the downstream concepts targeted for during "zero-shot" evaluation. In this work, we ask: How is the performance of multimodal models on downstream concepts influenced by the frequency of these concepts in their pretraining datasets? We comprehensively investigate this question across 34 models and five standard pretraining datasets (CC-3M, CC-12M, YFCC-15M, LAION-400M, LAION-Aesthetics), generating over 300GB of data artifacts. We consistently find that, far from exhibiting "zero-shot" generalization, multimodal models require exponentially more data to achieve linear improvements in downstream "zero-shot" performance, following a sample inefficient log-linear scaling trend. This trend persists even when controlling for sample-level similarity between pretraining and downstream datasets, and testing on purely synthetic data distributions. Furthermore, upon benchmarking models on long-tailed data sampled based on our analysis, we demonstrate that multimodal models across the board perform poorly. We contribute this long-tail test set as the "Let it Wag!" benchmark to further research in this direction. Taken together, our study reveals an exponential need for training data which implies that the key to "zero-shot" generalization capabilities under large-scale training paradigms remains to be found.

Updated: 2024-04-04 17:58:02

标题: 没有指数数据就没有“零次射击”:预训练概念频率决定多模型性能

摘要: 网络爬虫预训练数据集是支撑多模态模型(例如用于分类/检索的CLIP和用于图像生成的Stable-Diffusion)令人印象深刻的“零-shot”评估性能的基础。然而,目前尚不清楚“零-shot”泛化概念对于这些多模态模型是否具有意义,因为尚不清楚它们的预训练数据集在“零-shot”评估期间所针对的下游概念的范围。在这项工作中,我们探讨了一个问题:多模态模型在下游概念上的表现如何受到这些概念在其预训练数据集中的频率的影响?我们在34个模型和五个标准预训练数据集(CC-3M、CC-12M、YFCC-15M、LAION-400M、LAION-Aesthetics)上全面调查了这个问题,生成了超过300GB的数据产物。我们一致发现,与展现“零-shot”泛化相反,多模态模型需要指数级增加的数据才能实现下游“零-shot”性能的线性改进,遵循一个样本低效的对数线性缩放趋势。即使在控制预训练数据集和下游数据集之间样本级相似性,并在纯合成数据分布上进行测试时,这种趋势仍然存在。此外,通过基于我们的分析对长尾数据进行采样并对模型进行基准测试,我们证明了全面而言多模态模型表现不佳。我们将这个长尾测试集作为“让它摇摆!”基准,以促进这个方向的进一步研究。综上所述,我们的研究揭示了对训练数据的指数级需求,这意味着在大规模训练范式下实现“零-shot”泛化能力的关键尚待发现。

更新时间: 2024-04-04 17:58:02

领域: cs.CV,cs.CL,cs.LG

下载: http://arxiv.org/abs/2404.04125v1

Direct Nash Optimization: Teaching Language Models to Self-Improve with General Preferences

This paper studies post-training large language models (LLMs) using preference feedback from a powerful oracle to help a model iteratively improve over itself. The typical approach for post-training LLMs involves Reinforcement Learning from Human Feedback (RLHF), which traditionally separates reward learning and subsequent policy optimization. However, such a reward maximization approach is limited by the nature of "point-wise" rewards (such as Bradley-Terry model), which fails to express complex intransitive or cyclic preference relations. While advances on RLHF show reward learning and policy optimization can be merged into a single contrastive objective for stability, they yet still remain tethered to the reward maximization framework. Recently, a new wave of research sidesteps the reward maximization presumptions in favor of directly optimizing over "pair-wise" or general preferences. In this paper, we introduce Direct Nash Optimization (DNO), a provable and scalable algorithm that marries the simplicity and stability of contrastive learning with theoretical generality from optimizing general preferences. Because DNO is a batched on-policy algorithm using a regression-based objective, its implementation is straightforward and efficient. Moreover, DNO enjoys monotonic improvement across iterations that help it improve even over a strong teacher (such as GPT-4). In our experiments, a resulting 7B parameter Orca-2.5 model aligned by DNO achieves the state-of-the-art win-rate against GPT-4-Turbo of 33% on AlpacaEval 2.0 (even after controlling for response length), an absolute gain of 26% (7% to 33%) over the initializing model. It outperforms models with far more parameters, including Mistral Large, Self-Rewarding LM (70B parameters), and older versions of GPT-4.

Updated: 2024-04-04 17:56:41

标题: 直接纳什优化:教授语言模型利用一般偏好进行自我改进

摘要: 这篇论文研究了使用强大的神谕的偏好反馈来帮助模型迭代改进的后训练大规模语言模型(LLMs)。传统的后训练LLMs方法涉及从人类反馈中进行强化学习(RLHF),传统上将奖励学习和随后的策略优化分开。然而,这种奖励最大化方法受到“点对点”奖励(如Bradley-Terry模型)的限制,无法表达复杂的不可转换或循环偏好关系。虽然RLHF的进展表明奖励学习和策略优化可以合并为单一的对比目标以实现稳定性,但它们仍然与奖励最大化框架相连。最近,一波新的研究越过奖励最大化的假设,而是直接优化“成对的”或一般的偏好。在本文中,我们介绍了直接纳什优化(DNO),这是一种可证明且可扩展的算法,它将对比学习的简单性和稳定性与优化一般偏好的理论普适性相结合。由于DNO是一种基于回归目标的批处理在线算法,其实现是简单和高效的。此外,DNO在迭代过程中享有单调改进,帮助它甚至胜过强大的教师(如GPT-4)。在我们的实验中,通过DNO对其进行了调整的7B参数的Orca-2.5模型在AlpacaEval 2.0上实现了与GPT-4-Turbo的最新胜率为33%的最新水平(即使在控制响应长度的情况下),比起初始化模型的26%(从7%到33%)的绝对增益。它胜过了参数更多的模型,包括Mistral Large,Self-Rewarding LM(70B参数)和旧版本的GPT-4。

更新时间: 2024-04-04 17:56:41

领域: cs.LG,cs.AI,cs.CL

下载: http://arxiv.org/abs/2404.03715v1

WorDepth: Variational Language Prior for Monocular Depth Estimation

Three-dimensional (3D) reconstruction from a single image is an ill-posed problem with inherent ambiguities, i.e. scale. Predicting a 3D scene from text description(s) is similarly ill-posed, i.e. spatial arrangements of objects described. We investigate the question of whether two inherently ambiguous modalities can be used in conjunction to produce metric-scaled reconstructions. To test this, we focus on monocular depth estimation, the problem of predicting a dense depth map from a single image, but with an additional text caption describing the scene. To this end, we begin by encoding the text caption as a mean and standard deviation; using a variational framework, we learn the distribution of the plausible metric reconstructions of 3D scenes corresponding to the text captions as a prior. To "select" a specific reconstruction or depth map, we encode the given image through a conditional sampler that samples from the latent space of the variational text encoder, which is then decoded to the output depth map. Our approach is trained alternatingly between the text and image branches: in one optimization step, we predict the mean and standard deviation from the text description and sample from a standard Gaussian, and in the other, we sample using a (image) conditional sampler. Once trained, we directly predict depth from the encoded text using the conditional sampler. We demonstrate our approach on indoor (NYUv2) and outdoor (KITTI) scenarios, where we show that language can consistently improve performance in both.

Updated: 2024-04-04 17:54:33

标题: WorDepth:单目深度估计的变分语言先验

摘要: 从单个图像中进行三维(3D)重建是一个具有固有模糊性的问题,即尺度。从文本描述预测3D场景同样是一个具有固有模糊性的问题,即描述物体的空间排列。我们研究了两种固有模糊性模态是否可以结合使用以产生度量尺度的重建。为了测试这一点,我们专注于单眼深度估计,即从单个图像预测稠密深度图的问题,但附加了描述场景的文本标题。为此,我们首先将文本标题编码为均值和标准差;使用变分框架,我们学习了与文本标题对应的3D场景的可能度量重建的分布作为先验。为了“选择”特定的重建或深度图,我们通过条件抽样器对给定图像进行编码,该抽样器从变分文本编码器的潜在空间中抽样,然后解码为输出深度图。我们的方法在文本和图像分支之间交替训练:在一个优化步骤中,我们从文本描述中预测均值和标准差,并从标准高斯中抽样,在另一个步骤中,我们使用(图像)条件抽样器进行抽样。训练完成后,我们使用条件抽样器直接从编码的文本中预测深度。我们在室内(NYUv2)和室外(KITTI)情境中展示了我们的方法,结果表明语言可以在两者中始终提高性能。

更新时间: 2024-04-04 17:54:33

领域: cs.CV,cs.AI,cs.CL,cs.LG,cs.MM

下载: http://arxiv.org/abs/2404.03635v1

SpikeExplorer: hardware-oriented Design Space Exploration for Spiking Neural Networks on FPGA

One of today's main concerns is to bring Artificial Intelligence power to embedded systems for edge applications. The hardware resources and power consumption required by state-of-the-art models are incompatible with the constrained environments observed in edge systems, such as IoT nodes and wearable devices. Spiking Neural Networks (SNNs) can represent a solution in this sense: inspired by neuroscience, they reach unparalleled power and resource efficiency when run on dedicated hardware accelerators. However, when designing such accelerators, the amount of choices that can be taken is huge. This paper presents SpikExplorer, a modular and flexible Python tool for hardware-oriented Automatic Design Space Exploration to automate the configuration of FPGA accelerators for SNNs. Using Bayesian optimizations, SpikerExplorer enables hardware-centric multi-objective optimization, supporting factors such as accuracy, area, latency, power, and various combinations during the exploration process. The tool searches the optimal network architecture, neuron model, and internal and training parameters, trying to reach the desired constraints imposed by the user. It allows for a straightforward network configuration, providing the full set of explored points for the user to pick the trade-off that best fits the needs. The potential of SpikExplorer is showcased using three benchmark datasets. It reaches 95.8% accuracy on the MNIST dataset, with a power consumption of 180mW/image and a latency of 0.12 ms/image, making it a powerful tool for automatically optimizing SNNs.

Updated: 2024-04-04 17:53:08

标题: SpikeExplorer:面向硬件的FPGA上脉冲神经网络设计空间探索

摘要: 当今一个主要关注点是将人工智能能力引入嵌入式系统,用于边缘应用。目前先进模型所需的硬件资源和功耗与边缘系统中观察到的受限环境不兼容,如物联网节点和可穿戴设备。脉冲神经网络(SNNs)可以在这方面提供解决方案:受神经科学启发,它们在专用硬件加速器上运行时具有无与伦比的功耗和资源效率。然而,在设计这种加速器时,可选择的选择数量是巨大的。本文介绍了SpikExplorer,这是一个模块化和灵活的Python工具,用于面向硬件的自动设计空间探索,以自动配置用于SNNs的FPGA加速器。使用贝叶斯优化,SpikerExplorer实现了硬件中心的多目标优化,支持在探索过程中考虑准确性、面积、延迟、功耗以及各种组合等因素。该工具搜索最佳的网络架构、神经元模型和内部和训练参数,试图实现用户施加的所需约束。它允许简单的网络配置,为用户提供完整的探索点集,以便用户选择最符合需求的权衡。SpikExplorer的潜力通过三个基准数据集展示出来。它在MNIST数据集上达到了95.8%的准确率,每张图像功耗为180mW,延迟为0.12ms,使其成为自动优化SNNs的强大工具。

更新时间: 2024-04-04 17:53:08

领域: cs.NE,cs.AI

下载: http://arxiv.org/abs/2404.03714v1

Training LLMs over Neurally Compressed Text

In this paper, we explore the idea of training large language models (LLMs) over highly compressed text. While standard subword tokenizers compress text by a small factor, neural text compressors can achieve much higher rates of compression. If it were possible to train LLMs directly over neurally compressed text, this would confer advantages in training and serving efficiency, as well as easier handling of long text spans. The main obstacle to this goal is that strong compression tends to produce opaque outputs that are not well-suited for learning. In particular, we find that text na\"ively compressed via Arithmetic Coding is not readily learnable by LLMs. To overcome this, we propose Equal-Info Windows, a novel compression technique whereby text is segmented into blocks that each compress to the same bit length. Using this method, we demonstrate effective learning over neurally compressed text that improves with scale, and outperforms byte-level baselines by a wide margin on perplexity and inference speed benchmarks. While our method delivers worse perplexity than subword tokenizers for models trained with the same parameter count, it has the benefit of shorter sequence lengths. Shorter sequence lengths require fewer autoregressive generation steps, and reduce latency. Finally, we provide extensive analysis of the properties that contribute to learnability, and offer concrete suggestions for how to further improve the performance of high-compression tokenizers.

Updated: 2024-04-04 17:48:28

标题: 在神经压缩文本上训练LLMs

摘要: 在本文中,我们探讨了在高度压缩的文本上训练大型语言模型(LLMs)的想法。虽然标准的子词分词器将文本压缩了一小部分,但神经文本压缩器可以实现更高的压缩率。如果可以直接在神经压缩的文本上训练LLMs,这将在训练和服务效率方面带来优势,同时更容易处理长文本跨度。实现这一目标的主要障碍是,强压缩往往会产生不适合学习的不透明输出。特别是,我们发现通过算术编码朴素压缩的文本对LLMs来说不容易学习。为了克服这一障碍,我们提出了Equal-Info Windows,一种新颖的压缩技术,其中文本被分割成每个块都压缩到相同的位长度。使用这种方法,我们展示了在神经压缩文本上的有效学习,随着规模的增加而改进,并且在困惑度和推理速度基准上大幅优于字节级基线。虽然我们的方法在具有相同参数数量的模型上训练时比子词分词器提供了更差的困惑度,但它具有较短的序列长度的好处。较短的序列长度需要更少的自回归生成步骤,并减少延迟。最后,我们对有助于可学性的属性进行了广泛分析,并提出了如何进一步改进高压缩分词器性能的具体建议。

更新时间: 2024-04-04 17:48:28

领域: cs.CL,cs.LG

下载: http://arxiv.org/abs/2404.03626v1

Standardizing Knowledge Engineering Practices with a Reference Architecture

Knowledge engineering is the process of creating and maintaining knowledge-producing systems. Throughout the history of computer science and AI, knowledge engineering workflows have been widely used given the importance of high-quality knowledge for reliable intelligent agents. Meanwhile, the scope of knowledge engineering, as apparent from its target tasks and use cases, has been shifting, together with its paradigms such as expert systems, semantic web, and language modeling. The intended use cases and supported user requirements between these paradigms have not been analyzed globally, as new paradigms often satisfy prior pain points while possibly introducing new ones. The recent abstraction of systemic patterns into a boxology provides an opening for aligning the requirements and use cases of knowledge engineering with the systems, components, and software that can satisfy them best. This paper proposes a vision of harmonizing the best practices in the field of knowledge engineering by leveraging the software engineering methodology of creating reference architectures. We describe how a reference architecture can be iteratively designed and implemented to associate user needs with recurring systemic patterns, building on top of existing knowledge engineering workflows and boxologies. We provide a six-step roadmap that can enable the development of such an architecture, providing an initial design and outcome of the definition of architectural scope, selection of information sources, and analysis. We expect that following through on this vision will lead to well-grounded reference architectures for knowledge engineering, will advance the ongoing initiatives of organizing the neurosymbolic knowledge engineering space, and will build new links to the software architectures and data science communities.

Updated: 2024-04-04 17:46:32

标题: 使用参考架构标准化知识工程实践

摘要: 知识工程是创建和维护知识生成系统的过程。在计算机科学和人工智能的历史中,知识工程工作流程被广泛使用,因为高质量的知识对于可靠的智能代理至关重要。与此同时,从其目标任务和用例中显而易见,知识工程的范围正在转变,同时也伴随着专家系统、语义网和语言建模等范式的变化。这些范式之间的预期用例和支持的用户需求尚未全球分析,因为新的范式通常会满足先前的问题点,同时可能引入新的问题点。将系统模式最近抽象为boxology为将知识工程的需求和用例与最能满足它们的系统、组件和软件进行对齐提供了一个契机。本文提出了通过利用创建参考架构的软件工程方法来协调知识工程领域最佳实践的愿景。我们描述了如何迭代地设计和实施一个参考架构,以将用户需求与重复的系统模式联系起来,建立在现有的知识工程工作流程和boxology之上。我们提供了一个六步路线图,可以促进这种架构的开发,提供了对架构范围的初始设计和结果、信息来源的选择和分析。我们期望贯彻这一愿景将为知识工程提供扎实的参考架构,推进组织神经符号知识工程空间的正在进行的倡议,并与软件架构和数据科学社区建立新的联系。

更新时间: 2024-04-04 17:46:32

领域: cs.AI,cs.SE

下载: http://arxiv.org/abs/2404.03624v1

Explaining Explainability: Understanding Concept Activation Vectors

Recent interpretability methods propose using concept-based explanations to translate the internal representations of deep learning models into a language that humans are familiar with: concepts. This requires understanding which concepts are present in the representation space of a neural network. One popular method for finding concepts is Concept Activation Vectors (CAVs), which are learnt using a probe dataset of concept exemplars. In this work, we investigate three properties of CAVs. CAVs may be: (1) inconsistent between layers, (2) entangled with different concepts, and (3) spatially dependent. Each property provides both challenges and opportunities in interpreting models. We introduce tools designed to detect the presence of these properties, provide insight into how they affect the derived explanations, and provide recommendations to minimise their impact. Understanding these properties can be used to our advantage. For example, we introduce spatially dependent CAVs to test if a model is translation invariant with respect to a specific concept and class. Our experiments are performed on ImageNet and a new synthetic dataset, Elements. Elements is designed to capture a known ground truth relationship between concepts and classes. We release this dataset to facilitate further research in understanding and evaluating interpretability methods.

Updated: 2024-04-04 17:46:20

标题: 解释可解释性:理解概念激活向量

摘要: 最近的可解释性方法提议使用基于概念的解释将深度学习模型的内部表示转化为人类熟悉的语言:概念。这需要理解神经网络表示空间中存在哪些概念。一种流行的找到概念的方法是概念激活向量(CAVs),通过使用概念示例的探针数据集进行学习。在这项工作中,我们调查了CAVs的三个特性。CAVs可能是:(1)在不同层之间不一致,(2)与不同概念纠缠在一起,以及(3)空间相关。每个特性都提供了解释模型的挑战和机会。我们引入了旨在检测这些特性存在的工具,提供了关于它们如何影响派生解释的见解,并提供了减少它们影响的建议。理解这些特性可以利用我们的优势。例如,我们引入了空间相关的CAVs来测试模型是否在特定概念和类别方面具有平移不变性。我们的实验在ImageNet和一个新的合成数据集Elements上进行。Elements旨在捕获概念和类别之间已知的真实关系。我们发布这个数据集以促进进一步研究理解和评估可解释性方法。

更新时间: 2024-04-04 17:46:20

领域: cs.LG,cs.AI,cs.CV,cs.HC,I.2.6

下载: http://arxiv.org/abs/2404.03713v1

Unveiling LLMs: The Evolution of Latent Representations in a Temporal Knowledge Graph

Large Language Models (LLMs) demonstrate an impressive capacity to recall a vast range of common factual knowledge information. However, unravelling the underlying reasoning of LLMs and explaining their internal mechanisms of exploiting this factual knowledge remain active areas of investigation. Our work analyzes the factual knowledge encoded in the latent representation of LLMs when prompted to assess the truthfulness of factual claims. We propose an end-to-end framework that jointly decodes the factual knowledge embedded in the latent space of LLMs from a vector space to a set of ground predicates and represents its evolution across the layers using a temporal knowledge graph. Our framework relies on the technique of activation patching which intervenes in the inference computation of a model by dynamically altering its latent representations. Consequently, we neither rely on external models nor training processes. We showcase our framework with local and global interpretability analyses using two claim verification datasets: FEVER and CLIMATE-FEVER. The local interpretability analysis exposes different latent errors from representation to multi-hop reasoning errors. On the other hand, the global analysis uncovered patterns in the underlying evolution of the model's factual knowledge (e.g., store-and-seek factual information). By enabling graph-based analyses of the latent representations, this work represents a step towards the mechanistic interpretability of LLMs.

Updated: 2024-04-04 17:45:59

标题: 揭示LLMs:时间知识图中潜在表示的演变

摘要: 大型语言模型(LLMs)展示了惊人的能力,可以回忆大量常见的事实知识信息。然而,揭示LLMs的潜在推理和解释它们利用这些事实知识的内部机制仍然是研究的一个活跃领域。我们的工作分析了在提示评估事实性声明真实性时LLMs中编码的事实知识。我们提出了一个端到端框架,通过将LLMs的潜在空间中嵌入的事实知识从向量空间解码为一组基本谓词,并使用时间知识图表示其在各层之间的演变。我们的框架依赖于激活补丁技术,通过动态改变其潜在表示干预模型的推理计算。因此,我们既不依赖外部模型也不需要训练过程。我们使用两个声明验证数据集:FEVER和CLIMATE-FEVER展示了我们的框架的本地和全局可解释性分析。本地可解释性分析揭示了从表示到多跳推理错误的不同潜在错误。另一方面,全局分析揭示了模型事实知识演变中的模式(例如,存储和查找事实信息)。通过启用基于图的潜在表示分析,这项工作代表了朝着对LLMs的机械可解释性迈出的一步。

更新时间: 2024-04-04 17:45:59

领域: cs.CL,cs.AI,cs.CY

下载: http://arxiv.org/abs/2404.03623v1

KVQuant: Towards 10 Million Context Length LLM Inference with KV Cache Quantization

LLMs are seeing growing use for applications such as document analysis and summarization which require large context windows, and with these large context windows KV cache activations surface as the dominant contributor to memory consumption during inference. Quantization is a promising approach for compressing KV cache activations; however, existing solutions fail to represent activations accurately in ultra-low precisions, such as sub-4-bit. In this work, we present KVQuant, which addresses this problem by incorporating novel methods for quantizing cached KV activations, including: (i) Per-Channel Key Quantization, where we adjust the dimension along which we quantize the Key activations to better match the distribution; (ii) Pre-RoPE Key Quantization, where we quantize Key activations before the rotary positional embedding to mitigate its impact on quantization; (iii) Non-Uniform KV Cache Quantization, where we derive per-layer sensitivity-weighted non-uniform datatypes that better represent the distributions; (iv) Per-Vector Dense-and-Sparse Quantization, where we isolate outliers separately for each vector to minimize skews in quantization ranges; and (v) Q-Norm, where we normalize quantization centroids in order to mitigate distribution shift, providing additional benefits for 2-bit quantization. By applying our method to the LLaMA, LLaMA-2, and Mistral models, we achieve $<0.1$ perplexity degradation with 3-bit quantization on both Wikitext-2 and C4, outperforming existing approaches. Our method enables serving the LLaMA-7B model with a context length of up to 1 million on a single A100-80GB GPU and up to 10 million on an 8-GPU system.

Updated: 2024-04-04 17:45:34

标题: KVQuant:面向具有KV缓存量化的1000万上下文长度LLM推理

摘要: LLMs在文档分析和摘要等需要大范围上下文窗口的应用中越来越常见,而在这些大范围上下文窗口中,KV缓存激活成为推理过程中内存消耗的主要贡献者。量化是一种压缩KV缓存激活的有前途的方法;然而,现有的解决方案无法准确地表示超低精度(如低于4位)的激活。在这项工作中,我们提出了KVQuant,通过纳入新颖的方法来量化缓存的KV激活来解决这个问题,包括:(i)通道关键量化,我们调整量化关键激活的维度以更好地匹配分布;(ii)RoPE之前的关键量化,在旋转位置嵌入之前量化关键激活以减轻其对量化的影响;(iii)非均匀KV缓存量化,我们推导出每层敏感性加权的非均匀数据类型,更好地表示分布;(iv)向量稠密和稀疏量化,我们单独隔离每个向量的异常值,以最小化量化范围中的偏差;(v)Q-Norm,我们归一化量化中心以减轻分布偏移,为2位量化提供额外的好处。通过将我们的方法应用于LLaMA、LLaMA-2和Mistral模型,我们在Wikitext-2和C4上使用3位量化实现小于0.1的困惑度降级,胜过现有方法。我们的方法使得在单个A100-80GB GPU上可以为LLaMA-7B模型提供长达1百万的上下文长度,并在8-GPU系统上可达到1千万。

更新时间: 2024-04-04 17:45:34

领域: cs.LG

下载: http://arxiv.org/abs/2401.18079v3

On the Efficiency of Convolutional Neural Networks

Since the breakthrough performance of AlexNet in 2012, convolutional neural networks (convnets) have grown into extremely powerful vision models. Deep learning researchers have used convnets to produce accurate results that were unachievable a decade ago. Yet computer scientists make computational efficiency their primary objective. Accuracy with exorbitant cost is not acceptable; an algorithm must also minimize its computational requirements. Confronted with the daunting computation that convnets use, deep learning researchers also became interested in efficiency. Researchers applied tremendous effort to find the convnet architectures that have the greatest efficiency. However, skepticism grew among researchers and engineers alike about the relevance of arithmetic complexity. Contrary to the prevailing view that latency and arithmetic complexity are irreconcilable, a simple formula relates both through computational efficiency. This insight enabled us to co-optimize the separate factors that determine latency. We observed that the degenerate conv2d layers that produce the best accuracy-complexity trade-off also have low operational intensity. Therefore, kernels that implement these layers use significant memory resources. We solved this optimization problem with block-fusion kernels that implement all layers of a residual block, thereby creating temporal locality, avoiding communication, and reducing workspace size. Our ConvFirst model with block-fusion kernels ran approximately four times as fast as the ConvNeXt baseline with PyTorch Inductor, at equal accuracy on the ImageNet-1K classification task. Our unified approach to convnet efficiency envisions a new era of models and kernels that achieve greater accuracy at lower cost.

Updated: 2024-04-04 17:39:41

标题: 关于卷积神经网络的效率

摘要: 自2012年AlexNet的突破性表现以来,卷积神经网络(convnets)已经发展成为极其强大的视觉模型。深度学习研究人员利用convnets产生了准确的结果,这在十年前是无法实现的。然而,计算机科学家将计算效率作为他们的主要目标。准确性伴随着巨大的成本是不可接受的;一种算法还必须最小化其计算需求。面对convnets使用的令人生畏的计算量,深度学习研究人员也对效率产生了兴趣。研究人员付出了巨大努力,找到了具有最大效率的convnet架构。然而,研究人员和工程师对算术复杂性的相关性产生了怀疑。与普遍观点相反,延迟和算术复杂性并不是不可调和的,一个简单的公式通过计算效率将二者联系起来。这一洞察力使我们能够共同优化决定延迟的各个因素。我们观察到,产生最佳准确性-复杂性权衡的退化conv2d层也具有较低的操作强度。因此,实现这些层的内核使用了大量的内存资源。我们通过实现所有残差块层的块融合内核解决了这个优化问题,从而创造了时间局部性,避免了通信,并减少了工作空间大小。我们的ConvFirst模型使用块融合内核在ImageNet-1K分类任务上的准确率与PyTorch Inductor的ConvNeXt基线相等,运行速度大约是后者的四倍。我们对convnet效率的统一方法设想了一个新时代的模型和内核,实现了更高的准确性和更低的成本。

更新时间: 2024-04-04 17:39:41

领域: cs.LG,cs.CV

下载: http://arxiv.org/abs/2404.03617v1

InsectMamba: Insect Pest Classification with State Space Model

The classification of insect pests is a critical task in agricultural technology, vital for ensuring food security and environmental sustainability. However, the complexity of pest identification, due to factors like high camouflage and species diversity, poses significant obstacles. Existing methods struggle with the fine-grained feature extraction needed to distinguish between closely related pest species. Although recent advancements have utilized modified network structures and combined deep learning approaches to improve accuracy, challenges persist due to the similarity between pests and their surroundings. To address this problem, we introduce InsectMamba, a novel approach that integrates State Space Models (SSMs), Convolutional Neural Networks (CNNs), Multi-Head Self-Attention mechanism (MSA), and Multilayer Perceptrons (MLPs) within Mix-SSM blocks. This integration facilitates the extraction of comprehensive visual features by leveraging the strengths of each encoding strategy. A selective module is also proposed to adaptively aggregate these features, enhancing the model's ability to discern pest characteristics. InsectMamba was evaluated against strong competitors across five insect pest classification datasets. The results demonstrate its superior performance and verify the significance of each model component by an ablation study.

Updated: 2024-04-04 17:34:21

标题: 昆虫马姆巴:利用状态空间模型进行昆虫害分类

摘要: 昆虫害分类是农业技术中的关键任务,对确保食品安全和环境可持续性至关重要。然而,由于高度伪装和物种多样性等因素,害虫识别的复杂性带来了重大障碍。现有方法在需要区分密切相关的害虫物种时,往往难以进行细粒度特征提取。尽管最近的进展利用了修改的网络结构并结合深度学习方法来提高准确性,但由于害虫及其周围环境之间的相似性,挑战仍然存在。为解决这一问题,我们引入了InsectMamba,一种集成了状态空间模型(SSMs)、卷积神经网络(CNNs)、多头自注意机制(MSA)和多层感知器(MLPs)的全新方法,在Mix-SSM块中实现了这些结合。这种整合通过利用每种编码策略的优势,促进了全面视觉特征的提取。还提出了一个选择性模块,可以自适应地聚合这些特征,增强模型识别害虫特征的能力。InsectMamba在五个昆虫害分类数据集上与强竞争对手进行了评估。结果表明其优越的性能,并通过消融研究验证了每个模型组件的重要性。

更新时间: 2024-04-04 17:34:21

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2404.03611v1

Sailor: Open Language Models for South-East Asia

We present Sailor, a family of open language models ranging from 0.5B to 7B parameters, tailored for South-East Asian (SEA) languages. These models are continually pre-trained from Qwen1.5, a great language model for multilingual use cases. From Qwen1.5, Sailor models accept 200B to 400B tokens, primarily covering the languages of English, Chinese, Vietnamese, Thai, Indonesian, Malay, and Lao. The training leverages several techniques, including BPE dropout for improving the model robustness, aggressive data cleaning and deduplication, and small proxy models to optimize data mixture. Experimental results on four typical tasks indicate that Sailor models demonstrate strong performance across different benchmarks, including commonsense reasoning, question answering, reading comprehension and examination. Embracing the open-source spirit, we share our insights through this report to spark a wider interest in developing large language models for multilingual use cases.

Updated: 2024-04-04 17:31:32

标题: 水手:开放式语言模型用于东南亚

摘要: 我们介绍了Sailor,一个针对东南亚语言定制的一系列开放语言模型,参数范围从0.5B到7B。这些模型是从Qwen1.5不断预训练的,Qwen1.5是一个用于多语言使用案例的出色语言模型。从Qwen1.5开始,Sailor模型接受200B到400B的标记,主要涵盖英语、中文、越南语、泰语、印尼语、马来语和老挝语等语言。训练利用了几种技术,包括BPE dropout用于提高模型的鲁棒性、积极的数据清洗和去重以及小型代理模型以优化数据混合。在四项典型任务上的实验结果表明,Sailor模型在不同基准测试中表现出色,包括常识推理、问题回答、阅读理解和考试。秉持开源精神,我们通过这份报告分享我们的见解,以激发更广泛的兴趣,开发用于多语言使用案例的大型语言模型。

更新时间: 2024-04-04 17:31:32

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2404.03608v1

Infini-gram: Scaling Unbounded n-gram Language Models to a Trillion Tokens

Are $n$-gram language models still relevant in this era of neural large language models (LLMs)? Our answer is yes, and we showcase their values in both text analysis and improving neural LLMs. This was done by modernizing $n$-gram LMs in two aspects. First, we train them at the same data scale as neural LLMs -- 5 trillion tokens. This is the largest $n$-gram LM ever built. Second, existing $n$-gram LMs use small $n$ which hinders their performance; we instead allow $n$ to be arbitrarily large, by introducing a new $\infty$-gram LM with backoff. Instead of pre-computing $n$-gram count tables (which would be very expensive), we develop an engine named infini-gram -- powered by suffix arrays -- that can compute $\infty$-gram (as well as $n$-gram with arbitrary $n$) probabilities with millisecond-level latency. The $\infty$-gram framework and infini-gram engine enable us to conduct many novel and interesting analyses of human-written and machine-generated text: we find that the $\infty$-gram LM has fairly high accuracy for next-token prediction (47%), and can complement neural LLMs to greatly reduce their perplexity. When analyzing machine-generated text, we also observe irregularities in the machine--$\infty$-gram agreement level with respect to the suffix length, which indicates deficiencies in neural LLM pretraining and the positional embeddings of Transformers.

Updated: 2024-04-04 17:28:38

标题: 无限克隆:将无限制的n-gram语言模型扩展到一万亿令牌

摘要: $n$-gram语言模型在这个神经大语言模型(LLMs)时代仍然具有相关性吗?我们的回答是肯定的,并且我们展示了它们在文本分析和改进神经LLMs方面的价值。这是通过两个方面对$n$-gram LM进行现代化实现的。首先,我们在与神经LLMs相同的数据规模上训练它们--5000亿个标记。这是迄今为止建立的最大的$n$-gram LM。其次,现有的$n$-gram LMs使用较小的$n$会降低其性能;我们相反允许$n$可以任意大,通过引入一个带有回退的新$\infty$-gram LM。我们开发了一个名为infini-gram的引擎,由后缀数组支持,可以以毫秒级的延迟计算$\infty$-gram(以及任意$n$的$n$-gram)概率,而不是预先计算$n$-gram计数表(这将非常昂贵)。$\infty$-gram框架和infini-gram引擎使我们能够进行许多有关人类撰写和机器生成文本的新颖有趣的分析:我们发现$\infty$-gram LM对下一个标记的预测具有相当高的准确性(47%),并且可以辅助神经LLMs大大降低其困惑度。在分析机器生成的文本时,我们还观察到机器与$\infty$-gram一致性水平与后缀长度有关的不规则性,这表明神经LLM预训练和Transformer的位置嵌入存在缺陷。

更新时间: 2024-04-04 17:28:38

领域: cs.CL,cs.AI,cs.IR

下载: http://arxiv.org/abs/2401.17377v3

Analyzing Musical Characteristics of National Anthems in Relation to Global Indices

Music plays a huge part in shaping peoples' psychology and behavioral patterns. This paper investigates the connection between national anthems and different global indices with computational music analysis and statistical correlation analysis. We analyze national anthem musical data to determine whether certain musical characteristics are associated with peace, happiness, suicide rate, crime rate, etc. To achieve this, we collect national anthems from 169 countries and use computational music analysis techniques to extract pitch, tempo, beat, and other pertinent audio features. We then compare these musical characteristics with data on different global indices to ascertain whether a significant correlation exists. Our findings indicate that there may be a correlation between the musical characteristics of national anthems and the indices we investigated. The implications of our findings for music psychology and policymakers interested in promoting social well-being are discussed. This paper emphasizes the potential of musical data analysis in social research and offers a novel perspective on the relationship between music and social indices. The source code and data are made open-access for reproducibility and future research endeavors. It can be accessed at http://bit.ly/na_code.

Updated: 2024-04-04 17:25:31

标题: 分析国歌的音乐特征与全球指标的关系

摘要: 音乐在塑造人们心理和行为模式方面起着巨大作用。本文利用计算音乐分析和统计相关性分析,研究了国歌与不同全球指数之间的关联。我们分析国歌的音乐数据,以确定某些音乐特征是否与和平、幸福、自杀率、犯罪率等有关。为实现这一目标,我们收集了来自169个国家的国歌,并使用计算音乐分析技术提取音高、节奏、节拍等相关音频特征。然后我们将这些音乐特征与不同全球指数的数据进行比较,以确定是否存在显著相关性。我们的研究结果表明,国歌的音乐特征与我们研究的指数之间可能存在相关性。本研究发现对于音乐心理学和有兴趣促进社会福祉的政策制定者具有重要意义。本文强调了音乐数据分析在社会研究中的潜力,并提供了一个关于音乐与社会指数关系的新视角。源代码和数据已开放获取,以便于重现和未来研究。可通过http://bit.ly/na_code 访问。

更新时间: 2024-04-04 17:25:31

领域: cs.SD,cs.AI,cs.IR,eess.AS

下载: http://arxiv.org/abs/2404.03606v1

Mitigating the Impact of Outlier Channels for Language Model Quantization with Activation Regularization

We consider the problem of accurate quantization for language models, where both the weights and activations are uniformly quantized to 4 bits per parameter, the lowest bitwidth format natively supported by GPU hardware. In this context, the key challenge is activation quantization: it is known that language models contain outlier channels whose values on average are orders of magnitude higher than than other channels, which prevents accurate low-bitwidth quantization with known techniques. We systematically study this phenomena and find that these outlier channels emerge early in training, and that they occur more frequently in layers with residual streams. We then propose a simple strategy which regularizes a layer's inputs via quantization-aware training (QAT) and its outputs via activation kurtosis regularization. We show that regularizing both the inputs and outputs is crucial for preventing a model's "migrating" the difficulty in input quantization to the weights, which makes post-training quantization (PTQ) of weights more difficult. When combined with weight PTQ, we show that our approach can obtain a W4A4 model that performs competitively to the standard-precision W16A16 baseline.

Updated: 2024-04-04 17:25:30

标题: 使用激活正则化减轻语言模型量化中异常通道的影响

摘要: 我们考虑了语言模型精确量化的问题,其中权重和激活都被均匀量化为每个参数4位,这是GPU硬件本机支持的最低位宽格式。在这种情况下,关键挑战是激活量化:已知语言模型包含异常通道,其值平均比其他通道高出数个数量级,这阻碍了使用已知技术进行准确的低位宽量化。我们系统地研究了这种现象,并发现这些异常通道在训练早期就出现,并且在具有残差流的层中更频繁出现。然后,我们提出了一种简单的策略,通过量化感知训练(QAT)对层的输入进行正则化,通过激活峰度正则化对输出进行正则化。我们展示了通过对输入和输出进行正则化对于防止模型将输入量化的困难“迁移到”权重是至关重要的,这使得权重的后训练量化(PTQ)更加困难。当与权重PTQ结合使用时,我们展示了我们的方法可以获得一个性能与标准精度W16A16基线相竞争的W4A4模型。

更新时间: 2024-04-04 17:25:30

领域: cs.LG,cs.CL

下载: http://arxiv.org/abs/2404.03605v1

TableLlama: Towards Open Large Generalist Models for Tables

Semi-structured tables are ubiquitous. There has been a variety of tasks that aim to automatically interpret, augment, and query tables. Current methods often require pretraining on tables or special model architecture design, are restricted to specific table types, or have simplifying assumptions about tables and tasks. This paper makes the first step towards developing open-source large language models (LLMs) as generalists for a diversity of table-based tasks. Towards that end, we construct TableInstruct, a new dataset with a variety of realistic tables and tasks, for instruction tuning and evaluating LLMs. We further develop the first open-source generalist model for tables, TableLlama, by fine-tuning Llama 2 (7B) with LongLoRA to address the long context challenge. We experiment under both in-domain setting and out-of-domain setting. On 7 out of 8 in-domain tasks, TableLlama achieves comparable or better performance than the SOTA for each task, despite the latter often has task-specific design. On 6 out-of-domain datasets, it achieves 5-44 absolute point gains compared with the base model, showing that training on TableInstruct enhances the model's generalizability. We open-source our dataset and trained model to boost future work on developing open generalist models for tables.

Updated: 2024-04-04 17:10:25

标题: TableLlama:朝着为表格开发开放的大型通用模型

摘要: 半结构化表格是无处不在的。已经有各种任务旨在自动解释、增强和查询表格。目前的方法通常需要对表格进行预训练或特殊模型架构设计,受限于特定类型的表格,或者对表格和任务有简化的假设。本文首次迈出了发展开源大型语言模型(LLMs)作为多样表格任务通用工具的第一步。为此,我们构建了TableInstruct,一个包含各种真实表格和任务的新数据集,用于指导调整和评估LLMs。我们进一步开发了第一个针对表格的开源通用模型TableLlama,通过使用LongLoRA对Llama 2(7B)进行微调,以解决长上下文挑战。我们在领域内和领域外两种设置下进行实验。在8个领域内任务中,TableLlama在7个任务中达到了与每个任务的最先进技术相当或更好的性能,尽管后者通常具有特定任务设计。在6个领域外数据集中,与基础模型相比,它实现了5-44个绝对点的增益,表明在TableInstruct上进行训练增强了模型的泛化能力。我们开源了我们的数据集和训练模型,以促进未来开发面向表格的开源通用模型的工作。

更新时间: 2024-04-04 17:10:25

领域: cs.CL,cs.AI,cs.DB

下载: http://arxiv.org/abs/2311.09206v3

Laser Learning Environment: A new environment for coordination-critical multi-agent tasks

We introduce the Laser Learning Environment (LLE), a collaborative multi-agent reinforcement learning environment in which coordination is central. In LLE, agents depend on each other to make progress (interdependence), must jointly take specific sequences of actions to succeed (perfect coordination), and accomplishing those joint actions does not yield any intermediate reward (zero-incentive dynamics). The challenge of such problems lies in the difficulty of escaping state space bottlenecks caused by interdependence steps since escaping those bottlenecks is not rewarded. We test multiple state-of-the-art value-based MARL algorithms against LLE and show that they consistently fail at the collaborative task because of their inability to escape state space bottlenecks, even though they successfully achieve perfect coordination. We show that Q-learning extensions such as prioritized experience replay and n-steps return hinder exploration in environments with zero-incentive dynamics, and find that intrinsic curiosity with random network distillation is not sufficient to escape those bottlenecks. We demonstrate the need for novel methods to solve this problem and the relevance of LLE as cooperative MARL benchmark.

Updated: 2024-04-04 17:05:42

标题: 激光学习环境:一种用于协调关键性多智能体任务的新环境

摘要: 我们介绍了激光学习环境(LLE),这是一个协作的多智能体强化学习环境,其中协调是核心。在LLE中,智能体彼此依赖以取得进展(相互依赖),必须共同采取特定的动作序列才能成功(完美协调),而完成这些共同动作不会产生任何中间奖励(零激励动态)。这类问题的挑战在于通过相互依赖步骤逃脱状态空间瓶颈的困难,因为逃脱这些瓶颈是不被奖励的。我们对多种最先进的值基多智能体强化学习算法在LLE上进行测试,并表明它们在协作任务上一直失败,因为它们无法逃脱状态空间瓶颈,即使它们成功实现了完美协调。我们发现,像优先经验回放和n步回报等Q学习扩展在零激励动态环境中阻碍了探索,并且发现具有随机网络蒸馏的内在好奇心不足以逃脱这些瓶颈。我们展示了解决这一问题的新方法的必要性,并展示了LLE作为合作多智能体强化学习基准的相关性。

更新时间: 2024-04-04 17:05:42

领域: cs.LG,cs.AI,cs.MA

下载: http://arxiv.org/abs/2404.03596v1

Transducers with Pronunciation-aware Embeddings for Automatic Speech Recognition

This paper proposes Transducers with Pronunciation-aware Embeddings (PET). Unlike conventional Transducers where the decoder embeddings for different tokens are trained independently, the PET model's decoder embedding incorporates shared components for text tokens with the same or similar pronunciations. With experiments conducted in multiple datasets in Mandarin Chinese and Korean, we show that PET models consistently improve speech recognition accuracy compared to conventional Transducers. Our investigation also uncovers a phenomenon that we call error chain reactions. Instead of recognition errors being evenly spread throughout an utterance, they tend to group together, with subsequent errors often following earlier ones. Our analysis shows that PET models effectively mitigate this issue by substantially reducing the likelihood of the model generating additional errors following a prior one. Our implementation will be open-sourced with the NeMo toolkit.

Updated: 2024-04-04 17:03:47

标题: 具有发音感知嵌入的传感器,用于自动语音识别

摘要: 本文提出了一种具有发音感知嵌入(PET)的转录器。与传统的转录器不同,其中不同标记的解码器嵌入是独立训练的,PET模型的解码器嵌入融合了具有相同或相似发音的文本标记的共享组件。通过在多个汉语和韩语数据集上进行实验,我们展示了与传统转录器相比,PET模型能够持续提高语音识别准确性。我们的研究还揭示了一个我们称之为“错误链反应”的现象。识别错误不再均匀分布在话语中,而是倾向于聚集在一起,随后的错误通常跟随先前的错误。我们的分析表明,PET模型通过大幅减少模型在之前错误后生成额外错误的可能性,有效地缓解了这一问题。我们的实现将与NeMo工具包一起开源。

更新时间: 2024-04-04 17:03:47

领域: cs.CL,cs.LG,cs.SD,eess.AS

下载: http://arxiv.org/abs/2404.04295v1

ReFT: Representation Finetuning for Language Models

Parameter-efficient fine-tuning (PEFT) methods seek to adapt large models via updates to a small number of weights. However, much prior interpretability work has shown that representations encode rich semantic information, suggesting that editing representations might be a more powerful alternative. Here, we pursue this hypothesis by developing a family of $\textbf{Representation Finetuning (ReFT)}$ methods. ReFT methods operate on a frozen base model and learn task-specific interventions on hidden representations. We define a strong instance of the ReFT family, Low-rank Linear Subspace ReFT (LoReFT). LoReFT is a drop-in replacement for existing PEFTs and learns interventions that are 10x-50x more parameter-efficient than prior state-of-the-art PEFTs. We showcase LoReFT on eight commonsense reasoning tasks, four arithmetic reasoning tasks, Alpaca-Eval v1.0, and GLUE. In all these evaluations, LoReFT delivers the best balance of efficiency and performance, and almost always outperforms state-of-the-art PEFTs. We release a generic ReFT training library publicly at https://github.com/stanfordnlp/pyreft.

Updated: 2024-04-04 17:00:37

标题: ReFT: 语言模型的表示微调

摘要: 参数高效微调(PEFT)方法旨在通过对少量权重进行更新来调整大型模型。然而,许多先前的可解释性研究表明,表示编码了丰富的语义信息,这表明编辑表示可能是一个更强大的替代方法。在这里,我们通过开发一系列$\textbf{表示微调(ReFT)}$方法来追求这一假设。ReFT方法在一个冻结的基础模型上运行,并学习对隐藏表示进行特定于任务的干预。我们定义了ReFT家族的一个强实例,即低秩线性子空间ReFT(LoReFT)。LoReFT是现有PEFT的可插拔替代品,并学习比先前的最先进PEFT方法更高效的干预,参数效率提高了10倍至50倍。我们在八个常识推理任务、四个算术推理任务、Alpaca-Eval v1.0和GLUE上展示了LoReFT。在所有这些评估中,LoReFT提供了效率和性能的最佳平衡,并几乎总是优于最先进的PEFT方法。我们在https://github.com/stanfordnlp/pyreft 上公开发布了一个通用的ReFT训练库。

更新时间: 2024-04-04 17:00:37

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2404.03592v1

SemGrasp: Semantic Grasp Generation via Language Aligned Discretization

Generating natural human grasps necessitates consideration of not just object geometry but also semantic information. Solely depending on object shape for grasp generation confines the applications of prior methods in downstream tasks. This paper presents a novel semantic-based grasp generation method, termed SemGrasp, which generates a static human grasp pose by incorporating semantic information into the grasp representation. We introduce a discrete representation that aligns the grasp space with semantic space, enabling the generation of grasp postures in accordance with language instructions. A Multimodal Large Language Model (MLLM) is subsequently fine-tuned, integrating object, grasp, and language within a unified semantic space. To facilitate the training of SemGrasp, we have compiled a large-scale, grasp-text-aligned dataset named CapGrasp, featuring about 260k detailed captions and 50k diverse grasps. Experimental findings demonstrate that SemGrasp efficiently generates natural human grasps in alignment with linguistic intentions. Our code, models, and dataset are available publicly at: https://kailinli.github.io/SemGrasp.

Updated: 2024-04-04 16:58:26

标题: SemGrasp:通过语言对齐离散化生成语义抓取

摘要: 生成自然人类抓取需要考虑不仅对象几何形状,还有语义信息。仅仅依赖对象形状进行抓取生成限制了先前方法在下游任务中的应用。本文提出了一种新颖的基于语义的抓取生成方法,称为SemGrasp,通过将语义信息融入抓取表示来生成静态人类抓取姿势。我们引入了一个离散表示,将抓取空间与语义空间对齐,从而能够根据语言指令生成抓取姿势。随后对多模态大型语言模型(MLLM)进行微调,将对象、抓取和语言集成在统一的语义空间中。为了促进SemGrasp的训练,我们编制了一个名为CapGrasp的大规模、抓取文本对齐的数据集,包含约260k个详细标题和50k个多样的抓取。实验结果表明,SemGrasp能够有效地生成符合语言意图的自然人类抓取。我们的代码、模型和数据集可在以下网址公开获取:https://kailinli.github.io/SemGrasp。

更新时间: 2024-04-04 16:58:26

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2404.03590v1

Fairness Improvement with Multiple Protected Attributes: How Far Are We?

Existing research mostly improves the fairness of Machine Learning (ML) software regarding a single protected attribute at a time, but this is unrealistic given that many users have multiple protected attributes. This paper conducts an extensive study of fairness improvement regarding multiple protected attributes, covering 11 state-of-the-art fairness improvement methods. We analyze the effectiveness of these methods with different datasets, metrics, and ML models when considering multiple protected attributes. The results reveal that improving fairness for a single protected attribute can largely decrease fairness regarding unconsidered protected attributes. This decrease is observed in up to 88.3% of scenarios (57.5% on average). More surprisingly, we find little difference in accuracy loss when considering single and multiple protected attributes, indicating that accuracy can be maintained in the multiple-attribute paradigm. However, the effect on F1-score when handling two protected attributes is about twice that of a single attribute. This has important implications for future fairness research: reporting only accuracy as the ML performance metric, which is currently common in the literature, is inadequate.

Updated: 2024-04-04 16:54:25

标题: 多个受保护属性下的公平性改进:我们到底达到了多远?

摘要: 现有研究大多集中在改善机器学习(ML)软件对单个受保护属性的公平性,但考虑到许多用户拥有多个受保护属性,这是不现实的。本文对涵盖11种最先进的公平性改进方法的多个受保护属性进行了广泛研究。我们分析了这些方法在考虑多个受保护属性时在不同数据集、指标和ML模型上的有效性。结果显示,改善单个受保护属性的公平性可以大大降低未考虑的受保护属性的公平性。在多达88.3%的场景中观察到这种降低(平均为57.5%)。更令人惊讶的是,在考虑单个和多个受保护属性时,准确率损失几乎没有区别,表明准确性可以在多属性范式中保持。然而,在处理两个受保护属性时,F1分数的影响约为单个属性的两倍。这对未来的公平性研究具有重要意义:目前文献中普遍使用的仅报告准确性作为ML性能指标是不足够的。

更新时间: 2024-04-04 16:54:25

领域: cs.LG,cs.AI,cs.CY,cs.SE

下载: http://arxiv.org/abs/2308.01923v3

Anticipate & Collab: Data-driven Task Anticipation and Knowledge-driven Planning for Human-robot Collaboration

An agent assisting humans in daily living activities can collaborate more effectively by anticipating upcoming tasks. Data-driven methods represent the state of the art in task anticipation, planning, and related problems, but these methods are resource-hungry and opaque. Our prior work introduced a proof of concept framework that used an LLM to anticipate 3 high-level tasks that served as goals for a classical planning system that computed a sequence of low-level actions for the agent to achieve these goals. This paper describes DaTAPlan, our framework that significantly extends our prior work toward human-robot collaboration. Specifically, DaTAPlan planner computes actions for an agent and a human to collaboratively and jointly achieve the tasks anticipated by the LLM, and the agent automatically adapts to unexpected changes in human action outcomes and preferences. We evaluate DaTAPlan capabilities in a realistic simulation environment, demonstrating accurate task anticipation, effective human-robot collaboration, and the ability to adapt to unexpected changes. Project website: https://dataplan-hrc.github.io

Updated: 2024-04-04 16:52:48

标题: 预测与协作:基于数据驱动的任务预测和基于知识的规划,用于人机协作

摘要: 一种协助人类日常生活活动的代理人可以通过预测即将发生的任务,更有效地协作。数据驱动方法代表了任务预测、规划和相关问题的最新技术,但这些方法需要大量资源且不透明。我们之前的工作介绍了一个概念验证框架,该框架使用LLM来预测3个高级任务,这些任务作为传统规划系统的目标,该系统计算代理人实现这些目标所需的一系列低级动作的顺序。本文描述了DaTAPlan,我们的框架显著扩展了我们之前的工作,朝着人机协作的方向。具体而言,DaTAPlan规划器计算代理人和人类共同合作实现LLM预测的任务所需的动作,并且代理人可以自动适应人类行动结果和偏好的意外变化。我们在一个真实的模拟环境中评估了DaTAPlan的能力,展示了准确的任务预测、有效的人机协作以及适应意外变化的能力。项目网站:https://dataplan-hrc.github.io

更新时间: 2024-04-04 16:52:48

领域: cs.RO,cs.AI

下载: http://arxiv.org/abs/2404.03587v1

Leveraging Interpolation Models and Error Bounds for Verifiable Scientific Machine Learning

Effective verification and validation techniques for modern scientific machine learning workflows are challenging to devise. Statistical methods are abundant and easily deployed, but often rely on speculative assumptions about the data and methods involved. Error bounds for classical interpolation techniques can provide mathematically rigorous estimates of accuracy, but often are difficult or impractical to determine computationally. In this work, we present a best-of-both-worlds approach to verifiable scientific machine learning by demonstrating that (1) multiple standard interpolation techniques have informative error bounds that can be computed or estimated efficiently; (2) comparative performance among distinct interpolants can aid in validation goals; (3) deploying interpolation methods on latent spaces generated by deep learning techniques enables some interpretability for black-box models. We present a detailed case study of our approach for predicting lift-drag ratios from airfoil images. Code developed for this work is available in a public Github repository.

Updated: 2024-04-04 16:52:17

标题: 利用插值模型和误差界限进行可验证的科学机器学习

摘要: 现代科学机器学习工作流的有效验证和验证技术很难设计。统计方法丰富且易于部署,但通常依赖于对涉及的数据和方法的推测性假设。经典插值技术的误差界可以提供数学严格的准确性估计,但通常在计算上难以确定或不切实际。在这项工作中,我们提出了一种可验证的科学机器学习的最佳实践方法,通过展示:(1)多种标准插值技术具有可以高效计算或估计的信息性误差界;(2)不同插值器之间的比较性能可以帮助验证目标;(3)在由深度学习技术生成的潜在空间上部署插值方法使得黑盒模型具有一定的可解释性。我们提供了一个详细的案例研究,用于从翼型图像预测升阻比。本工作开发的代码可在公共Github存储库中找到。

更新时间: 2024-04-04 16:52:17

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2404.03586v1

UniHand: Privacy-preserving Universal Handover for Small-Cell Networks in 5G-enabled Mobile Communication with KCI Resilience

Introducing Small Cell Networks (SCN) has significantly improved wireless link quality, spectrum efficiency and network capacity, which has been viewed as one of the key technologies in the fifth-generation (5G) mobile network. However, this technology increases the frequency of handover (HO) procedures caused by the dense deployment of cells in the network with reduced cell coverage, bringing new security and privacy issues. The current 5G-AKA and HO protocols are vulnerable to security weaknesses, such as the lack of forward secrecy and identity confusion attacks. The high HO frequency of HOs might magnify these security and privacy concerns in the 5G mobile network. This work addresses these issues by proposing a secure privacy-preserving universal HO scheme ($\UniHand$) for SCNs in 5G mobile communication. $\UniHand$ can achieve mutual authentication, strong anonymity, perfect forward secrecy, key-escrow-free and key compromise impersonation (KCI) resilience. To the best of our knowledge, this is the \textit{first} scheme to achieve secure, privacy-preserving universal HO with \textit{KCI} resilience for roaming users in 5G environment. We demonstrate that our proposed scheme is resilient against all the essential security threats by performing a comprehensive formal security analysis and conducting relevant experiments to show the cost-effectiveness of the proposed scheme.

Updated: 2024-04-04 16:46:23

标题: UniHand:5G启用的移动通信中用于小区网络的保护隐私的通用切换与KCI弹性

摘要: 引入小区网络(SCN)显著改善了无线链路质量、频谱效率和网络容量,被视为第五代(5G)移动网络中的关键技术之一。然而,这项技术增加了由于网络中细胞密集部署而导致的漫游(HO)过程的频率,降低了细胞覆盖范围,带来了新的安全和隐私问题。当前的5G-AKA和HO协议容易受到安全弱点的影响,如缺乏前向保密性和身份混淆攻击。高频率的HO可能会放大5G移动网络中的这些安全和隐私问题。本文通过提出一种用于5G移动通信中SCN的安全隐私保护通用HO方案($\UniHand$)来解决这些问题。$\UniHand$能够实现相互认证、强匿名性、完美前向保密性、无密钥托管和密钥妥协冒名(KCI)韧性。据我们所知,这是第一个在5G环境中为漫游用户实现具有KCI韧性的安全、隐私保护通用HO方案。我们通过进行全面的形式安全分析和相关实验来展示我们提出的方案对所有重要安全威胁的韧性,并展示了提出方案的成本效益。

更新时间: 2024-04-04 16:46:23

领域: cs.CR

下载: http://arxiv.org/abs/2403.07817v2

SoK: Unintended Interactions among Machine Learning Defenses and Risks

Machine learning (ML) models cannot neglect risks to security, privacy, and fairness. Several defenses have been proposed to mitigate such risks. When a defense is effective in mitigating one risk, it may correspond to increased or decreased susceptibility to other risks. Existing research lacks an effective framework to recognize and explain these unintended interactions. We present such a framework, based on the conjecture that overfitting and memorization underlie unintended interactions. We survey existing literature on unintended interactions, accommodating them within our framework. We use our framework to conjecture on two previously unexplored interactions, and empirically validate our conjectures.

Updated: 2024-04-04 16:43:44

标题: SoK:机器学习防御措施之间的意外相互作用和风险

摘要: 机器学习(ML)模型不能忽视安全、隐私和公平性风险。已经提出了几种防御措施来减轻这些风险。当一种防御措施在减轻一种风险时,可能会对其他风险的敏感性增加或减少。现有研究缺乏一个有效的框架来识别和解释这些意外的相互作用。我们提出了这样一个框架,基于过拟合和记忆化的猜想来解释这些意外的相互作用。我们调查了现有文献中关于意外相互作用的内容,并将其纳入我们的框架中。我们利用我们的框架推测了两种以前未探讨的相互作用,并在实证上验证了我们的推测。

更新时间: 2024-04-04 16:43:44

领域: cs.CR,cs.LG

下载: http://arxiv.org/abs/2312.04542v2

Distributionally Robust Reinforcement Learning with Interactive Data Collection: Fundamental Hardness and Near-Optimal Algorithm

The sim-to-real gap, which represents the disparity between training and testing environments, poses a significant challenge in reinforcement learning (RL). A promising approach to addressing this challenge is distributionally robust RL, often framed as a robust Markov decision process (RMDP). In this framework, the objective is to find a robust policy that achieves good performance under the worst-case scenario among all environments within a pre-specified uncertainty set centered around the training environment. Unlike previous work, which relies on a generative model or a pre-collected offline dataset enjoying good coverage of the deployment environment, we tackle robust RL via interactive data collection, where the learner interacts with the training environment only and refines the policy through trial and error. In this robust RL paradigm, two main challenges emerge: managing distributional robustness while striking a balance between exploration and exploitation during data collection. Initially, we establish that sample-efficient learning without additional assumptions is unattainable owing to the curse of support shift; i.e., the potential disjointedness of the distributional supports between the training and testing environments. To circumvent such a hardness result, we introduce the vanishing minimal value assumption to RMDPs with a total-variation (TV) distance robust set, postulating that the minimal value of the optimal robust value function is zero. We prove that such an assumption effectively eliminates the support shift issue for RMDPs with a TV distance robust set, and present an algorithm with a provable sample complexity guarantee. Our work makes the initial step to uncovering the inherent difficulty of robust RL via interactive data collection and sufficient conditions for designing a sample-efficient algorithm accompanied by sharp sample complexity analysis.

Updated: 2024-04-04 16:40:22

标题: 分布鲁棒强化学习与交互式数据收集:基本困难和近似最优算法

摘要: 虚拟到现实差距代表了训练和测试环境之间的不一致性,在强化学习中构成了一个重要挑战。解决这一挑战的一种有前途的方法是分布鲁棒强化学习,通常被构建为一个鲁棒马尔可夫决策过程(RMDP)。在这个框架中,目标是找到一个在围绕训练环境的预定义不确定性集合中所有环境中实现良好性能的鲁棒策略。与以往依赖生成模型或预先收集的离线数据集,并且覆盖部署环境的工作不同,我们通过交互式数据收集来处理鲁棒强化学习,学习者仅与训练环境进行交互,并通过反复试错来完善策略。在这种鲁棒强化学习范式中,出现了两个主要挑战:在数据收集过程中管理分布鲁棒性,同时在探索和开发之间取得平衡。最初,我们确定了在没有额外假设的情况下实现高效样本学习是不可行的,这是由于支持转移的诅咒;即训练和测试环境之间的分布支持的潜在不一致性。为了规避这样一个困难结果,我们向具有总变分(TV)距离鲁棒集的RMDPs引入了消失的最小值假设,假设最优鲁棒值函数的最小值为零。我们证明了这种假设有效地消除了具有TV距离鲁棒集的RMDPs中的支持转移问题,并提出了一个具有可证明样本复杂度保证的算法。我们的工作为揭示通过交互式数据收集进行鲁棒强化学习的固有难度以及设计一个伴随着尖锐样本复杂度分析的高效样本算法的充分条件迈出了初始步骤。

更新时间: 2024-04-04 16:40:22

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2404.03578v1

TinyVQA: Compact Multimodal Deep Neural Network for Visual Question Answering on Resource-Constrained Devices

Traditional machine learning models often require powerful hardware, making them unsuitable for deployment on resource-limited devices. Tiny Machine Learning (tinyML) has emerged as a promising approach for running machine learning models on these devices, but integrating multiple data modalities into tinyML models still remains a challenge due to increased complexity, latency, and power consumption. This paper proposes TinyVQA, a novel multimodal deep neural network for visual question answering tasks that can be deployed on resource-constrained tinyML hardware. TinyVQA leverages a supervised attention-based model to learn how to answer questions about images using both vision and language modalities. Distilled knowledge from the supervised attention-based VQA model trains the memory aware compact TinyVQA model and low bit-width quantization technique is employed to further compress the model for deployment on tinyML devices. The TinyVQA model was evaluated on the FloodNet dataset, which is used for post-disaster damage assessment. The compact model achieved an accuracy of 79.5%, demonstrating the effectiveness of TinyVQA for real-world applications. Additionally, the model was deployed on a Crazyflie 2.0 drone, equipped with an AI deck and GAP8 microprocessor. The TinyVQA model achieved low latencies of 56 ms and consumes 693 mW power while deployed on the tiny drone, showcasing its suitability for resource-constrained embedded systems.

Updated: 2024-04-04 16:38:49

标题: TinyVQA:用于资源受限设备上的视觉问答的紧凑多模态深度神经网络

摘要: 传统的机器学习模型通常需要强大的硬件,这使它们不适合部署在资源有限的设备上。微型机器学习(tinyML)已经成为在这些设备上运行机器学习模型的一种有前途的方法,但是将多个数据模态集成到tinyML模型中仍然是一个挑战,因为增加了复杂性、延迟和能耗。本文提出了TinyVQA,这是一个新颖的用于视觉问答任务的多模态深度神经网络,可以部署在资源受限的tinyML硬件上。TinyVQA利用监督注意力模型来学习如何使用视觉和语言模态回答关于图像的问题。从监督注意力型VQA模型中提炼的知识训练了内存感知紧凑的TinyVQA模型,并采用低位宽量化技术进一步压缩模型以在tinyML设备上部署。TinyVQA模型在用于灾后损坏评估的FloodNet数据集上进行了评估。这个紧凑的模型实现了79.5%的准确率,展示了TinyVQA在实际应用中的有效性。此外,该模型被部署在一架装备有AI板和GAP8微处理器的Crazyflie 2.0无人机上。TinyVQA模型在部署在小型无人机上时实现了56毫秒的低延迟,并消耗693毫瓦的功耗,展示了它在资源受限的嵌入式系统中的适用性。

更新时间: 2024-04-04 16:38:49

领域: cs.CV,cs.AI,cs.LG

下载: http://arxiv.org/abs/2404.03574v1

From Language Modeling to Instruction Following: Understanding the Behavior Shift in LLMs after Instruction Tuning

Large Language Models (LLMs) have achieved remarkable success, where instruction tuning is the critical step in aligning LLMs with user intentions. In this work, we investigate how the instruction tuning adjusts pre-trained models with a focus on intrinsic changes. Specifically, we first develop several local and global explanation methods, including a gradient-based method for input-output attribution, and techniques for interpreting patterns and concepts in self-attention and feed-forward layers. The impact of instruction tuning is then studied by comparing the explanations derived from the pre-trained and instruction-tuned models. This approach provides an internal perspective of the model shifts on a human-comprehensible level. Our findings reveal three significant impacts of instruction tuning: 1) It empowers LLMs to recognize the instruction parts of user prompts, and promotes the response generation constantly conditioned on the instructions. 2) It encourages the self-attention heads to capture more word-word relationships about instruction verbs. 3) It encourages the feed-forward networks to rotate their pre-trained knowledge toward user-oriented tasks. These insights contribute to a more comprehensive understanding of instruction tuning and lay the groundwork for future work that aims at explaining and optimizing LLMs for various applications. Our code and data are publicly available at https://github.com/JacksonWuxs/Interpret_Instruction_Tuning_LLMs.

Updated: 2024-04-04 16:30:31

标题: 从语言建模到指令跟随:理解在指令调整后LLM的行为转变

摘要: 大型语言模型(LLMs)取得了显著的成功,其中指导调整是将LLMs与用户意图对齐的关键步骤。在这项工作中,我们研究了指导调整如何调整预训练模型,重点关注内在变化。具体来说,我们首先开发了几种局部和全局解释方法,包括基于梯度的输入输出归因方法,以及解释自注意力和前馈层中的模式和概念的技术。然后通过比较从预训练和指导调整模型派生的解释来研究指导调整的影响。这种方法提供了对模型转变的内部透视,可在人类可理解的水平上理解。我们的研究发现指导调整有三个重要影响:1) 它使LLMs能够识别用户提示中的指导部分,并促进生成与指导恒定相关的响应。2) 它鼓励自注意力头捕捉有关指导动词的更多词-词关系。3) 它鼓励前馈网络将其预训练知识转向面向用户的任务。这些发现有助于更全面地理解指导调整,并为未来旨在解释和优化LLMs用于各种应用的工作奠定基础。我们的代码和数据可以在https://github.com/JacksonWuxs/Interpret_Instruction_Tuning_LLMs 上公开获取。

更新时间: 2024-04-04 16:30:31

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2310.00492v3

Hybrid Ground-State Quantum Algorithms based on Neural Schrödinger Forging

Entanglement forging based variational algorithms leverage the bi-partition of quantum systems for addressing ground state problems. The primary limitation of these approaches lies in the exponential summation required over the numerous potential basis states, or bitstrings, when performing the Schmidt decomposition of the whole system. To overcome this challenge, we propose a new method for entanglement forging employing generative neural networks to identify the most pertinent bitstrings, eliminating the need for the exponential sum. Through empirical demonstrations on systems of increasing complexity, we show that the proposed algorithm achieves comparable or superior performance compared to the existing standard implementation of entanglement forging. Moreover, by controlling the amount of required resources, this scheme can be applied to larger, as well as non permutation invariant systems, where the latter constraint is associated with the Heisenberg forging procedure. We substantiate our findings through numerical simulations conducted on spins models exhibiting one-dimensional ring, two-dimensional triangular lattice topologies, and nuclear shell model configurations.

Updated: 2024-04-04 16:27:08

标题: 基于神经薛定谔锻造的混合基态量子算法

摘要: 基于纠缠锻造的变分算法利用量子系统的双分割来处理基态问题。这些方法的主要限制在于在执行整个系统的Schmidt分解时需要对众多潜在基态或比特串进行指数求和。为了克服这一挑战,我们提出了一种利用生成神经网络识别最相关比特串的新的纠缠锻造方法,消除了指数求和的需要。通过对逐渐增加复杂性的系统进行实证演示,我们展示了所提出的算法与现有标准纠缠锻造实现相比取得了可比较或更优越的性能。此外,通过控制所需资源的数量,该方案可以应用于更大的、以及非置换不变系统,后者约束与海森堡锻造程序相关。我们通过在展示一维环、二维三角格子拓扑结构和核壳模型配置的自旋模型上进行的数值模拟来证实我们的发现。

更新时间: 2024-04-04 16:27:08

领域: quant-ph,cond-mat.stat-mech,cs.LG

下载: http://arxiv.org/abs/2307.02633v2

Cameras as Rays: Pose Estimation via Ray Diffusion

Estimating camera poses is a fundamental task for 3D reconstruction and remains challenging given sparsely sampled views (<10). In contrast to existing approaches that pursue top-down prediction of global parametrizations of camera extrinsics, we propose a distributed representation of camera pose that treats a camera as a bundle of rays. This representation allows for a tight coupling with spatial image features improving pose precision. We observe that this representation is naturally suited for set-level transformers and develop a regression-based approach that maps image patches to corresponding rays. To capture the inherent uncertainties in sparse-view pose inference, we adapt this approach to learn a denoising diffusion model which allows us to sample plausible modes while improving performance. Our proposed methods, both regression- and diffusion-based, demonstrate state-of-the-art performance on camera pose estimation on CO3D while generalizing to unseen object categories and in-the-wild captures.

Updated: 2024-04-04 16:27:06

标题: 相机作为光线:通过光线扩散进行姿态估计

摘要: 估计相机姿态是三维重建的基本任务,鉴于视图稀疏采样(<10),仍然具有挑战性。与追求相机外参数全局参数化的现有方法相反,我们提出了一种将相机姿态视为一束光线的分布表示。这种表示允许与空间图像特征紧密耦合,提高姿态精度。我们观察到,这种表示自然适合于集合级别的转换器,并开发了一种基于回归的方法,将图像补丁映射到相应的光线上。为了捕捉稀疏视图姿态推断中固有的不确定性,我们调整了这种方法,学习了一个去噪扩散模型,使我们能够采样可能的模式同时提高性能。我们提出的方法,无论是基于回归还是扩散,都在CO3D上展示了最先进的相机姿态估计性能,同时可以泛化到未见的物体类别和野外捕捉。

更新时间: 2024-04-04 16:27:06

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2402.14817v3

Long-term Forecasting with TiDE: Time-series Dense Encoder

Recent work has shown that simple linear models can outperform several Transformer based approaches in long term time-series forecasting. Motivated by this, we propose a Multi-layer Perceptron (MLP) based encoder-decoder model, Time-series Dense Encoder (TiDE), for long-term time-series forecasting that enjoys the simplicity and speed of linear models while also being able to handle covariates and non-linear dependencies. Theoretically, we prove that the simplest linear analogue of our model can achieve near optimal error rate for linear dynamical systems (LDS) under some assumptions. Empirically, we show that our method can match or outperform prior approaches on popular long-term time-series forecasting benchmarks while being 5-10x faster than the best Transformer based model.

Updated: 2024-04-04 16:24:19

标题: TiDE: 时间序列密集编码器的长期预测

摘要: 最近的研究表明,简单的线性模型在长期时间序列预测中可以胜过几种基于Transformer的方法。受此启发,我们提出了一种基于多层感知器(MLP)的编码器-解码器模型,称为时间序列密集编码器(TiDE),用于长期时间序列预测,具有线性模型的简单性和速度,同时还能处理协变量和非线性依赖关系。从理论上讲,我们证明了我们模型的最简单线性模拟可以在一些假设下达到线性动力系统(LDS)的近乎最优误差率。从经验上看,我们表明我们的方法可以在流行的长期时间序列预测基准测试中与或胜过之前的方法,同时比最佳Transformer模型快5-10倍。

更新时间: 2024-04-04 16:24:19

领域: stat.ML,cs.LG

下载: http://arxiv.org/abs/2304.08424v5

How does Multi-Task Training Affect Transformer In-Context Capabilities? Investigations with Function Classes

Large language models (LLM) have recently shown the extraordinary ability to perform unseen tasks based on few-shot examples provided as text, also known as in-context learning (ICL). While recent works have attempted to understand the mechanisms driving ICL, few have explored training strategies that incentivize these models to generalize to multiple tasks. Multi-task learning (MTL) for generalist models is a promising direction that offers transfer learning potential, enabling large parameterized models to be trained from simpler, related tasks. In this work, we investigate the combination of MTL with ICL to build models that efficiently learn tasks while being robust to out-of-distribution examples. We propose several effective curriculum learning strategies that allow ICL models to achieve higher data efficiency and more stable convergence. Our experiments reveal that ICL models can effectively learn difficult tasks by training on progressively harder tasks while mixing in prior tasks, denoted as mixed curriculum in this work. Our code and models are available at https://github.com/harmonbhasin/curriculum_learning_icl .

Updated: 2024-04-04 16:15:23

标题: 多任务训练如何影响变压器上下文能力?与功能类别的研究调查

摘要: 最近,大型语言模型(LLM)展示了非凡的能力,能够基于少量提供的文本示例执行未见过的任务,这种方法被称为上下文学习(ICL)。虽然最近的研究尝试理解驱动ICL的机制,但很少有人探索激励这些模型泛化到多个任务的训练策略。对于通用模型来说,多任务学习(MTL)是一个有前途的方向,它提供了迁移学习的潜力,使大型参数化模型可以从更简单的相关任务中训练出来。在这项工作中,我们调查了将MTL与ICL相结合,构建出能够高效学习任务并对分布之外的示例具有鲁棒性的模型。我们提出了几种有效的课程学习策略,使ICL模型能够实现更高的数据效率和更稳定的收敛。我们的实验表明,ICL模型可以通过逐渐提升难度的任务进行训练,同时混合先前的任务,即本工作中标记为混合课程的方式,有效学习困难的任务。我们的代码和模型可在https://github.com/harmonbhasin/curriculum_learning_icl 上获得。

更新时间: 2024-04-04 16:15:23

领域: cs.CL,cs.LG

下载: http://arxiv.org/abs/2404.03558v1

APISR: Anime Production Inspired Real-World Anime Super-Resolution

While real-world anime super-resolution (SR) has gained increasing attention in the SR community, existing methods still adopt techniques from the photorealistic domain. In this paper, we analyze the anime production workflow and rethink how to use characteristics of it for the sake of the real-world anime SR. First, we argue that video networks and datasets are not necessary for anime SR due to the repetition use of hand-drawing frames. Instead, we propose an anime image collection pipeline by choosing the least compressed and the most informative frames from the video sources. Based on this pipeline, we introduce the Anime Production-oriented Image (API) dataset. In addition, we identify two anime-specific challenges of distorted and faint hand-drawn lines and unwanted color artifacts. We address the first issue by introducing a prediction-oriented compression module in the image degradation model and a pseudo-ground truth preparation with enhanced hand-drawn lines. In addition, we introduce the balanced twin perceptual loss combining both anime and photorealistic high-level features to mitigate unwanted color artifacts and increase visual clarity. We evaluate our method through extensive experiments on the public benchmark, showing our method outperforms state-of-the-art anime dataset-trained approaches.

Updated: 2024-04-04 16:12:51

标题: APISR:受动漫制作启发的现实世界动漫超分辨率

摘要: 尽管真实世界的动漫超分辨率(SR)在SR社区中越来越受到关注,但现有方法仍然采用来自照片逼真领域的技术。在本文中,我们分析了动漫制作工作流程,并重新思考如何利用其特征来实现真实世界的动漫SR。首先,我们认为视频网络和数据集对于动漫SR并不必要,因为手绘帧的重复使用。相反,我们通过从视频源中选择最少压缩和最具信息量的帧提出了一个动漫图像收集管线。基于这个管线,我们引入了动漫制作导向图像(API)数据集。此外,我们确定了两个动漫特有的挑战,即扭曲和淡薄的手绘线条以及不需要的色彩伪影。我们通过在图像退化模型中引入面向预测的压缩模块和增强手绘线条的伪地面实现了对第一个问题的解决。此外,我们引入了平衡的双感知损失,结合动漫和照片逼真的高级特征,以减轻不需要的色彩伪影并提高视觉清晰度。我们通过在公共基准上进行广泛实验评估了我们的方法,结果表明我们的方法优于最先进的动漫数据集训练方法。

更新时间: 2024-04-04 16:12:51

领域: eess.IV,cs.AI,cs.CV

下载: http://arxiv.org/abs/2403.01598v2

Alzheimer's disease detection in PSG signals

Alzheimer's disease (AD) and sleep disorders exhibit a close association, where disruptions in sleep patterns often precede the onset of Mild Cognitive Impairment (MCI) and early-stage AD. This study delves into the potential of utilizing sleep-related electroencephalography (EEG) signals acquired through polysomnography (PSG) for the early detection of AD. Our primary focus is on exploring semi-supervised Deep Learning techniques for the classification of EEG signals due to the clinical scenario characterized by the limited data availability. The methodology entails testing and comparing the performance of semi-supervised SMATE and TapNet models, benchmarked against the supervised XCM model, and unsupervised Hidden Markov Models (HMMs). The study highlights the significance of spatial and temporal analysis capabilities, conducting independent analyses of each sleep stage. Results demonstrate the effectiveness of SMATE in leveraging limited labeled data, achieving stable metrics across all sleep stages, and reaching 90% accuracy in its supervised form. Comparative analyses reveal SMATE's superior performance over TapNet and HMM, while XCM excels in supervised scenarios with an accuracy range of 92 - 94%. These findings underscore the potential of semi-supervised models in early AD detection, particularly in overcoming the challenges associated with the scarcity of labeled data. Ablation tests affirm the critical role of spatio-temporal feature extraction in semi-supervised predictive performance, and t-SNE visualizations validate the model's proficiency in distinguishing AD patterns. Overall, this research contributes to the advancement of AD detection through innovative Deep Learning approaches, highlighting the crucial role of semi-supervised learning in addressing data limitations.

Updated: 2024-04-04 15:56:23

标题: PSG信号中的阿尔茨海默病检测

摘要: 阿尔茨海默病(AD)和睡眠障碍表现出密切关联,睡眠模式的紊乱往往在轻度认知障碍(MCI)和早期AD的发作之前出现。本研究探讨了利用通过多导睡眠图(PSG)获取的与睡眠相关的脑电图(EEG)信号来早期检测AD的潜力。我们的主要重点是探索半监督深度学习技术用于对EEG信号进行分类,因为临床情景特征是数据可用性有限。该方法包括测试和比较半监督SMATE和TapNet模型的性能,与监督XCM模型和无监督隐马尔可夫模型(HMMs)进行基准比较。研究强调了空间和时间分析能力的重要性,对每个睡眠阶段进行独立分析。结果表明,SMATE在利用有限标记数据方面是有效的,在所有睡眠阶段都实现了稳定的指标,并在其监督形式下达到了90%的准确率。比较分析显示,SMATE在早期AD检测中表现优于TapNet和HMM,而XCM在监督情景中表现出92-94%的准确率范围。这些发现强调了半监督模型在早期AD检测中的潜力,特别是在克服标记数据稀缺性所带来的挑战方面。消融测试证实了半监督预测性能中空间-时间特征提取的关键作用,t-SNE可视化验证了模型在区分AD模式方面的能力。总体而言,这项研究通过创新的深度学习方法为AD检测的进步做出了贡献,突出了半监督学习在解决数据限制方面的关键作用。

更新时间: 2024-04-04 15:56:23

领域: eess.SP,cs.AI,68T07 (Primary), 68T05, 92B20 (Secondary),I.2.1

下载: http://arxiv.org/abs/2404.03549v1

CodeEditorBench: Evaluating Code Editing Capability of Large Language Models

Large Language Models (LLMs) for code are rapidly evolving, with code editing emerging as a critical capability. We introduce CodeEditorBench, an evaluation framework designed to rigorously assess the performance of LLMs in code editing tasks, including debugging, translating, polishing, and requirement switching. Unlike existing benchmarks focusing solely on code generation, CodeEditorBench emphasizes real-world scenarios and practical aspects of software development. We curate diverse coding challenges and scenarios from five sources, covering various programming languages, complexity levels, and editing tasks. Evaluation of 19 LLMs reveals that closed-source models (particularly Gemini-Ultra and GPT-4), outperform open-source models in CodeEditorBench, highlighting differences in model performance based on problem types and prompt sensitivities. CodeEditorBench aims to catalyze advancements in LLMs by providing a robust platform for assessing code editing capabilities. We will release all prompts and datasets to enable the community to expand the dataset and benchmark emerging LLMs. By introducing CodeEditorBench, we contribute to the advancement of LLMs in code editing and provide a valuable resource for researchers and practitioners.

Updated: 2024-04-04 15:49:49

标题: CodeEditorBench: 评估大型语言模型的代码编辑能力

摘要: 大型语言模型(LLMs)用于代码的应用正在迅速发展,代码编辑逐渐成为一个关键能力。我们引入了CodeEditorBench,这是一个旨在严格评估LLMs在代码编辑任务中表现的评估框架,包括调试、翻译、优化和需求切换。与现有的侧重于代码生成的基准不同,CodeEditorBench强调了现实世界场景和软件开发的实际方面。我们从五个来源精心策划了各种编码挑战和场景,涵盖了各种编程语言、复杂性水平和编辑任务。对19个LLMs的评估显示,封闭源模型(特别是Gemini-Ultra和GPT-4)在CodeEditorBench中表现优于开源模型,突出了基于问题类型和提示敏感性的模型性能差异。CodeEditorBench旨在通过提供一个强大的平台来评估代码编辑能力,推动LLMs的进步。我们将发布所有提示和数据集,以便社区扩大数据集并对新兴LLMs进行基准测试。通过引入CodeEditorBench,我们为LLMs在代码编辑方面的进步做出了贡献,并为研究人员和实践者提供了宝贵的资源。

更新时间: 2024-04-04 15:49:49

领域: cs.SE,cs.AI,cs.CL,cs.LG

下载: http://arxiv.org/abs/2404.03543v1

The solving degrees for computing Gröbner bases of affine semi-regular polynomial sequences

Determining the complexity of computing Gr\"{o}bner bases is an important problem both in theory and in practice, and for that the solving degree plays a key role. In this paper, we study the solving degrees of affine semi-regular sequences and their homogenized sequences. Some of our results are considered to give mathematically rigorous proofs of the correctness of methods for computing Gr\"{o}bner bases of the ideal generated by an affine semi-regular sequence. This paper is a sequel of the authors' previous work and gives additional results on the solving degrees and important behaviors of Gr\"obner basis computation.

Updated: 2024-04-04 15:35:41

标题: 计算仿射半规则多项式序列的Gröbner基的求解度量

摘要: 确定计算Groebner基的复杂性在理论和实践中都是一个重要问题,其中解决度起着关键作用。本文研究了仿射半正则序列及其均齐序列的解决度。我们的一些结果被认为是对计算由仿射半正则序列生成的理想的Groebner基的方法的正确性进行数学严格证明。本文是作者先前工作的续篇,并提供了有关解决度和Groebner基计算的重要行为的额外结果。

更新时间: 2024-04-04 15:35:41

领域: math.AC,cs.CR,cs.SC,math.AG

下载: http://arxiv.org/abs/2404.03530v1

Trust in AI: Progress, Challenges, and Future Directions

The increasing use of artificial intelligence (AI) systems in our daily life through various applications, services, and products explains the significance of trust/distrust in AI from a user perspective. AI-driven systems (as opposed to other technologies) have ubiquitously diffused in our life not only as some beneficial tools to be used by human agents but also are going to be substitutive agents on our behalf, or manipulative minds that would influence human thought, decision, and agency. Trust/distrust in AI plays the role of a regulator and could significantly control the level of this diffusion, as trust can increase, and distrust may reduce the rate of adoption of AI. Recently, varieties of studies have paid attention to the variant dimension of trust/distrust in AI, and its relevant considerations. In this systematic literature review, after conceptualization of trust in the current AI literature review, we will investigate trust in different types of human-Machine interaction, and its impact on technology acceptance in different domains. In addition to that, we propose a taxonomy of technical (i.e., safety, accuracy, robustness) and non-technical axiological (i.e., ethical, legal, and mixed) trustworthiness metrics, and some trustworthy measurements. Moreover, we examine some major trust-breakers in AI (e.g., autonomy and dignity threat), and trust makers; and propose some future directions and probable solutions for the transition to a trustworthy AI.

Updated: 2024-04-04 15:34:37

标题: 人工智能中的信任:进展、挑战和未来方向

摘要: 随着人工智能(AI)系统在我们日常生活中通过各种应用、服务和产品的不断增加,从用户角度来看信任/不信任AI的重要性得到解释。与其他技术相比,AI驱动系统已经普遍渗透到我们的生活中,不仅作为一些有益的工具供人类使用,而且也将成为我们的代理人,或者是具有影响人类思想、决策和行动的操纵性思维。在AI中的信任/不信任扮演着调节者的角色,可以明显地控制这种渗透的程度,信任可以增加,而不信任可能会减少AI的采用率。最近,各种研究已经关注了AI中信任/不信任的不同维度及相关考虑因素。在这篇系统文献综述中,在对当前AI文献中对信任的概念化之后,我们将调查不同类型的人机交互中的信任及其对不同领域技术接受的影响。除此之外,我们提出了技术(即安全性、准确性、鲁棒性)和非技术的价值观(即道德、法律和混合)可信度指标的分类法,以及一些可信度测量。此外,我们研究了AI中一些主要的信任破坏者(例如,自主性和尊严威胁),以及信任制造者,并提出了一些未来的方向和可能的解决方案,以实现向一个可信的AI的过渡。

更新时间: 2024-04-04 15:34:37

领域: cs.CY,cs.AI

下载: http://arxiv.org/abs/2403.14680v3

BanglaAutoKG: Automatic Bangla Knowledge Graph Construction with Semantic Neural Graph Filtering

Knowledge Graphs (KGs) have proven essential in information processing and reasoning applications because they link related entities and give context-rich information, supporting efficient information retrieval and knowledge discovery; presenting information flow in a very effective manner. Despite being widely used globally, Bangla is relatively underrepresented in KGs due to a lack of comprehensive datasets, encoders, NER (named entity recognition) models, POS (part-of-speech) taggers, and lemmatizers, hindering efficient information processing and reasoning applications in the language. Addressing the KG scarcity in Bengali, we propose BanglaAutoKG, a pioneering framework that is able to automatically construct Bengali KGs from any Bangla text. We utilize multilingual LLMs to understand various languages and correlate entities and relations universally. By employing a translation dictionary to identify English equivalents and extracting word features from pre-trained BERT models, we construct the foundational KG. To reduce noise and align word embeddings with our goal, we employ graph-based polynomial filters. Lastly, we implement a GNN-based semantic filter, which elevates contextual understanding and trims unnecessary edges, culminating in the formation of the definitive KG. Empirical findings and case studies demonstrate the universal effectiveness of our model, capable of autonomously constructing semantically enriched KGs from any text.

Updated: 2024-04-04 15:31:21

标题: BanglaAutoKG:使用语义神经图过滤自动构建孟加拉知识图

摘要: 知识图谱(KGs)已被证明在信息处理和推理应用中至关重要,因为它们链接相关实体并提供丰富的上下文信息,支持高效的信息检索和知识发现;以非常有效的方式展示信息流动。尽管在全球范围内被广泛使用,但由于缺乏全面的数据集、编码器、NER(命名实体识别)模型、POS(词性)标记器和词形还原器,孟加拉语在KGs中相对欠代表,阻碍了该语言中高效的信息处理和推理应用。为解决孟加拉语中的KG稀缺问题,我们提出了BanglaAutoKG,这是一个开创性的框架,能够从任何孟加拉语文本自动构建孟加拉语KGs。我们利用多语言LLMs来理解各种语言并普遍关联实体和关系。通过使用翻译词典识别英文等价词,并从预训练的BERT模型中提取单词特征,我们构建了基础KG。为了减少噪音并使单词嵌入与我们的目标保持一致,我们采用基于图的多项式滤波器。最后,我们实施了基于GNN的语义过滤器,提升了上下文理解并修剪了不必要的边,最终形成了明确的KG。实证研究和案例研究表明了我们模型的通用有效性,能够自主从任何文本中构建语义丰富的KGs。

更新时间: 2024-04-04 15:31:21

领域: cs.CL,cs.IR,cs.LG,cs.NE,cs.SI

下载: http://arxiv.org/abs/2404.03528v1

WeSee: Using Malicious #VC Interrupts to Break AMD SEV-SNP

AMD SEV-SNP offers VM-level trusted execution environments (TEEs) to protect the confidentiality and integrity for sensitive cloud workloads from untrusted hypervisor controlled by the cloud provider. AMD introduced a new exception, #VC, to facilitate the communication between the VM and the untrusted hypervisor. We present WeSee attack, where the hypervisor injects malicious #VC into a victim VM's CPU to compromise the security guarantees of AMD SEV-SNP. Specifically, WeSee injects interrupt number 29, which delivers a #VC exception to the VM who then executes the corresponding handler that performs data and register copies between the VM and the hypervisor. WeSee shows that using well-crafted #VC injections, the attacker can induce arbitrary behavior in the VM. Our case-studies demonstrate that WeSee can leak sensitive VM information (kTLS keys for NGINX), corrupt kernel data (firewall rules), and inject arbitrary code (launch a root shell from the kernel space).

Updated: 2024-04-04 15:30:13

标题: WeSee:利用恶意#VC中断来破解AMD SEV-SNP

摘要: AMD SEV-SNP提供了虚拟机级别的可信执行环境(TEEs),用于保护敏感云工作负载的机密性和完整性,免受由云提供商控制的不受信任的超级处理器的影响。AMD引入了一个新的异常,#VC,以促进虚拟机与不受信任的超级处理器之间的通信。我们提出了WeSee攻击,其中超级处理器将恶意#VC注入受害者虚拟机的CPU,以破坏AMD SEV-SNP的安全保证。具体而言,WeSee注入了中断号29,该中断将#VC异常传递给虚拟机,然后执行相应的处理程序,该处理程序在虚拟机和超级处理器之间执行数据和寄存器复制。WeSee显示,通过使用精心设计的#VC注入,攻击者可以在虚拟机中引发任意行为。我们的案例研究表明,WeSee可以泄露敏感虚拟机信息(NGINX的kTLS密钥),破坏内核数据(防火墙规则)并注入任意代码(从内核空间启动root shell)。

更新时间: 2024-04-04 15:30:13

领域: cs.CR

下载: http://arxiv.org/abs/2404.03526v1

Approximate Gradient Coding for Privacy-Flexible Federated Learning with Non-IID Data

This work focuses on the challenges of non-IID data and stragglers/dropouts in federated learning. We introduce and explore a privacy-flexible paradigm that models parts of the clients' local data as non-private, offering a more versatile and business-oriented perspective on privacy. Within this framework, we propose a data-driven strategy for mitigating the effects of label heterogeneity and client straggling on federated learning. Our solution combines both offline data sharing and approximate gradient coding techniques. Through numerical simulations using the MNIST dataset, we demonstrate that our approach enables achieving a deliberate trade-off between privacy and utility, leading to improved model convergence and accuracy while using an adaptable portion of non-private data.

Updated: 2024-04-04 15:29:50

标题: 非 IID 数据的隐私灵活联邦学习的近似梯度编码

摘要: 这项工作关注非独立同分布数据和联邦学习中的stragglers/dropouts挑战。我们引入并探索了一种隐私灵活范式,将客户端本地数据的部分建模为非私有,提供了一个更多样化和以业务为导向的隐私视角。在这个框架内,我们提出了一种基于数据的策略,用于减轻标签异质性和客户端straggling对联邦学习的影响。我们的解决方案结合了离线数据共享和近似梯度编码技术。通过使用MNIST数据集进行数值模拟,我们证明了我们的方法可以实现隐私和效用之间的有意义权衡,从而提高模型收敛性和准确性,同时利用非私有数据的可调部分。

更新时间: 2024-04-04 15:29:50

领域: cs.LG,cs.CR,cs.DC,cs.IT,math.IT,stat.ML

下载: http://arxiv.org/abs/2404.03524v1

Provably Robust and Plausible Counterfactual Explanations for Neural Networks via Robust Optimisation

Counterfactual Explanations (CEs) have received increasing interest as a major methodology for explaining neural network classifiers. Usually, CEs for an input-output pair are defined as data points with minimum distance to the input that are classified with a different label than the output. To tackle the established problem that CEs are easily invalidated when model parameters are updated (e.g. retrained), studies have proposed ways to certify the robustness of CEs under model parameter changes bounded by a norm ball. However, existing methods targeting this form of robustness are not sound or complete, and they may generate implausible CEs, i.e., outliers wrt the training dataset. In fact, no existing method simultaneously optimises for closeness and plausibility while preserving robustness guarantees. In this work, we propose Provably RObust and PLAusible Counterfactual Explanations (PROPLACE), a method leveraging on robust optimisation techniques to address the aforementioned limitations in the literature. We formulate an iterative algorithm to compute provably robust CEs and prove its convergence, soundness and completeness. Through a comparative experiment involving six baselines, five of which target robustness, we show that PROPLACE achieves state-of-the-art performances against metrics on three evaluation aspects.

Updated: 2024-04-04 15:29:25

标题: 经过稳健优化的神经网络可证明健壮和合理的反事实解释

摘要: 反事实解释(CEs)作为解释神经网络分类器的一种主要方法,受到越来越多的关注。通常,对于输入-输出对,CEs被定义为与输入具有最小距离并且被分类为与输出不同标签的数据点。为了解决CEs在模型参数更新(例如重新训练)时容易无效的问题,研究提出了一些方法来保证CEs在由范数球限制的模型参数变化下的鲁棒性。然而,针对这种鲁棒性的现有方法既不完备也不准确,它们可能生成不合理的CEs,即相对于训练数据集的异常值。实际上,没有任何现有方法能够在保留鲁棒性保证的同时同时优化接近度和合理性。在这项工作中,我们提出了可证明鲁棒和合理的反事实解释(PROPLACE),这是一种利用鲁棒优化技术来解决文献中提到的限制的方法。我们制定了一个迭代算法来计算可证明鲁棒的CEs,并证明其收敛性、准确性和完备性。通过一个涉及六个基线的比较实验,其中五个基线针对鲁棒性,我们展示了PROPLACE在三个评估方面的性能达到了最先进水平。

更新时间: 2024-04-04 15:29:25

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2309.12545v2

Learn When (not) to Trust Language Models: A Privacy-Centric Adaptive Model-Aware Approach

Retrieval-augmented large language models (LLMs) have been remarkably competent in various NLP tasks. Despite their great success, the knowledge provided by the retrieval process is not always useful for improving the model prediction, since in some samples LLMs may already be quite knowledgeable and thus be able to answer the question correctly without retrieval. Aiming to save the cost of retrieval, previous work has proposed to determine when to do/skip the retrieval in a data-aware manner by analyzing the LLMs' pretraining data. However, these data-aware methods pose privacy risks and memory limitations, especially when requiring access to sensitive or extensive pretraining data. Moreover, these methods offer limited adaptability under fine-tuning or continual learning settings. We hypothesize that token embeddings are able to capture the model's intrinsic knowledge, which offers a safer and more straightforward way to judge the need for retrieval without the privacy risks associated with accessing pre-training data. Moreover, it alleviates the need to retain all the data utilized during model pre-training, necessitating only the upkeep of the token embeddings. Extensive experiments and in-depth analyses demonstrate the superiority of our model-aware approach.

Updated: 2024-04-04 15:21:22

标题: 学习何时(不)信任语言模型:一种以隐私为中心的自适应模型感知方法

摘要: 检索增强的大型语言模型(LLMs)在各种自然语言处理任务中表现出色。尽管取得了巨大成功,检索过程提供的知识并不总是有助于改善模型的预测,因为在某些样本中,LLMs可能已经相当了解并能够正确回答问题而无需检索。为了节省检索成本,先前的工作提出了通过分析LLMs的预训练数据来决定何时执行/跳过检索的数据感知方法。然而,这些数据感知方法存在隐私风险和内存限制,尤其是当需要访问敏感或大量的预训练数据时。此外,这些方法在微调或持续学习设置下的适应性有限。我们假设标记嵌入能够捕捉模型的内在知识,从而提供一种更安全、更直接的方式来判断是否需要检索,而不涉及访问预训练数据所带来的隐私风险。此外,它减轻了保留在模型预训练期间使用的所有数据的需要,只需要维护标记嵌入。大量实验和深入分析证明了我们的模型感知方法的优越性。

更新时间: 2024-04-04 15:21:22

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2404.03514v1

CMB: A Comprehensive Medical Benchmark in Chinese

Large Language Models (LLMs) provide a possibility to make a great breakthrough in medicine. The establishment of a standardized medical benchmark becomes a fundamental cornerstone to measure progression. However, medical environments in different regions have their local characteristics, e.g., the ubiquity and significance of traditional Chinese medicine within China. Therefore, merely translating English-based medical evaluation may result in \textit{contextual incongruities} to a local region. To solve the issue, we propose a localized medical benchmark called CMB, a Comprehensive Medical Benchmark in Chinese, designed and rooted entirely within the native Chinese linguistic and cultural framework. While traditional Chinese medicine is integral to this evaluation, it does not constitute its entirety. Using this benchmark, we have evaluated several prominent large-scale LLMs, including ChatGPT, GPT-4, dedicated Chinese LLMs, and LLMs specialized in the medical domain. We hope this benchmark provide first-hand experience in existing LLMs for medicine and also facilitate the widespread adoption and enhancement of medical LLMs within China. Our data and code are publicly available at https://github.com/FreedomIntelligence/CMB.

Updated: 2024-04-04 15:16:57

标题: CMB:一个全面的中文医学基准

摘要: 大型语言模型(LLMs)为医学领域的突破提供了可能性。建立一个标准化的医学基准成为衡量进展的基本基石。然而,不同地区的医学环境具有其本地特色,例如在中国传统中医药的普遍性和重要性。因此,仅仅将基于英文的医学评估翻译成中文可能会导致与当地环境不一致的情况。为了解决这个问题,我们提出了一个名为CMB的本地化医学基准,这是一个完全根植于中国本土语言和文化框架中的综合医学基准。虽然传统中医药对这个评估至关重要,但并不构成其全部。使用这个基准,我们评估了几个知名的大规模LLMs,包括ChatGPT、GPT-4、专门的中文LLMs和专注于医学领域的LLMs。我们希望这个基准为医学领域现有的LLMs提供第一手体验,并促进中国内医学LLMs的广泛采用和提升。我们的数据和代码可公开访问于https://github.com/FreedomIntelligence/CMB。

更新时间: 2024-04-04 15:16:57

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2308.08833v2

Privacy-Enhancing Technologies for Artificial Intelligence-Enabled Systems

Artificial intelligence (AI) models introduce privacy vulnerabilities to systems. These vulnerabilities may impact model owners or system users; they exist during model development, deployment, and inference phases, and threats can be internal or external to the system. In this paper, we investigate potential threats and propose the use of several privacy-enhancing technologies (PETs) to defend AI-enabled systems. We then provide a framework for PETs evaluation for a AI-enabled systems and discuss the impact PETs may have on system-level variables.

Updated: 2024-04-04 15:14:40

标题: 隐私增强技术对人工智能系统的影响

摘要: 人工智能(AI)模型为系统引入了隐私漏洞。这些漏洞可能会影响模型所有者或系统用户;它们存在于模型开发、部署和推理阶段,并且威胁可以是系统内部的或外部的。在本文中,我们研究潜在的威胁,并提出使用几种增强隐私保护技术(PETs)来保护AI-enabled系统。然后,我们提供了一个PETs评估框架,用于AI-enabled系统,并讨论PETs可能对系统级变量产生的影响。

更新时间: 2024-04-04 15:14:40

领域: cs.CR

下载: http://arxiv.org/abs/2404.03509v1

CountARFactuals -- Generating plausible model-agnostic counterfactual explanations with adversarial random forests

Counterfactual explanations elucidate algorithmic decisions by pointing to scenarios that would have led to an alternative, desired outcome. Giving insight into the model's behavior, they hint users towards possible actions and give grounds for contesting decisions. As a crucial factor in achieving these goals, counterfactuals must be plausible, i.e., describing realistic alternative scenarios within the data manifold. This paper leverages a recently developed generative modeling technique -- adversarial random forests (ARFs) -- to efficiently generate plausible counterfactuals in a model-agnostic way. ARFs can serve as a plausibility measure or directly generate counterfactual explanations. Our ARF-based approach surpasses the limitations of existing methods that aim to generate plausible counterfactual explanations: It is easy to train and computationally highly efficient, handles continuous and categorical data naturally, and allows integrating additional desiderata such as sparsity in a straightforward manner.

Updated: 2024-04-04 15:10:13

标题: CountARFactuals -- 生成具有敌对随机森林的合理的模型无关的反事实解释

摘要: 反事实解释通过指出可能导致另一种期望结果的情景,阐明了算法决策。提供有关模型行为的洞察,提示用户可能的行动,并为争议决策提供依据。作为实现这些目标的关键因素,反事实必须是合理的,即描述数据流形内的现实替代情景。本文利用最近发展的生成建模技术 - 对抗性随机森林(ARFs) - 以一种模型无关的方式高效生成合理的反事实。ARFs可以作为合理性度量或直接生成反事实解释。我们基于ARF的方法超越了现有方法的局限性,这些方法旨在生成合理的反事实解释:它易于训练和计算效率高,自然处理连续和分类数据,并允许以直接的方式集成额外的愿望,如稀疏性。

更新时间: 2024-04-04 15:10:13

领域: stat.ML,cs.LG

下载: http://arxiv.org/abs/2404.03506v1

AI and the Problem of Knowledge Collapse

While artificial intelligence has the potential to process vast amounts of data, generate new insights, and unlock greater productivity, its widespread adoption may entail unforeseen consequences. We identify conditions under which AI, by reducing the cost of access to certain modes of knowledge, can paradoxically harm public understanding. While large language models are trained on vast amounts of diverse data, they naturally generate output towards the 'center' of the distribution. This is generally useful, but widespread reliance on recursive AI systems could lead to a process we define as "knowledge collapse", and argue this could harm innovation and the richness of human understanding and culture. However, unlike AI models that cannot choose what data they are trained on, humans may strategically seek out diverse forms of knowledge if they perceive them to be worthwhile. To investigate this, we provide a simple model in which a community of learners or innovators choose to use traditional methods or to rely on a discounted AI-assisted process and identify conditions under which knowledge collapse occurs. In our default model, a 20% discount on AI-generated content generates public beliefs 2.3 times further from the truth than when there is no discount. Finally, based on the results, we consider further research directions to counteract such outcomes.

Updated: 2024-04-04 15:06:23

标题: 人工智能与知识崩溃问题

摘要: 尽管人工智能有处理大量数据、产生新见解并提高生产力的潜力,但其广泛应用可能带来意想不到的后果。我们确定了某些条件下,人工智能通过降低获取某些知识模式的成本,可能矛盾地损害公众的理解。虽然大型语言模型是在大量多样化数据上训练的,它们自然会生成朝着“分布中心”的输出。这通常是有用的,但广泛依赖递归人工智能系统可能导致我们定义的“知识崩溃”过程,并认为这可能损害创新和人类理解与文化的丰富性。然而,不同于不能选择其训练数据的人工智能模型,人类可以有策略地寻求多样化的知识形式,如果他们认为这些知识形式是有价值的。为了研究这一点,我们提供了一个简单模型,其中一群学习者或创新者选择使用传统方法还是依赖于折扣的人工智能辅助过程,并确定了知识崩溃发生的条件。在我们的默认模型中,对人工智能生成内容的20%折扣使公众的信念离真相的距离增加了2.3倍。最后,根据结果,我们考虑了进一步的研究方向来抵消这种结果。

更新时间: 2024-04-04 15:06:23

领域: cs.AI,cs.CY,I.2.0

下载: http://arxiv.org/abs/2404.03502v1

Comprehensible Artificial Intelligence on Knowledge Graphs: A survey

Artificial Intelligence applications gradually move outside the safe walls of research labs and invade our daily lives. This is also true for Machine Learning methods on Knowledge Graphs, which has led to a steady increase in their application since the beginning of the 21st century. However, in many applications, users require an explanation of the Artificial Intelligences decision. This led to increased demand for Comprehensible Artificial Intelligence. Knowledge Graphs epitomize fertile soil for Comprehensible Artificial Intelligence, due to their ability to display connected data, i.e. knowledge, in a human- as well as machine-readable way. This survey gives a short history to Comprehensible Artificial Intelligence on Knowledge Graphs. Furthermore, we contribute by arguing that the concept Explainable Artificial Intelligence is overloaded and overlapping with Interpretable Machine Learning. By introducing the parent concept Comprehensible Artificial Intelligence, we provide a clear-cut distinction of both concepts while accounting for their similarities. Thus, we provide in this survey a case for Comprehensible Artificial Intelligence on Knowledge Graphs consisting of Interpretable Machine Learning on Knowledge Graphs and Explainable Artificial Intelligence on Knowledge Graphs. This leads to the introduction of a novel taxonomy for Comprehensible Artificial Intelligence on Knowledge Graphs. In addition, a comprehensive overview of the research on Comprehensible Artificial Intelligence on Knowledge Graphs is presented and put into the context of the taxonomy. Finally, research gaps in the field of Comprehensible Artificial Intelligence on Knowledge Graphs are identified for future research.

Updated: 2024-04-04 14:57:32

标题: 知识图谱上易理解的人工智能:一项调查

摘要: 人工智能应用逐渐走出研究实验室的安全壁垒,渗入我们的日常生活。这也适用于在知识图上的机器学习方法,自21世纪初以来,它们的应用数量稳步增加。然而,在许多应用中,用户需要对人工智能的决策进行解释。这导致对可理解人工智能的需求增加。知识图象征着可理解人工智能肥沃的土壤,因为它们能够以人类和机器可读的方式展示连接的数据,即知识。这项调查简要介绍了知识图上的可理解人工智能的历史。此外,我们认为可解释人工智能的概念过载,并与可解释机器学习重叠。通过引入父概念可理解人工智能,我们明确区分了这两个概念,同时考虑了它们的相似之处。因此,在这项调查中,我们提出了一个关于知识图上的可理解人工智能的案例,其中包括知识图上的可解释机器学习和知识图上的可解释人工智能。这导致了对知识图上可理解人工智能的新分类法的引入。此外,还提供了关于知识图上可理解人工智能研究的全面概述,并将其放入分类法的背景中。最后,确定了知识图上可理解人工智能领域的研究空白,供未来研究参考。

更新时间: 2024-04-04 14:57:32

领域: cs.AI

下载: http://arxiv.org/abs/2404.03499v1

As Good As A Coin Toss: Human detection of AI-generated images, videos, audio, and audiovisual stimuli

As synthetic media becomes progressively more realistic and barriers to using it continue to lower, the technology has been increasingly utilized for malicious purposes, from financial fraud to nonconsensual pornography. Today, the principal defense against being misled by synthetic media relies on the ability of the human observer to visually and auditorily discern between real and fake. However, it remains unclear just how vulnerable people actually are to deceptive synthetic media in the course of their day to day lives. We conducted a perceptual study with 1276 participants to assess how accurate people were at distinguishing synthetic images, audio only, video only, and audiovisual stimuli from authentic. To reflect the circumstances under which people would likely encounter synthetic media in the wild, testing conditions and stimuli emulated a typical online platform, while all synthetic media used in the survey was sourced from publicly accessible generative AI technology. We find that overall, participants struggled to meaningfully discern between synthetic and authentic content. We also find that detection performance worsens when the stimuli contains synthetic content as compared to authentic content, images featuring human faces as compared to non face objects, a single modality as compared to multimodal stimuli, mixed authenticity as compared to being fully synthetic for audiovisual stimuli, and features foreign languages as compared to languages the observer is fluent in. Finally, we also find that prior knowledge of synthetic media does not meaningfully impact their detection performance. Collectively, these results indicate that people are highly susceptible to being tricked by synthetic media in their daily lives and that human perceptual detection capabilities can no longer be relied upon as an effective counterdefense.

Updated: 2024-04-04 14:51:56

标题: 和抛硬币一样准确:人类对人工智能生成的图像、视频、音频和音视频刺激的检测

摘要: 随着合成媒体变得越来越逼真,使用它的障碍持续降低,这项技术越来越被用于恶意目的,从金融诈骗到非自愿色情。如今,防范被合成媒体误导的主要手段仍然依赖于人类观察者在视觉和听觉上区分真假的能力。然而,在日常生活中,人们实际上对欺骗性合成媒体的脆弱程度仍不清楚。我们进行了一项感知研究,共有1276名参与者,以评估人们在区分合成图像、仅音频、仅视频和音视频刺激与真实内容时的准确度。为了反映人们在野外遇到合成媒体的情况,测试条件和刺激模拟了一个典型的在线平台,而调查中使用的所有合成媒体均来自公开可访问的生成式人工智能技术。 我们发现,总体而言,参与者在合成和真实内容之间很难有意义地区分。我们还发现,当刺激包含合成内容时,与真实内容相比,检测性能会下降,图像中包含人脸与非人脸对象相比,单一模态与多模态刺激相比,混合真实性与完全合成的音视频刺激相比,以及包含外语与观察者流利的语言相比。最后,我们还发现,关于合成媒体的先前知识并不会对他们的检测性能产生实质性影响。总的来说,这些结果表明人们在日常生活中极易被合成媒体欺骗,人类感知检测能力不再可靠作为有效的反制手段。

更新时间: 2024-04-04 14:51:56

领域: cs.HC,cs.AI,cs.SD,eess.AS,68T01,I.2

下载: http://arxiv.org/abs/2403.16760v3

About Test-time training for outlier detection

In this paper, we introduce DOUST, our method applying test-time training for outlier detection, significantly improving the detection performance. After thoroughly evaluating our algorithm on common benchmark datasets, we discuss a common problem and show that it disappears with a large enough test set. Thus, we conclude that under reasonable conditions, our algorithm can reach almost supervised performance even when no labeled outliers are given.

Updated: 2024-04-04 14:50:50

标题: 关于异常检测的测试时间训练

摘要: 在这篇论文中,我们介绍了我们的方法DOUST,该方法应用于异常检测的测试时间训练,显著提高了检测性能。在对常见基准数据集上对我们的算法进行彻底评估后,我们讨论了一个常见问题,并展示了当测试集足够大时,该问题会消失。因此,我们得出结论,在合理条件下,即使没有提供标记的异常值,我们的算法也可以达到几乎监督性能。

更新时间: 2024-04-04 14:50:50

领域: cs.LG

下载: http://arxiv.org/abs/2404.03495v1

A Methodology to Study the Impact of Spiking Neural Network Parameters considering Event-Based Automotive Data

Autonomous Driving (AD) systems are considered as the future of human mobility and transportation. Solving computer vision tasks such as image classification and object detection/segmentation, with high accuracy and low power/energy consumption, is highly needed to realize AD systems in real life. These requirements can potentially be satisfied by Spiking Neural Networks (SNNs). However, the state-of-the-art works in SNN-based AD systems still focus on proposing network models that can achieve high accuracy, and they have not systematically studied the roles of SNN parameters when used for learning event-based automotive data. Therefore, we still lack understanding of how to effectively develop SNN models for AD systems. Toward this, we propose a novel methodology to systematically study and analyze the impact of SNN parameters considering event-based automotive data, then leverage this analysis for enhancing SNN developments. To do this, we first explore different settings of SNN parameters that directly affect the learning mechanism (i.e., batch size, learning rate, neuron threshold potential, and weight decay), then analyze the accuracy results. Afterward, we propose techniques that jointly improve SNN accuracy and reduce training time. Experimental results show that our methodology can improve the SNN models for AD systems than the state-of-the-art, as it achieves higher accuracy (i.e., 86%) for the NCARS dataset, and it can also achieve iso-accuracy (i.e., ~85% with standard deviation less than 0.5%) while speeding up the training time by 1.9x. In this manner, our research work provides a set of guidelines for SNN parameter enhancements, thereby enabling the practical developments of SNN-based AD systems.

Updated: 2024-04-04 14:48:26

标题: 一种研究尖峰神经网络参数对基于事件的汽车数据影响的方法论

摘要: 自动驾驶(AD)系统被认为是人类移动和交通的未来。解决计算机视觉任务,如图像分类和对象检测/分割,需要高准确性和低功耗/能耗,这对于实现现实生活中的AD系统至关重要。这些要求可能通过脉冲神经网络(SNNs)来满足。然而,基于SNN的AD系统的最新工作仍然专注于提出可以实现高准确性的网络模型,并且他们尚未系统地研究了在学习基于事件的汽车数据时使用SNN参数的作用。因此,我们仍然缺乏如何有效开发AD系统的SNN模型的理解。为此,我们提出了一种新方法,以系统地研究和分析考虑基于事件的汽车数据的SNN参数的影响,然后利用这种分析来增强SNN的发展。为了做到这一点,我们首先探索直接影响学习机制的SNN参数的不同设置(即批处理大小、学习速率、神经元阈值电位和权重衰减),然后分析准确性结果。随后,我们提出了一些同时提高SNN准确性并减少训练时间的技术。实验结果表明,我们的方法可以改进AD系统的SNN模型,达到了比最先进技术更高的准确性(即NCARS数据集的86%),并且还可以在加快训练时间1.9倍的情况下实现等准确性(即标准偏差小于0.5%的约85%)。通过这种方式,我们的研究工作为SNN参数增强提供了一套指导方针,从而实现基于SNN的AD系统的实际发展。

更新时间: 2024-04-04 14:48:26

领域: cs.NE,cs.AI,cs.LG,cs.RO

下载: http://arxiv.org/abs/2404.03493v1

A Cause-Effect Look at Alleviating Hallucination of Knowledge-grounded Dialogue Generation

Empowered by the large-scale pretrained language models, existing dialogue systems have demonstrated impressive performance conducting fluent and natural-sounding conversations. However, they are still plagued by the hallucination problem, causing unpredictable factual errors in the generated responses. Recently, knowledge-grounded dialogue generation models, that intentionally invoke external knowledge resources to more informative responses, are also proven to be effective in reducing hallucination. Following the idea of getting high-quality knowledge, a few efforts have achieved pretty good performance on this issue. As some inevitable knowledge noises may also lead to hallucinations, it is emergent to investigate the reason and future directions for building noise-tolerant methods in KGD tasks. In this paper, we analyze the causal story behind this problem with counterfactual reasoning methods. Based on the causal effect analysis, we propose a possible solution for alleviating the hallucination in KGD by exploiting the dialogue-knowledge interaction. Experimental results of our example implementation show that this method can reduce hallucination without disrupting other dialogue performance, while keeping adaptive to different generation models. We hope our efforts can support and call for more attention to developing lightweight techniques towards robust and trusty dialogue systems.

Updated: 2024-04-04 14:45:26

标题: 一种缓解基于知识的对话生成中幻觉的因果关系视角

摘要: 受大规模预训练语言模型的赋能,现有对话系统展示了出色的性能,能够进行流畅自然的对话。然而,它们仍然受到幻觉问题的困扰,在生成的回应中导致不可预测的事实错误。最近,知识驱动的对话生成模型有意地调用外部知识资源以提供更具信息性的回应,也被证明在减少幻觉方面是有效的。在获取高质量知识的理念下,一些努力在这个问题上取得了相当不错的表现。由于一些不可避免的知识噪音也可能导致幻觉,研究在KGD任务中构建耐噪声方法的原因和未来方向变得紧迫。在本文中,我们通过反事实推理方法分析了这个问题背后的因果故事。基于因果效应分析,我们提出了一种可能的解决方案,通过利用对话与知识的相互作用来缓解KGD中的幻觉。我们示例实现的实验结果显示,这种方法可以减少幻觉,同时不影响其他对话性能,并在不同生成模型下具有适应性。我们希望我们的努力能够支持并呼吁更多关注于开发轻量级技术,以构建强大可信赖的对话系统。

更新时间: 2024-04-04 14:45:26

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2404.03491v1

ILPO-NET: Network for the invariant recognition of arbitrary volumetric patterns in 3D

Effective recognition of spatial patterns and learning their hierarchy is crucial in modern spatial data analysis. Volumetric data applications seek techniques ensuring invariance not only to shifts but also to pattern rotations. While traditional methods can readily achieve translational invariance, rotational invariance possesses multiple challenges and remains an active area of research. Here, we present ILPO-Net (Invariant to Local Patterns Orientation Network), a novel approach that handles arbitrarily shaped patterns with the convolutional operation inherently invariant to local spatial pattern orientations using the Wigner matrix expansions. Our architecture seamlessly integrates the new convolution operator and, when benchmarked on diverse volumetric datasets such as MedMNIST and CATH, demonstrates superior performance over the baselines with significantly reduced parameter counts - up to 1000 times fewer in the case of MedMNIST. Beyond these demonstrations, ILPO-Net's rotational invariance paves the way for other applications across multiple disciplines. Our code is publicly available at https://gricad-gitlab.univ-grenoble-alpes.fr/GruLab/ILPONet.

Updated: 2024-04-04 14:44:23

标题: ILPO-NET:用于三维中任意体积模式不变识别的网络

摘要: 现代空间数据分析中,有效识别空间模式并学习其层次结构是至关重要的。体积数据应用程序寻求确保不仅对平移而且对模式旋转具有不变性的技术。传统方法可以轻松实现平移不变性,而旋转不变性则存在多重挑战,仍然是一个活跃的研究领域。在这里,我们提出了ILPO-Net(旋转不变局部模式方向网络),这是一种处理任意形状模式的新方法,使用Wigner矩阵展开的卷积操作本质上对局部空间模式方向具有不变性。我们的架构无缝地集成了新的卷积操作符,并在多样化的体积数据集(如MedMNIST和CATH)上进行基准测试时,表现出优越的性能,比基准线的参数数量显著减少 - 在MedMNIST的情况下高达1000倍。除了这些演示之外,ILPO-Net的旋转不变性为其他跨多个学科的应用程序铺平了道路。我们的代码公开可用,网址为https://gricad-gitlab.univ-grenoble-alpes.fr/GruLab/ILPONet。

更新时间: 2024-04-04 14:44:23

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2403.19612v2

Reinforcement learning-based estimation for partial differential equations

In systems governed by nonlinear partial differential equations such as fluid flows, the design of state estimators such as Kalman filters relies on a reduced-order model (ROM) that projects the original high-dimensional dynamics onto a computationally tractable low-dimensional space. However, ROMs are prone to large errors, which negatively affects the performance of the estimator. Here, we introduce the reinforcement learning reduced-order estimator (RL-ROE), a ROM-based estimator in which the correction term that takes in the measurements is given by a nonlinear policy trained through reinforcement learning. The nonlinearity of the policy enables the RL-ROE to compensate efficiently for errors of the ROM, while still taking advantage of the imperfect knowledge of the dynamics. Using examples involving the Burgers and Navier-Stokes equations, we show that in the limit of very few sensors, the trained RL-ROE outperforms a Kalman filter designed using the same ROM. Moreover, it yields accurate high-dimensional state estimates for trajectories corresponding to various physical parameter values, without direct knowledge of the latter.

Updated: 2024-04-04 14:35:35

标题: 基于强化学习的偏微分方程估计

摘要: 在受非线性偏微分方程控制的系统中,如流体流动,状态估计器(如卡尔曼滤波器)的设计依赖于一种降阶模型(ROM),将原始高维动态投影到可计算的低维空间上。然而,ROM容易产生较大误差,从而影响估计器的性能。在这里,我们介绍了强化学习降阶估计器(RL-ROE),这是一种基于ROM的估计器,其中接受测量的校正项由通过强化学习训练的非线性策略给出。策略的非线性使RL-ROE能够有效地补偿ROM的误差,同时仍然利用动态的不完全知识。通过涉及Burgers和Navier-Stokes方程的示例,我们展示了在传感器数量非常少的情况下,经过训练的RL-ROE优于使用相同ROM设计的卡尔曼滤波器。此外,它为对应于各种物理参数值的轨迹提供准确的高维状态估计,而无需直接了解后者。

更新时间: 2024-04-04 14:35:35

领域: cs.LG,cs.SY,eess.SY

下载: http://arxiv.org/abs/2302.01189v2

EulerFormer: Sequential User Behavior Modeling with Complex Vector Attention

To capture user preference, transformer models have been widely applied to model sequential user behavior data. The core of transformer architecture lies in the self-attention mechanism, which computes the pairwise attention scores in a sequence. Due to the permutation-equivariant nature, positional encoding is used to enhance the attention between token representations. In this setting, the pairwise attention scores can be derived by both semantic difference and positional difference. However, prior studies often model the two kinds of difference measurements in different ways, which potentially limits the expressive capacity of sequence modeling. To address this issue, this paper proposes a novel transformer variant with complex vector attention, named EulerFormer, which provides a unified theoretical framework to formulate both semantic difference and positional difference. The EulerFormer involves two key technical improvements. First, it employs a new transformation function for efficiently transforming the sequence tokens into polar-form complex vectors using Euler's formula, enabling the unified modeling of both semantic and positional information in a complex rotation form.Secondly, it develops a differential rotation mechanism, where the semantic rotation angles can be controlled by an adaptation function, enabling the adaptive integration of the semantic and positional information according to the semantic contexts.Furthermore, a phase contrastive learning task is proposed to improve the isotropy of contextual representations in EulerFormer. Our theoretical framework possesses a high degree of completeness and generality. It is more robust to semantic variations and possesses moresuperior theoretical properties in principle. Extensive experiments conducted on four public datasets demonstrate the effectiveness and efficiency of our approach.

Updated: 2024-04-04 14:29:34

标题: EulerFormer:使用复杂向量注意力进行顺序用户行为建模

摘要: 为了捕捉用户偏好,变压器模型已被广泛应用于建模序列用户行为数据。变压器架构的核心在于自注意机制,它计算序列中的成对注意力分数。由于排列等价性质,位置编码被用来增强标记表示之间的注意力。在这种设置下,成对注意力分数可以通过语义差异和位置差异来推导。然而,先前的研究经常以不同的方式对两种差异度量进行建模,这可能限制了序列建模的表达能力。为了解决这个问题,本文提出了一种具有复向量注意力的新型变压器变体,命名为EulerFormer,它提供了一个统一的理论框架来表达语义差异和位置差异。EulerFormer包含两个关键技术改进。首先,它采用一种新的转换函数,通过欧拉公式将序列标记有效地转换为极坐标形式的复向量,从而统一建模语义和位置信息。其次,它开发了一种差分旋转机制,语义旋转角度可以通过自适应函数控制,从而根据语义上下文自适应地整合语义和位置信息。此外,提出了一个相位对比学习任务,以改善EulerFormer中上下文表示的等向性。我们的理论框架具有很高的完整性和普适性。从原理上讲,它对语义变化更加健壮,并具有更优越的理论特性。在四个公共数据集上进行的大量实验证明了我们方法的有效性和效率。

更新时间: 2024-04-04 14:29:34

领域: cs.IR,cs.LG

下载: http://arxiv.org/abs/2403.17729v2

Performance of computer vision algorithms for fine-grained classification using crowdsourced insect images

With fine-grained classification, we identify unique characteristics to distinguish among classes of the same super-class. We are focusing on species recognition in Insecta, as they are critical for biodiversity monitoring and at the base of many ecosystems. With citizen science campaigns, billions of images are collected in the wild. Once these are labelled, experts can use them to create distribution maps. However, the labelling process is time-consuming, which is where computer vision comes in. The field of computer vision offers a wide range of algorithms, each with its strengths and weaknesses; how do we identify the algorithm that is in line with our application? To answer this question, we provide a full and detailed evaluation of nine algorithms among deep convolutional networks (CNN), vision transformers (ViT), and locality-based vision transformers (LBVT) on 4 different aspects: classification performance, embedding quality, computational cost, and gradient activity. We offer insights that we haven't yet had in this domain proving to which extent these algorithms solve the fine-grained tasks in Insecta. We found that the ViT performs the best on inference speed and computational cost while the LBVT outperforms the others on performance and embedding quality; the CNN provide a trade-off among the metrics.

Updated: 2024-04-04 14:26:58

标题: 计算机视觉算法利用众包昆虫图像进行细粒度分类的性能

摘要: 通过细粒度分类,我们识别出独特的特征来区分同一超类别的类别。我们专注于昆虫界中的物种识别,因为它们对于生物多样性监测至关重要,并且是许多生态系统的基础。通过公民科学活动,数十亿张野外图像被收集起来。一旦这些图像被标记,专家们就可以利用它们来创建分布图。然而,标记过程耗时,这就是计算机视觉发挥作用的地方。计算机视觉领域提供了广泛的算法,每种算法都有其优势和劣势;我们如何确定与我们应用相符的算法?为了回答这个问题,我们对深度卷积网络(CNN)、视觉变压器(ViT)和基于局部的视觉变压器(LBVT)等九种算法在分类性能、嵌入质量、计算成本和梯度活动等四个不同方面进行了全面详细的评估。我们提供了在这一领域尚未有的见解,证明这些算法在昆虫界的细粒度任务中解决问题的程度。我们发现ViT在推理速度和计算成本方面表现最好,而LBVT在性能和嵌入质量方面优于其他算法;CNN在各指标之间提供了一种权衡。

更新时间: 2024-04-04 14:26:58

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2404.03474v1

Generalization Bounds for Message Passing Networks on Mixture of Graphons

We study the generalization capabilities of Message Passing Neural Networks (MPNNs), a prevalent class of Graph Neural Networks (GNN). We derive generalization bounds specifically for MPNNs with normalized sum aggregation and mean aggregation. Our analysis is based on a data generation model incorporating a finite set of template graphons. Each graph within this framework is generated by sampling from one of the graphons with a certain degree of perturbation. In particular, we extend previous MPNN generalization results to a more realistic setting, which includes the following modifications: 1) we analyze simple random graphs with Bernoulli-distributed edges instead of weighted graphs; 2) we sample both graphs and graph signals from perturbed graphons instead of clean graphons; and 3) we analyze sparse graphs instead of dense graphs. In this more realistic and challenging scenario, we provide a generalization bound that decreases as the average number of nodes in the graphs increases. Our results imply that MPNNs with higher complexity than the size of the training set can still generalize effectively, as long as the graphs are sufficiently large.

Updated: 2024-04-04 14:26:47

标题: 混合图上消息传递网络的泛化界限

摘要: 我们研究了信息传递神经网络(MPNNs),一类常见的图神经网络(GNN)的泛化能力。我们特别为具有规范化求和聚合和均值聚合的MPNNs推导了泛化界限。我们的分析基于一个包含有限数量模板图谱的数据生成模型。在这个框架中,每个图形通过从其中一个图谱中抽样生成,并伴有一定程度的扰动。特别地,我们将先前的MPNN泛化结果扩展到了一个更为现实的设置,其中包括以下修改:1)我们分析了具有伯努利分布边缘的简单随机图,而不是加权图;2)我们从扰动图谱中抽样生成图形和图信号,而不是干净的图谱;3)我们分析了稀疏图而不是密集图。在这种更为现实和具有挑战性的情景中,我们提供了一个随着图形中节点平均数量增加而减少的泛化界限。我们的结果表明,只要图形足够大,比训练集的大小复杂度更高的MPNNs仍然可以有效地泛化。

更新时间: 2024-04-04 14:26:47

领域: cs.LG

下载: http://arxiv.org/abs/2404.03473v1

Challenges for Reinforcement Learning in Quantum Circuit Design

Quantum computing (QC) in the current NISQ era is still limited in size and precision. Hybrid applications mitigating those shortcomings are prevalent to gain early insight and advantages. Hybrid quantum machine learning (QML) comprises both the application of QC to improve machine learning (ML) and ML to improve QC architectures. This work considers the latter, leveraging reinforcement learning (RL) to improve the search for viable quantum architectures, which we formalize by a set of generic challenges. Furthermore, we propose a concrete framework, formalized as a Markov decision process, to enable learning policies capable of controlling a universal set of continuously parameterized quantum gates. Finally, we provide benchmark comparisons to assess the shortcomings and strengths of current state-of-the-art RL algorithms.

Updated: 2024-04-04 14:26:06

标题: 量子电路设计中强化学习面临的挑战

摘要: 在当前的NISQ时代,量子计算(QC)在规模和精度上仍然存在限制。为了获得早期洞察和优势,混合应用程序可以缓解这些缺点。混合量子机器学习(QML)包括利用QC来改进机器学习(ML)和利用ML来改进QC架构。本文考虑了后者,利用强化学习(RL)来改进寻找可行量子架构的过程,我们将其形式化为一组通用挑战。此外,我们提出了一个具体的框架,形式化为马尔可夫决策过程,以实现能够控制一组不断参数化的量子门的学习策略。最后,我们提供基准比较来评估当前最先进的RL算法的缺点和优势。

更新时间: 2024-04-04 14:26:06

领域: quant-ph,cs.LG

下载: http://arxiv.org/abs/2312.11337v2

Reevaluating Bias Detection in Language Models: The Role of Implicit Norm

Large language models (LLMs), trained on vast datasets, can carry biases that manifest in various forms, from overt discrimination to implicit stereotypes. One facet of bias is performance disparities in LLMs, often harming underprivileged groups, such as racial minorities. A common approach to quantifying bias is to use template-based bias probes, which explicitly state group membership (e.g. White) and evaluate if the outcome of a task, sentiment analysis for instance, is invariant to the change of group membership (e.g. change White race to Black). This approach is widely used in bias quantification. However, in this work, we find evidence of an unexpectedly overlooked consequence of using template-based probes for LLM bias quantification. We find that in doing so, text examples associated with White ethnicities appear to be classified as exhibiting negative sentiment at elevated rates. We hypothesize that the scenario arises artificially through a mismatch between the pre-training text of LLMs and the templates used to measure bias through reporting bias, unstated norms that imply group membership without explicit statement. Our finding highlights the potential misleading impact of varying group membership through explicit mention in bias quantification

Updated: 2024-04-04 14:24:06

标题: 重新评估语言模型中的偏见检测:隐含规范的作用

摘要: 大型语言模型(LLMs)经过大量数据集训练后可能存在各种形式的偏见,从明显的歧视到隐含的刻板印象。偏见的一个方面是LLMs中的表现差异,常常对弱势群体(如种族少数群体)造成伤害。量化偏见的常见方法是使用基于模板的偏见探针,明确说明群体成员身份(例如白人),并评估任务结果(例如情感分析)是否对群体成员身份的更改(例如将白人种族更改为黑人)保持不变。这种方法在偏见量化中被广泛使用。然而,在这项工作中,我们发现了使用基于模板的探针进行LLM偏见量化的一个意外被忽视的后果的证据。我们发现,这样做时,与白人种族相关的文本例子似乎被分类为以更高的比率展现负面情感。我们假设这种情况是通过LLMs的预训练文本与用于通过报告偏见来衡量偏见的模板之间的不匹配人为产生的,报告偏见是指暗示群体成员身份而没有明确说明的规范。我们的发现强调了通过在偏见量化中明确提及不同群体成员身份可能导致误导的潜在影响。

更新时间: 2024-04-04 14:24:06

领域: cs.CL,cs.CY,cs.LG

下载: http://arxiv.org/abs/2404.03471v1

Parametric-Task MAP-Elites

Optimizing a set of functions simultaneously by leveraging their similarity is called multi-task optimization. Current black-box multi-task algorithms only solve a finite set of tasks, even when the tasks originate from a continuous space. In this paper, we introduce Parametric-Task MAP-Elites (PT-ME), a new black-box algorithm for continuous multi-task optimization problems. This algorithm (1) solves a new task at each iteration, effectively covering the continuous space, and (2) exploits a new variation operator based on local linear regression. The resulting dataset of solutions makes it possible to create a function that maps any task parameter to its optimal solution. We show that PT-ME outperforms all baselines, including the deep reinforcement learning algorithm PPO on two parametric-task toy problems and a robotic problem in simulation.

Updated: 2024-04-04 14:21:13

标题: 参数化任务MAP-精英

摘要: 将一组函数同时优化,利用它们的相似性被称为多任务优化。目前的黑盒多任务算法只能解决有限数量的任务,即使这些任务来自连续空间。本文介绍了一种新的黑盒算法Parametric-Task MAP-Elites(PT-ME),用于连续多任务优化问题。该算法(1)在每次迭代中解决一个新任务,有效覆盖连续空间,并且(2)利用基于局部线性回归的新变异算子。由此产生的解集数据集使得可以创建一个将任何任务参数映射到其最优解的函数。我们展示了PT-ME在两个参数化任务玩具问题和模拟中的机器人问题上优于所有基线,包括深度强化学习算法PPO。

更新时间: 2024-04-04 14:21:13

领域: cs.NE,cs.LG

下载: http://arxiv.org/abs/2402.01275v2

Hessian Aware Low-Rank Weight Perturbation for Continual Learning

Continual learning aims to learn a series of tasks sequentially without forgetting the knowledge acquired from the previous ones. In this work, we propose the Hessian Aware Low-Rank Perturbation algorithm for continual learning. By modeling the parameter transitions along the sequential tasks with the weight matrix transformation, we propose to apply the low-rank approximation on the task-adaptive parameters in each layer of the neural networks. Specifically, we theoretically demonstrate the quantitative relationship between the Hessian and the proposed low-rank approximation. The approximation ranks are then globally determined according to the marginal increment of the empirical loss estimated by the layer-specific gradient and low-rank approximation error. Furthermore, we control the model capacity by pruning less important parameters to diminish the parameter growth. We conduct extensive experiments on various benchmarks, including a dataset with large-scale tasks, and compare our method against some recent state-of-the-art methods to demonstrate the effectiveness and scalability of our proposed method. Empirical results show that our method performs better on different benchmarks, especially in achieving task order robustness and handling the forgetting issue. The source code is at https://github.com/lijiaqi/HALRP.

Updated: 2024-04-04 14:12:11

标题: 赫西亚感知低秩权重扰动用于持续学习

摘要: 持续学习旨在按顺序学习一系列任务,而不会忘记从先前任务中获得的知识。在这项工作中,我们提出了一种用于持续学习的Hessian Aware Low-Rank Perturbation算法。通过使用权重矩阵变换来建模参数在顺序任务中的转换,我们提出在神经网络的每一层上应用低秩逼近于任务自适应参数。具体来说,我们在理论上证明了Hessian和所提出的低秩逼近之间的定量关系。然后,根据由每一层特定梯度和低秩逼近误差估计的经验损失的边际增量全局确定逼近秩。此外,我们通过修剪不太重要的参数来控制模型容量,以减少参数增长。我们在各种基准测试上进行了大量实验,包括一个具有大规模任务的数据集,并将我们的方法与一些最新的最先进方法进行比较,以展示我们提出的方法的有效性和可扩展性。实证结果表明,我们的方法在不同基准测试中表现更好,特别是在实现任务顺序鲁棒性和处理遗忘问题方面。源代码位于https://github.com/lijiaqi/HALRP。

更新时间: 2024-04-04 14:12:11

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2311.15161v2

Causal hybrid modeling with double machine learning

Hybrid modeling integrates machine learning with scientific knowledge to enhance interpretability, generalization, and adherence to natural laws. Nevertheless, equifinality and regularization biases pose challenges in hybrid modeling to achieve these purposes. This paper introduces a novel approach to estimating hybrid models via a causal inference framework, specifically employing Double Machine Learning (DML) to estimate causal effects. We showcase its use for the Earth sciences on two problems related to carbon dioxide fluxes. In the $Q_{10}$ model, we demonstrate that DML-based hybrid modeling is superior in estimating causal parameters over end-to-end deep neural network (DNN) approaches, proving efficiency, robustness to bias from regularization methods, and circumventing equifinality. Our approach, applied to carbon flux partitioning, exhibits flexibility in accommodating heterogeneous causal effects. The study emphasizes the necessity of explicitly defining causal graphs and relationships, advocating for this as a general best practice. We encourage the continued exploration of causality in hybrid models for more interpretable and trustworthy results in knowledge-guided machine learning.

Updated: 2024-04-04 14:02:47

标题: 用双机器学习进行因果混合建模

摘要: 混合建模将机器学习与科学知识相结合,以提高解释性、泛化性和遵循自然规律。然而,等效性和正则化偏差在混合建模中实现这些目的时存在挑战。本文介绍了一种通过因果推断框架估计混合模型的新方法,具体采用双机器学习(DML)来估计因果效应。我们展示了其在地球科学领域中应用于两个与二氧化碳通量相关的问题。在$Q_{10}$模型中,我们证明基于DML的混合建模在估计因果参数方面优于端到端深度神经网络(DNN)方法,证明了其效率、对正则化方法偏差的稳健性和规避等效性。我们的方法应用于碳通量分配,展现出在适应异质性因果效应方面的灵活性。该研究强调了明确定义因果图和关系的必要性,并主张将其作为一种一般最佳实践。我们鼓励继续探索混合模型中因果关系,以获得更具解释性和可信度的知识引导机器学习结果。

更新时间: 2024-04-04 14:02:47

领域: cs.LG,stat.ME

下载: http://arxiv.org/abs/2402.13332v2

Sequential Monte Carlo Bandits

We extend Bayesian multi-armed bandit (MAB) algorithms beyond their original setting by making use of sequential Monte Carlo (SMC) methods. A MAB is a sequential decision making problem where the goal is to learn a policy that maximizes long term payoff, where only the reward of the executed action is observed. In the stochastic MAB, the reward for each action is generated from an unknown distribution, often assumed to be stationary. To decide which action to take next, a MAB agent must learn the characteristics of the unknown reward distribution, e.g., compute its sufficient statistics. However, closed-form expressions for these statistics are analytically intractable except for simple, stationary cases. We here utilize SMC for estimation of the statistics Bayesian MAB agents compute, and devise flexible policies that can address a rich class of bandit problems: i.e., MABs with nonlinear, stateless- and context-dependent reward distributions that evolve over time. We showcase how non-stationary bandits, where time dynamics are modeled via linear dynamical systems, can be successfully addressed by SMC-based Bayesian bandit agents. We empirically demonstrate good regret performance of the proposed SMC-based bandit policies in several MAB scenarios that have remained elusive, i.e., in non-stationary bandits with nonlinear rewards.

Updated: 2024-04-04 14:00:42

标题: 顺序蒙特卡洛赌博机

摘要: 我们通过使用序贯蒙特卡罗(SMC)方法,将贝叶斯多臂老虎机(MAB)算法扩展到原始设定之外。 MAB是一个顺序决策问题,其目标是学习一个最大化长期回报的策略,只有执行动作的奖励被观测到。在随机MAB中,每个动作的奖励是从未知分布中生成的,通常假定为稳定的。为了决定下一步要采取哪个动作,MAB代理必须学习未知奖励分布的特性,例如计算其充分统计量。然而,除了简单的稳态情况外,这些统计量的封闭形式表达在解析上是难以处理的。 我们在这里利用SMC来估计贝叶斯MAB代理计算的统计量,并设计灵活的策略,可以解决一类丰富的老虎机问题:即具有非线性、无状态和上下文相关奖励分布,并随时间变化的MAB。我们展示了非稳态老虎机,其中时间动态通过线性动态系统建模,可以通过基于SMC的贝叶斯老虎机代理成功解决。我们在几个MAB场景中实证地证明了所提出的基于SMC的老虎机策略在一直难以解决的情况下(即具有非线性奖励的非稳态老虎机)的良好遗憾性能。

更新时间: 2024-04-04 14:00:42

领域: stat.ML,cs.LG,stat.CO,62L05, 62L12, 62L20, 62M05,I.2.6

下载: http://arxiv.org/abs/1808.02933v4

Conditioning of Banach Space Valued Gaussian Random Variables: An Approximation Approach Based on Martingales

In this paper we investigate the conditional distributions of two Banach space valued, jointly Gaussian random variables. These conditional distributions are again Gaussian and their means and covariances are determined by a general approximation scheme based upon a martingale idea. We then apply our general results to the case of Gaussian processes with continuous paths conditioned to partial observations of their paths.

Updated: 2024-04-04 13:57:44

标题: Banach空间值高斯随机变量的调节:基于鞅的逼近方法

摘要: 在这篇论文中,我们研究了两个Banach空间值的联合高斯随机变量的条件分布。这些条件分布再次是高斯分布,其均值和协方差由基于鞍马特里格尔思想的一般逼近方案确定。然后我们将我们的一般结果应用于连续路径的高斯过程的情况,这些过程被条件化为它们路径的部分观测。

更新时间: 2024-04-04 13:57:44

领域: math.PR,cs.LG,math.ST,stat.TH

下载: http://arxiv.org/abs/2404.03453v1

SpikeNAS: A Fast Memory-Aware Neural Architecture Search Framework for Spiking Neural Network-based Autonomous Agents

Autonomous mobile agents (e.g., UAVs and UGVs) are typically expected to incur low power/energy consumption for solving machine learning tasks (such as object recognition), as these mobile agents are usually powered by portable batteries. These requirements can be fulfilled by Spiking Neural Networks (SNNs), since their bio-inspired spike-based operations offer high accuracy and ultra low-power/energy computation. Currently, most of the SNN architectures are derived from Artificial Neural Networks whose neurons' architectures and operations are different from SNNs, or developed without considering memory budgets from the underlying processing hardware of autonomous mobile agents. These limitations hinder SNNs from reaching their full potential in accuracy and efficiency. Toward this, we propose SpikeNAS, a novel fast memory-aware neural architecture search (NAS) framework for SNNs that quickly finds an appropriate SNN architecture with high accuracy under the given memory budgets from autonomous mobile agents. To do this, our SpikeNAS employs several key steps: analyzing the impacts of network operations on the accuracy, enhancing the network architecture to improve the learning quality, and developing a fast memory-aware search algorithm. The experimental results show that our SpikeNAS improves the searching time and maintains high accuracy as compared to state-of-the-art while meeting the given memory budgets (e.g., 4.4x faster search with 1.3% accuracy improvement for CIFAR100, using an Nvidia RTX 6000 Ada GPU machine), thereby quickly providing the appropriate SNN architecture for the memory-constrained autonomous mobile agents.

Updated: 2024-04-04 13:55:05

标题: SpikeNAS:用于脉冲神经网络自主代理的快速记忆感知神经架构搜索框架

摘要: 自主移动代理(例如,UAV和UGV)通常期望在解决机器学习任务(如目标识别)时消耗低功耗/能量,因为这些移动代理通常由便携式电池供电。这些要求可以通过脉冲神经网络(SNN)来实现,因为它们的生物启发式基于脉冲的操作提供了高准确性和超低功耗/能量计算。目前,大多数SNN架构都是从人工神经网络中派生出来的,其神经元结构和操作与SNN不同,或者是在不考虑自主移动代理底层处理硬件的内存预算的情况下开发的。这些限制阻碍了SNN在准确性和效率方面发挥其全部潜力。为此,我们提出了SpikeNAS,这是一个新颖的快速记忆感知神经架构搜索(NAS)框架,用于在给定自主移动代理的内存预算下快速找到一个具有高准确性的适当SNN架构。为了做到这一点,我们的SpikeNAS采用了几个关键步骤:分析网络操作对准确性的影响,改进网络架构以提高学习质量,并开发一个快速的记忆感知搜索算法。实验结果显示,与最先进的方法相比,我们的SpikeNAS提高了搜索时间并保持了高准确性(例如,在Nvidia RTX 6000 Ada GPU机器上使用CIFAR100,搜索速度提高了4.4倍,准确性提高了1.3%),从而为内存受限的自主移动代理快速提供适当的SNN架构。

更新时间: 2024-04-04 13:55:05

领域: cs.NE,cs.AI,cs.LG

下载: http://arxiv.org/abs/2402.11322v2

UINav: A Practical Approach to Train On-Device Automation Agents

Automation systems that can autonomously drive application user interfaces to complete user tasks are of great benefit, especially when users are situationally or permanently impaired. Prior automation systems do not produce generalizable models while AI-based automation agents work reliably only in simple, hand-crafted applications or incur high computation costs. We propose UINav, a demonstration-based approach to train automation agents that fit mobile devices, yet achieving high success rates with modest numbers of demonstrations. To reduce the demonstration overhead, UINav uses a referee model that provides users with immediate feedback on tasks where the agent fails, and automatically augments human demonstrations to increase diversity in training data. Our evaluation shows that with only 10 demonstrations UINav can achieve 70% accuracy, and that with enough demonstrations it can surpass 90% accuracy.

Updated: 2024-04-04 13:51:56

标题: UINav:一种训练设备上自动化代理的实用方法

摘要: 自动化系统可以自主驾驶应用程序用户界面,完成用户任务,尤其是在用户情况或永久性受损时,具有巨大的好处。先前的自动化系统不能产生可推广的模型,而基于人工智能的自动化代理只能在简单的手工制作应用程序中可靠地工作,或者产生高计算成本。我们提出了UINav,这是一种基于演示的方法,用于训练适合移动设备的自动化代理,同时在少量演示的情况下实现高成功率。为了减少演示的开销,UINav使用一个裁判模型,为代理失败的任务提供用户即时反馈,并自动增加人类演示,以增加训练数据的多样性。我们的评估表明,仅需10次演示,UINav就可以达到70%的准确率,而足够多的演示可以使其超过90%的准确率。

更新时间: 2024-04-04 13:51:56

领域: cs.HC,cs.AI

下载: http://arxiv.org/abs/2312.10170v3

Metric-aware LLM inference for regression and scoring

Large language models (LLMs) have demonstrated strong results on a range of NLP tasks. Typically, outputs are obtained via autoregressive sampling from the LLM's underlying distribution. Building on prior work on Minimum Bayes Risk Decoding, we show that this inference strategy can be suboptimal for a range of regression and scoring tasks, and associated evaluation metrics. As a remedy, we propose metric aware LLM inference: a decision theoretic approach optimizing for custom regression and scoring metrics at inference time. We report improvements over baselines on academic benchmarks and publicly available models.

Updated: 2024-04-04 13:48:19

标题: 度量感知的LLM推断用于回归和评分

摘要: 大型语言模型(LLMs)已经在一系列NLP任务中展示出强大的结果。通常,输出是通过从LLM的基础分布中进行自回归抽样获得的。在先前关于最小贝叶斯风险解码的工作基础上,我们展示了这种推断策略在一系列回归和评分任务以及相关评估指标上可能是次优的。作为一种补救措施,我们提出了度量感知的LLM推断:一种决策理论方法,在推断时优化定制的回归和评分指标。我们在学术基准测试和公开可用模型上报告了相对于基线的改进。

更新时间: 2024-04-04 13:48:19

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2403.04182v2

SP$^2$OT: Semantic-Regularized Progressive Partial Optimal Transport for Imbalanced Clustering

Deep clustering, which learns representation and semantic clustering without labels information, poses a great challenge for deep learning-based approaches. Despite significant progress in recent years, most existing methods focus on uniformly distributed datasets, significantly limiting the practical applicability of their methods. In this paper, we propose a more practical problem setting named deep imbalanced clustering, where the underlying classes exhibit an imbalance distribution. To address this challenge, we introduce a novel optimal transport-based pseudo-label learning framework. Our framework formulates pseudo-label generation as a Semantic-regularized Progressive Partial Optimal Transport (SP$^2$OT) problem, which progressively transports each sample to imbalanced clusters under several prior distribution and semantic relation constraints, thus generating high-quality and imbalance-aware pseudo-labels. To solve SP$^2$OT, we develop a Majorization-Minimization-based optimization algorithm. To be more precise, we employ the strategy of majorization to reformulate the SP$^2$OT problem into a Progressive Partial Optimal Transport problem, which can be transformed into an unbalanced optimal transport problem with augmented constraints and can be solved efficiently by a fast matrix scaling algorithm. Experiments on various datasets, including a human-curated long-tailed CIFAR100, challenging ImageNet-R, and large-scale subsets of fine-grained iNaturalist2018 datasets, demonstrate the superiority of our method.

Updated: 2024-04-04 13:46:52

标题: SP$^2$OT: 语义正则化的渐进式部分最优输运在不平衡聚类中的应用

摘要: 深度聚类是一种学习表示和语义聚类的方法,不需要标签信息,对于基于深度学习的方法提出了巨大挑战。尽管近年来取得了显著进展,但大多数现有方法都集中在均匀分布的数据集上,显著限制了它们方法的实际适用性。本文提出了一个更实际的问题设置,称为深度不平衡聚类,其中底层类别呈现不平衡分布。为了解决这一挑战,我们引入了一种基于最优传输的伪标签学习框架。我们的框架将伪标签生成形式化为一个基于语义正则化的渐进式部分最优传输(SP$^2$OT)问题,根据几个先验分布和语义关系约束逐步将每个样本传输到不平衡的簇中,从而生成高质量和不平衡感知的伪标签。为了解决SP$^2$OT问题,我们开发了一种基于Majorization-Minimization的优化算法。更具体地说,我们利用主极化策略将SP$^2$OT问题重构为一个渐进式部分最优传输问题,这可以转化为一个带有增强约束的不平衡最优传输问题,并可以通过快速矩阵缩放算法有效解决。在包括人工策划的长尾CIFAR100、具有挑战性的ImageNet-R和大规模细粒度iNaturalist2018数据集的各种数据集上的实验证明了我们方法的优越性。

更新时间: 2024-04-04 13:46:52

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2404.03446v1

OptoGPT: A Foundation Model for Inverse Design in Optical Multilayer Thin Film Structures

Optical multilayer thin film structures have been widely used in numerous photonic applications. However, existing inverse design methods have many drawbacks because they either fail to quickly adapt to different design targets, or are difficult to suit for different types of structures, e.g., designing for different materials at each layer. These methods also cannot accommodate versatile design situations under different angles and polarizations. In addition, how to benefit practical fabrications and manufacturing has not been extensively considered yet. In this work, we introduce OptoGPT (Opto Generative Pretrained Transformer), a decoder-only transformer, to solve all these drawbacks and issues simultaneously.

Updated: 2024-04-04 13:45:51

标题: OptoGPT:光学多层薄膜结构中逆向设计的基础模型

摘要: 光学多层薄膜结构被广泛应用于许多光子应用中。然而,现有的逆向设计方法存在许多缺点,因为它们要么无法快速适应不同的设计目标,要么难以适用于不同类型的结构,例如,设计每层不同材料。这些方法也无法适应不同角度和极化下的多种设计情况。此外,如何使实际制造受益并且制造工艺尚未得到广泛考虑。在本研究中,我们引入了OptoGPT(光学生成预训练变压器),一个仅解码器的变压器,以同时解决所有这些缺点和问题。

更新时间: 2024-04-04 13:45:51

领域: physics.optics,cs.LG

下载: http://arxiv.org/abs/2304.10294v2

Self-organized arrival system for urban air mobility

Urban air mobility is an innovative mode of transportation in which electric vertical takeoff and landing (eVTOL) vehicles operate between nodes called vertiports. We outline a self-organized vertiport arrival system based on deep reinforcement learning. The airspace around the vertiport is assumed to be circular, and the vehicles can freely operate inside. Each aircraft is considered an individual agent and follows a shared policy, resulting in decentralized actions that are based on local information. We investigate the development of the reinforcement learning policy during training and illustrate how the algorithm moves from suboptimal local holding patterns to a safe and efficient final policy. The latter is validated in simulation-based scenarios and also deployed on small-scale unmanned aerial vehicles to showcase its real-world usability.

Updated: 2024-04-04 13:43:17

标题: 城市空中移动的自组织到达系统

摘要: 城市空中移动是一种创新的交通模式,电动垂直起降(eVTOL)飞行器在称为垂直港口的节点之间运行。我们概述了基于深度强化学习的自组织垂直港口到达系统。假设垂直港口周围的空域是圆形的,飞行器可以自由运行在内部。每架飞行器被视为一个独立的智能体,并遵循共享策略,导致基于本地信息的分散动作。我们研究了在训练过程中强化学习策略的发展,并说明了算法如何从次优的本地保持模式转变为安全高效的最终策略。后者在基于模拟的场景中得到验证,并且还在小型无人机上部署,展示其实际可用性。

更新时间: 2024-04-04 13:43:17

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2404.03710v1

Effective Learning with Node Perturbation in Deep Neural Networks

Backpropagation (BP) is the dominant and most successful method for training parameters of deep neural network models. However, BP relies on two computationally distinct phases, does not provide a satisfactory explanation of biological learning, and can be challenging to apply for training of networks with discontinuities or noisy node dynamics. By comparison, node perturbation (NP) proposes learning by the injection of noise into network activations, and subsequent measurement of the induced loss change. NP relies on two forward (inference) passes, does not make use of network derivatives, and has been proposed as a model for learning in biological systems. However, standard NP is highly data inefficient and unstable due to its unguided noise-based search process. In this work, we investigate different formulations of NP and relate it to the concept of directional derivatives as well as combining it with a decorrelating mechanism for layer-wise inputs. We find that a closer alignment with directional derivatives together with input decorrelation at every layer significantly enhances performance of NP learning with significant improvements in parameter convergence and much higher performance on the test data, approaching that of BP. Furthermore, our novel formulation allows for application to noisy systems in which the noise process itself is inaccessible.

Updated: 2024-04-04 13:40:51

标题: 深度神经网络中节点扰动的有效学习

摘要: 反向传播(BP)是训练深度神经网络模型参数的主要且最成功的方法。然而,BP依赖于两个计算上不同的阶段,无法提供对生物学习的满意解释,并且在训练具有不连续性或噪声节点动态的网络时可能具有挑战性。相比之下,节点扰动(NP)提出通过向网络激活注入噪声进行学习,并测量引起的损失变化。NP依赖于两个前向(推理)传递,不使用网络导数,并已被提出作为生物系统学习的模型。然而,标准NP由于其无指导的基于噪声的搜索过程而高度数据低效且不稳定。在这项工作中,我们研究了不同的NP形式,并将其与方向导数的概念联系起来,同时将其与层输入的去相关机制结合起来。我们发现,与方向导数更接近以及每一层输入的去相关显著增强了NP学习的性能,参数收敛得到显著改善,并且在测试数据上表现更好,接近于BP的性能。此外,我们的新颖公式允许应用于噪声系统,其中噪声过程本身是不可访问的。

更新时间: 2024-04-04 13:40:51

领域: cs.LG

下载: http://arxiv.org/abs/2310.00965v3

Privacy Engineering From Principles to Practice: A Roadmap

Privacy engineering is gaining momentum in industry and academia alike. So far, manifold low-level primitives and higher-level methods and strategies have successfully been established. Still, fostering adoption in real-world information systems calls for additional aspects to be consciously considered in research and practice.

Updated: 2024-04-04 13:39:49

标题: 隐私工程:从原则到实践的路线图

摘要: 隐私工程在工业界和学术界都越来越受到重视。到目前为止,已经成功建立了多种低级原语和高级方法和策略。然而,在现实世界的信息系统中促进采用需要在研究和实践中有意识地考虑其他方面。

更新时间: 2024-04-04 13:39:49

领域: cs.CR,cs.CY,cs.SE,K.5.0; H.1.0; D.2.1; D.2.2

下载: http://arxiv.org/abs/2404.03442v1

Benchmarking ChatGPT on Algorithmic Reasoning

We evaluate ChatGPT's ability to solve algorithm problems from the CLRS benchmark suite that is designed for GNNs. The benchmark requires the use of a specified classical algorithm to solve a given problem. We find that ChatGPT outperforms specialist GNN models, using Python to successfully solve these problems. This raises new points in the discussion about learning algorithms with neural networks.

Updated: 2024-04-04 13:39:06

标题: 在算法推理上对ChatGPT进行基准测试

摘要: 我们评估了ChatGPT解决CLRS基准套件中针对GNNs设计的算法问题的能力。该基准要求使用特定的经典算法来解决给定的问题。我们发现,使用Python,ChatGPT在成功解决这些问题方面胜过专门的GNN模型。这在关于使用神经网络学习算法的讨论中提出了新的观点。

更新时间: 2024-04-04 13:39:06

领域: cs.AI,cs.CL,cs.LG

下载: http://arxiv.org/abs/2404.03441v1

Non-negative Subspace Feature Representation for Few-shot Learning in Medical Imaging

Unlike typical visual scene recognition domains, in which massive datasets are accessible to deep neural networks, medical image interpretations are often obstructed by the paucity of data. In this paper, we investigate the effectiveness of data-based few-shot learning in medical imaging by exploring different data attribute representations in a low-dimensional space. We introduce different types of non-negative matrix factorization (NMF) in few-shot learning, addressing the data scarcity issue in medical image classification. Extensive empirical studies are conducted in terms of validating the effectiveness of NMF, especially its supervised variants (e.g., discriminative NMF, and supervised and constrained NMF with sparseness), and the comparison with principal component analysis (PCA), i.e., the collaborative representation-based dimensionality reduction technique derived from eigenvectors. With 14 different datasets covering 11 distinct illness categories, thorough experimental results and comparison with related techniques demonstrate that NMF is a competitive alternative to PCA for few-shot learning in medical imaging, and the supervised NMF algorithms are more discriminative in the subspace with greater effectiveness. Furthermore, we show that the part-based representation of NMF, especially its supervised variants, is dramatically impactful in detecting lesion areas in medical imaging with limited samples.

Updated: 2024-04-04 13:30:59

标题: 医学影像中用于少样本学习的非负子空间特征表示

摘要: 与典型的视觉场景识别领域不同,深度神经网络通常可以访问大量数据集,而医学图像解释常常受到数据稀缺的阻碍。本文探讨了在医学影像中探索低维空间中不同数据属性表示的数据驱动少样本学习的有效性。我们引入了不同类型的非负矩阵分解(NMF)在少样本学习中,解决了医学图像分类中的数据稀缺问题。通过大量实证研究验证了NMF的有效性,尤其是其监督变体(例如,判别式NMF和具有稀疏性的监督和受限制的NMF),以及与基于协同表示的特征向量导出的主成分分析(PCA)进行比较的实验。涵盖11个不同疾病类别的14个不同数据集,详尽的实验结果和与相关技术的比较表明,NMF是医学影像中少样本学习的竞争性替代品,监督NMF算法在具有更大有效性的子空间中更具区分性。此外,我们展示了NMF的部分表示,尤其是其监督变体,在医学影像中检测病灶区域具有显著的影响,尤其是在有限样本的情况下。

更新时间: 2024-04-04 13:30:59

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2404.02656v2

Data Upcycling Knowledge Distillation for Image Super-Resolution

Knowledge distillation (KD) compresses deep neural networks by transferring task-related knowledge from cumbersome pre-trained teacher models to compact student models. However, current KD methods for super-resolution (SR) networks overlook the nature of SR task that the outputs of the teacher model are noisy approximations to the ground-truth distribution of high-quality images (GT), which shades the teacher model's knowledge to result in limited KD effects. To utilize the teacher model beyond the GT upper-bound, we present the Data Upcycling Knowledge Distillation (DUKD), to transfer the teacher model's knowledge to the student model through the upcycled in-domain data derived from training data. Besides, we impose label consistency regularization to KD for SR by the paired invertible augmentations to improve the student model's performance and robustness. Comprehensive experiments demonstrate that the DUKD method significantly outperforms previous arts on several SR tasks.

Updated: 2024-04-04 13:29:25

标题: 数据再利用知识蒸馏用于图像超分辨率

摘要: 知识蒸馏(KD)通过将任务相关知识从繁琐的预训练教师模型转移到紧凑的学生模型来压缩深度神经网络。然而,目前针对超分辨率(SR)网络的KD方法忽视了SR任务的性质,即教师模型的输出是高质量图像(GT)的嘈杂近似,这使教师模型的知识变得模糊,导致KD效果有限。为了利用教师模型超出GT上限的知识,我们提出了数据升级知识蒸馏(DUKD),通过从训练数据衍生的升级领域数据将教师模型的知识传递给学生模型。此外,我们通过成对的可逆增强来对KD进行标签一致性规范,以提高学生模型的性能和鲁棒性。全面的实验表明,DUKD方法在多个SR任务上明显优于先前的方法。

更新时间: 2024-04-04 13:29:25

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2309.14162v3

RoleCraft-GLM: Advancing Personalized Role-Playing in Large Language Models

This study presents RoleCraft-GLM, an innovative framework aimed at enhancing personalized role-playing with Large Language Models (LLMs). RoleCraft-GLM addresses the key issue of lacking personalized interactions in conversational AI, and offers a solution with detailed and emotionally nuanced character portrayals. We contribute a unique conversational dataset that shifts from conventional celebrity-centric characters to diverse, non-celebrity personas, thus enhancing the realism and complexity of language modeling interactions. Additionally, our approach includes meticulous character development, ensuring dialogues are both realistic and emotionally resonant. The effectiveness of RoleCraft-GLM is validated through various case studies, highlighting its versatility and skill in different scenarios. Our framework excels in generating dialogues that accurately reflect characters' personality traits and emotions, thereby boosting user engagement. In conclusion, RoleCraft-GLM marks a significant leap in personalized AI interactions, and paves the way for more authentic and immersive AI-assisted role-playing experiences by enabling more nuanced and emotionally rich dialogues

Updated: 2024-04-04 13:27:38

标题: RoleCraft-GLM:推进大型语言模型中的个性化角色扮演

摘要: 这项研究介绍了RoleCraft-GLM,这是一个旨在通过大型语言模型(LLMs)增强个性化角色扮演的创新框架。RoleCraft-GLM解决了对话人工智能中缺乏个性化互动的关键问题,并提供了具有详细和情感细腻的角色刻画的解决方案。我们贡献了一个独特的对话数据集,从传统的以名人为中心的角色转变为多样化、非名人的人物,从而增强了语言建模互动的现实感和复杂性。此外,我们的方法包括精心的角色发展,确保对话既现实又具有情感共鸣。通过各种案例研究验证了RoleCraft-GLM的有效性,突显了其在不同场景中的多功能性和技巧。我们的框架在生成准确反映角色个性特征和情感的对话方面表现出色,从而提升了用户参与度。总之,RoleCraft-GLM标志着个性化人工智能互动的重大飞跃,并为更真实和沉浸式的人工智能辅助角色扮演体验铺平道路,实现更细腻和情感丰富的对话。

更新时间: 2024-04-04 13:27:38

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2401.09432v2

Learning From Simplicial Data Based on Random Walks and 1D Convolutions

Triggered by limitations of graph-based deep learning methods in terms of computational expressivity and model flexibility, recent years have seen a surge of interest in computational models that operate on higher-order topological domains such as hypergraphs and simplicial complexes. While the increased expressivity of these models can indeed lead to a better classification performance and a more faithful representation of the underlying system, the computational cost of these higher-order models can increase dramatically. To this end, we here explore a simplicial complex neural network learning architecture based on random walks and fast 1D convolutions (SCRaWl), in which we can adjust the increase in computational cost by varying the length and number of random walks considered while accounting for higher-order relationships. Importantly, due to the random walk-based design, the expressivity of the proposed architecture is provably incomparable to that of existing message-passing simplicial neural networks. We empirically evaluate SCRaWl on real-world datasets and show that it outperforms other simplicial neural networks.

Updated: 2024-04-04 13:27:22

标题: 基于随机游走和一维卷积的单纯数据学习

摘要: 由于基于图的深度学习方法在计算表达能力和模型灵活性方面存在局限,近年来人们对在高阶拓扑域(如超图和单纯复合体)上运行的计算模型表现出了极大的兴趣。尽管这些模型的增强表达能力确实可以导致更好的分类性能和对底层系统更忠实的表示,但这些高阶模型的计算成本可能会大幅增加。为此,我们在这里探索了一种基于随机游走和快速一维卷积的单纯复合体神经网络学习架构(SCRaWl),在这种架构中,我们可以通过调整考虑的随机游走的长度和数量来调整计算成本的增加,并考虑高阶关系。重要的是,由于基于随机游走的设计,所提出的架构的表达能力可证明不可与现有的消息传递单纯复合体神经网络相比。我们在真实数据集上对SCRaWl进行了实证评估,并展示了它优于其他单纯复合体神经网络。

更新时间: 2024-04-04 13:27:22

领域: cs.LG

下载: http://arxiv.org/abs/2404.03434v1

Can Large Language Models Beat Wall Street? Unveiling the Potential of AI in Stock Selection

This paper introduces MarketSenseAI, an innovative framework leveraging GPT-4's advanced reasoning for selecting stocks in financial markets. By integrating Chain of Thought and In-Context Learning, MarketSenseAI analyzes diverse data sources, including market trends, news, fundamentals, and macroeconomic factors, to emulate expert investment decision-making. The development, implementation, and validation of the framework are elaborately discussed, underscoring its capability to generate actionable and interpretable investment signals. A notable feature of this work is employing GPT-4 both as a predictive mechanism and signal evaluator, revealing the significant impact of the AI-generated explanations on signal accuracy, reliability and acceptance. Through empirical testing on the competitive S&P 100 stocks over a 15-month period, MarketSenseAI demonstrated exceptional performance, delivering excess alpha of 10% to 30% and achieving a cumulative return of up to 72% over the period, while maintaining a risk profile comparable to the broader market. Our findings highlight the transformative potential of Large Language Models in financial decision-making, marking a significant leap in integrating generative AI into financial analytics and investment strategies.

Updated: 2024-04-04 13:18:55

标题: 大语言模型能击败华尔街吗?揭示AI在股票选择中的潜力

摘要: 本文介绍了MarketSenseAI,这是一个创新框架,利用GPT-4的先进推理来选择金融市场的股票。通过整合思维链和上下文学习,MarketSenseAI分析多样的数据来源,包括市场趋势、新闻、基本面和宏观经济因素,以模拟专业投资决策。该框架的开发、实施和验证得到了详细讨论,强调了其生成可操作和可解释的投资信号的能力。这项工作的一个显着特点是将GPT-4既作为预测机制又作为信号评估器,揭示了人工智能生成的解释对信号的准确性、可靠性和接受度的显著影响。通过在竞争激烈的标准普尔100支股票上进行为期15个月的实证测试,MarketSenseAI展现出了出色的表现,提供了10%到30%的超额阿尔法,并在该期间实现了最高达72%的累计回报,同时保持了与更广泛市场相当的风险配置。我们的发现突出了大型语言模型在金融决策中的变革潜力,标志着在金融分析和投资策略中整合生成式人工智能的重大飞跃。

更新时间: 2024-04-04 13:18:55

领域: q-fin.CP,cs.AI,cs.CE,cs.CL,cs.LG,68T07, 68T50, 91G10, 91G15,I.2.1; I.2.7; J.4

下载: http://arxiv.org/abs/2401.03737v2

Accurate estimation of feature importance faithfulness for tree models

In this paper, we consider a perturbation-based metric of predictive faithfulness of feature rankings (or attributions) that we call PGI squared. When applied to decision tree-based regression models, the metric can be computed accurately and efficiently for arbitrary independent feature perturbation distributions. In particular, the computation does not involve Monte Carlo sampling that has been typically used for computing similar metrics and which is inherently prone to inaccuracies. Moreover, we propose a method of ranking features by their importance for the tree model's predictions based on PGI squared. Our experiments indicate that in some respects, the method may identify the globally important features better than the state-of-the-art SHAP explainer

Updated: 2024-04-04 13:09:26

标题: 树模型中特征重要性忠实度的准确估计

摘要: 在本文中,我们考虑了一种基于扰动的特征排名(或归因)的预测忠实度度量,我们称之为PGI平方。当应用于基于决策树的回归模型时,该度量可以准确且高效地计算任意独立特征扰动分布。特别地,该计算不涉及通常用于计算类似度量的蒙特卡罗抽样,后者在本质上容易出现不准确性。此外,我们提出了一种基于PGI平方的树模型预测重要性对特征进行排名的方法。我们的实验表明,在某些方面,该方法可能比最先进的SHAP解释器更好地识别全局重要的特征。

更新时间: 2024-04-04 13:09:26

领域: cs.LG

下载: http://arxiv.org/abs/2404.03426v1

ChangeMamba: Remote Sensing Change Detection with Spatio-Temporal State Space Model

Convolutional neural networks (CNN) and Transformers have made impressive progress in the field of remote sensing change detection (CD). However, both architectures have their inherent shortcomings. Recently, the Mamba architecture, based on spatial state models, has shown remarkable performance in a series of natural language processing tasks, which can effectively compensate for the shortcomings of the above two architectures. In this paper, we explore for the first time the potential of the Mamba architecture for remote sensing change detection tasks. We tailor the corresponding frameworks, called MambaBCD, MambaSCD, and MambaBDA, for binary change detection (BCD), semantic change detection (SCD), and building damage assessment (BDA), respectively. All three frameworks adopt the cutting-edge visual Mamba architecture as the encoder, which allows full learning of global spatial contextual information from the input images. For the change decoder, which is available in all three architectures, we propose three spatio-temporal relationship modeling mechanisms, which can be naturally combined with the Mamba architecture and fully utilize its attribute to achieve spatio-temporal interaction of multi-temporal features and obtain accurate change information. On five benchmark datasets, our proposed frameworks outperform current CNN- and Transformer-based approaches without using any complex strategies or tricks, fully demonstrating the potential of the Mamba architecture. Specifically, we obtained 83.11%, 88.39% and 94.19% F1 scores on the three BCD datasets SYSU, LEVIR-CD+, and WHU-CD; on the SCD dataset SECOND, we obtained 24.04% SeK; and on the xBD dataset, we obtained 81.41% overall F1 score. The source code will be available in https://github.com/ChenHongruixuan/MambaCD

Updated: 2024-04-04 13:06:25

标题: ChangeMamba:基于时空状态空间模型的遥感变化检测

摘要: 卷积神经网络(CNN)和Transformer在遥感变化检测(CD)领域取得了显著进展。然而,这两种架构都有其固有的缺点。最近,基于空间状态模型的Mamba架构在一系列自然语言处理任务中表现出了卓越的性能,可以有效弥补上述两种架构的缺点。在本文中,我们首次探索了Mamba架构在遥感变化检测任务中的潜力。我们为二值变化检测(BCD)、语义变化检测(SCD)和建筑损害评估(BDA)分别定制了相应的框架,分别称为MambaBCD、MambaSCD和MambaBDA。这三个框架都采用了最前沿的视觉Mamba架构作为编码器,可以从输入图像中完全学习全局空间上下文信息。对于变化解码器,在这三种架构中都可用,我们提出了三种时空关系建模机制,可以自然地与Mamba架构结合,并充分利用其特性,实现多时空特征的时空交互,并获得准确的变化信息。在五个基准数据集上,我们提出的框架在不使用任何复杂策略或技巧的情况下,优于当前基于CNN和Transformer的方法,充分展示了Mamba架构的潜力。具体来说,我们在三个BCD数据集SYSU、LEVIR-CD+和WHU-CD上分别获得了83.11%、88.39%和94.19%的F1得分;在SCD数据集SECOND上,我们获得了24.04%的SeK;在xBD数据集上,我们获得了81.41%的总体F1得分。源代码将在https://github.com/ChenHongruixuan/MambaCD上提供。

更新时间: 2024-04-04 13:06:25

领域: eess.IV,cs.AI,cs.CV

下载: http://arxiv.org/abs/2404.03425v1

Set-Type Belief Propagation with Applications to Poisson Multi-Bernoulli SLAM

Belief propagation (BP) is a useful probabilistic inference algorithm for efficiently computing approximate marginal probability densities of random variables. However, in its standard form, BP is only applicable to the vector-type random variables with a fixed and known number of vector elements, while certain applications rely on RFSs with an unknown number of vector elements. In this paper, we develop BP rules for factor graphs defined on sequences of RFSs where each RFS has an unknown number of elements, with the intention of deriving novel inference methods for RFSs. Furthermore, we show that vector-type BP is a special case of set-type BP, where each RFS follows the Bernoulli process. To demonstrate the validity of developed set-type BP, we apply it to the PMB filter for SLAM, which naturally leads to new set-type BP-mapping, SLAM, multi-target tracking, and simultaneous localization and tracking filters. Finally, we explore the relationships between the vector-type BP and the proposed set-type BP PMB-SLAM implementations and show a performance gain of the proposed set-type BP PMB-SLAM filter in comparison with the vector-type BP-SLAM filter.

Updated: 2024-04-04 12:59:14

标题: 使用置信度传播的集合类型,应用于泊松多伯努利SLAM

摘要: 信念传播(BP)是一种有用的概率推理算法,用于高效计算随机变量的近似边际概率密度。然而,在其标准形式中,BP仅适用于具有固定和已知向量元素数量的向量型随机变量,而某些应用依赖于具有未知向量元素数量的RFSs。在本文中,我们开发了在由具有未知元素数量的RFSs序列上定义的因子图的BP规则,旨在推导RFSs的新推理方法。此外,我们表明向量型BP是集合型BP的一个特例,其中每个RFS遵循伯努利过程。为了证明开发的集合型BP的有效性,我们将其应用于SLAM的PMB滤波器,这自然地导致了新的集合型BP-映射、SLAM、多目标跟踪和同时定位和跟踪滤波器。最后,我们探讨了向量型BP和建议的集合型BP PMB-SLAM实现之间的关系,并展示了建议的集合型BP PMB-SLAM滤波器相对于向量型BP-SLAM滤波器的性能增益。

更新时间: 2024-04-04 12:59:14

领域: cs.AI,eess.SP

下载: http://arxiv.org/abs/2305.04797v3

An Optimization Framework to Personalize Passive Cardiac Mechanics

Personalized cardiac mechanics modeling is a powerful tool for understanding the biomechanics of cardiac function in health and disease and assisting in treatment planning. However, current models are limited to using medical images acquired at a single cardiac phase, often limiting their applicability for processing dynamic image acquisitions. This study introduces an inverse finite element analysis (iFEA) framework to estimate the passive mechanical properties of cardiac tissue using time-dependent medical image data. The iFEA framework relies on a novel nested optimization scheme, in which the outer iterations utilize a traditional optimization method to best approximate material parameters that fit image data, while the inner iterations employ an augmented Sellier's algorithm to estimate the stress-free reference configuration. With a focus on characterizing the passive mechanical behavior, the framework employs structurally based anisotropic hyperelastic constitutive models and physiologically relevant boundary conditions to simulate myocardial mechanics. We use a stabilized variational multiscale formulation for solving the governing nonlinear elastodynamics equations, verified for cardiac mechanics applications. The framework is tested in myocardium models of biventricle and left atrium derived from cardiac phase-resolved computed tomographic (CT) images of a healthy subject and three patients with hypertrophic obstructive cardiomyopathy (HOCM). The impact of the choice of optimization methods and other numerical settings, including fiber direction parameters, mesh size, initial parameters for optimization, and perturbations to optimal material parameters, is assessed using a rigorous sensitivity analysis. The performance of the current iFEA is compared against an assumed power-law-based pressure-volume relation, typically used for single-phase image acquisition.

Updated: 2024-04-04 12:54:30

标题: 一个优化框架用于个性化 passsive 心脏力学

摘要: 个性化心脏力学建模是理解健康和疾病中心脏功能生物力学的强大工具,并协助治疗规划。然而,当前模型仅限于使用在单个心脏相位获取的医学影像,通常限制了它们对处理动态图像采集的适用性。本研究引入了一个逆有限元分析(iFEA)框架,使用时间相关医学影像数据估计心脏组织的被动力学特性。iFEA框架依赖于一种新颖的嵌套优化方案,外部迭代利用传统优化方法最佳逼近适合图像数据的材料参数,而内部迭代则利用增强的Sellier算法估计无应力参考配置。着重于表征被动力学行为,该框架采用基于结构的各向异性超弹性本构模型和生理相关的边界条件来模拟心肌力学。我们使用稳定的变分多尺度公式来解决主管非线性弹性动力学方程,经过验证适用于心脏力学应用。该框架在从一名健康受试者和三名患有肥厚性梗阻性心肌病(HOCM)的患者的心脏相位分辨率计算机断层扫描(CT)图像中获取的双室心肌和左房模型中进行了测试。通过严格的敏感性分析评估了优化方法和其他数值设置的选择,包括纤维方向参数、网格大小、优化的初始参数以及对最佳材料参数的扰动。当前iFEA的性能与通常用于单相图像采集的假设功率律压力-容积关系进行了比较。

更新时间: 2024-04-04 12:54:30

领域: physics.med-ph,cs.AI

下载: http://arxiv.org/abs/2404.02807v2

Integrating Hyperparameter Search into GramML

Automated Machine Learning (AutoML) has become increasingly popular in recent years due to its ability to reduce the amount of time and expertise required to design and develop machine learning systems. This is very important for the practice of machine learning, as it allows building strong baselines quickly, improving the efficiency of the data scientists, and reducing the time to production. However, despite the advantages of AutoML, it faces several challenges, such as defining the solutions space and exploring it efficiently. Recently, some approaches have been shown to be able to do it using tree-based search algorithms and context-free grammars. In particular, GramML presents a model-free reinforcement learning approach that leverages pipeline configuration grammars and operates using Monte Carlo tree search. However, one of the limitations of GramML is that it uses default hyperparameters, limiting the search problem to finding optimal pipeline structures for the available data preprocessors and models. In this work, we propose an extension to GramML that supports larger search spaces including hyperparameter search. We evaluated the approach using an OpenML benchmark and found significant improvements compared to other state-of-the-art techniques.

Updated: 2024-04-04 12:54:13

标题: 将超参数搜索集成到GramML中

摘要: 自动机器学习(AutoML)近年来越来越受欢迎,因为它能够减少设计和开发机器学习系统所需的时间和专业知识。这对机器学习的实践非常重要,因为它能够快速建立强大的基线,提高数据科学家的效率,并缩短投入生产的时间。然而,尽管AutoML具有诸多优势,但它也面临一些挑战,比如定义解决方案空间和有效地探索它。最近,一些方法已经表明能够使用基于树的搜索算法和无上下文语法来做到这一点。特别是,GramML提出了一种基于模型无关的强化学习方法,利用管道配置语法,并使用蒙特卡洛树搜索进行操作。然而,GramML的一个限制是它使用默认超参数,限制了搜索问题仅限于为可用的数据预处理器和模型找到最佳的管道结构。在这项工作中,我们提出了对GramML的扩展,支持更大的搜索空间,包括超参数搜索。我们使用OpenML基准进行了评估,并发现与其他最先进技术相比有显著的改进。

更新时间: 2024-04-04 12:54:13

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2404.03419v1

Permissible Knowledge Pooling

Information pooling has been extensively formalised across various logical frameworks in distributed systems, characterized by diverse information-sharing patterns. These approaches generally adopt an intersection perspective, aggregating all possible information, regardless of whether it is known or unknown to the agents. In contrast, this work adopts a unique stance, emphasising that sharing knowledge means distributing what is known, rather than what remains uncertain. This paper introduces a dynamic logic for knowledge pooling or sharing and further discusses a potential framework for permissible knowledge pooling.

Updated: 2024-04-04 12:51:28

标题: 可容许的知识汇聚

摘要: 信息汇集在分布式系统中已被广泛形式化,其特点是采用多样化的信息共享模式。这些方法通常采用交集视角,聚合所有可能的信息,无论代理是否已知。相比之下,这项工作采用独特立场,强调分享知识意味着分发已知的内容,而不是保留不确定的内容。本文介绍了一个用于知识汇集或分享的动态逻辑,并进一步讨论了一个允许的知识汇集框架。

更新时间: 2024-04-04 12:51:28

领域: cs.LO,cs.AI

下载: http://arxiv.org/abs/2404.03418v1

An Adaptive Hydropower Management Approach for Downstream Ecosystem Preservation

Hydropower plants play a pivotal role in advancing clean and sustainable energy production, contributing significantly to the global transition towards renewable energy sources. However, hydropower plants are currently perceived both positively as sources of renewable energy and negatively as disruptors of ecosystems. In this work, we highlight the overlooked potential of using hydropower plant as protectors of ecosystems by using adaptive ecological discharges. To advocate for this perspective, we propose using a neural network to predict the minimum ecological discharge value at each desired time. Additionally, we present a novel framework that seamlessly integrates it into hydropower management software, taking advantage of the well-established approach of using traditional constrained optimisation algorithms. This novel approach not only protects the ecosystems from climate change but also contributes to potentially increase the electricity production.

Updated: 2024-04-04 12:47:28

标题: 一种适应性水电管理方法,用于下游生态系统保护

摘要: 水电站在推动清洁和可持续能源生产方面发挥着关键作用,对全球向可再生能源转型做出了重大贡献。然而,水电站目前被认为既是可再生能源的来源,又是生态系统的破坏者。在这项工作中,我们强调了利用自适应生态排放来将水电站作为生态系统的保护者的潜力。为了支持这一观点,我们提出使用神经网络来预测每个所需时间的最小生态排放值。此外,我们提出了一个新颖的框架,将其无缝集成到水电管理软件中,利用传统的受限优化算法。这种新颖方法不仅保护生态系统免受气候变化的影响,还有可能增加电力生产量。

更新时间: 2024-04-04 12:47:28

领域: cs.LG,cs.CE,math.OC,J.2; I.5.1; G.1.6

下载: http://arxiv.org/abs/2403.02821v2

Can Small Language Models Help Large Language Models Reason Better?: LM-Guided Chain-of-Thought

We introduce a novel framework, LM-Guided CoT, that leverages a lightweight (i.e., <1B) language model (LM) for guiding a black-box large (i.e., >10B) LM in reasoning tasks. Specifically, the lightweight LM first generates a rationale for each input instance. The Frozen large LM is then prompted to predict a task output based on the rationale generated by the lightweight LM. Our approach is resource-efficient in the sense that it only requires training the lightweight LM. We optimize the model through 1) knowledge distillation and 2) reinforcement learning from rationale-oriented and task-oriented reward signals. We assess our method with multi-hop extractive question answering (QA) benchmarks, HotpotQA, and 2WikiMultiHopQA. Experimental results show that our approach outperforms all baselines regarding answer prediction accuracy. We also find that reinforcement learning helps the model to produce higher-quality rationales with improved QA performance.

Updated: 2024-04-04 12:46:37

标题: 小语言模型是否能帮助大语言模型更好地推理?:LM引导的思维链

摘要: 我们引入了一个新颖的框架,LM-Guided CoT,利用一个轻量级(即<1B)语言模型(LM)来指导黑盒大型(即>10B)LM进行推理任务。具体来说,轻量级LM首先为每个输入实例生成一个理由。然后,冻结的大型LM被提示根据轻量级LM生成的理由预测任务输出。我们的方法在资源利用上非常高效,因为它只需要训练轻量级LM。我们通过知识蒸馏和从基于理由和任务导向的奖励信号中进行强化学习来优化模型。我们通过多跳抽取式问答(QA)基准HotpotQA和2WikiMultiHopQA评估了我们的方法。实验结果显示,我们的方法在答案预测准确性方面优于所有基准。我们还发现,强化学习有助于模型生成质量更高的理由,并提高了QA性能。

更新时间: 2024-04-04 12:46:37

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2404.03414v1

Red Teaming GPT-4V: Are GPT-4V Safe Against Uni/Multi-Modal Jailbreak Attacks?

Various jailbreak attacks have been proposed to red-team Large Language Models (LLMs) and revealed the vulnerable safeguards of LLMs. Besides, some methods are not limited to the textual modality and extend the jailbreak attack to Multimodal Large Language Models (MLLMs) by perturbing the visual input. However, the absence of a universal evaluation benchmark complicates the performance reproduction and fair comparison. Besides, there is a lack of comprehensive evaluation of closed-source state-of-the-art (SOTA) models, especially MLLMs, such as GPT-4V. To address these issues, this work first builds a comprehensive jailbreak evaluation dataset with 1445 harmful questions covering 11 different safety policies. Based on this dataset, extensive red-teaming experiments are conducted on 11 different LLMs and MLLMs, including both SOTA proprietary models and open-source models. We then conduct a deep analysis of the evaluated results and find that (1) GPT4 and GPT-4V demonstrate better robustness against jailbreak attacks compared to open-source LLMs and MLLMs. (2) Llama2 and Qwen-VL-Chat are more robust compared to other open-source models. (3) The transferability of visual jailbreak methods is relatively limited compared to textual jailbreak methods. The dataset and code can be found here https://anonymous.4open.science/r/red_teaming_gpt4-C1CE/README.md .

Updated: 2024-04-04 12:38:14

标题: 红队对抗GPT-4V:GPT-4V是否能抵御单/多模态越狱攻击?

摘要: 各种越狱攻击方法已被提出,用于对大型语言模型(LLMs)进行红队测试,并揭示了LLMs的脆弱防护措施。此外,一些方法不仅限于文本模态,还通过扰乱视觉输入将越狱攻击扩展到多模态大型语言模型(MLLMs)。然而,缺乏一个通用的评估基准使性能重现和公平比较变得复杂。此外,对于封闭源最先进(SOTA)模型,尤其是MLLMs,如GPT-4V,缺乏全面评估。为解决这些问题,本研究首先构建了一个包含11种不同安全策略的1445个有害问题的全面越狱评估数据集。基于这个数据集,在11种不同的LLMs和MLLMs上进行了广泛的红队实验,包括SOTA专有模型和开源模型。然后,我们对评估结果进行了深入分析,发现(1)与开源LLMs和MLLMs相比,GPT4和GPT-4V对越狱攻击表现出更好的鲁棒性。 (2)与其他开源模型相比,Llama2和Qwen-VL-Chat更加稳健。 (3)与文本越狱方法相比,视觉越狱方法的可转移性相对有限。数据集和代码可在https://anonymous.4open.science/r/red_teaming_gpt4-C1CE/README.md找到。

更新时间: 2024-04-04 12:38:14

领域: cs.LG,cs.CL,cs.CR

下载: http://arxiv.org/abs/2404.03411v1

Calibrating Bayesian UNet++ for Sub-Seasonal Forecasting

Seasonal forecasting is a crucial task when it comes to detecting the extreme heat and colds that occur due to climate change. Confidence in the predictions should be reliable since a small increase in the temperatures in a year has a big impact on the world. Calibration of the neural networks provides a way to ensure our confidence in the predictions. However, calibrating regression models is an under-researched topic, especially in forecasters. We calibrate a UNet++ based architecture, which was shown to outperform physics-based models in temperature anomalies. We show that with a slight trade-off between prediction error and calibration error, it is possible to get more reliable and sharper forecasts. We believe that calibration should be an important part of safety-critical machine learning applications such as weather forecasters.

Updated: 2024-04-04 12:35:33

标题: 为亚季节预测校准贝叶斯UNet++

摘要: 季节性预测在检测由气候变化引起的极端高温和寒冷时是至关重要的任务。对预测的信心应该是可靠的,因为一年中温度的小幅增加会对世界产生巨大影响。神经网络的校准提供了一种确保我们对预测的信心的方法。然而,校准回归模型是一个研究不足的话题,尤其是在预测者中。我们校准了基于UNet++的架构,该架构在温度异常方面表现优于基于物理的模型。我们展示了在预测误差和校准误差之间略微进行权衡,可以获得更可靠和更锐利的预测。我们相信校准应该是安全关键的机器学习应用的重要组成部分,例如天气预报员。

更新时间: 2024-04-04 12:35:33

领域: cs.LG,cs.CV

下载: http://arxiv.org/abs/2403.16612v2

Towards Robust Continual Learning with Bayesian Adaptive Moment Regularization

The pursuit of long-term autonomy mandates that robotic agents must continuously adapt to their changing environments and learn to solve new tasks. Continual learning seeks to overcome the challenge of catastrophic forgetting, where learning to solve new tasks causes a model to forget previously learnt information. Prior-based continual learning methods are appealing for robotic applications as they are space efficient and typically do not increase in computational complexity as the number of tasks grows. Despite these desirable properties, prior-based approaches typically fail on important benchmarks and consequently are limited in their potential applications compared to their memory-based counterparts. We introduce Bayesian adaptive moment regularization (BAdam), a novel prior-based method that better constrains parameter growth, leading to lower catastrophic forgetting. Our method boasts a range of desirable properties for robotic applications such as being lightweight and task label-free, converging quickly, and offering calibrated uncertainty that is important for safe real-world deployment. Results show that BAdam achieves state-of-the-art performance for prior-based methods on challenging single-headed class-incremental experiments such as Split MNIST and Split FashionMNIST, and does so without relying on task labels or discrete task boundaries.

Updated: 2024-04-04 12:32:43

标题: 朝着使用贝叶斯自适应动量正则化实现稳健的持续学习

摘要: 追求长期自主性要求机器人代理必须不断适应其不断变化的环境并学会解决新任务。持续学习旨在克服灾难性遗忘的挑战,即学习解决新任务会导致模型忘记先前学习的信息。基于先验的持续学习方法对机器人应用具有吸引力,因为它们在空间效率上通常不会随着任务数量的增加而增加计算复杂性。尽管具有这些理想特性,但基于先验的方法通常在重要基准测试中失败,因此与基于记忆的对应方法相比在潜在应用上受到限制。我们引入了一种名为贝叶斯自适应矩正则化(BAdam)的新型基于先验的方法,它更好地约束参数增长,从而降低灾难性遗忘。我们的方法在机器人应用中具有一系列理想特性,例如轻量级和无任务标签,快速收敛,并提供重要的用于安全实际部署的校准不确定性。结果显示,BAdam在具有挑战性的单头类增量实验(如Split MNIST和Split FashionMNIST)中实现了基于先验的方法的最新性能,并且在此过程中不依赖于任务标签或离散任务边界。

更新时间: 2024-04-04 12:32:43

领域: cs.LG

下载: http://arxiv.org/abs/2309.08546v2

Incorporating Recklessness to Collaborative Filtering based Recommender Systems

Recommender systems are intrinsically tied to a reliability/coverage dilemma: The more reliable we desire the forecasts, the more conservative the decision will be and thus, the fewer items will be recommended. This leads to a significant drop in the novelty of these systems, since instead of recommending uncertain unusual items, they focus on predicting items with guaranteed success. In this paper, we propose the inclusion of a new term in the learning process of matrix factorization-based recommender systems, called recklessness, that takes into account the variance of the output probability distribution of the predicted ratings. In this way, gauging this recklessness measure we can force more spiky output distribution, enabling the control of the risk level desired when making decisions about the reliability of a prediction. Experimental results demonstrate that recklessness not only allows for risk regulation but also improves the quantity and quality of predictions provided by the recommender system.

Updated: 2024-04-04 12:26:03

标题: 将鲁莽性纳入基于协作过滤的推荐系统

摘要: 推荐系统与可靠性/覆盖率困境密切相关:我们希望预测更可靠,决策就会更保守,因此推荐的物品数量会减少。这导致这些系统的新颖性显著下降,因为它们不再推荐不确定的不寻常物品,而是专注于预测成功保证的物品。在本文中,我们提出在基于矩阵分解的推荐系统的学习过程中包含一个新术语,称为鲁莽度,它考虑了预测评分的输出概率分布的方差。通过衡量这种鲁莽度度量,我们可以强制更尖锐的输出分布,从而控制在对预测可靠性进行决策时所需的风险水平。实验结果表明,鲁莽度不仅可以实现风险调节,还可以改善推荐系统提供的预测的数量和质量。

更新时间: 2024-04-04 12:26:03

领域: cs.IR,cs.AI,cs.LG,stat.ML,Primary: 68T05, Secondary: 68T42, 62M20,I.2; I.5

下载: http://arxiv.org/abs/2308.02058v2

A replica analysis of Self-Training of Linear Classifier

Self-training (ST) is a simple and standard approach in semi-supervised learning that has been applied to many machine learning problems. Despite its widespread acceptance and practical effectiveness, it is still not well understood why and how ST improves performance by fitting the model to potentially erroneous pseudo-labels. To investigate the properties of ST, in this study, we derive and analyze a sharp characterization of the behavior of iterative ST when training a linear classifier by minimizing the ridge-regularized convex loss for binary Gaussian mixtures, in the asymptotic limit where input dimension and data size diverge proportionally. The derivation is based on the replica method of statistical mechanics. The result indicates that, when the total number of iterations is large, ST may find a classification plane with the optimal direction regardless of the label imbalance by accumulating small parameter updates over long iterations. It is argued that this is because the small update of ST can accumulate information of the data in an almost noiseless way. However, when a label imbalance is present in true labels, the performance of the ST is significantly lower than that of supervised learning with true labels, because the ratio between the norm of the weight and the magnitude of the bias can become significantly large. To overcome the problems in label imbalanced cases, several heuristics are introduced. By numerically analyzing the asymptotic formula, it is demonstrated that with the proposed heuristics, ST can find a classifier whose performance is nearly compatible with supervised learning using true labels even in the presence of significant label imbalance.

Updated: 2024-04-04 12:14:30

标题: 一个线性分类器自训练的复制分析

摘要: 自学习(ST)是半监督学习中一种简单和标准的方法,已应用于许多机器学习问题。尽管广泛接受和实际有效,但仍不清楚为什么以及如何通过将模型拟合到潜在错误的伪标签来改善性能。为了研究ST的特性,在这项研究中,我们推导并分析了通过最小化适用于二元高斯混合物的岭正则化凸损失来训练线性分类器时,当输入维度和数据大小按比例发散时,迭代ST的行为的尖锐特征描述。推导基于统计力学的复制方法。结果表明,当迭代总数很大时,ST可能会找到一个具有最佳方向的分类平面,而不考虑标签不平衡,因为在长时间迭代中积累小参数更新。认为这是因为ST的小更新可以以几乎无噪声的方式积累数据的信息。然而,当真实标签中存在标签不平衡时,ST的性能显著低于使用真实标签的监督学习,因为权重的范数与偏差的幅度之比可能显著大。为了克服标签不平衡情况下的问题,引入了几种启发式方法。通过对渐近公式进行数值分析,证明了使用提出的启发式方法,即使在存在显着标签不平衡的情况下,ST也可以找到性能几乎与使用真实标签的监督学习相兼容的分类器。

更新时间: 2024-04-04 12:14:30

领域: stat.ML,cond-mat.dis-nn,cond-mat.stat-mech,cs.LG,math.ST,stat.TH

下载: http://arxiv.org/abs/2205.07739v2

Implementation of Entropically Secure Encryption: Securing Personal Health Data

Entropically Secure Encryption (ESE) offers unconditional security with shorter keys compared to the One-Time Pad. In this paper, we present the first implementation of ESE for bulk encryption. The main computational bottleneck for bulk ESE is a multiplication in a very large finite field. This involves multiplication of polynomials followed by modular reduction. We have implemented polynomial multiplication based on the gf2x library, with some modifications that avoid inputs of vastly different length, thus improving speed. Additionally, we have implemented a recently proposed efficient reduction algorithm that works for any polynomial degree. We investigate two use cases: X-ray images of patients and human genome data. We conduct entropy estimation using compression methods whose results determine the key lengths required for ESE. We report running times for all steps of the encryption. We discuss the potential of ESE to be used in conjunction with Quantum Key Distribution (QKD), in order to achieve full information-theoretic security of QKD-protected links for these use cases.

Updated: 2024-04-04 12:07:33

标题: 实施熵安全加密:保护个人健康数据

摘要: Entropically Secure Encryption(ESE)相比于一次性密码本,提供了无条件的安全性,并且使用更短的密钥。本文介绍了ESE用于大规模加密的第一个实现。大规模ESE的主要计算瓶颈是在一个非常大的有限域中的乘法运算。这涉及多项式的乘法,然后是模数减法。我们基于gf2x库实现了多项式乘法,通过一些修改避免了输入长度差异很大的情况,从而提高了速度。此外,我们还实现了一个最近提出的高效减法算法,适用于任何多项式次数。我们研究了两种用例:病人的X射线图像和人类基因组数据。我们使用压缩方法进行熵估计,其结果决定了ESE所需的密钥长度。我们报告了加密过程中各步骤的运行时间。我们讨论了ESE与量子密钥分发(QKD)结合使用的潜力,以实现对于这些用例中受QKD保护的链接的完全信息论安全性。

更新时间: 2024-04-04 12:07:33

领域: cs.CR

下载: http://arxiv.org/abs/2404.16857v1

Proceedings 12th International Workshop on Theorem proving components for Educational software

The ThEdu series pursues the smooth transition from an intuitive way of doing mathematics at secondary school to a more formal approach to the subject in STEM education, while favouring software support for this transition by exploiting the power of theorem-proving technologies. What follows is a brief description of how the present volume contributes to this enterprise. The 12th International Workshop on Theorem Proving Components for Educational Software(ThEdu'23), was a satellite event of the 29th international Conference on Automated Deduction (CADE 2023), July 1-4, 2023, Rome, Italy. ThEdu'23 was very successful, with one invited talk, by Yves Bertot (Inria, France), "The challenges of using Type Theory to teach Mathematics", and seven regular contributions. An open call for papers was then issued, to which eight contributions were submitted. Seven submissions have been accepted by our reviewers, who jointly produced at least three careful reports on each of the contributions. The resulting revised papers are collected in the present volume. We, the volume editors, hope that this collection of papers will further promote the development of theorem-proving based software, and that it will allow to improve the mutual understanding between computer scientists, mathematicians and stakeholders in education. PC Chairs:Julien Narboux (University of Strasbourg, France); Walther Neuper (JKU, Johannes Kepler University, Linz, Austria); Pedro Quaresma (University of Coimbra, Portugal)

Updated: 2024-04-04 11:51:26

标题: 第12届国际工作坊论文集:用于教育软件的定理证明组件

摘要: ThEdu系列致力于在中学阶段从直觉性数学方法向STEM教育中更正式的学科方法的平稳过渡,同时通过利用定理证明技术的力量来支持这种过渡。接下来是对本卷如何促进这一目标的简要描述。 第12届国际教育软件定理证明组件研讨会(ThEdu'23)是第29届国际自动推理大会(CADE 2023)的一个卫星活动,于2023年7月1-4日在意大利罗马举行。ThEdu'23非常成功,有一次特邀演讲,由Yves Bertot(法国Inria)主讲“使用类型理论教授数学的挑战”,以及七篇常规投稿。随后发布了论文征集通知,共收到八篇投稿。审稿人员共同对每篇投稿至少进行了三份细致的报告,最终接受了七篇投稿。修改后的论文收集在本卷中。 我们,本卷编辑,希望这些论文的集合将进一步促进基于定理证明的软件的发展,并且能够改善计算机科学家、数学家和教育相关利益相关者之间的相互理解。 PC主席:Julien Narboux(法国斯特拉斯堡大学);Walther Neuper(奥地利琳茨约翰内斯·开普勒大学);Pedro Quaresma(葡萄牙科英布拉大学)

更新时间: 2024-04-04 11:51:26

领域: cs.LO,cs.AI,cs.LG

下载: http://arxiv.org/abs/2404.03709v1

Using construction waste hauling trucks' GPS data to classify earthwork-related locations: A Chengdu case study

Earthwork-related locations (ERLs), such as construction sites, earth dumping ground, and concrete mixing stations, are major sources of urban dust pollution (particulate matters). The effective management of ERLs is crucial and requires timely and efficient tracking of these locations throughout the city. This work aims to identify and classify urban ERLs using GPS trajectory data of over 16,000 construction waste hauling trucks (CWHTs), as well as 58 urban features encompassing geographic, land cover, POI and transport dimensions. We compare several machine learning models and examine the impact of various spatial-temporal features on classification performance using real-world data in Chengdu, China. The results demonstrate that 77.8% classification accuracy can be achieved with a limited number of features. This classification framework was implemented in the Alpha MAPS system in Chengdu, which has successfully identified 724 construction cites/earth dumping ground, 48 concrete mixing stations, and 80 truck parking locations in the city during December 2023, which has enabled local authority to effectively manage urban dust pollution at low personnel costs.

Updated: 2024-04-04 11:41:04

标题: 利用建筑废弃物运输卡车的GPS数据对土方工程相关位置进行分类:以成都为例研究

摘要: 地工相关位置(ERLs),如施工现场、土方填埋场和混凝土搅拌站,是城市尘埃污染(颗粒物)的主要来源。有效管理ERLs至关重要,需要及时和高效地跟踪全市的这些位置。本研究旨在利用超过16,000辆建筑废料运输卡车(CWHTs)的GPS轨迹数据以及包括地理、土地覆盖、POI和交通维度在内的58个城市特征,识别和分类城市ERLs。我们比较了几种机器学习模型,并使用成都市的真实数据检验了各种时空特征对分类性能的影响。结果表明,可以利用有限数量的特征达到77.8%的分类准确率。该分类框架已在成都的Alpha MAPS系统中实施,成功识别了全市在2023年12月的724个建筑工地/土方填埋场、48个混凝土搅拌站和80个卡车停车位置,使地方政府能够以较低的人力成本有效管理城市尘埃污染。

更新时间: 2024-04-04 11:41:04

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2402.14698v3

Heckler: Breaking Confidential VMs with Malicious Interrupts

Hardware-based Trusted execution environments (TEEs) offer an isolation granularity of virtual machine abstraction. They provide confidential VMs (CVMs) that host security-sensitive code and data. AMD SEV-SNP and Intel TDX enable CVMs and are now available on popular cloud platforms. The untrusted hypervisor in these settings is in control of several resource management and configuration tasks, including interrupts. We present Heckler, a new attack wherein the hypervisor injects malicious non-timer interrupts to break the confidentiality and integrity of CVMs. Our insight is to use the interrupt handlers that have global effects, such that we can manipulate a CVM's register states to change the data and control flow. With AMD SEV-SNP and Intel TDX, we demonstrate Heckler on OpenSSH and sudo to bypass authentication. On AMD SEV-SNP we break execution integrity of C, Java, and Julia applications that perform statistical and text analysis. We explain the gaps in current defenses and outline guidelines for future defenses.

Updated: 2024-04-04 11:37:59

标题: Heckler:利用恶意中断破坏机密虚拟机

摘要: 基于硬件的可信执行环境(TEEs)提供了虚拟机抽象的隔离粒度。它们提供了承载安全敏感代码和数据的保密虚拟机(CVMs)。AMD SEV-SNP和Intel TDX支持CVMs,并且现在在流行的云平台上可用。在这些设置中,不受信任的超级监视器控制着几个资源管理和配置任务,包括中断。我们提出了一种新的攻击称为Heckler,其中超级监视器注入恶意非定时器中断以破坏CVMs的保密性和完整性。我们的见解是使用具有全局效果的中断处理程序,以便我们可以操纵CVM的寄存器状态以改变数据和控制流。利用AMD SEV-SNP和Intel TDX,我们展示了Heckler在OpenSSH和sudo上绕过身份验证。在AMD SEV-SNP上,我们破坏了执行完整性,这些应用程序执行统计和文本分析的C、Java和Julia。我们解释了当前防御措施中的差距,并概述了未来防御的指导方针。

更新时间: 2024-04-04 11:37:59

领域: cs.CR

下载: http://arxiv.org/abs/2404.03387v1

SENSOR: Imitate Third-Person Expert's Behaviors via Active Sensoring

In many real-world visual Imitation Learning (IL) scenarios, there is a misalignment between the agent's and the expert's perspectives, which might lead to the failure of imitation. Previous methods have generally solved this problem by domain alignment, which incurs extra computation and storage costs, and these methods fail to handle the \textit{hard cases} where the viewpoint gap is too large. To alleviate the above problems, we introduce active sensoring in the visual IL setting and propose a model-based SENSory imitatOR (SENSOR) to automatically change the agent's perspective to match the expert's. SENSOR jointly learns a world model to capture the dynamics of latent states, a sensor policy to control the camera, and a motor policy to control the agent. Experiments on visual locomotion tasks show that SENSOR can efficiently simulate the expert's perspective and strategy, and outperforms most baseline methods.

Updated: 2024-04-04 11:37:55

标题: 传感器:通过主动传感器模仿第三人专家的行为

摘要: 在许多现实世界的视觉模仿学习(IL)场景中,代理和专家的视角存在不一致,这可能导致模仿失败。先前的方法通常通过领域对齐来解决这个问题,这会产生额外的计算和存储成本,并且这些方法无法处理视角差距过大的“难题”。为了缓解上述问题,我们在视觉IL设置中引入了主动传感器,并提出了一种基于模型的传感器模仿器(SENSOR),以自动改变代理的视角以匹配专家。SENSOR联合学习了一个世界模型来捕捉潜在状态的动态,一个传感器策略来控制摄像头,以及一个运动策略来控制代理。在视觉运动任务上的实验表明,SENSOR能够有效模拟专家的视角和策略,并且胜过大多数基线方法。

更新时间: 2024-04-04 11:37:55

领域: cs.RO,cs.AI,cs.LG

下载: http://arxiv.org/abs/2404.03386v1

REST: Retrieval-Based Speculative Decoding

We introduce Retrieval-Based Speculative Decoding (REST), a novel algorithm designed to speed up language model generation. The key insight driving the development of REST is the observation that the process of text generation often includes certain common phases and patterns. Unlike previous methods that rely on a draft language model for speculative decoding, REST harnesses the power of retrieval to generate draft tokens. This method draws from the reservoir of existing knowledge, retrieving and employing relevant tokens based on the current context. Its plug-and-play nature allows for seamless integration and acceleration of any language models, all without necessitating additional training. When benchmarked on 7B and 13B language models in a single-batch setting, REST achieves a significant speedup of 1.62X to 2.36X on code or text generation. The code of REST is available at https://github.com/FasterDecoding/REST.

Updated: 2024-04-04 11:37:01

标题: REST: 基于检索的推测解码

摘要: 我们介绍了一种名为检索式猜测解码(REST)的新算法,旨在加速语言模型生成。推动REST开发的关键见解是观察到文本生成过程通常包括某些常见阶段和模式。与先前依赖草稿语言模型进行猜测解码的方法不同,REST利用检索的力量生成草稿标记。该方法利用现有知识库,根据当前上下文检索和使用相关标记。其即插即用的特性允许无缝集成和加速任何语言模型,而无需进行额外的训练。在单批次设置中对7B和13B语言模型进行基准测试时,REST在代码或文本生成上实现了1.62X至2.36X的显着加速。REST的代码可在https://github.com/FasterDecoding/REST找到。

更新时间: 2024-04-04 11:37:01

领域: cs.CL,cs.AI,cs.IR,cs.LG

下载: http://arxiv.org/abs/2311.08252v2

Deep Learning in Cardiology

The medical field is creating large amount of data that physicians are unable to decipher and use efficiently. Moreover, rule-based expert systems are inefficient in solving complicated medical tasks or for creating insights using big data. Deep learning has emerged as a more accurate and effective technology in a wide range of medical problems such as diagnosis, prediction and intervention. Deep learning is a representation learning method that consists of layers that transform the data non-linearly, thus, revealing hierarchical relationships and structures. In this review we survey deep learning application papers that use structured data, signal and imaging modalities from cardiology. We discuss the advantages and limitations of applying deep learning in cardiology that also apply in medicine in general, while proposing certain directions as the most viable for clinical use.

Updated: 2024-04-04 11:34:52

标题: 在心脏病学中的深度学习

摘要: 医学领域正在产生大量数据,医生们无法有效地解读和利用这些数据。此外,基于规则的专家系统在解决复杂的医学任务或利用大数据进行洞察方面效率低下。深度学习已经成为一种更准确和有效的技术,可以应用于诊断、预测和干预等各种医学问题。深度学习是一种表示学习方法,由多层组成,可以非线性地转换数据,从而揭示层次关系和结构。在本综述中,我们调查了利用心脏病学中的结构化数据、信号和成像模式的深度学习应用论文。我们讨论了在心脏病学中应用深度学习的优势和局限性,这些也适用于医学的其他领域,并提出了一些在临床应用中最可行的方向。

更新时间: 2024-04-04 11:34:52

领域: cs.CV,cs.AI,cs.LG

下载: http://arxiv.org/abs/1902.11122v5

DIDA: Denoised Imitation Learning based on Domain Adaptation

Imitating skills from low-quality datasets, such as sub-optimal demonstrations and observations with distractors, is common in real-world applications. In this work, we focus on the problem of Learning from Noisy Demonstrations (LND), where the imitator is required to learn from data with noise that often occurs during the processes of data collection or transmission. Previous IL methods improve the robustness of learned policies by injecting an adversarially learned Gaussian noise into pure expert data or utilizing additional ranking information, but they may fail in the LND setting. To alleviate the above problems, we propose Denoised Imitation learning based on Domain Adaptation (DIDA), which designs two discriminators to distinguish the noise level and expertise level of data, facilitating a feature encoder to learn task-related but domain-agnostic representations. Experiment results on MuJoCo demonstrate that DIDA can successfully handle challenging imitation tasks from demonstrations with various types of noise, outperforming most baseline methods.

Updated: 2024-04-04 11:29:05

标题: DIDA:基于领域自适应的去噪模仿学习

摘要: 从低质量数据集中模仿技能,比如次优示范和带有干扰因素的观察,在现实世界的应用中很常见。在这项工作中,我们专注于学习从嘈杂示范中学习的问题(LND),其中模仿者需要从通常在数据收集或传输过程中发生的噪声中学习。先前的IL方法通过将对抗性学习的高斯噪声注入纯专家数据或利用额外的排名信息来提高学习策略的鲁棒性,但它们可能在LND设置中失败。为了缓解上述问题,我们提出了基于领域适应的去噪模仿学习(DIDA),该方法设计了两个判别器来区分数据的噪声水平和专业水平,促使特征编码器学习与任务相关但与领域无关的表示。MuJoCo上的实验结果表明,DIDA可以成功处理来自具有各种类型噪声的示范的具有挑战性的模仿任务,优于大多数基线方法。

更新时间: 2024-04-04 11:29:05

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2404.03382v1

Hybrid Unsupervised Learning Strategy for Monitoring Industrial Batch Processes

Industrial production processes, especially in the pharmaceutical industry, are complex systems that require continuous monitoring to ensure efficiency, product quality, and safety. This paper presents a hybrid unsupervised learning strategy (HULS) for monitoring complex industrial processes. Addressing the limitations of traditional Self-Organizing Maps (SOMs), especially in scenarios with unbalanced data sets and highly correlated process variables, HULS combines existing unsupervised learning techniques to address these challenges. To evaluate the performance of the HULS concept, comparative experiments are performed based on a laboratory batch

Updated: 2024-04-04 11:26:58

标题: 工业批处理过程监测的混合无监督学习策略

摘要: 工业生产过程,特别是在制药行业中,是复杂的系统,需要持续监测以确保效率、产品质量和安全性。本文提出了一种用于监测复杂工业过程的混合无监督学习策略(HULS)。针对传统自组织映射(SOMs)在数据集不平衡和高度相关的过程变量情况下的局限性,HULS结合了现有的无监督学习技术来解决这些挑战。为了评估HULS概念的性能,基于实验室批次进行了比较实验。

更新时间: 2024-04-04 11:26:58

领域: cs.LG,cs.SY,eess.SP,eess.SY

下载: http://arxiv.org/abs/2403.13032v2

On the Theoretical Expressive Power and the Design Space of Higher-Order Graph Transformers

Graph transformers have recently received significant attention in graph learning, partly due to their ability to capture more global interaction via self-attention. Nevertheless, while higher-order graph neural networks have been reasonably well studied, the exploration of extending graph transformers to higher-order variants is just starting. Both theoretical understanding and empirical results are limited. In this paper, we provide a systematic study of the theoretical expressive power of order-$k$ graph transformers and sparse variants. We first show that, an order-$k$ graph transformer without additional structural information is less expressive than the $k$-Weisfeiler Lehman ($k$-WL) test despite its high computational cost. We then explore strategies to both sparsify and enhance the higher-order graph transformers, aiming to improve both their efficiency and expressiveness. Indeed, sparsification based on neighborhood information can enhance the expressive power, as it provides additional information about input graph structures. In particular, we show that a natural neighborhood-based sparse order-$k$ transformer model is not only computationally efficient, but also expressive -- as expressive as $k$-WL test. We further study several other sparse graph attention models that are computationally efficient and provide their expressiveness analysis. Finally, we provide experimental results to show the effectiveness of the different sparsification strategies.

Updated: 2024-04-04 11:26:51

标题: 关于高阶图变换器的理论表达能力和设计空间

摘要: 图形转换器最近在图学习中受到了重视,部分原因是它们通过自注意力能够捕捉更多全局交互。然而,尽管高阶图神经网络已经得到了相当好的研究,但将图形转换器扩展到高阶变体的探索才刚刚开始。理论理解和实证结果都有限。在本文中,我们系统地研究了高阶图形转换器和稀疏变体的理论表达能力。我们首先证明,没有额外结构信息的高阶图形转换器在表达能力上不如$k$-Weisfeiler Lehman ($k$-WL)测试,尽管其计算成本很高。然后我们探索了稀疏化和增强高阶图形转换器的策略,旨在提高它们的效率和表达能力。事实上,基于邻域信息的稀疏化可以增强表达能力,因为它提供了有关输入图结构的额外信息。特别地,我们展示了一个自然的基于邻域的稀疏高阶图形转换器模型不仅计算效率高,而且表达能力强 -- 和$k$-WL测试一样强。我们进一步研究了几种其他计算效率高并提供了表达能力分析的稀疏图注意力模型。最后,我们提供实验结果展示了不同稀疏化策略的有效性。

更新时间: 2024-04-04 11:26:51

领域: cs.LG,cs.CG,math.GN

下载: http://arxiv.org/abs/2404.03380v1

Identifying Climate Targets in National Laws and Policies using Machine Learning

Quantified policy targets are a fundamental element of climate policy, typically characterised by domain-specific and technical language. Current methods for curating comprehensive views of global climate policy targets entail significant manual effort. At present there are few scalable methods for extracting climate targets from national laws or policies, which limits policymakers' and researchers' ability to (1) assess private and public sector alignment with global goals and (2) inform policy decisions. In this paper we present an approach for extracting mentions of climate targets from national laws and policies. We create an expert-annotated dataset identifying three categories of target ('Net Zero', 'Reduction' and 'Other' (e.g. renewable energy targets)) and train a classifier to reliably identify them in text. We investigate bias and equity impacts related to our model and identify specific years and country names as problematic features. Finally, we investigate the characteristics of the dataset produced by running this classifier on the Climate Policy Radar (CPR) dataset of global national climate laws and policies and UNFCCC submissions, highlighting the potential of automated and scalable data collection for existing climate policy databases and supporting further research. Our work represents a significant upgrade in the accessibility of these key climate policy elements for policymakers and researchers. We publish our model at https://huggingface.co/ClimatePolicyRadar/national-climate-targets and related dataset at https://huggingface.co/datasets/ClimatePolicyRadar/national-climate-targets.

Updated: 2024-04-04 11:23:59

标题: 利用机器学习在国家法律和政策中确定气候目标

摘要: 量化的政策目标是气候政策的基本要素,通常以特定领域和技术语言为特征。目前,策划全面了解全球气候政策目标的方法需要大量的人工工作。目前很少有可扩展的方法来从国家法律或政策中提取气候目标,这限制了决策者和研究人员评估私营和公共部门与全球目标的一致性以及为政策决策提供信息的能力。在本文中,我们提出了一种从国家法律和政策中提取气候目标提及的方法。我们创建了一个专家注释的数据集,识别三类目标(“零排放”,“减少”和“其他”(例如可再生能源目标)),并训练一个分类器可靠地识别它们在文本中。我们调查了与我们的模型相关的偏见和公平影响,并确定特定年份和国家名称作为问题特征。最后,我们通过在全球国家气候法律和政策以及联合国气候变化框架公约提交的Climate Policy Radar(CPR)数据集上运行该分类器产生的数据集的特征,突显了自动化和可扩展数据收集对现有气候政策数据库的潜力以及支持进一步研究的重要性。我们的工作显著提升了这些关键气候政策要素对决策者和研究人员的可访问性。我们在 https://huggingface.co/ClimatePolicyRadar/national-climate-targets 发布了我们的模型,相关数据集在 https://huggingface.co/datasets/ClimatePolicyRadar/national-climate-targets。

更新时间: 2024-04-04 11:23:59

领域: cs.CY,cs.CL,cs.LG

下载: http://arxiv.org/abs/2404.02822v2

Dendrites endow artificial neural networks with accurate, robust and parameter-efficient learning

Artificial neural networks (ANNs) are at the core of most Deep learning (DL) algorithms that successfully tackle complex problems like image recognition, autonomous driving, and natural language processing. However, unlike biological brains who tackle similar problems in a very efficient manner, DL algorithms require a large number of trainable parameters, making them energy-intensive and prone to overfitting. Here, we show that a new ANN architecture that incorporates the structured connectivity and restricted sampling properties of biological dendrites counteracts these limitations. We find that dendritic ANNs are more robust to overfitting and outperform traditional ANNs on several image classification tasks while using significantly fewer trainable parameters. This is achieved through the adoption of a different learning strategy, whereby most of the nodes respond to several classes, unlike classical ANNs that strive for class-specificity. These findings suggest that the incorporation of dendrites can make learning in ANNs precise, resilient, and parameter-efficient and shed new light on how biological features can impact the learning strategies of ANNs.

Updated: 2024-04-04 11:22:58

标题: 树突赋予人工神经网络准确、稳健和参数高效的学习能力

摘要: 人工神经网络(ANNs)是大多数成功解决复杂问题的深度学习(DL)算法的核心,如图像识别、自动驾驶和自然语言处理。然而,与生物大脑不同,生物大脑以一种非常高效的方式解决类似问题,DL算法需要大量可训练参数,使其耗能大且容易过拟合。在这里,我们展示了一种新的ANN架构,结合了生物树突的结构连接性和受限采样属性,以抵消这些限制。我们发现,树突ANNs对过拟合更具鲁棒性,在几个图像分类任务上表现优于传统ANNs,同时使用明显更少的可训练参数。这是通过采用不同的学习策略实现的,其中大多数节点对多个类别做出响应,而不像传统ANNs那样追求类别特异性。这些发现表明,树突的融入可以使ANNs中的学习变得精确、弹性和参数高效,并揭示了生物特征如何影响ANNs的学习策略。

更新时间: 2024-04-04 11:22:58

领域: cs.NE,cs.LG,q-bio.NC

下载: http://arxiv.org/abs/2404.03708v1

Better-than-KL PAC-Bayes Bounds

Let $f(\theta, X_1),$ $ \dots,$ $ f(\theta, X_n)$ be a sequence of random elements, where $f$ is a fixed scalar function, $X_1, \dots, X_n$ are independent random variables (data), and $\theta$ is a random parameter distributed according to some data-dependent posterior distribution $P_n$. In this paper, we consider the problem of proving concentration inequalities to estimate the mean of the sequence. An example of such a problem is the estimation of the generalization error of some predictor trained by a stochastic algorithm, such as a neural network where $f$ is a loss function. Classically, this problem is approached through a PAC-Bayes analysis where, in addition to the posterior, we choose a prior distribution which captures our belief about the inductive bias of the learning problem. Then, the key quantity in PAC-Bayes concentration bounds is a divergence that captures the complexity of the learning problem where the de facto standard choice is the KL divergence. However, the tightness of this choice has rarely been questioned. In this paper, we challenge the tightness of the KL-divergence-based bounds by showing that it is possible to achieve a strictly tighter bound. In particular, we demonstrate new high-probability PAC-Bayes bounds with a novel and better-than-KL divergence that is inspired by Zhang et al. (2022). Our proof is inspired by recent advances in regret analysis of gambling algorithms, and its use to derive concentration inequalities. Our result is first-of-its-kind in that existing PAC-Bayes bounds with non-KL divergences are not known to be strictly better than KL. Thus, we believe our work marks the first step towards identifying optimal rates of PAC-Bayes bounds.

Updated: 2024-04-04 11:22:20

标题: 优于KL的PAC-Bayes界限

摘要: 让$f(\theta, X_1),$ $ \dots,$ $ f(\theta, X_n)$ 成为一系列随机元素,其中$f$是一个固定的标量函数,$X_1, \dots, X_n$ 是独立的随机变量(数据),$\theta$ 是根据某个数据相关的后验分布$P_n$ 分布的随机参数。在本文中,我们考虑证明浓度不等式来估计这个序列的均值。这样一个问题的例子是通过随机算法训练的某个预测器的泛化误差的估计,比如一个$f$ 是损失函数的神经网络。传统上,这个问题是通过 PAC-Bayes 分析来处理的,在这个分析中,除了后验分布,我们选择一个先验分布来捕捉我们对学习问题的归纳偏差的信念。然后,PAC-Bayes 浓度界限中的关键数量是一个捕捉学习问题复杂性的散度,其中事实上标准选择是 KL 散度。然而,这个选择的紧密性很少受到质疑。 在本文中,我们通过展示可以实现严格更紧的界限来挑战基于 KL 散度的界限的紧密性。特别地,我们展示了一种新的高概率 PAC-Bayes 界限,采用了Zhang等人(2022年)的新颖且优于 KL 散度的散度。我们的证明受到了赌博算法遗憾分析的最新进展的启发,以及使用它来推导浓度不等式。我们的结果是首次,因为已知的具有非 KL 散度的 PAC-Bayes 界限并不被认为严格优于 KL。因此,我们相信我们的工作标志着确定 PAC-Bayes 界限的最佳速率的第一步。

更新时间: 2024-04-04 11:22:20

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2402.09201v2

Elementary Analysis of Policy Gradient Methods

Projected policy gradient under the simplex parameterization, policy gradient and natural policy gradient under the softmax parameterization, are fundamental algorithms in reinforcement learning. There have been a flurry of recent activities in studying these algorithms from the theoretical aspect. Despite this, their convergence behavior is still not fully understood, even given the access to exact policy evaluations. In this paper, we focus on the discounted MDP setting and conduct a systematic study of the aforementioned policy optimization methods. Several novel results are presented, including 1) global linear convergence of projected policy gradient for any constant step size, 2) sublinear convergence of softmax policy gradient for any constant step size, 3) global linear convergence of softmax natural policy gradient for any constant step size, 4) global linear convergence of entropy regularized softmax policy gradient for a wider range of constant step sizes than existing result, 5) tight local linear convergence rate of entropy regularized natural policy gradient, and 6) a new and concise local quadratic convergence rate of soft policy iteration without the assumption on the stationary distribution under the optimal policy. New and elementary analysis techniques have been developed to establish these results.

Updated: 2024-04-04 11:16:16

标题: 政策梯度方法的初级分析

摘要: 在简单参数化下的投影策略梯度,以及在softmax参数化下的策略梯度和自然策略梯度,在强化学习中是基本算法。最近在从理论方面研究这些算法方面进行了大量活动。尽管如此,它们的收敛行为仍然没有完全理解,即使可以获得准确的策略评估。在本文中,我们关注折扣MDP设置,并对上述策略优化方法进行系统研究。提出了几项新颖的结果,包括1)对于任何恒定步长的投影策略梯度的全局线性收敛,2)对于任何恒定步长的softmax策略梯度的次线性收敛,3)对于任何恒定步长的softmax自然策略梯度的全局线性收敛,4)对于比现有结果更广泛范围的恒定步长,熵正则化softmax策略梯度的全局线性收敛,5)熵正则化自然策略梯度的紧致局部线性收敛率,以及6)在不假定最优策略下的稳态分布的情况下,软策略迭代的新和简洁的局部二次收敛率。已开发了新的和基本的分析技术来建立这些结果。

更新时间: 2024-04-04 11:16:16

领域: math.OC,cs.LG

下载: http://arxiv.org/abs/2404.03372v1

Graph Neural Networks for Electric and Hydraulic Data Fusion to Enhance Short-term Forecasting of Pumped-storage Hydroelectricity

Pumped-storage hydropower plants (PSH) actively participate in grid power-frequency control and therefore often operate under dynamic conditions, which results in rapidly varying system states. Predicting these dynamically changing states is essential for comprehending the underlying sensor and machine conditions. This understanding aids in detecting anomalies and faults, ensuring the reliable operation of the connected power grid, and in identifying faulty and miscalibrated sensors. PSH are complex, highly interconnected systems encompassing electrical and hydraulic subsystems, each characterized by their respective underlying networks that can individually be represented as graphs. To take advantage of this relational inductive bias, graph neural networks (GNNs) have been separately applied to state forecasting tasks in the individual subsystems, but without considering their interdependencies. In PSH, however, these subsystems depend on the same control input, making their operations highly interdependent and interconnected. Consequently, hydraulic and electrical sensor data should be fused across PSH subsystems to improve state forecasting accuracy. This approach has not been explored in GNN literature yet because many available PSH graphs are limited to their respective subsystem boundaries, which makes the method unsuitable to be applied directly. In this work, we introduce the application of spectral-temporal graph neural networks, which leverage self-attention mechanisms to concurrently capture and learn meaningful subsystem interdependencies and the dynamic patterns observed in electric and hydraulic sensors. Our method effectively fuses data from the PSH's subsystems by operating on a unified, system-wide graph, learned directly from the data, This approach leads to demonstrably improved state forecasting performance and enhanced generalizability.

Updated: 2024-04-04 11:09:49

标题: 图神经网络用于电力和水力数据融合,以提高抽水蓄能水电短期预测效果

摘要: 抽水蓄能电站(PSH)积极参与电网功率频率控制,因此通常在动态条件下运行,导致系统状态迅速变化。预测这些动态变化的状态对于理解底层传感器和设备状态至关重要。这种理解有助于检测异常和故障,确保连接的电网可靠运行,并识别故障和校准不准确的传感器。PSH是复杂的、高度相互连接的系统,包括电气和液压子系统,每个子系统都有其相应的底层网络,可以单独表示为图。为了利用这种关系归纳偏差,图神经网络(GNNs)已被分别应用于各个子系统中的状态预测任务,但没有考虑它们之间的相互依赖关系。然而,在PSH中,这些子系统依赖于同一控制输入,使它们的运行高度相互依赖和相互连接。因此,应该跨PSH子系统融合液压和电气传感器数据以提高状态预测的准确性。这种方法尚未在GNN文献中探讨,因为许多可用的PSH图限于其各自的子系统边界,这使得该方法无法直接应用。在这项工作中,我们引入了谱-时间图神经网络的应用,利用自注意机制同时捕捉和学习有意义的子系统相互依赖关系和电气和液压传感器中观察到的动态模式。我们的方法通过在直接从数据中学习的统一、系统范围的图上运行,有效地融合了PSH子系统的数据。这种方法导致了明显改进的状态预测性能和增强的泛化能力。

更新时间: 2024-04-04 11:09:49

领域: cs.LG,cs.SY,eess.SP,eess.SY

下载: http://arxiv.org/abs/2404.03368v1

Chemist-X: Large Language Model-empowered Agent for Reaction Condition Recommendation in Chemical Synthesis

Recent AI research plots a promising future of automatic chemical reactions within the chemistry society. This study proposes Chemist-X, a transformative AI agent that automates the reaction condition recommendation (RCR) task in chemical synthesis with retrieval-augmented generation (RAG) technology. To emulate expert chemists' strategies when solving RCR tasks, Chemist-X utilizes advanced RAG schemes to interrogate online molecular databases and distill critical data from the latest literature database. Further, the agent leverages state-of-the-art computer-aided design (CAD) tools with a large language model (LLM) supervised programming interface. With the ability to utilize updated chemical knowledge and CAD tools, our agent significantly outperforms conventional synthesis AIs confined to the fixed knowledge within its training data. Chemist-X considerably reduces chemists' workload and allows them to focus on more fundamental and creative problems, thereby bringing closer computational techniques and chemical research and making a remarkable leap toward harnessing AI's full capabilities in scientific discovery.

Updated: 2024-04-04 10:57:56

标题: 化学家-X:大型语言模型增强的代理程序,用于化学合成中反应条件推荐

摘要: 最近的人工智能研究描绘了在化学社会中自动化化学反应的光明前景。本研究提出了Chemist-X,一个利用检索增强生成(RAG)技术自动化化学合成中反应条件推荐(RCR)任务的变革性人工智能代理。为了模拟专业化学家在解决RCR任务时的策略,Chemist-X利用先进的RAG方案来查询在线分子数据库并从最新的文献数据库中提炼关键数据。此外,该代理利用最先进的计算机辅助设计(CAD)工具和大型语言模型(LLM)监督编程接口。凭借使用更新的化学知识和CAD工具的能力,我们的代理明显优于传统合成人工智能,后者受限于其训练数据中的固定知识。Chemist-X显著减轻了化学家的工作量,使他们能够专注于更基础和创造性的问题,从而将计算技术和化学研究更为紧密地联系在一起,并在科学发现中充分发挥人工智能的能力迈出了一大步。

更新时间: 2024-04-04 10:57:56

领域: cs.IR,cs.AI

下载: http://arxiv.org/abs/2311.10776v5

REACT: Revealing Evolutionary Action Consequence Trajectories for Interpretable Reinforcement Learning

To enhance the interpretability of Reinforcement Learning (RL), we propose Revealing Evolutionary Action Consequence Trajectories (REACT). In contrast to the prevalent practice of validating RL models based on their optimal behavior learned during training, we posit that considering a range of edge-case trajectories provides a more comprehensive understanding of their inherent behavior. To induce such scenarios, we introduce a disturbance to the initial state, optimizing it through an evolutionary algorithm to generate a diverse population of demonstrations. To evaluate the fitness of trajectories, REACT incorporates a joint fitness function that encourages both local and global diversity in the encountered states and chosen actions. Through assessments with policies trained for varying durations in discrete and continuous environments, we demonstrate the descriptive power of REACT. Our results highlight its effectiveness in revealing nuanced aspects of RL models' behavior beyond optimal performance, thereby contributing to improved interpretability.

Updated: 2024-04-04 10:56:30

标题: REACT:揭示进化行动后果轨迹以进行可解释的强化学习

摘要: 为增强强化学习(RL)的可解释性,我们提出了揭示进化行为后果轨迹(REACT)的方法。与通常基于训练期间学到的最佳行为验证RL模型的做法相反,我们认为考虑一系列边缘案例轨迹可以更全面地理解其固有行为。为了诱导这种情景,我们引入了一种扰动到初始状态,并通过进化算法对其进行优化,以生成多样化的演示群体。为了评估轨迹的适应性,REACT融合了一个联合适应性函数,鼓励遇到的状态和选择的行为在局部和全局上的多样性。通过在离散和连续环境中训练不同持续时间的策略进行评估,我们展示了REACT的描述能力。我们的结果突显了它在揭示RL模型行为微妙方面的有效性,从而有助于提高可解释性。

更新时间: 2024-04-04 10:56:30

领域: cs.LG,cs.AI,cs.NE

下载: http://arxiv.org/abs/2404.03359v1

Investigating the Robustness of Counterfactual Learning to Rank Models: A Reproducibility Study

Counterfactual learning to rank (CLTR) has attracted extensive attention in the IR community for its ability to leverage massive logged user interaction data to train ranking models. While the CLTR models can be theoretically unbiased when the user behavior assumption is correct and the propensity estimation is accurate, their effectiveness is usually empirically evaluated via simulation-based experiments due to a lack of widely-available, large-scale, real click logs. However, the mainstream simulation-based experiments are somewhat limited as they often feature a single, deterministic production ranker and simplified user simulation models to generate the synthetic click logs. As a result, the robustness of CLTR models in complex and diverse situations is largely unknown and needs further investigation. To address this problem, in this paper, we aim to investigate the robustness of existing CLTR models in a reproducibility study with extensive simulation-based experiments that (1) use both deterministic and stochastic production rankers, each with different ranking performance, and (2) leverage multiple user simulation models with different user behavior assumptions. We find that the DLA models and IPS-DCM show better robustness under various simulation settings than IPS-PBM and PRS with offline propensity estimation. Besides, the existing CLTR models often fail to outperform the naive click baselines when the production ranker has relatively high ranking performance or certain randomness, which suggests an urgent need for developing new CLTR algorithms that work for these settings.

Updated: 2024-04-04 10:54:38

标题: 研究对抗学习排序模型的稳健性:一项可重复性研究

摘要: 反事实学习排名(CLTR)因其利用大规模记录的用户交互数据训练排名模型的能力而引起了信息检索社区的广泛关注。虽然当用户行为假设正确并且倾向估计准确时,CLTR模型在理论上可以是无偏的,但由于缺乏广泛可用的大规模真实点击日志,它们的有效性通常通过基于模拟的实验进行经验评估。然而,主流基于模拟的实验在某种程度上受限,因为它们通常具有单一确定性生产排名器和简化的用户仿真模型来生成合成点击日志。因此,在复杂和多样化情况下CLTR模型的稳健性在很大程度上是未知的,需要进一步研究。 为了解决这个问题,本文旨在通过广泛的基于模拟的实验,研究现有CLTR模型的稳健性,其中(1)使用具有不同排名性能的确定性和随机生产排名器,以及(2)利用具有不同用户行为假设的多个用户仿真模型。我们发现,DLA模型和IPS-DCM在各种模拟设置下表现出比IPS-PBM和具有离线倾向估计的PRS更好的稳健性。此外,当生产排名器具有相对较高的排名性能或某种随机性时,现有的CLTR模型通常无法超越天真点击基线,这表明迫切需要开发适用于这些设置的新CLTR算法。

更新时间: 2024-04-04 10:54:38

领域: cs.LG,cs.AI,cs.IR

下载: http://arxiv.org/abs/2404.03707v1

Towards Leveraging AutoML for Sustainable Deep Learning: A Multi-Objective HPO Approach on Deep Shift Neural Networks

Deep Learning (DL) has advanced various fields by extracting complex patterns from large datasets. However, the computational demands of DL models pose environmental and resource challenges. Deep shift neural networks (DSNNs) offer a solution by leveraging shift operations to reduce computational complexity at inference. Following the insights from standard DNNs, we are interested in leveraging the full potential of DSNNs by means of AutoML techniques. We study the impact of hyperparameter optimization (HPO) to maximize DSNN performance while minimizing resource consumption. Since this combines multi-objective (MO) optimization with accuracy and energy consumption as potentially complementary objectives, we propose to combine state-of-the-art multi-fidelity (MF) HPO with multi-objective optimization. Experimental results demonstrate the effectiveness of our approach, resulting in models with over 80\% in accuracy and low computational cost. Overall, our method accelerates efficient model development while enabling sustainable AI applications.

Updated: 2024-04-04 10:54:04

标题: 朝着利用AutoML实现可持续深度学习的方向:基于深度转移神经网络的多目标HPO方法

摘要: 深度学习(DL)通过从大型数据集中提取复杂模式,推动了各个领域的发展。然而,DL模型的计算需求带来了环境和资源挑战。深度平移神经网络(DSNNs)通过利用平移操作来降低推断时的计算复杂性,提供了一种解决方案。在借鉴标准DNNs的启示后,我们对利用AutoML技术充分发挥DSNNs的潜力感兴趣。我们研究了超参数优化(HPO)对最大化DSNN性能和最小化资源消耗的影响。由于这将多目标(MO)优化与准确性和能量消耗作为潜在互补目标相结合,我们建议将最新的多品质(MF)HPO与多目标优化相结合。实验结果表明我们的方法的有效性,产生了准确率超过80%且计算成本低的模型。总的来说,我们的方法加速了高效模型开发,同时实现了可持续的AI应用。

更新时间: 2024-04-04 10:54:04

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2404.01965v2

A Comprehensive Survey on Self-Supervised Learning for Recommendation

Recommender systems play a crucial role in tackling the challenge of information overload by delivering personalized recommendations based on individual user preferences. Deep learning techniques, such as RNNs, GNNs, and Transformer architectures, have significantly propelled the advancement of recommender systems by enhancing their comprehension of user behaviors and preferences. However, supervised learning methods encounter challenges in real-life scenarios due to data sparsity, resulting in limitations in their ability to learn representations effectively. To address this, self-supervised learning (SSL) techniques have emerged as a solution, leveraging inherent data structures to generate supervision signals without relying solely on labeled data. By leveraging unlabeled data and extracting meaningful representations, recommender systems utilizing SSL can make accurate predictions and recommendations even when confronted with data sparsity. In this paper, we provide a comprehensive review of self-supervised learning frameworks designed for recommender systems, encompassing a thorough analysis of over 170 papers. We conduct an exploration of nine distinct scenarios, enabling a comprehensive understanding of SSL-enhanced recommenders in different contexts. For each domain, we elaborate on different self-supervised learning paradigms, namely contrastive learning, generative learning, and adversarial learning, so as to present technical details of how SSL enhances recommender systems in various contexts. We consistently maintain the related open-source materials at https://github.com/HKUDS/Awesome-SSLRec-Papers.

Updated: 2024-04-04 10:45:23

标题: 一个关于自监督学习在推荐系统中的综合调查

摘要: 推荐系统在应对信息过载挑战中发挥着关键作用,通过基于个人用户偏好提供个性化推荐。深度学习技术,如RNNs、GNNs和Transformer架构,显著推动了推荐系统的进步,增强了对用户行为和偏好的理解。然而,在现实场景中,监督学习方法面临数据稀疏性等挑战,导致其有效学习表示的能力受限。为解决这一问题,自监督学习(SSL)技术出现作为解决方案,利用内在数据结构生成监督信号,而不仅仅依赖于标记数据。通过利用无标签数据并提取有意义的表示,利用SSL的推荐系统可以在面对数据稀疏性时做出准确的预测和推荐。本文对为推荐系统设计的自监督学习框架进行了全面回顾,涵盖了超过170篇论文的深入分析。我们探索了九种不同的场景,使人们全面了解不同情境下SSL增强推荐系统的情况。对于每个领域,我们详细阐述了不同的自监督学习范式,即对比学习、生成学习和对抗学习,以展示SSL如何在不同情境中增强推荐系统的技术细节。我们始终在https://github.com/HKUDS/Awesome-SSLRec-Papers上保持相关的开源材料。

更新时间: 2024-04-04 10:45:23

领域: cs.IR,cs.AI

下载: http://arxiv.org/abs/2404.03354v1

Bi-level Guided Diffusion Models for Zero-Shot Medical Imaging Inverse Problems

In the realm of medical imaging, inverse problems aim to infer high-quality images from incomplete, noisy measurements, with the objective of minimizing expenses and risks to patients in clinical settings. The Diffusion Models have recently emerged as a promising approach to such practical challenges, proving particularly useful for the zero-shot inference of images from partially acquired measurements in Magnetic Resonance Imaging (MRI) and Computed Tomography (CT). A central challenge in this approach, however, is how to guide an unconditional prediction to conform to the measurement information. Existing methods rely on deficient projection or inefficient posterior score approximation guidance, which often leads to suboptimal performance. In this paper, we propose \underline{\textbf{B}}i-level \underline{G}uided \underline{D}iffusion \underline{M}odels ({BGDM}), a zero-shot imaging framework that efficiently steers the initial unconditional prediction through a \emph{bi-level} guidance strategy. Specifically, BGDM first approximates an \emph{inner-level} conditional posterior mean as an initial measurement-consistent reference point and then solves an \emph{outer-level} proximal optimization objective to reinforce the measurement consistency. Our experimental findings, using publicly available MRI and CT medical datasets, reveal that BGDM is more effective and efficient compared to the baselines, faithfully generating high-fidelity medical images and substantially reducing hallucinatory artifacts in cases of severe degradation.

Updated: 2024-04-04 10:36:56

标题: 双层引导扩散模型用于零样本医学成像逆问题

摘要: 在医学影像领域,逆问题旨在从不完整、嘈杂的测量中推断高质量图像,其目标是在临床环境中尽量减少费用和风险。扩散模型最近已经成为解决这些实际挑战的一种有前途的方法,特别适用于从部分获取的测量中零样本推断磁共振成像(MRI)和计算机断层扫描(CT)图像。然而,这种方法的一个核心挑战是如何引导无条件预测符合测量信息。现有方法依赖于不足的投影或低效的后验分数逼近引导,通常导致次优性能。在本文中,我们提出了\textbf{B}i-level \underline{G}uided \underline{D}iffusion \underline{M}odels(BGDM),这是一个零样本成像框架,通过\emph{bi-level}引导策略有效地引导初始无条件预测。具体来说,BGDM首先将一个\emph{inner-level}条件后验均值近似为一个初始的符合测量的参考点,然后解决一个\emph{outer-level}近端优化目标来加强测量一致性。我们的实验结果,使用公开可用的MRI和CT医学数据集,显示BGDM相比基线更为有效和高效,能够忠实地生成高保真医学图像,并在严重退化情况下显著减少幻觉性伪影。

更新时间: 2024-04-04 10:36:56

领域: eess.IV,cs.LG

下载: http://arxiv.org/abs/2404.03706v1

Knowledge Distillation-Based Model Extraction Attack using Private Counterfactual Explanations

In recent years, there has been a notable increase in the deployment of machine learning (ML) models as services (MLaaS) across diverse production software applications. In parallel, explainable AI (XAI) continues to evolve, addressing the necessity for transparency and trustworthiness in ML models. XAI techniques aim to enhance the transparency of ML models by providing insights, in terms of the model's explanations, into their decision-making process. Simultaneously, some MLaaS platforms now offer explanations alongside the ML prediction outputs. This setup has elevated concerns regarding vulnerabilities in MLaaS, particularly in relation to privacy leakage attacks such as model extraction attacks (MEA). This is due to the fact that explanations can unveil insights about the inner workings of the model which could be exploited by malicious users. In this work, we focus on investigating how model explanations, particularly Generative adversarial networks (GANs)-based counterfactual explanations (CFs), can be exploited for performing MEA within the MLaaS platform. We also delve into assessing the effectiveness of incorporating differential privacy (DP) as a mitigation strategy. To this end, we first propose a novel MEA methodology based on Knowledge Distillation (KD) to enhance the efficiency of extracting a substitute model of a target model exploiting CFs. Then, we advise an approach for training CF generators incorporating DP to generate private CFs. We conduct thorough experimental evaluations on real-world datasets and demonstrate that our proposed KD-based MEA can yield a high-fidelity substitute model with reduced queries with respect to baseline approaches. Furthermore, our findings reveal that the inclusion of a privacy layer impacts the performance of the explainer, the quality of CFs, and results in a reduction in the MEA performance.

Updated: 2024-04-04 10:28:55

标题: 基于知识蒸馏的模型提取攻击:使用私密对立解释

摘要: 近年来,机器学习模型作为服务(MLaaS)在各种生产软件应用中的部署显著增加。与此同时,可解释人工智能(XAI)不断发展,解决了ML模型透明度和可信度的必要性。XAI技术旨在通过提供对模型解释的见解,揭示其决策过程,从而增强ML模型的透明度。同时,一些MLaaS平台现在在ML预测输出旁边提供解释。这种设置引起了人们对MLaaS的漏洞特别是隐私泄露攻击(如模型提取攻击)的担忧。这是因为解释可以揭示模型内部运作的见解,这可能被恶意用户利用。在这项工作中,我们重点研究模型解释,特别是基于生成对抗网络(GANs)的对抗性解释(CFs),如何被利用在MLaaS平台内进行MEA。我们还深入评估了将差分隐私(DP)作为缓解策略的有效性。为此,我们首先提出了一种基于知识蒸馏(KD)的新型MEA方法,以增强利用CFs提取目标模型的替代模型的效率。然后,我们建议一种采用DP的方法来训练CF生成器,生成私密CFs。我们对真实数据集进行了彻底的实验评估,并证明我们提出的基于KD的MEA可以产生具有较少查询的高保真度替代模型,相对于基线方法。此外,我们的研究结果显示,隐私层的加入影响了解释器的性能,CF的质量,并导致MEA性能下降。

更新时间: 2024-04-04 10:28:55

领域: cs.LG,cs.AI,cs.CR,cs.CY

下载: http://arxiv.org/abs/2404.03348v1

Meta Invariance Defense Towards Generalizable Robustness to Unknown Adversarial Attacks

Despite providing high-performance solutions for computer vision tasks, the deep neural network (DNN) model has been proved to be extremely vulnerable to adversarial attacks. Current defense mainly focuses on the known attacks, but the adversarial robustness to the unknown attacks is seriously overlooked. Besides, commonly used adaptive learning and fine-tuning technique is unsuitable for adversarial defense since it is essentially a zero-shot problem when deployed. Thus, to tackle this challenge, we propose an attack-agnostic defense method named Meta Invariance Defense (MID). Specifically, various combinations of adversarial attacks are randomly sampled from a manually constructed Attacker Pool to constitute different defense tasks against unknown attacks, in which a student encoder is supervised by multi-consistency distillation to learn the attack-invariant features via a meta principle. The proposed MID has two merits: 1) Full distillation from pixel-, feature- and prediction-level between benign and adversarial samples facilitates the discovery of attack-invariance. 2) The model simultaneously achieves robustness to the imperceptible adversarial perturbations in high-level image classification and attack-suppression in low-level robust image regeneration. Theoretical and empirical studies on numerous benchmarks such as ImageNet verify the generalizable robustness and superiority of MID under various attacks.

Updated: 2024-04-04 10:10:38

标题: 元不变性防御:实现对未知对抗攻击的泛化鲁棒性

摘要: 尽管深度神经网络(DNN)模型为计算机视觉任务提供了高性能解决方案,但已被证明极易受到对抗攻击的影响。目前的防御主要集中在已知攻击上,但对于未知攻击的对抗性鲁棒性严重被忽视。此外,常用的自适应学习和微调技术对于对抗性防御是不适当的,因为在部署时实质上是一个零样本问题。因此,为应对这一挑战,我们提出了一种攻击无关的防御方法,名为元不变性防御(MID)。具体来说,从手动构建的攻击者池中随机抽样不同的对抗攻击组合,构成针对未知攻击的不同防御任务,其中一个学生编码器通过多一致性蒸馏监督学习攻击不变特征,遵循元原则。所提出的MID具有两个优点:1)在良性和对抗样本之间进行像素级、特征级和预测级的全面蒸馏有助于发现攻击不变性。2)该模型同时在高级图像分类中实现了对不可察觉对抗扰动的鲁棒性,以及在低级鲁棒图像重建中实现了攻击抑制。在诸如ImageNet等许多基准测试上的理论和实证研究验证了MID在各种攻击下的一般化鲁棒性和优越性。

更新时间: 2024-04-04 10:10:38

领域: cs.CV,cs.CR,cs.LG

下载: http://arxiv.org/abs/2404.03340v1

Generative Semi-supervised Graph Anomaly Detection

This work considers a practical semi-supervised graph anomaly detection (GAD) scenario, where part of the nodes in a graph are known to be normal, contrasting to the unsupervised setting in most GAD studies with a fully unlabeled graph. As expected, we find that having access to these normal nodes helps enhance the detection performance of existing unsupervised GAD methods when they are adapted to the semi-supervised setting. However, their utilization of these normal nodes is limited. In this paper, we propose a novel Generative GAD approach (GGAD) for the semi-supervised scenario to better exploit the normal nodes. The key idea is to generate outlier nodes that assimilate anomaly nodes in both local structure and node representations for providing effective negative node samples in training a discriminative one-class classifier. There have been many generative anomaly detection approaches, but they are designed for non-graph data, and as a result, they fail to take account of the graph structure information. Our approach tackles this problem by generating graph structure-aware outlier nodes that have asymmetric affinity separability from normal nodes while being enforced to achieve egocentric closeness to normal nodes in the node representation space. Comprehensive experiments on four real-world datasets are performed to establish a benchmark for semi-supervised GAD and show that GGAD substantially outperforms state-of-the-art unsupervised and semi-supervised GAD methods with varying numbers of training normal nodes. Code will be made available at https://github.com/mala-lab/GGAD.

Updated: 2024-04-04 10:08:25

标题: 生成式半监督图异常检测

摘要: 这项工作考虑了一个实际的半监督图异常检测(GAD)场景,其中图中的部分节点被认为是正常的,与大多数GAD研究中使用完全未标记的图的无监督设置形成对比。正如预期的那样,我们发现当现有的无监督GAD方法适应半监督设置时,可以利用这些正常节点来增强检测性能。然而,它们利用这些正常节点的能力是有限的。在本文中,我们提出了一种新颖的半监督场景下的生成式GAD方法(GGAD),以更好地利用正常节点。其关键思想是生成异常节点,这些节点在本地结构和节点表示中模拟异常节点,为训练一个判别性单类分类器提供有效的负节点样本。已经有许多生成式异常检测方法,但它们设计用于非图数据,因此它们未能考虑图结构信息。我们的方法通过生成具有与正常节点不对称的亲和性可分性的图结构感知异常节点来解决这个问题,同时强制它们在节点表示空间中实现与正常节点的自我封闭性。对四个真实数据集进行了全面实验,建立了一个半监督GAD的基准,并表明GGAD在不同数量的训练正常节点下大大优于现有的无监督和半监督GAD方法。代码将在https://github.com/mala-lab/GGAD 上提供。

更新时间: 2024-04-04 10:08:25

领域: cs.LG

下载: http://arxiv.org/abs/2402.11887v3

Decoding Natural Images from EEG for Object Recognition

Electroencephalography (EEG) signals, known for convenient non-invasive acquisition but low signal-to-noise ratio, have recently gained substantial attention due to the potential to decode natural images. This paper presents a self-supervised framework to demonstrate the feasibility of learning image representations from EEG signals, particularly for object recognition. The framework utilizes image and EEG encoders to extract features from paired image stimuli and EEG responses. Contrastive learning aligns these two modalities by constraining their similarity. With the framework, we attain significantly above-chance results on a comprehensive EEG-image dataset, achieving a top-1 accuracy of 15.6% and a top-5 accuracy of 42.8% in challenging 200-way zero-shot tasks. Moreover, we perform extensive experiments to explore the biological plausibility by resolving the temporal, spatial, spectral, and semantic aspects of EEG signals. Besides, we introduce attention modules to capture spatial correlations, providing implicit evidence of the brain activity perceived from EEG data. These findings yield valuable insights for neural decoding and brain-computer interfaces in real-world scenarios. The code will be released on https://github.com/eeyhsong/NICE-EEG.

Updated: 2024-04-04 10:08:10

标题: 用脑电图解码自然图像以进行物体识别

摘要: 脑电图(EEG)信号以方便的无创获取而闻名,但信噪比低,最近因有望解码自然图像而引起了相当大的关注。本文提出了一个自监督框架,以展示从EEG信号学习图像表示的可行性,特别是用于物体识别。该框架利用图像和EEG编码器从配对的图像刺激和EEG响应中提取特征。对比学习通过约束它们的相似性来对齐这两种模态。通过该框架,在一个全面的EEG-图像数据集上取得了显著的超过随机的结果,在具有挑战性的200种零样本任务中,获得了15.6%的top-1准确率和42.8%的top-5准确率。此外,我们进行了大量实验,通过解决EEG信号的时间、空间、频谱和语义等方面,来探索生物学合理性。此外,我们引入了注意力模块来捕获空间相关性,为从EEG数据中感知到的大脑活动提供了隐含证据。这些发现为神经解码和现实世界场景中的脑机接口提供了宝贵的见解。代码将在https://github.com/eeyhsong/NICE-EEG上发布。

更新时间: 2024-04-04 10:08:10

领域: cs.HC,cs.AI,eess.SP,q-bio.NC

下载: http://arxiv.org/abs/2308.13234v3

Truncated Affinity Maximization: One-class Homophily Modeling for Graph Anomaly Detection

We reveal a one-class homophily phenomenon, which is one prevalent property we find empirically in real-world graph anomaly detection (GAD) datasets, i.e., normal nodes tend to have strong connection/affinity with each other, while the homophily in abnormal nodes is significantly weaker than normal nodes. However, this anomaly-discriminative property is ignored by existing GAD methods that are typically built using a conventional anomaly detection objective, such as data reconstruction. In this work, we explore this property to introduce a novel unsupervised anomaly scoring measure for GAD, local node affinity, that assigns a larger anomaly score to nodes that are less affiliated with their neighbors, with the affinity defined as similarity on node attributes/representations. We further propose Truncated Affinity Maximization (TAM) that learns tailored node representations for our anomaly measure by maximizing the local affinity of nodes to their neighbors. Optimizing on the original graph structure can be biased by nonhomophily edges (i.e., edges connecting normal and abnormal nodes). Thus, TAM is instead optimized on truncated graphs where non-homophily edges are removed iteratively to mitigate this bias. The learned representations result in significantly stronger local affinity for normal nodes than abnormal nodes. Extensive empirical results on 10 real-world GAD datasets show that TAM substantially outperforms seven competing models, achieving over 10% increase in AUROC/AUPRC compared to the best contenders on challenging datasets. Our code is available at https://github.com/mala-lab/TAM-master/.

Updated: 2024-04-04 10:06:34

标题: 截断亲和力最大化:图异常检测的单类同质建模

摘要: 我们揭示了一种单类同质性现象,这是我们在真实世界图异常检测(GAD)数据集中实证发现的一种普遍特性,即正常节点倾向于彼此具有强连接/亲和力,而异常节点中的同质性显著弱于正常节点。然而,这种区分异常的特性被现有的通常使用传统异常检测目标(如数据重建)构建的GAD方法所忽视。在这项工作中,我们探索了这种特性,引入了一种新颖的无监督异常评分测量方法,即局部节点亲和度,该方法为与其邻居关联较小的节点分配较大的异常评分,亲和度定义为节点属性/表示上的相似性。我们进一步提出了截断亲和性最大化(TAM),通过最大化节点与其邻居的局部亲和度来学习适合我们异常度量的节点表示。在原始图结构上的优化可能会受到非同质性边缘的偏见(即连接正常和异常节点的边缘)。因此,TAM改为在截断图上进行优化,其中非同质性边逐步被移除以减轻这种偏见。所学到的表示使正常节点的局部亲和度显著强于异常节点。对10个真实世界GAD数据集的广泛实证结果显示,TAM在挑战性数据集上比七个竞争模型表现出色,与最佳竞争者相比,AUROC/AUPRC增加了10%以上。我们的代码可以在https://github.com/mala-lab/TAM-master/找到。

更新时间: 2024-04-04 10:06:34

领域: cs.SI,cs.AI,cs.LG

下载: http://arxiv.org/abs/2306.00006v5

LancBiO: dynamic Lanczos-aided bilevel optimization via Krylov subspace

Bilevel optimization, with broad applications in machine learning, has an intricate hierarchical structure. Gradient-based methods have emerged as a common approach to large-scale bilevel problems. However, the computation of the hyper-gradient, which involves a Hessian inverse vector product, confines the efficiency and is regarded as a bottleneck. To circumvent the inverse, we construct a sequence of low-dimensional approximate Krylov subspaces with the aid of the Lanczos process. As a result, the constructed subspace is able to dynamically and incrementally approximate the Hessian inverse vector product with less effort and thus leads to a favorable estimate of the hyper-gradient. Moreover, we propose a~provable subspace-based framework for bilevel problems where one central step is to solve a small-size tridiagonal linear system. To the best of our knowledge, this is the first time that subspace techniques are incorporated into bilevel optimization. This successful trial not only enjoys $\mathcal{O}(\epsilon^{-1})$ convergence rate but also demonstrates efficiency in a synthetic problem and two deep learning tasks.

Updated: 2024-04-04 09:57:29

标题: LancBiO: 利用Krylov子空间动态Lanczos辅助的双层优化

摘要: 双层优化在机器学习中有广泛的应用,具有复杂的层次结构。基于梯度的方法已经成为处理大规模双层问题的常见方法。然而,涉及Hessian逆向量乘积的超梯度计算限制了效率,并被视为瓶颈。为了避免逆向计算,我们利用Lanczos过程构建了一系列低维近似Krylov子空间。因此,构建的子空间能够以较少的工作量动态和逐步地逼近Hessian逆向量乘积,从而导致对超梯度的有利估计。此外,我们提出了一个可证明的基于子空间的双层问题框架,其中一个关键步骤是解决一个小规模的三对角线性系统。据我们所知,这是首次将子空间技术纳入双层优化。这次成功的尝试不仅享有$O(\epsilon^{-1})$收敛速度,而且在一个合成问题和两个深度学习任务中展现出高效性。

更新时间: 2024-04-04 09:57:29

领域: math.OC,cs.LG,stat.ML

下载: http://arxiv.org/abs/2404.03331v1

MPOFI: Multichannel Partially Observed Functional Modeling for Defect Classification with Imbalanced Dataset via Deep Metric Learning

In modern manufacturing, most of the product lines are conforming. Few products are nonconforming but with different defect types. The identification of defect types can help further root cause diagnosis of production lines. With the sensing development, continuous signals of process variables can be collected in high resolution, which can be regarded as multichannel functional data. They have abundant information to characterize the process and help identify the defect types. Motivated by a real example from the pipe tightening process, we target at detect classification when each sample is a multichannel functional data. However, the available samples for each defect type are limited and imbalanced. Moreover, the functions are partially observed since the pre-tightening process before the pipe tightening process is unobserved. To classify the defect samples based on imbalanced, multichannel, and partially observed functional data is very important but challenging. Thus, we propose an innovative framework known as "Multichannel Partially Observed Functional Modeling for Defect Classification with an Imbalanced Dataset" (MPOFI). The framework leverages the power of deep metric learning in conjunction with a neural network specially crafted for processing functional data. This paper introduces a neural network explicitly tailored for handling multichannel and partially observed functional data, complemented by developing a corresponding loss function for training on imbalanced datasets. The results from a real-world case study demonstrate the superior accuracy of our framework when compared to existing benchmarks.

Updated: 2024-04-04 09:55:11

标题: MPOFI:通过深度度量学习进行不平衡数据集的缺陷分类的多通道部分观测功能建模

摘要: 在现代制造业中,大多数产品线都是符合规范的。少数产品存在不符合规范的情况,但缺陷类型各不相同。识别缺陷类型可以帮助进一步诊断生产线的根本原因。随着传感器技术的发展,可以以高分辨率收集过程变量的连续信号,这些信号可以被视为多通道功能数据。它们具有丰富的信息来表征过程并帮助识别缺陷类型。受管道紧固过程的一个真实例子的启发,我们旨在在每个样本都是多通道功能数据时进行检测分类。然而,每种缺陷类型的可用样本数量有限且不平衡。此外,由于管道紧固过程前的预紧过程未被观察到,函数是部分观察到的。基于不平衡、多通道和部分观察到的功能数据对缺陷样本进行分类非常重要但具有挑战性。因此,我们提出了一种创新的框架,名为“用于不平衡数据集的多通道部分观察到的功能建模缺陷分类”(MPOFI)。该框架利用了深度度量学习的力量,结合了专门为处理功能数据而设计的神经网络。本文介绍了一种专门针对处理多通道和部分观察到的功能数据的神经网络,同时开发了相应的损失函数,用于在不平衡数据集上进行训练。来自一个真实案例研究的结果表明,与现有基准相比,我们的框架具有更高的准确性。

更新时间: 2024-04-04 09:55:11

领域: cs.LG,eess.SP,stat.ML

下载: http://arxiv.org/abs/2404.03329v1

Embodied Neuromorphic Artificial Intelligence for Robotics: Perspectives, Challenges, and Research Development Stack

Robotic technologies have been an indispensable part for improving human productivity since they have been helping humans in completing diverse, complex, and intensive tasks in a fast yet accurate and efficient way. Therefore, robotic technologies have been deployed in a wide range of applications, ranging from personal to industrial use-cases. However, current robotic technologies and their computing paradigm still lack embodied intelligence to efficiently interact with operational environments, respond with correct/expected actions, and adapt to changes in the environments. Toward this, recent advances in neuromorphic computing with Spiking Neural Networks (SNN) have demonstrated the potential to enable the embodied intelligence for robotics through bio-plausible computing paradigm that mimics how the biological brain works, known as "neuromorphic artificial intelligence (AI)". However, the field of neuromorphic AI-based robotics is still at an early stage, therefore its development and deployment for solving real-world problems expose new challenges in different design aspects, such as accuracy, adaptability, efficiency, reliability, and security. To address these challenges, this paper will discuss how we can enable embodied neuromorphic AI for robotic systems through our perspectives: (P1) Embodied intelligence based on effective learning rule, training mechanism, and adaptability; (P2) Cross-layer optimizations for energy-efficient neuromorphic computing; (P3) Representative and fair benchmarks; (P4) Low-cost reliability and safety enhancements; (P5) Security and privacy for neuromorphic computing; and (P6) A synergistic development for energy-efficient and robust neuromorphic-based robotics. Furthermore, this paper identifies research challenges and opportunities, as well as elaborates our vision for future research development toward embodied neuromorphic AI for robotics.

Updated: 2024-04-04 09:52:22

标题: 具身神经形态人工智能在机器人领域的应用:观点、挑战和研究发展堆栈

摘要: Robotic technologies have played a crucial role in enhancing human productivity by assisting with diverse, complex, and intensive tasks in a fast and efficient manner. However, current robotic technologies and their computing methods lack the ability to interact efficiently with their environments, respond accurately, and adapt to changes. Recent advancements in neuromorphic computing, specifically with Spiking Neural Networks (SNN), show promise in enabling embodied intelligence in robotics through a bio-plausible computing approach that mimics the workings of the human brain, known as "neuromorphic artificial intelligence (AI)". Despite this potential, the field of neuromorphic AI-based robotics is still in its early stages, presenting new challenges in design aspects such as accuracy, adaptability, efficiency, reliability, and security. This paper aims to explore how embodied neuromorphic AI can be enabled in robotic systems through various perspectives, including effective learning rules, training mechanisms, and adaptability; energy-efficient cross-layer optimizations for neuromorphic computing; representative and fair benchmarks; low-cost reliability and safety enhancements; security and privacy considerations for neuromorphic computing; and a synergistic approach to developing energy-efficient and robust neuromorphic-based robotics. Additionally, the paper identifies research challenges, opportunities, and outlines a vision for future research development in the field of embodied neuromorphic AI for robotics.

更新时间: 2024-04-04 09:52:22

领域: cs.RO,cs.AI,cs.LG,cs.NE

下载: http://arxiv.org/abs/2404.03325v1

Sparse Concept Bottleneck Models: Gumbel Tricks in Contrastive Learning

We propose a novel architecture and method of explainable classification with Concept Bottleneck Models (CBMs). While SOTA approaches to Image Classification task work as a black box, there is a growing demand for models that would provide interpreted results. Such a models often learn to predict the distribution over class labels using additional description of this target instances, called concepts. However, existing Bottleneck methods have a number of limitations: their accuracy is lower than that of a standard model and CBMs require an additional set of concepts to leverage. We provide a framework for creating Concept Bottleneck Model from pre-trained multi-modal encoder and new CLIP-like architectures. By introducing a new type of layers known as Concept Bottleneck Layers, we outline three methods for training them: with $\ell_1$-loss, contrastive loss and loss function based on Gumbel-Softmax distribution (Sparse-CBM), while final FC layer is still trained with Cross-Entropy. We show a significant increase in accuracy using sparse hidden layers in CLIP-based bottleneck models. Which means that sparse representation of concepts activation vector is meaningful in Concept Bottleneck Models. Moreover, with our Concept Matrix Search algorithm we can improve CLIP predictions on complex datasets without any additional training or fine-tuning. The code is available at: https://github.com/Andron00e/SparseCBM.

Updated: 2024-04-04 09:43:43

标题: 稀疏概念瓶颈模型:对比学习中的古贝尔技巧

摘要: 我们提出了一种新颖的架构和可解释分类方法,使用概念瓶颈模型(CBMs)。虽然目前领先的图像分类方法作为黑盒运行,但越来越多的人需要能够提供解释结果的模型。这种模型通常会学习使用目标实例的附加描述来预测类标签的分布,这些描述被称为概念。然而,现有的瓶颈方法存在一些局限性:它们的准确性低于标准模型,CBMs需要额外的概念集来发挥作用。我们提供了一个从预训练的多模态编码器和新的类似CLIP的架构中创建概念瓶颈模型的框架。通过引入一种新类型的层称为概念瓶颈层,我们概述了三种训练方法:使用$\ell_1$损失、对比损失和基于Gumbel-Softmax分布的损失函数(稀疏CBM),而最终的FC层仍然是用交叉熵训练的。我们展示了在基于CLIP的瓶颈模型中使用稀疏隐藏层可以显著提高准确性。这意味着概念激活向量的稀疏表示在概念瓶颈模型中是有意义的。此外,通过我们的概念矩阵搜索算法,我们可以在不进行任何额外训练或微调的情况下改进CLIP在复杂数据集上的预测。代码可在以下链接找到:https://github.com/Andron00e/SparseCBM。

更新时间: 2024-04-04 09:43:43

领域: cs.CV,cs.AI,I.2.6, I.2.10, I.4.10, I.5.1, I.5.4, I.5.5,I.2.6; I.2.10; I.4.10; I.5.1; I.5.4; I.5.5

下载: http://arxiv.org/abs/2404.03323v1

MMM and MMMSynth: Clustering of heterogeneous tabular data, and synthetic data generation

We provide new algorithms for two tasks relating to heterogeneous tabular datasets: clustering, and synthetic data generation. Tabular datasets typically consist of heterogeneous data types (numerical, ordinal, categorical) in columns, but may also have hidden cluster structure in their rows: for example, they may be drawn from heterogeneous (geographical, socioeconomic, methodological) sources, such that the outcome variable they describe (such as the presence of a disease) may depend not only on the other variables but on the cluster context. Moreover, sharing of biomedical data is often hindered by patient confidentiality laws, and there is current interest in algorithms to generate synthetic tabular data from real data, for example via deep learning. We demonstrate a novel EM-based clustering algorithm, MMM (``Madras Mixture Model''), that outperforms standard algorithms in determining clusters in synthetic heterogeneous data, and recovers structure in real data. Based on this, we demonstrate a synthetic tabular data generation algorithm, MMMsynth, that pre-clusters the input data, and generates cluster-wise synthetic data assuming cluster-specific data distributions for the input columns. We benchmark this algorithm by testing the performance of standard ML algorithms when they are trained on synthetic data and tested on real published datasets. Our synthetic data generation algorithm outperforms other literature tabular-data generators, and approaches the performance of training purely with real data.

Updated: 2024-04-04 09:38:42

标题: MMM和MMMSynth:异构表格数据的聚类和合成数据生成

摘要: 我们提供了两个与异构表格数据集相关的任务的新算法:聚类和合成数据生成。表格数据集通常包含不同数据类型(数值、顺序、分类)的列,但可能在其行中具有隐藏的聚类结构:例如,它们可能来自不同的(地理、社会经济、方法论)来源,因此它们描述的结果变量(如疾病的存在)可能不仅取决于其他变量,还取决于聚类上下文。此外,医学数据的共享通常受到患者隐私法律的限制,目前对于通过深度学习从真实数据生成合成表格数据的算法有着浓厚的兴趣。 我们展示了一种新颖的基于EM的聚类算法MMM(“马德拉斯混合模型”),在确定合成异构数据中的聚类和恢复真实数据中的结构方面优于标准算法。基于此,我们展示了一种合成表格数据生成算法MMMsynth,它对输入数据进行预聚类,并根据输入列的簇特定数据分布生成簇内合成数据。我们通过测试标准机器学习算法在合成数据上训练并在真实已发布数据集上测试时的性能来对该算法进行基准测试。我们的合成数据生成算法优于其他文献中的表格数据生成器,并接近仅使用真实数据进行训练的性能。

更新时间: 2024-04-04 09:38:42

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2310.19454v2

Exploring Lightweight Federated Learning for Distributed Load Forecasting

Federated Learning (FL) is a distributed learning scheme that enables deep learning to be applied to sensitive data streams and applications in a privacy-preserving manner. This paper focuses on the use of FL for analyzing smart energy meter data with the aim to achieve comparable accuracy to state-of-the-art methods for load forecasting while ensuring the privacy of individual meter data. We show that with a lightweight fully connected deep neural network, we are able to achieve forecasting accuracy comparable to existing schemes, both at each meter source and at the aggregator, by utilising the FL framework. The use of lightweight models further reduces the energy and resource consumption caused by complex deep-learning models, making this approach ideally suited for deployment across resource-constrained smart meter systems. With our proposed lightweight model, we are able to achieve an overall average load forecasting RMSE of 0.17, with the model having a negligible energy overhead of 50 mWh when performing training and inference on an Arduino Uno platform.

Updated: 2024-04-04 09:35:48

标题: 探索轻量级联合学习在分布式负载预测中的应用

摘要: 联邦学习(FL)是一种分布式学习方案,使深度学习能够以保护隐私的方式应用于敏感数据流和应用程序。本文侧重于利用FL分析智能能源表数据,旨在实现与最先进的负荷预测方法相媲美的准确性,同时确保个体表数据的隐私。我们展示了通过利用FL框架,借助轻量级全连接深度神经网络,我们能够在每个表源和聚合器处实现与现有方案相媲美的预测准确性。轻量级模型的使用进一步降低了由复杂深度学习模型引起的能源和资源消耗,使该方法非常适合部署在资源受限的智能表系统中。通过我们提出的轻量级模型,我们能够实现总体平均负荷预测RMSE为0.17,当在Arduino Uno平台上进行训练和推断时,模型的能源开销极小,仅为50mWh。

更新时间: 2024-04-04 09:35:48

领域: cs.LG,cs.SY,eess.SY

下载: http://arxiv.org/abs/2404.03320v1

Neural Random Forest Imitation

We present Neural Random Forest Imitation - a novel approach for transforming random forests into neural networks. Existing methods propose a direct mapping and produce very inefficient architectures. In this work, we introduce an imitation learning approach by generating training data from a random forest and learning a neural network that imitates its behavior. This implicit transformation creates very efficient neural networks that learn the decision boundaries of a random forest. The generated model is differentiable, can be used as a warm start for fine-tuning, and enables end-to-end optimization. Experiments on several real-world benchmark datasets demonstrate superior performance, especially when training with very few training examples. Compared to state-of-the-art methods, we significantly reduce the number of network parameters while achieving the same or even improved accuracy due to better generalization.

Updated: 2024-04-04 09:30:55

标题: 神经随机森林模拟

摘要: 我们提出了神经随机森林模仿 - 一种将随机森林转化为神经网络的新方法。现有方法提出了直接映射,并产生非常低效的架构。在这项工作中,我们引入了一种模仿学习方法,通过从随机森林中生成训练数据并学习一个模仿其行为的神经网络。这种隐式转换创建了非常高效的神经网络,可以学习随机森林的决策边界。生成的模型是可微分的,可以用作微调的热启动,并实现端到端的优化。在多个真实世界基准数据集上的实验证明了优越的性能,特别是在使用非常少的训练样本进行训练时。与最先进的方法相比,我们显著减少了网络参数的数量,同时由于更好的泛化能力,实现了相同甚至更高的准确性。

更新时间: 2024-04-04 09:30:55

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/1911.10829v2

LightFAt: Mitigating Control-flow Explosion via Lightweight PMU-based Control-flow Attestation

With the continuous evolution of computational devices, more and more applications are being executed remotely. The applications operate on a wide spectrum of devices, ranging from IoT nodes with low computational capabilities to large cloud providers with high capabilities. Remote execution often deals with sensitive data or executes proprietary software. Hence, the challenge of ensuring that the code execution will not be compromised rises. Remote Attestation deals with this challenge. It ensures the code is executed in a non-compromised environment by calculating a potentially large sequence of cryptographic hash values. Each hash calculation is computationally intensive and over a large sequence the overhead becomes extremely high. In this work, we propose LightFAt: a Lightweight Control Flow Attestation scheme. Instead of relying on the expensive cryptographic hash calculation, LightFAt leverages the readings from the processor's Performance Monitor Unit (PMU) in conjunction with a lightweight unsupervised machine learning (ML) classifier to detect whether a target application's control flow is compromised, hence improving the system's security. On the verifier's side, LightFAt reaches a detection accuracy of over 95%, with low false-negative and false-positive rates.

Updated: 2024-04-04 09:20:33

标题: LightFAt: 通过轻量级PMU基础控制流认证缓解控制流爆炸

摘要: 随着计算设备的不断演化,越来越多的应用程序被远程执行。这些应用程序在各种设备上运行,从计算能力较低的物联网节点到计算能力较高的大型云提供商。远程执行通常涉及处理敏感数据或执行专有软件。因此,确保代码执行不会受到损害的挑战日益增加。远程证明处理这一挑战。它通过计算潜在的大量密码哈希值来确保代码在未被破坏的环境中执行。每个哈希计算都需要大量计算资源,在大量序列中开销变得极高。在这项工作中,我们提出了LightFAt:一种轻量级控制流认证方案。LightFAt不依赖昂贵的密码哈希计算,而是利用处理器的性能监视器单元(PMU)的读数,结合轻量级的无监督机器学习(ML)分类器,来检测目标应用程序的控制流是否受到破坏,从而提高系统的安全性。在验证方面,LightFAt的检测准确率达到了95%以上,误报率和漏报率低。

更新时间: 2024-04-04 09:20:33

领域: cs.CR

下载: http://arxiv.org/abs/2404.02608v2

Sticky Fingers: Resilience of Satellite Fingerprinting against Jamming Attacks

In the wake of increasing numbers of attacks on radio communication systems, a range of techniques are being deployed to increase the security of these systems. One such technique is radio fingerprinting, in which the transmitter can be identified and authenticated by observing small hardware differences expressed in the signal. Fingerprinting has been explored in particular in the defense of satellite systems, many of which are insecure and cannot be retrofitted with cryptographic security. In this paper, we evaluate the effectiveness of radio fingerprinting techniques under interference and jamming attacks, usually intended to deny service. By taking a pre-trained fingerprinting model and gathering a new dataset in which different levels of Gaussian noise and tone jamming have been added to the legitimate signal, we assess the attacker power required in order to disrupt the transmitter fingerprint such that it can no longer be recognized. We compare this to Gaussian jamming on the data portion of the signal, obtaining the remarkable result that transmitter fingerprints are still recognizable even in the presence of moderate levels of noise. Through deeper analysis of the results, we conclude that it takes a similar amount of jamming power in order to disrupt the fingerprint as it does to jam the message contents itself, so it is safe to include a fingerprinting system to authenticate satellite communication without opening up the system to easier denial-of-service attacks.

Updated: 2024-04-04 09:13:48

标题: 黏性手指:卫星指纹识别对干扰攻击的抵抗力

摘要: 随着对无线通信系统的攻击数量不断增加,人们正在部署一系列技术来增强这些系统的安全性。其中一种技术是无线指纹识别,通过观察信号中表现出的小硬件差异,可以识别和认证发射机。指纹识别在卫星系统的防御中得到了特别的探索,其中许多系统存在安全漏洞,无法进行加密安全性的改造。 本文评估了在干扰和干扰攻击下无线指纹识别技术的有效性,这些攻击通常旨在拒绝服务。通过采用预先训练的指纹模型,并收集一个新的数据集,其中对合法信号添加了不同水平的高斯噪声和音调干扰,我们评估了攻击者所需的能量以破坏发射机指纹,使其无法被识别。我们将此与对信号数据部分进行高斯干扰进行比较,得出了一个显著的结果,即即使在存在中等水平的噪声时,发射机指纹仍然是可识别的。通过对结果的深入分析,我们得出结论,为了认证卫星通信而包含指纹系统时,打乱指纹所需的干扰能量与干扰消息内容本身所需的能量相似,因此可以安全地将指纹识别系统包含在卫星通信中,而不会使系统更容易遭受拒绝服务攻击。

更新时间: 2024-04-04 09:13:48

领域: cs.CR

下载: http://arxiv.org/abs/2402.05042v2

Site-specific Deterministic Temperature and Humidity Forecasts with Explainable and Reliable Machine Learning

Site-specific weather forecasts are essential to accurate prediction of power demand and are consequently of great interest to energy operators. However, weather forecasts from current numerical weather prediction (NWP) models lack the fine-scale detail to capture all important characteristics of localised real-world sites. Instead they provide weather information representing a rectangular gridbox (usually kilometres in size). Even after post-processing and bias correction, area-averaged information is usually not optimal for specific sites. Prior work on site optimised forecasts has focused on linear methods, weighted consensus averaging, time-series methods, and others. Recent developments in machine learning (ML) have prompted increasing interest in applying ML as a novel approach towards this problem. In this study, we investigate the feasibility of optimising forecasts at sites by adopting the popular machine learning model gradient boosting decision tree, supported by the Python version of the XGBoost package. Regression trees have been trained with historical NWP and site observations as training data, aimed at predicting temperature and dew point at multiple site locations across Australia. We developed a working ML framework, named 'Multi-SiteBoost' and initial testing results show a significant improvement compared with gridded values from bias-corrected NWP models. The improvement from XGBoost is found to be comparable with non-ML methods reported in literature. With the insights provided by SHapley Additive exPlanations (SHAP), this study also tests various approaches to understand the ML predictions and increase the reliability of the forecasts generated by ML.

Updated: 2024-04-04 09:12:13

标题: 具有可解释和可靠机器学习的特定站点确定性温度和湿度预测

摘要: 特定站点的天气预报对于准确预测电力需求至关重要,因此对能源运营商具有极大的兴趣。然而,当前数值天气预报(NWP)模型的天气预报缺乏细致的细节,无法捕捉到局部现实世界站点的所有重要特征。相反,它们提供代表矩形网格箱(通常为几公里大小)的天气信息。即使经过后处理和偏差校正,区域平均信息通常也不适用于特定站点。先前关于站点优化预测的研究主要集中在线性方法、加权共识平均、时间序列方法等方面。近年来,机器学习(ML)领域的最新进展促使人们对将ML作为解决这一问题的新方法产生了越来越多的兴趣。在本研究中,我们通过采用流行的机器学习模型梯度提升决策树,结合Python版本的XGBoost软件包,探讨了通过采用机器学习模型在站点上进行预测的可行性。回归树已经通过历史NWP和站点观测作为训练数据进行训练,旨在预测澳大利亚各个站点位置的温度和露点。我们开发了一个名为“Multi-SiteBoost”的工作机器学习框架,初步测试结果显示与偏差校正NWP模型的网格值相比有显著改进。XGBoost的改进被发现与文献中报道的非ML方法相当。通过SHapley Additive exPlanations(SHAP)提供的见解,本研究还测试了各种方法,以了解机器学习预测并提高由ML生成的预测的可靠性。

更新时间: 2024-04-04 09:12:13

领域: physics.ao-ph,cs.LG

下载: http://arxiv.org/abs/2404.03310v1

P^3SUM: Preserving Author's Perspective in News Summarization with Diffusion Language Models

In this work, we take a first step towards designing summarization systems that are faithful to the author's intent, not only the semantic content of the article. Focusing on a case study of preserving political perspectives in news summarization, we find that existing approaches alter the political opinions and stances of news articles in more than 50% of summaries, misrepresenting the intent and perspectives of the news authors. We thus propose P^3SUM, a diffusion model-based summarization approach controlled by political perspective classifiers. In P^3SUM, the political leaning of a generated summary is iteratively evaluated at each decoding step, and any drift from the article's original stance incurs a loss back-propagated to the embedding layers, steering the political stance of the summary at inference time. Extensive experiments on three news summarization datasets demonstrate that P^3SUM outperforms state-of-the-art summarization systems and large language models by up to 13.7% in terms of the success rate of stance preservation, with competitive performance on standard metrics of summarization quality. Our findings present a first analysis of preservation of pragmatic features in summarization, highlight the lacunae in existing summarization models -- that even state-of-the-art models often struggle to preserve author's intents -- and develop new summarization systems that are more faithful to author's perspectives.

Updated: 2024-04-04 09:10:34

标题: P^3SUM:使用扩散语言模型在新闻摘要中保留作者的观点

摘要: 在这项工作中,我们迈出了朝着设计摘要系统的第一步,这些系统忠实于作者的意图,而不仅仅是文章的语义内容。通过关注在新闻摘要中保留政治观点的案例研究,我们发现现有方法在超过50%的摘要中改变了新闻文章的政治观点和立场,误传了新闻作者的意图和观点。因此,我们提出了基于扩散模型的摘要方法P^3SUM,由政治观点分类器控制。在P^3SUM中,生成的摘要的政治倾向在每个解码步骤中被迭代评估,并且任何偏离文章原始立场的情况都会造成损失,反向传播到嵌入层,引导摘要在推断时的政治立场。在三个新闻摘要数据集上进行的大量实验表明,P^3SUM在保持立场成功率方面优于最先进的摘要系统和大型语言模型高达13.7%,在摘要质量的标准指标上表现出有竞争力的性能。我们的发现首次分析了在摘要中保留实用特征的情况,突显了现有摘要模型的不足之处,即即使是最先进的模型也经常难以保留作者的意图,并开发出更忠实于作者观点的新摘要系统。

更新时间: 2024-04-04 09:10:34

领域: cs.CL,cs.LG

下载: http://arxiv.org/abs/2311.09741v2

Optimistic Online Non-stochastic Control via FTRL

This paper brings the concept of "optimism" to the new and promising framework of online Non-stochastic Control (NSC). Namely, we study how can NSC benefit from a prediction oracle of unknown quality responsible for forecasting future costs. The posed problem is first reduced to an optimistic learning with delayed feedback problem, which is handled through the Optimistic Follow the Regularized Leader (OFTRL) algorithmic family. This reduction enables the design of OptFTRL-C, the first Disturbance Action Controller (DAC) with optimistic policy regret bounds. These new bounds are commensurate with the oracle's accuracy, ranging from $\mathcal{O}(1)$ for perfect predictions to the order-optimal $\mathcal{O}(\sqrt{T})$ even when all predictions fail. By addressing the challenge of incorporating untrusted predictions into control systems, our work contributes to the advancement of the NSC framework and paves the way towards effective and robust learning-based controllers.

Updated: 2024-04-04 09:08:04

标题: 通过FTRL进行乐观的在线非随机控制

摘要: 本文将“乐观主义”概念引入了在线非随机控制(NSC)的新领域。具体来说,我们研究了NSC如何从一个负责预测未来成本的未知质量的预测神谕中受益。首先将提出的问题简化为一个具有延迟反馈问题的乐观学习,通过乐观跟随正则化领导者(OFTRL)算法族来处理。这种简化使得能够设计出OptFTRL-C,第一个具有乐观政策遗憾边界的干扰行为控制器(DAC)。这些新的边界与神谕的准确性相符,从完美预测的$\mathcal{O}(1)$到当所有预测失败时的顺序最优$\mathcal{O}(\sqrt{T})$。通过解决将不受信任的预测纳入控制系统的挑战,我们的工作有助于推动NSC框架的发展,并为有效和稳健的基于学习的控制器铺平道路。

更新时间: 2024-04-04 09:08:04

领域: cs.LG,math.OC

下载: http://arxiv.org/abs/2404.03309v1

Improvement of Performance in Freezing of Gait detection in Parkinsons Disease using Transformer networks and a single waist worn triaxial accelerometer

Freezing of gait (FOG) is one of the most incapacitating symptoms in Parkinsons disease, affecting more than 50 percent of patients in advanced stages of the disease. The presence of FOG may lead to falls and a loss of independence with a consequent reduction in the quality of life. Wearable technology and artificial intelligence have been used for automatic FOG detection to optimize monitoring. However, differences between laboratory and daily-life conditions present challenges for the implementation of reliable detection systems. Consequently, improvement of FOG detection methods remains important to provide accurate monitoring mechanisms intended for free-living and real-time use. This paper presents advances in automatic FOG detection using a single body-worn triaxial accelerometer and a novel classification algorithm based on Transformers and convolutional networks. This study was performed with data from 21 patients who manifested FOG episodes while performing activities of daily living in a home setting. Results indicate that the proposed FOG-Transformer can bring a significant improvement in FOG detection using leave-one-subject-out cross-validation (LOSO CV). These results bring opportunities for the implementation of accurate monitoring systems for use in ambulatory or home settings.

Updated: 2024-04-04 09:02:17

标题: 使用Transformer网络和单个佩戴在腰部的三轴加速度计改进帕金森病冻结步态检测的性能

摘要: "冻结步态(Freezing of gait,FOG)是帕金森病中最具瘫痪性症状之一,影响超过50%的晚期病患。FOG的存在可能导致跌倒和独立性丧失,进而降低生活质量。可穿戴技术和人工智能被用于自动检测FOG以优化监测。然而,实验室和日常生活条件之间的差异为可靠检测系统的实施带来挑战。因此,改进FOG检测方法仍然十分重要,以提供用于自由生活和实时使用的准确监测机制。本文介绍了使用单个身体佩戴的三轴加速度计和基于Transformer和卷积网络的新型分类算法进行自动FOG检测的进展。该研究使用了在家庭环境中进行日常生活活动时出现FOG发作的21名患者的数据。结果表明,提出的FOG-Transformer在通过留一主体交叉验证(LOSO CV)进行FOG检测方面可以带来显著改进。这些结果为在步行或家庭环境中使用准确监测系统的实施带来了机会。"

更新时间: 2024-04-04 09:02:17

领域: cs.LG,cs.AI,eess.SP

下载: http://arxiv.org/abs/2404.03704v1

Concept -- An Evaluation Protocol on Conversation Recommender Systems with System- and User-centric Factors

The conversational recommendation system (CRS) has been criticized regarding its user experience in real-world scenarios, despite recent significant progress achieved in academia. Existing evaluation protocols for CRS may prioritize system-centric factors such as effectiveness and fluency in conversation while neglecting user-centric aspects. Thus, we propose a new and inclusive evaluation protocol, Concept, which integrates both system- and user-centric factors. We conceptualise three key characteristics in representing such factors and further divide them into six primary abilities. To implement Concept, we adopt a LLM-based user simulator and evaluator with scoring rubrics that are tailored for each primary ability. Our protocol, Concept, serves a dual purpose. First, it provides an overview of the pros and cons in current CRS models. Second, it pinpoints the problem of low usability in the "omnipotent" ChatGPT and offers a comprehensive reference guide for evaluating CRS, thereby setting the foundation for CRS improvement.

Updated: 2024-04-04 08:56:48

标题: 概念——一个基于系统和用户中心因素的对话推荐系统评估协议

摘要: 虽然学术界在对话推荐系统(CRS)方面取得了重大进展,但在现实场景中,其用户体验却受到批评。现有的CRS评估协议可能会优先考虑系统相关因素,如对话效果和流畅性,而忽视了用户相关方面。因此,我们提出了一个新的全面评估协议Concept,将系统和用户相关因素整合在一起。我们概念化了代表这些因素的三个关键特征,并进一步将其分为六个主要能力。为了实施Concept,我们采用了基于LLM的用户模拟器和评估器,具有针对每个主要能力量身定制的评分标准。我们的协议Concept具有双重目的。首先,它提供了当前CRS模型的优缺点概述。其次,它指出了"全能"ChatGPT中低可用性的问题,并为评估CRS提供了全面的参考指南,从而为CRS的改进奠定了基础。

更新时间: 2024-04-04 08:56:48

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2404.03304v1

SiloFuse: Cross-silo Synthetic Data Generation with Latent Tabular Diffusion Models

Synthetic tabular data is crucial for sharing and augmenting data across silos, especially for enterprises with proprietary data. However, existing synthesizers are designed for centrally stored data. Hence, they struggle with real-world scenarios where features are distributed across multiple silos, necessitating on-premise data storage. We introduce SiloFuse, a novel generative framework for high-quality synthesis from cross-silo tabular data. To ensure privacy, SiloFuse utilizes a distributed latent tabular diffusion architecture. Through autoencoders, latent representations are learned for each client's features, masking their actual values. We employ stacked distributed training to improve communication efficiency, reducing the number of rounds to a single step. Under SiloFuse, we prove the impossibility of data reconstruction for vertically partitioned synthesis and quantify privacy risks through three attacks using our benchmark framework. Experimental results on nine datasets showcase SiloFuse's competence against centralized diffusion-based synthesizers. Notably, SiloFuse achieves 43.8 and 29.8 higher percentage points over GANs in resemblance and utility. Experiments on communication show stacked training's fixed cost compared to the growing costs of end-to-end training as the number of training iterations increases. Additionally, SiloFuse proves robust to feature permutations and varying numbers of clients.

Updated: 2024-04-04 08:48:30

标题: SiloFuse:使用潜在表格扩散模型进行跨筒仓合成数据生成

摘要: 人工合成的表格数据对于在不同信息孤岛之间共享和增强数据至关重要,尤其是对于拥有专有数据的企业而言。然而,现有的合成器设计都是针对中央存储的数据的。因此,在现实世界中,特征分布在多个信息孤岛之间,需要在现场存储数据。我们介绍了SiloFuse,这是一个新颖的生成框架,用于从跨信息孤岛的表格数据中进行高质量合成。为了确保隐私,SiloFuse利用了分布式潜在表格扩散架构。通过自动编码器,学习了每个客户端特征的潜在表示,掩盖了它们的实际值。我们采用堆叠式分布式训练来提高通信效率,将轮数减少到一个步骤。在SiloFuse下,我们证明了对于垂直分区合成数据的不可能性,并通过三种攻击使用我们的基准框架来量化隐私风险。对九个数据集的实验结果展示了SiloFuse相对于中心扩散合成器的能力。值得注意的是,SiloFuse在相似性和实用性方面比GANs分别提高了43.8和29.8个百分点。在通信方面的实验显示了与端到端训练随着训练迭代次数增加而增加的成本相比,堆叠训练的固定成本。此外,SiloFuse证明了对特征排列和不同客户端数量的稳健性。

更新时间: 2024-04-04 08:48:30

领域: cs.LG,cs.CR,cs.DB,cs.DC

下载: http://arxiv.org/abs/2404.03299v1

Reason from Fallacy: Enhancing Large Language Models' Logical Reasoning through Logical Fallacy Understanding

Large Language Models (LLMs) have demonstrated good performance in many reasoning tasks, but they still struggle with some complicated reasoning tasks including logical reasoning. One non-negligible reason for LLMs' suboptimal performance on logical reasoning is their overlooking of understanding logical fallacies correctly. To evaluate LLMs' capability of logical fallacy understanding (LFU), we propose five concrete tasks from three cognitive dimensions of WHAT, WHY, and HOW in this paper. Towards these LFU tasks, we have successfully constructed a new dataset LFUD based on GPT-4 accompanied by a little human effort. Our extensive experiments justify that our LFUD can be used not only to evaluate LLMs' LFU capability, but also to fine-tune LLMs to obtain significantly enhanced performance on logical reasoning.

Updated: 2024-04-04 08:38:03

标题: 从谬误中得出理由:通过理解逻辑谬误增强大型语言模型的逻辑推理

摘要: 大型语言模型(LLMs)在许多推理任务中表现出良好的性能,但它们仍然在一些复杂的推理任务中遇到困难,包括逻辑推理。LLMs在逻辑推理方面表现不佳的一个非常重要的原因是它们忽视了正确理解逻辑谬误。为了评估LLMs对逻辑谬误理解(LFU)的能力,本文提出了来自WHAT、WHY和HOW三个认知维度的五个具体任务。针对这些LFU任务,我们成功地构建了一个新的基于GPT-4的数据集LFUD,辅以少量人力。我们的大量实验证明,我们的LFUD不仅可以用于评估LLMs的LFU能力,还可以对LLMs进行微调,从而显著提高逻辑推理的性能。

更新时间: 2024-04-04 08:38:03

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2404.04293v1

The power of a single Haar random state: constructing and separating quantum pseudorandomness

In this work, we focus on the following question: what are the cryptographic implications of having access to an oracle that provides a single Haar random quantum state? We show, perhaps surprisingly, that such an oracle is sufficient to construct quantum pseudorandomness. Pseudorandom states (PRS) are a family of states for which it is hard to distinguish between polynomially many copies of either a state sampled uniformly from the family or a Haar random state. A weaker notion, called single-copy pseudorandom states (1PRS), satisfies this property with respect to a single copy. Our main result is that 1PRS (as well as bit-commitments) exist relative to an oracle that provides a single Haar random state. We build on this result to show the existence of an oracle relative to which 1PRS exist, but PRS do not. This provides one of the first black-box separations between different forms of quantum pseudorandomness.

Updated: 2024-04-04 08:36:44

标题: 一个单一的哈尔随机态的力量:构建和分离量子伪随机性

摘要: 在这项工作中,我们关注以下问题:拥有一个提供单个哈尔随机量子态的预言机会有什么密码学含义?我们展示了,或许令人惊讶的是,这样一个预言机足以构建量子伪随机性。 伪随机态(PRS)是一组态,对于这些态的多项式数量的拷贝中的任何一个,很难区分是从该组态中均匀采样的态还是一个哈尔随机态。一个更弱的概念,称为单个拷贝伪随机态(1PRS),对于单个拷贝满足这个属性。我们的主要结果是,相对于提供单个哈尔随机态的预言机,1PRS(以及比特承诺)存在。我们基于这一结果,展示了相对于一个预言机存在1PRS,但PRS不存在。这提供了量子伪随机性不同形式之间的第一个黑匣子分离。

更新时间: 2024-04-04 08:36:44

领域: quant-ph,cs.CR

下载: http://arxiv.org/abs/2404.03295v1

Learning-to-Optimize with PAC-Bayesian Guarantees: Theoretical Considerations and Practical Implementation

We use the PAC-Bayesian theory for the setting of learning-to-optimize. To the best of our knowledge, we present the first framework to learn optimization algorithms with provable generalization guarantees (PAC-Bayesian bounds) and explicit trade-off between convergence guarantees and convergence speed, which contrasts with the typical worst-case analysis. Our learned optimization algorithms provably outperform related ones derived from a (deterministic) worst-case analysis. The results rely on PAC-Bayesian bounds for general, possibly unbounded loss-functions based on exponential families. Then, we reformulate the learning procedure into a one-dimensional minimization problem and study the possibility to find a global minimum. Furthermore, we provide a concrete algorithmic realization of the framework and new methodologies for learning-to-optimize, and we conduct four practically relevant experiments to support our theory. With this, we showcase that the provided learning framework yields optimization algorithms that provably outperform the state-of-the-art by orders of magnitude.

Updated: 2024-04-04 08:24:57

标题: 学习优化与PAC-Bayesian保证:理论考虑和实际实施

摘要: 我们利用PAC-Bayesian理论来进行学习优化的设置。据我们所知,我们提出了第一个能够学习具有可证明泛化保证(PAC-Bayesian界限)和明确收敛保证与收敛速度之间权衡的优化算法框架,这与典型的最坏情况分析形成对比。我们学到的优化算法可以明确优于从(确定性)最坏情况分析中得出的相关算法。这些结果依赖于基于指数族的通用、可能无界的损失函数的PAC-Bayesian界限。然后,我们将学习过程重新表述为一维最小化问题,研究找到全局最小值的可能性。此外,我们提供了框架的具体算法实现和学习优化的新方法,进行了四个实际相关的实验来支持我们的理论。通过这一点,我们展示了提供的学习框架产生的优化算法可以明确优于现有技术水平数倍。

更新时间: 2024-04-04 08:24:57

领域: cs.LG,math.OC

下载: http://arxiv.org/abs/2404.03290v1

Recording and Describing Poker Hands

This paper introduces the Poker Hand History (PHH) file format, designed to standardize the recording of poker hands across different game variants. Despite poker's widespread popularity in the mainstream culture as a mind sport and its prominence in the field of artificial intelligence (AI) research as a benchmark for imperfect information AI agents, it lacks a consistent format that humans can use to document poker hands across different variants that can also easily be parsed by machines. To address this gap in the literature, we propose the PHH format which provides a concise human-readable machine-friendly representation of hand history that comprehensively captures various details of the hand, ranging from initial game parameters and actions to contextual parameters including but not limited to the venue, players, and time control information. In the supplementary, we provide 10,088 hands covering 11 different variants in the PHH format. The source code of the parser is available on GitHub: https://github.com/uoftcprg/pokerkit

Updated: 2024-04-04 08:06:03

标题: 记录和描述扑克牌手

摘要: 本文介绍了扑克手牌历史(PHH)文件格式,旨在标准化记录不同游戏变体中的扑克手牌。尽管扑克在主流文化中广受欢迎作为一种头脑运动,并在人工智能(AI)研究领域中占据重要地位,作为不完全信息AI代理的基准,但缺乏一种一致的格式,供人类用于记录跨不同变体的扑克手牌,并且机器可以轻松解析。为了填补文献中的这一空白,我们提出了PHH格式,提供了一种简洁易读的机器友好的手牌历史表示,全面捕捉手牌的各种细节,从初始游戏参数和动作到包括但不限于场地、玩家和时间控制信息的上下文参数。在补充材料中,我们提供了10,088手不同变体的手牌历史,以PHH格式呈现。解析器的源代码可在GitHub上找到:https://github.com/uoftcprg/pokerkit

更新时间: 2024-04-04 08:06:03

领域: cs.AI

下载: http://arxiv.org/abs/2312.11753v3

A Deep Reinforcement Learning Approach for Security-Aware Service Acquisition in IoT

The novel Internet of Things (IoT) paradigm is composed of a growing number of heterogeneous smart objects and services that are transforming architectures and applications, increasing systems' complexity, and the need for reliability and autonomy. In this context, both smart objects and services are often provided by third parties which do not give full transparency regarding the security and privacy of the features offered. Although machine-based Service Level Agreements (SLA) have been recently leveraged to establish and share policies in Cloud-based scenarios, and also in the IoT context, the issue of making end users aware of the overall system security levels and the fulfillment of their privacy requirements through the provision of the requested service remains a challenging task. To tackle this problem, we propose a complete framework that defines suitable levels of privacy and security requirements in the acquisition of services in IoT, according to the user needs. Through the use of a Reinforcement Learning based solution, a user agent, inside the environment, is trained to choose the best smart objects granting access to the target services. Moreover, the solution is designed to guarantee deadline requirements and user security and privacy needs. Finally, to evaluate the correctness and the performance of the proposed approach we illustrate an extensive experimental analysis.

Updated: 2024-04-04 08:00:12

标题: 一种用于物联网中安全感知服务获取的深度强化学习方法

摘要: 新兴的物联网(IoT)范式由日益增长的异构智能对象和服务组成,这些对象和服务正在改变架构和应用程序,增加系统的复杂性,以及对可靠性和自主性的需求。在这种背景下,智能对象和服务通常由第三方提供,这些第三方并未提供有关所提供功能的安全性和隐私性的充分透明度。尽管最近已经利用基于机器的服务级别协议(SLA)来在基于云的场景以及物联网环境中建立和共享政策,但使最终用户意识到整个系统的安全级别以及通过提供所请求的服务来满足其隐私需求仍然是一个具有挑战性的任务。为了解决这个问题,我们提出了一个完整的框架,根据用户需求定义了适当的隐私和安全要求,用于在物联网中获取服务。通过使用基于强化学习的解决方案,环境内的用户代理被训练为选择授予对目标服务访问权限的最佳智能对象。此外,该解决方案旨在保证截止日期要求和用户的安全性和隐私性需求。最后,为了评估所提出方法的正确性和性能,我们进行了广泛的实验分析。

更新时间: 2024-04-04 08:00:12

领域: cs.CR,cs.AI

下载: http://arxiv.org/abs/2404.03276v1

DELTA: Decomposed Efficient Long-Term Robot Task Planning using Large Language Models

Recent advancements in Large Language Models (LLMs) have sparked a revolution across various research fields. In particular, the integration of common-sense knowledge from LLMs into robot task and motion planning has been proven to be a game-changer, elevating performance in terms of explainability and downstream task efficiency to unprecedented heights. However, managing the vast knowledge encapsulated within these large models has posed challenges, often resulting in infeasible plans generated by LLM-based planning systems due to hallucinations or missing domain information. To overcome these challenges and obtain even greater planning feasibility and computational efficiency, we propose a novel LLM-driven task planning approach called DELTA. For achieving better grounding from environmental topology into actionable knowledge, DELTA leverages the power of scene graphs as environment representations within LLMs, enabling the fast generation of precise planning problem descriptions. For obtaining higher planning performance, we use LLMs to decompose the long-term task goals into an autoregressive sequence of sub-goals for an automated task planner to solve. Our contribution enables a more efficient and fully automatic task planning pipeline, achieving higher planning success rates and significantly shorter planning times compared to the state of the art.

Updated: 2024-04-04 07:59:24

标题: DELTA:使用大型语言模型分解高效长期机器人任务规划

摘要: 最近大语言模型(LLMs)的进展在各个研究领域引发了一场革命。特别是,将LLMs中的常识知识整合到机器人任务和运动规划中被证明是一个改变游戏规则的举措,提高了解释性能和下游任务效率达到了前所未有的高度。然而,管理这些大型模型中封装的庞大知识面带来了挑战,通常导致由LLM驱动的规划系统生成的计划不可行,因为出现了幻觉或缺少领域信息。为了克服这些挑战,实现更大规划可行性和计算效率,我们提出了一种名为DELTA的新型LLM驱动任务规划方法。为了将环境拓扑结构更好地转化为可操作的知识,DELTA利用了场景图作为LLMs中的环境表示,实现了快速生成精确的规划问题描述。为了获得更高的规划性能,我们使用LLMs将长期任务目标分解为自回归序列的子目标,供自动任务规划器解决。我们的贡献实现了更高效和完全自动的任务规划流程,较现有技术实现了更高的规划成功率和显著缩短的规划时间。

更新时间: 2024-04-04 07:59:24

领域: cs.RO,cs.AI

下载: http://arxiv.org/abs/2404.03275v1

Bias Behind the Wheel: Fairness Analysis of Autonomous Driving Systems

This paper analyzes fairness in automated pedestrian detection, a crucial but under-explored issue in autonomous driving systems. We evaluate eight state-of-the-art deep learning-based pedestrian detectors across demographic groups on large-scale real-world datasets. To enable thorough fairness testing, we provide extensive annotations for the datasets, resulting in 8,311 images with 16,070 gender labels, 20,115 age labels, and 3,513 skin tone labels. Our findings reveal significant fairness issues, particularly related to age. The undetected proportions for children are 20.14% higher compared to adults. Furthermore, we explore how various driving scenarios affect the fairness of pedestrian detectors. We find that pedestrian detectors demonstrate significant gender biases during night time, potentially exacerbating the prevalent societal issue of female safety concerns during nighttime out. Moreover, we observe that pedestrian detectors can demonstrate both enhanced fairness and superior performance under specific driving conditions, which challenges the fairness-performance trade-off theory widely acknowledged in the fairness literature. We publicly release the code, data, and results to support future research on fairness in autonomous driving.

Updated: 2024-04-04 07:56:59

标题: 驾驶偏见:自动驾驶系统的公平性分析

摘要: 本文分析了自动行人检测中的公平性,这是自动驾驶系统中一个至关重要但鲜为人知的问题。我们评估了八种最先进的基于深度学习的行人检测器在大规模真实世界数据集上跨不同人口群体的表现。为了进行彻底的公平性测试,我们为数据集提供了大量注释,共有8311张图像,16070个性别标签,20115个年龄标签和3513个肤色标签。我们的研究发现了显著的公平性问题,尤其是与年龄有关。与成年人相比,儿童的未检测比例高出20.14%。此外,我们探讨了不同驾驶情境如何影响行人检测器的公平性。我们发现,在夜间,行人检测器表现出显著的性别偏见,可能加剧了晚上女性安全问题的普遍社会问题。此外,我们观察到,在特定驾驶条件下,行人检测器可以展现出既增强的公平性又卓越的性能,这挑战了公平性文献中被广泛认可的公平性 - 性能权衡理论。我们公开发布代码、数据和结果,以支持未来研究自动驾驶中的公平性。

更新时间: 2024-04-04 07:56:59

领域: cs.CY,cs.AI,cs.CV,cs.SE

下载: http://arxiv.org/abs/2308.02935v3

Gaussian-Smoothed Sliced Probability Divergences

Gaussian smoothed sliced Wasserstein distance has been recently introduced for comparing probability distributions, while preserving privacy on the data. It has been shown that it provides performances similar to its non-smoothed (non-private) counterpart. However, the computationaland statistical properties of such a metric have not yet been well-established. This work investigates the theoretical properties of this distance as well as those of generalized versions denoted as Gaussian-smoothed sliced divergences. We first show that smoothing and slicing preserve the metric property and the weak topology. To study the sample complexity of such divergences, we then introduce $\hat{\hat\mu}_{n}$ the double empirical distribution for the smoothed-projected $\mu$. The distribution $\hat{\hat\mu}_{n}$ is a result of a double sampling process: one from sampling according to the origin distribution $\mu$ and the second according to the convolution of the projection of $\mu$ on the unit sphere and the Gaussian smoothing. We particularly focus on the Gaussian smoothed sliced Wasserstein distance and prove that it converges with a rate $O(n^{-1/2})$. We also derive other properties, including continuity, of different divergences with respect to the smoothing parameter. We support our theoretical findings with empirical studies in the context of privacy-preserving domain adaptation.

Updated: 2024-04-04 07:55:46

标题: 高斯平滑的切片概率散度

摘要: 最近引入了高斯平滑的切片Wasserstein距离,用于比较概率分布,同时保护数据的隐私。已经证明,它提供的性能类似于其非平滑(非私有)的对应物。然而,这种度量的计算和统计特性尚未得到充分的确立。本文研究了这种距离的理论特性以及被标记为高斯平滑切片差异的广义版本的特性。我们首先展示了平滑和切片保留了度量性质和弱拓扑性质。为了研究这种差异的样本复杂性,我们引入了平滑投影$\mu$的双重经验分布$\hat{\hat\mu}_{n}$。分布$\hat{\hat\mu}_{n}$是一个双重采样过程的结果:一个是根据原始分布$\mu$进行采样,另一个是根据$\mu$在单位球面上的投影和高斯平滑的卷积进行采样。我们特别关注高斯平滑的切片Wasserstein距离,并证明它以$O(n^{-1/2})$的速率收敛。我们还推导了关于不同差异的连续性等其他特性,与平滑参数相关。我们通过在隐私保护领域自适应的上下文中的实证研究支持我们的理论发现。

更新时间: 2024-04-04 07:55:46

领域: cs.LG,math.ST,stat.ML,stat.TH

下载: http://arxiv.org/abs/2404.03273v1

Security Weaknesses of Copilot Generated Code in GitHub

Modern code generation tools, utilizing AI models like Large Language Models (LLMs), have gained popularity for producing functional code. However, their usage presents security challenges, often resulting in insecure code merging into the code base. Evaluating the quality of generated code, especially its security, is crucial. While prior research explored various aspects of code generation, the focus on security has been limited, mostly examining code produced in controlled environments rather than real-world scenarios. To address this gap, we conducted an empirical study, analyzing code snippets generated by GitHub Copilot from GitHub projects. Our analysis identified 452 snippets generated by Copilot, revealing a high likelihood of security issues, with 32.8% of Python and 24.5% of JavaScript snippets affected. These issues span 38 different Common Weakness Enumeration (CWE) categories, including significant ones like CWE-330: Use of Insufficiently Random Values, CWE-78: OS Command Injection, and CWE-94: Improper Control of Generation of Code. Notably, eight CWEs are among the 2023 CWE Top-25, highlighting their severity. Our findings confirm that developers should be careful when adding code generated by Copilot and should also run appropriate security checks as they accept the suggested code. It also shows that practitioners should cultivate corresponding security awareness and skills.

Updated: 2024-04-04 07:53:03

标题: GitHub中由Copilot生成的代码的安全弱点

摘要: 现代代码生成工具,利用人工智能模型如大型语言模型(LLMs),已经因生成功能性代码而变得流行。然而,它们的使用带来了安全挑战,往往导致不安全代码合并到代码库中。评估生成代码的质量,特别是其安全性,至关重要。尽管以往研究探讨了代码生成的各个方面,但对安全性的关注却有限,主要是检查在受控环境中生成的代码,而不是在真实世界情景下。为了弥补这一空白,我们进行了经验研究,分析了GitHub项目中由GitHub Copilot生成的代码片段。我们的分析发现,Copilot生成了452个代码片段,揭示了高概率的安全问题,其中32.8%的Python和24.5%的JavaScript代码片段受到影响。这些问题涉及38个不同的常见弱点枚举(CWE)类别,包括重要的如CWE-330:使用不足随机值、CWE-78:操作系统命令注入和CWE-94:生成代码的不当控制。值得注意的是,八个CWE属于2023年CWE Top-25,突显了它们的严重性。我们的研究结果证实,开发人员在添加Copilot生成的代码时应谨慎,并在接受建议的代码时运行适当的安全检查。这也表明从业者应培养相应的安全意识和技能。

更新时间: 2024-04-04 07:53:03

领域: cs.SE,cs.CR

下载: http://arxiv.org/abs/2310.02059v2

Mitigating analytical variability in fMRI results with style transfer

We propose a novel approach to improve the reproducibility of neuroimaging results by converting statistic maps across different functional MRI pipelines. We make the assumption that pipelines can be considered as a style component of data and propose to use different generative models, among which, Diffusion Models (DM) to convert data between pipelines. We design a new DM-based unsupervised multi-domain image-to-image transition framework and constrain the generation of 3D fMRI statistic maps using the latent space of an auxiliary classifier that distinguishes statistic maps from different pipelines. We extend traditional sampling techniques used in DM to improve the transition performance. Our experiments demonstrate that our proposed methods are successful: pipelines can indeed be transferred, providing an important source of data augmentation for future medical studies.

Updated: 2024-04-04 07:49:39

标题: 使用样式转移减轻fMRI结果中的分析变异性

摘要: 我们提出了一种新颖的方法,通过在不同功能磁共振成像流程之间转换统计图来提高神经影像结果的可重复性。我们假设流程可以被视为数据的风格组件,并建议使用不同的生成模型,其中包括扩散模型(DM)来在流程之间转换数据。我们设计了一个基于DM的新型无监督多域图像转换框架,并通过辅助分类器的潜在空间来约束生成3D fMRI统计图。我们扩展了用于DM的传统抽样技术以改进转换性能。我们的实验表明,我们提出的方法是成功的:流程确实可以被转移,为未来医学研究提供了重要的数据增强来源。

更新时间: 2024-04-04 07:49:39

领域: eess.IV,cs.AI,cs.CV,cs.LG

下载: http://arxiv.org/abs/2404.03703v1

Cryptographic Hardness of Score Estimation

We show that $L^2$-accurate score estimation, in the absence of strong assumptions on the data distribution, is computationally hard even when sample complexity is polynomial in the relevant problem parameters. Our reduction builds on the result of Chen et al. (ICLR 2023), who showed that the problem of generating samples from an unknown data distribution reduces to $L^2$-accurate score estimation. Our hard-to-estimate distributions are the "Gaussian pancakes" distributions, originally due to Diakonikolas et al. (FOCS 2017), which have been shown to be computationally indistinguishable from the standard Gaussian under widely believed hardness assumptions from lattice-based cryptography (Bruna et al., STOC 2021; Gupte et al., FOCS 2022).

Updated: 2024-04-04 07:49:09

标题: 评分估计的密码学难度

摘要: 我们发现,在没有对数据分布进行强烈假设的情况下,即使样本复杂度在相关问题参数的多项式时间内,$L^2$-准确的分数估计也是计算上困难的。我们的约简建立在Chen等人(ICLR 2023)的结果之上,他们表明从未知数据分布生成样本的问题可以约简为$L^2$-准确的分数估计。我们难以估计的分布是“高斯煎饼”分布,最初由Diakonikolas等人(FOCS 2017)提出,根据基于格的密码学的普遍难度假设(Bruna等人,STOC 2021;Gupte等人,FOCS 2022),这些分布已被证明在计算上与标准高斯分布无法区分。

更新时间: 2024-04-04 07:49:09

领域: cs.LG,cs.CC,cs.CR,math.ST,stat.ML,stat.TH

下载: http://arxiv.org/abs/2404.03272v1

SalUn: Empowering Machine Unlearning via Gradient-based Weight Saliency in Both Image Classification and Generation

With evolving data regulations, machine unlearning (MU) has become an important tool for fostering trust and safety in today's AI models. However, existing MU methods focusing on data and/or weight perspectives often suffer limitations in unlearning accuracy, stability, and cross-domain applicability. To address these challenges, we introduce the concept of 'weight saliency' for MU, drawing parallels with input saliency in model explanation. This innovation directs MU's attention toward specific model weights rather than the entire model, improving effectiveness and efficiency. The resultant method that we call saliency unlearning (SalUn) narrows the performance gap with 'exact' unlearning (model retraining from scratch after removing the forgetting data points). To the best of our knowledge, SalUn is the first principled MU approach that can effectively erase the influence of forgetting data, classes, or concepts in both image classification and generation tasks. As highlighted below, For example, SalUn yields a stability advantage in high-variance random data forgetting, e.g., with a 0.2% gap compared to exact unlearning on the CIFAR-10 dataset. Moreover, in preventing conditional diffusion models from generating harmful images, SalUn achieves nearly 100% unlearning accuracy, outperforming current state-of-the-art baselines like Erased Stable Diffusion and Forget-Me-Not. Codes are available at https://github.com/OPTML-Group/Unlearn-Saliency. (WARNING: This paper contains model outputs that may be offensive in nature.)

Updated: 2024-04-04 07:45:38

标题: SalUn:通过基于梯度的权重显著性在图像分类和生成中赋能机器去学习

摘要: 随着数据法规的不断发展,机器遗忘(MU)已成为促进当今AI模型信任和安全的重要工具。然而,现有的MU方法主要关注数据和/或权重角度,往往在遗忘准确性、稳定性和跨领域适用性方面存在局限性。为解决这些挑战,我们引入了“权重显著性”概念,为MU引入了模型解释中的输入显著性进行类比。这一创新将MU的注意力集中在特定模型权重上,而不是整个模型,从而提高效果和效率。我们称之为显著性遗忘(SalUn)的结果方法缩小了与“精确”遗忘(在删除遗忘数据点后从头开始重新训练模型)之间的性能差距。据我们所知,SalUn是第一个能够有效消除图像分类和生成任务中遗忘数据、类别或概念影响的合理MU方法。正如下面所强调的,例如,SalUn在高方差随机数据遗忘中具有稳定性优势,例如,在CIFAR-10数据集上与精确遗忘相比有0.2%的差距。此外,在防止条件扩散模型生成有害图像方面,SalUn实现了近乎100%的遗忘准确性,优于当前的最先进基线模型,如Erased Stable Diffusion和Forget-Me-Not。代码可在https://github.com/OPTML-Group/Unlearn-Saliency找到。(警告:本文包含可能具有冒犯性质的模型输出。)

更新时间: 2024-04-04 07:45:38

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2310.12508v5

AgentGroupChat: An Interactive Group Chat Simulacra For Better Eliciting Emergent Behavior

Language significantly influences the formation and evolution of Human emergent behavior, which is crucial in understanding collective intelligence within human societies. Considering that the study of how language affects human behavior needs to put it into the dynamic scenarios in which it is used, we introduce AgentGroupChat in this paper, a simulation that delves into the complex role of language in shaping collective behavior through interactive debate scenarios. Central to this simulation are characters engaging in dynamic conversation interactions. To enable simulation, we introduce the Verbal Strategist Agent, utilizing large language models to enhance interaction strategies by incorporating elements of persona and action. We set four narrative scenarios based on AgentGroupChat to demonstrate the simulation's capacity to mimic complex language use in group dynamics. Evaluations focus on aligning agent behaviors with human expectations and the emergence of collective behaviors within the simulation. Results reveal that emergent behaviors materialize from a confluence of factors: a conducive environment for extensive information exchange, characters with diverse traits, high linguistic comprehension, and strategic adaptability. During discussions on ``the impact of AI on humanity'' in AgentGroupChat simulation, philosophers commonly agreed that ``AI could enhance societal welfare with judicious limitations'' and even come to a conclusion that ``the essence of true intelligence encompasses understanding the necessity to constrain self abilities''. Additionally, in the competitive domain of casting for primary roles in films in AgentGroupChat, certain actors were ready to reduce their remuneration or accept lesser roles, motivated by their deep-seated desire to contribute to the project.

Updated: 2024-04-04 07:40:31

标题: AgentGroupChat:一个交互式小组聊天模拟系统,以更好地引发新兴行为

摘要: 语言在人类新兴行为的形成和演变中起着重要作用,这对于理解人类社会内集体智慧至关重要。考虑到研究语言如何影响人类行为需要将其置于动态场景中,本文介绍了AgentGroupChat,这是一个模拟,深入探讨语言在塑造集体行为中的复杂作用,通过互动辩论场景。该模拟的核心是角色参与动态对话互动。为了实现模拟,我们引入了言语策略家代理,利用大型语言模型通过融入角色和行动元素来增强互动策略。我们基于AgentGroupChat设定了四个叙事场景,以展示模拟能够模仿群体动态中复杂的语言使用能力。评估侧重于调整代理行为与人类期望的一致性以及模拟中群体行为的形成。结果显示,新兴行为源于多个因素的交汇:有利于广泛信息交流的环境、具有多样特征的角色、高度语言理解能力和战略适应性。在AgentGroupChat模拟中关于“人工智能对人类的影响”的讨论中,哲学家们普遍认为“人工智能可以在谨慎限制下增强社会福祉”,甚至得出结论“真正智慧的本质包含了理解约束自身能力的必要性”。此外,在AgentGroupChat中竞争激烈的电影主演角色选拔领域,某些演员愿意降低报酬或接受较小的角色,出于他们内心深处的渴望为项目做出贡献的动机。

更新时间: 2024-04-04 07:40:31

领域: cs.AI,cs.CL,cs.CY

下载: http://arxiv.org/abs/2403.13433v2

Foundation Model for Advancing Healthcare: Challenges, Opportunities, and Future Directions

Foundation model, which is pre-trained on broad data and is able to adapt to a wide range of tasks, is advancing healthcare. It promotes the development of healthcare artificial intelligence (AI) models, breaking the contradiction between limited AI models and diverse healthcare practices. Much more widespread healthcare scenarios will benefit from the development of a healthcare foundation model (HFM), improving their advanced intelligent healthcare services. Despite the impending widespread deployment of HFMs, there is currently a lack of clear understanding about how they work in the healthcare field, their current challenges, and where they are headed in the future. To answer these questions, a comprehensive and deep survey of the challenges, opportunities, and future directions of HFMs is presented in this survey. It first conducted a comprehensive overview of the HFM including the methods, data, and applications for a quick grasp of the current progress. Then, it made an in-depth exploration of the challenges present in data, algorithms, and computing infrastructures for constructing and widespread application of foundation models in healthcare. This survey also identifies emerging and promising directions in this field for future development. We believe that this survey will enhance the community's comprehension of the current progress of HFM and serve as a valuable source of guidance for future development in this field. The latest HFM papers and related resources are maintained on our website: https://github.com/YutingHe-list/Awesome-Foundation-Models-for-Advancing-Healthcare.

Updated: 2024-04-04 07:39:55

标题: 医疗保健推进基础模型:挑战、机遇与未来方向

摘要: 基础模型,预先在广泛数据上进行了预训练,并能够适应各种任务,正在推动医疗保健领域的发展。它促进了医疗人工智能(AI)模型的发展,打破了有限的AI模型与多样化医疗实践之间的矛盾。更广泛的医疗场景将受益于医疗基础模型(HFM)的发展,改进其先进智能医疗服务。尽管HFM的广泛部署即将到来,但目前对于它们在医疗领域的工作原理、当前挑战以及未来发展方向缺乏清晰的理解。为了回答这些问题,在本调查中提出了对HFM的挑战、机遇和未来方向进行全面深入调查。首先,对HFM进行了全面概述,包括方法、数据和应用,以便快速了解当前进展。然后,对在数据、算法和计算基础设施方面存在的挑战进行了深入探讨,以建立和广泛应用医疗基础模型。该调查还确定了这一领域未来发展的新兴和有前景的方向。我们相信,这项调查将增进社区对HFM当前进展的理解,并为该领域未来发展提供有价值的指导来源。最新的HFM论文和相关资源可在我们的网站上找到:https://github.com/YutingHe-list/Awesome-Foundation-Models-for-Advancing-Healthcare。

更新时间: 2024-04-04 07:39:55

领域: cs.CY,cs.AI

下载: http://arxiv.org/abs/2404.03264v1

On the Surprising Efficacy of Distillation as an Alternative to Pre-Training Small Models

In this paper, we propose that small models may not need to absorb the cost of pre-training to reap its benefits. Instead, they can capitalize on the astonishing results achieved by modern, enormous models to a surprising degree. We observe that, when distilled on a task from a pre-trained teacher model, a small model can achieve or surpass the performance it would achieve if it was pre-trained then finetuned on that task. To allow this phenomenon to be easily leveraged, we establish a connection reducing knowledge distillation to modern contrastive learning, opening two doors: (1) vastly different model architecture pairings can work for the distillation, and (2) most contrastive learning algorithms rooted in the theory of Noise Contrastive Estimation can be easily applied and used. We demonstrate this paradigm using pre-trained teacher models from open-source model hubs, Transformer and convolution based model combinations, and a novel distillation algorithm that massages the Alignment/Uniformity perspective of contrastive learning by Wang & Isola (2020) into a distillation objective. We choose this flavor of contrastive learning due to its low computational cost, an overarching theme of this work. We also observe that this phenomenon tends not to occur if the task is data-limited. However, this can be alleviated by leveraging yet another scale-inspired development: large, pre-trained generative models for dataset augmentation. Again, we use an open-source model, and our rudimentary prompts are sufficient to boost the small model`s performance. Thus, we highlight a training method for small models that is up to 94% faster than the standard pre-training paradigm without sacrificing performance. For practitioners discouraged from fully utilizing modern foundation datasets for their small models due to the prohibitive scale, we believe our work keeps that door open.

Updated: 2024-04-04 07:38:11

标题: 关于蒸馏作为一种替代预训练小模型的惊人有效性

摘要: 在本文中,我们提出小模型可能不需要承担预训练的成本就能获得其好处。相反,它们可以在很大程度上利用现代巨大模型取得的惊人结果。我们观察到,当在来自预训练教师模型的任务上进行蒸馏时,小模型可以实现或超越在该任务上进行预训练然后微调时的性能。为了让这一现象更容易被利用,我们建立了一种将知识蒸馏简化为现代对比学习的联系,打开了两个大门:(1)大不同的模型架构配对可以适用于蒸馏,以及(2)大多数根植于噪声对比估计理论的对比学习算法可以轻松应用和使用。我们使用来自开源模型中心的预训练教师模型,Transformer和基于卷积的模型组合,以及一种新的蒸馏算法,将Wang&Isola(2020)的对比学习中的对准/均匀性视角转化为蒸馏目标。我们选择这种对比学习的风格是因为它的计算成本低,这是本文的一个主题。我们还观察到,如果任务受到数据限制,这种现象往往不会发生。然而,通过利用另一个受规模启发的发展:大型预训练生成模型进行数据增强,可以缓解这种情况。再次,我们使用开源模型,我们的基本提示足以提升小模型的性能。因此,我们强调了一种针对小模型的训练方法,其速度比标准预训练范式快多达94%,而不牺牲性能。对于由于规模过大而感到沮丧而无法充分利用现代基础数据集的从业者,我们相信我们的工作保持了这扇门的敞开。

更新时间: 2024-04-04 07:38:11

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2404.03263v1

Enhancing the Performance of Aspect-Based Sentiment Analysis Systems

Aspect-based sentiment analysis aims to predict sentiment polarity with fine granularity. While Graph Convolutional Networks (GCNs) are widely utilized for sentimental feature extraction, their naive application for syntactic feature extraction can compromise information preservation. This study introduces an innovative edge-enhanced GCN, named SentiSys, to navigate the syntactic graph while preserving intact feature information, leading to enhanced performance. Specifically,we first integrate a bidirectional long short-term memory (Bi-LSTM) network and a self-attention-based transformer. This combination facilitates effective text encoding, preventing the loss of information and predicting long dependency text. A bidirectional GCN (Bi-GCN) with message passing is then employed to encode relationships between entities. Additionally, unnecessary information is filtered out using an aspect-specific masking technique. To validate the effectiveness of our proposed model, we conduct extensive evaluation experiments and ablation studies on four benchmark datasets. The results consistently demonstrate improved performance in aspect-based sentiment analysis when employing SentiSys. This approach successfully addresses the challenges associated with syntactic feature extraction, highlighting its potential for advancing sentiment analysis methodologies.

Updated: 2024-04-04 07:31:56

标题: 提升基于方面的情感分析系统的性能

摘要: 基于方面的情感分析旨在以细粒度预测情感极性。虽然图卷积网络(GCNs)被广泛用于情感特征提取,但它们在句法特征提取方面的简单应用可能会损害信息保存。本研究引入了一种创新的增强边缘的GCN,命名为SentiSys,用于在保留完整特征信息的同时导航句法图,从而提高性能。具体来说,我们首先整合了一个双向长短期记忆(Bi-LSTM)网络和一个基于自注意力的变换器。这种组合有助于有效地对文本进行编码,防止信息丢失,并预测长依赖文本。然后,采用具有消息传递的双向GCN(Bi-GCN)来编码实体之间的关系。此外,采用特定于方面的屏蔽技术来过滤掉不必要的信息。为了验证我们提出的模型的有效性,我们对四个基准数据集进行了广泛的评估实验和消融研究。结果一致表明,在使用SentiSys时,在基于方面的情感分析中,性能得到了改善。这种方法成功地解决了与句法特征提取相关的挑战,突显了它在推进情感分析方法论方面的潜力。

更新时间: 2024-04-04 07:31:56

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2404.03259v1

A dataset of primary nasopharyngeal carcinoma MRI with multi-modalities segmentation

Multi-modality magnetic resonance imaging data with various sequences facilitate the early diagnosis, tumor segmentation, and disease staging in the management of nasopharyngeal carcinoma (NPC). The lack of publicly available, comprehensive datasets limits advancements in diagnosis, treatment planning, and the development of machine learning algorithms for NPC. Addressing this critical need, we introduce the first comprehensive NPC MRI dataset, encompassing MR axial imaging of 277 primary NPC patients. This dataset includes T1-weighted, T2-weighted, and contrast-enhanced T1-weighted sequences, totaling 831 scans. In addition to the corresponding clinical data, manually annotated and labeled segmentations by experienced radiologists offer high-quality data resources from untreated primary NPC.

Updated: 2024-04-04 07:19:31

标题: 一个包含多种模态分割的原发性鼻咽癌MRI数据集

摘要: 多模式磁共振成像数据结合不同序列,有助于鼻咽癌(NPC)的早期诊断、肿瘤分割和疾病分期管理。公开可用的综合数据集的缺乏限制了NPC的诊断、治疗规划和机器学习算法开发的进展。为满足这一迫切需求,我们介绍了首个全面的NPC磁共振成像数据集,包括277名原发性NPC患者的MR轴向成像。该数据集包括T1加权、T2加权和增强T1加权序列,总共831次扫描。除了相应的临床数据外,经验丰富的放射科医生手动注释和标记的分割提供了来自未经治疗的原发性NPC的高质量数据资源。

更新时间: 2024-04-04 07:19:31

领域: eess.IV,cs.AI,cs.CV,cs.LG

下载: http://arxiv.org/abs/2404.03253v1

Multi-task learning via robust regularized clustering with non-convex group penalties

Multi-task learning (MTL) aims to improve estimation and prediction performance by sharing common information among related tasks. One natural assumption in MTL is that tasks are classified into clusters based on their characteristics. However, existing MTL methods based on this assumption often ignore outlier tasks that have large task-specific components or no relation to other tasks. To address this issue, we propose a novel MTL method called Multi-Task Learning via Robust Regularized Clustering (MTLRRC). MTLRRC incorporates robust regularization terms inspired by robust convex clustering, which is further extended to handle non-convex and group-sparse penalties. The extension allows MTLRRC to simultaneously perform robust task clustering and outlier task detection. The connection between the extended robust clustering and the multivariate M-estimator is also established. This provides an interpretation of the robustness of MTLRRC against outlier tasks. An efficient algorithm based on a modified alternating direction method of multipliers is developed for the estimation of the parameters. The effectiveness of MTLRRC is demonstrated through simulation studies and application to real data.

Updated: 2024-04-04 07:09:43

标题: 多任务学习通过具有非凸组惩罚的稳健正则化聚类

摘要: 多任务学习(MTL)旨在通过在相关任务之间共享共同信息来改善估计和预测性能。MTL中的一个自然假设是根据任务的特征将任务分类为簇。然而,基于这一假设的现有MTL方法经常忽略具有较大任务特定组件或与其他任务没有关系的异常任务。为解决这一问题,我们提出了一种名为多任务学习 via 鲁棒正则化聚类(MTLRRC)的新型MTL方法。MTLRRC结合了受鲁棒凸聚类启发的鲁棒正则化项,进一步扩展以处理非凸和组稀疏惩罚。此扩展允许MTLRRC同时执行鲁棒任务聚类和异常任务检测。扩展的鲁棒聚类与多元M-估计器之间的联系也得以建立。这为MTLRRC对抗异常任务的鲁棒性提供了解释。为了估计参数,基于修改的交替方向法的多参数算法被开发出来。通过模拟研究和实际数据应用,证明了MTLRRC的有效性。

更新时间: 2024-04-04 07:09:43

领域: stat.ME,cs.LG,stat.ML

下载: http://arxiv.org/abs/2404.03250v1

Semi-supervised Domain Adaptation on Graphs with Contrastive Learning and Minimax Entropy

Label scarcity in a graph is frequently encountered in real-world applications due to the high cost of data labeling. To this end, semi-supervised domain adaptation (SSDA) on graphs aims to leverage the knowledge of a labeled source graph to aid in node classification on a target graph with limited labels. SSDA tasks need to overcome the domain gap between the source and target graphs. However, to date, this challenging research problem has yet to be formally considered by the existing approaches designed for cross-graph node classification. This paper proposes a novel method called SemiGCL to tackle the graph \textbf{Semi}-supervised domain adaptation with \textbf{G}raph \textbf{C}ontrastive \textbf{L}earning and minimax entropy training. SemiGCL generates informative node representations by contrasting the representations learned from a graph's local and global views. Additionally, SemiGCL is adversarially optimized with the entropy loss of unlabeled target nodes to reduce domain divergence. Experimental results on benchmark datasets demonstrate that SemiGCL outperforms the state-of-the-art baselines on the SSDA tasks. The source codes of SemiGCL are publicly available at https://github.com/ JiarenX/SemiGCL.

Updated: 2024-04-04 07:08:25

标题: 在图上使用对比学习和极小熵的半监督域自适应

摘要: 标签稀缺在图中经常在真实世界的应用中遇到,这是由于数据标签的高成本造成的。为此,图上的半监督领域自适应(SSDA)旨在利用标记源图的知识来帮助在具有有限标签的目标图上进行节点分类。SSDA任务需要克服源图和目标图之间的领域差距。然而,迄今为止,为跨图节点分类设计的现有方法尚未正式考虑这个具有挑战性的研究问题。本文提出了一种名为SemiGCL的新方法,以应对图上的半监督领域自适应,采用图对比学习和极小熵训练。SemiGCL通过对比从图的局部和全局视图学习的表示来生成信息丰富的节点表示。此外,SemiGCL通过未标记目标节点的熵损失进行对抗优化,以减少领域分歧。在基准数据集上的实验结果表明,SemiGCL在SSDA任务上优于最先进的基准线。SemiGCL的源代码可以在https://github.com/JiarenX/SemiGCL上公开获取。

更新时间: 2024-04-04 07:08:25

领域: cs.LG

下载: http://arxiv.org/abs/2309.07402v2

Knowledge-Based Convolutional Neural Network for the Simulation and Prediction of Two-Phase Darcy Flows

Physics-informed neural networks (PINNs) have gained significant prominence as a powerful tool in the field of scientific computing and simulations. Their ability to seamlessly integrate physical principles into deep learning architectures has revolutionized the approaches to solving complex problems in physics and engineering. However, a persistent challenge faced by mainstream PINNs lies in their handling of discontinuous input data, leading to inaccuracies in predictions. This study addresses these challenges by incorporating the discretized forms of the governing equations into the PINN framework. We propose to combine the power of neural networks with the dynamics imposed by the discretized differential equations. By discretizing the governing equations, the PINN learns to account for the discontinuities and accurately capture the underlying relationships between inputs and outputs, improving the accuracy compared to traditional interpolation techniques. Moreover, by leveraging the power of neural networks, the computational cost associated with numerical simulations is substantially reduced. We evaluate our model on a large-scale dataset for the prediction of pressure and saturation fields demonstrating high accuracies compared to non-physically aware models.

Updated: 2024-04-04 06:56:32

标题: 基于知识的卷积神经网络在两相达西流动模拟和预测中的应用

摘要: 物理信息神经网络(PINNs)作为科学计算和仿真领域中强大工具已经获得了显著的关注。它们能够将物理原理无缝地整合到深度学习架构中,彻底改变了解决物理和工程领域复杂问题的方法。然而,主流PINNs面临的一个持续挑战在于处理不连续的输入数据,导致预测的不准确性。本研究通过将控制方程的离散形式纳入PINN框架来解决这些挑战。我们提议将神经网络的强大功能与离散化的微分方程所施加的动力学相结合。通过离散化控制方程,PINN学习考虑不连续性,并准确捕捉输入和输出之间的潜在关系,提高了与传统插值技术相比的准确性。此外,通过利用神经网络的能力,与数值模拟相关的计算成本大大降低。我们在大规模数据集上评估了我们的模型,用于预测压力和饱和度场,相较于非物理感知模型展示了高准确性。

更新时间: 2024-04-04 06:56:32

领域: cs.LG,physics.flu-dyn

下载: http://arxiv.org/abs/2404.03240v1

Exploring Emotions in Multi-componential Space using Interactive VR Games

Emotion understanding is a complex process that involves multiple components. The ability to recognise emotions not only leads to new context awareness methods but also enhances system interaction's effectiveness by perceiving and expressing emotions. Despite the attention to discrete and dimensional models, neuroscientific evidence supports those emotions as being complex and multi-faceted. One framework that resonated well with such findings is the Component Process Model (CPM), a theory that considers the complexity of emotions with five interconnected components: appraisal, expression, motivation, physiology and feeling. However, the relationship between CPM and discrete emotions has not yet been fully explored. Therefore, to better understand emotions underlying processes, we operationalised a data-driven approach using interactive Virtual Reality (VR) games and collected multimodal measures (self-reports, physiological and facial signals) from 39 participants. We used Machine Learning (ML) methods to identify the unique contributions of each component to emotion differentiation. Our results showed the role of different components in emotion differentiation, with the model including all components demonstrating the most significant contribution. Moreover, we found that at least five dimensions are needed to represent the variation of emotions in our dataset. These findings also have implications for using VR environments in emotion research and highlight the role of physiological signals in emotion recognition within such environments.

Updated: 2024-04-04 06:54:44

标题: 使用交互式虚拟现实游戏探索多组分空间中的情感

摘要: 情绪理解是一个复杂的过程,涉及多个组成部分。识别情绪的能力不仅可以带来新的情境意识方法,还可以通过感知和表达情绪来增强系统交互的有效性。尽管人们关注离散和维度模型,神经科学证据支持情绪是复杂且多方面的。一个与这些发现相吻合的框架是组件过程模型(CPM),这是一个考虑情绪复杂性的理论,包括五个相互关联的组件:评估、表达、动机、生理和感觉。然而,CPM与离散情绪之间的关系尚未完全探讨。因此,为了更好地理解情感潜在过程,我们采用了一种数据驱动方法,利用交互式虚拟现实(VR)游戏收集了39名参与者的多模态测量(自我报告、生理和面部信号)。我们使用机器学习(ML)方法识别每个组件对情绪差异化的独特贡献。我们的结果显示了不同组件在情绪差异化中的作用,包括所有组件的模型表现出最显著的贡献。此外,我们发现至少需要五个维度来代表我们数据集中情绪的变化。这些发现还对在情感研究中使用VR环境具有影响,并强调了在这种环境中情绪识别中生理信号的作用。

更新时间: 2024-04-04 06:54:44

领域: cs.HC,cs.AI,cs.LG

下载: http://arxiv.org/abs/2404.03239v1

Learn What You Want to Unlearn: Unlearning Inversion Attacks against Machine Unlearning

Machine unlearning has become a promising solution for fulfilling the "right to be forgotten", under which individuals can request the deletion of their data from machine learning models. However, existing studies of machine unlearning mainly focus on the efficacy and efficiency of unlearning methods, while neglecting the investigation of the privacy vulnerability during the unlearning process. With two versions of a model available to an adversary, that is, the original model and the unlearned model, machine unlearning opens up a new attack surface. In this paper, we conduct the first investigation to understand the extent to which machine unlearning can leak the confidential content of the unlearned data. Specifically, under the Machine Learning as a Service setting, we propose unlearning inversion attacks that can reveal the feature and label information of an unlearned sample by only accessing the original and unlearned model. The effectiveness of the proposed unlearning inversion attacks is evaluated through extensive experiments on benchmark datasets across various model architectures and on both exact and approximate representative unlearning approaches. The experimental results indicate that the proposed attack can reveal the sensitive information of the unlearned data. As such, we identify three possible defenses that help to mitigate the proposed attacks, while at the cost of reducing the utility of the unlearned model. The study in this paper uncovers an underexplored gap between machine unlearning and the privacy of unlearned data, highlighting the need for the careful design of mechanisms for implementing unlearning without leaking the information of the unlearned data.

Updated: 2024-04-04 06:37:46

标题: 学习你想要忘记的内容:机器学习遗忘反击的逆向攻击

摘要: 机器遗忘已成为实现“被遗忘权”的一种有前途的解决方案,根据该权利,个人可以要求从机器学习模型中删除其数据。然而,现有的机器遗忘研究主要关注遗忘方法的功效和效率,而忽视了遗忘过程中的隐私漏洞调查。当对手拥有两个版本的模型,即原始模型和已遗忘模型时,机器遗忘开辟了一个新的攻击面。本文开展了首次调查,以了解机器遗忘在多大程度上可能泄露已遗忘数据的机密内容。具体来说,在“机器学习即服务”设置下,我们提出了可以通过仅访问原始和已遗忘模型来揭示已遗忘样本的特征和标签信息的遗忘逆推攻击。通过在各种模型架构上对基准数据集进行广泛实验以及对精确和近似代表性的遗忘方法进行评估,验证了所提出的遗忘逆推攻击的有效性。实验结果表明,所提出的攻击可以揭示已遗忘数据的敏感信息。因此,我们确定了三种可能的防御措施,有助于减轻所提出的攻击,但会降低已遗忘模型的效用。本文的研究揭示了机器遗忘和已遗忘数据的隐私之间尚未充分探索的差距,强调了需要谨慎设计机制以实现遗忘而不泄露已遗忘数据信息的必要性。

更新时间: 2024-04-04 06:37:46

领域: cs.CR

下载: http://arxiv.org/abs/2404.03233v1

Short-term prediction of construction waste transport activities using AI-Truck

Construction waste hauling trucks (or `slag trucks') are among the most commonly seen heavy-duty diesel vehicles in urban streets, which not only produce significant carbon, NO$_{\textbf{x}}$ and PM$_{\textbf{2.5}}$ emissions but are also a major source of on-road and on-site fugitive dust. Slag trucks are subject to a series of spatial and temporal access restrictions by local traffic and environmental policies. This paper addresses the practical problem of predicting levels of slag truck activity at a city scale during heavy pollution episodes, such that environmental law enforcement units can take timely and proactive measures against localized truck aggregation. A deep ensemble learning framework (coined AI-Truck) is designed, which employs a soft vote integrator that utilizes Bi-LSTM, TCN, STGCN, and PDFormer as base classifiers. AI-Truck employs a combination of downsampling and weighted loss is employed to address sample imbalance, and utilizes truck trajectories to extract more accurate and effective geographic features. The framework was deployed for truck activity prediction at a resolution of 1km$\times$1km$\times$0.5h, in a 255 km$^{\textbf{2}}$ area in Chengdu, China. As a classifier, AI-Truck achieves a macro F1 of 0.747 in predicting levels of slag truck activity for 0.5-h prediction time length, and enables personnel to spot high-activity locations 1.5 hrs ahead with over 80\% accuracy.

Updated: 2024-04-04 06:31:36

标题: 基于AI-Truck的建筑废弃物运输活动短期预测

摘要: 建筑废料运输卡车(或“渣土卡车”)是城市街道上最常见的重型柴油车辆之一,不仅产生大量的碳、NO$_{\textbf{x}}$和PM$_{\textbf{2.5}}$排放,而且也是道路和工地上游离尘埃的主要来源。渣土卡车受到当地交通和环境政策的一系列空间和时间访问限制。本文解决了在重度污染事件期间在城市范围内预测渣土卡车活动水平的实际问题,以便环境执法单位可以及时和主动地采取措施防止局部卡车聚集。设计了一个深度集成学习框架(命名为AI-Truck),该框架采用软投票整合器,利用Bi-LSTM、TCN、STGCN和PDFormer作为基本分类器。AI-Truck采用下采样和加权损失的组合来解决样本不平衡问题,并利用卡车轨迹提取更准确和有效的地理特征。该框架在中国成都255平方公里的区域以1km$\times$1km$\times$0.5h的分辨率部署,用于预测卡车活动。作为分类器,AI-Truck在0.5小时预测时长内预测渣土卡车活动水平的macro F1为0.747,并使人员能够提前1.5小时以超过80\%的准确率发现高活动区域。

更新时间: 2024-04-04 06:31:36

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2312.04609v2

Unleashing the Potential of Large Language Models for Predictive Tabular Tasks in Data Science

In the domain of data science, the predictive tasks of classification, regression, and imputation of missing values are commonly encountered challenges associated with tabular data. This research endeavors to apply Large Language Models (LLMs) towards addressing these predictive tasks. Despite their proficiency in comprehending natural language, LLMs fall short in dealing with structured tabular data. This limitation stems from their lacking exposure to the intricacies of tabular data during their foundational training. Our research aims to mitigate this gap by compiling a comprehensive corpus of tables annotated with instructions and executing large-scale training of Llama-2 on this enriched dataset. Furthermore, we investigate the practical application of applying the trained model to zero-shot prediction, few-shot prediction, and in-context learning scenarios. Through extensive experiments, our methodology has shown significant improvements over existing benchmarks. These advancements highlight the efficacy of tailoring LLM training to solve table-related problems in data science, thereby establishing a new benchmark in the utilization of LLMs for enhancing tabular intelligence.

Updated: 2024-04-04 06:28:25

标题: 释放大型语言模型在数据科学中预测性表格任务中的潜力

摘要: 在数据科学领域,分类、回归和缺失值插补等预测任务通常与表格数据相关的挑战密切相关。本研究旨在应用大型语言模型(LLMs)解决这些预测任务。尽管LLMs在理解自然语言方面表现出色,但在处理结构化表格数据方面表现不佳。这种局限性源自它们在基础训练过程中未接触到表格数据的复杂性。我们的研究旨在通过整理一套包含指导说明的表格注释语料库,并在这一丰富的数据集上对Llama-2进行大规模训练,以弥补这一差距。此外,我们研究了将训练模型应用于零样本预测、少样本预测和上下文学习场景的实际应用。通过大量实验证明,我们的方法在现有基准上取得了显著改进。这些进展突显了将LLM训练定制为解决数据科学中与表格相关的问题的有效性,从而建立了LLM用于增强表格智能的新基准。

更新时间: 2024-04-04 06:28:25

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2403.20208v3

Decentralized Learning Strategies for Estimation Error Minimization with Graph Neural Networks

We address the challenge of sampling and remote estimation for autoregressive Markovian processes in a multi-hop wireless network with statistically-identical agents. Agents cache the most recent samples from others and communicate over wireless collision channels governed by an underlying graph topology. Our goal is to minimize time-average estimation error and/or age of information with decentralized scalable sampling and transmission policies, considering both oblivious (where decision-making is independent of the physical processes) and non-oblivious policies (where decision-making depends on physical processes). We prove that in oblivious policies, minimizing estimation error is equivalent to minimizing the age of information. The complexity of the problem, especially the multi-dimensional action spaces and arbitrary network topologies, makes theoretical methods for finding optimal transmission policies intractable. We optimize the policies using a graphical multi-agent reinforcement learning framework, where each agent employs a permutation-equivariant graph neural network architecture. Theoretically, we prove that our proposed framework exhibits desirable transferability properties, allowing transmission policies trained on small- or moderate-size networks to be executed effectively on large-scale topologies. Numerical experiments demonstrate that (i) Our proposed framework outperforms state-of-the-art baselines; (ii) The trained policies are transferable to larger networks, and their performance gains increase with the number of agents; (iii) The training procedure withstands non-stationarity even if we utilize independent learning techniques; and, (iv) Recurrence is pivotal in both independent learning and centralized training and decentralized execution, and improves the resilience to non-stationarity in independent learning.

Updated: 2024-04-04 06:24:11

标题: 基于图神经网络的去中心化学习策略用于估计误差最小化

摘要: 我们研究了在具有具有统计相同特性的代理的多跳无线网络中,对自回归马尔可夫过程进行采样和远程估计的挑战。代理缓存来自其他代理的最新样本,并通过由底层图拓扑结构管理的无线碰撞信道进行通信。我们的目标是通过分散可扩展的采样和传输策略,最小化时间平均估计误差和/或信息时代,考虑到无意识(决策独立于物理过程)和非无意识策略(决策依赖于物理过程)。我们证明,在无意识策略中,最小化估计误差等同于最小化信息时代。问题的复杂性,特别是多维动作空间和任意网络拓扑,使得在理论上找到最优传输策略的方法不可行。我们使用图形多代理强化学习框架来优化策略,其中每个代理采用置换等变图神经网络架构。从理论上讲,我们证明我们提出的框架具有理想的可迁移性特性,允许在小型或中等大小网络上训练的传输策略在大规模拓扑上有效执行。数值实验表明:(i)我们提出的框架优于现有技术基线;(ii)训练好的策略可以迁移到更大的网络,并且它们的性能收益随着代理数量的增加而增加;(iii)即使我们使用独立学习技术,训练过程也能承受非稳态;以及(iv)在独立学习和集中式训练以及分散式执行中,复现对抗非稳态具有关键作用。

更新时间: 2024-04-04 06:24:11

领域: eess.SP,cs.LG

下载: http://arxiv.org/abs/2404.03227v1

FACTUAL: A Novel Framework for Contrastive Learning Based Robust SAR Image Classification

Deep Learning (DL) Models for Synthetic Aperture Radar (SAR) Automatic Target Recognition (ATR), while delivering improved performance, have been shown to be quite vulnerable to adversarial attacks. Existing works improve robustness by training models on adversarial samples. However, by focusing mostly on attacks that manipulate images randomly, they neglect the real-world feasibility of such attacks. In this paper, we propose FACTUAL, a novel Contrastive Learning framework for Adversarial Training and robust SAR classification. FACTUAL consists of two components: (1) Differing from existing works, a novel perturbation scheme that incorporates realistic physical adversarial attacks (such as OTSA) to build a supervised adversarial pre-training network. This network utilizes class labels for clustering clean and perturbed images together into a more informative feature space. (2) A linear classifier cascaded after the encoder to use the computed representations to predict the target labels. By pre-training and fine-tuning our model on both clean and adversarial samples, we show that our model achieves high prediction accuracy on both cases. Our model achieves 99.7% accuracy on clean samples, and 89.6% on perturbed samples, both outperforming previous state-of-the-art methods.

Updated: 2024-04-04 06:20:22

标题: FACTUAL:基于对比学习的鲁棒SAR图像分类的新框架

摘要: 深度学习(DL)模型用于合成孔径雷达(SAR)自动目标识别(ATR)时,虽然提高了性能,但表现出对对抗攻击相当脆弱的特点。现有研究通过在对抗样本上训练模型来提高鲁棒性。然而,由于主要关注随机操纵图像的攻击,他们忽视了这种攻击在现实世界中的可行性。在本文中,我们提出了FACTUAL,一种新颖的对比学习框架,用于对抗训练和鲁棒的SAR分类。FACTUAL包括两个组件:(1)与现有作品不同,一种新颖的扰动方案,将现实物理对抗攻击(如OTSA)纳入监督对抗预训练网络中。该网络利用类标签将干净和扰动图像聚类到更具信息性的特征空间中。 (2)在编码器之后级联的线性分类器,使用计算的表示来预测目标标签。通过在干净和对抗样本上进行预训练和微调,我们展示了我们的模型在两种情况下都实现了高预测准确性。我们的模型在干净样本上实现了99.7%的准确率,在扰动样本上实现了89.6%的准确率,均优于先前的最先进方法。

更新时间: 2024-04-04 06:20:22

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2404.03225v1

Conversational Disease Diagnosis via External Planner-Controlled Large Language Models

The advancement of medical artificial intelligence (AI) has set the stage for the realization of conversational diagnosis, where AI systems mimic human doctors by engaging in dialogue with patients to deduce diagnoses. This study introduces an innovative approach using external planners augmented with large language models (LLMs) to develop a medical task-oriented dialogue system. This system comprises a policy module for information gathering, a LLM based module for natural language understanding and generation, addressing the limitations of previous AI systems in these areas. By emulating the two-phase decision-making process of doctors disease screening and differential diagnosis. we designed two distinct planners. The first focuses on collecting patient symptoms to identify potential diseases, while the second delves into specific inquiries to confirm or exclude these diseases. Utilizing reinforcement learning and active learning with LLMs, we trained these planners to navigate medical dialogues effectively. Our evaluation on the MIMIC-IV dataset demonstrated the system's capability to outperform existing models, indicating a significant step towards achieving automated conversational disease diagnostics and enhancing the precision and accessibility of medical diagnoses.

Updated: 2024-04-04 06:16:35

标题: 通过外部计划控制的大型语言模型进行对话式疾病诊断

摘要: 医学人工智能的进步为实现对话式诊断奠定了基础,其中人工智能系统通过与患者对话以推断诊断,模拟人类医生的行为。本研究引入了一种创新方法,利用增强了大型语言模型(LLMs)的外部规划器开发医学任务导向的对话系统。该系统包括一个用于信息收集的策略模块,一个基于LLM的自然语言理解和生成模块,解决了以往人工智能系统在这些领域的局限性。通过模拟医生疾病筛查和鉴别诊断的两阶段决策过程,我们设计了两个不同的规划器。第一个重点收集患者症状以识别潜在疾病,而第二个则深入具体询问以确认或排除这些疾病。利用强化学习和LLMs的主动学习,我们训练这些规划器有效地导航医学对话。我们在MIMIC-IV数据集上的评估表明,该系统具有超越现有模型的能力,标志着实现自动对话式疾病诊断并提高医学诊断的准确性和可访问性迈出了重要一步。

更新时间: 2024-04-04 06:16:35

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2404.04292v1

Enabling Clean Energy Resilience with Machine Learning-Empowered Underground Hydrogen Storage

To address the urgent challenge of climate change, there is a critical need to transition away from fossil fuels towards sustainable energy systems, with renewable energy sources playing a pivotal role. However, the inherent variability of renewable energy, without effective storage solutions, often leads to imbalances between energy supply and demand. Underground Hydrogen Storage (UHS) emerges as a promising long-term storage solution to bridge this gap, yet its widespread implementation is impeded by the high computational costs associated with high fidelity UHS simulations. This paper introduces UHS from a data-driven perspective and outlines a roadmap for integrating machine learning into UHS, thereby facilitating the large-scale deployment of UHS.

Updated: 2024-04-04 06:10:57

标题: 利用机器学习技术实现清洁能源弹性:地下储氢技术

摘要: 为了应对气候变化带来的紧迫挑战,迫切需要摆脱化石燃料向可持续能源系统过渡,可再生能源扮演着关键角色。然而,可再生能源固有的变化性,如果没有有效的储存解决方案,往往会导致能源供需之间的不平衡。地下氢气储存(UHS)作为一个有前途的长期储存解决方案,能够填补这一差距,但其广泛实施受制于与高保真度UHS模拟相关的高计算成本。本文从数据驱动的角度介绍了UHS,并概述了将机器学习整合到UHS中的路线图,从而促进UHS的大规模部署。

更新时间: 2024-04-04 06:10:57

领域: cs.LG,cs.SY,eess.SY

下载: http://arxiv.org/abs/2404.03222v1

Commitments are equivalent to one-way state generators

One-way state generators (OWSG) are natural quantum analogs to classical one-way functions. We show that $O\left(\frac{n}{\log(n)}\right)$-copy OWSGs ($n$ represents the input length) are equivalent to $poly(n)$-copy OWSG and to quantum commitments. Since known results show that $o\left(\frac{n}{\log(n)}\right)$-copy OWSG cannot imply commitments, this shows that $O\left(\frac{n}{\log(n)}\right)$-copy OWSGs are the weakest OWSGs from which we can get commitments (and hence much of quantum cryptography). Our construction follows along the lines of H\r{a}stad, Impagliazzo, Levin and Luby [HILL], who obtained classical pseudorandom generators (PRG) from classical one-way functions (OWF), however with crucial modifications. Our construction, when applied to the classical case, provides an alternative to the construction provided by [HILL]. Since we do not argue conditioned on the output of the one-way function, our construction and analysis are arguably simpler and may be of independent interest.

Updated: 2024-04-04 06:06:38

标题: 承诺等同于单向状态生成器

摘要: 单向状态生成器(OWSG)是经典单向函数的自然量子类比。我们展示了$O\left(\frac{n}{\log(n)}\right)$-copy OWSGs($n$代表输入长度)等价于$poly(n)$-copy OWSG和量子承诺。由于已知结果表明$o\left(\frac{n}{\log(n)}\right)$-copy OWSG不能暗示承诺,这表明$O\left(\frac{n}{\log(n)}\right)$-copy OWSGs是我们可以获得承诺(因此也是量子密码学的许多内容)的最弱OWSGs。 我们的构造沿着H\r{a}stad,Impagliazzo,Levin和Luby[HILL]的思路进行,他们从经典单向函数(OWF)获得了经典伪随机生成器(PRG),不过有关键的修改。我们的构造在应用于经典情况时,提供了[HILL]提供的构造的另一种选择。由于我们没有根据单向函数的输出进行论证,我们的构造和分析可能更简单,可能具有独立的兴趣。

更新时间: 2024-04-04 06:06:38

领域: quant-ph,cs.CR

下载: http://arxiv.org/abs/2404.03220v1

T-COL: Generating Counterfactual Explanations for General User Preferences on Variable Machine Learning Systems

To address the interpretability challenge in machine learning (ML) systems, counterfactual explanations (CEs) have emerged as a promising solution. CEs are unique as they provide workable suggestions to users, in addition to explaining why a certain outcome was predicted. The application of CEs encounters two main challenges: general user preferences and variable ML systems. User preferences tend to be general rather than specific, and CEs need to be adaptable to variable ML models while maintaining robustness even as these models change. Facing these challenges, we present a solution rooted in validated general user preferences, which are derived from thorough user research. We map these preferences to the properties of CEs. Additionally, we introduce a novel method, \uline{T}ree-based \uline{C}onditions \uline{O}ptional \uline{L}inks (T-COL), which incorporates two optional structures and multiple condition groups for generating CEs adaptable to general user preferences. Meanwhile, we employ T-COL to enhance the robustness of CEs with specific conditions, making them more valid even when the ML model is replaced. Our experimental comparisons under different user preferences show that T-COL outperforms all baselines, including Large Language Models which are shown to be able to generate counterfactuals.

Updated: 2024-04-04 05:59:22

标题: T-COL:为可变机器学习系统上的一般用户偏好生成反事实解释

摘要: 为了解决机器学习(ML)系统中的可解释性挑战,反事实解释(CEs)已经成为一种有希望的解决方案。CEs独特之处在于除了解释为什么预测出某种结果外,还为用户提供可操作的建议。CEs的应用遇到两个主要挑战:一是普遍用户偏好,二是不同的ML系统。用户偏好往往是普遍性的而非具体的,而CEs需要能够适应不同的ML模型,同时保持鲁棒性,即使这些模型发生变化。面对这些挑战,我们提出了一种解决方案,基于经过深入用户研究得出的验证过的普遍用户偏好。我们将这些偏好映射到CEs的属性上。此外,我们引入了一种新颖的方法,基于树的条件可选链接(T-COL),它结合了两种可选结构和多个条件组,用于生成适应普遍用户偏好的CEs。同时,我们利用T-COL增强了具体条件的CEs的鲁棒性,使它们在ML模型被替换时更加有效。我们在不同用户偏好下进行的实验比较表明,T-COL优于所有基线,包括已被证明能够生成反事实的大型语言模型。

更新时间: 2024-04-04 05:59:22

领域: cs.AI

下载: http://arxiv.org/abs/2309.16146v2

Accurate Low-Degree Polynomial Approximation of Non-polynomial Operators for Fast Private Inference in Homomorphic Encryption

As machine learning (ML) permeates fields like healthcare, facial recognition, and blockchain, the need to protect sensitive data intensifies. Fully Homomorphic Encryption (FHE) allows inference on encrypted data, preserving the privacy of both data and the ML model. However, it slows down non-secure inference by up to five magnitudes, with a root cause of replacing non-polynomial operators (ReLU and MaxPooling) with high-degree Polynomial Approximated Function (PAF). We propose SmartPAF, a framework to replace non-polynomial operators with low-degree PAF and then recover the accuracy of PAF-approximated model through four techniques: (1) Coefficient Tuning (CT) -- adjust PAF coefficients based on the input distributions before training, (2) Progressive Approximation (PA) -- progressively replace one non-polynomial operator at a time followed by a fine-tuning, (3) Alternate Training (AT) -- alternate the training between PAFs and other linear operators in the decoupled manner, and (4) Dynamic Scale (DS) / Static Scale (SS) -- dynamically scale PAF input value within (-1, 1) in training, and fix the scale as the running max value in FHE deployment. The synergistic effect of CT, PA, AT, and DS/SS enables SmartPAF to enhance the accuracy of the various models approximated by PAFs with various low degrees under multiple datasets. For ResNet-18 under ImageNet-1k, the Pareto-frontier spotted by SmartPAF in latency-accuracy tradeoff space achieves 1.42x ~ 13.64x accuracy improvement and 6.79x ~ 14.9x speedup than prior works. Further, SmartPAF enables a 14-degree PAF (f1^2 g_1^2) to achieve 7.81x speedup compared to the 27-degree PAF obtained by minimax approximation with the same 69.4% post-replacement accuracy. Our code is available at https://github.com/TorchFHE/SmartPAF.

Updated: 2024-04-04 05:45:52

标题: 在同态加密中进行快速私密推断的非多项式运算符的准确低次多项式逼近

摘要: 随着机器学习(ML)渗透到诸如医疗保健、人脸识别和区块链等领域,保护敏感数据的需求日益加剧。全同态加密(FHE)允许对加密数据进行推断,保护数据和ML模型的隐私。然而,它会使非安全推断速度降低达到五个数量级,根本原因是使用高次多项式逼近函数(PAF)替换非多项式运算符(ReLU和MaxPooling)。我们提出了SmartPAF,一个框架,用低次数PAF替换非多项式运算符,然后通过四种技术恢复PAF逼近模型的准确性:(1)系数调整(CT)-在训练之前根据输入分布调整PAF系数,(2)渐进逼近(PA)-逐步替换一个非多项式运算符,然后进行微调,(3)交替训练(AT)-在解耦的方式下交替训练PAF和其他线性运算符,以及(4)动态尺度(DS)/静态尺度(SS)-在训练中动态缩放PAF输入值至(-1, 1),并在FHE部署中将尺度固定为运行最大值。CT、PA、AT和DS/SS的协同效应使SmartPAF能够提高在多个数据集下由各种低次数PAF逼近的各种模型的准确性。对于ImageNet-1k下的ResNet-18,SmartPAF在延迟-准确性权衡空间中发现的帕累托前沿实现了1.42x ~ 13.64x的准确性改进和6.79x ~ 14.9x的速度提升,比以前的工作更快。此外,SmartPAF使一个14次PAF(f1^2 g_1^2)相比于通过最小最大逼近获得的27次PAF实现了7.81x的速度提升,后者具有相同的69.4%替换后准确性。我们的代码可在https://github.com/TorchFHE/SmartPAF找到。

更新时间: 2024-04-04 05:45:52

领域: cs.CR

下载: http://arxiv.org/abs/2404.03216v1

Exploiting Contextual Structure to Generate Useful Auxiliary Tasks

Reinforcement learning requires interaction with an environment, which is expensive for robots. This constraint necessitates approaches that work with limited environmental interaction by maximizing the reuse of previous experiences. We propose an approach that maximizes experience reuse while learning to solve a given task by generating and simultaneously learning useful auxiliary tasks. To generate these tasks, we construct an abstract temporal logic representation of the given task and leverage large language models to generate context-aware object embeddings that facilitate object replacements. Counterfactual reasoning and off-policy methods allow us to simultaneously learn these auxiliary tasks while solving the given target task. We combine these insights into a novel framework for multitask reinforcement learning and experimentally show that our generated auxiliary tasks share similar underlying exploration requirements as the given task, thereby maximizing the utility of directed exploration. Our approach allows agents to automatically learn additional useful policies without extra environment interaction.

Updated: 2024-04-04 05:37:52

标题: 利用上下文结构生成有用的辅助任务

摘要: 强化学习需要与环境互动,这对机器人来说是昂贵的。这种约束需要采取方法来最大限度地利用先前的经验,从而减少与环境的互动。我们提出了一种方法,通过生成并同时学习有用的辅助任务,最大化经验的重复利用,从而学习解决给定任务。为了生成这些任务,我们构建了给定任务的抽象时间逻辑表示,并利用大型语言模型生成上下文感知的对象嵌入,以促进对象替换。反事实推理和离线策略方法使我们能够在解决给定的目标任务的同时学习这些辅助任务。我们将这些见解结合到一个新颖的多任务强化学习框架中,并实验证明我们生成的辅助任务与给定任务共享类似的探索需求,从而最大化定向探索的效用。我们的方法允许代理自动学习额外有用的策略,而无需额外的环境互动。

更新时间: 2024-04-04 05:37:52

领域: cs.AI,cs.LG,cs.RO

下载: http://arxiv.org/abs/2303.05038v2

Convergence Conditions of Online Regularized Statistical Learning in Reproducing Kernel Hilbert Space With Non-Stationary Data

We study the convergence of recursive regularized learning algorithms in the reproducing kernel Hilbert space (RKHS) with dependent and non-stationary online data streams. Firstly, we study the mean square asymptotic stability of a class of random difference equations in RKHS, whose non-homogeneous terms are martingale difference sequences dependent on the homogeneous ones. Secondly, we introduce the concept of random Tikhonov regularization path, and show that if the regularization path is slowly time-varying in some sense, then the output of the algorithm is consistent with the regularization path in mean square. Furthermore, if the data streams also satisfy the RKHS persistence of excitation condition, i.e. there exists a fixed length of time period, such that each eigenvalue of the conditional expectation of the operators induced by the input data accumulated over every time period has a uniformly positive lower bound with respect to time, then the output of the algorithm is consistent with the unknown function in mean square. Finally, for the case with independent and non-identically distributed data streams, the algorithm achieves the mean square consistency provided the marginal probability measures induced by the input data are slowly time-varying and the average measure over each fixed-length time period has a uniformly strictly positive lower bound.

Updated: 2024-04-04 05:35:59

标题: 在线正则化统计学习在含非平稳数据的再生核希尔伯特空间中的收敛条件

摘要: 我们研究了在具有依赖性和非平稳在线数据流的再生核希尔伯特空间(RKHS)中递归正规化学习算法的收敛性。首先,我们研究了RKHS中一类随机差分方程的均方渐近稳定性,其非齐次项是依赖于齐次项的鞅差分序列。其次,我们引入了随机Tikhonov正则化路径的概念,并证明如果正则化路径在某种意义上缓慢变化,那么算法的输出与正则化路径在均方上是一致的。此外,如果数据流还满足RKHS持续激励条件,即存在一个固定长度的时间段,使得每个时间段累积的输入数据诱导的操作符的条件期望的每个特征值都在时间上具有均匀正的下界,那么算法的输出与未知函数在均方上是一致的。最后,对于具有独立且非同分布数据流的情况,如果由输入数据诱导的边缘概率测度在时间上缓慢变化,并且每个固定长度时间段上的平均测度具有统一严格正的下界,则算法实现了均方一致性。

更新时间: 2024-04-04 05:35:59

领域: cs.LG,cs.SY,eess.SY

下载: http://arxiv.org/abs/2404.03211v1

Planning and Editing What You Retrieve for Enhanced Tool Learning

Recent advancements in integrating external tools with Large Language Models (LLMs) have opened new frontiers, with applications in mathematical reasoning, code generators, and smart assistants. However, existing methods, relying on simple one-time retrieval strategies, fall short on effectively and accurately shortlisting relevant tools. This paper introduces a novel PLUTO (Planning, Learning, and Understanding for TOols) approach, encompassing `Plan-and-Retrieve (P&R)` and `Edit-and-Ground (E&G)` paradigms. The P&R paradigm consists of a neural retrieval module for shortlisting relevant tools and an LLM-based query planner that decomposes complex queries into actionable tasks, enhancing the effectiveness of tool utilization. The E&G paradigm utilizes LLMs to enrich tool descriptions based on user scenarios, bridging the gap between user queries and tool functionalities. Experiment results demonstrate that these paradigms significantly improve the recall and NDCG in tool retrieval tasks, significantly surpassing current state-of-the-art models.

Updated: 2024-04-04 05:33:07

标题: 计划和编辑检索到的内容,以增强工具学习

摘要: 最近在将外部工具与大规模语言模型(LLMs)集成方面取得了重大进展,开拓了新的领域,应用于数学推理、代码生成器和智能助手。然而,现有方法依赖简单的一次性检索策略,无法有效和准确地筛选出相关工具。本文介绍了一种新颖的PLUTO(Planning, Learning, and Understanding for TOols)方法,包括“规划和检索(P&R)”和“编辑和落实(E&G)”范式。P&R范式包括一个神经检索模块,用于筛选相关工具,以及一个基于LLM的查询规划器,将复杂查询分解为可操作的任务,增强工具利用效果。E&G范式利用LLMs根据用户场景丰富工具描述,弥合用户查询与工具功能之间的差距。实验结果表明,这些范式显著提高了工具检索任务中的召回率和NDCG,明显超越了当前最先进的模型。

更新时间: 2024-04-04 05:33:07

领域: cs.CL,cs.AI,cs.IR,cs.LG

下载: http://arxiv.org/abs/2404.00450v2

Multimodal hierarchical multi-task deep learning framework for jointly predicting and explaining Alzheimer disease progression

Early identification of Mild Cognitive Impairment (MCI) subjects who will eventually progress to Alzheimer Disease (AD) is challenging. Existing deep learning models are mostly single-modality single-task models predicting risk of disease progression at a fixed timepoint. We proposed a multimodal hierarchical multi-task learning approach which can monitor the risk of disease progression at each timepoint of the visit trajectory. Longitudinal visit data from multiple modalities (MRI, cognition, and clinical data) were collected from MCI individuals of the Alzheimer Disease Neuroimaging Initiative (ADNI) dataset. Our hierarchical model predicted at every timepoint a set of neuropsychological composite cognitive function scores as auxiliary tasks and used the forecasted scores at every timepoint to predict the future risk of disease. Relevance weights for each composite function provided explanations about potential factors for disease progression. Our proposed model performed better than state-of-the-art baselines in predicting AD progression risk and the composite scores. Ablation study on the number of modalities demonstrated that imaging and cognition data contributed most towards the outcome. Model explanations at each timepoint can inform clinicians 6 months in advance the potential cognitive function decline that can lead to progression to AD in future. Our model monitored their risk of AD progression every 6 months throughout the visit trajectory of individuals. The hierarchical learning of auxiliary tasks allowed better optimization and allowed longitudinal explanations for the outcome. Our framework is flexible with the number of input modalities and the selection of auxiliary tasks and hence can be generalized to other clinical problems too.

Updated: 2024-04-04 05:30:03

标题: 多模态分层多任务深度学习框架,用于联合预测和解释阿尔茨海默病的进展

摘要: 早期识别轻度认知障碍(MCI)患者最终可能发展为阿尔茨海默病(AD)是具有挑战性的。现有的深度学习模型大多是单模态单任务模型,预测在固定时间点疾病进展的风险。我们提出了一种多模态分层多任务学习方法,可以监测每次访问轨迹的时间点上疾病进展的风险。从阿尔茨海默病神经影像学倡议(ADNI)数据集的MCI个体中收集了多模态(MRI、认知和临床数据)的纵向访问数据。我们的分层模型在每个时间点预测了一组神经心理学综合认知功能评分作为辅助任务,并利用每个时间点的预测分数来预测未来疾病的风险。每个组合功能的相关权重提供了关于疾病进展潜在因素的解释。我们提出的模型在预测AD进展风险和组合评分方面优于最先进的基线。对模态数量的消融研究表明,影像和认知数据对结果的贡献最大。每个时间点的模型解释可以提前6个月告知临床医生未来可能导致AD进展的认知功能下降。我们的模型在个体的访问轨迹中每6个月监测其AD进展的风险。辅助任务的分层学习使优化更好,并允许为结果提供纵向解释。我们的框架对输入模态的数量和辅助任务的选择具有灵活性,因此也可以推广到其他临床问题中。

更新时间: 2024-04-04 05:30:03

领域: cs.LG

下载: http://arxiv.org/abs/2404.03208v1

Robust Federated Learning Mitigates Client-side Training Data Distribution Inference Attacks

Recent studies have revealed that federated learning (FL), once considered secure due to clients not sharing their private data with the server, is vulnerable to attacks such as client-side training data distribution inference, where a malicious client can recreate the victim's data. While various countermeasures exist, they are not practical, often assuming server access to some training data or knowledge of label distribution before the attack. In this work, we bridge the gap by proposing InferGuard, a novel Byzantine-robust aggregation rule aimed at defending against client-side training data distribution inference attacks. In our proposed InferGuard, the server first calculates the coordinate-wise median of all the model updates it receives. A client's model update is considered malicious if it significantly deviates from the computed median update. We conduct a thorough evaluation of our proposed InferGuard on five benchmark datasets and perform a comparison with ten baseline methods. The results of our experiments indicate that our defense mechanism is highly effective in protecting against client-side training data distribution inference attacks, even against strong adaptive attacks. Furthermore, our method substantially outperforms the baseline methods in various practical FL scenarios.

Updated: 2024-04-04 05:23:39

标题: 强大的联邦学习减轻客户端训练数据分布推断攻击

摘要: 最近的研究发现,联邦学习(FL)曾被认为安全,因为客户端不与服务器共享其私人数据,但容易受到攻击,如客户端端训练数据分布推断,恶意客户端可以重建受害者的数据。虽然存在各种对策,但它们并不实用,通常假设服务器在攻击之前可以访问一些训练数据或知道标签分布。 在这项工作中,我们通过提出InferGuard,一个新颖的拜占庭鲁棒聚合规则,旨在抵御客户端端训练数据分布推断攻击,弥合了这一差距。在我们提出的InferGuard中,服务器首先计算接收到的所有模型更新的坐标中值。如果客户端的模型更新与计算的中值更新明显偏离,则认为其是恶意的。我们在五个基准数据集上对我们提出的InferGuard进行了深入评估,并与十种基准方法进行了比较。我们实验的结果表明,我们的防御机制在保护对抗客户端端训练数据分布推断攻击方面非常有效,甚至对抗强适应性攻击也能很好地应对。此外,我们的方法在各种实际的FL场景中都明显优于基准方法。

更新时间: 2024-04-04 05:23:39

领域: cs.CR,cs.DC,cs.LG

下载: http://arxiv.org/abs/2403.03149v2

LaMPilot: An Open Benchmark Dataset for Autonomous Driving with Language Model Programs

Autonomous driving (AD) has made significant strides in recent years. However, existing frameworks struggle to interpret and execute spontaneous user instructions, such as "overtake the car ahead." Large Language Models (LLMs) have demonstrated impressive reasoning capabilities showing potential to bridge this gap. In this paper, we present LaMPilot, a novel framework that integrates LLMs into AD systems, enabling them to follow user instructions by generating code that leverages established functional primitives. We also introduce LaMPilot-Bench, the first benchmark dataset specifically designed to quantitatively evaluate the efficacy of language model programs in AD. Adopting the LaMPilot framework, we conduct extensive experiments to assess the performance of off-the-shelf LLMs on LaMPilot-Bench. Our results demonstrate the potential of LLMs in handling diverse driving scenarios and following user instructions in driving. To facilitate further research in this area, we release our code and data at https://github.com/PurdueDigitalTwin/LaMPilot.

Updated: 2024-04-04 05:20:11

标题: LaMPilot:一份用于自动驾驶的开放基准数据集,带有语言模型程序

摘要: 自动驾驶(AD)在近年取得了显著进展。然而,现有框架难以解释和执行用户的即时指令,例如“超越前方的车辆”。大型语言模型(LLMs)显示出令人印象深刻的推理能力,有潜力弥合这一差距。本文介绍了LaMPilot,一个将LLMs集成到AD系统中的新框架,使其能够通过生成利用已建立的功能原语的代码来遵循用户指令。我们还介绍了LaMPilot-Bench,这是第一个专门设计用于定量评估语言模型程序在AD中有效性的基准数据集。采用LaMPilot框架,我们进行了广泛实验,评估了现成的LLMs在LaMPilot-Bench上的性能。我们的结果表明LLMs在处理各种驾驶场景和遵循用户驾驶指令方面具有潜力。为了促进该领域的进一步研究,我们在https://github.com/PurdueDigitalTwin/LaMPilot 上发布了我们的代码和数据。

更新时间: 2024-04-04 05:20:11

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2312.04372v2

RALL-E: Robust Codec Language Modeling with Chain-of-Thought Prompting for Text-to-Speech Synthesis

We present RALL-E, a robust language modeling method for text-to-speech (TTS) synthesis. While previous work based on large language models (LLMs) shows impressive performance on zero-shot TTS, such methods often suffer from poor robustness, such as unstable prosody (weird pitch and rhythm/duration) and a high word error rate (WER), due to the autoregressive prediction style of language models. The core idea behind RALL-E is chain-of-thought (CoT) prompting, which decomposes the task into simpler steps to enhance the robustness of LLM-based TTS. To accomplish this idea, RALL-E first predicts prosody features (pitch and duration) of the input text and uses them as intermediate conditions to predict speech tokens in a CoT style. Second, RALL-E utilizes the predicted duration prompt to guide the computing of self-attention weights in Transformer to enforce the model to focus on the corresponding phonemes and prosody features when predicting speech tokens. Results of comprehensive objective and subjective evaluations demonstrate that, compared to a powerful baseline method VALL-E, RALL-E significantly improves the WER of zero-shot TTS from $6.3\%$ (without reranking) and $2.1\%$ (with reranking) to $2.8\%$ and $1.0\%$, respectively. Furthermore, we demonstrate that RALL-E correctly synthesizes sentences that are hard for VALL-E and reduces the error rate from $68\%$ to $4\%$.

Updated: 2024-04-04 05:15:07

标题: RALL-E:用于文本到语音合成的鲁棒编解码器语言建模和思维链提示

摘要: 我们提出了一种强大的语言建模方法RALL-E,用于文本到语音(TTS)合成。尽管先前基于大型语言模型(LLMs)的工作在零样本TTS上表现出色,但这种方法常常受到稳定性差的困扰,如不稳定的韵律(奇怪的音高和节奏/持续时间)和高的词错误率(WER),这是由于语言模型的自回归预测风格导致的。RALL-E背后的核心思想是链式思维(CoT)提示,将任务分解为更简单的步骤,以增强基于LLM的TTS的稳健性。为了实现这一想法,RALL-E首先预测输入文本的韵律特征(音高和持续时间),并将它们用作预测语音令牌的中间条件,以CoT风格。其次,RALL-E利用预测的持续时间提示来指导Transformer中自注意力权重的计算,以强制模型在预测语音令牌时专注于相应的音素和韵律特征。全面客观和主观评估的结果表明,与强大的基线方法VALL-E相比,RALL-E显著改善了零样本TTS的WER,分别从$6.3\%$(无重新排序)和$2.1\%$(重新排序后)提高到$2.8\%$和$1.0\%$。此外,我们证明了RALL-E正确合成了对VALL-E来说困难的句子,并将错误率从$68\%$降低到$4\%$。

更新时间: 2024-04-04 05:15:07

领域: eess.AS,cs.AI,cs.CL,cs.LG,cs.SD

下载: http://arxiv.org/abs/2404.03204v1

Crowdsourcing Fraud Detection over Heterogeneous Temporal MMMA Graph

The rise of the click farm business using Multi-purpose Messaging Mobile Apps (MMMAs) tempts cybercriminals to perpetrate crowdsourcing frauds that cause financial losses to click farm workers. In this paper, we propose a novel contrastive multi-view learning method named CMT for crowdsourcing fraud detection over the heterogeneous temporal graph (HTG) of MMMA. CMT captures both heterogeneity and dynamics of HTG and generates high-quality representations for crowdsourcing fraud detection in a self-supervised manner. We deploy CMT to detect crowdsourcing frauds on an industry-size HTG of a representative MMMA WeChat and it significantly outperforms other methods. CMT also shows promising results for fraud detection on a large-scale public financial HTG, indicating that it can be applied in other graph anomaly detection tasks. We provide our implementation at https://github.com/KDEGroup/CMT.

Updated: 2024-04-04 05:10:06

标题: 众包欺诈检测在异构时间MMMA图上

摘要: 随着多功能消息移动应用程序(MMMA)的兴起,点击农场业务诱使网络犯罪分子实施众包欺诈,给点击农场工作者造成财务损失。在本文中,我们提出了一种名为CMT的新型对比多视图学习方法,用于在MMMA的异构时间图(HTG)上进行众包欺诈检测。CMT捕捉了HTG的异质性和动态性,并以自监督方式生成了高质量的表示,用于众包欺诈检测。我们部署了CMT来检测代表性MMMA微信的行业规模HTG上的众包欺诈,它明显优于其他方法。CMT在大规模公共财务HTG上的欺诈检测也显示出有希望的结果,表明它可以应用于其他图异常检测任务。我们将我们的实现提供在https://github.com/KDEGroup/CMT。

更新时间: 2024-04-04 05:10:06

领域: cs.SI,cs.AI

下载: http://arxiv.org/abs/2308.02793v2

Future-Proofing Class Incremental Learning

Exemplar-Free Class Incremental Learning is a highly challenging setting where replay memory is unavailable. Methods relying on frozen feature extractors have drawn attention recently in this setting due to their impressive performances and lower computational costs. However, those methods are highly dependent on the data used to train the feature extractor and may struggle when an insufficient amount of classes are available during the first incremental step. To overcome this limitation, we propose to use a pre-trained text-to-image diffusion model in order to generate synthetic images of future classes and use them to train the feature extractor. Experiments on the standard benchmarks CIFAR100 and ImageNet-Subset demonstrate that our proposed method can be used to improve state-of-the-art methods for exemplar-free class incremental learning, especially in the most difficult settings where the first incremental step only contains few classes. Moreover, we show that using synthetic samples of future classes achieves higher performance than using real data from different classes, paving the way for better and less costly pre-training methods for incremental learning.

Updated: 2024-04-04 05:08:51

标题: 未来保障班级增量学习

摘要: 非样本自由类增量学习是一个极具挑战性的设置,其中回放记忆不可用。最近在这种情况下,依赖冻结特征提取器的方法引起了关注,因为它们表现出色并且计算成本较低。然而,这些方法高度依赖用于训练特征提取器的数据,当第一个增量步骤中可用的类别数量不足时可能会遇到困难。为了克服这个限制,我们建议使用预训练的文本到图像扩散模型生成未来类别的合成图像,并使用它们来训练特征提取器。在标准基准测试CIFAR100和ImageNet-Subset上进行的实验表明,我们提出的方法可以用于改进非样本自由类增量学习的最新方法,特别是在第一个增量步骤中仅包含少量类别的最困难情况下。此外,我们展示了使用未来类别的合成样本比使用来自不同类别的真实数据获得更高的性能,为增量学习提供更好且成本更低的预训练方法铺平了道路。

更新时间: 2024-04-04 05:08:51

领域: cs.LG,cs.CV

下载: http://arxiv.org/abs/2404.03200v1

DeepIPC: Deeply Integrated Perception and Control for an Autonomous Vehicle in Real Environments

In this work, we introduce DeepIPC, a novel end-to-end model tailored for autonomous driving, which seamlessly integrates perception and control tasks. Unlike traditional models that handle these tasks separately, DeepIPC innovatively combines a perception module, which processes RGBD images for semantic segmentation and generates bird's eye view (BEV) mappings, with a controller module that utilizes these insights along with GNSS and angular speed measurements to accurately predict navigational waypoints. This integration allows DeepIPC to efficiently translate complex environmental data into actionable driving commands. Our comprehensive evaluation demonstrates DeepIPC's superior performance in terms of drivability and multi-task efficiency across diverse real-world scenarios, setting a new benchmark for end-to-end autonomous driving systems with a leaner model architecture. The experimental results underscore DeepIPC's potential to significantly enhance autonomous vehicular navigation, promising a step forward in the development of autonomous driving technologies. For further insights and replication, we will make our code and datasets available at https://github.com/oskarnatan/DeepIPC.

Updated: 2024-04-04 04:52:43

标题: DeepIPC:在真实环境中为自主车辆深度集成感知和控制

摘要: 在这项工作中,我们介绍了DeepIPC,这是一种专为自动驾驶定制的全新端到端模型,无缝地整合了感知和控制任务。与传统模型分别处理这些任务的方式不同,DeepIPC创新地将一个处理RGBD图像进行语义分割并生成鸟瞰图(BEV)映射的感知模块与一个利用这些见解以及GNSS和角速度测量来精确预测导航航点的控制器模块无缝结合。这种整合使得DeepIPC能够高效地将复杂的环境数据转化为可操作的驾驶命令。我们的全面评估显示,DeepIPC在驾驶性能和多任务效率方面表现出优越的性能,跨越了各种真实场景,为端到端自动驾驶系统设定了一个新的基准,并具有更精简的模型结构。实验结果强调了DeepIPC显著提升自主车辆导航能力的潜力,为自动驾驶技术的发展迈出了一大步。为了进一步深入了解和复制,我们将在https://github.com/oskarnatan/DeepIPC 上提供我们的代码和数据集。

更新时间: 2024-04-04 04:52:43

领域: cs.RO,cs.AI,cs.CV

下载: http://arxiv.org/abs/2207.09934v7

AQuA -- Combining Experts' and Non-Experts' Views To Assess Deliberation Quality in Online Discussions Using LLMs

Measuring the quality of contributions in political online discussions is crucial in deliberation research and computer science. Research has identified various indicators to assess online discussion quality, and with deep learning advancements, automating these measures has become feasible. While some studies focus on analyzing specific quality indicators, a comprehensive quality score incorporating various deliberative aspects is often preferred. In this work, we introduce AQuA, an additive score that calculates a unified deliberative quality score from multiple indices for each discussion post. Unlike other singular scores, AQuA preserves information on the deliberative aspects present in comments, enhancing model transparency. We develop adapter models for 20 deliberative indices, and calculate correlation coefficients between experts' annotations and the perceived deliberativeness by non-experts to weigh the individual indices into a single deliberative score. We demonstrate that the AQuA score can be computed easily from pre-trained adapters and aligns well with annotations on other datasets that have not be seen during training. The analysis of experts' vs. non-experts' annotations confirms theoretical findings in the social science literature.

Updated: 2024-04-04 04:34:31

标题: AQuA - 结合专家和非专家观点,利用LLMs评估在线讨论中的审慎质量

摘要: 在政治在线讨论中衡量贡献质量对于协商研究和计算机科学至关重要。研究已经确定了各种指标来评估在线讨论质量,并借助深度学习的进展,自动化这些测量变得可行。虽然一些研究着重于分析特定质量指标,但通常更倾向于包含各种协商方面的综合质量评分。在这项工作中,我们引入了AQuA,这是一个附加分数,它从多个指标中计算出每个讨论帖子的统一协商质量分数。与其他单一分数不同,AQuA保留了评论中存在的协商方面的信息,增强了模型的透明度。我们为20个协商指标开发了适配器模型,并计算了专家注释与非专家对协商性的感知之间的相关系数,以将各个指标加权为单一协商分数。我们证明了AQuA分数可以轻松地从预训练的适配器中计算出,并且与在训练期间未见过的其他数据集上的注释很好地吻合。专家与非专家注释的分析证实了社会科学文献中的理论发现。

更新时间: 2024-04-04 04:34:31

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2404.02761v2

Adaptive Discrete Disparity Volume for Self-supervised Monocular Depth Estimation

In self-supervised monocular depth estimation tasks, discrete disparity prediction has been proven to attain higher quality depth maps than common continuous methods. However, current discretization strategies often divide depth ranges of scenes into bins in a handcrafted and rigid manner, limiting model performance. In this paper, we propose a learnable module, Adaptive Discrete Disparity Volume (ADDV), which is capable of dynamically sensing depth distributions in different RGB images and generating adaptive bins for them. Without any extra supervision, this module can be integrated into existing CNN architectures, allowing networks to produce representative values for bins and a probability volume over them. Furthermore, we introduce novel training strategies - uniformizing and sharpening - through a loss term and temperature parameter, respectively, to provide regularizations under self-supervised conditions, preventing model degradation or collapse. Empirical results demonstrate that ADDV effectively processes global information, generating appropriate bins for various scenes and producing higher quality depth maps compared to handcrafted methods.

Updated: 2024-04-04 04:22:25

标题: 自监督单眼深度估计的自适应离散视差体积

摘要: 在自监督单目深度估计任务中,已经证明离散视差预测比常见的连续方法能够获得更高质量的深度图。然而,当前的离散化策略通常以手工制定和刚性方式将场景的深度范围划分为区间,限制了模型的性能。在本文中,我们提出了一个可学习的模块,自适应离散视差体积(ADDV),能够动态感知不同RGB图像中的深度分布并为其生成自适应的区间。在没有额外监督的情况下,这个模块可以集成到现有的CNN架构中,使网络能够为区间生成代表性值和概率分布。此外,我们引入了新的训练策略 - 均匀化和锐化 - 通过损失项和温度参数,分别在自监督条件下提供正则化,防止模型恶化或崩溃。实证结果表明,ADDV有效处理全局信息,为各种场景生成适当的区间,并产生比手工方法更高质量的深度图。

更新时间: 2024-04-04 04:22:25

领域: cs.CV,cs.AI,cs.RO

下载: http://arxiv.org/abs/2404.03190v1

The Probabilities Also Matter: A More Faithful Metric for Faithfulness of Free-Text Explanations in Large Language Models

In order to oversee advanced AI systems, it is important to understand their underlying decision-making process. When prompted, large language models (LLMs) can provide natural language explanations or reasoning traces that sound plausible and receive high ratings from human annotators. However, it is unclear to what extent these explanations are faithful, i.e., truly capture the factors responsible for the model's predictions. In this work, we introduce Correlational Explanatory Faithfulness (CEF), a metric that can be used in faithfulness tests based on input interventions. Previous metrics used in such tests take into account only binary changes in the predictions. Our metric accounts for the total shift in the model's predicted label distribution, more accurately reflecting the explanations' faithfulness. We then introduce the Correlational Counterfactual Test (CCT) by instantiating CEF on the Counterfactual Test (CT) from Atanasova et al. (2023). We evaluate the faithfulness of free-text explanations generated by few-shot-prompted LLMs from the Llama2 family on three NLP tasks. We find that our metric measures aspects of faithfulness which the CT misses.

Updated: 2024-04-04 04:20:04

标题: 概率也很重要:大型语言模型中自由文本解释忠实度的更准确度量方式

摘要: 为了监督先进的人工智能系统,了解其潜在的决策过程至关重要。在被要求时,大型语言模型(LLMs)可以提供自然语言解释或推理过程,听起来似乎合理,并得到人类注释者的高评分。然而,目前尚不清楚这些解释在多大程度上是忠实的,即是否真正捕捉到了影响模型预测的因素。在这项工作中,我们引入了相关性解释忠实度(CEF),这是一个可以在基于输入干预的忠实度测试中使用的度量。先前在此类测试中使用的度量仅考虑预测中的二进制变化。我们的度量考虑了模型预测标签分布的总体变化,更准确地反映了解释的忠实度。然后,我们通过在Atanasova等人(2023年)的反事实测试(CT)上实例化CEF,引入了相关性反事实测试(CCT)。我们评估了Llama2系列中几次触发的LLMs生成的自由文本解释在三个自然语言处理任务上的忠实度。我们发现我们的度量衡量了CT遗漏的忠实度方面。

更新时间: 2024-04-04 04:20:04

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2404.03189v1

Classification of Nasopharyngeal Cases using DenseNet Deep Learning Architecture

Nasopharyngeal carcinoma (NPC) is one of the understudied yet deadliest cancers in South East Asia. In Malaysia, the prevalence is identified mainly in Sarawak, among the ethnic of Bidayuh. NPC is often late-diagnosed because it is asymptomatic at the early stage. There are several tissue representations from the nasopharynx biopsy, such as nasopharyngeal inflammation (NPI), lymphoid hyperplasia (LHP), nasopharyngeal carcinoma (NPC) and normal tissue. This paper is our first initiative to identify the difference between NPC, NPI and normal cases. Seven whole slide images (WSIs) with gigapixel resolutions from seven different patients and two hospitals were experimented with using two test setups, consisting of a different set of images. The tissue regions are patched into smaller blocks and classified using DenseNet architecture with 21 dense layers. Two tests are carried out, each for proof of concept (Test 1) and real-test scenario (Test 2). The accuracy achieved for NPC class is 94.8% for Test 1 and 67.0% for Test 2.

Updated: 2024-04-04 04:16:31

标题: 使用DenseNet深度学习架构对鼻咽病例进行分类

摘要: 鼻咽癌(NPC)是东南亚地区研究较少但致命的癌症之一。在马来西亚,患病率主要集中在沙捞越州的比达优族。由于早期没有症状,NPC通常被晚期诊断。鼻咽活检中有几种组织表现,如鼻咽炎(NPI)、淋巴组织增生(LHP)、鼻咽癌(NPC)和正常组织。这篇论文是我们首次尝试区分NPC、NPI和正常情况之间的差异。从两家不同医院的七名患者采集了分辨率为gigapixel的七个全切片图像(WSIs),采用两种不同的图像集进行实验。组织区域被划分为较小的块,并使用具有21个密集层的DenseNet架构进行分类。进行了两项测试,分别用于概念验证(测试1)和真实测试场景(测试2)。在测试1中,NPC类的准确率达到94.8%,在测试2中为67.0%。

更新时间: 2024-04-04 04:16:31

领域: eess.IV,cs.CV,cs.LG

下载: http://arxiv.org/abs/2404.03188v1

Image Outlier Detection Without Training using RANSAC

Image outlier detection (OD) is an essential tool to ensure the quality of images used in computer vision tasks. Existing algorithms often involve training a model to represent the inlier distribution, and outliers are determined by some deviation measure. Although existing methods proved effective when trained on strictly inlier samples, their performance remains questionable when undesired outliers are included during training. As a result of this limitation, it is necessary to carefully examine the data when developing OD models for new domains. In this work, we present a novel image OD algorithm called RANSAC-NN that eliminates the need of data examination and model training altogether. Unlike existing approaches, RANSAC-NN can be directly applied on datasets containing outliers by sampling and comparing subsets of the data. Our algorithm maintains favorable performance compared to existing methods on a range of benchmarks. Furthermore, we show that RANSAC-NN can enhance the robustness of existing methods by incorporating our algorithm as part of the data preparation process.

Updated: 2024-04-04 04:11:05

标题: 使用RANSAC进行无需训练的图像异常值检测

摘要: 图像异常值检测(OD)是确保计算机视觉任务中使用的图像质量的关键工具。现有算法通常涉及训练模型以表示正常值分布,并通过某些偏差度量确定异常值。尽管现有方法在严格训练正常值样本时被证明是有效的,但当在训练过程中包含不希望的异常值时,它们的性能仍然存在疑问。因此,有必要在开发新领域的OD模型时仔细检查数据。在这项工作中,我们提出了一种名为RANSAC-NN的新颖图像OD算法,它消除了对数据检查和模型训练的需求。与现有方法不同,RANSAC-NN可以直接应用于包含异常值的数据集,通过对数据子集进行抽样和比较。我们的算法在一系列基准测试中保持了有利的性能,此外,我们展示了RANSAC-NN可以通过将我们的算法作为数据准备过程的一部分来增强现有方法的稳健性。

更新时间: 2024-04-04 04:11:05

领域: cs.CV,cs.IR,cs.LG

下载: http://arxiv.org/abs/2307.12301v3

DeepIPCv2: LiDAR-powered Robust Environmental Perception and Navigational Control for Autonomous Vehicle

We present DeepIPCv2, an autonomous driving model that perceives the environment using a LiDAR sensor for more robust drivability, especially when driving under poor illumination conditions where everything is not clearly visible. DeepIPCv2 takes a set of LiDAR point clouds as the main perception input. Since point clouds are not affected by illumination changes, they can provide a clear observation of the surroundings no matter what the condition is. This results in a better scene understanding and stable features provided by the perception module to support the controller module in estimating navigational control properly. To evaluate its performance, we conduct several tests by deploying the model to predict a set of driving records and perform real automated driving under three different conditions. We also conduct ablation and comparative studies with some recent models to justify its performance. Based on the experimental results, DeepIPCv2 shows a robust performance by achieving the best drivability in all driving scenarios. Furthermore, to support future research, we will upload the codes and data to https://github.com/oskarnatan/DeepIPCv2.

Updated: 2024-04-04 04:07:48

标题: DeepIPCv2:基于激光雷达的自主车辆稳健环境感知和导航控制

摘要: 我们提出了DeepIPCv2,这是一个自主驾驶模型,通过使用LiDAR传感器来感知环境,以实现更加稳健的驾驶性能,特别是在光照条件较差的情况下,其中一切并不清晰可见。DeepIPCv2以一组LiDAR点云作为主要感知输入。由于点云不受照明变化影响,无论条件如何,它们都可以提供对周围环境的清晰观察。这导致了对场景的更好理解,感知模块提供的稳定特征能够支持控制模块准确估计导航控制。为了评估其性能,我们进行了几项测试,通过部署该模型来预测一组驾驶记录并在三种不同条件下进行真实自动驾驶。我们还进行了消融和比较研究,以证明其性能。根据实验结果,DeepIPCv2在所有驾驶场景中均表现出稳健的性能,实现了最佳的驾驶性能。此外,为了支持未来的研究,我们将代码和数据上传至https://github.com/oskarnatan/DeepIPCv2。

更新时间: 2024-04-04 04:07:48

领域: cs.RO,cs.AI,cs.CV

下载: http://arxiv.org/abs/2307.06647v3

The Death of Feature Engineering? BERT with Linguistic Features on SQuAD 2.0

Machine reading comprehension is an essential natural language processing task, which takes into a pair of context and query and predicts the corresponding answer to query. In this project, we developed an end-to-end question answering model incorporating BERT and additional linguistic features. We conclude that the BERT base model will be improved by incorporating the features. The EM score and F1 score are improved 2.17 and 2.14 compared with BERT(base). Our best single model reaches EM score 76.55 and F1 score 79.97 in the hidden test set. Our error analysis also shows that the linguistic architecture can help model understand the context better in that it can locate answers that BERT only model predicted "No Answer" wrongly.

Updated: 2024-04-04 03:50:34

标题: 特征工程的消亡?在SQuAD 2.0上使用具有语言特征的BERT

摘要: 机器阅读理解是一项重要的自然语言处理任务,它接受一对上下文和查询,并预测相应的查询答案。在这个项目中,我们开发了一个集成了BERT和额外语言特征的端到端问答模型。我们得出结论,通过集成这些特征,BERT基础模型将得到改进。与BERT(基础)相比,EM分数和F1分数分别提高了2.17和2.14。我们最佳的单一模型在隐藏测试集中达到了EM分数76.55和F1分数79.97。我们的错误分析还显示,语言架构有助于模型更好地理解上下文,它可以定位BERT模型错误预测为“无答案”的答案。

更新时间: 2024-04-04 03:50:34

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2404.03184v1

Missing Data Imputation Based on Structural Equation Modeling Enhanced with Self-Attention

Addressing missing data in complex datasets like Electronic Health Records (EHR) is critical for ensuring accurate analysis and decision-making in healthcare. This paper proposes Structural Equation Modeling (SEM) enhanced with the Self-Attention method (SESA), an innovative approach for data imputation in EHR. SESA innovates beyond traditional SEM-based methods by incorporating self-attention mechanisms, enhancing the model's adaptability and accuracy across diverse EHR datasets. This enhancement allows SESA to dynamically adjust and optimize imputation processes, overcoming the limitations of static SEM frameworks. Our experimental analyses demonstrate that SESA achieves robust predictive performance, effectively handling missing data in EHR. Moreover, SESA's architecture not only rectifies potential mis-specifications in SEM but also synergizes with causal discovery algorithms, to refine its imputation logic based on underlying data structures. These features highlight SESA's advanced capabilities and its potential for broader application in EHR data analysis and beyond, marking a significant leap forward in the field of data imputation.

Updated: 2024-04-04 03:38:32

标题: 基于结构方程建模的缺失数据插补,结合自注意力机制增强

摘要: 复杂数据集如电子健康记录(EHR)中的缺失数据对于确保医疗保健领域准确分析和决策至关重要。本文提出了结构方程模型(SEM)增强自注意力方法(SESA),这是一种创新的方法,用于EHR中的数据插补。SESA通过整合自注意力机制,超越传统基于SEM的方法,增强了模型在不同EHR数据集上的适应性和准确性。这种增强使SESA能够动态调整和优化插补过程,克服静态SEM框架的局限性。我们的实验分析表明,SESA实现了强大的预测性能,有效处理EHR中的缺失数据。此外,SESA的架构不仅纠正了SEM中潜在的错误规范,而且与因果发现算法协同作用,根据底层数据结构优化其插补逻辑。这些特性突显了SESA的先进能力和在EHR数据分析及其他领域广泛应用的潜力,标志着数据插补领域的重大飞跃。

更新时间: 2024-04-04 03:38:32

领域: cs.LG

下载: http://arxiv.org/abs/2308.12388v3

Goldfish: An Efficient Federated Unlearning Framework

With recent legislation on the right to be forgotten, machine unlearning has emerged as a crucial research area. It facilitates the removal of a user's data from federated trained machine learning models without the necessity for retraining from scratch. However, current machine unlearning algorithms are confronted with challenges of efficiency and validity.To address the above issues, we propose a new framework, named Goldfish. It comprises four modules: basic model, loss function, optimization, and extension. To address the challenge of low validity in existing machine unlearning algorithms, we propose a novel loss function. It takes into account the loss arising from the discrepancy between predictions and actual labels in the remaining dataset. Simultaneously, it takes into consideration the bias of predicted results on the removed dataset. Moreover, it accounts for the confidence level of predicted results. Additionally, to enhance efficiency, we adopt knowledge distillation technique in basic model and introduce an optimization module that encompasses the early termination mechanism guided by empirical risk and the data partition mechanism. Furthermore, to bolster the robustness of the aggregated model, we propose an extension module that incorporates a mechanism using adaptive distillation temperature to address the heterogeneity of user local data and a mechanism using adaptive weight to handle the variety in the quality of uploaded models. Finally, we conduct comprehensive experiments to illustrate the effectiveness of proposed approach.

Updated: 2024-04-04 03:29:41

标题: 金鱼:一种高效的联邦遗忘框架

摘要: 随着最近有关被遗忘权的立法,机器去学习已经成为一个关键的研究领域。它有助于从联邦训练的机器学习模型中删除用户数据,而无需从头重新训练。然而,当前的机器去学习算法面临效率和有效性方面的挑战。为了解决上述问题,我们提出了一个名为Goldfish的新框架。它包括四个模块:基本模型、损失函数、优化和扩展。为了解决现有机器去学习算法中有效性不足的挑战,我们提出了一种新颖的损失函数。它考虑了由于剩余数据集中预测和实际标签之间的差异而产生的损失。同时,它考虑了在移除数据集上预测结果的偏差。此外,它考虑了预测结果的置信水平。此外,为了提高效率,我们在基本模型中采用了知识蒸馏技术,并引入了一个优化模块,其中包括由经验风险引导的早期终止机制和数据分区机制。此外,为了增强聚合模型的鲁棒性,我们提出了一个扩展模块,其中包括使用自适应蒸馏温度来处理用户本地数据的异质性的机制,以及使用自适应权重来处理上传模型质量的多样性的机制。最后,我们进行了全面的实验来说明所提出方法的有效性。

更新时间: 2024-04-04 03:29:41

领域: cs.LG,cs.CR

下载: http://arxiv.org/abs/2404.03180v1

Information-Theoretic Generalization Bounds for Deep Neural Networks

Deep neural networks (DNNs) exhibit an exceptional capacity for generalization in practical applications. This work aims to capture the effect and benefits of depth for supervised learning via information-theoretic generalization bounds. We first derive two hierarchical bounds on the generalization error in terms of the Kullback-Leibler (KL) divergence or the 1-Wasserstein distance between the train and test distributions of the network internal representations. The KL divergence bound shrinks as the layer index increases, while the Wasserstein bound implies the existence of a layer that serves as a generalization funnel, which attains a minimal 1-Wasserstein distance. Analytic expressions for both bounds are derived under the setting of binary Gaussian classification with linear DNNs. To quantify the contraction of the relevant information measures when moving deeper into the network, we analyze the strong data processing inequality (SDPI) coefficient between consecutive layers of three regularized DNN models: Dropout, DropConnect, and Gaussian noise injection. This enables refining our generalization bounds to capture the contraction as a function of the network architecture parameters. Specializing our results to DNNs with a finite parameter space and the Gibbs algorithm reveals that deeper yet narrower network architectures generalize better in those examples, although how broadly this statement applies remains a question.

Updated: 2024-04-04 03:20:35

标题: 深度神经网络的信息论泛化界限

摘要: 深度神经网络(DNNs)在实际应用中表现出异常的泛化能力。本研究旨在通过信息论泛化界限来捕捉深度对监督学习的影响和好处。我们首先推导了两个层次性的泛化误差界限,其中以Kullback-Leibler(KL)散度或网络内部表示的训练和测试分布之间的1-Wasserstein距离为基础。KL散度界限随着层索引的增加而缩小,而Wasserstein界限暗示存在一个层作为泛化漏斗,达到最小的1-Wasserstein距离。在线性DNNs的二元高斯分类设置下推导了两个界限的分析表达式。为了量化在网络深入时相关信息度量的收缩,我们分析了三种正则化DNN模型(Dropout、DropConnect和高斯噪声注入)连续层之间的强数据处理不等式(SDPI)系数。这使得我们能够将我们的泛化界限细化为捕捉作为网络架构参数函数的收缩。将我们的结果专门应用于具有有限参数空间和Gibbs算法的DNNs揭示,尽管这种说法适用范围如何仍然是一个问题,但更深而较窄的网络架构在这些示例中表现出更好的泛化能力。

更新时间: 2024-04-04 03:20:35

领域: cs.LG,cs.IT,math.IT

下载: http://arxiv.org/abs/2404.03176v1

Baichuan2-Sum: Instruction Finetune Baichuan2-7B Model for Dialogue Summarization

Large language models (LLMs) like Llama, Baichuan and Bloom models show remarkable ability with instruction fine-tuning in many natural language tasks. Nevertheless, for the dialogue summarization task, which aims to generate summaries for different roles in dialogue, most of the state-of-the-art methods conduct on small models (e.g Bart and Bert). Existing methods try to add task specified optimization on small models like adding global-local centrality score to models. In this paper, we propose an instruction fine-tuning model: Baichuan2-Sum, for role-oriented diaglouge summarization. By setting different instructions for different roles, the model can learn from the dialogue interactions and output the expected summaries. Furthermore, we applied NEFTune technique to add suitable noise during training to improve the results. The experiments demonstrate that the proposed model achieves the new state-of-the-art results on two public dialogue summarization datasets: CSDS and SAMSUM. We release our model and related codes to facilitate future studies on dialogue summarization task.

Updated: 2024-04-04 03:15:15

标题: Baichuan2-Sum:指导微调Baichuan2-7B模型用于对话摘要

摘要: 大型语言模型(LLMs)如Llama、Baichuan和Bloom模型在许多自然语言任务中展现出出色的指导微调能力。然而,对于对话摘要任务,其目标是为对话中的不同角色生成摘要,大多数最先进的方法都是基于小型模型(例如Bart和Bert)进行的。现有方法尝试在小型模型上添加任务特定的优化,如向模型添加全局-局部中心度分数。在本文中,我们提出了一个指导微调模型:Baichuan2-Sum,用于面向角色的对话摘要。通过为不同角色设置不同的指导,模型可以从对话互动中学习并输出预期的摘要。此外,我们应用了NEFTune技术,在训练过程中添加适当的噪音以改善结果。实验表明,所提出的模型在两个公共对话摘要数据集(CSDS和SAMSUM)上实现了新的最先进结果。我们发布了我们的模型和相关代码,以促进未来对话摘要任务的研究。

更新时间: 2024-04-04 03:15:15

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2401.15496v3

Distributed Representations of Entities in Open-World Knowledge Graphs

Graph neural network (GNN)-based methods have demonstrated remarkable performance in various knowledge graph (KG) tasks. However, most existing approaches rely on observing all entities during training, posing a challenge in real-world knowledge graphs where new entities emerge frequently. To address this limitation, we introduce Decentralized Attention Network (DAN). DAN leverages neighbor context as the query vector to score the neighbors of an entity, thereby distributing the entity semantics only among its neighbor embeddings. To effectively train a DAN, we introduce self-distillation, a technique that guides the network in generating desired representations. Theoretical analysis validates the effectiveness of our approach. We implement an end-to-end framework and conduct extensive experiments to evaluate our method, showcasing competitive performance on conventional entity alignment and entity prediction tasks. Furthermore, our method significantly outperforms existing methods in open-world settings.

Updated: 2024-04-04 03:12:50

标题: 开放世界知识图中实体的分布式表示

摘要: 基于图神经网络(GNN)的方法在各种知识图(KG)任务中展现出了显著的性能。然而,大多数现有方法依赖于在训练过程中观察所有实体,这在现实世界的知识图中新实体频繁出现时存在挑战。为了解决这一限制,我们引入了分散式注意力网络(DAN)。DAN利用邻居上下文作为查询向量对实体的邻居进行评分,从而仅在其邻居嵌入之间分发实体语义。为了有效训练DAN,我们引入了自我蒸馏技术,该技术指导网络生成所需表示。理论分析验证了我们方法的有效性。我们实现了一个端到端框架,并进行了大量实验来评估我们的方法,在传统的实体对齐和实体预测任务中展示出竞争性表现。此外,我们的方法在开放世界设置中明显优于现有方法。

更新时间: 2024-04-04 03:12:50

领域: cs.LG,cs.AI,cs.CL

下载: http://arxiv.org/abs/2010.08114v2

Stable Anisotropic Regularization

Given the success of Large Language Models (LLMs), there has been considerable interest in studying the properties of model activations. The literature overwhelmingly agrees that LLM representations are dominated by a few "outlier dimensions" with exceedingly high variance and magnitude. Several studies in Natural Language Processing (NLP) have sought to mitigate the impact of such outlier dimensions and force LLMs to be isotropic (i.e., have uniform variance across all dimensions in embedding space). Isotropy is thought to be a desirable property for LLMs that improves model performance and more closely aligns textual representations with human intuition. However, many of the claims regarding isotropy in NLP have been based on the average cosine similarity of embeddings, which has recently been shown to be a flawed measure of isotropy. In this paper, we propose I-STAR: IsoScore*-based STable Anisotropic Regularization, a novel regularization method that can be used to increase or decrease levels of isotropy in embedding space during training. I-STAR uses IsoScore*, the first accurate measure of isotropy that is both differentiable and stable on mini-batch computations. In contrast to several previous works, we find that decreasing isotropy in contextualized embeddings improves performance on the majority of tasks and models considered in this paper.

Updated: 2024-04-04 03:04:12

标题: 稳定各向异性正则化

摘要: 鉴于大型语言模型(LLMs)的成功,人们对研究模型激活特性表现出了极大的兴趣。文献普遍认为LLM表示由少数“异常维度”主导,这些维度具有异常高的方差和幅度。自然语言处理(NLP)领域的几项研究试图减轻这些异常维度的影响,并迫使LLMs成为各向同性(即在嵌入空间的所有维度上具有均匀方差)。各向同性被认为是LLMs的一个理想特性,可以提高模型性能,并更接近人类直觉的文本表示。然而,关于NLP中各向同性的许多说法基于嵌入的平均余弦相似度,最近已经证明这是一个有缺陷的各向同性度量。在本文中,我们提出了I-STAR:基于IsoScore*的稳定各向异性正则化,这是一种新颖的正则化方法,可用于在训练过程中增加或减少嵌入空间的各向同性水平。I-STAR使用IsoScore*,这是第一个准确测量各向同性的指标,对小批量计算既可微分又稳定。与之前的几项工作相反,我们发现在上下文化嵌入中减少各向同性可以提高本文考虑的大多数任务和模型的性能。

更新时间: 2024-04-04 03:04:12

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2305.19358v3

Multi-modal Learning for WebAssembly Reverse Engineering

The increasing adoption of WebAssembly (Wasm) for performance-critical and security-sensitive tasks drives the demand for WebAssembly program comprehension and reverse engineering. Recent studies have introduced machine learning (ML)-based WebAssembly reverse engineering tools. Yet, the generalization of task-specific ML solutions remains challenging, because their effectiveness hinges on the availability of an ample supply of high-quality task-specific labeled data. Moreover, previous works overlook the high-level semantics present in source code and its documentation. Acknowledging the abundance of available source code with documentation, which can be compiled into WebAssembly, we propose to learn representations of them concurrently and harness their mutual relationships for effective WebAssembly reverse engineering. In this paper, we present WasmRev, the first multi-modal pre-trained language model for WebAssembly reverse engineering. WasmRev is pre-trained using self-supervised learning on a large-scale multi-modal corpus encompassing source code, code documentation and the compiled WebAssembly, without requiring labeled data. WasmRev incorporates three tailored multi-modal pre-training tasks to capture various characteristics of WebAssembly and cross-modal relationships. WasmRev is only trained once to produce general-purpose representations that can broadly support WebAssembly reverse engineering tasks through few-shot fine-tuning with much less labeled data, improving data efficiency. We fine-tune WasmRev onto three important reverse engineering tasks: type recovery, function purpose identification and WebAssembly summarization. Our results show that WasmRev pre-trained on the corpus of multi-modal samples establishes a robust foundation for these tasks, achieving high task accuracy and outperforming the state-of-the-art ML methods for WebAssembly reverse engineering.

Updated: 2024-04-04 03:03:38

标题: 多模式学习用于WebAssembly反向工程

摘要: 随着WebAssembly(Wasm)在性能关键和安全敏感任务中的日益采用,对WebAssembly程序理解和逆向工程的需求也在增加。最近的研究引入了基于机器学习(ML)的WebAssembly逆向工程工具。然而,特定任务的ML解决方案的普适性仍然具有挑战性,因为它们的有效性取决于大量高质量的特定任务标记数据的可用性。此外,先前的研究忽视了源代码及其文档中存在的高级语义。鉴于现有源代码与文档的丰富性,可以将其编译成WebAssembly,我们建议同时学习它们的表示,并利用它们之间的相互关系进行有效的WebAssembly逆向工程。 在本文中,我们提出了WasmRev,这是第一个用于WebAssembly逆向工程的多模态预训练语言模型。WasmRev使用自监督学习在一个大规模的多模态语料库上进行预训练,涵盖了源代码、代码文档和编译后的WebAssembly,无需标记数据。WasmRev包含三个定制的多模态预训练任务,以捕捉WebAssembly的各种特征和跨模态关系。WasmRev仅需训练一次即可生成通用表示,可以广泛支持WebAssembly逆向工程任务,通过少量标记数据进行少样本微调,提高数据效率。我们将WasmRev微调到三个重要的逆向工程任务上:类型恢复、函数目的识别和WebAssembly摘要。我们的结果表明,基于多模态样本语料库进行预训练的WasmRev为这些任务奠定了坚实的基础,实现了高任务准确度,并超越了WebAssembly逆向工程的最新ML方法。

更新时间: 2024-04-04 03:03:38

领域: cs.SE,cs.LG,cs.PL

下载: http://arxiv.org/abs/2404.03171v1

Designing for Human-Agent Alignment: Understanding what humans want from their agents

Our ability to build autonomous agents that leverage Generative AI continues to increase by the day. As builders and users of such agents it is unclear what parameters we need to align on before the agents start performing tasks on our behalf. To discover these parameters, we ran a qualitative empirical research study about designing agents that can negotiate during a fictional yet relatable task of selling a camera online. We found that for an agent to perform the task successfully, humans/users and agents need to align over 6 dimensions: 1) Knowledge Schema Alignment 2) Autonomy and Agency Alignment 3) Operational Alignment and Training 4) Reputational Heuristics Alignment 5) Ethics Alignment and 6) Human Engagement Alignment. These empirical findings expand previous work related to process and specification alignment and the need for values and safety in Human-AI interactions. Subsequently we discuss three design directions for designers who are imagining a world filled with Human-Agent collaborations.

Updated: 2024-04-04 03:01:57

标题: 设计人-代理对齐:了解人类希望从他们的代理人那里得到什么

摘要: 我们的能力构建利用生成式人工智能的自主代理每天都在增加。作为这些代理的构建者和用户,在代理开始代表我们执行任务之前,我们需要对什么参数进行调整尚不清楚。为了发现这些参数,我们进行了一项定性实证研究,研究了设计代理人的能力,可以在虚构但可相关的在线相机销售任务中进行谈判。我们发现,为了使代理成功完成任务,人类/用户和代理需要在6个维度上进行调整:1)知识模式对齐 2)自治和代理对齐 3)运作对齐和培训 4)声誉启发式对齐 5)伦理对齐和6)人类参与对齐。这些实证发现扩展了先前与过程和规格对齐以及人机互动中价值观和安全性需求相关的工作。随后,我们讨论了三个设计方向,供那些想象一个充满人类-代理合作的世界的设计者参考。

更新时间: 2024-04-04 03:01:57

领域: cs.AI,cs.HC,cs.LG,I.2.0

下载: http://arxiv.org/abs/2404.04289v1

CONFLARE: CONFormal LArge language model REtrieval

Retrieval-augmented generation (RAG) frameworks enable large language models (LLMs) to retrieve relevant information from a knowledge base and incorporate it into the context for generating responses. This mitigates hallucinations and allows for the updating of knowledge without retraining the LLM. However, RAG does not guarantee valid responses if retrieval fails to identify the necessary information as the context for response generation. Also, if there is contradictory content, the RAG response will likely reflect only one of the two possible responses. Therefore, quantifying uncertainty in the retrieval process is crucial for ensuring RAG trustworthiness. In this report, we introduce a four-step framework for applying conformal prediction to quantify retrieval uncertainty in RAG frameworks. First, a calibration set of questions answerable from the knowledge base is constructed. Each question's embedding is compared against document embeddings to identify the most relevant document chunks containing the answer and record their similarity scores. Given a user-specified error rate ({\alpha}), these similarity scores are then analyzed to determine a similarity score cutoff threshold. During inference, all chunks with similarity exceeding this threshold are retrieved to provide context to the LLM, ensuring the true answer is captured in the context with a (1-{\alpha}) confidence level. We provide a Python package that enables users to implement the entire workflow proposed in our work, only using LLMs and without human intervention.

Updated: 2024-04-04 02:58:21

标题: CONFLARE: 正式的大型语言模型检索

摘要: 检索增强生成(RAG)框架使大型语言模型(LLMs)能够从知识库中检索相关信息,并将其合并到生成响应的上下文中。这有助于减少虚构,并允许更新知识而无需重新训练LLM。然而,如果检索未能识别必要的信息作为响应生成的上下文,RAG并不能保证有效的响应。此外,如果存在矛盾内容,RAG响应可能只反映两种可能响应中的一种。因此,在检索过程中量化不确定性对于确保RAG的可靠性至关重要。在本报告中,我们介绍了一个四步框架,将符合性预测应用于量化RAG框架中的检索不确定性。首先,构建可从知识库回答的校准问题集。将每个问题的嵌入与文档嵌入进行比较,以识别包含答案的最相关文档块,并记录它们的相似度分数。给定用户指定的错误率(α),然后分析这些相似度分数以确定相似度分数的截止阈值。在推断过程中,所有相似度超过此阈值的块将被检索以为LLM提供上下文,确保真实答案以(1-α)的置信水平捕获在上下文中。我们提供一个Python包,使用户能够实现我们提出的整个工作流程,仅使用LLMs且无需人工干预。

更新时间: 2024-04-04 02:58:21

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2404.04287v1

Improving the Reconstruction of Disentangled Representation Learners via Multi-Stage Modeling

Current autoencoder-based disentangled representation learning methods achieve disentanglement by penalizing the (aggregate) posterior to encourage statistical independence of the latent factors. This approach introduces a trade-off between disentangled representation learning and reconstruction quality since the model does not have enough capacity to learn correlated latent variables that capture detail information present in most image data. To overcome this trade-off, we present a novel multi-stage modeling approach where the disentangled factors are first learned using a penalty-based disentangled representation learning method; then, the low-quality reconstruction is improved with another deep generative model that is trained to model the missing correlated latent variables, adding detail information while maintaining conditioning on the previously learned disentangled factors. Taken together, our multi-stage modelling approach results in a single, coherent probabilistic model that is theoretically justified by the principal of D-separation and can be realized with a variety of model classes including likelihood-based models such as variational autoencoders, implicit models such as generative adversarial networks, and tractable models like normalizing flows or mixtures of Gaussians. We demonstrate that our multi-stage model has higher reconstruction quality than current state-of-the-art methods with equivalent disentanglement performance across multiple standard benchmarks. In addition, we apply the multi-stage model to generate synthetic tabular datasets, showcasing an enhanced performance over benchmark models across a variety of metrics. The interpretability analysis further indicates that the multi-stage model can effectively uncover distinct and meaningful features of variations from which the original distribution can be recovered.

Updated: 2024-04-04 02:47:09

标题: 通过多阶段建模改进解缠表示学习器的重建

摘要: 目前基于自动编码器的解耦表示学习方法通过对(聚合)后验进行惩罚来促进潜在因素的统计独立性,从而实现解耦。这种方法引入了解耦表示学习和重建质量之间的权衡,因为模型没有足够的容量来学习捕获大多数图像数据中存在的详细信息的相关潜在变量。为了克服这种权衡,我们提出了一种新颖的多阶段建模方法,首先使用基于惩罚的解耦表示学习方法学习解耦因子;然后,通过另一个深度生成模型改进低质量重建,该模型经过训练以建模缺失的相关潜在变量,同时添加详细信息并保持对先前学习的解耦因子的条件化。总的来说,我们的多阶段建模方法产生了一个统一的概率模型,根据D-分离原则在理论上得到了证明,并且可以通过各种模型类别实现,包括基于似然的模型,如变分自动编码器,隐式模型,如生成对抗网络,以及可处理的模型,如正态流或高斯混合模型。我们证明,我们的多阶段模型在多个标准基准测试中具有更高的重建质量,与当前最先进的方法具有相当的解耦性能。此外,我们将多阶段模型应用于生成合成表格数据集,展示了在各种指标上优于基准模型的性能。可解释性分析进一步表明,多阶段模型能够有效地揭示出不同和有意义的特征变化,从中可以恢复原始分布。

更新时间: 2024-04-04 02:47:09

领域: stat.ML,cs.CV,cs.LG

下载: http://arxiv.org/abs/2010.13187v2

Personalized Federated Learning for Spatio-Temporal Forecasting: A Dual Semantic Alignment-Based Contrastive Approach

The existing federated learning (FL) methods for spatio-temporal forecasting fail to capture the inherent spatio-temporal heterogeneity, which calls for personalized FL (PFL) methods to model the spatio-temporally variant patterns. While contrastive learning approach is promising in addressing spatio-temporal heterogeneity, the existing methods are noneffective in determining negative pairs and can hardly apply to PFL paradigm. To tackle this limitation, we propose a novel PFL method, named Federated dUal sEmantic aLignment-based contraStive learning (FUELS), which can adaptively align positive and negative pairs based on semantic similarity, thereby injecting precise spatio-temporal heterogeneity into the latent representation space by auxiliary contrastive tasks. From temporal perspective, a hard negative filtering module is introduced to dynamically align heterogeneous temporal representations for the supplemented intra-client contrastive task. From spatial perspective, we design lightweight-but-efficient prototypes as client-level semantic representations, based on which the server evaluates spatial similarity and yields client-customized global prototypes for the supplemented inter-client contrastive task. Extensive experiments demonstrate that FUELS outperforms state-of-the-art methods, with communication cost decreasing by around 94%.

Updated: 2024-04-04 02:43:56

标题: 基于双语义对齐对比方法的时空预测个性化联邦学习

摘要: 现有的面向时空预测的联邦学习(FL)方法未能捕捉固有的时空异质性,这需要个性化的FL(PFL)方法来建模时空变异模式。虽然对比学习方法在解决时空异质性方面很有前景,但现有方法在确定负对时效果不佳,几乎无法应用于PFL范式。为解决这一限制,我们提出了一种新颖的PFL方法,名为Federated dUal sEmantic aLignment-based contraStive learning(FUELS),它可以根据语义相似性自适应地对齐正负对,从而通过辅助对比任务将精确的时空异质性注入潜在表示空间。从时间角度看,引入了一个硬负过滤模块,动态地对齐异质时间表示,用于补充客户端内对比任务。从空间角度看,我们设计了轻量级但高效的原型作为客户端级语义表示,基于这些原型,服务器评估空间相似性并产生客户定制的全局原型,用于补充客户间的对比任务。大量实验证明,FUELS的性能优于最先进的方法,通信成本降低约94%。

更新时间: 2024-04-04 02:43:56

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2404.03702v1

A Learning-Based Caching Mechanism for Edge Content Delivery

With the advent of 5G networks and the rise of the Internet of Things (IoT), Content Delivery Networks (CDNs) are increasingly extending into the network edge. This shift introduces unique challenges, particularly due to the limited cache storage and the diverse request patterns at the edge. These edge environments can host traffic classes characterized by varied object-size distributions and object-access patterns. Such complexity makes it difficult for traditional caching strategies, which often rely on metrics like request frequency or time intervals, to be effective. Despite these complexities, the optimization of edge caching is crucial. Improved byte hit rates at the edge not only alleviate the load on the network backbone but also minimize operational costs and expedite content delivery to end-users. In this paper, we introduce HR-Cache, a comprehensive learning-based caching framework grounded in the principles of Hazard Rate (HR) ordering, a rule originally formulated to compute an upper bound on cache performance. HR-Cache leverages this rule to guide future object eviction decisions. It employs a lightweight machine learning model to learn from caching decisions made based on HR ordering, subsequently predicting the "cache-friendliness" of incoming requests. Objects deemed "cache-averse" are placed into cache as priority candidates for eviction. Through extensive experimentation, we demonstrate that HR-Cache not only consistently enhances byte hit rates compared to existing state-of-the-art methods but also achieves this with minimal prediction overhead. Our experimental results, using three real-world traces and one synthetic trace, indicate that HR-Cache consistently achieves 2.2-14.6% greater WAN traffic savings than LRU. It outperforms not only heuristic caching strategies but also the state-of-the-art learning-based algorithm.

Updated: 2024-04-04 02:41:11

标题: 边缘内容交付的基于学习的缓存机制

摘要: 随着5G网络的出现和物联网(IoT)的兴起,内容交付网络(CDN)越来越延伸到网络边缘。这种转变带来了独特的挑战,尤其是由于边缘的缓存存储有限以及多样化的请求模式。这些边缘环境可能承载以各种对象大小分布和对象访问模式为特征的流量类别。这种复杂性使得传统的缓存策略难以有效实施,这些策略通常依赖于请求频率或时间间隔等指标。尽管存在这些复杂性,优化边缘缓存至关重要。在边缘提高字节命中率不仅可以减轻网络骨干负载,还可以最小化运营成本并加快内容交付给终端用户的速度。 在本文中,我们介绍了HR-Cache,这是一个基于学习的缓存框架,基于危险率(HR)排序原则,这是最初为了计算缓存性能上限而制定的规则。HR-Cache利用这一规则指导未来的对象驱逐决策。它采用轻量级机器学习模型,从基于HR排序的缓存决策中学习,随后预测传入请求的“缓存友好性”。被视为“不适合缓存”的对象会被放入缓存,作为优先候选对象进行驱逐。通过广泛的实验,我们展示了HR-Cache不仅与现有的最先进方法相比始终提高了字节命中率,而且实现了最小的预测开销。 我们的实验结果使用三个真实世界迹线和一个合成迹线,表明HR-Cache始终比LRU实现了2.2-14.6%更大的广域网(WAN)流量节约。它不仅优于启发式缓存策略,还优于最先进的基于学习的算法。

更新时间: 2024-04-04 02:41:11

领域: cs.NI,cs.DC,cs.LG

下载: http://arxiv.org/abs/2402.02795v2

Does Knowledge Graph Really Matter for Recommender Systems?

Recommender systems (RSs) are designed to provide personalized recommendations to users. Recently, knowledge graphs (KGs) have been widely introduced in RSs to improve recommendation accuracy. In this study, however, we demonstrate that RSs do not necessarily perform worse even if the KG is downgraded to the user-item interaction graph only (or removed). We propose an evaluation framework KG4RecEval to systematically evaluate how much a KG contributes to the recommendation accuracy of a KG-based RS, using our defined metric KGER (KG utilization efficiency in recommendation). We consider the scenarios where knowledge in a KG gets completely removed, randomly distorted and decreased, and also where recommendations are for cold-start users. Our extensive experiments on four commonly used datasets and a number of state-of-the-art KG-based RSs reveal that: to remove, randomly distort or decrease knowledge does not necessarily decrease recommendation accuracy, even for cold-start users. These findings inspire us to rethink how to better utilize knowledge from existing KGs, whereby we discuss and provide insights into what characteristics of datasets and KG-based RSs may help improve KG utilization efficiency.

Updated: 2024-04-04 02:32:58

标题: 知识图谱对推荐系统真的很重要吗?

摘要: 推荐系统(RSs)旨在为用户提供个性化推荐。最近,知识图(KGs)已被广泛引入RSs中,以提高推荐准确性。然而,在这项研究中,我们证明即使将KG降级为仅用户-项目交互图(或删除),RSs也不一定表现更差。我们提出了一个评估框架KG4RecEval,以系统评估KG对基于KG的RS的推荐准确性的贡献,使用我们定义的度量KGER(推荐中KG利用效率)。我们考虑在KG中知识完全被删除、随机失真和减少的情况,以及推荐面向冷启动用户的情况。我们在四个常用数据集和一些最先进的基于KG的RS上进行了广泛实验,揭示:删除、随机失真或减少知识不一定会降低推荐准确性,即使对于冷启动用户也是如此。这些发现启发我们重新思考如何更好地利用现有KG的知识,我们讨论并提供见解,探讨数据集和基于KG的RS的特征可能有助于提高KG利用效率。

更新时间: 2024-04-04 02:32:58

领域: cs.IR,cs.AI,cs.LG

下载: http://arxiv.org/abs/2404.03164v1

Uncertainty in Language Models: Assessment through Rank-Calibration

Language Models (LMs) have shown promising performance in natural language generation. However, as LMs often generate incorrect or hallucinated responses, it is crucial to correctly quantify their uncertainty in responding to given inputs. In addition to verbalized confidence elicited via prompting, many uncertainty measures ($e.g.$, semantic entropy and affinity-graph-based measures) have been proposed. However, these measures can differ greatly, and it is unclear how to compare them, partly because they take values over different ranges ($e.g.$, $[0,\infty)$ or $[0,1]$). In this work, we address this issue by developing a novel and practical framework, termed $Rank$-$Calibration$, to assess uncertainty and confidence measures for LMs. Our key tenet is that higher uncertainty (or lower confidence) should imply lower generation quality, on average. Rank-calibration quantifies deviations from this ideal relationship in a principled manner, without requiring ad hoc binary thresholding of the correctness score ($e.g.$, ROUGE or METEOR). The broad applicability and the granular interpretability of our methods are demonstrated empirically.

Updated: 2024-04-04 02:31:05

标题: 语言模型中的不确定性:通过排名校准进行评估

摘要: 语言模型(LMs)在自然语言生成中表现出有希望的性能。然而,由于LMs经常生成不正确或幻想的响应,因此正确量化它们在响应给定输入时的不确定性非常重要。除了通过提示引出的口头信心之外,许多不确定性度量(例如,语义熵和基于亲和图的度量)已被提出。然而,这些度量可能有很大差异,目前尚不清楚如何比较它们,部分原因是它们取值范围不同(例如,[0,∞)或[0,1])。在这项工作中,我们通过开发一种新颖且实用的框架,称为Rank-Calibration,来解决这个问题,以评估LMs的不确定性和信心度量。我们的关键理念是,更高的不确定性(或更低的信心)应该意味着平均生成质量更低。Rank-Calibration以一种原则性的方式量化了与这种理想关系的偏差,而无需对正确性分数进行临时的二进制阈值处理(例如,ROUGE或METEOR)。我们的方法在实证上展示了其广泛适用性和细粒度可解释性。

更新时间: 2024-04-04 02:31:05

领域: cs.CL,cs.AI,cs.LG,stat.ML

下载: http://arxiv.org/abs/2404.03163v1

LTRDetector: Exploring Long-Term Relationship for Advanced Persistent Threats Detection

Advanced Persistent Threat (APT) is challenging to detect due to prolonged duration, infrequent occurrence, and adept concealment techniques. Existing approaches primarily concentrate on the observable traits of attack behaviors, neglecting the intricate relationships formed throughout the persistent attack lifecycle. Thus, we present an innovative APT detection framework named LTRDetector, implementing an end-to-end holistic operation. LTRDetector employs an innovative graph embedding technique to retain comprehensive contextual information, then derives long-term features from these embedded provenance graphs. During the process, we compress the data of the system provenance graph for effective feature learning. Furthermore, in order to detect attacks conducted by using zero-day exploits, we captured the system's regular behavior and detects abnormal activities without relying on predefined attack signatures. We also conducted extensive evaluations using five prominent datasets, the efficacy evaluation of which underscores the superiority of LTRDetector compared to existing state-of-the-art techniques.

Updated: 2024-04-04 02:30:51

标题: LTRDetector:探索用于高级持续威胁检测的长期关系

摘要: 高级持续威胁(APT)由于持续时间长、发生不经常且隐藏技术熟练而具有挑战性。现有方法主要集中在攻击行为的可观察特征上,忽视了在持续攻击生命周期中形成的复杂关系。因此,我们提出了一种名为LTRDetector的创新APT检测框架,实施端到端的整体操作。LTRDetector采用创新的图嵌入技术来保留全面的上下文信息,然后从这些嵌入的来源图中导出长期特征。在此过程中,我们压缩系统来源图的数据以进行有效的特征学习。此外,为了检测使用零日漏洞进行的攻击,我们捕获了系统的常规行为,并在不依赖预定义攻击签名的情况下检测异常活动。我们还使用五个知名数据集进行了广泛的评估,其有效性评估凸显了与现有最先进技术相比LTRDetector的优越性。

更新时间: 2024-04-04 02:30:51

领域: cs.CR

下载: http://arxiv.org/abs/2404.03162v1

API Is Enough: Conformal Prediction for Large Language Models Without Logit-Access

This study aims to address the pervasive challenge of quantifying uncertainty in large language models (LLMs) without logit-access. Conformal Prediction (CP), known for its model-agnostic and distribution-free features, is a desired approach for various LLMs and data distributions. However, existing CP methods for LLMs typically assume access to the logits, which are unavailable for some API-only LLMs. In addition, logits are known to be miscalibrated, potentially leading to degraded CP performance. To tackle these challenges, we introduce a novel CP method that (1) is tailored for API-only LLMs without logit-access; (2) minimizes the size of prediction sets; and (3) ensures a statistical guarantee of the user-defined coverage. The core idea of this approach is to formulate nonconformity measures using both coarse-grained (i.e., sample frequency) and fine-grained uncertainty notions (e.g., semantic similarity). Experimental results on both close-ended and open-ended Question Answering tasks show our approach can mostly outperform the logit-based CP baselines.

Updated: 2024-04-04 02:15:39

标题: API已足够:大型语言模型的无需逻辑访问的符合性预测

摘要: 该研究旨在解决在没有对数访问的情况下量化大型语言模型(LLMs)中的不确定性的普遍挑战。符合预测(CP)以其与模型无关和无分布特性而闻名,是各种LLMs和数据分布的理想方法。然而,现有的LLMs的CP方法通常假设可以访问对数,而对于一些仅API的LLMs则不可用。此外,已知对数可能未校准,可能导致CP性能下降。为了解决这些挑战,我们引入了一种新的CP方法,该方法(1)专为无法访问对数的仅API的LLMs定制;(2)最小化预测集的大小;以及(3)确保用户定义的覆盖率的统计保证。该方法的核心思想是利用粗粒度(例如,样本频率)和细粒度不确定性概念(例如,语义相似性)来制定不符合度量。在封闭式和开放式问答任务的实验结果显示,我们的方法大多能胜过基于对数的CP基线。

更新时间: 2024-04-04 02:15:39

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2403.01216v2

Temporally Consistent Unbalanced Optimal Transport for Unsupervised Action Segmentation

We propose a novel approach to the action segmentation task for long, untrimmed videos, based on solving an optimal transport problem. By encoding a temporal consistency prior into a Gromov-Wasserstein problem, we are able to decode a temporally consistent segmentation from a noisy affinity/matching cost matrix between video frames and action classes. Unlike previous approaches, our method does not require knowing the action order for a video to attain temporal consistency. Furthermore, our resulting (fused) Gromov-Wasserstein problem can be efficiently solved on GPUs using a few iterations of projected mirror descent. We demonstrate the effectiveness of our method in an unsupervised learning setting, where our method is used to generate pseudo-labels for self-training. We evaluate our segmentation approach and unsupervised learning pipeline on the Breakfast, 50-Salads, YouTube Instructions and Desktop Assembly datasets, yielding state-of-the-art results for the unsupervised video action segmentation task.

Updated: 2024-04-04 02:06:15

标题: "时间一致的不平衡最优输运用于无监督动作分割"

摘要: 我们提出了一种针对长时间未修剪视频的动作分割任务的新方法,基于解决最优输运问题。通过将时间一致性先验编码到Gromov-Wasserstein问题中,我们能够从视频帧和动作类别之间的嘈杂亲和性/匹配成本矩阵中解码出一个时间一致的分割。与先前的方法不同,我们的方法不需要知道视频的动作顺序才能实现时间一致性。此外,我们得到的(融合的)Gromov-Wasserstein问题可以通过在GPU上使用几次迭代的投影镜下降方法高效求解。我们在一个无监督学习环境中展示了我们方法的有效性,其中我们的方法用于生成自我训练的伪标签。我们在Breakfast、50-Salads、YouTube Instructions和Desktop Assembly数据集上评估了我们的分割方法和无监督学习流程,产生了无监督视频动作分割任务的最新结果。

更新时间: 2024-04-04 02:06:15

领域: cs.CV,cs.LG,eess.IV

下载: http://arxiv.org/abs/2404.01518v2

Learning Generalizable Tool-use Skills through Trajectory Generation

Autonomous systems that efficiently utilize tools can assist humans in completing many common tasks such as cooking and cleaning. However, current systems fall short of matching human-level of intelligence in terms of adapting to novel tools. Prior works based on affordance often make strong assumptions about the environments and cannot scale to more complex, contact-rich tasks. In this work, we tackle this challenge and explore how agents can learn to use previously unseen tools to manipulate deformable objects. We propose to learn a generative model of the tool-use trajectories as a sequence of tool point clouds, which generalizes to different tool shapes. Given any novel tool, we first generate a tool-use trajectory and then optimize the sequence of tool poses to align with the generated trajectory. We train a single model on four different challenging deformable object manipulation tasks, using demonstration data from only one tool per task. The model generalizes to various novel tools, significantly outperforming baselines. We further test our trained policy in the real world with unseen tools, where it achieves the performance comparable to human. Additional materials can be found on our project website: https://sites.google.com/view/toolgen.

Updated: 2024-04-04 02:03:20

标题: 通过轨迹生成学习可推广的工具使用技能

摘要: 自主系统可以有效地利用工具来帮助人类完成许多常见任务,如烹饪和清洁。然而,当前的系统在适应新工具方面仍无法与人类智能相匹敌。以工具使用为基础的先前工作经常对环境做出强烈假设,无法扩展到更复杂、接触丰富的任务。在这项工作中,我们面对这一挑战,探讨了智能体如何学习使用以前未见过的工具来操纵可变形物体。我们提出学习工具使用轨迹的生成模型,作为工具点云序列,这种模型可以推广到不同的工具形状。对于任何新工具,我们首先生成一个工具使用轨迹,然后优化工具姿势序列,使其与生成的轨迹对齐。我们在四种不同的具有挑战性的可变形物体操纵任务上训练了一个单一模型,仅使用每个任务的一个工具的演示数据。该模型推广到各种新工具,明显优于基线。我们进一步测试了我们训练好的策略在真实世界中使用未见过的工具时的表现,其性能可与人类相媲美。我们的项目网站上可以找到更多相关资料:https://sites.google.com/view/toolgen。

更新时间: 2024-04-04 02:03:20

领域: cs.RO,cs.AI,I.2.9

下载: http://arxiv.org/abs/2310.00156v3

Language Model Evolution: An Iterated Learning Perspective

With the widespread adoption of Large Language Models (LLMs), the prevalence of iterative interactions among these models is anticipated to increase. Notably, recent advancements in multi-round self-improving methods allow LLMs to generate new examples for training subsequent models. At the same time, multi-agent LLM systems, involving automated interactions among agents, are also increasing in prominence. Thus, in both short and long terms, LLMs may actively engage in an evolutionary process. We draw parallels between the behavior of LLMs and the evolution of human culture, as the latter has been extensively studied by cognitive scientists for decades. Our approach involves leveraging Iterated Learning (IL), a Bayesian framework that elucidates how subtle biases are magnified during human cultural evolution, to explain some behaviors of LLMs. This paper outlines key characteristics of agents' behavior in the Bayesian-IL framework, including predictions that are supported by experimental verification with various LLMs. This theoretical framework could help to more effectively predict and guide the evolution of LLMs in desired directions.

Updated: 2024-04-04 02:01:25

标题: 语言模型演化:迭代学习的视角

摘要: 随着大型语言模型(LLMs)的广泛采用,预计这些模型之间的迭代交互的普及程度将增加。值得注意的是,最近在多轮自我改进方法方面取得的进展使LLMs能够为训练后续模型生成新的示例。与此同时,涉及代理之间自动交互的多代理LLM系统也日益突显。因此,在短期和长期内,LLMs可能积极参与演化过程。我们将LLMs的行为与人类文化的演化进行了类比,因为后者已经被认知科学家们研究了几十年。我们的方法涉及利用迭代学习(IL),这是一个贝叶斯框架,阐明了人类文化演化过程中微妙偏见是如何放大的,以解释LLMs的一些行为。本文概述了贝叶斯-IL框架中代理行为的关键特征,包括通过与各种LLMs进行实验验证支持的预测。这个理论框架有助于更有效地预测和引导LLMs朝着期望的方向发展。

更新时间: 2024-04-04 02:01:25

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2404.04286v1

SOLAR 10.7B: Scaling Large Language Models with Simple yet Effective Depth Up-Scaling

We introduce SOLAR 10.7B, a large language model (LLM) with 10.7 billion parameters, demonstrating superior performance in various natural language processing (NLP) tasks. Inspired by recent efforts to efficiently up-scale LLMs, we present a method for scaling LLMs called depth up-scaling (DUS), which encompasses depthwise scaling and continued pretraining. In contrast to other LLM up-scaling methods that use mixture-of-experts, DUS does not require complex changes to train and inference efficiently. We show experimentally that DUS is simple yet effective in scaling up high-performance LLMs from small ones. Building on the DUS model, we additionally present SOLAR 10.7B-Instruct, a variant fine-tuned for instruction-following capabilities, surpassing Mixtral-8x7B-Instruct. SOLAR 10.7B is publicly available under the Apache 2.0 license, promoting broad access and application in the LLM field.

Updated: 2024-04-04 01:53:38

标题: SOLAR 10.7B:用简单而有效的深度放大来扩展大型语言模型

摘要: 我们介绍了SOLAR 10.7B,这是一个具有107亿参数的大型语言模型(LLM),在各种自然语言处理(NLP)任务中展现出优越性能。受到最近有效地扩大规模LLM的努力的启发,我们提出了一种称为深度扩展(DUS)的LLM扩展方法,其中包括深度缩放和持续预训练。与其他使用专家混合的LLM扩展方法不同,DUS不需要复杂的更改以有效地进行训练和推断。我们实验证明,DUS在从小型LLM扩展至高性能LLM方面既简单又有效。在DUS模型的基础上,我们另外提出了SOLAR 10.7B-Instruct,这是一个针对遵循指令功能进行微调的变体,超越了Mixtral-8x7B-Instruct。SOLAR 10.7B在Apache 2.0许可下公开提供,促进了在LLM领域的广泛访问和应用。

更新时间: 2024-04-04 01:53:38

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2312.15166v3

L2MAC: Large Language Model Automatic Computer for Extensive Code Generation

Transformer-based large language models (LLMs) are constrained by the fixed context window of the underlying transformer architecture, hindering their ability to produce long and coherent outputs. Memory-augmented LLMs are a promising solution, but current approaches cannot handle long output generation tasks since they (1) only focus on reading memory and reduce its evolution to the concatenation of new memories or (2) use very specialized memories that cannot adapt to other domains. This paper presents L2MAC, the first practical LLM-based stored-program automatic computer (von Neumann architecture) framework, an LLM-based multi-agent system, for long and consistent output generation. Its memory has two components: the instruction registry, which is populated with a prompt program to solve the user-given task, and a file store, which will contain the final and intermediate outputs. Each instruction in turn is executed by a separate LLM agent, whose context is managed by a control unit capable of precise memory reading and writing to ensure effective interaction with the file store. These components enable L2MAC to generate extensive outputs, bypassing the constraints of the finite context window while producing outputs that fulfill a complex user-specified task. We empirically demonstrate that L2MAC achieves state-of-the-art performance in generating large codebases for system design tasks, significantly outperforming other coding methods in implementing the detailed user-specified task, and we provide valuable insights into the reasons for this performance gap.

Updated: 2024-04-04 01:53:27

标题: L2MAC:用于广泛代码生成的大型语言模型自动计算机

摘要: 基于Transformer的大型语言模型(LLM)受到底层Transformer架构固定上下文窗口的限制,阻碍了它们产生长而连贯的输出的能力。记忆增强的LLM是一个有前途的解决方案,但目前的方法无法处理长时间的输出生成任务,因为它们要么只关注读取内存并将其演变为新内存的串联,要么使用无法适应其他领域的非常专门化的内存。本文提出了L2MAC,这是第一个实用的基于LLM的存储程序自动计算机(冯·诺伊曼架构)框架,一个基于LLM的多智能体系统,用于长期和连贯的输出生成。它的记忆包括两个组件:指令注册表,其中填充了一个用于解决用户给定任务的提示程序,以及一个文件存储器,其中将包含最终和中间输出。每个指令依次由一个单独的LLM代理执行,其上下文由一个控制单元管理,该控制单元能够精确地读写内存,以确保与文件存储器的有效交互。这些组件使L2MAC能够生成大量输出,绕过有限上下文窗口的限制,同时产生满足复杂用户指定任务的输出。我们在实证上证明,L2MAC在为系统设计任务生成大型代码库方面达到了最先进的性能,明显优于其他编码方法在实现详细用户指定任务方面的表现,并为此性能差距提供了有价值的见解。

更新时间: 2024-04-04 01:53:27

领域: cs.SE,cs.AI,cs.LG,cs.PL,I.2.7; I.2.6; I.2.5; D.2.2; D.2.3; D.3.4

下载: http://arxiv.org/abs/2310.02003v4

Data-Efficient Strategies for Probabilistic Voltage Envelopes under Network Contingencies

This work presents an efficient data-driven method to construct probabilistic voltage envelopes (PVE) using power flow learning in grids with network contingencies. First, a network-aware Gaussian process (GP) termed Vertex-Degree Kernel (VDK-GP), developed in prior work, is used to estimate voltage-power functions for a few network configurations. The paper introduces a novel multi-task vertex degree kernel (MT-VDK) that amalgamates the learned VDK-GPs to determine power flows for unseen networks, with a significant reduction in the computational complexity and hyperparameter requirements compared to alternate approaches. Simulations on the IEEE 30-Bus network demonstrate the retention and transfer of power flow knowledge in both N-1 and N-2 contingency scenarios. The MT-VDK-GP approach achieves over 50% reduction in mean prediction error for novel N-1 contingency network configurations in low training data regimes (50-250 samples) over VDK-GP. Additionally, MT-VDK-GP outperforms a hyper-parameter based transfer learning approach in over 75% of N-2 contingency network structures, even without historical N-2 outage data. The proposed method demonstrates the ability to achieve PVEs using sixteen times fewer power flow solutions compared to Monte-Carlo sampling-based methods.

Updated: 2024-04-04 01:52:32

标题: 网络事故下概率电压包络的高效数据策略

摘要: 这项工作提出了一种高效的数据驱动方法,利用电网网络中的功率流学习来构建概率电压包络(PVE)。首先,利用先前开发的网络感知高斯过程(GP)——顶点-度核(VDK-GP)来估计几种网络配置的电压-功率函数。本文引入了一种新颖的多任务顶点度核(MT-VDK),将学习到的VDK-GP合并起来,以确定未知网络的功率流,相比于其他方法,显著减少了计算复杂性和超参数要求。在IEEE 30-Bus网络上的模拟表明,在N-1和N-2事故情形下,功率流知识的保留和转移。MT-VDK-GP方法在低训练数据(50-250个样本)的情况下,相比VDK-GP,新颖的N-1事故网络配置可实现超过50%的平均预测误差减少。此外,即使没有历史N-2故障数据,MT-VDK-GP在超过75%的N-2事故网络结构中表现优于基于超参数的迁移学习方法。所提出的方法展示了使用比蒙特卡洛采样方法少16倍的功率流解决方案,实现PVE的能力。

更新时间: 2024-04-04 01:52:32

领域: cs.LG,cs.SY,eess.SY

下载: http://arxiv.org/abs/2310.00763v3

NLP at UC Santa Cruz at SemEval-2024 Task 5: Legal Answer Validation using Few-Shot Multi-Choice QA

This paper presents our submission to the SemEval 2024 Task 5: The Legal Argument Reasoning Task in Civil Procedure. We present two approaches to solving the task of legal answer validation, given an introduction to the case, a question and an answer candidate. Firstly, we fine-tuned pre-trained BERT-based models and found that models trained on domain knowledge perform better. Secondly, we performed few-shot prompting on GPT models and found that reformulating the answer validation task to be a multiple-choice QA task remarkably improves the performance of the model. Our best submission is a BERT-based model that achieved the 7th place out of 20.

Updated: 2024-04-04 01:50:20

标题: NLP在UC圣塔克鲁兹参加SemEval-2024任务5:使用少样本多选问答验证的法律答案

摘要: 本文介绍了我们对SemEval 2024任务5的提交:民事诉讼中的法律论证任务。我们提出了两种解决法律答案验证任务的方法,给定一个案例介绍、一个问题和一个答案候选。首先,我们对预训练的BERT模型进行了微调,发现在领域知识上训练的模型表现更好。其次,我们在GPT模型上执行了少量提示,并发现将答案验证任务重新构造为多选问答任务显著提高了模型的性能。我们最佳的提交是一个基于BERT的模型,在20个模型中排名第7。

更新时间: 2024-04-04 01:50:20

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2404.03150v1

Real-time Control of Electric Autonomous Mobility-on-Demand Systems via Graph Reinforcement Learning

Operators of Electric Autonomous Mobility-on-Demand (E-AMoD) fleets need to make several real-time decisions such as matching available vehicles to ride requests, rebalancing idle vehicles to areas of high demand, and charging vehicles to ensure sufficient range. While this problem can be posed as a linear program that optimizes flows over a space-charge-time graph, the size of the resulting optimization problem does not allow for real-time implementation in realistic settings. In this work, we present the E-AMoD control problem through the lens of reinforcement learning and propose a graph network-based framework to achieve drastically improved scalability and superior performance over heuristics. Specifically, we adopt a bi-level formulation where we (1) leverage a graph network-based RL agent to specify a desired next state in the space-charge graph, and (2) solve more tractable linear programs to best achieve the desired state while ensuring feasibility. Experiments using real-world data from San Francisco and New York City show that our approach achieves up to 89% of the profits of the theoretically-optimal solution while achieving more than a 100x speedup in computational time. We further highlight promising zero-shot transfer capabilities of our learned policy on tasks such as inter-city generalization and service area expansion, thus showing the utility, scalability, and flexibility of our framework. Finally, our approach outperforms the best domain-specific heuristics with comparable runtimes, with an increase in profits by up to 3.2x.

Updated: 2024-04-04 01:43:42

标题: 实时控制电动自主移动需求系统通过图强化学习

摘要: 电动自主移动出行(E-AMoD)车队的运营商需要做出一些实时决策,例如将可用车辆与乘车请求匹配、将空闲车辆重新平衡到高需求区域,并为车辆充电以确保足够的续航里程。虽然这个问题可以被描述为一个线性规划,优化空间-充电-时间图上的流动,但是由此产生的优化问题的规模并不允许在现实环境中进行实时实施。在这项工作中,我们通过强化学习的视角提出了E-AMoD控制问题,并提出了一个基于图网络的框架,实现了大幅度的可扩展性和优越的性能,超越了启发式方法。具体来说,我们采用双层制定,在这个制定中,我们(1)利用基于图网络的RL代理来指定空间-充电图中的期望下一个状态,(2)解决更易处理的线性规划,以最佳实现期望状态同时确保可行性。使用旧金山和纽约市的真实数据进行实验表明,我们的方法实现了理论最优解利润的高达89%,同时计算时间加快了超过100倍。我们进一步突出了我们学习策略在跨城通用化和服务区域扩展等任务上的有前途的零-shot转移能力,从而展示了我们框架的实用性、可扩展性和灵活性。最后,我们的方法在具有可比运行时间的最佳领域特定启发式方法中表现出色,利润增加了高达3.2倍。

更新时间: 2024-04-04 01:43:42

领域: eess.SY,cs.LG,cs.RO,cs.SY

下载: http://arxiv.org/abs/2311.05780v2

Eigenpruning

We introduce eigenpruning, a method that removes singular values from weight matrices in an LLM to improve its performance in a particular task. This method is inspired by interpretability methods designed to automatically find subnetworks of a model which solve a specific task. In our tests, the pruned model outperforms the original model by a large margin, while only requiring minimal computation to prune the weight matrices. In the case of a small synthetic task in integer multiplication, the Phi-2 model can improve its accuracy in the test set from 13.75% to 97.50%. Interestingly, these results seem to indicate the existence of a computation path that can solve the task very effectively, but it was not being used by the original model. Finally, we plan to open-source our implementation in the camera-ready version of our work.

Updated: 2024-04-04 01:42:28

标题: 特征修剪

摘要: 我们引入了特征修剪(eigenpruning)方法,该方法从LLM中的权重矩阵中删除奇异值,以提高其在特定任务中的性能。这种方法受启发于设计用于自动找到解决特定任务的模型的可解释性方法。在我们的测试中,修剪后的模型表现远远优于原始模型,同时只需要最少的计算来修剪权重矩阵。在一个小的合成任务(整数乘法)中,Phi-2模型可以将其在测试集中的准确率从13.75%提高到97.50%。有趣的是,这些结果似乎表明存在一个可以非常有效地解决任务的计算路径,但原始模型没有使用。最后,我们计划在我们的工作的最终版本中开源我们的实现。

更新时间: 2024-04-04 01:42:28

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2404.03147v1

A Survey on Large Language Model based Autonomous Agents

Autonomous agents have long been a prominent research focus in both academic and industry communities. Previous research in this field often focuses on training agents with limited knowledge within isolated environments, which diverges significantly from human learning processes, and thus makes the agents hard to achieve human-like decisions. Recently, through the acquisition of vast amounts of web knowledge, large language models (LLMs) have demonstrated remarkable potential in achieving human-level intelligence. This has sparked an upsurge in studies investigating LLM-based autonomous agents. In this paper, we present a comprehensive survey of these studies, delivering a systematic review of the field of LLM-based autonomous agents from a holistic perspective. More specifically, we first discuss the construction of LLM-based autonomous agents, for which we propose a unified framework that encompasses a majority of the previous work. Then, we present a comprehensive overview of the diverse applications of LLM-based autonomous agents in the fields of social science, natural science, and engineering. Finally, we delve into the evaluation strategies commonly used for LLM-based autonomous agents. Based on the previous studies, we also present several challenges and future directions in this field. To keep track of this field and continuously update our survey, we maintain a repository of relevant references at https://github.com/Paitesanshi/LLM-Agent-Survey.

Updated: 2024-04-04 01:32:04

标题: 基于大型语言模型的自主代理的调查

摘要: 自主代理已经成为学术界和工业界长期以来的研究重点。这一领域的先前研究通常集中在训练在孤立环境中具有有限知识的代理,这与人类学习过程有着显著差异,因此使得代理难以实现类似人类的决策。最近,通过获取大量网络知识,大型语言模型(LLMs)已经展示出实现人类水平智能的巨大潜力。这引发了对基于LLM的自主代理进行研究的热潮。在本文中,我们从整体的角度对基于LLM的自主代理的研究进行了全面调查,提供了该领域的系统综述。具体来说,我们首先讨论了基于LLM的自主代理的构建,我们提出了一个涵盖大多数先前工作的统一框架。然后,我们全面介绍了基于LLM的自主代理在社会科学、自然科学和工程领域的各种应用。最后,我们深入探讨了常用于基于LLM的自主代理的评估策略。基于先前的研究,我们还提出了该领域的几个挑战和未来方向。为了跟踪这一领域并不断更新我们的调查,我们维护了一个相关参考资料的存储库,网址为https://github.com/Paitesanshi/LLM-Agent-Survey。

更新时间: 2024-04-04 01:32:04

领域: cs.AI,cs.CL

下载: http://arxiv.org/abs/2308.11432v5

Theoretical and Empirical Insights into the Origins of Degree Bias in Graph Neural Networks

Graph Neural Networks (GNNs) often perform better for high-degree nodes than low-degree nodes on node classification tasks. This degree bias can reinforce social marginalization by, e.g., sidelining authors of lowly-cited papers when predicting paper topics in citation networks. While researchers have proposed numerous hypotheses for why GNN degree bias occurs, we find via a survey of 38 degree bias papers that these hypotheses are often not rigorously validated, and can even be contradictory. Thus, we provide an analysis of the origins of degree bias in message-passing GNNs with different graph filters. We prove that high-degree test nodes tend to have a lower probability of misclassification regardless of how GNNs are trained. Moreover, we show that degree bias arises from a variety of factors that are associated with a node's degree (e.g., homophily of neighbors, diversity of neighbors). Furthermore, we show that during training, some GNNs may adjust their loss on low-degree nodes more slowly than on high-degree nodes; however, with sufficiently many epochs of training, message-passing GNNs can achieve their maximum possible training accuracy, which is not significantly limited by their expressive power. Throughout our analysis, we connect our findings to previously-proposed hypotheses for the origins of degree bias, supporting and unifying some while drawing doubt to others. We validate our theoretical findings on 8 common real-world networks, and based on our theoretical and empirical insights, describe a roadmap to alleviate degree bias.

Updated: 2024-04-04 01:24:27

标题: 图神经网络中度偏差起源的理论和实证洞见

摘要: 图神经网络(GNNs)在节点分类任务中通常对高度节点的表现优于低度节点。这种度偏差可能通过在引用网络中预测论文主题时,将低引用论文的作者排除在外而加剧社会边缘化。虽然研究人员提出了许多有关GNN度偏差产生原因的假设,但我们通过对38篇度偏差论文的调查发现,这些假设通常没有得到严格验证,甚至可能相互矛盾。因此,我们对传递消息的GNNs中度偏差的起源进行了分析,使用不同的图滤波器。我们证明,无论GNNs如何训练,高度测试节点往往更不容易被错误分类。此外,我们展示度偏差源自与节点度数相关的各种因素(例如,邻居的同质性,邻居的多样性)。此外,我们展示,在训练过程中,一些GNNs可能会比高度节点上的低度节点更慢地调整其损失;然而,通过足够多次的训练,传递消息的GNNs可以实现其最大可能的训练准确度,这并不受其表达能力的显著限制。在整个分析过程中,我们将我们的发现与先前提出的关于度偏差起源的假设联系起来,支持和统一一些假设,对其他假设提出质疑。我们在8个常见的现实世界网络上验证了我们的理论发现,并基于我们的理论和实证洞察,描述了缓解度偏差的路线图。

更新时间: 2024-04-04 01:24:27

领域: cs.LG,cs.SI

下载: http://arxiv.org/abs/2404.03139v1

A General and Flexible Multi-concept Parsing Framework for Multilingual Semantic Matching

Sentence semantic matching is a research hotspot in natural language processing, which is considerably significant in various key scenarios, such as community question answering, searching, chatbot, and recommendation. Since most of the advanced models directly model the semantic relevance among words between two sentences while neglecting the \textit{keywords} and \textit{intents} concepts of them, DC-Match is proposed to disentangle keywords from intents and utilizes them to optimize the matching performance. Although DC-Match is a simple yet effective method for semantic matching, it highly depends on the external NER techniques to identify the keywords of sentences, which limits the performance of semantic matching for minor languages since satisfactory NER tools are usually hard to obtain. In this paper, we propose to generally and flexibly resolve the text into multi concepts for multilingual semantic matching to liberate the model from the reliance on NER models. To this end, we devise a \underline{M}ulti-\underline{C}oncept \underline{P}arsed \underline{S}emantic \underline{M}atching framework based on the pre-trained language models, abbreviated as \textbf{MCP-SM}, to extract various concepts and infuse them into the classification tokens. We conduct comprehensive experiments on English datasets QQP and MRPC, and Chinese dataset Medical-SM. Besides, we experiment on Arabic datasets MQ2Q and XNLI, the outstanding performance further prove MCP-SM's applicability in low-resource languages.

Updated: 2024-04-04 01:07:24

标题: 一个通用且灵活的多概念解析框架,用于多语言语义匹配

摘要: 句子语义匹配是自然语言处理中的研究热点,在各种关键场景中具有相当重要的意义,例如社区问答、搜索、聊天机器人和推荐系统。由于大多数先进模型直接建模两个句子之间的词语语义相关性,而忽略了它们的关键词和意图概念,因此提出了DC-Match来将关键词和意图分离,并利用它们来优化匹配性能。尽管DC-Match是一种简单而有效的语义匹配方法,但它高度依赖外部命名实体识别技术来识别句子的关键词,这限制了对少数语言的语义匹配性能,因为通常很难获得令人满意的命名实体识别工具。在本文中,我们提出了一种通用且灵活地将文本解析为多个概念的多语言语义匹配方法,以使模型摆脱对命名实体识别模型的依赖。为此,我们设计了基于预训练语言模型的多概念解析语义匹配框架,简称为MCP-SM,以提取各种概念并将其融入分类标记中。我们在英语数据集QQP和MRPC以及中文数据集Medical-SM上进行了全面实验。此外,我们还在阿拉伯语数据集MQ2Q和XNLI上进行了实验,出色的性能进一步证明了MCP-SM在资源稀缺语言中的适用性。

更新时间: 2024-04-04 01:07:24

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2403.02975v2

EGTR: Extracting Graph from Transformer for Scene Graph Generation

Scene Graph Generation (SGG) is a challenging task of detecting objects and predicting relationships between objects. After DETR was developed, one-stage SGG models based on a one-stage object detector have been actively studied. However, complex modeling is used to predict the relationship between objects, and the inherent relationship between object queries learned in the multi-head self-attention of the object detector has been neglected. We propose a lightweight one-stage SGG model that extracts the relation graph from the various relationships learned in the multi-head self-attention layers of the DETR decoder. By fully utilizing the self-attention by-products, the relation graph can be extracted effectively with a shallow relation extraction head. Considering the dependency of the relation extraction task on the object detection task, we propose a novel relation smoothing technique that adjusts the relation label adaptively according to the quality of the detected objects. By the relation smoothing, the model is trained according to the continuous curriculum that focuses on object detection task at the beginning of training and performs multi-task learning as the object detection performance gradually improves. Furthermore, we propose a connectivity prediction task that predicts whether a relation exists between object pairs as an auxiliary task of the relation extraction. We demonstrate the effectiveness and efficiency of our method for the Visual Genome and Open Image V6 datasets. Our code is publicly available at https://github.com/naver-ai/egtr.

Updated: 2024-04-04 00:59:51

标题: EGTR:从Transformer中提取图形以生成场景图

摘要: 场景图生成(SGG)是一项具有挑战性的任务,涉及检测物体并预测物体之间的关系。在开发了DETR之后,基于一阶段目标检测器的一阶段SGG模型得到了积极研究。然而,为了预测物体之间的关系,通常使用复杂的建模方法,而在目标检测器的多头自注意力中学习的物体查询之间的固有关系却被忽视了。我们提出了一个轻量级的一阶段SGG模型,从DETR解码器的多头自注意力层中学习的各种关系中提取关系图。通过充分利用自注意力的副产品,可以有效地使用浅层关系提取头提取关系图。考虑到关系提取任务对目标检测任务的依赖性,我们提出了一种新颖的关系平滑技术,根据检测到的物体的质量自适应地调整关系标签。通过关系平滑,模型根据连续的课程进行训练,一开始侧重于目标检测任务,随着目标检测性能逐渐提高,会执行多任务学习。此外,我们提出了一个连接性预测任务,作为关系提取的辅助任务,预测物体对之间是否存在关系。我们在Visual Genome和Open Image V6数据集上展示了我们方法的有效性和效率。我们的代码公开可用于https://github.com/naver-ai/egtr。

更新时间: 2024-04-04 00:59:51

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2404.02072v2

A Framework for Guided Motion Planning

Randomized sampling based algorithms are widely used in robot motion planning due to the problem's intractability, and are experimentally effective on a wide range of problem instances. Most variants bias their sampling using various heuristics related to the known underlying structure of the search space. In this work, we formalize the intuitive notion of guided search by defining the concept of a guiding space. This new language encapsulates many seemingly distinct prior methods under the same framework, and allows us to reason about guidance, a previously obscured core contribution of different algorithms. We suggest an information theoretic method to evaluate guidance, which experimentally matches intuition when tested on known algorithms in a variety of environments. The language and evaluation of guidance suggests improvements to existing methods, and allows for simple hybrid algorithms that combine guidance from multiple sources.

Updated: 2024-04-04 00:58:19

标题: 一个引导运动规划的框架

摘要: 基于随机抽样的算法广泛应用于机器人运动规划中,因为问题的复杂性,这些算法在各种问题实例上实验有效。大多数变体使用各种与搜索空间的已知基本结构相关的启发式方法来偏置它们的抽样。在这项工作中,我们通过定义引导空间的概念,形式化了引导搜索的直观概念。这种新语言将许多看似独立的先前方法封装在同一框架下,并使我们能够推理出引导的核心贡献,这是不同算法以前被掩盖的。我们建议一种信息理论方法来评估引导,在各种环境中测试已知算法时,实验结果与直觉相匹配。引导的语言和评估方法提出了对现有方法的改进,并允许简单的混合算法,将来自多个来源的引导结合起来。

更新时间: 2024-04-04 00:58:19

领域: cs.RO,cs.AI

下载: http://arxiv.org/abs/2404.03133v1

Predictive Analytics of Varieties of Potatoes

We explore the application of machine learning algorithms to predict the suitability of Russet potato clones for advancement in breeding trials. Leveraging data from manually collected trials in the state of Oregon, we investigate the potential of a wide variety of state-of-the-art binary classification models. We conduct a comprehensive analysis of the dataset that includes preprocessing, feature engineering, and imputation to address missing values. We focus on several key metrics such as accuracy, F1-score, and Matthews correlation coefficient (MCC) for model evaluation. The top-performing models, namely the multi-layer perceptron (MLPC), histogram-based gradient boosting classifier (HGBC), and a support vector machine (SVC), demonstrate consistent and significant results. Variable selection further enhances model performance and identifies influential features in predicting trial outcomes. The findings emphasize the potential of machine learning in streamlining the selection process for potato varieties, offering benefits such as increased efficiency, substantial cost savings, and judicious resource utilization. Our study contributes insights into precision agriculture and showcases the relevance of advanced technologies for informed decision-making in breeding programs.

Updated: 2024-04-04 00:49:05

标题: 土豆品种的预测分析

摘要: 我们探讨了将机器学习算法应用于预测Russet马铃薯克隆在育种试验中推进的适宜性。利用来自俄勒冈州手动收集试验的数据,我们研究了各种最先进的二元分类模型的潜力。我们对数据集进行了全面的分析,包括预处理、特征工程和填补缺失值。我们关注准确性、F1分数和Matthews相关系数(MCC)等几个关键指标用于模型评估。表现最佳的模型,即多层感知器(MLPC)、基于直方图的梯度提升分类器(HGBC)和支持向量机(SVC),表现出一致且显著的结果。变量选择进一步提高了模型性能,并确定了在预测试验结果中具有影响力的特征。研究结果强调了机器学习在简化马铃薯品种选择过程中的潜力,提供了增加效率、大幅节省成本和明智利用资源等好处。我们的研究为精准农业提供了见解,并展示了先进技术在育种计划中为明智决策提供信息的相关性。

更新时间: 2024-04-04 00:49:05

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2404.03701v1

A Methodology for Improving Accuracy of Embedded Spiking Neural Networks through Kernel Size Scaling

Spiking Neural Networks (SNNs) can offer ultra low power/ energy consumption for machine learning-based applications due to their sparse spike-based operations. Currently, most of the SNN architectures need a significantly larger model size to achieve higher accuracy, which is not suitable for resource-constrained embedded applications. Therefore, developing SNNs that can achieve high accuracy with acceptable memory footprint is highly needed. Toward this, we propose a novel methodology that improves the accuracy of SNNs through kernel size scaling. Its key steps include investigating the impact of different kernel sizes on the accuracy, devising new sets of kernel sizes, generating SNN architectures based on the selected kernel sizes, and analyzing the accuracy-memory trade-offs for SNN model selection. The experimental results show that our methodology achieves higher accuracy than state-of-the-art (93.24% accuracy for CIFAR10 and 70.84% accuracy for CIFAR100) with less than 10M parameters and up to 3.45x speed-up of searching time, thereby making it suitable for embedded applications.

Updated: 2024-04-04 00:36:18

标题: 通过核大小缩放改善嵌入式脉冲神经网络准确性的方法论

摘要: 脉冲神经网络(SNNs)由于其稀疏的基于脉冲的操作,可以为基于机器学习的应用提供极低的功耗/能耗。目前,大多数SNN架构需要更大的模型尺寸才能实现更高的准确性,这对于资源受限的嵌入式应用不太适合。因此,急需开发能够在可接受的内存占用下实现高准确性的SNNs。为此,我们提出了一种通过核大小缩放提高SNNs准确性的新方法。其关键步骤包括调查不同核大小对准确性的影响,设计新的一组核大小,基于选定的核大小生成SNN架构,并分析SNN模型选择的准确性与内存的权衡。实验结果表明,我们的方法实现了比最先进技术更高的准确性(对于CIFAR10达到93.24%的准确性,对于CIFAR100达到70.84%的准确性),参数少于10M,并且搜索时间加速了最多3.45倍,因此适用于嵌入式应用。

更新时间: 2024-04-04 00:36:18

领域: cs.NE,cs.AI,cs.LG

下载: http://arxiv.org/abs/2404.01685v2

Algorithmic Persuasion Through Simulation

We study a Bayesian persuasion problem where a sender wants to persuade a receiver to take a binary action, such as purchasing a product. The sender is informed about the (binary) state of the world, such as whether the quality of the product is high or low, but only has limited information about the receiver's beliefs and utilities. Motivated by customer surveys, user studies, and recent advances in generative AI, we allow the sender to learn more about the receiver by querying an oracle that simulates the receiver's behavior. After a fixed number of queries, the sender commits to a messaging policy and the receiver takes the action that maximizes her expected utility given the message she receives. We characterize the sender's optimal messaging policy given any distribution over receiver types. We then design a polynomial-time querying algorithm that optimizes the sender's expected utility in this Bayesian persuasion game. We also consider approximate oracles, more general query structures, and costly queries.

Updated: 2024-04-04 00:16:08

标题: 算法模拟的说服力

摘要: 我们研究了一个贝叶斯说服问题,其中发送方希望说服接收方采取二元行动,例如购买产品。发送方了解世界的(二元)状态,例如产品的质量是高还是低,但只对接收方的信念和效用有有限的信息。受客户调查、用户研究和生成式人工智能的最新进展的启发,我们允许发送方通过查询模拟接收方行为的预言者来更多地了解接收方。在固定数量的查询后,发送方制定了一项信息政策,接收方根据她收到的信息来最大化她的预期效用。我们表征了发送方在任何接收方类型分布下的最优信息传递政策。然后,我们设计了一个多项式时间查询算法,优化了发送方在这个贝叶斯说服游戏中的预期效用。我们还考虑了近似预言者、更一般的查询结构和昂贵的查询。

更新时间: 2024-04-04 00:16:08

领域: cs.GT,cs.AI,econ.TH

下载: http://arxiv.org/abs/2311.18138v3

Robust deep learning for eye fundus images: Bridging real and synthetic data for enhancing generalization

Deep learning applications for assessing medical images are limited because the datasets are often small and imbalanced. The use of synthetic data has been proposed in the literature, but neither a robust comparison of the different methods nor generalizability has been reported. Our approach integrates a retinal image quality assessment model and StyleGAN2 architecture to enhance Age-related Macular Degeneration (AMD) detection capabilities and improve generalizability. This work compares ten different Generative Adversarial Network (GAN) architectures to generate synthetic eye-fundus images with and without AMD. We combined subsets of three public databases (iChallenge-AMD, ODIR-2019, and RIADD) to form a single training and test set. We employed the STARE dataset for external validation, ensuring a comprehensive assessment of the proposed approach. The results show that StyleGAN2 reached the lowest Frechet Inception Distance (166.17), and clinicians could not accurately differentiate between real and synthetic images. ResNet-18 architecture obtained the best performance with 85% accuracy and outperformed the two human experts (80% and 75%) in detecting AMD fundus images. The accuracy rates were 82.8% for the test set and 81.3% for the STARE dataset, demonstrating the model's generalizability. The proposed methodology for synthetic medical image generation has been validated for robustness and accuracy, with free access to its code for further research and development in this field.

Updated: 2024-04-04 00:13:42

标题: 强大的深度学习用于眼底图像:通过融合真实和合成数据来增强泛化能力

摘要: 医学图像评估的深度学习应用受限于数据集通常较小且不平衡。文献中提出了使用合成数据的方法,但对不同方法的强健比较和泛化性并未报道。我们的方法将视网膜图像质量评估模型与StyleGAN2架构相结合,以增强年龄相关性黄斑变性(AMD)的检测能力并改善泛化性。本研究比较了十种不同的生成对抗网络(GAN)架构,生成带有和不带有AMD的合成眼底图像。我们将三个公共数据库的子集(iChallenge-AMD,ODIR-2019和RIADD)组合成一个训练和测试集。我们使用STARE数据集进行外部验证,确保对所提出方法进行全面评估。结果显示,StyleGAN2达到了最低的Frechet Inception Distance(166.17),临床医生无法准确区分真实和合成图像。ResNet-18架构表现最佳,准确率达到85%,在检测AMD眼底图像方面超过了两位人类专家(80%和75%)。测试集准确率为82.8%,STARE数据集准确率为81.3%,展示了模型的泛化性。所提出的合成医学图像生成方法已经通过鲁棒性和准确性验证,并可免费获取其代码,以进一步研究和发展该领域。

更新时间: 2024-04-04 00:13:42

领域: eess.IV,cs.CV,cs.LG

下载: http://arxiv.org/abs/2203.13856v2

Long-context LLMs Struggle with Long In-context Learning

Large Language Models (LLMs) have made significant strides in handling long sequences exceeding 32K tokens. However, their performance evaluation has largely been confined to metrics like perplexity and synthetic tasks, which may not fully capture their abilities in more nuanced, real-world scenarios. This study introduces a specialized benchmark (LongICLBench) focusing on long in-context learning within the realm of extreme-label classification. We meticulously selected six datasets with a label range spanning 28 to 174 classes covering different input (few-shot demonstration) lengths from 2K to 50K tokens. Our benchmark requires LLMs to comprehend the entire input to recognize the massive label spaces to make correct predictions. We evaluate 13 long-context LLMs on our benchmarks. We find that the long-context LLMs perform relatively well on less challenging tasks with shorter demonstration lengths by effectively utilizing the long context window. However, on the most challenging task Discovery with 174 labels, all the LLMs struggle to understand the task definition, thus reaching a performance close to zero. This suggests a notable gap in current LLM capabilities for processing and understanding long, context-rich sequences. Further analysis revealed a tendency among models to favor predictions for labels presented toward the end of the sequence. Their ability to reason over multiple pieces in the long sequence is yet to be improved. Our study reveals that long context understanding and reasoning is still a challenging task for the existing LLMs. We believe LongICLBench could serve as a more realistic evaluation for the future long-context LLMs.

Updated: 2024-04-04 00:01:25

标题: 长上下文LLMs在长上下文学习中面临困难

摘要: 大型语言模型(LLMs)在处理超过32K标记的长序列方面取得了重大进展。然而,它们的性能评估主要局限于困惑度和合成任务等指标,这些指标可能无法完全捕捉它们在更微妙、真实世界场景中的能力。本研究引入了一个专门的基准(LongICLBench),重点关注长上下文学习在极端标签分类领域中的应用。我们精心选择了六个数据集,标签范围从28到174类,覆盖了不同长度的输入(少样本演示)从2K到50K个标记。我们的基准要求LLMs理解整个输入,以识别庞大的标签空间来进行正确的预测。我们在我们的基准上评估了13个长上下文LLMs。我们发现,长上下文LLMs在较短演示长度的较少挑战性任务上表现相对良好,有效利用长上下文窗口。然而,在具有174个标签的最具挑战性的任务发现中,所有LLMs都难以理解任务定义,因此表现接近零。这表明当前LLMs在处理和理解长、上下文丰富序列方面存在显著差距。进一步的分析揭示了模型倾向于为序列末尾呈现的标签做出预测。它们在长序列中推理多个部分的能力有待改善。我们的研究表明,长上下文理解和推理对现有LLMs仍然是一项具有挑战性的任务。我们相信LongICLBench可以作为未来长上下文LLMs更现实的评估。

更新时间: 2024-04-04 00:01:25

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2404.02060v2

Scalable tensor methods for nonuniform hypergraphs

While multilinear algebra appears natural for studying the multiway interactions modeled by hypergraphs, tensor methods for general hypergraphs have been stymied by theoretical and practical barriers. A recently proposed adjacency tensor is applicable to nonuniform hypergraphs, but is prohibitively costly to form and analyze in practice. We develop tensor times same vector (TTSV) algorithms for this tensor which improve complexity from $O(n^r)$ to a low-degree polynomial in $r$, where $n$ is the number of vertices and $r$ is the maximum hyperedge size. Our algorithms are implicit, avoiding formation of the order $r$ adjacency tensor. We demonstrate the flexibility and utility of our approach in practice by developing tensor-based hypergraph centrality and clustering algorithms. We also show these tensor measures offer complementary information to analogous graph-reduction approaches on data, and are also able to detect higher-order structure that many existing matrix-based approaches provably cannot.

Updated: 2024-04-04 00:00:12

标题: 非均匀超图的可扩展张量方法

摘要: 尽管多线性代数似乎适用于研究由超图建模的多路径交互作用,但一般超图的张量方法受到理论和实际障碍的制约。最近提出的邻接张量适用于非均匀超图,但在实践中形成和分析成本过高。我们为这个张量开发了张量乘以相同向量(TTSV)算法,将复杂度从$O(n^r)$降低到$r$的低次多项式,其中$n$是顶点数,$r$是最大超边大小。我们的算法是隐式的,避免形成$r$阶邻接张量。我们通过开发基于张量的超图中心性和聚类算法来实践展示我们方法的灵活性和效用。我们还展示这些张量度量值在数据上提供了补充信息,与类似图缩减方法相比,并且还能够检测许多现有基于矩阵的方法无法证明的高阶结构。

更新时间: 2024-04-04 00:00:12

领域: math.NA,cs.LG,cs.NA,cs.SI,math.CO,physics.soc-ph,05C65, 15A69, 05C50, 05C85

下载: http://arxiv.org/abs/2306.17825v2

By Xinhai (Sean) Zou.