Arxiv Day: Article

Nutmeg and SPICE: Models and Data for Biomolecular Machine Learning

We describe version 2 of the SPICE dataset, a collection of quantum chemistry calculations for training machine learning potentials. It expands on the original dataset by adding much more sampling of chemical space and more data on non-covalent interactions. We train a set of potential energy functions called Nutmeg on it. They use a novel mechanism to improve performance on charged and polar molecules, injecting precomputed partial charges into the model to provide a reference for the large scale charge distribution. Evaluation of the new models shows they do an excellent job of reproducing energy differences between conformations, even on highly charged molecules or ones that are significantly larger than the molecules in the training set. They also produce stable molecular dynamics trajectories, and are fast enough to be useful for routine simulation of small molecules.

Updated: 2024-06-18 23:54:21

标题: 肉豆蔻和香料：生物分子机器学习的模型和数据

摘要: 我们描述了SPICE数据集的第2版，这是一个用于训练机器学习潜力的量子化学计算集合。它通过在化学空间中添加更多采样和更多非共价相互作用数据来扩展原始数据集。我们在其上训练了一组名为Nutmeg的势能函数。它们使用一种新颖的机制来提高对带电和极性分子的性能，在模型中注入预计算的部分电荷，以提供大规模电荷分布的参考。对新模型的评估显示，它们在重现构象之间的能量差异方面表现出色，即使在高电荷分子或比训练集中的分子显着更大的分子上也是如此。它们还产生稳定的分子动力学轨迹，并且足够快速以用于小分子的常规模拟。

更新时间: 2024-06-18 23:54:21

领域: physics.chem-ph,cs.LG

下载: http://arxiv.org/abs/2406.13112v1

Fighting Against the Repetitive Training and Sample Dependency Problem in Few-shot Named Entity Recognition

Few-shot named entity recognition (NER) systems recognize entities using a few labeled training examples. The general pipeline consists of a span detector to identify entity spans in text and an entity-type classifier to assign types to entities. Current span detectors rely on extensive manual labeling to guide training. Almost every span detector requires initial training on basic span features followed by adaptation to task-specific features. This process leads to repetitive training of the basic span features among span detectors. Additionally, metric-based entity-type classifiers, such as prototypical networks, typically employ a specific metric that gauges the distance between the query sample and entity-type referents, ultimately assigning the most probable entity type to the query sample. However, these classifiers encounter the sample dependency problem, primarily stemming from the limited samples available for each entity-type referent. To address these challenges, we proposed an improved few-shot NER pipeline. First, we introduce a steppingstone span detector that is pre-trained on open-domain Wikipedia data. It can be used to initialize the pipeline span detector to reduce the repetitive training of basic features. Second, we leverage a large language model (LLM) to set reliable entity-type referents, eliminating reliance on few-shot samples of each type. Our model exhibits superior performance with fewer training steps and human-labeled data compared with baselines, as demonstrated through extensive experiments on various datasets. Particularly in fine-grained few-shot NER settings, our model outperforms strong baselines, including ChatGPT. We will publicly release the code, datasets, LLM outputs, and model checkpoints.

Updated: 2024-06-18 23:45:14

标题: 在少样本命名实体识别中解决重复训练和样本依赖问题

摘要: 少样本命名实体识别（NER）系统使用少量标记训练示例识别实体。一般流程包括一个跨度检测器来识别文本中的实体跨度，以及一个实体类型分类器来分配实体类型。当前的跨度检测器依赖于广泛的手动标记来指导训练。几乎每个跨度检测器都需要最初在基本跨度特征上进行训练，然后再适应特定任务的特征。这一过程导致跨度检测器之间基本跨度特征的重复训练。此外，基于度量的实体类型分类器，如原型网络，通常采用衡量查询样本和实体类型参考之间距离的特定度量，最终将最可能的实体类型分配给查询样本。然而，这些分类器遇到样本依赖性问题，主要源自每种实体类型参考样本有限。为了解决这些挑战，我们提出了一个改进的少样本NER流程。首先，我们引入了一个基于开放领域维基百科数据预训练的跨度检测器。它可以用来初始化流程中的跨度检测器，以减少基本特征的重复训练。其次，我们利用一个大型语言模型（LLM）来设定可靠的实体类型参考，消除对每种类型的少样本的依赖。我们的模型在较少训练步骤和人工标记数据的情况下表现出优越性能，通过在各种数据集上进行大量实验证明。特别是在细粒度的少样本NER设置中，我们的模型优于强基线，包括ChatGPT。我们将公开发布代码、数据集、LLM输出和模型检查点。

更新时间: 2024-06-18 23:45:14

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2406.05460v2

Accelerating Complex Disease Treatment through Network Medicine and GenAI: A Case Study on Drug Repurposing for Breast Cancer

The objective of this research is to introduce a network specialized in predicting drugs that can be repurposed by investigating real-world evidence sources, such as clinical trials and biomedical literature. Specifically, it aims to generate drug combination therapies for complex diseases (e.g., cancer, Alzheimer's). We present a multilayered network medicine approach, empowered by a highly configured ChatGPT prompt engineering system, which is constructed on the fly to extract drug mentions in clinical trials. Additionally, we introduce a novel algorithm that connects real-world evidence with disease-specific signaling pathways (e.g., KEGG database). This sheds light on the repurposability of drugs if they are found to bind with one or more protein constituents of a signaling pathway. To demonstrate, we instantiated the framework for breast cancer and found that, out of 46 breast cancer signaling pathways, the framework identified 38 pathways that were covered by at least two drugs. This evidence signals the potential for combining those drugs. Specifically, the most covered signaling pathway, ID hsa:2064, was covered by 108 drugs, some of which can be combined. Conversely, the signaling pathway ID hsa:1499 was covered by only two drugs, indicating a significant gap for further research. Our network medicine framework, empowered by GenAI, shows promise in identifying drug combinations with a high degree of specificity, knowing the exact signaling pathways and proteins that serve as targets. It is noteworthy that ChatGPT successfully accelerated the process of identifying drug mentions in clinical trials, though further investigations are required to determine the relationships among the drug mentions.

Updated: 2024-06-18 23:40:00

标题: 通过网络医学和GenAI加速复杂疾病治疗：以药物重用治疗乳腺癌为例研究

摘要: 这项研究的目标是引入一个专门用于预测可以通过调查真实世界证据来源（如临床试验和生物医学文献）重新利用的药物的网络。具体而言，它旨在为复杂疾病（例如癌症、阿尔茨海默病）生成药物组合疗法。我们提出了一个多层次的网络医学方法，由高度配置的ChatGPT提示工程系统支持，该系统动态构建以提取临床试验中的药物提及。此外，我们引入了一种连接真实世界证据与疾病特定信号通路（例如KEGG数据库）的新算法。如果发现药物与一个或多个信号通路的蛋白质成分结合，这将揭示药物的可重新利用性。为了证明，我们针对乳腺癌实例化了该框架，并发现，在46个乳腺癌信号通路中，该框架识别出至少有两种药物覆盖的38个通路。这一证据表明结合这些药物的潜力。具体而言，覆盖最广泛的信号通路ID hsa:2064被108种药物覆盖，其中一些可以结合使用。相反，信号通路ID hsa:1499仅被两种药物覆盖，表明进一步研究存在重大差距。我们的网络医学框架，由GenAI支持，显示出在识别具有高度特异性的药物组合方面的潜力，了解作为靶点的确切信号通路和蛋白质。值得注意的是，ChatGPT成功加速了在临床试验中识别药物提及的过程，尽管需要进一步研究以确定药物提及之间的关系。

更新时间: 2024-06-18 23:40:00

领域: cs.AI,cs.CL,cs.IR,I.2; I.2.6

下载: http://arxiv.org/abs/2406.13106v1

$TrIND$: Representing Anatomical Trees by Denoising Diffusion of Implicit Neural Fields

Anatomical trees play a central role in clinical diagnosis and treatment planning. However, accurately representing anatomical trees is challenging due to their varying and complex topology and geometry. Traditional methods for representing tree structures, captured using medical imaging, while invaluable for visualizing vascular and bronchial networks, exhibit drawbacks in terms of limited resolution, flexibility, and efficiency. Recently, implicit neural representations (INRs) have emerged as a powerful tool for representing shapes accurately and efficiently. We propose a novel approach, $TrIND$, for representing anatomical trees using INR, while also capturing the distribution of a set of trees via denoising diffusion in the space of INRs. We accurately capture the intricate geometries and topologies of anatomical trees at any desired resolution. Through extensive qualitative and quantitative evaluation, we demonstrate high-fidelity tree reconstruction with arbitrary resolution yet compact storage, and versatility across anatomical sites and tree complexities. The code is available at: \texttt{\url{https://github.com/sinashish/TreeDiffusion}}.

Updated: 2024-06-18 23:32:30

标题: $TrIND$: 通过去噪隐式神经场扩散表示解剖树

摘要: 解剖树在临床诊断和治疗规划中起着核心作用。然而，准确表示解剖树是具有挑战性的，因为它们具有不同且复杂的拓扑结构和几何形状。传统的表示树结构的方法，通过医学成像捕获，虽然在可视化血管和支气管网络方面非常有价值，但在分辨率、灵活性和效率方面存在缺陷。最近，隐式神经表示（INRs）已经成为一个有效的工具，可以准确且高效地表示形状。我们提出了一种新颖的方法，$TrIND$，用于使用INR表示解剖树，同时通过INR空间中的去噪扩散来捕捉一组树的分布。我们可以准确地捕捉解剖树的复杂几何形状和拓扑结构，并在任意所需分辨率下进行重建。通过广泛的定性和定量评估，我们展示了高保真度的树形重建，具有任意分辨率且紧凑的存储，并且在解剖部位和树形复杂性上具有多样性。代码可在以下链接找到：\texttt{\url{https://github.com/sinashish/TreeDiffusion}}。

更新时间: 2024-06-18 23:32:30

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2403.08974v3

A Generic Method for Fine-grained Category Discovery in Natural Language Texts

Fine-grained category discovery using only coarse-grained supervision is a cost-effective yet challenging task. Previous training methods focus on aligning query samples with positive samples and distancing them from negatives. They often neglect intra-category and inter-category semantic similarities of fine-grained categories when navigating sample distributions in the embedding space. Furthermore, some evaluation techniques that rely on pre-collected test samples are inadequate for real-time applications. To address these shortcomings, we introduce a method that successfully detects fine-grained clusters of semantically similar texts guided by a novel objective function. The method uses semantic similarities in a logarithmic space to guide sample distributions in the Euclidean space and to form distinct clusters that represent fine-grained categories. We also propose a centroid inference mechanism to support real-time applications. The efficacy of the method is both theoretically justified and empirically confirmed on three benchmark tasks. The proposed objective function is integrated in multiple contrastive learning based neural models. Its results surpass existing state-of-the-art approaches in terms of Accuracy, Adjusted Rand Index and Normalized Mutual Information of the detected fine-grained categories. Code and data will be available at https://github.com/XX upon publication.

Updated: 2024-06-18 23:27:46

标题: 一种用于在自然语言文本中发现细粒度类别的通用方法

摘要: 使用仅粗略监督进行细粒度类别发现是一项经济高效且具有挑战性的任务。先前的训练方法侧重于将查询样本与正样本对齐，并将其与负样本距离。它们通常在导航嵌入空间中的样本分布时忽略了细粒度类别的类内和类间语义相似性。此外，一些依赖于预先收集的测试样本的评估技术对于实时应用是不足够的。为了解决这些缺点，我们引入了一种方法，通过一种新颖的目标函数成功地检测到语义相似文本的细粒度聚类。该方法使用对数空间中的语义相似性来引导欧几里得空间中的样本分布，并形成代表细粒度类别的明显聚类。我们还提出了一个质心推断机制来支持实时应用。该方法的有效性在理论上得到了证明，并在三个基准任务上得到了实证验证。所提出的目标函数被集成在多个基于对比学习的神经模型中。其结果在检测到的细粒度类别的准确性、调整兰德指数和标准化互信息方面超过了现有的最先进方法。代码和数据将在https://github.com/XX 发表后可用。

更新时间: 2024-06-18 23:27:46

领域: cs.AI,cs.LG

下载: http://arxiv.org/abs/2406.13103v1

On instabilities in neural network-based physics simulators

When neural networks are trained from data to simulate the dynamics of physical systems, they encounter a persistent challenge: the long-time dynamics they produce are often unphysical or unstable. We analyze the origin of such instabilities when learning linear dynamical systems, focusing on the training dynamics. We make several analytical findings which empirical observations suggest extend to nonlinear dynamical systems. First, the rate of convergence of the training dynamics is uneven and depends on the distribution of energy in the data. As a special case, the dynamics in directions where the data have no energy cannot be learned. Second, in the unlearnable directions, the dynamics produced by the neural network depend on the weight initialization, and common weight initialization schemes can produce unstable dynamics. Third, injecting synthetic noise into the data during training adds damping to the training dynamics and can stabilize the learned simulator, though doing so undesirably biases the learned dynamics. For each contributor to instability, we suggest mitigative strategies. We also highlight important differences between learning discrete-time and continuous-time dynamics, and discuss extensions to nonlinear systems.

Updated: 2024-06-18 23:25:14

标题: 关于神经网络物理模拟器中的不稳定性

摘要: 当神经网络从数据中训练以模拟物理系统的动态时，它们面临着一个持久的挑战：它们产生的长时间动态通常是非物理的或不稳定的。我们分析了学习线性动态系统时这种不稳定性的起源，重点放在训练动态上。我们做出了几项分析发现，经验观察表明这些发现可以推广到非线性动态系统。首先，训练动态的收敛速度不均匀，取决于数据中能量的分布。作为特例，在数据没有能量的方向上无法学习动态。其次，在无法学习的方向上，神经网络产生的动态取决于权重初始化，常见的权重初始化方案可能会产生不稳定的动态。第三，在训练过程中向数据注入合成噪声会给训练动态中添加阻尼，并且可以稳定学习到的模拟器，尽管这样做会不希望地偏向于学习到的动态。对于每个导致不稳定性的因素，我们提出了缓解策略。我们还强调了学习离散时间和连续时间动态之间的重要差异，并讨论了对非线性系统的扩展。

更新时间: 2024-06-18 23:25:14

领域: cs.LG,cs.CE,nlin.CD

下载: http://arxiv.org/abs/2406.13101v1

Simple Cracking of (Noise-Based) Dynamic Watermarking in Smart Grids

Previous research employing a conceptual approach with a digital twin has demonstrated that (noise-based) dynamic watermarking is incapable of providing unconditional security in smart electrical grid systems. However, the implementation of digital twins can be prohibitively costly or infeasible due to limited available data on critical infrastructure. In this study, we first analyze the spectral properties of dynamic watermarking and its associated protocol. Subsequently, we present a straightforward attack inspired by the digital twin method, which extracts and utilizes the grid noises and completely breaches the security of dynamic watermarking without requiring knowledge of the private watermarking signal. The attacker can fully expose the grid while evading detection by the controller. Our findings indicate that in the absence of secure and authenticated communications, dynamic watermarking offers neither conditional nor unconditional security. Conversely, when communication lines, sensors, and communicators are equipped with tamper-resistant and secure/authenticated links, dynamic watermarking becomes redundant for grid security.

Updated: 2024-06-18 23:24:22

标题: 智能电网中（基于噪声的）动态水印的简单破解

摘要: 先前的研究采用概念方法和数字孪生进行了研究，表明（基于噪声的）动态水印技术无法在智能电网系统中提供无条件的安全性。然而，由于关键基础设施上可用数据有限，数字孪生的实施可能成本过高或不可行。在这项研究中，我们首先分析了动态水印技术及其相关协议的频谱特性。随后，我们提出了一种受数字孪生方法启发的直接攻击，该攻击提取并利用了电网噪声，完全突破了动态水印技术的安全性，而无需了解私有水印信号。攻击者可以完全暴露电网，同时避开控制器的检测。我们的研究结果表明，在缺乏安全和认证通信的情况下，动态水印技术既不提供条件安全性也不提供无条件安全性。相反，当通信线路、传感器和通信设备配备有防篡改和安全/认证链接时，动态水印技术对于电网安全变得多余。

更新时间: 2024-06-18 23:24:22

领域: cs.CR

下载: http://arxiv.org/abs/2406.15494v1

Pretraining Strategy for Neural Potentials

We propose a mask pretraining method for Graph Neural Networks (GNNs) to improve their performance on fitting potential energy surfaces, particularly in water systems. GNNs are pretrained by recovering spatial information related to masked-out atoms from molecules, then transferred and finetuned on atomic forcefields. Through such pretraining, GNNs learn meaningful prior about structural and underlying physical information of molecule systems that are useful for downstream tasks. From comprehensive experiments and ablation studies, we show that the proposed method improves the accuracy and convergence speed compared to GNNs trained from scratch or using other pretraining techniques such as denoising. On the other hand, our pretraining method is suitable for both energy-centric and force-centric GNNs. This approach showcases its potential to enhance the performance and data efficiency of GNNs in fitting molecular force fields.

Updated: 2024-06-18 23:19:42

标题: 神经电位的预训练策略

摘要: 我们提出了一种用于图神经网络（GNNs）的掩模预训练方法，以提高其在拟合潜在能量表面方面的性能，特别是在水系统中。GNNs通过从分子中恢复与掩蔽原子相关的空间信息进行预训练，然后转移并在原子力场上进行微调。通过这种预训练，GNNs学习有关分子系统的结构和潜在物理信息的有意义先验，这对下游任务非常有用。通过全面的实验和消融研究，我们展示了所提出的方法相对于从头开始训练的GNNs或使用其他预训练技术（如去噪）的准确性和收敛速度得到了提高。另一方面，我们的预训练方法适用于以能量为中心和以力为中心的GNNs。这种方法展示了其在拟合分子力场方面提高GNNs性能和数据效率的潜力。

更新时间: 2024-06-18 23:19:42

领域: cs.LG,physics.chem-ph

下载: http://arxiv.org/abs/2402.15921v2

Sampling 3D Gaussian Scenes in Seconds with Latent Diffusion Models

We present a latent diffusion model over 3D scenes, that can be trained using only 2D image data. To achieve this, we first design an autoencoder that maps multi-view images to 3D Gaussian splats, and simultaneously builds a compressed latent representation of these splats. Then, we train a multi-view diffusion model over the latent space to learn an efficient generative model. This pipeline does not require object masks nor depths, and is suitable for complex scenes with arbitrary camera positions. We conduct careful experiments on two large-scale datasets of complex real-world scenes -- MVImgNet and RealEstate10K. We show that our approach enables generating 3D scenes in as little as 0.2 seconds, either from scratch, from a single input view, or from sparse input views. It produces diverse and high-quality results while running an order of magnitude faster than non-latent diffusion models and earlier NeRF-based generative models

Updated: 2024-06-18 23:14:29

标题: 用潜在扩散模型在几秒钟内对3D高斯场景进行抽样

摘要: 我们提出了一个潜在扩散模型，可以在只使用2D图像数据的情况下进行训练。为了实现这一目标，我们首先设计了一个自动编码器，将多视角图像映射到3D高斯斑点，并同时构建这些斑点的压缩潜在表示。然后，我们在潜在空间上训练多视角扩散模型，以学习一个高效的生成模型。这个流程不需要对象掩模或深度，并适用于具有任意摄像机位置的复杂场景。我们在两个大规模真实世界场景数据集MVImgNet和RealEstate10K上进行了仔细的实验。我们展示了我们的方法可以在0.2秒内生成3D场景，可以从零开始、从单个输入视图或从稀疏输入视图生成。它产生多样且高质量的结果，同时比非潜在扩散模型和早期基于NeRF的生成模型运行速度快一个数量级。

更新时间: 2024-06-18 23:14:29

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2406.13099v1

Self-Adaptive Physics-Informed Neural Networks using a Soft Attention Mechanism

Physics-Informed Neural Networks (PINNs) have emerged recently as a promising application of deep neural networks to the numerical solution of nonlinear partial differential equations (PDEs). However, it has been recognized that adaptive procedures are needed to force the neural network to fit accurately the stubborn spots in the solution of "stiff" PDEs. In this paper, we propose a fundamentally new way to train PINNs adaptively, where the adaptation weights are fully trainable and applied to each training point individually, so the neural network learns autonomously which regions of the solution are difficult and is forced to focus on them. The self-adaptation weights specify a soft multiplicative soft attention mask, which is reminiscent of similar mechanisms used in computer vision. The basic idea behind these SA-PINNs is to make the weights increase as the corresponding losses increase, which is accomplished by training the network to simultaneously minimize the losses and maximize the weights. In addition, we show how to build a continuous map of self-adaptive weights using Gaussian Process regression, which allows the use of stochastic gradient descent in problems where conventional gradient descent is not enough to produce accurate solutions. Finally, we derive the Neural Tangent Kernel matrix for SA-PINNs and use it to obtain a heuristic understanding of the effect of the self-adaptive weights on the dynamics of training in the limiting case of infinitely-wide PINNs, which suggests that SA-PINNs work by producing a smooth equalization of the eigenvalues of the NTK matrix corresponding to the different loss terms. In numerical experiments with several linear and nonlinear benchmark problems, the SA-PINN outperformed other state-of-the-art PINN algorithm in L2 error, while using a smaller number of training epochs.

Updated: 2024-06-18 23:09:32

标题: 使用软注意机制的自适应物理信息神经网络

摘要: 物理信息神经网络（PINNs）最近已经成为深度神经网络在非线性偏微分方程（PDEs）数值解法中的一种有前途的应用。然而，人们已经意识到需要采取自适应程序来强制神经网络精确拟合“刚性”PDEs解中的顽固点。在本文中，我们提出了一种基本上全新的方法来自适应训练PINNs，其中自适应权重是完全可训练的，并分别应用于每个训练点，因此神经网络可以自主学习哪些解区域困难，并被迫专注于这些区域。自适应权重指定了一个软的乘法软注意掩码，类似于计算机视觉中使用的类似机制。SA-PINNs的基本思想是使权重随着相应的损失增加而增加，通过训练网络同时最小化损失和最大化权重来实现这一点。此外，我们展示了如何使用高斯过程回归构建自适应权重的连续映射，从而允许在传统梯度下降不足以产生准确解的问题中使用随机梯度下降。最后，我们推导了SA-PINNs的神经切向核矩阵，并将其用于在无限宽PINNs的极限情况下获得对自适应权重对训练动态的影响的一种启发式理解，这表明SA-PINNs通过产生与不同损失项对应的NTK矩阵的特征值的平滑均衡来工作。在对几个线性和非线性基准问题进行的数值实验中，SA-PINN在L2误差方面优于其他最先进的PINN算法，同时使用更少的训练时期。

更新时间: 2024-06-18 23:09:32

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2009.04544v5

DLP: towards active defense against backdoor attacks with decoupled learning process

Deep learning models are well known to be susceptible to backdoor attack, where the attacker only needs to provide a tampered dataset on which the triggers are injected. Models trained on the dataset will passively implant the backdoor, and triggers on the input can mislead the models during testing. Our study shows that the model shows different learning behaviors in clean and poisoned subsets during training. Based on this observation, we propose a general training pipeline to defend against backdoor attacks actively. Benign models can be trained from the unreliable dataset by decoupling the learning process into three stages, i.e., supervised learning, active unlearning, and active semi-supervised fine-tuning. The effectiveness of our approach has been shown in numerous experiments across various backdoor attacks and datasets.

Updated: 2024-06-18 23:04:38

标题: DLP: 朝着与分离学习过程的后门攻击进行主动防御

摘要: 深度学习模型因易受后门攻击而闻名，攻击者只需提供一个篡改的数据集，其中注入了触发器。在该数据集上训练的模型将被 passively 植入后门，并且输入上的触发器可能在测试期间误导模型。我们的研究表明，在训练过程中，模型在干净子集和被毒害的子集中表现出不同的学习行为。基于这一观察，我们提出了一个通用的训练流程，以主动防御后门攻击。良性模型可以通过将学习过程分解为三个阶段来从不可靠的数据集中进行训练，即监督学习、主动取消学习和主动半监督微调。我们的方法在各种后门攻击和数据集上的大量实验证明了其有效性。

更新时间: 2024-06-18 23:04:38

领域: cs.CR

下载: http://arxiv.org/abs/2406.13098v1

Exploring and Benchmarking the Planning Capabilities of Large Language Models

We seek to elevate the planning capabilities of Large Language Models (LLMs)investigating four main directions. First, we construct a comprehensive benchmark suite encompassing both classical planning domains and natural language scenarios. This suite includes algorithms to generate instances with varying levels of difficulty, allowing for rigorous and systematic evaluation of LLM performance. Second, we investigate the use of in-context learning (ICL) to enhance LLM planning, exploring the direct relationship between increased context length and improved planning performance. Third, we demonstrate the positive impact of fine-tuning LLMs on optimal planning paths, as well as the effectiveness of incorporating model-driven search procedures. Finally, we investigate the performance of the proposed methods in out-of-distribution scenarios, assessing the ability to generalize to novel and unseen planning challenges.

Updated: 2024-06-18 22:57:06

标题: 探索和基准测试大型语言模型的规划能力

摘要: 我们旨在提升大型语言模型（LLMs）的规划能力，研究四个主要方向。首先，我们构建了一个全面的基准套件，涵盖传统规划领域和自然语言场景。这个套件包括生成不同难度级别实例的算法，允许对LLM性能进行严格和系统的评估。其次，我们研究了在上下文学习（ICL）中使用以增强LLM规划，探索上下文长度增加与规划性能改善之间的直接关系。第三，我们展示了微调LLMs对最佳规划路径的积极影响，以及将模型驱动的搜索程序结合的有效性。最后，我们研究了所提出方法在分布之外的场景中的性能，评估其对新颖和未知规划挑战的泛化能力。

更新时间: 2024-06-18 22:57:06

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2406.13094v1

RITA: A Real-time Interactive Talking Avatars Framework

RITA presents a high-quality real-time interactive framework built upon generative models, designed with practical applications in mind. Our framework enables the transformation of user-uploaded photos into digital avatars that can engage in real-time dialogue interactions. By leveraging the latest advancements in generative modeling, we have developed a versatile platform that not only enhances the user experience through dynamic conversational avatars but also opens new avenues for applications in virtual reality, online education, and interactive gaming. This work showcases the potential of integrating computer vision and natural language processing technologies to create immersive and interactive digital personas, pushing the boundaries of how we interact with digital content.

Updated: 2024-06-18 22:53:15

标题: RITA：一个实时交互式语音头像框架

摘要: RITA提供了一个基于生成模型的高质量实时交互框架，旨在实用应用。我们的框架能够将用户上传的照片转换为可以参与实时对话交互的数字化头像。通过利用生成建模的最新进展，我们开发了一个多功能平台，不仅通过动态对话头像增强用户体验，还为虚拟现实、在线教育和互动游戏应用开辟了新途径。这项工作展示了整合计算机视觉和自然语言处理技术以创建沉浸式和互动数字化人物的潜力，推动了我们与数字内容互动的界限。

更新时间: 2024-06-18 22:53:15

领域: cs.CV,cs.AI,cs.HC

下载: http://arxiv.org/abs/2406.13093v1

Why is plausibility surprisingly problematic as an XAI criterion?

Explainable artificial intelligence (XAI) is motivated by the problem of making AI predictions understandable, transparent, and responsible, as AI becomes increasingly impactful in society and high-stakes domains. XAI algorithms are designed to explain AI decisions in human-understandable ways. The evaluation and optimization criteria of XAI are gatekeepers for XAI algorithms to achieve their expected goals and should withstand rigorous inspection. To improve the scientific rigor of XAI, we conduct the first critical examination of a common XAI criterion: plausibility. It measures how convincing the AI explanation is to humans, and is usually quantified by metrics on feature localization or correlation of feature attribution. Our examination shows, although plausible explanations can improve users' understanding and local trust in an AI decision, doing so is at the cost of abandoning other possible approaches of enhancing understandability, increasing misleading explanations that manipulate users, being unable to achieve complementary human-AI task performance, and deteriorating users' global trust in the overall AI system. Because the flaws outweigh the benefits, we do not recommend using plausibility as a criterion to evaluate or optimize XAI algorithms. We also identify new directions to improve XAI on understandability and utility to users including complementary human-AI task performance.

Updated: 2024-06-18 22:38:32

标题: 为什么可信度作为XAI标准令人意外地具有问题？

摘要: 可解释的人工智能（XAI）受到一个问题的激励，即如何使人工智能的预测变得可理解、透明且负责任，因为人工智能在社会和高风险领域的影响日益增大。XAI算法旨在以人类可理解的方式解释人工智能决策。XAI的评估和优化标准是XAI算法实现预期目标的门卫，并且应该经受严格的检查。为了提高XAI的科学严谨性，我们进行了对一个常见XAI标准的首次批判性审查：可信度。它衡量人工智能解释对人类的说服力，通常通过特征定位或特征归因的相关性指标来量化。我们的研究显示，尽管可信度解释可以提高用户对人工智能决策的理解和局部信任，但这样做是以放弃增强可理解性的其他可能途径为代价的，导致误导性解释增多，操纵用户，无法实现人类-人工智能任务绩效的补充，并降低用户对整体人工智能系统的全局信任。由于缺点超过了好处，我们不建议使用可信度作为评估或优化XAI算法的标准。我们还确定了改进XAI可理解性和对用户效用的新方向，包括增强人类-人工智能任务绩效。

更新时间: 2024-06-18 22:38:32

领域: cs.AI,cs.HC

下载: http://arxiv.org/abs/2303.17707v3

NaviSplit: Dynamic Multi-Branch Split DNNs for Efficient Distributed Autonomous Navigation

Lightweight autonomous unmanned aerial vehicles (UAV) are emerging as a central component of a broad range of applications. However, autonomous navigation necessitates the implementation of perception algorithms, often deep neural networks (DNN), that process the input of sensor observations, such as that from cameras and LiDARs, for control logic. The complexity of such algorithms clashes with the severe constraints of these devices in terms of computing power, energy, memory, and execution time. In this paper, we propose NaviSplit, the first instance of a lightweight navigation framework embedding a distributed and dynamic multi-branched neural model. At its core is a DNN split at a compression point, resulting in two model parts: (1) the head model, that is executed at the vehicle, which partially processes and compacts perception from sensors; and (2) the tail model, that is executed at an interconnected compute-capable device, which processes the remainder of the compacted perception and infers navigation commands. Different from prior work, the NaviSplit framework includes a neural gate that dynamically selects a specific head model to minimize channel usage while efficiently supporting the navigation network. In our implementation, the perception model extracts a 2D depth map from a monocular RGB image captured by the drone using the robust simulator Microsoft AirSim. Our results demonstrate that the NaviSplit depth model achieves an extraction accuracy of 72-81% while transmitting an extremely small amount of data (1.2-18 KB) to the edge server. When using the neural gate, as utilized by NaviSplit, we obtain a slightly higher navigation accuracy as compared to a larger static network by 0.3% while significantly reducing the data rate by 95%. To the best of our knowledge, this is the first exemplar of dynamic multi-branched model based on split DNNs for autonomous navigation.

Updated: 2024-06-18 22:25:09

标题: NaviSplit：用于高效分布式自主导航的动态多分支分割DNNs

摘要: 轻量级自主无人机（UAV）正逐渐成为广泛应用的一个核心组成部分。然而，自主导航必须实施感知算法，通常是深度神经网络（DNN），用于处理传感器观测的输入，例如来自摄像头和LiDAR的观测，以进行控制逻辑。这些算法的复杂性与这些设备在计算能力、能量、内存和执行时间方面的严格限制相冲突。在本文中，我们提出了NaviSplit，这是第一个轻量级导航框架的实例，它嵌入了一个分布式和动态的多支神经模型。其核心是在一个压缩点处分裂的DNN，导致两个模型部分：（1）头模型，在车辆上执行，部分处理和压缩来自传感器的感知；（2）尾模型，在相互连接的计算能力设备上执行，处理其余的压缩感知并推断导航命令。与以往的工作不同，NaviSplit框架包括一个神经门，动态选择特定的头模型以最小化信道使用，同时有效支持导航网络。在我们的实现中，感知模型从无人机使用强大的模拟器Microsoft AirSim拍摄的单眼RGB图像中提取2D深度图。我们的结果表明，NaviSplit深度模型实现了72-81%的提取准确性，同时向边缘服务器传输极少量的数据（1.2-18 KB）。当使用NaviSplit所使用的神经门时，我们相比于更大的静态网络获得了略高的导航准确性，提高了0.3%，同时将数据传输速率降低了95%。据我们所知，这是第一个基于分裂DNN的动态多支模型的自主导航实例。

更新时间: 2024-06-18 22:25:09

领域: cs.RO,cs.AI

下载: http://arxiv.org/abs/2406.13086v1

Learning to Extract Structured Entities Using Language Models

Recent advances in machine learning have significantly impacted the field of information extraction, with Language Models (LMs) playing a pivotal role in extracting structured information from unstructured text. Prior works typically represent information extraction as triplet-centric and use classical metrics such as precision and recall for evaluation. We reformulate the task to be entity-centric, enabling the use of diverse metrics that can provide more insights from various perspectives. We contribute to the field by introducing Structured Entity Extraction and proposing the Approximate Entity Set OverlaP (AESOP) metric, designed to appropriately assess model performance. Later, we introduce a new model that harnesses the power of LMs for enhanced effectiveness and efficiency by decomposing the extraction task into multiple stages. Quantitative and human side-by-side evaluations confirm that our model outperforms baselines, offering promising directions for future advancements in structured entity extraction.

Updated: 2024-06-18 22:11:27

标题: 学习使用语言模型提取结构化实体

摘要: 机器学习的最新进展显著影响了信息提取领域，语言模型（LMs）在从非结构化文本中提取结构化信息方面发挥了关键作用。先前的研究通常将信息提取表示为三元组中心，并使用经典指标如精确度和召回率进行评估。我们重新制定了任务以实体为中心，使得可以使用多样化的指标，从不同角度提供更多见解。我们通过引入结构化实体提取并提出适用于适当评估模型性能的近似实体集重叠度（AESOP）指标，为该领域做出了贡献。随后，我们引入了一种新模型，利用LMs的力量增强了效果和效率，通过将提取任务分解为多个阶段。定量和人类并行评估证实我们的模型优于基线，为未来结构化实体提取方面的发展提供了有希望的方向。

更新时间: 2024-06-18 22:11:27

领域: cs.CL,cs.LG

下载: http://arxiv.org/abs/2402.04437v4

Online Learning with Set-Valued Feedback

We study a variant of online multiclass classification where the learner predicts a single label but receives a \textit{set of labels} as feedback. In this model, the learner is penalized for not outputting a label contained in the revealed set. We show that unlike online multiclass learning with single-label feedback, deterministic and randomized online learnability are \textit{not equivalent} even in the realizable setting with set-valued feedback. Accordingly, we give two new combinatorial dimensions, named the Set Littlestone and Measure Shattering dimension, that tightly characterize deterministic and randomized online learnability respectively in the realizable setting. In addition, we show that the Measure Shattering dimension characterizes online learnability in the agnostic setting and tightly quantifies the minimax regret. Finally, we use our results to establish bounds on the minimax regret for three practical learning settings: online multilabel ranking, online multilabel classification, and real-valued prediction with interval-valued response.

Updated: 2024-06-18 22:11:04

标题: 在线学习与集合值反馈

摘要: 我们研究了一种在线多类别分类的变体，其中学习者预测一个单一标签，但接收到一组标签作为反馈。在这个模型中，学习者因未输出包含在所揭示集合中的标签而受到惩罚。我们展示了与具有单一标签反馈的在线多类别学习不同，在具有集合值反馈的可实现设置中，确定性和随机化在线可学性\textit{不等价}。因此，我们提出了两个新的组合维度，命名为集合Littlestone维度和Measure Shattering维度，分别严格表征了确定性和随机化在线可学性在可实现设置中。此外，我们展示了Measure Shattering维度表征了在犹豫设置下的在线可学性，并严格量化了极小化遗憾。最后，我们利用我们的结果建立了三种实际学习设置的极小化遗憾的界限：在线多标签排名、在线多标签分类和具有区间值响应的实值预测。

更新时间: 2024-06-18 22:11:04

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2306.06247v4

An Experimental Characterization of Combined RowHammer and RowPress Read Disturbance in Modern DRAM Chips

DRAM read disturbance can break memory isolation, a fundamental property to ensure system robustness (i.e., reliability, security, safety). RowHammer and RowPress are two different DRAM read disturbance phenomena. RowHammer induces bitflips in physically adjacent victim DRAM rows by repeatedly opening and closing an aggressor DRAM row, while RowPress induces bitflips by keeping an aggressor DRAM row open for a long period of time. In this study, we characterize a DRAM access pattern that combines RowHammer and RowPress in 84 real DDR4 DRAM chips from all three major DRAM manufacturers. Our key results show that 1) this combined RowHammer and RowPress pattern takes significantly smaller amount of time (up to 46.1% faster) to induce the first bitflip compared to the state-of-the-art RowPress pattern, and 2) at the minimum aggressor row activation count to induce at least one bitflip, the bits that flip are different across RowHammer, RowPress, and the combined patterns. Based on our results, we provide a key hypothesis that the read disturbance effect caused by RowPress from one of the two aggressor rows in a double-sided pattern is much more significant than the other.

Updated: 2024-06-18 21:57:45

标题: 现代DRAM芯片中组合RowHammer和RowPress读干扰的实验特征化

摘要: DRAM读取干扰可能会破坏内存隔离，这是确保系统稳健性（即可靠性、安全性、安全性）的基本属性。 RowHammer和RowPress是两种不同的DRAM读取干扰现象。 RowHammer通过反复打开和关闭侵略者DRAM行在物理上相邻的受害者DRAM行中引起位翻转，而RowPress通过保持侵略者DRAM行长时间打开来引起位翻转。在这项研究中，我们对来自三家主要DRAM制造商的84个真实DDR4 DRAM芯片进行了结合RowHammer和RowPress的DRAM访问模式的特征化。我们的关键结果显示：1）与最先进的RowPress模式相比，这种混合RowHammer和RowPress模式需要较短的时间（高达46.1％更快）来引起第一个位翻转；2）在至少激活一个位翻转的最小侵略者行激活计数时，通过RowHammer，RowPress和混合模式翻转的位是不同的。根据我们的结果，我们提出一个关键假设，即双面模式中由一侧侵略者行引起的RowPress引起的读取干扰效应要比另一侧显著得多。

更新时间: 2024-06-18 21:57:45

领域: cs.AR,cs.CR

下载: http://arxiv.org/abs/2406.13080v1

LEVI: Generalizable Fine-tuning via Layer-wise Ensemble of Different Views

Fine-tuning is becoming widely used for leveraging the power of pre-trained foundation models in new downstream tasks. While there are many successes of fine-tuning on various tasks, recent studies have observed challenges in the generalization of fine-tuned models to unseen distributions (i.e., out-of-distribution; OOD). To improve OOD generalization, some previous studies identify the limitations of fine-tuning data and regulate fine-tuning to preserve the general representation learned from pre-training data. However, potential limitations in the pre-training data and models are often ignored. In this paper, we contend that overly relying on the pre-trained representation may hinder fine-tuning from learning essential representations for downstream tasks and thus hurt its OOD generalization. It can be especially catastrophic when new tasks are from different (sub)domains compared to pre-training data. To address the issues in both pre-training and fine-tuning data, we propose a novel generalizable fine-tuning method LEVI (Layer-wise Ensemble of different VIews), where the pre-trained model is adaptively ensembled layer-wise with a small task-specific model, while preserving its efficiencies. By combining two complementing models, LEVI effectively suppresses problematic features in both the fine-tuning data and pre-trained model and preserves useful features for new tasks. Broad experiments with large language and vision models show that LEVI greatly improves fine-tuning generalization via emphasizing different views from fine-tuning data and pre-trained features.

Updated: 2024-06-18 21:56:54

标题: LEVI：通过不同视图的逐层集成进行通用微调

摘要: Fine-tuning是为了在新的下游任务中利用预训练基础模型的能力而变得广泛使用。虽然在各种任务上都取得了许多成功，但最近的研究发现了将fine-tuning模型泛化到未见分布（即out-of-distribution; OOD）的挑战。为了改进OOD泛化，一些先前的研究确定了fine-tuning数据的局限性，并规范了fine-tuning以保留从预训练数据中学到的通用表示。然而，先前的研究通常忽视了预训练数据和模型中的潜在局限性。在本文中，我们认为过度依赖预训练表示可能会阻碍fine-tuning学习下游任务的重要表示，从而损害其OOD泛化。当新任务来自与预训练数据不同的（子）领域时，这种情况可能尤为严重。为了解决预训练和fine-tuning数据中的问题，我们提出了一种新颖的通用fine-tuning方法LEVI（Layer-wise Ensemble of different VIews），其中预训练模型逐层自适应地与一个小型任务特定模型合奏，同时保留其高效性。通过结合两个互补模型，LEVI有效地抑制了fine-tuning数据和预训练模型中的问题特征，并保留了新任务的有用特征。通过大型语言和视觉模型的广泛实验表明，LEVI通过强调fine-tuning数据和预训练特征中的不同视图，极大地提高了fine-tuning的泛化能力。

更新时间: 2024-06-18 21:56:54

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2402.04644v2

Apple Tasting: Combinatorial Dimensions and Minimax Rates

In online binary classification under \emph{apple tasting} feedback, the learner only observes the true label if it predicts ``1". First studied by \cite{helmbold2000apple}, we revisit this classical partial-feedback setting and study online learnability from a combinatorial perspective. We show that the Littlestone dimension continues to provide a tight quantitative characterization of apple tasting in the agnostic setting, closing an open question posed by \cite{helmbold2000apple}. In addition, we give a new combinatorial parameter, called the Effective width, that tightly quantifies the minimax expected mistakes in the realizable setting. As a corollary, we use the Effective width to establish a \emph{trichotomy} of the minimax expected number of mistakes in the realizable setting. In particular, we show that in the realizable setting, the expected number of mistakes of any learner, under apple tasting feedback, can be $\Theta(1), \Theta(\sqrt{T})$, or $\Theta(T)$. This is in contrast to the full-information realizable setting where only $\Theta(1)$ and $\Theta(T)$ are possible.

Updated: 2024-06-18 21:54:41

标题: 苹果品尝：组合维度和极小化率

摘要: 在在线二元分类中，根据\emph{苹果品尝}反馈，学习者只有在预测“1”时才能观察到真实标签。首次由\cite{helmbold2000apple}研究，我们重新审视这一经典的部分反馈设置，并从组合角度研究在线可学习性。我们展示了Littlestone维度继续提供对在agnostic设置下苹果品尝的紧密定量表征，从而解决了\cite{helmbold2000apple}提出的一个开放问题。此外，我们引入了一个新的组合参数，称为Effective width，用于紧密量化可实现设置中的最小最大期望错误。作为推论，我们使用Effective width建立了可实现设置中最小最大期望错误数量的\emph{三分法}。特别地，我们展示了在可实现设置中，任何学习者在苹果品尝反馈下的期望错误数量可以是$\Theta(1)$，$\Theta(\sqrt{T})$或$\Theta(T)$。这与完全信息可实现设置形成对比，后者只有$\Theta(1)$和$\Theta(T)$是可能的。

更新时间: 2024-06-18 21:54:41

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2310.19064v3

Exact Community Recovery (under Side Information): Optimality of Spectral Algorithms

In this paper, we study the problem of exact community recovery in general, two-community block models considering both Bernoulli and Gaussian matrix models, capturing the Stochastic Block Model, submatrix localization, and $\mathbb{Z}_2$-synchronization as special cases. We also study the settings where $side$ $information$ about community assignment labels is available, modeled as passing the true labels through a noisy channel: either the binary erasure channel (where some community labels are known while others are erased) or the binary symmetric channel (where some labels are flipped). We provide a unified analysis of the effect of side information on the information-theoretic limits of exact recovery, generalizing prior works and extending to new settings. Additionally, we design a simple but optimal spectral algorithm that incorporates side information (when present) along with the eigenvectors of the matrix observation. Using the powerful tool of entrywise eigenvector analysis [Abbe, Fan, Wang, Zhong 2020], we show that our spectral algorithm can mimic the so called $genie$-$aided$ $estimators$, where the $i^{\mathrm{th}}$ genie-aided estimator optimally computes the estimate of the $i^{\mathrm{th}}$ label, when all remaining labels are revealed by a genie. This perspective provides a unified understanding of the optimality of spectral algorithms for various exact recovery problems in a recent line of work.

Updated: 2024-06-18 21:48:59

标题: 确切的社区恢复（在侧面信息下）：谱算法的最优性

摘要: 在这篇论文中，我们研究了在一般两社区块模型中的确切社区恢复问题，考虑了伯努利和高斯矩阵模型，捕捉了随机块模型、子矩阵定位和$\mathbb{Z}_2$-同步作为特殊情况。我们还研究了在有关社区分配标签的$side$ $information$可用的情况，将其建模为通过一个嘈杂的信道传递真实标签：二元擦除信道（其中一些社区标签已知，而其他标签被擦除）或二元对称信道（其中一些标签被翻转）。我们提供了关于$side$信息对确切恢复信息理论极限的影响的统一分析，推广了先前的研究并扩展到新的设置。此外，我们设计了一个简单但最优的谱算法，它结合了$side$信息（存在时）和矩阵观测的特征向量。利用入口级特征向量分析的强大工具[Abbe, Fan, Wang, Zhong 2020]，我们表明我们的谱算法可以模拟所谓的$genie$-$aided$ $estimators$，其中第$i$个genie-aided estimator在所有剩余标签由一个神灯揭示时最优地计算第$i$个标签的估计。这种观点为最近一系列工作中各种确切恢复问题的谱算法的最优性提供了统一的理解。

更新时间: 2024-06-18 21:48:59

领域: cs.SI,cs.LG

下载: http://arxiv.org/abs/2406.13075v1

Improving Signed Propagation of Graph Neural Network Under Multiple Classes

Message-passing Graph Neural Networks (GNNs), which collect information from adjacent nodes achieve dismal performance on heterophilic graphs. Various schemes have been proposed to solve this problem, and propagating signed information on heterophilic edges has gained great attention. Recently, some works provided theoretical analysis that signed propagation always leads to performance improvement under a binary class scenario. However, we notice that prior analyses do not align well with multi-class benchmark datasets. This paper provides a new understanding of signed propagation for multi-class scenarios and points out two drawbacks in terms of message-passing and parameter update: (1) Message-passing: if two nodes belong to different classes but have a high similarity, signed propagation can decrease the separability. (2) Parameter update: the prediction uncertainty (e.g., conflict evidence) of signed neighbors increases during training, which can impede the stability of the algorithm. Based on the observation, we introduce two novel strategies for improving signed propagation under multi-class graphs. The proposed scheme combines calibration to secure robustness while reducing uncertainty. We show the efficacy of our theorem through extensive experiments on six benchmark graph datasets.

Updated: 2024-06-18 21:48:38

标题: Improving Signed Propagation of Graph Neural Network Under Multiple Classes 改进多类别情况下图神经网络中的有符号传播

摘要: 消息传递图神经网络（GNNs）从相邻节点收集信息，在异质图上表现不佳。已经提出了各种方案来解决这个问题，对异质边上的有符号信息进行传播引起了很大关注。最近，一些研究提供了理论分析，表明在二元类别情况下，有符号传播总是会提高性能。然而，我们注意到先前的分析与多类别基准数据集不太匹配。本文为多类别场景提供了有关有符号传播的新理解，并指出了在消息传递和参数更新方面的两个缺点：（1）消息传递：如果两个节点属于不同类别但具有很高的相似性，则有符号传播可能会降低可分离性。（2）参数更新：在训练期间，有符号邻居的预测不确定性（例如冲突证据）会增加，这可能会影响算法的稳定性。基于观察结果，我们提出了两种改进多类别图中有符号传播的新策略。所提出的方案将校准与减少不确定性相结合，以确保稳健性。我们通过对六个基准图数据集进行广泛实验展示了我们方法的有效性。

更新时间: 2024-06-18 21:48:38

领域: cs.LG,cs.SI

下载: http://arxiv.org/abs/2301.08918v6

PIPPIN: Generating variable length full events from partons

This paper presents a novel approach for directly generating full events at detector-level from parton-level information, leveraging cutting-edge machine learning techniques. To address the challenge of multiplicity variations between parton and reconstructed object spaces, we employ transformers, score-based models and normalizing flows. Our method tackles the inherent complexities of the stochastic transition between these two spaces and achieves remarkably accurate results. The combination of innovative techniques and the achieved accuracy demonstrates the potential of our approach in advancing the field and opens avenues for further exploration. This research contributes to the ongoing efforts in high-energy physics and generative modelling, providing a promising direction for enhanced precision in fast detector simulation.

Updated: 2024-06-18 21:47:28

标题: PIPPIN：从部分子生成可变长度的完整事件

摘要: 本文提出了一种新颖的方法，直接从部分子水平信息生成探测器级别的完整事件，利用尖端的机器学习技术。为了解决部分子和重建对象空间之间多重性变化的挑战，我们采用了变压器、基于得分的模型和归一化流。我们的方法解决了这两个空间之间随机转换的固有复杂性，并取得了非常准确的结果。创新技术的结合和实现的准确性展示了我们方法在推动该领域发展方面的潜力，并为进一步探索开辟了途径。这项研究为高能物理和生成建模领域的持续努力做出了贡献，为提高快速探测器模拟的精度提供了一个有希望的方向。

更新时间: 2024-06-18 21:47:28

领域: hep-ph,cs.LG,hep-ex,stat.ML

下载: http://arxiv.org/abs/2406.13074v1

NoiSec: Harnessing Noise for Security against Adversarial and Backdoor Attacks

The exponential adoption of machine learning (ML) is propelling the world into a future of intelligent automation and data-driven solutions. However, the proliferation of malicious data manipulation attacks against ML, namely adversarial and backdoor attacks, jeopardizes its reliability in safety-critical applications. The existing detection methods against such attacks are built upon assumptions, limiting them in diverse practical scenarios. Thus, motivated by the need for a more robust and unified defense mechanism, we investigate the shared traits of adversarial and backdoor attacks and propose NoiSec that leverages solely the noise, the foundational root cause of such attacks, to detect any malicious data alterations. NoiSec is a reconstruction-based detector that disentangles the noise from the test input, extracts the underlying features from the noise, and leverages them to recognize systematic malicious manipulation. Experimental evaluations conducted on the CIFAR10 dataset demonstrate the efficacy of NoiSec, achieving AUROC scores exceeding 0.954 and 0.852 under white-box and black-box adversarial attacks, respectively, and 0.992 against backdoor attacks. Notably, NoiSec maintains a high detection performance, keeping the false positive rate within only 1\%. Comparative analyses against MagNet-based baselines reveal NoiSec's superior performance across various attack scenarios.

Updated: 2024-06-18 21:44:51

标题: NoiSec：利用噪音来对抗敌对和后门攻击的安全措施

摘要: 机器学习（ML）的指数级采用正在推动世界走向智能自动化和数据驱动解决方案的未来。然而，针对ML的恶意数据操纵攻击，即对抗性和后门攻击的激增，危及其在安全关键应用中的可靠性。现有的针对此类攻击的检测方法建立在假设的基础上，限制了它们在各种实际场景中的应用。因此，受到更强大和统一的防御机制需求的驱使，我们研究了对抗性和后门攻击的共同特征，并提出了NoiSec，它仅利用噪声，即这些攻击的根本根源，来检测任何恶意数据修改。NoiSec是一个基于重建的检测器，它将噪声从测试输入中分离出来，从噪声中提取出底层特征，并利用这些特征来识别系统性的恶意操纵。在CIFAR10数据集上进行的实验评估显示了NoiSec的有效性，其在白盒和黑盒对抗攻击下的AUROC得分分别超过0.954和0.852，并且在后门攻击下为0.992。值得注意的是，NoiSec保持着高检测性能，将误报率保持在仅为1\%。与基于MagNet的基准线的比较分析揭示了NoiSec在各种攻击场景中的卓越性能。

更新时间: 2024-06-18 21:44:51

领域: cs.LG,cs.CR,cs.CV

下载: http://arxiv.org/abs/2406.13073v1

Differentiable All-pole Filters for Time-varying Audio Systems

Infinite impulse response filters are an essential building block of many time-varying audio systems, such as audio effects and synthesisers. However, their recursive structure impedes end-to-end training of these systems using automatic differentiation. Although non-recursive filter approximations like frequency sampling and frame-based processing have been proposed and widely used in previous works, they cannot accurately reflect the gradient of the original system. We alleviate this difficulty by re-expressing a time-varying all-pole filter to backpropagate the gradients through itself, so the filter implementation is not bound to the technical limitations of automatic differentiation frameworks. This implementation can be employed within audio systems containing filters with poles for efficient gradient evaluation. We demonstrate its training efficiency and expressive capabilities for modelling real-world dynamic audio systems on a phaser, time-varying subtractive synthesiser, and feed-forward compressor. We make our code and audio samples available and provide the trained audio effect and synth models in a VST plugin at https://diffapf.github.io/web/.

Updated: 2024-06-18 21:37:25

标题: 不同iable All-pole 滤波器用于时变音频系统

摘要: 无限冲激响应滤波器是许多时变音频系统的基本构建模块，例如音频效果和合成器。然而，它们的递归结构妨碍了使用自动微分对这些系统进行端对端训练。虽然先前的工作中已提出并广泛使用了类似频率抽样和基于帧的非递归滤波器逼近方法，但它们无法准确反映原始系统的梯度。我们通过将时变全极滤波器重新表达为通过自身反向传播梯度来缓解这一困难，因此滤波器实现不受自动微分框架的技术限制。这种实现可以在包含极点滤波器的音频系统中使用，以便进行高效的梯度评估。我们展示了其在相位器、时变减法合成器和前馈压缩器上对建模真实世界动态音频系统的训练效率和表现能力。我们提供我们的代码和音频样本，并在https://diffapf.github.io/web/上提供经过训练的音频效果和合成器模型的VST插件。

更新时间: 2024-06-18 21:37:25

领域: eess.AS,cs.LG,cs.SD

下载: http://arxiv.org/abs/2404.07970v3

Linear Convergence of Black-Box Variational Inference: Should We Stick the Landing?

We prove that black-box variational inference (BBVI) with control variates, particularly the sticking-the-landing (STL) estimator, converges at a geometric (traditionally called "linear") rate under perfect variational family specification. In particular, we prove a quadratic bound on the gradient variance of the STL estimator, one which encompasses misspecified variational families. Combined with previous works on the quadratic variance condition, this directly implies convergence of BBVI with the use of projected stochastic gradient descent. For the projection operator, we consider a domain with triangular scale matrices, which the projection onto is computable in $\Theta(d)$ time, where $d$ is the dimensionality of the target posterior. We also improve existing analysis on the regular closed-form entropy gradient estimators, which enables comparison against the STL estimator, providing explicit non-asymptotic complexity guarantees for both.

Updated: 2024-06-18 21:33:56

标题: 黑盒变分推断的线性收敛性：我们应该坚守立场吗？

摘要: 我们证明了使用控制变量的黑盒变分推断（BBVI），特别是“sticking-the-landing”（STL）估计器，在完美变分族规范下以几何（传统上称为“线性”）速率收敛。特别是，我们证明了STL估计器的梯度方差具有二次上界，这包括了变分族的错误规范。结合之前关于二次方差条件的研究，这直接意味着使用投影随机梯度下降的BBVI的收敛性。对于投影算子，我们考虑了具有三角形比例矩阵的域，其中投影可在$\Theta(d)$时间内计算，其中$d$是目标后验的维数。我们还改进了现有的对于正规闭式熵梯度估计器的分析，这使得可以与STL估计器进行比较，并为两者提供明确的非渐近复杂性保证。

更新时间: 2024-06-18 21:33:56

领域: stat.ML,cs.LG,stat.CO

下载: http://arxiv.org/abs/2307.14642v6

Evaluating $n$-Gram Novelty of Language Models Using Rusty-DAWG

How novel are texts generated by language models (LMs) relative to their training corpora? In this work, we investigate the extent to which modern LMs generate $n$-grams from their training data, evaluating both (i) the probability LMs assign to complete training $n$-grams and (ii) $n$-novelty, the proportion of $n$-grams generated by an LM that did not appear in the training data (for arbitrarily large $n$). To enable arbitrary-length $n$-gram search over a corpus in constant time, we develop Rusty-DAWG, a novel search tool inspired by indexing of genomic data. We compare the novelty of LM-generated text to human-written text and explore factors that affect generation novelty, focusing on the Pythia models. We find that, for $n > 4$, LM-generated text is less novel than human-written text, though it is more novel for smaller $n$. Larger LMs and more constrained decoding strategies both decrease novelty. Finally, we show that LMs complete $n$-grams with lower loss if they are less frequent in the training data. Overall, our results reveal factors influencing the novelty of LM-generated text, and we release Rusty-DAWG to facilitate further pretraining data research.

Updated: 2024-06-18 21:31:19

标题: 使用 Rusty-DAWG 评估语言模型的 n 克新颖性

摘要: 语言模型（LMs）生成的文本相对于它们的训练语料库有多新颖？在这项工作中，我们调查现代LMs生成的n-gram与其训练数据的关系，评估LMs分配给完整训练n-gram的概率以及n-新颖性，即LMs生成的n-gram中在训练数据中未出现的比例（对于任意大的n）。为了实现在常数时间内对语料库进行任意长度的n-gram搜索，我们开发了Rusty-DAWG，这是一种受基因组数据索引启发的新型搜索工具。我们将LM生成的文本的新颖性与人类撰写的文本进行比较，并探讨影响生成新颖性的因素，重点放在Pythia模型上。我们发现，对于n > 4，LM生成的文本比人类撰写的文本更不新颖，尽管对于较小的n，它更具新颖性。更大的LMs和更受限的解码策略都会降低新颖性。最后，我们展示了如果在训练数据中出现较少，LMs将以更低的损失完成n-gram。总的来说，我们的结果揭示了影响LM生成文本新颖性的因素，并发布了Rusty-DAWG以促进进一步的预训练数据研究。

更新时间: 2024-06-18 21:31:19

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2406.13069v1

ALEXR: An Optimal Single-Loop Algorithm for Convex Finite-Sum Coupled Compositional Stochastic Optimization

This paper revisits a class of convex Finite-Sum Coupled Compositional Stochastic Optimization (cFCCO) problems with many applications, including group distributionally robust optimization (GDRO), learning with imbalanced data, reinforcement learning, and learning to rank. To better solve these problems, we introduce an efficient single-loop primal-dual block-coordinate proximal algorithm, dubbed ALEXR. This algorithm leverages block-coordinate stochastic mirror ascent updates for the dual variable and stochastic proximal gradient descent updates for the primal variable. We establish the convergence rates of ALEXR in both convex and strongly convex cases under smoothness and non-smoothness conditions of involved functions, which not only improve the best rates in previous works on smooth cFCCO problems but also expand the realm of cFCCO for solving more challenging non-smooth problems such as the dual form of GDRO. Finally, we present lower complexity bounds to demonstrate that the convergence rates of ALEXR are optimal among first-order block-coordinate stochastic algorithms for the considered class of cFCCO problems.

Updated: 2024-06-18 21:31:08

标题: ALEXR：一种用于凸有限和耦合组合随机优化的最优单循环算法

摘要: 本文重访了一类凸有限和耦合组合随机优化（cFCCO）问题，这些问题具有许多应用，包括群分布鲁棒优化（GDRO）、不平衡数据学习、强化学习和排序学习。为了更好地解决这些问题，我们引入了一种高效的单循环原始-对偶块坐标近端算法，名为ALEXR。该算法利用块坐标随机镜像上升更新对偶变量和随机近端梯度下降更新原始变量。我们在涉及函数的光滑性和非光滑性条件下建立了ALEXR的收敛速度，这不仅改进了先前关于光滑cFCCO问题的最佳速度，而且扩展了cFCCO的范围，以解决更具挑战性的非光滑问题，如GDRO的对偶形式。最后，我们提出了较低的复杂度界限，以证明ALEXR的收敛速度在考虑的cFCCO问题类中是最优的一阶块坐标随机算法之一。

更新时间: 2024-06-18 21:31:08

领域: math.OC,cs.LG

下载: http://arxiv.org/abs/2312.02277v4

MaskPure: Improving Defense Against Text Adversaries with Stochastic Purification

The improvement of language model robustness, including successful defense against adversarial attacks, remains an open problem. In computer vision settings, the stochastic noising and de-noising process provided by diffusion models has proven useful for purifying input images, thus improving model robustness against adversarial attacks. Similarly, some initial work has explored the use of random noising and de-noising to mitigate adversarial attacks in an NLP setting, but improving the quality and efficiency of these methods is necessary for them to remain competitive. We extend upon methods of input text purification that are inspired by diffusion processes, which randomly mask and refill portions of the input text before classification. Our novel method, MaskPure, exceeds or matches robustness compared to other contemporary defenses, while also requiring no adversarial classifier training and without assuming knowledge of the attack type. In addition, we show that MaskPure is provably certifiably robust. To our knowledge, MaskPure is the first stochastic-purification method with demonstrated success against both character-level and word-level attacks, indicating the generalizable and promising nature of stochastic denoising defenses. In summary: the MaskPure algorithm bridges literature on the current strongest certifiable and empirical adversarial defense methods, showing that both theoretical and practical robustness can be obtained together. Code is available on GitHub at https://github.com/hubarruby/MaskPure.

Updated: 2024-06-18 21:27:13

标题: MaskPure：通过随机净化改进对文本对手的防御

摘要: 语言模型的鲁棒性改进，包括成功抵御对抗性攻击，仍然是一个悬而未决的问题。在计算机视觉环境中，扩散模型提供的随机噪声和去噪过程已被证明对净化输入图像、提高模型对抗性攻击的鲁棒性是有用的。同样，在自然语言处理环境中，一些初步工作已经探讨了随机噪声和去噪的使用来缓解对抗性攻击，但改进这些方法的质量和效率对于它们保持竞争力是必要的。我们扩展了受扩散过程启发的输入文本净化方法，该方法在分类之前会随机遮蔽和填充输入文本的部分。我们的新方法MaskPure，在与其他当代防御方法相比，超越或匹配了鲁棒性，同时也不需要对抗性分类器训练，也不需要假设对攻击类型有所了解。此外，我们展示了MaskPure在理论上具有可证明的鲁棒性。据我们所知，MaskPure是第一个证明对字符级和词级攻击都成功的随机净化方法，表明随机去噪防御的通用和有前途的性质。总之：MaskPure算法连接了当前最强的可证明和经验性对抗防御方法的文献，表明理论和实践的鲁棒性可以同时获得。代码可在GitHub上找到：https://github.com/hubarruby/MaskPure。

更新时间: 2024-06-18 21:27:13

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2406.13066v1

Machine Learning and Optimization Techniques for Solving Inverse Kinematics in a 7-DOF Robotic Arm

As the pace of AI technology continues to accelerate, more tools have become available to researchers to solve longstanding problems, Hybrid approaches available today continue to push the computational limits of efficiency and precision. One of such problems is the inverse kinematics of redundant systems. This paper explores the complexities of a 7 degree of freedom manipulator and explores 13 optimization techniques to solve it. Additionally, a novel approach is proposed to contribute to the field of algorithmic research. This was found to be over 200 times faster than the well-known traditional Particle Swarm Optimization technique. This new method may serve as a new field of search that combines the explorative capabilities of Machine Learning with the exploitative capabilities of numerical methods.

Updated: 2024-06-18 21:23:51

标题: 机器学习和优化技术用于解决7自由度机械臂逆运动学问题

摘要: 随着人工智能技术的加速发展，研究人员已经有了更多工具来解决长期存在的问题。目前可用的混合方法继续推动计算效率和精度的极限。其中一个问题是冗余系统的逆运动学。本文探讨了一个具有7个自由度的操纵器的复杂性，并探索了13种优化技术来解决这个问题。此外，提出了一种新颖的方法来贡献于算法研究领域。发现这种方法比众所周知的传统粒子群优化技术快200多倍。这种新方法可能作为一个结合机器学习的探索能力和数值方法的开发能力的新领域搜索方法。

更新时间: 2024-06-18 21:23:51

领域: cs.RO,cs.AI,cs.LG

下载: http://arxiv.org/abs/2406.13064v1

On the Diminishing Returns of Width for Continual Learning

While deep neural networks have demonstrated groundbreaking performance in various settings, these models often suffer from \emph{catastrophic forgetting} when trained on new tasks in sequence. Several works have empirically demonstrated that increasing the width of a neural network leads to a decrease in catastrophic forgetting but have yet to characterize the exact relationship between width and continual learning. We design one of the first frameworks to analyze Continual Learning Theory and prove that width is directly related to forgetting in Feed-Forward Networks (FFN). Specifically, we demonstrate that increasing network widths to reduce forgetting yields diminishing returns. We empirically verify our claims at widths hitherto unexplored in prior studies where the diminishing returns are clearly observed as predicted by our theory.

Updated: 2024-06-18 21:22:10

标题: 关于持续学习中宽度递减回报的研究

摘要: 尽管深度神经网络在各种情况下展示了开创性的性能，但这些模型在顺序训练新任务时经常遭受“灾难性遗忘”。一些研究已经实证，增加神经网络的宽度会减少灾难性遗忘，但尚未确定宽度和持续学习之间的确切关系。我们设计了一个分析持续学习理论的框架，并证明宽度与前馈网络（FFN）中的遗忘直接相关。具体而言，我们证明增加网络宽度以减少遗忘会产生递减的收益。我们在先前研究中尚未探索的宽度上进行了实证验证，我们的理论预测的递减收益清晰可见。

更新时间: 2024-06-18 21:22:10

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2403.06398v3

Physics-inspired spatiotemporal-graph AI ensemble for the detection of higher order wave mode signals of spinning binary black hole mergers

We present a new class of AI models for the detection of quasi-circular, spinning, non-precessing binary black hole mergers whose waveforms include the higher order gravitational wave modes $(l, |m|)=\{(2, 2), (2, 1), (3, 3), (3, 2), (4, 4)\}$, and mode mixing effects in the $l = 3, |m| = 2$ harmonics. These AI models combine hybrid dilated convolution neural networks to accurately model both short- and long-range temporal sequential information of gravitational waves; and graph neural networks to capture spatial correlations among gravitational wave observatories to consistently describe and identify the presence of a signal in a three detector network encompassing the Advanced LIGO and Virgo detectors. We first trained these spatiotemporal-graph AI models using synthetic noise, using 1.2 million modeled waveforms to densely sample this signal manifold, within 1.7 hours using 256 A100 GPUs in the Polaris supercomputer at the ALCF. Our distributed training approach had optimal performance, and strong scaling up to 512 A100 GPUs. With these AI ensembles we processed data from a three detector network, and found that an ensemble of 4 AI models achieves state-of-the-art performance for signal detection, and reports two misclassifications for every decade of searched data. We distributed AI inference over 128 GPUs in the Polaris supercomputer and 128 nodes in the Theta supercomputer, and completed the processing of a decade of gravitational wave data from a three detector network within 3.5 hours. Finally, we fine-tuned these AI ensembles to process the entire month of February 2020, which is part of the O3b LIGO/Virgo observation run, and found 6 gravitational waves, concurrently identified in Advanced LIGO and Advanced Virgo data, and zero false positives. This analysis was completed in one hour using one A100 GPU.

Updated: 2024-06-18 21:18:22

标题: 受物理启发的时空图人工智能集成用于检测自旋二进制黑洞合并的高阶波模信号

摘要: 我们提出了一种新的人工智能模型类别，用于检测包含较高阶引力波模式$(l, |m|)=\{(2, 2), (2, 1), (3, 3), (3, 2), (4, 4)\}$以及$l = 3, |m| = 2$谐波中模式混合效应的准圆形、自旋、非进动双黑洞合并。这些人工智能模型结合了混合扩张卷积神经网络，以精确地建模引力波的短程和长程时间序列信息；以及图神经网络，以捕捉引力波观测站之间的空间相关性，以一致描述和识别在包括先进LIGO和Virgo探测器的三个探测器网络中信号的存在。我们首先使用合成噪声对这些时空图人工智能模型进行训练，使用120万个模拟波形对这个信号流形进行密集采样，在ALCF的Polaris超级计算机上使用256个A100 GPU在1.7小时内完成。我们的分布式训练方法具有最佳性能，并且在512个A100 GPU的强扩展中表现良好。利用这些人工智能合奏，我们处理了三个探测器网络的数据，发现4个人工智能模型的合奏实现了信号检测的最先进性能，并且每个十年搜索数据报告两次误分类。我们在Polaris超级计算机上的128个GPU和Theta超级计算机上的128个节点上分布AI推断，并在3.5小时内完成了对三个探测器网络的十年引力波数据的处理。最后，我们对这些人工智能合奏进行了微调，以处理2020年2月的整个月份，这是O3b LIGO/Virgo观测运行的一部分，发现了6次引力波，同时在先进LIGO和先进Virgo数据中进行了识别，并且没有错误报警。这项分析在一个A100 GPU上使用一个小时完成。

更新时间: 2024-06-18 21:18:22

领域: astro-ph.IM,cs.AI,gr-qc,68T01, 68T35, 83C35, 83C57

下载: http://arxiv.org/abs/2306.15728v2

Scale-Translation Equivariant Network for Oceanic Internal Solitary Wave Localization

Internal solitary waves (ISWs) are gravity waves that are often observed in the interior ocean rather than the surface. They hold significant importance due to their capacity to carry substantial energy, thus influence pollutant transport, oil platform operations, submarine navigation, etc. Researchers have studied ISWs through optical images, synthetic aperture radar (SAR) images, and altimeter data from remote sensing instruments. However, cloud cover in optical remote sensing images variably obscures ground information, leading to blurred or missing surface observations. As such, this paper aims at altimeter-based machine learning solutions to automatically locate ISWs. The challenges, however, lie in the following two aspects: 1) the altimeter data has low resolution, which requires a strong machine learner; 2) labeling data is extremely labor-intensive, leading to very limited data for training. In recent years, the grand progress of deep learning demonstrates strong learning capacity given abundant data. Besides, more recent studies on efficient learning and self-supervised learning laid solid foundations to tackle the aforementioned challenges. In this paper, we propose to inject prior knowledge to achieve a strong and efficient learner. Specifically, intrinsic patterns in altimetry data are efficiently captured using a scale-translation equivariant convolutional neural network (ST-ECNN). By considering inherent symmetries in neural network design, ST-ECNN achieves higher efficiency and better performance than baseline models. Furthermore, we also introduce prior knowledge from massive unsupervised data to enhance our solution using the SimCLR framework for pre-training. Our final solution achieves an overall better performance than baselines on our handcrafted altimetry dataset. Data and codes are available at https://github.com/ZhangWan-byte/Internal_Solitary_Wave_Localization .

Updated: 2024-06-18 21:09:56

标题: 海洋内部孤立波定位的尺度平移等变网络

摘要: 内部孤立波（ISWs）是重力波，通常在海洋内部而非表面观察到。它们具有重要意义，因为能够携带大量能量，从而影响污染物传输、油田平台操作、潜艇航行等。研究人员通过光学图像、合成孔径雷达（SAR）图像和遥感仪器的测高仪数据研究了ISWs。然而，光学遥感图像中的云层遮挡地面信息，导致地面观测模糊或缺失。因此，本文旨在基于测高仪的机器学习解决方案自动定位ISWs。然而，挑战在于以下两个方面：1）测高仪数据分辨率低，需要强大的机器学习器；2）标记数据极其耗时，导致训练数据非常有限。近年来，深度学习的巨大进展表明在有充足数据的情况下具有强大学习能力。此外，对高效学习和自监督学习的最新研究奠定了解决上述挑战的坚实基础。在本文中，我们提出注入先验知识以实现强大和高效学习器。具体地，通过使用尺度平移等变卷积神经网络（ST-ECNN）高效捕捉测高数据中的内在模式。通过考虑神经网络设计中的固有对称性，ST-ECNN实现了比基准模型更高的效率和更好的性能。此外，我们还引入大规模无监督数据中的先验知识，利用SimCLR框架进行预训练以增强我们的解决方案。我们的最终解决方案在我们手工制作的测高数据集上实现了整体更好的性能。数据和代码可在https://github.com/ZhangWan-byte/Internal_Solitary_Wave_Localization获取。

更新时间: 2024-06-18 21:09:56

领域: cs.LG,cs.AI,stat.AP

下载: http://arxiv.org/abs/2406.13060v1

Informed along the road: roadway capacity driven graph convolution network for network-wide traffic prediction

While deep learning has shown success in predicting traffic states, most methods treat it as a general prediction task without considering transportation aspects. Recently, graph neural networks have proven effective for this task, but few incorporate external factors that impact roadway capacity and traffic flow. This study introduces the Roadway Capacity Driven Graph Convolution Network (RCDGCN) model, which incorporates static and dynamic roadway capacity attributes in spatio-temporal settings to predict network-wide traffic states. The model was evaluated on two real-world datasets with different transportation factors: the ICM-495 highway network and an urban network in Manhattan, New York City. Results show RCDGCN outperformed baseline methods in forecasting accuracy. Analyses, including ablation experiments, weight analysis, and case studies, investigated the effect of capacity-related factors. The study demonstrates the potential of using RCDGCN for transportation system management.

Updated: 2024-06-18 21:04:23

标题: 沿途知情：基于道路容量的图卷积网络用于全网络交通预测

摘要: 深度学习在预测交通状态方面取得了成功，但大多数方法将其视为一般性预测任务，而没有考虑交通运输方面的因素。最近，图神经网络已被证明对这一任务非常有效，但很少有方法考虑影响道路容量和交通流量的外部因素。本研究引入了道路容量驱动的图卷积网络（RCDGCN）模型，该模型在时空环境中结合静态和动态道路容量属性，以预测整个网络的交通状态。该模型在两个真实数据集上进行了评估，分别是ICM-495高速公路网络和纽约曼哈顿的城市网络，这两个数据集具有不同的交通因素。结果显示，RCDGCN在预测准确性方面优于基准方法。分析包括消融实验、权重分析和案例研究，研究了与容量相关因素的影响。该研究表明了利用RCDGCN进行交通系统管理的潜力。

更新时间: 2024-06-18 21:04:23

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2406.13057v1

Scalable Ensembling For Mitigating Reward Overoptimisation

Reinforcement Learning from Human Feedback (RLHF) has enabled significant advancements within language modeling for powerful, instruction-following models. However, the alignment of these models remains a pressing challenge as the policy tends to overfit the learned ``proxy" reward model past an inflection point of utility as measured by a ``gold" reward model that is more performant -- a phenomenon known as overoptimisation. Prior work has mitigated this issue by computing a pessimistic statistic over an ensemble of reward models, which is common in Offline Reinforcement Learning but incredibly costly for language models with high memory requirements, making such approaches infeasible for sufficiently large models. To this end, we propose using a shared encoder but separate linear heads. We find this leads to similar performance as the full ensemble while allowing tremendous savings in memory and time required for training for models of similar size.

Updated: 2024-06-18 20:53:08

标题: 可扩展的集成方法用于减轻奖励过度优化

摘要: 来自人类反馈的强化学习（RLHF）已经为强大的指令跟随模型的语言建模带来了显著进展。然而，这些模型的对齐仍然是一个紧迫的挑战，因为策略往往会在学习的“代理”奖励模型在效用的拐点过度拟合，这是由一个更高性能的“金标”奖励模型衡量的现象，即过度优化。先前的工作通过在一组奖励模型上计算悲观统计数据来缓解这个问题，这在离线强化学习中是常见的，但对于具有高内存需求的语言模型来说成本极高，使得这种方法对于足够大的模型是不可行的。为此，我们建议使用共享编码器但分开的线性头部。我们发现这样可以实现与完整集合相似的性能，同时可以在内存和训练所需的时间方面实现巨大的节省，适用于类似大小的模型。

更新时间: 2024-06-18 20:53:08

领域: cs.LG,cs.CL

下载: http://arxiv.org/abs/2406.01013v2

Assessing AI vs Human-Authored Spear Phishing SMS Attacks: An Empirical Study Using the TRAPD Method

This paper explores the rising concern of utilizing Large Language Models (LLMs) in spear phishing message generation, and their performance compared to human-authored counterparts. Our pilot study compares the effectiveness of smishing (SMS phishing) messages created by GPT-4 and human authors, which have been personalized to willing targets. The targets assessed the messages in a modified ranked-order experiment using a novel methodology we call TRAPD (Threshold Ranking Approach for Personalized Deception). Specifically, targets provide personal information (job title and location, hobby, item purchased online), spear smishing messages are created using this information by humans and GPT-4, targets are invited back to rank-order 12 messages from most to least convincing (and identify which they would click on), and then asked questions about why they ranked messages the way they did. They also guess which messages are created by an LLM and their reasoning. Results from 25 targets show that LLM-generated messages are most often perceived as more convincing than those authored by humans, with messages related to jobs being the most convincing. We characterize different criteria used when assessing the authenticity of messages including word choice, style, and personal relevance. Results also show that targets were unable to identify whether the messages was AI-generated or human-authored and struggled to identify criteria to use in order to make this distinction. This study aims to highlight the urgent need for further research and improved countermeasures against personalized AI-enabled social engineering attacks.

Updated: 2024-06-18 20:47:16

标题: 评估人工智能与人类编写的钓鱼短信攻击：使用TRAPD方法进行的实证研究

摘要: 本文探讨了在矛头钓鱼信息生成中利用大型语言模型（LLMs）引起的日益关注，并将它们的表现与人类编写的对手进行比较。我们的初步研究比较了由GPT-4和人类作者创建的个性化目标的短信钓鱼（SMS钓鱼）信息的有效性。目标通过一种我们称之为TRAPD（个性化欺骗的阈值排名方法）的新方法进行了修改后的排名实验，提供个人信息（职称和位置、爱好、在线购买的物品），然后由人类和GPT-4利用这些信息创建矛头短信信息，邀请目标重新对12条信息进行排名，从最具说服力到最不具说服力（并标识他们会点击哪些），然后询问他们为什么这样排名信息。他们还猜测哪些信息是由LLM创建的以及他们的推理。25个目标的结果表明，LLM生成的信息通常被认为比人类创作的更具说服力，与工作相关的信息最具说服力。我们描述了在评估信息真实性时使用的不同标准，包括词语选择、风格和个人相关性。结果还显示，目标无法确定信息是由AI生成还是人类创作的，并且难以确定用于区分的标准。本研究旨在强调迫切需要进一步研究和改进针对个性化AI启用社会工程攻击的对策。

更新时间: 2024-06-18 20:47:16

领域: cs.CY,cs.AI

下载: http://arxiv.org/abs/2406.13049v1

Revisiting Non-separable Binary Classification and its Applications in Anomaly Detection

The inability to linearly classify XOR has motivated much of deep learning. We revisit this age-old problem and show that linear classification of XOR is indeed possible. Instead of separating data between halfspaces, we propose a slightly different paradigm, equality separation, that adapts the SVM objective to distinguish data within or outside the margin. Our classifier can then be integrated into neural network pipelines with a smooth approximation. From its properties, we intuit that equality separation is suitable for anomaly detection. To formalize this notion, we introduce closing numbers, a quantitative measure on the capacity for classifiers to form closed decision regions for anomaly detection. Springboarding from this theoretical connection between binary classification and anomaly detection, we test our hypothesis on supervised anomaly detection experiments, showing that equality separation can detect both seen and unseen anomalies.

Updated: 2024-06-18 20:46:32

标题: 重新审视非可分二元分类及其在异常检测中的应用

摘要: 无法线性分类异或操作一直以来激发了深度学习的许多研究。我们重新审视这个古老的问题，并展示线性分类异或操作的确是可能的。我们提出了一个稍微不同的范式，即等式分离，而不是在半空间之间分隔数据，该范式将支持向量机的目标调整为区分数据在或者超出边界。我们的分类器可以被集成到神经网络管道中，并进行平滑逼近。根据其性质，我们直觉地认为等式分离适用于异常检测。为了形式化这一概念，我们引入了闭合数，一种衡量分类器形成封闭决策区域容量的定量标准。基于二元分类与异常检测之间的理论联系，我们在监督异常检测实验中测试了我们的假设，结果显示等式分离可以检测到已知和未知的异常。

更新时间: 2024-06-18 20:46:32

领域: cs.LG,cs.AI,stat.ML,68T37 (Primary), 68T07 (Secondary),I.2.6; I.5.1

下载: http://arxiv.org/abs/2312.01541v2

Bayesian-LoRA: LoRA based Parameter Efficient Fine-Tuning using Optimal Quantization levels and Rank Values trough Differentiable Bayesian Gates

It is a common practice in natural language processing to pre-train a single model on a general domain and then fine-tune it for downstream tasks. However, when it comes to Large Language Models, fine-tuning the entire model can be computationally expensive, resulting in very intensive energy consumption. As a result, several Parameter efficient fine-tuning (PEFT) approaches were recently proposed. One of the most popular approaches is low-rank adaptation (LoRA), where the key insight is decomposing the update weights of the pre-trained model into two low-rank matrices. However, the proposed approaches either use the same rank value across all different weight matrices or do not use any quantization technique, which has been shown to be one of the most important factors when it comes to a model's energy consumption. In this work, we propose Bayesian-LoRA (B-LoRA) which approaches matrix decomposition and quantization from a Bayesian perspective by employing a prior distribution on both quantization levels and rank values of the learned low-rank matrices. As a result, B-LoRA is able to fine-tune a pre-trained model on a specific downstream task, finding the optimal rank values and quantization levels for every low-rank matrix. We validate the proposed model fine-tuning a pre-trained DeBERTaV3 on the GLUE benchmark. Moreover, we compare it to relevant baselines and present both qualitative and quantitative results, showing how the proposed approach is able to learn optimal-rank quantized matrices. B-LoRA performs on par or better than baselines while reducing the total amount of bit operations of roughly 70% with respect to the baselines ones.

Updated: 2024-06-18 20:26:30

标题: 贝叶斯LoRA：使用最优量化级别和经过可微分贝叶斯门的排名值进行参数高效微调的LoRA

摘要: 在自然语言处理中，通常会对一个通用领域的单一模型进行预训练，然后再对下游任务进行微调。然而，在大型语言模型中，对整个模型进行微调可能会造成计算开销巨大，导致能源消耗非常高。因此，最近提出了几种参数高效的微调（PEFT）方法。其中最流行的方法之一是低秩适应（LoRA），其关键洞察是将预训练模型的更新权重分解为两个低秩矩阵。然而，现有的方法要么在所有不同的权重矩阵上使用相同的秩值，要么不使用任何量化技术，而量化技术被证明是影响模型能源消耗最重要的因素之一。在这项工作中，我们提出了贝叶斯-LoRA（B-LoRA），通过在学习到的低秩矩阵的量化水平和秩值上采用先验分布，从贝叶斯角度处理矩阵分解和量化。因此，B-LoRA能够对预训练模型进行微调以适应特定的下游任务，找到每个低秩矩阵的最优秩值和量化水平。我们通过在GLUE基准上微调预训练的DeBERTaV3来验证提出的模型。此外，我们将其与相关基准进行比较，并展示定性和定量结果，显示出所提出的方法如何学习到最优秩量化矩阵。B-LoRA的表现与基准相当或更好，同时相对于基准减少了大约70%的总位操作量。

更新时间: 2024-06-18 20:26:30

领域: cs.AI

下载: http://arxiv.org/abs/2406.13046v1

FLUX: Fast Software-based Communication Overlap On GPUs Through Kernel Fusion

Large deep learning models have demonstrated strong ability to solve many tasks across a wide range of applications. Those large models typically require training and inference to be distributed. Tensor parallelism is a common technique partitioning computation of an operation or layer across devices to overcome the memory capacity limitation of a single processor, and/or to accelerate computation to meet a certain latency requirement. However, this kind of parallelism introduces additional communication that might contribute a significant portion of overall runtime. Thus limits scalability of this technique within a group of devices with high speed interconnects, such as GPUs with NVLinks in a node. This paper proposes a novel method, Flux, to significantly hide communication latencies with dependent computations for GPUs. Flux over-decomposes communication and computation operations into much finer-grained operations and further fuses them into a larger kernel to effectively hide communication without compromising kernel efficiency. Flux can potentially overlap up to 96% of communication given a fused kernel. Overall, it can achieve up to 1.24x speedups for training over Megatron-LM on a cluster of 128 GPUs with various GPU generations and interconnects, and up to 1.66x and 1.30x speedups for prefill and decoding inference over vLLM on a cluster with 8 GPUs with various GPU generations and interconnects.

Updated: 2024-06-18 20:25:56

标题: FLUX: 基于GPU的快速软件通信重叠通过内核融合

摘要: 大型深度学习模型已经展示出在广泛的应用领域中解决许多任务的强大能力。这些大型模型通常需要分布式进行训练和推断。张量并行是一种常见的技术，将操作或层的计算在多个设备之间分区，以克服单个处理器的内存容量限制，并/或加速计算以满足特定的延迟要求。然而，这种并行技术引入了额外的通信，可能会对整体运行时间贡献显著比例。因此，在具有高速互连的设备组内，如具有NVLinks的GPU节点中，这种技术的可扩展性受到限制。本文提出了一种新颖的方法Flux，通过依赖计算显著隐藏GPU的通信延迟。Flux将通信和计算操作过度分解为更细粒度的操作，并进一步将它们融合成一个更大的内核，以有效地隐藏通信而不影响内核效率。在给定一个融合内核的情况下，Flux可以潜在地重叠高达96%的通信。总体而言，它可以在拥有各种GPU代和互连的128个GPU集群上，相对于Megatron-LM，实现高达1.24倍的训练加速，并在拥有各种GPU代和互连的8个GPU集群上，相对于vLLM，实现高达1.66倍和1.30倍的预填和解码推断加速。

更新时间: 2024-06-18 20:25:56

领域: cs.LG,cs.DC

下载: http://arxiv.org/abs/2406.06858v4

Toward Human-AI Alignment in Large-Scale Multi-Player Games

Achieving human-AI alignment in complex multi-agent games is crucial for creating trustworthy AI agents that enhance gameplay. We propose a method to evaluate this alignment using an interpretable task-sets framework, focusing on high-level behavioral tasks instead of low-level policies. Our approach has three components. First, we analyze extensive human gameplay data from Xbox's Bleeding Edge (100K+ games), uncovering behavioral patterns in a complex task space. This task space serves as a basis set for a behavior manifold capturing interpretable axes: fight-flight, explore-exploit, and solo-multi-agent. Second, we train an AI agent to play Bleeding Edge using a Generative Pretrained Causal Transformer and measure its behavior. Third, we project human and AI gameplay to the proposed behavior manifold to compare and contrast. This allows us to interpret differences in policy as higher-level behavioral concepts, e.g., we find that while human players exhibit variability in fight-flight and explore-exploit behavior, AI players tend towards uniformity. Furthermore, AI agents predominantly engage in solo play, while humans often engage in cooperative and competitive multi-agent patterns. These stark differences underscore the need for interpretable evaluation, design, and integration of AI in human-aligned applications. Our study advances the alignment discussion in AI and especially generative AI research, offering a measurable framework for interpretable human-agent alignment in multiplayer gaming.

Updated: 2024-06-18 20:23:37

标题: 朝向大规模多人游戏中人工智能与人类的协同。

摘要: 在复杂的多智能体游戏中实现人工智能对齐至关重要，以创建可信赖的AI代理，增强游戏体验。我们提出了一种使用可解释的任务集框架来评估这种对齐的方法，重点放在高级行为任务而不是低级策略上。我们的方法有三个组成部分。首先，我们分析了来自Xbox的《Bleeding Edge》（超过100,000场比赛）的大量人类游戏数据，揭示了复杂任务空间中的行为模式。这个任务空间作为一个基础集合，捕捉了可解释的轴线：战斗-逃避、探索-开发和独立-多智能体。其次，我们训练一个使用生成预训练因果变换器来玩《Bleeding Edge》的AI代理，并测量其行为。第三，我们将人类和AI游戏投影到提出的行为空间中进行比较和对比。这使我们能够解释策略差异作为更高级别的行为概念，例如，我们发现，尽管人类玩家在战斗-逃避和探索-开发行为方面存在差异，AI玩家趋向于一致性。此外，AI代理主要进行独立游戏，而人类通常参与合作和竞争的多智能体模式。这些明显的差异强调了在人工智能对齐应用中进行可解释评估、设计和集成的必要性。我们的研究推动了AI对齐讨论，特别是生成式AI研究，为多人游戏中可解释的人-代理对齐提供了可衡量的框架。

更新时间: 2024-06-18 20:23:37

领域: cs.AI,cs.HC

下载: http://arxiv.org/abs/2402.03575v2

Accelerated Stochastic Min-Max Optimization Based on Bias-corrected Momentum

Lower-bound analyses for nonconvex strongly-concave minimax optimization problems have shown that stochastic first-order algorithms require at least $\mathcal{O}(\varepsilon^{-4})$ oracle complexity to find an $\varepsilon$-stationary point. Some works indicate that this complexity can be improved to $\mathcal{O}(\varepsilon^{-3})$ when the loss gradient is Lipschitz continuous. The question of achieving enhanced convergence rates under distinct conditions, remains unresolved. In this work, we address this question for optimization problems that are nonconvex in the minimization variable and strongly concave or Polyak-Lojasiewicz (PL) in the maximization variable. We introduce novel bias-corrected momentum algorithms utilizing efficient Hessian-vector products. We establish convergence conditions and demonstrate a lower iteration complexity of $\mathcal{O}(\varepsilon^{-3})$ for the proposed algorithms. The effectiveness of the method is validated through applications to robust logistic regression using real-world datasets.

Updated: 2024-06-18 20:14:52

标题: 基于偏差校正动量的加速随机极大极小优化

摘要: 非凸强凹极小化问题的下界分析表明，随机一阶算法至少需要$\mathcal{O}(\varepsilon^{-4})$的oracle复杂度才能找到一个$\varepsilon$-稳定点。一些研究表明，在损失梯度是Lipschitz连续时，这种复杂度可以改进为$\mathcal{O}(\varepsilon^{-3})$。在不同条件下实现更快的收敛速度的问题仍未解决。在本研究中，我们针对在最小化变量中是非凸的，最大化变量中是强凹或Polyak-Lojasiewicz（PL）的优化问题解决这个问题。我们引入了利用高效Hessian-vector乘积的新型偏差校正动量算法。我们建立了收敛条件，并展示了所提出算法的较低迭代复杂度为$\mathcal{O}(\varepsilon^{-3})$。该方法的有效性通过在真实世界数据集上应用鲁棒逻辑回归来验证。

更新时间: 2024-06-18 20:14:52

领域: cs.LG,math.OC

下载: http://arxiv.org/abs/2406.13041v1

Traffic Prediction considering Multiple Levels of Spatial-temporal Information: A Multi-scale Graph Wavelet-based Approach

Although traffic prediction has been receiving considerable attention with a number of successes in the context of intelligent transportation systems, the prediction of traffic states over a complex transportation network that contains different road types has remained a challenge. This study proposes a multi-scale graph wavelet temporal convolution network (MSGWTCN) to predict the traffic states in complex transportation networks. Specifically, a multi-scale spatial block is designed to simultaneously capture the spatial information at different levels, and the gated temporal convolution network is employed to extract the temporal dependencies of the data. The model jointly learns to mount multiple levels of the spatial interactions by stacking graph wavelets with different scales. Two real-world datasets are used in this study to investigate the model performance, including a highway network in Seattle and a dense road network of Manhattan in New York City. Experiment results show that the proposed model outperforms other baseline models. Furthermore, different scales of graph wavelets are found to be effective in extracting local, intermediate and global information at the same time and thus enable the model to learn a complex transportation network topology with various types of road segments. By carefully customizing the scales of wavelets, the model is able to improve the prediction performance and better adapt to different network configurations.

Updated: 2024-06-18 20:05:47

标题: 考虑多个空间-时间信息层次的交通预测：基于多尺度图小波的方法

摘要: 尽管交通预测在智能交通系统的背景下受到了广泛关注，并取得了一些成功，但对包含不同道路类型的复杂交通网络中的交通状态进行预测仍然是一个挑战。本研究提出了一种多尺度图小波时空卷积网络（MSGWTCN），用于预测复杂交通网络中的交通状态。具体而言，设计了一个多尺度空间块，同时捕捉不同级别的空间信息，并利用门控时间卷积网络提取数据的时间依赖性。该模型通过堆叠具有不同尺度的图小波来联合学习多个级别的空间交互作用。本研究使用了西雅图的高速公路网络和纽约市曼哈顿的密集道路网络两个真实世界数据集来调查模型的性能。实验结果显示，所提出的模型优于其他基准模型。此外，发现不同尺度的图小波同时有效地提取了本地、中间和全局信息，从而使模型能够学习具有各种类型道路段的复杂交通网络拓扑结构。通过精心定制小波的尺度，模型能够提高预测性能并更好地适应不同的网络配置。

更新时间: 2024-06-18 20:05:47

领域: cs.AI,eess.SP

下载: http://arxiv.org/abs/2406.13038v1

Sharp detection of low-dimensional structure in probability measures via dimensional logarithmic Sobolev inequalities

Identifying low-dimensional structure in high-dimensional probability measures is an essential pre-processing step for efficient sampling. We introduce a method for identifying and approximating a target measure $\pi$ as a perturbation of a given reference measure $\mu$ along a few significant directions of $\mathbb{R}^{d}$. The reference measure can be a Gaussian or a nonlinear transformation of a Gaussian, as commonly arising in generative modeling. Our method extends prior work on minimizing majorizations of the Kullback--Leibler divergence to identify optimal approximations within this class of measures. Our main contribution unveils a connection between the \emph{dimensional} logarithmic Sobolev inequality (LSI) and approximations with this ansatz. Specifically, when the target and reference are both Gaussian, we show that minimizing the dimensional LSI is equivalent to minimizing the KL divergence restricted to this ansatz. For general non-Gaussian measures, the dimensional LSI produces majorants that uniformly improve on previous majorants for gradient-based dimension reduction. We further demonstrate the applicability of this analysis to the squared Hellinger distance, where analogous reasoning shows that the dimensional Poincar\'e inequality offers improved bounds.

Updated: 2024-06-18 20:02:44

标题: 通过维度对数Sobolev不等式对概率测度中的低维结构进行敏锐检测

摘要: 在高维概率测度中识别低维结构是高效采样的一个关键预处理步骤。我们介绍了一种方法，用于识别和逼近目标测度 $\pi$，将其视为给定参考测度 $\mu$ 沿着 $\mathbb{R}^{d}$ 中少数重要方向的扰动。参考测度可以是高斯分布或高斯变换的非线性变换，通常在生成模型中出现。我们的方法扩展了先前关于最小化 Kullback--Leibler 散度的主要化来识别在这类测度中的最优逼近的工作。我们的主要贡献揭示了\emph{维度}对数 Sobolev 不等式（LSI）与这种假设的逼近之间的联系。具体而言，当目标和参考均为高斯分布时，我们展示了最小化维度LSI等效于在这种假设下最小化KL散度。对于一般的非高斯测度，维度LSI产生的主要上限在梯度为基础的降维中统一改进了先前的主要上限。我们进一步展示了这种分析方法对于平方Hellinger距离的适用性，类似的推理表明维度Poincar\'e不等式提供了改进的界限。

更新时间: 2024-06-18 20:02:44

领域: stat.ML,cs.LG,math.PR,math.ST,stat.CO,stat.TH

下载: http://arxiv.org/abs/2406.13036v1

Real-time Yemeni Currency Detection

Banknote recognition is a major problem faced by visually Challenged people. So we propose a application to help the visually Challenged people to identify the different types of Yemenian currencies through deep learning technique. As money has a significant role in daily life for any business transactions, real-time detection and recognition of banknotes become necessary for a person, especially blind or visually impaired, or for a system that sorts the data. This paper presents a real-time Yemeni currency detection system for visually impaired persons. The proposed system exploits the deep learning approach to facilitate the visually impaired people to prosperously recognize banknotes. For real-time recognition, we have deployed the system into a mobile application.

Updated: 2024-06-18 19:57:15

标题: 实时也门货币检测

摘要: 纸币识别是视觉障碍人士面临的主要问题。因此，我们提出了一种应用程序，通过深度学习技术帮助视觉障碍人士识别不同类型的也门货币。由于货币在日常生活中对于任何业务交易都具有重要作用，因此对于一个人，尤其是盲人或视觉受限者，或者用于对数据进行排序的系统，实时检测和识别纸币变得必要。本文提出了一个用于视觉受限者的实时也门货币检测系统。所提出的系统利用深度学习方法，便于视觉受限者成功识别纸币。为了实现实时识别，我们将系统部署到了一个移动应用程序中。

更新时间: 2024-06-18 19:57:15

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2406.13034v1

"False negative -- that one is going to kill you": Understanding Industry Perspectives of Static Analysis based Security Testing

The demand for automated security analysis techniques, such as static analysis based security testing (SAST) tools continues to increase. To develop SASTs that are effectively leveraged by developers for finding vulnerabilities, researchers and tool designers must understand how developers perceive, select, and use SASTs, what they expect from the tools, whether they know of the limitations of the tools, and how they address those limitations. This paper describes a qualitative study that explores the assumptions, expectations, beliefs, and challenges experienced by developers who use SASTs. We perform in-depth, semi-structured interviews with 20 practitioners who possess a diverse range of software development expertise, as well as a variety of unique security, product, and organizational backgrounds. We identify $17$ key findings that shed light on developer perceptions and desires related to SASTs, and also expose gaps in the status quo - challenging long-held beliefs in SAST design priorities. Finally, we provide concrete future directions for researchers and practitioners rooted in an analysis of our findings.

Updated: 2024-06-18 19:46:47

标题: "假阴性 - 那将会害了你：理解基于静态分析的安全测试的行业观点"

摘要: 对于自动化安全分析技术的需求，如基于静态分析的安全测试（SAST）工具的需求仍在增加。为了开发能够有效被开发人员利用来发现漏洞的SAST，研究人员和工具设计者必须了解开发人员如何看待、选择和使用SAST，他们对工具有什么期望，是否了解工具的局限性以及他们如何解决这些局限性。本文描述了一项定性研究，探讨了使用SAST的开发人员所经历的假设、期望、信念和挑战。我们对拥有多样化软件开发专业知识以及各种独特安全、产品和组织背景的20名从业者进行了深入、半结构化的访谈。我们确定了17个关键发现，揭示了与SAST相关的开发人员看法和愿望，同时也暴露了现状中的差距 - 挑战了长期以来对SAST设计优先级的信念。最后，我们根据我们的发现提供了具体的未来研究方向，供研究人员和从业者参考。

更新时间: 2024-06-18 19:46:47

领域: cs.CR,cs.SE

下载: http://arxiv.org/abs/2307.16325v3

ABNet: Attention BarrierNet for Safe and Scalable Robot Learning

Safe learning is central to AI-enabled robots where a single failure may lead to catastrophic results. Barrier-based method is one of the dominant approaches for safe robot learning. However, this method is not scalable, hard to train, and tends to generate unstable signals under noisy inputs that are challenging to be deployed for robots. To address these challenges, we propose a novel Attention BarrierNet (ABNet) that is scalable to build larger foundational safe models in an incremental manner. Each head of BarrierNet in the ABNet could learn safe robot control policies from different features and focus on specific part of the observation. In this way, we do not need to one-shotly construct a large model for complex tasks, which significantly facilitates the training of the model while ensuring its stable output. Most importantly, we can still formally prove the safety guarantees of the ABNet. We demonstrate the strength of ABNet in 2D robot obstacle avoidance, safe robot manipulation, and vision-based end-to-end autonomous driving, with results showing much better robustness and guarantees over existing models.

Updated: 2024-06-18 19:37:44

标题: ABNet：用于安全和可扩展机器人学习的注意力屏障网络

摘要: 安全学习对于人工智能机器人至关重要，因为单一故障可能导致灾难性结果。基于障碍物的方法是安全机器人学习的主要方法之一。然而，该方法不具备可扩展性，难以训练，并且在嘈杂输入下容易生成不稳定的信号，难以部署到机器人中。为了解决这些挑战，我们提出了一种新颖的关注障碍网络（ABNet），可以以增量方式可扩展地构建更大的基础安全模型。ABNet中的每个BarrierNet头可以从不同的特征中学习安全机器人控制策略，并专注于观察的特定部分。通过这种方式，我们不需要一次性构建一个复杂任务的大模型，这显著简化了模型的训练过程，同时确保其稳定的输出。最重要的是，我们仍然可以正式证明ABNet的安全性保证。我们展示了ABNet在2D机器人避障、安全机器人操纵和基于视觉的端到端自动驾驶中的强大能力，结果显示比现有模型具有更好的鲁棒性和保证性。

更新时间: 2024-06-18 19:37:44

领域: cs.LG,cs.RO,cs.SY,eess.SY

下载: http://arxiv.org/abs/2406.13025v1

Equivariance via Minimal Frame Averaging for More Symmetries and Efficiency

We consider achieving equivariance in machine learning systems via frame averaging. Current frame averaging methods involve a costly sum over large frames or rely on sampling-based approaches that only yield approximate equivariance. Here, we propose Minimal Frame Averaging (MFA), a mathematical framework for constructing provably minimal frames that are exactly equivariant. The general foundations of MFA also allow us to extend frame averaging to more groups than previously considered, including the Lorentz group for describing symmetries in space-time, and the unitary group for complex-valued domains. Results demonstrate the efficiency and effectiveness of encoding symmetries via MFA across a diverse range of tasks, including $n$-body simulation, top tagging in collider physics, and relaxed energy prediction. Our code is available at https://github.com/divelab/MFA.

Updated: 2024-06-18 19:37:05

标题: 等变性通过最小帧平均获得更多的对称性和效率

摘要: 我们考虑通过帧平均实现机器学习系统中的等变性。当前的帧平均方法涉及对大帧的昂贵求和，或者依赖于基于采样的方法，这些方法只能产生近似的等变性。在这里，我们提出了最小帧平均（MFA），这是一个数学框架，用于构建可以证明是确切等变的最小帧。MFA的一般基础还允许我们将帧平均扩展到比以前考虑的更多群体，包括用于描述时空中的对称性的洛伦兹群，和用于复值域的酉群。结果表明，通过MFA对各种任务进行对称性编码的效率和有效性，包括$n$-体模拟、对撞机物理中的顶标记和松弛能量预测。我们的代码可以在https://github.com/divelab/MFA找到。

更新时间: 2024-06-18 19:37:05

领域: cs.LG

下载: http://arxiv.org/abs/2406.07598v3

Stackelberg Games with $k$-Submodular Function under Distributional Risk-Receptiveness and Robustness

We study submodular optimization in adversarial context, applicable to machine learning problems such as feature selection using data susceptible to uncertainties and attacks. We focus on Stackelberg games between an attacker (or interdictor) and a defender where the attacker aims to minimize the defender's objective of maximizing a $k$-submodular function. We allow uncertainties arising from the success of attacks and inherent data noise, and address challenges due to incomplete knowledge of the probability distribution of random parameters. Specifically, we introduce Distributionally Risk-Averse $k$-Submodular Interdiction Problem (DRA $k$-SIP) and Distributionally Risk-Receptive $k$-Submodular Interdiction Problem (DRR $k$-SIP) along with finitely convergent exact algorithms for solving them. The DRA $k$-SIP solution allows risk-averse interdictor to develop robust strategies for real-world uncertainties. Conversely, DRR $k$-SIP solution suggests aggressive tactics for attackers, willing to embrace (distributional) risk to inflict maximum damage, identifying critical vulnerable components, which can be used for the defender's defensive strategies. The optimal values derived from both DRA $k$-SIP and DRR $k$-SIP offer a confidence interval-like range for the expected value of the defender's objective function, capturing distributional ambiguity. We conduct computational experiments using instances of feature selection and sensor placement problems, and Wisconsin breast cancer data and synthetic data, respectively.

Updated: 2024-06-18 19:30:46

标题: 带有$k$-次模函数的Stackelberg博弈在分布风险接受度和鲁棒性下

摘要: 我们研究在对抗环境中的次模优化，适用于机器学习问题，例如使用易受不确定性和攻击的数据进行特征选择。我们关注攻击者（或干扰者）和防御者之间的Stackelberg博弈，攻击者旨在最小化防御者最大化$k$-次模函数的目标。我们允许由于攻击成功和固有数据噪声而产生的不确定性，并解决由于对随机参数的概率分布的不完全了解而产生的挑战。具体而言，我们引入了Distributionally Risk-Averse $k$-Submodular Interdiction Problem（DRA $k$-SIP）和Distributionally Risk-Receptive $k$-Submodular Interdiction Problem（DRR $k$-SIP），以及用于解决它们的有限收敛精确算法。DRA $k$-SIP解决方案允许风险厌恶的干扰者为真实世界的不确定性制定强大的策略。相反，DRR $k$-SIP解决方案建议攻击者采取激进策略，愿意接受（分布）风险以造成最大伤害，识别关键的易受攻击组件，可用于防守者的防御策略。从DRA $k$-SIP和DRR $k$-SIP得出的最优值为防御者目标函数的期望值提供了类似置信区间的范围，捕捉分布模糊性。我们使用特征选择和传感器位置问题的实例，以及威斯康星乳腺癌数据和合成数据进行计算实验。

更新时间: 2024-06-18 19:30:46

领域: math.OC,cs.LG

下载: http://arxiv.org/abs/2406.13023v1

How Safe Am I Given What I See? Calibrated Prediction of Safety Chances for Image-Controlled Autonomy

End-to-end learning has emerged as a major paradigm for developing autonomous systems. Unfortunately, with its performance and convenience comes an even greater challenge of safety assurance. A key factor of this challenge is the absence of the notion of a low-dimensional and interpretable dynamical state, around which traditional assurance methods revolve. Focusing on the online safety prediction problem, this paper proposes a configurable family of learning pipelines based on generative world models, which do not require low-dimensional states. To implement these pipelines, we overcome the challenges of learning safety-informed latent representations and missing safety labels under prediction-induced distribution shift. These pipelines come with statistical calibration guarantees on their safety chance predictions based on conformal prediction. We perform an extensive evaluation of the proposed learning pipelines on two case studies of image-controlled systems: a racing car and a cartpole.

Updated: 2024-06-18 19:30:36

标题: 我看到的情况下我有多安全？基于图像控制自主性的安全几率校准预测

摘要: 端到端学习已经成为开发自主系统的主要范式。不幸的是，随着其性能和便利性的增加，安全保障面临着更大的挑战。这一挑战的关键因素是传统保障方法围绕的低维可解释的动态状态的概念缺失。本文针对在线安全预测问题，提出了一种基于生成世界模型的可配置学习流程系列，不需要低维状态。为了实现这些流程，我们克服了学习受安全信息驱动的潜在表示和在预测引起的分布转移下缺失安全标签的挑战。这些流程具有基于符合预测的安全机会预测的统计校准保证。我们在两个图像控制系统的案例研究中对所提出的学习流程进行了广泛评估：一辆赛车和一个倒立摆。

更新时间: 2024-06-18 19:30:36

领域: cs.LG

下载: http://arxiv.org/abs/2308.12252v4

As Advertised? Understanding the Impact of Influencer VPN Ads

Influencer VPN ads (sponsored segments) on YouTube often disseminate misleading information about both VPNs, and security & privacy more broadly. However, it remains unclear how (or whether) these ads affect users' perceptions and knowledge about VPNs. In this work, we explore the relationship between YouTube VPN ad exposure and users' mental models of VPNs, security, and privacy. We use a novel VPN ad detection model to calculate the ad exposure of 217 participants via their YouTube watch histories, and we develop scales to characterize their mental models in relation to claims commonly made in VPN ads. Through (pre-registered) regression-based analysis, we find that exposure to VPN ads is significantly correlated with familiarity with VPN brands and increased belief in (hyperbolic) threats. While not specific to VPNs, these threats are often discussed in VPN ads. In contrast, although many participants agree with both factual and misleading mental models of VPNs that often appear in ads, we find no significant correlation between exposure to VPN ads and these mental models. These findings suggest that, if VPN ads do impact mental models, then it is predominantly emotional (i.e., threat perceptions) rather than technical.

Updated: 2024-06-18 19:22:37

标题: 按照广告宣传的方式？了解影响力网络广告的影响

摘要: 在YouTube上，影响力者VPN广告（赞助片段）经常传播关于VPN、安全和隐私的误导信息。然而，目前还不清楚这些广告如何（或者是否）影响用户对VPN的认知和了解。在这项研究中，我们探讨了YouTube VPN广告曝光与用户对VPN、安全和隐私的心智模型之间的关系。我们使用一种新颖的VPN广告检测模型，通过217位参与者的YouTube观看历史来计算他们的广告曝光量，并开发了用于描述他们心智模型的量表，以与VPN广告中常见的声明相对应。通过（预注册的）基于回归分析，我们发现暴露于VPN广告的程度与熟悉VPN品牌和对（夸张的）威胁的信念增加之间显著相关。虽然这些威胁并非专门针对VPN，在VPN广告中经常讨论这些威胁。相比之下，虽然许多参与者同意出现在广告中的VPN的事实和误导性心智模型，但我们并未发现暴露于VPN广告与这些心智模型之间存在显著相关性。这些发现表明，如果VPN广告确实影响心智模型，那么主要是情感方面（即威胁认知），而不是技术方面。

更新时间: 2024-06-18 19:22:37

领域: cs.CR,cs.HC

下载: http://arxiv.org/abs/2406.13017v1

Is the System Message Really Important to Jailbreaks in Large Language Models?

The rapid evolution of Large Language Models (LLMs) has rendered them indispensable in modern society. While security measures are typically to align LLMs with human values prior to release, recent studies have unveiled a concerning phenomenon named "Jailbreak". This term refers to the unexpected and potentially harmful responses generated by LLMs when prompted with malicious questions. Most existing research focus on generating jailbreak prompts but system message configurations vary significantly in experiments. In this paper, we aim to answer a question: Is the system message really important for jailbreaks in LLMs? We conduct experiments in mainstream LLMs to generate jailbreak prompts with varying system messages: short, long, and none. We discover that different system messages have distinct resistances to jailbreaks. Therefore, we explore the transferability of jailbreaks across LLMs with different system messages. Furthermore, we propose the System Messages Evolutionary Algorithm (SMEA) to generate system messages that are more resistant to jailbreak prompts, even with minor changes. Through SMEA, we get a robust system messages population with little change in the length of system messages. Our research not only bolsters LLMs security but also raises the bar for jailbreaks, fostering advancements in this field of study.

Updated: 2024-06-18 19:22:19

标题: 大语言模型中系统消息对越狱是否真的很重要？

摘要: 大型语言模型（LLMs）的快速发展使它们在现代社会中变得不可或缺。虽然通常会采取安全措施以在发布前与人类价值观保持一致，但最近的研究揭示了一个令人担忧的现象，被称为“越狱”。这个术语指的是当LLMs被提示恶意问题时生成的意外且可能有害的回应。大多数现有研究侧重于生成越狱提示，但在实验中系统消息配置存在显著差异。在本文中，我们旨在回答一个问题：系统消息对LLMs中的越狱是否真的很重要？我们在主流LLMs中进行实验，生成具有不同系统消息的越狱提示：短、长和无。我们发现不同的系统消息对越狱有不同的抵抗力。因此，我们探讨了在具有不同系统消息的LLMs之间越狱的可传递性。此外，我们提出了系统消息演化算法（SMEA），以生成更能抵抗越狱提示的系统消息，即使只有微小的变化。通过SMEA，我们得到了一个稳健的系统消息群体，而系统消息的长度几乎没有变化。我们的研究不仅增强了LLMs的安全性，还提高了越狱的门槛，促进了这一研究领域的进展。

更新时间: 2024-06-18 19:22:19

领域: cs.CL,cs.AI,cs.CR

下载: http://arxiv.org/abs/2402.14857v2

The troublesome kernel -- On hallucinations, no free lunches and the accuracy-stability trade-off in inverse problems

Methods inspired by Artificial Intelligence (AI) are starting to fundamentally change computational science and engineering through breakthrough performances on challenging problems. However, reliability and trustworthiness of such techniques is a major concern. In inverse problems in imaging, the focus of this paper, there is increasing empirical evidence that methods may suffer from hallucinations, i.e., false, but realistic-looking artifacts; instability, i.e., sensitivity to perturbations in the data; and unpredictable generalization, i.e., excellent performance on some images, but significant deterioration on others. This paper provides a theoretical foundation for these phenomena. We give mathematical explanations for how and when such effects arise in arbitrary reconstruction methods, with several of our results taking the form of `no free lunch' theorems. Specifically, we show that (i) methods that overperform on a single image can wrongly transfer details from one image to another, creating a hallucination, (ii) methods that overperform on two or more images can hallucinate or be unstable, (iii) optimizing the accuracy-stability trade-off is generally difficult, (iv) hallucinations and instabilities, if they occur, are not rare events, and may be encouraged by standard training, (v) it may be impossible to construct optimal reconstruction maps for certain problems. Our results trace these effects to the kernel of the forward operator whenever it is nontrivial, but also apply to the case when the forward operator is ill-conditioned. Based on these insights, our work aims to spur research into new ways to develop robust and reliable AI-based methods for inverse problems in imaging.

Updated: 2024-06-18 19:18:47

标题: 这个标题的翻译是：麻烦的核心——关于幻觉、免费午餐和逆问题中准确性和稳定性的权衡

摘要: 受人工智能（AI）启发的方法开始从根本上改变计算科学和工程，通过在具有挑战性的问题上取得突破性表现。然而，这些技术的可靠性和信任度是一个主要关注点。在成像中的逆问题中，本文的焦点，越来越多的经验证据表明，方法可能患有幻觉，即虽然是假的，但看起来很真实的伪影; 不稳定性，即对数据中的扰动敏感; 和不可预测的泛化，即在一些图像上表现出色，但在其他图像上明显恶化。本文为这些现象提供了理论基础。我们对任意重建方法中这些效果如何以及何时出现给出了数学解释，我们的一些结果采取了“没有免费午餐”定理的形式。具体而言，我们表明：（i）在单个图像上表现过度的方法可能会错误地将细节从一个图像转移到另一个图像，导致幻觉; （ii）在两个或更多图像上表现过度的方法可能会产生幻觉或不稳定; （iii）优化准确性-稳定性权衡通常是困难的; （iv）幻觉和不稳定性，如果发生，不是罕见事件，并且可能会受到标准训练的鼓励; （v）对于某些问题，可能无法构建最佳的重建映射。我们的结果将这些效应追溯到前向运算符的核心，只要它是非平凡的，但也适用于前向运算符病态的情况。基于这些见解，我们的工作旨在推动对成像中逆问题开发强大可靠的基于人工智能方法的新途径的研究。

更新时间: 2024-06-18 19:18:47

领域: cs.LG,cs.CV,65R32, 94A08, 68T05, 65M12

下载: http://arxiv.org/abs/2001.01258v4

Inverse Optimization for Routing Problems

We propose a method for learning decision-makers' behavior in routing problems using Inverse Optimization (IO). The IO framework falls into the supervised learning category and builds on the premise that the target behavior is an optimizer of an unknown cost function. This cost function is to be learned through historical data, and in the context of routing problems, can be interpreted as the routing preferences of the decision-makers. In this view, the main contributions of this study are to propose an IO methodology with a hypothesis function, loss function, and stochastic first-order algorithm tailored to routing problems. We further test our IO approach in the Amazon Last Mile Routing Research Challenge, where the goal is to learn models that replicate the routing preferences of human drivers, using thousands of real-world routing examples. Our final IO-learned routing model achieves a score that ranks 2nd compared with the 48 models that qualified for the final round of the challenge. Our examples and results showcase the flexibility and real-world potential of the proposed IO methodology to learn from decision-makers' decisions in routing problems.

Updated: 2024-06-18 19:18:16

标题: 路由问题的逆优化

摘要: 我们提出了一种使用逆优化（IO）学习路由问题中决策者行为的方法。IO框架属于监督学习范畴，基于目标行为是未知成本函数的优化器的前提。这个成本函数通过历史数据学习，而在路由问题的背景下，可以解释为决策者的路由偏好。在这个观点下，本研究的主要贡献是提出了一种适用于路由问题的IO方法论，包括假设函数、损失函数和针对路由问题定制的随机一阶算法。我们进一步在亚马逊末端路由研究挑战中测试了我们的IO方法，目标是学习模型，模拟人类驾驶员的路由偏好，使用数千个真实世界的路由示例。我们最终通过IO学习的路由模型在挑战的最终轮中排名第二，相较于48个晋级最终轮的模型。我们的示例和结果展示了所提出的IO方法学习路由问题中决策者决策的灵活性和实际潜力。

更新时间: 2024-06-18 19:18:16

领域: math.OC,cs.LG

下载: http://arxiv.org/abs/2307.07357v3

Deriving Hematological Disease Classes Using Fuzzy Logic and Expert Knowledge: A Comprehensive Machine Learning Approach with CBC Parameters

In the intricate field of medical diagnostics, capturing the subtle manifestations of diseases remains a challenge. Traditional methods, often binary in nature, may not encapsulate the nuanced variances that exist in real-world clinical scenarios. This paper introduces a novel approach by leveraging Fuzzy Logic Rules to derive disease classes based on expert domain knowledge from a medical practitioner. By recognizing that diseases do not always fit into neat categories, and that expert knowledge can guide the fuzzification of these boundaries, our methodology offers a more sophisticated and nuanced diagnostic tool. Using a dataset procured from a prominent hospital, containing detailed patient blood count records, we harness Fuzzy Logic Rules, a computational technique celebrated for its ability to handle ambiguity. This approach, moving through stages of fuzzification, rule application, inference, and ultimately defuzzification, produces refined diagnostic predictions. When combined with the Random Forest classifier, the system adeptly predicts hematological conditions using Complete Blood Count (CBC) parameters. Preliminary results showcase high accuracy levels, underscoring the advantages of integrating fuzzy logic into the diagnostic process. When juxtaposed with traditional diagnostic techniques, it becomes evident that Fuzzy Logic, especially when guided by medical expertise, offers significant advancements in the realm of hematological diagnostics. This paper not only paves the path for enhanced patient care but also beckons a deeper dive into the potentialities of fuzzy logic in various medical diagnostic applications.

Updated: 2024-06-18 19:16:32

标题: 用模糊逻辑和专家知识推导血液病类别：基于CBC参数的全面机器学习方法

摘要: 在医学诊断领域，捕捉疾病的微妙表现仍然是一个挑战。传统方法往往是二元的，可能无法涵盖现实临床情景中存在的微妙差异。本文介绍了一种新颖方法，通过利用模糊逻辑规则来根据医学实践者的专业领域知识推导疾病类别。通过认识到疾病并不总是符合明确的类别，以及专家知识可以引导这些界限的模糊化，我们的方法提供了一种更复杂和微妙的诊断工具。使用从知名医院获取的包含详细患者血液计数记录的数据集，我们利用模糊逻辑规则，这是一种因其处理模糊性能力而备受推崇的计算技术。这种方法，通过模糊化、规则应用、推理，最终解模糊化的阶段，产生了精细的诊断预测。当与随机森林分类器结合时，系统能够使用完整血细胞计数（CBC）参数熟练预测血液学状况。初步结果展示了高准确性水平，强调了将模糊逻辑整合到诊断过程中的优势。与传统诊断技术相比，显而易见，模糊逻辑，特别是在医学专业知识的指导下，为血液学诊断领域带来了重大进展。本文不仅为提升患者护理铺平了道路，还呼吁深入探讨模糊逻辑在各种医学诊断应用中的潜力。

更新时间: 2024-06-18 19:16:32

领域: cs.AI,cs.LG

下载: http://arxiv.org/abs/2406.13015v1

Silent Signals, Loud Impact: LLMs for Word-Sense Disambiguation of Coded Dog Whistles

A dog whistle is a form of coded communication that carries a secondary meaning to specific audiences and is often weaponized for racial and socioeconomic discrimination. Dog whistling historically originated from United States politics, but in recent years has taken root in social media as a means of evading hate speech detection systems and maintaining plausible deniability. In this paper, we present an approach for word-sense disambiguation of dog whistles from standard speech using Large Language Models (LLMs), and leverage this technique to create a dataset of 16,550 high-confidence coded examples of dog whistles used in formal and informal communication. Silent Signals is the largest dataset of disambiguated dog whistle usage, created for applications in hate speech detection, neology, and political science. The dataset can be found at https://huggingface.co/datasets/SALT-NLP/silent_signals.

Updated: 2024-06-18 19:11:11

标题: 沉默的信号，巨大的影响：LLMs用于编码的狗哨词义消歧

摘要: 一种狗哨是一种编码沟通形式，对特定受众具有次要含义，并经常被用作种族和社会经济歧视的武器。狗哨在历史上起源于美国政治，但近年来已在社交媒体上生根，用作逃避仇恨言论检测系统并保持可否认性的手段。在本文中，我们提出了一种使用大型语言模型（LLMs）对狗哨和标准语音进行词义消歧的方法，并利用这一技术创建了一个包含16,550个高可信度编码示例的数据集，用于正式和非正式沟通中使用的狗哨。静默信号是最大的狗哨使用消歧数据集，用于仇恨言论检测、新词语和政治科学应用。数据集可在https://huggingface.co/datasets/SALT-NLP/silent_signals找到。

更新时间: 2024-06-18 19:11:11

领域: cs.CL,cs.LG,J.4; K.4.1; K.4.2

下载: http://arxiv.org/abs/2406.06840v2

MAGNOLIA: Matching Algorithms via GNNs for Online Value-to-go Approximation

Online Bayesian bipartite matching is a central problem in digital marketplaces and exchanges, including advertising, crowdsourcing, ridesharing, and kidney exchange. We introduce a graph neural network (GNN) approach that emulates the problem's combinatorially-complex optimal online algorithm, which selects actions (e.g., which nodes to match) by computing each action's value-to-go (VTG) -- the expected weight of the final matching if the algorithm takes that action, then acts optimally in the future. We train a GNN to estimate VTG and show empirically that this GNN returns high-weight matchings across a variety of tasks. Moreover, we identify a common family of graph distributions in spatial crowdsourcing applications, such as rideshare, under which VTG can be efficiently approximated by aggregating information within local neighborhoods in the graphs. This structure matches the local behavior of GNNs, providing theoretical justification for our approach.

Updated: 2024-06-18 19:06:04

标题: 玉兰花：通过GNN进行匹配算法以进行在线价值逼近

摘要: 在线贝叶斯二部匹配是数字市场和交易中的一个核心问题，包括广告、众包、拼车和肾脏交换。我们引入了一种图神经网络（GNN）方法，模拟了问题的组合复杂最优在线算法，通过计算每个动作的预期最终匹配权重（VTG）来选择动作（例如，匹配哪些节点），然后在未来以最佳方式行动。我们训练一个GNN来估计VTG，并通过实验证明，该GNN在各种任务中返回高权重的匹配。此外，我们在空间众包应用程序中识别了一类常见的图分布，例如拼车，在这些分布下，VTG可以通过在图中的本地邻域内聚合信息来有效地近似。这种结构与GNN的本地行为相匹配，为我们的方法提供了理论上的依据。

更新时间: 2024-06-18 19:06:04

领域: cs.LG,cs.DS

下载: http://arxiv.org/abs/2406.05959v2

Data Plagiarism Index: Characterizing the Privacy Risk of Data-Copying in Tabular Generative Models

The promise of tabular generative models is to produce realistic synthetic data that can be shared and safely used without dangerous leakage of information from the training set. In evaluating these models, a variety of methods have been proposed to measure the tendency to copy data from the training dataset when generating a sample. However, these methods suffer from either not considering data-copying from a privacy threat perspective, not being motivated by recent results in the data-copying literature or being difficult to make compatible with the high dimensional, mixed type nature of tabular data. This paper proposes a new similarity metric and Membership Inference Attack called Data Plagiarism Index (DPI) for tabular data. We show that DPI evaluates a new intuitive definition of data-copying and characterizes the corresponding privacy risk. We show that the data-copying identified by DPI poses both privacy and fairness threats to common, high performing architectures; underscoring the necessity for more sophisticated generative modeling techniques to mitigate this issue.

Updated: 2024-06-18 19:05:24

标题: 数据剽窃指数：表格生成模型中数据复制的隐私风险特征化

摘要: 表格生成模型的承诺是产生逼真的合成数据，可以共享和安全使用，而不会从训练集中泄露信息。在评估这些模型时，已经提出了各种方法来衡量在生成样本时从训练数据集中复制数据的倾向。然而，这些方法要么没有考虑数据复制对隐私威胁的角度，要么没有受到最近数据复制文献中的结果的启发，要么难以与表格数据的高维、混合类型的特性相兼容。本文提出了一种适用于表格数据的新的相似度度量和成员推断攻击，称为数据剽窃指数（DPI）。我们表明，DPI评估了对数据复制的新直观定义，并表征了相应的隐私风险。我们表明，DPI所识别的数据复制对常见的高性能架构都构成隐私和公平威胁，强调更复杂的生成模型技术以缓解这一问题的必要性。

更新时间: 2024-06-18 19:05:24

领域: cs.LG,cs.CR,stat.ML

下载: http://arxiv.org/abs/2406.13012v1

Detecting Errors through Ensembling Prompts (DEEP): An End-to-End LLM Framework for Detecting Factual Errors

Accurate text summarization is one of the most common and important tasks performed by Large Language Models, where the costs of human review for an entire document may be high, but the costs of errors in summarization may be even greater. We propose Detecting Errors through Ensembling Prompts (DEEP) - an end-to-end large language model framework for detecting factual errors in text summarization. Our framework uses a diverse set of LLM prompts to identify factual inconsistencies, treating their outputs as binary features, which are then fed into ensembling models. We then calibrate the ensembled models to produce empirically accurate probabilities that a text is factually consistent or free of hallucination. We demonstrate that prior models for detecting factual errors in summaries perform significantly worse without optimizing the thresholds on subsets of the evaluated dataset. Our framework achieves state-of-the-art (SOTA) balanced accuracy on the AggreFact-XSUM FTSOTA, TofuEval Summary-Level, and HaluEval Summarization benchmarks in detecting factual errors within transformer-generated text summaries. It does so without any fine-tuning of the language model or reliance on thresholding techniques not available in practical settings.

Updated: 2024-06-18 18:59:37

标题: 通过集成提示检测错误（DEEP）：一种用于检测事实错误的端到端LLM框架

摘要: 准确的文本摘要是大型语言模型执行的最常见和重要的任务之一，其中对整个文档进行人工审查的成本可能很高，但摘要中的错误可能更大。我们提出了一种名为DEEP（Detecting Errors through Ensembling Prompts）的端到端大型语言模型框架，用于检测文本摘要中的事实错误。我们的框架使用多样化的LLM提示集来识别事实上的不一致性，将它们的输出视为二进制特征，然后将其馈送到集成模型中。然后，我们校准集成模型以产生经验上准确的概率，即文本是否事实上一致或没有幻觉。我们证明，在未优化评估数据集子集上的阈值的情况下，先前用于检测摘要中的事实错误的模型表现明显较差。我们的框架在检测变压器生成的文本摘要中的事实错误方面，在AggreFact-XSUM FTSOTA、TofuEval Summary-Level和HaluEval Summarization基准测试中实现了最先进的平衡准确性（SOTA）。它实现了这一点，而无需对语言模型进行任何微调或依赖于在实际环境中不可用的阈值技术。

更新时间: 2024-06-18 18:59:37

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2406.13009v1

ClaudesLens: Uncertainty Quantification in Computer Vision Models

In a world where more decisions are made using artificial intelligence, it is of utmost importance to ensure these decisions are well-grounded. Neural networks are the modern building blocks for artificial intelligence. Modern neural network-based computer vision models are often used for object classification tasks. Correctly classifying objects with \textit{certainty} has become of great importance in recent times. However, quantifying the inherent \textit{uncertainty} of the output from neural networks is a challenging task. Here we show a possible method to quantify and evaluate the uncertainty of the output of different computer vision models based on Shannon entropy. By adding perturbation of different levels, on different parts, ranging from the input to the parameters of the network, one introduces entropy to the system. By quantifying and evaluating the perturbed models on the proposed PI and PSI metrics, we can conclude that our theoretical framework can grant insight into the uncertainty of predictions of computer vision models. We believe that this theoretical framework can be applied to different applications for neural networks. We believe that Shannon entropy may eventually have a bigger role in the SOTA (State-of-the-art) methods to quantify uncertainty in artificial intelligence. One day we might be able to apply Shannon entropy to our neural systems.

Updated: 2024-06-18 18:58:54

标题: 克劳德的透镜：计算机视觉模型中的不确定性量化

摘要: 在一个越来越多决策依赖人工智能的世界中，确保这些决策是有根据的至关重要。神经网络是人工智能的现代基石。现代基于神经网络的计算机视觉模型经常用于对象分类任务。最近，准确分类对象并带有确定性已经变得非常重要。然而，量化神经网络输出的固有不确定性是一项具有挑战性的任务。在这里，我们展示了一种可能的方法，基于香农熵来量化和评估不同计算机视觉模型输出的不确定性。通过在不同部分，从输入到网络参数，添加不同级别的扰动，引入熵到系统中。通过在提出的PI和PSI指标上量化和评估扰动模型，我们可以得出结论，我们的理论框架可以提供对计算机视觉模型预测不确定性的洞察。我们相信这个理论框架可以应用于不同的神经网络应用。我们相信香农熵可能最终在量化人工智能不确定性的最新方法中发挥更大的作用。有一天，我们可能能够将香农熵应用到我们的神经系统中。

更新时间: 2024-06-18 18:58:54

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2406.13008v1

Information-theoretic generalization bounds for learning from quantum data

Learning tasks play an increasingly prominent role in quantum information and computation. They range from fundamental problems such as state discrimination and metrology over the framework of quantum probably approximately correct (PAC) learning, to the recently proposed shadow variants of state tomography. However, the many directions of quantum learning theory have so far evolved separately. We propose a general mathematical formalism for describing quantum learning by training on classical-quantum data and then testing how well the learned hypothesis generalizes to new data. In this framework, we prove bounds on the expected generalization error of a quantum learner in terms of classical and quantum information-theoretic quantities measuring how strongly the learner's hypothesis depends on the specific data seen during training. To achieve this, we use tools from quantum optimal transport and quantum concentration inequalities to establish non-commutative versions of decoupling lemmas that underlie recent information-theoretic generalization bounds for classical machine learning. Our framework encompasses and gives intuitively accessible generalization bounds for a variety of quantum learning scenarios such as quantum state discrimination, PAC learning quantum states, quantum parameter estimation, and quantumly PAC learning classical functions. Thereby, our work lays a foundation for a unifying quantum information-theoretic perspective on quantum learning.

Updated: 2024-06-18 18:47:54

标题: 学习量子数据的信息理论泛化界限

摘要: 学习任务在量子信息和计算中扮演着日益重要的角色。它们涵盖了从状态区分和计量基本问题到量子概率近似正确（PAC）学习框架，再到最近提出的状态重建的阴影变体等多种学习方向。然而，迄今为止，量子学习理论的许多方向仍然在分开发展。我们提出了一个描述量子学习的一般数学形式，通过在经典-量子数据上训练，然后测试学习的假设在新数据上的泛化能力。在这个框架下，我们证明了量子学习器的期望泛化误差的界限，以经典和量子信息理论量表衡量学习假设在训练过程中如何依赖于具体数据的强度。为了实现这一点，我们利用量子最优输运和量子集中不等式的工具，建立了非交换版本的解耦引理，这些引理是最近信息理论泛化界限的基础，适用于经典机器学习。我们的框架包含并直观地提供了各种量子学习场景的泛化界限，如量子状态区分、PAC学习量子态、量子参数估计以及量子PAC学习经典函数。因此，我们的工作为统一的量子信息理论视角打下了基础。

更新时间: 2024-06-18 18:47:54

领域: quant-ph,cs.CC,cs.IT,cs.LG,math.IT

下载: http://arxiv.org/abs/2311.05529v2

Discounted Adaptive Online Learning: Towards Better Regularization

We study online learning in adversarial nonstationary environments. Since the future can be very different from the past, a critical challenge is to gracefully forget the history while new data comes in. To formalize this intuition, we revisit the discounted regret in online convex optimization, and propose an adaptive (i.e., instance optimal), FTRL-based algorithm that improves the widespread non-adaptive baseline -- gradient descent with a constant learning rate. From a practical perspective, this refines the classical idea of regularization in lifelong learning: we show that designing good regularizers can be guided by the principled theory of adaptive online optimization. Complementing this result, we also consider the (Gibbs and Cand\`es, 2021)-style online conformal prediction problem, where the goal is to sequentially predict the uncertainty sets of a black-box machine learning model. We show that the FTRL nature of our algorithm can simplify the conventional gradient-descent-based analysis, leading to instance-dependent performance guarantees.

Updated: 2024-06-18 18:47:21

标题: 打折的自适应在线学习：迈向更好的正则化

摘要: 我们研究在对抗性非稳态环境中的在线学习。由于未来可能与过去截然不同，一个关键挑战是在新数据输入时优雅地忘记历史。为了明确这种直觉，我们重新审视在线凸优化中的折扣遗憾，并提出了一种自适应（即实例最优）的基于FTRL的算法，改进了广泛使用的非自适应基准--具有恒定学习率的梯度下降。从实际角度来看，这是对终身学习中正则化经典思想的完善：我们展示了设计良好正则化器可以通过自适应在线优化的原则理论来指导。作为这一结果的补充，我们还考虑了(Gibbs and Cand\`es, 2021)风格的在线符合预测问题，其目标是顺序预测黑盒机器学习模型的不确定性集。我们展示了我们算法的FTRL特性可以简化传统基于梯度下降的分析，从而导致实例相关的性能保证。

更新时间: 2024-06-18 18:47:21

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2402.02720v2

Variational quantum simulation: a case study for understanding warm starts

The barren plateau phenomenon, characterized by loss gradients that vanish exponentially with system size, poses a challenge to scaling variational quantum algorithms. Here we explore the potential of warm starts, whereby one initializes closer to a solution in the hope of enjoying larger loss variances. Focusing on an iterative variational method for learning shorter-depth circuits for quantum real and imaginary time evolution we conduct a case study to elucidate the potential and limitations of warm starts. We start by proving that the iterative variational algorithm will exhibit substantial (at worst vanishing polynomially in system size) gradients in a small region around the initializations at each time-step. Convexity guarantees for these regions are then established, suggesting trainability for polynomial size time-steps. However, our study highlights scenarios where a good minimum shifts outside the region with trainability guarantees. Our analysis leaves open the question whether such minima jumps necessitate optimization across barren plateau landscapes or whether there exist gradient flows, i.e., fertile valleys away from the plateau with substantial gradients, that allow for training.

Updated: 2024-06-18 18:45:36

标题: 变分量子模拟：理解热启动的案例研究

摘要: 这个文献摘要讨论了贫瘠高原现象，其特征是损失梯度随系统规模呈指数级消失，这对于扩展变分量子算法构成了挑战。在这里，我们探索了暖启动的潜力，即通过更接近解的初始化，希望能够获得更大的损失方差。我们专注于一种用于学习量子实部和虚部时间演化的短深度电路的迭代变分方法，进行了一项案例研究以阐明暖启动的潜力和局限性。我们首先证明，迭代变分算法将在每个时间步附近的一个小区域内展示出大量的（最坏情况下在系统规模上多项式消失的）梯度。然后建立了这些区域的凸性保证，表明多项式规模时间步的可训练性。然而，我们的研究突出了一些情况，即一个良好的最小值移出了具有可训练性保证的区域。我们的分析留下了一个问题，即这种最小值跳跃是否需要在贫瘠高原景观上进行优化，或者是否存在梯度流，即远离高原的肥沃山谷，有充分的梯度可以进行训练。

更新时间: 2024-06-18 18:45:36

领域: quant-ph,cs.LG,stat.ML

下载: http://arxiv.org/abs/2404.10044v3

Towards Enhancing the Reproducibility of Deep Learning Bugs: An Empirical Study

Context: Deep learning has achieved remarkable progress in various domains. However, like any software system, deep learning systems contain bugs, some of which can have severe impacts, as evidenced by crashes involving autonomous vehicles. Despite substantial advancements in deep learning techniques, little research has focused on reproducing deep learning bugs, which is an essential step for their resolution. Existing literature suggests that only 3% of deep learning bugs are reproducible, underscoring the need for further research. Objective: This paper examines the reproducibility of deep learning bugs. We identify edit actions and useful information that could improve the reproducibility of deep learning bugs. Method: First, we construct a dataset of 668 deep-learning bugs from Stack Overflow and GitHub across three frameworks and 22 architectures. Second, out of the 668 bugs, we select 165 bugs using stratified sampling and attempt to determine their reproducibility. While reproducing these bugs, we identify edit actions and useful information for their reproduction. Third, we used the Apriori algorithm to identify useful information and edit actions required to reproduce specific types of bugs. Finally, we conducted a user study involving 22 developers to assess the effectiveness of our findings in real-life settings. Results: We successfully reproduced 148 out of 165 bugs attempted. We identified ten edit actions and five useful types of component information that can help us reproduce the deep learning bugs. With the help of our findings, the developers were able to reproduce 22.92% more bugs and reduce their reproduction time by 24.35%. Conclusions: Our research addresses the critical issue of deep learning bug reproducibility. Practitioners and researchers can leverage our findings to improve deep learning bug reproducibility.

Updated: 2024-06-18 18:42:46

标题: 朝着增强深度学习缺陷的可重现性：一项实证研究

摘要: 背景：深度学习在各个领域取得了显著进展。然而，就像任何软件系统一样，深度学习系统也存在缺陷，其中一些可能会产生严重影响，正如涉及自动驾驶汽车的崩溃所证明的那样。尽管深度学习技术取得了显著进展，但很少有研究专注于重现深度学习缺陷，这是解决问题的重要步骤。现有文献表明，只有3%的深度学习缺陷是可重现的，强调了进一步研究的必要性。目标：本文研究深度学习缺陷的重现性。我们确定了可以提高深度学习缺陷重现性的编辑操作和有用信息。方法：首先，我们从Stack Overflow和GitHub构建了一个包含668个深度学习缺陷的数据集，涵盖了三种框架和22种架构。其次，我们从这668个缺陷中使用分层抽样选择了165个缺陷，并尝试确定它们的重现性。在重现这些缺陷的过程中，我们确定了编辑操作和有用信息，以便进行重现。第三，我们使用Apriori算法识别了重现特定类型的缺陷所需的有用信息和编辑操作。最后，我们进行了一项涉及22名开发人员的用户研究，以评估我们的研究结果在现实环境中的有效性。结果：我们成功重现了165个缺陷中的148个。我们确定了十种编辑操作和五种有用的组件信息类型，可以帮助我们重现深度学习缺陷。借助我们的研究结果，开发人员能够重现更多的缺陷，并将重现时间缩短了24.35%。结论：我们的研究解决了深度学习缺陷重现性的关键问题。从业者和研究人员可以利用我们的研究结果来提高深度学习缺陷的重现性。

更新时间: 2024-06-18 18:42:46

领域: cs.SE,cs.LG

下载: http://arxiv.org/abs/2401.03069v2

Joint Optimization of Piecewise Linear Ensembles

Tree ensembles achieve state-of-the-art performance on numerous prediction tasks. We propose Joint Optimization of Piecewise Linear ENsembles (JOPLEN), which jointly fits piecewise linear models at all leaf nodes of an existing tree ensemble. In addition to enhancing the expressiveness of an ensemble, JOPLEN allows several common penalties, including sparsity-promoting matrix norms and subspace-norms, to be applied to nonlinear prediction. We demonstrate the performance of JOPLEN on over 100 regression and classification datasets and with a variety of penalties. JOPLEN leads to improved prediction performance relative to not only standard random forest and gradient boosted tree ensembles, but also other methods for enhancing tree ensembles. We demonstrate that JOPLEN with a nuclear norm penalty learns subspace-aligned functions. Additionally, JOPLEN combined with a Dirty LASSO penalty is an effective feature selection method for nonlinear prediction in multitask learning.

Updated: 2024-06-18 18:40:09

标题: 分段线性集成的联合优化

摘要: 树集成在许多预测任务中取得了最先进的性能。我们提出了Piecewise Linear ENsembles的联合优化（JOPLEN），它在现有树集成的所有叶节点上联合拟合分段线性模型。除了增强集成的表达能力外，JOPLEN还允许应用于非线性预测的几种常见惩罚，包括促进稀疏性的矩阵范数和子空间范数。我们在100多个回归和分类数据集上展示了JOPLEN的性能，使用了各种惩罚。相对于标准随机森林和梯度提升树集成，JOPLEN在预测性能上表现出改进，还优于其他增强树集成的方法。我们证明，JOPLEN与核范数惩罚学习子空间对齐函数。此外，JOPLEN与Dirty LASSO惩罚相结合是多任务学习中非线性预测的有效特征选择方法。

更新时间: 2024-06-18 18:40:09

领域: cs.LG

下载: http://arxiv.org/abs/2405.00303v2

Articulatory Encodec: Vocal Tract Kinematics as a Codec for Speech

Vocal tract articulation is a natural, grounded control space of speech production. The spatiotemporal coordination of articulators combined with the vocal source shapes intelligible speech sounds to enable effective spoken communication. Based on this physiological grounding of speech, we propose a new framework of neural encoding-decoding of speech -- articulatory encodec. The articulatory encodec comprises an articulatory analysis model that infers articulatory features from speech audio, and an articulatory synthesis model that synthesizes speech audio from articulatory features. The articulatory features are kinematic traces of vocal tract articulators and source features, which are intuitively interpretable and controllable, being the actual physical interface of speech production. An additional speaker identity encoder is jointly trained with the articulatory synthesizer to inform the voice texture of individual speakers. By training on large-scale speech data, we achieve a fully intelligible, high-quality articulatory synthesizer that generalizes to unseen speakers. Furthermore, the speaker embedding is effectively disentangled from articulations, which enables accent-perserving zero-shot voice conversion. To the best of our knowledge, this is the first demonstration of universal, high-performance articulatory inference and synthesis, suggesting the proposed framework as a powerful coding system of speech.

Updated: 2024-06-18 18:38:17

标题: 发音编码：声道运动学作为语音编解码器

摘要: Vocal tract articulation is a natural, grounded control space of speech production. The spatiotemporal coordination of articulators combined with the vocal source shapes intelligible speech sounds to enable effective spoken communication. Based on this physiological grounding of speech, we propose a new framework of neural encoding-decoding of speech -- articulatory encodec. The articulatory encodec comprises an articulatory analysis model that infers articulatory features from speech audio, and an articulatory synthesis model that synthesizes speech audio from articulatory features. The articulatory features are kinematic traces of vocal tract articulators and source features, which are intuitively interpretable and controllable, being the actual physical interface of speech production. An additional speaker identity encoder is jointly trained with the articulatory synthesizer to inform the voice texture of individual speakers. By training on large-scale speech data, we achieve a fully intelligible, high-quality articulatory synthesizer that generalizes to unseen speakers. Furthermore, the speaker embedding is effectively disentangled from articulations, which enables accent-preserving zero-shot voice conversion. To the best of our knowledge, this is the first demonstration of universal, high-performance articulatory inference and synthesis, suggesting the proposed framework as a powerful coding system of speech.

更新时间: 2024-06-18 18:38:17

领域: eess.AS,cs.AI,cs.CL,cs.SD

下载: http://arxiv.org/abs/2406.12998v1

InstructCMP: Length Control in Sentence Compression through Instruction-based Large Language Models

Extractive summarization can produce faithful summaries but often requires additional constraints such as a desired summary length. Traditional sentence compression models do not typically consider the constraints because of their restricted model abilities, which require model modifications for coping with them. To bridge this gap, we propose Instruction-based Compression (InstructCMP), an approach to the sentence compression task that can consider the length constraint through instructions by leveraging the zero-shot task-solving abilities of Large Language Models (LLMs). For this purpose, we created new evaluation datasets by transforming traditional sentence compression datasets into an instruction format. By using the datasets, we first reveal that the current LLMs still face challenges in accurately controlling the length for a compressed text. To address this issue, we propose an approach named "length priming," that incorporates additional length information into the instructions without external resources. While the length priming effectively works in a zero-shot setting, a training dataset with the instructions would further improve the ability of length control. Thus, we additionally created a training dataset in an instruction format to fine-tune the model on it. Experimental results and analysis show that applying the length priming significantly improves performances of InstructCMP in both zero-shot and fine-tuning settings without the need of any model modifications.

Updated: 2024-06-18 18:35:52

标题: InstructCMP：通过基于指令的大型语言模型控制句子压缩的长度

摘要: 摘要：提取式摘要可以产生忠实的摘要，但通常需要额外的约束，如所需的摘要长度。传统的句子压缩模型通常不考虑这些约束，因为它们的模型能力受限，需要对其进行修改以应对这些约束。为了弥合这一差距，我们提出了基于指令的压缩（InstructCMP），这是一种考虑长度约束的句子压缩任务方法，通过利用大型语言模型（LLMs）的零-shot任务解决能力来利用指令。为此，我们将传统的句子压缩数据集转换为指令格式，创建了新的评估数据集。通过使用这些数据集，我们首先揭示了当前的LLMs在准确控制压缩文本长度方面仍面临挑战。为解决这一问题，我们提出了一种名为“长度引导”的方法，将额外的长度信息纳入指令中，而无需外部资源。虽然长度引导在零-shot设置中有效，但使用带有指令的训练数据集将进一步提高长度控制的能力。因此，我们另外创建了一个指令格式的训练数据集，对模型进行微调。实验结果和分析显示，应用长度引导显著改善了InstructCMP在零-shot和微调设置中的性能，无需进行任何模型修改。

更新时间: 2024-06-18 18:35:52

领域: cs.CL,cs.AI,I.2.7

下载: http://arxiv.org/abs/2406.11097v2

Additive regularization schedule for neural architecture search

Neural network structures have a critical impact on the accuracy and stability of forecasting. Neural architecture search procedures help design an optimal neural network according to some loss function, which represents a set of quality criteria. This paper investigates the problem of neural network structure optimization. It proposes a way to construct a loss function, which contains a set of additive elements. Each element is called the regularizer. It corresponds to some part of the neural network structure and represents a criterion to optimize. The optimization procedure changes the structure in iterations. To optimize various parts of the structure, the procedure changes the set of regularizers according to some schedule. The authors propose a way to construct the additive regularization schedule. By comparing regularized models with non-regularized ones for a collection of datasets the computational experiments show that the proposed method finds efficient neural network structure and delivers accurate networks of low complexity.

Updated: 2024-06-18 18:32:13

标题: 神经架构搜索的加法正则化计划

摘要: 神经网络结构对预测的准确性和稳定性有关键影响。神经架构搜索程序有助于根据某个损失函数设计最佳的神经网络，该函数代表一组质量标准。本文研究了神经网络结构优化问题。它提出了一种构建包含一组附加元素的损失函数的方法。每个元素称为正则化器，对应神经网络结构的某个部分，并代表一个优化标准。优化过程通过迭代改变结构。为了优化结构的各个部分，该过程根据某个时间表改变正则化器集合。作者提出了一种构建附加正则化计划的方法。通过比较一系列数据集的经过正则化和未经正则化的模型，计算实验表明所提出的方法找到了高效的神经网络结构，并提供了复杂度低、准确的网络。

更新时间: 2024-06-18 18:32:13

领域: cs.LG

下载: http://arxiv.org/abs/2406.12992v1

Active Adaptive Experimental Design for Treatment Effect Estimation with Covariate Choices

This study designs an adaptive experiment for efficiently estimating average treatment effects (ATEs). In each round of our adaptive experiment, an experimenter sequentially samples an experimental unit, assigns a treatment, and observes the corresponding outcome immediately. At the end of the experiment, the experimenter estimates an ATE using the gathered samples. The objective is to estimate the ATE with a smaller asymptotic variance. Existing studies have designed experiments that adaptively optimize the propensity score (treatment-assignment probability). As a generalization of such an approach, we propose optimizing the covariate density as well as the propensity score. First, we derive the efficient covariate density and propensity score that minimize the semiparametric efficiency bound and find that optimizing both covariate density and propensity score minimizes the semiparametric efficiency bound more effectively than optimizing only the propensity score. Next, we design an adaptive experiment using the efficient covariate density and propensity score sequentially estimated during the experiment. Lastly, we propose an ATE estimator whose asymptotic variance aligns with the minimized semiparametric efficiency bound.

Updated: 2024-06-18 18:20:08

标题: 主动自适应实验设计用于治疗效果估计与协变量选择

摘要: 这项研究设计了一种自适应实验，以有效地估计平均治疗效应（ATEs）。在我们的自适应实验的每一轮中，实验者依次对实验单元进行抽样，分配治疗，并立即观察相应的结果。在实验结束时，实验者利用收集到的样本估计ATE。其目标是用更小的渐近方差估计ATE。现有研究已经设计了自适应优化倾向得分（治疗分配概率）的实验。作为这种方法的一个推广，我们提出优化协变量密度以及倾向得分。首先，我们推导出最小化半参数效率界的有效协变量密度和倾向得分，并发现优化协变量密度和倾向得分比仅优化倾向得分更有效地最小化半参数效率界。接下来，我们设计了一个自适应实验，使用在实验期间依次估计的有效协变量密度和倾向得分。最后，我们提出了一个ATE估计器，其渐近方差与最小化的半参数效率界一致。

更新时间: 2024-06-18 18:20:08

领域: stat.ME,cs.LG,econ.EM,stat.ML

下载: http://arxiv.org/abs/2403.03589v2

HumanoidBench: Simulated Humanoid Benchmark for Whole-Body Locomotion and Manipulation

Humanoid robots hold great promise in assisting humans in diverse environments and tasks, due to their flexibility and adaptability leveraging human-like morphology. However, research in humanoid robots is often bottlenecked by the costly and fragile hardware setups. To accelerate algorithmic research in humanoid robots, we present a high-dimensional, simulated robot learning benchmark, HumanoidBench, featuring a humanoid robot equipped with dexterous hands and a variety of challenging whole-body manipulation and locomotion tasks. Our findings reveal that state-of-the-art reinforcement learning algorithms struggle with most tasks, whereas a hierarchical learning approach achieves superior performance when supported by robust low-level policies, such as walking or reaching. With HumanoidBench, we provide the robotics community with a platform to identify the challenges arising when solving diverse tasks with humanoid robots, facilitating prompt verification of algorithms and ideas. The open-source code is available at https://humanoid-bench.github.io.

Updated: 2024-06-18 18:11:07

标题: HumanoidBench：用于全身运动和操作的仿真人形机器人基准测试

摘要: 人形机器人在不同环境和任务中协助人类具有巨大潜力，因为它们利用类似人类形态的灵活性和适应性。然而，人形机器人的研究往往受到昂贵且脆弱的硬件设置的限制。为了加速人形机器人的算法研究，我们提出了一个高维度的模拟机器人学习基准，HumanoidBench，其中包含一个配备灵巧手和各种具有挑战性的全身操纵和运动任务的人形机器人。我们的研究发现，最先进的强化学习算法在大多数任务中表现艰难，而在支持强健的低级策略（如行走或伸手）的情况下，分层学习方法取得了更好的性能。通过HumanoidBench，我们为机器人学界提供了一个平台，用于识别解决人形机器人各种任务时出现的挑战，促进算法和思想的及时验证。开源代码可在https://humanoid-bench.github.io 上找到。

更新时间: 2024-06-18 18:11:07

领域: cs.RO,cs.AI,cs.LG

下载: http://arxiv.org/abs/2403.10506v2

CompA: Addressing the Gap in Compositional Reasoning in Audio-Language Models

A fundamental characteristic of audio is its compositional nature. Audio-language models (ALMs) trained using a contrastive approach (e.g., CLAP) that learns a shared representation between audio and language modalities have improved performance in many downstream applications, including zero-shot audio classification, audio retrieval, etc. However, the ability of these models to effectively perform compositional reasoning remains largely unexplored and necessitates additional research. In this paper, we propose CompA, a collection of two expert-annotated benchmarks with a majority of real-world audio samples, to evaluate compositional reasoning in ALMs. Our proposed CompA-order evaluates how well an ALM understands the order or occurrence of acoustic events in audio, and CompA-attribute evaluates attribute-binding of acoustic events. An instance from either benchmark consists of two audio-caption pairs, where both audios have the same acoustic events but with different compositions. An ALM is evaluated on how well it matches the right audio to the right caption. Using this benchmark, we first show that current ALMs perform only marginally better than random chance, thereby struggling with compositional reasoning. Next, we propose CompA-CLAP, where we fine-tune CLAP using a novel learning method to improve its compositional reasoning abilities. To train CompA-CLAP, we first propose improvements to contrastive training with composition-aware hard negatives, allowing for more focused training. Next, we propose a novel modular contrastive loss that helps the model learn fine-grained compositional understanding and overcomes the acute scarcity of openly available compositional audios. CompA-CLAP significantly improves over all our baseline models on the CompA benchmark, indicating its superior compositional reasoning capabilities.

Updated: 2024-06-18 18:03:28

标题: CompA：解决音频语言模型中的组合推理差距

摘要: 音频的一个基本特征是其构成性质。使用对比方法（例如CLAP）训练的音频-语言模型（ALMs）学习了音频和语言模态之间的共享表示，在许多下游应用中提高了性能，包括零样本音频分类、音频检索等。然而，这些模型有效执行构成推理的能力仍然未被充分探索，需要额外的研究。在本文中，我们提出了CompA，这是一个包含大多数真实世界音频样本的两个专家注释基准集，用于评估ALMs中的构成推理。我们提出的CompA-order评估ALM在理解音频中声学事件的顺序或发生方面的表现，CompA-attribute评估声学事件的属性绑定。任何基准集的一个实例包括两个音频-标题对，其中两个音频具有相同的声学事件，但构成不同。ALM根据其将正确的音频匹配到正确的标题的表现进行评估。使用这个基准集，我们首先展示了当前ALMs仅略优于随机机会，因此在构成推理方面有困难。接下来，我们提出了CompA-CLAP，通过一种新的学习方法对CLAP进行微调，以提高其构成推理能力。为了训练CompA-CLAP，我们首先提出了改进的对比训练方法，包括考虑构成的困难负例，从而实现更加集中的训练。接下来，我们提出了一种新颖的模块对比损失，帮助模型学习细粒度的构成理解，并克服了公开可用构成音频的严重稀缺。CompA-CLAP在CompA基准集上明显优于所有我们的基线模型，表明其优越的构成推理能力。

更新时间: 2024-06-18 18:03:28

领域: cs.SD,cs.AI,cs.CL,eess.AS

下载: http://arxiv.org/abs/2310.08753v3

Reinforcement Learning for Corporate Bond Trading: A Sell Side Perspective

A corporate bond trader in a typical sell side institution such as a bank provides liquidity to the market participants by buying/selling securities and maintaining an inventory. Upon receiving a request for a buy/sell price quote (RFQ), the trader provides a quote by adding a spread over a \textit{prevalent market price}. For illiquid bonds, the market price is harder to observe, and traders often resort to available benchmark bond prices (such as MarketAxess, Bloomberg, etc.). In \cite{Bergault2023ModelingLI}, the concept of \textit{Fair Transfer Price} for an illiquid corporate bond was introduced which is derived from an infinite horizon stochastic optimal control problem (for maximizing the trader's expected P\&L, regularized by the quadratic variation). In this paper, we consider the same optimization objective, however, we approach the estimation of an optimal bid-ask spread quoting strategy in a data driven manner and show that it can be learned using Reinforcement Learning. Furthermore, we perform extensive outcome analysis to examine the reasonableness of the trained agent's behavior.

Updated: 2024-06-18 18:02:35

标题: 企业债券交易的强化学习：卖方视角

摘要: 在典型的卖方机构（如银行）中，企业债券交易员通过买卖证券并维护库存为市场参与者提供流动性。在收到要求报出买入/卖出价格的询价（RFQ）后，交易员通过在\textit{流行市场价格}上加价来提供报价。对于不流动的债券，市场价格很难观察到，交易员通常会求助于可用的基准债券价格（如MarketAxess、Bloomberg等）。在\cite{Bergault2023ModelingLI}中，介绍了针对不流动企业债券的\textit{公平转让价格}概念，该概念来源于一个无限期随机最优控制问题（用于最大化交易员的预期P\&L，通过二次变异性进行正则化）。在本文中，我们考虑相同的优化目标，然而我们以数据驱动的方式来评估最优买卖价差报价策略，并展示它可以通过强化学习来学习。此外，我们进行了广泛的结果分析，以检验训练代理的行为的合理性。

更新时间: 2024-06-18 18:02:35

领域: q-fin.CP,cs.LG,math.OC

下载: http://arxiv.org/abs/2406.12983v1

SHIELD: Evaluation and Defense Strategies for Copyright Compliance in LLM Text Generation

Large Language Models (LLMs) have transformed machine learning but raised significant legal concerns due to their potential to produce text that infringes on copyrights, resulting in several high-profile lawsuits. The legal landscape is struggling to keep pace with these rapid advancements, with ongoing debates about whether generated text might plagiarize copyrighted materials. Current LLMs may infringe on copyrights or overly restrict non-copyrighted texts, leading to these challenges: (i) the need for a comprehensive evaluation benchmark to assess copyright compliance from multiple aspects; (ii) evaluating robustness against safeguard bypassing attacks; and (iii) developing effective defenses targeted against the generation of copyrighted text. To tackle these challenges, we introduce a curated dataset to evaluate methods, test attack strategies, and propose lightweight, real-time defenses to prevent the generation of copyrighted text, ensuring the safe and lawful use of LLMs. Our experiments demonstrate that current LLMs frequently output copyrighted text, and that jailbreaking attacks can significantly increase the volume of copyrighted output. Our proposed defense mechanisms significantly reduce the volume of copyrighted text generated by LLMs by effectively refusing malicious requests. Code is publicly available at https://github.com/xz-liu/SHIELD

Updated: 2024-06-18 18:00:03

标题: SHIELD：LLM文本生成中版权合规的评估和防御策略

摘要: 大型语言模型（LLMs）已经改变了机器学习，但由于它们可能产生侵犯版权的文本，引发了重大的法律关注，导致了几起备受关注的诉讼。法律界正在努力跟上这些快速发展的步伐，就生成的文本可能抄袭受版权保护的材料展开持续的辩论。当前的LLMs可能侵犯版权或过度限制无版权的文本，导致了以下挑战：（i）需要一个全面的评估基准来从多个方面评估版权合规性；（ii）评估抵御绕过攻击的稳健性；以及（iii）开发针对生成受版权保护文本的有效防御。为了解决这些挑战，我们引入了一个策划的数据集，以评估方法，测试攻击策略，并提出轻量级、实时的防御措施，以防止生成受版权保护的文本，确保LLMs的安全和合法使用。我们的实验表明，当前的LLMs经常输出受版权保护的文本，而越狱攻击可以显著增加受版权保护的输出量。我们提出的防御机制通过有效拒绝恶意请求，显著减少了LLMs生成的受版权保护文本的数量。代码可在https://github.com/xz-liu/SHIELD公开获取。

更新时间: 2024-06-18 18:00:03

领域: cs.CL,cs.AI,cs.CY

下载: http://arxiv.org/abs/2406.12975v1

Interpretable Preferences via Multi-Objective Reward Modeling and Mixture-of-Experts

Reinforcement learning from human feedback (RLHF) has emerged as the primary method for aligning large language models (LLMs) with human preferences. The RLHF process typically starts by training a reward model (RM) using human preference data. Conventional RMs are trained on pairwise responses to the same user request, with relative ratings indicating which response humans prefer. The trained RM serves as a proxy for human preferences. However, due to the black-box nature of RMs, their outputs lack interpretability, as humans cannot intuitively understand why an RM thinks a response is good or not. As RMs act as human preference proxies, we believe they should be human-interpretable to ensure that their internal decision processes are consistent with human preferences and to prevent reward hacking in LLM alignment. To build RMs with interpretable preferences, we propose a two-stage approach: i) train an Absolute-Rating Multi-Objective Reward Model (ArmoRM) with multi-dimensional absolute-rating data, each dimension corresponding to a human-interpretable objective (e.g., honesty, verbosity, safety); ii) employ a Mixture-of-Experts (MoE) strategy with a gating network that automatically selects the most suitable reward objectives based on the context. We efficiently trained an ArmoRM with Llama-3 8B and a gating network consisting of a shallow MLP on top of the ArmoRM. Our trained model, ArmoRM-Llama3-8B, obtains state-of-the-art performance on RewardBench, a benchmark evaluating RMs for language modeling. Notably, the performance of our model surpasses the LLM-as-a-judge method with GPT-4 judges by a margin, and approaches the performance of the much larger Nemotron-4 340B reward model.

Updated: 2024-06-18 17:58:28

标题: 可解释的偏好：通过多目标奖励建模和专家混合模型

摘要: 人类反馈强化学习（RLHF）已经成为与人类偏好对齐的大型语言模型（LLMs）的主要方法。 RLHF过程通常从使用人类偏好数据训练奖励模型（RM）开始。传统的RM是根据对同一用户请求的成对响应进行训练的，相对评分指示人类更喜欢哪种响应。训练有素的RM充当了人类偏好的代理。但是，由于RM的黑盒特性，它们的输出缺乏可解释性，因为人类无法直观地理解为什么RM认为某个响应是好的还是不好的。由于RMs充当人类偏好代理，我们认为它们应该是人类可解释的，以确保它们的内部决策过程与人类偏好一致，并防止LLM对齐中的奖励黑客。为了构建具有可解释偏好的RMs，我们提出了一个两阶段方法：i）使用多维绝对评分数据训练绝对评分多目标奖励模型（ArmoRM），每个维度对应一个人类可解释的目标（例如，诚实性，冗长性，安全性）； ii）采用基于上下文自动选择最合适的奖励目标的门控网络的专家混合（MoE）策略。我们有效地使用Llama-3 8B训练了一个ArmoRM和一个由ArmoRM顶部的浅MLP组成的门控网络。我们训练的模型ArmoRM-Llama3-8B，在评估语言建模RMs的基准RewardBench上取得了最先进的性能。值得注意的是，我们的模型性能超过了使用GPT-4评委的LLM作为评委的方法，并接近更大的Nemotron-4 340B奖励模型的性能。

更新时间: 2024-06-18 17:58:28

领域: cs.LG,cs.CL

下载: http://arxiv.org/abs/2406.12845v1

Synergizing Foundation Models and Federated Learning: A Survey

The recent development of Foundation Models (FMs), represented by large language models, vision transformers, and multimodal models, has been making a significant impact on both academia and industry. Compared with small-scale models, FMs have a much stronger demand for high-volume data during the pre-training phase. Although general FMs can be pre-trained on data collected from open sources such as the Internet, domain-specific FMs need proprietary data, posing a practical challenge regarding the amount of data available due to privacy concerns. Federated Learning (FL) is a collaborative learning paradigm that breaks the barrier of data availability from different participants. Therefore, it provides a promising solution to customize and adapt FMs to a wide range of domain-specific tasks using distributed datasets whilst preserving privacy. This survey paper discusses the potentials and challenges of synergizing FL and FMs and summarizes core techniques, future directions, and applications. A periodically updated paper collection on FM-FL is available at https://github.com/lishenghui/awesome-fm-fl.

Updated: 2024-06-18 17:58:09

标题: "基于基础模型和联邦学习的协同作用：一项调查"

摘要: 最近发展的Foundation Models (FMs)，如大型语言模型、视觉变换器和多模态模型，已经在学术界和工业界产生了重大影响。与小规模模型相比，FMs在预训练阶段对大量数据有更强的需求。尽管一般的FMs可以在从互联网等开源收集的数据上进行预训练，但特定领域的FMs需要专有数据，由于隐私问题而导致数据量可用性方面的实际挑战。联邦学习(FL)是一种协作学习范式，打破了来自不同参与者的数据可用性障碍。因此，它提供了一种有前途的解决方案，可以利用分布式数据集定制和调整FMs以适应各种特定领域的任务，同时保护隐私。这篇综述论文讨论了协同FL和FMs的潜力和挑战，并总结了核心技术、未来方向和应用。关于FM-FL的定期更新的文集可在https://github.com/lishenghui/awesome-fm-fl 上找到。

更新时间: 2024-06-18 17:58:09

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2406.12844v1

Can Go AIs be adversarially robust?

Prior work found that superhuman Go AIs like KataGo can be defeated by simple adversarial strategies. In this paper, we study if simple defenses can improve KataGo's worst-case performance. We test three natural defenses: adversarial training on hand-constructed positions, iterated adversarial training, and changing the network architecture. We find that some of these defenses are able to protect against previously discovered attacks. Unfortunately, we also find that none of these defenses are able to withstand adaptive attacks. In particular, we are able to train new adversaries that reliably defeat our defended agents by causing them to blunder in ways humans would not. Our results suggest that building robust AI systems is challenging even in narrow domains such as Go. For interactive examples of attacks and a link to our codebase, see https://goattack.far.ai.

Updated: 2024-06-18 17:57:49

标题: 人工智能围棋对抗是否具有抗干扰能力？

摘要: 先前的研究发现，像KataGo这样的超级人工智能在围棋中可以被简单的对抗策略击败。在本文中，我们研究了简单的防御是否可以提高KataGo的最坏情况表现。我们测试了三种自然的防御方法：对手构造位置的对抗训练，迭代对抗训练，以及改变网络架构。我们发现其中一些防御方法能够抵御先前发现的攻击。不幸的是，我们也发现这些防御方法都无法抵御自适应攻击。特别是，我们能够训练新的对手，他们能够可靠地击败我们的受保护代理，使它们犯错，这是人类不会犯的错误。我们的结果表明，即使在围棋等狭窄领域中，构建强大的人工智能系统也具有挑战性。有关攻击的互动示例和我们的代码库链接，请参见https://goattack.far.ai。

更新时间: 2024-06-18 17:57:49

领域: cs.LG,cs.AI,stat.ML

下载: http://arxiv.org/abs/2406.12843v1

A Characterization of Semi-Involutory MDS Matrices

In symmetric cryptography, maximum distance separable (MDS) matrices with computationally simple inverses have wide applications. Many block ciphers like AES, SQUARE, SHARK, and hash functions like PHOTON use an MDS matrix in the diffusion layer. In this article, we first characterize all $3 \times 3$ irreducible semi-involutory matrices over the finite field of characteristic $2$. Using this matrix characterization, we provide a necessary and sufficient condition to construct MDS semi-involutory matrices using only their diagonal entries and the entries of an associated diagonal matrix. Finally, we count the number of $3 \times 3$ semi-involutory MDS matrices over any finite field of characteristic $2$.

Updated: 2024-06-18 17:57:46

标题: 半对合MDS矩阵的表征

摘要: 在对称加密中，具有计算简单逆矩阵的最大距离可分（MDS）矩阵具有广泛的应用。许多块密码，如AES、SQUARE、SHARK，以及哈希函数如PHOTON，在扩散层中使用MDS矩阵。在本文中，我们首先对特征为2的有限域上的所有$3 \times 3$不可约半对角矩阵进行表征。利用这个矩阵特征，我们提供了一个仅使用其对角元素和相关对角矩阵的条目来构造MDS半对角矩阵的必要和充分条件。最后，我们计算了特征为2的任意有限域上的$3 \times 3$半对角MDS矩阵的数量。

更新时间: 2024-06-18 17:57:46

领域: cs.CR,05B20, 12E20, 15B99 (Primary), 94A60, 94B05 (Secondary)

下载: http://arxiv.org/abs/2406.12842v1

Demystifying Higher-Order Graph Neural Networks

Higher-order graph neural networks (HOGNNs) are an important class of GNN models that harness polyadic relations between vertices beyond plain edges. They have been used to eliminate issues such as over-smoothing or over-squashing, to significantly enhance the accuracy of GNN predictions, to improve the expressiveness of GNN architectures, and for numerous other goals. A plethora of HOGNN models have been introduced, and they come with diverse neural architectures, and even with different notions of what the "higher-order" means. This richness makes it very challenging to appropriately analyze and compare HOGNN models, and to decide in what scenario to use specific ones. To alleviate this, we first design an in-depth taxonomy and a blueprint for HOGNNs. This facilitates designing models that maximize performance. Then, we use our taxonomy to analyze and compare the available HOGNN models. The outcomes of our analysis are synthesized in a set of insights that help to select the most beneficial GNN model in a given scenario, and a comprehensive list of challenges and opportunities for further research into more powerful HOGNNs.

Updated: 2024-06-18 17:57:11

标题: 解密高阶图神经网络

摘要: 高阶图神经网络（HOGNNs）是一类重要的图神经网络模型，利用了顶点之间的多重关系，超越了简单的边缘。它们已被用于消除过度平滑或过度压缩等问题，显著提高了图神经网络预测的准确性，改善了图神经网络架构的表现力，以及实现了众多其他目标。已经引入了大量的HOGNN模型，它们具有不同的神经架构，甚至对“高阶”意味着什么有不同的概念。这种丰富性使得适当分析和比较HOGNN模型变得非常具有挑战性，以及决定在何种情况下使用特定模型。为了缓解这一问题，我们首先设计了一种深入的分类法和HOGNN的蓝图。这有助于设计最大化性能的模型。然后，我们使用我们的分类法来分析和比较可用的HOGNN模型。我们分析的结果综合在一组见解中，有助于在特定场景中选择最有益的GNN模型，并提供了进一步研究更强大HOGNN的一系列挑战和机遇的全面列表。

更新时间: 2024-06-18 17:57:11

领域: cs.LG,cs.AI,cs.SI

下载: http://arxiv.org/abs/2406.12841v1

Evaluating the design space of diffusion-based generative models

Most existing theoretical investigations of the accuracy of diffusion models, albeit significant, assume the score function has been approximated to a certain accuracy, and then use this a priori bound to control the error of generation. This article instead provides a first quantitative understanding of the whole generation process, i.e., both training and sampling. More precisely, it conducts a non-asymptotic convergence analysis of denoising score matching under gradient descent. In addition, a refined sampling error analysis for variance exploding models is also provided. The combination of these two results yields a full error analysis, which elucidates (again, but this time theoretically) how to design the training and sampling processes for effective generation. For instance, our theory implies a preference toward noise distribution and loss weighting that qualitatively agree with the ones used in [Karras et al. 2022]. It also provides some perspectives on why the time and variance schedule used in [Karras et al. 2022] could be better tuned than the pioneering version in [Song et al. 2020].

Updated: 2024-06-18 17:56:10

标题: 评估基于扩散的生成模型的设计空间

摘要: 大多数现有的扩散模型准确性的理论研究，虽然重要，但假设得分函数已被近似到一定精度，然后利用这个先验界限来控制生成的错误。本文提供了对整个生成过程（即训练和采样）的第一次定量理解。更准确地说，它对梯度下降下的去噪得分匹配进行了非渐近收敛分析。此外，还提供了对方差爆炸模型的精细采样误差分析。这两个结果的结合产生了一个完整的错误分析，阐明了如何在理论上设计有效的生成过程（再次，但这次是从理论上）。例如，我们的理论暗示了对噪声分布和损失加权的偏好，与[Karras等人2022]中使用的偏好定性一致。它还提供了一些关于为什么[Karras等人2022]中使用的时间和方差调度可能比[Song等人2020]中的开创性版本更好调整的观点。

更新时间: 2024-06-18 17:56:10

领域: cs.LG,math.DS,math.OC,math.PR,stat.ML

下载: http://arxiv.org/abs/2406.12839v1

LayerMerge: Neural Network Depth Compression through Layer Pruning and Merging

Recent works show that reducing the number of layers in a convolutional neural network can enhance efficiency while maintaining the performance of the network. Existing depth compression methods remove redundant non-linear activation functions and merge the consecutive convolution layers into a single layer. However, these methods suffer from a critical drawback; the kernel size of the merged layers becomes larger, significantly undermining the latency reduction gained from reducing the depth of the network. We show that this problem can be addressed by jointly pruning convolution layers and activation functions. To this end, we propose LayerMerge, a novel depth compression method that selects which activation layers and convolution layers to remove, to achieve a desired inference speed-up while minimizing performance loss. Since the corresponding selection problem involves an exponential search space, we formulate a novel surrogate optimization problem and efficiently solve it via dynamic programming. Empirical results demonstrate that our method consistently outperforms existing depth compression and layer pruning methods on various network architectures, both on image classification and generation tasks. We release the code at https://github.com/snu-mllab/LayerMerge.

Updated: 2024-06-18 17:55:15

标题: LayerMerge：通过层修剪和合并实现神经网络深度压缩

摘要: 最近的研究表明，在卷积神经网络中减少层数可以提高效率，同时保持网络性能。现有的深度压缩方法去除多余的非线性激活函数，并将连续的卷积层合并为单个层。然而，这些方法存在一个关键缺点；合并层的核大小变大，显著削弱了减少网络深度带来的延迟减少。我们表明，这个问题可以通过联合修剪卷积层和激活函数来解决。为此，我们提出了LayerMerge，一种新颖的深度压缩方法，选择要删除的激活层和卷积层，以实现所需的推断加速，同时最小化性能损失。由于相应的选择问题涉及指数搜索空间，我们制定了一种新颖的替代优化问题，并通过动态规划有效解决。实证结果表明，我们的方法在各种网络架构上始终优于现有的深度压缩和层修剪方法，无论是在图像分类还是生成任务上。我们在https://github.com/snu-mllab/LayerMerge发布了代码。

更新时间: 2024-06-18 17:55:15

领域: cs.LG,cs.CV

下载: http://arxiv.org/abs/2406.12837v1

Influence Maximization via Graph Neural Bandits

We consider a ubiquitous scenario in the study of Influence Maximization (IM), in which there is limited knowledge about the topology of the diffusion network. We set the IM problem in a multi-round diffusion campaign, aiming to maximize the number of distinct users that are influenced. Leveraging the capability of bandit algorithms to effectively balance the objectives of exploration and exploitation, as well as the expressivity of neural networks, our study explores the application of neural bandit algorithms to the IM problem. We propose the framework IM-GNB (Influence Maximization with Graph Neural Bandits), where we provide an estimate of the users' probabilities of being influenced by influencers (also known as diffusion seeds). This initial estimate forms the basis for constructing both an exploitation graph and an exploration one. Subsequently, IM-GNB handles the exploration-exploitation tradeoff, by selecting seed nodes in real-time using Graph Convolutional Networks (GCN), in which the pre-estimated graphs are employed to refine the influencers' estimated rewards in each contextual setting. Through extensive experiments on two large real-world datasets, we demonstrate the effectiveness of IM-GNB compared with other baseline methods, significantly improving the spread outcome of such diffusion campaigns, when the underlying network is unknown.

Updated: 2024-06-18 17:54:33

标题: 通过图神经贝叶斯影响最大化

摘要: 我们考虑在影响最大化（IM）研究中一个普遍的场景，即对扩散网络的拓扑结构了解有限。我们将IM问题设置在一个多轮扩散活动中，旨在最大化受影响的不同用户数量。利用赌博算法有效平衡探索和利用目标的能力，以及神经网络的表现力，我们的研究探讨了神经赌博算法在IM问题中的应用。我们提出了IM-GNB（使用图神经赌博的影响最大化）框架，其中我们提供了用户被传播种子（也称为扩散种子）影响的概率估计。这一初步估计形成了构建利用图和探索图的基础。随后，IM-GNB通过使用图卷积网络（GCN）实时选择种子节点来处理探索和利用的权衡，其中预估图用于在每个上下文设置中优化影响者的预估奖励。通过在两个大型真实世界数据集上进行大量实验，我们展示了IM-GNB相对于其他基准方法的有效性，在未知基础网络的情况下显着改善了这种扩散活动的传播结果。

更新时间: 2024-06-18 17:54:33

领域: cs.LG,cs.AI,cs.IR,cs.SI

下载: http://arxiv.org/abs/2406.12835v1

BTS: Building Timeseries Dataset: Empowering Large-Scale Building Analytics

Buildings play a crucial role in human well-being, influencing occupant comfort, health, and safety. Additionally, they contribute significantly to global energy consumption, accounting for one-third of total energy usage, and carbon emissions. Optimizing building performance presents a vital opportunity to combat climate change and promote human flourishing. However, research in building analytics has been hampered by the lack of accessible, available, and comprehensive real-world datasets on multiple building operations. In this paper, we introduce the Building TimeSeries (BTS) dataset. Our dataset covers three buildings over a three-year period, comprising more than ten thousand timeseries data points with hundreds of unique ontologies. Moreover, the metadata is standardized using the Brick schema. To demonstrate the utility of this dataset, we performed benchmarks on two tasks: timeseries ontology classification and zero-shot forecasting. These tasks represent an essential initial step in addressing challenges related to interoperability in building analytics. Access to the dataset and the code used for benchmarking are available here: https://github.com/cruiseresearchgroup/DIEF_BTS .

Updated: 2024-06-18 17:54:13

标题: BTS: 建立时间序列数据集：推动大规模建筑分析

摘要: 建筑物在人类福祉中发挥着至关重要的作用，影响着居住者的舒适度、健康和安全。此外，它们在全球能源消耗方面发挥着重要作用，占总能源使用量的三分之一，并产生碳排放量。优化建筑性能提供了一个重要机会，可以应对气候变化并促进人类的繁荣。然而，建筑分析研究受到了多个建筑运营的实际数据缺乏、不可获取和不全面所阻碍。在本文中，我们介绍了建筑时间序列（BTS）数据集。我们的数据集涵盖了三栋建筑物，在三年的时间内，包括了超过一万个时间序列数据点，具有数百个独特的本体论。此外，元数据使用了标准的Brick模式。为了展示这个数据集的实用性，我们在两个任务上进行了基准测试：时间序列本体分类和零-shot预测。这些任务代表了解决建筑分析中与互操作性相关挑战的一个重要初始步骤。可以在此处访问数据集和用于基准测试的代码：https://github.com/cruiseresearchgroup/DIEF_BTS。

更新时间: 2024-06-18 17:54:13

领域: cs.LG

下载: http://arxiv.org/abs/2406.08990v2

LaMDA: Large Model Fine-Tuning via Spectrally Decomposed Low-Dimensional Adaptation

Low-rank adaptation (LoRA) has become the default approach to fine-tune large language models (LLMs) due to its significant reduction in trainable parameters. However, trainable parameter demand for LoRA increases with increasing model embedding dimensions, leading to high compute costs. Additionally, its backward updates require storing high-dimensional intermediate activations and optimizer states, demanding high peak GPU memory. In this paper, we introduce large model fine-tuning via spectrally decomposed low-dimensional adaptation (LaMDA), a novel approach to fine-tuning large language models, which leverages low-dimensional adaptation to achieve significant reductions in trainable parameters and peak GPU memory footprint. LaMDA freezes a first projection matrix (PMA) in the adaptation path while introducing a low-dimensional trainable square matrix, resulting in substantial reductions in trainable parameters and peak GPU memory usage. LaMDA gradually freezes a second projection matrix (PMB) during the early fine-tuning stages, reducing the compute cost associated with weight updates to enhance parameter efficiency further. We also present an enhancement, LaMDA++, incorporating a ``lite-weight" adaptive rank allocation for the LoRA path via normalized spectrum analysis of pre-trained model weights. We evaluate LaMDA/LaMDA++ across various tasks, including natural language understanding with the GLUE benchmark, text summarization, natural language generation, and complex reasoning on different LLMs. Results show that LaMDA matches or surpasses the performance of existing alternatives while requiring up to 17.7x fewer parameter updates and up to 1.32x lower peak GPU memory usage during fine-tuning. Code will be publicly available.

Updated: 2024-06-18 17:52:59

标题: LaMDA：通过谱分解低维适应实现大型模型微调

摘要: 低秩适应（LoRA）已成为微调大型语言模型（LLMs）的默认方法，因为它显著减少了可训练参数。然而，LoRA的可训练参数需求随着模型嵌入维度的增加而增加，导致计算成本高昂。此外，其反向更新需要存储高维中间激活和优化器状态，需要高峰GPU内存。在本文中，我们介绍了通过谱分解低维适应（LaMDA）进行大模型微调的方法，这是一种微调大型语言模型的新方法，利用低维适应实现了可训练参数和峰值GPU内存占用的显著减少。LaMDA在适应路径中冻结了第一个投影矩阵（PMA），同时引入了一个低维可训练方阵，从而实现了可训练参数和峰值GPU内存使用量的大幅减少。LaMDA在早期微调阶段逐渐冻结第二个投影矩阵（PMB），减少了与权重更新相关的计算成本，从而进一步增强了参数效率。我们还提出了一个增强版，LaMDA++，通过对预训练模型权重进行归一化谱分析，在LoRA路径上引入了“轻量级”自适应秩分配。我们在各种任务上评估了LaMDA/LaMDA++，包括使用GLUE基准测试进行自然语言理解、文本摘要、自然语言生成以及不同LLMs上的复杂推理。结果表明，LaMDA在需要更新参数的情况下匹敌或超过了现有替代方案的性能，同时在微调过程中减少了最多17.7倍的参数更新次数，峰值GPU内存使用量降低了最多1.32倍。代码将公开提供。

更新时间: 2024-06-18 17:52:59

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2406.12832v1

OccamLLM: Fast and Exact Language Model Arithmetic in a Single Step

Despite significant advancements in text generation and reasoning, Large Language Models (LLMs) still face challenges in accurately performing complex arithmetic operations. To achieve accurate calculations, language model systems often enable LLMs to generate code for arithmetic operations. However, this approach compromises speed and security and, if finetuning is involved, risks the language model losing prior capabilities. We propose a framework that enables exact arithmetic in \textit{a single autoregressive step}, providing faster, more secure, and more interpretable LLM systems with arithmetic capabilities. We use the hidden states of an LLM to control a symbolic architecture which performs arithmetic. Our implementation using Llama 3 8B Instruct with OccamNet as a symbolic model (OccamLlama) achieves 100\% accuracy on single arithmetic operations ($+,-,\times,\div,\sin{},\cos{},\log{},\exp{},\sqrt{}$), outperforming GPT 4o and on par with GPT 4o using a code interpreter. OccamLlama also outperforms GPT 4o both with and without a code interpreter on mathematical problem solving benchmarks involving challenging arithmetic, thus enabling small LLMs to match the arithmetic performance of even much larger models. We will make our code public shortly.

Updated: 2024-06-18 17:51:42

标题: OccamLLM：一步快速准确的语言模型算术

摘要: 尽管文本生成和推理方面取得了显著进展，但大型语言模型（LLMs）在准确执行复杂算术运算方面仍面临挑战。为了实现准确计算，语言模型系统通常会使LLMs生成算术运算的代码。然而，这种方法会牺牲速度和安全性，并且如果涉及微调，可能会使语言模型失去先前的能力。我们提出了一个框架，可以在一个自回归步骤中实现精确的算术运算，从而提供更快、更安全、更可解释的LLM系统，并具有算术能力。我们利用LLM的隐藏状态来控制一个执行算术运算的符号架构。我们的实现使用Llama 3 8B Instruct与OccamNet作为符号模型（OccamLlama），在单个算术运算（$+,-,\times,\div,\sin{},\cos{},\log{},\exp{},\sqrt{}$）上实现了100％的准确率，优于GPT 4o，并与使用代码解释器的GPT 4o相媲美。OccamLlama在涉及具有挑战性算术的数学问题解决基准测试中，无论是否使用代码解释器，都比GPT 4o表现更好，从而使小型LLMs能够达到甚至比更大模型更好的算术性能。我们将很快公开我们的代码。

更新时间: 2024-06-18 17:51:42

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2406.06576v2

VIA: A Spatiotemporal Video Adaptation Framework for Global and Local Video Editing

Video editing stands as a cornerstone of digital media, from entertainment and education to professional communication. However, previous methods often overlook the necessity of comprehensively understanding both global and local contexts, leading to inaccurate and inconsistency edits in the spatiotemporal dimension, especially for long videos. In this paper, we introduce VIA, a unified spatiotemporal VIdeo Adaptation framework for global and local video editing, pushing the limits of consistently editing minute-long videos. First, to ensure local consistency within individual frames, the foundation of VIA is a novel test-time editing adaptation method, which adapts a pre-trained image editing model for improving consistency between potential editing directions and the text instruction, and adapts masked latent variables for precise local control. Furthermore, to maintain global consistency over the video sequence, we introduce spatiotemporal adaptation that adapts consistent attention variables in key frames and strategically applies them across the whole sequence to realize the editing effects. Extensive experiments demonstrate that, compared to baseline methods, our VIA approach produces edits that are more faithful to the source videos, more coherent in the spatiotemporal context, and more precise in local control. More importantly, we show that VIA can achieve consistent long video editing in minutes, unlocking the potentials for advanced video editing tasks over long video sequences.

Updated: 2024-06-18 17:51:37

标题: VIA：一个用于全局和局部视频编辑的时空视频适应框架

摘要: 视频编辑是数字媒体的基石，涵盖了从娱乐和教育到专业通信的各个领域。然而，先前的方法往往忽视了全局和局部环境的全面理解的必要性，在时空维度上导致编辑不准确和不一致，特别是对于长视频。在本文中，我们介绍了一个统一的时空视频适应框架VIA，用于全局和局部视频编辑，将一分钟长视频的一致编辑推向极限。首先，为了确保单个帧内的局部一致性，VIA的基础是一种新颖的测试时间编辑适应方法，该方法通过调整预训练的图像编辑模型来提高潜在编辑方向与文本指令之间的一致性，并调整掩码潜在变量以实现精确的局部控制。此外，为了在视频序列中保持全局一致性，我们引入了时空适应，通过在关键帧中调整一致的注意变量，并在整个序列中策略性地应用它们来实现编辑效果。大量实验证明，与基准方法相比，我们的VIA方法产生的编辑更忠实于原始视频，在时空环境中更连贯，并且在局部控制方面更精确。更重要的是，我们展示了VIA可以在几分钟内实现一致的长视频编辑，释放了在长视频序列上进行高级视频编辑任务的潜力。

更新时间: 2024-06-18 17:51:37

领域: cs.CV,cs.AI,cs.MM

下载: http://arxiv.org/abs/2406.12831v1

From RAGs to rich parameters: Probing how language models utilize external knowledge over parametric information for factual queries

Retrieval Augmented Generation (RAG) enriches the ability of language models to reason using external context to augment responses for a given user prompt. This approach has risen in popularity due to practical applications in various applications of language models in search, question/answering, and chat-bots. However, the exact nature of how this approach works isn't clearly understood. In this paper, we mechanistically examine the RAG pipeline to highlight that language models take shortcut and have a strong bias towards utilizing only the context information to answer the question, while relying minimally on their parametric memory. We probe this mechanistic behavior in language models with: (i) Causal Mediation Analysis to show that the parametric memory is minimally utilized when answering a question and (ii) Attention Contributions and Knockouts to show that the last token residual stream do not get enriched from the subject token in the question, but gets enriched from other informative tokens in the context. We find this pronounced shortcut behaviour true across both LLaMa and Phi family of models.

Updated: 2024-06-18 17:46:08

标题: 从RAG到丰富参数：探究语言模型如何利用外部知识而不是参数信息来处理事实查询

摘要: 检索增强生成（RAG）丰富了语言模型利用外部上下文推理的能力，以增强针对给定用户提示的响应。这种方法因在搜索、问答和聊天机器人等语言模型应用中的实际应用而日益受到欢迎。然而，这种方法的确切工作方式尚不清楚。在本文中，我们机械地检查了RAG管道，以突出语言模型采取捷径并倾向于仅利用上下文信息来回答问题，而最小程度地依赖其参数化记忆。我们通过以下方式探究语言模型的这种机械行为：（i）因果中介分析表明，在回答问题时，参数化记忆被最小程度利用；（ii）注意力贡献和淘汰表明，问题中的主题标记不会使最后一个标记残留流得到丰富，但会受到上下文中其他信息丰富的标记的影响。我们发现这种明显的捷径行为在LLaMa和Phi家族模型中都是真实的。

更新时间: 2024-06-18 17:46:08

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2406.12824v1

Is It Good Data for Multilingual Instruction Tuning or Just Bad Multilingual Evaluation for Large Language Models?

Large language models, particularly multilingual ones, are designed, claimed, and expected to cater to native speakers of varied languages. We hypothesise that the current practices of fine-tuning and evaluating these models may mismatch this intention owing to a heavy reliance on translation, which can introduce translation artefacts and defects. It remains unknown whether the nature of the instruction data has an impact on the model output; on the other hand, it remains questionable whether translated test sets can capture such nuances. Due to the often coupled practices of using translated data in both stages, such imperfections could have been overlooked. This work investigates these issues by using controlled native or translated data during instruction tuning and evaluation stages and observing model results. Experiments on eight base models and eight different benchmarks reveal that native or generation benchmarks display a notable difference between native and translated instruction data especially when model performance is high, whereas other types of test sets cannot. Finally, we demonstrate that regularization is beneficial to bridging this gap on structured but not generative tasks.

Updated: 2024-06-18 17:43:47

标题: 这是一个关于多语言指导调整的好数据，还是仅仅是大型语言模型的糟糕多语言评估的标题。

摘要: 大型语言模型，尤其是多语言模型，旨在为各种语言的母语使用者提供服务。我们假设目前的微调和评估这些模型的做法可能不符合这一意图，因为过度依赖翻译可能会引入翻译的人为错误和缺陷。目前尚不清楚指导数据的性质是否会影响模型输出；另一方面，翻译后的测试集能否捕捉到这些细微差别也是一个问题。由于在两个阶段都使用翻译数据的常见做法，这些缺陷可能被忽视了。本研究通过在指导微调和评估阶段使用受控的母语或翻译数据，并观察模型结果来调查这些问题。对八个基础模型和八个不同基准的实验表明，母语或生成基准在模型表现良好时显示出母语和翻译指导数据之间的显著差异，而其他类型的测试集则不能。最后，我们证明正则化对于在结构化但不是生成性任务上弥合这一差距是有益的。

更新时间: 2024-06-18 17:43:47

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2406.12822v1

DeformTime: Capturing Variable Dependencies with Deformable Attention for Time Series Forecasting

In multivariate time series (MTS) forecasting, existing state-of-the-art deep learning approaches tend to focus on autoregressive formulations and overlook the information within exogenous indicators. To address this limitation, we present DeformTime, a neural network architecture that attempts to capture correlated temporal patterns from the input space, and hence, improve forecasting accuracy. It deploys two core operations performed by deformable attention blocks (DABs): learning dependencies across variables from different time steps (variable DAB), and preserving temporal dependencies in data from previous time steps (temporal DAB). Input data transformation is explicitly designed to enhance learning from the deformed series of information while passing through a DAB. We conduct extensive experiments on 6 MTS data sets, using previously established benchmarks as well as challenging infectious disease modelling tasks with more exogenous variables. The results demonstrate that DeformTime improves accuracy against previous competitive methods across the vast majority of MTS forecasting tasks, reducing the mean absolute error by 10% on average. Notably, performance gains remain consistent across longer forecasting horizons.

Updated: 2024-06-18 17:42:52

标题: DeformTime：使用可变形注意力捕捉时间序列预测中的变量依赖关系

摘要: 在多元时间序列（MTS）预测中，现有的深度学习方法往往专注于自回归形式，并忽略外生指标中的信息。为了解决这一局限性，我们提出了DeformTime，一种神经网络架构，旨在捕捉输入空间中的相关时间模式，从而提高预测准确性。它部署了由可变形注意力块（DABs）执行的两个核心操作：学习不同时间步长之间变量之间的依赖关系（变量DAB），并保存先前时间步长数据中的时间依赖关系（时间DAB）。输入数据转换被明确设计为通过DAB传递时增强学习来自信息的变形系列。我们在6个MTS数据集上进行了大量实验，使用先前建立的基准以及具有更多外生变量的具有挑战性的传染病建模任务。结果表明，DeformTime在绝大多数MTS预测任务中提高了准确性，平均减少了10％的平均绝对误差。值得注意的是，性能增益在更长的预测时间范围内保持一致。

更新时间: 2024-06-18 17:42:52

领域: cs.LG

下载: http://arxiv.org/abs/2406.07438v2

Neural Approximate Mirror Maps for Constrained Diffusion Models

Diffusion models excel at creating visually-convincing images, but they often struggle to meet subtle constraints inherent in the training data. Such constraints could be physics-based (e.g., satisfying a PDE), geometric (e.g., respecting symmetry), or semantic (e.g., including a particular number of objects). When the training data all satisfy a certain constraint, enforcing this constraint on a diffusion model not only improves its distribution-matching accuracy but also makes it more reliable for generating valid synthetic data and solving constrained inverse problems. However, existing methods for constrained diffusion models are inflexible with different types of constraints. Recent work proposed to learn mirror diffusion models (MDMs) in an unconstrained space defined by a mirror map and to impose the constraint with an inverse mirror map, but analytical mirror maps are challenging to derive for complex constraints. We propose neural approximate mirror maps (NAMMs) for general constraints. Our approach only requires a differentiable distance function from the constraint set. We learn an approximate mirror map that pushes data into an unconstrained space and a corresponding approximate inverse that maps data back to the constraint set. A generative model, such as an MDM, can then be trained in the learned mirror space and its samples restored to the constraint set by the inverse map. We validate our approach on a variety of constraints, showing that compared to an unconstrained diffusion model, a NAMM-based MDM substantially improves constraint satisfaction. We also demonstrate how existing diffusion-based inverse-problem solvers can be easily applied in the learned mirror space to solve constrained inverse problems.

Updated: 2024-06-18 17:36:09

标题: 神经近似镜像地图用于受限扩散模型

摘要: 扩散模型在创建视觉上令人信服的图像方面表现出色，但它们经常很难满足训练数据中固有的微妙约束。这些约束可能是基于物理的（例如，满足PDE），几何的（例如，尊重对称性），或语义的（例如，包括特定数量的对象）。当训练数据都满足某个特定约束时，在扩散模型上强制执行这个约束不仅提高了其分布匹配精度，还使其更可靠地生成有效的合成数据和解决受约束的反问题。然而，现有的受约束扩散模型方法对不同类型的约束缺乏灵活性。最近的工作提出在由镜像映射定义的无约束空间中学习镜像扩散模型（MDMs），并使用逆镜像映射来施加约束，但对于复杂约束来说，解析镜像映射是具有挑战性的。我们提出了用于一般约束的神经近似镜像映射（NAMMs）。我们的方法仅需要从约束集合中获得可微分的距离函数。我们学习一个近似镜像映射，将数据推送到无约束空间，并学习一个相应的近似逆映射，将数据映射回约束集合。然后，可以在学习的镜像空间中训练生成模型，如MDM，并通过逆映射将其样本恢复到约束集合。我们验证了我们的方法在各种约束条件下的效果，结果显示，与无约束扩散模型相比，基于NAMM的MDM大大提高了约束满足度。我们还展示了如何在学习的镜像空间中轻松应用现有基于扩散的反问题求解器来解决受约束的反问题。

更新时间: 2024-06-18 17:36:09

领域: cs.LG,cs.CV,eess.IV

下载: http://arxiv.org/abs/2406.12816v1

Privacy Preserving Federated Learning in Medical Imaging with Uncertainty Estimation

Machine learning (ML) and Artificial Intelligence (AI) have fueled remarkable advancements, particularly in healthcare. Within medical imaging, ML models hold the promise of improving disease diagnoses, treatment planning, and post-treatment monitoring. Various computer vision tasks like image classification, object detection, and image segmentation are poised to become routine in clinical analysis. However, privacy concerns surrounding patient data hinder the assembly of large training datasets needed for developing and training accurate, robust, and generalizable models. Federated Learning (FL) emerges as a compelling solution, enabling organizations to collaborate on ML model training by sharing model training information (gradients) rather than data (e.g., medical images). FL's distributed learning framework facilitates inter-institutional collaboration while preserving patient privacy. However, FL, while robust in privacy preservation, faces several challenges. Sensitive information can still be gleaned from shared gradients that are passed on between organizations during model training. Additionally, in medical imaging, quantifying model confidence\uncertainty accurately is crucial due to the noise and artifacts present in the data. Uncertainty estimation in FL encounters unique hurdles due to data heterogeneity across organizations. This paper offers a comprehensive review of FL, privacy preservation, and uncertainty estimation, with a focus on medical imaging. Alongside a survey of current research, we identify gaps in the field and suggest future directions for FL research to enhance privacy and address noisy medical imaging data challenges.

Updated: 2024-06-18 17:35:52

标题: 隐私保护的医学影像联邦学习与不确定性估计

摘要: 机器学习（ML）和人工智能（AI）在医疗领域取得了显著的进展。在医学影像学中，ML模型有望改善疾病诊断、治疗计划和治疗后监测。各种计算机视觉任务，如图像分类、目标检测和图像分割，正逐渐成为临床分析的常规。然而，围绕患者数据的隐私问题阻碍了开发和训练准确、稳健且可泛化模型所需的大型训练数据集的组装。联邦学习（FL）被视为一种引人注目的解决方案，通过共享模型训练信息（梯度）而非数据（如医学影像），使组织能够共同进行ML模型训练。FL的分布式学习框架促进了跨机构合作，同时保护了患者隐私。然而，尽管在隐私保护方面强大，FL面临着几个挑战。在模型训练过程中，组织之间传递的共享梯度仍然可能泄露敏感信息。此外，在医学影像领域，由于数据中存在噪音和伪影，准确量化模型的置信度\不确定性至关重要。由于组织间数据的异质性，FL中的不确定性估计遇到了独特的障碍。本文全面审查了FL、隐私保护和不确定性估计，重点关注医学影像。除了对当前研究的调查，我们还确定了该领域的空白，并提出了未来FL研究的方向，以增强隐私保护并解决医学影像数据中的噪声挑战。

更新时间: 2024-06-18 17:35:52

领域: cs.LG,cs.AI,cs.DC,eess.IV,stat.ML

下载: http://arxiv.org/abs/2406.12815v1

Adversarial Attacks on Multimodal Agents

Vision-enabled language models (VLMs) are now used to build autonomous multimodal agents capable of taking actions in real environments. In this paper, we show that multimodal agents raise new safety risks, even though attacking agents is more challenging than prior attacks due to limited access to and knowledge about the environment. Our attacks use adversarial text strings to guide gradient-based perturbation over one trigger image in the environment: (1) our captioner attack attacks white-box captioners if they are used to process images into captions as additional inputs to the VLM; (2) our CLIP attack attacks a set of CLIP models jointly, which can transfer to proprietary VLMs. To evaluate the attacks, we curated VisualWebArena-Adv, a set of adversarial tasks based on VisualWebArena, an environment for web-based multimodal agent tasks. Within an L-infinity norm of $16/256$ on a single image, the captioner attack can make a captioner-augmented GPT-4V agent execute the adversarial goals with a 75% success rate. When we remove the captioner or use GPT-4V to generate its own captions, the CLIP attack can achieve success rates of 21% and 43%, respectively. Experiments on agents based on other VLMs, such as Gemini-1.5, Claude-3, and GPT-4o, show interesting differences in their robustness. Further analysis reveals several key factors contributing to the attack's success, and we also discuss the implications for defenses as well. Project page: https://chenwu.io/attack-agent Code and data: https://github.com/ChenWu98/agent-attack

Updated: 2024-06-18 17:32:48

标题: 多模态代理的对抗性攻击

摘要: 视觉启用的语言模型（VLMs）现在被用于构建能够在真实环境中采取行动的自主多模态代理。在本文中，我们展示了多模态代理引发了新的安全风险，尽管攻击代理比以往更具挑战性，因为对环境的访问和了解有限。我们的攻击使用对抗性文本字符串来引导渐变扰动，覆盖环境中的一个触发图像：（1）我们的标题攻击针对白盒标题生成器进行攻击，如果它们用于将图像处理成标题作为VLM的附加输入；（2）我们的CLIP攻击同时攻击一组CLIP模型，这可能会转移到专有的VLM。为了评估这些攻击，我们策划了基于VisualWebArena的对抗性任务集VisualWebArena-Adv，这是一个用于基于网络的多模态代理任务的环境。在单个图像上的L-无穷范数为$16/256$的情况下，标题攻击可以使标题增强的GPT-4V代理以75%的成功率执行对抗性目标。当我们移除标题生成器或使用GPT-4V生成自己的标题时，CLIP攻击的成功率分别为21%和43%。对基于其他VLMs的代理进行的实验，如Gemini-1.5、Claude-3和GPT-4o，展示了它们在鲁棒性方面的有趣差异。进一步的分析揭示了几个影响攻击成功的关键因素，我们还讨论了对防御的影响。项目页面：https://chenwu.io/attack 代码和数据：https://github.com/ChenWu98/agent-attack

更新时间: 2024-06-18 17:32:48

领域: cs.LG,cs.CL,cs.CR,cs.CV

下载: http://arxiv.org/abs/2406.12814v1

Jailbreaking Leading Safety-Aligned LLMs with Simple Adaptive Attacks

We show that even the most recent safety-aligned LLMs are not robust to simple adaptive jailbreaking attacks. First, we demonstrate how to successfully leverage access to logprobs for jailbreaking: we initially design an adversarial prompt template (sometimes adapted to the target LLM), and then we apply random search on a suffix to maximize a target logprob (e.g., of the token ``Sure''), potentially with multiple restarts. In this way, we achieve nearly 100% attack success rate -- according to GPT-4 as a judge -- on Vicuna-13B, Mistral-7B, Phi-3-Mini, Nemotron-4-340B, Llama-2-Chat-7B/13B/70B, Llama-3-Instruct-8B, Gemma-7B, GPT-3.5, GPT-4, and R2D2 from HarmBench that was adversarially trained against the GCG attack. We also show how to jailbreak all Claude models -- that do not expose logprobs -- via either a transfer or prefilling attack with a 100% success rate. In addition, we show how to use random search on a restricted set of tokens for finding trojan strings in poisoned models -- a task that shares many similarities with jailbreaking -- which is the algorithm that brought us the first place in the SaTML'24 Trojan Detection Competition. The common theme behind these attacks is that adaptivity is crucial: different models are vulnerable to different prompting templates (e.g., R2D2 is very sensitive to in-context learning prompts), some models have unique vulnerabilities based on their APIs (e.g., prefilling for Claude), and in some settings, it is crucial to restrict the token search space based on prior knowledge (e.g., for trojan detection). For reproducibility purposes, we provide the code, logs, and jailbreak artifacts in the JailbreakBench format at https://github.com/tml-epfl/llm-adaptive-attacks.

Updated: 2024-06-18 17:29:04

标题: 用简单的自适应攻击对越狱领先的安全对齐LLMs进行研究

摘要: 我们展示了即使是最新的安全对齐LLMs也不具有对简单适应性越狱攻击的强大性。首先，我们展示了如何成功利用对logprobs的访问进行越狱：我们首先设计一个对抗性提示模板（有时会针对目标LLM进行调整），然后我们在后缀上应用随机搜索来最大化目标logprob（例如，令牌“Sure”），可能会进行多次重启。通过这种方式，我们在Vicuna-13B、Mistral-7B、Phi-3-Mini、Nemotron-4-340B、Llama-2-Chat-7B/13B/70B、Llama-3-Instruct-8B、Gemma-7B、GPT-3.5、GPT-4和对抗性训练的HarmBench中的R2D2上实现了近乎100%的攻击成功率--根据GPT-4的评判。我们还展示了如何通过转移攻击或预填充攻击以100%的成功率越狱所有不暴露logprobs的Claude模型。此外，我们展示了如何在受污染的模型中的一组受限制的令牌上使用随机搜索来找到特洛伊字符串--这个任务与越狱有很多相似之处--这是我们在SaTML'24特洛伊检测竞赛中夺冠的算法。这些攻击背后的共同主题是适应性至关重要：不同的模型对不同的提示模板（例如，R2D2对于上下文学习提示非常敏感）易受攻击，一些模型基于其API具有独特的漏洞（例如，Claude的预填充），在某些情况下，根据先前的知识限制令牌搜索空间是至关重要的（例如，用于特洛伊检测）。为了可重现性目的，我们以JailbreakBench格式在https://github.com/tml-epfl/llm-adaptive-attacks 提供了代码、日志和越狱工件。

更新时间: 2024-06-18 17:29:04

领域: cs.CR,cs.AI,cs.LG,stat.ML

下载: http://arxiv.org/abs/2404.02151v2

Graph Neural Networks in Histopathology: Emerging Trends and Future Directions

Histopathological analysis of Whole Slide Images (WSIs) has seen a surge in the utilization of deep learning methods, particularly Convolutional Neural Networks (CNNs). However, CNNs often fall short in capturing the intricate spatial dependencies inherent in WSIs. Graph Neural Networks (GNNs) present a promising alternative, adept at directly modeling pairwise interactions and effectively discerning the topological tissue and cellular structures within WSIs. Recognizing the pressing need for deep learning techniques that harness the topological structure of WSIs, the application of GNNs in histopathology has experienced rapid growth. In this comprehensive review, we survey GNNs in histopathology, discuss their applications, and exploring emerging trends that pave the way for future advancements in the field. We begin by elucidating the fundamentals of GNNs and their potential applications in histopathology. Leveraging quantitative literature analysis, we identify four emerging trends: Hierarchical GNNs, Adaptive Graph Structure Learning, Multimodal GNNs, and Higher-order GNNs. Through an in-depth exploration of these trends, we offer insights into the evolving landscape of GNNs in histopathological analysis. Based on our findings, we propose future directions to propel the field forward. Our analysis serves to guide researchers and practitioners towards innovative approaches and methodologies, fostering advancements in histopathological analysis through the lens of graph neural networks.

Updated: 2024-06-18 17:23:50

标题: 组织病理学中的图神经网络：新兴趋势与未来方向

摘要: 组织学分析整张幻灯片图像（WSIs）已经出现了利用深度学习方法，特别是卷积神经网络（CNNs）的激增。然而，CNNs经常无法捕捉WSIs固有的错综复杂的空间依赖关系。图神经网络（GNNs）提供了一种有前途的替代方案，能够直接建模成对交互作用并有效地识别WSIs中的组织和细胞结构的拓扑关系。鉴于利用WSIs的拓扑结构的深度学习技术的迫切需求，GNNs在组织学中的应用经历了快速增长。在这个综述中，我们调查了组织学中的GNNs，讨论了它们的应用，并探索了为未来进展铺平道路的新兴趋势。我们首先阐明了GNNs的基本原理及其在组织学中的潜在应用。通过量化文献分析，我们确定了四个新兴趋势：分层GNNs、自适应图结构学习、多模式GNNs和高阶GNNs。通过深入探讨这些趋势，我们提供了对组织学分析中GNNs不断发展的景观的见解。根据我们的发现，我们提出了推动该领域向前发展的未来方向。我们的分析旨在引导研究人员和从业者朝着创新方法和方法论的方向发展，促进通过图神经网络促进组织学分析的进步。

更新时间: 2024-06-18 17:23:50

领域: cs.CV,cs.AI,cs.LG,q-bio.TO,I.2.10; I.4.10; J.3

下载: http://arxiv.org/abs/2406.12808v1

Probabilistic Temporal Prediction of Continuous Disease Trajectories and Treatment Effects Using Neural SDEs

Personalized medicine based on medical images, including predicting future individualized clinical disease progression and treatment response, would have an enormous impact on healthcare and drug development, particularly for diseases (e.g. multiple sclerosis (MS)) with long term, complex, heterogeneous evolutions and no cure. In this work, we present the first stochastic causal temporal framework to model the continuous temporal evolution of disease progression via Neural Stochastic Differential Equations (NSDE). The proposed causal inference model takes as input the patient's high dimensional images (MRI) and tabular data, and predicts both factual and counterfactual progression trajectories on different treatments in latent space. The NSDE permits the estimation of high-confidence personalized trajectories and treatment effects. Extensive experiments were performed on a large, multi-centre, proprietary dataset of patient 3D MRI and clinical data acquired during several randomized clinical trials for MS treatments. Our results present the first successful uncertainty-based causal Deep Learning (DL) model to: (a) accurately predict future patient MS disability evolution (e.g. EDSS) and treatment effects leveraging baseline MRI, and (b) permit the discovery of subgroups of patients for which the model has high confidence in their response to treatment even in clinical trials which did not reach their clinical endpoints.

Updated: 2024-06-18 17:22:55

标题: 使用神经随机微分方程对连续疾病轨迹和治疗效果进行概率时间预测

摘要: 基于医学影像的个性化医学，包括预测未来个体化临床疾病进展和治疗反应，将对医疗保健和药物开发产生巨大影响，特别是对于长期、复杂、异质进化且无治愈方法的疾病（如多发性硬化症（MS））。在这项工作中，我们提出了第一个用神经随机微分方程（NSDE）建模疾病进展连续时间演化的随机因果时间框架。所提出的因果推断模型以患者的高维影像（MRI）和表格数据作为输入，并在潜在空间中预测不同治疗下的实际和反事实进展轨迹。NSDE允许估计高置信度个性化轨迹和治疗效果。我们在一项为多发性硬化症治疗进行的几项随机临床试验中获取的大型、多中心、专有数据集上进行了大量实验。我们的结果展示了第一个成功基于不确定性的因果深度学习（DL）模型，能够精确预测未来患者的MS残疾演化（例如EDSS）和治疗效果，利用基线MRI，并且允许发现在临床试验中未达到临床终点的患者亚组，对其对治疗反应有高度信心。

更新时间: 2024-06-18 17:22:55

领域: cs.AI,cs.LG,eess.IV,q-bio.QM

下载: http://arxiv.org/abs/2406.12807v1

Identifying Performance-Sensitive Configurations in Software Systems through Code Analysis with LLM Agents

Configuration settings are essential for tailoring software behavior to meet specific performance requirements. However, incorrect configurations are widespread, and identifying those that impact system performance is challenging due to the vast number and complexity of possible settings. In this work, we present PerfSense, a lightweight framework that leverages Large Language Models (LLMs) to efficiently identify performance-sensitive configurations with minimal overhead. PerfSense employs LLM agents to simulate interactions between developers and performance engineers using advanced prompting techniques such as prompt chaining and retrieval-augmented generation (RAG). Our evaluation of seven open-source Java systems demonstrates that PerfSense achieves an average accuracy of 64.77% in classifying performance-sensitive configurations, outperforming both our LLM baseline (50.36%) and the previous state-of-the-art method (61.75%). Notably, our prompt chaining technique improves recall by 10% to 30% while maintaining similar precision levels. Additionally, a manual analysis of 362 misclassifications reveals common issues, including LLMs' misunderstandings of requirements (26.8%). In summary, PerfSense significantly reduces manual effort in classifying performance-sensitive configurations and offers valuable insights for future LLM-based code analysis research.

Updated: 2024-06-18 17:22:48

标题: 通过LLM代理的代码分析来识别软件系统中性能敏感的配置

摘要: 配置设置对于定制软件行为以满足特定性能要求至关重要。然而，不正确的配置是普遍存在的，由于可能设置的数量和复杂性巨大，识别影响系统性能的配置是具有挑战性的。在这项工作中，我们提出了PerfSense，这是一个轻量级框架，利用大型语言模型（LLMs）高效地识别性能敏感配置，且开销最小。PerfSense利用LLM代理模拟开发人员和性能工程师之间的交互，使用高级提示技术，如提示链接和检索增强生成（RAG）。我们对七个开源Java系统进行的评估表明，PerfSense在分类性能敏感配置方面的平均准确率为64.77%，优于我们的LLM基线（50.36%）和先前的最先进方法（61.75%）。值得注意的是，我们的提示链接技术在保持类似精度水平的同时，将召回率提高了10%至30%。此外，对362个错误分类的手动分析揭示了常见问题，包括LLMs对需求的误解（26.8%）。总之，PerfSense显著减少了分类性能敏感配置的手动工作量，并为未来基于LLM的代码分析研究提供了有价值的见解。

更新时间: 2024-06-18 17:22:48

领域: cs.SE,cs.AI

下载: http://arxiv.org/abs/2406.12806v1

Scalable Rule Lists Learning with Sampling

Learning interpretable models has become a major focus of machine learning research, given the increasing prominence of machine learning in socially important decision-making. Among interpretable models, rule lists are among the best-known and easily interpretable ones. However, finding optimal rule lists is computationally challenging, and current approaches are impractical for large datasets. We present a novel and scalable approach to learn nearly optimal rule lists from large datasets. Our algorithm uses sampling to efficiently obtain an approximation of the optimal rule list with rigorous guarantees on the quality of the approximation. In particular, our algorithm guarantees to find a rule list with accuracy very close to the optimal rule list when a rule list with high accuracy exists. Our algorithm builds on the VC-dimension of rule lists, for which we prove novel upper and lower bounds. Our experimental evaluation on large datasets shows that our algorithm identifies nearly optimal rule lists with a speed-up up to two orders of magnitude over state-of-the-art exact approaches. Moreover, our algorithm is as fast as, and sometimes faster than, recent heuristic approaches, while reporting higher quality rule lists. In addition, the rules reported by our algorithm are more similar to the rules in the optimal rule list than the rules from heuristic approaches.

Updated: 2024-06-18 17:15:00

标题: 可扩展的采样规则列表学习

摘要: 学习可解释模型已成为机器学习研究的主要焦点，鉴于机器学习在社会重要决策中的日益突出。在可解释模型中，规则列表是最为著名和易于解释的之一。然而，找到最优规则列表在计算上具有挑战性，当前方法对于大型数据集来说并不实用。我们提出了一种新颖且可扩展的方法，从大型数据集中学习接近最优规则列表。我们的算法使用采样来高效地获得最优规则列表的近似，并对近似质量提供严格保证。特别是，我们的算法保证在存在高准确性规则列表时找到准确性非常接近最优规则列表的规则列表。我们的算法建立在规则列表的VC维度上，为此我们证明了新颖的上下界。我们在大型数据集上的实验评估表明，我们的算法能够识别接近最优规则列表，速度比最先进的精确方法提高了两个数量级。此外，我们的算法与最近的启发式方法一样快，有时甚至更快，同时报告更高质量的规则列表。此外，我们的算法报告的规则与最优规则列表中的规则更为相似，而启发式方法报告的规则则不尽相同。

更新时间: 2024-06-18 17:15:00

领域: cs.LG

下载: http://arxiv.org/abs/2406.12803v1

Gap-Free Clustering: Sensitivity and Robustness of SDP

We study graph clustering in the Stochastic Block Model (SBM) in the presence of both large clusters and small, unrecoverable clusters. Previous convex relaxation approaches achieving exact recovery do not allow any small clusters of size $o(\sqrt{n})$, or require a size gap between the smallest recovered cluster and the largest non-recovered cluster. We provide an algorithm based on semidefinite programming (SDP) which removes these requirements and provably recovers large clusters regardless of the remaining cluster sizes. Mid-sized clusters pose unique challenges to the analysis, since their proximity to the recovery threshold makes them highly sensitive to small noise perturbations and precludes a closed-form candidate solution. We develop novel techniques, including a leave-one-out-style argument which controls the correlation between SDP solutions and noise vectors even when the removal of one row of noise can drastically change the SDP solution. We also develop improved eigenvalue perturbation bounds of potential independent interest. Our results are robust to certain semirandom settings that are challenging for alternative algorithms. Using our gap-free clustering procedure, we obtain efficient algorithms for the problem of clustering with a faulty oracle with superior query complexities, notably achieving $o(n^2)$ sample complexity even in the presence of a large number of small clusters. Our gap-free clustering procedure also leads to improved algorithms for recursive clustering.

Updated: 2024-06-18 17:13:53

标题: Gap-Free Clustering: Sensitivity and Robustness of SDP 间隙无缝聚类：SDP的敏感性和稳健性

摘要: 我们研究了在随机块模型（SBM）中存在大簇和小不可恢复簇的图聚类问题。先前的凸松弛方法实现了精确恢复，但不允许任何大小为$o(\sqrt{n})$的小簇，或者需要最小恢复簇和最大未恢复簇之间的大小差距。我们提供了一种基于半定规划（SDP）的算法，该算法去除了这些要求，并且可以可靠地恢复大簇，而不考虑其余簇的大小。中等大小的簇对于分析提出了独特的挑战，因为它们接近恢复阈值，使它们对于小的噪声扰动非常敏感，并且无法找到封闭形式的候选解。我们开发了新颖的技术，包括一种留一出样式的论证，可以控制SDP解和噪声向量之间的相关性，即使去除一个噪声行可能会显著改变SDP解。我们还开发了潜在独立兴趣的改进特征值扰动界。我们的结果对于某些具有挑战性的半随机设置是稳健的，这些设置对于替代算法是具有挑战性的。使用我们的无差距聚类过程，我们获得了在具有优越查询复杂性的错误预言问题中的高效算法，特别是在存在大量小簇的情况下，即使在这种情况下也实现了$o(n^2)$的样本复杂性。我们的无差距聚类过程还导致改进的递归聚类算法。

更新时间: 2024-06-18 17:13:53

领域: cs.LG,cs.DS,cs.IT,math.IT,math.OC,stat.ML

下载: http://arxiv.org/abs/2308.15642v2

Supporting Human Raters with the Detection of Harmful Content using Large Language Models

In this paper, we explore the feasibility of leveraging large language models (LLMs) to automate or otherwise assist human raters with identifying harmful content including hate speech, harassment, violent extremism, and election misinformation. Using a dataset of 50,000 comments, we demonstrate that LLMs can achieve 90% accuracy when compared to human verdicts. We explore how to best leverage these capabilities, proposing five design patterns that integrate LLMs with human rating, such as pre-filtering non-violative content, detecting potential errors in human rating, or surfacing critical context to support human rating. We outline how to support all of these design patterns using a single, optimized prompt. Beyond these synthetic experiments, we share how piloting our proposed techniques in a real-world review queue yielded a 41.5% improvement in optimizing available human rater capacity, and a 9--11% increase (absolute) in precision and recall for detecting violative content.

Updated: 2024-06-18 17:12:50

标题: 使用大型语言模型支持人类评分员检测有害内容

摘要: 在本文中，我们探讨了利用大型语言模型（LLMs）自动化或协助人类评分员识别有害内容，包括仇恨言论、骚扰、暴力极端主义和选举误导的可行性。通过使用一个包含5万条评论的数据集，我们展示了LLMs可以在与人类裁决相比时达到90%的准确率。我们探讨了如何最好地利用这些能力，提出了五种将LLMs与人类评分结合的设计模式，比如预先过滤非违规内容、检测人类评分中的潜在错误，或提供关键背景以支持人类评分。我们概述了如何使用一个单一的优化提示来支持所有这些设计模式。除了这些合成实验外，我们还分享了如何在真实的审核队列中试验我们提出的技术，实现了优化可用人类评分员容量的41.5%提升，以及在检测违规内容方面精确度和召回率的9-11%增加（绝对值）。

更新时间: 2024-06-18 17:12:50

领域: cs.CR

下载: http://arxiv.org/abs/2406.12800v1

ROOT-SGD: Sharp Nonasymptotics and Near-Optimal Asymptotics in a Single Algorithm

We study the problem of solving strongly convex and smooth unconstrained optimization problems using stochastic first-order algorithms. We devise a novel algorithm, referred to as \emph{Recursive One-Over-T SGD} (\textsf{ROOT-SGD}), based on an easily implementable, recursive averaging of past stochastic gradients. We prove that it simultaneously achieves state-of-the-art performance in both a finite-sample, nonasymptotic sense and an asymptotic sense. On the nonasymptotic side, we prove risk bounds on the last iterate of \textsf{ROOT-SGD} with leading-order terms that match the optimal statistical risk with a unity pre-factor, along with a higher-order term that scales at the sharp rate of $O(n^{-3/2})$ under the Lipschitz condition on the Hessian matrix. On the asymptotic side, we show that when a mild, one-point Hessian continuity condition is imposed, the rescaled last iterate of (multi-epoch) \textsf{ROOT-SGD} converges asymptotically to a Gaussian limit with the Cram\'{e}r-Rao optimal asymptotic covariance, for a broad range of step-size choices.

Updated: 2024-06-18 17:03:10

标题: ROOT-SGD：单一算法中的尖锐非渐近性和近乎最优渐近性

摘要: 我们研究了使用随机一阶算法解决强凸和光滑无约束优化问题的问题。我们设计了一种新颖的算法，称为\emph{递归一分之一SGD}（\textsf{ROOT-SGD}），基于过去随机梯度的递归平均，易于实现。我们证明它在有限样本非渐近意义和渐近意义上同时实现了最先进的性能。在非渐近方面，我们证明了\textsf{ROOT-SGD}的最后迭代的风险界具有与最优统计风险匹配的一阶项，以及一个按尖锐速率$O(n^{-3/2})$缩放的高阶项，前提是Hessian矩阵满足Lipschitz条件。在渐近方面，我们展示了当施加轻微的一点Hessian连续性条件时，（多轮）\textsf{ROOT-SGD}的重新缩放的最后迭代在一定范围的步长选择下渐近收敛到具有Cram\'{e}r-Rao最优渐近协方差的高斯极限。

更新时间: 2024-06-18 17:03:10

领域: math.OC,cs.LG,stat.ML

下载: http://arxiv.org/abs/2008.12690v3

The Limits of Pure Exploration in POMDPs: When the Observation Entropy is Enough

The problem of pure exploration in Markov decision processes has been cast as maximizing the entropy over the state distribution induced by the agent's policy, an objective that has been extensively studied. However, little attention has been dedicated to state entropy maximization under partial observability, despite the latter being ubiquitous in applications, e.g., finance and robotics, in which the agent only receives noisy observations of the true state governing the system's dynamics. How can we address state entropy maximization in those domains? In this paper, we study the simple approach of maximizing the entropy over observations in place of true latent states. First, we provide lower and upper bounds to the approximation of the true state entropy that only depends on some properties of the observation function. Then, we show how knowledge of the latter can be exploited to compute a principled regularization of the observation entropy to improve performance. With this work, we provide both a flexible approach to bring advances in state entropy maximization to the POMDP setting and a theoretical characterization of its intrinsic limits.

Updated: 2024-06-18 17:00:13

标题: POMDPs中纯探索的极限：当观测熵足够时

摘要: 马尔可夫决策过程中的纯探索问题被定义为最大化由代理策略诱导的状态分布的熵，这一目标已得到广泛研究。然而，在部分可观测性下，即代理只接收真实状态的噪声观察的应用中，比如金融和机器人领域，对状态熵最大化的注意力却很少。我们如何在这些领域解决状态熵最大化问题？在本文中，我们研究了一种简单的方法，即在真实潜在状态的位置上最大化观察的熵。首先，我们提供了仅取决于观察函数某些属性的真实状态熵近似的下限和上限。然后，我们展示了如何利用观察函数的知识来计算一个有原则的观察熵正则化以提高性能。通过这项工作，我们提供了一种灵活的方法，将状态熵最大化的进展带入POMDP设置，并对其固有限制进行了理论刻画。

更新时间: 2024-06-18 17:00:13

领域: cs.LG

下载: http://arxiv.org/abs/2406.12795v1

In-Context Learning of Energy Functions

In-context learning is a powerful capability of certain machine learning models that arguably underpins the success of today's frontier AI models. However, in-context learning is critically limited to settings where the in-context distribution of interest $p_{\theta}^{ICL}( x|\mathcal{D})$ can be straightforwardly expressed and/or parameterized by the model; for instance, language modeling relies on expressing the next-token distribution as a categorical distribution parameterized by the network's output logits. In this work, we present a more general form of in-context learning without such a limitation that we call \textit{in-context learning of energy functions}. The idea is to instead learn the unconstrained and arbitrary in-context energy function $E_{\theta}^{ICL}(x|\mathcal{D})$ corresponding to the in-context distribution $p_{\theta}^{ICL}(x|\mathcal{D})$. To do this, we use classic ideas from energy-based modeling. We provide preliminary evidence that our method empirically works on synthetic data. Interestingly, our work contributes (to the best of our knowledge) the first example of in-context learning where the input space and output space differ from one another, suggesting that in-context learning is a more-general capability than previously realized.

Updated: 2024-06-18 16:54:43

标题: 能量函数的上下文学习

摘要: 在上下文学习是某些机器学习模型的强大能力，可以说是当今前沿人工智能模型成功的基础。然而，在上下文学习在仅仅在上下文分布可以直接表达和/或由模型参数化的情况下才能发挥作用；例如，语言建模依赖于将下一个标记的分布表达为由网络输出对数参数化的分类分布。在这项工作中，我们提出了一种更一般形式的上下文学习，称为\textit{能量函数的上下文学习}，没有这样的限制。这个想法是学习与上下文分布$p_{\theta}^{ICL}(x|\mathcal{D})$对应的无约束和任意的上下文能量函数$E_{\theta}^{ICL}(x|\mathcal{D})$。为了做到这一点，我们使用了能量基建模的经典思想。我们提供初步证据表明我们的方法在合成数据上经验上是有效的。有趣的是，我们的工作贡献了（据我们所知）第一个例子，其中输入空间和输出空间相互不同，这表明上下文学习是一个比以前意识到的更一般的能力。

更新时间: 2024-06-18 16:54:43

领域: cs.LG

下载: http://arxiv.org/abs/2406.12785v1

Knowledge Graphs in Practice: Characterizing their Users, Challenges, and Visualization Opportunities

This study presents insights from interviews with nineteen Knowledge Graph (KG) practitioners who work in both enterprise and academic settings on a wide variety of use cases. Through this study, we identify critical challenges experienced by KG practitioners when creating, exploring, and analyzing KGs that could be alleviated through visualization design. Our findings reveal three major personas among KG practitioners - KG Builders, Analysts, and Consumers - each of whom have their own distinct expertise and needs. We discover that KG Builders would benefit from schema enforcers, while KG Analysts need customizable query builders that provide interim query results. For KG Consumers, we identify a lack of efficacy for node-link diagrams, and the need for tailored domain-specific visualizations to promote KG adoption and comprehension. Lastly, we find that implementing KGs effectively in practice requires both technical and social solutions that are not addressed with current tools, technologies, and collaborative workflows. From the analysis of our interviews, we distill several visualization research directions to improve KG usability, including knowledge cards that balance digestibility and discoverability, timeline views to track temporal changes, interfaces that support organic discovery, and semantic explanations for AI and machine learning predictions.

Updated: 2024-06-18 16:47:40

标题: 实践中的知识图谱：用户、挑战和可视化机会的特征化

摘要: 这项研究通过对工作在企业和学术环境中从事各种知识图（KG）应用的19位从业者进行访谈，提供了一些深刻的见解。通过这项研究，我们发现知识图从业者在创建、探索和分析知识图时面临的关键挑战，这些挑战可以通过可视化设计来缓解。研究结果显示，知识图从业者中存在三种主要角色-知识图构建者、分析师和消费者，每种角色都有自己独特的专业知识和需求。我们发现，知识图构建者将受益于模式强制执行者，而知识图分析师需要提供临时查询结果的可定制查询构建器。对于知识图消费者，我们发现节点链接图的效力不足，需要定制的领域特定可视化来促进知识图的采用和理解。最后，我们发现，在实践中有效实施知识图需要技术和社会解决方案，目前的工具、技术和协作工作流程并未解决这些问题。通过对访谈内容的分析，我们提炼出了几个可视化研究方向，以提升知识图的可用性，包括平衡易消化性和发现性的知识卡，用于跟踪时间变化的时间轴视图，支持有机发现的界面，以及用于AI和机器学习预测的语义解释。

更新时间: 2024-06-18 16:47:40

领域: cs.HC,cs.DB,cs.LG

下载: http://arxiv.org/abs/2304.01311v4

Plasma Surrogate Modelling using Fourier Neural Operators

Predicting plasma evolution within a Tokamak reactor is crucial to realizing the goal of sustainable fusion. Capabilities in forecasting the spatio-temporal evolution of plasma rapidly and accurately allow us to quickly iterate over design and control strategies on current Tokamak devices and future reactors. Modelling plasma evolution using numerical solvers is often expensive, consuming many hours on supercomputers, and hence, we need alternative inexpensive surrogate models. We demonstrate accurate predictions of plasma evolution both in simulation and experimental domains using deep learning-based surrogate modelling tools, viz., Fourier Neural Operators (FNO). We show that FNO has a speedup of six orders of magnitude over traditional solvers in predicting the plasma dynamics simulated from magnetohydrodynamic models, while maintaining a high accuracy (MSE in the normalised domain $\approx$ $10^{-5}$). Our modified version of the FNO is capable of solving multi-variable Partial Differential Equations (PDE), and can capture the dependence among the different variables in a single model. FNOs can also predict plasma evolution on real-world experimental data observed by the cameras positioned within the MAST Tokamak, i.e., cameras looking across the central solenoid and the divertor in the Tokamak. We show that FNOs are able to accurately forecast the evolution of plasma and have the potential to be deployed for real-time monitoring. We also illustrate their capability in forecasting the plasma shape, the locations of interactions of the plasma with the central solenoid and the divertor for the full (available) duration of the plasma shot within MAST. The FNO offers a viable alternative for surrogate modelling as it is quick to train and infer, and requires fewer data points, while being able to do zero-shot super-resolution and getting high-fidelity solutions.

Updated: 2024-06-18 16:46:44

标题: 用傅里叶神经算子进行等离子体替代建模

摘要: 在托卡马克反应堆内预测等离子体演变对实现可持续核聚变目标至关重要。快速准确地预测等离子体的时空演变能够使我们能够快速在当前托卡马克装置和未来反应堆上迭代设计和控制策略。使用数值求解器建模等离子体演变通常很昂贵，在超级计算机上消耗很多小时，因此，我们需要替代性廉价的代理模型。我们使用基于深度学习的代理建模工具，即傅里叶神经算子（FNO），在模拟和实验领域准确预测等离子体演变。我们展示了FNO在预测磁流体力学模型模拟的等离子体动态方面比传统求解器快六个数量级，同时保持高准确性（在标准化领域的均方误差约为$10^{-5}$）。我们改进的FNO版本能够解决多变量偏微分方程，并能够在单一模型中捕捉不同变量之间的依赖关系。FNO还能够预测MAST托卡马克内置摄像机观察到的真实世界实验数据中的等离子体演变，即观察托卡马克中央线圈和分流器的摄像机。我们展示了FNO能够准确预测等离子体的演变，并有潜力用于实时监测。我们还展示了它们在预测等离子体形状、等离子体与中央线圈和分流器的相互作用位置以及MAST内等离子体射击的全部（可用）持续时间方面的能力。FNO提供了一种可行的代理建模替代方案，因为它训练和推理速度快，需要的数据点较少，同时能够进行零样本超分辨率并获得高保真度解决方案。

更新时间: 2024-06-18 16:46:44

领域: physics.plasm-ph,cs.LG

下载: http://arxiv.org/abs/2311.05967v2

OPFData: Large-scale datasets for AC optimal power flow with topological perturbations

Solving the AC optimal power flow problem (AC-OPF) is critical to the efficient and safe planning and operation of power grids. Small efficiency improvements in this domain have the potential to lead to billions of dollars of cost savings, and significant reductions in emissions from fossil fuel generators. Recent work on data-driven solution methods for AC-OPF shows the potential for large speed improvements compared to traditional solvers; however, no large-scale open datasets for this problem exist. We present the largest readily-available collection of solved AC-OPF problems to date. This collection is orders of magnitude larger than existing readily-available datasets, allowing training of high-capacity data-driven models. Uniquely, it includes topological perturbations - a critical requirement for usage in realistic power grid operations. We hope this resource will spur the community to scale research to larger grid sizes with variable topology.

Updated: 2024-06-18 16:46:21

标题: OPFData：具有拓扑扰动的大规模交流最优潮流数据集

摘要: 解决交流最优潮流问题（AC-OPF）对于电网的高效和安全规划和运行至关重要。在这一领域的小效率改进有可能导致数十亿美元的成本节约，以及从化石燃料发电机中显著减少排放。最近关于AC-OPF的数据驱动解决方法的研究显示，与传统求解器相比，有巨大的速度改进潜力；然而，目前没有大规模的开放数据集用于解决这一问题。我们提供了迄今为止最大的可用解决AC-OPF问题的集合。这一集合比现有的可用数据集大数个数量级，可以训练高容量的数据驱动模型。独特之处在于，它包括拓扑扰动 - 这是在现实电网运营中使用的关键要求。我们希望这一资源能够激励社区将研究扩展到拓扑结构可变的更大规模电网。

更新时间: 2024-06-18 16:46:21

领域: cs.LG

下载: http://arxiv.org/abs/2406.07234v2

Towards Exact Gradient-based Training on Analog In-memory Computing

Given the high economic and environmental costs of using large vision or language models, analog in-memory accelerators present a promising solution for energy-efficient AI. While inference on analog accelerators has been studied recently, the training perspective is underexplored. Recent studies have shown that the "workhorse" of digital AI training - stochastic gradient descent (SGD) algorithm converges inexactly when applied to model training on non-ideal devices. This paper puts forth a theoretical foundation for gradient-based training on analog devices. We begin by characterizing the non-convergent issue of SGD, which is caused by the asymmetric updates on the analog devices. We then provide a lower bound of the asymptotic error to show that there is a fundamental performance limit of SGD-based analog training rather than an artifact of our analysis. To address this issue, we study a heuristic analog algorithm called Tiki-Taka that has recently exhibited superior empirical performance compared to SGD and rigorously show its ability to exactly converge to a critical point and hence eliminates the asymptotic error. The simulations verify the correctness of the analyses.

Updated: 2024-06-18 16:43:59

标题: 朝向基于梯度的模拟内存计算精确训练

摘要: 鉴于使用大型视觉或语言模型的高经济和环境成本，模拟内存加速器为实现能效智能提供了一个有希望的解决方案。虽然最近已经研究了模拟加速器上的推理，但训练角度尚未得到充分探讨。最近的研究表明，数字AI训练的“工作马”——随机梯度下降（SGD）算法在应用于非理想设备上的模型训练时收敛不完全。本文提出了一个关于模拟设备基于梯度的训练的理论基础。我们首先对SGD的非收敛问题进行了表征，这是由于模拟设备上的不对称更新造成的。然后，我们提供了一个渐近误差的下界，以表明基于SGD的模拟训练存在一个基本性能限制，而不是我们分析的产物。为了解决这个问题，我们研究了一个名为Tiki-Taka的启发式模拟算法，最近的实验表明其相对于SGD表现出了更好的实证性能，并严格证明了它能够准确收敛到临界点，从而消除了渐近误差。模拟验证了分析的正确性。

更新时间: 2024-06-18 16:43:59

领域: cs.LG,cs.AR,math.OC

下载: http://arxiv.org/abs/2406.12774v1

First-Order Methods for Linearly Constrained Bilevel Optimization

Algorithms for bilevel optimization often encounter Hessian computations, which are prohibitive in high dimensions. While recent works offer first-order methods for unconstrained bilevel problems, the constrained setting remains relatively underexplored. We present first-order linearly constrained optimization methods with finite-time hypergradient stationarity guarantees. For linear equality constraints, we attain $\epsilon$-stationarity in $\widetilde{O}(\epsilon^{-2})$ gradient oracle calls, which is nearly-optimal. For linear inequality constraints, we attain $(\delta,\epsilon)$-Goldstein stationarity in $\widetilde{O}(d{\delta^{-1} \epsilon^{-3}})$ gradient oracle calls, where $d$ is the upper-level dimension. Finally, we obtain for the linear inequality setting dimension-free rates of $\widetilde{O}({\delta^{-1} \epsilon^{-4}})$ oracle complexity under the additional assumption of oracle access to the optimal dual variable. Along the way, we develop new nonsmooth nonconvex optimization methods with inexact oracles. We verify these guarantees with preliminary numerical experiments.

Updated: 2024-06-18 16:41:21

标题: 线性约束双层优化的一阶方法

摘要: 双层优化算法通常会遇到在高维度下计算Hessian矩阵的问题，这在很多情况下是不可行的。尽管最近的研究提供了针对无约束双层问题的一阶方法，但约束条件下的情况仍然相对未被充分探讨。我们提出了具有有限时间超梯度稳定性保证的一阶线性约束优化方法。对于线性等式约束，我们在$\widetilde{O}(\epsilon^{-2})$个梯度预测调用中实现了$\epsilon$-稳定性，这几乎是最优的。对于线性不等式约束，我们在$\widetilde{O}(d{\delta^{-1} \epsilon^{-3}})$个梯度预测调用中实现了$(\delta,\epsilon)$-Goldstein稳定性，其中$d$为上层维度。最后，我们在线性不等式约束设置下获得了无维度速率为$\widetilde{O}({\delta^{-1} \epsilon^{-4}})$的预测调用复杂度，假设可访问最优对偶变量的预测。在此过程中，我们开发了新的非光滑非凸优化方法，其中使用不精确的预测。我们通过初步的数值实验验证了这些保证。

更新时间: 2024-06-18 16:41:21

领域: math.OC,cs.LG

下载: http://arxiv.org/abs/2406.12771v1

Inference via Interpolation: Contrastive Representations Provably Enable Planning and Inference

Given time series data, how can we answer questions like "what will happen in the future?" and "how did we get here?" These sorts of probabilistic inference questions are challenging when observations are high-dimensional. In this paper, we show how these questions can have compact, closed form solutions in terms of learned representations. The key idea is to apply a variant of contrastive learning to time series data. Prior work already shows that the representations learned by contrastive learning encode a probability ratio. By extending prior work to show that the marginal distribution over representations is Gaussian, we can then prove that joint distribution of representations is also Gaussian. Taken together, these results show that representations learned via temporal contrastive learning follow a Gauss-Markov chain, a graphical model where inference (e.g., prediction, planning) over representations corresponds to inverting a low-dimensional matrix. In one special case, inferring intermediate representations will be equivalent to interpolating between the learned representations. We validate our theory using numerical simulations on tasks up to 46-dimensions.

Updated: 2024-06-18 16:40:32

标题: 通过插值进行推断：对比表征可证明地实现规划和推断

摘要: 鉴于时间序列数据，我们如何回答类似“未来会发生什么？”和“我们是如何到达这里的？”这类概率推断问题在观测结果具有高维度时是具有挑战性的。在本文中，我们展示了这些问题如何可以通过学习表示的紧凑闭合形式解决。关键思想是将对比学习的变体应用于时间序列数据。先前的工作已经表明，通过对比学习学习到的表示编码了概率比。通过将先前的工作扩展到表明表示的边际分布是高斯分布，我们可以证明表示的联合分布也是高斯分布。总的来说，这些结果表明，通过时间对比学习学到的表示遵循高斯-马尔可夫链，一种图形模型，其中对表示的推断（例如预测、规划）相当于反演一个低维矩阵。在一个特殊情况下，推断中间表示将等同于在学习到的表示之间插值。我们使用高达46维度的任务进行数字模拟来验证我们的理论。

更新时间: 2024-06-18 16:40:32

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2403.04082v2

Formatics & dairy industry coalition: AI trends and present challenges

Artificial Intelligence (AI) can potentially transform the industry, enhancing the production process and minimizing manual, repetitive tasks. Accordingly, the synergy between high-performance computing and powerful mathematical models enables the application of sophisticated data analysis procedures like Machine Learning. However, challenges exist regarding effective, efficient, and flexible processing to generate valuable knowledge. Consequently, this work comprehensively describes industrial challenges where AI can be exploited, focusing on the dairy industry. The conclusions presented can help researchers apply novel approaches for cattle monitoring and farmers by proposing advanced technological solutions to their needs.

Updated: 2024-06-18 16:39:21

标题: 形态学与奶业联盟：人工智能趋势与当前挑战

摘要: 人工智能（AI）有可能彻底改变产业，提升生产过程并最小化手动、重复性工作。因此，高性能计算和强大的数学模型之间的协同作用使得复杂数据分析程序如机器学习得以应用。然而，存在着关于生成有价值知识的有效、高效和灵活处理方面的挑战。因此，这项工作全面描述了AI可以被利用的工业挑战，并重点关注奶业。所述结论可以帮助研究人员应用新颖方法对牛的监测和农民提出先进技术解决方案。

更新时间: 2024-06-18 16:39:21

领域: cs.AI,cs.CL

下载: http://arxiv.org/abs/2406.12770v1

Latent Intuitive Physics: Learning to Transfer Hidden Physics from A 3D Video

We introduce latent intuitive physics, a transfer learning framework for physics simulation that can infer hidden properties of fluids from a single 3D video and simulate the observed fluid in novel scenes. Our key insight is to use latent features drawn from a learnable prior distribution conditioned on the underlying particle states to capture the invisible and complex physical properties. To achieve this, we train a parametrized prior learner given visual observations to approximate the visual posterior of inverse graphics, and both the particle states and the visual posterior are obtained from a learned neural renderer. The converged prior learner is embedded in our probabilistic physics engine, allowing us to perform novel simulations on unseen geometries, boundaries, and dynamics without knowledge of the true physical parameters. We validate our model in three ways: (i) novel scene simulation with the learned visual-world physics, (ii) future prediction of the observed fluid dynamics, and (iii) supervised particle simulation. Our model demonstrates strong performance in all three tasks.

Updated: 2024-06-18 16:37:44

标题: 潜在的直觉物理学：学习从3D视频中转移隐藏的物理学

摘要: 我们提出了潜在直觉物理，这是一个用于物理仿真的迁移学习框架，可以从单个3D视频中推断出流体的隐藏特性，并在新颖的场景中模拟观察到的流体。我们的关键见解是利用从可学习的先验分布中提取的潜在特征，条件于基础粒子状态，来捕获不可见和复杂的物理特性。为了实现这一目标，我们训练了一个参数化的先验学习器，给定视觉观察，以逼近逆向图形的视觉后验，粒子状态和视觉后验均来自于学习的神经渲染器。收敛的先验学习器嵌入在我们的概率物理引擎中，使我们能够在没有真实物理参数知识的情况下对未见几何、边界和动力学进行新颖仿真。我们通过三种方式验证了我们的模型：（i）使用学习到的视觉世界物理进行新颖场景仿真，（ii）对观察到的流体动力学进行未来预测，以及（iii）监督粒子仿真。我们的模型在所有三个任务中表现出强大的性能。

更新时间: 2024-06-18 16:37:44

领域: cs.AI,cs.CV

下载: http://arxiv.org/abs/2406.12769v1

Quasi-Bayes meets Vines

Recently proposed quasi-Bayesian (QB) methods initiated a new era in Bayesian computation by directly constructing the Bayesian predictive distribution through recursion, removing the need for expensive computations involved in sampling the Bayesian posterior distribution. This has proved to be data-efficient for univariate predictions, but extensions to multiple dimensions rely on a conditional decomposition resulting from predefined assumptions on the kernel of the Dirichlet Process Mixture Model, which is the implicit nonparametric model used. Here, we propose a different way to extend Quasi-Bayesian prediction to high dimensions through the use of Sklar's theorem by decomposing the predictive distribution into one-dimensional predictive marginals and a high-dimensional copula. Thus, we use the efficient recursive QB construction for the one-dimensional marginals and model the dependence using highly expressive vine copulas. Further, we tune hyperparameters using robust divergences (eg. energy score) and show that our proposed Quasi-Bayesian Vine (QB-Vine) is a fully non-parametric density estimator with \emph{an analytical form} and convergence rate independent of the dimension of data in some situations. Our experiments illustrate that the QB-Vine is appropriate for high dimensional distributions ($\sim$64), needs very few samples to train ($\sim$200) and outperforms state-of-the-art methods with analytical forms for density estimation and supervised tasks by a considerable margin.

Updated: 2024-06-18 16:31:02

标题: 准贝叶斯遇上藤图

摘要: 最近提出的拟贝叶斯（QB）方法开启了贝叶斯计算的新时代，通过直接递归构建贝叶斯预测分布，消除了在采样贝叶斯后验分布时涉及的昂贵计算需求。这已经被证明在单变量预测方面是高效的，但对于多维度的扩展则依赖于对狄利克雷过程混合模型内核的条件分解，这是隐含的非参数模型。在这里，我们提出了一种通过使用Sklar定理将预测分布分解为一维预测边际和高维copula的不同方式，从而将拟贝叶斯预测扩展到高维度。因此，我们使用高度表达的藤copulas对一维边际进行高效递归QB构建，并通过稳健的差异度量（例如能量分数）来调整超参数，并展示我们提出的拟贝叶斯藤（QB-Vine）在某些情况下是完全非参数的密度估计器，具有\emph{分析形式}，并且收敛速度与数据维度无关。我们的实验表明，QB-Vine适用于高维分布（$\sim$64），需要很少的样本来训练（$\sim$200），并且在密度估计和监督任务的分析形式方面胜过了现有的方法。

更新时间: 2024-06-18 16:31:02

领域: stat.ML,cs.LG,62G07

下载: http://arxiv.org/abs/2406.12764v1

Implicit Bias of Mirror Flow on Separable Data

We examine the continuous-time counterpart of mirror descent, namely mirror flow, on classification problems which are linearly separable. Such problems are minimised `at infinity' and have many possible solutions; we study which solution is preferred by the algorithm depending on the mirror potential. For exponential tailed losses and under mild assumptions on the potential, we show that the iterates converge in direction towards a $\phi_\infty$-maximum margin classifier. The function $\phi_\infty$ is the $\textit{horizon function}$ of the mirror potential and characterises its shape `at infinity'. When the potential is separable, a simple formula allows to compute this function. We analyse several examples of potentials and provide numerical experiments highlighting our results.

Updated: 2024-06-18 16:30:51

标题: 镜面流对可分离数据的隐性偏见

摘要: 我们研究了镜像下降的连续时间对应物，即分类问题上的镜像流，这些问题是线性可分的。这些问题在“无穷远处”被最小化，并且有许多可能的解；我们研究了算法根据镜像势能选择哪个解。对于指数尾损失并在势能上做出温和假设，我们证明了迭代会朝向$\phi_\infty$-最大边界分类器收敛。函数$\phi_\infty$是镜像势能的$\textit{边界函数}$，描述了其在“无穷远处”的形状。当势能可分时，一个简单的公式可以计算出这个函数。我们分析了几个势能的例子，并提供了突出我们结果的数值实验。

更新时间: 2024-06-18 16:30:51

领域: stat.ML,cs.LG,math.OC

下载: http://arxiv.org/abs/2406.12763v1

Unsupervised explainable activity prediction in competitive Nordic Walking from experimental data

Artificial Intelligence (AI) has found application in Human Activity Recognition (HAR) in competitive sports. To date, most Machine Learning (ML) approaches for HAR have relied on offline (batch) training, imposing higher computational and tagging burdens compared to online processing unsupervised approaches. Additionally, the decisions behind traditional ML predictors are opaque and require human interpretation. In this work, we apply an online processing unsupervised clustering approach based on low-cost wearable Inertial Measurement Units (IMUs). The outcomes generated by the system allow for the automatic expansion of limited tagging available (e.g., by referees) within those clusters, producing pertinent information for the explainable classification stage. Specifically, our work focuses on achieving automatic explainability for predictions related to athletes' activities, distinguishing between correct, incorrect, and cheating practices in Nordic Walking. The proposed solution achieved performance metrics of close to 100 % on average.

Updated: 2024-06-18 16:29:07

标题: 无监督的可解释的活动预测在竞技北欧步行中的实验数据中

摘要: 人工智能（AI）已在竞技体育中的人类活动识别（HAR）中找到了应用。迄今为止，大多数用于HAR的机器学习（ML）方法依赖于离线（批量）训练，与在线处理无监督方法相比，会带来更高的计算和标记负担。此外，传统ML预测器背后的决策是不透明的，并需要人类解释。在这项工作中，我们应用了一种基于低成本可穿戴惯性测量单元（IMUs）的在线处理无监督聚类方法。系统生成的结果允许在这些聚类中自动扩展有限的标记（例如，由裁判员提供），为可解释的分类阶段提供相关信息。具体而言，我们的工作侧重于实现与运动员活动相关的预测的自动可解释性，区分北欧健步行中的正确、错误和作弊行为。所提出的解决方案在平均性能指标上达到了接近100%的水平。

更新时间: 2024-06-18 16:29:07

领域: cs.LG,cs.AI,cs.HC

下载: http://arxiv.org/abs/2406.12762v1

GFM4MPM: Towards Geospatial Foundation Models for Mineral Prospectivity Mapping

Machine Learning (ML) for Mineral Prospectivity Mapping (MPM) remains a challenging problem as it requires the analysis of associations between large-scale multi-modal geospatial data and few historical mineral commodity observations (positive labels). Recent MPM works have explored Deep Learning (DL) as a modeling tool with more representation capacity. However, these overparameterized methods may be more prone to overfitting due to their reliance on scarce labeled data. While a large quantity of unlabeled geospatial data exists, no prior MPM works have considered using such information in a self-supervised manner. Our MPM approach uses a masked image modeling framework to pretrain a backbone neural network in a self-supervised manner using unlabeled geospatial data alone. After pretraining, the backbone network provides feature extraction for downstream MPM tasks. We evaluated our approach alongside existing methods to assess mineral prospectivity of Mississippi Valley Type (MVT) and Clastic-Dominated (CD) Lead-Zinc deposits in North America and Australia. Our results demonstrate that self-supervision promotes robustness in learned features, improving prospectivity predictions. Additionally, we leverage explainable artificial intelligence techniques to demonstrate that individual predictions can be interpreted from a geological perspective.

Updated: 2024-06-18 16:24:28

标题: GFM4MPM：面向矿产远景预测制图的地理空间基础模型

摘要: 矿产远景映射（MPM）的机器学习（ML）仍然是一个具有挑战性的问题，因为它需要分析大规模多模式地理空间数据与少量历史矿产观测（正标签）之间的关联。最近的MPM研究已经探索了深度学习（DL）作为具有更多表征能力的建模工具。然而，这些过度参数化的方法可能更容易过拟合，因为它们依赖于稀缺的标记数据。虽然存在大量未标记的地理空间数据，但以前的MPM研究尚未考虑以自监督的方式使用这些信息。我们的MPM方法使用一个掩膜图像建模框架，以自监督的方式使用仅未标记的地理空间数据预训练一个骨干神经网络。在预训练后，骨干网络为下游MPM任务提供特征提取。我们评估了我们的方法，与现有方法一起评估了北美和澳大利亚的密西西比河谷型（MVT）和碎屑主导型（CD）铅锌矿床的矿产远景。我们的结果表明，自监督促进了学习特征的稳健性，改善了矿产远景预测。此外，我们利用可解释的人工智能技术，展示了可以从地质学角度解释个别预测。

更新时间: 2024-06-18 16:24:28

领域: cs.LG,cs.CV

下载: http://arxiv.org/abs/2406.12756v1

DataComp-LM: In search of the next generation of training sets for language models

We introduce DataComp for Language Models (DCLM), a testbed for controlled dataset experiments with the goal of improving language models. As part of DCLM, we provide a standardized corpus of 240T tokens extracted from Common Crawl, effective pretraining recipes based on the OpenLM framework, and a broad suite of 53 downstream evaluations. Participants in the DCLM benchmark can experiment with data curation strategies such as deduplication, filtering, and data mixing at model scales ranging from 412M to 7B parameters. As a baseline for DCLM, we conduct extensive experiments and find that model-based filtering is key to assembling a high-quality training set. The resulting dataset, DCLM-Baseline enables training a 7B parameter language model from scratch to 64% 5-shot accuracy on MMLU with 2.6T training tokens. Compared to MAP-Neo, the previous state-of-the-art in open-data language models, DCLM-Baseline represents a 6.6 percentage point improvement on MMLU while being trained with 40% less compute. Our baseline model is also comparable to Mistral-7B-v0.3 and Llama 3 8B on MMLU (63% & 66%), and performs similarly on an average of 53 natural language understanding tasks while being trained with 6.6x less compute than Llama 3 8B. Our results highlight the importance of dataset design for training language models and offer a starting point for further research on data curation.

Updated: 2024-06-18 16:22:45

标题: DataComp-LM：寻找下一代语言模型的训练集

摘要: 我们引入了用于语言模型（DCLM）的数据比较测试平台，旨在通过控制数据集实验来改进语言模型。作为DCLM的一部分，我们提供了一个标准化语料库，从Common Crawl中提取了240T个标记，基于OpenLM框架的有效预训练配方，以及一个包含53个下游评估的广泛套件。参与DCLM基准测试的参与者可以在412M到7B参数范围内的模型规模上尝试数据整理策略，如去重、过滤和数据混合。作为DCLM的基线，我们进行了大量实验，并发现基于模型的过滤对于组装高质量训练集至关重要。由此产生的数据集，DCLM-Baseline使得可以从头开始训练一个7B参数的语言模型，在MMLU上达到64%的5-shot准确率，训练标记为2.6T。与先前在开放数据语言模型中的MAP-Neo相比，DCLM-Baseline在MMLU上代表了6.6个百分点的改进，同时使用的计算量也减少了40%。我们的基线模型在MMLU上也与Mistral-7B-v0.3和Llama 3 8B相媲美（63％和66％），并且在53个自然语言理解任务的平均表现上与Llama 3 8B相似，同时使用的计算量比Llama 3 8B少6.6倍。我们的结果突显了数据集设计对训练语言模型的重要性，并为进一步研究数据整理提供了一个起点。

更新时间: 2024-06-18 16:22:45

领域: cs.LG,cs.CL

下载: http://arxiv.org/abs/2406.11794v2

Chumor 1.0: A Truly Funny and Challenging Chinese Humor Understanding Dataset from Ruo Zhi Ba

Existing humor datasets and evaluations predominantly focus on English, lacking resources for culturally nuanced humor in non-English languages like Chinese. To address this gap, we construct Chumor, a dataset sourced from Ruo Zhi Ba (RZB), a Chinese Reddit-like platform dedicated to sharing intellectually challenging and culturally specific jokes. We annotate explanations for each joke and evaluate human explanations against two state-of-the-art LLMs, GPT-4o and ERNIE Bot, through A/B testing by native Chinese speakers. Our evaluation shows that Chumor is challenging even for SOTA LLMs, and the human explanations for Chumor jokes are significantly better than explanations generated by the LLMs.

Updated: 2024-06-18 16:22:05

标题: Chumor 1.0: 来自若知吧的真正有趣且具挑战性的中国幽默理解数据集

摘要: 现有的幽默数据集和评估主要集中在英语上，缺乏针对像中文这样的非英语语言中文化细微幽默的资源。为了填补这一空白，我们构建了Chumor数据集，这个数据集是从若知巴（RZB）中文Reddit类平台上收集的，该平台致力于分享具有挑战性和文化特定的笑话。我们为每个笑话注释解释，并通过中国母语者进行A/B测试，将人类解释与两种最先进的LLM，GPT-4o和ERNIE Bot进行评估。我们的评估结果显示，即使对于最先进的LLMs来说，Chumor也是具有挑战性的，而针对Chumor笑话的人类解释明显优于LLMs生成的解释。

更新时间: 2024-06-18 16:22:05

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2406.12754v1

Unleashing the potential of prompt engineering in Large Language Models: a comprehensive review

This paper delves into the pivotal role of prompt engineering in unleashing the capabilities of Large Language Models (LLMs). Prompt engineering is the process of structuring input text for LLMs and is a technique integral to optimizing the efficacy of LLMs. This survey elucidates foundational principles of prompt engineering, such as role-prompting, one-shot, and few-shot prompting, as well as more advanced methodologies such as the chain-of-thought and tree-of-thoughts prompting. The paper sheds light on how external assistance in the form of plugins can assist in this task, and reduce machine hallucination by retrieving external knowledge. We subsequently delineate prospective directions in prompt engineering research, emphasizing the need for a deeper understanding of structures and the role of agents in Artificial Intelligence-Generated Content (AIGC) tools. We discuss how to assess the efficacy of prompt methods from different perspectives and using different methods. Finally, we gather information about the application of prompt engineering in such fields as education and programming, showing its transformative potential. This comprehensive survey aims to serve as a friendly guide for anyone venturing through the big world of LLMs and prompt engineering.

Updated: 2024-06-18 16:21:12

标题: 释放大型语言模型中迅速工程的潜力：全面回顾

摘要: 本文探讨了及时工程在释放大型语言模型（LLMs）能力中的关键作用。及时工程是为LLMs结构化输入文本的过程，是优化LLMs功效的关键技术。本调查阐明了及时工程的基本原则，如角色提示、一次性和少次提示，以及更先进的方法，如思维链和思维树提示。本文阐明了外部插件形式的外部辅助如何在这项任务中发挥作用，并通过检索外部知识减少机器幻觉。我们随后描绘了及时工程研究的前景方向，强调对结构和代理在人工智能生成内容（AIGC）工具中的作用的更深入理解的必要性。我们讨论了如何从不同的角度和使用不同的方法评估提示方法的功效。最后，我们收集了有关提示工程在教育和编程等领域的应用的信息，展示其变革潜力。这份全面调查旨在为任何冒险探索LLMs和提示工程大世界的人提供友好指南。

更新时间: 2024-06-18 16:21:12

领域: cs.CL,cs.AI,I.2.7

下载: http://arxiv.org/abs/2310.14735v4

OlympicArena: Benchmarking Multi-discipline Cognitive Reasoning for Superintelligent AI

The evolution of Artificial Intelligence (AI) has been significantly accelerated by advancements in Large Language Models (LLMs) and Large Multimodal Models (LMMs), gradually showcasing potential cognitive reasoning abilities in problem-solving and scientific discovery (i.e., AI4Science) once exclusive to human intellect. To comprehensively evaluate current models' performance in cognitive reasoning abilities, we introduce OlympicArena, which includes 11,163 bilingual problems across both text-only and interleaved text-image modalities. These challenges encompass a wide range of disciplines spanning seven fields and 62 international Olympic competitions, rigorously examined for data leakage. We argue that the challenges in Olympic competition problems are ideal for evaluating AI's cognitive reasoning due to their complexity and interdisciplinary nature, which are essential for tackling complex scientific challenges and facilitating discoveries. Beyond evaluating performance across various disciplines using answer-only criteria, we conduct detailed experiments and analyses from multiple perspectives. We delve into the models' cognitive reasoning abilities, their performance across different modalities, and their outcomes in process-level evaluations, which are vital for tasks requiring complex reasoning with lengthy solutions. Our extensive evaluations reveal that even advanced models like GPT-4o only achieve a 39.97% overall accuracy, illustrating current AI limitations in complex reasoning and multimodal integration. Through the OlympicArena, we aim to advance AI towards superintelligence, equipping it to address more complex challenges in science and beyond. We also provide a comprehensive set of resources to support AI research, including a benchmark dataset, an open-source annotation platform, a detailed evaluation tool, and a leaderboard with automatic submission features.

Updated: 2024-06-18 16:20:53

标题: 奥林匹克竞技场：对超级智能AI进行多学科认知推理的基准测试

摘要: 人工智能(AI)的发展受到大型语言模型(LLMs)和大型多模态模型(LMMs)的进步的显著加速，逐渐展示出在问题解决和科学发现方面潜在的认知推理能力（即AI4Science），这些能力曾经只属于人类智慧。为了全面评估当前模型在认知推理能力方面的表现，我们引入了OlympicArena，其中包括11,163个双语问题，涵盖了纯文本和交错文本-图像两种模态。这些挑战涵盖了七个领域和62个国际奥林匹克比赛，严格检查了数据泄漏情况。我们认为奥林匹克竞赛问题中的挑战对于评估AI的认知推理能力是理想的，因为它们的复杂性和跨学科性对于解决复杂科学问题和促进发现至关重要。除了使用仅答案标准跨不同学科评估性能外，我们还从多个角度进行了详细的实验和分析。我们深入研究了模型的认知推理能力，它们在不同模态下的表现，以及它们在过程级评估中的结果，这对于需要复杂推理和冗长解决方案的任务至关重要。我们的广泛评估显示，即使像GPT-4o这样的先进模型也仅实现了39.97%的整体准确率，显示了当前AI在复杂推理和多模态集成方面的局限性。通过OlympicArena，我们旨在将AI推向超级智能，使其能够解决更复杂的科学挑战及其他领域的问题。我们还提供了一套全面的资源来支持AI研究，包括一个基准数据集、一个开源注释平台、一个详细的评估工具以及一个具有自动提交功能的排行榜。

更新时间: 2024-06-18 16:20:53

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2406.12753v1

Extracting Training Data from Unconditional Diffusion Models

As diffusion probabilistic models (DPMs) are being employed as mainstream models for generative artificial intelligence (AI), the study of their memorization of the raw training data has attracted growing attention. Existing works in this direction aim to establish an understanding of whether or to what extent DPMs learn by memorization. Such an understanding is crucial for identifying potential risks of data leakage and copyright infringement in diffusion models and, more importantly, for more controllable generation and trustworthy application of Artificial Intelligence Generated Content (AIGC). While previous works have made important observations of when DPMs are prone to memorization, these findings are mostly empirical, and the developed data extraction methods only work for conditional diffusion models. In this work, we aim to establish a theoretical understanding of memorization in DPMs with 1) a memorization metric for theoretical analysis, 2) an analysis of conditional memorization with informative and random labels, and 3) two better evaluation metrics for measuring memorization. Based on the theoretical analysis, we further propose a novel data extraction method called \textbf{Surrogate condItional Data Extraction (SIDE)} that leverages a classifier trained on generated data as a surrogate condition to extract training data directly from unconditional diffusion models. Our empirical results demonstrate that SIDE can extract training data from diffusion models where previous methods fail, and it is on average over 50\% more effective across different scales of the CelebA dataset.

Updated: 2024-06-18 16:20:12

标题: 从无条件扩散模型中提取训练数据

摘要: 随着扩散概率模型（DPMs）被作为生成人工智能（AI）的主流模型使用，对其对原始训练数据的记忆的研究引起了越来越多的关注。在这个方向上的现有研究旨在建立对DPMs是否以及在多大程度上通过记忆学习的理解。这种理解对于识别扩散模型中数据泄漏和侵犯版权的潜在风险至关重要，更重要的是，对于更可控的生成和可信赖的应用人工智能生成内容（AIGC）。虽然先前的研究对DPMs何时容易发生记忆做出了重要观察，但这些发现大多是经验性的，并且开发的数据提取方法仅适用于条件扩散模型。在这项工作中，我们旨在通过以下方式建立对DPMs中记忆的理论理解：1）为理论分析提供记忆度量标准，2）对具有信息性和随机标签的条件记忆进行分析，3）提供两个更好的评估记忆的度量标准。基于理论分析，我们进一步提出了一种称为\textbf{Surrogate condItional Data Extraction（SIDE）}的新型数据提取方法，该方法利用在生成数据上训练的分类器作为替代条件，直接从无条件扩散模型中提取训练数据。我们的实证结果表明，相对于先前的方法，SIDE可以从扩散模型中提取训练数据，且在CelebA数据集的不同规模上平均效果提高了50％以上。

更新时间: 2024-06-18 16:20:12

领域: cs.CR,cs.CV,cs.LG

下载: http://arxiv.org/abs/2406.12752v1

High-Performance Hybrid Algorithm for Minimum Sum-of-Squares Clustering of Infinitely Tall Data

This paper introduces a novel formulation of the clustering problem, namely the Minimum Sum-of-Squares Clustering of Infinitely Tall Data (MSSC-ITD), and presents HPClust, an innovative set of hybrid parallel approaches for its effective solution. By utilizing modern high-performance computing techniques, HPClust enhances key clustering metrics: effectiveness, computational efficiency, and scalability. In contrast to vanilla data parallelism, which only accelerates processing time through the MapReduce framework, our approach unlocks superior performance by leveraging the multi-strategy competitive-cooperative parallelism and intricate properties of the objective function landscape. Unlike other available algorithms that struggle to scale, our algorithm is inherently parallel in nature, improving solution quality through increased scalability and parallelism, and outperforming even advanced algorithms designed for small and medium-sized datasets. Our evaluation of HPClust, featuring four parallel strategies, demonstrates its superiority over traditional and cutting-edge methods by offering better performance in the key metrics. These results also show that parallel processing not only enhances the clustering efficiency, but the accuracy as well. Additionally, we explore the balance between computational efficiency and clustering quality, providing insights into optimal parallel strategies based on dataset specifics and resource availability. This research advances our understanding of parallelism in clustering algorithms, demonstrating that a judicious hybridization of advanced parallel approaches yields optimal results for MSSC-ITD. Experiments on synthetic data further confirm HPClust's exceptional scalability and robustness to noise.

Updated: 2024-06-18 16:19:56

标题: 高性能混合算法用于无限高数据的最小平方和聚类

摘要: 本文介绍了一种新颖的聚类问题表述，即无限高数据的最小平方和聚类（MSSC-ITD），并提出了HPClust，一组创新的混合并行方法，用于有效解决该问题。通过利用现代高性能计算技术，HPClust提升了关键的聚类指标：效果、计算效率和可扩展性。与仅通过MapReduce框架加速处理时间的普通数据并行相比，我们的方法通过利用多策略竞争合作并行和目标函数景观的复杂特性，实现了卓越的性能提升。与其他现有算法不适合扩展的情况不同，我们的算法在本质上是并行的，通过增加可扩展性和并行性来提高解决方案质量，并且胜过为小型和中型数据集设计的先进算法。我们对HPClust的评估，包括四种并行策略，证明了它在关键指标上比传统和尖端方法更优秀。这些结果还表明，并行处理不仅提高了聚类效率，而且提高了准确性。此外，我们探讨了计算效率与聚类质量之间的平衡，根据数据集特定性和资源可用性提供了关于最佳并行策略的见解。这项研究推动了我们对聚类算法中并行性的理解，表明对于MSSC-ITD，采用高级并行方法的慎重混合可以产生最佳结果。在合成数据上的实验进一步证实了HPClust在可扩展性和对噪声的稳健性方面的卓越表现。

更新时间: 2024-06-18 16:19:56

领域: cs.DC,cs.LG,math.OC

下载: http://arxiv.org/abs/2311.04517v4

Reinforcement Learning from Multi-role Debates as Feedback for Bias Mitigation in LLMs

Bias in LLMs can harm user experience and societal outcomes. However, current bias mitigation methods often require intensive human feedback, lack transferability to other topics or yield overconfident and random outputs. We find that involving LLMs in role-playing scenario boosts their ability to recognize and mitigate biases. Based on this, we propose Reinforcement Learning from Multi-role Debates as Feedback (RLDF), a novel approach for bias mitigation replacing human feedback in traditional RLHF. We utilize LLMs in multi-role debates to create a dataset that includes both high-bias and low-bias instances for training the reward model in reinforcement learning. Our approach comprises two modes: (1) self-reflection, where the same LLM participates in multi-role debates, and (2) teacher-student, where a more advanced LLM like GPT-3.5-turbo guides the LLM to perform this task. Experimental results across different LLMs demonstrate the effectiveness of our approach in bias mitigation.

Updated: 2024-06-18 16:19:40

标题: 使用多角色辩论作为强化学习，用于减轻LLMs中的偏见：一项研究

摘要: 在LLMs中的偏见可能会损害用户体验和社会结果。然而，当前的偏见缓解方法通常需要大量人类反馈，缺乏对其他主题的可传递性，或者产生自信和随机的输出。我们发现，让LLMs参与角色扮演场景可以提高它们识别和缓解偏见的能力。基于此，我们提出了一种新颖的偏见缓解方法，即从多角色辩论中进行强化学习作为反馈（RLDF），取代传统RLHF中的人类反馈。我们利用LLMs在多角色辩论中创建数据集，其中包括训练强化学习奖励模型的高偏见和低偏见实例。我们的方法包括两种模式：（1）自我反思，即同一LLM参与多角色辩论，和（2）师生，即更高级的LLM如GPT-3.5-turbo指导LLM执行此任务。不同LLMs的实验结果表明我们的方法在偏见缓解方面的有效性。

更新时间: 2024-06-18 16:19:40

领域: cs.AI

下载: http://arxiv.org/abs/2404.10160v5

Zero-Shot Neural Architecture Search: Challenges, Solutions, and Opportunities

Recently, zero-shot (or training-free) Neural Architecture Search (NAS) approaches have been proposed to liberate NAS from the expensive training process. The key idea behind zero-shot NAS approaches is to design proxies that can predict the accuracy of some given networks without training the network parameters. The proxies proposed so far are usually inspired by recent progress in theoretical understanding of deep learning and have shown great potential on several datasets and NAS benchmarks. This paper aims to comprehensively review and compare the state-of-the-art (SOTA) zero-shot NAS approaches, with an emphasis on their hardware awareness. To this end, we first review the mainstream zero-shot proxies and discuss their theoretical underpinnings. We then compare these zero-shot proxies through large-scale experiments and demonstrate their effectiveness in both hardware-aware and hardware-oblivious NAS scenarios. Finally, we point out several promising ideas to design better proxies. Our source code and the list of related papers are available on https://github.com/SLDGroup/survey-zero-shot-nas.

Updated: 2024-06-18 16:09:26

标题: 零样本神经架构搜索：挑战、解决方案和机遇

摘要: 最近，零-shot（或无需训练）神经架构搜索（NAS）方法已被提出，以使NAS摆脱昂贵的训练过程。零-shot NAS方法背后的关键思想是设计可以在不训练网络参数的情况下预测某些给定网络准确度的代理。迄今为止提出的代理通常受到深度学习理论理解的最新进展的启发，并在几个数据集和NAS基准上展现了巨大潜力。本文旨在全面审查和比较最新的零-shot NAS方法，重点关注它们对硬件的意识。为此，我们首先回顾主流的零-shot代理并讨论它们的理论基础。然后，我们通过大规模实验比较这些零-shot代理，并展示它们在硬件感知和硬件无关NAS场景中的有效性。最后，我们指出几个设计更好代理的有希望的想法。我们的源代码和相关论文列表可在https://github.com/SLDGroup/survey-zero-shot-nas 上找到。

更新时间: 2024-06-18 16:09:26

领域: cs.LG,cs.CV

下载: http://arxiv.org/abs/2307.01998v3

TSI-Bench: Benchmarking Time Series Imputation

Effective imputation is a crucial preprocessing step for time series analysis. Despite the development of numerous deep learning algorithms for time series imputation, the community lacks standardized and comprehensive benchmark platforms to effectively evaluate imputation performance across different settings. Moreover, although many deep learning forecasting algorithms have demonstrated excellent performance, whether their modeling achievements can be transferred to time series imputation tasks remains unexplored. To bridge these gaps, we develop TSI-Bench, the first (to our knowledge) comprehensive benchmark suite for time series imputation utilizing deep learning techniques. The TSI-Bench pipeline standardizes experimental settings to enable fair evaluation of imputation algorithms and identification of meaningful insights into the influence of domain-appropriate missingness ratios and patterns on model performance. Furthermore, TSI-Bench innovatively provides a systematic paradigm to tailor time series forecasting algorithms for imputation purposes. Our extensive study across 34,804 experiments, 28 algorithms, and 8 datasets with diverse missingness scenarios demonstrates TSI-Bench's effectiveness in diverse downstream tasks and potential to unlock future directions in time series imputation research and analysis. The source code and experiment logs are available at https://github.com/WenjieDu/AwesomeImputation.

Updated: 2024-06-18 16:07:33

标题: TSI-Bench：时间序列插补的基准测试

摘要: 有效的插补是时间序列分析中至关重要的预处理步骤。尽管已经开发了许多深度学习算法用于时间序列插补，但社区缺乏标准化和全面的基准平台，以有效评估在不同设置下的插补性能。此外，尽管许多深度学习预测算法表现出色，但它们的建模成就是否可以转移到时间序列插补任务中尚未被探索。为了填补这些差距，我们开发了TSI-Bench，这是第一个（据我们所知）利用深度学习技术的时间序列插补全面基准套件。TSI-Bench管道标准化了实验设置，以便公平评估插补算法，并识别领域适当的缺失率和模式对模型性能的影响的有意义见解。此外，TSI-Bench创新地提供了一个系统范式，以定制时间序列预测算法用于插补目的。我们在34,804个实验、28种算法和8个数据集中进行了广泛研究，展示了TSI-Bench在各种下游任务中的有效性，以及在时间序列插补研究和分析中开启未来方向的潜力。源代码和实验日志可在https://github.com/WenjieDu/AwesomeImputation上找到。

更新时间: 2024-06-18 16:07:33

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2406.12747v1

Ensuring Both Positivity and Stability Using Sector-Bounded Nonlinearity for Systems with Neural Network Controllers

This paper introduces a novel method for the stability analysis of positive feedback systems with a class of fully connected feedforward neural networks (FFNN) controllers. By establishing sector bounds for fully connected FFNNs without biases, we present a stability theorem that demonstrates the global exponential stability of linear systems under fully connected FFNN control. Utilizing principles from positive Lur'e systems and the positive Aizerman conjecture, our approach effectively addresses the challenge of ensuring stability in highly nonlinear systems. The crux of our method lies in maintaining sector bounds that preserve the positivity and Hurwitz property of the overall Lur'e system. We showcase the practical applicability of our methodology through its implementation in a linear system managed by a FFNN trained on output feedback controller data, highlighting its potential for enhancing stability in dynamic systems.

Updated: 2024-06-18 16:05:57

标题: 利用具有神经网络控制器的部门有界非线性确保积极性和稳定性

摘要: 本文介绍了一种新颖的方法，用于稳定性分析具有一类全连接前馈神经网络（FFNN）控制器的正反馈系统。通过为没有偏差的全连接FFNN建立区间界限，我们提出了一个稳定性定理，证明了线性系统在全连接FFNN控制下的全局指数稳定性。利用正Lur'e系统和正Aizerman猜想的原理，我们的方法有效地解决了在高度非线性系统中确保稳定性的挑战。我们方法的关键在于保持区间界限，以保持整体Lur'e系统的正性和Hurst属性。我们通过将方法应用于由经过输出反馈控制器数据训练的FFNN管理的线性系统中展示了我们方法的实际适用性，突出了其增强动态系统稳定性的潜力。

更新时间: 2024-06-18 16:05:57

领域: eess.SY,cs.AI,cs.SY,math.OC,G.1.2; I.2.3; I.2.8

下载: http://arxiv.org/abs/2406.12744v1

Variables are a Curse in Software Vulnerability Prediction

Deep learning-based approaches for software vulnerability prediction currently mainly rely on the original text of software code as the feature of nodes in the graph of code and thus could learn a representation that is only specific to the code text, rather than the representation that depicts the 'intrinsic' functionality of a program hidden in the text representation. One curse that causes this problem is an infinite number of possibilities to name a variable. In order to lift the curse, in this work we introduce a new type of edge called name dependence, a type of abstract syntax graph based on the name dependence, and an efficient node representation method named 3-property encoding scheme. These techniques will allow us to remove the concrete variable names from code, and facilitate deep learning models to learn the functionality of software hidden in diverse code expressions. The experimental results show that the deep learning models built on these techniques outperform the ones based on existing approaches not only in the prediction of vulnerabilities but also in the memory need. The factor of memory usage reductions of our techniques can be up to the order of 30,000 in comparison to existing approaches.

Updated: 2024-06-18 16:02:29

标题: 变量是软件漏洞预测中的诅咒

摘要: 基于深度学习的软件漏洞预测方法目前主要依赖于软件代码的原始文本作为代码图中节点的特征，因此可能仅学习到与代码文本相关的表示，而不是描绘隐藏在文本表示中的程序“固有”功能的表示。导致这个问题的一个诅咒是变量命名的可能性无限。为了解决这个问题，在这项工作中我们引入了一种新类型的边，称为名称依赖边，一种基于名称依赖的抽象语法图，以及一种高效的节点表示方法，称为三属性编码方案。这些技术将允许我们从代码中删除具体的变量名称，并促进深度学习模型学习隐藏在不同代码表达中的软件功能。实验结果表明，基于这些技术构建的深度学习模型在漏洞预测方面不仅优于基于现有方法的模型，还在内存需求方面表现出色。与现有方法相比，我们的技术在内存使用减少方面可以达到30,000数量级的减少。

更新时间: 2024-06-18 16:02:29

领域: cs.SE,cs.AI,cs.CR,cs.LG,I.2.0; D.2.m

下载: http://arxiv.org/abs/2407.02509v1

Benchmarking Multi-Image Understanding in Vision and Language Models: Perception, Knowledge, Reasoning, and Multi-Hop Reasoning

The advancement of large language models (LLMs) has significantly broadened the scope of applications in natural language processing, with multi-modal LLMs extending these capabilities to integrate and interpret visual data. However, existing benchmarks for visual language models (VLMs) predominantly focus on single-image inputs, neglecting the crucial aspect of multi-image understanding. In this paper, we introduce a Multi-Image Relational Benchmark MIRB, designed to evaluate VLMs' ability to compare, analyze, and reason across multiple images. Our benchmark encompasses four categories: perception, visual world knowledge, reasoning, and multi-hop reasoning. Through a comprehensive evaluation of a wide range of open-source and closed-source models, we demonstrate that while open-source VLMs were shown to approach the performance of GPT-4V in single-image tasks, a significant performance gap remains in multi-image reasoning tasks. Our findings also reveal that even the state-of-the-art GPT-4V model struggles with our benchmark, underscoring the need for further research and development in this area. We believe our contribution of MIRB could serve as a testbed for developing the next-generation multi-modal models.

Updated: 2024-06-18 16:02:18

标题: 基准测试视觉与语言模型中的多图像理解：感知、知识、推理和多跳推理

摘要: 大型语言模型（LLMs）的发展显著拓宽了自然语言处理应用的范围，多模式LLMs将这些功能扩展到整合和解释视觉数据。然而，现有的视觉语言模型（VLMs）基准主要集中在单幅图像输入上，忽视了多幅图像理解的关键方面。本文介绍了一个名为MIRB的多图像关系基准，旨在评估VLMs在比较、分析和推理多个图像时的能力。我们的基准包括四个类别：感知、视觉世界知识、推理和多跳推理。通过对各种开源和闭源模型进行全面评估，我们证明了虽然开源VLMs在单图像任务中已接近GPT-4V的性能，但在多图像推理任务中仍存在显著的性能差距。我们的研究结果还表明，即使是最先进的GPT-4V模型在我们的基准上也存在困难，强调了在这个领域进一步研究和发展的必要性。我们相信我们的MIRB的贡献可以作为开发下一代多模式模型的试验平台。

更新时间: 2024-06-18 16:02:18

领域: cs.CV,cs.AI,cs.CL

下载: http://arxiv.org/abs/2406.12742v1

To smooth a cloud or to pin it down: Guarantees and Insights on Score Matching in Denoising Diffusion Models

Denoising diffusion models are a class of generative models which have recently achieved state-of-the-art results across many domains. Gradual noise is added to the data using a diffusion process, which transforms the data distribution into a Gaussian. Samples from the generative model are then obtained by simulating an approximation of the time reversal of this diffusion initialized by Gaussian samples. Recent research has explored adapting diffusion models for sampling and inference tasks. In this paper, we leverage known connections to stochastic control akin to the F\"ollmer drift to extend established neural network approximation results for the F\"ollmer drift to denoising diffusion models and samplers.

Updated: 2024-06-18 16:01:52

标题: 平滑云端还是固定云端：噪声扩散模型中得分匹配的保证和洞见

摘要: 去噪扩散模型是一类生成模型，最近在许多领域取得了最先进的结果。逐渐将噪声添加到数据中，使用扩散过程将数据分布转换为高斯分布。然后通过模拟该扩散的时间逆转的近似来获取生成模型的样本，该过程以高斯样本为初始值。最近的研究探讨了将扩散模型调整为采样和推断任务。在本文中，我们利用已知的与类似于F\"ollmer漂移的随机控制的联系，将已建立的神经网络逼近结果扩展到去噪扩散模型和采样器。

更新时间: 2024-06-18 16:01:52

领域: stat.ML,cs.LG

下载: http://arxiv.org/abs/2305.09605v2

Self-Control of LLM Behaviors by Compressing Suffix Gradient into Prefix Controller

We propose Self-Control, a novel method utilizing suffix gradients to control the behavior of large language models (LLMs) without explicit human annotations. Given a guideline expressed in suffix string and the model's self-assessment of adherence, Self-Control computes the gradient of this self-judgment concerning the model's hidden states, directly influencing the auto-regressive generation process towards desired behaviors. To enhance efficiency, we introduce Self-Control_{prefix}, a compact module that encapsulates the learned representations from suffix gradients into a Prefix Controller, facilitating inference-time control for various LLM behaviors. Our experiments demonstrate Self-Control's efficacy across multiple domains, including emotional modulation, ensuring harmlessness, and enhancing complex reasoning. Especially, Self-Control_{prefix} enables a plug-and-play control and jointly controls multiple attributes, improving model outputs without altering model parameters or increasing inference-time costs.

Updated: 2024-06-18 15:58:38

标题: 将后缀梯度压缩为前缀控制器来自控LLM行为

摘要: 我们提出了自控（Self-Control）这一新颖方法，利用后缀梯度来控制大型语言模型（LLMs）的行为，而无需明确的人类注释。给定一个以后缀字符串表示的指导方针和模型对自身遵守度的自我评估，Self-Control计算这种自我判断对模型隐藏状态的梯度，直接影响自回归生成过程朝向期望的行为。为了提高效率，我们引入了Self-Control_{prefix}，一个紧凑的模块，将从后缀梯度中学习到的表示封装成一个前缀控制器，便于在不同的LLM行为情境下进行控制。我们的实验证明了Self-Control在多个领域的有效性，包括情绪调节、确保无害性以及增强复杂推理。特别地，Self-Control_{prefix}实现了即插即用的控制，并联合控制多个属性，提高了模型输出的质量，而不需要改变模型参数或增加推理时间成本。

更新时间: 2024-06-18 15:58:38

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2406.02721v2

Large Language Model as a Universal Clinical Multi-task Decoder

The development of effective machine learning methodologies for enhancing the efficiency and accuracy of clinical systems is crucial. Despite significant research efforts, managing a plethora of diversified clinical tasks and adapting to emerging new tasks remain significant challenges. This paper presents a novel paradigm that employs a pre-trained large language model as a universal clinical multi-task decoder. This approach leverages the flexibility and diversity of language expressions to handle task topic variations and associated arguments. The introduction of a new task simply requires the addition of a new instruction template. We validate this framework across hundreds of tasks, demonstrating its robustness in facilitating multi-task predictions, performing on par with traditional multi-task learning and single-task learning approaches. Moreover, it shows exceptional adaptability to new tasks, with impressive zero-shot performance in some instances and superior data efficiency in few-shot scenarios. This novel approach offers a unified solution to manage a wide array of new and emerging tasks in clinical applications.

Updated: 2024-06-18 15:58:36

标题: 大型语言模型作为一个通用的临床多任务解码器

摘要: 为提高临床系统效率和准确性，开发有效的机器学习方法至关重要。尽管有大量研究工作，但管理各种不同的临床任务并适应新兴任务仍然是重大挑战。本文提出了一种新颖的范式，将一个预训练的大型语言模型作为通用临床多任务解码器。该方法利用语言表达的灵活性和多样性来处理任务主题的变化和相关参数。引入新任务仅需要添加新的指令模板。我们在数百个任务上验证了这一框架，展示了其在促进多任务预测方面的强大鲁棒性，与传统的多任务学习和单任务学习方法表现相当。此外，在某些情况下，它表现出卓越的适应性，有时呈现出零-shot性能，并在少样本情况下表现出更高的数据效率。这种新颖方法提供了一个统一的解决方案，以处理临床应用中各种新兴任务。

更新时间: 2024-06-18 15:58:36

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2406.12738v1

Beyond Visual Appearances: Privacy-sensitive Objects Identification via Hybrid Graph Reasoning

The Privacy-sensitive Object Identification (POI) task allocates bounding boxes for privacy-sensitive objects in a scene. The key to POI is settling an object's privacy class (privacy-sensitive or non-privacy-sensitive). In contrast to conventional object classes which are determined by the visual appearance of an object, one object's privacy class is derived from the scene contexts and is subject to various implicit factors beyond its visual appearance. That is, visually similar objects may be totally opposite in their privacy classes. To explicitly derive the objects' privacy class from the scene contexts, in this paper, we interpret the POI task as a visual reasoning task aimed at the privacy of each object in the scene. Following this interpretation, we propose the PrivacyGuard framework for POI. PrivacyGuard contains three stages. i) Structuring: an unstructured image is first converted into a structured, heterogeneous scene graph that embeds rich scene contexts. ii) Data Augmentation: a contextual perturbation oversampling strategy is proposed to create slightly perturbed privacy-sensitive objects in a scene graph, thereby balancing the skewed distribution of privacy classes. iii) Hybrid Graph Generation & Reasoning: the balanced, heterogeneous scene graph is then transformed into a hybrid graph by endowing it with extra "node-node" and "edge-edge" homogeneous paths. These homogeneous paths allow direct message passing between nodes or edges, thereby accelerating reasoning and facilitating the capturing of subtle context changes. Based on this hybrid graph... **For the full abstract, see the original paper.**

Updated: 2024-06-18 15:58:22

标题: 超越视觉外观：基于混合图推理的隐私敏感对象识别

摘要: 隐私敏感对象识别（POI）任务为场景中的隐私敏感对象分配边界框。POI的关键在于确定一个对象的隐私类别（隐私敏感或非隐私敏感）。与传统的由对象的视觉外观确定的对象类别不同，一个对象的隐私类别是从场景上下文中派生的，并受到其视觉外观之外的各种隐含因素的影响。也就是说，视觉上相似的对象在其隐私类别上可能完全相反。为了明确地从场景上下文中派生对象的隐私类别，在本文中，我们将POI任务解释为一个旨在保护场景中每个对象隐私的视觉推理任务。根据这一解释，我们提出了用于POI的PrivacyGuard框架。PrivacyGuard包括三个阶段。i）结构化：将非结构化图像首先转换为包含丰富场景上下文的结构化异构场景图。ii）数据增强：提出了一种上下文扰动过采样策略，用于在场景图中创建略微扰动的隐私敏感对象，从而平衡隐私类别的倾斜分布。iii）混合图生成和推理：然后通过为其增加额外的“节点-节点”和“边-边”同质路径，将平衡的异构场景图转换为混合图。这些同质路径允许节点或边之间直接传递消息，从而加速推理并促进捕捉微妙的上下文变化。基于这个混合图... **有关完整摘要，请参阅原始论文。**

更新时间: 2024-06-18 15:58:22

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2406.12736v1

Automatic generation of insights from workers' actions in industrial workflows with explainable Machine Learning

New technologies such as Machine Learning (ML) gave great potential for evaluating industry workflows and automatically generating key performance indicators (KPIs). However, despite established standards for measuring the efficiency of industrial machinery, there is no precise equivalent for workers' productivity, which would be highly desirable given the lack of a skilled workforce for the next generation of industry workflows. Therefore, an ML solution combining data from manufacturing processes and workers' performance for that goal is required. Additionally, in recent times intense effort has been devoted to explainable ML approaches that can automatically explain their decisions to a human operator, thus increasing their trustworthiness. We propose to apply explainable ML solutions to differentiate between expert and inexpert workers in industrial workflows, which we validate at a quality assessment industrial workstation. Regarding the methodology used, input data are captured by a manufacturing machine and stored in a NoSQL database. Data are processed to engineer features used in automatic classification and to compute workers' KPIs to predict their level of expertise (with all classification metrics exceeding 90 %). These KPIs, and the relevant features in the decisions are textually explained by natural language expansion on an explainability dashboard. These automatic explanations made it possible to infer knowledge from expert workers for inexpert workers. The latter illustrates the interest of research in self-explainable ML for automatically generating insights to improve productivity in industrial workflows.

Updated: 2024-06-18 15:55:11

标题: 工业工作流程中工人行为的可解释机器学习自动生成见解

摘要: 新技术，如机器学习（ML），为评估行业工作流程并自动生成关键绩效指标（KPI）提供了巨大潜力。然而，尽管已经建立了衡量工业机械效率的标准，但对于工人的生产力，尚无精确的等效指标，鉴于下一代工业工作流程缺乏熟练的劳动力，这将是非常理想的。因此，需要一种结合制造过程数据和工人表现数据的ML解决方案来实现这一目标。此外，近年来，人们致力于解释性ML方法，可以自动向人类操作员解释其决策，从而增加其可信度。我们建议将可解释性ML解决方案应用于区分工业工作流程中的专家和非专家工人，在质量评估工作站进行验证。关于所使用的方法论，输入数据由制造机器捕获并存储在NoSQL数据库中。数据经过处理以构建用于自动分类的特征，并计算工人的KPI，以预测他们的专业水平（所有分类指标均超过90%）。这些KPI和相关特征的决策通过可解释性仪表板上的自然语言扩展进行文本解释。这些自动解释使得能够从专家工人那里推断出对非专家工人有用的知识。后者说明了自解释ML研究对于自动生成洞见以提高工业工作流程生产率的兴趣。

更新时间: 2024-06-18 15:55:11

领域: cs.AI,cs.LG

下载: http://arxiv.org/abs/2406.12732v1

Predicting the energetic proton flux with a machine learning regression algorithm

The need of real-time of monitoring and alerting systems for Space Weather hazards has grown significantly in the last two decades. One of the most important challenge for space mission operations and planning is the prediction of solar proton events (SPEs). In this context, artificial intelligence and machine learning techniques have opened a new frontier, providing a new paradigm for statistical forecasting algorithms. The great majority of these models aim to predict the occurrence of a SPE, i.e., they are based on the classification approach. In this work we present a simple and efficient machine learning regression algorithm which is able to forecast the energetic proton flux up to 1 hour ahead by exploiting features derived from the electron flux only. This approach could be helpful to improve monitoring systems of the radiation risk in both deep space and near-Earth environments. The model is very relevant for mission operations and planning, especially when flare characteristics and source location are not available in real time, as at Mars distance.

Updated: 2024-06-18 15:54:50

标题: 使用机器学习回归算法预测高能质子通量

摘要: 在过去的二十年里，对太空天气危险进行实时监测和警报系统的需求显著增长。太空任务运营和规划面临的最重要挑战之一是太阳质子事件（SPEs）的预测。在这种背景下，人工智能和机器学习技术开辟了一个新的领域，为统计预测算法提供了一个新的范式。这些模型中绝大多数旨在预测SPE的发生，即它们基于分类方法。在这项工作中，我们提出了一个简单而高效的机器学习回归算法，能够通过仅利用来自电子通量的特征，提前1小时预测高能质子通量。这种方法对改进深空和近地球环境辐射风险监测系统可能有帮助。该模型对任务运营和规划非常重要，特别是在实时不可用太阳耀斑特征和源位置的情况下，例如在火星距离处。

更新时间: 2024-06-18 15:54:50

领域: astro-ph.SR,astro-ph.IM,cs.LG,physics.space-ph

下载: http://arxiv.org/abs/2406.12730v1

Leveraging Generative Models for Covert Messaging: Challenges and Tradeoffs for "Dead-Drop" Deployments

State of the art generative models of human-produced content are the focus of many recent papers that explore their use for steganographic communication. In particular, generative models of natural language text. Loosely, these works (invertibly) encode message-carrying bits into a sequence of samples from the model, ultimately yielding a plausible natural language covertext. By focusing on this narrow steganographic piece, prior work has largely ignored the significant algorithmic challenges, and performance-security tradeoffs, that arise when one actually tries to build a messaging pipeline around it. We make these challenges concrete, by considering the natural application of such a pipeline: namely, "dead-drop" covert messaging over large, public internet platforms (e.g. social media sites). We explicate the challenges and describe approaches to overcome them, surfacing in the process important performance and security tradeoffs that must be carefully tuned. We implement a system around this model-based format-transforming encryption pipeline, and give an empirical analysis of its performance and (heuristic) security.

Updated: 2024-06-18 15:52:51

标题: 利用生成模型进行隐秘通信：针对“死信箱”部署的挑战和权衡

摘要: 最新的生成模型研究人类生成内容是许多最近论文的焦点，这些论文探讨了它们在隐写通信中的应用。特别是自然语言文本的生成模型。这些作品将携带信息的比特（可逆地）编码为从模型中取样的序列，最终得到一个可信的自然语言覆盖文本。先前的研究主要集中在这个狭窄的隐写片段上，忽略了实际尝试围绕它构建消息传输管道时出现的重要算法挑战和性能安全权衡。我们通过考虑这样一个管道的自然应用，即在大型公共互联网平台（例如社交媒体网站）上进行“死信箱”秘密通信，使这些挑战具体化。我们阐明了挑战，并描述了克服这些挑战的方法，同时揭示了必须谨慎调整的重要性能和安全权衡。我们围绕这种基于模型的格式转换加密管道实现了一个系统，并对其性能和（启发式）安全性进行了实证分析。

更新时间: 2024-06-18 15:52:51

领域: cs.CR,cs.LG

下载: http://arxiv.org/abs/2110.07009v3

Skin Cancer Images Classification using Transfer Learning Techniques

Skin cancer is one of the most common and deadliest types of cancer. Early diagnosis of skin cancer at a benign stage is critical to reducing cancer mortality. To detect skin cancer at an earlier stage an automated system is compulsory that can save the life of many patients. Many previous studies have addressed the problem of skin cancer diagnosis using various deep learning and transfer learning models. However, existing literature has limitations in its accuracy and time-consuming procedure. In this work, we applied five different pre-trained transfer learning approaches for binary classification of skin cancer detection at benign and malignant stages. To increase the accuracy of these models we fine-tune different layers and activation functions. We used a publicly available ISIC dataset to evaluate transfer learning approaches. For model stability, data augmentation techniques are applied to improve the randomness of the input dataset. These approaches are evaluated using different hyperparameters such as batch sizes, epochs, and optimizers. The experimental results show that the ResNet-50 model provides an accuracy of 0.935, F1-score of 0.86, and precision of 0.94.

Updated: 2024-06-18 15:48:20

标题: 使用迁移学习技术对皮肤癌图像进行分类

摘要: 皮肤癌是最常见和最致命的癌症之一。在良性阶段对皮肤癌进行早期诊断对减少癌症死亡率至关重要。为了在更早的阶段检测皮肤癌，一个能够拯救许多患者生命的自动化系统是必不可少的。许多先前的研究已经解决了使用各种深度学习和迁移学习模型进行皮肤癌诊断的问题。然而，现有文献在准确性和耗时程序方面存在局限性。在这项工作中，我们应用了五种不同的预训练迁移学习方法，用于对皮肤癌在良性和恶性阶段进行二元分类检测。为了提高这些模型的准确性，我们微调了不同的层和激活函数。我们使用公开可用的ISIC数据集来评估迁移学习方法。为了增加模型的稳定性，应用了数据增强技术来改善输入数据集的随机性。这些方法使用不同的超参数进行评估，如批量大小、时代和优化器。实验结果显示，ResNet-50模型提供了0.935的准确性，0.86的F1分数和0.94的精度。

更新时间: 2024-06-18 15:48:20

领域: cs.CV,cs.AI,cs.LG

下载: http://arxiv.org/abs/2406.12954v1

Can Large Language Models Code Like a Linguist?: A Case Study in Low Resource Sound Law Induction

Historical linguists have long written a kind of incompletely formalized ''program'' that converts reconstructed words in an ancestor language into words in one of its attested descendants that consist of a series of ordered string rewrite functions (called sound laws). They do this by observing pairs of words in the reconstructed language (protoforms) and the descendent language (reflexes) and constructing a program that transforms protoforms into reflexes. However, writing these programs is error-prone and time-consuming. Prior work has successfully scaffolded this process computationally, but fewer researchers have tackled Sound Law Induction (SLI), which we approach in this paper by casting it as Programming by Examples. We propose a language-agnostic solution that utilizes the programming ability of Large Language Models (LLMs) by generating Python sound law programs from sound change examples. We evaluate the effectiveness of our approach for various LLMs, propose effective methods to generate additional language-agnostic synthetic data to fine-tune LLMs for SLI, and compare our method with existing automated SLI methods showing that while LLMs lag behind them they can complement some of their weaknesses.

Updated: 2024-06-18 15:46:04

标题: 大型语言模型是否能像语言学家一样编码？：低资源声音规律归纳的案例研究

摘要: 历史语言学家长期以来一直撰写一种未完全形式化的“程序”，将祖先语言中重建的词转换为其已知后代语言中的词，这些词由一系列有序的字符串重写函数（称为音变规律）组成。他们通过观察重建语言（原形）和后代语言（反射）中的词对，并构建一个将原形转换为反射的程序来实现这一点。然而，编写这些程序容易出错且耗时。先前的工作已成功地通过计算方式搭建了这一过程，但较少的研究人员涉及音变规律归纳（SLI），我们在本文中通过将其视为示例编程来处理。我们提出了一种语言无关的解决方案，利用大型语言模型（LLM）的编程能力，通过生成Python音变规律程序来自音变示例。我们评估了我们的方法对于各种LLM的有效性，提出了生成额外语言无关合成数据的有效方法，以对LLM进行微调以用于SLI，并将我们的方法与现有的自动SLI方法进行比较，表明LLM虽然落后于它们，但它们可以补充其中一些弱点。

更新时间: 2024-06-18 15:46:04

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2406.12725v1

BIOSCAN-5M: A Multimodal Dataset for Insect Biodiversity

As part of an ongoing worldwide effort to comprehend and monitor insect biodiversity, this paper presents the BIOSCAN-5M Insect dataset to the machine learning community and establish several benchmark tasks. BIOSCAN-5M is a comprehensive dataset containing multi-modal information for over 5 million insect specimens, and it significantly expands existing image-based biological datasets by including taxonomic labels, raw nucleotide barcode sequences, assigned barcode index numbers, and geographical information. We propose three benchmark experiments to demonstrate the impact of the multi-modal data types on the classification and clustering accuracy. First, we pretrain a masked language model on the DNA barcode sequences of the \mbox{BIOSCAN-5M} dataset, and demonstrate the impact of using this large reference library on species- and genus-level classification performance. Second, we propose a zero-shot transfer learning task applied to images and DNA barcodes to cluster feature embeddings obtained from self-supervised learning, to investigate whether meaningful clusters can be derived from these representation embeddings. Third, we benchmark multi-modality by performing contrastive learning on DNA barcodes, image data, and taxonomic information. This yields a general shared embedding space enabling taxonomic classification using multiple types of information and modalities. The code repository of the BIOSCAN-5M Insect dataset is available at {\url{https://github.com/zahrag/BIOSCAN-5M}}

Updated: 2024-06-18 15:45:21

标题: BIOSCAN-5M：昆虫生物多样性的多模态数据集

摘要: 作为全球范围内持续进行的理解和监测昆虫生物多样性的努力的一部分，本文向机器学习社区介绍了BIOSCAN-5M昆虫数据集，并建立了几个基准任务。BIOSCAN-5M是一个综合数据集，包含超过500万昆虫标本的多模态信息，通过包含分类标签、原始核苷酸条形码序列、分配的条形码索引号和地理信息，显著扩展了现有的基于图像的生物数据集。我们提出了三个基准实验，以展示多模态数据类型对分类和聚类准确性的影响。首先，我们对BIOSCAN-5M数据集的DNA条形码序列进行掩码语言模型的预训练，并展示使用这个庞大参考库对物种和属级别分类性能的影响。其次，我们提出了一个应用于图像和DNA条形码的零样本迁移学习任务，用于聚类从自监督学习中获得的特征嵌入，以探究是否可以从这些表示嵌入中推导出有意义的簇。第三，我们通过对DNA条形码、图像数据和分类信息进行对比学习来评估多模态性能。这产生了一个通用的共享嵌入空间，可以使用多种信息和模态进行分类。BIOSCAN-5M昆虫数据集的代码存储库可在https://github.com/zahrag/BIOSCAN-5M上找到。

更新时间: 2024-06-18 15:45:21

领域: cs.LG

下载: http://arxiv.org/abs/2406.12723v1

Structure-Aware Code Vulnerability Analysis With Graph Neural Networks

This study explores the effectiveness of graph neural networks (GNNs) for vulnerability detection in software code, utilizing a real-world dataset of Java vulnerability-fixing commits. The dataset's structure, based on the number of modified methods in each commit, offers a natural partition that facilitates diverse investigative scenarios. The primary focus is to evaluate the general applicability of GNNs in identifying vulnerable code segments and distinguishing these from their fixed versions, as well as from random non-vulnerable code. Through a series of experiments, the research addresses key questions about the suitability of different configurations and subsets of data in enhancing the prediction accuracy of GNN models. Experiments indicate that certain model configurations, such as the pruning of specific graph elements and the exclusion of certain types of code representation, significantly improve performance. Additionally, the study highlights the importance of including random data in training to optimize the detection capabilities of GNNs.

Updated: 2024-06-18 15:44:30

标题: 使用图神经网络进行结构感知代码漏洞分析

摘要: 本研究探讨了图神经网络（GNNs）在软件代码漏洞检测中的有效性，利用了一个真实世界的Java漏洞修复提交的数据集。数据集的结构基于每个提交中修改的方法数量，提供了一个自然的分区，有助于促进多样化的调查场景。主要重点是评估GNNs在识别易受攻击的代码段并将其区分出来以及与其固定版本以及随机非易受攻击代码的能力。通过一系列实验，研究解决了关于不同配置和数据子集对提高GNN模型预测准确性的适用性的关键问题。实验表明，某些模型配置，如修剪特定图元素和排除某些类型的代码表示，显着改善性能。此外，研究强调了在训练中包括随机数据以优化GNNs的检测能力的重要性。

更新时间: 2024-06-18 15:44:30

领域: cs.CR

下载: http://arxiv.org/abs/2307.11454v2

Estimating class separability of text embeddings with persistent homology

This paper introduces an unsupervised method to estimate the class separability of text datasets from a topological point of view. Using persistent homology, we demonstrate how tracking the evolution of embedding manifolds during training can inform about class separability. More specifically, we show how this technique can be applied to detect when the training process stops improving the separability of the embeddings. Our results, validated across binary and multi-class text classification tasks, show that the proposed method's estimates of class separability align with those obtained from supervised methods. This approach offers a novel perspective on monitoring and improving the fine-tuning of sentence transformers for classification tasks, particularly in scenarios where labeled data is scarce. We also discuss how tracking these quantities can provide additional insights into the properties of the trained classifier.

Updated: 2024-06-18 15:43:18

标题: 使用持续同调来估计文本嵌入的类别可分性

摘要: 这篇论文介绍了一种从拓扑角度估计文本数据集类别可分离性的无监督方法。利用持久同调理论，我们展示了在训练过程中跟踪嵌入流形演化如何可以提供关于类别可分离性的信息。更具体地，我们展示了这种技术如何可以应用于检测训练过程何时停止改善嵌入的可分离性。我们的结果在二分类和多分类文本分类任务中验证，表明所提出的方法对类别可分离性的估计与监督方法获得的结果一致。这种方法提供了一个新颖的视角，用于监控和改善用于分类任务的句子转换器的微调，尤其是在标记数据稀缺的情况下。我们还讨论了如何跟踪这些量可以提供对训练分类器属性的额外洞察。

更新时间: 2024-06-18 15:43:18

领域: cs.LG

下载: http://arxiv.org/abs/2305.15016v4

On the Robustness of Language Models for Tabular Question Answering

Large Language Models (LLMs), originally shown to ace various text comprehension tasks have also remarkably been shown to tackle table comprehension tasks without specific training. While previous research has explored LLM capabilities with tabular dataset tasks, our study assesses the influence of $\textit{in-context learning}$,$ \textit{model scale}$, $\textit{instruction tuning}$, and $\textit{domain biases}$ on Tabular Question Answering (TQA). We evaluate the robustness of LLMs on Wikipedia-based $\textbf{WTQ}$ and financial report-based $\textbf{TAT-QA}$ TQA datasets, focusing on their ability to robustly interpret tabular data under various augmentations and perturbations. Our findings indicate that instructions significantly enhance performance, with recent models like Llama3 exhibiting greater robustness over earlier versions. However, data contamination and practical reliability issues persist, especially with WTQ. We highlight the need for improved methodologies, including structure-aware self-attention mechanisms and better handling of domain-specific tabular data, to develop more reliable LLMs for table comprehension.

Updated: 2024-06-18 15:41:15

标题: 关于表格问答任务中语言模型的稳健性

摘要: 大型语言模型（LLMs）最初被证明能够处理各种文本理解任务，同时也被显示出能够处理表格理解任务，而无需特定训练。尽管先前的研究探索了LLM在表格数据集任务中的能力，我们的研究评估了$\textit{上下文学习}$，$\textit{模型规模}$，$\textit{指导调整}$和$\textit{领域偏见}$对表格问答（TQA）的影响。我们评估了LLMs在基于维基百科的$\textbf{WTQ}$和基于财务报告的$\textbf{TAT-QA}$ TQA数据集上的鲁棒性，重点关注它们在各种增强和扰动下对表格数据的解释能力。我们的研究结果表明，指导显著提高了性能，像Llama3这样的最新模型在早期版本上表现出更强的鲁棒性。然而，数据污染和实际可靠性问题仍然存在，特别是在WTQ中。我们强调了改进方法的必要性，包括结构感知的自注意机制和更好地处理特定领域的表格数据，以开发更可靠的用于表格理解的LLMs。

更新时间: 2024-06-18 15:41:15

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2406.12719v1

AGLA: Mitigating Object Hallucinations in Large Vision-Language Models with Assembly of Global and Local Attention

Despite their great success across various multimodal tasks, Large Vision-Language Models (LVLMs) are facing a prevalent problem with object hallucinations, where the generated textual responses are inconsistent with ground-truth objects in the given image. This paper investigates various LVLMs and pinpoints attention deficiency toward discriminative local image features as one root cause of object hallucinations. Specifically, LVLMs predominantly attend to prompt-independent global image features, while failing to capture prompt-relevant local features, consequently undermining the visual grounding capacity of LVLMs and leading to hallucinations. To this end, we propose Assembly of Global and Local Attention (AGLA), a training-free and plug-and-play approach that mitigates object hallucinations by exploring an ensemble of global features for response generation and local features for visual discrimination simultaneously. Our approach exhibits an image-prompt matching scheme that captures prompt-relevant local features from images, leading to an augmented view of the input image where prompt-relevant content is reserved while irrelevant distractions are masked. With the augmented view, a calibrated decoding distribution can be derived by integrating generative global features from the original image and discriminative local features from the augmented image. Extensive experiments show that AGLA consistently mitigates object hallucinations and enhances general perception capability for LVLMs across various discriminative and generative benchmarks. Our code will be released at https://github.com/Lackel/AGLA.

Updated: 2024-06-18 15:38:41

标题: AGLA：使用全局和局部注意力组装来减轻大型视觉语言模型中的对象幻觉

摘要: 尽管大规模视觉语言模型（LVLMs）在各种多模态任务中取得了巨大成功，但它们面临一个普遍问题，即物体幻觉，即生成的文本响应与给定图像中的实际对象不一致。本文调查了各种LVLMs，并将对辨别性局部图像特征的注意力不足定位为物体幻觉的一个根本原因。具体来说，LVLMs主要关注与提示无关的全局图像特征，而未能捕捉提示相关的局部特征，因此削弱了LVLMs的视觉基础能力，导致幻觉。为此，我们提出了全局和局部注意力的组合（AGLA），这是一种无需训练和即插即用的方法，通过同时探索全局特征用于响应生成和局部特征用于视觉辨别，从而减轻物体幻觉。我们的方法展示了一种图像提示匹配方案，从图像中捕捉提示相关的局部特征，从而在输入图像的扩充视图中保留提示相关内容，同时掩盖无关的干扰。通过扩充视图，可以通过整合原始图像的生成全局特征和扩充图像的辨别性局部特征来得出校准的解码分布。大量实验证明，AGLA始终减轻物体幻觉，并增强LVLMs在各种辨别和生成基准上的一般感知能力。我们的代码将在https://github.com/Lackel/AGLA上发布。

更新时间: 2024-06-18 15:38:41

领域: cs.CV,cs.AI,cs.CL

下载: http://arxiv.org/abs/2406.12718v1

On Differentially Private Subspace Estimation in a Distribution-Free Setting

Private data analysis faces a significant challenge known as the curse of dimensionality, leading to increased costs. However, many datasets possess an inherent low-dimensional structure. For instance, during optimization via gradient descent, the gradients frequently reside near a low-dimensional subspace. If the low-dimensional structure could be privately identified using a small amount of points, we could avoid paying for the high ambient dimension. On the negative side, Dwork, Talwar, Thakurta, and Zhang (STOC 2014) proved that privately estimating subspaces, in general, requires an amount of points that has a polynomial dependency on the dimension. However, their bound do not rule out the possibility to reduce the number of points for "easy'' instances. Yet, providing a measure that captures how much a given dataset is "easy'' for this task turns out to be challenging, and was not properly addressed in prior works. Inspired by the work of Singhal and Steinke (NeurIPS 2021), we provide the first measures that quantify easiness as a function of multiplicative singular-value gaps in the input dataset, and support them with new upper and lower bounds. In particular, our results determine the first type of gap that is sufficient and necessary for estimating a subspace with an amount of points that is independent of the dimension. Furthermore, we realize our upper bounds using a practical algorithm and demonstrate its advantage in high-dimensional regimes compared to prior approaches.

Updated: 2024-06-18 15:37:11

标题: 在一个无分布设置中对差异私密子空间估计进行翻译

摘要: 私人数据分析面临一个被称为维度诅咒的重要挑战，导致成本增加。然而，许多数据集具有固有的低维结构。例如，在通过梯度下降进行优化时，梯度经常位于低维子空间附近。如果能够使用少量点私下识别出低维结构，我们就可以避免支付高环境维度的成本。然而，Dwork、Talwar、Thakurta和Zhang（STOC 2014）证明，一般情况下私下估计子空间需要的点数与维度呈多项式依赖关系。然而，他们的界限并没有排除在“简单”情况下减少点数的可能性。然而，提供一个能够捕捉给定数据集对于这一任务的“简单程度”的度量结果是具有挑战性的，并且之前的工作中并没有得到妥善解决。受Singhal和Steinke（NeurIPS 2021）的工作启发，我们提供了第一种将简易性量化为输入数据集中的乘法奇异值差的函数的度量标准，并用新的上下界支持这些度量标准。特别地，我们的结果确定了第一种足够且必要的间隙类型，以便估计一个与维度无关的点数的子空间。此外，我们利用一个实用算法实现了我们的上界，并展示了与之前方法相比在高维度范围内的优势。

更新时间: 2024-06-18 15:37:11

领域: cs.LG,cs.CR,cs.DS

下载: http://arxiv.org/abs/2402.06465v2

Capturing Knowledge Graphs and Rules with Octagon Embeddings

Region based knowledge graph embeddings represent relations as geometric regions. This has the advantage that the rules which are captured by the model are made explicit, making it straightforward to incorporate prior knowledge and to inspect learned models. Unfortunately, existing approaches are severely restricted in their ability to model relational composition, and hence also their ability to model rules, thus failing to deliver on the main promise of region based models. With the aim of addressing these limitations, we investigate regions which are composed of axis-aligned octagons. Such octagons are particularly easy to work with, as intersections and compositions can be straightforwardly computed, while they are still sufficiently expressive to model arbitrary knowledge graphs. Among others, we also show that our octagon embeddings can properly capture a non-trivial class of rule bases. Finally, we show that our model achieves competitive experimental results.

Updated: 2024-06-18 15:29:55

标题: 使用八边形嵌入捕获知识图和规则

摘要: 基于区域的知识图嵌入将关系表示为几何区域。这具有优势，即模型捕获的规则是明确的，使得可以直接将先前的知识纳入，并检查学习到的模型。不幸的是，现有方法在建模关系组合能力方面受到严重限制，因此也受到规则建模能力的限制，因此未能兑现基于区域模型的主要承诺。为了解决这些限制，我们研究了由轴对齐八边形组成的区域。这样的八边形特别容易处理，因为交集和组合可以直接计算，同时它们仍然足够表达任意知识图。除此之外，我们还展示了我们的八边形嵌入能够正确捕获一类非平凡的规则基础。最后，我们展示了我们的模型取得了具有竞争力的实验结果。

更新时间: 2024-06-18 15:29:55

领域: cs.AI

下载: http://arxiv.org/abs/2401.16270v2

Learning Useful Representations of Recurrent Neural Network Weight Matrices

Recurrent Neural Networks (RNNs) are general-purpose parallel-sequential computers. The program of an RNN is its weight matrix. How to learn useful representations of RNN weights that facilitate RNN analysis as well as downstream tasks? While the mechanistic approach directly looks at some RNN's weights to predict its behavior, the functionalist approach analyzes its overall functionality-specifically, its input-output mapping. We consider several mechanistic approaches for RNN weights and adapt the permutation equivariant Deep Weight Space layer for RNNs. Our two novel functionalist approaches extract information from RNN weights by 'interrogating' the RNN through probing inputs. We develop a theoretical framework that demonstrates conditions under which the functionalist approach can generate rich representations that help determine RNN behavior. We release the first two 'model zoo' datasets for RNN weight representation learning. One consists of generative models of a class of formal languages, and the other one of classifiers of sequentially processed MNIST digits.With the help of an emulation-based self-supervised learning technique we compare and evaluate the different RNN weight encoding techniques on multiple downstream applications. On the most challenging one, namely predicting which exact task the RNN was trained on, functionalist approaches show clear superiority.

Updated: 2024-06-18 15:27:16

标题: 学习递归神经网络权重矩阵的有用表示

摘要: 递归神经网络（RNNs）是通用的并行顺序计算机。RNN的程序是其权重矩阵。如何学习有用的RNN权重表示，以便促进RNN分析以及下游任务？虽然机械主义方法直接查看某些RNN的权重以预测其行为，但功能主义方法分析其整体功能，特别是其输入-输出映射。我们考虑了几种RNN权重的机械主义方法，并为RNNs调整了置换等变Deep Weight Space层。我们的两种新颖的功能主义方法通过“审问”RNN来从RNN权重中提取信息。我们制定了一个理论框架，展示了功能主义方法可以生成有助于确定RNN行为的丰富表示的条件。我们发布了RNN权重表示学习的第一个“模型动物园”数据集。一个由一类形式语言的生成模型组成，另一个由依次处理的MNIST数字的分类器组成。借助基于仿真的自监督学习技术，我们比较和评估了不同的RNN权重编码技术在多个下游应用上的表现。在最具挑战性的任务上，即预测RNN训练的确切任务，功能主义方法表现出明显的优势。

更新时间: 2024-06-18 15:27:16

领域: cs.LG,I.2.6

下载: http://arxiv.org/abs/2403.11998v2

Hypergraph: A Unified and Uniform Definition with Application to Chemical Hypergraph

The conventional definition of hypergraph has two major issues: (1) there is not a standard definition of directed hypergraph and (2) there is not a formal definition of nested hypergraph. To resolve these issues, we propose a new definition of hypergraph that unifies the concepts of undirected, directed and nested hypergraphs, and that is uniform in using hyperedge as a single construct for representing high-order correlations among things, i.e., nodes and hyperedges. Specifically, we define a hyperedge to be a simple hyperedge, a nesting hyperedge, or a directed hyperedge. With this new definition, a hypergraph is nested if it has nesting hyperedge(s), and is directed if it has directed hyperedge(s). Otherwise, a hypergraph is a simple hypergraph. The uniformity and power of this new definition, with visualization, should facilitate the use of hypergraph for representing (hierarchical) high-order correlations in general and chemical systems in particular. Graph has been widely used as a mathematical structure for machine learning on molecular structures and 3D molecular geometries. However, graph has a major limitation: it can represent only pairwise correlations between nodes. Hypergraph extends graph with high-order correlations among nodes. This extension is significant or essential for machine learning on chemical systems. For molecules, this is significant as it allows the direct, explicit representation of multicenter bonds and molecular substructures. For chemical reactions, this is essential since most chemical reactions involve multiple participants. We propose the use of chemical hypergraph, a multilevel hypergraph with simple, nesting and directed hyperedges, as a single mathematical structure for representing chemical systems. We apply the new definition of hypergraph to chemical hypergraph and, as simplified versions, molecular hypergraph and chemical reaction hypergraph.

Updated: 2024-06-18 15:25:28

标题: 超图：一个统一和统一的定义及其在化学超图中的应用

摘要: 超图的传统定义存在两个主要问题：（1）没有标准的有向超图的定义，（2）没有嵌套超图的正式定义。为了解决这些问题，我们提出了一个新的超图定义，统一了无向、有向和嵌套超图的概念，并统一使用超边作为表示物体之间高阶相关性的单一构造，即节点和超边。具体来说，我们将超边定义为简单超边、嵌套超边或有向超边。根据这个新定义，如果超图具有嵌套超边，则为嵌套超图，如果具有有向超边，则为有向超图。否则，超图为简单超图。这个新定义的一致性和强大性，结合可视化，应该有助于使用超图来表示（层次）高阶相关性，特别是化学系统中的相关性。图被广泛用作机器学习分子结构和3D分子几何结构的数学结构。然而，图有一个主要限制：它只能表示节点之间的成对相关性。超图通过节点之间的高阶相关性扩展了图。这种扩展对于化学系统的机器学习是重要或必要的。对于分子而言，这是重要的，因为它允许直接、明确地表示多中心键和分子亚结构。对于化学反应而言，这是必不可少的，因为大多数化学反应涉及多个参与者。我们提出使用化学超图，一个具有简单、嵌套和有向超边的多层超图，作为表示化学系统的单一数学结构。我们将超图的新定义应用于化学超图，并作为简化版本，分子超图和化学反应超图。

更新时间: 2024-06-18 15:25:28

领域: cs.LG,q-bio.QM

下载: http://arxiv.org/abs/2405.12235v4

What is in the Chrome Web Store? Investigating Security-Noteworthy Browser Extensions

This paper is the first attempt at providing a holistic view of the Chrome Web Store (CWS). We leverage historical data provided by ChromeStats to study global trends in the CWS and security implications. We first highlight the extremely short life cycles of extensions: roughly 60% of extensions stay in the CWS for one year. Second, we define and show that Security-Noteworthy Extensions (SNE) are a significant issue: they pervade the CWS for years and affect almost 350 million users. Third, we identify clusters of extensions with a similar code base. We discuss how code similarity techniques could be used to flag suspicious extensions. By developing an approach to extract URLs from extensions' comments, we show that extensions reuse code snippets from public repositories or forums, leading to the propagation of dated code and vulnerabilities. Finally, we underline a critical lack of maintenance in the CWS: 60% of the extensions in the CWS have never been updated; half of the extensions known to be vulnerable are still in the CWS and still vulnerable 2 years after disclosure; a third of extensions use vulnerable library versions. We believe that these issues should be widely known in order to pave the way for a more secure CWS.

Updated: 2024-06-18 15:25:06

标题: Chrome Web Store 中有什么？调查值得关注安全的浏览器扩展程序

摘要: 这篇论文是对Chrome Web Store（CWS）提供全面视角的第一次尝试。我们利用ChromeStats提供的历史数据来研究CWS的全球趋势和安全影响。我们首先强调了扩展的极短的生命周期：大约60%的扩展在CWS中停留一年。其次，我们定义并展示了安全值得关注的扩展（SNE）是一个重要问题：它们在CWS中存在多年，影响了近3.5亿用户。第三，我们识别了具有相似代码基础的扩展群集。我们讨论了如何使用代码相似性技术来标记可疑扩展。通过开发一种从扩展评论中提取URL的方法，我们展示了扩展重用来自公共仓库或论坛的代码片段，导致陈旧代码和漏洞的传播。最后，我们强调了CWS中关键的维护不足：60%的CWS中的扩展从未更新过；已知有漏洞的扩展中有一半仍在CWS中，而且在披露后两年仍存在漏洞；三分之一的扩展使用有漏洞的库版本。我们认为这些问题应该得到广泛关注，以为更安全的CWS铺平道路。

更新时间: 2024-06-18 15:25:06

领域: cs.CR

下载: http://arxiv.org/abs/2406.12710v1

Enhancing Spatio-temporal Quantile Forecasting with Curriculum Learning: Lessons Learned

Training models on spatio-temporal (ST) data poses an open problem due to the complicated and diverse nature of the data itself, and it is challenging to ensure the model's performance directly trained on the original ST data. While limiting the variety of training data can make training easier, it can also lead to a lack of knowledge and information for the model, resulting in a decrease in performance. To address this challenge, we presented an innovative paradigm that incorporates three separate forms of curriculum learning specifically targeting from spatial, temporal, and quantile perspectives. Furthermore, our framework incorporates a stacking fusion module to combine diverse information from three types of curriculum learning, resulting in a strong and thorough learning process. We demonstrated the effectiveness of this framework with extensive empirical evaluations, highlighting its better performance in addressing complex ST challenges. We provided thorough ablation studies to investigate the effectiveness of our curriculum and to explain how it contributes to the improvement of learning efficiency on ST data.

Updated: 2024-06-18 15:23:10

标题: 用课程学习增强时空分位数预测：经验教训

摘要: 在时空（ST）数据上训练模型存在一个开放性问题，这是由于数据本身的复杂和多样性，直接在原始ST数据上训练模型的性能很具挑战性。虽然限制训练数据的多样性可以使训练变得更容易，但也可能导致模型缺乏知识和信息，从而降低性能。为了解决这一挑战，我们提出了一个创新的范式，该范式将空间、时间和分位数三种不同形式的课程学习结合起来。此外，我们的框架还包括一个叠加融合模块，用于将三种类型的课程学习中获得的多样信息结合起来，从而实现强大而全面的学习过程。我们通过大量实证评估证明了这一框架的有效性，突显了其在解决复杂ST挑战方面的更好性能。我们提供了彻底的消融研究，以调查我们的课程的有效性，并解释它如何有助于提高ST数据的学习效率。

更新时间: 2024-06-18 15:23:10

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2406.12709v1

Talk With Human-like Agents: Empathetic Dialogue Through Perceptible Acoustic Reception and Reaction

Large Language Model (LLM)-enhanced agents become increasingly prevalent in Human-AI communication, offering vast potential from entertainment to professional domains. However, current multi-modal dialogue systems overlook the acoustic information present in speech, which is crucial for understanding human communication nuances. This oversight can lead to misinterpretations of speakers' intentions, resulting in inconsistent or even contradictory responses within dialogues. To bridge this gap, in this paper, we propose PerceptiveAgent, an empathetic multi-modal dialogue system designed to discern deeper or more subtle meanings beyond the literal interpretations of words through the integration of speech modality perception. Employing LLMs as a cognitive core, PerceptiveAgent perceives acoustic information from input speech and generates empathetic responses based on speaking styles described in natural language. Experimental results indicate that PerceptiveAgent excels in contextual understanding by accurately discerning the speakers' true intentions in scenarios where the linguistic meaning is either contrary to or inconsistent with the speaker's true feelings, producing more nuanced and expressive spoken dialogues. Code is publicly available at: \url{https://github.com/Haoqiu-Yan/PerceptiveAgent}.

Updated: 2024-06-18 15:19:51

标题: 与类人智能代理交谈：通过可感知的声学接收和反应实现共情对话

摘要: 大型语言模型(LLM)-增强代理在人工智能交流中变得越来越普遍，从娱乐到专业领域都具有巨大潜力。然而，当前的多模态对话系统忽视了语音中存在的声学信息，这对于理解人类交流细微差别至关重要。这种疏忽可能导致对说话者意图的误解，导致对话中出现不一致甚至矛盾的回应。为了弥合这一差距，在本文中，我们提出了一种名为PerceptiveAgent的共情多模态对话系统，旨在通过整合语音模态感知来辨别言外之意或更微妙的含义，超越文字的字面解释。利用LLMs作为认知核心，PerceptiveAgent从输入语音中感知声学信息，并根据自然语言描述的说话风格生成共情回应。实验结果表明，PerceptiveAgent在情境理解方面表现出色，能够准确辨别说话者真正意图，即使在语言含义与说话者真实感情相反或不一致的情况下，也能产生更加细腻和富有表现力的口头对话。代码公开可在以下链接获取：\url{https://github.com/Haoqiu-Yan/PerceptiveAgent}。

更新时间: 2024-06-18 15:19:51

领域: cs.CL,cs.AI,cs.SD,eess.AS

下载: http://arxiv.org/abs/2406.12707v1

BLoB: Bayesian Low-Rank Adaptation by Backpropagation for Large Language Models

Large Language Models (LLMs) often suffer from overconfidence during inference, particularly when adapted to downstream domain-specific tasks with limited data. Previous work addresses this issue by employing approximate Bayesian estimation after the LLMs are trained, enabling them to quantify uncertainty. However, such post-training approaches' performance is severely limited by the parameters learned during training. In this paper, we go beyond post-training Bayesianization and propose Bayesian Low-Rank Adaptation by Backpropagation (BLoB), an algorithm that continuously and jointly adjusts both the mean and covariance of LLM parameters throughout the whole fine-tuning process. Our empirical results verify the effectiveness of BLoB in terms of generalization and uncertainty estimation, when evaluated on both in-distribution and out-of-distribution data.

Updated: 2024-06-18 15:15:04

标题: BLoB：基于反向传播的贝叶斯低秩调整方法用于大型语言模型

摘要: 大型语言模型（LLMs）在推断过程中经常表现出过度自信，特别是在适应有限数据的下游领域特定任务时。先前的研究通过在LLMs训练后采用近似贝叶斯估计来解决这个问题，使其能够量化不确定性。然而，这种后训练方法的性能受到训练期间学习到的参数的严重限制。在本文中，我们超越了后训练贝叶斯化，并提出了通过反向传播的贝叶斯低秩适应（BLoB）算法，该算法在整个微调过程中持续和联合地调整LLM参数的均值和协方差。我们的实证结果验证了BLoB在泛化和不确定性估计方面的有效性，在评估时考虑了分布内和分布外的数据。

更新时间: 2024-06-18 15:15:04

领域: cs.LG,cs.AI,cs.CL,stat.ML

下载: http://arxiv.org/abs/2406.11675v2

On Efficiently Representing Regular Languages as RNNs

Recent work by Hewitt et al. (2020) provides an interpretation of the empirical success of recurrent neural networks (RNNs) as language models (LMs). It shows that RNNs can efficiently represent bounded hierarchical structures that are prevalent in human language. This suggests that RNNs' success might be linked to their ability to model hierarchy. However, a closer inspection of Hewitt et al.'s (2020) construction shows that it is not inherently limited to hierarchical structures. This poses a natural question: What other classes of LMs can RNNs efficiently represent? To this end, we generalize Hewitt et al.'s (2020) construction and show that RNNs can efficiently represent a larger class of LMs than previously claimed -- specifically, those that can be represented by a pushdown automaton with a bounded stack and a specific stack update function. Altogether, the efficiency of representing this diverse class of LMs with RNN LMs suggests novel interpretations of their inductive bias.

Updated: 2024-06-18 15:14:18

标题: 关于将正则语言有效地表示为循环神经网络的研究

摘要: Hewitt等人（2020年）最近的研究解释了循环神经网络（RNNs）作为语言模型（LMs）的实证成功。他们表明，RNNs可以有效地表示人类语言中普遍存在的有界分层结构。这表明RNNs的成功可能与其建模层次结构的能力有关。然而，对Hewitt等人（2020年）构建的仔细检查显示，它并不固有地局限于分层结构。这引出一个自然的问题：RNNs可以有效表示哪些其他类别的LMs？为此，我们推广了Hewitt等人（2020年）的构建，并展示RNNs可以有效地表示比先前所述更广泛的LMs类别 - 具体而言，那些可以由具有有界堆栈和特定堆栈更新函数的下推自动机表示的LMs。总的来说，用RNN LMs表示这种多样化的LMs类别的效率提出了它们的归纳偏见的新颖解释。

更新时间: 2024-06-18 15:14:18

领域: cs.CL,cs.CC,cs.LG

下载: http://arxiv.org/abs/2402.15814v2

SUPER: Selfie Undistortion and Head Pose Editing with Identity Preservation

Self-portraits captured from a short distance might look unnatural or even unattractive due to heavy distortions making facial features malformed, and ill-placed head poses. In this paper, we propose SUPER, a novel method of eliminating distortions and adjusting head pose in a close-up face crop. We perform 3D GAN inversion for a facial image by optimizing camera parameters and face latent code, which gives a generated image. Besides, we estimate depth from the obtained latent code, create a depth-induced 3D mesh, and render it with updated camera parameters to obtain a warped portrait. Finally, we apply the visibility-based blending so that visible regions are reprojected, and occluded parts are restored with a generative model. Experiments on face undistortion benchmarks and on our self-collected Head Rotation dataset (HeRo), show that SUPER outperforms previous approaches both qualitatively and quantitatively, opening new possibilities for photorealistic selfie editing.

Updated: 2024-06-18 15:14:14

标题: SUPER：自拍矫正和头部姿势编辑与身份保留

摘要: 从短距离拍摄的自拍照可能因为严重的扭曲使面部特征变形，头部姿势不正确，看起来不自然甚至不吸引人。在本文中，我们提出了SUPER，一种新颖的方法，用于消除扭曲并调整近距离拍摄的面部裁剪中的头部姿势。我们通过优化摄像机参数和面部潜在代码来执行面部图像的3D GAN反演，从而生成图像。此外，我们从获得的潜在代码中估计深度，创建深度诱导的3D网格，并使用更新后的摄像机参数渲染它以获得扭曲的肖像。最后，我们应用基于可见性的混合，以便重新投影可见区域，并使用生成模型恢复被遮挡的部分。在面部去畸变基准测试和我们自行收集的头部旋转数据集（HeRo）上的实验表明，SUPER在定性和定量上都优于先前的方法，为逼真的自拍编辑开辟了新的可能性。

更新时间: 2024-06-18 15:14:14

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2406.12700v1

LoRA-drop: Efficient LoRA Parameter Pruning based on Output Evaluation

Low-Rank Adaptation (LoRA) is currently the most commonly used Parameter-efficient fine-tuning (PEFT) method, it introduces auxiliary parameters for each layer to fine-tune the pre-trained model under limited computing resources. However, it still faces resource consumption challenges during training when scaling up to larger models. Most previous studies have tackled this issue by using pruning techniques, which involve removing LoRA parameters deemed unimportant. Nonetheless, these efforts only analyze LoRA parameter features to evaluate their importance, such as parameter count, size, and gradient. In fact, the output of LoRA (product of LoRA parameter and hidden state), directly impacts the final results. Preliminary experiments indicate that a fraction of LoRA elements possesses significantly high output values, substantially influencing the layer output. Motivated by the observation, we propose LoRA-drop. Concretely, LoRA-drop evaluates the importance of LoRA based on the LoRA output. Then we retain LoRA for important layers and the other layers share the same LoRA. We conduct abundant experiments with models of different scales on NLU and NLG tasks. Results demonstrate that LoRA-drop can achieve performance comparable to full fine-tuning and LoRA, while retaining 50\% of the LoRA parameters on average.

Updated: 2024-06-18 15:13:12

标题: LoRA-drop：基于输出评估的高效LoRA参数修剪

摘要: Low-Rank Adaptation (LoRA)是目前最常用的参数高效微调（PEFT）方法，它为每一层引入辅助参数，以在有限的计算资源下微调预训练模型。然而，当扩展到更大的模型时，训练过程仍然面临资源消耗挑战。大多数先前的研究通过使用修剪技术来解决这个问题，这涉及删除被认为不重要的LoRA参数。然而，这些努力仅分析LoRA参数特征以评估其重要性，如参数数量、大小和梯度。事实上，LoRA的输出（LoRA参数和隐藏状态的乘积）直接影响最终结果。初步实验表明，一部分LoRA元素具有显著高的输出值，对层输出产生重大影响。受这一观察的启发，我们提出LoRA-drop。具体来说，LoRA-drop根据LoRA的输出评估LoRA的重要性。然后我们保留重要层的LoRA，其他层共享相同的LoRA。我们在不同规模的模型上进行了大量实验，涉及NLU和NLG任务。结果表明，LoRA-drop可以在保留平均50%的LoRA参数的同时，实现与完全微调和LoRA相当的性能。

更新时间: 2024-06-18 15:13:12

领域: cs.LG,cs.CL

下载: http://arxiv.org/abs/2402.07721v2

Online-Adaptive Anomaly Detection for Defect Identification in Aircraft Assembly

Anomaly detection deals with detecting deviations from established patterns within data. It has various applications like autonomous driving, predictive maintenance, and medical diagnosis. To improve anomaly detection accuracy, transfer learning can be applied to large, pre-trained models and adapt them to the specific application context. In this paper, we propose a novel framework for online-adaptive anomaly detection using transfer learning. The approach adapts to different environments by selecting visually similar training images and online fitting a normality model to EfficientNet features extracted from the training subset. Anomaly detection is then performed by computing the Mahalanobis distance between the normality model and the test image features. Different similarity measures (SIFT/FLANN, Cosine) and normality models (MVG, OCSVM) are employed and compared with each other. We evaluate the approach on different anomaly detection benchmarks and data collected in controlled laboratory settings. Experimental results showcase a detection accuracy exceeding 0.975, outperforming the state-of-the-art ET-NET approach.

Updated: 2024-06-18 15:11:44

标题: 飞机装配中的缺陷识别的在线自适应异常检测

摘要: 异常检测涉及检测数据中与已建立模式的偏离。它在自动驾驶、预测性维护和医疗诊断等方面有各种应用。为了提高异常检测的准确性，可以将迁移学习应用于大型预训练模型，并将其调整到特定的应用环境中。本文提出了一种利用迁移学习进行在线自适应异常检测的新框架。该方法通过选择视觉上相似的训练图像，并在线拟合从训练子集提取的EfficientNet特征的正常模型来适应不同的环境。然后，通过计算正常模型与测试图像特征之间的马哈拉诺比斯距离来执行异常检测。采用不同的相似度度量（SIFT/FLANN，余弦）和正常模型（MVG，OCSVM），并将它们进行比较。我们在不同的异常检测基准和在受控实验室环境中收集的数据上评估了该方法。实验结果展示了超过0.975的检测准确性，优于最先进的ET-NET方法。

更新时间: 2024-06-18 15:11:44

领域: cs.CV,cs.AI,cs.RO

下载: http://arxiv.org/abs/2406.12698v1

Evaluating the Data Model Robustness of Text-to-SQL Systems Based on Real User Queries

Text-to-SQL systems (also known as NL-to-SQL systems) have become an increasingly popular solution for bridging the gap between user capabilities and SQL-based data access. These systems translate user requests in natural language to valid SQL statements for a specific database. Recent Text-to-SQL systems have benefited from the rapid improvement of transformer-based language models. However, while Text-to-SQL systems that incorporate such models continuously reach new high scores on -- often synthetic -- benchmark datasets, a systematic exploration of their robustness towards different data models in a real-world, realistic scenario is notably missing. This paper provides the first in-depth evaluation of the data model robustness of Text-to-SQL systems in practice based on a multi-year international project focused on Text-to-SQL interfaces. Our evaluation is based on a real-world deployment of FootballDB, a system that was deployed over a 9 month period in the context of the FIFA World Cup 2022, during which about 6K natural language questions were asked and executed. All of our data is based on real user questions that were asked live to the system. We manually labeled and translated a subset of these questions for three different data models. For each data model, we explore the performance of representative Text-to-SQL systems and language models. We further quantify the impact of training data size, pre-, and post-processing steps as well as language model inference time. Our comprehensive evaluation sheds light on the design choices of real-world Text-to-SQL systems and their impact on moving from research prototypes to real deployments. Last, we provide a new benchmark dataset to the community, which is the first to enable the evaluation of different data models for the same dataset and is substantially more challenging than most previous datasets in terms of query complexity.

Updated: 2024-06-18 15:10:01

标题: 基于真实用户查询评估文本到SQL系统的数据模型稳健性

摘要: 文本到SQL系统（也称为NL到SQL系统）已经成为弥合用户能力和基于SQL的数据访问之间差距的日益流行的解决方案。这些系统将用户的自然语言请求转换为特定数据库的有效SQL语句。最近的文本到SQL系统受益于基于变压器的语言模型的快速改进。然而，虽然整合了这些模型的文本到SQL系统不断在 -- 通常是合成的 -- 基准数据集上取得新的高分，但对它们在真实世界、现实场景中对不同数据模型的稳健性进行系统探索明显缺失。本文基于一个多年的国际项目对实际中文本到SQL系统的数据模型稳健性进行了深入评估，该项目专注于文本到SQL界面。我们的评估基于FootballDB的实际部署，该系统在2022年FIFA世界杯期间部署了9个月，期间大约提出并执行了6K个自然语言问题。我们的所有数据都基于实际用户实时向系统提出的问题。我们手动标记和翻译了这些问题的一个子集，针对三种不同的数据模型。对于每个数据模型，我们探索了代表性的文本到SQL系统和语言模型的性能。我们进一步量化了训练数据大小、预处理和后处理步骤以及语言模型推理时间的影响。我们的全面评估揭示了真实世界文本到SQL系统的设计选择及其对从研究原型转向实际部署的影响。最后，我们为社区提供了一个新的基准数据集，这是第一个能够评估同一数据集的不同数据模型的数据集，并在查询复杂性方面比大多数先前的数据集更具挑战性。

更新时间: 2024-06-18 15:10:01

领域: cs.DB,cs.AI,cs.CL

下载: http://arxiv.org/abs/2402.08349v2

Transformers Can Represent $n$-gram Language Models

Existing work has analyzed the representational capacity of the transformer architecture by means of formal models of computation. However, the focus so far has been on analyzing the architecture in terms of language \emph{acceptance}. We contend that this is an ill-suited problem in the study of \emph{language models} (LMs), which are definitionally \emph{probability distributions} over strings. In this paper, we focus on the relationship between transformer LMs and $n$-gram LMs, a simple and historically relevant class of language models. We show that transformer LMs using the hard or sparse attention mechanisms can exactly represent any $n$-gram LM, giving us a concrete lower bound on their probabilistic representational capacity. This provides a first step towards understanding the mechanisms that transformer LMs can use to represent probability distributions over strings.

Updated: 2024-06-18 15:08:24

标题: 变压器可以表示$n$-gram语言模型

摘要: 现有的工作通过形式化的计算模型分析了变压器架构的表征能力。然而，到目前为止，重点一直放在以语言“接受”为基础的架构分析上。我们认为，在研究“语言模型”（LMs）时，这是一个不合适的问题，LMs在定义上是字符串上的概率分布。在本文中，我们关注变压器LMs和$n$-gram LMs之间的关系，$n$-gram LMs是一种简单且历史上相关的语言模型类别。我们展示了使用硬或稀疏注意机制的变压器LMs可以精确表示任何$n$-gram LM，为它们的概率表征能力提供了具体的下限。这为理解变压器LMs可以用来表示字符串上的概率分布提供了第一步。

更新时间: 2024-06-18 15:08:24

领域: cs.CL,cs.AI,cs.CC,cs.FL,cs.LG

下载: http://arxiv.org/abs/2404.14994v2

XXLTraffic: Expanding and Extremely Long Traffic Dataset for Ultra-Dynamic Forecasting Challenges

Traffic forecasting is crucial for smart cities and intelligent transportation initiatives, where deep learning has made significant progress in modeling complex spatio-temporal patterns in recent years. However, current public datasets have limitations in reflecting the ultra-dynamic nature of real-world scenarios, characterized by continuously evolving infrastructures, varying temporal distributions, and temporal gaps due to sensor downtimes or changes in traffic patterns. These limitations inevitably restrict the practical applicability of existing traffic forecasting datasets. To bridge this gap, we present XXLTraffic, the largest available public traffic dataset with the longest timespan and increasing number of sensor nodes over the multiple years observed in the data, curated to support research in ultra-dynamic forecasting. Our benchmark includes both typical time-series forecasting settings with hourly and daily aggregated data and novel configurations that introduce gaps and down-sample the training size to better simulate practical constraints. We anticipate the new XXLTraffic will provide a fresh perspective for the time-series and traffic forecasting communities. It would also offer a robust platform for developing and evaluating models designed to tackle ultra-dynamic and extremely long forecasting problems. Our dataset supplements existing spatio-temporal data resources and leads to new research directions in this domain.

Updated: 2024-06-18 15:06:22

标题: XXLTraffic：用于超动态预测挑战的扩展和极长交通数据集

摘要: 交通预测对于智慧城市和智能交通计划至关重要，在过去几年中，深度学习在建模复杂的时空模式方面取得了显著进展。然而，当前的公共数据集在反映真实世界场景的超动态特性方面存在局限性，这些特性包括不断发展的基础设施、不同的时间分布以及由于传感器停机或交通模式改变而导致的时间间隔。这些局限性不可避免地限制了现有交通预测数据集的实际适用性。为了弥补这一差距，我们提出了XXLTraffic，这是目前公开的拥有最长时间跨度和不断增加传感器节点数量的交通数据集，旨在支持超动态预测研究。我们的基准包括典型的时间序列预测设置，包括按小时和按日汇总的数据，以及引入间隙和减少训练规模以更好地模拟实际约束的新配置。我们期待新的XXLTraffic将为时间序列和交通预测社区提供新的视角。它还将为开发和评估旨在解决超动态和极长预测问题的模型提供强大的平台。我们的数据集补充了现有的时空数据资源，并引领了这一领域的新研究方向。

更新时间: 2024-06-18 15:06:22

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2406.12693v1

MAGIC: Generating Self-Correction Guideline for In-Context Text-to-SQL

Self-correction in text-to-SQL is the process of prompting large language model (LLM) to revise its previously incorrectly generated SQL, and commonly relies on manually crafted self-correction guidelines by human experts that are not only labor-intensive to produce but also limited by the human ability in identifying all potential error patterns in LLM responses. We introduce MAGIC, a novel multi-agent method that automates the creation of the self-correction guideline. MAGIC uses three specialized agents: a manager, a correction, and a feedback agent. These agents collaborate on the failures of an LLM-based method on the training set to iteratively generate and refine a self-correction guideline tailored to LLM mistakes, mirroring human processes but without human involvement. Our extensive experiments show that MAGIC's guideline outperforms expert human's created ones. We empirically find out that the guideline produced by MAGIC enhance the interpretability of the corrections made, providing insights in analyzing the reason behind the failures and successes of LLMs in self-correction. We make all agent interactions publicly available to the research community, to foster further research in this area, offering a synthetic dataset for future explorations into automatic self-correction guideline generation.

Updated: 2024-06-18 15:06:06

标题: MAGIC：生成上下文文本到SQL自纠正指南

摘要: 自我纠正在文本到SQL中是促使大型语言模型（LLM）修订其先前生成的错误SQL的过程，通常依赖于人类专家手工制定的自我纠正指南，这不仅费时费力，而且受到人类在识别LLM响应中所有潜在错误模式的能力的限制。我们引入了MAGIC，这是一种新颖的多智能体方法，自动创建了自我纠正指南。MAGIC使用三个专门的智能体：一个管理者，一个纠正者和一个反馈智能体。这些智能体合作处理LLM方法在训练集上的失败，迭代生成和完善一个针对LLM错误的自我纠正指南，模仿人类过程但不涉及人类参与。我们的广泛实验表明，MAGIC的指南优于专家人类创建的指南。我们经验性地发现，MAGIC生成的指南提高了纠正的可解释性，为分析LLM在自我纠正中失败和成功的原因提供了洞察。我们将所有智能体的互动公开提供给研究社区，促进进一步研究，为未来探索自动生成自我纠正指南提供一个合成数据集。

更新时间: 2024-06-18 15:06:06

领域: cs.CL,cs.AI,cs.DB,cs.HC

下载: http://arxiv.org/abs/2406.12692v1

Connecting the Dots: Evaluating Abstract Reasoning Capabilities of LLMs Using the New York Times Connections Word Game

The New York Times Connections game has emerged as a popular and challenging pursuit for word puzzle enthusiasts. We collect 200 Connections games to evaluate the performance of state-of-the-art large language models (LLMs) against expert and novice human players. Our results show that even the best-performing LLM, GPT-4o, which has otherwise shown impressive reasoning abilities on a wide variety of benchmarks, can only fully solve 8% of the games. Compared to GPT-4o, novice and expert players perform better, with expert human players significantly outperforming GPT-4o. To deepen our understanding we create a taxonomy of the knowledge types required to successfully categorize words in the Connections game, revealing that LLMs struggle with associative, encyclopedic, and linguistic knowledge. Our findings establish the New York Times Connections game as a challenging benchmark for evaluating abstract reasoning capabilities in humans and AI systems.

Updated: 2024-06-18 15:02:28

标题: 连接点：使用《纽约时报》连接词游戏评估LLM的抽象推理能力

摘要: 《纽约时报连接》游戏已成为文字谜题爱好者们喜爱且具有挑战性的追求。我们收集了200个《连接》游戏，以评估最先进的大型语言模型（LLMs）在专家和新手人类玩家面前的表现。我们的结果显示，即使是表现最佳的LLM，GPT-4o，在各种基准测试中展现出令人印象深刻的推理能力，也只能完全解决8%的游戏。与GPT-4o相比，新手和专家玩家表现更好，专家人类玩家明显优于GPT-4o。为了加深我们的理解，我们创建了一个知识类型分类法，以成功分类《连接》游戏中的单词所需的知识类型，揭示了LLMs在联想、百科全书和语言知识方面的困难。我们的发现将《纽约时报连接》游戏确立为评估人类和人工智能系统抽象推理能力的具有挑战性的基准。

更新时间: 2024-06-18 15:02:28

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2406.11012v2

The Lie Derivative for Measuring Learned Equivariance

Equivariance guarantees that a model's predictions capture key symmetries in data. When an image is translated or rotated, an equivariant model's representation of that image will translate or rotate accordingly. The success of convolutional neural networks has historically been tied to translation equivariance directly encoded in their architecture. The rising success of vision transformers, which have no explicit architectural bias towards equivariance, challenges this narrative and suggests that augmentations and training data might also play a significant role in their performance. In order to better understand the role of equivariance in recent vision models, we introduce the Lie derivative, a method for measuring equivariance with strong mathematical foundations and minimal hyperparameters. Using the Lie derivative, we study the equivariance properties of hundreds of pretrained models, spanning CNNs, transformers, and Mixer architectures. The scale of our analysis allows us to separate the impact of architecture from other factors like model size or training method. Surprisingly, we find that many violations of equivariance can be linked to spatial aliasing in ubiquitous network layers, such as pointwise non-linearities, and that as models get larger and more accurate they tend to display more equivariance, regardless of architecture. For example, transformers can be more equivariant than convolutional neural networks after training.

Updated: 2024-06-18 15:01:13

标题: 用于测量学习等变性的谎言导数

摘要: 等变性确保模型的预测捕捉数据中的关键对称性。当图像被平移或旋转时，等变模型对该图像的表示会相应地进行平移或旋转。卷积神经网络的成功历史上与直接编码在其架构中的平移等变性相关联。视觉transformers的崛起成功挑战了这一说法，并表明增强和训练数据也可能在其性能中发挥重要作用。为了更好地理解最近视觉模型中等变性的作用，我们引入了Lie导数，一种具有坚实数学基础和最小超参数的等变性测量方法。使用Lie导数，我们研究了数百个预训练模型的等变性属性，涵盖了CNNs、transformers和Mixer架构。我们的分析规模使我们能够将架构的影响与模型大小或训练方法等其他因素分开。令人惊讶的是，我们发现许多等变性违例可以与普遍网络层中的空间混叠（如逐点非线性）联系起来，并且随着模型变得更大和更准确，它们往往会显示更多的等变性，不管架构如何。例如，在训练后，transformers可以比卷积神经网络更具等变性。

更新时间: 2024-06-18 15:01:13

领域: cs.LG,cs.AI,cs.CV,stat.ML

下载: http://arxiv.org/abs/2210.02984v2

Spatial Sequence Attention Network for Schizophrenia Classification from Structural Brain MR Images

Schizophrenia is a debilitating, chronic mental disorder that significantly impacts an individual's cognitive abilities, behavior, and social interactions. It is characterized by subtle morphological changes in the brain, particularly in the gray matter. These changes are often imperceptible through manual observation, demanding an automated approach to diagnosis. This study introduces a deep learning methodology for the classification of individuals with Schizophrenia. We achieve this by implementing a diversified attention mechanism known as Spatial Sequence Attention (SSA) which is designed to extract and emphasize significant feature representations from structural MRI (sMRI). Initially, we employ the transfer learning paradigm by leveraging pre-trained DenseNet to extract initial feature maps from the final convolutional block which contains morphological alterations associated with Schizophrenia. These features are further processed by the proposed SSA to capture and emphasize intricate spatial interactions and relationships across volumes within the brain. Our experimental studies conducted on a clinical dataset have revealed that the proposed attention mechanism outperforms the existing Squeeze & Excitation Network for Schizophrenia classification.

Updated: 2024-06-18 14:55:41

标题: 空间序列注意力网络用于从结构脑MR图像中对精神分裂症进行分类

摘要: 精神分裂症是一种严重影响个体认知能力、行为和社交互动的慢性精神障碍。其特点是大脑中的微小形态学变化，特别是在灰质方面。这些变化通常通过手动观察难以察觉，需要自动化方法进行诊断。本研究引入了一种用于精神分裂症个体分类的深度学习方法。我们通过实施一种被称为空间序列注意力（SSA）的多样化注意力机制来实现这一目标，该机制旨在从结构磁共振成像（sMRI）中提取和强调重要的特征表示。最初，我们利用迁移学习范式，通过利用预训练的DenseNet从包含与精神分裂症相关的形态学改变的最终卷积块中提取初始特征图。这些特征进一步由提出的SSA处理，以捕捉并强调大脑内体积之间复杂的空间交互作用和关系。我们在临床数据集上进行的实验研究表明，提出的注意力机制在精神分裂症分类方面优于现有的Squeeze & Excitation Network。

更新时间: 2024-06-18 14:55:41

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2406.12683v1

Code Agents are State of the Art Software Testers

Rigorous software testing is crucial for developing and maintaining high-quality code, making automated test generation a promising avenue for both improving software quality and boosting the effectiveness of code generation methods. However, while code generation with Large Language Models (LLMs) is an extraordinarily active research area, test generation remains relatively unexplored. We address this gap and investigate the capability of LLM-based Code Agents for formalizing user issues into test cases. To this end, we propose a novel benchmark based on popular GitHub repositories, containing real-world issues, ground-truth patches, and golden tests. We find that LLMs generally perform surprisingly well at generating relevant test cases with Code Agents designed for code repair exceeding the performance of systems designed specifically for test generation. Further, as test generation is a similar but more structured task than code generation, it allows for a more fine-grained analysis using fail-to-pass rate and coverage metrics, providing a dual metric for analyzing systems designed for code repair. Finally, we find that generated tests are an effective filter for proposed code fixes, doubling the precision of SWE-Agent.

Updated: 2024-06-18 14:54:37

标题: 代码代理是最先进的软件测试工具 (Note: "Code Agents" may refer to a specific type of software or testing tool in this context.)

摘要: 严格的软件测试对于开发和维护高质量代码至关重要，因此自动化测试生成是改善软件质量和提高代码生成方法效果的一个有前途的途径。然而，尽管使用大型语言模型（LLMs）进行代码生成是一个非常活跃的研究领域，测试生成仍然相对未被探索。我们填补了这一空白，研究了基于LLM的代码代理的能力，将用户问题形式化为测试用例。为此，我们提出了一个基于流行的GitHub存储库的新型基准，其中包含真实世界的问题、标准修补程序和黄金测试。我们发现，LLMs通常在使用专门设计用于代码修复的代码代理生成相关的测试用例方面表现出色，超过了专门设计用于测试生成的系统的性能。此外，由于测试生成是一个与代码生成类似但更结构化的任务，它允许使用失败转化率和覆盖率指标进行更细粒度的分析，提供了用于分析专门设计用于代码修复的系统的双重指标。最后，我们发现生成的测试是对提议的代码修复的有效过滤器，将SWE-Agent的精确度提高了一倍。

更新时间: 2024-06-18 14:54:37

领域: cs.SE,cs.AI,cs.LG

下载: http://arxiv.org/abs/2406.12952v1

WRDScore: New Metric for Evaluation of Natural Language Generation Models

The problem of natural language generation, and, more specifically, method name prediction, faces significant difficulties when proposed models need to be evaluated on test data. Such a metric would need to consider the versatility with which a single method can be named, with respect to both semantics and syntax. Measuring the direct overlap between the predicted and reference (true) sequences will not be able to capture these subtleties. Other existing embedding based metrics either do not measure precision and recall or impose strict unrealistic assumptions on both sequences. To address these issues, we propose a new metric that, on the one hand, is very simple and lightweight, and, on the other hand, is able to calculate precision and recall without resorting to any assumptions while obtaining good performance with respect to the human judgement.

Updated: 2024-06-18 14:53:24

标题: WRDScore：自然语言生成模型评估的新指标

摘要: 自然语言生成问题，更具体地说，方法名称预测，在测试数据上面临重大困难。这样的度量需要考虑单个方法在语义和句法方面的命名灵活性。直接测量预测序列和参考（真实）序列之间的重叠不能捕捉这些微妙之处。其他现有的基于嵌入的度量要么不测量精确度和召回率，要么对两个序列都施加严格不切实际的假设。为了解决这些问题，我们提出了一种新的度量方法，一方面非常简单轻量，另一方面能够在不借助任何假设的情况下计算精确度和召回率，同时在人类判断方面表现良好。

更新时间: 2024-06-18 14:53:24

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2405.19220v2

Contraction rates for conjugate gradient and Lanczos approximate posteriors in Gaussian process regression

Due to their flexibility and theoretical tractability Gaussian process (GP) regression models have become a central topic in modern statistics and machine learning. While the true posterior in these models is given explicitly, numerical evaluations depend on the inversion of the augmented kernel matrix $ K + \sigma^2 I $, which requires up to $ O(n^3) $ operations. For large sample sizes n, which are typically given in modern applications, this is computationally infeasible and necessitates the use of an approximate version of the posterior. Although such methods are widely used in practice, they typically have very limtied theoretical underpinning. In this context, we analyze a class of recently proposed approximation algorithms from the field of Probabilistic numerics. They can be interpreted in terms of Lanczos approximate eigenvectors of the kernel matrix or a conjugate gradient approximation of the posterior mean, which are particularly advantageous in truly large scale applications, as they are fundamentally only based on matrix vector multiplications amenable to the GPU acceleration of modern software frameworks. We combine result from the numerical analysis literature with state of the art concentration results for spectra of kernel matrices to obtain minimax contraction rates. Our theoretical findings are illustrated by numerical experiments.

Updated: 2024-06-18 14:50:42

标题: 共轭梯度和Lanczos近似后验在高斯过程回归中的收缩率

摘要: 由于其灵活性和理论可追踪性，高斯过程（GP）回归模型已成为现代统计学和机器学习中的核心主题。尽管这些模型中的真实后验已明确给出，但数值评估取决于增广核矩阵$K + \sigma^2 I$的反演，这需要高达$O(n^3)$的操作。对于通常在现代应用中给定的大样本量n，这在计算上是不可行的，需要使用后验的近似版本。尽管这种方法在实践中被广泛使用，但它们通常具有非常有限的理论基础。在这种背景下，我们分析了最近从概率数值领域提出的一类近似算法。它们可以解释为核矩阵的Lanczos近似特征向量或后验均值的共轭梯度近似，这在真正大规模应用中特别有优势，因为它们基本上只基于适合于现代软件框架GPU加速的矩阵向量乘法。我们将数值分析文献中的结果与核矩阵谱的集中结果相结合，以获得极小的收缩速率。我们的理论发现通过数值实验进行了说明。

更新时间: 2024-06-18 14:50:42

领域: stat.ML,cs.LG,math.ST,stat.TH

下载: http://arxiv.org/abs/2406.12678v1

Memorization in Self-Supervised Learning Improves Downstream Generalization

Self-supervised learning (SSL) has recently received significant attention due to its ability to train high-performance encoders purely on unlabeled data-often scraped from the internet. This data can still be sensitive and empirical evidence suggests that SSL encoders memorize private information of their training data and can disclose them at inference time. Since existing theoretical definitions of memorization from supervised learning rely on labels, they do not transfer to SSL. To address this gap, we propose SSLMem, a framework for defining memorization within SSL. Our definition compares the difference in alignment of representations for data points and their augmented views returned by both encoders that were trained on these data points and encoders that were not. Through comprehensive empirical analysis on diverse encoder architectures and datasets we highlight that even though SSL relies on large datasets and strong augmentations-both known in supervised learning as regularization techniques that reduce overfitting-still significant fractions of training data points experience high memorization. Through our empirical results, we show that this memorization is essential for encoders to achieve higher generalization performance on different downstream tasks.

Updated: 2024-06-18 14:49:32

标题: 自监督学习中的记忆化提高下游泛化

摘要: 最近，自监督学习（SSL）受到了重视，因为它能够在纯粹使用未标记数据（通常来自互联网抓取）训练高性能编码器。这些数据可能仍然是敏感的，经验证据表明，SSL编码器会记住训练数据的私人信息，并在推断时披露这些信息。由于现有的监督学习中关于记忆的理论定义依赖于标签，因此不能直接应用于SSL。为了弥补这一差距，我们提出了SSLMem，一个用于在SSL中定义记忆的框架。我们的定义比较了在数据点及其增强视图的表示之间对齐的差异，这些数据点由训练过这些数据点的编码器和未经训练的编码器返回。通过对不同编码器架构和数据集进行全面的经验分析，我们强调，尽管SSL依赖于大型数据集和强大的增强技术（在监督学习中被视为减少过拟合的正则化技术），仍然有相当大比例的训练数据点经历高度的记忆。通过我们的经验结果，我们表明这种记忆对于编码器在不同下游任务上实现更高的泛化性能是必不可少的。

更新时间: 2024-06-18 14:49:32

领域: cs.LG

下载: http://arxiv.org/abs/2401.12233v3

Large Language Models Are Zero-Shot Time Series Forecasters

By encoding time series as a string of numerical digits, we can frame time series forecasting as next-token prediction in text. Developing this approach, we find that large language models (LLMs) such as GPT-3 and LLaMA-2 can surprisingly zero-shot extrapolate time series at a level comparable to or exceeding the performance of purpose-built time series models trained on the downstream tasks. To facilitate this performance, we propose procedures for effectively tokenizing time series data and converting discrete distributions over tokens into highly flexible densities over continuous values. We argue the success of LLMs for time series stems from their ability to naturally represent multimodal distributions, in conjunction with biases for simplicity, and repetition, which align with the salient features in many time series, such as repeated seasonal trends. We also show how LLMs can naturally handle missing data without imputation through non-numerical text, accommodate textual side information, and answer questions to help explain predictions. While we find that increasing model size generally improves performance on time series, we show GPT-4 can perform worse than GPT-3 because of how it tokenizes numbers, and poor uncertainty calibration, which is likely the result of alignment interventions such as RLHF.

Updated: 2024-06-18 14:48:38

标题: 大型语言模型是零样本时间序列预测器

摘要: 通过将时间序列编码为一串数字，我们可以将时间序列预测视为文本中的下一个标记预测。通过开发这种方法，我们发现大型语言模型（LLMs）如GPT-3和LLaMA-2可以出人意料地在零样本情况下外推时间序列，其水平可与或超过在下游任务上训练的专门构建的时间序列模型的性能。为了促进这种性能，我们提出了有效对时间序列数据进行标记化和将离散分布转换为连续值高度灵活的密度的程序。我们认为LLMs在时间序列中的成功源于它们自然表示多模态分布的能力，以及对简单性和重复的偏见，这与许多时间序列中的显著特征相一致，如重复的季节性趋势。我们还展示了LLMs如何可以自然地处理缺失数据，而无需插补通过非数字文本，容纳文本侧信息，并回答问题以帮助解释预测。虽然我们发现增加模型大小通常会改善时间序列的性能，但我们展示了GPT-4可能比GPT-3表现更差，这是因为它如何标记数字，以及不良的不确定性校准，这可能是对齐干预（如RLHF）的结果。

更新时间: 2024-06-18 14:48:38

领域: cs.LG

下载: http://arxiv.org/abs/2310.07820v2

Sparsifying dimensionality reduction of PDE solution data with Bregman learning

Classical model reduction techniques project the governing equations onto a linear subspace of the original state space. More recent data-driven techniques use neural networks to enable nonlinear projections. Whilst those often enable stronger compression, they may have redundant parameters and lead to suboptimal latent dimensionality. To overcome these, we propose a multistep algorithm that induces sparsity in the encoder-decoder networks for effective reduction in the number of parameters and additional compression of the latent space. This algorithm starts with sparsely initialized a network and training it using linearized Bregman iterations. These iterations have been very successful in computer vision and compressed sensing tasks, but have not yet been used for reduced-order modelling. After the training, we further compress the latent space dimensionality by using a form of proper orthogonal decomposition. Last, we use a bias propagation technique to change the induced sparsity into an effective reduction of parameters. We apply this algorithm to three representative PDE models: 1D diffusion, 1D advection, and 2D reaction-diffusion. Compared to conventional training methods like Adam, the proposed method achieves similar accuracy with 30% less parameters and a significantly smaller latent space.

Updated: 2024-06-18 14:45:30

标题: 使用Bregman学习稀疏化PDE解数据的降维

摘要: 经典的模型简化技术将控制方程投影到原始状态空间的线性子空间上。最近的数据驱动技术使用神经网络实现非线性投影。虽然这些技术通常能够实现更强的压缩，但可能存在冗余参数并导致次优潜在维度。为了克服这些问题，我们提出了一种多步算法，通过在编码器-解码器网络中引入稀疏性，有效减少参数数量并进一步压缩潜在空间。该算法从稀疏初始化网络开始，并使用线性化的Bregman迭代进行训练。这些迭代在计算机视觉和压缩感知任务中取得了很大成功，但尚未用于降阶建模。在训练之后，我们进一步通过一种适当的正交分解形式压缩潜在空间维度。最后，我们使用偏差传播技术将引入的稀疏性转化为参数有效减少。我们将该算法应用于三个代表性的偏微分方程模型：1D扩散、1D对流和2D反应-扩散。与传统的Adam等训练方法相比，所提出的方法在具有30%更少参数和显著更小潜在空间的情况下实现了类似的精度。

更新时间: 2024-06-18 14:45:30

领域: math.NA,cs.AI,cs.NA,stat.ML,65K10 (Primary) 68T07, 65D99, 41A63 (Secondary),G.1.6; I.2.6

下载: http://arxiv.org/abs/2406.12672v1

Stealth edits for provably fixing or attacking large language models

We reveal new methods and the theoretical foundations of techniques for editing large language models. We also show how the new theory can be used to assess the editability of models and to expose their susceptibility to previously unknown malicious attacks. Our theoretical approach shows that a single metric (a specific measure of the intrinsic dimensionality of the model's features) is fundamental to predicting the success of popular editing approaches, and reveals new bridges between disparate families of editing methods. We collectively refer to these approaches as stealth editing methods, because they aim to directly and inexpensively update a model's weights to correct the model's responses to known hallucinating prompts without otherwise affecting the model's behaviour, without requiring retraining. By carefully applying the insight gleaned from our theoretical investigation, we are able to introduce a new network block -- named a jet-pack block -- which is optimised for highly selective model editing, uses only standard network operations, and can be inserted into existing networks. The intrinsic dimensionality metric also determines the vulnerability of a language model to a stealth attack: a small change to a model's weights which changes its response to a single attacker-chosen prompt. Stealth attacks do not require access to or knowledge of the model's training data, therefore representing a potent yet previously unrecognised threat to redistributed foundation models. They are computationally simple enough to be implemented in malware in many cases. Extensive experimental results illustrate and support the method and its theoretical underpinnings. Demos and source code for editing language models are available at https://github.com/qinghua-zhou/stealth-edits.

Updated: 2024-06-18 14:43:18

标题: 隐蔽编辑用于可证明修复或攻击大型语言模型

摘要: 我们揭示了编辑大型语言模型的新方法和理论基础技术。我们还展示了如何利用新理论来评估模型的可编辑性，并暴露它们对先前未知的恶意攻击的敏感性。我们的理论方法显示，一个单一的度量（模型特征的固有维度的特定度量）对于预测流行的编辑方法的成功至关重要，并揭示了编辑方法不同家族之间的新桥梁。我们将这些方法统称为隐形编辑方法，因为它们旨在直接且廉价地更新模型的权重，以纠正模型对已知幻觉提示的响应，而不影响模型的行为，也不需要重新训练。通过仔细应用我们理论研究所得到的见解，我们能够引入一种新的网络模块—名为喷气背包模块—该模块针对高度选择性的模型编辑进行了优化，仅使用标准网络操作，并可以插入现有网络中。固有维度度量还确定了语言模型对隐形攻击的脆弱性：对模型权重的微小更改可能会改变它对单个攻击者选择的提示的响应。隐形攻击不需要访问或了解模型的训练数据，因此对重新分布的基础模型构成了一个强大但以前未被认识的威胁。在许多情况下，它们的计算简单到可以在恶意软件中实现。广泛的实验结果展示并支持了该方法及其理论基础。编辑语言模型的演示和源代码可在https://github.com/qinghua-zhou/stealth-edits 上找到。

更新时间: 2024-06-18 14:43:18

领域: cs.AI,cs.LG,68T07, 68T50, 68W40,I.2.7; F.2.0

下载: http://arxiv.org/abs/2406.12670v1

A Systematization of the Wagner Framework: Graph Theory Conjectures and Reinforcement Learning

In 2021, Adam Zsolt Wagner proposed an approach to disprove conjectures in graph theory using Reinforcement Learning (RL). Wagner's idea can be framed as follows: consider a conjecture, such as a certain quantity f(G) < 0 for every graph G; one can then play a single-player graph-building game, where at each turn the player decides whether to add an edge or not. The game ends when all edges have been considered, resulting in a certain graph G_T, and f(G_T) is the final score of the game; RL is then used to maximize this score. This brilliant idea is as simple as innovative, and it lends itself to systematic generalization. Several different single-player graph-building games can be employed, along with various RL algorithms. Moreover, RL maximizes the cumulative reward, allowing for step-by-step rewards instead of a single final score, provided the final cumulative reward represents the quantity of interest f(G_T). In this paper, we discuss these and various other choices that can be significant in Wagner's framework. As a contribution to this systematization, we present four distinct single-player graph-building games. Each game employs both a step-by-step reward system and a single final score. We also propose a principled approach to select the most suitable neural network architecture for any given conjecture, and introduce a new dataset of graphs labeled with their Laplacian spectra. Furthermore, we provide a counterexample for a conjecture regarding the sum of the matching number and the spectral radius, which is simpler than the example provided in Wagner's original paper. The games have been implemented as environments in the Gymnasium framework, and along with the dataset, are available as open-source supplementary materials.

Updated: 2024-06-18 14:40:20

标题: Wagner框架的系统化：图论猜想和强化学习

摘要: 2021年，Adam Zsolt Wagner提出了一种使用强化学习（RL）来证伪图论猜想的方法。Wagner的想法可以概括如下：考虑一个猜想，比如对于每个图G，某个数量f(G) < 0；然后可以玩一个单人图构建游戏，在每个回合玩家决定是否添加一条边。游戏在所有边都被考虑完时结束，得到一个特定的图G_T，f(G_T)是游戏的最终得分；然后使用RL来最大化这个得分。这个聪明的想法既简单又创新，并且适用于系统化的推广。可以使用几种不同的单人图构建游戏，以及各种RL算法。此外，RL最大化了累积奖励，允许逐步奖励而不是单个最终得分，只要最终累积奖励代表感兴趣的数量f(G_T)。在本文中，我们讨论了在Wagner的框架中可能具有重要意义的这些选择以及其他选择。作为对这种系统化的贡献，我们提出了四种独特的单人图构建游戏。每个游戏都采用逐步奖励系统和单个最终得分。我们还提出了一种有原则的方法来选择针对任何给定猜想最适合的神经网络架构，并介绍了一个带有其拉普拉斯谱标签的新图数据集。此外，我们为关于匹配数和谱半径之和的猜想提供了一个反例，比Wagner原始论文中提供的例子更简单。这些游戏已经被实现为Gymnasium框架中的环境，并与数据集一起作为开源附加材料提供。

更新时间: 2024-06-18 14:40:20

领域: cs.LG

下载: http://arxiv.org/abs/2406.12667v1

CollabStory: Multi-LLM Collaborative Story Generation and Authorship Analysis

The rise of unifying frameworks that enable seamless interoperability of Large Language Models (LLMs) has made LLM-LLM collaboration for open-ended tasks a possibility. Despite this, there have not been efforts to explore such collaborative writing. We take the next step beyond human-LLM collaboration to explore this multi-LLM scenario by generating the first exclusively LLM-generated collaborative stories dataset called CollabStory. We focus on single-author ($N=1$) to multi-author (up to $N=5$) scenarios, where multiple LLMs co-author stories. We generate over 32k stories using open-source instruction-tuned LLMs. Further, we take inspiration from the PAN tasks that have set the standard for human-human multi-author writing tasks and analysis. We extend their authorship-related tasks for multi-LLM settings and present baselines for LLM-LLM collaboration. We find that current baselines are not able to handle this emerging scenario. Thus, CollabStory is a resource that could help propel an understanding as well as the development of techniques to discern the use of multiple LLMs. This is crucial to study in the context of writing tasks since LLM-LLM collaboration could potentially overwhelm ongoing challenges related to plagiarism detection, credit assignment, maintaining academic integrity in educational settings, and addressing copyright infringement concerns. We make our dataset and code available at \texttt{\url{https://github.com/saranya-venkatraman/multi_llm_story_writing}}.

Updated: 2024-06-18 14:35:12

标题: CollabStory: 多LLM协作故事生成和作者分析

摘要: 具有统一框架的崛起，使得大型语言模型（LLMs）能够实现无缝互操作性，为开放性任务中的LLM-LLM协作提供了可能性。尽管如此，目前尚未有探索这种协作写作的努力。我们迈出了超越人类-LLM协作的下一步，通过生成首个专门由LLM生成的协作故事数据集CollabStory，来探索这种多LLM情境。我们专注于单一作者（$N=1$）到多作者（最多$N=5$）的情景，其中多个LLM共同撰写故事。我们使用开源指导调整的LLMs生成了超过32,000个故事。此外，我们从已为人类-人类多作者写作任务和分析设定标准的PAN任务中汲取灵感。我们扩展了他们针对多LLM设置的与作者相关的任务，并提出LLM-LLM协作的基准。我们发现当前的基准无法处理这种新兴情境。因此，CollabStory是一个资源，可以帮助推动对多LLMs使用的理解和技术发展。在写作任务的背景下进行研究是至关重要的，因为LLM-LLM协作可能会对关于抄袭检测、归因分配、在教育环境中保持学术诚信以及解决版权侵权问题等正在进行的挑战造成潜在压力。我们将我们的数据集和代码提供在\texttt{\url{https://github.com/saranya-venkatraman/multi_llm_story_writing}}。

更新时间: 2024-06-18 14:35:12

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2406.12665v1

Do More Details Always Introduce More Hallucinations in LVLM-based Image Captioning?

Large Vision-Language Models (LVLMs) excel in integrating visual and linguistic contexts to produce detailed content, facilitating applications such as image captioning. However, using LVLMs to generate descriptions often faces the challenge of object hallucination (OH), where the output text misrepresents actual objects in the input image. While previous studies attribute the occurrence of OH to the inclusion of more details, our study finds technical flaws in existing metrics, leading to unreliable evaluations of models and conclusions about OH. This has sparked a debate on the question: Do more details always introduce more hallucinations in LVLM-based image captioning? In this paper, we address this debate by proposing a novel decoding strategy, Differentiated Beam Decoding (DBD), along with a reliable new set of evaluation metrics: CLIP-Precision, CLIP-Recall, and CLIP-F1. DBD decodes the wealth of information hidden in visual input into distinct language representations called unit facts in parallel. This decoding is achieved via a well-designed differential score that guides the parallel search and candidate screening. The selected unit facts are then aggregated to generate the final caption. Our proposed metrics evaluate the comprehensiveness and accuracy of image captions by comparing the embedding groups of ground-truth image regions and generated text partitions. Extensive experiments on the Visual Genome dataset validate the effectiveness of our approach, demonstrating that it produces detailed descriptions while maintaining low hallucination levels.

Updated: 2024-06-18 14:33:56

标题: 基于LVLM的图像描述中，更多细节总是会引入更多幻觉吗？

摘要: 大型视觉语言模型（LVLMs）在整合视觉和语言上下文以产生详细内容方面表现出色，促进了诸如图像字幕生成等应用。然而，使用LVLMs生成描述往往面临对象幻觉（OH）的挑战，即输出文本误代实际输入图像中的对象。尽管先前的研究将OH的发生归因于包含更多细节，但我们的研究发现现有度量标准中存在技术缺陷，导致对模型的不可靠评估和关于OH的结论。这引发了一个争论：在基于LVLM的图像字幕生成中，更多的细节是否总是会引入更多的幻觉？在本文中，我们通过提出一种新颖的解码策略Differentiated Beam Decoding（DBD），以及一组可靠的新评估指标：CLIP-Precision、CLIP-Recall和CLIP-F1来解决这一争论。DBD将隐藏在视觉输入中的丰富信息并行解码为称为单元事实的不同语言表示。这种解码是通过一个精心设计的差分分数实现的，该分数引导并行搜索和候选筛选。然后，选择的单元事实被聚合以生成最终的字幕。我们提出的指标通过比较地面真实图像区域的嵌入组和生成文本分区来评估图像字幕的全面性和准确性。对Visual Genome数据集的大量实验验证了我们方法的有效性，证明它能够生成详细描述同时保持低幻觉水平。

更新时间: 2024-06-18 14:33:56

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2406.12663v1

SCORE: A 1D Reparameterization Technique to Break Bayesian Optimization's Curse of Dimensionality

Bayesian optimization (BO) has emerged as a powerful tool for navigating complex search spaces, showcasing practical applications in the fields of science and engineering.However, since it typically relies on a surrogate model to approximate the objective function, BO grapples with heightened computational costs that tend to escalate as the number of parameters and experiments grows. Several methods such as parallelization, surrogate model approximations, and memory pruning have been proposed to cut down computing time, but they all fall short of resolving the core issue behind BO's curse of dimensionality. In this paper, a 1D reparametrization trick is proposed to break this curse and sustain linear time complexity for BO in high-dimensional landscapes. This fast and scalable approach named SCORE can successfully find the global minimum of needle-in-a-haystack optimization functions and fit real-world data without the high-performance computing resources typically required by state-of-the-art techniques.

Updated: 2024-06-18 14:28:29

标题: 得分：一种一维重新参数化技术，用于打破贝叶斯优化的维度诅咒

摘要: 贝叶斯优化（BO）已经成为一种在复杂搜索空间中导航的强大工具，展示了在科学和工程领域的实际应用。然而，由于它通常依赖于一个替代模型来近似目标函数，BO面临着随着参数和实验数量增加而不断增加的计算成本。已经提出了几种方法，如并行化、替代模型近似和内存修剪，以减少计算时间，但它们都无法解决BO维度诅咒背后的核心问题。本文提出了一个一维重新参数化技巧，以打破这个诅咒，在高维度景观中维持BO的线性时间复杂度。这种快速和可扩展的方法命名为SCORE，可以成功地找到针在草堆优化函数的全局最小值，并且适应真实世界的数据，而不需要通常由最先进技术所需的高性能计算资源。

更新时间: 2024-06-18 14:28:29

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2406.12661v1

Investigating the Role of Explainability and AI Literacy in User Compliance

AI is becoming increasingly common across different domains. However, as sophisticated AI-based systems are often black-boxed, rendering the decision-making logic opaque, users find it challenging to comply with their recommendations. Although researchers are investigating Explainable AI (XAI) to increase the transparency of the underlying machine learning models, it is unclear what types of explanations are effective and what other factors increase compliance. To better understand the interplay of these factors, we conducted an experiment with 562 participants who were presented with the recommendations of an AI and two different types of XAI. We find that users' compliance increases with the introduction of XAI but is also affected by AI literacy. We also find that the relationships between AI literacy XAI and users' compliance are mediated by the users' mental model of AI. Our study has several implications for successfully designing AI-based systems utilizing XAI.

Updated: 2024-06-18 14:28:12

标题: 调查解释性和人工智能素养在用户遵从性中的作用

摘要: 人工智能在不同领域越来越普遍。然而，由于复杂的基于人工智能的系统往往是黑盒的，决策逻辑不透明，用户发现很难遵循其建议。尽管研究人员正在研究可解释人工智能（XAI）以增加底层机器学习模型的透明度，但目前尚不清楚哪种类型的解释是有效的，以及什么其他因素会增加遵从性。为了更好地理解这些因素的相互作用，我们进行了一个实验，涉及562名参与者，他们被呈现了一个人工智能的建议和两种不同类型的XAI。我们发现用户在引入XAI后遵从性增加，但也受到人工智能素养的影响。我们还发现人工智能素养、XAI和用户遵从性之间的关系受用户对人工智能的心理模型的中介作用影响。我们的研究对成功设计利用XAI的基于人工智能系统具有几方面意义。

更新时间: 2024-06-18 14:28:12

领域: cs.AI

下载: http://arxiv.org/abs/2406.12660v1

A variational Bayes approach to debiased inference for low-dimensional parameters in high-dimensional linear regression

We propose a scalable variational Bayes method for statistical inference for a single or low-dimensional subset of the coordinates of a high-dimensional parameter in sparse linear regression. Our approach relies on assigning a mean-field approximation to the nuisance coordinates and carefully modelling the conditional distribution of the target given the nuisance. This requires only a preprocessing step and preserves the computational advantages of mean-field variational Bayes, while ensuring accurate and reliable inference for the target parameter, including for uncertainty quantification. We investigate the numerical performance of our algorithm, showing that it performs competitively with existing methods. We further establish accompanying theoretical guarantees for estimation and uncertainty quantification in the form of a Bernstein--von Mises theorem.

Updated: 2024-06-18 14:27:44

标题: 一种变分贝叶斯方法用于高维线性回归中低维参数的无偏推断

摘要: 我们提出了一种可扩展的变分贝叶斯方法，用于稀疏线性回归中高维参数的单个或低维子集的统计推断。我们的方法依赖于将均场近似分配给无关坐标，并仔细建模给定无关坐标时目标的条件分布。这仅需要一个预处理步骤，并保留了均场变分贝叶斯的计算优势，同时确保了目标参数的准确和可靠推断，包括不确定性量化。我们研究了算法的数值性能，表明它与现有方法具有竞争力。我们进一步建立了估计和不确定性量化的理论保证，形式为Bernstein-von Mises定理。

更新时间: 2024-06-18 14:27:44

领域: stat.ML,cs.LG,math.ST,stat.TH,62

下载: http://arxiv.org/abs/2406.12659v1

Federated Learning with a Single Shared Image

Federated Learning (FL) enables multiple machines to collaboratively train a machine learning model without sharing of private training data. Yet, especially for heterogeneous models, a key bottleneck remains the transfer of knowledge gained from each client model with the server. One popular method, FedDF, uses distillation to tackle this task with the use of a common, shared dataset on which predictions are exchanged. However, in many contexts such a dataset might be difficult to acquire due to privacy and the clients might not allow for storage of a large shared dataset. To this end, in this paper, we introduce a new method that improves this knowledge distillation method to only rely on a single shared image between clients and server. In particular, we propose a novel adaptive dataset pruning algorithm that selects the most informative crops generated from only a single image. With this, we show that federated learning with distillation under a limited shared dataset budget works better by using a single image compared to multiple individual ones. Finally, we extend our approach to allow for training heterogeneous client architectures by incorporating a non-uniform distillation schedule and client-model mirroring on the server side.

Updated: 2024-06-18 14:26:09

标题: 使用单个共享图像的联邦学习

摘要: 联邦学习（FL）使多台机器能够共同训练机器学习模型，而无需共享私人训练数据。然而，特别是对于异构模型，一个关键瓶颈仍然是从每个客户端模型向服务器传输所获知识。一种流行的方法FedDF使用蒸馏来处理这个任务，使用一个共同的、共享的数据集，用于交换预测。然而，在许多情况下，这样的数据集可能难以获取，因为隐私问题，客户端可能不允许存储一个大型的共享数据集。因此，在本文中，我们介绍了一种改进这种知识蒸馏方法的新方法，只依赖于客户端和服务器之间的单个共享图像。具体来说，我们提出了一种新颖的自适应数据集修剪算法，从仅一个图像生成的最具信息量的裁剪中选择。通过这种方法，我们展示了在有限的共享数据集预算下，使用单个图像比使用多个单独的图像更好地进行蒸馏的联邦学习。最后，我们扩展了我们的方法，通过在服务器端结合非统一蒸馏时间表和客户端模型镜像，来支持训练异构客户端架构。

更新时间: 2024-06-18 14:26:09

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2406.12658v1

Benchmarks and Metrics for Evaluations of Code Generation: A Critical Review

With the rapid development of Large Language Models (LLMs), a large number of machine learning models have been developed to assist programming tasks including the generation of program code from natural language input. However, how to evaluate such LLMs for this task is still an open problem despite of the great amount of research efforts that have been made and reported to evaluate and compare them. This paper provides a critical review of the existing work on the testing and evaluation of these tools with a focus on two key aspects: the benchmarks and the metrics used in the evaluations. Based on the review, further research directions are discussed.

Updated: 2024-06-18 14:25:34

标题: 代码生成评估的基准和度量标准：一项关键审查

摘要: 随着大型语言模型（LLMs）的快速发展，大量的机器学习模型已经被开发用于辅助编程任务，包括从自然语言输入中生成程序代码。然而，尽管已经付出了大量研究工作来评估和比较这些LLMs，但如何评估这些LLMs在这个任务中的表现仍然是一个悬而未决的问题。本文对现有关于这些工具的测试和评估工作进行了批判性审查，重点关注两个关键方面：评估中使用的基准和指标。根据审查结果，进一步的研究方向被讨论。

更新时间: 2024-06-18 14:25:34

领域: cs.AI,cs.SE

下载: http://arxiv.org/abs/2406.12655v1

Transforming Surgical Interventions with Embodied Intelligence for Ultrasound Robotics

Ultrasonography has revolutionized non-invasive diagnostic methodologies, significantly enhancing patient outcomes across various medical domains. Despite its advancements, integrating ultrasound technology with robotic systems for automated scans presents challenges, including limited command understanding and dynamic execution capabilities. To address these challenges, this paper introduces a novel Ultrasound Embodied Intelligence system that synergistically combines ultrasound robots with large language models (LLMs) and domain-specific knowledge augmentation, enhancing ultrasound robots' intelligence and operational efficiency. Our approach employs a dual strategy: firstly, integrating LLMs with ultrasound robots to interpret doctors' verbal instructions into precise motion planning through a comprehensive understanding of ultrasound domain knowledge, including APIs and operational manuals; secondly, incorporating a dynamic execution mechanism, allowing for real-time adjustments to scanning plans based on patient movements or procedural errors. We demonstrate the effectiveness of our system through extensive experiments, including ablation studies and comparisons across various models, showcasing significant improvements in executing medical procedures from verbal commands. Our findings suggest that the proposed system improves the efficiency and quality of ultrasound scans and paves the way for further advancements in autonomous medical scanning technologies, with the potential to transform non-invasive diagnostics and streamline medical workflows.

Updated: 2024-06-18 14:22:16

标题: 利用具身智能改变超声波机器人手术干预

摘要: 超声波技术已经彻底改变了非侵入性诊断方法，显著提高了各种医学领域患者的预后。尽管技术进步，但将超声技术与机器人系统集成以进行自动扫描仍然存在挑战，包括指令理解和动态执行能力有限。为了解决这些挑战，本文介绍了一种新颖的超声波体现智能系统，将超声波机器人与大型语言模型（LLMs）和领域特定知识增强相结合，增强了超声波机器人的智能和操作效率。我们的方法采用了双重策略：首先，将LLMs与超声波机器人集成，通过全面理解超声波领域知识（包括API和操作手册），将医生的口头指令解释为精确的运动规划；其次，结合动态执行机制，允许根据患者的动作或程序错误实时调整扫描计划。我们通过广泛的实验展示了我们系统的有效性，包括消融研究和各种模型的比较，展示了从口头指令执行医疗程序方面的显著改进。我们的研究结果表明，所提出的系统提高了超声波扫描的效率和质量，并为自主医学扫描技术的进一步发展铺平了道路，有潜力改变非侵入性诊断，并简化医疗工作流程。

更新时间: 2024-06-18 14:22:16

领域: cs.RO,cs.AI,cs.CL,cs.HC

下载: http://arxiv.org/abs/2406.12651v1

Probabilistic Conceptual Explainers: Trustworthy Conceptual Explanations for Vision Foundation Models

Vision transformers (ViTs) have emerged as a significant area of focus, particularly for their capacity to be jointly trained with large language models and to serve as robust vision foundation models. Yet, the development of trustworthy explanation methods for ViTs has lagged, particularly in the context of post-hoc interpretations of ViT predictions. Existing sub-image selection approaches, such as feature-attribution and conceptual models, fall short in this regard. This paper proposes five desiderata for explaining ViTs -- faithfulness, stability, sparsity, multi-level structure, and parsimony -- and demonstrates the inadequacy of current methods in meeting these criteria comprehensively. We introduce a variational Bayesian explanation framework, dubbed ProbAbilistic Concept Explainers (PACE), which models the distributions of patch embeddings to provide trustworthy post-hoc conceptual explanations. Our qualitative analysis reveals the distributions of patch-level concepts, elucidating the effectiveness of ViTs by modeling the joint distribution of patch embeddings and ViT's predictions. Moreover, these patch-level explanations bridge the gap between image-level and dataset-level explanations, thus completing the multi-level structure of PACE. Through extensive experiments on both synthetic and real-world datasets, we demonstrate that PACE surpasses state-of-the-art methods in terms of the defined desiderata.

Updated: 2024-06-18 14:17:57

标题: 概率性概念解释器：视觉基础模型的可信概念解释

摘要: 视觉转换器（ViTs）已经成为一个重要的关注领域，特别是因为它们可以与大型语言模型联合训练，并作为强大的视觉基础模型。然而，针对ViTs的可信解释方法的发展滞后，特别是在后续解释ViT预测的背景下。现有的子图像选择方法，如特征归因和概念模型，在这方面表现不佳。本文提出了解释ViTs的五个愿望--忠实性、稳定性、稀疏性、多层结构和简洁性，并展示了当前方法在全面满足这些标准方面的不足之处。我们引入了一个变分贝叶斯解释框架，命名为ProbAbilistic Concept Explainers（PACE），该框架模拟了补丁嵌入的分布，以提供可信赖的后续概念解释。我们的定性分析揭示了补丁级别概念的分布，通过建模补丁嵌入和ViT预测的联合分布，阐明了ViTs的有效性。此外，这些补丁级别的解释弥合了图像级别和数据集级别解释之间的差距，从而完成了PACE的多层结构。通过在合成和真实数据集上进行广泛实验，我们证明了PACE在定义的愿望方面超越了最先进的方法。

更新时间: 2024-06-18 14:17:57

领域: cs.LG,cs.AI,cs.CV,stat.ML

下载: http://arxiv.org/abs/2406.12649v1

An Empirical Study on the Fairness of Foundation Models for Multi-Organ Image Segmentation

The segmentation foundation model, e.g., Segment Anything Model (SAM), has attracted increasing interest in the medical image community. Early pioneering studies primarily concentrated on assessing and improving SAM's performance from the perspectives of overall accuracy and efficiency, yet little attention was given to the fairness considerations. This oversight raises questions about the potential for performance biases that could mirror those found in task-specific deep learning models like nnU-Net. In this paper, we explored the fairness dilemma concerning large segmentation foundation models. We prospectively curate a benchmark dataset of 3D MRI and CT scans of the organs including liver, kidney, spleen, lung and aorta from a total of 1056 healthy subjects with expert segmentations. Crucially, we document demographic details such as gender, age, and body mass index (BMI) for each subject to facilitate a nuanced fairness analysis. We test state-of-the-art foundation models for medical image segmentation, including the original SAM, medical SAM and SAT models, to evaluate segmentation efficacy across different demographic groups and identify disparities. Our comprehensive analysis, which accounts for various confounding factors, reveals significant fairness concerns within these foundational models. Moreover, our findings highlight not only disparities in overall segmentation metrics, such as the Dice Similarity Coefficient but also significant variations in the spatial distribution of segmentation errors, offering empirical evidence of the nuanced challenges in ensuring fairness in medical image segmentation.

Updated: 2024-06-18 14:14:04

标题: 一个关于多器官图像分割基础模型公平性的实证研究

摘要: 分割基础模型，例如分割任何模型（SAM），已经在医学图像社区引起了越来越多的关注。早期的开创性研究主要集中在评估和改进SAM的整体准确性和效率方面，然而很少有人关注公平性考虑。这种疏忽引发了关于性能偏见的潜在问题，这可能与nnU-Net等任务特定的深度学习模型中发现的类似。在本文中，我们探讨了大型分割基础模型所面临的公平困境。我们预先策划了一个基准数据集，包括来自1056名健康受试者的器官的3D MRI和CT扫描，包括肝脏、肾脏、脾脏、肺部和主动脉，并附有专家分割。关键是，我们为每个受试者记录了性别、年龄和体重指数（BMI）等人口统计学细节，以促进细致的公平分析。我们测试了用于医学图像分割的最先进的基础模型，包括原始的SAM、医学SAM和SAT模型，以评估在不同人口统计学群体中的分割效果并确定差异。我们的全面分析考虑了各种混杂因素，揭示了这些基础模型中的重大公平性问题。此外，我们的发现不仅突显了整体分割度量指标（如Dice相似系数）的差异，还显示了分割错误的空间分布存在显著变化，为确保医学图像分割公平性的微妙挑战提供了实证证据。

更新时间: 2024-06-18 14:14:04

领域: eess.IV,cs.AI,cs.CV

下载: http://arxiv.org/abs/2406.12646v1

Evaluating Transparency of Machine Generated Fact Checking Explanations

An important factor when it comes to generating fact-checking explanations is the selection of evidence: intuitively, high-quality explanations can only be generated given the right evidence. In this work, we investigate the impact of human-curated vs. machine-selected evidence for explanation generation using large language models. To assess the quality of explanations, we focus on transparency (whether an explanation cites sources properly) and utility (whether an explanation is helpful in clarifying a claim). Surprisingly, we found that large language models generate similar or higher quality explanations using machine-selected evidence, suggesting carefully curated evidence (by humans) may not be necessary. That said, even with the best model, the generated explanations are not always faithful to the sources, suggesting further room for improvement in explanation generation for fact-checking.

Updated: 2024-06-18 14:13:13

标题: 评估机器生成的事实核查解释的透明度

摘要: 在生成事实核查解释时一个重要因素是证据的选择：直觉上，只有在有正确证据的情况下才能生成高质量的解释。在这项工作中，我们研究了使用大型语言模型进行解释生成时人工策划与机器选择证据的影响。为了评估解释的质量，我们关注透明度（解释是否正确引用来源）和实用性（解释是否有助于澄清主张）。令人惊讶的是，我们发现大型语言模型使用机器选择的证据生成类似或更高质量的解释，这表明人工精心策划的证据可能并非必要。尽管如此，即使使用最佳模型，生成的解释也并不总是忠实于来源，这表明在事实核查的解释生成方面仍有进一步改进的空间。

更新时间: 2024-06-18 14:13:13

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2406.12645v1

Hierarchical Prompting Taxonomy: A Universal Evaluation Framework for Large Language Models

Assessing the effectiveness of large language models (LLMs) in addressing diverse tasks is essential for comprehending their strengths and weaknesses. Conventional evaluation techniques typically apply a single prompting strategy uniformly across datasets, not considering the varying degrees of task complexity. We introduce the Hierarchical Prompting Taxonomy (HPT), a taxonomy that employs a Hierarchical Prompt Framework (HPF) composed of five unique prompting strategies, arranged from the simplest to the most complex, to assess LLMs more precisely and to offer a clearer perspective. This taxonomy assigns a score, called the Hierarchical Prompting Score (HP-Score), to datasets as well as LLMs based on the rules of the taxonomy, providing a nuanced understanding of their ability to solve diverse tasks and offering a universal measure of task complexity. Additionally, we introduce the Adaptive Hierarchical Prompt framework, which automates the selection of appropriate prompting strategies for each task. This study compares manual and adaptive hierarchical prompt frameworks using four instruction-tuned LLMs, namely Llama 3 8B, Phi 3 3.8B, Mistral 7B, and Gemma 7B, across four datasets: BoolQ, CommonSenseQA (CSQA), IWSLT-2017 en-fr (IWSLT), and SamSum. Experiments demonstrate the effectiveness of HPT, providing a reliable way to compare different tasks and LLM capabilities. This paper leads to the development of a universal evaluation metric that can be used to evaluate both the complexity of the datasets and the capabilities of LLMs. The implementation of both manual HPF and adaptive HPF is publicly available.

Updated: 2024-06-18 14:12:27

标题: 分层提示分类法：大语言模型的通用评估框架

摘要: 评估大型语言模型（LLMs）在处理多样化任务中的有效性对于理解它们的优势和劣势至关重要。传统的评估技术通常在数据集上统一应用单一提示策略，而不考虑任务复杂度的不同程度。我们引入了分层提示分类法（HPT），这是一个采用由最简单到最复杂的五种独特提示策略排列而成的分层提示框架（HPF），以更精确地评估LLMs并提供更清晰的视角。该分类法根据分类法的规则为数据集和LLMs分配一个分数，称为分层提示评分（HP-Score），从而提供对它们解决多样化任务能力的微妙理解，并提供任务复杂度的通用度量。此外，我们引入了自适应分层提示框架，该框架自动选择适当的提示策略来处理每个任务。本研究使用四个经过指令调整的LLMs，分别是Llama 3 8B、Phi 3 3.8B、Mistral 7B和Gemma 7B，以及四个数据集：BoolQ，CommonSenseQA（CSQA），IWSLT-2017 en-fr（IWSLT）和SamSum，比较了手动和自适应分层提示框架的效果。实验证明了HPT的有效性，提供了一种可靠的方法来比较不同任务和LLMs的能力。本文引领了一个可以用来评估数据集复杂度和LLMs能力的通用评估指标的开发。手动HPF和自适应HPF的实施已经公开可用。

更新时间: 2024-06-18 14:12:27

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2406.12644v1

$S^3$ -- Semantic Signal Separation

Topic models are useful tools for discovering latent semantic structures in large textual corpora. Topic modeling historically relied on bag-of-words representations of language. This approach makes models sensitive to the presence of stop words and noise, and does not utilize potentially useful contextual information. Recent efforts have been oriented at incorporating contextual neural representations in topic modeling and have been shown to outperform classical topic models. These approaches are, however, typically slow, volatile and still require preprocessing for optimal results. We present Semantic Signal Separation ($S^3$), a theory-driven topic modeling approach in neural embedding spaces. $S^3$ conceptualizes topics as independent axes of semantic space, and uncovers these with blind-source separation. Our approach provides the most diverse, highly coherent topics, requires no preprocessing, and is demonstrated to be the fastest contextually sensitive topic model to date. We offer an implementation of $S^3$, among other approaches, in the Turftopic Python package.

Updated: 2024-06-18 14:12:18

标题: $S^3$ -- 语义信号分离

摘要: 主题模型是用于发现大型文本语料库中潜在语义结构的有用工具。主题建模历史上依赖于语言的词袋表示。这种方法使模型对停用词和噪声的存在敏感，并且不利用潜在有用的上下文信息。最近的努力致力于将上下文神经表示纳入主题建模中，并已显示出优于传统主题模型。然而，这些方法通常速度较慢，不稳定，并且仍然需要预处理以获得最佳结果。我们提出了基于理论的主题建模方法Semantic Signal Separation（$S^3$），在神经嵌入空间中。$S^3$将主题概念化为语义空间的独立轴，并通过盲源分离来揭示这些主题。我们的方法提供了最多样化、高度连贯的主题，不需要预处理，并被证明是迄今为止最快的上下文敏感主题模型。我们在Turftopic Python包中提供了$S^3$的实现，以及其他方法。

更新时间: 2024-06-18 14:12:18

领域: cs.LG,cs.CL,stat.ML,I.2.7

下载: http://arxiv.org/abs/2406.09556v2

STG4Traffic: A Survey and Benchmark of Spatial-Temporal Graph Neural Networks for Traffic Prediction

Traffic prediction has been an active research topic in the domain of spatial-temporal data mining. Accurate real-time traffic prediction is essential to improve the safety, stability, and versatility of smart city systems, i.e., traffic control and optimal routing. The complex and highly dynamic spatial-temporal dependencies make effective predictions still face many challenges. Recent studies have shown that spatial-temporal graph neural networks exhibit great potential applied to traffic prediction, which combines sequential models with graph convolutional networks to jointly model temporal and spatial correlations. However, a survey study of graph learning, spatial-temporal graph models for traffic, as well as a fair comparison of baseline models are pending and unavoidable issues. In this paper, we first provide a systematic review of graph learning strategies and commonly used graph convolution algorithms. Then we conduct a comprehensive analysis of the strengths and weaknesses of recently proposed spatial-temporal graph network models. Furthermore, we build a study called STG4Traffic using the deep learning framework PyTorch to establish a standardized and scalable benchmark on two types of traffic datasets. We can evaluate their performance by personalizing the model settings with uniform metrics. Finally, we point out some problems in the current study and discuss future directions. Source codes are available at https://github.com/trainingl/STG4Traffic.

Updated: 2024-06-18 14:11:01

标题: STG4Traffic: 交通预测的时空图神经网络调查和基准

摘要: 交通预测一直是空间-时间数据挖掘领域的一个活跃研究课题。准确的实时交通预测对于提高智能城市系统的安全性、稳定性和多功能性至关重要，即交通控制和优化路由。复杂和高度动态的空间-时间依赖关系使得有效的预测仍然面临许多挑战。最近的研究表明，空间-时间图神经网络在交通预测中表现出巨大潜力，它结合了顺序模型和图卷积网络，共同建模了时间和空间的相关性。然而，图学习、空间-时间图模型用于交通的调查研究，以及基准模型的公平比较仍然是待解决和不可避免的问题。在本文中，我们首先对图学习策略和常用的图卷积算法进行系统回顾。然后对最近提出的空间-时间图网络模型的优势和劣势进行全面分析。此外，我们利用深度学习框架PyTorch建立了一个名为STG4Traffic的研究，以在两种类型的交通数据集上建立一个标准化和可扩展的基准。我们可以通过使用统一的度量标准个性化模型设置来评估它们的性能。最后，我们指出了当前研究中的一些问题，并讨论了未来的方向。源代码可在https://github.com/trainingl/STG4Traffic 上找到。

更新时间: 2024-06-18 14:11:01

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2307.00495v2

Research and Implementation of Data Enhancement Techniques for Graph Neural Networks

Data, algorithms, and arithmetic power are the three foundational conditions for deep learning to be effective in the application domain. Data is the focus for developing deep learning algorithms. In practical engineering applications, some data are affected by the conditions under which more data cannot be obtained or the cost of obtaining data is too high, resulting in smaller data sets (generally several hundred to several thousand) and data sizes that are far smaller than the size of large data sets (tens of thousands). The above two methods are based on the original dataset to generate, in the case of insufficient data volume of the original data may not reflect all the real environment, such as the real environment of the light, silhouette and other information, if the amount of data is not enough, it is difficult to use a simple transformation or neural network generative model to generate the required data. The research in this paper firstly analyses the key points of the data enhancement technology of graph neural network, and at the same time introduces the composition foundation of graph neural network in depth, on the basis of which the data enhancement technology of graph neural network is optimized and analysed.

Updated: 2024-06-18 14:07:38

标题: 图神经网络数据增强技术的研究与实现

摘要: 数据、算法和算力是深度学习在应用领域有效的三个基础条件。数据是发展深度学习算法的重点。在实际工程应用中，一些数据受到条件的影响，无法获得更多数据或者获取数据的成本过高，导致数据集较小（通常几百到几千个）且数据规模远远小于大数据集的规模（数万个）。上述两种方法基于原始数据集生成，在原始数据量不足的情况下可能无法反映所有真实环境，例如真实环境的光线、轮廓等信息，如果数据量不足，则很难使用简单的转换或神经网络生成模型生成所需数据。本文研究首先分析了图神经网络数据增强技术的关键点，同时深入介绍了图神经网络的组成基础，基于此优化和分析了图神经网络数据增强技术。

更新时间: 2024-06-18 14:07:38

领域: cs.LG

下载: http://arxiv.org/abs/2406.12640v1

Ask-before-Plan: Proactive Language Agents for Real-World Planning

The evolution of large language models (LLMs) has enhanced the planning capabilities of language agents in diverse real-world scenarios. Despite these advancements, the potential of LLM-powered agents to comprehend ambiguous user instructions for reasoning and decision-making is still under exploration. In this work, we introduce a new task, Proactive Agent Planning, which requires language agents to predict clarification needs based on user-agent conversation and agent-environment interaction, invoke external tools to collect valid information, and generate a plan to fulfill the user's demands. To study this practical problem, we establish a new benchmark dataset, Ask-before-Plan. To tackle the deficiency of LLMs in proactive planning, we propose a novel multi-agent framework, Clarification-Execution-Planning (\texttt{CEP}), which consists of three agents specialized in clarification, execution, and planning. We introduce the trajectory tuning scheme for the clarification agent and static execution agent, as well as the memory recollection mechanism for the dynamic execution agent. Extensive evaluations and comprehensive analyses conducted on the Ask-before-Plan dataset validate the effectiveness of our proposed framework.

Updated: 2024-06-18 14:07:28

标题: 询问-计划前：面向实际规划的积极语言代理

摘要: 大型语言模型（LLMs）的发展增强了语言代理在各种现实场景中的规划能力。尽管取得了这些进展，LLM驱动的代理在理解模糊用户指令以进行推理和决策的潜力仍在探索中。在这项工作中，我们介绍了一个新任务，即主动代理规划，需要语言代理根据用户-代理对话和代理-环境交互预测澄清需求，调用外部工具收集有效信息，并生成一个满足用户需求的计划。为了研究这一实际问题，我们建立了一个新的基准数据集，名为Ask-before-Plan。为了解决LLMs在主动规划中的不足，我们提出了一个新颖的多代理框架，澄清-执行-规划（CEP），其中包括专门负责澄清、执行和规划的三个代理。我们引入了用于澄清代理和静态执行代理的轨迹调整方案，以及用于动态执行代理的记忆回忆机制。对Ask-before-Plan数据集进行的广泛评估和全面分析验证了我们提出的框架的有效性。

更新时间: 2024-06-18 14:07:28

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2406.12639v1

Saliency Attention and Semantic Similarity-Driven Adversarial Perturbation

In this paper, we introduce an enhanced textual adversarial attack method, known as Saliency Attention and Semantic Similarity driven adversarial Perturbation (SASSP). The proposed scheme is designed to improve the effectiveness of contextual perturbations by integrating saliency, attention, and semantic similarity. Traditional adversarial attack methods often struggle to maintain semantic consistency and coherence while effectively deceiving target models. Our proposed approach addresses these challenges by incorporating a three-pronged strategy for word selection and perturbation. First, we utilize a saliency-based word selection to prioritize words for modification based on their importance to the model's prediction. Second, attention mechanisms are employed to focus perturbations on contextually significant words, enhancing the attack's efficacy. Finally, an advanced semantic similarity-checking method is employed that includes embedding-based similarity and paraphrase detection. By leveraging models like Sentence-BERT for embedding similarity and fine-tuned paraphrase detection models from the Sentence Transformers library, the scheme ensures that the perturbed text remains contextually appropriate and semantically consistent with the original. Empirical evaluations demonstrate that SASSP generates adversarial examples that not only maintain high semantic fidelity but also effectively deceive state-of-the-art natural language processing models. Moreover, in comparison to the original scheme of contextual perturbation CLARE, SASSP has yielded a higher attack success rate and lower word perturbation rate.

Updated: 2024-06-18 14:07:27

标题: 显著性注意力和语义相似性驱动的对抗扰动

摘要: 在本文中，我们介绍了一种增强的文本对抗攻击方法，称为Saliency Attention and Semantic Similarity driven adversarial Perturbation（SASSP）。所提出的方案旨在通过整合显著性、注意力和语义相似性来提高上下文扰动的有效性。传统的对抗攻击方法经常在有效欺骗目标模型的同时，难以保持语义一致性和连贯性。我们提出的方法通过整合三重策略来解决这些挑战，用于词语选择和扰动。首先，我们利用基于显著性的词语选择，根据其对模型预测的重要性优先考虑修改词语。其次，采用注意力机制将扰动集中在上下文中重要的词语上，增强攻击的有效性。最后，采用高级语义相似性检测方法，包括基于嵌入的相似性和释义检测。通过利用类似Sentence-BERT的模型进行嵌入相似性和从Sentence Transformers库中微调的释义检测模型，该方案确保扰动文本保持上下文适当并与原始文本在语义上保持一致。实证评估表明，SASSP生成的对抗样本不仅保持高语义保真度，还有效欺骗了最先进的自然语言处理模型。此外，与原始的上下文扰动CLARE方案相比，SASSP取得了更高的攻击成功率和更低的词语扰动率。

更新时间: 2024-06-18 14:07:27

领域: cs.CR,cs.CL,cs.LG

下载: http://arxiv.org/abs/2406.19413v1

Efficient and Long-Tailed Generalization for Pre-trained Vision-Language Model

Pre-trained vision-language models like CLIP have shown powerful zero-shot inference ability via image-text matching and prove to be strong few-shot learners in various downstream tasks. However, in real-world scenarios, adapting CLIP to downstream tasks may encounter the following challenges: 1) data may exhibit long-tailed data distributions and might not have abundant samples for all the classes; 2) There might be emerging tasks with new classes that contain no samples at all. To overcome them, we propose a novel framework to achieve efficient and long-tailed generalization, which can be termed as Candle. During the training process, we propose compensating logit-adjusted loss to encourage large margins of prototypes and alleviate imbalance both within the base classes and between the base and new classes. For efficient adaptation, we treat the CLIP model as a black box and leverage the extracted features to obtain visual and textual prototypes for prediction. To make full use of multi-modal information, we also propose cross-modal attention to enrich the features from both modalities. For effective generalization, we introduce virtual prototypes for new classes to make up for their lack of training images. Candle achieves state-of-the-art performance over extensive experiments on 11 diverse datasets while substantially reducing the training time, demonstrating the superiority of our approach. The source code is available at https://github.com/shijxcs/Candle.

Updated: 2024-06-18 14:07:13

标题: 预训练视觉语言模型的高效和长尾泛化

摘要: 像CLIP这样的预训练视觉-语言模型已经展示出强大的零样本推理能力，通过图像-文本匹配，并在各种下游任务中证明了强大的少样本学习能力。然而，在现实世界的场景中，将CLIP调整到下游任务可能会遇到以下挑战：1）数据可能表现出长尾数据分布，可能不具有丰富的样本覆盖所有类别；2）可能会出现包含没有任何样本的新类别的新任务。为了克服这些挑战，我们提出了一个新颖的框架，实现了高效的长尾泛化，可以称为Candle。在训练过程中，我们提出了补偿对数调整损失，以鼓励原型之间的大边界，并减轻基类别和新类别之间以及基类别内部的不平衡。为了实现高效的适应性，我们将CLIP模型视为黑盒，并利用提取的特征来获得用于预测的视觉和文本原型。为了充分利用多模态信息，我们还提出了交叉模态注意力，以丰富来自两种模态的特征。为了有效的泛化，我们引入了新类别的虚拟原型，弥补了它们缺乏训练图像的不足。通过对11个不同数据集的广泛实验，Candle实现了最先进的性能，同时大大减少了训练时间，展示了我们方法的优越性。源代码可在https://github.com/shijxcs/Candle 上找到。

更新时间: 2024-06-18 14:07:13

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2406.12638v1

ScenEval: A Benchmark for Scenario-Based Evaluation of Code Generation

In the scenario-based evaluation of machine learning models, a key problem is how to construct test datasets that represent various scenarios. The methodology proposed in this paper is to construct a benchmark and attach metadata to each test case. Then a test system can be constructed with test morphisms that filter the test cases based on metadata to form a dataset. The paper demonstrates this methodology with large language models for code generation. A benchmark called ScenEval is constructed from problems in textbooks, an online tutorial website and Stack Overflow. Filtering by scenario is demonstrated and the test sets are used to evaluate ChatGPT for Java code generation. Our experiments found that the performance of ChatGPT decreases with the complexity of the coding task. It is weakest for advanced topics like multi-threading, data structure algorithms and recursive methods. The Java code generated by ChatGPT tends to be much shorter than reference solution in terms of number of lines, while it is more likely to be more complex in both cyclomatic and cognitive complexity metrics, if the generated code is correct. However, the generated code is more likely to be less complex than the reference solution if the code is incorrect.

Updated: 2024-06-18 14:02:20

标题: ScenEval：代码生成场景评估的基准测试

摘要: 在基于场景的机器学习模型评估中，一个关键问题是如何构建代表各种场景的测试数据集。本文提出的方法是构建一个基准，并为每个测试用例附加元数据。然后可以使用基于元数据的测试态映射来过滤测试用例，形成数据集。本文展示了这种方法论在大型语言模型用于代码生成方面的应用。一个名为ScenEval的基准是从教科书、在线教程网站和Stack Overflow的问题中构建的。通过场景进行过滤，并使用这些测试集来评估ChatGPT生成Java代码。我们的实验发现，ChatGPT的性能随着编码任务的复杂性而降低。在高级主题如多线程、数据结构算法和递归方法中表现最差。ChatGPT生成的Java代码在行数方面往往比参考解决方案要短，但在圈复杂度和认知复杂度等指标上更可能更复杂，如果生成的代码是正确的。然而，如果代码不正确，生成的代码更可能比参考解决方案更简单。

更新时间: 2024-06-18 14:02:20

领域: cs.SE,cs.AI

下载: http://arxiv.org/abs/2406.12635v1

News Without Borders: Domain Adaptation of Multilingual Sentence Embeddings for Cross-lingual News Recommendation

Rapidly growing numbers of multilingual news consumers pose an increasing challenge to news recommender systems in terms of providing customized recommendations. First, existing neural news recommenders, even when powered by multilingual language models (LMs), suffer substantial performance losses in zero-shot cross-lingual transfer (ZS-XLT). Second, the current paradigm of fine-tuning the backbone LM of a neural recommender on task-specific data is computationally expensive and infeasible in few-shot recommendation and cold-start setups, where data is scarce or completely unavailable. In this work, we propose a news-adapted sentence encoder (NaSE), domain-specialized from a pretrained massively multilingual sentence encoder (SE). To this end, we construct and leverage PolyNews and PolyNewsParallel, two multilingual news-specific corpora. With the news-adapted multilingual SE in place, we test the effectiveness of (i.e., question the need for) supervised fine-tuning for news recommendation, and propose a simple and strong baseline based on (i) frozen NaSE embeddings and (ii) late click-behavior fusion. We show that NaSE achieves state-of-the-art performance in ZS-XLT in true cold-start and few-shot news recommendation.

Updated: 2024-06-18 14:01:53

标题: 没有边界的新闻：多语句嵌入的领域适应，用于跨语言新闻推荐

摘要: 快速增长的多语言新闻消费者数量对新闻推荐系统提出了越来越大的挑战，需要提供定制化的推荐。首先，即使由多语言语言模型（LMs）支持，现有的神经新闻推荐系统在零-shot跨语言转移（ZS-XLT）中也遭受着相当大的性能损失。其次，目前的范式是在任务特定数据上对神经推荐系统的骨干LM进行微调，这在少样本推荐和冷启动设置中是计算上昂贵且不可行的，因为数据稀缺或完全不可用。在这项工作中，我们提出了一种新闻适应的句子编码器（NaSE），从预训练的大规模多语言句子编码器（SE）中专门针对领域进行了特化。为此，我们构建并利用了两个多语言新闻特定语料库PolyNews和PolyNewsParallel。通过使用新闻适应的多语言SE，我们测试了新闻推荐中监督微调的有效性，并提出了一个基于（i）冻结的NaSE嵌入和（ii）延迟点击行为融合的简单而强大的基线。我们展示了NaSE在真正冷启动和少样本新闻推荐中实现了最先进的性能。

更新时间: 2024-06-18 14:01:53

领域: cs.IR,cs.AI,I.2.7; H.3.3

下载: http://arxiv.org/abs/2406.12634v1

SeTAR: Out-of-Distribution Detection with Selective Low-Rank Approximation

Out-of-distribution (OOD) detection is crucial for the safe deployment of neural networks. Existing CLIP-based approaches perform OOD detection by devising novel scoring functions or sophisticated fine-tuning methods. In this work, we propose SeTAR, a novel, training-free OOD detection method that leverages selective low-rank approximation of weight matrices in vision-language and vision-only models. SeTAR enhances OOD detection via post-hoc modification of the model's weight matrices using a simple greedy search algorithm. Based on SeTAR, we further propose SeTAR+FT, a fine-tuning extension optimizing model performance for OOD detection tasks. Extensive evaluations on ImageNet1K and Pascal-VOC benchmarks show SeTAR's superior performance, reducing the false positive rate by up to 18.95% and 36.80% compared to zero-shot and fine-tuning baselines. Ablation studies further validate our approach's effectiveness, robustness, and generalizability across different model backbones. Our work offers a scalable, efficient solution for OOD detection, setting a new state-of-the-art in this area.

Updated: 2024-06-18 13:55:13

标题: SeTAR：选择性低秩逼近的异常检测

摘要: Out-of-distribution (OOD) detection is crucial for the safe deployment of neural networks. Existing CLIP-based approaches perform OOD detection by devising novel scoring functions or sophisticated fine-tuning methods. In this work, we propose SeTAR, a novel, training-free OOD detection method that leverages selective low-rank approximation of weight matrices in vision-language and vision-only models. SeTAR enhances OOD detection via post-hoc modification of the model's weight matrices using a simple greedy search algorithm. Based on SeTAR, we further propose SeTAR+FT, a fine-tuning extension optimizing model performance for OOD detection tasks. Extensive evaluations on ImageNet1K and Pascal-VOC benchmarks show SeTAR's superior performance, reducing the false positive rate by up to 18.95% and 36.80% compared to zero-shot and fine-tuning baselines. Ablation studies further validate our approach's effectiveness, robustness, and generalizability across different model backbones. Our work offers a scalable, efficient solution for OOD detection, setting a new state-of-the-art in this area.

更新时间: 2024-06-18 13:55:13

领域: cs.CL,cs.AI,cs.CV

下载: http://arxiv.org/abs/2406.12629v1

Judging the Judges: Evaluating Alignment and Vulnerabilities in LLMs-as-Judges

Offering a promising solution to the scalability challenges associated with human evaluation, the LLM-as-a-judge paradigm is rapidly gaining traction as an approach to evaluating large language models (LLMs). However, there are still many open questions about the strengths and weaknesses of this paradigm, and what potential biases it may hold. In this paper, we present a comprehensive study of the performance of various LLMs acting as judges. We leverage TriviaQA as a benchmark for assessing objective knowledge reasoning of LLMs and evaluate them alongside human annotations which we found to have a high inter-annotator agreement. Our study includes 9 judge models and 9 exam taker models -- both base and instruction-tuned. We assess the judge model's alignment across different model sizes, families, and judge prompts. Among other results, our research rediscovers the importance of using Cohen's kappa as a metric of alignment as opposed to simple percent agreement, showing that judges with high percent agreement can still assign vastly different scores. We find that both Llama-3 70B and GPT-4 Turbo have an excellent alignment with humans, but in terms of ranking exam taker models, they are outperformed by both JudgeLM-7B and the lexical judge Contains, which have up to 34 points lower human alignment. Through error analysis and various other studies, including the effects of instruction length and leniency bias, we hope to provide valuable lessons for using LLMs as judges in the future.

Updated: 2024-06-18 13:49:54

标题: 评判法官：评估法学硕士作为法官的一致性和脆弱性

摘要: 提供了一种有希望解决与人类评估相关的可扩展性挑战的解决方案，LLM作为法官范式正迅速获得关注，作为评估大型语言模型（LLMs）的一种方法。然而，关于该范式的优势和劣势以及可能存在的潜在偏见仍有许多未解之谜。在本文中，我们对各种LLMs作为法官的表现进行了全面研究。我们利用TriviaQA作为评估LLMs客观知识推理能力的基准，并与我们发现具有较高注释者间一致性的人类注释进行评估。我们的研究包括9个法官模型和9个考试模型-基础和经过指导调整。我们评估了法官模型在不同模型大小、系列和法官提示下的对齐情况。除其他结果外，我们的研究重新发现了使用Cohen's kappa作为对齐度量标准的重要性，而不是简单的百分比一致性，显示出高百分比一致性的法官仍然可以分配截然不同的分数。我们发现Llama-370B和GPT-4 Turbo都与人类有着极好的对齐，但在排名考试模型方面，它们被JudgeLM-7B和词汇法官Contains超越，这两者的人类对齐度低至34个点。通过误差分析和其他各种研究，包括指令长度和宽容偏见的影响，我们希望为未来在使用LLMs作为法官时提供宝贵的经验教训。

更新时间: 2024-06-18 13:49:54

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2406.12624v1

User Centric Evaluation of Code Generation Tools

With the rapid advance of machine learning (ML) technology, large language models (LLMs) are increasingly explored as an intelligent tool to generate program code from natural language specifications. However, existing evaluations of LLMs have focused on their capabilities in comparison with humans. It is desirable to evaluate their usability when deciding on whether to use a LLM in software production. This paper proposes a user centric method for this purpose. It includes metadata in the test cases of a benchmark to describe their usages, conducts testing in a multi-attempt process that mimics the uses of LLMs, measures LLM generated solutions on a set of quality attributes that reflect usability, and evaluates the performance based on user experiences in the uses of LLMs as a tool. The paper also reports a case study with the method in the evaluation of ChatGPT's usability as a code generation tool for the R programming language. Our experiments demonstrated that ChatGPT is highly useful for generating R program code although it may fail on hard programming tasks. The user experiences are good with overall average number of attempts being 1.61 and the average time of completion being 47.02 seconds. Our experiments also found that the weakest aspect of usability is conciseness, which has a score of 3.80 out of 5.

Updated: 2024-06-18 13:45:05

标题: 用户中心评估代码生成工具

摘要: 随着机器学习（ML）技术的快速发展，大型语言模型（LLMs）越来越被探索作为从自然语言规范生成程序代码的智能工具。然而，现有的LLMs评估侧重于它们与人类的能力进行比较。在决定是否在软件生产中使用LLM时，评估它们的可用性是值得的。本文提出了一种以用户为中心的方法来实现这一目的。该方法在基准测试的测试用例中包含元数据以描述它们的用途，通过模拟LLMs的使用进行多次尝试的测试，对LLM生成的解决方案进行一组反映可用性的质量属性的测量，并根据用户在使用LLMs作为工具时的体验来评估性能。本文还报告了一项使用该方法评估ChatGPT在R编程语言中作为代码生成工具的可用性的案例研究。我们的实验表明，尽管在较难的编程任务上可能失败，但ChatGPT非常适用于生成R程序代码。用户体验整体上很好，平均尝试次数为1.61次，平均完成时间为47.02秒。我们的实验还发现可用性中最薄弱的方面是简洁性，得分为5分中的3.80分。

更新时间: 2024-06-18 13:45:05

领域: cs.SE,cs.AI

下载: http://arxiv.org/abs/2402.03130v3

Learning Diffusion at Lightspeed

Diffusion regulates a phenomenal number of natural processes and the dynamics of many successful generative models. Existing models to learn the diffusion terms from observational data rely on complex bilevel optimization problems and properly model only the drift of the system. We propose a new simple model, JKOnet*, which bypasses altogether the complexity of existing architectures while presenting significantly enhanced representational capacity: JKOnet* recovers the potential, interaction, and internal energy components of the underlying diffusion process. JKOnet* minimizes a simple quadratic loss, runs at lightspeed, and drastically outperforms other baselines in practice. Additionally, JKOnet* provides a closed-form optimal solution for linearly parametrized functionals. Our methodology is based on the interpretation of diffusion processes as energy-minimizing trajectories in the probability space via the so-called JKO scheme, which we study via its first-order optimality conditions, in light of few-weeks-old advancements in optimization in the probability space.

Updated: 2024-06-18 13:44:07

标题: 学习在光速下的扩散

摘要: 扩散调节着大量自然过程和许多成功的生成模型的动态。现有的从观测数据中学习扩散项的模型依赖于复杂的双层优化问题，并且仅正确地对系统的漂移进行建模。我们提出了一个新的简单模型，JKOnet*，它完全绕过了现有架构的复杂性，同时具有显著增强的表征能力：JKOnet*恢复了潜在、相互作用和内部能量组件的基础扩散过程。JKOnet*最小化简单的二次损失，运行速度快，并在实践中大大优于其他基线。此外，JKOnet*为线性参数化的函数提供了一个封闭形式的最优解。我们的方法论基于将扩散过程解释为概率空间中能量最小化轨迹的JKO方案，并通过其一阶最优性条件进行研究，结合了最近几周在概率空间中的优化方面的进展。

更新时间: 2024-06-18 13:44:07

领域: cs.LG

下载: http://arxiv.org/abs/2406.12616v1

When Are Bias-Free ReLU Networks Like Linear Networks?

We investigate the expressivity and learning dynamics of bias-free ReLU networks. We firstly show that two-layer bias-free ReLU networks have limited expressivity: the only odd function two-layer bias-free ReLU networks can express is a linear one. We then show that, under symmetry conditions on the data, these networks have the same learning dynamics as linear networks. This allows us to give closed-form time-course solutions to certain two-layer bias-free ReLU networks, which has not been done for nonlinear networks outside the lazy learning regime. While deep bias-free ReLU networks are more expressive than their two-layer counterparts, they still share a number of similarities with deep linear networks. These similarities enable us to leverage insights from linear networks, leading to a novel understanding of bias-free ReLU networks. Overall, our results show that some properties established for bias-free ReLU networks arise due to equivalence to linear networks, and suggest that including bias or considering asymmetric data are avenues to engage with nonlinear behaviors.

Updated: 2024-06-18 13:43:58

标题: 当无偏ReLU网络类似于线性网络时是什么时候？

摘要: 我们研究了无偏ReLU网络的表达能力和学习动态。我们首先展示了两层无偏ReLU网络的表达能力有限：唯一能够表达的奇函数是线性函数。然后我们展示，在数据对称条件下，这些网络具有与线性网络相同的学习动态。这使我们能够为某些两层无偏ReLU网络提供封闭形式的时间序列解决方案，这在非线性网络中还未被实现。虽然深度无偏ReLU网络比两层网络更具表达能力，但它们仍与深度线性网络分享许多相似之处。这些相似之处使我们能够借鉴线性网络的见解，从而对无偏ReLU网络有一个新的理解。总的来说，我们的结果表明，对无偏ReLU网络建立的一些性质是由于与线性网络的等价性，而包含偏差或考虑非对称数据是研究非线性行为的途径。

更新时间: 2024-06-18 13:43:58

领域: cs.LG

下载: http://arxiv.org/abs/2406.12615v1

EUvsDisinfo: a Dataset for Multilingual Detection of Pro-Kremlin Disinformation in News Articles

This work introduces EUvsDisinfo, a multilingual dataset of trustworthy and disinformation articles related to pro-Kremlin themes. It is sourced directly from the debunk articles written by experts leading the EUvsDisinfo project. Our dataset is the largest to-date resource in terms of the overall number of articles and distinct languages. It also provides the largest topical and temporal coverage. Using this dataset, we investigate the dissemination of pro-Kremlin disinformation across different languages, uncovering language-specific patterns targeting specific disinformation topics. We further analyse the evolution of topic distribution over an eight-year period, noting a significant surge in disinformation content before the full-scale invasion of Ukraine in 2022. Lastly, we demonstrate the dataset's applicability in training models to effectively distinguish between disinformation and trustworthy content in multilingual settings.

Updated: 2024-06-18 13:43:22

标题: EUvsDisinfo：一个用于多语言检测新闻文章中亲克里姆林宫虚假信息的数据集

摘要: 这项工作介绍了EUvsDisinfo，这是一个关于亲克里姆林主题的可信和虚假信息文章的多语言数据集。该数据集直接来自EUvsDisinfo项目领导专家撰写的揭穿文章。我们的数据集是迄今为止在文章总数和不同语言方面最大的资源。它还提供了最广泛的主题和时间覆盖范围。利用这个数据集，我们调查了亲克里姆林虚假信息在不同语言中的传播情况，揭示了针对特定虚假信息主题的特定语言模式。我们进一步分析了在八年时间内主题分布的演变，注意到在2022年乌克兰全面入侵之前虚假信息内容的显著激增。最后，我们展示了该数据集在多语言环境中训练模型以有效区分虚假信息和可信内容的适用性。

更新时间: 2024-06-18 13:43:22

领域: cs.CL,cs.LG

下载: http://arxiv.org/abs/2406.12614v1

Mixing Artificial and Natural Intelligence: From Statistical Mechanics to AI and Back to Turbulence

The paper reflects on the future role of AI in scientific research, with a special focus on turbulence studies, and examines the evolution of AI, particularly through Diffusion Models rooted in non-equilibrium statistical mechanics. It underscores the significant impact of AI on advancing reduced, Lagrangian models of turbulence through innovative use of deep neural networks. Additionally, the paper reviews various other AI applications in turbulence research and outlines potential challenges and opportunities in the concurrent advancement of AI and statistical hydrodynamics. This discussion sets the stage for a future where AI and turbulence research are intricately intertwined, leading to more profound insights and advancements in both fields.

Updated: 2024-06-18 13:37:50

标题: 混合人工智能和自然智能：从统计力学到人工智能，再回到湍流

摘要: 本文反思了人工智能在科学研究中的未来角色，特别关注湍流研究，并通过根植于非平衡统计力学的扩散模型检验了人工智能的进化。它强调了人工智能对通过创新使用深度神经网络推进湍流的简化拉格朗日模型的重大影响。此外，本文还回顾了湍流研究中各种其他人工智能应用，并概述了在人工智能和统计流体力学同时推进中可能面临的挑战和机遇。这一讨论为一个未来铺平了道路，其中人工智能和湍流研究紧密交织在一起，从而在两个领域中带来更深刻的洞察和进步。

更新时间: 2024-06-18 13:37:50

领域: cs.LG,cond-mat.stat-mech,cs.AI,physics.flu-dyn

下载: http://arxiv.org/abs/2403.17993v2

A Single Graph Convolution Is All You Need: Efficient Grayscale Image Classification

Image classifiers often rely on convolutional neural networks (CNN) for their tasks, which are inherently more heavyweight than multilayer perceptrons (MLPs), which can be problematic in real-time applications. Additionally, many image classification models work on both RGB and grayscale datasets. Classifiers that operate solely on grayscale images are much less common. Grayscale image classification has diverse applications, including but not limited to medical image classification and synthetic aperture radar (SAR) automatic target recognition (ATR). Thus, we present a novel grayscale (single channel) image classification approach using a vectorized view of images. We exploit the lightweightness of MLPs by viewing images as a vector and reducing our problem setting to the grayscale image classification setting. We find that using a single graph convolutional layer batch-wise increases accuracy and reduces variance in the performance of our model. Moreover, we develop a customized accelerator on FPGA for the proposed model with several optimizations to improve its performance. Our experimental results on benchmark grayscale image datasets demonstrate the effectiveness of the proposed model, achieving vastly lower latency (up to 16$\times$ less) and competitive or leading performance compared to other state-of-the-art image classification models on various domain-specific grayscale image classification datasets.

Updated: 2024-06-18 13:36:51

标题: 一次图卷积就够了：高效的灰度图像分类

摘要: 图像分类器通常依赖于卷积神经网络（CNN）来完成任务，这种网络在实时应用中比多层感知器（MLP）更加庞大，这可能会带来问题。此外，许多图像分类模型适用于RGB和灰度数据集。仅仅处理灰度图像的分类器相对较少。灰度图像分类具有多种应用，包括但不限于医学图像分类和合成孔径雷达（SAR）自动目标识别（ATR）。因此，我们提出了一种新颖的灰度（单通道）图像分类方法，通过将图像视为向量来利用MLP的轻量级特性，并将问题设置简化为灰度图像分类设置。我们发现，批处理方式使用单个图形卷积层可以提高准确性并降低模型性能的方差。此外，我们在FPGA上为提出的模型开发了定制加速器，经过多项优化以提高其性能。我们在基准灰度图像数据集上的实验结果表明，提出的模型具有很高的效率，具有大大较低的延迟（最多减少16倍），并且在各种特定领域的灰度图像分类数据集上相比其他最先进的图像分类模型具有竞争力或领先的性能。

更新时间: 2024-06-18 13:36:51

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2402.00564v4

Bridging Local Details and Global Context in Text-Attributed Graphs

Representation learning on text-attributed graphs (TAGs) is vital for real-world applications, as they combine semantic textual and contextual structural information. Research in this field generally consist of two main perspectives: local-level encoding and global-level aggregating, respectively refer to textual node information unification (e.g., using Language Models) and structure-augmented modeling (e.g., using Graph Neural Networks). Most existing works focus on combining different information levels but overlook the interconnections, i.e., the contextual textual information among nodes, which provides semantic insights to bridge local and global levels. In this paper, we propose GraphBridge, a multi-granularity integration framework that bridges local and global perspectives by leveraging contextual textual information, enhancing fine-grained understanding of TAGs. Besides, to tackle scalability and efficiency challenges, we introduce a graphaware token reduction module. Extensive experiments across various models and datasets show that our method achieves state-of-theart performance, while our graph-aware token reduction module significantly enhances efficiency and solves scalability issues.

Updated: 2024-06-18 13:35:25

标题: 在文本属性图中连接本地细节和全局背景

摘要: 文本属性图（TAGs）上的表示学习对于现实世界的应用至关重要，因为它们结合了语义文本和上下文结构信息。该领域的研究通常包括两个主要视角：局部级别编码和全局级别聚合，分别指的是文本节点信息统一（例如，使用语言模型）和结构增强建模（例如，使用图神经网络）。大多数现有作品侧重于结合不同的信息级别，但忽略了节点之间的上下文文本信息，即提供语义洞见以桥接局部和全局级别。在本文中，我们提出了GraphBridge，一个多粒度集成框架，通过利用上下文文本信息，桥接局部和全局视角，增强了对TAGs的细粒度理解。此外，为了解决可扩展性和效率挑战，我们引入了一个图感知的令牌减少模块。在不同模型和数据集上进行的广泛实验表明，我们的方法实现了最先进的性能，而我们的图感知令牌减少模块显著增强了效率并解决了可扩展性问题。

更新时间: 2024-06-18 13:35:25

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2406.12608v1

Attack and Defense of Deep Learning Models in the Field of Web Attack Detection

The challenge of WAD (web attack detection) is growing as hackers continuously refine their methods to evade traditional detection. Deep learning models excel in handling complex unknown attacks due to their strong generalization and adaptability. However, they are vulnerable to backdoor attacks, where contextually irrelevant fragments are inserted into requests, compromising model stability. While backdoor attacks are well studied in image recognition, they are largely unexplored in WAD. This paper introduces backdoor attacks in WAD, proposing five methods and corresponding defenses. Testing on textCNN, biLSTM, and tinybert models shows an attack success rate over 87%, reducible through fine-tuning. Future research should focus on backdoor defenses in WAD. All the code and data of this paper can be obtained at https://anonymous.4open.science/r/attackDefenceinDL-7E05

Updated: 2024-06-18 13:34:02

标题: 《深度学习模型在网络攻击检测领域的攻击与防御》

摘要: WAD（网络攻击检测）的挑战不断增长，因为黑客不断完善其逃避传统检测方法的技巧。深度学习模型在处理复杂未知攻击方面表现出色，因为它们具有强大的泛化能力和适应性。然而，它们容易受到后门攻击的影响，即在请求中插入上下文无关的片段，从而破坏模型的稳定性。虽然后门攻击在图像识别领域得到了很好的研究，但在WAD领域基本上尚未被探索。本文介绍了WAD中的后门攻击，并提出了五种方法和相应的防御措施。对textCNN、biLSTM和tinybert模型的测试显示攻击成功率超过87%，可以通过微调来降低。未来的研究应该集中于WAD中的后门防御。本文的所有代码和数据都可以在https://anonymous.4open.science/r/attackDefenceinDL-7E05 上获取。

更新时间: 2024-06-18 13:34:02

领域: cs.LG,cs.CR

下载: http://arxiv.org/abs/2406.12605v1

Reinforcement-Learning based routing for packet-optical networks with hybrid telemetry

This article provides a methodology and open-source implementation of Reinforcement Learning algorithms for finding optimal routes in a packet-optical network scenario. The algorithm uses measurements provided by the physical layer (pre-FEC bit error rate and propagation delay) and the link layer (link load) to configure a set of latency-based rewards and penalties based on such measurements. Then, the algorithm executes Q-learning based on this set of rewards for finding the optimal routing strategies. It is further shown that the algorithm dynamically adapts to changing network conditions by re-calculating optimal policies upon either link load changes or link degradation as measured by pre-FEC BER.

Updated: 2024-06-18 13:32:12

标题: 基于强化学习的混合遥测分组光网络路由

摘要: 本文提供了一种方法论和开源实现的强化学习算法，用于在分组光网络场景中找到最佳路由。该算法利用物理层提供的测量数据（预前向纠错比特误码率和传播延迟）和链路层提供的测量数据（链路负载）配置一组基于延迟的奖励和惩罚。然后，该算法执行基于Q学习的路由策略优化。进一步表明，该算法通过根据链路负载变化或者通过预前向纠错比特误码率来测量的链路退化重新计算最佳策略，动态适应网络变化条件。

更新时间: 2024-06-18 13:32:12

领域: cs.NI,cs.LG

下载: http://arxiv.org/abs/2406.12602v1

Generalization bounds for mixing processes via delayed online-to-PAC conversions

We study the generalization error of statistical learning algorithms in a non-i.i.d. setting, where the training data is sampled from a stationary mixing process. We develop an analytic framework for this scenario based on a reduction to online learning with delayed feedback. In particular, we show that the existence of an online learning algorithm with bounded regret (against a fixed statistical learning algorithm in a specially constructed game of online learning with delayed feedback) implies low generalization error of said statistical learning method even if the data sequence is sampled from a mixing time series. The rates demonstrate a trade-off between the amount of delay in the online learning game and the degree of dependence between consecutive data points, with near-optimal rates recovered in a number of well-studied settings when the delay is tuned appropriately as a function of the mixing time of the process.

Updated: 2024-06-18 13:31:15

标题: 混合过程的泛化界限：通过延迟在线到PAC转换

摘要: 我们研究了统计学习算法在非独立同分布设置中的泛化误差，其中训练数据是从一个稳定的混合过程中抽样得到的。我们基于将其简化为带有延迟反馈的在线学习，为这种情况开发了一个分析框架。特别地，我们展示了存在一个具有有界遗憾的在线学习算法（针对一个特别构造的带有延迟反馈的在线学习游戏中的固定统计学习算法），即使数据序列是从混合时间序列中抽样得到，也能表明该统计学习方法的泛化误差较低。这些速率展示了在线学习游戏中延迟量和连续数据点之间的依赖程度之间的折衷，当延迟适当调整为过程的混合时间的函数时，在许多研究良好的设置中可以恢复接近最优的速率。

更新时间: 2024-06-18 13:31:15

领域: cs.LG

下载: http://arxiv.org/abs/2406.12600v1

PromptDSI: Prompt-based Rehearsal-free Instance-wise Incremental Learning for Document Retrieval

Differentiable Search Index (DSI) utilizes Pre-trained Language Models (PLMs) for efficient document retrieval without relying on external indexes. However, DSIs need full re-training to handle updates in dynamic corpora, causing significant computational inefficiencies. We introduce PromptDSI, a rehearsal-free, prompt-based approach for instance-wise incremental learning in document retrieval. PromptDSI attaches prompts to the frozen PLM's encoder of DSI, leveraging its powerful representation to efficiently index new corpora while maintaining a balance between stability and plasticity. We eliminate the initial forward pass of prompt-based continual learning methods that doubles training and inference time. Moreover, we propose a topic-aware prompt pool that employs neural topic embeddings as fixed keys. This strategy ensures diverse and effective prompt usage, addressing the challenge of parameter underutilization caused by the collapse of the query-key matching mechanism. Our empirical evaluations demonstrate that PromptDSI matches IncDSI in managing forgetting while significantly enhancing recall by over 4% on new corpora.

Updated: 2024-06-18 13:25:18

标题: PromptDSI：基于提示的无预演示逐实例增量学习在文档检索中的应用

摘要: Differentiable Search Index (DSI)利用预训练语言模型（PLMs）进行高效文档检索，而不依赖外部索引。然而，DSI需要完全重新训练以处理动态语料库中的更新，导致显着的计算效率低下。我们介绍了PromptDSI，这是一种无需重复练习，基于提示的方法，用于文档检索中的逐实例增量学习。PromptDSI将提示附加到DSI的冻结PLM编码器上，利用其强大的表示来高效地索引新的语料库，同时保持稳定性和可塑性之间的平衡。我们消除了基于提示的持续学习方法的初始向前传递，从而使训练和推断时间翻倍。此外，我们提出了一个主题感知的提示池，利用神经主题嵌入作为固定键。这种策略确保了多样化和有效的提示使用，解决了由于查询-键匹配机制崩溃而导致的参数未充分利用的挑战。我们的实证评估表明，PromptDSI在管理遗忘方面与IncDSI相匹配，同时在新的语料库上将召回率显着提高了超过4%。

更新时间: 2024-06-18 13:25:18

领域: cs.IR,cs.AI,cs.CL,cs.LG

下载: http://arxiv.org/abs/2406.12593v1

Erase to Enhance: Data-Efficient Machine Unlearning in MRI Reconstruction

Machine unlearning is a promising paradigm for removing unwanted data samples from a trained model, towards ensuring compliance with privacy regulations and limiting harmful biases. Although unlearning has been shown in, e.g., classification and recommendation systems, its potential in medical image-to-image translation, specifically in image recon-struction, has not been thoroughly investigated. This paper shows that machine unlearning is possible in MRI tasks and has the potential to benefit for bias removal. We set up a protocol to study how much shared knowledge exists between datasets of different organs, allowing us to effectively quantify the effect of unlearning. Our study reveals that combining training data can lead to hallucinations and reduced image quality in the reconstructed data. We use unlearning to remove hallucinations as a proxy exemplar of undesired data removal. Indeed, we show that machine unlearning is possible without full retraining. Furthermore, our observations indicate that maintaining high performance is feasible even when using only a subset of retain data. We have made our code publicly accessible.

Updated: 2024-06-18 13:20:08

标题: 擦除以增强：MRI重建中的数据高效机器遗忘

摘要: 机器遗忘是一种有希望的范式，用于从训练模型中删除不需要的数据样本，以确保符合隐私法规并限制有害偏见。虽然在分类和推荐系统中已经展示了遗忘的效果，但在医学图像到图像翻译中，特别是在图像重建方面，其潜力尚未得到深入研究。本文表明在MRI任务中机器遗忘是可能的，并有利于偏见消除。我们建立了一个协议来研究不同器官数据集之间存在多少共享知识，从而能够有效量化遗忘的效果。我们的研究表明，合并训练数据可能导致幻觉和重建数据的图像质量降低。我们使用遗忘来移除幻觉，作为不需要的数据移除的代理样本。事实上，我们展示了即使不进行完全重新训练，机器遗忘也是可能的。此外，我们的观察表明，即使仅使用保留数据的子集，保持高性能也是可行的。我们已经将我们的代码公开可访问。

更新时间: 2024-06-18 13:20:08

领域: eess.IV,cs.CV,cs.LG

下载: http://arxiv.org/abs/2405.15517v2

Discovering Minimal Reinforcement Learning Environments

Reinforcement learning (RL) agents are commonly trained and evaluated in the same environment. In contrast, humans often train in a specialized environment before being evaluated, such as studying a book before taking an exam. The potential of such specialized training environments is still vastly underexplored, despite their capacity to dramatically speed up training. The framework of synthetic environments takes a first step in this direction by meta-learning neural network-based Markov decision processes (MDPs). The initial approach was limited to toy problems and produced environments that did not transfer to unseen RL algorithms. We extend this approach in three ways: Firstly, we modify the meta-learning algorithm to discover environments invariant towards hyperparameter configurations and learning algorithms. Secondly, by leveraging hardware parallelism and introducing a curriculum on an agent's evaluation episode horizon, we can achieve competitive results on several challenging continuous control problems. Thirdly, we surprisingly find that contextual bandits enable training RL agents that transfer well to their evaluation environment, even if it is a complex MDP. Hence, we set up our experiments to train synthetic contextual bandits, which perform on par with synthetic MDPs, yield additional insights into the evaluation environment, and can speed up downstream applications.

Updated: 2024-06-18 13:19:26

标题: 发现最小化强化学习环境

摘要: 增强学习（RL）代理通常在相同的环境中进行训练和评估。相比之下，人类在接受评估之前通常会在专门的环境中接受训练，比如在考试之前学习一本书。尽管这种专门训练环境能够极大加快训练的速度，但其潜力仍然被大大低估。合成环境框架朝着这个方向迈出了第一步，通过元学习基于神经网络的马尔可夫决策过程（MDP）。最初的方法局限于简单问题，并产生了并不能转移到未见的RL算法的环境。我们通过三种方式扩展了这一方法：首先，修改元学习算法以发现对超参数配置和学习算法不变的环境。其次，通过利用硬件并行性并在代理的评估阶段引入课程，我们可以在几个具有挑战性的连续控制问题上取得竞争性结果。第三，令人惊讶地发现，上下文匪帮能够训练RL代理，并很好地转移到其评估环境，即使是一个复杂的MDP。因此，我们设置了实验来训练合成上下文匪帮，其表现与合成MDP相当，提供了对评估环境的额外见解，并可以加速下游应用。

更新时间: 2024-06-18 13:19:26

领域: cs.LG

下载: http://arxiv.org/abs/2406.12589v1

UIFV: Data Reconstruction Attack in Vertical Federated Learning

Vertical Federated Learning (VFL) facilitates collaborative machine learning without the need for participants to share raw private data. However, recent studies have revealed privacy risks where adversaries might reconstruct sensitive features through data leakage during the learning process. Although data reconstruction methods based on gradient or model information are somewhat effective, they reveal limitations in VFL application scenarios. This is because these traditional methods heavily rely on specific model structures and/or have strict limitations on application scenarios. To address this, our study introduces the Unified InverNet Framework into VFL, which yields a novel and flexible approach (dubbed UIFV) that leverages intermediate feature data to reconstruct original data, instead of relying on gradients or model details. The intermediate feature data is the feature exchanged by different participants during the inference phase of VFL. Experiments on four datasets demonstrate that our methods significantly outperform state-of-the-art techniques in attack precision. Our work exposes severe privacy vulnerabilities within VFL systems that pose real threats to practical VFL applications and thus confirms the necessity of further enhancing privacy protection in the VFL architecture.

Updated: 2024-06-18 13:18:52

标题: UIFV：垂直联邦学习中的数据重构攻击

摘要: 垂直联邦学习（VFL）促进了协作机器学习，无需参与者共享原始私人数据。然而，最近的研究揭示了隐私风险，敌对方可能在学习过程中通过数据泄露重新构建敏感特征。虽然基于梯度或模型信息的数据重建方法在某种程度上有效，但它们在VFL应用场景中也存在局限性。这是因为这些传统方法在很大程度上依赖于特定的模型结构和/或在应用场景上有严格的限制。为了解决这个问题，我们的研究将统一逆网络框架引入到VFL中，提出了一种新颖灵活的方法（称为UIFV），通过利用中间特征数据重构原始数据，而不是依赖于梯度或模型细节。中间特征数据是在VFL推理阶段由不同参与者交换的特征。对四个数据集的实验证明，我们的方法在攻击精度方面明显优于最先进的技术。我们的工作揭示了VFL系统中存在严重的隐私漏洞，对实际VFL应用构成真正的威胁，从而确认了在VFL架构中进一步增强隐私保护的必要性。

更新时间: 2024-06-18 13:18:52

领域: cs.LG,cs.AI,cs.CR,stat.ML

下载: http://arxiv.org/abs/2406.12588v1

Breaking the Ceiling of the LLM Community by Treating Token Generation as a Classification for Ensembling

Ensembling multiple models has always been an effective approach to push the limits of existing performance and is widely used in classification tasks by simply averaging the classification probability vectors from multiple classifiers to achieve better accuracy. However, in the thriving open-source Large Language Model (LLM) community, ensembling methods are rare and typically limited to ensembling the full-text outputs of LLMs, such as selecting the best output using a ranker, which leads to underutilization of token-level probability information. In this paper, we treat the Generation of each token by LLMs as a Classification (GaC) for ensembling. This approach fully exploits the probability information at each generation step and better prevents LLMs from producing early incorrect tokens that lead to snowballing errors. In experiments, we ensemble state-of-the-art LLMs on several benchmarks, including exams, mathematics and reasoning, and observe that our method breaks the existing community performance ceiling. Furthermore, we observed that most of the tokens in the answer are simple and do not affect the correctness of the final answer. Therefore, we also experimented with ensembling only key tokens, and the results showed better performance with lower latency across benchmarks.

Updated: 2024-06-18 13:17:26

标题: 将标题翻译为：将Token生成视为集成分类，突破LLM社区的局限性

摘要: 将多个模型组合起来一直是推动现有性能极限的有效方法，并且在分类任务中被广泛使用，通过简单地对多个分类器的分类概率向量进行平均来实现更高的准确性。然而，在蓬勃发展的开源大型语言模型(LLM)社区中，集成方法很少见，通常局限于集成LLM的全文输出，例如使用排名器选择最佳输出，这导致了对令牌级概率信息的低效利用。在本文中，我们将LLM生成每个令牌视为分类(GaC)进行集成处理。这种方法充分利用了每个生成步骤的概率信息，并更好地防止LLM产生导致错误逐渐扩大的早期不正确令牌。在实验中，我们在几个基准测试中集成了最先进的LLM，并观察到我们的方法突破了现有社区性能上限。此外，我们观察到答案中大多数令牌都很简单，不会影响最终答案的正确性。因此，我们也尝试仅集成关键令牌，结果显示在各项基准测试中具有更好的性能和较低的延迟。

更新时间: 2024-06-18 13:17:26

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2406.12585v1

The Future of Consumer Edge-AI Computing

In the last decade, Deep Learning has rapidly infiltrated the consumer end, mainly thanks to hardware acceleration across devices. However, as we look towards the future, it is evident that isolated hardware will be insufficient. Increasingly complex AI tasks demand shared resources, cross-device collaboration, and multiple data types, all without compromising user privacy or quality of experience. To address this, we introduce a novel paradigm centered around EdgeAI-Hub devices, designed to reorganise and optimise compute resources and data access at the consumer edge. To this end, we lay a holistic foundation for the transition from on-device to Edge-AI serving systems in consumer environments, detailing their components, structure, challenges and opportunities.

Updated: 2024-06-18 13:15:55

标题: 消费者边缘人工智能计算的未来

摘要: 在过去的十年里，深度学习迅速渗透到消费者端，主要得益于各种设备的硬件加速。然而，展望未来，很明显孤立的硬件将是不够的。越来越复杂的人工智能任务需要共享资源、跨设备协作和多种数据类型，而且不能损害用户隐私或体验质量。为了解决这个问题，我们引入了一个围绕EdgeAI-Hub设备的新范式，旨在重新组织和优化消费者边缘的计算资源和数据访问。为此，我们为在消费者环境中从设备端向边缘人工智能服务系统的过渡奠定了全面的基础，详细介绍了它们的组成、结构、挑战和机遇。

更新时间: 2024-06-18 13:15:55

领域: cs.LG

下载: http://arxiv.org/abs/2210.10514v3

Theoretical Understanding of In-Context Learning in Shallow Transformers with Unstructured Data

Large language models (LLMs) are powerful models that can learn concepts at the inference stage via in-context learning (ICL). While theoretical studies, e.g., \cite{zhang2023trained}, attempt to explain the mechanism of ICL, they assume the input $x_i$ and the output $y_i$ of each demonstration example are in the same token (i.e., structured data). However, in real practice, the examples are usually text input, and all words, regardless of their logic relationship, are stored in different tokens (i.e., unstructured data \cite{wibisono2023role}). To understand how LLMs learn from the unstructured data in ICL, this paper studies the role of each component in the transformer architecture and provides a theoretical understanding to explain the success of the architecture. In particular, we consider a simple transformer with one/two attention layers and linear regression tasks for the ICL prediction. We observe that (1) a transformer with two layers of (self-)attentions with a look-ahead attention mask can learn from the prompt in the unstructured data, and (2) positional encoding can match the $x_i$ and $y_i$ tokens to achieve a better ICL performance.

Updated: 2024-06-18 13:11:32

标题: 浅层变压器中对非结构化数据进行上下文学习的理论理解

摘要: 大型语言模型（LLMs）是强大的模型，可以通过上下文学习（ICL）在推理阶段学习概念。尽管理论研究，例如\cite{zhang2023trained}，试图解释ICL的机制，但它们假设每个演示示例的输入$x_i$和输出$y_i$在同一个令牌中（即结构化数据）。然而，在实际应用中，示例通常是文本输入，所有单词，无论它们的逻辑关系如何，都存储在不同的令牌中（即非结构化数据\cite{wibisono2023role}）。为了理解LLMs如何从ICL中的非结构化数据中学习，本文研究了变压器架构中每个组件的作用，并提供了理论理解来解释该架构的成功。特别地，我们考虑一个简单的变压器，具有一/两个注意力层和线性回归任务用于ICL预测。我们观察到（1）具有两层（自我）注意力和前瞻性注意力掩码的变压器可以从非结构化数据中的提示中学习，并且（2）位置编码可以匹配$x_i$和$y_i$令牌以实现更好的ICL性能。

更新时间: 2024-06-18 13:11:32

领域: cs.LG,cs.CL,stat.ML

下载: http://arxiv.org/abs/2402.00743v2

Assistive Large Language Model Agents for Socially-Aware Negotiation Dialogues

We develop assistive agents based on Large Language Models (LLMs) that aid interlocutors in business negotiations. Specifically, we simulate business negotiations by letting two LLM-based agents engage in role play. A third LLM acts as a remediator agent to rewrite utterances violating norms for improving negotiation outcomes. We introduce a simple tuning-free and label-free In-Context Learning (ICL) method to identify high-quality ICL exemplars for the remediator, where we propose a novel select criteria, called value impact, to measure the quality of the negotiation outcomes. We provide rich empirical evidence to demonstrate its effectiveness in negotiations across three different negotiation topics. The source code and the generated dataset will be publicly available upon acceptance.

Updated: 2024-06-18 13:10:16

标题: "社交意识谈判对话的辅助大型语言模型代理"

摘要: 我们开发了基于大型语言模型（LLM）的辅助代理，帮助商业谈判中的对话者。具体来说，我们通过让两个基于LLM的代理进行角色扮演来模拟商业谈判。第三个LLM作为一个重新调解代理，重新书写违反规范的话语，以改善谈判结果。我们引入了一种简单的无调整和无标签的上下文学习（ICL）方法，用于识别高质量的ICL示例供重新调解使用，我们提出了一种新颖的选择标准，称为价值影响，用于衡量谈判结果的质量。我们提供了丰富的实证证据，证明了它在三个不同谈判主题上的有效性。源代码和生成的数据集将在接受后公开提供。

更新时间: 2024-06-18 13:10:16

领域: cs.CL,cs.AI,I.2.7

下载: http://arxiv.org/abs/2402.01737v2

RoleLLM: Benchmarking, Eliciting, and Enhancing Role-Playing Abilities of Large Language Models

The advent of Large Language Models (LLMs) has paved the way for complex tasks such as role-playing, which enhances user interactions by enabling models to imitate various characters. However, the closed-source nature of state-of-the-art LLMs and their general-purpose training limit role-playing optimization. In this paper, we introduce RoleLLM, a framework to benchmark, elicit, and enhance role-playing abilities in LLMs. RoleLLM comprises four stages: (1) Role Profile Construction for 100 roles; (2) Context-Based Instruction Generation (Context-Instruct) for role-specific knowledge extraction; (3) Role Prompting using GPT (RoleGPT) for speaking style imitation; and (4) Role-Conditioned Instruction Tuning (RoCIT) for fine-tuning open-source models along with role customization. By Context-Instruct and RoleGPT, we create RoleBench, the first systematic and fine-grained character-level benchmark dataset for role-playing with 168,093 samples. Moreover, RoCIT on RoleBench yields RoleLLaMA (English) and RoleGLM (Chinese), significantly enhancing role-playing abilities and even achieving comparable results with RoleGPT (using GPT-4).

Updated: 2024-06-18 13:08:24

标题: RoleLLM：大型语言模型的角色扮演能力基准测试、引诱和增强

摘要: 大语言模型（LLMs）的出现为复杂任务如角色扮演铺平了道路，通过让模型模仿各种角色来增强用户互动。然而，最先进的LLMs的闭源特性以及它们的通用训练限制了角色扮演的优化。在本文中，我们介绍了RoleLLM，一个用于评估、引出和增强LLMs中角色扮演能力的框架。RoleLLM包括四个阶段：（1）构建100种角色的角色概况；（2）基于上下文的指导生成（Context-Instruct）用于角色特定知识的提取；（3）使用GPT进行角色提示（RoleGPT）以进行口语风格模仿；以及（4）角色条件指导调整（RoCIT）用于对开源模型进行微调并进行角色定制。通过Context-Instruct和RoleGPT，我们创建了RoleBench，这是第一个系统化和细粒度的角色级别基准数据集，包含168,093个样本用于角色扮演。此外，对RoleBench进行RoCIT生成了RoleLLaMA（英语）和RoleGLM（中文），显著增强了角色扮演能力，甚至取得了与使用GPT-4的RoleGPT相媲美的结果。

更新时间: 2024-06-18 13:08:24

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2310.00746v3

Training Diffusion Models with Federated Learning

The training of diffusion-based models for image generation is predominantly controlled by a select few Big Tech companies, raising concerns about privacy, copyright, and data authority due to their lack of transparency regarding training data. To ad-dress this issue, we propose a federated diffusion model scheme that enables the independent and collaborative training of diffusion models without exposing local data. Our approach adapts the Federated Averaging (FedAvg) algorithm to train a Denoising Diffusion Model (DDPM). Through a novel utilization of the underlying UNet backbone, we achieve a significant reduction of up to 74% in the number of parameters exchanged during training,compared to the naive FedAvg approach, whilst simultaneously maintaining image quality comparable to the centralized setting, as evaluated by the FID score.

Updated: 2024-06-18 13:02:48

标题: 使用联邦学习训练扩散模型

摘要: 图像生成的扩散模型训练主要由少数几家大型科技公司控制，由于它们在训练数据方面缺乏透明度，引发了有关隐私、版权和数据权威的担忧。为了解决这个问题，我们提出了一种联合扩散模型方案，可以实现独立和协作训练扩散模型，而不暴露本地数据。我们的方法将联邦平均（FedAvg）算法调整为训练去噪扩散模型（DDPM）。通过对底层UNet骨干的新颖利用，我们在训练过程中将参数交换的数量最多减少了74％，相比于朴素的FedAvg方法，同时通过FID评分评估，保持了与集中设置相当的图像质量。

更新时间: 2024-06-18 13:02:48

领域: cs.LG,cs.DC,I.2.11

下载: http://arxiv.org/abs/2406.12575v1

Mathador-LM: A Dynamic Benchmark for Mathematical Reasoning on Large Language Models

We introduce Mathador-LM, a new benchmark for evaluating the mathematical reasoning on large language models (LLMs), combining ruleset interpretation, planning, and problem-solving. This benchmark is inspired by the Mathador game, where the objective is to reach a target number using basic arithmetic operations on a given set of base numbers, following a simple set of rules. We show that, across leading LLMs, we obtain stable average performance while generating benchmark instances dynamically, following a target difficulty level. Thus, our benchmark alleviates concerns about test-set leakage into training data, an issue that often undermines popular benchmarks. Additionally, we conduct a comprehensive evaluation of both open and closed-source state-of-the-art LLMs on Mathador-LM. Our findings reveal that contemporary models struggle with Mathador-LM, scoring significantly lower than average 5th graders. This stands in stark contrast to their strong performance on popular mathematical reasoning benchmarks.

Updated: 2024-06-18 13:02:12

标题: Mathador-LM：大规模语言模型上的数学推理动态基准

摘要: 我们介绍了Mathador-LM，这是一个用于评估大型语言模型（LLMs）数学推理能力的新基准，结合了规则集解释、规划和问题解决。这个基准受Mathador游戏启发，目标是使用给定的一组基本数字上的基本算术运算达到一个目标数字，遵循一组简单的规则。我们展示，在领先的LLMs中，我们在动态生成基准实例的同时获得稳定的平均性能，遵循一个目标难度级别。因此，我们的基准减轻了对测试集泄漏到训练数据的担忧，这经常会损害流行基准的可靠性。此外，我们对开源和闭源最新LLMs在Mathador-LM上进行了全面评估。我们的研究结果表明，当代模型在Mathador-LM上表现不佳，得分明显低于平均五年级学生。这与它们在流行的数学推理基准上的强劲表现形成鲜明对比。

更新时间: 2024-06-18 13:02:12

领域: cs.CL,cs.AI,cs.LG,I.2.7

下载: http://arxiv.org/abs/2406.12572v1

MOYU: A Theoretical Study on Massive Over-activation Yielded Uplifts in LLMs

Massive Over-activation Yielded Uplifts(MOYU) is an inherent property of large language models, and dynamic activation(DA) based on the MOYU property is a clever yet under-explored strategy designed to accelerate inference in these models. Existing methods that utilize MOYU often face a significant 'Impossible Trinity': struggling to simultaneously maintain model performance, enhance inference speed, and extend applicability across various architectures. Due to the theoretical ambiguities surrounding MOYU, this paper elucidates the root cause of the MOYU property and outlines the mechanisms behind two primary limitations encountered by current DA methods: 1) history-related activation uncertainty, and 2) semantic-irrelevant activation inertia. Our analysis not only underscores the limitations of current dynamic activation strategies within large-scale LLaMA models but also proposes opportunities for refining the design of future sparsity schemes.

Updated: 2024-06-18 12:57:33

标题: MOYU: 低限分子液体中大量过激活产生的提升现象的理论研究

摘要: 大规模语言模型具有的大规模过度激活(MOYU)是一种固有属性，基于MOYU属性的动态激活(DA)是一种聪明而又未被充分探索的策略，旨在加速这些模型中的推理。利用MOYU的现有方法往往面临重大的“不可能三角”：努力同时维持模型性能，增强推理速度，并延伸适用于各种架构。由于围绕MOYU存在理论模糊，本文阐明了MOYU属性的根本原因，并概述了当前DA方法遇到的两个主要限制背后的机制：1)与历史相关的激活不确定性，以及2)与语义无关的激活惯性。我们的分析不仅强调了当前大规模LLaMA模型中动态激活策略的局限性，还提出了改进未来稀疏方案设计的机会。

更新时间: 2024-06-18 12:57:33

领域: cs.LG

下载: http://arxiv.org/abs/2406.12569v1

Analysing India's Cyber Warfare Readiness and Developing a Defence Strategy

The demand for strong cyber defence measures grows, especially in countries such as India, where the rate of digitalization far exceeds cybersecurity developments. The increasing amount of cyber threats highlights the urgent need to strengthen cyber defences. The literature review reveals significant shortcomings in India's cyber defence readiness, especially in real-time threat detection and response capabilities. Through simulation models, the study explores network security behaviours and the impact of defences on network security. The next section of this study focuses on implementing a cyber threat detection system that uses machine learning to identify and categorise cyber threats in real time, followed by strategies to integrate it into India's present infrastructure. Also, the study proposes an educational framework for training cyber professionals. The study concludes with a reflection on the implemented defence strategies. It adds to the continuing discussion about national security by providing an in-depth investigation of cyber warfare preparation and recommending a systematic method to improving through both technological and educational solutions.

Updated: 2024-06-18 12:55:07

标题: 分析印度的网络战准备情况并制定防御战略

摘要: 对于强大的网络防御措施的需求正在增长，特别是在印度等国家，数字化程度远远超过了网络安全发展。越来越多的网络威胁突显了加强网络防御的迫切需要。文献综述揭示了印度网络防御准备方面的显著不足，特别是在实时威胁检测和响应能力方面。通过模拟模型，研究探讨了网络安全行为以及防御措施对网络安全的影响。本研究的下一部分侧重于实施一个利用机器学习实时识别和分类网络威胁的网络威胁检测系统，随后提出了将其整合到印度现有基础设施的策略。此外，研究提出了一个培训网络专业人员的教育框架。研究总结了实施的防御策略，并通过提供对网络战准备的深入调查和推荐通过技术和教育解决方案来改进的系统方法，为国家安全的持续讨论增添了内容。

更新时间: 2024-06-18 12:55:07

领域: cs.CR

下载: http://arxiv.org/abs/2406.12568v1

Benchmarking Label Noise in Instance Segmentation: Spatial Noise Matters

Obtaining accurate labels for instance segmentation is particularly challenging due to the complex nature of the task. Each image necessitates multiple annotations, encompassing not only the object's class but also its precise spatial boundaries. These requirements elevate the likelihood of errors and inconsistencies in both manual and automated annotation processes. By simulating different noise conditions, we provide a realistic scenario for assessing the robustness and generalization capabilities of instance segmentation models in different segmentation tasks, introducing COCO-N and Cityscapes-N. We also propose a benchmark for weakly annotation noise, dubbed COCO-WAN, which utilizes foundation models and weak annotations to simulate semi-automated annotation tools and their noisy labels. This study sheds light on the quality of segmentation masks produced by various models and challenges the efficacy of popular methods designed to address learning with label noise.

Updated: 2024-06-18 12:54:48

标题: 在实例分割中基准标签噪声：空间噪声很重要

摘要: 由于任务的复杂性，获得准确的实例分割标签尤其具有挑战性。每幅图像都需要多个注释，不仅包括对象的类别，还包括其精确的空间边界。这些要求提高了手动和自动注释过程中错误和不一致性的可能性。通过模拟不同的噪声条件，我们为评估不同分割任务中实例分割模型的鲁棒性和泛化能力提供了一个现实的场景，引入了COCO-N和Cityscapes-N。我们还提出了一个名为COCO-WAN的弱注释噪声基准，利用基础模型和弱标注来模拟半自动注释工具及其嘈杂的标签。这项研究揭示了各种模型生成的分割蒙版的质量，并挑战了旨在解决标签噪声学习的流行方法的有效性。

更新时间: 2024-06-18 12:54:48

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2406.10891v2

MolecularGPT: Open Large Language Model (LLM) for Few-Shot Molecular Property Prediction

Molecular property prediction (MPP) is a fundamental and crucial task in drug discovery. However, prior methods are limited by the requirement for a large number of labeled molecules and their restricted ability to generalize for unseen and new tasks, both of which are essential for real-world applications. To address these challenges, we present MolecularGPT for few-shot MPP. From a perspective on instruction tuning, we fine-tune large language models (LLMs) based on curated molecular instructions spanning over 1000 property prediction tasks. This enables building a versatile and specialized LLM that can be adapted to novel MPP tasks without any fine-tuning through zero- and few-shot in-context learning (ICL). MolecularGPT exhibits competitive in-context reasoning capabilities across 10 downstream evaluation datasets, setting new benchmarks for few-shot molecular prediction tasks. More importantly, with just two-shot examples, MolecularGPT can outperform standard supervised graph neural network methods on 4 out of 7 datasets. It also excels state-of-the-art LLM baselines by up to 16.6% increase on classification accuracy and decrease of 199.17 on regression metrics (e.g., RMSE) under zero-shot. This study demonstrates the potential of LLMs as effective few-shot molecular property predictors. The code is available at https://github.com/NYUSHCS/MolecularGPT.

Updated: 2024-06-18 12:54:47

标题: 分子GPT：用于少样本分子性质预测的开放式大型语言模型(LLM)

摘要: 分子属性预测（MPP）是药物发现中的一项基础和关键任务。然而，先前的方法受到需要大量标记分子和它们受限的泛化能力限制的限制，这两者对于实际应用至关重要。为了解决这些挑战，我们提出了用于少样本MPP的MolecularGPT。从指导调整的角度，我们根据涵盖1000多个属性预测任务的策划的分子说明，对大型语言模型（LLMs）进行微调。这使得构建一个多功能和专业化的LLM，可以适应新的MPP任务，而无需通过零样本和少样本的上下文学习（ICL）进行微调。MolecularGPT在10个下游评估数据集中展现了竞争力强的上下文推理能力，为少样本分子预测任务设定了新的基准。更重要的是，仅仅通过两个示例，MolecularGPT就可以在7个数据集中的4个上胜过标准监督图神经网络方法。它还在零样本条件下在分类准确性上超越了最先进的LLM基线，使回归指标（例如，RMSE）减少了199.17。这项研究展示了LLMs作为有效的少样本分子属性预测器的潜力。该代码可在https://github.com/NYUSHCS/MolecularGPT 上找到。

更新时间: 2024-06-18 12:54:47

领域: q-bio.QM,cs.AI,cs.CE,cs.CL,cs.LG

下载: http://arxiv.org/abs/2406.12950v1

Scalable and Flexible Causal Discovery with an Efficient Test for Adjacency

To make accurate predictions, understand mechanisms, and design interventions in systems of many variables, we wish to learn causal graphs from large scale data. Unfortunately the space of all possible causal graphs is enormous so scalably and accurately searching for the best fit to the data is a challenge. In principle we could substantially decrease the search space, or learn the graph entirely, by testing the conditional independence of variables. However, deciding if two variables are adjacent in a causal graph may require an exponential number of tests. Here we build a scalable and flexible method to evaluate if two variables are adjacent in a causal graph, the Differentiable Adjacency Test (DAT). DAT replaces an exponential number of tests with a provably equivalent relaxed problem. It then solves this problem by training two neural networks. We build a graph learning method based on DAT, DAT-Graph, that can also learn from data with interventions. DAT-Graph can learn graphs of 1000 variables with state of the art accuracy. Using the graph learned by DAT-Graph, we also build models that make much more accurate predictions of the effects of interventions on large scale RNA sequencing data.

Updated: 2024-06-18 12:52:29

标题: 可扩展和灵活的因果发现：一种有效的邻接性测试

摘要: 为了做出准确的预测，了解机制，并设计干预措施在许多变量的系统中，我们希望从大规模数据中学习因果图。不幸的是，所有可能因果图的空间是巨大的，因此可伸缩且准确地搜索最适合数据的因果图是一个挑战。原则上，我们可以通过测试变量的条件独立性来显著减少搜索空间，或者完全学习图。然而，在因果图中决定两个变量是否相邻可能需要指数数量的测试。在这里，我们构建了一种可扩展且灵活的方法来评估两个变量是否在因果图中相邻，即可微邻接测试（DAT）。DAT用可证明等效的放松问题替换了指数数量的测试。然后通过训练两个神经网络来解决这个问题。我们基于DAT构建了一种图学习方法DAT-Graph，该方法还可以从干预数据中学习。DAT-Graph可以以最先进的准确性学习1000个变量的图。使用DAT-Graph学习的图，我们还构建了模型，可以更准确地预测对大规模RNA测序数据的干预效果。

更新时间: 2024-06-18 12:52:29

领域: stat.ML,cs.LG

下载: http://arxiv.org/abs/2406.09177v2

Improving global awareness of linkset predictions using Cross-Attentive Modulation tokens

Most of multiple link prediction or graph generation techniques rely on the attention mechanism or on Graph Neural Networks (GNNs), which consist in leveraging node-level information exchanges in order to form proper link predictions. Such node-level interactions do not process nodes as an ordered sequence, which would imply some kind of natural ordering of the nodes: they are said to be permutation invariant mechanisms. They are well suited for graph problems, but struggle at providing a global orchestration of the predicted links, which can result in a loss of performance. Some typical issues can be the difficulty to ensure high-level properties such as global connectedness, fixed diameter or to avoid information bottleneck effects such as oversmoothing and oversquashing, which respectively consist in abundant smoothing in dense areas leading to a loss of information and a tendency to exclude isolated nodes from the message passing scheme, and often result in irrelevant, unbalanced link predictions. To tackle this problem, we hereby present Cross-Attentive Modulation (CAM) tokens, which introduce cross-attentive units used to condition node and edge-level modulations in order to enable context-aware computations that improve the global consistency of the prediction links. We will implement it on a few permutation invariant architectures, and showcase benchmarks that prove the merits of our work.

Updated: 2024-06-18 12:51:49

标题: 使用交叉注意力调节标记来改进对链接集预测的全球认识

摘要: 大多数多链接预测或图生成技术依赖于注意机制或图神经网络（GNN），这些技术利用节点级信息交换来形成适当的链接预测。此类节点级交互不以有序序列处理节点，这意味着节点的某种自然排序：它们被称为排列不变机制。它们非常适用于图问题，但在提供预测链接的全局协调方面存在困难，这可能导致性能损失。一些典型问题可能包括确保高级属性（如全局连通性、固定直径）的困难，或避免信息瓶颈效应（如过度平滑和过度压缩），分别指的是在密集区域中出现过度平滑导致信息丢失，并倾向于将孤立节点排除在消息传递方案之外，通常导致无关紧要、不平衡的链接预测。为了解决这个问题，我们在此介绍了交叉注意调制（CAM）令牌，引入了交叉注意单元，用于调节节点和边级调制，以实现能够改善预测链接的全局一致性的上下文感知计算。我们将在几个排列不变的架构上实施它，并展示证明我们工作优点的基准测试。

更新时间: 2024-06-18 12:51:49

领域: cs.SI,cs.LG,I.2.6

下载: http://arxiv.org/abs/2405.19375v2

Low-Resource Machine Translation through the Lens of Personalized Federated Learning

We present a new approach based on the Personalized Federated Learning algorithm MeritFed that can be applied to Natural Language Tasks with heterogeneous data. We evaluate it on the Low-Resource Machine Translation task, using the dataset from the Large-Scale Multilingual Machine Translation Shared Task (Small Track #2) and the subset of Sami languages from the multilingual benchmark for Finno-Ugric languages. In addition to its effectiveness, MeritFed is also highly interpretable, as it can be applied to track the impact of each language used for training. Our analysis reveals that target dataset size affects weight distribution across auxiliary languages, that unrelated languages do not interfere with the training, and auxiliary optimizer parameters have minimal impact. Our approach is easy to apply with a few lines of code, and we provide scripts for reproducing the experiments at https://github.com/VityaVitalich/MeritFed

Updated: 2024-06-18 12:50:00

标题: 低资源机器翻译视角下的个性化联邦学习

摘要: 我们提出了一种基于个性化联邦学习算法MeritFed的新方法，可应用于具有异构数据的自然语言任务。我们在低资源机器翻译任务上对其进行评估，使用来自大规模多语种机器翻译共享任务（小轨道＃2）的数据集以及芬乌戈里亚语言多语种基准的萨米语言子集。除了其有效性外，MeritFed还具有高度可解释性，因为它可以用来跟踪每种用于训练的语言的影响。我们的分析表明，目标数据集大小影响辅助语言之间的权重分配，不相关的语言不会干扰训练，并且辅助优化器参数的影响很小。我们的方法易于应用，只需几行代码，我们提供了用于重现实验的脚本，网址为https://github.com/VityaVitalich/MeritFed。

更新时间: 2024-06-18 12:50:00

领域: cs.CL,cs.LG

下载: http://arxiv.org/abs/2406.12564v1

ML-Bench: Evaluating Large Language Models and Agents for Machine Learning Tasks on Repository-Level Code

Despite Large Language Models (LLMs) like GPT-4 achieving impressive results in function-level code generation, they struggle with repository-scale code understanding (e.g., coming up with the right arguments for calling routines), requiring a deeper comprehension of complex file interactions. Also, recently, people have developed LLM agents that attempt to interact with repository code (e.g., compiling and evaluating its execution), prompting the need to evaluate their performance. These gaps have motivated our development of ML-Bench, a benchmark rooted in real-world programming applications that leverage existing code repositories to perform tasks. Addressing the need for LLMs to interpret long code contexts and translate instructions into precise, executable scripts, ML-Bench encompasses annotated 9,641 examples across 18 GitHub repositories, challenging LLMs to accommodate user-specified arguments and documentation intricacies effectively. To evaluate both LLMs and AI agents, two setups are employed: ML-LLM-Bench for assessing LLMs' text-to-code conversion within a predefined deployment environment, and ML-Agent-Bench for testing autonomous agents in an end-to-end task execution within a Linux sandbox environment. Our findings indicate that while GPT-4o leads with a Pass@5 rate surpassing 50%, there remains significant scope for improvement, highlighted by issues such as hallucinated outputs and difficulties with bash script generation. Notably, in the more demanding ML-Agent-Bench, GPT-4o achieves a 76.47% success rate, reflecting the efficacy of iterative action and feedback in complex task resolution. Our code, dataset, and models are available at https://github.com/gersteinlab/ML-bench.

Updated: 2024-06-18 12:49:41

标题: ML-Bench：对存储库级别代码上的大型语言模型和代理进行机器学习任务评估

摘要: 尽管像GPT-4这样的大型语言模型在功能级代码生成方面取得了令人印象深刻的结果，但它们在理解存储库规模的代码（例如，为调用例程提供正确的参数）方面仍然存在困难，需要更深入地理解复杂文件交互。最近，人们开发了尝试与存储库代码交互（例如，编译和评估其执行）的LLM代理，促使我们评估它们的性能。这些差距促使我们开发了ML-Bench，这是一个基于真实世界编程应用的基准测试，利用现有代码存储库执行任务。ML-Bench满足了LLM解释长代码上下文并将指令转换为精确可执行脚本的需求，涵盖了18个GitHub存储库中的9,641个示例，并挑战LLM有效地适应用户指定的参数和文档细节。为了评估LLM和AI代理，采用了两种设置：ML-LLM-Bench用于评估LLM在预定义部署环境中的文本到代码转换，ML-Agent-Bench用于测试在Linux沙盒环境中进行端到端任务执行的自主代理。我们的研究结果表明，尽管GPT-4o的Pass@5率超过50%，但仍存在改进的重要空间，例如产生幻觉输出和生成bash脚本的困难。值得注意的是，在更具挑战性的ML-Agent-Bench中，GPT-4o实现了76.47%的成功率，反映了在复杂任务解决中迭代行动和反馈的有效性。我们的代码、数据集和模型可在https://github.com/gersteinlab/ML-bench上找到。

更新时间: 2024-06-18 12:49:41

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2311.09835v4

A Super-human Vision-based Reinforcement Learning Agent for Autonomous Racing in Gran Turismo

Racing autonomous cars faster than the best human drivers has been a longstanding grand challenge for the fields of Artificial Intelligence and robotics. Recently, an end-to-end deep reinforcement learning agent met this challenge in a high-fidelity racing simulator, Gran Turismo. However, this agent relied on global features that require instrumentation external to the car. This paper introduces, to the best of our knowledge, the first super-human car racing agent whose sensor input is purely local to the car, namely pixels from an ego-centric camera view and quantities that can be sensed from on-board the car, such as the car's velocity. By leveraging global features only at training time, the learned agent is able to outperform the best human drivers in time trial (one car on the track at a time) races using only local input features. The resulting agent is evaluated in Gran Turismo 7 on multiple tracks and cars. Detailed ablation experiments demonstrate the agent's strong reliance on visual inputs, making it the first vision-based super-human car racing agent.

Updated: 2024-06-18 12:49:27

标题: 一个基于超人类视觉的强化学习代理人在《极限竞速》中的自主赛车。

摘要: 在人工智能和机器人领域，比最优秀的人类驾驶员更快地比赛自动驾驶汽车一直是一个长期存在的宏伟挑战。最近，在高保真度赛车模拟器Gran Turismo中，一个端到端的深度强化学习代理应对了这一挑战。然而，这个代理依赖于需要在车外进行仪器化的全局特征。本文介绍了据我们所知，第一个超越人类驾驶员的汽车赛车代理，其传感器输入仅限于汽车本身，即来自自我中心摄像头视角的像素和可以从车上感知的数量，如汽车的速度。通过仅在训练时利用全局特征，学习的代理能够在只使用本地输入特征的情况下在时间试验（一次只有一辆车在赛道上）比赛中胜过最优秀的人类驾驶员。该代理在Gran Turismo 7中在多个赛道和汽车上进行了评估。详细的消融实验展示了代理对视觉输入的强烈依赖，使其成为第一个基于视觉的超越人类的汽车赛车代理。

更新时间: 2024-06-18 12:49:27

领域: cs.LG,cs.CV,cs.RO

下载: http://arxiv.org/abs/2406.12563v1

How Well Do Multi-modal LLMs Interpret CT Scans? An Auto-Evaluation Framework for Analyses

Automatically interpreting CT scans can ease the workload of radiologists. However, this is challenging mainly due to the scarcity of adequate datasets and reference standards for evaluation. This study aims to bridge this gap by introducing a novel evaluation framework, named ``GPTRadScore''. This framework assesses the capabilities of multi-modal LLMs, such as GPT-4 with Vision (GPT-4V), Gemini Pro Vision, LLaVA-Med, and RadFM, in generating descriptions for prospectively-identified findings. By employing a decomposition technique based on GPT-4, GPTRadScore compares these generated descriptions with gold-standard report sentences, analyzing their accuracy in terms of body part, location, and type of finding. Evaluations demonstrated a high correlation with clinician assessments and highlighted its potential over traditional metrics, such as BLEU, METEOR, and ROUGE. Furthermore, to contribute to future studies, we plan to release a benchmark dataset annotated by clinicians. Using GPTRadScore, we found that while GPT-4V and Gemini Pro Vision fare better, their performance revealed significant areas for improvement, primarily due to limitations in the dataset used for training these models. To demonstrate this potential, RadFM was fine-tuned and it resulted in significant accuracy improvements: location accuracy rose from 3.41\% to 12.8\%, body part accuracy from 29.12\% to 53\%, and type accuracy from 9.24\% to 30\%, thereby validating our hypothesis.

Updated: 2024-06-18 12:43:18

标题: 多模态LLMs在解释CT扫描中表现如何？一种用于分析的自动评估框架

摘要: 自动解释CT扫描可以减轻放射科医师的工作量。然而，由于缺乏足够的数据集和评估参考标准，这是具有挑战性的。本研究旨在通过引入一种名为“GPTRadScore”的新评估框架来弥补这一差距。该框架评估了多模式LLM（如具有视觉功能的GPT-4、Gemini Pro Vision、LLaVA-Med和RadFM）在生成预先识别的发现的描述方面的能力。通过基于GPT-4的分解技术，GPTRadScore将这些生成的描述与金标准报告句进行比较，分析其在身体部位、位置和发现类型方面的准确性。评估结果显示与临床评估之间存在高度相关性，并突显了其相对于传统指标（如BLEU、METEOR和ROUGE）的潜力。此外，为了为未来研究做出贡献，我们计划发布一份由临床医生注释的基准数据集。使用GPTRadScore，我们发现虽然GPT-4V和Gemini Pro Vision表现更好，但其性能显示出需要改进的显著领域，主要是由于用于训练这些模型的数据集的限制。为了展示这一潜力，我们对RadFM进行了微调，结果显示明显的准确性提高：位置准确性从3.41\%提高到12.8\%，身体部位准确性从29.12\%提高到53\%，发现类型准确性从9.24\%提高到30\%，从而验证了我们的假设。

更新时间: 2024-06-18 12:43:18

领域: cs.AI,cs.CL,cs.CV

下载: http://arxiv.org/abs/2403.05680v2

A Rate-Distortion View of Uncertainty Quantification

In supervised learning, understanding an input's proximity to the training data can help a model decide whether it has sufficient evidence for reaching a reliable prediction. While powerful probabilistic models such as Gaussian Processes naturally have this property, deep neural networks often lack it. In this paper, we introduce Distance Aware Bottleneck (DAB), i.e., a new method for enriching deep neural networks with this property. Building on prior information bottleneck approaches, our method learns a codebook that stores a compressed representation of all inputs seen during training. The distance of a new example from this codebook can serve as an uncertainty estimate for the example. The resulting model is simple to train and provides deterministic uncertainty estimates by a single forward pass. Finally, our method achieves better out-of-distribution (OOD) detection and misclassification prediction than prior methods, including expensive ensemble methods, deep kernel Gaussian Processes, and approaches based on the standard information bottleneck.

Updated: 2024-06-18 12:41:43

标题: 不确定性量化的速率失真视角

摘要: 在监督学习中，了解输入数据与训练数据的接近程度可以帮助模型决定是否有足够的证据进行可靠的预测。虽然强大的概率模型如高斯过程天然具有这种性质，但深度神经网络通常缺乏这种性质。在本文中，我们介绍了一种新方法Distance Aware Bottleneck（DAB），即一种用于丰富深度神经网络具有这种性质的方法。基于先前的信息瓶颈方法，我们的方法学习一个代码本，存储训练过程中看到的所有输入的压缩表示。新示例与代码本的距离可以作为示例的不确定性估计。由于通过单次前向传播训练的模型简单且提供确定性不确定性估计。最后，我们的方法实现了比先前方法更好的超出分布（OOD）检测和误分类预测，包括昂贵的集成方法、深度核高斯过程和基于标准信息瓶颈的方法。

更新时间: 2024-06-18 12:41:43

领域: cs.LG,cs.AI,stat.ML

下载: http://arxiv.org/abs/2406.10775v2

Bayesian Data Selection

A wide range of machine learning algorithms iteratively add data to the training sample. Examples include semi-supervised learning, active learning, multi-armed bandits, and Bayesian optimization. We embed this kind of data addition into decision theory by framing data selection as a decision problem. This paves the way for finding Bayes-optimal selections of data. For the illustrative case of self-training in semi-supervised learning, we derive the respective Bayes criterion. We further show that deploying this criterion mitigates the issue of confirmation bias by empirically assessing our method for generalized linear models, semi-parametric generalized additive models, and Bayesian neural networks on simulated and real-world data.

Updated: 2024-06-18 12:40:15

标题: 贝叶斯数据选择

摘要: 一系列机器学习算法不断将数据添加到训练样本中。例如，半监督学习、主动学习、多臂老虎机和贝叶斯优化。我们将这种数据添加嵌入到决策理论中，将数据选择框定为一个决策问题。这为寻找贝叶斯最优数据选择铺平了道路。以半监督学习中的自训练为例，我们推导了相应的贝叶斯标准。我们进一步展示，通过在模拟和真实数据上对广义线性模型、半参数广义加法模型和贝叶斯神经网络使用我们的方法进行经验评估，可以缓解确认偏见的问题。

更新时间: 2024-06-18 12:40:15

领域: stat.ML,cs.AI,cs.LG,math.ST,stat.TH

下载: http://arxiv.org/abs/2406.12560v1

Navigating Complexity: Toward Lossless Graph Condensation via Expanding Window Matching

Graph condensation aims to reduce the size of a large-scale graph dataset by synthesizing a compact counterpart without sacrificing the performance of Graph Neural Networks (GNNs) trained on it, which has shed light on reducing the computational cost for training GNNs. Nevertheless, existing methods often fall short of accurately replicating the original graph for certain datasets, thereby failing to achieve the objective of lossless condensation. To understand this phenomenon, we investigate the potential reasons and reveal that the previous state-of-the-art trajectory matching method provides biased and restricted supervision signals from the original graph when optimizing the condensed one. This significantly limits both the scale and efficacy of the condensed graph. In this paper, we make the first attempt toward \textit{lossless graph condensation} by bridging the previously neglected supervision signals. Specifically, we employ a curriculum learning strategy to train expert trajectories with more diverse supervision signals from the original graph, and then effectively transfer the information into the condensed graph with expanding window matching. Moreover, we design a loss function to further extract knowledge from the expert trajectories. Theoretical analysis justifies the design of our method and extensive experiments verify its superiority across different datasets. Code is released at https://github.com/NUS-HPC-AI-Lab/GEOM.

Updated: 2024-06-18 12:38:51

标题: 应对复杂性：通过扩展窗口匹配实现无损图压缩

摘要: 图图凝缩旨在通过合成一个紧凑的对应物，而不牺牲在其上训练的图神经网络（GNNs）的性能，从而缩小大规模图数据集的大小，从而减少训练GNNs的计算成本。然而，现有方法往往无法准确复制某些数据集的原始图，因此无法实现无损凝缩的目标。为了理解这一现象，我们调查了潜在的原因，并揭示了前沿轨迹匹配方法在优化简化图时提供了偏见和受限制的监督信号。这显著限制了简化图的规模和功效。在本文中，我们通过弥补先前被忽视的监督信号，首次尝试实现无损图凝缩。具体来说，我们采用课程学习策略来训练具有更多来自原始图的多样化监督信号的专家轨迹，然后通过扩展窗口匹配有效地将信息转移到简化图中。此外，我们设计了一个损失函数，进一步从专家轨迹中提取知识。理论分析证明了我们方法的设计，并广泛实验验证了其在不同数据集上的优越性。代码发布在https://github.com/NUS-HPC-AI-Lab/GEOM。

更新时间: 2024-06-18 12:38:51

领域: cs.LG

下载: http://arxiv.org/abs/2402.05011v3

Provable Guarantees for Model Performance via Mechanistic Interpretability

In this work, we propose using mechanistic interpretability -- techniques for reverse engineering model weights into human-interpretable algorithms -- to derive and compactly prove formal guarantees on model performance. We prototype this approach by formally proving lower bounds on the accuracy of 151 small transformers trained on a Max-of-$K$ task. We create 102 different computer-assisted proof strategies and assess their length and tightness of bound on each of our models. Using quantitative metrics, we find that shorter proofs seem to require and provide more mechanistic understanding. Moreover, we find that more faithful mechanistic understanding leads to tighter performance bounds. We confirm these connections by qualitatively examining a subset of our proofs. Finally, we identify compounding structureless noise as a key challenge for using mechanistic interpretability to generate compact proofs on model performance.

Updated: 2024-06-18 12:36:07

标题: 通过机制可解释性证明模型性能的保证

摘要: 在这项工作中，我们提出使用机械解释性技术——将模型权重逆向工程成人类可解释的算法——来推导并简洁地证明模型性能的形式保证。我们通过正式证明151个小型transformer在Max-of-$K$任务上的准确性下界来原型化这种方法。我们创建了102种不同的计算机辅助证明策略，并评估它们在每个模型上的长度和紧密度。通过定量指标，我们发现较短的证明似乎需要并提供更多的机械理解。此外，我们发现更忠实的机械理解会导致更紧密的性能边界。我们通过定性地检查我们证明的子集来确认这些联系。最后，我们确定结构性噪音的累积是利用机械解释性生成模型性能紧凑证明的关键挑战。

更新时间: 2024-06-18 12:36:07

领域: cs.LG,cs.LO

下载: http://arxiv.org/abs/2406.11779v2

A generalizable framework for low-rank tensor completion with numerical priors

Low-Rank Tensor Completion, a method which exploits the inherent structure of tensors, has been studied extensively as an effective approach to tensor completion. Whilst such methods attained great success, none have systematically considered exploiting the numerical priors of tensor elements. Ignoring numerical priors causes loss of important information regarding the data, and therefore prevents the algorithms from reaching optimal accuracy. Despite the existence of some individual works which consider ad hoc numerical priors for specific tasks, no generalizable frameworks for incorporating numerical priors have appeared. We present the Generalized CP Decomposition Tensor Completion (GCDTC) framework, the first generalizable framework for low-rank tensor completion that takes numerical priors of the data into account. We test GCDTC by further proposing the Smooth Poisson Tensor Completion (SPTC) algorithm, an instantiation of the GCDTC framework, whose performance exceeds current state-of-the-arts by considerable margins in the task of non-negative tensor completion, exemplifying GCDTC's effectiveness. Our code is open-source.

Updated: 2024-06-18 12:32:19

标题: 一个适用于具有数值先验知识的低秩张量补全的通用框架

摘要: 低秩张量补全是一种利用张量固有结构的方法，已被广泛研究作为张量补全的有效方法。虽然这些方法取得了巨大成功，但没有系统地考虑利用张量元素的数值先验。忽略数值先验会导致丢失有关数据的重要信息，从而阻止算法达到最佳准确度。尽管存在一些个别作品考虑特定任务的特定数值先验，但尚未出现用于整合数值先验的可推广框架。我们提出了广义CP分解张量补全（GCDTC）框架，这是第一个将数据的数值先验考虑在内的通用张量补全框架。我们通过进一步提出平滑泊松张量补全（SPTC）算法来测试GCDTC，这是GCDTC框架的一个实例，其性能在非负张量补全任务中明显超过当前的最新技术，展示了GCDTC的有效性。我们的代码是开源的。

更新时间: 2024-06-18 12:32:19

领域: cs.CV,cs.AI,cs.LG,cs.NA,math.NA,stat.ML

下载: http://arxiv.org/abs/2302.05881v5

Local Recovery of Two-layer Neural Networks at Overparameterization

Under mild assumptions, we investigate the structure of loss landscape of two-layer neural networks near global minima, determine the set of parameters which recovers the target function, and characterize the gradient flows around it. With novel techniques, our work uncovers some simple aspects of the complicated loss landscape and reveals how model, target function, samples and initialization affect the training dynamics differently. These results concludes that two-layer neural networks can be recovered locally at overparameterization.

Updated: 2024-06-18 12:29:30

标题: 过参数化状态下两层神经网络的局部恢复

摘要: 在温和的假设下，我们研究了全局最小值附近两层神经网络的损失景观结构，确定了恢复目标函数的参数集，并表征了其周围的梯度流。通过新颖的技术，我们的工作揭示了复杂损失景观的一些简单方面，并揭示了模型、目标函数、样本和初始化如何不同地影响训练动态。这些结果表明，在过参数化的情况下，可以在本地恢复两层神经网络。

更新时间: 2024-06-18 12:29:30

领域: cs.LG,math.DS

下载: http://arxiv.org/abs/2309.00508v2

Offline Imitation Learning with Model-based Reverse Augmentation

In offline Imitation Learning (IL), one of the main challenges is the \textit{covariate shift} between the expert observations and the actual distribution encountered by the agent, because it is difficult to determine what action an agent should take when outside the state distribution of the expert demonstrations. Recently, the model-free solutions introduce the supplementary data and identify the latent expert-similar samples to augment the reliable samples during learning. Model-based solutions build forward dynamic models with conservatism quantification and then generate additional trajectories in the neighborhood of expert demonstrations. However, without reward supervision, these methods are often over-conservative in the out-of-expert-support regions, because only in states close to expert-observed states can there be a preferred action enabling policy optimization. To encourage more exploration on expert-unobserved states, we propose a novel model-based framework, called offline Imitation Learning with Self-paced Reverse Augmentation (SRA). Specifically, we build a reverse dynamic model from the offline demonstrations, which can efficiently generate trajectories leading to the expert-observed states in a self-paced style. Then, we use the subsequent reinforcement learning method to learn from the augmented trajectories and transit from expert-unobserved states to expert-observed states. This framework not only explores the expert-unobserved states but also guides maximizing long-term returns on these states, ultimately enabling generalization beyond the expert data. Empirical results show that our proposal could effectively mitigate the covariate shift and achieve the state-of-the-art performance on the offline imitation learning benchmarks. Project website: \url{https://www.lamda.nju.edu.cn/shaojj/KDD24_SRA/}.

Updated: 2024-06-18 12:27:02

标题: 离线模型反向增强的模仿学习

摘要: 在离线模仿学习（IL）中，一个主要挑战是专家观察和代理遇到的实际分布之间的\textit{协变量转移}，因为很难确定代理在专家演示状态分布之外应该采取什么行动。最近，无模型解决方案引入了辅助数据，并识别潜在的类似专家样本，在学习过程中增加可靠的样本。基于模型的解决方案构建了具有保守性量化的前向动态模型，然后在专家演示附近生成额外的轨迹。然而，在没有奖励监督的情况下，这些方法在超出专家支持区域时往往过于保守，因为只有在接近专家观察状态的状态中才能有一个优选行动，从而实现策略优化。为了鼓励在专家未观察到的状态上进行更多探索，我们提出了一种新颖的基于模型的框架，称为离线模仿学习与自主逐步反向增强（SRA）。具体地，我们根据离线演示构建一个反向动态模型，以自主方式有效地生成通往专家观察状态的轨迹。然后，我们使用随后的强化学习方法从增强轨迹中学习，并从专家未观察到的状态过渡到专家观察到的状态。这个框架不仅探索了专家未观察到的状态，还指导最大化这些状态的长期回报，最终实现超越专家数据的泛化。实证结果表明，我们的提案能有效缓解协变量转移，并在离线模仿学习基准测试中取得最先进的性能。项目网站：\url{https://www.lamda.nju.edu.cn/shaojj/KDD24_SRA/}。

更新时间: 2024-06-18 12:27:02

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2406.12550v1

MultiSocial: Multilingual Benchmark of Machine-Generated Text Detection of Social-Media Texts

Recent LLMs are able to generate high-quality multilingual texts, indistinguishable for humans from authentic human-written ones. Research in machine-generated text detection is however mostly focused on the English language and longer texts, such as news articles, scientific papers or student essays. Social-media texts are usually much shorter and often feature informal language, grammatical errors, or distinct linguistic items (e.g., emoticons, hashtags). There is a gap in studying the ability of existing methods in detection of such texts, reflected also in the lack of existing multilingual benchmark datasets. To fill this gap we propose the first multilingual (22 languages) and multi-platform (5 social media platforms) dataset for benchmarking machine-generated text detection in the social-media domain, called MultiSocial. It contains 472,097 texts, of which about 58k are human-written and approximately the same amount is generated by each of 7 multilingual LLMs. We use this benchmark to compare existing detection methods in zero-shot as well as fine-tuned form. Our results indicate that the fine-tuned detectors have no problem to be trained on social-media texts and that the platform selection for training matters.

Updated: 2024-06-18 12:26:09

标题: MultiSocial：社交媒体文本机器生成文本检测的多语言基准Benchmark

摘要: 最近的大型语言模型能够生成高质量的多语言文本，无法从真实人类编写的文本中区分。然而，机器生成文本检测的研究大多集中在英语语言和更长的文本上，如新闻文章、科学论文或学生论文。社交媒体文本通常更短，并且常常包含非正式语言、语法错误或独特的语言项目（如表情符号、主题标签）。对于现有方法在检测这类文本方面的能力缺乏研究，这也反映在缺乏现有多语言基准数据集上。为填补这一空白，我们提出了第一个面向社交媒体领域的多语言（22种语言）和多平台（5个社交媒体平台）数据集，名为MultiSocial。它包含472,097个文本，其中大约58,000个是人类编写的，大约相同数量由7个多语言大型语言模型生成。我们使用这个基准来比较现有的检测方法在零样本和微调形式上的表现。我们的结果表明，微调的检测器在社交媒体文本上训练没有问题，并且平台选择对训练很重要。

更新时间: 2024-06-18 12:26:09

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2406.12549v1

Next Generation of Phishing Attacks using AI powered Browsers

The increase in the number of phishing demands innovative solutions to safeguard users from phishing attacks. This study explores the development and utilization of a real-time browser extension integrated with machine learning model to improve the detection of phishing websites. The results showed that the model had an accuracy of 98.32%, precision of 98.62%, recall of 97.86%, and an F1-score of 98.24%. When compared to other algorithms like Support Vector Machine, Na\"ive Bayes, Decision Tree, XGBoost, and K Nearest Neighbor, the Random Forest algorithm stood out for its effectiveness in detecting phishing attacks. The zero-day phishing attack detection testing over a 15-day period revealed the model's capability to identify previously unseen threats and thus achieving an overall accuracy rate of 99.11%. Furthermore, the model showed better performance when compared to conventional security measures like Google Safe Browsing. The model had successfully detected phishing URLs that evaded detection by Google safe browsing. This research shows how using machine learning in real-time browser extensions can defend against phishing attacks. It gives useful information about cybersecurity and helps make the internet safer for everyone.

Updated: 2024-06-18 12:24:36

标题: 下一代利用AI强化浏览器的网络钓鱼攻击

摘要: 随着网络钓鱼数量的增加，需要创新的解决方案来保护用户免受网络钓鱼攻击。本研究探讨了开发和利用实时浏览器扩展与机器学习模型集成的方法，以提高对网络钓鱼网站的检测能力。结果显示，该模型的准确率为98.32％，精确率为98.62％，召回率为97.86％，F1分数为98.24％。与其他算法（如支持向量机、朴素贝叶斯、决策树、XGBoost和K近邻）相比，随机森林算法在检测网络钓鱼攻击方面效果显著。在为期15天的零日网络钓鱼攻击检测测试中，该模型能够识别以前未见过的威胁，从而实现总体准确率达到99.11％。此外，与谷歌安全浏览等传统安全措施相比，该模型表现更好。该模型成功检测到了谷歌安全浏览未能检测到的网络钓鱼URL。这项研究展示了如何在实时浏览器扩展中使用机器学习来抵御网络钓鱼攻击，为网络安全提供了有用的信息，并帮助使互联网变得更安全。

更新时间: 2024-06-18 12:24:36

领域: cs.CR

下载: http://arxiv.org/abs/2406.12547v1

The Heterophilic Snowflake Hypothesis: Training and Empowering GNNs for Heterophilic Graphs

Graph Neural Networks (GNNs) have become pivotal tools for a range of graph-based learning tasks. Notably, most current GNN architectures operate under the assumption of homophily, whether explicitly or implicitly. While this underlying assumption is frequently adopted, it is not universally applicable, which can result in potential shortcomings in learning effectiveness. In this paper, \textbf{for the first time}, we transfer the prevailing concept of ``one node one receptive field" to the heterophilic graph. By constructing a proxy label predictor, we enable each node to possess a latent prediction distribution, which assists connected nodes in determining whether they should aggregate their associated neighbors. Ultimately, every node can have its own unique aggregation hop and pattern, much like each snowflake is unique and possesses its own characteristics. Based on observations, we innovatively introduce the Heterophily Snowflake Hypothesis and provide an effective solution to guide and facilitate research on heterophilic graphs and beyond. We conduct comprehensive experiments including (1) main results on 10 graphs with varying heterophily ratios across 10 backbones; (2) scalability on various deep GNN backbones (SGC, JKNet, etc.) across various large number of layers (2,4,6,8,16,32 layers); (3) comparison with conventional snowflake hypothesis; (4) efficiency comparison with existing graph pruning algorithms. Our observations show that our framework acts as a versatile operator for diverse tasks. It can be integrated into various GNN frameworks, boosting performance in-depth and offering an explainable approach to choosing the optimal network depth. The source code is available at \url{https://github.com/bingreeky/HeteroSnoH}.

Updated: 2024-06-18 12:16:00

标题: 异构性雪花假设：训练和赋能GNNs处理异构图网络

摘要: 图神经网络（GNNs）已成为一系列基于图的学习任务中至关重要的工具。值得注意的是，大多数当前的GNN架构都是在同质性的假设下运行的，无论是显式还是隐式地。虽然这一基本假设经常被采用，但并非普适，这可能导致学习效果的潜在缺陷。在本文中，\textbf{首次}我们将“一个节点一个感受域”的普遍概念转移到异质图中。通过构建一个代理标签预测器，我们使每个节点都能拥有一个潜在的预测分布，这有助于连接节点确定它们是否应该聚合其相关邻居。最终，每个节点都可以拥有自己独特的聚合跳数和模式，就像每片雪花都是独特的，具有自己的特征一样。基于观察，我们创新性地引入了异质雪花假设，并提供了一种有效的解决方案，以指导和促进对异质图及其以上领域的研究。我们进行了包括（1）在10个具有不同同质性比率的图上的主要结果；（2）在各种深度GNN骨干（SGC、JKNet等）上的可扩展性，跨越各种大量层（2、4、6、8、16、32层）；（3）与传统雪花假设的比较；（4）与现有图修剪算法的效率比较在内的全面实验。我们的观察结果表明，我们的框架作为一个多功能运算符，适用于各种任务。它可以集成到各种GNN框架中，提高性能深度，并提供一个可解释的方法来选择最佳网络深度。源代码可在\url{https://github.com/bingreeky/HeteroSnoH}上获得。

更新时间: 2024-06-18 12:16:00

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2406.12539v1

Dynamic Normativity: Necessary and Sufficient Conditions for Value Alignment

The critical inquiry pervading the realm of Philosophy, and perhaps extending its influence across all Humanities disciplines, revolves around the intricacies of morality and normativity. Surprisingly, in recent years, this thematic thread has woven its way into an unexpected domain, one not conventionally associated with pondering "what ought to be": the field of artificial intelligence (AI) research. Central to morality and AI, we find "alignment", a problem related to the challenges of expressing human goals and values in a manner that artificial systems can follow without leading to unwanted adversarial effects. More explicitly and with our current paradigm of AI development in mind, we can think of alignment as teaching human values to non-anthropomorphic entities trained through opaque, gradient-based learning techniques. This work addresses alignment as a technical-philosophical problem that requires solid philosophical foundations and practical implementations that bring normative theory to AI system development. To accomplish this, we propose two sets of necessary and sufficient conditions that, we argue, should be considered in any alignment process. While necessary conditions serve as metaphysical and metaethical roots that pertain to the permissibility of alignment, sufficient conditions establish a blueprint for aligning AI systems under a learning-based paradigm. After laying such foundations, we present implementations of this approach by using state-of-the-art techniques and methods for aligning general-purpose language systems. We call this framework Dynamic Normativity. Its central thesis is that any alignment process under a learning paradigm that cannot fulfill its necessary and sufficient conditions will fail in producing aligned systems.

Updated: 2024-06-18 12:15:06

标题: 动态规范性：价值调整的必要和充分条件

摘要: 潜在于哲学领域的关键探究，或许延伸至所有人文学科，围绕着道德和规范的复杂性展开。令人惊讶的是，近年来，这一主题线索已经渗透到一个意想不到的领域，一个通常不与思考“应该是什么”的领域有关联的领域：人工智能（AI）研究领域。在道德和人工智能中，我们发现“对齐”是一个问题，涉及到以一种人工系统可以遵循而不会导致不良对抗效果的方式表达人类目标和价值观的挑战。更具体地，在我们当前的人工智能发展范式中，我们可以将对齐视为向通过不透明的基于梯度学习技术训练的非人类实体传授人类价值观的过程。这项工作将对齐视为一个需要坚实哲学基础和将规范理论带入人工智能系统开发的技术哲学问题。为了实现这一目标，我们提出了两组必要和充分条件，我们认为应该在任何对齐过程中考虑。虽然必要条件作为关于对齐的可容许性的形而上学和元伦理根基，充分条件确立了在基于学习的范式下对齐AI系统的蓝图。在奠定这样的基础之后，我们通过使用最先进的技术和方法来对齐通用语言系统，展示了这种方法的实现。我们将这一框架称为“动态规范性”。其核心命题是，任何在学习范式下的对齐过程，如果不能满足其必要和充分条件，将无法产生对齐的系统。

更新时间: 2024-06-18 12:15:06

领域: cs.AI,cs.CY

下载: http://arxiv.org/abs/2406.11039v2

Variational Distillation of Diffusion Policies into Mixture of Experts

This work introduces Variational Diffusion Distillation (VDD), a novel method that distills denoising diffusion policies into Mixtures of Experts (MoE) through variational inference. Diffusion Models are the current state-of-the-art in generative modeling due to their exceptional ability to accurately learn and represent complex, multi-modal distributions. This ability allows Diffusion Models to replicate the inherent diversity in human behavior, making them the preferred models in behavior learning such as Learning from Human Demonstrations (LfD). However, diffusion models come with some drawbacks, including the intractability of likelihoods and long inference times due to their iterative sampling process. The inference times, in particular, pose a significant challenge to real-time applications such as robot control. In contrast, MoEs effectively address the aforementioned issues while retaining the ability to represent complex distributions but are notoriously difficult to train. VDD is the first method that distills pre-trained diffusion models into MoE models, and hence, combines the expressiveness of Diffusion Models with the benefits of Mixture Models. Specifically, VDD leverages a decompositional upper bound of the variational objective that allows the training of each expert separately, resulting in a robust optimization scheme for MoEs. VDD demonstrates across nine complex behavior learning tasks, that it is able to: i) accurately distill complex distributions learned by the diffusion model, ii) outperform existing state-of-the-art distillation methods, and iii) surpass conventional methods for training MoE.

Updated: 2024-06-18 12:15:05

标题: 扩散策略的变分蒸馏转化为专家混合

摘要: 这项工作介绍了变分扩散蒸馏（VDD），一种将去噪扩散策略蒸馏为专家混合（MoE）的新方法，通过变分推理。由于其出色的学习和表示复杂、多模态分布的能力，扩散模型是生成建模的当前最先进技术。这种能力使扩散模型能够复制人类行为中固有的多样性，使其成为行为学习（如从人类演示中学习）中首选的模型。然而，扩散模型也存在一些缺点，包括似然性的难以处理以及由于其迭代抽样过程而导致的长推理时间。特别是推理时间对于实时应用（如机器人控制）构成了重大挑战。相比之下，MoE有效地解决了上述问题，同时保持了表示复杂分布的能力，但训练难度很大。VDD是第一种将预先训练的扩散模型蒸馏为MoE模型的方法，因此，将扩散模型的表达能力与混合模型的优势结合起来。具体来说，VDD利用了变分目标的分解上界，允许独立训练每个专家，从而为MoE提供了稳健的优化方案。VDD在九个复杂行为学习任务中展示了其能力：i）准确蒸馏扩散模型学习的复杂分布，ii）优于现有最先进的蒸馏方法，iii）超越传统的MoE训练方法。

更新时间: 2024-06-18 12:15:05

领域: cs.LG,cs.AI,cs.RO

下载: http://arxiv.org/abs/2406.12538v1

Enhanced Gradient Boosting for Zero-Inflated Insurance Claims and Comparative Analysis of CatBoost, XGBoost, and LightGBM

The property and casualty (P&C) insurance industry faces challenges in developing claim predictive models due to the highly right-skewed distribution of positive claims with excess zeros. To address this, actuarial science researchers have employed "zero-inflated" models that combine a traditional count model and a binary model. This paper investigates the use of boosting algorithms to process insurance claim data, including zero-inflated telematics data, to construct claim frequency models. Three popular gradient boosting libraries - XGBoost, LightGBM, and CatBoost - are evaluated and compared to determine the most suitable library for training insurance claim data and fitting actuarial frequency models. Through a comprehensive analysis of two distinct datasets, it is determined that CatBoost is the best for developing auto claim frequency models based on predictive performance. Furthermore, we propose a new zero-inflated Poisson boosted tree model, with variation in the assumption about the relationship between inflation probability $p$ and distribution mean $\mu$, and find that it outperforms others depending on data characteristics. This model enables us to take advantage of particular CatBoost tools, which makes it easier and more convenient to investigate the effects and interactions of various risk features on the frequency model when using telematics data.

Updated: 2024-06-18 12:09:18

标题: 增强梯度提升用于零膨胀保险索赔及CatBoost、XGBoost和LightGBM的比较分析

摘要: 财产和意外伤害（P＆C）保险行业在开发索赔预测模型方面面临挑战，原因是正向索赔的分布高度右偏且存在过多的零值。为解决这一问题，精算科学研究人员采用了“零膨胀”模型，将传统计数模型和二元模型结合起来。本文研究了使用提升算法处理保险索赔数据，包括零膨胀的遥测数据，以构建索赔频率模型。评估并比较了三种流行的梯度提升库 - XGBoost、LightGBM和CatBoost，以确定最适合训练保险索赔数据和拟合精算频率模型的库。通过对两个不同数据集的全面分析，确定CatBoost在基于预测性能开发汽车索赔频率模型方面表现最佳。此外，我们提出了一种新的零膨胀泊松增强树模型，对膨胀概率$p$与分布均值$\mu$之间关系的假设进行了变化，并发现根据数据特征，它优于其他模型。该模型使我们能够利用特定的CatBoost工具，使得在使用遥测数据时更容易、更便捷地研究各种风险特征对频率模型的影响和相互作用。

更新时间: 2024-06-18 12:09:18

领域: cs.LG

下载: http://arxiv.org/abs/2307.07771v3

New Reservoir Computing Kernel Based on Chaotic Chua Circuit and Investigating Application to Post-Quantum Cryptography

The aim of this project was to develop a new Reservoir Computer implementation, based on a chaotic Chua circuit. In addition to suitable classification and regression benchmarks, the Reservoir Computer was applied to Post-Quantum Cryptography, with its suitability for this application investigated and assessed. The cryptographic algorithm utilised was the Learning with Errors problem, for both encryption and decryption. To achieve this, the Chua circuit was characterised, in simulation, and by physical circuit testing. The Reservoir Computer was designed and implemented using the results of the characterisation. As part of this development, noise was considered and mitigated. The benchmarks demonstrate that the Reservoir Computer can achieve current literature benchmarks with low error. However, the results with Learning with Errors suggest that a Chua-based Reservoir Computer is not sufficiently complex to tackle the high non-linearity in Post-Quantum Cryptography. Future work would involve researching the use of different combinations of multiple Chua Reservoir Computers in larger neural network architectures. Such architectures may produce the required high-dimensional behaviour to achieve the Learning with Errors problem. This project is believed to be only the second instance of a Chua-based Reservoir Computer in academia, and it is the first to be applied to challenging real-world tasks such as Post-Quantum Cryptography. It is also original by its investigation of hitherto unexplored parameters, and their impact on performance. It demonstrates a proof-of-concept for a mass-producible, inexpensive, low-power consumption hardware neural network. It also enables the next stages in research to occur, paving the road for using Chua-based Reservoir Computers across various applications.

Updated: 2024-06-18 12:07:59

标题: 基于混沌Chua电路的新型储层计算核心及其在后量子密码学中的应用研究

摘要: 该项目的目标是基于混沌Chua电路开发一个新的水库计算机实现。除了适用于分类和回归的基准测试外，水库计算机还应用于后量子密码学，研究和评估其适用性。所使用的密码算法是学习与错误问题，用于加密和解密。为了实现这一目标，Chua电路在模拟中进行了表征，并通过物理电路测试。水库计算机是根据表征结果设计和实施的。在这一开发过程中，考虑并减轻了噪音。基准测试表明，水库计算机可以以低误差实现当前文献中的基准测试。然而，学习与错误的结果表明，基于Chua的水库计算机并不足够复杂，无法应对后量子密码学中的高非线性。未来的工作将涉及研究在更大的神经网络架构中使用不同组合的多个Chua水库计算机。这样的架构可能产生所需的高维行为，以解决学习与错误问题。该项目被认为是学术界仅有的第二个基于Chua的水库计算机实例，并且是首个应用于具有挑战性的实际任务，如后量子密码学。通过对迄今未被探索的参数及其对性能的影响进行调查，也具有独创性。它展示了一个可大规模生产、廉价、低功耗的硬件神经网络的概念验证。它也为未来的研究阶段铺平了道路，促进了在各种应用中使用基于Chua的水库计算机。

更新时间: 2024-06-18 12:07:59

领域: cs.CR,cs.LG,nlin.CD,physics.app-ph,physics.class-ph

下载: http://arxiv.org/abs/2406.12948v1

Equivariant Frames and the Impossibility of Continuous Canonicalization

Canonicalization provides an architecture-agnostic method for enforcing equivariance, with generalizations such as frame-averaging recently gaining prominence as a lightweight and flexible alternative to equivariant architectures. Recent works have found an empirical benefit to using probabilistic frames instead, which learn weighted distributions over group elements. In this work, we provide strong theoretical justification for this phenomenon: for commonly-used groups, there is no efficiently computable choice of frame that preserves continuity of the function being averaged. In other words, unweighted frame-averaging can turn a smooth, non-symmetric function into a discontinuous, symmetric function. To address this fundamental robustness problem, we formally define and construct \emph{weighted} frames, which provably preserve continuity, and demonstrate their utility by constructing efficient and continuous weighted frames for the actions of $SO(2)$, $SO(3)$, and $S_n$ on point clouds.

Updated: 2024-06-18 12:07:34

标题: 等变框架和连续规范化的不可能性

摘要: 规范化提供了一种与架构无关的方法来强制执行等变性，最近，像帧平均这样的概括正逐渐成为等变架构的轻量级和灵活的替代方案。最近的研究发现，使用概率帧有实证好处，它们学习组元素上的加权分布。在这项工作中，我们为这一现象提供了强有力的理论依据：对于常用的群，没有有效可计算的帧选择可以保持正在平均的函数的连续性。换句话说，无权重的帧平均可以将一个平滑的、非对称的函数转变为一个不连续的、对称的函数。为了解决这一基本的鲁棒性问题，我们正式定义和构建\emph{加权}帧，可以证明它们保持连续性，并通过构建有效且连续的加权帧来展示它们的实用性，用于在点云上进行$SO(2)$、$SO(3)$和$S_n$的操作。

更新时间: 2024-06-18 12:07:34

领域: cs.LG

下载: http://arxiv.org/abs/2402.16077v2

TREE: Tree Regularization for Efficient Execution

The rise of machine learning methods on heavily resource constrained devices requires not only the choice of a suitable model architecture for the target platform, but also the optimization of the chosen model with regard to execution time consumption for inference in order to optimally utilize the available resources. Random forests and decision trees are shown to be a suitable model for such a scenario, since they are not only heavily tunable towards the total model size, but also offer a high potential for optimizing their executions according to the underlying memory architecture. In addition to the straightforward strategy of enforcing shorter paths through decision trees and hence reducing the execution time for inference, hardware-aware implementations can optimize the execution time in an orthogonal manner. One particular hardware-aware optimization is to layout the memory of decision trees in such a way, that higher probably paths are less likely to be evicted from system caches. This works particularly well when splits within tree nodes are uneven and have a high probability to visit one of the child nodes. In this paper, we present a method to reduce path lengths by rewarding uneven probability distributions during the training of decision trees at the cost of a minimal accuracy degradation. Specifically, we regularize the impurity computation of the CART algorithm in order to favor not only low impurity, but also highly asymmetric distributions for the evaluation of split criteria and hence offer a high optimization potential for a memory architecture-aware implementation. We show that especially for binary classification data sets and data sets with many samples, this form of regularization can lead to an reduction of up to approximately four times in the execution time with a minimal accuracy degradation.

Updated: 2024-06-18 12:01:06

标题: 树：树正则化以提高执行效率

摘要: 机器学习方法在资源严重受限的设备上的崛起，不仅需要选择适合目标平台的模型架构，还需要针对推理的执行时间消耗优化所选模型，以便最佳利用可用资源。研究表明，随机森林和决策树是这种情况下的合适模型，因为它们不仅可以针对整体模型大小进行调整，而且还具有根据底层内存架构优化执行的高潜力。除了通过强制决策树中的较短路径来减少推理的执行时间的直接策略外，硬件感知的实现可以以正交方式优化执行时间。一种特定的硬件感知优化是以一种方式布局决策树的内存，使得更可能的路径不太可能从系统缓存中被驱逐。当树节点内的分割不均匀并且很可能访问其中一个子节点时，这种方法特别有效。在本文中，我们提出了一种方法，通过奖励训练决策树期间的不均匀概率分布来减少路径长度，以牺牲最小的准确性降低。具体来说，我们规范了CART算法的不纯度计算，以更青睐不仅低不纯度，而且对于评估分割标准具有高度不对称分布，从而为内存架构感知的实现提供高度优化潜力。我们发现，特别是对于二元分类数据集和具有许多样本的数据集，这种正规化方法可以导致执行时间降低约四倍，而准确性几乎没有下降。

更新时间: 2024-06-18 12:01:06

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2406.12531v1

LLM4MSR: An LLM-Enhanced Paradigm for Multi-Scenario Recommendation

As the demand for more personalized recommendation grows and a dramatic boom in commercial scenarios arises, the study on multi-scenario recommendation (MSR) has attracted much attention, which uses the data from all scenarios to simultaneously improve their recommendation performance. However, existing methods tend to integrate insufficient scenario knowledge and neglect learning personalized cross-scenario preferences, thus leading to suboptimal performance and inadequate interpretability. Meanwhile, though large language model (LLM) has shown great capability of reasoning and capturing semantic information, the high inference latency and high computation cost of tuning hinder its implementation in industrial recommender systems. To fill these gaps, we propose an effective efficient interpretable LLM-enhanced paradigm LLM4MSR in this work. Specifically, we first leverage LLM to uncover multi-level knowledge including scenario correlations and users' cross-scenario interests from the designed scenario- and user-level prompt without fine-tuning the LLM, then adopt hierarchical meta networks to generate multi-level meta layers to explicitly improves the scenario-aware and personalized recommendation capability. Our experiments on KuaiSAR-small, KuaiSAR, and Amazon datasets validate two significant advantages of LLM4MSR: (i) the effectiveness and compatibility with different multi-scenario backbone models (achieving 1.5%, 1%, and 40% AUC improvement on three datasets), (ii) high efficiency and deployability on industrial recommender systems, and (iii) improved interpretability. The implemented code and data is available to ease reproduction.

Updated: 2024-06-18 11:59:36

标题: LLM4MSR: 一种基于LLM的多场景推荐范式

摘要: 随着对更个性化推荐的需求增长和商业场景的急剧增加，多场景推荐（MSR）的研究引起了广泛关注，该研究利用所有场景的数据同时提高推荐性能。然而，现有方法往往整合了不足的场景知识，忽视了学习个性化跨场景偏好，从而导致性能次优和解释性不足。同时，虽然大型语言模型（LLM）表现出极强的推理和语义信息捕获能力，但调整的高推断延迟和高计算成本阻碍了其在工业推荐系统中的实施。为了填补这些空白，我们在这项工作中提出了一个有效的高效可解释的LLM增强模式LLM4MSR。具体来说，我们首先利用LLM来揭示多层知识，包括场景相关性和用户跨场景兴趣，而无需微调LLM，然后采用分层元网络生成多层元层，明确改进场景感知和个性化推荐能力。我们在KuaiSAR-small、KuaiSAR和亚马逊数据集上的实验验证了LLM4MSR的两个显著优势：(i) 与不同多场景骨干模型的有效性和兼容性（在三个数据集上分别实现了1.5％、1％和40％的AUC改善），(ii) 在工业推荐系统上的高效性和可部署性，以及(iii) 改进的可解释性。实现的代码和数据可用于便利重现。

更新时间: 2024-06-18 11:59:36

领域: cs.IR,cs.AI

下载: http://arxiv.org/abs/2406.12529v1

HARE: HumAn pRiors, a key to small language model Efficiency

Human priors play a crucial role in efficiently utilizing data in deep learning. However, with the development of large language models (LLMs), there is an increasing emphasis on scaling both model size and data volume, which often diminishes the importance of human priors in data construction. Influenced by these trends, existing Small Language Models (SLMs) mainly rely on web-scraped large-scale training data, neglecting the proper incorporation of human priors. This oversight limits the training efficiency of language models in resource-constrained settings. In this paper, we propose a principle to leverage human priors for data construction. This principle emphasizes achieving high-performance SLMs by training on a concise dataset that accommodates both semantic diversity and data quality consistency, while avoiding benchmark data leakage. Following this principle, we train an SLM named HARE-1.1B. Extensive experiments on large-scale benchmark datasets demonstrate that HARE-1.1B performs favorably against state-of-the-art SLMs, validating the effectiveness of the proposed principle. Additionally, this provides new insights into efficient language model training in resource-constrained environments from the view of human priors.

Updated: 2024-06-18 11:59:03

标题: HARE：人类先验知识，小型语言模型效率的关键

摘要: 人类先验在深度学习中有效利用数据起着至关重要的作用。然而，随着大型语言模型（LLMs）的发展，越来越强调模型规模和数据量的扩展，这往往会减弱数据构建中人类先验的重要性。受这些趋势的影响，现有的小型语言模型（SLMs）主要依赖于网络抓取的大规模训练数据，忽视了适当整合人类先验的重要性。这一疏忽限制了语言模型在资源受限环境中的训练效率。在本文中，我们提出了一种利用人类先验进行数据构建的原则。这一原则强调通过在一个简明的数据集上进行训练，既包含语义多样性又保持数据质量的一致性，同时避免基准数据泄漏。根据这一原则，我们训练了一个名为HARE-1.1B的SLM。对大规模基准数据集进行的广泛实验表明，HARE-1.1B在性能上表现优异，超过了最先进的SLMs，验证了所提出原则的有效性。此外，这为从人类先验的角度提供了在资源受限环境中有效训练语言模型的新见解。

更新时间: 2024-06-18 11:59:03

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2406.11410v2

ROCOv2: Radiology Objects in COntext Version 2, an Updated Multimodal Image Dataset

Automated medical image analysis systems often require large amounts of training data with high quality labels, which are difficult and time consuming to generate. This paper introduces Radiology Object in COntext version 2 (ROCOv2), a multimodal dataset consisting of radiological images and associated medical concepts and captions extracted from the PMC Open Access subset. It is an updated version of the ROCO dataset published in 2018, and adds 35,705 new images added to PMC since 2018. It further provides manually curated concepts for imaging modalities with additional anatomical and directional concepts for X-rays. The dataset consists of 79,789 images and has been used, with minor modifications, in the concept detection and caption prediction tasks of ImageCLEFmedical Caption 2023. The dataset is suitable for training image annotation models based on image-caption pairs, or for multi-label image classification using Unified Medical Language System (UMLS) concepts provided with each image. In addition, it can serve for pre-training of medical domain models, and evaluation of deep learning models for multi-task learning.

Updated: 2024-06-18 11:58:39

标题: ROCOv2：上下文中的放射学对象第2版，一个更新的多模态图像数据集

摘要: 自动化医学图像分析系统通常需要大量训练数据和高质量标签，这些数据难以生成且耗时。本文介绍了Radiology Object in COntext版本2（ROCOv2），这是一个多模态数据集，包括从PMC Open Access子集中提取的放射学图像、相关医学概念和标题。这是2018年发布的ROCO数据集的更新版本，新增了自2018年以来在PMC中添加的35,705张新图片。它还为X光提供了手动策划的成像模态及额外的解剖和方向概念。该数据集包含79,789张图片，并已在ImageCLEFmedical Caption 2023的概念检测和标题预测任务中进行了轻微修改的使用。该数据集适用于基于图像-标题对训练图像注释模型，或使用每个图像提供的统一医学语言系统（UMLS）概念进行多标签图像分类。此外，它可用于医学领域模型的预训练，以及深度学习模型进行多任务学习的评估。

更新时间: 2024-06-18 11:58:39

领域: eess.IV,cs.CV,cs.LG

下载: http://arxiv.org/abs/2405.10004v2

Make Your Home Safe: Time-aware Unsupervised User Behavior Anomaly Detection in Smart Homes via Loss-guided Mask

Smart homes, powered by the Internet of Things, offer great convenience but also pose security concerns due to abnormal behaviors, such as improper operations of users and potential attacks from malicious attackers. Several behavior modeling methods have been proposed to identify abnormal behaviors and mitigate potential risks. However, their performance often falls short because they do not effectively learn less frequent behaviors, consider temporal context, or account for the impact of noise in human behaviors. In this paper, we propose SmartGuard, an autoencoder-based unsupervised user behavior anomaly detection framework. First, we design a Loss-guided Dynamic Mask Strategy (LDMS) to encourage the model to learn less frequent behaviors, which are often overlooked during learning. Second, we propose a Three-level Time-aware Position Embedding (TTPE) to incorporate temporal information into positional embedding to detect temporal context anomaly. Third, we propose a Noise-aware Weighted Reconstruction Loss (NWRL) that assigns different weights for routine behaviors and noise behaviors to mitigate the interference of noise behaviors during inference. Comprehensive experiments on three datasets with ten types of anomaly behaviors demonstrates that SmartGuard consistently outperforms state-of-the-art baselines and also offers highly interpretable results.

Updated: 2024-06-18 11:52:26

标题: 使你的家更安全：通过损失引导的掩码，在智能家居中基于时间感知的无监督用户行为异常检测

摘要: 智能家居，由物联网驱动，提供了极大的便利性，但也因用户的不当操作和恶意攻击者的潜在攻击而引发安全问题。已经提出了几种行为建模方法来识别异常行为并减轻潜在风险。然而，它们的性能通常不佳，因为它们不能有效地学习较少频繁的行为，考虑时间上下文，或考虑到人类行为中的噪声影响。在本文中，我们提出了SmartGuard，一种基于自动编码器的无监督用户行为异常检测框架。首先，我们设计了Loss-guided Dynamic Mask Strategy（LDMS）来鼓励模型学习通常被忽视的较少频繁的行为。其次，我们提出了Three-level Time-aware Position Embedding（TTPE）来将时间信息融入位置嵌入中，以检测时间上下文异常。第三，我们提出了Noise-aware Weighted Reconstruction Loss（NWRL），为常规行为和噪声行为分配不同权重，以减轻推断过程中噪声行为的干扰。在三个数据集上进行的广泛实验表明，SmartGuard始终优于最先进的基线，并提供高度可解释的结果。

更新时间: 2024-06-18 11:52:26

领域: cs.CR,cs.AI,cs.NI

下载: http://arxiv.org/abs/2406.10928v2

Variational optimization of the amplitude of neural-network quantum many-body ground states

Neural-network quantum states (NQSs), variationally optimized by combining traditional methods and deep learning techniques, is a new way to find quantum many-body ground states and gradually becomes a competitor of traditional variational methods. However, there are still some difficulties in the optimization of NQSs, such as local minima, slow convergence, and sign structure optimization. Here, we split a quantum many-body variational wave function into a multiplication of a real-valued amplitude neural network and a sign structure, and focus on the optimization of the amplitude network while keeping the sign structure fixed. The amplitude network is a convolutional neural network (CNN) with residual blocks, namely a ResNet. Our method is tested on three typical quantum many-body systems. The obtained ground state energies are lower than or comparable to those from traditional variational Monte Carlo (VMC) methods and density matrix renormalization group (DMRG). Surprisingly, for the frustrated Heisenberg $J_1$-$J_2$ model, our results are better than those of the complex-valued CNN in the literature, implying that the sign structure of the complex-valued NQS is difficult to be optimized. We will study the optimization of the sign structure of NQSs in the future.

Updated: 2024-06-18 11:50:24

标题: 神经网络量子多体基态振幅的变分优化

摘要: 神经网络量子态（NQSs）通过将传统方法和深度学习技术相结合进行变分优化，是一种寻找量子多体基态的新方法，并逐渐成为传统变分方法的竞争对手。然而，NQSs的优化仍然存在一些困难，如局部极小值、收敛速度慢和符号结构优化。在这里，我们将量子多体变分波函数分解为实值振幅神经网络和符号结构的乘积，并专注于优化振幅网络，同时保持符号结构固定。振幅网络是一个带残差块的卷积神经网络（CNN），即ResNet。我们的方法在三种典型的量子多体系统上进行了测试。获得的基态能量低于或与传统变分蒙特卡洛（VMC）方法和密度矩阵重整化群（DMRG）方法的基态能量相当。令人惊讶的是，在受挫的海森堡$J_1$-$J_2$模型中，我们的结果优于文献中复值CNN的结果，这表明复值NQS的符号结构难以优化。我们将在未来研究NQSs符号结构的优化。

更新时间: 2024-06-18 11:50:24

领域: cond-mat.str-el,cond-mat.dis-nn,cs.LG,quant-ph

下载: http://arxiv.org/abs/2308.09664v2

Update Selective Parameters: Federated Machine Unlearning Based on Model Explanation

Federated learning is a promising privacy-preserving paradigm for distributed machine learning. In this context, there is sometimes a need for a specialized process called machine unlearning, which is required when the effect of some specific training samples needs to be removed from a learning model due to privacy, security, usability, and/or legislative factors. However, problems arise when current centralized unlearning methods are applied to existing federated learning, in which the server aims to remove all information about a class from the global model. Centralized unlearning usually focuses on simple models or is premised on the ability to access all training data at a central node. However, training data cannot be accessed on the server under the federated learning paradigm, conflicting with the requirements of the centralized unlearning process. Additionally, there are high computation and communication costs associated with accessing clients' data, especially in scenarios involving numerous clients or complex global models. To address these concerns, we propose a more effective and efficient federated unlearning scheme based on the concept of model explanation. Model explanation involves understanding deep networks and individual channel importance, so that this understanding can be used to determine which model channels are critical for classes that need to be unlearned. We select the most influential channels within an already-trained model for the data that need to be unlearned and fine-tune only influential channels to remove the contribution made by those data. In this way, we can simultaneously avoid huge consumption costs and ensure that the unlearned model maintains good performance. Experiments with different training models on various datasets demonstrate the effectiveness of the proposed approach.

Updated: 2024-06-18 11:43:20

标题: 更新选择性参数：基于模型解释的联邦机器遗忘

摘要: 联邦学习是一种有前途的分布式机器学习的隐私保护范式。在这种情况下，有时需要一种称为机器遗忘的专门过程，当一些特定训练样本的影响需要从学习模型中移除时，这是由于隐私、安全、可用性和/或立法因素所需。然而，当当前的集中式遗忘方法应用于现有的联邦学习时，问题就会出现，其中服务器旨在从全局模型中删除有关某一类的所有信息。集中式遗忘通常专注于简单模型或基于能够在中心节点访问所有训练数据的能力。然而，在联邦学习范式下，训练数据无法在服务器上访问，与集中式遗忘过程的要求相冲突。此外，在访问客户端数据时存在高计算和通信成本，特别是在涉及众多客户端或复杂全局模型的情况下。为了解决这些问题，我们提出了一个基于模型解释概念的更有效和高效的联邦遗忘方案。模型解释涉及理解深度网络和单个通道的重要性，以便利用这种理解来确定哪些模型通道对需要遗忘的类别至关重要。我们选择已经训练模型中对需要遗忘的数据最有影响力的通道，并仅微调这些有影响力的通道以消除这些数据所做出的贡献。通过这种方式，我们可以同时避免巨大的消耗成本，并确保被遗忘的模型保持良好的性能。在各种数据集上使用不同的训练模型进行实验，证明了所提出方法的有效性。

更新时间: 2024-06-18 11:43:20

领域: cs.CR,cs.DC,cs.LG

下载: http://arxiv.org/abs/2406.12516v1

A Survey of Fragile Model Watermarking

Model fragile watermarking, inspired by both the field of adversarial attacks on neural networks and traditional multimedia fragile watermarking, has gradually emerged as a potent tool for detecting tampering, and has witnessed rapid development in recent years. Unlike robust watermarks, which are widely used for identifying model copyrights, fragile watermarks for models are designed to identify whether models have been subjected to unexpected alterations such as backdoors, poisoning, compression, among others. These alterations can pose unknown risks to model users, such as misidentifying stop signs as speed limit signs in classic autonomous driving scenarios. This paper provides an overview of the relevant work in the field of model fragile watermarking since its inception, categorizing them and revealing the developmental trajectory of the field, thus offering a comprehensive survey for future endeavors in model fragile watermarking.

Updated: 2024-06-18 11:42:03

标题: 一种脆弱模型水印技术调查

摘要: 模型脆弱水印技术，受到对抗性攻击神经网络领域和传统多媒体脆弱水印技术的启发，逐渐成为检测篡改的有效工具，并在近年来取得了快速发展。与广泛用于识别模型版权的稳健水印不同，用于模型的脆弱水印旨在识别模型是否遭受了意外的修改，如后门、毒化、压缩等。这些修改可能对模型用户造成未知风险，例如在经典的自动驾驶场景中误将停车标志识别为限速标志。本文概述了自模型脆弱水印技术问世以来该领域相关工作，对其进行分类，并揭示了该领域的发展轨迹，为未来在模型脆弱水印技术领域的努力提供了全面调查。

更新时间: 2024-06-18 11:42:03

领域: cs.CR,cs.AI

下载: http://arxiv.org/abs/2406.04809v2

Advances in 3D Neural Stylization: A Survey

Modern artificial intelligence offers a novel and transformative approach to creating digital art across diverse styles and modalities like images, videos and 3D data, unleashing the power of creativity and revolutionizing the way that we perceive and interact with visual content. This paper reports on recent advances in stylized 3D asset creation and manipulation with the expressive power of neural networks. We establish a taxonomy for neural stylization, considering crucial design choices such as scene representation, guidance data, optimization strategies, and output styles. Building on such taxonomy, our survey first revisits the background of neural stylization on 2D images, and then presents in-depth discussions on recent neural stylization methods for 3D data, accompanied by a mini-benchmark evaluating selected neural field stylization methods. Based on the insights gained from the survey, we highlight the practical significance, open challenges, future research, and potential impacts of neural stylization, which facilitates researchers and practitioners to navigate the rapidly evolving landscape of 3D content creation using modern artificial intelligence.

Updated: 2024-06-18 11:31:56

标题: 三维神经风格化的进展：一项调查

摘要: 现代人工智能为跨不同风格和形式的数字艺术创作提供了一种新颖和变革性的方法，包括图像、视频和3D数据，释放了创造力的力量并革新了我们感知和与视觉内容互动的方式。本文报道了神经网络在风格化3D资产创作和操纵方面的最新进展。我们建立了神经风格化的分类体系，考虑了关键的设计选择，如场景表示、引导数据、优化策略和输出风格。基于这样的分类体系，我们首先回顾了2D图像上神经风格化的背景，然后深入讨论了最近针对3D数据的神经风格化方法，同时伴随着一个评估选定的神经场风格化方法的迷你基准测试。根据调查所得的见解，我们突出了神经风格化的实际意义、开放挑战、未来研究和潜在影响，这有助于研究人员和从业者在使用现代人工智能进行快速演变的3D内容创作领域中进行导航。

更新时间: 2024-06-18 11:31:56

领域: cs.CV,cs.AI,cs.GR

下载: http://arxiv.org/abs/2311.18328v2

Can We Trust Large Language Models Generated Code? A Framework for In-Context Learning, Security Patterns, and Code Evaluations Across Diverse LLMs

Large Language Models (LLMs) such as ChatGPT and GitHub Copilot have revolutionized automated code generation in software engineering. However, as these models are increasingly utilized for software development, concerns have arisen regarding the security and quality of the generated code. These concerns stem from LLMs being primarily trained on publicly available code repositories and internet-based textual data, which may contain insecure code. This presents a significant risk of perpetuating vulnerabilities in the generated code, creating potential attack vectors for exploitation by malicious actors. Our research aims to tackle these issues by introducing a framework for secure behavioral learning of LLMs through In-Content Learning (ICL) patterns during the code generation process, followed by rigorous security evaluations. To achieve this, we have selected four diverse LLMs for experimentation. We have evaluated these coding LLMs across three programming languages and identified security vulnerabilities and code smells. The code is generated through ICL with curated problem sets and undergoes rigorous security testing to evaluate the overall quality and trustworthiness of the generated code. Our research indicates that ICL-driven one-shot and few-shot learning patterns can enhance code security, reducing vulnerabilities in various programming scenarios. Developers and researchers should know that LLMs have a limited understanding of security principles. This may lead to security breaches when the generated code is deployed in production systems. Our research highlights LLMs are a potential source of new vulnerabilities to the software supply chain. It is important to consider this when using LLMs for code generation. This research article offers insights into improving LLM security and encourages proactive use of LLMs for code generation to ensure software system safety.

Updated: 2024-06-18 11:29:34

标题: 我们能信任由大型语言模型生成的代码吗？一种适用于多种大型语言模型的上下文学习、安全模式和代码评估的框架

摘要: 大型语言模型（LLMs）如ChatGPT和GitHub Copilot已经彻底改变了软件工程中的自动化代码生成。然而，随着这些模型在软件开发中的日益应用，人们对生成代码的安全性和质量产生了担忧。这些担忧源于LLMs主要是在公开可用的代码存储库和基于互联网的文本数据上进行训练，这些数据可能包含不安全的代码。这会导致生成的代码中存在严重的漏洞风险，为恶意行为者提供潜在的攻击路径。我们的研究旨在通过引入一种框架，通过在代码生成过程中引入内容学习（ICL）模式来进行LLMs的安全行为学习，然后进行严格的安全评估来解决这些问题。为实现这一目标，我们选择了四种不同的LLMs进行实验。我们在三种编程语言中评估了这些编程LLMs，并确定了安全漏洞和代码异味。通过ICL与精心策划的问题集生成代码，并进行严格的安全测试，以评估生成代码的整体质量和可信度。我们的研究表明，ICL驱动的一次性和少次学习模式可以增强代码安全性，在各种编程场景中降低漏洞。开发人员和研究人员应该知道，LLMs对安全原则的理解是有限的。当生成的代码部署在生产系统中时，可能导致安全漏洞。我们的研究强调，LLMs是软件供应链新漏洞的潜在来源。在使用LLMs进行代码生成时，考虑这一点是很重要的。本研究文章提供了改进LLM安全性的见解，并鼓励积极使用LLMs进行代码生成，以确保软件系统的安全性。

更新时间: 2024-06-18 11:29:34

领域: cs.CR

下载: http://arxiv.org/abs/2406.12513v1

Improving the Evaluation and Actionability of Explanation Methods for Multivariate Time Series Classification

Explanation for Multivariate Time Series Classification (MTSC) is an important topic that is under explored. There are very few quantitative evaluation methodologies and even fewer examples of actionable explanation, where the explanation methods are shown to objectively improve specific computational tasks on time series data. In this paper we focus on analyzing InterpretTime, a recent evaluation methodology for attribution methods applied to MTSC. We reproduce the original paper results, showcase some significant weaknesses of the methodology and propose ideas to improve both its accuracy and efficiency. Unlike related work, we go beyond evaluation and also showcase the actionability of the produced explainer ranking, by using the best attribution methods for the task of channel selection in MTSC. We find that perturbation-based methods such as SHAP and Feature Ablation work well across a set of datasets, classifiers and tasks and outperform gradient-based methods. We apply the best ranked explainers to channel selection for MTSC and show significant data size reduction and improved classifier accuracy.

Updated: 2024-06-18 11:18:46

标题: 提高用于多变量时间序列分类的解释方法评估和可操作性

摘要: 多元时间序列分类（MTSC）的解释是一个尚未被充分探讨的重要主题。目前很少有定量评估方法，甚至更少的可操作性解释示例，其中解释方法被证明能客观地提高时间序列数据上特定计算任务的效果。本文重点分析InterpretTime，一种最近应用于MTSC的归因方法的评估方法。我们复现了原始论文的结果，并展示了该方法的一些显著弱点，并提出了改进其准确性和效率的想法。与相关工作不同的是，我们不仅超越了评估，还展示了所产生的解释排名的可操作性，通过使用最佳的归因方法来进行MTSC中的通道选择任务。我们发现基于扰动的方法（如SHAP和特征消除）在一组数据集、分类器和任务中表现良好，并优于基于梯度的方法。我们将排名最高的解释器应用于MTSC的通道选择，并展示了显著的数据大小缩减和改进的分类器准确性。

更新时间: 2024-06-18 11:18:46

领域: cs.LG

下载: http://arxiv.org/abs/2406.12507v1

Segmentation and Characterization of Macerated Fibers and Vessels Using Deep Learning

Wood comprises different cell types, such as fibers, tracheids and vessels, defining its properties. Studying cells' shape, size, and arrangement in microscopy images is crucial for understanding wood characteristics. Typically, this involves macerating (soaking) samples in a solution to separate cells, then spreading them on slides for imaging with a microscope that covers a wide area, capturing thousands of cells. However, these cells often cluster and overlap in images, making the segmentation difficult and time-consuming using standard image-processing methods. In this work, we developed an automatic deep learning segmentation approach that utilizes the one-stage YOLOv8 model for fast and accurate segmentation and characterization of macerated fiber and vessel form aspen trees in microscopy images. The model can analyze 32,640 x 25,920 pixels images and demonstrate effective cell detection and segmentation, achieving a mAP_{0.5-0.95} of 78 %. To assess the model's robustness, we examined fibers from a genetically modified tree line known for longer fibers. The outcomes were comparable to previous manual measurements. Additionally, we created a user-friendly web application for image analysis and provided the code for use on Google Colab. By leveraging YOLOv8's advances, this work provides a deep learning solution to enable efficient quantification and analysis of wood cells suitable for practical applications.

Updated: 2024-06-18 11:02:49

标题: 深度学习用于分割和表征腐解纤维和导管

摘要: 木材由不同的细胞类型组成，如纤维、导管和维管束，这些细胞类型定义了木材的特性。研究细胞在显微镜图像中的形状、大小和排列对于理解木材特性至关重要。通常，这涉及将样本浸泡在溶液中以分离细胞，然后将它们铺在玻片上，用覆盖较大区域的显微镜进行成像，捕捉成千上万个细胞。然而，在图像中，这些细胞通常会聚集和重叠，使得使用标准图像处理方法进行分割变得困难且耗时。在这项工作中，我们开发了一种自动深度学习分割方法，利用一阶段YOLOv8模型对从白杨树中分离的纤维和维管束进行快速准确的分割和表征。该模型可以分析32,640 x 25,920像素图像，并展示出有效的细胞检测和分割，实现了78%的mAP_{0.5-0.95}。为了评估模型的稳健性，我们检查了来自一个以长纤维著称的基因改造树系的纤维。结果与先前的手动测量结果相当。此外，我们还创建了一个用户友好的网络应用程序用于图像分析，并提供了在Google Colab上使用的代码。通过利用YOLOv8的进展，这项工作提供了一种深度学习解决方案，可实现对木材细胞的高效量化和分析，适用于实际应用。

更新时间: 2024-06-18 11:02:49

领域: cs.CV,cs.LG,I.5.1

下载: http://arxiv.org/abs/2401.16937v2

Autonomous navigation of catheters and guidewires in mechanical thrombectomy using inverse reinforcement learning

Purpose: Autonomous navigation of catheters and guidewires can enhance endovascular surgery safety and efficacy, reducing procedure times and operator radiation exposure. Integrating tele-operated robotics could widen access to time-sensitive emergency procedures like mechanical thrombectomy (MT). Reinforcement learning (RL) shows potential in endovascular navigation, yet its application encounters challenges without a reward signal. This study explores the viability of autonomous navigation in MT vasculature using inverse RL (IRL) to leverage expert demonstrations. Methods: This study established a simulation-based training and evaluation environment for MT navigation. We used IRL to infer reward functions from expert behaviour when navigating a guidewire and catheter. We utilized soft actor-critic to train models with various reward functions and compared their performance in silico. Results: We demonstrated feasibility of navigation using IRL. When evaluating single versus dual device (i.e. guidewire versus catheter and guidewire) tracking, both methods achieved high success rates of 95% and 96%, respectively. Dual-tracking, however, utilized both devices mimicking an expert. A success rate of 100% and procedure time of 22.6 s were obtained when training with a reward function obtained through reward shaping. This outperformed a dense reward function (96%, 24.9 s) and an IRL-derived reward function (48%, 59.2 s). Conclusions: We have contributed to the advancement of autonomous endovascular intervention navigation, particularly MT, by employing IRL. The results underscore the potential of using reward shaping to train models, offering a promising avenue for enhancing the accessibility and precision of MT. We envisage that future research can extend our methodology to diverse anatomical structures to enhance generalizability.

Updated: 2024-06-18 11:00:55

标题: 机械血栓切除中导管和导丝的自主导航：逆强化学习

摘要: 目的：导管和导丝的自主导航可以提高血管内手术的安全性和有效性，减少手术时间和操作者的辐射暴露。集成远程操作机器人技术可以扩大对机械血栓清除术（MT）等紧急程序的访问范围。强化学习（RL）在血管内导航中显示出潜力，但在没有奖励信号的情况下应用遇到挑战。本研究探讨了使用逆强化学习（IRL）在MT血管系统中实现自主导航的可行性，以利用专家示范。方法：本研究建立了一个基于模拟的MT导航训练和评估环境。我们利用IRL从专家行为中推断奖励函数，当导航导丝和导管时。我们利用软性演员-评论家算法来训练具有不同奖励函数的模型，并在模拟环境中比较它们的性能。结果：我们展示了使用IRL进行导航的可行性。在评估单个设备与双设备（即导丝与导管和导丝）跟踪时，两种方法的成功率分别为95％和96％。然而，双跟踪利用了两个设备模仿专家。在通过奖励塑造获得奖励函数进行训练时，获得了100％的成功率和22.6秒的手术时间。这优于密集奖励函数（96％，24.9秒）和IRL衍生的奖励函数（48％，59.2秒）。结论：通过应用IRL，我们为自主血管内干预导航，特别是MT，的进展做出了贡献。结果强调了利用奖励塑造来训练模型的潜力，为增强MT的可访问性和精确性提供了一个有前途的途径。我们预见未来研究可以将我们的方法扩展到不同的解剖结构，以增强泛化能力。

更新时间: 2024-06-18 11:00:55

领域: cs.LG,cs.AI,cs.RO

下载: http://arxiv.org/abs/2406.12499v1

Retrieve to Explain: Evidence-driven Predictions with Language Models

Language models hold incredible promise for enabling scientific discovery by synthesizing massive research corpora. Many complex scientific research questions have multiple plausible answers, each supported by evidence of varying strength. However, existing language models lack the capability to quantitatively and faithfully compare answer plausibility in terms of supporting evidence. To address this issue, we introduce Retrieve to Explain (R2E), a retrieval-based language model. R2E scores and ranks all possible answers to a research question based on evidence retrieved from a document corpus. The architecture represents each answer only in terms of its supporting evidence, with the answer itself masked. This allows us to extend feature attribution methods, such as Shapley values, to transparently attribute each answer's score back to its supporting evidence at inference time. The architecture also allows R2E to incorporate new evidence without retraining, including non-textual data modalities templated into natural language. We assess on the challenging task of drug target identification from scientific literature, a human-in-the-loop process where failures are extremely costly and explainability is paramount. When predicting whether drug targets will subsequently be confirmed as efficacious in clinical trials, R2E not only matches non-explainable literature-based models but also surpasses a genetics-based target identification approach used throughout the pharmaceutical industry.

Updated: 2024-06-18 10:42:54

标题: 检索以解释：基于语言模型的证据驱动预测

摘要: 语言模型在通过综合大量研究文献来促进科学发现方面具有巨大的潜力。许多复杂的科学研究问题有多个可能的答案，每个答案都有不同强度的支持证据。然而，现有的语言模型缺乏在支持证据方面对答案可信度进行定量和忠实比较的能力。为了解决这个问题，我们介绍了一种基于检索的语言模型Retrieve to Explain（R2E）。R2E根据从文档语料库中检索到的证据对一个研究问题的所有可能答案进行评分和排序。该架构仅仅通过支持证据来表示每个答案，而将答案本身掩盖。这使我们能够扩展特征归因方法，如Shapley值，以透明地将每个答案的得分归因回其支持证据的推断时间。该架构还允许R2E在不重新训练的情况下引入新的证据，包括模板化为自然语言的非文本数据模态。我们在从科学文献中识别药物靶标的具有挑战性任务上进行评估，这是一个人机协作的过程，失败是极其昂贵的，解释性是至关重要的。在预测药物靶标是否会在临床试验中被证实有效时，R2E不仅与不可解释的基于文献的模型相匹配，而且超越了在制药行业中广泛使用的基因学方法来识别靶标。

更新时间: 2024-06-18 10:42:54

领域: cs.LG,cs.CL

下载: http://arxiv.org/abs/2402.04068v3

Connected Speech-Based Cognitive Assessment in Chinese and English

We present a novel benchmark dataset and prediction tasks for investigating approaches to assess cognitive function through analysis of connected speech. The dataset consists of speech samples and clinical information for speakers of Mandarin Chinese and English with different levels of cognitive impairment as well as individuals with normal cognition. These data have been carefully matched by age and sex by propensity score analysis to ensure balance and representativity in model training. The prediction tasks encompass mild cognitive impairment diagnosis and cognitive test score prediction. This framework was designed to encourage the development of approaches to speech-based cognitive assessment which generalise across languages. We illustrate it by presenting baseline prediction models that employ language-agnostic and comparable features for diagnosis and cognitive test score prediction. The models achieved unweighted average recall was 59.2% in diagnosis, and root mean squared error of 2.89 in score prediction.

Updated: 2024-06-18 10:41:48

标题: 基于连续语音的中英文认知评估

摘要: 我们提供了一个新颖的基准数据集和预测任务，用于研究通过分析连续语音来评估认知功能的方法。该数据集包括普通话和英语演讲者的语音样本和临床信息，这些演讲者具有不同程度的认知损伤，以及具有正常认知的个体。这些数据经过年龄和性别的倾向性评分分析进行了仔细匹配，以确保在模型训练中的平衡性和代表性。预测任务涵盖轻度认知损伤诊断和认知测试分数预测。该框架旨在促进跨语言通用的基于语音的认知评估方法的发展。我们通过提供采用语言不可知和可比较特征进行诊断和认知测试分数预测的基线预测模型来说明这一点。这些模型在诊断中实现的未加权平均召回率为59.2%，在分数预测中的均方根误差为2.89。

更新时间: 2024-06-18 10:41:48

领域: cs.CL,cs.LG,cs.SD,eess.AS,J.3; I.5.4

下载: http://arxiv.org/abs/2406.10272v2

Instruction Fine-Tuning: Does Prompt Loss Matter?

We present a novel study analyzing the effects of various prompt loss token weights (PLW) for supervised instruction fine-tuning (SIFT). While prompt-masking (PLW = 0) is common for SIFT, some fine-tuning APIs support fractional PLWs and suggest that using a small non-zero PLW can help stabilize learning when fine-tuning on short-completion data. However, there has never been a study confirming this claim, and OpenAI, a major cloud-based SIFT provider, recently removed this parameter from their fine-tuning API. We found that performance of models fine-tuned on short-completion data had a statistically-significant negative quadratic relationship with PLW. Using small values (0.01 - 0.5) of PLW produced better results on multiple-choice and short-generation benchmarks (outperforming models fine-tuned on long-completion data) while large values (~ 1.0) of PLW produced better results on long-generation benchmarks. We explained this effect and verified its importance through additional experiments. This research serves as a warning to API providers about the importance of providing a PLW parameter for SIFT.

Updated: 2024-06-18 10:37:08

标题: 指导微调：提示损失重要吗？

摘要: 我们提出了一项新颖的研究，分析了各种提示丢失标记权重（PLW）对监督指导微调（SIFT）的影响。尽管对于SIFT来说，提示屏蔽（PLW = 0）是常见的，但一些微调API支持分数PLW，并建议在微调短完成数据时使用一个小的非零PLW可以帮助稳定学习。然而，从未有研究证实这一说法，而一家主要的基于云的SIFT提供商OpenAI最近从其微调API中移除了此参数。我们发现，在短完成数据上微调的模型的性能与PLW之间存在显著的负二次关系。使用小值（0.01 - 0.5）的PLW在多项选择和短生成基准上产生了更好的结果（胜过在长完成数据上微调的模型），而大值（~1.0）的PLW在长生成基准上产生了更好的结果。我们解释了这种效应，并通过额外的实验验证了其重要性。这项研究提醒API提供商提供SIFT的PLW参数的重要性。

更新时间: 2024-06-18 10:37:08

领域: cs.LG,cs.AI,cs.CL

下载: http://arxiv.org/abs/2401.13586v3

The Power of LLM-Generated Synthetic Data for Stance Detection in Online Political Discussions

Stance detection holds great potential for enhancing the quality of online political discussions, as it has shown to be useful for summarizing discussions, detecting misinformation, and evaluating opinion distributions. Usually, transformer-based models are used directly for stance detection, which require large amounts of data. However, the broad range of debate questions in online political discussion creates a variety of possible scenarios that the model is faced with and thus makes data acquisition for model training difficult. In this work, we show how to leverage LLM-generated synthetic data to train and improve stance detection agents for online political discussions:(i) We generate synthetic data for specific debate questions by prompting a Mistral-7B model and show that fine-tuning with the generated synthetic data can substantially improve the performance of stance detection. (ii) We examine the impact of combining synthetic data with the most informative samples from an unlabelled dataset. First, we use the synthetic data to select the most informative samples, second, we combine both these samples and the synthetic data for fine-tuning. This approach reduces labelling effort and consistently surpasses the performance of the baseline model that is trained with fully labeled data. Overall, we show in comprehensive experiments that LLM-generated data greatly improves stance detection performance for online political discussions.

Updated: 2024-06-18 10:36:21

标题: LLM生成的合成数据在在线政治讨论中的立场检测中的力量

摘要: 立场检测对于提升在线政治讨论的质量具有巨大潜力，因为它已经被证明对于总结讨论、检测错误信息和评估意见分布非常有用。通常，基于transformer的模型直接用于立场检测，这需要大量的数据。然而，在在线政治讨论中广泛的辩论问题创造了模型面临的各种可能情景，从而使得模型训练的数据获取变得困难。在这项工作中，我们展示了如何利用LLM生成的合成数据来训练和改进在线政治讨论的立场检测代理：(i)我们通过促使Mistral-7B模型生成特定辩论问题的合成数据，并展示了使用生成的合成数据进行微调可以显著提高立场检测的性能。(ii)我们研究了将合成数据与未标记数据集中最具信息量的样本相结合的影响。首先，我们使用合成数据选择最具信息量的样本，其次，我们将这些样本和合成数据结合进行微调。这种方法减少了标注工作量，并始终超过了使用完全标记数据训练的基准模型的性能。总的来说，我们在全面的实验中展示了LLM生成的数据极大地提升了在线政治讨论的立场检测性能。

更新时间: 2024-06-18 10:36:21

领域: cs.CL,cs.LG

下载: http://arxiv.org/abs/2406.12480v1

RS-GPT4V: A Unified Multimodal Instruction-Following Dataset for Remote Sensing Image Understanding

The remote sensing image intelligence understanding model is undergoing a new profound paradigm shift which has been promoted by multi-modal large language model (MLLM), i.e. from the paradigm learning a domain model (LaDM) shifts to paradigm learning a pre-trained general foundation model followed by an adaptive domain model (LaGD). Under the new LaGD paradigm, the old datasets, which have led to advances in RSI intelligence understanding in the last decade, are no longer suitable for fire-new tasks. We argued that a new dataset must be designed to lighten tasks with the following features: 1) Generalization: training model to learn shared knowledge among tasks and to adapt to different tasks; 2) Understanding complex scenes: training model to understand the fine-grained attribute of the objects of interest, and to be able to describe the scene with natural language; 3) Reasoning: training model to be able to realize high-level visual reasoning. In this paper, we designed a high-quality, diversified, and unified multimodal instruction-following dataset for RSI understanding produced by GPT-4V and existing datasets, which we called RS-GPT4V. To achieve generalization, we used a (Question, Answer) which was deduced from GPT-4V via instruction-following to unify the tasks such as captioning and localization; To achieve complex scene, we proposed a hierarchical instruction description with local strategy in which the fine-grained attributes of the objects and their spatial relationships are described and global strategy in which all the local information are integrated to yield detailed instruction descript; To achieve reasoning, we designed multiple-turn QA pair to provide the reasoning ability for a model. The empirical results show that the fine-tuned MLLMs by RS-GPT4V can describe fine-grained information. The dataset is available at: https://github.com/GeoX-Lab/RS-GPT4V.

Updated: 2024-06-18 10:34:28

标题: RS-GPT4V：用于遥感图像理解的统一多模态指令跟随数据集

摘要: 遥感图像智能理解模型正在经历一个由多模态大语言模型（MLLM）推动的新的深刻范式转变，即从学习领域模型（LaDM）的范式转变为学习预训练的通用基础模型，然后是自适应领域模型（LaGD）的范式。在新的LaGD范式下，过去十年中导致RSI智能理解取得进展的旧数据集不再适用于全新的任务。我们认为必须设计一个新数据集以轻松处理具有以下特征的任务：1）泛化：训练模型学习任务之间的共享知识并适应不同任务；2）理解复杂场景：训练模型理解感兴趣对象的细粒度属性，并能够用自然语言描述场景；3）推理：训练模型能够实现高级视觉推理。在本文中，我们设计了一个由GPT-4V和现有数据集生成的高质量、多样化和统一的遥感图像理解多模态指令遵循数据集，我们称之为RS-GPT4V。为了实现泛化，我们使用了从GPT-4V通过指令遵循推导出的（问题，答案），统一了诸如字幕和定位等任务；为了实现复杂场景，我们提出了一个包含局部策略的分层指令描述，描述了感兴趣对象的细粒度属性和它们的空间关系，并包含了全局策略，其中所有局部信息被整合以产生详细的指令描述；为了实现推理，我们设计了多轮问答对，为模型提供推理能力。实证结果表明，通过RS-GPT4V微调的MLLMs能够描述细粒度信息。该数据集可在以下网址获取：https://github.com/GeoX-Lab/RS-GPT4V。

更新时间: 2024-06-18 10:34:28

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2406.12479v1

Accelerating Depthwise Separable Convolutions on Ultra-Low-Power Devices

Depthwise separable convolutions are a fundamental component in efficient Deep Neural Networks, as they reduce the number of parameters and operations compared to traditional convolutions while maintaining comparable accuracy. However, their low data reuse opportunities make deploying them notoriously difficult. In this work, we perform an extensive exploration of alternatives to fuse the depthwise and pointwise kernels that constitute the separable convolutional block. Our approach aims to minimize time-consuming memory transfers by combining different data layouts. When targeting a commercial ultra-low-power device with a three-level memory hierarchy, the GreenWaves GAP8 SoC, we reduce the latency of end-to-end network execution by up to 11.40%. Furthermore, our kernels reduce activation data movements between L2 and L1 memories by up to 52.97%.

Updated: 2024-06-18 10:32:40

标题: 在超低功耗设备上加速深度可分离卷积

摘要: 深度可分离卷积是高效深度神经网络中的基本组件，与传统卷积相比，它们减少了参数和操作的数量，同时保持了可比较的准确性。然而，它们低数据重用机会使得部署它们变得困难。在这项工作中，我们对融合构成可分离卷积块的深度和逐点核的替代方案进行了广泛探讨。我们的方法旨在通过组合不同的数据布局来最小化耗时的内存传输。当针对商用超低功耗设备GreenWaves GAP8 SoC时，我们将端到端网络执行的延迟减少了多达11.40%。此外，我们的核心减少了L2和L1内存之间的激活数据移动高达52.97%。

更新时间: 2024-06-18 10:32:40

领域: cs.LG,cs.DC

下载: http://arxiv.org/abs/2406.12478v1

Semi-Supervised Coupled Thin-Plate Spline Model for Rotation Correction and Beyond

Thin-plate spline (TPS) is a principal warp that allows for representing elastic, nonlinear transformation with control point motions. With the increase of control points, the warp becomes increasingly flexible but usually encounters a bottleneck caused by undesired issues, e.g., content distortion. In this paper, we explore generic applications of TPS in single-image-based warping tasks, such as rotation correction, rectangling, and portrait correction. To break this bottleneck, we propose the coupled thin-plate spline model (CoupledTPS), which iteratively couples multiple TPS with limited control points into a more flexible and powerful transformation. Concretely, we first design an iterative search to predict new control points according to the current latent condition. Then, we present the warping flow as a bridge for the coupling of different TPS transformations, effectively eliminating interpolation errors caused by multiple warps. Besides, in light of the laborious annotation cost, we develop a semi-supervised learning scheme to improve warping quality by exploiting unlabeled data. It is formulated through dual transformation between the searched control points of unlabeled data and its graphic augmentation, yielding an implicit correction consistency constraint. Finally, we collect massive unlabeled data to exhibit the benefit of our semi-supervised scheme in rotation correction. Extensive experiments demonstrate the superiority and universality of CoupledTPS over the existing state-of-the-art (SoTA) solutions for rotation correction and beyond. The code and data are available at https://github.com/nie-lang/CoupledTPS.

Updated: 2024-06-18 10:29:39

标题: 半监督耦合薄板样条旋转校正模型及其拓展

摘要: 薄板样条（TPS）是一种主要变形，允许用控制点运动表示弹性、非线性变换。随着控制点数量的增加，变形变得越来越灵活，但通常会遇到由不良问题引起的瓶颈，例如内容失真。在本文中，我们探讨了TPS在基于单幅图像的变形任务中的通用应用，例如旋转校正、矩形化和肖像校正。为了突破这一瓶颈，我们提出了耦合薄板样条模型（CoupledTPS），它将多个具有有限控制点的TPS迭代耦合成更灵活、更强大的转换。具体而言，我们首先设计了一个迭代搜索，根据当前的潜在条件来预测新的控制点。然后，我们将变形流作为不同TPS变换耦合的桥梁，有效消除由多次变形引起的插值错误。此外，考虑到繁重的注释成本，我们开发了一种半监督学习方案，通过利用未标记数据来提高变形质量。它通过未标记数据的搜索控制点和图形增强之间的双重转换来建立，产生隐式校正一致性约束。最后，我们收集了大量未标记数据，展示了我们半监督方案在旋转校正中的益处。大量实验表明，CoupledTPS相对于现有的旋转校正及其他方面的最新解决方案具有优越性和普适性。代码和数据可在https://github.com/nie-lang/CoupledTPS获得。

更新时间: 2024-06-18 10:29:39

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2401.13432v2

Adversarial Multi-dueling Bandits

We introduce the problem of regret minimization in adversarial multi-dueling bandits. While adversarial preferences have been studied in dueling bandits, they have not been explored in multi-dueling bandits. In this setting, the learner is required to select $m \geq 2$ arms at each round and observes as feedback the identity of the most preferred arm which is based on an arbitrary preference matrix chosen obliviously. We introduce a novel algorithm, MiDEX (Multi Dueling EXP3), to learn from such preference feedback that is assumed to be generated from a pairwise-subset choice model. We prove that the expected cumulative $T$-round regret of MiDEX compared to a Borda-winner from a set of $K$ arms is upper bounded by $O((K \log K)^{1/3} T^{2/3})$. Moreover, we prove a lower bound of $\Omega(K^{1/3} T^{2/3})$ for the expected regret in this setting which demonstrates that our proposed algorithm is near-optimal.

Updated: 2024-06-18 10:28:12

标题: 对抗性多对决赌博算法

摘要: 我们介绍了在对抗性多路对战赌博中最小化遗憾的问题。虽然对抗性偏好已经在对战赌博中进行了研究，但在多路对战赌博中尚未探讨。在这种情况下，学习者需要在每一轮选择$m \geq 2$个臂，并观察基于任意偏好矩阵选择的最受偏好臂的身份作为反馈。我们引入了一种新颖的算法MiDEX (Multi Dueling EXP3)，用于从假定由成对子集选择模型生成的偏好反馈中学习。我们证明了MiDEX相对于一组$K$个臂中的Borda获胜者的期望累积$T$轮遗憾上界为$O((K \log K)^{1/3} T^{2/3})$。此外，我们证明了在这种情况下期望遗憾的下界为$\Omega(K^{1/3} T^{2/3})，这证明了我们提出的算法是接近最优的。

更新时间: 2024-06-18 10:28:12

领域: cs.LG

下载: http://arxiv.org/abs/2406.12475v1

Adaptive Token Biaser: Knowledge Editing via Biasing Key Entities

The parametric knowledge memorized by large language models (LLMs) becomes outdated quickly. In-context editing (ICE) is currently the most effective method for updating the knowledge of LLMs. Recent advancements involve enhancing ICE by modifying the decoding strategy, obviating the need for altering internal model structures or adjusting external prompts. However, this enhancement operates across the entire sequence generation, encompassing a plethora of non-critical tokens. In this work, we introduce $\textbf{A}$daptive $\textbf{T}$oken $\textbf{Bias}$er ($\textbf{ATBias}$), a new decoding technique designed to enhance ICE. It focuses on the tokens that are mostly related to knowledge during decoding, biasing their logits by matching key entities related to new and parametric knowledge. Experimental results show that ATBias significantly enhances ICE performance, achieving up to a 32.3% improvement over state-of-the-art ICE methods while incurring only half the latency. ATBias not only improves the knowledge editing capabilities of ICE but can also be widely applied to LLMs with negligible cost.

Updated: 2024-06-18 10:18:06

标题: 自适应令牌偏置器：通过偏置关键实体进行知识编辑

摘要: 大型语言模型（LLMs）记忆的参数化知识很快就会过时。在上下文编辑（ICE）目前是更新LLMs知识最有效的方法。最近的进展涉及通过修改解码策略增强ICE，从而避免了改变内部模型结构或调整外部提示的需要。然而，这种增强操作涵盖了整个序列生成过程，包括大量非关键的标记。在这项工作中，我们介绍了一种新的解码技术$\textbf{A}$daptive $\textbf{T}$oken $\textbf{Bias}$er（$\textbf{ATBias}$），旨在增强ICE。它专注于在解码过程中与知识相关性最高的标记，通过匹配与新的和参数化知识相关的关键实体来偏置它们的逻辑。实验结果显示，ATBias显著增强了ICE的性能，相比最先进的ICE方法，性能提高了32.3%，而延迟仅为一半。ATBias不仅改善了ICE的知识编辑能力，而且可以广泛应用于LLMs，并且成本微不足道。

更新时间: 2024-06-18 10:18:06

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2406.12468v1

RIGL: A Unified Reciprocal Approach for Tracing the Independent and Group Learning Processes

In the realm of education, both independent learning and group learning are esteemed as the most classic paradigms. The former allows learners to self-direct their studies, while the latter is typically characterized by teacher-directed scenarios. Recent studies in the field of intelligent education have leveraged deep temporal models to trace the learning process, capturing the dynamics of students' knowledge states, and have achieved remarkable performance. However, existing approaches have primarily focused on modeling the independent learning process, with the group learning paradigm receiving less attention. Moreover, the reciprocal effect between the two learning processes, especially their combined potential to foster holistic student development, remains inadequately explored. To this end, in this paper, we propose RIGL, a unified Reciprocal model to trace knowledge states at both the individual and group levels, drawing from the Independent and Group Learning processes. Specifically, we first introduce a time frame-aware reciprocal embedding module to concurrently model both student and group response interactions across various time frames. Subsequently, we employ reciprocal enhanced learning modeling to fully exploit the comprehensive and complementary information between the two behaviors. Furthermore, we design a relation-guided temporal attentive network, comprised of dynamic graph modeling coupled with a temporal self-attention mechanism. It is used to delve into the dynamic influence of individual and group interactions throughout the learning processes. Conclusively, we introduce a bias-aware contrastive learning module to bolster the stability of the model's training. Extensive experiments on four real-world educational datasets clearly demonstrate the effectiveness of the proposed RIGL model.

Updated: 2024-06-18 10:16:18

标题: RIGL: 一种用于追踪独立和群体学习过程的统一互惠方法

摘要: 在教育领域中，独立学习和小组学习被视为最经典的范式。前者允许学习者自主指导他们的学习，而后者通常以教师指导的情景为特征。智能教育领域的最新研究利用深度时间模型来追踪学习过程，捕捉学生知识状态的动态变化，并取得了显著的表现。然而，现有方法主要集中在建模独立学习过程上，对小组学习范式的关注较少。此外，两种学习过程之间的相互作用，尤其是它们共同促进整体学生发展的潜力，仍未得到充分探讨。因此，在本文中，我们提出了RIGL，一个统一的相互模型，以追踪个体和小组两个层面的知识状态，借鉴独立学习和小组学习过程。具体而言，我们首先引入一个时间框架感知的相互嵌入模块，同时建模不同时间框架下的学生和小组响应交互。随后，我们采用相互增强学习建模，充分利用两种行为之间的全面和互补信息。此外，我们设计了一个关系引导的时间注意力网络，由动态图建模和时间自注意机制组成。它用于深入探讨学习过程中个人和小组互动的动态影响。最后，我们引入了一个偏差感知的对比学习模块，以增强模型训练的稳定性。对四个真实教育数据集的广泛实验清楚地证明了所提出的RIGL模型的有效性。

更新时间: 2024-06-18 10:16:18

领域: cs.CY,cs.AI,cs.IR

下载: http://arxiv.org/abs/2406.12465v1

Privacy in Speech Technology

Speech technology for communication, accessing information and services has rapidly improved in quality. It is convenient and appealing because speech is the primary mode of communication for humans. Such technology however also presents proven threats to privacy. Speech is a tool for communication and it will thus inherently contain private information. Importantly, it however also contains a wealth of side information, such as information related to health, emotions, affiliations, and relationships, all of which are private. Exposing such private information can lead to serious threats such as price gouging, harassment, extortion, and stalking. This paper is a tutorial on privacy issues related to speech technology, modeling their threats, approaches for protecting users' privacy, measuring the performance of privacy-protecting methods, perception of privacy as well as societal and legal consequences. In addition to a tutorial overview, it also presents lines for further development where improvements are most urgently needed.

Updated: 2024-06-18 10:00:26

标题: 语音技术中的隐私保护

摘要: 语音技术在沟通、获取信息和服务方面的质量迅速提高。这种技术方便且吸引人，因为语音是人类的主要交流方式。然而，这种技术也存在对隐私的实际威胁。语音是一种交流工具，因此本质上包含私人信息。重要的是，它还包含大量的附加信息，如与健康、情绪、关联和关系相关的信息，所有这些都是私人的。暴露这些私人信息可能导致严重威胁，如哄抬价格、骚扰、勒索和跟踪。本文是关于与语音技术相关的隐私问题的教程，对其威胁进行建模，探讨保护用户隐私的方法，衡量保护隐私方法的性能，隐私感知以及社会和法律后果。除了教程概述外，还提出了进一步发展的方向，其中最迫切需要改进的地方。

更新时间: 2024-06-18 10:00:26

领域: eess.AS,cs.CR,cs.SD

下载: http://arxiv.org/abs/2305.05227v2

A Neural Column Generation Approach to the Vehicle Routing Problem with Two-Dimensional Loading and Last-In-First-Out Constraints

The vehicle routing problem with two-dimensional loading constraints (2L-CVRP) and the last-in-first-out (LIFO) rule presents significant practical and algorithmic challenges. While numerous heuristic approaches have been proposed to address its complexity, stemming from two NP-hard problems: the vehicle routing problem (VRP) and the two-dimensional bin packing problem (2D-BPP), less attention has been paid to developing exact algorithms. Bridging this gap, this article presents an exact algorithm that integrates advanced machine learning techniques, specifically a novel combination of attention and recurrence mechanisms. This integration accelerates the state-of-the-art exact algorithm by a median of 29.79% across various problem instances. Moreover, the proposed algorithm successfully resolves an open instance in the standard test-bed, demonstrating significant improvements brought about by the incorporation of machine learning models. Code is available at https://github.com/xyfffff/NCG-for-2L-CVRP.

Updated: 2024-06-18 09:58:29

标题: 一种神经列生成方法用于具有二维装载和先进后出约束的车辆路径问题

摘要: 具有二维装载约束的车辆路径问题（2L-CVRP）和后进先出（LIFO）规则在实践和算法上都存在显著挑战。虽然已提出了许多启发式方法来解决其复杂性，源自两个NP难问题：车辆路径问题（VRP）和二维装箱问题（2D-BPP），但对于开发精确算法却付之较少关注。弥合这一差距，本文提出了一种精确算法，该算法整合了先进的机器学习技术，特别是一种新颖的注意力和循环机制的组合。这种整合通过各种问题实例的中位数加速了最先进的精确算法29.79％。此外，所提出的算法成功解决了标准测试平台上的一个开放实例，展示了机器学习模型融入带来的重大改进。代码可在https://github.com/xyfffff/NCG-for-2L-CVRP找到。

更新时间: 2024-06-18 09:58:29

领域: cs.AI

下载: http://arxiv.org/abs/2406.12454v1

Fixed Design Analysis of Regularization-Based Continual Learning

We consider a continual learning (CL) problem with two linear regression tasks in the fixed design setting, where the feature vectors are assumed fixed and the labels are assumed to be random variables. We consider an $\ell_2$-regularized CL algorithm, which computes an Ordinary Least Squares parameter to fit the first dataset, then computes another parameter that fits the second dataset under an $\ell_2$-regularization penalizing its deviation from the first parameter, and outputs the second parameter. For this algorithm, we provide tight bounds on the average risk over the two tasks. Our risk bounds reveal a provable trade-off between forgetting and intransigence of the $\ell_2$-regularized CL algorithm: with a large regularization parameter, the algorithm output forgets less information about the first task but is intransigent to extract new information from the second task; and vice versa. Our results suggest that catastrophic forgetting could happen for CL with dissimilar tasks (under a precise similarity measurement) and that a well-tuned $\ell_2$-regularization can partially mitigate this issue by introducing intransigence.

Updated: 2024-06-18 09:57:26

标题: 基于正则化的连续学习固定设计分析

摘要: 我们考虑一个在固定设计环境中具有两个线性回归任务的持续学习（CL）问题，其中特征向量被假定为固定的，标签被假定为随机变量。我们考虑一个$\ell_2$正则化的CL算法，该算法计算一个普通最小二乘参数来拟合第一个数据集，然后计算另一个参数，在$\ell_2$正则化的情况下适应第二个数据集，惩罚其与第一个参数的偏差，并输出第二个参数。对于这个算法，我们提供了关于两个任务的平均风险的严格界限。我们的风险界限揭示了$\ell_2$正则化的CL算法在遗忘和不易让步之间的可证明权衡：具有大的正则化参数，算法输出对第一个任务的信息遗忘较少，但是对于从第二个任务中提取新信息的不易让步；反之亦然。我们的结果表明，对于具有不同任务的CL，灾难性遗忘可能会发生（在精确的相似性测量下），而经过良好调整的$\ell_2$正则化可以通过引入不易让步来部分缓解这个问题。

更新时间: 2024-06-18 09:57:26

领域: cs.LG

下载: http://arxiv.org/abs/2303.10263v2

Insect Identification in the Wild: The AMI Dataset

Insects represent half of all global biodiversity, yet many of the world's insects are disappearing, with severe implications for ecosystems and agriculture. Despite this crisis, data on insect diversity and abundance remain woefully inadequate, due to the scarcity of human experts and the lack of scalable tools for monitoring. Ecologists have started to adopt camera traps to record and study insects, and have proposed computer vision algorithms as an answer for scalable data processing. However, insect monitoring in the wild poses unique challenges that have not yet been addressed within computer vision, including the combination of long-tailed data, extremely similar classes, and significant distribution shifts. We provide the first large-scale machine learning benchmarks for fine-grained insect recognition, designed to match real-world tasks faced by ecologists. Our contributions include a curated dataset of images from citizen science platforms and museums, and an expert-annotated dataset drawn from automated camera traps across multiple continents, designed to test out-of-distribution generalization under field conditions. We train and evaluate a variety of baseline algorithms and introduce a combination of data augmentation techniques that enhance generalization across geographies and hardware setups. Code and datasets are made publicly available.

Updated: 2024-06-18 09:57:02

标题: 在野外的昆虫识别：AMI数据集

摘要: 昆虫代表了全球生物多样性的一半，然而许多世界上的昆虫正在消失，这对生态系统和农业产生了严重影响。尽管存在这一危机，由于人类专家稀缺和缺乏可扩展的监测工具，关于昆虫多样性和丰富度的数据仍然非常不足。生态学家已经开始采用摄像机陷阱来记录和研究昆虫，并提出计算机视觉算法作为可扩展数据处理的解决方案。然而，在野外监测昆虫面临着尚未在计算机视觉中解决的独特挑战，包括长尾数据的组合、极其相似的类别以及显著的分布变化。我们提供了第一个针对细粒度昆虫识别的大规模机器学习基准，旨在匹配生态学家面临的真实任务。我们的贡献包括来自公民科学平台和博物馆的图片数据集，以及来自多个大陆的自动摄像机陷阱的专家注释数据集，旨在在野外条件下测试超出分布的泛化能力。我们训练和评估了各种基线算法，并引入了一系列数据增强技术，以增强跨地理位置和硬件设置的泛化能力。代码和数据集已公开发布。

更新时间: 2024-06-18 09:57:02

领域: cs.CV,cs.AI,cs.LG

下载: http://arxiv.org/abs/2406.12452v1

Retrieval-Augmented Generation for Generative Artificial Intelligence in Medicine

Generative artificial intelligence (AI) has brought revolutionary innovations in various fields, including medicine. However, it also exhibits limitations. In response, retrieval-augmented generation (RAG) provides a potential solution, enabling models to generate more accurate contents by leveraging the retrieval of external knowledge. With the rapid advancement of generative AI, RAG can pave the way for connecting this transformative technology with medical applications and is expected to bring innovations in equity, reliability, and personalization to health care.

Updated: 2024-06-18 09:53:37

标题: 检索增强生成：医学中生成人工智能

摘要: 生成人工智能（AI）在包括医学在内的各个领域带来了革命性的创新。然而，它也存在一些局限性。为了应对这些局限性，检索增强生成（RAG）提供了一个潜在的解决方案，通过利用外部知识的检索，使模型能够生成更准确的内容。随着生成AI的快速发展，RAG可以为连接这一变革性技术与医疗应用铺平道路，并有望为医疗保健带来公平性、可靠性和个性化创新。

更新时间: 2024-06-18 09:53:37

领域: cs.AI

下载: http://arxiv.org/abs/2406.12449v1

Data Set Terminology of Deep Learning in Medicine: A Historical Review and Recommendation

Medicine and deep learning-based artificial intelligence (AI) engineering represent two distinct fields each with decades of published history. With such history comes a set of terminology that has a specific way in which it is applied. However, when two distinct fields with overlapping terminology start to collaborate, miscommunication and misunderstandings can occur. This narrative review aims to give historical context for these terms, accentuate the importance of clarity when these terms are used in medical AI contexts, and offer solutions to mitigate misunderstandings by readers from either field. Through an examination of historical documents, including articles, writing guidelines, and textbooks, this review traces the divergent evolution of terms for data sets and their impact. Initially, the discordant interpretations of the word 'validation' in medical and AI contexts are explored. Then the data sets used for AI evaluation are classified, namely random splitting, cross-validation, temporal, geographic, internal, and external sets. The accurate and standardized description of these data sets is crucial for demonstrating the robustness and generalizability of AI applications in medicine. This review clarifies existing literature to provide a comprehensive understanding of these classifications and their implications in AI evaluation. This review then identifies often misunderstood terms and proposes pragmatic solutions to mitigate terminological confusion. Among these solutions are the use of standardized terminology such as 'training set,' 'validation (or tuning) set,' and 'test set,' and explicit definition of data set splitting terminologies in each medical AI research publication. This review aspires to enhance the precision of communication in medical AI, thereby fostering more effective and transparent research methodologies in this interdisciplinary field.

Updated: 2024-06-18 09:49:49

标题: 医学深度学习数据集术语：历史回顾和建议

摘要: 医学和基于深度学习的人工智能工程代表了两个具有数十年历史的不同领域。随着这样的历史，出现了一系列具有特定应用方式的术语。然而，当具有重叠术语的两个不同领域开始合作时，可能会发生误解和误解。本叙事评论旨在为这些术语提供历史背景，强调在医学人工智能环境中使用这些术语时的重要性，并提供解决方案，以减少来自任何领域的读者的误解。通过对历史文件的检查，包括文章、写作指南和教科书，本评论追溯了数据集术语及其影响的分歧演变。首先，探讨了医学和人工智能环境中对“验证”一词的不一致解释。然后对用于人工智能评估的数据集进行了分类，即随机分割、交叉验证、时间、地理、内部和外部集。准确和标准化地描述这些数据集对于证明医学中人工智能应用的健壮性和普适性至关重要。本评论澄清了现有文献，以提供对这些分类及其在人工智能评估中的含义的全面理解。本评论随后确定了经常被误解的术语，并提出了缓解术语混淆的务实解决方案。这些解决方案包括使用标准术语，如“训练集”、“验证（或调整）集”和“测试集”，以及在每篇医学人工智能研究出版物中明确定义数据集分割术语。本评论旨在提高医学人工智能领域中的沟通精度，从而促进该跨学科领域中更有效和透明的研究方法。

更新时间: 2024-06-18 09:49:49

领域: cs.AI,cs.CV

下载: http://arxiv.org/abs/2404.19303v2

Abstraction-of-Thought Makes Language Models Better Reasoners

Abstract reasoning, the ability to reason from the abstract essence of a problem, serves as a key to generalization in human reasoning. However, eliciting language models to perform reasoning with abstraction remains unexplored. This paper seeks to bridge this gap by introducing a novel structured reasoning format called Abstraction-of-Thought (AoT). The uniqueness of AoT lies in its explicit requirement for varying levels of abstraction within the reasoning process. This approach could elicit language models to first contemplate on the abstract level before incorporating concrete details, which is overlooked by the prevailing step-by-step Chain-of-Thought (CoT) method. To align models with the AoT format, we present AoT Collection, a generic finetuning dataset consisting of 348k high-quality samples with AoT reasoning processes, collected via an automated and scalable pipeline. We finetune a wide range of language models with AoT Collection and conduct extensive evaluations on 23 unseen tasks from the challenging benchmark Big-Bench Hard. Experimental results indicate that models aligned to AoT reasoning format substantially outperform those aligned to CoT in many reasoning tasks.

Updated: 2024-06-18 09:46:44

标题: 思维的抽象使语言模型更好地推理

摘要: 抽象推理是从问题的抽象本质进行推理的能力，在人类推理中起着泛化的关键作用。然而，激发语言模型进行抽象推理仍未被探索。本文旨在通过引入一种新颖的结构化推理格式——思维抽象（AoT），来弥补这一空白。AoT的独特之处在于其在推理过程中明确要求不同层次的抽象。这种方法可以促使语言模型在融入具体细节之前首先考虑抽象层面，这一点被普遍的逐步思维链（CoT）方法所忽视。为了使模型符合AoT格式，我们提出了AoT Collection，一个包含348k个高质量样本及AoT推理过程的通用微调数据集，通过自动化和可扩展的流水线收集而来。我们使用AoT Collection对各种语言模型进行微调，并在具有挑战性的基准测试Big-Bench Hard的23个未见任务上进行广泛评估。实验结果表明，符合AoT推理格式的模型在许多推理任务中明显优于符合CoT的模型。

更新时间: 2024-06-18 09:46:44

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2406.12442v1

ERASER: Machine Unlearning in MLaaS via an Inference Serving-Aware Approach

Over the past years, Machine Learning-as-a-Service (MLaaS) has received a surging demand for supporting Machine Learning-driven services to offer revolutionized user experience across diverse application areas. MLaaS provides inference service with low inference latency based on an ML model trained using a dataset collected from numerous individual data owners. Recently, for the sake of data owners' privacy and to comply with the "right to be forgotten (RTBF)" as enacted by data protection legislation, many machine unlearning methods have been proposed to remove data owners' data from trained models upon their unlearning requests. However, despite their promising efficiency, almost all existing machine unlearning methods handle unlearning requests independently from inference requests, which unfortunately introduces a new security issue of inference service obsolescence and a privacy vulnerability of undesirable exposure for machine unlearning in MLaaS. In this paper, we propose the ERASER framework for machinE unleaRning in MLaAS via an inferencE seRving-aware approach. ERASER strategically choose appropriate unlearning execution timing to address the inference service obsolescence issue. A novel inference consistency certification mechanism is proposed to avoid the violation of RTBF principle caused by postponed unlearning executions, thereby mitigating the undesirable exposure vulnerability. ERASER offers three groups of design choices to allow for tailor-made variants that best suit the specific environments and preferences of various MLaaS systems. Extensive empirical evaluations across various settings confirm ERASER's effectiveness, e.g., it can effectively save up to 99% of inference latency and 31% of computation overhead over the inference-oblivion baseline.

Updated: 2024-06-18 09:46:06

标题: ERASER：通过推理服务感知方法在MLaaS中实现机器学习消除

摘要: 在过去几年中，机器学习即服务（MLaaS）已经受到了支持基于机器学习的服务以提供革命性用户体验的激增需求，覆盖了多个应用领域。MLaaS提供基于使用从众多个人数据所有者收集的数据集训练的ML模型的推理服务，具有低推理延迟。最近，为了保护数据所有者的隐私并遵守数据保护法规中规定的“被遗忘的权利（RTBF）”，许多机器解除学习方法已被提出，以在数据所有者要求解除学习时从经过训练的模型中删除其数据。然而，尽管它们具有很高的效率，几乎所有现有的机器解除学习方法都独立处理解除学习请求，这不幸地引入了推理服务过时的新安全问题和在MLaaS中解除学习的隐私漏洞的风险。在本文中，我们提出了ERASER框架，通过一种推理服务感知方法，用于MLaaS中的机器解除学习。ERASER策略性地选择适当的解除学习执行时间来解决推理服务过时问题。提出了一种新颖的推理一致性认证机制，以避免由于延迟解除学习执行而引起的违反RTBF原则，从而减轻不良暴露的漏洞。ERASER提供三组设计选择，以允许定制的变体，最适合各种MLaaS系统的具体环境和偏好。在各种设置下进行的广泛实证评估证实了ERASER的有效性，例如，它可以有效节省高达99%的推理延迟和31%的计算开销，超过了推理遗忘基线。

更新时间: 2024-06-18 09:46:06

领域: cs.CR,cs.AI

下载: http://arxiv.org/abs/2311.16136v3

Cycle-Correspondence Loss: Learning Dense View-Invariant Visual Features from Unlabeled and Unordered RGB Images

Robot manipulation relying on learned object-centric descriptors became popular in recent years. Visual descriptors can easily describe manipulation task objectives, they can be learned efficiently using self-supervision, and they can encode actuated and even non-rigid objects. However, learning robust, view-invariant keypoints in a self-supervised approach requires a meticulous data collection approach involving precise calibration and expert supervision. In this paper we introduce Cycle-Correspondence Loss (CCL) for view-invariant dense descriptor learning, which adopts the concept of cycle-consistency, enabling a simple data collection pipeline and training on unpaired RGB camera views. The key idea is to autonomously detect valid pixel correspondences by attempting to use a prediction over a new image to predict the original pixel in the original image, while scaling error terms based on the estimated confidence. Our evaluation shows that we outperform other self-supervised RGB-only methods, and approach performance of supervised methods, both with respect to keypoint tracking as well as for a robot grasping downstream task.

Updated: 2024-06-18 09:44:56

标题: Cycle-Correspondence Loss: 从未标记和无序的RGB图像中学习稠密的视图不变特征

摘要: 在最近几年中，依赖于学习的物体中心描述符的机器人操作变得流行起来。视觉描述符可以轻松描述操作任务目标，它们可以通过自我监督有效地学习，并且可以编码被激发甚至非刚性的物体。然而，在自我监督方法中学习稳健的、视角不变的关键点需要一种谨慎的数据收集方法，涉及精确的校准和专家监督。在本文中，我们介绍了用于视角不变密集描述符学习的Cycle-Correspondence Loss（CCL），它采用了循环一致性的概念，实现了简单的数据收集流程，并在未配对的RGB摄像头视图上进行训练。关键思想是通过尝试使用新图像上的预测来预测原始图像中的原始像素，同时根据估计的置信度调整错误项的比例。我们的评估显示，我们在自我监督的RGB-only方法中表现优异，并且在关键点跟踪以及机器人抓取下游任务方面接近监督方法的表现。

更新时间: 2024-06-18 09:44:56

领域: cs.CV,cs.AI,cs.RO

下载: http://arxiv.org/abs/2406.12441v1

A data-centric approach for assessing progress of Graph Neural Networks

Graph Neural Networks (GNNs) have achieved state-of-the-art results in node classification tasks. However, most improvements are in multi-class classification, with less focus on the cases where each node could have multiple labels. The first challenge in studying multi-label node classification is the scarcity of publicly available datasets. To address this, we collected and released three real-world biological datasets and developed a multi-label graph generator with tunable properties. We also argue that traditional notions of homophily and heterophily do not apply well to multi-label scenarios. Therefore, we define homophily and Cross-Class Neighborhood Similarity for multi-label classification and investigate $9$ collected multi-label datasets. Lastly, we conducted a large-scale comparative study with $8$ methods across nine datasets to evaluate current progress in multi-label node classification. We release our code at \url{https://github.com/Tianqi-py/MLGNC}.

Updated: 2024-06-18 09:41:40

标题: 一种评估图神经网络进展的数据中心方法

摘要: 图神经网络（GNNs）在节点分类任务中取得了最先进的结果。然而，大多数改进都集中在多类分类上，对每个节点可能具有多个标签的情况关注较少。研究多标签节点分类的第一个挑战是公开可用数据集的稀缺性。为了解决这个问题，我们收集并发布了三个真实的生物数据集，并开发了一个具有可调属性的多标签图生成器。我们还认为传统的同质性和异质性概念不适用于多标签场景。因此，我们为多标签分类定义了同质性和跨类邻域相似性，并研究了9个收集的多标签数据集。最后，我们进行了一项大规模的对比研究，涵盖了9个数据集中的8种方法，以评估当前在多标签节点分类方面的进展。我们在\url{https://github.com/Tianqi-py/MLGNC}上发布了我们的代码。

更新时间: 2024-06-18 09:41:40

领域: cs.LG

下载: http://arxiv.org/abs/2406.12439v1

Gaussian Process on the Product of Directional Manifolds

We present a principled study on defining Gaussian processes (GPs) with inputs on the product of directional manifolds. A circular kernel is first presented according to the von Mises distribution. Based thereon, the hypertoroidal von Mises (HvM) kernel is proposed to establish GPs on hypertori with consideration of correlated circular components. The proposed HvM kernel is demonstrated with multi-output GP regression for learning vector-valued functions on hypertori using the intrinsic coregionalization model. Analytic derivatives for hyperparameter optimization are provided for runtime-critical applications. For evaluation, we synthesize a ranging-based sensor network and employ the HvM-based GPs for data-driven recursive localization. Numerical results show that the HvM-based GP achieves superior tracking accuracy compared to parametric model and GPs of conventional kernel designs.

Updated: 2024-06-18 09:40:03

标题: 高斯过程在方向流形乘积上的应用

摘要: 我们提出了一个关于在方向流形的乘积上定义高斯过程（GPs）的原则性研究。首先根据von Mises分布提出了一个圆形核。基于此，提出了超环面von Mises（HvM）核，用于在超环面上建立具有相关圆形组件考虑的GPs。提出的HvM核通过使用内在核相关模型进行多输出GP回归来学习超环面上的向量值函数。提供了用于超参数优化的解析导数，适用于运行时关键应用。为了评估，我们合成了一个基于测距的传感器网络，并使用基于HvM的GPs进行数据驱动的递归定位。数值结果表明，与参数模型和传统核设计的GPs相比，基于HvM的GP实现了更好的跟踪精度。

更新时间: 2024-06-18 09:40:03

领域: cs.LG

下载: http://arxiv.org/abs/2303.06799v3

Fuzzy Convolution Neural Networks for Tabular Data Classification

Recently, convolution neural networks (CNNs) have attracted a great deal of attention due to their remarkable performance in various domains, particularly in image and text classification tasks. However, their application to tabular data classification remains underexplored. There are many fields such as bioinformatics, finance, medicine where nonimage data are prevalent. Adaption of CNNs to classify nonimage data remains highly challenging. This paper investigates the efficacy of CNNs for tabular data classification, aiming to bridge the gap between traditional machine learning approaches and deep learning techniques. We propose a novel framework fuzzy convolution neural network (FCNN) tailored specifically for tabular data to capture local patterns within feature vectors. In our approach, we map feature values to fuzzy memberships. The fuzzy membership vectors are converted into images that are used to train the CNN model. The trained CNN model is used to classify unknown feature vectors. To validate our approach, we generated six complex noisy data sets. We used randomly selected seventy percent samples from each data set for training and thirty percent for testing. The data sets were also classified using the state-of-the-art machine learning algorithms such as the decision tree (DT), support vector machine (SVM), fuzzy neural network (FNN), Bayes classifier, and Random Forest (RF). Experimental results demonstrate that our proposed model can effectively learn meaningful representations from tabular data, achieving competitive or superior performance compared to existing methods. Overall, our finding suggests that the proposed FCNN model holds promise as a viable alternative for tabular data classification tasks, offering a fresh prospective and potentially unlocking new opportunities for leveraging deep learning in structured data analysis.

Updated: 2024-06-18 09:36:13

标题: 模糊卷积神经网络用于表格数据分类

摘要: 最近，卷积神经网络（CNN）由于在各个领域，特别是在图像和文本分类任务中表现出色，引起了广泛关注。然而，它们在表格数据分类方面的应用仍未被充分探索。在许多领域，如生物信息学、金融、医学等，非图像数据普遍存在。将CNN调整为用于分类非图像数据仍然具有极高的挑战性。本文研究了CNN在表格数据分类中的有效性，旨在弥合传统机器学习方法和深度学习技术之间的差距。我们提出了一个新颖的框架，即用于表格数据的模糊卷积神经网络（FCNN），专门设计以捕捉特征向量中的局部模式。在我们的方法中，我们将特征值映射到模糊成员资格。模糊成员资格向量被转换为图像，用于训练CNN模型。训练后的CNN模型用于分类未知的特征向量。为了验证我们的方法，我们生成了六个复杂的噪声数据集。我们从每个数据集中随机选择了70%的样本用于训练，30%用于测试。这些数据集还使用了决策树（DT）、支持向量机（SVM）、模糊神经网络（FNN）、贝叶斯分类器和随机森林（RF）等最先进的机器学习算法进行分类。实验结果表明，我们提出的模型可以有效地从表格数据中学习有意义的表示，实现了与现有方法相比具有竞争力或更好的性能。总的来说，我们的发现表明，提出的FCNN模型作为表格数据分类任务的可行替代方案具有潜力，为结构化数据分析提供了全新的前景，有可能开辟深度学习在结构化数据分析中的新机遇。

更新时间: 2024-06-18 09:36:13

领域: cs.LG,cs.AI,I.2.10,I.4.6

下载: http://arxiv.org/abs/2406.03506v3

Federated Learning with Limited Node Labels

Subgraph federated learning (SFL) is a research methodology that has gained significant attention for its potential to handle distributed graph-structured data. In SFL, the local model comprises graph neural networks (GNNs) with a partial graph structure. However, some SFL models have overlooked the significance of missing cross-subgraph edges, which can lead to local GNNs being unable to message-pass global representations to other parties' GNNs. Moreover, existing SFL models require substantial labeled data, which limits their practical applications. To overcome these limitations, we present a novel SFL framework called FedMpa that aims to learn cross-subgraph node representations. FedMpa first trains a multilayer perceptron (MLP) model using a small amount of data and then propagates the federated feature to the local structures. To further improve the embedding representation of nodes with local subgraphs, we introduce the FedMpae method, which reconstructs the local graph structure with an innovation view that applies pooling operation to form super-nodes. Our extensive experiments on six graph datasets demonstrate that FedMpa is highly effective in node classification. Furthermore, our ablation experiments verify the effectiveness of FedMpa.

Updated: 2024-06-18 09:30:10

标题: 有限节点标签的联邦学习

摘要: 子图联合学习（SFL）是一种研究方法，因其处理分布式图结构数据的潜力而受到重视。在SFL中，本地模型由部分图结构的图神经网络（GNNs）组成。然而，一些SFL模型忽视了缺失的跨子图边的重要性，这可能导致本地GNN无法将全局表示传递给其他方的GNN。此外，现有的SFL模型需要大量标记数据，这限制了它们的实际应用。为了克服这些限制，我们提出了一种名为FedMpa的新型SFL框架，旨在学习跨子图节点表示。FedMpa首先使用少量数据训练多层感知器（MLP）模型，然后将联合特征传播到本地结构。为了进一步改进具有本地子图的节点嵌入表示，我们引入了FedMpae方法，该方法使用创新视角重建本地图结构，将池化操作应用于形成超级节点。我们在六个图数据集上进行了大量实验，结果表明FedMpa在节点分类方面非常有效。此外，我们的消融实验验证了FedMpa的有效性。

更新时间: 2024-06-18 09:30:10

领域: cs.LG,cs.AI,cs.DC

下载: http://arxiv.org/abs/2406.12435v1

The Influencer Next Door: How Misinformation Creators Use GenAI

Advances in generative AI (GenAI) have raised concerns about detecting and discerning AI-generated content from human-generated content. Most existing literature assumes a paradigm where 'expert' organized disinformation creators and flawed AI models deceive 'ordinary' users. Based on longitudinal ethnographic research with misinformation creators and consumers between 2022-2023, we instead find that GenAI supports bricolage work, where non-experts increasingly use GenAI to remix, repackage, and (re)produce content to meet their personal needs and desires. This research yielded four key findings: First, participants primarily used GenAI for creation, rather than truth-seeking. Second, a spreading 'influencer millionaire' narrative drove participants to become content creators, using GenAI as a productivity tool to generate a volume of (often misinformative) content. Third, GenAI lowered the barrier to entry for content creation across modalities, enticing consumers to become creators and significantly increasing existing creators' output. Finally, participants used Gen AI to learn and deploy marketing tactics to expand engagement and monetize their content. We argue for shifting analysis from the public as consumers of AI content to bricoleurs who use GenAI creatively, often without a detailed understanding of its underlying technology. We analyze how these understudied emergent uses of GenAI produce new or accelerated misinformation harms, and their implications for AI products, platforms and policies.

Updated: 2024-06-18 09:29:48

标题: 《邻居身边的影响者：误导信息创作者如何利用人工智能》

摘要: 人工智能生成技术的进步引发了对检测和区分人工智能生成内容和人类生成内容的担忧。大多数现有文献假定一个范式，即‘专家’组织的虚假信息制造者和有缺陷的人工智能模型欺骗‘普通’用户。基于2022-2023年间对虚假信息制造者和消费者进行的纵向民族志研究，我们发现人工智能生成技术支持拼凑工作，非专家越来越多地使用人工智能生成技术来重新混合、重新打包和(再)生产内容以满足他们的个人需求和欲望。这项研究得出了四个关键发现：首先，参与者主要使用人工智能生成技术进行创作，而不是寻求真相。其次，一种传播中的‘影响力百万富翁’叙事驱使参与者成为内容创作者，利用人工智能生成技术作为生产工具来生成大量(通常是误导性的)内容。第三，人工智能生成技术降低了跨模态内容创作的准入门槛，诱使消费者成为创作者，并显著增加现有创作者的产出。最后，参与者利用人工智能生成技术学习和应用营销策略来扩大参与度并赚取内容。我们主张将分析重点从公众作为人工智能内容的消费者转变为创意使用人工智能生成技术的拼凑者，他们往往并不详细了解其基础技术。我们分析了这些鲜为人知的人工智能生成技术的新兴用途如何产生新的或加速的误导性伤害，以及它们对人工智能产品、平台和政策的影响。

更新时间: 2024-06-18 09:29:48

领域: cs.HC,cs.AI

下载: http://arxiv.org/abs/2405.13554v3

Towards Audio Codec-based Speech Separation

Recent improvements in neural audio codec (NAC) models have generated interest in adopting pre-trained codecs for a variety of speech processing applications to take advantage of the efficiencies gained from high compression, but these have yet been applied to the speech separation (SS) task. SS can benefit from high compression because the compute required for traditional SS models makes them impractical for many edge computing use cases. However, SS is a waveform-masking task where compression tends to introduce distortions that severely impact performance. Here we propose a novel task of Audio Codec-based SS, where SS is performed within the embedding space of a NAC, and propose a new model, Codecformer, to address this task. At inference, Codecformer achieves a 52x reduction in MAC while producing separation performance comparable to a cloud deployment of Sepformer. This method charts a new direction for performing efficient SS in practical scenarios.

Updated: 2024-06-18 09:29:24

标题: 朝向基于音频编解码器的语音分离

摘要: 最近改进的神经音频编解码器（NAC）模型引起了人们对采用预训练编解码器进行各种语音处理应用的兴趣，以利用高压缩带来的效率提升，但这些尚未应用于语音分离（SS）任务。SS可以从高压缩中受益，因为传统SS模型所需的计算使它们在许多边缘计算用例中变得不切实际。然而，SS是一项波形掩码任务，压缩往往会引入严重影响性能的失真。在这里，我们提出了一种新颖的基于音频编解码器的SS任务，其中SS在NAC的嵌入空间内执行，并提出了一个新模型Codecformer来解决这个任务。在推理阶段，Codecformer实现了52倍的MAC减少，同时产生了与Sepformer云部署相当的分离性能。这种方法为在实际场景中执行高效的SS开辟了新方向。

更新时间: 2024-06-18 09:29:24

领域: cs.SD,cs.LG,eess.AS

下载: http://arxiv.org/abs/2406.12434v1

Exploring Sensing Devices for Heart and Lung Sound Monitoring

This paper presents a comprehensive review of cardiorespiratory auscultation sensing devices which is useful for understanding the theoretical aspects of sensing devices, as well as practical notes to design novel sensing devices. One of the methods to design a stethoscope is using electret condenser microphones (ECM). In this paper, we first introduce the acoustic properties of the heart and lungs, as well as a brief history of stethoscope evolution. Then, we discuss the basic concept of ECM sensors and a recent stethoscope based on this technology. In response to the limitations of ECM-based systems, we explore the potential of microelectromechanical systems (MEMS), particularly focusing on piezoelectric transducer (PZT) sensors. This paper comprehensively reviews sensing technologies, emphasizing innovative MEMS-based designs for wearable cardiopulmonary auscultation in the past decade. To our knowledge, this is the first paper to summarize ECM and MEMS applications for heart and lung sound analysis. Keywords: Micro-electro-mechanical Systems (MEMS); Electret Condenser Microphone (ECM); Wearable Sensing Devices; Cardiorespiratory Auscultation; Phonocardiography (PCG); Heart Sound; Lung Sound

Updated: 2024-06-18 09:28:23

标题: 探索用于心肺音监测的传感器装置

摘要: 本文介绍了心肺听诊感应设备的综合评估，有助于理解感应设备的理论方面，以及设计新型感应设备的实用注释。其中一种设计听诊器的方法是使用电容式电容麦克风（ECM）。本文首先介绍了心脏和肺部的声学特性，以及听诊器演变的简要历史。然后，我们讨论了ECM传感器的基本概念以及基于这种技术的最新听诊器。针对基于ECM系统的局限性，我们探讨了微机电系统（MEMS）的潜力，特别关注压电传感器（PZT）。本文全面审查了感应技术，重点强调了过去十年中创新的MEMS设计，用于可穿戴心肺听诊。据我们所知，这是第一篇总结ECM和MEMS在心肺音分析中应用的论文。关键词：微电子机械系统（MEMS）；电容式电容麦克风（ECM）；可穿戴感应设备；心肺听诊；心音图（PCG）；心音；肺音

更新时间: 2024-06-18 09:28:23

领域: eess.SP,cs.AI,cs.LG,eess.AS

下载: http://arxiv.org/abs/2406.12432v1

MLEM: Generative and Contrastive Learning as Distinct Modalities for Event Sequences

This study explores the application of self-supervised learning techniques for event sequences. It is a key modality in various applications such as banking, e-commerce, and healthcare. However, there is limited research on self-supervised learning for event sequences, and methods from other domains like images, texts, and speech may not easily transfer. To determine the most suitable approach, we conduct a detailed comparative analysis of previously identified best-performing methods. We find that neither the contrastive nor generative method is superior. Our assessment includes classifying event sequences, predicting the next event, and evaluating embedding quality. These results further highlight the potential benefits of combining both methods. Given the lack of research on hybrid models in this domain, we initially adapt the baseline model from another domain. However, upon observing its underperformance, we develop a novel method called the Multimodal-Learning Event Model (MLEM). MLEM treats contrastive learning and generative modeling as distinct yet complementary modalities, aligning their embeddings. The results of our study demonstrate that combining contrastive and generative approaches into one procedure with MLEM achieves superior performance across multiple metrics.

Updated: 2024-06-18 09:26:41

标题: MLEM: 生成式学习和对比学习作为事件序列的不同模态

摘要: 这项研究探讨了自监督学习技术在事件序列中的应用。它是银行、电子商务和医疗保健等各种应用中的一个关键模态。然而，关于事件序列的自监督学习的研究有限，而来自其他领域如图像、文本和语音的方法可能不容易转移。为了确定最适合的方法，我们对先前确定的表现最佳方法进行了详细的比较分析。我们发现对比方法和生成方法都没有优势。我们的评估包括对事件序列进行分类、预测下一个事件并评估嵌入质量。这些结果进一步突显了结合两种方法的潜在好处。鉴于在该领域缺乏混合模型的研究，我们最初从另一个领域改编了基线模型。然而，在观察到其表现不佳后，我们开发了一种名为多模态学习事件模型（MLEM）的新方法。MLEM将对比学习和生成建模视为不同但互补的模态，使它们的嵌入对齐。我们研究的结果表明，将对比和生成方法结合到一个过程中，使用MLEM在多个指标上实现了卓越的性能。

更新时间: 2024-06-18 09:26:41

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2401.15935v3

PlanRAG: A Plan-then-Retrieval Augmented Generation for Generative Large Language Models as Decision Makers

In this paper, we conduct a study to utilize LLMs as a solution for decision making that requires complex data analysis. We define Decision QA as the task of answering the best decision, $d_{best}$, for a decision-making question $Q$, business rules $R$ and a database $D$. Since there is no benchmark that can examine Decision QA, we propose Decision QA benchmark, DQA. It has two scenarios, Locating and Building, constructed from two video games (Europa Universalis IV and Victoria 3) that have almost the same goal as Decision QA. To address Decision QA effectively, we also propose a new RAG technique called the iterative plan-then-retrieval augmented generation (PlanRAG). Our PlanRAG-based LM generates the plan for decision making as the first step, and the retriever generates the queries for data analysis as the second step. The proposed method outperforms the state-of-the-art iterative RAG method by 15.8% in the Locating scenario and by 7.4% in the Building scenario, respectively. We release our code and benchmark at https://github.com/myeon9h/PlanRAG.

Updated: 2024-06-18 09:25:35

标题: PlanRAG：一种用于生成大型语言模型的计划-检索增强生成，作为决策者

摘要: 在这篇论文中，我们进行了一项研究，利用LLMs作为需要复杂数据分析的决策制定的解决方案。我们将Decision QA定义为回答决策制定问题$Q$的最佳决策$d_{best}$，业务规则$R$和数据库$D$的任务。由于没有可以检查Decision QA的基准，我们提出了Decision QA基准，DQA。它由两个视频游戏（欧陆风云IV和维多利亚3）构成，这两个游戏的目标几乎与Decision QA相同。为了有效解决Decision QA，我们还提出了一种新的RAG技术，称为迭代计划-检索增强生成（PlanRAG）。我们基于PlanRAG的LM首先生成决策制定计划，然后检索器生成数据分析查询。所提出的方法在定位场景中的表现优于最先进的迭代RAG方法15.8％，在建立场景中的表现优于7.4％。我们在https://github.com/myeon9h/PlanRAG发布了我们的代码和基准。

更新时间: 2024-06-18 09:25:35

领域: cs.CL,cs.AI,cs.LG,I.2.7

下载: http://arxiv.org/abs/2406.12430v1

Adaptive Selection for Homogeneous Tools: An Instantiation in the RAG Scenario

Current research on tool learning primarily focuses on selecting the most effective tool from a wide array of options, often overlooking cost-effectiveness, a crucial factor in human problem-solving. In this paper, we address the selection of homogeneous tools by predicting both their performance and the associated cost required to accomplish a given task. We then assign queries to the optimal tools in a cost-effective manner. Our experimental results demonstrate that our method achieves higher performance at a lower cost compared to strong baseline approaches.

Updated: 2024-06-18 09:24:09

标题: 同质工具的自适应选择：在RAG场景中的具体实例化

摘要: 目前关于工具学习的研究主要集中在从各种选项中选择最有效的工具，通常忽视成本效益，这是人类问题解决中的一个关键因素。在本文中，我们通过预测同质工具的性能和完成给定任务所需的相关成本，来解决同质工具的选择。然后以一种成本效益的方式将查询分配给最佳工具。我们的实验结果表明，与强基线方法相比，我们的方法在更低的成本下实现了更高的性能。

更新时间: 2024-06-18 09:24:09

领域: cs.AI

下载: http://arxiv.org/abs/2406.12429v1

PSLM: Parallel Generation of Text and Speech with LLMs for Low-Latency Spoken Dialogue Systems

Multimodal language models that process both text and speech have a potential for applications in spoken dialogue systems. However, current models face two major challenges in response generation latency: (1) generating a spoken response requires the prior generation of a written response, and (2) speech sequences are significantly longer than text sequences. This study addresses these issues by extending the input and output sequences of the language model to support the parallel generation of text and speech. Our experiments on spoken question answering tasks demonstrate that our approach improves latency while maintaining the quality of response content. Additionally, we show that latency can be further reduced by generating speech in multiple sequences. Demo samples are available at https://rinnakk.github.io/research/publications/PSLM.

Updated: 2024-06-18 09:23:54

标题: PSLM：用LLM并行生成文本和语音，用于低延迟口语对话系统

摘要: 处理文本和语音的多模态语言模型在口语对话系统中具有潜在的应用前景。然而，当前模型在响应生成延迟方面面临两个主要挑战：(1) 生成口头回应需要先生成书面回应，(2) 语音序列显著长于文本序列。本研究通过扩展语言模型的输入和输出序列，以支持文本和语音的并行生成，解决了这些问题。我们在口语问答任务上的实验表明，我们的方法改善了延迟，同时保持了响应内容的质量。此外，我们展示了通过生成多个序列可以进一步减少延迟。演示样本可在https://rinnakk.github.io/research/publications/PSLM上获得。

更新时间: 2024-06-18 09:23:54

领域: cs.CL,cs.AI,cs.LG,cs.SD,eess.AS

下载: http://arxiv.org/abs/2406.12428v1

AutoFirm: Automatically Identifying Reused Libraries inside IoT Firmware at Large-Scale

The Internet of Things (IoT) has become indispensable to our daily lives and work. Unfortunately, developers often reuse software libraries in the IoT firmware, leading to a major security concern. If vulnerabilities or insecure versions of these libraries go unpatched, a massive number of IoT devices can be impacted. In this paper, we propose the AutoFirm, an automated tool for detecting reused libraries in IoT firmware at a large scale. Specifically, AutoFirm leverages the syntax information (library name and version) to determine whether IoT firmware reuses the libraries. We conduct a large-scale empirical study of reused libraries of IoT firmware, investigating more than 6,900+ firmware and 2,700+ distinct vulnerabilities affecting 11,300+ vulnerable versions from 349 open-source software libraries. Leveraging this diverse information set, we conduct a qualitative assessment of vulnerable library versions to understand security gaps and the misplaced trust of libraries in IoT firmware. Our research reveals that: manufacturers neglected to update outdated libraries for IoT firmware in 67.3\% of cases; on average, outdated libraries persisted for over 1.34 years prior to remediation; vulnerabilities of software libraries have posed server threats to widespread IoT devices.

Updated: 2024-06-18 09:22:32

标题: AutoFirm：在大规模IoT固件中自动识别重复使用的库

摘要: 物联网(IoT)已经成为我们日常生活和工作中不可或缺的一部分。不幸的是，开发人员经常在IoT固件中重复使用软件库，这导致了一个重大的安全问题。如果这些库的漏洞或不安全版本未能修补，大量的IoT设备可能会受到影响。在本文中，我们提出了AutoFirm，这是一个用于大规模检测IoT固件中重复使用库的自动化工具。具体来说，AutoFirm利用语法信息(库名称和版本)来确定IoT固件是否重复使用了这些库。我们进行了一项关于IoT固件中重复使用库的大规模经验研究，调查了超过6,900个固件和2,700个不同漏洞，影响了来自349个开源软件库的11,300个易受攻击版本。利用这一多样化信息集，我们对易受攻击的库版本进行了定性评估，以了解IoT固件中的安全漏洞和对库的错误信任。我们的研究表明：在67.3%的情况下，制造商忽视了更新过时的库用于IoT固件；平均而言，过时的库在进行修复前存在了超过1.34年；软件库的漏洞对广泛的IoT设备构成了严重威胁。

更新时间: 2024-06-18 09:22:32

领域: cs.CR,cs.SE

下载: http://arxiv.org/abs/2406.12947v1

STEMO: Early Spatio-temporal Forecasting with Multi-Objective Reinforcement Learning

Accuracy and timeliness are indeed often conflicting goals in prediction tasks. Premature predictions may yield a higher rate of false alarms, whereas delaying predictions to gather more information can render them too late to be useful. In applications such as wildfires, crimes, and traffic jams, timely forecasting are vital for safeguarding human life and property. Consequently, finding a balance between accuracy and timeliness is crucial. In this paper, we propose an early spatio-temporal forecasting model based on Multi-Objective reinforcement learning that can either implement an optimal policy given a preference or infer the preference based on a small number of samples. The model addresses two primary challenges: 1) enhancing the accuracy of early forecasting and 2) providing the optimal policy for determining the most suitable prediction time for each area. Our method demonstrates superior performance on three large-scale real-world datasets, surpassing existing methods in early spatio-temporal forecasting tasks.

Updated: 2024-06-18 09:16:33

标题: STEMO：使用多目标强化学习进行早期时空预测

摘要: 准确性和及时性确实经常在预测任务中是相互冲突的目标。过早的预测可能会导致更高的误报率，而延迟预测以收集更多信息可能会使它们变得太晚而无用。在野火、犯罪和交通拥堵等应用中，及时的预测对于保障人类生命和财产至关重要。因此，在准确性和及时性之间找到平衡是至关重要的。在本文中，我们提出了一个基于多目标强化学习的早期时空预测模型，该模型可以根据偏好实施最佳策略，或者根据少量样本推断偏好。该模型解决了两个主要挑战：1）提高早期预测的准确性；2）为确定每个区域最适合的预测时间提供最佳策略。我们的方法在三个大规模真实数据集上表现出优异性能，在早期时空预测任务中超越现有方法。

更新时间: 2024-06-18 09:16:33

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2406.04035v3

Deep Temporal Deaggregation: Large-Scale Spatio-Temporal Generative Models

Many of today's data is time-series data originating from various sources, such as sensors, transaction systems, or production systems. Major challenges with such data include privacy and business sensitivity. Generative time-series models have the potential to overcome these problems, allowing representative synthetic data, such as people's movement in cities, to be shared openly and be used to the benefit of society at large. However, contemporary approaches are limited to prohibitively short sequences and small scales. Aside from major memory limitations, the models generate less accurate and less representative samples the longer the sequences are. This issue is further exacerbated by the lack of a comprehensive and accessible benchmark. Furthermore, a common need in practical applications is what-if analysis and dynamic adaptation to data distribution changes, for usage in decision making and to manage a changing world: What if this road is temporarily blocked or another road is added? The focus of this paper is on mobility data, such as people's movement in cities, requiring all these issues to be addressed. To this end, we propose a transformer-based diffusion model, TDDPM, for time-series which outperforms and scales substantially better than state-of-the-art. This is evaluated in a new comprehensive benchmark across several sequence lengths, standard datasets, and evaluation measures. We also demonstrate how the model can be conditioned on a prior over spatial occupancy frequency information, allowing the model to generate mobility data for previously unseen environments and for hypothetical scenarios where the underlying road network and its usage changes. This is evaluated by training on mobility data from part of a city. Then, using only aggregate spatial information as prior, we demonstrate out-of-distribution generalization to the unobserved remainder of the city.

Updated: 2024-06-18 09:16:11

标题: 深度时间解聚：大规模时空生成模型

摘要: 今天许多数据都是来自各种传感器、交易系统或生产系统的时间序列数据。这类数据面临的主要挑战包括隐私和商业敏感性。生成式时间序列模型有潜力克服这些问题，允许代表性的合成数据（如城市中人们的移动）被公开共享，并为整个社会带来利益。然而，当代方法受到限制，序列过短且规模小得令人难以接受。除了主要的内存限制外，模型生成的样本在序列越长时越不准确、不具代表性。这个问题进一步加剧了缺乏全面和易于访问的基准的困境。此外，在实际应用中普遍需要进行假设分析和动态适应数据分布的变化，以用于决策和应对不断变化的世界：如果这条道路暂时封闭，或者另一条道路被添加会怎样？本文的重点是城市中的人员移动等移动数据，需要解决所有这些问题。为此，我们提出了一种基于Transformer的扩散模型TDDPM，用于时间序列，其性能优于并且比最先进的方法具有更好的扩展性。通过在新的全面基准测试中评估多个序列长度、标准数据集和评估指标，我们展示了这一点。我们还展示了模型如何能够在空间占用频率信息的先验条件下进行条件化，使模型能够为以前未见的环境以及道路网络及其使用情况发生变化的假设情景生成移动数据。通过在城市的部分地区训练移动数据，然后仅使用聚合空间信息作为先验条件，我们展示了对城市未观测部分的分布外泛化。

更新时间: 2024-06-18 09:16:11

领域: cs.LG

下载: http://arxiv.org/abs/2406.12423v1

Identifying and Mitigating Privacy Risks Stemming from Language Models: A Survey

Large Language Models (LLMs) have shown greatly enhanced performance in recent years, attributed to increased size and extensive training data. This advancement has led to widespread interest and adoption across industries and the public. However, training data memorization in Machine Learning models scales with model size, particularly concerning for LLMs. Memorized text sequences have the potential to be directly leaked from LLMs, posing a serious threat to data privacy. Various techniques have been developed to attack LLMs and extract their training data. As these models continue to grow, this issue becomes increasingly critical. To help researchers and policymakers understand the state of knowledge around privacy attacks and mitigations, including where more work is needed, we present the first SoK on data privacy for LLMs. We (i) identify a taxonomy of salient dimensions where attacks differ on LLMs, (ii) systematize existing attacks, using our taxonomy of dimensions to highlight key trends, (iii) survey existing mitigation strategies, highlighting their strengths and limitations, and (iv) identify key gaps, demonstrating open problems and areas for concern.

Updated: 2024-06-18 09:14:34

标题: 识别和减轻源自语言模型的隐私风险：一项调查

摘要: 大型语言模型(LLMs)近年来表现出极大的性能提升，这归因于模型尺寸增加和大量的训练数据。这一进步引起了跨行业和公众的广泛兴趣和采用。然而，在机器学习模型中，特别是对LLMs来说，训练数据的记忆随着模型尺寸的增加而扩大。记忆文本序列有可能直接从LLMs中泄露，对数据隐私构成严重威胁。已经开发了各种技术来攻击LLMs并提取它们的训练数据。随着这些模型的不断增长，这个问题变得越来越关键。为了帮助研究人员和政策制定者了解围绕LLMs的数据隐私攻击和缓解的知识状态，包括需要更多工作的地方，我们提出了第一份关于LLMs数据隐私的SoK。我们 (i) 确定了攻击在LLMs上有所不同的显著维度分类，(ii) 系统化现有攻击，利用我们的维度分类来突出关键趋势，(iii) 调查现有的缓解策略，突出它们的优势和局限性，以及 (iv) 确定关键的差距，展示了存在的问题和需要关注的领域。

更新时间: 2024-06-18 09:14:34

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2310.01424v2

MMUTF: Multimodal Multimedia Event Argument Extraction with Unified Template Filling

With the advancement of multimedia technologies, news documents and user-generated content are often represented as multiple modalities, making Multimedia Event Extraction (MEE) an increasingly important challenge. However, recent MEE methods employ weak alignment strategies and data augmentation with simple classification models, which ignore the capabilities of natural language-formulated event templates for the challenging Event Argument Extraction (EAE) task. In this work, we focus on EAE and address this issue by introducing a unified template filling model that connects the textual and visual modalities via textual prompts. This approach enables the exploitation of cross-ontology transfer and the incorporation of event-specific semantics. Experiments on the M2E2 benchmark demonstrate the effectiveness of our approach. Our system surpasses the current SOTA on textual EAE by +7% F1, and performs generally better than the second-best systems for multimedia EAE.

Updated: 2024-06-18 09:14:17

标题: MMUTF：统一模板填充的多模态多媒体事件论证提取

摘要: 随着多媒体技术的进步，新闻文档和用户生成的内容通常以多种形式呈现，使得多媒体事件提取(MEE)成为一个日益重要的挑战。然而，最近的MEE方法采用弱对齐策略和简单分类模型的数据增强，忽视了自然语言形式的事件模板在具有挑战性的事件论证提取(EAE)任务中的能力。在这项工作中，我们专注于EAE，并通过引入一个统一的模板填充模型来解决这个问题，通过文本提示将文本和视觉模态连接起来。这种方法能够利用跨本体传递和整合事件特定语义。在M2E2基准测试上的实验证明了我们方法的有效性。我们的系统在文本EAE上超过当前SOTA的+F1 7%，并且在多媒体EAE方面通常优于第二好的系统。

更新时间: 2024-06-18 09:14:17

领域: cs.CL,cs.LG

下载: http://arxiv.org/abs/2406.12420v1

Prediction of the Realisation of an Information Need: An EEG Study

One of the foundational goals of Information Retrieval (IR) is to satisfy searchers' Information Needs (IN). Understanding how INs physically manifest has long been a complex and elusive process. However, recent studies utilising Electroencephalography (EEG) data have provided real-time insights into the neural processes associated with INs. Unfortunately, they have yet to demonstrate how this insight can practically benefit the search experience. As such, within this study, we explore the ability to predict the realisation of IN within EEG data across 14 subjects whilst partaking in a Question-Answering (Q/A) task. Furthermore, we investigate the combinations of EEG features that yield optimal predictive performance, as well as identify regions within the Q/A queries where a subject's realisation of IN is more pronounced. The findings from this work demonstrate that EEG data is sufficient for the real-time prediction of the realisation of an IN across all subjects with an accuracy of 73.5% (SD 2.6%) and on a per-subject basis with an accuracy of 90.1% (SD 22.1%). This work helps to close the gap by bridging theoretical neuroscientific advancements with tangible improvements in information retrieval practices, paving the way for real-time prediction of the realisation of IN.

Updated: 2024-06-18 09:13:04

标题: 信息需求实现的预测：一个脑电图研究

摘要: 信息检索（IR）的基本目标之一是满足搜索者的信息需求（IN）。理解IN如何在物理上显现长期以来一直是一个复杂而难以捉摸的过程。然而，最近利用脑电图（EEG）数据的研究提供了关于与IN相关的神经过程的实时见解。不幸的是，他们尚未展示这种见解如何可以实际上有益于搜索体验。因此，在这项研究中，我们探讨了在14名参与问答（Q/A）任务的受试者中预测IN实现在EEG数据中的能力。此外，我们研究了产生最佳预测性能的EEG特征的组合，以及识别了Q/A查询中受试者对IN实现更为显著的区域。这项工作的发现表明，EEG数据足以实时预测所有受试者的IN实现，准确率为73.5%（标准差2.6%），以及按照每位受试者的准确率为90.1%（标准差22.1%）。这项工作有助于通过将理论神经科学的进步与信息检索实践的实质性改进联系起来，为实时预测IN实现铺平道路。

更新时间: 2024-06-18 09:13:04

领域: cs.IR,cs.AI,cs.HC

下载: http://arxiv.org/abs/2406.08105v3

Beyond Under-Alignment: Atomic Preference Enhanced Factuality Tuning for Large Language Models

Large language models (LLMs) have achieved remarkable success but still tend to generate factually erroneous responses, a phenomenon known as hallucination. A recent trend is to use preference learning to fine-tune models to align with factuality. However, existing work primarily evaluates fine-tuned models on in-domain (ID) datasets and the factuality on out-of-domain (OOD) datasets remains underexplored. In this paper, we conduct a comprehensive evaluation of the factuality of different models tuned by various preference learning algorithms and demonstrate that their performance on OOD datasets either increases minimally or decreases. Subsequently, we reveal that the main cause of model's failure to uphold factuality under a distribution shift is \textbf{under-alignment}, rather than \textbf{over-alignment}, by analyzing the token distribution shift of the models before and after tuning. Finally, we propose \textbf{APEFT} (\textbf{A}tomic \textbf{P}reference \textbf{E}nhanced \textbf{F}actuality \textbf{T}uning), a framework that enhances model's awareness of factuality at the granularity of individual facts. Extensive experiments demonstrate that APEFT improves model performance by an average of $\boldsymbol{3.45\%}$ on both ID and OOD datasets, which is highly effective.

Updated: 2024-06-18 09:07:30

标题: 超越不一致性：原子偏好增强事实性调整用于大型语言模型

摘要: 大型语言模型（LLMs）取得了显著的成功，但仍然倾向于生成事实错误的响应，这种现象被称为幻觉。最近的一个趋势是使用偏好学习来微调模型以与事实相符。然而，现有工作主要在域内（ID）数据集上评估微调的模型，而域外（OOD）数据集上的事实性仍未得到充分探讨。在本文中，我们对经过各种偏好学习算法调整的不同模型的事实性进行了全面评估，并展示它们在OOD数据集上的性能要么略微增加，要么减少。随后，我们通过分析模型在微调前后的标记分布变化，揭示了模型未能在分布转移下维持事实性的主要原因是\textbf{不对齐}，而不是\textbf{过度对齐}。最后，我们提出了\textbf{APEFT}（\textbf{A}tomic \textbf{P}reference \textbf{E}nhanced \textbf{F}actuality \textbf{T}uning）框架，该框架在个别事实的粒度上增强了模型对事实性的认识。大量实验证明，APEFT在ID和OOD数据集上平均提高了$\boldsymbol{3.45\%}$的模型性能，非常有效。

更新时间: 2024-06-18 09:07:30

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2406.12416v1

MegaVul: A C/C++ Vulnerability Dataset with Comprehensive Code Representation

We constructed a newly large-scale and comprehensive C/C++ vulnerability dataset named MegaVul by crawling the Common Vulnerabilities and Exposures (CVE) database and CVE-related open-source projects. Specifically, we collected all crawlable descriptive information of the vulnerabilities from the CVE database and extracted all vulnerability-related code changes from 28 Git-based websites. We adopt advanced tools to ensure the extracted code integrality and enrich the code with four different transformed representations. In total, MegaVul contains 17,380 vulnerabilities collected from 992 open-source repositories spanning 169 different vulnerability types disclosed from January 2006 to October 2023. Thus, MegaVul can be used for a variety of software security-related tasks including detecting vulnerabilities and assessing vulnerability severity. All information is stored in the JSON format for easy usage. MegaVul is publicly available on GitHub and will be continuously updated. It can be easily extended to other programming languages.

Updated: 2024-06-18 09:03:18

标题: MegaVul：具有全面代码表示的C/C++漏洞数据集

摘要: 我们通过爬取常见漏洞和曝光(CVE)数据库以及与CVE相关的开源项目，构建了一个新的大规模和综合的C/C++漏洞数据集，命名为MegaVul。具体来说，我们从CVE数据库收集了所有可以爬取的漏洞描述信息，并从28个基于Git的网站中提取了所有与漏洞相关的代码更改。我们采用先进的工具确保提取的代码完整性，并使用四种不同的转换表示丰富代码。总共，MegaVul包含了从2006年1月至2023年10月披露的992个开源存储库中收集的17,380个漏洞，涵盖了169种不同的漏洞类型。因此，MegaVul可用于各种软件安全相关任务，包括检测漏洞和评估漏洞严重性。所有信息以JSON格式存储，易于使用。MegaVul公开在GitHub上，并将持续更新。它可以轻松扩展到其他编程语言。

更新时间: 2024-06-18 09:03:18

领域: cs.CR,cs.SE

下载: http://arxiv.org/abs/2406.12415v1

Cross-Problem Learning for Solving Vehicle Routing Problems

Existing neural heuristics often train a deep architecture from scratch for each specific vehicle routing problem (VRP), ignoring the transferable knowledge across different VRP variants. This paper proposes the cross-problem learning to assist heuristics training for different downstream VRP variants. Particularly, we modularize neural architectures for complex VRPs into 1) the backbone Transformer for tackling the travelling salesman problem (TSP), and 2) the additional lightweight modules for processing problem-specific features in complex VRPs. Accordingly, we propose to pre-train the backbone Transformer for TSP, and then apply it in the process of fine-tuning the Transformer models for each target VRP variant. On the one hand, we fully fine-tune the trained backbone Transformer and problem-specific modules simultaneously. On the other hand, we only fine-tune small adapter networks along with the modules, keeping the backbone Transformer still. Extensive experiments on typical VRPs substantiate that 1) the full fine-tuning achieves significantly better performance than the one trained from scratch, and 2) the adapter-based fine-tuning also delivers comparable performance while being notably parameter-efficient. Furthermore, we empirically demonstrate the favorable effect of our method in terms of cross-distribution application and versatility.

Updated: 2024-06-18 09:03:08

标题: 跨问题学习用于解决车辆路径问题

摘要: 现有的神经启发式方法通常针对每个特定的车辆路径规划问题（VRP）从头开始训练深度架构，忽略了不同VRP变体之间可转移的知识。本文提出了跨问题学习来辅助启发式方法训练不同下游VRP变体。具体来说，我们将复杂VRP的神经架构模块化为1）用于解决旅行推销员问题（TSP）的主干Transformer，以及2）用于处理复杂VRP中特定问题特征的额外轻量级模块。因此，我们提出预先训练TSP的主干Transformer，然后将其应用于微调Transformer模型以用于每个目标VRP变体的过程中。一方面，我们同时完全微调训练过的主干Transformer和特定问题模块。另一方面，我们仅微调适配器网络以及模块，保持主干Transformer不变。大量对典型VRP的实验证明了，1）全面微调的性能显著优于从头开始训练的性能，2）基于适配器的微调在保持显著参数高效的同时也实现了可比较的性能。此外，我们通过实证证明了我们的方法在跨分布应用和多功能性方面的有利效果。

更新时间: 2024-06-18 09:03:08

领域: cs.AI

下载: http://arxiv.org/abs/2404.11677v3

Pushing the Frontier on Approximate EFX Allocations

We study the problem of allocating a set of indivisible goods to a set of agents with additive valuation functions, aiming to achieve approximate envy-freeness up to any good ($\alpha$-EFX). The state-of-the-art results on the problem include that (exact) EFX allocations exist when (a) there are at most three agents, or (b) the agents' valuation functions can take at most two values, or (c) the agents' valuation functions can be represented via a graph. For $\alpha$-EFX, it is known that a $0.618$-EFX allocation exists for any number of agents with additive valuation functions. In this paper, we show that $2/3$-EFX allocations exist when (a) there are at most \emph{seven agents}, (b) the agents' valuation functions can take at most \emph{three values}, or (c) the agents' valuation functions can be represented via a \emph{multigraph}. Our results can be interpreted in two ways. First, by relaxing the notion of EFX to $2/3$-EFX, we obtain existence results for strict generalizations of the settings for which exact EFX allocations are known to exist. Secondly, by imposing restrictions on the setting, we manage to beat the barrier of $0.618$ and achieve an approximation guarantee of $2/3$. Therefore, our results push the \emph{frontier} of existence and computation of approximate EFX allocations, and provide insights into the challenges of settling the existence of exact EFX allocations.

Updated: 2024-06-18 09:01:37

标题: 推动近似EFX分配的前沿

摘要: 我们研究了将一组不可分割的物品分配给一组具有可加性估值函数的代理的问题，旨在实现对任何物品的近似无嫉妒性（$\alpha$-EFX）。该问题的最新研究结果包括：当（a）代理最多有三个时，或（b）代理的估值函数最多可以取两个值时，或（c）代理的估值函数可以通过图表示时，存在（精确的）EFX分配。对于$\alpha$-EFX，已知对于具有可加性估值函数的任意数量的代理，存在一个$0.618$-EFX分配。在本文中，我们展示了当（a）有最多七个代理，（b）代理的估值函数最多可以取三个值，或（c）代理的估值函数可以通过多重图表示时，存在$2/3$-EFX分配。我们的结果可以有两种解释方式。首先，通过将EFX的概念放宽到$2/3$-EFX，我们得到了对已知存在精确EFX分配的设置的严格概括的存在结果。其次，通过对设置施加限制，我们设法突破了$0.618$的障碍，并实现了$2/3$的近似保证。因此，我们的结果推动了近似EFX分配的存在和计算的前沿，并为解决精确EFX分配的存在性挑战提供了见解。

更新时间: 2024-06-18 09:01:37

领域: cs.GT,cs.AI,cs.DM

下载: http://arxiv.org/abs/2406.12413v1

A Novel Algorithm for Community Detection in Networks using Rough Sets and Consensus Clustering

Complex networks, such as those in social, biological, and technological systems, often present challenges to the task of community detection. Our research introduces a novel rough clustering based consensus community framework (RC-CCD) for effective structure identification of network communities. The RC-CCD method employs rough set theory to handle uncertainties within data and utilizes a consensus clustering approach to aggregate multiple clustering results, enhancing the reliability and accuracy of community detection. This integration allows the RC-CCD to effectively manage overlapping communities, which are often present in complex networks. This approach excels at detecting overlapping communities, offering a detailed and accurate representation of network structures. Comprehensive testing on benchmark networks generated by the Lancichinetti-Fortunato-Radicchi method showcased the strength and adaptability of the new proposal to varying node degrees and community sizes. Cross-comparisons of RC-CCD versus other well known detection algorithms outcomes highlighted its stability and adaptability.

Updated: 2024-06-18 09:01:21

标题: 一种利用粗糙集和共识聚类在网络中进行社区检测的新算法

摘要: 复杂网络，如社会、生物和技术系统中的网络，通常对社区检测任务提出挑战。我们的研究引入了一种基于粗糙聚类的共识社区框架（RC-CCD），用于有效识别网络社区的结构。RC-CCD方法采用粗糙集理论处理数据中的不确定性，并利用共识聚类方法聚合多个聚类结果，增强社区检测的可靠性和准确性。这种整合使得RC-CCD能够有效地管理复杂网络中常见的重叠社区。这种方法在检测重叠社区方面表现出色，提供了网络结构的详细和准确表示。对Lancichinetti-Fortunato-Radicchi方法生成的基准网络的全面测试展示了新提议对不同节点度和社区大小的适应能力。RC-CCD与其他众所周知的检测算法结果的交叉比较突出了其稳定性和适应性。

更新时间: 2024-06-18 09:01:21

领域: cs.AI,cs.SI

下载: http://arxiv.org/abs/2406.12412v1

TADM: Temporally-Aware Diffusion Model for Neurodegenerative Progression on Brain MRI

Generating realistic images to accurately predict changes in the structure of brain MRI is a crucial tool for clinicians. Such applications help assess patients' outcomes and analyze how diseases progress at the individual level. However, existing methods for this task present some limitations. Some approaches attempt to model the distribution of MRI scans directly by conditioning the model on patients' ages, but they fail to explicitly capture the relationship between structural changes in the brain and time intervals, especially on age-unbalanced datasets. Other approaches simply rely on interpolation between scans, which limits their clinical application as they do not predict future MRIs. To address these challenges, we propose a Temporally-Aware Diffusion Model (TADM), which introduces a novel approach to accurately infer progression in brain MRIs. TADM learns the distribution of structural changes in terms of intensity differences between scans and combines the prediction of these changes with the initial baseline scans to generate future MRIs. Furthermore, during training, we propose to leverage a pre-trained Brain-Age Estimator (BAE) to refine the model's training process, enhancing its ability to produce accurate MRIs that match the expected age gap between baseline and generated scans. Our assessment, conducted on the OASIS-3 dataset, uses similarity metrics and region sizes computed by comparing predicted and real follow-up scans on 3 relevant brain regions. TADM achieves large improvements over existing approaches, with an average decrease of 24% in region size error and an improvement of 4% in similarity metrics. These evaluations demonstrate the improvement of our model in mimicking temporal brain neurodegenerative progression compared to existing methods. Our approach will benefit applications, such as predicting patient outcomes or improving treatments for patients.

Updated: 2024-06-18 09:00:49

标题: TADM：基于脑MRI的神经退行性进展的时间感知扩散模型

摘要: 生成真实图像以准确预测脑MRI结构的变化对临床医生是至关重要的工具。这种应用有助于评估患者的预后，并分析疾病如何在个体水平上发展。然而，目前针对这一任务的现有方法存在一些局限性。一些方法尝试直接对MRI扫描的分布进行建模，通过将模型与患者年龄相关联，但它们未能明确捕捉脑部结构变化与时间间隔之间的关系，特别是在年龄不平衡的数据集上。其他方法仅仅依赖于扫描之间的插值，这限制了它们在临床应用中的使用，因为它们无法预测未来的MRI。为了解决这些挑战，我们提出了一种具有时间感知扩散模型（TADM）的方法，该方法引入了一种新颖的方法来准确推断脑MRI中的进展。TADM通过学习扫描之间强度差异的变化分布，并将这些变化的预测与初始基线扫描相结合，生成未来的MRI。此外，在训练过程中，我们建议利用预先训练的脑龄估计器（BAE）来优化模型的训练过程，提高其生成与基线和生成扫描之间预期年龄差距相匹配的准确MRI的能力。我们在OASIS-3数据集上进行评估，使用相似性指标和通过比较预测和真实随访扫描得出的三个相关脑区域的区域大小来计算。TADM相比现有方法取得了显著的改进，区域大小误差平均减少了24％，相似性指标改善了4％。这些评估结果表明，与现有方法相比，我们的模型在模拟时间性脑神经退行性进展方面取得了进步。我们的方法将有助于应用，如预测患者结果或改善患者治疗。

更新时间: 2024-06-18 09:00:49

领域: eess.IV,cs.CV,cs.LG

下载: http://arxiv.org/abs/2406.12411v1

Translation Equivariant Transformer Neural Processes

The effectiveness of neural processes (NPs) in modelling posterior prediction maps -- the mapping from data to posterior predictive distributions -- has significantly improved since their inception. This improvement can be attributed to two principal factors: (1) advancements in the architecture of permutation invariant set functions, which are intrinsic to all NPs; and (2) leveraging symmetries present in the true posterior predictive map, which are problem dependent. Transformers are a notable development in permutation invariant set functions, and their utility within NPs has been demonstrated through the family of models we refer to as TNPs. Despite significant interest in TNPs, little attention has been given to incorporating symmetries. Notably, the posterior prediction maps for data that are stationary -- a common assumption in spatio-temporal modelling -- exhibit translation equivariance. In this paper, we introduce of a new family of translation equivariant TNPs that incorporate translation equivariance. Through an extensive range of experiments on synthetic and real-world spatio-temporal data, we demonstrate the effectiveness of TE-TNPs relative to their non-translation-equivariant counterparts and other NP baselines.

Updated: 2024-06-18 08:58:59

标题: Translation Equivariant Transformer Neural Processes （平移等变换Transformer神经网络过程）

摘要: 神经过程（NPs）在建模后验预测地图 - 从数据到后验预测分布的映射 - 的有效性自其创立以来显著提高。这种改进可以归因于两个主要因素：（1）置换不变集函数架构的进步，这些函数在所有NPs中都是固有的；和（2）利用真后验预测地图中存在的对称性，这取决于问题。变压器是置换不变集函数中的一个显着发展，并且它们在NPs中的实用性已经通过我们称之为TNPs的模型系列得到证明。尽管对TNPs存在着巨大的兴趣，但很少关注如何纳入对称性。值得注意的是，对于具有平稳性的数据的后验预测地图 - 在时空建模中常见的假设 - 具有平移等变性。在本文中，我们介绍了一种新的具有平移等变性的TNPs系列，这些系列包含了平移等变性。通过对合成和真实世界时空数据的大量实验，我们展示了TE-TNPs相对于其非平移等变性对照组和其他NP基线的有效性。

更新时间: 2024-06-18 08:58:59

领域: stat.ML,cs.LG

下载: http://arxiv.org/abs/2406.12409v1

Preventing Model Collapse in Gaussian Process Latent Variable Models

Gaussian process latent variable models (GPLVMs) are a versatile family of unsupervised learning models commonly used for dimensionality reduction. However, common challenges in modeling data with GPLVMs include inadequate kernel flexibility and improper selection of the projection noise, leading to a type of model collapse characterized by vague latent representations that do not reflect the underlying data structure. This paper addresses these issues by, first, theoretically examining the impact of projection variance on model collapse through the lens of a linear GPLVM. Second, we tackle model collapse due to inadequate kernel flexibility by integrating the spectral mixture (SM) kernel and a differentiable random Fourier feature (RFF) kernel approximation, which ensures computational scalability and efficiency through off-the-shelf automatic differentiation tools for learning the kernel hyperparameters, projection variance, and latent representations within the variational inference framework. The proposed GPLVM, named advisedRFLVM, is evaluated across diverse datasets and consistently outperforms various salient competing models, including state-of-the-art variational autoencoders (VAEs) and other GPLVM variants, in terms of informative latent representations and missing data imputation.

Updated: 2024-06-18 08:56:13

标题: 避免高斯过程潜变量模型中的模型崩溃

摘要: 高斯过程潜变量模型（GPLVMs）是一类多功能的无监督学习模型，通常用于降维。然而，在使用GPLVMs建模数据时常见的挑战包括核灵活性不足和投影噪声选择不当，导致一种模型崩溃，其特征是模糊的潜在表示不反映基础数据结构。本文通过首先在理论上研究投影方差对模型崩溃的影响，通过线性GPLVM的视角。其次，我们通过集成谱混合（SM）核和可微随机傅里叶特征（RFF）核逼近，解决了由于核灵活性不足而导致的模型崩溃，这确保了通过现成的自动微分工具学习核超参数、投影方差和潜在表示的计算可伸缩性和效率，构建在变分推断框架内。提出的GPLVM，命名为advisedRFLVM，在各种数据集上进行评估，并在信息性潜在表示和缺失数据插补方面一贯优于各种显著竞争模型，包括最先进的变分自编码器（VAEs）和其他GPLVM变体。

更新时间: 2024-06-18 08:56:13

领域: stat.ML,cs.LG

下载: http://arxiv.org/abs/2404.01697v2

Fast Rates for Bandit PAC Multiclass Classification

We study multiclass PAC learning with bandit feedback, where inputs are classified into one of $K$ possible labels and feedback is limited to whether or not the predicted labels are correct. Our main contribution is in designing a novel learning algorithm for the agnostic $(\varepsilon,\delta)$-PAC version of the problem, with sample complexity of $O\big( (\operatorname{poly}(K) + 1 / \varepsilon^2) \log (|H| / \delta) \big)$ for any finite hypothesis class $H$. In terms of the leading dependence on $\varepsilon$, this improves upon existing bounds for the problem, that are of the form $O(K/\varepsilon^2)$. We also provide an extension of this result to general classes and establish similar sample complexity bounds in which $\log |H|$ is replaced by the Natarajan dimension. This matches the optimal rate in the full-information version of the problem and resolves an open question studied by Daniely, Sabato, Ben-David, and Shalev-Shwartz (2011) who demonstrated that the multiplicative price of bandit feedback in realizable PAC learning is $\Theta(K)$. We complement this by revealing a stark contrast with the agnostic case, where the price of bandit feedback is only $O(1)$ as $\varepsilon \to 0$. Our algorithm utilizes a stochastic optimization technique to minimize a log-barrier potential based on Frank-Wolfe updates for computing a low-variance exploration distribution over the hypotheses, and is made computationally efficient provided access to an ERM oracle over $H$.

Updated: 2024-06-18 08:54:04

标题: 多臂赌博带有PAC多类别分类的快速速率

摘要: 我们研究了具有强盗反馈的多类PAC学习，其中输入被分类为$K$个可能的标签之一，反馈仅限于预测的标签是否正确。我们的主要贡献在于设计了一种新颖的学习算法，用于解决问题的agnostic $(\varepsilon,\delta)$-PAC版本，对于任何有限的假设类$H$，样本复杂度为$O\big( (\operatorname{poly}(K) + 1 / \varepsilon^2) \log (|H| / \delta) \big)$。在$\varepsilon$的主导依赖方面，这比现有的问题界限有所改进，后者形式为$O(K/\varepsilon^2)$。我们还将这一结果扩展到一般类，并建立类似的样本复杂度界限，其中$\log |H|$被Natarajan维度取代。这与问题的全信息版本中的最佳速率相匹配，并解决了Daniely、Sabato、Ben-David和Shalev-Shwartz（2011年）研究的一个开放问题，他们证明了在可实现的PAC学习中，强盗反馈的乘法价格为$\Theta(K)$。我们通过揭示与agnostic情况的明显对比来补充这一点，在这种情况下，随着$\varepsilon \to 0$，强盗反馈的价格仅为$O(1)$。我们的算法利用随机优化技术来最小化基于Frank-Wolfe更新的对数障碍潜力，以计算对假设之间的低方差探索分布，并且在提供对$H$上的ERM预言者的访问权限的情况下，具有计算效率。

更新时间: 2024-06-18 08:54:04

领域: cs.LG,cs.AI,stat.ML

下载: http://arxiv.org/abs/2406.12406v1

PDSS: A Privacy-Preserving Framework for Step-by-Step Distillation of Large Language Models

In the context of real-world applications, leveraging large language models (LLMs) for domain-specific tasks often faces two major challenges: domain-specific knowledge privacy and constrained resources. To address these issues, we propose PDSS, a privacy-preserving framework for step-by-step distillation of LLMs. PDSS works on a server-client architecture, wherein client transmits perturbed prompts to the server's LLM for rationale generation. The generated rationales are then decoded by the client and used to enrich the training of task-specific small language model(SLM) within a multi-task learning paradigm. PDSS introduces two privacy protection strategies: the Exponential Mechanism Strategy and the Encoder-Decoder Strategy, balancing prompt privacy and rationale usability. Experiments demonstrate the effectiveness of PDSS in various text generation tasks, enabling the training of task-specific SLM with enhanced performance while prioritizing data privacy protection.

Updated: 2024-06-18 08:48:14

标题: PDSS：大型语言模型逐步提炼的隐私保护框架

摘要: 在现实世界应用的背景下，利用大型语言模型（LLMs）进行特定领域任务常常面临两个主要挑战：特定领域知识隐私和资源受限。为了解决这些问题，我们提出了PDSS，一个用于逐步提炼LLMs的隐私保护框架。PDSS采用服务器-客户端架构，客户端向服务器的LLM传输扰动提示以生成合理性。生成的合理性然后由客户端解码，并用于丰富多任务学习范例内特定任务小语言模型（SLM）的训练。PDSS引入了两种隐私保护策略：指数机制策略和编码器-解码器策略，平衡提示隐私和合理性可用性。实验表明PDSS在各种文本生成任务中的有效性，可以实现特定任务SLM的训练并提升性能，同时优先考虑数据隐私保护。

更新时间: 2024-06-18 08:48:14

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2406.12403v1

A Cutting-Edge Deep Learning Method For Enhancing IoT Security

There have been significant issues given the IoT, with heterogeneity of billions of devices and with a large amount of data. This paper proposed an innovative design of the Internet of Things (IoT) Environment Intrusion Detection System (or IDS) using Deep Learning-integrated Convolutional Neural Networks (CNN) and Long Short-Term Memory (LSTM) networks. Our model, based on the CICIDS2017 dataset, achieved an accuracy of 99.52% in classifying network traffic as either benign or malicious. The real-time processing capability, scalability, and low false alarm rate in our model surpass some traditional IDS approaches and, therefore, prove successful for application in today's IoT networks. The development and the performance of the model, with possible applications that may extend to other related fields of adaptive learning techniques and cross-domain applicability, are discussed. The research involving deep learning for IoT cybersecurity offers a potent solution for significantly improving network security.

Updated: 2024-06-18 08:42:51

标题: 一种用于增强物联网安全性的前沿深度学习方法

摘要: 鉴于IoT的异构性和大量数据，出现了重大问题。本文提出了一种创新的基于深度学习集成卷积神经网络（CNN）和长短期记忆（LSTM）网络的物联网（IoT）环境入侵检测系统（IDS）的设计。基于CICIDS2017数据集，我们的模型在将网络流量分类为良性或恶意方面实现了99.52%的准确率。我们的模型具有实时处理能力、可扩展性和低误报率，超越了一些传统的IDS方法，因此在当今的IoT网络中证明了成功应用。讨论了模型的发展和性能，以及可能扩展到其他相关领域的适应性学习技术和跨领域适用性的应用。深度学习在IoT网络安全方面的研究为显著提高网络安全提供了有效解决方案。

更新时间: 2024-06-18 08:42:51

领域: cs.AI,cs.CR

下载: http://arxiv.org/abs/2406.12400v1

Deep Proxy Causal Learning and its Application to Confounded Bandit Policy Evaluation

Proxy causal learning (PCL) is a method for estimating the causal effect of treatments on outcomes in the presence of unobserved confounding, using proxies (structured side information) for the confounder. This is achieved via two-stage regression: in the first stage, we model relations among the treatment and proxies; in the second stage, we use this model to learn the effect of treatment on the outcome, given the context provided by the proxies. PCL guarantees recovery of the true causal effect, subject to identifiability conditions. We propose a novel method for PCL, the deep feature proxy variable method (DFPV), to address the case where the proxies, treatments, and outcomes are high-dimensional and have nonlinear complex relationships, as represented by deep neural network features. We show that DFPV outperforms recent state-of-the-art PCL methods on challenging synthetic benchmarks, including settings involving high dimensional image data. Furthermore, we show that PCL can be applied to off-policy evaluation for the confounded bandit problem, in which DFPV also exhibits competitive performance.

Updated: 2024-06-18 08:40:30

标题: 深度代理因果学习及其在混杂赌博策略评估中的应用

摘要: 代理因果学习（PCL）是一种估计治疗对结果的因果效应的方法，该方法在存在未观察到的混杂因素的情况下使用代理（结构化边际信息）来解决。这是通过两阶段回归实现的：在第一阶段，我们建立了治疗和代理之间的关系模型；在第二阶段，我们利用这个模型来学习给定代理提供的上下文的情况下治疗对结果的影响。PCL保证了真实因果效应的恢复，受到可识别性条件的限制。我们提出了一种新颖的PCL方法，即深度特征代理变量方法（DFPV），以解决代理、治疗和结果高维且具有非线性复杂关系的情况，这些关系由深度神经网络特征表示。我们展示了DFPV在具有挑战性的合成基准测试中优于最近的PCL方法，包括涉及高维图像数据的设置。此外，我们展示了PCL可应用于混杂赌博问题的离线评估，在这种情况下，DFPV也表现出竞争性能。

更新时间: 2024-06-18 08:40:30

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2106.03907v5

QueerBench: Quantifying Discrimination in Language Models Toward Queer Identities

With the increasing role of Natural Language Processing (NLP) in various applications, challenges concerning bias and stereotype perpetuation are accentuated, which often leads to hate speech and harm. Despite existing studies on sexism and misogyny, issues like homophobia and transphobia remain underexplored and often adopt binary perspectives, putting the safety of LGBTQIA+ individuals at high risk in online spaces. In this paper, we assess the potential harm caused by sentence completions generated by English large language models (LLMs) concerning LGBTQIA+ individuals. This is achieved using QueerBench, our new assessment framework, which employs a template-based approach and a Masked Language Modeling (MLM) task. The analysis indicates that large language models tend to exhibit discriminatory behaviour more frequently towards individuals within the LGBTQIA+ community, reaching a difference gap of 7.2% in the QueerBench score of harmfulness.

Updated: 2024-06-18 08:40:29

标题: QueerBench：量化语言模型对酷儿身份的歧视

摘要: 随着自然语言处理（NLP）在各种应用中的作用日益增加，有关偏见和刻板印象持续存在的挑战被强调，这往往导致仇恨言论和伤害。尽管已经存在关于性别歧视和厌恶的研究，但同性恋恐惧症和跨性别恐惧症等问题仍然未被充分探讨，并且通常采用二元视角，使 LGBTQIA+ 个体在线空间中的安全受到高风险。本文评估了由英语大型语言模型（LLMs）生成的句子完成对 LGBTQIA+ 个体可能造成的伤害。这是通过我们的新评估框架 QueerBench 实现的，该框架采用基于模板的方法和掩蔽语言建模（MLM）任务。分析表明，大型语言模型往往更频繁地表现出对 LGBTQIA+ 社区内个体的歧视行为，使 QueerBench 有害性评分的差距达到7.2％。

更新时间: 2024-06-18 08:40:29

领域: cs.CL,cs.AI,cs.CY

下载: http://arxiv.org/abs/2406.12399v1

Data Poisoning to Fake a Nash Equilibrium in Markov Games

We characterize offline data poisoning attacks on Multi-Agent Reinforcement Learning (MARL), where an attacker may change a data set in an attempt to install a (potentially fictitious) unique Markov-perfect Nash equilibrium for a two-player zero-sum Markov game. We propose the unique Nash set, namely the set of games, specified by their Q functions, with a specific joint policy being the unique Nash equilibrium. The unique Nash set is central to poisoning attacks because the attack is successful if and only if data poisoning pushes all plausible games inside the set. The unique Nash set generalizes the reward polytope commonly used in inverse reinforcement learning to MARL. For zero-sum Markov games, both the inverse Nash set and the set of plausible games induced by data are polytopes in the Q function space. We exhibit a linear program to efficiently compute the optimal poisoning attack. Our work sheds light on the structure of data poisoning attacks on offline MARL, a necessary step before one can design more robust MARL algorithms.

Updated: 2024-06-18 08:39:02

标题: 在马尔可夫博弈中植入数据以伪造纳什均衡

摘要: 我们对多智能体强化学习（MARL）中的离线数据污染攻击进行了特征化，攻击者可能改变数据集，试图为两个玩家零和马尔科夫博弈安装（可能是虚构的）唯一马尔科夫完美纳什均衡。我们提出了唯一的纳什集，即由其Q函数指定的游戏集，特定联合策略为唯一纳什均衡。唯一的纳什集对于数据污染攻击至关重要，因为攻击只有在数据污染将所有可信游戏推入集合内时才会成功。唯一的纳什集将常用于逆强化学习的奖励多面体推广到MARL。对于零和马尔科夫博弈，逆纳什集和由数据诱导的可信游戏集都是Q函数空间中的多面体。我们展示了一个线性规划，可以高效地计算最佳的污染攻击。我们的工作揭示了离线MARL中数据污染攻击的结构，这是设计更加健壮的MARL算法之前的必要步骤。

更新时间: 2024-06-18 08:39:02

领域: cs.MA,cs.AI,cs.CR,cs.GT,cs.LG

下载: http://arxiv.org/abs/2306.08041v2

SNN4Agents: A Framework for Developing Energy-Efficient Embodied Spiking Neural Networks for Autonomous Agents

Recent trends have shown that autonomous agents, such as Autonomous Ground Vehicles (AGVs), Unmanned Aerial Vehicles (UAVs), and mobile robots, effectively improve human productivity in solving diverse tasks. However, since these agents are typically powered by portable batteries, they require extremely low power/energy consumption to operate in a long lifespan. To solve this challenge, neuromorphic computing has emerged as a promising solution, where bio-inspired Spiking Neural Networks (SNNs) use spikes from event-based cameras or data conversion pre-processing to perform sparse computations efficiently. However, the studies of SNN deployments for autonomous agents are still at an early stage. Hence, the optimization stages for enabling efficient embodied SNN deployments for autonomous agents have not been defined systematically. Toward this, we propose a novel framework called SNN4Agents that consists of a set of optimization techniques for designing energy-efficient embodied SNNs targeting autonomous agent applications. Our SNN4Agents employs weight quantization, timestep reduction, and attention window reduction to jointly improve the energy efficiency, reduce the memory footprint, optimize the processing latency, while maintaining high accuracy. In the evaluation, we investigate use cases of event-based car recognition, and explore the trade-offs among accuracy, latency, memory, and energy consumption. The experimental results show that our proposed framework can maintain high accuracy (i.e., 84.12% accuracy) with 68.75% memory saving, 3.58x speed-up, and 4.03x energy efficiency improvement as compared to the state-of-the-art work for NCARS dataset. In this manner, our SNN4Agents framework paves the way toward enabling energy-efficient embodied SNN deployments for autonomous agents.

Updated: 2024-06-18 08:36:11

标题: SNN4Agents：用于开发自主体能源高效体现式脉冲神经网络的框架

摘要: 近年来的趋势表明，自主代理，如自主地面车辆（AGVs），无人机（UAVs）和移动机器人，有效地提高了人类在解决各种任务中的生产力。然而，由于这些代理通常由便携式电池供电，它们需要极低的功耗/能耗才能在长寿命中运行。为了解决这一挑战，神经形态计算已经成为一种有前途的解决方案，其中受生物启发的尖峰神经网络（SNNs）使用事件驱动摄像头或数据转换预处理的尖峰来有效地执行稀疏计算。然而，针对自主代理的SNN部署研究仍处于早期阶段。因此，尚未系统地定义用于实现自主代理的高效SNN部署的优化阶段。为此，我们提出了一个名为SNN4Agents的新框架，包括一组用于设计以自主代理应用为目标的节能SNN的优化技术。我们的SNN4Agents采用了权重量化、时间步减少和注意窗口减少等方法，共同提高能效、减少内存占用、优化处理延迟，同时保持高准确性。在评估中，我们调查了基于事件的车辆识别的用例，并探讨了准确性、延迟、内存和能耗之间的权衡。实验结果显示，与NCARS数据集的最新工作相比，我们提出的框架可以保持高准确性（即84.12%准确率），节省68.75%内存，加速3.58倍，提高4.03倍的能效。通过这种方式，我们的SNN4Agents框架为实现自主代理的节能SNN部署铺平了道路。

更新时间: 2024-06-18 08:36:11

领域: cs.RO,cs.AI,cs.LG,cs.NE

下载: http://arxiv.org/abs/2404.09331v2

Embeddings between Barron spaces with higher order activation functions

The approximation properties of infinitely wide shallow neural networks heavily depend on the choice of the activation function. To understand this influence, we study embeddings between Barron spaces with different activation functions. These embeddings are proven by providing push-forward maps on the measures $\mu$ used to represent functions $f$. An activation function of particular interest is the rectified power unit ($\operatorname{RePU}$) given by $\operatorname{RePU}_s(x)=\max(0,x)^s$. For many commonly used activation functions, the well-known Taylor remainder theorem can be used to construct a push-forward map, which allows us to prove the embedding of the associated Barron space into a Barron space with a $\operatorname{RePU}$ as activation function. Moreover, the Barron spaces associated with the $\operatorname{RePU}_s$ have a hierarchical structure similar to the Sobolev spaces $H^m$.

Updated: 2024-06-18 08:33:24

标题: Barron空间之间带有高阶激活函数的嵌入

摘要: 无限宽浅层神经网络的逼近性质在很大程度上取决于激活函数的选择。为了理解这种影响，我们研究了具有不同激活函数的Barron空间之间的嵌入。通过提供用于表示函数f的测度μ上的推送映射来证明这些嵌入。一种特别感兴趣的激活函数是修正功率单元（RePU），其定义为RePU_s(x)=max(0,x)^s。对于许多常用的激活函数，可以使用众所周知的泰勒余项定理构建推送映射，从而使我们能够证明相关的Barron空间嵌入到具有RePU作为激活函数的Barron空间中。此外，与RePU_s相关的Barron空间具有类似于Sobolev空间H^m的层次结构。

更新时间: 2024-06-18 08:33:24

领域: stat.ML,cs.LG,math.FA,46E35 (Primary) 46E15, 46G12 (Secondary),I.2.6; G.1.9

下载: http://arxiv.org/abs/2305.15839v2

Timeline-based Sentence Decomposition with In-Context Learning for Temporal Fact Extraction

Facts extraction is pivotal for constructing knowledge graphs. Recently, the increasing demand for temporal facts in downstream tasks has led to the emergence of the task of temporal fact extraction. In this paper, we specifically address the extraction of temporal facts from natural language text. Previous studies fail to handle the challenge of establishing time-to-fact correspondences in complex sentences. To overcome this hurdle, we propose a timeline-based sentence decomposition strategy using large language models (LLMs) with in-context learning, ensuring a fine-grained understanding of the timeline associated with various facts. In addition, we evaluate the performance of LLMs for direct temporal fact extraction and get unsatisfactory results. To this end, we introduce TSDRE, a method that incorporates the decomposition capabilities of LLMs into the traditional fine-tuning of smaller pre-trained language models (PLMs). To support the evaluation, we construct ComplexTRED, a complex temporal fact extraction dataset. Our experiments show that TSDRE achieves state-of-the-art results on both HyperRED-Temporal and ComplexTRED datasets.

Updated: 2024-06-18 08:22:29

标题: 基于时间轴的句子分解与上下文学习用于时间事实提取

摘要: 事实提取对于构建知识图谱至关重要。最近，在下游任务中对时间事实的增加需求导致了时间事实提取任务的出现。本文专门讨论从自然语言文本中提取时间事实。先前的研究未能处理建立复杂句子中时间与事实对应的挑战。为了克服这一障碍，我们提出了一种基于时间线的句子分解策略，利用大型语言模型（LLMs）进行上下文学习，确保对与各种事实相关的时间线有细致的理解。此外，我们评估了LLMs用于直接提取时间事实的性能，并得到了不理想的结果。为此，我们引入了TSDRE，一种将LLMs的分解能力融入传统的对较小的预训练语言模型（PLMs）进行微调的方法。为了支持评估，我们构建了ComplexTRED，一个复杂的时间事实提取数据集。我们的实验表明，TSDRE在HyperRED-Temporal和ComplexTRED数据集上取得了最先进的结果。

更新时间: 2024-06-18 08:22:29

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2405.10288v3

FedMKT: Federated Mutual Knowledge Transfer for Large and Small Language Models

Recent research in federated large language models (LLMs) has primarily focused on enabling clients to fine-tune their locally deployed homogeneous LLMs collaboratively or on transferring knowledge from server-based LLMs to small language models (SLMs) at downstream clients. However, a significant gap remains in the simultaneous mutual enhancement of both the server's LLM and clients' SLMs. To bridge this gap, we propose FedMKT, a parameter-efficient federated mutual knowledge transfer framework for large and small language models. This framework is designed to adaptively transfer knowledge from the server's LLM to clients' SLMs while concurrently enriching the LLM with clients' unique domain insights. We facilitate token alignment using minimum edit distance (MinED) and then selective mutual knowledge transfer between client-side SLMs and a server-side LLM, aiming to collectively enhance their performance. Through extensive experiments across three distinct scenarios, we evaluate the effectiveness of FedMKT using various public LLMs and SLMs on a range of NLP text generation tasks. Empirical results demonstrate that FedMKT simultaneously boosts the performance of both LLMs and SLMs.

Updated: 2024-06-18 08:17:00

标题: FedMKT：大型和小型语言模型的联合互相知识传输

摘要: 最近的联邦大型语言模型（LLMs）研究主要集中在使客户能够协作地对其本地部署的同质LLMs进行精细调整，或将知识从基于服务器的LLMs转移给下游客户的小语言模型（SLMs）。然而，仍然存在一个重要的差距，即同时相互增强服务器的LLM和客户的SLMs。为了弥合这一差距，我们提出了FedMKT，这是一个参数高效的联邦互相知识转移框架，适用于大型和小型语言模型。该框架旨在从服务器的LLM向客户的SLMs自适应地转移知识，同时用客户独特的领域见解丰富LLM。我们利用最小编辑距离（MinED）促进令牌对齐，然后在客户端SLMs和服务器端LLM之间进行选择性的相互知识转移，旨在共同提高它们的性能。通过在三种不同场景下进行广泛的实验，我们评估了FedMKT在各种公共LLMs和SLMs上在一系列NLP文本生成任务中的有效性。实证结果表明，FedMKT同时提升了LLMs和SLMs的性能。

更新时间: 2024-06-18 08:17:00

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2406.02224v2

Vanishing Variance Problem in Fully Decentralized Neural-Network Systems

Federated learning and gossip learning are emerging methodologies designed to mitigate data privacy concerns by retaining training data on client devices and exclusively sharing locally-trained machine learning (ML) models with others. The primary distinction between the two lies in their approach to model aggregation: federated learning employs a centralized parameter server, whereas gossip learning adopts a fully decentralized mechanism, enabling direct model exchanges among nodes. This decentralized nature often positions gossip learning as less efficient compared to federated learning. Both methodologies involve a critical step: computing a representation of received ML models and integrating this representation into the existing model. Conventionally, this representation is derived by averaging the received models, exemplified by the FedAVG algorithm. Our findings suggest that this averaging approach inherently introduces a potential delay in model convergence. We identify the underlying cause and refer to it as the "vanishing variance" problem, where averaging across uncorrelated ML models undermines the optimal variance established by the Xavier weight initialization. Unlike federated learning where the central server ensures model correlation, and unlike traditional gossip learning which circumvents this problem through model partitioning and sampling, our research introduces a variance-corrected model averaging algorithm. This novel algorithm preserves the optimal variance needed during model averaging, irrespective of network topology or non-IID data distributions. Our extensive simulation results demonstrate that our approach enables gossip learning to achieve convergence efficiency comparable to that of federated learning.

Updated: 2024-06-18 08:16:52

标题: 全分散神经网络系统中的消失方差问题

摘要: 联邦学习和八卦学习是新兴的方法论，旨在通过在客户设备上保留训练数据并仅与其他人分享本地训练的机器学习（ML）模型来减轻数据隐私问题。这两种方法的主要区别在于它们对模型聚合的方法：联邦学习采用集中式参数服务器，而八卦学习则采用完全分散的机制，使节点之间可以直接交换模型。这种分散化的性质通常使八卦学习相比联邦学习效率较低。这两种方法都涉及一个关键步骤：计算接收到的ML模型的表示，并将该表示集成到现有模型中。传统上，这种表示是通过对接收到的模型进行平均而导出的，例如FedAVG算法。我们的研究结果表明，这种平均方法在本质上引入了模型收敛的潜在延迟。我们确定了潜在的原因，并将其称为“消失方差”问题，在这个问题中，跨不相关的ML模型的平均会削弱Xavier权重初始化所建立的最佳方差。与联邦学习不同，其中集中服务器确保模型相关性，也不同于传统的八卦学习，后者通过模型分区和抽样来避免这个问题，我们的研究引入了一个方差校正的模型平均算法。这种新颖算法保持了模型平均中所需的最佳方差，不受网络拓扑或非IID数据分布的影响。我们广泛的模拟结果表明，我们的方法使八卦学习能够实现与联邦学习相当的收敛效率。

更新时间: 2024-06-18 08:16:52

领域: cs.LG,cs.DC

下载: http://arxiv.org/abs/2404.04616v2

CM2-Net: Continual Cross-Modal Mapping Network for Driver Action Recognition

Driver action recognition has significantly advanced in enhancing driver-vehicle interactions and ensuring driving safety by integrating multiple modalities, such as infrared and depth. Nevertheless, compared to RGB modality only, it is always laborious and costly to collect extensive data for all types of non-RGB modalities in car cabin environments. Therefore, previous works have suggested independently learning each non-RGB modality by fine-tuning a model pre-trained on RGB videos, but these methods are less effective in extracting informative features when faced with newly-incoming modalities due to large domain gaps. In contrast, we propose a Continual Cross-Modal Mapping Network (CM2-Net) to continually learn each newly-incoming modality with instructive prompts from the previously-learned modalities. Specifically, we have developed Accumulative Cross-modal Mapping Prompting (ACMP), to map the discriminative and informative features learned from previous modalities into the feature space of newly-incoming modalities. Then, when faced with newly-incoming modalities, these mapped features are able to provide effective prompts for which features should be extracted and prioritized. These prompts are accumulating throughout the continual learning process, thereby boosting further recognition performances. Extensive experiments conducted on the Drive&Act dataset demonstrate the performance superiority of CM2-Net on both uni- and multi-modal driver action recognition.

Updated: 2024-06-18 08:10:58

标题: CM2-Net：用于驾驶员动作识别的持续交叉模态映射网络

摘要: 司机动作识别在增强驾驶员-车辆交互和确保驾驶安全方面取得了显著进展，通过整合多种模态，如红外和深度。然而，与仅有RGB模态相比，在汽车舱环境中收集所有类型非RGB模态的大量数据始终是费力和昂贵的。因此，先前的研究建议独立学习每个非RGB模态，通过微调在RGB视频上预训练的模型，但是当面对新进入的模态时，这些方法在提取信息特征方面不够有效，因为存在较大的领域差距。相反，我们提出了一个连续交叉模态映射网络（CM2-Net），通过不断学习每个新进入的模态，并从先前学习的模态中得到指导性提示。具体来说，我们开发了累积交叉模态映射提示（ACMP），将从先前模态学习到的具有鉴别性和信息性特征映射到新进入模态的特征空间中。然后，当面对新进入的模态时，这些映射特征能够提供有效的提示，指导应该提取和优先考虑哪些特征。这些提示在连续学习过程中不断累积，从而提高进一步的识别性能。在Drive&Act数据集上进行的大量实验表明，CM2-Net在单模态和多模态驾驶员动作识别方面的性能优越性。

更新时间: 2024-06-18 08:10:58

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2406.11340v2

Foundation Models for Time Series Analysis: A Tutorial and Survey

Time series analysis stands as a focal point within the data mining community, serving as a cornerstone for extracting valuable insights crucial to a myriad of real-world applications. Recent advances in Foundation Models (FMs) have fundamentally reshaped the paradigm of model design for time series analysis, boosting various downstream tasks in practice. These innovative approaches often leverage pre-trained or fine-tuned FMs to harness generalized knowledge tailored for time series analysis. This survey aims to furnish a comprehensive and up-to-date overview of FMs for time series analysis. While prior surveys have predominantly focused on either application or pipeline aspects of FMs in time series analysis, they have often lacked an in-depth understanding of the underlying mechanisms that elucidate why and how FMs benefit time series analysis. To address this gap, our survey adopts a methodology-centric classification, delineating various pivotal elements of time-series FMs, including model architectures, pre-training techniques, adaptation methods, and data modalities. Overall, this survey serves to consolidate the latest advancements in FMs pertinent to time series analysis, accentuating their theoretical underpinnings, recent strides in development, and avenues for future exploration.

Updated: 2024-06-18 08:10:07

标题: 时间序列分析的基础模型：教程与调查

摘要: 时间序列分析是数据挖掘社区的焦点，是提取对各种实际应用至关重要的宝贵见解的基石。最近基于基础模型（FMs）的进展从根本上改变了时间序列分析模型设计的范式，在实践中推动了各种下游任务。这些创新方法通常利用预训练或微调的FMs，以利用为时间序列分析量身定制的广义知识。本调查旨在提供关于时间序列分析中FMs的全面和最新概述。虽然先前的调查主要集中在时间序列分析中FMs的应用或流程方面，但往往缺乏深入了解解释FMs为何以及如何有益于时间序列分析的基本机制。为填补这一空白，我们的调查采用了以方法为中心的分类，勾画了时间序列FMs的各种关键要素，包括模型架构、预训练技术、适应方法和数据模态。总的来说，本调查旨在巩固与时间序列分析相关的FMs的最新进展，强调它们的理论基础、最新的发展进展以及未来探索的途径。

更新时间: 2024-06-18 08:10:07

领域: cs.LG

下载: http://arxiv.org/abs/2403.14735v3

QOG:Question and Options Generation based on Language Model

Question-Options Generation (QOG) is a task that involves generating a set of question-options pairs given context. This task has various applications, including fine-tuning large models, information retrieval, and automated multiple-choice question generation for education. In this paper, we develop QOG models using three different methods based on fine-tuning sequence-to-sequence language models (LMs). Experiments demonstrate that the end-to-end QOG model is computationally efficient and stable during both training and inference, outperforming other methods. Furthermore, our analysis indicates that our QOG models are competitive on the QOG task compared to the large language model Llama 3-8B.

Updated: 2024-06-18 08:09:58

标题: QOG：基于语言模型的问题和选项生成

摘要: Question-Options Generation（QOG）是一个涉及生成一组问题-选项对的任务，给定上下文。这个任务有各种应用，包括对大型模型进行微调，信息检索以及教育中自动生成多选题。在本文中，我们使用三种基于微调序列到序列语言模型（LMs）的方法开发了QOG模型。实验表明，端到端的QOG模型在训练和推理过程中计算效率高，稳定性强，胜过其他方法。此外，我们的分析表明，与大型语言模型Llama 3-8B相比，我们的QOG模型在QOG任务上具有竞争力。

更新时间: 2024-06-18 08:09:58

领域: cs.CL,cs.LG

下载: http://arxiv.org/abs/2406.12381v1

A Framework of SO(3)-equivariant Non-linear Representation Learning and its Application to Electronic-Structure Hamiltonian Prediction

We present both a theoretical and a methodological framework that addresses a critical challenge in applying deep learning to physical systems: the reconciliation of non-linear expressiveness with SO(3)-equivariance in predictions of SO(3)-equivariant quantities. Inspired by covariant theory in physics, we address this problem by exploring the mathematical relationships between SO(3)-invariant and SO(3)-equivariant quantities and their representations. We first construct theoretical SO(3)-invariant quantities derived from the SO(3)-equivariant regression targets, and use these invariant quantities as supervisory labels to guide the learning of high-quality SO(3)-invariant features. Given that SO(3)-invariance is preserved under non-linear operations, the encoding process for invariant features can extensively utilize non-linear mappings, thereby fully capturing the non-linear patterns inherent in physical systems. Building on this foundation, we propose a gradient-based mechanism to induce SO(3)-equivariant encodings of various degrees from the learned SO(3)-invariant features. This mechanism can incorporate non-linear expressive capabilities into SO(3)-equivariant representations, while theoretically preserving their equivariant properties as we prove. We apply our theory and method to the electronic-structure Hamiltonian prediction tasks, experimental results on eight benchmark databases covering multiple types of elements and challenging scenarios show dramatic breakthroughs on the state-of-the-art prediction accuracy, with improvements of up to 40% in predicting Hamiltonians and up to 76% in predicting downstream physical quantities such as occupied orbital energy. Our approach goes beyond handling physical systems and offers a promising general solution to the critical dilemma between equivariance and non-linear expressiveness for the deep learning paradigm.

Updated: 2024-06-18 08:08:17

标题: 一个SO（3）-等变非线性表示学习框架及其在电子结构哈密顿量预测中的应用

摘要: 我们提出了一个既有理论又有方法的框架，用于解决将深度学习应用于物理系统时面临的一个关键挑战：非线性表达与SO(3)等变性在SO(3)等变量预测中的调和。受物理学中的协变理论启发，我们通过探索SO(3)不变量和SO(3)等变量以及它们的表示之间的数学关系来解决这个问题。我们首先构建了从SO(3)等变回归目标导出的理论上的SO(3)不变量，然后使用这些不变量作为监督标签来引导学习高质量的SO(3)不变特征。鉴于SO(3)不变性在非线性操作下得以保留，不变特征的编码过程可以广泛利用非线性映射，从而完全捕捉物理系统中固有的非线性模式。基于这一基础，我们提出了一种基于梯度的机制，从学习的SO(3)不变特征中诱导出各种程度的SO(3)等变编码。这种机制可以将非线性表现能力纳入SO(3)等变表示，同时在理论上保持它们的等变性质，我们已经证明了这一点。我们将我们的理论和方法应用于电子结构哈密顿预测任务，在涵盖多种元素类型和具有挑战性场景的八个基准数据库上的实验结果显示了在最先进的预测准确率方面的显著突破，预测哈密顿量的准确性提高了高达40%，预测占据轨道能量等下游物理量的准确性提高了高达76%。我们的方法超越了处理物理系统，为深度学习范式中等变性和非线性表达之间的关键困境提供了一个有前途的通用解决方案。

更新时间: 2024-06-18 08:08:17

领域: cs.LG,cond-mat.mtrl-sci,physics.chem-ph

下载: http://arxiv.org/abs/2405.05722v3

A Survey on Large Language Models for Recommendation

Large Language Models (LLMs) have emerged as powerful tools in the field of Natural Language Processing (NLP) and have recently gained significant attention in the domain of Recommendation Systems (RS). These models, trained on massive amounts of data using self-supervised learning, have demonstrated remarkable success in learning universal representations and have the potential to enhance various aspects of recommendation systems by some effective transfer techniques such as fine-tuning and prompt tuning, and so on. The crucial aspect of harnessing the power of language models in enhancing recommendation quality is the utilization of their high-quality representations of textual features and their extensive coverage of external knowledge to establish correlations between items and users. To provide a comprehensive understanding of the existing LLM-based recommendation systems, this survey presents a taxonomy that categorizes these models into two major paradigms, respectively Discriminative LLM for Recommendation (DLLM4Rec) and Generative LLM for Recommendation (GLLM4Rec), with the latter being systematically sorted out for the first time. Furthermore, we systematically review and analyze existing LLM-based recommendation systems within each paradigm, providing insights into their methodologies, techniques, and performance. Additionally, we identify key challenges and several valuable findings to provide researchers and practitioners with inspiration. We have also created a GitHub repository to index relevant papers on LLMs for recommendation, https://github.com/WLiK/LLM4Rec.

Updated: 2024-06-18 08:07:01

标题: 关于大型语言模型在推荐系统中的调查

摘要: 大型语言模型(LLMs)已经成为自然语言处理(NLP)领域中强大的工具，并且最近在推荐系统(RS)领域引起了重大关注。这些模型通过大量数据使用自监督学习进行训练，已经展示了在学习通用表示方面的显著成功，并且有潜力通过一些有效的迁移技术如微调和提示调整等增强推荐系统的各个方面。利用语言模型的力量增强推荐质量的关键方面是利用它们对文本特征的高质量表示和对外部知识的广泛覆盖，以建立物品和用户之间的相关性。为了全面了解现有基于LLM的推荐系统，本调查提出了一个分类法，将这些模型分为两个主要范式，分别是用于推荐的判别式LLM(DLLM4Rec)和用于推荐的生成式LLM(GLLM4Rec)，后者首次被系统整理出来。此外，我们系统地审查和分析了每个范式内现有的基于LLM的推荐系统，提供了关于它们方法、技术和性能的见解。此外，我们还确定了关键挑战和一些有价值的发现，为研究人员和从业者提供灵感。我们还创建了一个 GitHub 仓库，用于索引有关LLM用于推荐的相关论文，https://github.com/WLiK/LLM4Rec。

更新时间: 2024-06-18 08:07:01

领域: cs.IR,cs.AI

下载: http://arxiv.org/abs/2305.19860v5

Efficient mapping of phase diagrams with conditional normalizing flows

The accurate prediction of phase diagrams is of central importance for both the fundamental understanding of materials as well as for technological applications in material sciences. However, the computational prediction of the relative stability between phases based on their free energy is a daunting task, as traditional free energy estimators require a large amount of simulation data to obtain uncorrelated equilibrium samples over a grid of thermodynamic states. In this work, we develop deep generative machine learning models for entire phase diagrams, employing normalizing flows conditioned on the thermodynamic states, e.g., temperature and pressure, that they map to. By training a single normalizing flow to transform the equilibrium distribution sampled at only one reference thermodynamic state to a wide range of target temperatures and pressures, we can efficiently generate equilibrium samples across the entire phase diagram. Using a permutation-equivariant architecture allows us, thereby, to treat solid and liquid phases on the same footing. We demonstrate our approach by predicting the solid-liquid coexistence line for a Lennard-Jones system in excellent agreement with state-of-the-art free energy methods while significantly reducing the number of energy evaluations needed.

Updated: 2024-06-18 08:05:04

标题: 使用条件归一化流高效映射相图

摘要: 相位图的精确预测对于材料的基本理解以及在材料科学技术应用中都具有重要意义。然而，基于自由能计算相对稳定性的计算预测是一项艰巨的任务，因为传统的自由能估计器需要大量模拟数据以获取在热力学状态网格上无关联的平衡样本。在这项工作中，我们开发了深度生成式机器学习模型来预测整个相位图，采用以热力学状态（例如温度和压力）为条件的归一化流进行映射。通过训练单个归一化流将采样自仅一个参考热力学状态的平衡分布转换为广泛范围的目标温度和压力，我们可以高效地生成整个相位图上的平衡样本。使用置换等变结构使我们能够平等对待固相和液相。我们通过预测Lennard-Jones系统的固液共存线来演示我们的方法，结果与最先进的自由能方法完全一致，同时显著减少了所需的能量评估数量。

更新时间: 2024-06-18 08:05:04

领域: cond-mat.stat-mech,cs.LG

下载: http://arxiv.org/abs/2406.12378v1

DCS Chain: A Flexible Private Blockchain System

Blockchain technology has seen tremendous development over the past few years. Despite the emergence of numerous blockchain systems, they all suffer from various limitations, which can all be attributed to the fundamental issue posed by the DCS trilemma. In light of this, this work introduces a novel private blockchain system named DCS Chain. The core idea is to quantify the DCS metrics and dynamically adjust the blockchain's performance across these three dimensions, to achieve theoretically optimal system performance. Overall, our system provides a comprehensive suite of blockchain essentials, including DCS quantification, consensus protocol adjustment, and communication network simulation.

Updated: 2024-06-18 08:04:24

标题: DCS链：一种灵活的私有区块链系统

摘要: 区块链技术在过去几年取得了巨大的发展。尽管出现了许多区块链系统，它们都存在各种限制，这些限制可以归因于DCS三难题所提出的根本问题。鉴于此，本文介绍了一种名为DCS Chain的新型私有区块链系统。其核心思想是量化DCS指标，并动态调整区块链在这三个维度上的性能，以实现理论上的最佳系统性能。总的来说，我们的系统提供了一套全面的区块链基础设施，包括DCS量化、共识协议调整和通信网络模拟。

更新时间: 2024-06-18 08:04:24

领域: cs.CR

下载: http://arxiv.org/abs/2406.12376v1

GW-MoE: Resolving Uncertainty in MoE Router with Global Workspace Theory

Mixture-of-Experts (MoE) has been demonstrated as an efficient method to scale up models. By dynamically and sparsely selecting activated experts, MoE can effectively reduce computational costs. Despite the success, we observe that many tokens in the MoE models have uncertain routing results. These tokens have nearly equal scores for choosing each expert, and we demonstrate that this uncertainty can lead to incorrect selections. Inspired by the Global Workspace Theory (GWT), we propose a new fine-tuning method, GW-MoE, to address this issue. The core idea is to broadcast the uncertain tokens across experts during fine-tuning. Therefore, these tokens can acquire the necessary knowledge from any expert during inference and become less sensitive to the choice. GW-MoE does not introduce additional inference overhead. We validate that GW can mitigate the uncertain problem and consistently improve in different tasks (text classification, question answering, summarization, code generation, and mathematical problem solving) and model sizes (650M and 8B parameters).

Updated: 2024-06-18 08:03:51

标题: GW-MoE: 用全局工作空间理论解决MoE路由器中的不确定性

摘要: 混合专家（MoE）已被证明是一种有效的方法来扩展模型。通过动态和稀疏地选择已激活的专家，MoE可以有效地减少计算成本。尽管取得了成功，我们观察到MoE模型中的许多标记具有不确定的路由结果。这些标记对于选择每个专家几乎有相同的得分，我们证明这种不确定性可能导致错误的选择。受全局工作空间理论（GWT）的启发，我们提出了一种新的微调方法，GW-MoE，以解决这个问题。其核心思想是在微调过程中将不确定的标记广播到专家之间。因此，在推断过程中，这些标记可以从任何专家获取必要的知识，并对选择变得不太敏感。GW-MoE不会引入额外的推断开销。我们验证了GW可以缓解不确定性问题，并在不同任务（文本分类、问题回答、摘要、代码生成和数学问题求解）和模型大小（650M和8B参数）中持续改进。

更新时间: 2024-06-18 08:03:51

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2406.12375v1

First Multi-Dimensional Evaluation of Flowchart Comprehension for Multimodal Large Language Models

With the development of Multimodal Large Language Models (MLLMs) technology, its general capabilities are increasingly powerful. To evaluate the various abilities of MLLMs, numerous evaluation systems have emerged. But now there is still a lack of a comprehensive method to evaluate MLLMs in the tasks related to flowcharts, which are very important in daily life and work. We propose the first comprehensive method, FlowCE, to assess MLLMs across various dimensions for tasks related to flowcharts. It encompasses evaluating MLLMs' abilities in Reasoning, Localization Recognition, Information Extraction, Logical Verification, and Summarization on flowcharts. However, we find that even the GPT4o model achieves only a score of 56.63. Among open-source models, Phi-3-Vision obtained the highest score of 49.97. We hope that FlowCE can contribute to future research on MLLMs for tasks based on flowcharts. \url{https://github.com/360AILAB-NLP/FlowCE} \end{abstract}

Updated: 2024-06-18 08:03:31

标题: 多模态大型语言模型流程图理解的首次多维评估

摘要: 随着多模态大型语言模型（MLLMs）技术的发展，其通用能力越来越强大。为了评估MLLMs的各种能力，出现了许多评估系统。但目前仍然缺乏一种全面的方法来评估与流程图相关的任务中的MLLMs，这些任务在日常生活和工作中非常重要。我们提出了第一个全面的方法FlowCE，用于评估与流程图相关任务中MLLMs的各个维度。它包括评估MLLMs在流程图上的推理、定位识别、信息提取、逻辑验证和总结能力。然而，我们发现即使GPT4o模型也只能获得56.63的分数。在开源模型中，Phi-3-Vision获得了最高的得分49.97。我们希望FlowCE能为未来基于流程图的MLLMs研究做出贡献。

更新时间: 2024-06-18 08:03:31

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2406.10057v2

Problem-Solving in Language Model Networks

To improve the reasoning and question-answering capabilities of Large Language Models (LLMs), several multi-agent approaches have been introduced. While these methods enhance performance, the application of collective intelligence-based approaches to complex network structures and the dynamics of agent interactions remain underexplored. This work extends the concept of multi-agent debate to more general network topologies, measuring the question-answering accuracy, influence, consensus, and the effects of bias on the collective. The results show that random networks perform similarly to fully connected networks despite using significantly fewer tokens. Furthermore, a strong consensus among agents in correlates with correct answers, whereas divided responses typically indicate incorrect answers. Analysing the influence of the agents reveals a balance between self-reflection and interconnectedness; self-reflection aids when local interactions are incorrect, and local interactions aid when the agent itself is incorrect. Additionally, bias plays a strong role in system performance with correctly biased hub nodes boosting performance. These insights suggest that using random networks or scale-free networks with knowledgeable agents placed in central positions can enhance the overall performance of multi-agent systems.

Updated: 2024-06-18 07:59:14

标题: 语言模型网络中的问题解决

摘要: 为了提高大型语言模型（LLMs）的推理和问答能力，引入了几种多智能体方法。虽然这些方法提高了性能，但对复杂网络结构和智能体相互作用动态的集体智能方法的应用仍未得到充分探讨。本研究将多智能体辩论的概念扩展到更一般的网络拓扑结构，测量了问答准确性、影响力、共识以及偏见对集体的影响。结果显示，尽管使用的令牌数量明显较少，随机网络表现与完全连接的网络类似。此外，代理之间的强烈共识与正确答案相关，而分歧的回应通常表示错误答案。分析代理的影响力揭示了自我反思和相互连接之间的平衡；当局部交互错误时，自我反思有助于，当代理本身错误时，局部交互有助于。此外，偏见在系统性能中起着重要作用，正确偏见的中心节点提高了性能。这些见解表明，使用随机网络或具有知识代理置于中心位置的无标度网络可以增强多智能体系统的整体性能。

更新时间: 2024-06-18 07:59:14

领域: cs.AI,cs.SI

下载: http://arxiv.org/abs/2406.12374v1

WebCanvas: Benchmarking Web Agents in Online Environments

For web agents to be practically useful, they must adapt to the continuously evolving web environment characterized by frequent updates to user interfaces and content. However, most existing benchmarks only capture the static aspects of the web. To bridge this gap, we introduce WebCanvas, an innovative online evaluation framework for web agents that effectively addresses the dynamic nature of web interactions. WebCanvas contains three main components to facilitate realistic assessments: (1) A novel evaluation metric which reliably capture critical intermediate actions or states necessary for task completions while disregarding noise caused by insignificant events or changed web-elements. (2) A benchmark dataset called Mind2Web-Live, a refined version of original Mind2Web static dataset containing 542 tasks with 2439 intermediate evaluation states; (3) Lightweight and generalizable annotation tools and testing pipelines that enables the community to collect and maintain the high-quality, up-to-date dataset. Building on WebCanvas, we open-source an agent framework with extensible modules for reasoning, providing a foundation for the community to conduct online inference and evaluations. Our best-performing agent achieves a task success rate of 23.1% and a task completion rate of 48.8% on the Mind2Web-Live test set. Additionally, we analyze the performance discrepancies across various websites, domains, and experimental environments. We encourage the community to contribute further insights on online agent evaluation, thereby advancing this field of research.

Updated: 2024-06-18 07:58:33

标题: WebCanvas：在线环境中Web代理的基准测试

摘要: 为了使网络代理在实践中发挥作用，它们必须适应不断更新的用户界面和内容的网络环境。然而，大多数现有的基准测试只捕捉网络的静态方面。为了弥合这一差距，我们引入了WebCanvas，这是一个创新的在线评估框架，有效地解决了网络交互的动态特性。WebCanvas包含三个主要组件，以促进真实的评估：（1）一种新颖的评估指标，可可靠地捕捉完成任务所需的关键中间动作或状态，同时忽略由于不重要事件或改变的网页元素引起的噪音。（2）一个名为Mind2Web-Live的基准数据集，是原始Mind2Web静态数据集的精制版本，包含542个任务和2439个中间评估状态；（3）轻量级和通用的注释工具和测试管道，使社区能够收集和维护高质量、最新的数据集。基于WebCanvas，我们开源了一个代理框架，具有可扩展的推理模块，为社区进行在线推断和评估奠定了基础。我们的性能最佳的代理在Mind2Web-Live测试集上实现了23.1%的任务成功率和48.8%的任务完成率。此外，我们分析了各种网站、领域和实验环境之间的性能差异。我们鼓励社区进一步贡献关于在线代理评估的见解，从而推动这一研究领域的发展。

更新时间: 2024-06-18 07:58:33

领域: cs.CL,cs.AI,cs.LG,68T50,I.2.7

下载: http://arxiv.org/abs/2406.12373v1

UAV-based Intelligent Information Systems on Winter Road Safety for Autonomous Vehicles

As autonomous vehicles continue to revolutionize transportation, addressing challenges posed by adverse weather conditions, particularly during winter, becomes paramount for ensuring safe and efficient operations. One of the most important aspects of a road safety inspection during adverse weather is when a limited lane width can reduce the capacity of the road and raise the risk of serious accidents involving autonomous vehicles. In this research, a method for improving driving challenges on roads in winter conditions, with a model that segments and estimates the width of the road from the perspectives of Uncrewed aerial vehicles and autonomous vehicles. The proposed approach in this article is needed to empower self-driving cars with up-to-date and accurate insights, enhancing their adaptability and decision-making capabilities in winter landscapes.

Updated: 2024-06-18 07:53:37

标题: 基于无人机的智能信息系统对自动驾驶车辆冬季道路安全的影响

摘要: 随着自动驾驶车辆继续改变交通运输方式，解决由恶劣天气条件带来的挑战，特别是在冬季期间，变得至关重要，以确保安全和高效的运营。在恶劣天气条件下进行道路安全检查的一个最重要方面是，当车道宽度受限时，可能会降低道路的容量并增加涉及自动驾驶车辆的严重事故的风险。在这项研究中，提出了一种改善冬季道路驾驶挑战的方法，该方法利用无人机和自动驾驶车辆的视角，分割和估计道路宽度。本文提出的方法旨在为自动驾驶汽车提供最新和准确的见解，增强它们在冬季景观中的适应性和决策能力。

更新时间: 2024-06-18 07:53:37

领域: cs.RO,cs.AI,cs.CV

下载: http://arxiv.org/abs/2406.12370v1

Uncovering communities of pipelines in the task-fMRI analytical space

Analytical workflows in functional magnetic resonance imaging are highly flexible with limited best practices as to how to choose a pipeline. While it has been shown that the use of different pipelines might lead to different results, there is still a lack of understanding of the factors that drive these differences and of the stability of these differences across contexts. We use community detection algorithms to explore the pipeline space and assess the stability of pipeline relationships across different contexts. We show that there are subsets of pipelines that give similar results, especially those sharing specific parameters (e.g. number of motion regressors, software packages, etc.). Those pipeline-to-pipeline patterns are stable across groups of participants but not across different tasks. By visualizing the differences between communities, we show that the pipeline space is mainly driven by the size of the activation area in the brain and the scale of statistic values in statistic maps.

Updated: 2024-06-18 07:52:28

标题: 在任务fMRI分析空间中发现管道社区

摘要: 功能磁共振成像中的分析工作流程具有高度灵活性，但在选择流程时存在有限的最佳实践。虽然已经证明使用不同的流程可能导致不同的结果，但对于驱动这些差异的因素以及这些差异在不同情境下的稳定性仍缺乏理解。我们使用社区检测算法来探索流程空间，并评估在不同情境下流程关系的稳定性。我们发现有一些子集的流程会产生类似的结果，特别是那些共享特定参数（例如运动回归器的数量、软件包等）。这些流程之间的模式在参与者群体中是稳定的，但在不同任务之间不稳定。通过可视化社区之间的差异，我们发现流程空间主要受大脑激活区域的大小和统计图中统计值的规模驱动。

更新时间: 2024-06-18 07:52:28

领域: cs.AI

下载: http://arxiv.org/abs/2312.06231v3

Spatial-Temporal Large Language Model for Traffic Prediction

Traffic prediction, an essential component for intelligent transportation systems, endeavours to use historical data to foresee future traffic features at specific locations. Although existing traffic prediction models often emphasize developing complex neural network structures, their accuracy has not improved. Recently, large language models have shown outstanding capabilities in time series analysis. Differing from existing models, LLMs progress mainly through parameter expansion and extensive pretraining while maintaining their fundamental structures. Motivated by these developments, we propose a Spatial-Temporal Large Language Model (ST-LLM) for traffic prediction. In the ST-LLM, we define timesteps at each location as tokens and design a spatial-temporal embedding to learn the spatial location and global temporal patterns of these tokens. Additionally, we integrate these embeddings by a fusion convolution to each token for a unified spatial-temporal representation. Furthermore, we innovate a partially frozen attention strategy to adapt the LLM to capture global spatial-temporal dependencies for traffic prediction. Comprehensive experiments on real traffic datasets offer evidence that ST-LLM is a powerful spatial-temporal learner that outperforms state-of-the-art models. Notably, the ST-LLM also exhibits robust performance in both few-shot and zero-shot prediction scenarios. The code is publicly available at https://github.com/ChenxiLiu-HNU/ST-LLM.

Updated: 2024-06-18 07:50:31

标题: 空间-时间大型语言模型用于交通预测

摘要: 交通预测是智能交通系统的一个重要组成部分，旨在利用历史数据预测特定位置未来的交通特征。尽管现有的交通预测模型通常强调开发复杂的神经网络结构，但它们的准确性并未提高。最近，大型语言模型在时间序列分析中表现出出色的能力。与现有模型不同，LLM主要通过参数扩展和大量预训练来进展，同时保持其基本结构。受这些发展的启发，我们提出了一种用于交通预测的空间-时间大型语言模型（ST-LLM）。在ST-LLM中，我们将每个位置的时间步长定义为令牌，并设计了一个空间-时间嵌入来学习这些令牌的空间位置和全局时间模式。此外，我们通过融合卷积将这些嵌入集成到每个令牌中，以实现统一的空间-时间表示。此外，我们创新地采用了部分冻结的注意力策略，以适应LLM捕捉用于交通预测的全局空间-时间依赖关系。对真实交通数据集的全面实验证明了ST-LLM是一个强大的空间-时间学习器，优于最先进的模型。值得注意的是，ST-LLM在少样本和零样本预测场景中也表现出稳健的性能。代码可以在https://github.com/ChenxiLiu-HNU/ST-LLM上公开获取。

更新时间: 2024-06-18 07:50:31

领域: cs.LG,cs.CL

下载: http://arxiv.org/abs/2401.10134v3

Competitive Learning for Achieving Content-specific Filters in Video Coding for Machines

This paper investigates the efficacy of jointly optimizing content-specific post-processing filters to adapt a human oriented video/image codec into a codec suitable for machine vision tasks. By observing that artifacts produced by video/image codecs are content-dependent, we propose a novel training strategy based on competitive learning principles. This strategy assigns training samples to filters dynamically, in a fuzzy manner, which further optimizes the winning filter on the given sample. Inspired by simulated annealing optimization techniques, we employ a softmax function with a temperature variable as the weight allocation function to mitigate the effects of random initialization. Our evaluation, conducted on a system utilizing multiple post-processing filters within a Versatile Video Coding (VVC) codec framework, demonstrates the superiority of content-specific filters trained with our proposed strategies, specifically, when images are processed in blocks. Using VVC reference software VTM 12.0 as the anchor, experiments on the OpenImages dataset show an improvement in the BD-rate reduction from -41.3% and -44.6% to -42.3% and -44.7% for object detection and instance segmentation tasks, respectively, compared to independently trained filters. The statistics of the filter usage align with our hypothesis and underscore the importance of jointly optimizing filters for both content and reconstruction quality. Our findings pave the way for further improving the performance of video/image codecs.

Updated: 2024-06-18 07:45:57

标题: 竞争学习实现视频编码机器中特定内容滤镜

摘要: 这篇论文研究了共同优化特定内容后处理滤波器的有效性，将人类导向的视频/图像编解码器调整为适用于机器视觉任务的编解码器。通过观察视频/图像编解码器产生的伪影是与内容相关的，我们提出了一种基于竞争学习原则的新型训练策略。该策略以一种模糊的方式动态地将训练样本分配给滤波器，进一步优化给定样本上获胜的滤波器。受模拟退火优化技术的启发，我们采用softmax函数和温度变量作为权重分配函数，以减轻随机初始化的影响。我们在利用多个后处理滤波器的系统上进行评估，这些滤波器嵌入在通用视频编码(VVC)编解码器框架中，展示了经过我们提出的策略训练的特定内容滤波器的优越性，特别是在对图像进行分块处理时。使用VVC参考软件VTM 12.0作为锚点，在OpenImages数据集上进行的实验显示，与独立训练的滤波器相比，对于目标检测和实例分割任务，BD-rate的减少从-41.3%和-44.6%分别提高到-42.3%和-44.7%。滤波器使用统计数据与我们的假设一致，并强调了同时优化内容和重建质量的滤波器的重要性。我们的发现为进一步提高视频/图像编解码器的性能铺平了道路。

更新时间: 2024-06-18 07:45:57

领域: cs.CV,cs.LG,cs.MM

下载: http://arxiv.org/abs/2406.12367v1

Structured Prediction in Online Learning

We study a theoretical and algorithmic framework for structured prediction in the online learning setting. The problem of structured prediction, i.e. estimating function where the output space lacks a vectorial structure, is well studied in the literature of supervised statistical learning. We show that our algorithm is a generalisation of optimal algorithms from the supervised learning setting, and achieves the same excess risk upper bound also when data are not i.i.d. Moreover, we consider a second algorithm designed especially for non-stationary data distributions, including adversarial data. We bound its stochastic regret in function of the variation of the data distributions.

Updated: 2024-06-18 07:45:02

标题: 在线学习中的结构化预测

摘要: 我们研究了在线学习环境中的结构化预测的理论和算法框架。结构化预测的问题，即估计输出空间缺乏向量结构的函数，在监督统计学习的文献中得到了很好的研究。我们表明，我们的算法是监督学习设置中最优算法的泛化，并且在数据不是独立同分布时也能实现相同的超额风险上界。此外，我们考虑了一种专为非稳态数据分布设计的第二种算法，包括对抗性数据。我们根据数据分布的变化来限制其随机遗憾。

更新时间: 2024-06-18 07:45:02

领域: cs.LG,math.ST,stat.ML,stat.TH

下载: http://arxiv.org/abs/2406.12366v1

Non-autoregressive Generative Models for Reranking Recommendation

Contemporary recommendation systems are designed to meet users' needs by delivering tailored lists of items that align with their specific demands or interests. In a multi-stage recommendation system, reranking plays a crucial role by modeling the intra-list correlations among items. The key challenge of reranking lies in the exploration of optimal sequences within the combinatorial space of permutations. Recent research proposes a generator-evaluator learning paradigm, where the generator generates multiple feasible sequences and the evaluator picks out the best sequence based on the estimated listwise score. The generator is of vital importance, and generative models are well-suited for the generator function. Current generative models employ an autoregressive strategy for sequence generation. However, deploying autoregressive models in real-time industrial systems is challenging. To address these issues, we propose a Non-AutoRegressive generative model for reranking Recommendation (NAR4Rec) designed to enhance efficiency and effectiveness. To tackle challenges such as sparse training samples and dynamic candidates, we introduce a matching model. Considering the diverse nature of user feedback, we employ a sequence-level unlikelihood training objective to differentiate feasible sequences from unfeasible ones. Additionally, to overcome the lack of dependency modeling in non-autoregressive models regarding target items, we introduce contrastive decoding to capture correlations among these items. Extensive offline experiments validate the superior performance of NAR4Rec over state-of-the-art reranking methods. Online A/B tests reveal that NAR4Rec significantly enhances the user experience. Furthermore, NAR4Rec has been fully deployed in a popular video app Kuaishou with over 300 million daily active users.

Updated: 2024-06-18 07:45:02

标题: 非自回归生成模型用于重新排序推荐

摘要: 当代推荐系统旨在通过提供与用户特定需求或兴趣相一致的定制列表来满足用户的需求。在多阶段推荐系统中，重新排序通过建模列表内项目之间的相关性发挥着关键作用。重新排序的关键挑战在于探索排列组合空间中的最佳序列。最近的研究提出了一种生成-评估学习范式，其中生成器生成多个可行序列，评估器基于估计的列表得分选择最佳序列。生成器至关重要，生成模型非常适合生成器功能。当前的生成模型采用自回归策略进行序列生成。然而，在实时工业系统中部署自回归模型具有挑战性。为了解决这些问题，我们提出了一种用于重新排序推荐的非自回归生成模型（NAR4Rec），旨在提高效率和有效性。为了解决稀疏训练样本和动态候选者等挑战，我们引入了一个匹配模型。考虑到用户反馈的多样性，我们采用了一个序列级不可能性训练目标，以区分可行序列和不可行序列。此外，为了克服非自回归模型中关于目标项的依赖建模不足的问题，我们引入对比解码以捕捉这些项之间的相关性。大量的离线实验验证了NAR4Rec相对于最先进的重新排序方法的出色性能。在线A/B测试显示，NAR4Rec显著提升了用户体验。此外，NAR4Rec已完全部署在一个拥有超过3亿日活跃用户的热门视频应用快手中。

更新时间: 2024-06-18 07:45:02

领域: cs.IR,cs.AI

下载: http://arxiv.org/abs/2402.06871v3

Certified ML Object Detection for Surveillance Missions

In this paper, we present a development process of a drone detection system involving a machine learning object detection component. The purpose is to reach acceptable performance objectives and provide sufficient evidences, required by the recommendations (soon to be published) of the ED 324 / ARP 6983 standard, to gain confidence in the dependability of the designed system.

Updated: 2024-06-18 07:42:22

标题: 监控任务的认证机器学习目标检测

摘要: 在本文中，我们介绍了一个涉及机器学习目标检测组件的无人机检测系统的开发过程。其目的是达到可接受的性能目标，并提供足够的证据，以符合ED 324 / ARP 6983标准（即将发布的推荐标准），从而增强对设计系统可靠性的信心。

更新时间: 2024-06-18 07:42:22

领域: cs.AI

下载: http://arxiv.org/abs/2406.12362v1

UrbanLLM: Autonomous Urban Activity Planning and Management with Large Language Models

Location-based services play an critical role in improving the quality of our daily lives. Despite the proliferation of numerous specialized AI models within spatio-temporal context of location-based services, these models struggle to autonomously tackle problems regarding complex urban planing and management. To bridge this gap, we introduce UrbanLLM, a fine-tuned large language model (LLM) designed to tackle diverse problems in urban scenarios. UrbanLLM functions as a problem-solver by decomposing urban-related queries into manageable sub-tasks, identifying suitable spatio-temporal AI models for each sub-task, and generating comprehensive responses to the given queries. Our experimental results indicate that UrbanLLM significantly outperforms other established LLMs, such as Llama and the GPT series, in handling problems concerning complex urban activity planning and management. UrbanLLM exhibits considerable potential in enhancing the effectiveness of solving problems in urban scenarios, reducing the workload and reliance for human experts.

Updated: 2024-06-18 07:41:42

标题: UrbanLLM：具有大型语言模型的自主城市活动规划和管理

摘要: 基于位置的服务在改善我们日常生活质量方面发挥着至关重要的作用。尽管在基于位置的服务的时空背景下存在许多专门的人工智能模型，但这些模型很难自主解决复杂城市规划和管理方面的问题。为了弥合这一差距，我们引入了UrbanLLM，这是一个经过精细调整的大型语言模型（LLM），旨在解决城市场景中的各种问题。UrbanLLM通过将与城市相关的查询分解为可管理的子任务，识别适合每个子任务的时空人工智能模型，并为给定查询生成全面的响应来充当问题解决者。我们的实验结果表明，UrbanLLM在处理涉及复杂城市活动规划和管理问题方面明显优于其他已建立的LLM，如Llama和GPT系列。UrbanLLM在提高解决城市场景中问题的效率方面具有显著潜力，减轻了人类专家的工作量和依赖。

更新时间: 2024-06-18 07:41:42

领域: cs.LG

下载: http://arxiv.org/abs/2406.12360v1

Memory Sequence Length of Data Sampling Impacts the Adaptation of Meta-Reinforcement Learning Agents

Fast adaptation to new tasks is extremely important for embodied agents in the real world. Meta-reinforcement learning (meta-RL) has emerged as an effective method to enable fast adaptation in unknown environments. Compared to on-policy meta-RL algorithms, off-policy algorithms rely heavily on efficient data sampling strategies to extract and represent the historical trajectories. However, little is known about how different data sampling methods impact the ability of meta-RL agents to represent unknown environments. Here, we investigate the impact of data sampling strategies on the exploration and adaptability of meta-RL agents. Specifically, we conducted experiments with two types of off-policy meta-RL algorithms based on Thompson sampling and Bayes-optimality theories in continuous control tasks within the MuJoCo environment and sparse reward navigation tasks. Our analysis revealed the long-memory and short-memory sequence sampling strategies affect the representation and adaptive capabilities of meta-RL agents. We found that the algorithm based on Bayes-optimality theory exhibited more robust and better adaptability than the algorithm based on Thompson sampling, highlighting the importance of appropriate data sampling strategies for the agent's representation of an unknown environment, especially in the case of sparse rewards.

Updated: 2024-06-18 07:41:40

标题: 数据采样的记忆序列长度对元强化学习代理的适应性产生影响

摘要: 快速适应新任务对于现实世界中的具象化代理非常重要。元强化学习（meta-RL）已经成为在未知环境中实现快速适应的有效方法。与基于策略的元强化学习算法相比，基于离线策略的算法依赖于高效的数据采样策略来提取和表示历史轨迹。然而，我们对不同的数据采样方法如何影响元强化学习代理表现未知环境的能力知之甚少。在这里，我们研究了数据采样策略对元强化学习代理探索和适应能力的影响。具体而言，我们在MuJoCo环境中的连续控制任务和稀疏奖励导航任务中进行了两种基于Thompson抽样和贝叶斯最优理论的离线策略元强化学习算法的实验。我们的分析揭示了长记忆和短记忆序列采样策略对元强化学习代理的表现和适应能力产生影响。我们发现基于贝叶斯最优理论的算法比基于Thompson抽样的算法表现更为稳健，适应能力更好，突显了适当的数据采样策略对代理在未知环境中的表现的重要性，特别是在稀疏奖励的情况下。

更新时间: 2024-06-18 07:41:40

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2406.12359v1

CHG Shapley: Efficient Data Valuation and Selection towards Trustworthy Machine Learning

Understanding the decision-making process of machine learning models is crucial for ensuring trustworthy machine learning. Data Shapley, a landmark study on data valuation, advances this understanding by assessing the contribution of each datum to model accuracy. However, the resource-intensive and time-consuming nature of multiple model retraining poses challenges for applying Data Shapley to large datasets. To address this, we propose the CHG (Conduct of Hardness and Gradient) score, which approximates the utility of each data subset on model accuracy during a single model training. By deriving the closed-form expression of the Shapley value for each data point under the CHG score utility function, we reduce the computational complexity to the equivalent of a single model retraining, an exponential improvement over existing methods. Additionally, we employ CHG Shapley for real-time data selection, demonstrating its effectiveness in identifying high-value and noisy data. CHG Shapley facilitates trustworthy model training through efficient data valuation, introducing a novel data-centric perspective on trustworthy machine learning.

Updated: 2024-06-18 07:38:31

标题: CHG Shapley：面向可信机器学习的高效数据估值和选择

摘要: 理解机器学习模型决策过程对确保可信赖的机器学习至关重要。数据Shapley是一项关于数据价值评估的里程碑研究，通过评估每个数据对模型准确性的贡献，推进了这种理解。然而，多次模型重新训练的资源密集和耗时特性给将Data Shapley应用于大型数据集带来了挑战。为了解决这个问题，我们提出了CHG（硬度和梯度）分数，该分数在单次模型训练期间近似计算出每个数据子集对模型准确性的效用。通过为CHG分数效用函数下的每个数据点推导出Shapley值的封闭形式表达式，我们将计算复杂性降低到与单次模型重新训练相当，相较于现有方法获得了指数级的改进。此外，我们利用CHG Shapley进行实时数据选择，展示其在识别高价值和嘈杂数据方面的有效性。CHG Shapley通过高效的数据估值促进了可信赖的模型训练，引入了一种新颖的数据中心的观点，助力可信赖的机器学习。

更新时间: 2024-06-18 07:38:31

领域: cs.GT,cs.LG

下载: http://arxiv.org/abs/2406.11730v2

Top-Down Bayesian Posterior Sampling for Sum-Product Networks

Sum-product networks (SPNs) are probabilistic models characterized by exact and fast evaluation of fundamental probabilistic operations. Its superior computational tractability has led to applications in many fields, such as machine learning with time constraints or accuracy requirements and real-time systems. The structural constraints of SPNs supporting fast inference, however, lead to increased learning-time complexity and can be an obstacle to building highly expressive SPNs. This study aimed to develop a Bayesian learning approach that can be efficiently implemented on large-scale SPNs. We derived a new full conditional probability of Gibbs sampling by marginalizing multiple random variables to expeditiously obtain the posterior distribution. The complexity analysis revealed that our sampling algorithm works efficiently even for the largest possible SPN. Furthermore, we proposed a hyperparameter tuning method that balances the diversity of the prior distribution and optimization efficiency in large-scale SPNs. Our method has improved learning-time complexity and demonstrated computational speed tens to more than one hundred times faster and superior predictive performance in numerical experiments on more than 20 datasets.

Updated: 2024-06-18 07:36:45

标题: 自顶向下的贝叶斯后验采样用于求和-乘积网络

摘要: Sum-product networks (SPNs)是一种以精确和快速评估基本概率操作为特征的概率模型。其优越的计算可操作性已经在许多领域得到应用，例如在具有时间约束或精度要求的机器学习和实时系统中。然而，支持快速推理的SPNs的结构约束导致了学习时间复杂性的增加，可能成为构建高度表现力SPNs的障碍。本研究旨在开发一种贝叶斯学习方法，可以有效地在大规模SPNs上实施。我们通过将多个随机变量边际化得到了一种新的吉布斯采样的全条件概率，以便迅速获得后验分布。复杂性分析表明，我们的采样算法即使对于最大可能的SPN也能有效工作。此外，我们提出了一种超参数调整方法，该方法在大规模SPNs中平衡了先验分布的多样性和优化效率。我们的方法改进了学习时间复杂性，并在20多个数据集的数值实验中表现出了比以前快数十到数百倍的计算速度和更优越的预测性能。

更新时间: 2024-06-18 07:36:45

领域: stat.ML,cs.LG

下载: http://arxiv.org/abs/2406.12353v1

How Graph Neural Networks Learn: Lessons from Training Dynamics

A long-standing goal in deep learning has been to characterize the learning behavior of black-box models in a more interpretable manner. For graph neural networks (GNNs), considerable advances have been made in formalizing what functions they can represent, but whether GNNs will learn desired functions during the optimization process remains less clear. To fill this gap, we study their training dynamics in function space. In particular, we find that the gradient descent optimization of GNNs implicitly leverages the graph structure to update the learned function, as can be quantified by a phenomenon which we call \emph{kernel-graph alignment}. We provide theoretical explanations for the emergence of this phenomenon in the overparameterized regime and empirically validate it on real-world GNNs. This finding offers new interpretable insights into when and why the learned GNN functions generalize, highlighting their limitations in heterophilic graphs. Practically, we propose a parameter-free algorithm that directly uses a sparse matrix (i.e. graph adjacency) to update the learned function. We demonstrate that this embarrassingly simple approach can be as effective as GNNs while being orders-of-magnitude faster.

Updated: 2024-06-18 07:34:39

标题: 图神经网络是如何学习的：从训练动态中得到的启示

摘要: 深度学习中一个长期以来的目标是以更可解释的方式表征黑盒模型的学习行为。对于图神经网络（GNNs），在形式化它们可以表示的函数方面取得了相当大的进展，但在优化过程中GNNs是否会学习到期望的函数仍然不够清晰。为了填补这一空白，我们研究它们在函数空间中的训练动态。特别地，我们发现GNNs的梯度下降优化隐式地利用图结构来更新学到的函数，这一现象可以通过我们称之为“核-图对齐”来量化。我们提供了这一现象在过参数化范围内出现的理论解释，并在真实世界的GNNs上进行了实证验证。这一发现为我们提供了新的可解释的洞见，说明了学到的GNN函数何时以及为何泛化，突显了它们在异质图中的局限性。在实践中，我们提出了一种无参数的算法，直接使用稀疏矩阵（即图的邻接矩阵）来更新学到的函数。我们证明这种极其简单的方法可以与GNNs一样有效，而且速度快得多。

更新时间: 2024-06-18 07:34:39

领域: cs.LG

下载: http://arxiv.org/abs/2310.05105v3

Language-Driven Active Learning for Diverse Open-Set 3D Object Detection

Object detection is crucial for ensuring safe autonomous driving. However, data-driven approaches face challenges when encountering minority or novel objects in the 3D driving scene. In this paper, we propose VisLED, a language-driven active learning framework for diverse open-set 3D Object Detection. Our method leverages active learning techniques to query diverse and informative data samples from an unlabeled pool, enhancing the model's ability to detect underrepresented or novel objects. Specifically, we introduce the Vision-Language Embedding Diversity Querying (VisLED-Querying) algorithm, which operates in both open-world exploring and closed-world mining settings. In open-world exploring, VisLED-Querying selects data points most novel relative to existing data, while in closed-world mining, it mines novel instances of known classes. We evaluate our approach on the nuScenes dataset and demonstrate its efficiency compared to random sampling and entropy-querying methods. Our results show that VisLED-Querying consistently outperforms random sampling and offers competitive performance compared to entropy-querying despite the latter's model-optimality, highlighting the potential of VisLED for improving object detection in autonomous driving scenarios. We make our code publicly available at https://github.com/Bjork-crypto/VisLED-Querying

Updated: 2024-06-18 07:34:33

标题: 语言驱动的多样化开放式三维物体检测主动学习

摘要: 目标检测对确保安全的自动驾驶至关重要。然而，数据驱动方法在遇到3D驾驶场景中的少数或新颖对象时面临挑战。在本文中，我们提出了VisLED，一种面向多样化开放式3D目标检测的语言驱动主动学习框架。我们的方法利用主动学习技术从未标记的数据池中查询多样化和信息丰富的数据样本，增强模型检测不常见或新颖对象的能力。具体来说，我们引入了Vision-Language Embedding Diversity Querying (VisLED-Querying)算法，该算法在开放世界探索和封闭世界挖掘设置中运行。在开放世界探索中，VisLED-Querying选择相对于现有数据最新颖的数据点，而在封闭世界挖掘中，它挖掘已知类别的新颖实例。我们在nuScenes数据集上评估了我们的方法，并展示了与随机抽样和熵查询方法相比的效率。我们的结果表明，尽管后者具有模型最优性，但VisLED-Querying始终优于随机抽样，并在性能上与熵查询方法相媲美，突显了VisLED在改善自动驾驶场景中的目标检测中的潜力。我们在https://github.com/Bjork-crypto/VisLED-Querying上公开了我们的代码。

更新时间: 2024-06-18 07:34:33

领域: cs.CV,cs.AI,cs.LG

下载: http://arxiv.org/abs/2404.12856v2

Understanding the Difficulty of Solving Cauchy Problems with PINNs

Physics-Informed Neural Networks (PINNs) have gained popularity in scientific computing in recent years. However, they often fail to achieve the same level of accuracy as classical methods in solving differential equations. In this paper, we identify two sources of this issue in the case of Cauchy problems: the use of $L^2$ residuals as objective functions and the approximation gap of neural networks. We show that minimizing the sum of $L^2$ residual and initial condition error is not sufficient to guarantee the true solution, as this loss function does not capture the underlying dynamics. Additionally, neural networks are not capable of capturing singularities in the solutions due to the non-compactness of their image sets. This, in turn, influences the existence of global minima and the regularity of the network. We demonstrate that when the global minimum does not exist, machine precision becomes the predominant source of achievable error in practice. We also present numerical experiments in support of our theoretical claims.

Updated: 2024-06-18 07:33:29

标题: 理解使用PINNs解决Cauchy问题的困难

摘要: 物理信息神经网络（PINNs）在科学计算领域近年来变得越来越受欢迎。然而，它们通常无法达到解决微分方程时经典方法的精度水平。在本文中，我们针对柯西问题中的两个问题源进行了识别：将$L^2$残差作为目标函数以及神经网络的逼近差距。我们表明，仅仅最小化$L^2$残差和初始条件误差的总和是不足以保证真实解的，因为这种损失函数无法捕捉到潜在的动态。此外，由于神经网络的像集非紧致，它们无法捕捉解中的奇点。这反过来影响了全局最小值的存在性和网络的正则性。我们证明，当全局最小值不存在时，机器精度在实践中成为可实现误差的主要来源。我们还展示了支持我们理论主张的数值实验。

更新时间: 2024-06-18 07:33:29

领域: cs.LG,cs.NA,math.NA

下载: http://arxiv.org/abs/2405.02561v2

Effective Generation of Feasible Solutions for Integer Programming via Guided Diffusion

Feasible solutions are crucial for Integer Programming (IP) since they can substantially speed up the solving process. In many applications, similar IP instances often exhibit similar structures and shared solution distributions, which can be potentially modeled by deep learning methods. Unfortunately, existing deep-learning-based algorithms, such as Neural Diving and Predict-and-search framework, are limited to generating only partial feasible solutions, and they must rely on solvers like SCIP and Gurobi to complete the solutions for a given IP problem. In this paper, we propose a novel framework that generates complete feasible solutions end-to-end. Our framework leverages contrastive learning to characterize the relationship between IP instances and solutions, and learns latent embeddings for both IP instances and their solutions. Further, the framework employs diffusion models to learn the distribution of solution embeddings conditioned on IP representations, with a dedicated guided sampling strategy that accounts for both constraints and objectives. We empirically evaluate our framework on four typical datasets of IP problems, and show that it effectively generates complete feasible solutions with a high probability (> 89.7 \%) without the reliance of Solvers and the quality of solutions is comparable to the best heuristic solutions from Gurobi. Furthermore, by integrating our method's sampled partial solutions with the CompleteSol heuristic from SCIP, the resulting feasible solutions outperform those from state-of-the-art methods across all datasets, exhibiting a 3.7 to 33.7\% improvement in the gap to optimal values, and maintaining a feasible ratio of over 99.7\% for all datasets.

Updated: 2024-06-18 07:33:05

标题: 通过引导扩散有效生成整数规划的可行解

摘要: 整数规划（IP）的可行解对于解决过程至关重要，因为它们可以大大加快求解过程。在许多应用中，相似的IP实例通常具有相似的结构和共享的解分布，这可能可以通过深度学习方法进行建模。然而，现有的基于深度学习的算法，如神经潜水和预测搜索框架，仅限于生成部分可行解，并且它们必须依赖于像SCIP和Gurobi这样的求解器来完成给定IP问题的解决方案。在本文中，我们提出了一个新颖的框架，可以端到端生成完整的可行解。我们的框架利用对比学习来表征IP实例和解之间的关系，并学习IP实例和解的潜在嵌入。此外，该框架采用扩散模型来学习在IP表示条件下解嵌入的分布，具有专门的引导抽样策略，考虑了约束和目标。我们在四个典型IP问题数据集上对我们的框架进行了实证评估，并展示了它在不依赖求解器的情况下以高概率（> 89.7％）有效生成完整的可行解，解的质量与Gurobi的最佳启发式解相当。此外，通过将我们方法的抽样部分解与SCIP的CompleteSol启发式方法相结合，所得到的可行解在所有数据集上优于最先进的方法，使得与最优值之间的差距改善了3.7至33.7％，并且在所有数据集上保持了超过99.7％的可行比率。

更新时间: 2024-06-18 07:33:05

领域: math.OC,cs.LG

下载: http://arxiv.org/abs/2406.12349v1

Never Gonna Give You Up: Exploring Deprecated NULL Ciphers in Commercial VoWiFi Deployments

In today's cellular network evolutions, such as 4G and 5G, the IMS (IP Multimedia Subsystem) serves as a crucial component in managing voice calls and handling short messages. Besides accessing the IMS over the traditional radio layer, many operators use Voice over Wi-Fi (VoWiFi) allowing customers to dial into their core network over the public Internet using an (insecure) Wi-Fi connection. To protect against malicious actors on the WiFi or Internet domain, the traffic is sent over a series of IPsec tunnels, ensuring confidentiality and integrity. Similar to other encrypted protocols (e.g. TLS), the client and server use a handshake protocol (i.e., IKEv2) to communicate their supported security configurations and to agree upon the used parameters (e.g., keys or an encryption algorithm) for the ongoing session. This however opens the door for security vulnerabilities introduced by misconfiguration. We want to analyze security configurations within commercial VoWiFi deployments, both on the client and server side, spotting deprecated configurations that undermine communication security.

Updated: 2024-06-18 07:32:38

标题: 永远不会放弃你：探索商业VoWiFi部署中被弃用的NULL密码

摘要: 在当今的蜂窝网络演进中，如4G和5G，IMS（IP多媒体子系统）是管理语音通话和处理短信的关键组件。除了通过传统的无线电层访问IMS外，许多运营商还使用Wi-Fi语音（VoWiFi），允许客户通过公共互联网使用（不安全的）Wi-Fi连接拨入其核心网络。为了防止WiFi或互联网领域上的恶意行为者，流量被发送过一系列IPsec隧道，确保机密性和完整性。与其他加密协议（如TLS）类似，客户端和服务器使用握手协议（即IKEv2）来通信其支持的安全配置，并就正在进行的会话使用的参数（例如密钥或加密算法）达成一致。然而，这为由于错误配置而引入的安全漏洞打开了大门。我们想要分析商业VoWiFi部署中的安全配置，包括客户端和服务器端，识别削弱通信安全性的过时配置。

更新时间: 2024-06-18 07:32:38

领域: cs.NI,cs.CR

下载: http://arxiv.org/abs/2406.12348v1

Navigating Knowledge Management Implementation Success in Government Organizations: A type-2 fuzzy approach

Optimal information and knowledge management is crucial for organizations to achieve their objectives efficiently. As a rare and valuable resource, effective knowledge management provides a strategic advantage and has become a key determinant of organizational success. The study aims to identify critical success and failure factors for implementing knowledge management systems in government organizations. This research employs a descriptive survey methodology, collecting data through random interviews and questionnaires. The study highlights the critical success factors for knowledge management systems in government organizations, including cooperation, an open atmosphere, staff training, creativity and innovation, removal of organizational constraints, reward policies, role modeling, and focus. Conversely, failure to consider formality, staff participation, collaboration technologies, network and hardware infrastructure, complexity, IT staff, and trust can pose significant obstacles to successful implementation.

Updated: 2024-06-18 07:22:32

标题: 在政府组织中实现知识管理的成功导航：一种类型-2模糊方法

摘要: 最佳的信息和知识管理对于组织高效实现其目标至关重要。作为一种稀缺而宝贵的资源，有效的知识管理提供战略优势并成为组织成功的关键决定因素。该研究旨在确定政府组织实施知识管理系统的关键成功和失败因素。本研究采用描述性调查方法，通过随机访谈和问卷收集数据。研究重点突出了政府组织知识管理系统的关键成功因素，包括合作、开放氛围、员工培训、创造力和创新、消除组织约束、奖励政策、角色塑造和专注。相反，未考虑形式化、员工参与、协作技术、网络和硬件基础设施、复杂性、IT人员和信任可能会对成功实施构成重大障碍。

更新时间: 2024-06-18 07:22:32

领域: cs.CY,cs.AI

下载: http://arxiv.org/abs/2406.12345v1

MEIT: Multi-Modal Electrocardiogram Instruction Tuning on Large Language Models for Report Generation

Electrocardiogram (ECG) is the primary non-invasive diagnostic tool for monitoring cardiac conditions and is crucial in assisting clinicians. Recent studies have concentrated on classifying cardiac conditions using ECG data but have overlooked ECG report generation, which is time-consuming and requires clinical expertise. To automate ECG report generation and ensure its versatility, we propose the Multimodal ECG Instruction Tuning (MEIT) framework, the first attempt to tackle ECG report generation with LLMs and multimodal instructions. To facilitate future research, we establish a benchmark to evaluate MEIT with various LLMs backbones across two large-scale ECG datasets. Our approach uniquely aligns the representations of the ECG signal and the report, and we conduct extensive experiments to benchmark MEIT with nine open-source LLMs using more than 800,000 ECG reports. MEIT's results underscore the superior performance of instruction-tuned LLMs, showcasing their proficiency in quality report generation, zero-shot capabilities, and resilience to signal perturbation. These findings emphasize the efficacy of our MEIT framework and its potential for real-world clinical application.

Updated: 2024-06-18 07:15:09

标题: MEIT: 基于大型语言模型的多模式心电图指导调整用于报告生成

摘要: 心电图（ECG）是监测心脏状况的主要非侵入性诊断工具，对于协助临床医生至关重要。最近的研究集中于使用ECG数据对心脏状况进行分类，但忽略了ECG报告的生成，这是耗时且需要临床专业知识的。为了自动化ECG报告生成并确保其多功能性，我们提出了多模态ECG指令调整（MEIT）框架，这是首次尝试使用LLMs和多模态指令来处理ECG报告生成。为了促进未来的研究，我们建立了一个基准，评估了MEIT在两个大规模ECG数据集上使用各种LLMs主干的效果。我们的方法独特地对齐了ECG信号和报告的表示，并进行了大量实验，使用超过80万份ECG报告对MEIT进行了基准测试，结果凸显了指令调整LLMs的卓越性能，展示了它们在生成高质量报告、零-shot能力和对信号扰动的韧性方面的熟练性。这些发现强调了我们MEIT框架的有效性及其在现实临床应用中的潜力。

更新时间: 2024-06-18 07:15:09

领域: cs.CL,cs.LG,eess.SP

下载: http://arxiv.org/abs/2403.04945v3

Benchmarking Zero-Shot Recognition with Vision-Language Models: Challenges on Granularity and Specificity

This paper presents novel benchmarks for evaluating vision-language models (VLMs) in zero-shot recognition, focusing on granularity and specificity. Although VLMs excel in tasks like image captioning, they face challenges in open-world settings. Our benchmarks test VLMs' consistency in understanding concepts across semantic granularity levels and their response to varying text specificity. Findings show that VLMs favor moderately fine-grained concepts and struggle with specificity, often misjudging texts that differ from their training data. Extensive evaluations reveal limitations in current VLMs, particularly in distinguishing between correct and subtly incorrect descriptions. While fine-tuning offers some improvements, it doesn't fully address these issues, highlighting the need for VLMs with enhanced generalization capabilities for real-world applications. This study provides insights into VLM limitations and suggests directions for developing more robust models.

Updated: 2024-06-18 07:12:47

标题: 用视觉-语言模型进行零样本识别的基准测试：在粒度和特异性上的挑战

摘要: 本文提出了用于评估视觉语言模型（VLMs）在零样本识别中的新型基准，重点关注细粒度和特异性。虽然VLMs在诸如图像字幕等任务中表现出色，但它们在开放世界环境中面临挑战。我们的基准测试了VLMs在不同语义粒度水平上理解概念的一致性，以及它们对变化的文本特异性的反应。研究结果显示，VLMs更偏向于适度细粒度的概念，并且在特异性方面存在困难，经常误判与其训练数据不同的文本。广泛的评估揭示了当前VLMs的局限性，特别是在区分正确和微妙不正确描述方面。虽然微调提供了一些改进，但并未完全解决这些问题，突显了对具有增强泛化能力的VLMs的需求，以适用于真实世界的应用。这项研究揭示了VLM的局限性，并提出了开发更强大模型的方向。

更新时间: 2024-06-18 07:12:47

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2306.16048v3

Evaluating Text-to-Visual Generation with Image-to-Text Generation

Despite significant progress in generative AI, comprehensive evaluation remains challenging because of the lack of effective metrics and standardized benchmarks. For instance, the widely-used CLIPScore measures the alignment between a (generated) image and text prompt, but it fails to produce reliable scores for complex prompts involving compositions of objects, attributes, and relations. One reason is that text encoders of CLIP can notoriously act as a "bag of words", conflating prompts such as "the horse is eating the grass" with "the grass is eating the horse". To address this, we introduce the VQAScore, which uses a visual-question-answering (VQA) model to produce an alignment score by computing the probability of a "Yes" answer to a simple "Does this figure show '{text}'?" question. Though simpler than prior art, VQAScore computed with off-the-shelf models produces state-of-the-art results across many (8) image-text alignment benchmarks. We also compute VQAScore with an in-house model that follows best practices in the literature. For example, we use a bidirectional image-question encoder that allows image embeddings to depend on the question being asked (and vice versa). Our in-house model, CLIP-FlanT5, outperforms even the strongest baselines that make use of the proprietary GPT-4V. Interestingly, although we train with only images, VQAScore can also align text with video and 3D models. VQAScore allows researchers to benchmark text-to-visual generation using complex texts that capture the compositional structure of real-world prompts. We introduce GenAI-Bench, a more challenging benchmark with 1,600 compositional text prompts that require parsing scenes, objects, attributes, relationships, and high-order reasoning like comparison and logic. GenAI-Bench also offers over 15,000 human ratings for leading image and video generation models such as Stable Diffusion, DALL-E 3, and Gen2.

Updated: 2024-06-18 07:09:55

标题: 评估文本到视觉生成与图像到文本生成

摘要: 尽管生成式人工智能取得了显著进展，但由于缺乏有效的指标和标准化基准，全面评估仍然具有挑战性。例如，广泛使用的CLIPScore衡量了（生成的）图像与文本提示之间的对齐情况，但在涉及对象、属性和关系组合的复杂提示时，它无法产生可靠的分数。一个原因是CLIP的文本编码器可以臭名昭著地像一个"词袋"，混淆提示，比如"马正在吃草"和"草正在吃马"。为了解决这个问题，我们引入了VQAScore，它使用一个视觉问答（VQA）模型通过计算对一个简单的"这个图显示'{文本}'吗？"问题的"Yes"答案的概率来产生一个对齐分数。尽管比过去的技术更简单，但使用现成模型计算的VQAScore在许多（8个）图像文本对齐基准测试中产生了最先进的结果。我们还使用符合文献最佳实践的内部模型计算VQAScore。例如，我们使用一个双向图像-问题编码器，允许图像嵌入取决于被问的问题（反之亦然）。我们的内部模型CLIP-FlanT5甚至超越了使用专有GPT-4V的最强基线。有趣的是，尽管我们只用图像训练，VQAScore也可以将文本与视频和3D模型对齐。VQAScore允许研究人员使用捕捉现实世界提示的组合结构的复杂文本来评估文本到视觉生成。我们引入了GenAI-Bench，一个更具挑战性的基准，包括1,600个需要解析场景、对象、属性、关系和高阶推理（如比较和逻辑）的组合文本提示。GenAI-Bench还为领先的图像和视频生成模型（如稳定扩散、DALL-E 3和Gen2）提供了超过15,000个人类评分。

更新时间: 2024-06-18 07:09:55

领域: cs.CV,cs.AI,cs.CL,cs.LG,cs.MM

下载: http://arxiv.org/abs/2404.01291v2

PARAFAC2-based Coupled Matrix and Tensor Factorizations with Constraints

Data fusion models based on Coupled Matrix and Tensor Factorizations (CMTF) have been effective tools for joint analysis of data from multiple sources. While the vast majority of CMTF models are based on the strictly multilinear CANDECOMP/PARAFAC (CP) tensor model, recently also the more flexible PARAFAC2 model has been integrated into CMTF models. PARAFAC2 tensor models can handle irregular/ragged tensors and have shown to be especially useful for modelling dynamic data with unaligned or irregular time profiles. However, existing PARAFAC2-based CMTF models have limitations in terms of possible regularizations on the factors and/or types of coupling between datasets. To address these limitations, in this paper we introduce a flexible algorithmic framework that fits PARAFAC2-based CMTF models using Alternating Optimization (AO) and the Alternating Direction Method of Multipliers (ADMM). The proposed framework allows to impose various constraints on all modes and linear couplings to other matrix-, CP- or PARAFAC2-models. Experiments on various simulated and a real dataset demonstrate the utility and versatility of the proposed framework as well as its benefits in terms of accuracy and efficiency in comparison with state-of-the-art methods.

Updated: 2024-06-18 07:05:31

标题: 基于PARAFAC2的耦合矩阵和张量分解与约束

摘要: 基于耦合矩阵和张量分解（CMTF）的数据融合模型已成为联合分析来自多个来源的数据的有效工具。尽管绝大多数CMTF模型基于严格的多线性CANDECOMP/PARAFAC（CP）张量模型，但最近更灵活的PARAFAC2模型也被整合到CMTF模型中。PARAFAC2张量模型可以处理不规则/不整齐的张量，并且已被证明特别适用于建模具有不对齐或不规则时间轮廓的动态数据。然而，现有基于PARAFAC2的CMTF模型在因子上的可能正规化和/或数据集之间的耦合类型方面存在局限性。为了解决这些限制，本文介绍了一个灵活的算法框架，利用交替优化（AO）和交替方向乘法器方法（ADMM）来拟合基于PARAFAC2的CMTF模型。所提出的框架允许对所有模式和线性耦合到其他矩阵、CP或PARAFAC2模型施加各种约束。对各种模拟和真实数据集的实验证明了所提出框架的实用性和多功能性，以及与最先进方法相比在准确性和效率方面的优势。

更新时间: 2024-06-18 07:05:31

领域: cs.LG

下载: http://arxiv.org/abs/2406.12338v1

A Compass for Navigating the World of Sentence Embeddings for the Telecom Domain

A plethora of sentence embedding models makes it challenging to choose one, especially for domains such as telecom, rich with specialized vocabulary. We evaluate multiple embeddings obtained from publicly available models and their domain-adapted variants, on both point retrieval accuracies as well as their (95\%) confidence intervals. We establish a systematic method to obtain thresholds for similarity scores for different embeddings. We observe that fine-tuning improves mean bootstrapped accuracies as well as tightens confidence intervals. The pre-training combined with fine-tuning makes confidence intervals even tighter. To understand these variations, we analyse and report significant correlations between the distributional overlap between top-$K$, correct and random sentence similarities with retrieval accuracies and similarity thresholds. Following current literature, we analyze if retrieval accuracy variations can be attributed to isotropy of embeddings. Our conclusions are that isotropy of embeddings (as measured by two independent state-of-the-art isotropy metric definitions) cannot be attributed to better retrieval performance. However, domain adaptation which improves retrieval accuracies also improves isotropy. We establish that domain adaptation moves domain specific embeddings further away from general domain embeddings.

Updated: 2024-06-18 07:03:34

标题: 一个用于在电信领域中导航句子嵌入世界的指南

摘要: 大量的句子嵌入模型使选择其中一个变得具有挑战性，特别是对于像电信这样富含专业词汇的领域。我们评估了从公开可用模型和它们的领域自适应变体中获取的多个嵌入，包括点检索准确性以及它们的（95\%）置信区间。我们建立了一种系统方法来获取不同嵌入的相似性分数阈值。我们观察到微调提高了平均自举准确性并缩小了置信区间。预训练结合微调使置信区间变得更加紧密。为了理解这些变化，我们分析并报告了最佳-K、正确和随机句子相似性之间的分布重叠与检索准确性和相似性阈值之间的重要相关性。根据当前文献，我们分析了检索准确性的变化是否可以归因于嵌入的各向同性。我们的结论是，嵌入的各向同性（根据两种独立的最先进的各向同性度量定义）不能归因于更好的检索性能。然而，改进检索准确性的领域自适应也改进了各向同性。我们确定，领域自适应将特定领域嵌入物进一步移离了通用领域嵌入。

更新时间: 2024-06-18 07:03:34

领域: cs.CL,cs.LG,68T50,I.2.7

下载: http://arxiv.org/abs/2406.12336v1

AIM: Attributing, Interpreting, Mitigating Data Unfairness

Data collected in the real world often encapsulates historical discrimination against disadvantaged groups and individuals. Existing fair machine learning (FairML) research has predominantly focused on mitigating discriminative bias in the model prediction, with far less effort dedicated towards exploring how to trace biases present in the data, despite its importance for the transparency and interpretability of FairML. To fill this gap, we investigate a novel research problem: discovering samples that reflect biases/prejudices from the training data. Grounding on the existing fairness notions, we lay out a sample bias criterion and propose practical algorithms for measuring and countering sample bias. The derived bias score provides intuitive sample-level attribution and explanation of historical bias in data. On this basis, we further design two FairML strategies via sample-bias-informed minimal data editing. They can mitigate both group and individual unfairness at the cost of minimal or zero predictive utility loss. Extensive experiments and analyses on multiple real-world datasets demonstrate the effectiveness of our methods in explaining and mitigating unfairness. Code is available at https://github.com/ZhiningLiu1998/AIM.

Updated: 2024-06-18 07:02:45

标题: 目标：归因、解释、缓解数据的不公平性

摘要: 在现实世界中收集的数据往往包含针对弱势群体和个人的历史歧视。现有的公平机器学习（FairML）研究主要集中在减少模型预测中的歧视性偏见，对探索如何追踪数据中存在的偏见的努力相对较少，尽管这对于FairML的透明度和可解释性非常重要。为了填补这一空白，我们研究了一个新颖的研究问题：发现训练数据中反映偏见/成见的样本。基于现有的公平性概念，我们提出了一个样本偏见标准，并提出了用于衡量和对抗样本偏见的实用算法。得出的偏见分数提供了历史偏见在数据中的直观样本级归因和解释。基于此基础，我们进一步设计了两种通过样本偏见引导的最小数据编辑的FairML策略。它们可以在最小或零预测效用损失的情况下减轻组和个人的不公平。对多个真实世界数据集进行的广泛实验和分析表明，我们的方法在解释和减轻不公平方面的有效性。代码可以在https://github.com/ZhiningLiu1998/AIM 上找到。

更新时间: 2024-06-18 07:02:45

领域: cs.LG,cs.AI,stat.ML

下载: http://arxiv.org/abs/2406.08819v2

Attention Score is not All You Need for Token Importance Indicator in KV Cache Reduction: Value Also Matters

Scaling the context size of large language models (LLMs) enables them to perform various new tasks, e.g., book summarization. However, the memory cost of the Key and Value (KV) cache in attention significantly limits the practical applications of LLMs. Recent works have explored token pruning for KV cache reduction in LLMs, relying solely on attention scores as a token importance indicator. However, our investigation into value vector norms revealed a notably non-uniform pattern questioning their reliance only on attention scores. Inspired by this, we propose a new method: Value-Aware Token Pruning (VATP) which uses both attention scores and the $ \ell_{1} $ norm of value vectors to evaluate token importance. Extensive experiments on LLaMA2-7B-chat and Vicuna-v1.5-7B across 16 LongBench tasks demonstrate VATP's superior performance.

Updated: 2024-06-18 07:01:11

标题: 注意得分不是在KV缓存减少中令牌重要性指标的全部所需：价值也很重要

摘要: 将大型语言模型（LLMs）的上下文大小进行扩展使其能够执行各种新任务，例如书籍摘要。然而，注意力机制中的关键和值（KV）缓存的内存成本显著限制了LLMs的实际应用。最近的研究探索了在LLMs中使用令牌修剪以减少KV缓存，仅依赖于注意力分数作为令牌重要性指标。然而，我们对值向量范数的调查揭示了一个明显的非均匀模式，对仅依赖于注意力分数的依赖性提出了质疑。受此启发，我们提出了一种新方法：Value-Aware Token Pruning（VATP），它使用注意力分数和值向量的$ \ell_{1} $范数来评估令牌的重要性。在16个LongBench任务上对LLaMA2-7B-chat和Vicuna-v1.5-7B进行的大量实验表明VATP具有卓越的性能。

更新时间: 2024-06-18 07:01:11

领域: cs.CL,cs.LG

下载: http://arxiv.org/abs/2406.12335v1

A Flexible Cryptographic Infrastructure for High-security SDR-based Systems

Military software defined radio (SDR) systems are a major factor in future network-centric operations due to their flexibility and support for more capable radio communications systems. The inherent nature of software-based systems requires a more complex auxiliary infrastructure and multiple independent levels of security compared with typical systems: Secure booting of the SDR device, cryptographically signed software, real time operating platform software as well as radio applications. This technology raises new challenges with respect to the management. The largest impact on SDR deployments is due to the auxiliary cryptographic infrastructure for the security of the software life cycle and the cyclic update of the keys. Compared to conventional radio devices, the SDR system with the cryptographic infrastructure described in this paper reaches a higher security level and is more flexible. The advantage is the possibility to deploy trunked radio system and further waveforms, such as coalition wideband, which will be standardized in the future. Also it is possible to update cryptographic mechanisms. In this work, we analyze the requirements for a high secure SDR deployment and model the life cycle of the components of a deployed SDR node based on the Joint Program Executive Office (JPEO) Software Communication Architecture (SCA).

Updated: 2024-06-18 07:00:50

标题: 一个灵活的用于高安全SDR系统的加密基础设施

摘要: 军用软件定义无线电（SDR）系统是未来网络中心化作战的重要因素，因其灵活性和对更强大的无线电通信系统的支持。基于软件的系统的固有性质要求比典型系统更复杂的辅助基础设施和多个独立的安全级别：SDR设备的安全引导、加密签名软件、实时操作平台软件以及无线电应用。这项技术带来了管理方面的新挑战。对SDR部署的最大影响是由于用于软件生命周期安全和密钥循环更新的辅助加密基础设施。与传统无线电设备相比，本文描述的带有加密基础设施的SDR系统达到了更高的安全级别并且更加灵活。其优势在于可以部署干线无线电系统和进一步的波形，如未来将标准化的联合宽带波形。同时也可以更新加密机制。在这项工作中，我们分析了高度安全的SDR部署的要求，并根据联合项目执行办公室（JPEO）软件通信架构（SCA）对部署的SDR节点组件的生命周期进行建模。

更新时间: 2024-06-18 07:00:50

领域: cs.CR,cs.DC,cs.NI,cs.SY,eess.SY,94A60,E.3; C.3; H.4

下载: http://arxiv.org/abs/2406.15489v1

What Did I Do Wrong? Quantifying LLMs' Sensitivity and Consistency to Prompt Engineering

Large Language Models (LLMs) changed the way we design and interact with software systems. Their ability to process and extract information from text has drastically improved productivity in a number of routine tasks. Developers that want to include these models in their software stack, however, face a dreadful challenge: debugging their inconsistent behavior across minor variations of the prompt. We therefore introduce two metrics for classification tasks, namely sensitivity and consistency, which are complementary to task performance. First, sensitivity measures changes of predictions across rephrasings of the prompt, and does not require access to ground truth labels. Instead, consistency measures how predictions vary across rephrasings for elements of the same class. We perform an empirical comparison of these metrics on text classification tasks, using them as guideline for understanding failure modes of the LLM. Our hope is that sensitivity and consistency will be powerful allies in automatic prompt engineering frameworks to obtain LLMs that balance robustness with performance.

Updated: 2024-06-18 06:59:24

标题: 我做错了什么？量化LLMs对提示工程的敏感性和一致性

摘要: 大型语言模型（LLMs）改变了我们设计和与软件系统交互的方式。它们处理和提取文本信息的能力显著提高了许多例行任务的生产力。然而，希望在其软件堆栈中包含这些模型的开发人员面临着一个可怕的挑战：调试在提示的微小变化下它们的不一致行为。因此，我们引入了两个用于分类任务的指标，即敏感性和一致性，这些指标是任务性能的补充。首先，敏感性衡量提示重新表述时预测的变化，并不需要访问地面真相标签。相反，一致性衡量了相同类别元素在提示重新表述时预测变化的程度。我们在文本分类任务上对这些指标进行了实证比较，将它们作为了解LLM失败模式的指导。我们希望敏感性和一致性将成为自动提示工程框架中的强大盟友，以获得在鲁棒性和性能之间平衡的LLMs。

更新时间: 2024-06-18 06:59:24

领域: cs.LG,cs.SE

下载: http://arxiv.org/abs/2406.12334v1

Can LLMs Learn New Concepts Incrementally without Forgetting?

Large Language Models (LLMs) have achieved remarkable success across various tasks, yet their ability to learn incrementally without forgetting remains underexplored. Incremental learning (IL) is crucial as it enables models to acquire new knowledge while retaining previously learned information, akin to human learning. Existing benchmarks for IL are insufficient due to data leakage issues and the overqualification of LLMs. To address these challenges, we introduce Concept-1K, a novel dataset comprising 1,023 recently emerged concepts across diverse domains. The concepts in Concept-1K are discrete, interpretable units of knowledge that allow for fine-grained analysis of learning and forgetting processes. Using Concept-1K as a testbed, we aim to answer the question: ``Can LLMs learn new concepts incrementally without forgetting like humans?'' Our investigation reveals that LLMs still suffer from catastrophic forgetting and that LoRA, despite fine-tuning fewer parameters, may lead to more forgetting on training data. Additionally, we explore the roles of in-context learning, model scale, buffer size, and pretraining in IL performance. These findings highlight the strengths and limitations of LLMs in IL scenarios and provide a robust benchmark for future research.

Updated: 2024-06-18 06:56:44

标题: LLM是否能够在不遗忘的情况下逐步学习新概念？

摘要: 大型语言模型（LLMs）在各种任务上取得了显著的成功，但它们具有增量学习而不会遗忘的能力仍然未被充分探索。增量学习（IL）是至关重要的，因为它使模型能够获取新知识同时保留先前学到的信息，类似于人类学习。由于数据泄漏问题和LLMs的过度合格化，现有的IL基准不足。为了解决这些挑战，我们引入了Concept-1K，这是一个新颖的数据集，包括来自不同领域的1,023个最近出现的概念。Concept-1K中的概念是离散的、可解释的知识单元，可以对学习和遗忘过程进行细粒度分析。利用Concept-1K作为测试平台，我们旨在回答一个问题：“LLMs能否像人类一样增量学习新概念而不会遗忘？”我们的调查表明，LLMs仍然受到灾难性遗忘的影响，尽管LoRA调整的参数较少，但在训练数据上可能导致更多的遗忘。此外，我们探讨了上下文学习、模型规模、缓冲区大小和预训练在IL性能中的作用。这些发现突出了LLMs在IL场景中的优势和局限性，并为未来研究提供了一个强大的基准。

更新时间: 2024-06-18 06:56:44

领域: cs.LG,cs.CL

下载: http://arxiv.org/abs/2402.08526v3

Retrieval Meets Reasoning: Dynamic In-Context Editing for Long-Text Understanding

Current Large Language Models (LLMs) face inherent limitations due to their pre-defined context lengths, which impede their capacity for multi-hop reasoning within extensive textual contexts. While existing techniques like Retrieval-Augmented Generation (RAG) have attempted to bridge this gap by sourcing external information, they fall short when direct answers are not readily available. We introduce a novel approach that re-imagines information retrieval through dynamic in-context editing, inspired by recent breakthroughs in knowledge editing. By treating lengthy contexts as malleable external knowledge, our method interactively gathers and integrates relevant information, thereby enabling LLMs to perform sophisticated reasoning steps. Experimental results demonstrate that our method effectively empowers context-limited LLMs, such as Llama2, to engage in multi-hop reasoning with improved performance, which outperforms state-of-the-art context window extrapolation methods and even compares favorably to more advanced commercial long-context models. Our interactive method not only enhances reasoning capabilities but also mitigates the associated training and computational costs, making it a pragmatic solution for enhancing LLMs' reasoning within expansive contexts.

Updated: 2024-06-18 06:54:28

标题: 检索遇见推理：动态上下文编辑用于长文本理解

摘要: 目前的大型语言模型(LLMs)面临固有的限制，由于它们预定义的上下文长度，这阻碍了它们在广泛的文本上下文中进行多跳推理的能力。虽然现有的技术如检索增强生成(RAG)已经尝试通过采集外部信息来弥合这一差距，但当直接答案不容易获得时，它们就会不足。我们引入了一种新颖的方法，通过动态上下文编辑重新构想信息检索，受到最近知识编辑突破的启发。通过将冗长的上下文视为可塑的外部知识，我们的方法交互式地收集和整合相关信息，从而使LLMs能够进行复杂的推理步骤。实验结果表明，我们的方法有效地增强了受限上下文的LLMs，如Llama2，在多跳推理中表现出改进的性能，超过了最先进的上下文窗口外推方法，甚至与更先进的商用长上下文模型相比也表现出色。我们的交互式方法不仅增强了推理能力，还减轻了相关的训练和计算成本，使其成为在广阔上下文中增强LLMs推理的实用解决方案。

更新时间: 2024-06-18 06:54:28

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2406.12331v1

Security and Privacy of 6G Federated Learning-enabled Dynamic Spectrum Sharing

Spectrum sharing is increasingly vital in 6G wireless communication, facilitating dynamic access to unused spectrum holes. Recently, there has been a significant shift towards employing machine learning (ML) techniques for sensing spectrum holes. In this context, federated learning (FL)-enabled spectrum sensing technology has garnered wide attention, allowing for the construction of an aggregated ML model without disclosing the private spectrum sensing information of wireless user devices. However, the integrity of collaborative training and the privacy of spectrum information from local users have remained largely unexplored. This article first examines the latest developments in FL-enabled spectrum sharing for prospective 6G scenarios. It then identifies practical attack vectors in 6G to illustrate potential AI-powered security and privacy threats in these contexts. Finally, the study outlines future directions, including practical defense challenges and guidelines.

Updated: 2024-06-18 06:54:15

标题: 6G联邦学习启用的动态频谱共享的安全性和隐私保护

摘要: 频谱共享在6G无线通信中变得越来越重要，有助于动态访问未使用的频谱空隙。最近，越来越多地采用机器学习（ML）技术来感知频谱空隙。在这种情况下，启用联邦学习（FL）的频谱感知技术引起了广泛关注，可以构建一个聚合的ML模型，而不泄露无线用户设备的私有频谱感知信息。然而，合作训练的完整性和来自本地用户的频谱信息的隐私性仍然大部分未被探讨。本文首先考察了FL启用的频谱共享在未来6G场景中的最新发展。然后识别了6G中的实际攻击向量，以说明这些情境中潜在的人工智能驱动的安全和隐私威胁。最后，研究概述了未来的方向，包括实际的防御挑战和指导方针。

更新时间: 2024-06-18 06:54:15

领域: cs.CR,cs.DC,cs.ET,cs.LG,cs.NI

下载: http://arxiv.org/abs/2406.12330v1

Fast post-process Bayesian inference with Variational Sparse Bayesian Quadrature

In applied Bayesian inference scenarios, users may have access to a large number of pre-existing model evaluations, for example from maximum-a-posteriori (MAP) optimization runs. However, traditional approximate inference techniques make little to no use of this available information. We propose the framework of post-process Bayesian inference as a means to obtain a quick posterior approximation from existing target density evaluations, with no further model calls. Within this framework, we introduce Variational Sparse Bayesian Quadrature (VSBQ), a method for post-process approximate inference for models with black-box and potentially noisy likelihoods. VSBQ reuses existing target density evaluations to build a sparse Gaussian process (GP) surrogate model of the log posterior density function. Subsequently, we leverage sparse-GP Bayesian quadrature combined with variational inference to achieve fast approximate posterior inference over the surrogate. We validate our method on challenging synthetic scenarios and real-world applications from computational neuroscience. The experiments show that VSBQ builds high-quality posterior approximations by post-processing existing optimization traces, with no further model evaluations.

Updated: 2024-06-18 06:53:56

标题: 用变分稀疏贝叶斯积分进行快速后处理贝叶斯推断

摘要: 在应用贝叶斯推断的场景中，用户可能可以访问大量已有的模型评估，例如来自最大后验（MAP）优化运行。然而，传统的近似推断技术很少或根本不利用这些可用信息。我们提出了后处理贝叶斯推断框架作为一种从现有目标密度评估中快速获得后验近似的方法，而无需进行更多的模型调用。在这个框架内，我们引入了变分稀疏贝叶斯积分（VSBQ）方法，用于黑盒模型和潜在嘈杂似然函数的后处理近似推断。VSBQ重用现有的目标密度评估来构建对数后验密度函数的稀疏高斯过程（GP）代理模型。随后，我们利用稀疏GP贝叶斯积分结合变分推断来实现对代理模型的快速近似后验推断。我们在具有挑战性的合成场景和来自计算神经科学的真实应用中验证了我们的方法。实验证明，VSBQ通过后处理现有的优化轨迹构建高质量的后验近似，而无需进行更多的模型评估。

更新时间: 2024-06-18 06:53:56

领域: stat.ML,cs.LG,stat.CO,stat.ME

下载: http://arxiv.org/abs/2303.05263v2

Toward Exploring the Code Understanding Capabilities of Pre-trained Code Generation Models

Recently, large code generation models trained in a self-supervised manner on extensive unlabeled programming language data have achieved remarkable success. While these models acquire vast amounts of code knowledge, they perform poorly on code understanding tasks, such as code search and clone detection, as they are specifically trained for generation. Pre-training a larger encoder-only architecture model from scratch on massive code data can improve understanding performance. However, this approach is costly and time-consuming, making it suboptimal. In this paper, we pioneer the transfer of knowledge from pre-trained code generation models to code understanding tasks, significantly reducing training costs. We examine effective strategies for enabling decoder-only models to acquire robust code representations. Furthermore, we introduce CL4D, a contrastive learning method designed to enhance the representation capabilities of decoder-only models. Comprehensive experiments demonstrate that our approach achieves state-of-the-art performance in understanding tasks such as code search and clone detection. Our analysis shows that our method effectively reduces the distance between semantically identical samples in the representation space. These findings suggest the potential for unifying code understanding and generation tasks using a decoder-only structured model.

Updated: 2024-06-18 06:52:14

标题: 朝向探索预训练代码生成模型的代码理解能力

摘要: 最近，在大量未标记的编程语言数据上以自监督方式训练的大型代码生成模型取得了显著成功。虽然这些模型获得了大量的代码知识，但它们在代码理解任务（如代码搜索和克隆检测）上表现不佳，因为它们是专门针对生成而训练的。从头开始在大量代码数据上预训练一个更大的仅编码器架构模型可以提高理解性能。然而，这种方法成本高且耗时，使其不够优化。在本文中，我们开创性地将知识从预训练的代码生成模型转移到代码理解任务中，显著降低训练成本。我们研究了使仅解码器模型获得强大代码表示的有效策略。此外，我们引入了CL4D，一种旨在增强仅解码器模型表示能力的对比学习方法。全面的实验表明，我们的方法在代码搜索和克隆检测等理解任务中取得了最先进的性能。我们的分析显示，我们的方法有效地减少了表示空间中语义相同样本之间的距离。这些发现表明，使用仅解码器结构模型可能统一代码理解和生成任务的潜力。

更新时间: 2024-06-18 06:52:14

领域: cs.SE,cs.AI

下载: http://arxiv.org/abs/2406.12326v1

Growth in products of matrices: fastest, average, and generic

The problems that we consider in this paper are as follows. Let A and B be 2x2 matrices (over reals). Let w(A, B) be a word of length n. After evaluating w(A, B) as a product of matrices, we get a 2x2 matrix, call it W. What is the largest (by the absolute value) possible entry of W, over all w(A, B) of length n, as a function of n? What is the expected absolute value of the largest (by the absolute value) entry in a random product of n matrices, where each matrix is A or B with probability 0.5? What is the Lyapunov exponent for a random matrix product like that? We give partial answer to the first of these questions and an essentially complete answer to the second question. For the third question (the most difficult of the three), we offer a very simple method to produce an upper bound on the Lyapunov exponent in the case where all entries of the matrices A and B are nonnegative.

Updated: 2024-06-18 06:49:09

标题: 矩阵乘积的增长：最快、平均和一般情况

摘要: 在本文中我们考虑的问题如下。设A和B是实数域上的2x2矩阵。设w(A, B)是长度为n的一个词。在将w(A, B)作为矩阵乘积计算后，我们得到一个2x2矩阵，称之为W。在所有长度为n的w(A, B)中，W中可能的最大（绝对值）入口是多少？在随机n个矩阵乘积中，每个矩阵为A或B的概率均为0.5，最大（绝对值）入口的期望值是多少？在这样一个随机矩阵乘积中，Lyapunov指数是多少？我们对第一个问题给出了部分答案，并对第二个问题给出了基本完整的答案。对于第三个问题（三者中最困难的问题），我们提供了一个非常简单的方法，在矩阵A和B的所有入口都为非负数的情况下，产生Lyapunov指数的上界。

更新时间: 2024-06-18 06:49:09

领域: math.GR,cs.CR,math.CO,math.DS,math.PR

下载: http://arxiv.org/abs/2405.00610v5

Automatic benchmarking of large multimodal models via iterative experiment programming

Assessing the capabilities of large multimodal models (LMMs) often requires the creation of ad-hoc evaluations. Currently, building new benchmarks requires tremendous amounts of manual work for each specific analysis. This makes the evaluation process tedious and costly. In this paper, we present APEx, Automatic Programming of Experiments, the first framework for automatic benchmarking of LMMs. Given a research question expressed in natural language, APEx leverages a large language model (LLM) and a library of pre-specified tools to generate a set of experiments for the model at hand, and progressively compile a scientific report. The report drives the testing procedure: based on the current status of the investigation, APEx chooses which experiments to perform and whether the results are sufficient to draw conclusions. Finally, the LLM refines the report, presenting the results to the user in natural language. Thanks to its modularity, our framework is flexible and extensible as new tools become available. Empirically, APEx reproduces the findings of existing studies while allowing for arbitrary analyses and hypothesis testing.

Updated: 2024-06-18 06:43:46

标题: 大型多模态模型的自动基准测试：通过迭代式实验编程

摘要: 评估大型多模态模型（LMMs）的能力通常需要创建临时评估。目前，构建新的基准需要为每个特定分析进行大量的手动工作。这使得评估过程变得繁琐且昂贵。在本文中，我们介绍了APEx，即自动实验编程，这是用于自动基准测试LMMs的第一个框架。给定用自然语言表达的研究问题，APEx利用大型语言模型（LLM）和预先指定工具库生成适用于当前模型的一组实验，并逐步编制科学报告。报告驱动测试过程：根据调查的当前状态，APEx选择要执行的实验以及结果是否足以得出结论。最后，LLM完善报告，以自然语言向用户呈现结果。由于其模块化，我们的框架具有灵活性和可扩展性，随着新工具的推出而变得更加灵活。经验上，APEx重现了现有研究的发现，同时允许进行任意分析和假设检验。

更新时间: 2024-06-18 06:43:46

领域: cs.AI,cs.CL,cs.CV

下载: http://arxiv.org/abs/2406.12321v1

Enhancing Visible-Infrared Person Re-identification with Modality- and Instance-aware Visual Prompt Learning

The Visible-Infrared Person Re-identification (VI ReID) aims to match visible and infrared images of the same pedestrians across non-overlapped camera views. These two input modalities contain both invariant information, such as shape, and modality-specific details, such as color. An ideal model should utilize valuable information from both modalities during training for enhanced representational capability. However, the gap caused by modality-specific information poses substantial challenges for the VI ReID model to handle distinct modality inputs simultaneously. To address this, we introduce the Modality-aware and Instance-aware Visual Prompts (MIP) network in our work, designed to effectively utilize both invariant and specific information for identification. Specifically, our MIP model is built on the transformer architecture. In this model, we have designed a series of modality-specific prompts, which could enable our model to adapt to and make use of the specific information inherent in different modality inputs, thereby reducing the interference caused by the modality gap and achieving better identification. Besides, we also employ each pedestrian feature to construct a group of instance-specific prompts. These customized prompts are responsible for guiding our model to adapt to each pedestrian instance dynamically, thereby capturing identity-level discriminative clues for identification. Through extensive experiments on SYSU-MM01 and RegDB datasets, the effectiveness of both our designed modules is evaluated. Additionally, our proposed MIP performs better than most state-of-the-art methods.

Updated: 2024-06-18 06:39:03

标题: 通过模态和实例感知视觉提示学习增强可见-红外人员再识别

摘要: 可见红外人物再识别（VI ReID）旨在匹配不重叠摄像头视图中相同行人的可见和红外图像。这两种输入模态包含不变信息，如形状，以及模态特定的细节，如颜色。理想的模型应该在训练过程中利用来自两种模态的宝贵信息，以增强表征能力。然而，由模态特定信息引起的差距给VI ReID模型同时处理不同模态输入带来了重大挑战。为了解决这个问题，我们在我们的工作中引入了Modality-aware and Instance-aware Visual Prompts (MIP)网络，旨在有效利用不变和特定信息进行识别。具体来说，我们的MIP模型是基于变压器架构构建的。在这个模型中，我们设计了一系列模态特定提示，这些提示可以使我们的模型适应并利用不同模态输入中固有的特定信息，从而减少由模态差距引起的干扰并实现更好的识别。此外，我们还利用每个行人特征构建一组实例特定提示。这些定制的提示负责指导我们的模型动态适应每个行人实例，从而捕捉身份级别的辨识线索。通过对SYSU-MM01和RegDB数据集进行大量实验，评估了我们设计的两个模块的有效性。此外，我们提出的MIP比大多数最先进的方法表现更好。

更新时间: 2024-06-18 06:39:03

领域: cs.CV,cs.AI,cs.MM

下载: http://arxiv.org/abs/2406.12316v1

PruningBench: A Comprehensive Benchmark of Structural Pruning

Structural pruning has emerged as a promising approach for producing more efficient models. Nevertheless, the community suffers from a lack of standardized benchmarks and metrics, leaving the progress in this area not fully comprehended. To fill this gap, we present the first comprehensive benchmark, termed \textit{PruningBench}, for structural pruning. PruningBench showcases the following three characteristics: 1) PruningBench employs a unified and consistent framework for evaluating the effectiveness of diverse structural pruning techniques; 2) PruningBench systematically evaluates 16 existing pruning methods, encompassing a wide array of models (e.g., CNNs and ViTs) and tasks (e.g., classification and detection); 3) PruningBench provides easily implementable interfaces to facilitate the implementation of future pruning methods, and enables the subsequent researchers to incorporate their work into our leaderboards. We provide an online pruning platform http://pruning.vipazoo.cn for customizing pruning tasks and reproducing all results in this paper. Codes will be made publicly available.

Updated: 2024-06-18 06:37:26

标题: PruningBench：结构修剪的全面基准

摘要: 结构剪枝已成为生产更高效模型的一种有前途的方法。然而，社区在缺乏标准化基准和指标方面遇到困难，导致该领域的进展尚未完全被理解。为填补这一空白，我们提出了第一个全面的基准，称为\textit{PruningBench}，用于结构剪枝。PruningBench展示了以下三个特点：1）PruningBench采用统一和一致的框架评估不同结构剪枝技术的有效性；2）PruningBench系统评估了16种现有的剪枝方法，涵盖了各种模型（如CNNs和ViTs）和任务（如分类和检测）；3）PruningBench提供易于实施的接口，以促进未来剪枝方法的实施，并使随后的研究人员将他们的工作纳入我们的排行榜。我们提供一个在线剪枝平台http://pruning.vipazoo.cn，用于定制剪枝任务并重现本文中的所有结果。代码将公开发布。

更新时间: 2024-06-18 06:37:26

领域: cs.AI

下载: http://arxiv.org/abs/2406.12315v1

Mixture of Scales: Memory-Efficient Token-Adaptive Binarization for Large Language Models

Binarization, which converts weight parameters to binary values, has emerged as an effective strategy to reduce the size of large language models (LLMs). However, typical binarization techniques significantly diminish linguistic effectiveness of LLMs. To address this issue, we introduce a novel binarization technique called Mixture of Scales (BinaryMoS). Unlike conventional methods, BinaryMoS employs multiple scaling experts for binary weights, dynamically merging these experts for each token to adaptively generate scaling factors. This token-adaptive approach boosts the representational power of binarized LLMs by enabling contextual adjustments to the values of binary weights. Moreover, because this adaptive process only involves the scaling factors rather than the entire weight matrix, BinaryMoS maintains compression efficiency similar to traditional static binarization methods. Our experimental results reveal that BinaryMoS surpasses conventional binarization techniques in various natural language processing tasks and even outperforms 2-bit quantization methods, all while maintaining similar model size to static binarization techniques.

Updated: 2024-06-18 06:32:23

标题: 混合尺度：用于大型语言模型的记忆高效的令牌自适应二值化

摘要: 二值化将权重参数转换为二进制值，已经成为减小大型语言模型（LLMs）大小的有效策略。然而，典型的二值化技术显著降低了LLMs的语言效果。为了解决这个问题，我们引入了一种新颖的二值化技术，称为混合尺度（BinaryMoS）。与传统方法不同，BinaryMoS利用多个缩放专家来处理二进制权重，动态地合并这些专家以适应性地生成缩放因子。这种基于标记的方法通过使二值化的LLMs能够进行上下文调整来提升表示能力。此外，由于这种自适应过程仅涉及缩放因子而不是整个权重矩阵，BinaryMoS保持了类似于传统静态二值化方法的压缩效率。我们的实验结果显示，BinaryMoS在各种自然语言处理任务中超越了传统的二值化技术，甚至优于2位量化方法，同时保持了与静态二值化技术相似的模型大小。

更新时间: 2024-06-18 06:32:23

领域: cs.LG

下载: http://arxiv.org/abs/2406.12311v1

EEG2Rep: Enhancing Self-supervised EEG Representation Through Informative Masked Inputs

Self-supervised approaches for electroencephalography (EEG) representation learning face three specific challenges inherent to EEG data: (1) The low signal-to-noise ratio which challenges the quality of the representation learned, (2) The wide range of amplitudes from very small to relatively large due to factors such as the inter-subject variability, risks the models to be dominated by higher amplitude ranges, and (3) The absence of explicit segmentation in the continuous-valued sequences which can result in less informative representations. To address these challenges, we introduce \textit{EEG2Rep}, a self-prediction approach for self-supervised representation learning from EEG. Two core novel components of EEG2Rep are as follows: 1) Instead of learning to predict the masked input from raw EEG, EEG2Rep learns to predict masked input in latent representation space, and 2) Instead of conventional masking methods, EEG2Rep uses a new semantic subsequence preserving (SSP) method which provides informative masked inputs to guide EEG2Rep to generate rich semantic representations. In experiments on 6 diverse EEG tasks with subject variability, EEG2Rep significantly outperforms state-of-the-art methods. We show that our semantic subsequence preserving improves the existing masking methods in self-prediction literature and find that preserving 50\% of EEG recordings will result in the most accurate results on all 6 tasks on average. Finally, we show that EEG2Rep is robust to noise addressing a significant challenge that exists in EEG data. Models and code are available at:\url{https://github.com/Navidfoumani/EEG2Rep}

Updated: 2024-06-18 06:31:49

标题: EEG2Rep: 通过信息蒙版输入增强自监督EEG表示

摘要: 自监督的脑电图（EEG）表示学习面临着三个特定的挑战：（1）信噪比低，挑战了学到的表示的质量，（2）振幅范围广，从非常小到相对较大，由于诸如被试间变异等因素，风险是模型会被较高振幅范围所主导，以及（3）在连续值序列中缺乏明确的分割，这可能导致信息较少的表示。为了解决这些挑战，我们引入了\textit{EEG2Rep}，这是一种从EEG中进行自监督表示学习的自我预测方法。EEG2Rep的两个核心创新组件如下：1）与从原始EEG中学习预测掩码输入不同，EEG2Rep学习在潜在表示空间中预测掩码输入，以及2）与传统的掩码方法不同，EEG2Rep使用一种新的保留语义子序列（SSP）方法，提供信息丰富的掩码输入以指导EEG2Rep生成丰富的语义表示。在涉及受试者变异性的6个不同EEG任务的实验中，EEG2Rep明显优于最先进的方法。我们表明我们的语义子序列保留改进了自我预测文献中的现有掩码方法，并发现保留50％的EEG记录将导致所有6个任务的平均最准确结果。最后，我们展示EEG2Rep对噪音是稳健的，解决了EEG数据中存在的重大挑战。模型和代码可在以下网址找到：\url{https://github.com/Navidfoumani/EEG2Rep}

更新时间: 2024-06-18 06:31:49

领域: eess.SP,cs.LG

下载: http://arxiv.org/abs/2402.17772v2

MAP: Low-compute Model Merging with Amortized Pareto Fronts via Quadratic Approximation

Model merging has emerged as an effective approach to combine multiple single-task models, fine-tuned from the same pre-trained model, into a multitask model. This process typically involves computing a weighted average of the model parameters without any additional training. Existing model-merging methods focus on enhancing average task accuracy. However, interference and conflicts between the objectives of different tasks can lead to trade-offs during model merging. In real-world applications, a set of solutions with various trade-offs can be more informative, helping practitioners make decisions based on diverse preferences. In this paper, we introduce a novel low-compute algorithm, Model Merging with Amortized Pareto Front (MAP). MAP identifies a Pareto set of scaling coefficients for merging multiple models to reflect the trade-offs. The core component of MAP is approximating the evaluation metrics of the various tasks using a quadratic approximation surrogate model derived from a pre-selected set of scaling coefficients, enabling amortized inference. Experimental results on vision and natural language processing tasks show that MAP can accurately identify the Pareto front. To further reduce the required computation of MAP, we propose (1) a Bayesian adaptive sampling algorithm and (2) a nested merging scheme with multiple stages.

Updated: 2024-06-18 06:24:11

标题: MAP：通过二次近似实现低计算模型合并的摊销帕累托前沿

摘要: 模型合并已经成为一种有效的方法，将从同一预训练模型微调的多个单任务模型合并成一个多任务模型。这个过程通常涉及计算模型参数的加权平均，无需额外训练。现有的模型合并方法侧重于提高平均任务准确度。然而，不同任务目标之间的干扰和冲突可能导致模型合并过程中的权衡。在现实应用中，一组具有各种权衡的解决方案可能更具信息性，有助于从不同偏好中做出决策。在本文中，我们介绍了一种新的低计算算法，称为Amortized Pareto Front（MAP）的模型合并。MAP识别了一组用于合并多个模型的比例系数的帕累托集，以反映权衡。MAP的核心组件是使用从预先选择的比例系数集导出的二次逼近替代模型来近似各种任务的评估指标，实现分期推断。视觉和自然语言处理任务的实验结果显示，MAP能够准确识别帕累托前沿。为了进一步减少MAP所需的计算量，我们提出了（1）一种贝叶斯自适应抽样算法和（2）一个具有多个阶段的嵌套合并方案。

更新时间: 2024-06-18 06:24:11

领域: cs.LG

下载: http://arxiv.org/abs/2406.07529v2

Investigating Annotator Bias in Large Language Models for Hate Speech Detection

Data annotation, the practice of assigning descriptive labels to raw data, is pivotal in optimizing the performance of machine learning models. However, it is a resource-intensive process susceptible to biases introduced by annotators. The emergence of sophisticated Large Language Models (LLMs), like ChatGPT presents a unique opportunity to modernize and streamline this complex procedure. While existing research extensively evaluates the efficacy of LLMs, as annotators, this paper delves into the biases present in LLMs, specifically GPT 3.5 and GPT 4o when annotating hate speech data. Our research contributes to understanding biases in four key categories: gender, race, religion, and disability. Specifically targeting highly vulnerable groups within these categories, we analyze annotator biases. Furthermore, we conduct a comprehensive examination of potential factors contributing to these biases by scrutinizing the annotated data. We introduce our custom hate speech detection dataset, HateSpeechCorpus, to conduct this research. Additionally, we perform the same experiments on the ETHOS (Mollas et al., 2022) dataset also for comparative analysis. This paper serves as a crucial resource, guiding researchers and practitioners in harnessing the potential of LLMs for dataannotation, thereby fostering advancements in this critical field. The HateSpeechCorpus dataset is available here: https://github.com/AmitDasRup123/HateSpeechCorpus

Updated: 2024-06-18 06:21:16

标题: 调查大型语言模型中标注者偏见对仇恨言论检测的影响

摘要: 数据标注是将描述性标签分配给原始数据的实践，在优化机器学习模型的性能方面至关重要。然而，这是一个资源密集型的过程，容易受到注释者引入的偏见的影响。像ChatGPT这样的复杂大语言模型（LLMs）的出现为现代化和简化这一复杂过程提供了独特的机会。虽然现有研究广泛评估了LLMs的有效性，但本文作为注释者，着重讨论了在LLMs（特别是GPT 3.5和GPT 4o）在注释仇恨言论数据时存在的偏见。我们的研究有助于了解LLMs中存在的偏见，特别是在性别、种族、宗教和残疾等四个关键类别中针对高度脆弱群体的注释者偏见。此外，我们通过审查注释数据，对导致这些偏见的潜在因素进行了全面的研究。我们引入了我们自定义的仇恨言论检测数据集HateSpeechCorpus来进行这项研究。此外，我们还对ETHOS（Mollas等，2022年）数据集进行了相同的实验，以进行比较分析。本文作为一个关键资源，指导研究人员和从业者利用LLMs的潜力进行数据标注，从而促进这一关键领域的进步。HateSpeechCorpus数据集可在此处获得：https://github.com/AmitDasRup123/HateSpeechCorpus

更新时间: 2024-06-18 06:21:16

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2406.11109v2

Exploiting and Securing ML Solutions in Near-RT RIC: A Perspective of an xApp

Open Radio Access Networks (O-RAN) are emerging as a disruptive technology, revolutionising traditional mobile network architecture and deployments in the current 5G and the upcoming 6G era. Disaggregation of network architecture, inherent support for AI/ML workflows, cloud-native principles, scalability, and interoperability make O-RAN attractive to network providers for beyond-5G and 6G deployments. Notably, the ability to deploy custom applications, including Machine Learning (ML) solutions as xApps or rApps on the RAN Intelligent Controllers (RICs), has immense potential for network function and resource optimisation. However, the openness, nascent standards, and distributed architecture of O-RAN and RICs introduce numerous vulnerabilities exploitable through multiple attack vectors, which have not yet been fully explored. To address this gap and ensure robust systems before large-scale deployments, this work analyses the security of ML-based applications deployed on the RIC platform. We focus on potential attacks, defence mechanisms, and pave the way for future research towards a more robust RIC platform.

Updated: 2024-06-18 06:12:57

标题: 在近实时RIC中利用和保护ML解决方案：一个xApp的视角

摘要: 开放式无线接入网络（O-RAN）作为一种颠覆性技术正在崛起，颠覆传统移动网络架构和5G时代以及即将到来的6G时代的部署。网络架构的解耦、对AI/ML工作流的固有支持、云原生原则、可扩展性和互操作性使得O-RAN对于网络提供商在超越5G和6G部署中具有吸引力。值得注意的是，将自定义应用程序，包括将机器学习（ML）解决方案作为xApps或rApps部署在无线接入网络智能控制器（RICs）上的能力对于网络功能和资源优化具有巨大潜力。然而，O-RAN和RICs的开放性、初期标准和分布式架构引入了许多可通过多种攻击向量利用的漏洞，这些漏洞尚未完全被探索。为了填补这一空白并确保在大规模部署之前具有强大的系统，本研究分析了部署在RIC平台上的基于ML的应用程序的安全性。我们关注潜在的攻击、防御机制，并为未来研究铺平道路，以建立更加强大的RIC平台。

更新时间: 2024-06-18 06:12:57

领域: cs.CR,cs.NI,cs.SY,eess.SY

下载: http://arxiv.org/abs/2406.12299v1

LSKNet: A Foundation Lightweight Backbone for Remote Sensing

Remote sensing images pose distinct challenges for downstream tasks due to their inherent complexity. While a considerable amount of research has been dedicated to remote sensing classification, object detection and semantic segmentation, most of these studies have overlooked the valuable prior knowledge embedded within remote sensing scenarios. Such prior knowledge can be useful because remote sensing objects may be mistakenly recognized without referencing a sufficiently long-range context, which can vary for different objects. This paper considers these priors and proposes a lightweight Large Selective Kernel Network (LSKNet) backbone. LSKNet can dynamically adjust its large spatial receptive field to better model the ranging context of various objects in remote sensing scenarios. To our knowledge, large and selective kernel mechanisms have not been previously explored in remote sensing images. Without bells and whistles, our lightweight LSKNet sets new state-of-the-art scores on standard remote sensing classification, object detection and semantic segmentation benchmarks. Our comprehensive analysis further validated the significance of the identified priors and the effectiveness of LSKNet. The code is available at https://github.com/zcablii/LSKNet.

Updated: 2024-06-18 06:08:24

标题: LSKNet：遥感领域的基础轻量级骨干网络

摘要: 遥感图像由于其固有的复杂性，对下游任务提出了独特的挑战。虽然已经有大量的研究致力于遥感分类、目标检测和语义分割，但大多数研究都忽视了遥感场景中蕴含的宝贵先验知识。这种先验知识可能很有用，因为在没有参考足够长距离背景的情况下，遥感对象可能会被错误地识别，而不同对象的背景范围可能会有所不同。本文考虑了这些先验知识，并提出了一种轻量级的大型选择性核网络（LSKNet）骨干。LSKNet可以动态调整其大的空间感受野，更好地对遥感场景中各种对象的距离背景进行建模。据我们所知，大型和选择性核机制在遥感图像中尚未被探索。没有花哨的设计，我们的轻量级LSKNet在标准遥感分类、目标检测和语义分割基准上取得了新的最先进成绩。我们的全面分析进一步验证了确定的先验知识的重要性以及LSKNet的有效性。代码可在https://github.com/zcablii/LSKNet找到。

更新时间: 2024-06-18 06:08:24

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2403.11735v3

Research on Dangerous Flight Weather Prediction based on Machine Learning

With the continuous expansion of the scale of air transport, the demand for aviation meteorological support also continues to grow. The impact of hazardous weather on flight safety is critical. How to effectively use meteorological data to improve the early warning capability of flight dangerous weather and ensure the safe flight of aircraft is the primary task of aviation meteorological services. In this work, support vector machine (SVM) models are used to predict hazardous flight weather, especially for meteorological conditions with high uncertainty such as storms and turbulence. SVM is a supervised learning method that distinguishes between different classes of data by finding optimal decision boundaries in a high-dimensional space. In order to meet the needs of this study, we chose the radial basis function (RBF) as the kernel function, which helps to deal with nonlinear problems and enables the model to better capture complex meteorological data structures. During the model training phase, we used historical meteorological observations from multiple weather stations, including temperature, humidity, wind speed, wind direction, and other meteorological indicators closely related to flight safety. From this data, the SVM model learns how to distinguish between normal and dangerous flight weather conditions.

Updated: 2024-06-18 06:08:15

标题: 基于机器学习的危险飞行天气预测研究

摘要: 随着航空运输规模的不断扩大，对航空气象支持的需求也在不断增长。恶劣天气对飞行安全的影响至关重要。如何有效利用气象数据提高对飞行危险天气的预警能力，确保飞机的安全飞行是航空气象服务的首要任务。本研究中，利用支持向量机（SVM）模型来预测危险的飞行天气，特别是对于暴风雨和湍流等存在高度不确定性的气象条件。SVM 是一种监督学习方法，通过在高维空间中找到最优决策边界来区分不同类别的数据。为了满足本研究的需要，我们选择径向基函数（RBF）作为核函数，这有助于处理非线性问题，并使模型更好地捕捉复杂的气象数据结构。在模型训练阶段，我们利用来自多个气象站的历史气象观测数据，包括温度、湿度、风速、风向等与飞行安全密切相关的气象指标。通过这些数据，SVM 模型学习如何区分正常和危险的飞行天气条件。

更新时间: 2024-06-18 06:08:15

领域: cs.AI,physics.ao-ph

下载: http://arxiv.org/abs/2406.12298v1

Faithful Density-Peaks Clustering via Matrix Computations on MPI Parallelization System

Density peaks clustering (DP) has the ability of detecting clusters of arbitrary shape and clustering non-Euclidean space data, but its quadratic complexity in both computing and storage makes it difficult to scale for big data. Various approaches have been proposed in this regard, including MapReduce based distribution computing, multi-core parallelism, presentation transformation (e.g., kd-tree, Z-value), granular computing, and so forth. However, most of these existing methods face two limitations. One is their target datasets are mostly constrained to be in Euclidian space, the other is they emphasize only on local neighbors while ignoring global data distribution due to restriction to cut-off kernel when computing density. To address the two issues, we present a faithful and parallel DP method that makes use of two types of vector-like distance matrices and an inverse leading-node-finding policy. The method is implemented on a message passing interface (MPI) system. Extensive experiments showed that our method is capable of clustering non-Euclidean data such as in community detection, while outperforming the state-of-the-art counterpart methods in accuracy when clustering large Euclidean data. Our code is publicly available at https://github.com/alanxuji/FaithPDP.

Updated: 2024-06-18 06:05:45

标题: 忠实的密度峰聚类：基于MPI并行化系统的矩阵计算

摘要: 密度峰值聚类（DP）具有检测任意形状簇和对非欧几里德空间数据进行聚类的能力，但其在计算和存储方面的二次复杂度使其难以扩展到大数据规模。在此方面已经提出了各种方法，包括基于MapReduce的分布式计算、多核并行、表示转换（如kd树、Z值）、粒度计算等。然而，大多数现有方法面临两个限制。一是它们的目标数据集主要限制为欧几里德空间，另一个是它们仅强调局部邻居，而忽略了全局数据分布，因为在计算密度时限制了截止核。为了解决这两个问题，我们提出了一种忠实且并行的DP方法，利用两种类似向量的距离矩阵和一个反向主节点查找策略。该方法在消息传递接口（MPI）系统上实现。广泛的实验表明，我们的方法能够对社区检测等非欧几里德数据进行聚类，并在聚类大型欧几里德数据时优于最先进的对应方法。我们的代码可在https://github.com/alanxuji/FaithPDP 上公开获取。

更新时间: 2024-06-18 06:05:45

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2406.12297v1

A Deep Dive Into the Factors Influencing Financial Success: A Machine Learning Approach

This paper explores various socioeconomic factors that contribute to individual financial success using machine learning algorithms and approaches. Financial success, a critical aspect of all individual's well-being, is a complex concept influenced by a plethora of different factors. This study aims to understand the true determinants of financial success. It examines the survey data from the National Longitudinal Survey of Youth 1997 by the Bureau of Labor Statistics [1], consisting of a sample of 8,984 individuals's longitudinal data over years. The dataset comprises income variables and a large set of socioeconomic variables of individuals. An in-depth analysis demonstrates the effectiveness of machine learning algorithms in financial success research, highlights the potential of leveraging longitudinal data to enhance prediction accuracy, and provides valuable insights into how various socioeconomic factors influence financial success. The findings underscore the significant influence of highest education degree, occupation and gender as the top three determinants of individual income among socioeconomic factors examined. Yearly working hours, age and work tenure emerge as three secondary influencing factors, and all other factors including parental household income, industry, parents' highest grade and others are identified as tertiary factors. These insights allow researchers to better understand the complex nature of financial success and enable policymakers to grasp the underlying dynamics shaping aspirations, decision-making, and the broader socio-economic fabric of society. This comprehension is crucial for fostering financial success among individuals and advancing broader societal well-being.

Updated: 2024-06-18 06:04:42

标题: 深入探讨影响财务成功的因素：机器学习方法

摘要: 本文利用机器学习算法和方法探讨了各种社会经济因素对个人财务成功的贡献。财务成功是每个人幸福的重要方面，是一个受到众多不同因素影响的复杂概念。本研究旨在了解财务成功的真正决定因素。它通过劳工统计局的1997年国家青年纵向调查数据进行了研究，样本包括8,984个个体多年的纵向数据。数据集包括个人的收入变量和一系列社会经济变量。深入分析展示了机器学习算法在财务成功研究中的有效性，突显了利用纵向数据提高预测准确性的潜力，并提供了有关各种社会经济因素如何影响财务成功的宝贵见解。研究结果强调了最高学历、职业和性别作为所考察的社会经济因素中个人收入的前三大决定因素的重要影响。年度工作小时、年龄和工作年限 emerge 作为三个次要影响因素，其他所有因素包括父母家庭收入、行业、父母最高学历等被确定为三级因素。这些见解让研究人员更好地理解财务成功的复杂性，并让决策者掌握塑造愿望、决策制定和社会经济结构的基本动态。这种理解对于促进个人和推动更广泛的社会幸福至关重要。

更新时间: 2024-06-18 06:04:42

领域: cs.LG

下载: http://arxiv.org/abs/2405.08233v2

Integrated Planning in Hospitals: A Review

Efficient planning of scarce resources in hospitals is a challenging task for which a large variety of Operations Research and Management Science approaches have been developed since the 1950s. While efficient planning of single resources such as operating rooms, beds, or specific types of staff can already lead to enormous efficiency gains, integrated planning of several resources has been shown to hold even greater potential, and a large number of integrated planning approaches have been presented in the literature over the past decades. This paper provides the first literature review that focuses specifically on the Operations Research and Management Science literature related to integrated planning of different resources in hospitals. We collect the relevant literature and analyze it regarding different aspects such as uncertainty modeling and the use of real-life data. Several cross comparisons reveal interesting insights concerning, e.g., relations between the modeling and solution methods used and the practical implementation of the approaches developed. Moreover, we provide a high-level taxonomy for classifying different resource-focused integration approaches and point out gaps in the literature as well as promising directions for future research.

Updated: 2024-06-18 06:02:43

标题: 医院综合规划：一项回顾

摘要: 医院稀缺资源的高效规划是一项具有挑战性的任务，自上世纪50年代以来，已经开发了各种运筹学和管理科学方法。尽管对单一资源如手术室、床位或特定类型的员工进行高效规划已经可以带来巨大的效率提升，但整合规划多种资源已被证明具有更大的潜力，文献中在过去几十年中提出了大量整合规划方法。本文提供了第一篇专注于医院不同资源整合规划的运筹学和管理科学文献综述。我们收集相关文献，并对其进行分析，涉及不同方面，如不确定性建模和使用真实数据。几个跨领域比较揭示了有趣的见解，例如，建模和解决方法之间的关系以及所开发方法的实际实施。此外，我们提供了一个高级分类法，用于对不同资源集成方法进行分类，并指出文献中的空白以及未来研究的有希望的方向。

更新时间: 2024-06-18 06:02:43

领域: cs.AI,cs.DM,math.OC

下载: http://arxiv.org/abs/2307.05258v2

Generative Artificial Intelligence-Guided User Studies: An Application for Air Taxi Services

User studies are crucial for meeting user needs. In user studies, real experimental scenarios and participants are constructed and recruited. However, emerging and unfamiliar studies face limitations, including safety concerns and iterative efficiency. To address these challenges, this study utilizes a large language model (LLM) to create generative AI virtual scenarios for user experience. By recruiting real users to evaluate this experience, we can collect feedback that enables rapid iteration in the early design phase. The air taxi is particularly representative of these challenges and has been chosen as the case study for this research. The key contribution was designing a virtual ATJ using OpenAI's GPT-4 model and AI image and video generators. Based on the LLM-generated scripts, key visuals were created for the air taxi, and the ATJ was evaluated by 72 participants. Furthermore, the LLM demonstrated the ability to identify and suggest environments that significantly improve participants' attitudes toward air taxis. Education level and gender significantly influenced participants' attitudes and their satisfaction with the ATJ. Our study confirms the capability of generative AI to support user studies, providing a feasible approach and valuable insights for designing air taxi user experiences in the early design phase.

Updated: 2024-06-18 06:00:18

标题: 生成人工智能引导的用户研究：用于空中出租车服务的应用

摘要: 用户研究对满足用户需求至关重要。在用户研究中，需要构建和招募真实的实验场景和参与者。然而，新兴和不熟悉的研究面临一些限制，包括安全担忧和迭代效率。为了解决这些挑战，本研究利用大型语言模型（LLM）为用户体验创建生成式人工智能虚拟场景。通过招募真实用户来评估这种体验，我们可以收集反馈，从而在早期设计阶段进行快速迭代。空中出租车特别代表了这些挑战，并被选为本研究的案例研究。关键贡献是利用OpenAI的GPT-4模型和人工智能图像和视频生成器设计虚拟ATJ。基于LLM生成的脚本，为空中出租车创建了关键视觉效果，并由72名参与者对ATJ进行了评估。此外，LLM展示了识别和建议显著改善参与者对空中出租车态度的环境的能力。教育水平和性别显著影响了参与者对ATJ的态度和满意度。我们的研究证实了生成式人工智能支持用户研究的能力，为在早期设计阶段设计空中出租车用户体验提供了可行的方法和宝贵的见解。

更新时间: 2024-06-18 06:00:18

领域: cs.HC,cs.AI

下载: http://arxiv.org/abs/2406.12296v1

Watch Out! Simple Horizontal Class Backdoor Can Trivially Evade Defense

All current backdoor attacks on deep learning (DL) models fall under the category of a vertical class backdoor (VCB) -- class-dependent. In VCB attacks, any sample from a class activates the implanted backdoor when the secret trigger is present. Existing defense strategies overwhelmingly focus on countering VCB attacks, especially those that are source-class-agnostic. This narrow focus neglects the potential threat of other simpler yet general backdoor types, leading to false security implications. This study introduces a new, simple, and general type of backdoor attack coined as the horizontal class backdoor (HCB) that trivially breaches the class dependence characteristic of the VCB, bringing a fresh perspective to the community. HCB is now activated when the trigger is presented together with an innocuous feature, regardless of class. For example, the facial recognition model misclassifies a person who wears sunglasses with a smiling innocuous feature into the targeted person, such as an administrator, regardless of which person. The key is that these innocuous features are horizontally shared among classes but are only exhibited by partial samples per class. Extensive experiments on attacking performance across various tasks, including MNIST, facial recognition, traffic sign recognition, object detection, and medical diagnosis, confirm the high efficiency and effectiveness of the HCB. We rigorously evaluated the evasiveness of the HCB against a series of eleven representative countermeasures, including Fine-Pruning (RAID 18'), STRIP (ACSAC 19'), Neural Cleanse (Oakland 19'), ABS (CCS 19'), Februus (ACSAC 20'), NAD (ICLR 21'), MNTD (Oakland 21'), SCAn (USENIX SEC 21'), MOTH (Oakland 22'), Beatrix (NDSS 23'), and MM-BD (Oakland 24'). None of these countermeasures prove robustness, even when employing a simplistic trigger, such as a small and static white-square patch.

Updated: 2024-06-18 05:59:02

标题: 小心！简单水平类后门可以轻松躲避防御

摘要: 目前对深度学习（DL）模型的所有后门攻击都属于垂直类别后门（VCB） - 类别相关。在VCB攻击中，当存在秘密触发器时，来自某个类别的任何样本都会激活植入的后门。现有的防御策略主要集中在对抗VCB攻击，特别是那些不受源类别影响的攻击。这种狭窄的关注忽视了其他更简单但更普遍的后门类型可能构成的潜在威胁，导致了虚假的安全影响。本研究引入了一种新的、简单且普遍的后门攻击类型，被称为水平类别后门（HCB），它轻松地突破了VCB的类别依赖特征，为社区带来了新的视角。当触发器与无害特征一起呈现时，HCB现在被激活，而不管类别如何。例如，面部识别模型会将戴着太阳镜和微笑无害特征的人误分类为目标人员，比如管理员，而不管是哪个人。关键是这些无害特征在各类别之间是水平共享的，但只有部分样本在每个类别中展示。对包括MNIST、面部识别、交通标志识别、物体检测和医学诊断在内的各种任务的攻击性能进行了广泛实验，证实了HCB的高效性和有效性。我们对HCB对十一种代表性对抗措施的回避性进行了严格评估，包括Fine-Pruning（RAID 18'）、STRIP（ACSAC 19'）、Neural Cleanse（Oakland 19'）、ABS（CCS 19'）、Februus（ACSAC 20'）、NAD（ICLR 21'）、MNTD（Oakland 21'）、SCAn（USENIX SEC 21'）、MOTH（Oakland 22'）、Beatrix（NDSS 23'）和MM-BD（Oakland 24'）。即使使用简单的触发器，如一个小而静态的白色方块补丁，这些对抗措施都未能证明其稳健性。

更新时间: 2024-06-18 05:59:02

领域: cs.CR,cs.LG

下载: http://arxiv.org/abs/2310.00542v3

HistGen: Histopathology Report Generation via Local-Global Feature Encoding and Cross-modal Context Interaction

Histopathology serves as the gold standard in cancer diagnosis, with clinical reports being vital in interpreting and understanding this process, guiding cancer treatment and patient care. The automation of histopathology report generation with deep learning stands to significantly enhance clinical efficiency and lessen the labor-intensive, time-consuming burden on pathologists in report writing. In pursuit of this advancement, we introduce HistGen, a multiple instance learning-empowered framework for histopathology report generation together with the first benchmark dataset for evaluation. Inspired by diagnostic and report-writing workflows, HistGen features two delicately designed modules, aiming to boost report generation by aligning whole slide images (WSIs) and diagnostic reports from local and global granularity. To achieve this, a local-global hierarchical encoder is developed for efficient visual feature aggregation from a region-to-slide perspective. Meanwhile, a cross-modal context module is proposed to explicitly facilitate alignment and interaction between distinct modalities, effectively bridging the gap between the extensive visual sequences of WSIs and corresponding highly summarized reports. Experimental results on WSI report generation show the proposed model outperforms state-of-the-art (SOTA) models by a large margin. Moreover, the results of fine-tuning our model on cancer subtyping and survival analysis tasks further demonstrate superior performance compared to SOTA methods, showcasing strong transfer learning capability. Dataset, model weights, and source code are available in https://github.com/dddavid4real/HistGen.

Updated: 2024-06-18 05:58:43

标题: HistGen：通过局部-全局特征编码和跨模态上下文交互生成组织病理学报告

摘要: 组织病理学在癌症诊断中被视为黄金标准，临床报告对解释和理解这一过程至关重要，指导癌症治疗和患者护理。深度学习在组织病理学报告生成方面的自动化将显著提高临床效率，减轻病理学家在撰写报告过程中的劳动强度和耗时负担。为了追求这一进展，我们引入了HistGen，这是一个多实例学习增强的组织病理学报告生成框架，同时提供了第一个用于评估的基准数据集。受诊断和报告撰写工作流程的启发，HistGen具有两个精心设计的模块，旨在通过将全切片图像（WSIs）和来自局部和全局粒度的诊断报告进行对齐来促进报告生成。为了实现这一目标，开发了一个局部-全局分层编码器，用于从区域到切片的视觉特征聚合。同时，提出了一个跨模态上下文模块，明确促进不同模态之间的对齐和交互，有效地弥合了WSIs的大量视觉序列和相应高度总结的报告之间的差距。关于WSI报告生成的实验结果显示，所提出的模型在性能上远远优于最先进的模型。此外，我们对模型在癌症亚型和生存分析任务上进行微调的结果进一步展示了与最先进方法相比的出色性能，展示了强大的迁移学习能力。数据集、模型权重和源代码可在https://github.com/dddavid4real/HistGen上获得。

更新时间: 2024-06-18 05:58:43

领域: cs.CV,cs.AI,cs.CL,cs.LG

下载: http://arxiv.org/abs/2403.05396v2

SwinGNN: Rethinking Permutation Invariance in Diffusion Models for Graph Generation

Diffusion models based on permutation-equivariant networks can learn permutation-invariant distributions for graph data. However, in comparison to their non-invariant counterparts, we have found that these invariant models encounter greater learning challenges since 1) their effective target distributions exhibit more modes; 2) their optimal one-step denoising scores are the score functions of Gaussian mixtures with more components. Motivated by this analysis, we propose a non-invariant diffusion model, called $\textit{SwinGNN}$, which employs an efficient edge-to-edge 2-WL message passing network and utilizes shifted window based self-attention inspired by SwinTransformers. Further, through systematic ablations, we identify several critical training and sampling techniques that significantly improve the sample quality of graph generation. At last, we introduce a simple post-processing trick, $\textit{i.e.}$, randomly permuting the generated graphs, which provably converts any graph generative model to a permutation-invariant one. Extensive experiments on synthetic and real-world protein and molecule datasets show that our SwinGNN achieves state-of-the-art performances. Our code is released at https://github.com/qiyan98/SwinGNN.

Updated: 2024-06-18 05:55:32

标题: SwinGNN：重新思考扩散模型中图生成的置换不变性

摘要: 基于置换等变网络的扩散模型可以学习图数据的置换不变分布。然而，与非不变对应物相比，我们发现这些不变模型遇到更大的学习挑战，因为1）它们的有效目标分布展示更多模式；2）它们的最佳一步去噪分数是具有更多组分的高斯混合的分数函数。受这一分析的启发，我们提出了一种非不变的扩散模型，称为$\textit{SwinGNN}$，该模型采用高效的边对边2-WL消息传递网络，并利用受SwinTransformers启发的基于移位窗口的自注意机制。此外，通过系统的消融实验，我们确定了几种关键的训练和采样技术，显著提高了图生成样本的质量。最后，我们介绍了一个简单的后处理技巧，即随机排列生成的图，可以将任何图生成模型转换为置换不变模型。在合成和真实的蛋白质和分子数据集上进行的广泛实验表明，我们的SwinGNN实现了最先进的性能。我们的代码已发布在https://github.com/qiyan98/SwinGNN。

更新时间: 2024-06-18 05:55:32

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2307.01646v3

JEN-1 DreamStyler: Customized Musical Concept Learning via Pivotal Parameters Tuning

Large models for text-to-music generation have achieved significant progress, facilitating the creation of high-quality and varied musical compositions from provided text prompts. However, input text prompts may not precisely capture user requirements, particularly when the objective is to generate music that embodies a specific concept derived from a designated reference collection. In this paper, we propose a novel method for customized text-to-music generation, which can capture the concept from a two-minute reference music and generate a new piece of music conforming to the concept. We achieve this by fine-tuning a pretrained text-to-music model using the reference music. However, directly fine-tuning all parameters leads to overfitting issues. To address this problem, we propose a Pivotal Parameters Tuning method that enables the model to assimilate the new concept while preserving its original generative capabilities. Additionally, we identify a potential concept conflict when introducing multiple concepts into the pretrained model. We present a concept enhancement strategy to distinguish multiple concepts, enabling the fine-tuned model to generate music incorporating either individual or multiple concepts simultaneously. Since we are the first to work on the customized music generation task, we also introduce a new dataset and evaluation protocol for the new task. Our proposed Jen1-DreamStyler outperforms several baselines in both qualitative and quantitative evaluations. Demos will be available at https://www.jenmusic.ai/research#DreamStyler.

Updated: 2024-06-18 05:54:11

标题: JEN-1 DreamStyler: 通过关键参数调整实现定制音乐概念学习

摘要: 大型文本到音乐生成模型已经取得了显著进展，促进了从提供的文本提示生成高质量和多样化音乐作品的创作。然而，输入文本提示可能无法准确捕捉用户需求，特别是当目标是生成体现从指定参考集合中得出的特定概念的音乐时。在本文中，我们提出了一种新颖的定制文本到音乐生成方法，可以从两分钟的参考音乐中捕捉概念并生成符合该概念的新音乐作品。我们通过使用参考音乐对预训练的文本到音乐模型进行微调来实现这一目标。然而，直接微调所有参数会导致过拟合问题。为了解决这个问题，我们提出了一个关键参数调整方法，使模型能够吸收新概念同时保留其原始的生成能力。此外，当向预训练模型引入多个概念时，我们发现了潜在的概念冲突。我们提出了一个概念增强策略来区分多个概念，使微调模型能够同时生成包含单个或多个概念的音乐。由于我们是第一个研究定制音乐生成任务的团队，我们还为新任务引入了一个新数据集和评估协议。我们提出的Jen1-DreamStyler在定性和定量评估中均优于几个基准模型。演示将在https://www.jenmusic.ai/research#DreamStyler 上提供。

更新时间: 2024-06-18 05:54:11

领域: cs.SD,cs.AI,eess.AS

下载: http://arxiv.org/abs/2406.12292v1

Causal Graph Discovery with Retrieval-Augmented Generation based Large Language Models

Causal graph recovery is traditionally done using statistical estimation-based methods or based on individual's knowledge about variables of interests. They often suffer from data collection biases and limitations of individuals' knowledge. The advance of large language models (LLMs) provides opportunities to address these problems. We propose a novel method that leverages LLMs to deduce causal relationships in general causal graph recovery tasks. This method leverages knowledge compressed in LLMs and knowledge LLMs extracted from scientific publication database as well as experiment data about factors of interest to achieve this goal. Our method gives a prompting strategy to extract associational relationships among those factors and a mechanism to perform causality verification for these associations. Comparing to other LLM-based methods that directly instruct LLMs to do the highly complex causal reasoning, our method shows clear advantage on causal graph quality on benchmark datasets. More importantly, as causality among some factors may change as new research results emerge, our method show sensitivity to new evidence in the literature and can provide useful information for updating causal graphs accordingly.

Updated: 2024-06-18 05:51:50

标题: 用基于检索增强生成的大型语言模型进行因果图发现

摘要: 因果图恢复传统上是使用基于统计估计的方法或基于个体对感兴趣变量的知识。它们经常受到数据收集偏见和个体知识的限制。大型语言模型（LLMs）的进步为解决这些问题提供了机会。我们提出了一种新颖的方法，利用LLMs来推断一般因果图恢复任务中的因果关系。该方法利用LLMs中压缩的知识以及从科学出版物数据库中提取的LLMs知识以及关于感兴趣因素的实验数据来实现这一目标。我们的方法提供了一种提示策略，来提取这些因素之间的关联关系，并提供一种机制来验证这些关联的因果关系。与其他直接指导LLMs进行高度复杂因果推理的LLM基础方法相比，我们的方法在基准数据集上显示了明显优势。更重要的是，由于一些因素之间的因果关系可能会随着新的研究结果的出现而改变，我们的方法对文献中的新证据显示出敏感性，并可以提供有用的信息，以相应地更新因果图。

更新时间: 2024-06-18 05:51:50

领域: cs.CL,cs.LG,stat.ME

下载: http://arxiv.org/abs/2402.15301v2

Accel-NASBench: Sustainable Benchmarking for Accelerator-Aware NAS

One of the primary challenges impeding the progress of Neural Architecture Search (NAS) is its extensive reliance on exorbitant computational resources. NAS benchmarks aim to simulate runs of NAS experiments at zero cost, remediating the need for extensive compute. However, existing NAS benchmarks use synthetic datasets and model proxies that make simplified assumptions about the characteristics of these datasets and models, leading to unrealistic evaluations. We present a technique that allows searching for training proxies that reduce the cost of benchmark construction by significant margins, making it possible to construct realistic NAS benchmarks for large-scale datasets. Using this technique, we construct an open-source bi-objective NAS benchmark for the ImageNet2012 dataset combined with the on-device performance of accelerators, including GPUs, TPUs, and FPGAs. Through extensive experimentation with various NAS optimizers and hardware platforms, we show that the benchmark is accurate and allows searching for state-of-the-art hardware-aware models at zero cost.

Updated: 2024-06-18 05:51:50

标题: Accel-NASBench: 可持续的加速器感知NAS基准测试

摘要: 神经架构搜索（NAS）进展的一个主要挑战是其对巨大计算资源的广泛依赖。NAS基准旨在模拟NAS实验的运行，无需额外成本，减少了对大量计算资源的需求。然而，现有的NAS基准使用合成数据集和模型代理，对这些数据集和模型的特征做出简化假设，导致了不切实际的评估。我们提出了一种技术，允许搜索降低基准构建成本的训练代理，从而使得可能为大规模数据集构建实际的NAS基准。利用这种技术，我们为ImageNet2012数据集结合加速器的现场性能（包括GPU、TPU和FPGA）构建了一个开源的双目标NAS基准。通过与各种NAS优化器和硬件平台的广泛实验，我们展示了该基准的准确性，并允许在零成本下搜索最先进的硬件感知模型。

更新时间: 2024-06-18 05:51:50

领域: cs.LG,eess.IV

下载: http://arxiv.org/abs/2404.08005v2

Mitigating Privacy Risk in Membership Inference by Convex-Concave Loss

Machine learning models are susceptible to membership inference attacks (MIAs), which aim to infer whether a sample is in the training set. Existing work utilizes gradient ascent to enlarge the loss variance of training data, alleviating the privacy risk. However, optimizing toward a reverse direction may cause the model parameters to oscillate near local minima, leading to instability and suboptimal performance. In this work, we propose a novel method -- Convex-Concave Loss, which enables a high variance of training loss distribution by gradient descent. Our method is motivated by the theoretical analysis that convex losses tend to decrease the loss variance during training. Thus, our key idea behind CCL is to reduce the convexity of loss functions with a concave term. Trained with CCL, neural networks produce losses with high variance for training data, reinforcing the defense against MIAs. Extensive experiments demonstrate the superiority of CCL, achieving state-of-the-art balance in the privacy-utility trade-off.

Updated: 2024-06-18 05:51:47

标题: 通过凸凹损失减轻成员推断中的隐私风险

摘要: 机器学习模型容易受到成员推断攻击（MIAs）的影响，这种攻击旨在推断样本是否在训练集中。现有工作利用梯度上升来扩大训练数据的损失方差，从而减轻隐私风险。然而，朝相反方向优化可能会导致模型参数在局部最小值附近振荡，导致不稳定和次优性能。在这项工作中，我们提出了一种新颖的方法 - 凸凹损失（Convex-Concave Loss），通过梯度下降实现训练损失分布的高方差。我们的方法受到理论分析的启发，即凸损失在训练过程中往往会减少损失方差。因此，我们CCL背后的关键思想是通过凹项来降低损失函数的凸性。经过CCL训练，神经网络为训练数据产生具有高方差的损失，加强了对MIAs的防御。大量实验显示了CCL的优越性，实现了隐私-效用权衡的最新平衡。

更新时间: 2024-06-18 05:51:47

领域: cs.LG,cs.CR

下载: http://arxiv.org/abs/2402.05453v3

Stability of Data-Dependent Ridge-Regularization for Inverse Problems

Theoretical guarantees for the robust solution of inverse problems have important implications for applications. To achieve both guarantees and high reconstruction quality, we propose to learn a pixel-based ridge regularizer with a data-dependent and spatially-varying regularization strength. For this architecture, we establish the existence of solutions to the associated variational problem and the stability of its solution operator. Further, we prove that the reconstruction forms a maximum-a-posteriori approach. Simulations for biomedical imaging and material sciences demonstrate that the approach yields high-quality reconstructions even if only a small instance-specific training set is available.

Updated: 2024-06-18 05:49:54

标题: 数据相关的岭正则化在逆问题中的稳定性

摘要: 逆问题的稳健解决方案的理论保证对应用具有重要意义。为了实现保证和高质量的重建，我们提出学习基于像素的岭正则化器，其具有数据相关和空间变化的正则化强度。对于这种架构，我们建立了相关变分问题的解的存在性和其解算子的稳定性。此外，我们证明重建形成了最大后验方法。生物医学成像和材料科学的模拟表明，即使只有一个小的特定实例的训练集可用，该方法也能产生高质量的重建。

更新时间: 2024-06-18 05:49:54

领域: math.OC,cs.LG

下载: http://arxiv.org/abs/2406.12289v1

Teleporter Theory: A General and Simple Approach for Modeling Cross-World Counterfactual Causality

Leveraging the development of structural causal model (SCM), researchers can establish graphical models for exploring the causal mechanisms behind machine learning techniques. As the complexity of machine learning applications rises, single-world interventionism causal analysis encounters theoretical adaptation limitations. Accordingly, cross-world counterfactual approach extends our understanding of causality beyond observed data, enabling hypothetical reasoning about alternative scenarios. However, the joint involvement of cross-world variables, encompassing counterfactual variables and real-world variables, challenges the construction of the graphical model. Twin network is a subtle attempt, establishing a symbiotic relationship, to bridge the gap between graphical modeling and the introduction of counterfactuals albeit with room for improvement in generalization. In this regard, we demonstrate the theoretical breakdowns of twin networks in certain cross-world counterfactual scenarios. To this end, we propose a novel teleporter theory to establish a general and simple graphical representation of counterfactuals, which provides criteria for determining teleporter variables to connect multiple worlds. In theoretical application, we determine that introducing the proposed teleporter theory can directly obtain the conditional independence between counterfactual variables and real-world variables from the cross-world SCM without requiring complex algebraic derivations. Accordingly, we can further identify counterfactual causal effects through cross-world symbolic derivation. We demonstrate the generality of the teleporter theory to the practical application. Adhering to the proposed theory, we build a plug-and-play module, and the effectiveness of which are substantiated by experiments on benchmarks.

Updated: 2024-06-18 05:49:27

标题: 瞬间移动理论：一种通用且简单的建模跨世界反事实因果关系的方法

摘要: 利用结构因果模型（SCM）的发展，研究人员可以建立用于探索机器学习技术背后因果机制的图形模型。随着机器学习应用复杂性的增加，单一世界干预主义因果分析遇到了理论适应性限制。因此，跨世界反事实方法扩展了我们对因果关系的理解，使我们能够对替代场景进行假设推理。然而，跨世界变量的共同参与，包括反事实变量和真实世界变量，挑战了图形模型的构建。双胞胎网络是一个微妙的尝试，建立了一种共生关系，以弥合图形建模和引入反事实之间的差距，尽管在概括方面还有改进的空间。在这方面，我们展示了在某些跨世界反事实场景中双胞胎网络的理论破裂。为此，我们提出了一种新颖的传送门理论，以建立反事实的一般和简单的图形表示，为确定连接多个世界的传送门变量提供标准。在理论应用中，我们确定引入所提出的传送门理论可以直接从跨世界SCM中获得反事实变量与真实世界变量之间的条件独立性，而无需复杂的代数推导。因此，我们可以通过跨世界符号推导进一步确定反事实因果效应。我们展示了传送门理论对实际应用的一般性。遵循所提出的理论，我们构建了一个即插即用模块，并通过对基准测试的实验证明了其有效性。

更新时间: 2024-06-18 05:49:27

领域: cs.LG,cs.AI,stat.ME

下载: http://arxiv.org/abs/2406.11501v2

An Investigation of Neuron Activation as a Unified Lens to Explain Chain-of-Thought Eliciting Arithmetic Reasoning of LLMs

Large language models (LLMs) have shown strong arithmetic reasoning capabilities when prompted with Chain-of-Thought (CoT) prompts. However, we have only a limited understanding of how they are processed by LLMs. To demystify it, prior work has primarily focused on ablating different components in the CoT prompt and empirically observing their resulting LLM performance change. Yet, the reason why these components are important to LLM reasoning is not explored. To fill this gap, in this work, we investigate ``neuron activation'' as a lens to provide a unified explanation to observations made by prior work. Specifically, we look into neurons within the feed-forward layers of LLMs that may have activated their arithmetic reasoning capabilities, using Llama2 as an example. To facilitate this investigation, we also propose an approach based on GPT-4 to automatically identify neurons that imply arithmetic reasoning. Our analyses revealed that the activation of reasoning neurons in the feed-forward layers of an LLM can explain the importance of various components in a CoT prompt, and future research can extend it for a more complete understanding.

Updated: 2024-06-18 05:49:24

标题: 对LLMs的连续思维引发算术推理的解释：神经元激活的研究

摘要: 大型语言模型(LLMs)在接收链式思维(CoT)提示时展现出强大的算术推理能力。然而，我们对它们在LLMs中是如何处理的仅有有限的理解。为了揭示其神秘性，先前的研究主要集中在消除CoT提示中的不同组成部分，并经验性地观察它们对LLM性能的影响。然而，这些组件对LLM推理的重要性的原因尚未被探讨。为了填补这一空白，本研究通过“神经元激活”作为一个视角，提供了先前研究观察到的现象的统一解释。具体地，我们研究了可能已经激活其算术推理能力的LLMs前馈层内的神经元，以Llama2为例。为了促进这一调查，我们还提出了一种基于GPT-4的方法，以自动识别暗示算术推理的神经元。我们的分析表明，在LLM的前馈层中激活推理神经元可以解释CoT提示中各个组件的重要性，未来的研究可以扩展这一理解以获得更全面的理解。

更新时间: 2024-06-18 05:49:24

领域: cs.AI,I.2.7

下载: http://arxiv.org/abs/2406.12288v1

Incentive-Aware Recommender Systems in Two-Sided Markets

Online platforms in the Internet Economy commonly incorporate recommender systems that recommend products (or "arms") to users (or "agents"). A key challenge in this domain arises from myopic agents who are naturally incentivized to exploit by choosing the optimal arm based on current information, rather than exploring various alternatives to gather information that benefits the collective. We propose a novel recommender system that aligns with agents' incentives while achieving asymptotically optimal performance, as measured by regret in repeated interactions. Our framework models this incentive-aware system as a multi-agent bandit problem in two-sided markets, where the interactions of agents and arms are facilitated by recommender systems on online platforms. This model incorporates incentive constraints induced by agents' opportunity costs. In scenarios where opportunity costs are known to the platform, we show the existence of an incentive-compatible recommendation algorithm. This algorithm pools recommendations between a genuinely good arm and an unknown arm using a randomized and adaptive strategy. Moreover, when these opportunity costs are unknown, we introduce an algorithm that randomly pools recommendations across all arms, utilizing the cumulative loss from each arm as feedback for strategic exploration. We demonstrate that both algorithms satisfy an ex-post fairness criterion, which protects agents from over-exploitation. All code for using the proposed algorithms and reproducing results is made available on GitHub.

Updated: 2024-06-18 05:45:41

标题: 双边市场中的激励感知推荐系统

摘要: 在线平台在互联网经济中通常会整合推荐系统，向用户推荐产品（或“臂”）。这个领域的一个关键挑战来自于那些有短视的代理人，他们在选择最佳臂时往往会基于当前信息而不是探索各种替代方案以收集对整体有益的信息，这在自然情况下是有动机的。我们提出了一种新颖的推荐系统，它与代理人的激励相一致，同时实现了在重复交互中通过遗憾来衡量的渐近最佳性能。我们的框架将这种激励感知系统建模为双边市场中的多代理人赌博问题，代理人和臂之间的交互由在线平台上的推荐系统促进。这个模型包含了由代理人的机会成本引发的激励约束。在平台已知机会成本的情况下，我们展示了存在一种激励兼容的推荐算法。这种算法使用随机化和自适应策略在一个真正好的臂和一个未知臂之间池化推荐。此外，当这些机会成本未知时，我们引入了一种算法，通过随机池化推荐跨所有臂，利用每个臂的累积损失作为战略探索的反馈。我们证明这两种算法都满足事后公平标准，保护代理人免受过度开采。所有使用提出的算法和重现结果的代码都可以在GitHub上找到。

更新时间: 2024-06-18 05:45:41

领域: cs.IR,cs.LG,stat.ML

下载: http://arxiv.org/abs/2211.15381v2

Formal-LLM: Integrating Formal Language and Natural Language for Controllable LLM-based Agents

Recent advancements on Large Language Models (LLMs) enable AI Agents to automatically generate and execute multi-step plans to solve complex tasks. However, since LLM's content generation process is hardly controllable, current LLM-based agents frequently generate invalid or non-executable plans, which jeopardizes the performance of the generated plans and corrupts users' trust in LLM-based agents. In response, this paper proposes a novel ``Formal-LLM'' framework for LLM-based agents by integrating the expressiveness of natural language and the precision of formal language. Specifically, the framework allows human users to express their requirements or constraints for the planning process as an automaton. A stack-based LLM plan generation process is then conducted under the supervision of the automaton to ensure that the generated plan satisfies the constraints, making the planning process controllable. We conduct experiments on both benchmark tasks and practical real-life tasks, and our framework achieves over 50% overall performance increase, which validates the feasibility and effectiveness of employing Formal-LLM to guide the plan generation of agents, preventing the agents from generating invalid and unsuccessful plans. Further, more controllable LLM-based agents can facilitate the broader utilization of LLM in application scenarios where high validity of planning is essential. The work is open-sourced at https://github.com/agiresearch/Formal-LLM.

Updated: 2024-06-18 05:44:03

标题: Formal-LLM：将形式语言和自然语言整合为可控的基于LLM的代理程序

摘要: 最近关于大规模语言模型（LLMs）的进展使得人工智能代理能够自动生成和执行多步计划以解决复杂任务。然而，由于LLM的内容生成过程几乎不可控，目前基于LLM的代理经常生成无效或不可执行的计划，这危及了生成计划的性能并破坏了用户对LLM代理的信任。为此，本文提出了一种新颖的“Formal-LLM”框架，通过整合自然语言的表达能力和形式语言的精确性，为基于LLM的代理提供了解决方案。具体来说，该框架允许人类用户将他们对规划过程的需求或约束表达为自动机。然后，在自动机的监督下进行基于栈的LLM计划生成过程，以确保生成的计划满足约束条件，使规划过程可控。我们在基准任务和实际生活任务上进行实验，我们的框架实现了超过50％的整体性能提升，验证了采用Formal-LLM来引导代理计划生成的可行性和有效性，防止代理生成无效和失败的计划。此外，更可控的基于LLM的代理可以促进LLM在需要规划的高度有效性的应用场景中的广泛利用。该工作已在https://github.com/agiresearch/Formal-LLM上开源。

更新时间: 2024-06-18 05:44:03

领域: cs.LG,cs.AI,cs.CL,cs.FL

下载: http://arxiv.org/abs/2402.00798v3

Unique Security and Privacy Threats of Large Language Model: A Comprehensive Survey

With the rapid development of artificial intelligence, large language models (LLMs) have made remarkable advancements in natural language processing. These models are trained on vast datasets to exhibit powerful language understanding and generation capabilities across various applications, including machine translation, chatbots, and agents. However, LLMs have revealed a variety of privacy and security issues throughout their life cycle, drawing significant academic and industrial attention. Moreover, the risks faced by LLMs differ significantly from those encountered by traditional language models. Given that current surveys lack a clear taxonomy of unique threat models across diverse scenarios, we emphasize the unique privacy and security threats associated with five specific scenarios: pre-training, fine-tuning, retrieval-augmented generation systems, deployment, and LLM-based agents. Addressing the characteristics of each risk, this survey outlines potential threats and countermeasures. Research on attack and defense situations can offer feasible research directions, enabling more areas to benefit from LLMs.

Updated: 2024-06-18 05:37:06

标题: 大型语言模型的独特安全和隐私威胁：一项全面调查

摘要: 随着人工智能的快速发展，大型语言模型（LLMs）在自然语言处理方面取得了显著进展。这些模型经过大量数据集的训练，展示了强大的语言理解和生成能力，广泛应用于机器翻译、聊天机器人和代理等各种应用中。然而，LLMs在其生命周期中揭示了各种隐私和安全问题，吸引了学术界和工业界的重视。此外，LLMs面临的风险与传统语言模型所遇到的风险有很大不同。鉴于当前的调查缺乏清晰的针对不同场景的独特威胁模型分类，我们强调与五个特定场景相关的独特隐私和安全威胁：预训练、微调、检索增强生成系统、部署以及基于LLM的代理。通过对每种风险的特征进行讨论，本调查概述了潜在的威胁和对策。对攻击和防御情况的研究可以提供可行的研究方向，使更多领域受益于LLMs。

更新时间: 2024-06-18 05:37:06

领域: cs.CR

下载: http://arxiv.org/abs/2406.07973v2

Efficient algorithms for implementing incremental proximal-point methods

Model training algorithms which observe a small portion of the training set in each computational step are ubiquitous in practical machine learning, and include both stochastic and online optimization methods. In the vast majority of cases, such algorithms typically observe the training samples via the gradients of the cost functions the samples incur. Thus, these methods exploit are the slope of the cost functions via their first-order approximations. To address limitations of gradient-based methods, such as sensitivity to step-size choice in the stochastic setting, or inability to use small function variability in the online setting, several streams of research attempt to exploit more information about the cost functions than just their gradients via the well-known proximal operators. However, implementing such methods in practice poses a challenge, since each iteration step boils down to computing the proximal operator, which may not be easy. In this work we devise a novel algorithmic framework, which exploits convex duality theory to achieve both algorithmic efficiency and software modularity of proximal operator implementations, in order to make experimentation with incremental proximal optimization algorithms accessible to a larger audience of researchers and practitioners, by reducing the gap between their theoretical description in research papers and their use in practice. We provide a reference Python implementation for the framework developed in this paper as an open source library at on https://github.com/alexshtf/inc_prox_pt/releases/tag/prox_pt_paper, along with examples which demonstrate our implementation on a variety of problems, and reproduce the numerical experiments in this paper. The pure Python reference implementation is not necessarily the most efficient, but is a basis for creating efficient implementations by combining Python with a native backend.

Updated: 2024-06-18 05:35:53

标题: 高效算法实现增量近端点方法

摘要: 训练算法观察每个计算步骤中训练集的一小部分，在实际机器学习中是普遍存在的，包括随机和在线优化方法。在绝大多数情况下，这些算法通常通过训练样本所产生的成本函数的梯度来观察训练样本。因此，这些方法通过它们的一阶近似利用成本函数的斜率。为了解决基于梯度的方法的局限性，比如在随机设置中对步长选择的敏感性，或者在线设置中无法利用小函数变化性，有几个研究领域试图通过众所周知的近端算子来利用有关成本函数的更多信息，而不仅仅是它们的梯度。然而，在实践中实现这些方法是一个挑战，因为每个迭代步骤归结为计算近端算子，这可能并不容易。在这项工作中，我们设计了一个新颖的算法框架，利用凸对偶理论实现算法效率和近端算子实现的软件模块化，以使更多研究人员和从业者能够实验增量近端优化算法，缩小研究论文中的理论描述与实践中的应用之间的差距。我们在https://github.com/alexshtf/inc_prox_pt/releases/tag/prox_pt_paper上提供了本文开发的框架的参考Python实现作为开源库，并提供了演示我们在各种问题上实现的示例，并重现本文中的数值实验。纯Python参考实现未必是最有效的，但是可以通过将Python与本地后端结合来创建有效的实现的基础。

更新时间: 2024-06-18 05:35:53

领域: cs.LG,math.OC

下载: http://arxiv.org/abs/2205.01457v2

Accelerating optimization over the space of probability measures

The acceleration of gradient-based optimization methods is a subject of significant practical and theoretical importance, particularly within machine learning applications. While much attention has been directed towards optimizing within Euclidean space, the need to optimize over spaces of probability measures in machine learning motivates exploration of accelerated gradient methods in this context too. To this end, we introduce a Hamiltonian-flow approach analogous to momentum-based approaches in Euclidean space. We demonstrate that, in the continuous-time setting, algorithms based on this approach can achieve convergence rates of arbitrarily high order. We complement our findings with numerical examples.

Updated: 2024-06-18 05:33:01

标题: 加速在概率测度空间上的优化

摘要: 梯度优化方法的加速是一个具有重要实际和理论意义的主题，特别是在机器学习应用中。虽然人们在优化欧几里得空间方面投入了大量注意力，但在机器学习中需要优化概率测度空间的需求也促使了对这一领域中加速梯度方法的探索。为此，我们引入了类似于欧几里得空间中基于动量的方法的哈密顿流方法。我们证明，在连续时间设置中，基于这种方法的算法可以实现任意高阶的收敛速度。我们用数值示例来补充我们的发现。

更新时间: 2024-06-18 05:33:01

领域: math.OC,cs.LG

下载: http://arxiv.org/abs/2310.04006v3

VIRL: Volume-Informed Representation Learning towards Few-shot Manufacturability Estimation

Designing for manufacturing poses significant challenges in part due to the computation bottleneck of Computer-Aided Manufacturing (CAM) simulations. Although deep learning as an alternative offers fast inference, its performance is dependently bounded by the need for abundant training data. Representation learning, particularly through pre-training, offers promise for few-shot learning, aiding in manufacturability tasks where data can be limited. This work introduces VIRL, a Volume-Informed Representation Learning approach to pre-train a 3D geometric encoder. The pretrained model is evaluated across four manufacturability indicators obtained from CAM simulations: subtractive machining (SM) time, additive manufacturing (AM) time, residual von Mises stress, and blade collisions during Laser Power Bed Fusion process. Across all case studies, the model pre-trained by VIRL shows substantial enhancements on demonstrating improved generalizability with limited data and superior performance with larger datasets. Regarding deployment strategy, case-specific phenomenon exists where finetuning VIRL-pretrained models adversely affects AM tasks with limited data but benefits SM time prediction. Moreover, the efficacy of Low-rank adaptation (LoRA), which balances between probing and finetuning, is explored. LoRA shows stable performance akin to probing with limited data, while achieving a higher upper bound than probing as data size increases, without the computational costs of finetuning. Furthermore, static normalization of manufacturing indicators consistently performs well across tasks, while dynamic normalization enhances performance when a reliable task dependent input is available.

Updated: 2024-06-18 05:30:26

标题: VIRL：基于体积信息的表示学习，用于少样本制造可行性估计

摘要: 制造设计面临重大挑战，部分原因是由于计算机辅助制造（CAM）模拟的计算瓶颈。尽管深度学习作为一种替代方案提供了快速推断，但其性能受制于对丰富训练数据的需求。表示学习，特别是通过预训练，为少样本学习提供了希望，有助于制造性任务，其中数据可能有限。本文介绍了VIRL，一种基于体积信息的表示学习方法，用于预训练3D几何编码器。评估了从CAM模拟获得的四个可制造性指标：去除加工（SM）时间、增材制造（AM）时间、残余von Mises应力以及激光功率熔融过程中的刀片碰撞。在所有案例研究中，由VIRL预训练的模型显示出在有限数据下展示改进的泛化能力和在更大数据集下表现出优越性能。关于部署策略，存在特定案例现象，即微调VIRL预训练模型对有限数据的AM任务产生不利影响，但有益于SM时间预测。此外，探索了低秩适应（LoRA）的有效性，其在有限数据下表现稳定，同时在数据量增加时实现比探测更高的上限，而无需微调的计算成本。此外，制造指标的静态归一化在各项任务中始终表现良好，而当可靠的任务依赖输入可用时，动态归一化可以提高性能。

更新时间: 2024-06-18 05:30:26

领域: cs.LG,cs.CV

下载: http://arxiv.org/abs/2406.12286v1

DASSF: Dynamic-Attention Scale-Sequence Fusion for Aerial Object Detection

The detection of small objects in aerial images is a fundamental task in the field of computer vision. Moving objects in aerial photography have problems such as different shapes and sizes, dense overlap, occlusion by the background, and object blur, however, the original YOLO algorithm has low overall detection accuracy due to its weak ability to perceive targets of different scales. In order to improve the detection accuracy of densely overlapping small targets and fuzzy targets, this paper proposes a dynamic-attention scale-sequence fusion algorithm (DASSF) for small target detection in aerial images. First, we propose a dynamic scale sequence feature fusion (DSSFF) module that improves the up-sampling mechanism and reduces computational load. Secondly, a x-small object detection head is specially added to enhance the detection capability of small targets. Finally, in order to improve the expressive ability of targets of different types and sizes, we use the dynamic head (DyHead). The model we proposed solves the problem of small target detection in aerial images and can be applied to multiple different versions of the YOLO algorithm, which is universal. Experimental results show that when the DASSF method is applied to YOLOv8, compared to YOLOv8n, on the VisDrone-2019 and DIOR datasets, the model shows an increase of 9.2% and 2.4% in the mean average precision (mAP), respectively, and outperforms the current mainstream methods.

Updated: 2024-06-18 05:26:44

标题: DASSF: 用于航空目标检测的动态注意力尺度序列融合

摘要: 在航空图像中检测小物体是计算机视觉领域的一个基本任务。航空摄影中的移动物体存在形状和大小不同、密集重叠、被背景遮挡和模糊等问题，然而，原始的YOLO算法由于对不同尺度目标的感知能力较弱，导致整体检测准确度较低。为了提高密集重叠小目标和模糊目标的检测准确度，本文提出了一种用于航空图像中小目标检测的动态注意力尺度序列融合算法（DASSF）。首先，我们提出了一个动态尺度序列特征融合（DSSFF）模块，改进了上采样机制并减少了计算负载。其次，特别添加了一个小目标检测头部，以增强对小目标的检测能力。最后，为了提高不同类型和大小目标的表达能力，我们使用了动态头部（DyHead）。我们提出的模型解决了航空图像中小目标检测的问题，并可应用于多个不同版本的YOLO算法，具有通用性。实验结果表明，当将DASSF方法应用于YOLOv8时，在VisDrone-2019和DIOR数据集上，与YOLOv8n相比，模型的平均精度（mAP）分别提高了9.2%和2.4%，优于当前主流方法。

更新时间: 2024-06-18 05:26:44

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2406.12285v1

Demystifying the Recency Heuristic in Temporal-Difference Learning

The recency heuristic in reinforcement learning is the assumption that stimuli that occurred closer in time to an acquired reward should be more heavily reinforced. The recency heuristic is one of the key assumptions made by TD($\lambda$), which reinforces recent experiences according to an exponentially decaying weighting. In fact, all other widely used return estimators for TD learning, such as $n$-step returns, satisfy a weaker (i.e., non-monotonic) recency heuristic. Why is the recency heuristic effective for temporal credit assignment? What happens when credit is assigned in a way that violates this heuristic? In this paper, we analyze the specific mathematical implications of adopting the recency heuristic in TD learning. We prove that any return estimator satisfying this heuristic: 1) is guaranteed to converge to the correct value function, 2) has a relatively fast contraction rate, and 3) has a long window of effective credit assignment, yet bounded worst-case variance. We also give a counterexample where on-policy, tabular TD methods violating the recency heuristic diverge. Our results offer some of the first theoretical evidence that credit assignment based on the recency heuristic facilitates learning.

Updated: 2024-06-18 05:23:29

标题: 解密时间差分学习中的最近启发式

摘要: 在强化学习中的最近性启发式是这样一种假设：接收到奖励近的刺激应该受到更重视。最近性启发式是TD($\lambda$)所作的关键假设之一，根据指数衰减权重强化最近的经验。事实上，所有其他广泛使用的TD学习的回报估计器，如$n$步回报，都满足一个较弱（即非单调）的最近性启发式。为什么最近性启发式对时间性信用分配有效？当信用分配违反这一启发式时会发生什么？在本文中，我们分析了在TD学习中采用最近性启发式的具体数学含义。我们证明，任何满足这一启发式的回报估计器：1) 保证收敛到正确的价值函数，2) 具有相对较快的收缩速率，3) 具有长窗口的有效信用分配，但有界的最坏情况方差。我们还提供了一个反例，表格化的在政策TD方法违反最近性启发式会发散。我们的结果提供了一些最初的理论证据，即基于最近性启发式的信用分配有助于学习。

更新时间: 2024-06-18 05:23:29

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2406.12284v1

SAGDFN: A Scalable Adaptive Graph Diffusion Forecasting Network for Multivariate Time Series Forecasting

Time series forecasting is essential for our daily activities and precise modeling of the complex correlations and shared patterns among multiple time series is essential for improving forecasting performance. Spatial-Temporal Graph Neural Networks (STGNNs) are widely used in multivariate time series forecasting tasks and have achieved promising performance on multiple real-world datasets for their ability to model the underlying complex spatial and temporal dependencies. However, existing studies have mainly focused on datasets comprising only a few hundred sensors due to the heavy computational cost and memory cost of spatial-temporal GNNs. When applied to larger datasets, these methods fail to capture the underlying complex spatial dependencies and exhibit limited scalability and performance. To this end, we present a Scalable Adaptive Graph Diffusion Forecasting Network (SAGDFN) to capture complex spatial-temporal correlation for large-scale multivariate time series and thereby, leading to exceptional performance in multivariate time series forecasting tasks. The proposed SAGDFN is scalable to datasets of thousands of nodes without the need of prior knowledge of spatial correlation. Extensive experiments demonstrate that SAGDFN achieves comparable performance with state-of-the-art baselines on one real-world dataset of 207 nodes and outperforms all state-of-the-art baselines by a significant margin on three real-world datasets of 2000 nodes.

Updated: 2024-06-18 05:19:51

标题: SAGDFN：一种用于多变量时间序列预测的可扩展自适应图扩散预测网络

摘要: 时间序列预测对我们的日常活动至关重要，精确建模多个时间序列之间的复杂相关性和共享模式对于提高预测性能至关重要。空间-时间图神经网络（STGNNs）被广泛应用于多变量时间序列预测任务，并在多个真实世界数据集上取得了令人期待的表现，因为它们能够模拟潜在的复杂空间和时间依赖关系。然而，现有研究主要集中在仅包含几百个传感器的数据集上，因为空间-时间GNN的计算成本和内存成本较高。当应用于更大的数据集时，这些方法无法捕捉潜在的复杂空间依赖关系，且具有有限的可扩展性和性能。因此，我们提出了一种可扩展的自适应图扩散预测网络（SAGDFN），以捕捉大规模多变量时间序列的复杂空间-时间相关性，从而在多变量时间序列预测任务中表现出卓越性能。所提出的SAGDFN可扩展到数千个节点的数据集，无需事先了解空间相关性。广泛的实验表明，SAGDFN在一个由207个节点组成的真实世界数据集上实现了与最先进基线相当的性能，并在三个由2000个节点组成的真实世界数据集上明显优于所有最先进的基线。

更新时间: 2024-06-18 05:19:51

领域: cs.LG

下载: http://arxiv.org/abs/2406.12282v1

How Susceptible are Large Language Models to Ideological Manipulation?

Large Language Models (LLMs) possess the potential to exert substantial influence on public perceptions and interactions with information. This raises concerns about the societal impact that could arise if the ideologies within these models can be easily manipulated. In this work, we investigate how effectively LLMs can learn and generalize ideological biases from their instruction-tuning data. Our findings reveal a concerning vulnerability: exposure to only a small amount of ideologically driven samples significantly alters the ideology of LLMs. Notably, LLMs demonstrate a startling ability to absorb ideology from one topic and generalize it to even unrelated ones. The ease with which LLMs' ideologies can be skewed underscores the risks associated with intentionally poisoned training data by malicious actors or inadvertently introduced biases by data annotators. It also emphasizes the imperative for robust safeguards to mitigate the influence of ideological manipulations on LLMs.

Updated: 2024-06-18 05:14:02

标题: 大型语言模型对意识形态操纵有多容易受影响？

摘要: 大型语言模型（LLMs）具有对公众看法和信息互动产生重大影响的潜力。这引发了对社会影响的担忧，如果这些模型中的意识形态可以轻易被操纵，可能会引发问题。在这项工作中，我们研究了LLMs如何有效地从其调整数据中学习和推广意识形态偏见。我们的发现揭示了一个令人担忧的脆弱性：仅暴露于少量具有意识形态驱动的样本就显着改变了LLMs的意识形态。值得注意的是，LLMs表现出了惊人的能力，能够从一个主题吸收意识形态，并将其推广到甚至不相关的主题。LLMs意识形态可以被扭曲的轻松程度突显了恶意行为者故意操纵训练数据或数据标注者无意引入偏见所带来的风险。这也强调了减轻意识形态操纵对LLMs影响的强有力保障的迫切性。

更新时间: 2024-06-18 05:14:02

领域: cs.CL,cs.CR,cs.CY

下载: http://arxiv.org/abs/2402.11725v3

CodeNav: Beyond tool-use to using real-world codebases with LLM agents

We present CodeNav, an LLM agent that navigates and leverages previously unseen code repositories to solve user queries. In contrast to tool-use LLM agents that require ``registration'' of all relevant tools via manual descriptions within the LLM context, CodeNav automatically indexes and searches over code blocks in the target codebase, finds relevant code snippets, imports them, and uses them to iteratively generate a solution with execution feedback. To highlight the core-capabilities of CodeNav, we first showcase three case studies where we use CodeNav for solving complex user queries using three diverse codebases. Next, on three benchmarks, we quantitatively compare the effectiveness of code-use (which only has access to the target codebase) to tool-use (which has privileged access to all tool names and descriptions). Finally, we study the effect of varying kinds of tool and library descriptions on code-use performance, as well as investigate the advantage of the agent seeing source code as opposed to natural descriptions of code. All code will be made open source under a permissive license.

Updated: 2024-06-18 05:10:38

标题: CodeNav：超越工具使用，使用带有LLM代理的真实代码库

摘要: 我们提出了CodeNav，这是一个LLM代理程序，可以导航和利用先前未见过的代码库来解决用户查询。与需要通过LLM上下文中的手动描述“注册”所有相关工具的工具使用LLM代理程序不同，CodeNav自动索引和搜索目标代码库中的代码块，找到相关的代码片段，导入它们，并使用它们来迭代生成具有执行反馈的解决方案。为了突出CodeNav的核心功能，我们首先展示了三个案例研究，在这些案例研究中，我们使用CodeNav来解决三个不同代码库的复杂用户查询。接下来，在三个基准测试中，我们定量比较了仅具有访问目标代码库的代码使用效果（只能访问目标代码库）与具有特权访问所有工具名称和描述的工具使用效果。最后，我们研究了不同种类工具和库描述对代码使用性能的影响，以及调查代理程序看到源代码与自然代码描述的优势。所有代码将在一种宽松许可证下开源。

更新时间: 2024-06-18 05:10:38

领域: cs.AI,cs.CL,cs.SE

下载: http://arxiv.org/abs/2406.12276v1

Uncertainty Quantification on Clinical Trial Outcome Prediction

The importance of uncertainty quantification is increasingly recognized in the diverse field of machine learning. Accurately assessing model prediction uncertainty can help provide deeper understanding and confidence for researchers and practitioners. This is especially critical in medical diagnosis and drug discovery areas, where reliable predictions directly impact research quality and patient health. In this paper, we proposed incorporating uncertainty quantification into clinical trial outcome predictions. Our main goal is to enhance the model's ability to discern nuanced differences, thereby significantly improving its overall performance. We have adopted a selective classification approach to fulfill our objective, integrating it seamlessly with the Hierarchical Interaction Network (HINT), which is at the forefront of clinical trial prediction modeling. Selective classification, encompassing a spectrum of methods for uncertainty quantification, empowers the model to withhold decision-making in the face of samples marked by ambiguity or low confidence, thereby amplifying the accuracy of predictions for the instances it chooses to classify. A series of comprehensive experiments demonstrate that incorporating selective classification into clinical trial predictions markedly enhances the model's performance, as evidenced by significant upticks in pivotal metrics such as PR-AUC, F1, ROC-AUC, and overall accuracy. Specifically, the proposed method achieved 32.37\%, 21.43\%, and 13.27\% relative improvement on PR-AUC over the base model (HINT) in phase I, II, and III trial outcome prediction, respectively. When predicting phase III, our method reaches 0.9022 PR-AUC scores. These findings illustrate the robustness and prospective utility of this strategy within the area of clinical trial predictions, potentially setting a new benchmark in the field.

Updated: 2024-06-18 05:09:46

标题: 临床试验结果预测的不确定性量化

摘要: 不确定性量化的重要性在机器学习的各个领域中日益受到认可。准确评估模型预测的不确定性可以帮助研究人员和从业者更深入地理解和信任。这在医学诊断和药物发现领域尤为关键，可靠的预测直接影响研究质量和患者健康。本文提出将不确定性量化纳入临床试验结果预测中。我们的主要目标是增强模型辨别微妙差异的能力，从而显著提高其整体性能。我们采用了选择性分类方法来实现我们的目标，将其与层次交互网络（HINT）无缝集成，后者处于临床试验预测建模的前沿。选择性分类涵盖了一系列用于不确定性量化的方法，使模型能够在面对模棱两可或低置信度的样本时暂缓决策，从而增强对其选择分类的预测准确性。一系列综合实验表明，将选择性分类纳入临床试验预测显著提高了模型的性能，如PR-AUC、F1、ROC-AUC和整体准确率等关键指标明显增加。具体而言，所提出的方法在I、II、III期试验结果预测的PR-AUC上相对于基础模型（HINT）分别实现了32.37%、21.43%和13.27%的改善。在预测第III期时，我们的方法获得了0.9022的PR-AUC分数。这些发现展示了在临床试验预测领域中这一策略的稳健性和潜在实用性，可能在该领域设立新的基准。

更新时间: 2024-06-18 05:09:46

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2401.03482v2

Slot State Space Models

Recent State Space Models (SSMs) such as S4, S5, and Mamba have shown remarkable computational benefits in long-range temporal dependency modeling. However, in many sequence modeling problems, the underlying process is inherently modular and it is of interest to have inductive biases that mimic this modular structure. In this paper, we introduce SlotSSMs, a novel framework for incorporating independent mechanisms into SSMs to preserve or encourage separation of information. Unlike conventional SSMs that maintain a monolithic state vector, SlotSSMs maintains the state as a collection of multiple vectors called slots. Crucially, the state transitions are performed independently per slot with sparse interactions across slots implemented via the bottleneck of self-attention. In experiments, we evaluate our model in object-centric video understanding, 3D visual reasoning, and video prediction tasks, which involve modeling multiple objects and their long-range temporal dependencies. We find that our proposed design offers substantial performance gains over existing sequence modeling methods.

Updated: 2024-06-18 04:59:14

标题: "槽状态空间模型"

摘要: 最近的状态空间模型（SSMs）如S4、S5和Mamba在建模长程时间依赖性方面显示出了显著的计算优势。然而，在许多序列建模问题中，底层过程本质上是模块化的，因此具有模拟这种模块化结构的归纳偏差是有趣的。在本文中，我们介绍了SlotSSMs，这是一个将独立机制合并到SSMs中以保持或鼓励信息分离的新框架。与保持单一状态向量的传统SSMs不同，SlotSSMs将状态保持为称为槽的多个向量的集合。关键是，状态转换是针对每个槽独立执行的，跨槽之间的稀疏交互通过自注意力的瓶颈实现。在实验中，我们评估了我们的模型在以对象为中心的视频理解、3D视觉推理和视频预测任务中的表现，这些任务涉及建模多个对象及其长程时间依赖性。我们发现我们提出的设计在现有序列建模方法上提供了显著的性能增益。

更新时间: 2024-06-18 04:59:14

领域: cs.AI

下载: http://arxiv.org/abs/2406.12272v1

A Comprehensive Survey on AI-based Methods for Patents

Recent advancements in Artificial Intelligence (AI) and machine learning have demonstrated transformative capabilities across diverse domains. This progress extends to the field of patent analysis and innovation, where AI-based tools present opportunities to streamline and enhance important tasks in the patent cycle such as classification, retrieval, and valuation prediction. This not only accelerates the efficiency of patent researchers and applicants but also opens new avenues for technological innovation and discovery. Our survey provides a comprehensive summary of recent AI tools in patent analysis from more than 40 papers from 26 venues between 2017 and 2023. Unlike existing surveys, we include methods that work for patent image and text data. Furthermore, we introduce a novel taxonomy for the categorization based on the tasks in the patent life cycle as well as the specifics of the AI methods. This interdisciplinary survey aims to serve as a resource for researchers and practitioners who are working at the intersection of AI and patent analysis as well as the patent offices that are aiming to build efficient patent systems.

Updated: 2024-06-18 04:58:56

标题: 基于人工智能方法的专利综合调查

摘要: 人工智能（AI）和机器学习的最新进展已经在各个领域展示出了变革性的能力。这一进步延伸到专利分析和创新领域，AI工具为专利周期中的分类、检索和估值预测等重要任务提供了优化和增强的机会。这不仅加快了专利研究人员和申请人的效率，还为技术创新和发现开辟了新途径。我们的调查总结了2017年至2023年间来自26个会议的40多篇关于专利分析中最新AI工具的论文。与现有调查不同的是，我们包括适用于专利图像和文本数据的方法。此外，我们还引入了一种基于专利生命周期任务和AI方法特点的新型分类法。这一跨学科调查旨在为在AI和专利分析交叉领域工作的研究人员和从业者以及致力于构建高效专利系统的专利办公室提供资源。

更新时间: 2024-06-18 04:58:56

领域: cs.IR,cs.AI

下载: http://arxiv.org/abs/2404.08668v2

Positional Encoding Helps Recurrent Neural Networks Handle a Large Vocabulary

This study reports an unintuitive finding that positional encoding enhances learning of recurrent neural networks (RNNs). Positional encoding is a high-dimensional representation of time indices on input data. Most famously, positional encoding complements the capabilities of Transformer neural networks, which lack an inherent mechanism for representing the data order. By contrast, RNNs can encode the temporal information of data points on their own, rendering their use of positional encoding seemingly redundant/unnecessary. Nonetheless, investigations through synthetic benchmarks reveal an advantage of coupling positional encoding and RNNs, especially for handling a large vocabulary that yields low-frequency tokens. Further scrutinization unveils that these low-frequency tokens destabilizes the gradients of vanilla RNNs, and the positional encoding resolves this instability. These results shed a new light on the utility of positional encoding beyond its canonical role as a timekeeper for Transformers.

Updated: 2024-06-18 04:53:53

标题: 位置编码有助于循环神经网络处理大词汇量

摘要: 这项研究报告了一个令人意外的发现，即位置编码增强了循环神经网络（RNNs）的学习能力。位置编码是对输入数据中的时间索引的高维表示。最著名的是，位置编码补充了Transformer神经网络的能力，后者缺乏表示数据顺序的内在机制。相比之下，RNNs可以自己编码数据点的时间信息，使得它们使用位置编码似乎多余/不必要。然而，通过合成基准测试的调查揭示了位置编码和RNNs的耦合具有优势，特别是用于处理产生低频令牌的大词汇量。进一步的审查揭示了这些低频令牌不稳定了普通RNNs的梯度，而位置编码解决了这种不稳定性。这些结果揭示了位置编码的实用性，超越了其作为Transformer的时间管理者的传统角色。

更新时间: 2024-06-18 04:53:53

领域: cs.LG,cs.NE

下载: http://arxiv.org/abs/2402.00236v3

Byzantine-Robust Decentralized Federated Learning

Federated learning (FL) enables multiple clients to collaboratively train machine learning models without revealing their private training data. In conventional FL, the system follows the server-assisted architecture (server-assisted FL), where the training process is coordinated by a central server. However, the server-assisted FL framework suffers from poor scalability due to a communication bottleneck at the server, and trust dependency issues. To address challenges, decentralized federated learning (DFL) architecture has been proposed to allow clients to train models collaboratively in a serverless and peer-to-peer manner. However, due to its fully decentralized nature, DFL is highly vulnerable to poisoning attacks, where malicious clients could manipulate the system by sending carefully-crafted local models to their neighboring clients. To date, only a limited number of Byzantine-robust DFL methods have been proposed, most of which are either communication-inefficient or remain vulnerable to advanced poisoning attacks. In this paper, we propose a new algorithm called BALANCE (Byzantine-robust averaging through local similarity in decentralization) to defend against poisoning attacks in DFL. In BALANCE, each client leverages its own local model as a similarity reference to determine if the received model is malicious or benign. We establish the theoretical convergence guarantee for BALANCE under poisoning attacks in both strongly convex and non-convex settings. Furthermore, the convergence rate of BALANCE under poisoning attacks matches those of the state-of-the-art counterparts in Byzantine-free settings. Extensive experiments also demonstrate that BALANCE outperforms existing DFL methods and effectively defends against poisoning attacks.

Updated: 2024-06-18 04:46:08

标题: 拜占庭-鲁棒的去中心化联邦学习

摘要: 联邦学习（FL）使多个客户端能够在不透露其私有训练数据的情况下协作训练机器学习模型。在传统的FL中，系统遵循服务器辅助架构（服务器辅助FL），其中训练过程由中央服务器协调。然而，服务器辅助FL框架由于服务器上的通信瓶颈和信任依赖问题而存在扩展性差的问题。为了解决这些挑战，提出了去中心化的联邦学习（DFL）架构，允许客户端以无服务器和点对点的方式协作训练模型。然而，由于其完全去中心化的特性，DFL极易受到毒化攻击的影响，恶意客户端可以通过向其邻近客户端发送精心制作的本地模型来操纵系统。到目前为止，只有有限数量的拜占庭-鲁棒DFL方法被提出，其中大部分要么通信效率低，要么仍然容易受到高级毒化攻击的影响。在本文中，我们提出了一种名为BALANCE（通过本地相似性在去中心化中实现拜占庭-鲁棒均值）的新算法，用于抵御DFL中的毒化攻击。在BALANCE中，每个客户端利用自己的本地模型作为相似性参考来确定接收到的模型是恶意还是良性的。我们在强凸和非凸设置下建立了BALANCE在毒化攻击下的理论收敛保证。此外，BALANCE在毒化攻击下的收敛速度与拜占庭-自由设置下的最先进对手相匹配。广泛的实验也证明，BALANCE优于现有的DFL方法，并有效地抵御毒化攻击。

更新时间: 2024-06-18 04:46:08

领域: cs.CR,cs.DC,cs.LG

下载: http://arxiv.org/abs/2406.10416v2

Projection Methods for Operator Learning and Universal Approximation

We obtain a new universal approximation theorem for continuous operators on arbitrary Banach spaces using the Leray-Schauder mapping. Moreover, we introduce and study a method for operator learning in Banach spaces $L^p$ of functions with multiple variables, based on orthogonal projections on polynomial bases. We derive a universal approximation result for operators where we learn a linear projection and a finite dimensional mapping under some additional assumptions. For the case of $p=2$, we give some sufficient conditions for the approximation results to hold. This article serves as the theoretical framework for a deep learning methodology whose implementation will be provided in subsequent work.

Updated: 2024-06-18 04:44:05

标题: 运算学习和通用逼近的投影方法

摘要: 我们利用Leray-Schauder映射获得了一个新的针对任意Banach空间上连续算子的通用逼近定理。此外，我们引入并研究了一种在具有多个变量的函数的Banach空间$L^p$上进行算子学习的方法，该方法基于多项式基上的正交投影。我们推导出了一个针对学习线性投影和有限维映射的算子的通用逼近结果，前提是在一些额外假设下。对于$p=2$的情况，我们给出了一些逼近结果成立的充分条件。本文为一种深度学习方法提供了理论框架，其实现将在随后的工作中提供。

更新时间: 2024-06-18 04:44:05

领域: math.NA,cs.AI,cs.LG,cs.NA

下载: http://arxiv.org/abs/2406.12264v1

CoSLight: Co-optimizing Collaborator Selection and Decision-making to Enhance Traffic Signal Control

Effective multi-intersection collaboration is pivotal for reinforcement-learning-based traffic signal control to alleviate congestion. Existing work mainly chooses neighboring intersections as collaborators. However, quite an amount of congestion, even some wide-range congestion, is caused by non-neighbors failing to collaborate. To address these issues, we propose to separate the collaborator selection as a second policy to be learned, concurrently being updated with the original signal-controlling policy. Specifically, the selection policy in real-time adaptively selects the best teammates according to phase- and intersection-level features. Empirical results on both synthetic and real-world datasets provide robust validation for the superiority of our approach, offering significant improvements over existing state-of-the-art methods. The code is available at https://github.com/bonaldli/CoSLight.

Updated: 2024-06-18 04:43:01

标题: CoSLight：协同优化协作者选择和决策，以增强交通信号控制

摘要: 有效的多路口协作对于基于强化学习的交通信号控制至关重要，以减轻拥堵。现有研究主要选择邻近路口作为合作者。然而，相当数量的拥堵，甚至一些广泛范围的拥堵，是由非邻近路口未能协作造成的。为了解决这些问题，我们提出将合作者选择作为第二个需要学习的策略来独立进行，同时与原始信号控制策略一同更新。具体而言，实时选择策略根据阶段和路口级特征自适应地选择最佳合作者。对合成和真实世界数据集的实证结果为我们的方法的优越性提供了坚实的验证，相较于现有最先进的方法，我们的方法提供了显著的改进。代码可在https://github.com/bonaldli/CoSLight 找到。

更新时间: 2024-06-18 04:43:01

领域: cs.MA,cs.AI

下载: http://arxiv.org/abs/2405.17152v2

Pride and Prejudice: LLM Amplifies Self-Bias in Self-Refinement

Recent studies show that large language models (LLMs) improve their performance through self-feedback on certain tasks while degrade on others. We discovered that such a contrary is due to LLM's bias in evaluating their own output. In this paper, we formally define LLM's self-bias - the tendency to favor its own generation - using two statistics. We analyze six LLMs (GPT-4, GPT-3.5, Gemini, LLaMA2, Mixtral and DeepSeek) on translation, constrained text generation, and mathematical reasoning tasks. We find that self-bias is prevalent in all examined LLMs across multiple languages and tasks. Our analysis reveals that while the self-refine pipeline improves the fluency and understandability of model outputs, it further amplifies self-bias. To mitigate such biases, we discover that larger model size and external feedback with accurate assessment can significantly reduce bias in the self-refine pipeline, leading to actual performance improvement in downstream tasks. The code and data are released at https://github.com/xu1998hz/llm_self_bias.

Updated: 2024-06-18 04:41:07

标题: 《傲慢与偏见：LLM在自我完善中加剧自我偏见》

摘要: 最近的研究表明，大型语言模型（LLMs）通过对某些任务进行自我反馈来提高性能，但在其他任务上性能下降。我们发现，这种相反的情况是由于LLM在评估自己的输出时存在偏见。在本文中，我们正式定义了LLM的自我偏见 - 倾向于偏爱自己生成的倾向 - 使用两个统计量。我们分析了六个LLM（GPT-4、GPT-3.5、Gemini、LLaMA2、Mixtral和DeepSeek）在翻译、受限文本生成和数学推理任务上的表现。我们发现，在所有检验的LLM中，自我偏见在多种语言和任务中普遍存在。我们的分析揭示了自我优化流程虽然提高了模型输出的流畅性和可理解性，但进一步加剧了自我偏见。为了减轻这种偏见，我们发现更大的模型大小和准确评估的外部反馈可以显著减少自我优化流程中的偏见，从而导致下游任务的实际性能改善。代码和数据发布在https://github.com/xu1998hz/llm_self_bias。

更新时间: 2024-06-18 04:41:07

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2402.11436v2

Investigating Data Usage for Inductive Conformal Predictors

Inductive conformal predictors (ICPs) are algorithms that are able to generate prediction sets, instead of point predictions, which are valid at a user-defined confidence level, only assuming exchangeability. These algorithms are useful for reliable machine learning and are increasing in popularity. The ICP development process involves dividing development data into three parts: training, calibration and test. With access to limited or expensive development data, it is an open question regarding the most efficient way to divide the data. This study provides several experiments to explore this question and consider the case for allowing overlap of examples between training and calibration sets. Conclusions are drawn that will be of value to academics and practitioners planning to use ICPs.

Updated: 2024-06-18 04:35:35

标题: 调查归纳适应性预测器的数据使用情况

摘要: 归纳一致性预测器（ICPs）是能够生成预测集而不是点预测的算法，这些预测集在用户定义的置信水平下有效，仅假设可交换性。这些算法对可靠的机器学习非常有用，并且越来越受欢迎。ICP的开发过程涉及将开发数据分为三部分：训练、校准和测试。在有限或昂贵的开发数据的情况下，如何高效地划分数据是一个开放的问题。本研究提供了几个实验来探讨这个问题，并考虑允许训练集和校准集之间的示例重叠的情况。得出的结论对计划使用ICPs的学术界和从业者将具有价值。

更新时间: 2024-06-18 04:35:35

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2406.12262v1

Can LLMs Recognize Toxicity? Definition-Based Toxicity Metric

In the pursuit of developing Large Language Models (LLMs) that adhere to societal standards, it is imperative to detect the toxicity in the generated text. The majority of existing toxicity metrics rely on encoder models trained on specific toxicity datasets, which are susceptible to out-of-distribution (OOD) problems and depend on the dataset's definition of toxicity. In this paper, we introduce a robust metric grounded on LLMs to flexibly measure toxicity according to the given definition. We first analyze the toxicity factors, followed by an examination of the intrinsic toxic attributes of LLMs to ascertain their suitability as evaluators. Finally, we evaluate the performance of our metric with detailed analysis. Our empirical results demonstrate outstanding performance in measuring toxicity within verified factors, improving on conventional metrics by 12 points in the F1 score. Our findings also indicate that upstream toxicity significantly influences downstream metrics, suggesting that LLMs are unsuitable for toxicity evaluations within unverified factors.

Updated: 2024-06-18 04:35:12

标题: LLMs能识别毒性吗？基于定义的毒性度量

摘要: 在开发符合社会标准的大型语言模型（LLMs）的过程中，检测生成文本中的毒性是至关重要的。现有大多数毒性度量标准依赖于在特定毒性数据集上训练的编码器模型，这些模型容易受到分布外（OOD）问题的影响，并且依赖于数据集对毒性的定义。在本文中，我们引入了一种基于LLMs的鲁棒度量标准，灵活地根据给定定义来衡量毒性。我们首先分析毒性因素，然后检查LLMs的内在毒性属性，以确定它们作为评估者的适用性。最后，我们通过详细分析评估了我们的度量标准的性能。我们的实证结果表明，在验证因素内度量毒性的性能卓越，F1分数比传统度量标准提高了12个百分点。我们的研究结果还表明，上游毒性显著影响下游度量标准，表明LLMs不适合在未验证因素内进行毒性评估。

更新时间: 2024-06-18 04:35:12

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2402.06900v3

Linear Bellman Completeness Suffices for Efficient Online Reinforcement Learning with Few Actions

One of the most natural approaches to reinforcement learning (RL) with function approximation is value iteration, which inductively generates approximations to the optimal value function by solving a sequence of regression problems. To ensure the success of value iteration, it is typically assumed that Bellman completeness holds, which ensures that these regression problems are well-specified. We study the problem of learning an optimal policy under Bellman completeness in the online model of RL with linear function approximation. In the linear setting, while statistically efficient algorithms are known under Bellman completeness (e.g., Jiang et al. (2017); Zanette et al. (2020)), these algorithms all rely on the principle of global optimism which requires solving a nonconvex optimization problem. In particular, it has remained open as to whether computationally efficient algorithms exist. In this paper we give the first polynomial-time algorithm for RL under linear Bellman completeness when the number of actions is any constant.

Updated: 2024-06-18 04:27:49

标题: 线性Bellman完整性足以实现具有少量动作的高效在线强化学习

摘要: 强化学习（RL）中最自然的函数逼近方法之一是值迭代，通过解决一系列回归问题归纳生成最优值函数的近似。为了确保值迭代的成功，通常假定贝尔曼完备性成立，这确保了这些回归问题是明确定义的。我们研究在线RL模型下在线性函数逼近中学习最优策略的问题。在线性设置中，虽然在贝尔曼完备性下已知统计有效的算法（例如Jiang等人（2017）；Zanette等人（2020）），这些算法都依赖于全局乐观原则，需要解决非凸优化问题。特别地，目前尚不清楚是否存在计算有效的算法。在本文中，我们提出了第一个在动作数量为任意常数时在线性贝尔曼完备性下的RL多项式时间算法。

更新时间: 2024-06-18 04:27:49

领域: cs.LG,cs.AI,stat.ML

下载: http://arxiv.org/abs/2406.11640v2

Self-Supervised Time-Series Anomaly Detection Using Learnable Data Augmentation

Continuous efforts are being made to advance anomaly detection in various manufacturing processes to increase the productivity and safety of industrial sites. Deep learning replaced rule-based methods and recently emerged as a promising method for anomaly detection in diverse industries. However, in the real world, the scarcity of abnormal data and difficulties in obtaining labeled data create limitations in the training of detection models. In this study, we addressed these shortcomings by proposing a learnable data augmentation-based time-series anomaly detection (LATAD) technique that is trained in a self-supervised manner. LATAD extracts discriminative features from time-series data through contrastive learning. At the same time, learnable data augmentation produces challenging negative samples to enhance learning efficiency. We measured anomaly scores of the proposed technique based on latent feature similarities. As per the results, LATAD exhibited comparable or improved performance to the state-of-the-art anomaly detection assessments on several benchmark datasets and provided a gradient-based diagnosis technique to help identify root causes.

Updated: 2024-06-18 04:25:56

标题: 利用可学习的数据增强进行自监督时间序列异常检测

摘要: 不断努力推动各种制造过程中的异常检测，以提高工业场所的生产效率和安全性。深度学习取代了基于规则的方法，并最近成为多样行业异常检测的一种有前途的方法。然而，在现实世界中，异常数据的稀缺和获取标记数据的困难限制了检测模型的训练。在本研究中，我们提出了一种基于可学习数据增强的时间序列异常检测（LATAD）技术，通过自监督方式进行训练来解决这些缺点。LATAD通过对比学习从时间序列数据中提取判别特征。同时，可学习的数据增强产生具有挑战性的负样本，以增强学习效率。我们基于潜在特征相似性测量了所提出技术的异常分数。根据结果，LATAD在几个基准数据集上表现出与最先进的异常检测评估相当或更好的性能，并提供了一种基于梯度的诊断技术，帮助识别根本原因。

更新时间: 2024-06-18 04:25:56

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2406.12260v1

Adversarial Attacks on Large Language Models in Medicine

The integration of Large Language Models (LLMs) into healthcare applications offers promising advancements in medical diagnostics, treatment recommendations, and patient care. However, the susceptibility of LLMs to adversarial attacks poses a significant threat, potentially leading to harmful outcomes in delicate medical contexts. This study investigates the vulnerability of LLMs to two types of adversarial attacks in three medical tasks. Utilizing real-world patient data, we demonstrate that both open-source and proprietary LLMs are susceptible to manipulation across multiple tasks. This research further reveals that domain-specific tasks demand more adversarial data in model fine-tuning than general domain tasks for effective attack execution, especially for more capable models. We discover that while integrating adversarial data does not markedly degrade overall model performance on medical benchmarks, it does lead to noticeable shifts in fine-tuned model weights, suggesting a potential pathway for detecting and countering model attacks. This research highlights the urgent need for robust security measures and the development of defensive mechanisms to safeguard LLMs in medical applications, to ensure their safe and effective deployment in healthcare settings.

Updated: 2024-06-18 04:24:30

标题: 医学领域中对大型语言模型的对抗攻击

摘要: 将大型语言模型（LLMs）整合到医疗应用中，可以在医疗诊断、治疗建议和患者护理方面带来有希望的进展。然而，LLMs容易受到对抗攻击的影响，可能导致在敏感的医疗环境中产生有害结果。本研究调查了LLMs在三个医疗任务中对两种类型对抗攻击的脆弱性。利用真实世界的患者数据，我们展示了开源和专有LLMs在多个任务中都容易受到操纵。这项研究进一步揭示，领域特定任务需要更多的对抗数据进行模型微调，以有效执行攻击，尤其是对更有能力的模型而言。我们发现，虽然整合对抗数据并没有显著降低医疗基准模型的整体性能，但它确实会导致微调模型权重的明显变化，这表明了一种检测和抵御模型攻击的潜在途径。这项研究突显了在医疗应用中迫切需要强有力的安全措施和防御机制的发展，以确保LLMs在医疗环境中的安全有效部署。

更新时间: 2024-06-18 04:24:30

领域: cs.AI

下载: http://arxiv.org/abs/2406.12259v1

The Role of Inherent Bellman Error in Offline Reinforcement Learning with Linear Function Approximation

In this paper, we study the offline RL problem with linear function approximation. Our main structural assumption is that the MDP has low inherent Bellman error, which stipulates that linear value functions have linear Bellman backups with respect to the greedy policy. This assumption is natural in that it is essentially the minimal assumption required for value iteration to succeed. We give a computationally efficient algorithm which succeeds under a single-policy coverage condition on the dataset, namely which outputs a policy whose value is at least that of any policy which is well-covered by the dataset. Even in the setting when the inherent Bellman error is 0 (termed linear Bellman completeness), our algorithm yields the first known guarantee under single-policy coverage. In the setting of positive inherent Bellman error ${\varepsilon_{\mathrm{BE}}} > 0$, we show that the suboptimality error of our algorithm scales with $\sqrt{\varepsilon_{\mathrm{BE}}}$. Furthermore, we prove that the scaling of the suboptimality with $\sqrt{\varepsilon_{\mathrm{BE}}}$ cannot be improved for any algorithm. Our lower bound stands in contrast to many other settings in reinforcement learning with misspecification, where one can typically obtain performance that degrades linearly with the misspecification error.

Updated: 2024-06-18 04:23:39

标题: 线性函数逼近下离线强化学习中固有贝尔曼误差的作用

摘要: 在这篇论文中，我们研究了具有线性函数逼近的离线RL问题。我们的主要结构性假设是MDP具有低固有贝尔曼误差，这意味着线性值函数对于贪婪策略具有线性贝尔曼备份。这个假设是自然的，因为这基本上是值迭代成功所需的最小假设。我们提出了一个计算有效的算法，该算法在数据集上具有单策略覆盖条件时成功，即输出一个价值至少与数据集充分覆盖的任何策略的价值相同的策略。即使在固有贝尔曼误差为0的情况下（称为线性贝尔曼完备性），我们的算法也可以在单策略覆盖下提供已知的第一个保证。在固有贝尔曼误差为正${\varepsilon_{\mathrm{BE}}} > 0$的情况下，我们表明我们的算法的次优性误差与$\sqrt{\varepsilon_{\mathrm{BE}}}$成比例。此外，我们证明了次优性与$\sqrt{\varepsilon_{\mathrm{BE}}}$的比例对于任何算法都无法改进。我们的下限与许多其他强化学习中的误差规范设置形成对比，其中通常可以获得与误差规范误差线性恶化的性能。

更新时间: 2024-06-18 04:23:39

领域: cs.LG,cs.AI,stat.ML

下载: http://arxiv.org/abs/2406.11686v2

Multi-Modal and Multi-Agent Systems Meet Rationality: A Survey

Rationality is the quality of being guided by reason, characterized by logical thinking and decision-making that align with evidence and logical rules. This quality is essential for effective problem-solving, as it ensures that solutions are well-founded and systematically derived. Despite the advancements of large language models (LLMs) in generating human-like text with remarkable accuracy, they present biases inherited from the training data, inconsistency across different contexts, and difficulty understanding complex scenarios involving multiple layers of context. Therefore, recent research attempts to leverage the strength of multiple agents working collaboratively with various types of data and tools for enhanced consistency and reliability. To that end, this paper aims to understand whether multi-modal and multi-agent systems are advancing toward rationality by surveying the state-of-the-art works, identifying advancements over single-agent and single-modal systems in terms of rationality, and discussing open problems and future directions. We maintain an open repository at https://github.com/bowen-upenn/MMMA_Rationality.

Updated: 2024-06-18 04:22:39

标题: 多模式和多代理系统遇见理性：一项调查

摘要: 理性是被理性引导的质量，其特征是逻辑思维和决策与证据和逻辑规则一致。这种质量对于有效解决问题至关重要，因为它确保解决方案是有充分根据和系统推导的。尽管大型语言模型（LLMs）在生成类似人类文本方面取得了显著的准确性，但它们存在着从训练数据中继承的偏见、在不同上下文中的不一致性，以及难以理解涉及多层上下文的复杂场景。因此，最近的研究试图利用多个代理人协作工作以及各种类型的数据和工具的优势，以增强一致性和可靠性。为此，本文旨在通过调查最新的研究成果，识别在理性方面相对于单一代理和单一模态系统的进展，并讨论开放问题和未来方向，了解多模态和多代理系统是否正在朝着理性前进。我们在https://github.com/bowen-upenn/MMMA_Rationality 上维护一个开放的存储库。

更新时间: 2024-06-18 04:22:39

领域: cs.AI,cs.CL,cs.CV,cs.MA

下载: http://arxiv.org/abs/2406.00252v3

BEV-TSR: Text-Scene Retrieval in BEV Space for Autonomous Driving

The rapid development of the autonomous driving industry has led to a significant accumulation of autonomous driving data. Consequently, there comes a growing demand for retrieving data to provide specialized optimization. However, directly applying previous image retrieval methods faces several challenges, such as the lack of global feature representation and inadequate text retrieval ability for complex driving scenes. To address these issues, firstly, we propose the BEV-TSR framework which leverages descriptive text as an input to retrieve corresponding scenes in the Bird's Eye View (BEV) space. Then to facilitate complex scene retrieval with extensive text descriptions, we employ a large language model (LLM) to extract the semantic features of the text inputs and incorporate knowledge graph embeddings to enhance the semantic richness of the language embedding. To achieve feature alignment between the BEV feature and language embedding, we propose Shared Cross-modal Embedding with a set of shared learnable embeddings to bridge the gap between these two modalities, and employ a caption generation task to further enhance the alignment. Furthermore, there lack of well-formed retrieval datasets for effective evaluation. To this end, we establish a multi-level retrieval dataset, nuScenes-Retrieval, based on the widely adopted nuScenes dataset. Experimental results on the multi-level nuScenes-Retrieval show that BEV-TSR achieves state-of-the-art performance, e.g., 85.78% and 87.66% top-1 accuracy on scene-to-text and text-to-scene retrieval respectively. Codes and datasets will be available.

Updated: 2024-06-18 04:20:51

标题: BEV-TSR：自动驾驶中基于BEV空间的文本场景检索

摘要: 自动驾驶行业的快速发展导致了大量的自动驾驶数据积累。因此，对于检索数据以提供专门的优化需求不断增长。然而，直接应用先前的图像检索方法面临一些挑战，例如全局特征表征的缺乏和对复杂驾驶场景的文本检索能力不足。为了解决这些问题，首先，我们提出了BEV-TSR框架，利用描述性文本作为输入，在鸟瞰图空间中检索相应的场景。然后，为了便于用广泛的文本描述进行复杂场景检索，我们采用了一个大型语言模型（LLM）来提取文本输入的语义特征，并结合知识图嵌入来增强语言嵌入的语义丰富性。为了实现BEV特征和语言嵌入之间的特征对齐，我们提出了具有一组共享可学习嵌入的共享跨模态嵌入，以弥合这两种模态之间的差距，并采用生成标题的任务进一步增强对齐。此外，缺乏完善的检索数据集以进行有效评估。为此，我们建立了一个基于广泛采用的nuScenes数据集的多级检索数据集nuScenes-Retrieval。多级nuScenes-Retrieval的实验结果表明，BEV-TSR实现了最先进的性能，例如，对于场景到文本和文本到场景检索，分别达到了85.78%和87.66%的top-1准确率。代码和数据集将会提供。

更新时间: 2024-06-18 04:20:51

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2401.01065v2

A Survey on In-context Learning

With the increasing capabilities of large language models (LLMs), in-context learning (ICL) has emerged as a new paradigm for natural language processing (NLP), where LLMs make predictions based on contexts augmented with a few examples. It has been a significant trend to explore ICL to evaluate and extrapolate the ability of LLMs. In this paper, we aim to survey and summarize the progress and challenges of ICL. We first present a formal definition of ICL and clarify its correlation to related studies. Then, we organize and discuss advanced techniques, including training strategies, prompt designing strategies, and related analysis. Additionally, we explore various ICL application scenarios, such as data engineering and knowledge updating. Finally, we address the challenges of ICL and suggest potential directions for further research. We hope that our work can encourage more research on uncovering how ICL works and improving ICL.

Updated: 2024-06-18 04:19:31

标题: “关于上下文学习的调查”

摘要: 随着大型语言模型（LLMs）的能力不断增强，基于上下文学习（ICL）已经成为自然语言处理（NLP）的一种新范式，LLMs根据增加了一些示例的上下文进行预测。探索ICL以评估和推断LLMs的能力已经成为一个重要趋势。本文旨在调查和总结ICL的进展和挑战。首先，我们提出了ICL的正式定义，并阐明了它与相关研究的关联。然后，我们组织和讨论了包括训练策略、提示设计策略和相关分析在内的先进技术。此外，我们探索了各种ICL应用场景，如数据工程和知识更新。最后，我们讨论了ICL的挑战，并提出了进一步研究的潜在方向。我们希望我们的工作可以鼓励更多研究揭示ICL的工作原理，并改进ICL。

更新时间: 2024-06-18 04:19:31

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2301.00234v4

Is Efficient PAC Learning Possible with an Oracle That Responds 'Yes' or 'No'?

The empirical risk minimization (ERM) principle has been highly impactful in machine learning, leading both to near-optimal theoretical guarantees for ERM-based learning algorithms as well as driving many of the recent empirical successes in deep learning. In this paper, we investigate the question of whether the ability to perform ERM, which computes a hypothesis minimizing empirical risk on a given dataset, is necessary for efficient learning: in particular, is there a weaker oracle than ERM which can nevertheless enable learnability? We answer this question affirmatively, showing that in the realizable setting of PAC learning for binary classification, a concept class can be learned using an oracle which only returns a single bit indicating whether a given dataset is realizable by some concept in the class. The sample complexity and oracle complexity of our algorithm depend polynomially on the VC dimension of the hypothesis class, thus showing that there is only a polynomial price to pay for use of our weaker oracle. Our results extend to the agnostic learning setting with a slight strengthening of the oracle, as well as to the partial concept, multiclass and real-valued learning settings. In the setting of partial concept classes, prior to our work no oracle-efficient algorithms were known, even with a standard ERM oracle. Thus, our results address a question of Alon et al. (2021) who asked whether there are algorithmic principles which enable efficient learnability in this setting.

Updated: 2024-06-18 04:18:17

标题: 在一个只回答“是”或“否”的Oracle的帮助下，是否可能进行高效的PAC学习？

摘要: 经验风险最小化（ERM）原则在机器学习中产生了极大影响，不仅为基于ERM的学习算法提供了近乎最优的理论保证，同时也推动了深度学习中许多最近的实证成功。在本文中，我们探讨了一个问题，即执行ERM的能力，即在给定数据集上计算最小化经验风险的假设，是否对有效学习是必要的：特别地，是否存在比ERM更弱的神谕，仍然可以实现可学习性？我们肯定地回答了这个问题，展示了在可实现的PAC学习二元分类设置中，一个概念类可以使用一个只返回一个指示特定数据集是否可通过类中某个概念实现的单个位的神谕来学习。我们的算法的样本复杂度和神谕复杂度多项式地取决于假设类的VC维度，因此表明使用我们更弱的神谕只需支付多项式代价。我们的结果扩展到有一定强度的神谕的agnostic学习设置，以及部分概念、多类别和实值学习设置。在部分概念类设置中，即使使用标准的ERM神谕，以前没有已知的神谕高效算法。因此，我们的结果解决了Alon等人（2021）提出的一个问题，即是否存在使得在这种设置中实现高效可学习性的算法原则。

更新时间: 2024-06-18 04:18:17

领域: cs.LG,cs.AI,stat.ML

下载: http://arxiv.org/abs/2406.11667v2

CleanGen: Mitigating Backdoor Attacks for Generation Tasks in Large Language Models

The remarkable performance of large language models (LLMs) in generation tasks has enabled practitioners to leverage publicly available models to power custom applications, such as chatbots and virtual assistants. However, the data used to train or fine-tune these LLMs is often undisclosed, allowing an attacker to compromise the data and inject backdoors into the models. In this paper, we develop a novel inference time defense, named CleanGen, to mitigate backdoor attacks for generation tasks in LLMs. CleanGenis a lightweight and effective decoding strategy that is compatible with the state-of-the-art (SOTA) LLMs. Our insight behind CleanGen is that compared to other LLMs, backdoored LLMs assign significantly higher probabilities to tokens representing the attacker-desired contents. These discrepancies in token probabilities enable CleanGen to identify suspicious tokens favored by the attacker and replace them with tokens generated by another LLM that is not compromised by the same attacker, thereby avoiding generation of attacker-desired content. We evaluate CleanGen against five SOTA backdoor attacks. Our results show that CleanGen achieves lower attack success rates (ASR) compared to five SOTA baseline defenses for all five backdoor attacks. Moreover, LLMs deploying CleanGen maintain helpfulness in their responses when serving benign user queries with minimal added computational overhead.

Updated: 2024-06-18 04:10:38

标题: CleanGen：在大型语言模型中减轻生成任务的后门攻击

摘要: 大型语言模型（LLMs）在生成任务中的出色表现使从业者能够利用公开可用的模型来支持定制应用程序，如聊天机器人和虚拟助手。然而，用于训练或微调这些LLMs的数据通常是不公开的，这使得攻击者可以破坏数据并向模型中注入后门。在本文中，我们开发了一种新颖的推理时间防御机制，名为CleanGen，用于减轻LLMs中生成任务的后门攻击。CleanGen是一种轻量且有效的解码策略，与最先进的LLMs兼容。我们对CleanGen的见解是，与其他LLMs相比，带后门的LLMs将更高的概率分配给代表攻击者所需内容的令牌。这些令牌概率上的差异使得CleanGen能够识别攻击者偏爱的可疑令牌，并用另一个未受同一攻击者威胁的LLM生成的令牌替换它们，从而避免生成攻击者所需的内容。我们对CleanGen针对五种最先进的后门攻击进行了评估。我们的结果显示，与五种最先进的基线防御相比，CleanGen在所有五种后门攻击中实现了更低的攻击成功率（ASR）。此外，部署CleanGen的LLMs在为良性用户查询提供帮助时保持了响应的友好性，并且增加的计算开销很小。

更新时间: 2024-06-18 04:10:38

领域: cs.AI,cs.CR

下载: http://arxiv.org/abs/2406.12257v1

A Hopfieldian View-based Interpretation for Chain-of-Thought Reasoning

Chain-of-Thought (CoT) holds a significant place in augmenting the reasoning performance for large language models (LLMs). While some studies focus on improving CoT accuracy through methods like retrieval enhancement, yet a rigorous explanation for why CoT achieves such success remains unclear. In this paper, we analyze CoT methods under two different settings by asking the following questions: (1) For zero-shot CoT, why does prompting the model with "let's think step by step" significantly impact its outputs? (2) For few-shot CoT, why does providing examples before questioning the model could substantially improve its reasoning ability? To answer these questions, we conduct a top-down explainable analysis from the Hopfieldian view and propose a Read-and-Control approach for controlling the accuracy of CoT. Through extensive experiments on seven datasets for three different tasks, we demonstrate that our framework can decipher the inner workings of CoT, provide reasoning error localization, and control to come up with the correct reasoning path.

Updated: 2024-06-18 04:07:13

标题: 基于霍普菲尔德观点的链式思维推理解释

摘要: Chain-of-Thought（CoT）在增强大型语言模型（LLMs）的推理性能方面发挥着重要作用。虽然一些研究侧重于通过诸如检索增强等方法来提高CoT的准确性，但是为什么CoT取得如此成功的严格解释仍不清楚。在本文中，我们通过以下两种不同的设置分析CoT方法，提出以下问题：（1）对于零-shot CoT，为什么提示模型“让我们逐步思考”会显著影响其输出？（2）对于少量-shot CoT，为什么在向模型提问之前提供示例能够显著提高其推理能力？为了回答这些问题，我们从Hopfieldian视角进行自顶向下的可解释性分析，并提出了一个读取和控制方法来控制CoT的准确性。通过对三种不同任务的七个数据集进行广泛实验，我们证明了我们的框架可以解密CoT的内部工作原理，提供推理错误定位，并控制以提出正确的推理路径。

更新时间: 2024-06-18 04:07:13

领域: cs.CL,cs.AI,cs.HC,cs.LG

下载: http://arxiv.org/abs/2406.12255v1

DELRec: Distilling Sequential Pattern to Enhance LLM-based Recommendation

Sequential recommendation (SR) tasks enhance recommendation accuracy by capturing the connection between users' past interactions and their changing preferences. Conventional models often focus solely on capturing sequential patterns within the training data, neglecting the broader context and semantic information embedded in item titles from external sources. This limits their predictive power and adaptability. Recently, large language models (LLMs) have shown promise in SR tasks due to their advanced understanding capabilities and strong generalization abilities. Researchers have attempted to enhance LLMs' recommendation performance by incorporating information from SR models. However, previous approaches have encountered problems such as 1) only influencing LLMs at the result level; 2) increased complexity of LLMs recommendation methods leading to reduced interpretability; 3) incomplete understanding and utilization of SR models information by LLMs. To address these problems, we proposes a novel framework, DELRec, which aims to extract knowledge from SR models and enable LLMs to easily comprehend and utilize this supplementary information for more effective sequential recommendations. DELRec consists of two main stages: 1) SR Models Pattern Distilling, focusing on extracting behavioral patterns exhibited by SR models using soft prompts through two well-designed strategies; 2) LLMs-based Sequential Recommendation, aiming to fine-tune LLMs to effectively use the distilled auxiliary information to perform SR tasks. Extensive experimental results conducted on three real datasets validate the effectiveness of the DELRec framework.

Updated: 2024-06-18 04:00:59

标题: DELRec：提炼序列模式以增强基于LLM的推荐

摘要: 顺序推荐（SR）任务通过捕捉用户过去的互动和他们不断变化的偏好之间的联系，提高了推荐准确性。传统模型通常仅关注于捕捉训练数据中的顺序模式，忽视了来自外部来源的项目标题中嵌入的更广泛的上下文和语义信息。这限制了它们的预测能力和适应性。最近，由于其先进的理解能力和强大的泛化能力，大型语言模型（LLMs）在SR任务中显示出潜力。研究人员已尝试通过将SR模型的信息纳入来提高LLMs的推荐性能。然而，先前的方法遇到了问题，如：1）仅在结果级别影响LLMs；2）LLMs推荐方法复杂性增加导致可解释性降低；3）LLMs对SR模型信息的理解和利用不完整。为解决这些问题，我们提出了一个新颖的框架DELRec，旨在从SR模型中提取知识，并使LLMs能够轻松理解和利用这些补充信息，以实现更有效的顺序推荐。DELRec包括两个主要阶段：1）SR模型模式提取，专注于使用两种精心设计的策略通过软提示提取SR模型展示的行为模式；2）基于LLMs的顺序推荐，旨在微调LLMs以有效利用提取的辅助信息来执行SR任务。在三个真实数据集上进行的大量实验结果验证了DELRec框架的有效性。

更新时间: 2024-06-18 04:00:59

领域: cs.IR,cs.AI

下载: http://arxiv.org/abs/2406.11156v2

Mitigate Negative Transfer with Similarity Heuristic Lifelong Prompt Tuning

Lifelong prompt tuning has significantly advanced parameter-efficient lifelong learning with its efficiency and minimal storage demands on various tasks. Our empirical studies, however, highlights certain transferability constraints in the current methodologies: a universal algorithm that guarantees consistent positive transfer across all tasks is currently unattainable, especially when dealing dissimilar tasks that may engender negative transfer. Identifying the misalignment between algorithm selection and task specificity as the primary cause of negative transfer, we present the Similarity Heuristic Lifelong Prompt Tuning (SHLPT) framework. This innovative strategy partitions tasks into two distinct subsets by harnessing a learnable similarity metric, thereby facilitating fruitful transfer from tasks regardless of their similarity or dissimilarity. Additionally, SHLPT incorporates a parameter pool to combat catastrophic forgetting effectively. Our experiments shows that SHLPT outperforms state-of-the-art techniques in lifelong learning benchmarks and demonstrates robustness against negative transfer in diverse task sequences.

Updated: 2024-06-18 03:57:49

标题: 使用相似性启发式和终身提示调整来缓解负面转移

摘要: 终身即时调整显著提升了参数高效的终身学习，在各种任务中具有高效性和最小的存储需求。然而，我们的实证研究突出了当前方法中存在的一些可转移性约束：目前无法获得一种保证在所有任务中都能实现一致正向转移的通用算法，特别是在处理可能导致负向转移的不同任务时。我们确定算法选择与任务特异性之间的不匹配是负向转移的主要原因，因此提出了相似性启发终身即时调整（SHLPT）框架。这种创新策略通过利用可学习的相似度度量将任务分为两个不同的子集，从而促进任务之间的有益转移，无论它们的相似性或差异性如何。此外，SHLPT还包括一个参数池，有效地抵抗灾难性遗忘。我们的实验表明，SHLPT在终身学习基准测试中优于最先进的技术，并且在不同任务序列中展现了对负向转移的稳健性。

更新时间: 2024-06-18 03:57:49

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2406.12251v1

Addressing Shortcomings in Fair Graph Learning Datasets: Towards a New Benchmark

Fair graph learning plays a pivotal role in numerous practical applications. Recently, many fair graph learning methods have been proposed; however, their evaluation often relies on poorly constructed semi-synthetic datasets or substandard real-world datasets. In such cases, even a basic Multilayer Perceptron (MLP) can outperform Graph Neural Networks (GNNs) in both utility and fairness. In this work, we illustrate that many datasets fail to provide meaningful information in the edges, which may challenge the necessity of using graph structures in these problems. To address these issues, we develop and introduce a collection of synthetic, semi-synthetic, and real-world datasets that fulfill a broad spectrum of requirements. These datasets are thoughtfully designed to include relevant graph structures and bias information crucial for the fair evaluation of models. The proposed synthetic and semi-synthetic datasets offer the flexibility to create data with controllable bias parameters, thereby enabling the generation of desired datasets with user-defined bias values with ease. Moreover, we conduct systematic evaluations of these proposed datasets and establish a unified evaluation approach for fair graph learning models. Our extensive experimental results with fair graph learning methods across our datasets demonstrate their effectiveness in benchmarking the performance of these methods. Our datasets and the code for reproducing our experiments are available at https://github.com/XweiQ/Benchmark-GraphFairness.

Updated: 2024-06-18 03:55:04

标题: 解决公平图学习数据集的不足之处：走向一个新的基准Benchmark

摘要: 公平图学习在许多实际应用中起着关键作用。最近，提出了许多公平图学习方法；然而，它们的评估往往依赖于构造不佳的半合成数据集或次标准的真实世界数据集。在这种情况下，即使基本的多层感知器（MLP）也可以在效用和公平性方面胜过图神经网络（GNN）。在这项工作中，我们说明许多数据集未能提供有意义的边缘信息，这可能挑战在这些问题中使用图结构的必要性。为了解决这些问题，我们开发并引入了一系列合成、半合成和真实世界数据集，满足各种要求。这些数据集经过精心设计，包括了对于模型公平评估至关重要的相关图结构和偏见信息。所提出的合成和半合成数据集提供了灵活性，可以创建具有可控偏见参数的数据，从而轻松生成具有用户定义偏见值的所需数据集。此外，我们对这些提出的数据集进行系统评估，并建立了公平图学习模型的统一评估方法。我们在我们的数据集中对公平图学习方法进行了广泛的实验结果，展示了这些方法在基准性能测试中的有效性。我们的数据集和复制实验的代码可在https://github.com/XweiQ/Benchmark-GraphFairness 上找到。

更新时间: 2024-06-18 03:55:04

领域: cs.LG,cs.CY

下载: http://arxiv.org/abs/2403.06017v2

TroL: Traversal of Layers for Large Language and Vision Models

Large language and vision models (LLVMs) have been driven by the generalization power of large language models (LLMs) and the advent of visual instruction tuning. Along with scaling them up directly, these models enable LLVMs to showcase powerful vision language (VL) performances by covering diverse tasks via natural language instructions. However, existing open-source LLVMs that perform comparably to closed-source LLVMs such as GPT-4V are often considered too large (e.g., 26B, 34B, and 110B parameters), having a larger number of layers. These large models demand costly, high-end resources for both training and inference. To address this issue, we present a new efficient LLVM family with 1.8B, 3.8B, and 7B LLM model sizes, Traversal of Layers (TroL), which enables the reuse of layers in a token-wise manner. This layer traversing technique simulates the effect of looking back and retracing the answering stream while increasing the number of forward propagation layers without physically adding more layers. We demonstrate that TroL employs a simple layer traversing approach yet efficiently outperforms the open-source LLVMs with larger model sizes and rivals the performances of the closed-source LLVMs with substantial sizes.

Updated: 2024-06-18 03:42:00

标题: TroL：大型语言和视觉模型的层遍历

摘要: 大型语言和视觉模型(LLVMs)受到大型语言模型(LLMs)的泛化能力以及视觉指导调整的推动。除了直接放大它们，这些模型使LLVMs能够通过自然语言指令涵盖多样的任务，展示强大的视觉语言(VL)性能。然而，与GPT-4V等闭源LLVMs表现相当的现有开源LLVMs通常被认为过大(例如26B、34B和110B参数)，拥有更多的层。这些大型模型需要昂贵的高端资源进行训练和推断。为了解决这个问题，我们提出了一个新的高效的LLVM家族，包括1.8B、3.8B和7B的LLM模型大小，即层遍历(TroL)，它可以以令牌方式重复使用层。这种层遍历技术模拟了回顾和重追答案流的效果，同时增加了前向传播层数量，而无需物理上添加更多层。我们展示了TroL采用简单的层遍历方法，但却有效地胜过了具有更大模型大小的开源LLVMs，并与具有大规模的闭源LLVMs的性能相媲美。

更新时间: 2024-06-18 03:42:00

领域: cs.LG,cs.CL,cs.CV

下载: http://arxiv.org/abs/2406.12246v1

TabularFM: An Open Framework For Tabular Foundational Models

Foundational models (FMs), pretrained on extensive datasets using self-supervised techniques, are capable of learning generalized patterns from large amounts of data. This reduces the need for extensive labeled datasets for each new task, saving both time and resources by leveraging the broad knowledge base established during pretraining. Most research on FMs has primarily focused on unstructured data, such as text and images, or semi-structured data, like time-series. However, there has been limited attention to structured data, such as tabular data, which, despite its prevalence, remains under-studied due to a lack of clean datasets and insufficient research on the transferability of FMs for various tabular data tasks. In response to this gap, we introduce a framework called TabularFM, which incorporates state-of-the-art methods for developing FMs specifically for tabular data. This includes variations of neural architectures such as GANs, VAEs, and Transformers. We have curated a million of tabular datasets and released cleaned versions to facilitate the development of tabular FMs. We pretrained FMs on this curated data, benchmarked various learning methods on these datasets, and released the pretrained models along with leaderboards for future comparative studies. Our fully open-sourced system provides a comprehensive analysis of the transferability of tabular FMs. By releasing these datasets, pretrained models, and leaderboards, we aim to enhance the validity and usability of tabular FMs in the near future.

Updated: 2024-06-18 03:36:03

标题: TabularFM：一个用于表格基础模型的开放框架

摘要: 基于广泛数据集使用自监督技术预训练的基础模型（FMs）能够从大量数据中学习广义模式。这降低了每个新任务所需的大量标记数据集，通过利用预训练期间建立的广泛知识基础，节省了时间和资源。大多数关于FMs的研究主要集中在非结构化数据，如文本和图像，或半结构化数据，如时间序列上。然而，对于结构化数据（如表格数据），尽管其普遍存在，由于缺乏干净数据集和关于FMs在各种表格数据任务中可迁移性的研究不足，一直被忽视。为了填补这一空白，我们引入了一个名为TabularFM的框架，该框架结合了专门针对表格数据开发FMs的最先进方法，包括GANs、VAEs和Transformers等神经结构的变体。我们搜集了数百万个表格数据集，并发布了清理过的版本，以促进表格FMs的开发。我们在这些搜集的数据上对FMs进行了预训练，对这些数据集上的各种学习方法进行了基准测试，并发布了预训练模型以及未来比较研究的排行榜。我们的完全开源系统提供了对表格FMs可迁移性的全面分析。通过发布这些数据集、预训练模型和排行榜，我们旨在提升表格FMs在不久的将来的有效性和可用性。

更新时间: 2024-06-18 03:36:03

领域: cs.LG

下载: http://arxiv.org/abs/2406.09837v2

Privacy-Preserved Neural Graph Databases

In the era of large language models (LLMs), efficient and accurate data retrieval has become increasingly crucial for the use of domain-specific or private data in the retrieval augmented generation (RAG). Neural graph databases (NGDBs) have emerged as a powerful paradigm that combines the strengths of graph databases (GDBs) and neural networks to enable efficient storage, retrieval, and analysis of graph-structured data which can be adaptively trained with LLMs. The usage of neural embedding storage and Complex neural logical Query Answering (CQA) provides NGDBs with generalization ability. When the graph is incomplete, by extracting latent patterns and representations, neural graph databases can fill gaps in the graph structure, revealing hidden relationships and enabling accurate query answering. Nevertheless, this capability comes with inherent trade-offs, as it introduces additional privacy risks to the domain-specific or private databases. Malicious attackers can infer more sensitive information in the database using well-designed queries such as from the answer sets of where Turing Award winners born before 1950 and after 1940 lived, the living places of Turing Award winner Hinton are probably exposed, although the living places may have been deleted in the training stage due to the privacy concerns. In this work, we propose a privacy-preserved neural graph database (P-NGDB) framework to alleviate the risks of privacy leakage in NGDBs. We introduce adversarial training techniques in the training stage to enforce the NGDBs to generate indistinguishable answers when queried with private information, enhancing the difficulty of inferring sensitive information through combinations of multiple innocuous queries.

Updated: 2024-06-18 03:35:51

标题: 隐私保护的神经图数据库

摘要: 在大语言模型（LLMs）时代，高效准确的数据检索对于在检索增强生成（RAG）中使用领域特定或私人数据变得越来越关键。神经图数据库（NGDBs）已经成为一种强大的范式，它结合了图数据库（GDBs）和神经网络的优势，实现了图结构数据的高效存储、检索和分析，这些数据可以与LLMs自适应训练。神经嵌入存储和复杂神经逻辑查询回答（CQA）的使用为NGDBs提供了泛化能力。当图形不完整时，通过提取潜在模式和表示，神经图数据库可以填补图结构中的空白，揭示隐藏关系并实现准确的查询回答。然而，这种能力伴随着固有的折衷，因为它引入了额外的隐私风险到领域特定或私人数据库中。恶意攻击者可以使用设计良好的查询推断数据库中更多敏感信息，比如从图灵奖获得者出生于1950年之前和1940年之后的答案集中，图灵奖获得者Hinton的居住地可能被暴露，尽管由于隐私问题，这些居住地可能已在训练阶段被删除。在这项工作中，我们提出了一个隐私保护的神经图数据库（P-NGDB）框架，以减轻NGDB中隐私泄露的风险。我们在训练阶段引入对抗训练技术，强制NGDB在查询私人信息时生成不可区分的答案，增加通过多个无害查询的组合推断敏感信息的难度。

更新时间: 2024-06-18 03:35:51

领域: cs.DB,cs.CR,cs.LG

下载: http://arxiv.org/abs/2312.15591v5

CherryRec: Enhancing News Recommendation Quality via LLM-driven Framework

Large Language Models (LLMs) have achieved remarkable progress in language understanding and generation. Custom LLMs leveraging textual features have been applied to recommendation systems, demonstrating improvements across various recommendation scenarios. However, most existing methods perform untrained recommendation based on pre-trained knowledge (e.g., movie recommendation), and the auto-regressive generation of LLMs leads to slow inference speeds, making them less effective in real-time recommendations.To address this, we propose a framework for news recommendation using LLMs, named \textit{CherryRec}, which ensures the quality of recommendations while accelerating the recommendation process. Specifically, we employ a Knowledge-aware News Rapid Selector to retrieve candidate options based on the user's interaction history. The history and retrieved items are then input as text into a fine-tuned LLM, the Content-aware News Llm Evaluator, designed to enhance news recommendation capabilities. Finally, the Value-aware News Scorer integrates the scores to compute the CherryRec Score, which serves as the basis for the final recommendation.We validate the effectiveness of the proposed framework by comparing it with state-of-the-art baseline methods on benchmark datasets. Our experimental results consistently show that CherryRec outperforms the baselines in both recommendation performance and efficiency.The project resource can be accessed at: \url{https://github.com/xxxxxx}

Updated: 2024-06-18 03:33:38

标题: CherryRec：通过LLM驱动框架提高新闻推荐质量

摘要: 大型语言模型（LLMs）在语言理解和生成方面取得了显著进展。利用文本特征的定制化LLMs已经应用于推荐系统，在各种推荐场景中展示出改进。然而，大多数现有方法基于预训练知识（例如电影推荐）执行未经训练的推荐，而LLMs的自回归生成导致推理速度较慢，使它们在实时推荐中效果不佳。为了解决这个问题，我们提出了一个利用LLMs进行新闻推荐的框架，名为CherryRec，该框架在加速推荐过程的同时确保了推荐的质量。具体来说，我们采用了一个基于用户交互历史的知识感知新闻快速选择器来检索候选选项。然后，将历史和检索到的项目作为文本输入到一个经过微调的LLM中，即内容感知新闻LLM评估器，旨在增强新闻推荐能力。最后，价值感知新闻评分器整合得分以计算CherryRec得分，这将成为最终推荐的依据。我们通过将提出的框架与基准数据集上的最先进基准方法进行比较来验证其有效性。我们的实验结果一致表明，CherryRec在推荐性能和效率方面均优于基准方法。项目资源可在以下网址访问：https://github.com/xxxxxx

更新时间: 2024-06-18 03:33:38

领域: cs.IR,cs.AI

下载: http://arxiv.org/abs/2406.12243v1

GMP-AR: Granularity Message Passing and Adaptive Reconciliation for Temporal Hierarchy Forecasting

Time series forecasts of different temporal granularity are widely used in real-world applications, e.g., sales prediction in days and weeks for making different inventory plans. However, these tasks are usually solved separately without ensuring coherence, which is crucial for aligning downstream decisions. Previous works mainly focus on ensuring coherence with some straightforward methods, e.g., aggregation from the forecasts of fine granularity to the coarse ones, and allocation from the coarse granularity to the fine ones. These methods merely take the temporal hierarchical structure to maintain coherence without improving the forecasting accuracy. In this paper, we propose a novel granularity message-passing mechanism (GMP) that leverages temporal hierarchy information to improve forecasting performance and also utilizes an adaptive reconciliation (AR) strategy to maintain coherence without performance loss. Furthermore, we introduce an optimization module to achieve task-based targets while adhering to more real-world constraints. Experiments on real-world datasets demonstrate that our framework (GMP-AR) achieves superior performances on temporal hierarchical forecasting tasks compared to state-of-the-art methods. In addition, our framework has been successfully applied to a real-world task of payment traffic management in Alipay by integrating with the task-based optimization module.

Updated: 2024-06-18 03:33:03

标题: GMP-AR：时间层次预测的粒度消息传递和自适应调解

摘要: 时间序列预测在不同时间粒度上被广泛应用于现实世界的应用中，例如，销售预测可以在天和周之内进行，以制定不同的库存计划。然而，这些任务通常是分开解决的，而没有确保一致性，这对于调整下游决策是至关重要的。先前的工作主要集中在使用一些直接的方法来确保一致性，例如，从细粒度预测聚合到粗粒度，以及从粗粒度分配到细粒度。这些方法仅仅利用时间层次结构来保持一致性，而没有提高预测的准确性。在本文中，我们提出了一种新颖的粒度消息传递机制（GMP），利用时间层次信息来提高预测性能，并利用自适应协调（AR）策略来保持一致性而不损失性能。此外，我们引入了一个优化模块来实现基于任务的目标，同时遵守更多的现实世界约束。对真实世界数据集的实验表明，与最先进的方法相比，我们的框架（GMP-AR）在时间层次预测任务上表现出更好的性能。此外，我们的框架已成功应用于支付宝的支付流量管理实际任务中，通过与基于任务的优化模块集成。

更新时间: 2024-06-18 03:33:03

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2406.12242v1

More Efficient Randomized Exploration for Reinforcement Learning via Approximate Sampling

Thompson sampling (TS) is one of the most popular exploration techniques in reinforcement learning (RL). However, most TS algorithms with theoretical guarantees are difficult to implement and not generalizable to Deep RL. While the emerging approximate sampling-based exploration schemes are promising, most existing algorithms are specific to linear Markov Decision Processes (MDP) with suboptimal regret bounds, or only use the most basic samplers such as Langevin Monte Carlo. In this work, we propose an algorithmic framework that incorporates different approximate sampling methods with the recently proposed Feel-Good Thompson Sampling (FGTS) approach (Zhang, 2022; Dann et al., 2021), which was previously known to be computationally intractable in general. When applied to linear MDPs, our regret analysis yields the best known dependency of regret on dimensionality, surpassing existing randomized algorithms. Additionally, we provide explicit sampling complexity for each employed sampler. Empirically, we show that in tasks where deep exploration is necessary, our proposed algorithms that combine FGTS and approximate sampling perform significantly better compared to other strong baselines. On several challenging games from the Atari 57 suite, our algorithms achieve performance that is either better than or on par with other strong baselines from the deep RL literature.

Updated: 2024-06-18 03:32:10

标题: 通过近似抽样实现强化学习中更高效的随机探索

摘要: 汤普森抽样（TS）是强化学习（RL）中最受欢迎的探索技术之一。然而，大多数具有理论保证的TS算法难以实现，且不适用于深度强化学习。虽然新兴的基于近似抽样的探索方案很有前景，但大多数现有算法仅适用于具有次优遗憾边界的线性马尔可夫决策过程（MDP），或者仅使用像Langevin Monte Carlo这样的最基本的采样器。在这项工作中，我们提出了一个算法框架，将不同的近似抽样方法与最近提出的Feel-Good Thompson Sampling（FGTS）方法（Zhang，2022; Dann等，2021）相结合，该方法以前被认为在一般情况下难以计算。当应用于线性MDP时，我们的遗憾分析得出了遗憾对维度的最佳已知依赖性，超过了现有的随机算法。此外，我们为每个采用的采样器提供了明确的采样复杂度。在实证方面，我们展示了在需要深度探索的任务中，我们提出的将FGTS和近似抽样结合的算法相比于其他强基线表现显著地更好。在Atari 57套件的几个具有挑战性的游戏中，我们的算法取得了要么比其他深度RL文献中的其他强基线更好，要么与之持平的表现。

更新时间: 2024-06-18 03:32:10

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2406.12241v1

Self-supervised Graph Neural Network for Mechanical CAD Retrieval

CAD (Computer-Aided Design) plays a crucial role in mechanical industry, where large numbers of similar-shaped CAD parts are often created. Efficiently reusing these parts is key to reducing design and production costs for enterprises. Retrieval systems are vital for achieving CAD reuse, but the complex shapes of CAD models are difficult to accurately describe using text or keywords, making traditional retrieval methods ineffective. While existing representation learning approaches have been developed for CAD, manually labeling similar samples in these methods is expensive. Additionally, CAD models' unique parameterized data structure presents challenges for applying existing 3D shape representation learning techniques directly. In this work, we propose GC-CAD, a self-supervised contrastive graph neural network-based method for mechanical CAD retrieval that directly models parameterized CAD raw files. GC-CAD consists of two key modules: structure-aware representation learning and contrastive graph learning framework. The method leverages graph neural networks to extract both geometric and topological information from CAD models, generating feature representations. We then introduce a simple yet effective contrastive graph learning framework approach, enabling the model to train without manual labels and generate retrieval-ready representations. Experimental results on four datasets including human evaluation demonstrate that the proposed method achieves significant accuracy improvements and up to 100 times efficiency improvement over the baseline methods.

Updated: 2024-06-18 03:29:12

标题: 自监督图神经网络用于机械CAD检索

摘要: CAD（计算机辅助设计）在机械工业中起着至关重要的作用，通常会创建大量相似形状的CAD零件。有效地重复使用这些零件对于降低企业的设计和生产成本至关重要。检索系统对于实现CAD重复使用至关重要，但是CAD模型的复杂形状很难用文本或关键字准确描述，使得传统的检索方法无效。虽然已经为CAD开发了现有的表示学习方法，但在这些方法中手动标记相似样本是昂贵的。此外，CAD模型的独特参数化数据结构对于直接应用现有的三维形状表示学习技术提出了挑战。在这项工作中，我们提出了GC-CAD，一种基于自监督对比图神经网络的机械CAD检索方法，直接对参数化CAD原始文件进行建模。GC-CAD包括两个关键模块：结构感知表示学习和对比图学习框架。该方法利用图神经网络从CAD模型中提取几何和拓扑信息，生成特征表示。然后，我们引入了一种简单而有效的对比图学习框架方法，使模型能够在没有手动标签的情况下进行训练并生成可检索的表示。包括人类评估在内的四个数据集的实验结果表明，所提出的方法在准确性方面取得了显著的改进，并且相对于基线方法提高了高达100倍的效率。

更新时间: 2024-06-18 03:29:12

领域: cs.IR,cs.AI,cs.CV

下载: http://arxiv.org/abs/2406.08863v2

SyncVSR: Data-Efficient Visual Speech Recognition with End-to-End Crossmodal Audio Token Synchronization

Visual Speech Recognition (VSR) stands at the intersection of computer vision and speech recognition, aiming to interpret spoken content from visual cues. A prominent challenge in VSR is the presence of homophenes-visually similar lip gestures that represent different phonemes. Prior approaches have sought to distinguish fine-grained visemes by aligning visual and auditory semantics, but often fell short of full synchronization. To address this, we present SyncVSR, an end-to-end learning framework that leverages quantized audio for frame-level crossmodal supervision. By integrating a projection layer that synchronizes visual representation with acoustic data, our encoder learns to generate discrete audio tokens from a video sequence in a non-autoregressive manner. SyncVSR shows versatility across tasks, languages, and modalities at the cost of a forward pass. Our empirical evaluations show that it not only achieves state-of-the-art results but also reduces data usage by up to ninefold.

Updated: 2024-06-18 03:14:22

标题: SyncVSR: 使用端到端跨模态音频令牌同步的高效数据视觉语音识别

摘要: 视觉语音识别（VSR）处于计算机视觉和语音识别的交叉点上，旨在通过视觉线索解释口头内容。在VSR中一个显著的挑战是同音异义词的存在-即代表不同音素的视觉相似的唇部动作。先前的方法试图通过对齐视觉和听觉语义来区分细粒度的面部表情，但通常未能完全同步。为了解决这个问题，我们提出了SyncVSR，这是一个利用量化音频进行帧级跨模态监督的端到端学习框架。通过集成一个投影层，该层将视觉表示与声学数据同步，我们的编码器学习以非自回归方式从视频序列生成离散音频令牌。SyncVSR在任务、语言和模态之间展现出了多样性，而且只需进行一次前向传递。我们的实证评估表明，它不仅实现了最先进的结果，还将数据使用量降低了多达九倍。

更新时间: 2024-06-18 03:14:22

领域: cs.AI,cs.CL,cs.CV

下载: http://arxiv.org/abs/2406.12233v1

Segment Anything Model is a Good Teacher for Local Feature Learning

Local feature detection and description play an important role in many computer vision tasks, which are designed to detect and describe keypoints in "any scene" and "any downstream task". Data-driven local feature learning methods need to rely on pixel-level correspondence for training, which is challenging to acquire at scale, thus hindering further improvements in performance. In this paper, we propose SAMFeat to introduce SAM (segment anything model), a fundamental model trained on 11 million images, as a teacher to guide local feature learning and thus inspire higher performance on limited datasets. To do so, first, we construct an auxiliary task of Attention-weighted Semantic Relation Distillation (ASRD), which distillates feature relations with category-agnostic semantic information learned by the SAM encoder into a local feature learning network, to improve local feature description using semantic discrimination. Second, we develop a technique called Weakly Supervised Contrastive Learning Based on Semantic Grouping (WSC), which utilizes semantic groupings derived from SAM as weakly supervised signals, to optimize the metric space of local descriptors. Third, we design an Edge Attention Guidance (EAG) to further improve the accuracy of local feature detection and description by prompting the network to pay more attention to the edge region guided by SAM. SAMFeat's performance on various tasks such as image matching on HPatches, and long-term visual localization on Aachen Day-Night showcases its superiority over previous local features. The release code is available at https://github.com/vignywang/SAMFeat.

Updated: 2024-06-18 03:11:59

标题: Segment Anything模型是本地特征学习的好老师

摘要: 本地特征检测和描述在许多计算机视觉任务中起着重要作用，这些任务旨在检测和描述“任何场景”和“任何下游任务”中的关键点。数据驱动的本地特征学习方法需要依赖像素级对应关系进行训练，这在大规模上是具有挑战性的，从而阻碍了进一步性能的提高。在本文中，我们提出了SAMFeat，引入了SAM（分段任意模型），这是一个在1100万张图像上训练的基础模型，作为导师来指导本地特征学习，从而在有限数据集上激发更高的性能。为此，首先，我们构建了一个辅助任务，即基于注意力加权语义关系蒸馏（ASRD），它利用SAM编码器学习的类别-无关语义信息来提炼特征关系，并将其引入到本地特征学习网络中，从而利用语义区分来改进本地特征描述。其次，我们开发了一种称为基于语义分组的弱监督对比学习（WSC）的技术，它利用从SAM中派生的语义分组作为弱监督信号，来优化本地描述符的度量空间。第三，我们设计了一个边缘注意力引导（EAG），通过引导SAM指导网络更加关注边缘区域，从而进一步提高本地特征检测和描述的准确性。SAMFeat在HPatches上的图像匹配和在Aachen Day-Night上的长期视觉定位等各种任务中的表现展示了其优越性。发布代码可在https://github.com/vignywang/SAMFeat上获得。

更新时间: 2024-06-18 03:11:59

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2309.16992v3

"You Gotta be a Doctor, Lin": An Investigation of Name-Based Bias of Large Language Models in Employment Recommendations

Social science research has shown that candidates with names indicative of certain races or genders often face discrimination in employment practices. Similarly, Large Language Models (LLMs) have demonstrated racial and gender biases in various applications. In this study, we utilize GPT-3.5-Turbo and Llama 3-70B-Instruct to simulate hiring decisions and salary recommendations for candidates with 320 first names that strongly signal their race and gender, across over 750,000 prompts. Our empirical results indicate a preference among these models for hiring candidates with White female-sounding names over other demographic groups across 40 occupations. Additionally, even among candidates with identical qualifications, salary recommendations vary by as much as 5% between different subgroups. A comparison with real-world labor data reveals inconsistent alignment with U.S. labor market characteristics, underscoring the necessity of risk investigation of LLM-powered systems.

Updated: 2024-06-18 03:11:43

标题: "You Gotta be a Doctor, Lin": 对大型语言模型在就业推荐中基于姓名的偏见的调查

摘要: 社会科学研究表明，具有特定种族或性别特征的候选人在就业实践中经常面临歧视。同样，大型语言模型（LLMs）在各种应用中展示了种族和性别偏见。在这项研究中，我们利用GPT-3.5-Turbo和Llama 3-70B-Instruct模拟对具有320个强烈表明其种族和性别的名字的候选人进行招聘决策和薪资建议，涵盖超过75万个提示。我们的实证结果表明，这些模型更倾向于聘用具有白人女性化名字的候选人，而不是其他族群，在40个职业中。此外，即使在具有相同资格的候选人中，不同子组之间的薪资建议也可能相差高达5％。与真实世界劳动数据的比较显示，与美国劳动力市场特征不一致，凸显了对LLM动力系统的风险调查的必要性。

更新时间: 2024-06-18 03:11:43

领域: cs.AI,cs.CL

下载: http://arxiv.org/abs/2406.12232v1

PANDA: Preference Adaptation for Enhancing Domain-Specific Abilities of LLMs

While Large language models (LLMs) have demonstrated considerable capabilities across various natural language tasks, they often fall short of the performance achieved by domain-specific state-of-the-art models. One potential approach to enhance domain-specific capabilities of LLMs involves fine-tuning them using corresponding datasets. However, this method can be both resource and time-intensive, and not applicable to closed-source commercial LLMs. In this paper, we propose Preference Adaptation for Enhancing Domain-specific Abilities of LLMs (PANDA), a method designed to augment the domain-specific capabilities of LLMs by leveraging insights from the response preference of expert models without requiring fine-tuning. Our experimental results reveal that PANDA significantly enhances the domain-specific ability of LLMs on text classification and interactive decision tasks. Moreover, LLM with PANDA even outperforms the expert model that being learned on 4 tasks of ScienceWorld. This finding highlights the potential of exploring tuning-free approaches to achieve weak-to-strong generalization.

Updated: 2024-06-18 03:08:37

标题: 熊猫：偏好调整以增强LLM的领域特定能力

摘要: 尽管大型语言模型（LLMs）在各种自然语言任务中展示出相当大的能力，但它们通常无法达到特定领域最先进模型的性能水平。增强LLMs特定领域能力的一个潜在方法是利用相应的数据集对其进行微调。然而，这种方法既可能消耗资源又可能耗时，并且不适用于封闭源商业LLMs。在本文中，我们提出了一种名为PANDA（Preference Adaptation for Enhancing Domain-specific Abilities of LLMs）的方法，旨在通过利用专家模型的响应偏好的见解来增强LLMs的特定领域能力，而无需进行微调。我们的实验结果显示，PANDA显著提高了LLMs在文本分类和交互决策任务上的特定领域能力。此外，具有PANDA的LLM甚至在对ScienceWorld的4项任务进行学习的专家模型表现出色。这一发现突显了探索无需调整的方法以实现弱到强泛化的潜力。

更新时间: 2024-06-18 03:08:37

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2402.12835v2

MCSD: An Efficient Language Model with Diverse Fusion

Transformers excel in Natural Language Processing (NLP) due to their prowess in capturing long-term dependencies but suffer from exponential resource consumption with increasing sequence lengths. To address these challenges, we propose MCSD model, an efficient language model with linear scaling and fast inference speed. MCSD model leverages diverse feature fusion, primarily through the multi-channel slope and decay (MCSD) block, to robustly represent features. This block comprises slope and decay sections that extract features across diverse temporal receptive fields, facilitating capture of both local and global information. In addition, MCSD block conducts element-wise fusion of diverse features to further enhance the delicate feature extraction capability. For inference, we formulate the inference process into a recurrent representation, slashing space complexity to $O(1)$ and time complexity to $O(N)$ respectively. Our experiments show that MCSD attains higher throughput and lower GPU memory consumption compared to Transformers, while maintaining comparable performance to larger-scale language learning models on benchmark tests. These attributes position MCSD as a promising base for edge deployment and embodied intelligence.

Updated: 2024-06-18 03:08:01

标题: MCSD：一种具有多样融合的高效语言模型

摘要: 变压器在自然语言处理（NLP）方面表现出色，因为它们擅长捕捉长期依赖关系，但随着序列长度的增加，资源消耗呈指数增长。为了解决这些挑战，我们提出了MCSD模型，这是一个具有线性扩展和快速推理速度的高效语言模型。MCSD模型利用多通道坡度和衰减（MCSD）块，通过多样化的特征融合来稳健地表示特征。这个块包括坡度和衰减部分，可以提取不同时间感受野中的特征，有助于捕获本地和全局信息。此外，MCSD块进行元素级融合不同的特征，进一步增强微妙的特征提取能力。对于推理，我们将推理过程形式化为一个循环表示，将空间复杂度削减到$O(1)$，时间复杂度分别削减到$O(N)$。我们的实验表明，与变压器相比，MCSD具有更高的吞吐量和更低的GPU内存消耗，同时在基准测试中维持与更大规模语言学习模型相当的性能。这些特征使MCSD成为边缘部署和具体化智能的有希望的基础。

更新时间: 2024-06-18 03:08:01

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2406.12230v1

Large Language Models Play StarCraft II: Benchmarks and A Chain of Summarization Approach

StarCraft II is a challenging benchmark for AI agents due to the necessity of both precise micro level operations and strategic macro awareness. Previous works, such as Alphastar and SCC, achieve impressive performance on tackling StarCraft II , however, still exhibit deficiencies in long term strategic planning and strategy interpretability. Emerging large language model (LLM) agents, such as Voyage and MetaGPT, presents the immense potential in solving intricate tasks. Motivated by this, we aim to validate the capabilities of LLMs on StarCraft II, a highly complex RTS game.To conveniently take full advantage of LLMs` reasoning abilities, we first develop textual StratCraft II environment, called TextStarCraft II, which LLM agent can interact. Secondly, we propose a Chain of Summarization method, including single frame summarization for processing raw observations and multi frame summarization for analyzing game information, providing command recommendations, and generating strategic decisions. Our experiment consists of two parts: first, an evaluation by human experts, which includes assessing the LLMs`s mastery of StarCraft II knowledge and the performance of LLM agents in the game; second, the in game performance of LLM agents, encompassing aspects like win rate and the impact of Chain of Summarization.Experiment results demonstrate that: 1. LLMs possess the relevant knowledge and complex planning abilities needed to address StarCraft II scenarios; 2. Human experts consider the performance of LLM agents to be close to that of an average player who has played StarCraft II for eight years; 3. LLM agents are capable of defeating the built in AI at the Harder(Lv5) difficulty level. We have open sourced the code and released demo videos of LLM agent playing StarCraft II.

Updated: 2024-06-18 03:07:37

标题: 大型语言模型在星际争霸 II 中的应用：基准测试和一种摘要方法链

摘要: 星际争霸II对AI代理来说是一个具有挑战性的基准，因为需要精确的微操作和战略宏观意识。先前的作品，如Alphastar和SCC，在解决星际争霸II方面取得了令人印象深刻的表现，但在长期战略规划和策略可解释性方面仍存在缺陷。新兴的大型语言模型(LLM)代理，如Voyage和MetaGPT，在解决复杂任务方面展现了巨大潜力。受此启发，我们旨在验证LLMs在星际争霸II上的能力，这是一个极其复杂的即时战略游戏。为了方便充分利用LLMs的推理能力，我们首先开发了文本化的StratCraft II环境，称为TextStarCraft II，LLM代理可以与之交互。其次，我们提出了一种摘要链方法，包括用于处理原始观察结果的单帧摘要和用于分析游戏信息、提供命令建议和生成战略决策的多帧摘要。我们的实验分为两部分：首先是由人类专家评估LLMs对星际争霸II知识的掌握程度和LLM代理在游戏中的表现；其次是LLM代理在游戏中的表现，包括胜率和摘要链的影响。实验结果表明：1. LLMs具有解决星际争霸II场景所需的相关知识和复杂规划能力；2. 人类专家认为LLM代理的表现接近于一个已经玩了八年星际争霸II的普通玩家；3. LLM代理能够在困难（Lv5）难度级别上击败内置AI。我们已经开源了代码，并发布了LLM代理玩星际争霸II的演示视频。

更新时间: 2024-06-18 03:07:37

领域: cs.AI

下载: http://arxiv.org/abs/2312.11865v3

Spatially Resolved Gene Expression Prediction from Histology via Multi-view Graph Contrastive Learning with HSIC-bottleneck Regularization

The rapid development of spatial transcriptomics(ST) enables the measurement of gene expression at spatial resolution, making it possible to simultaneously profile the gene expression, spatial locations of spots, and the matched histopathological images. However, the cost for collecting ST data is much higher than acquiring histopathological images, and thus several studies attempt to predict the gene expression on ST by leveraging their corresponding histopathological images. Most of the existing image-based gene prediction models treat the prediction task on each spot of ST data independently, which ignores the spatial dependency among spots. In addition, while the histology images share phenotypic characteristics with the ST data, it is still challenge to extract such common information to help align paired image and expression representations. To address the above issues, we propose a Multi-view Graph Contrastive Learning framework with HSIC-bottleneck Regularization(ST-GCHB) aiming at learning shared representation to help impute the gene expression of the queried imagingspots by considering their spatial dependency.

Updated: 2024-06-18 03:07:25

标题: 通过多视图图对比学习和HSIC瓶颈正则化实现的组织学空间基因表达预测

摘要: 空间转录组学（ST）的快速发展使得可以在空间分辨率下测量基因表达，从而使得同时分析基因表达、斑点的空间位置和匹配的组织病理图像成为可能。然而，收集ST数据的成本远高于获取组织病理图像，因此一些研究试图通过利用它们的对应组织病理图像来预测ST上的基因表达。大多数现有基于图像的基因预测模型将ST数据中每个斑点上的预测任务视为独立的，忽略了斑点之间的空间依赖关系。此外，虽然组织学图像与ST数据共享表型特征，但仍然具有挑战性来提取这种共同信息以帮助对齐配对的图像和表达表示。为解决上述问题，我们提出了一个多视图图对比学习框架与HSIC-瓶颈正则化（ST-GCHB），旨在学习共享表示以帮助通过考虑它们的空间依赖性来填补查询图像斑点的基因表达。

更新时间: 2024-06-18 03:07:25

领域: cs.AI,cs.LG

下载: http://arxiv.org/abs/2406.12229v1

Beyond Natural Language: LLMs Leveraging Alternative Formats for Enhanced Reasoning and Communication

Natural language (NL) has long been the predominant format for human cognition and communication, and by extension, has been similarly pivotal in the development and application of Large Language Models (LLMs). Yet, besides NL, LLMs have seen various non-NL formats during pre-training, such as code and logical expression. NL's status as the optimal format for LLMs, particularly in single-LLM reasoning and multi-agent communication, has not been thoroughly examined. In this work, we challenge the default use of NL by exploring the utility of non-NL formats in these contexts. We show that allowing LLMs to autonomously select the most suitable format before reasoning or communicating leads to a 3.3 to 5.7\% improvement in reasoning efficiency for different LLMs, and up to a 72.7\% reduction in token usage in multi-agent communication, all while maintaining communicative effectiveness. Our comprehensive analysis further reveals that LLMs can devise a format from limited task instructions and that the devised format is effectively transferable across different LLMs. Intriguingly, the structured communication format decided by LLMs exhibits notable parallels with established agent communication languages, suggesting a natural evolution towards efficient, structured communication in agent communication. Our code is released at \url{https://github.com/thunlp/AutoForm}.

Updated: 2024-06-18 03:06:39

标题: 超越自然语言：LLMs利用替代格式以增强推理和沟通

摘要: 自然语言（NL）长期以来一直是人类认知和沟通的主要格式，并且在大型语言模型（LLMs）的开发和应用中同样至关重要。然而，除了NL，LLMs在预训练过程中也看到了各种非NL格式，例如代码和逻辑表达式。尤其是在单一LLM推理和多代理沟通中，NL作为LLMs的最佳格式的地位尚未得到彻底的审查。在这项工作中，我们通过探索在这些情境中非NL格式的实用性来挑战NL的默认使用。我们展示了在推理或沟通之前允许LLMs自主选择最适合的格式可以使不同LLMs的推理效率提高3.3至5.7％，并且在多代理沟通中可以减少高达72.7％的标记使用量，同时仍保持沟通有效性。我们的全面分析进一步揭示了LLMs可以从有限的任务说明中设计格式，并且设计的格式在不同LLMs之间有效地可转移。有趣的是，LLMs决定的结构化沟通格式与已建立的代理沟通语言存在显著的相似之处，表明在代理沟通中朝着高效、结构化沟通的自然演变。我们的代码已发布在\url{https://github.com/thunlp/AutoForm}。

更新时间: 2024-06-18 03:06:39

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2402.18439v2

Interpretable Catastrophic Forgetting of Large Language Model Fine-tuning via Instruction Vector

Fine-tuning large language models (LLMs) can cause them to lose their general capabilities. However, the intrinsic mechanisms behind such forgetting remain unexplored. In this paper, we begin by examining this phenomenon by focusing on knowledge understanding and instruction following, with the latter identified as the main contributor to forgetting during fine-tuning. Consequently, we propose the Instruction Vector (IV) framework to capture model representations highly related to specific instruction-following capabilities, thereby making it possible to understand model-intrinsic forgetting. Through the analysis of IV dynamics pre and post-training, we suggest that fine-tuning mostly adds specialized reasoning patterns instead of erasing previous skills, which may appear as forgetting. Building on this insight, we develop IV-guided training, which aims to preserve original computation graph, thereby mitigating catastrophic forgetting. Empirical tests on three benchmarks confirm the efficacy of this new approach, supporting the relationship between IVs and forgetting. Our code will be made available soon.

Updated: 2024-06-18 03:05:08

标题: 通过指导向量解释大型语言模型微调的灾难性遗忘

摘要: 微调大型语言模型（LLMs）可能导致它们丧失通用能力。然而，导致这种遗忘的内在机制尚未被探索。本文首先通过关注知识理解和遵循指令，来研究这一现象，后者被确定为在微调过程中遗忘的主要原因。因此，我们提出了指令向量（IV）框架，用于捕捉与特定指令遵循能力密切相关的模型表示，从而使我们能够理解模型内在的遗忘。通过分析训练前后的IV动态，我们建议微调主要添加专门的推理模式，而不是擦除先前的技能，这可能会表现为遗忘。基于这一洞察力，我们开发了IV引导训练，旨在保留原始计算图，从而减轻灾难性遗忘。在三个基准测试上的实证测试证实了这种新方法的有效性，支持IV与遗忘之间的关系。我们的代码将很快提供。

更新时间: 2024-06-18 03:05:08

领域: cs.AI

下载: http://arxiv.org/abs/2406.12227v1

PagPassGPT: Pattern Guided Password Guessing via Generative Pretrained Transformer

Amidst the surge in deep learning-based password guessing models, challenges of generating high-quality passwords and reducing duplicate passwords persist. To address these challenges, we present PagPassGPT, a password guessing model constructed on Generative Pretrained Transformer (GPT). It can perform pattern guided guessing by incorporating pattern structure information as background knowledge, resulting in a significant increase in the hit rate. Furthermore, we propose D&C-GEN to reduce the repeat rate of generated passwords, which adopts the concept of a divide-and-conquer approach. The primary task of guessing passwords is recursively divided into non-overlapping subtasks. Each subtask inherits the knowledge from the parent task and predicts succeeding tokens. In comparison to the state-of-the-art model, our proposed scheme exhibits the capability to correctly guess 12% more passwords while producing 25% fewer duplicates.

Updated: 2024-06-18 03:05:01

标题: PagPassGPT：通过生成预训练变压器模型的模式引导密码猜测

摘要: 在深度学习密码猜测模型激增的背景下，生成高质量密码和减少重复密码的挑战仍然存在。为了解决这些挑战，我们提出了PagPassGPT，这是一个基于生成预训练变压器（GPT）构建的密码猜测模型。它可以通过将模式结构信息作为背景知识进行模式引导猜测，从而显著提高命中率。此外，我们提出了D&C-GEN来减少生成密码的重复率，它采用了分而治之的概念。猜测密码的主要任务被递归地分成不重叠的子任务。每个子任务继承了父任务的知识，并预测后续的令牌。与最先进的模型相比，我们提出的方案展示了正确猜测更多密码的能力，同时减少了25％的重复密码。

更新时间: 2024-06-18 03:05:01

领域: cs.CR,cs.AI

下载: http://arxiv.org/abs/2404.04886v2

Towards Goal-oriented Prompt Engineering for Large Language Models: A Survey

Large Language Models (LLMs) have shown prominent performance in various downstream tasks and prompt engineering plays a pivotal role in optimizing LLMs' performance. This paper, not only as an overview of current prompt engineering methods, but also aims to highlight the limitation of designing prompts based on an anthropomorphic assumption that expects LLMs to think like humans. From our review of 36 representative studies, we demonstrate that a goal-oriented prompt formulation, which guides LLMs to follow established human logical thinking, significantly improves the performance of LLMs. Furthermore, We introduce a novel taxonomy that categorizes goal-oriented prompting methods into five interconnected stages and we demonstrate the broad applicability of our framework. With four future directions proposed, we hope to further emphasize the power and potential of goal-oriented prompt engineering in all fields.

Updated: 2024-06-18 02:58:37

标题: 朝向大型语言模型的目标导向提示工程：一项调查

摘要: 大型语言模型（LLMs）在各种下游任务中表现出色，工程优化在优化LLMs性能方面起着关键作用。本文不仅是对当前提示工程方法的概述，还旨在强调在设计基于人类类推假设的提示时存在的局限性，期望LLMs像人类一样思考。通过对36项代表性研究的回顾，我们展示了一个目标导向的提示公式，它指导LLMs遵循已建立的人类逻辑思维，显著提高了LLMs的性能。此外，我们引入了一个新的分类法，将目标导向提示方法分为五个相互关联的阶段，并展示了我们框架的广泛适用性。提出了四个未来方向，希望进一步强调目标导向提示工程在所有领域的力量和潜力。

更新时间: 2024-06-18 02:58:37

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2401.14043v2

Edge Classification on Graphs: New Directions in Topological Imbalance

Recent years have witnessed the remarkable success of applying Graph machine learning (GML) to node/graph classification and link prediction. However, edge classification task that enjoys numerous real-world applications such as social network analysis and cybersecurity, has not seen significant advancement. To address this gap, our study pioneers a comprehensive approach to edge classification. We identify a novel `Topological Imbalance Issue', which arises from the skewed distribution of edges across different classes, affecting the local subgraph of each edge and harming the performance of edge classifications. Inspired by the recent studies in node classification that the performance discrepancy exists with varying local structural patterns, we aim to investigate if the performance discrepancy in topological imbalanced edge classification can also be mitigated by characterizing the local class distribution variance. To overcome this challenge, we introduce Topological Entropy (TE), a novel topological-based metric that measures the topological imbalance for each edge. Our empirical studies confirm that TE effectively measures local class distribution variance, and indicate that prioritizing edges with high TE values can help address the issue of topological imbalance. Based on this, we develop two strategies - Topological Reweighting and TE Wedge-based Mixup - to focus training on (synthetic) edges based on their TEs. While topological reweighting directly manipulates training edge weights according to TE, our wedge-based mixup interpolates synthetic edges between high TE wedges. Ultimately, we integrate these strategies into a novel topological imbalance strategy for edge classification: TopoEdge. Through extensive experiments, we demonstrate the efficacy of our proposed strategies on newly curated datasets and thus establish a new benchmark for (imbalanced) edge classification.

Updated: 2024-06-18 02:49:25

标题: 图上的边缘分类：拓扑不平衡的新方向

摘要: 近年来，图机器学习（GML）在节点/图分类和链接预测方面取得了显著的成功。然而，边缘分类任务在社交网络分析和网络安全等许多实际应用中具有重要意义，但却没有看到显著的进展。为了填补这一空白，我们的研究开创了一种全面的边缘分类方法。我们发现了一种新的“拓扑不平衡问题”，这是由于边缘在不同类别之间的分布不均匀而导致的，影响了每个边缘的局部子图，并损害了边缘分类的性能。受到最近对节点分类的研究的启发，即性能差异存在于不同的局部结构模式中，我们的目标是研究是否可以通过表征局部类分布方差来减轻拓扑不平衡边缘分类中的性能差异。为了克服这一挑战，我们引入了拓扑熵（TE），一种衡量每个边缘拓扑不平衡的新型基于拓扑的度量。我们的实证研究证实了TE有效地衡量了局部类分布的方差，并表明优先考虑具有较高TE值的边缘可以帮助解决拓扑不平衡问题。基于此，我们开发了两种策略 - 拓扑重新加权和TE基于Mixup - 以便根据它们的TE集中训练（合成）边缘。虽然拓扑重新加权根据TE直接操作训练边缘权重，但我们基于楔形的Mixup在高TE楔之间插值合成边缘。最终，我们将这些策略整合到一种新的边缘分类拓扑不平衡策略中：TopoEdge。通过大量实验，我们展示了我们提出的策略在新的策划数据集上的有效性，并为（不平衡的）边缘分类建立了一个新的基准。

更新时间: 2024-06-18 02:49:25

领域: cs.LG,cs.SI

下载: http://arxiv.org/abs/2406.11685v2

Nova: Generative Language Models for Assembly Code with Hierarchical Attention and Contrastive Learning

Binary code analysis is the foundation of crucial tasks in the security domain; thus building effective binary analysis techniques is more important than ever. Large language models (LLMs) although have brought impressive improvement to source code tasks, do not directly generalize to assembly code due to the unique challenges of assembly: (1) the low information density of assembly and (2) the diverse optimizations in assembly code. To overcome these challenges, this work proposes a hierarchical attention mechanism that builds attention summaries to capture the semantics more effectively, and designs contrastive learning objectives to train LLMs to learn assembly optimization. Equipped with these techniques, this work develops Nova, a generative LLM for assembly code. Nova outperforms existing techniques on binary code decompilation by up to 146.54%, and outperforms the latest binary code similarity detection techniques by up to 6.17%, showing promising abilities on both assembly generation and understanding tasks.

Updated: 2024-06-18 02:48:16

标题: 新星：具有分层注意力和对比学习的汇编代码生成语言模型

摘要: 二进制代码分析是安全领域中关键任务的基础，因此构建有效的二进制分析技术比以往任何时候都更为重要。尽管大型语言模型（LLMs）为源代码任务带来了显著的改进，但由于汇编代码的独特挑战（1）汇编的信息密度较低以及（2）汇编代码中多样化的优化，因此LLMs并不能直接推广到汇编代码。为了克服这些挑战，本文提出了一种层次化的注意力机制，建立注意力摘要以更有效地捕捉语义，设计对比学习目标来训练LLMs学习汇编优化。凭借这些技术，本文开发了一种生成式LLM，名为Nova，用于汇编代码。Nova在二进制代码反编译方面的表现优于现有技术高达146.54％，在最新的二进制代码相似性检测技术上的表现优于高达6.17％，表现出在汇编生成和理解任务上有着很有前景的能力。

更新时间: 2024-06-18 02:48:16

领域: cs.SE,cs.AI

下载: http://arxiv.org/abs/2311.13721v3

Refiner: Restructure Retrieval Content Efficiently to Advance Question-Answering Capabilities

Large Language Models (LLMs) are limited by their parametric knowledge, leading to hallucinations in knowledge-extensive tasks. To address this, Retrieval-Augmented Generation (RAG) incorporates external document chunks to expand LLM knowledge. Furthermore, compressing information from document chunks through extraction or summarization can improve LLM performance. Nonetheless, LLMs still struggle to notice and utilize scattered key information, a problem known as the "lost-in-the-middle" syndrome. Therefore, we typically need to restructure the content for LLM to recognize the key information. We propose $\textit{Refiner}$, an end-to-end extract-and-restructure paradigm that operates in the post-retrieval process of RAG. $\textit{Refiner}$ leverages a single decoder-only LLM to adaptively extract query-relevant contents verbatim along with the necessary context, and section them based on their interconnectedness, thereby highlights information distinction, and aligns downstream LLMs with the original context effectively. Experiments show that a trained $\textit{Refiner}$ (with 7B parameters) exhibits significant gain to downstream LLM in improving answer accuracy, and outperforms other state-of-the-art advanced RAG and concurrent compressing approaches in various single-hop and multi-hop QA tasks. Notably, $\textit{Refiner}$ achieves a 80.5% tokens reduction and a 1.6-7.0% improvement margin in multi-hop tasks compared to the next best solution. $\textit{Refiner}$ is a plug-and-play solution that can be seamlessly integrated with RAG systems, facilitating its application across diverse open-source frameworks.

Updated: 2024-06-18 02:44:27

标题: 精炼者：有效重构检索内容以提升问答能力

摘要: 大型语言模型（LLMs）受其参数化知识的限制，导致在知识广泛任务中出现幻觉。为了解决这个问题，检索增强生成（RAG）将外部文档块纳入以扩展LLM知识。此外，通过从文档块中提取或总结信息可以提高LLM的性能。然而，LLMs仍然难以注意和利用分散的关键信息，这个问题被称为“中间丢失”综合症。因此，我们通常需要重新构造内容以使LLM识别关键信息。我们提出了$\textit{Refiner}$，这是一个端到端的提取和重构范例，它在RAG的后检索过程中运行。$\textit{Refiner}$利用一个仅有解码器的LLM来自适应地提取查询相关内容的原文以及必要的上下文，并根据它们的相互关联性对它们进行分类，从而突出信息的区分，并有效地将下游LLMs与原始上下文对齐。实验表明，经过训练的$\textit{Refiner}$（具有7B参数）在提高答案准确性方面对下游LLM表现出显著的增益，并在各种单跳和多跳问答任务中优于其他最先进的RAG和同时压缩方法。值得注意的是，与次优解决方案相比，$\textit{Refiner}$在多跳任务中实现了80.5%的标记减少和1.6-7.0%的改进幅度。$\textit{Refiner}$是一个即插即用的解决方案，可以与RAG系统无缝集成，从而促进其在各种开源框架中的应用。

更新时间: 2024-06-18 02:44:27

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2406.11357v2

BadSampler: Harnessing the Power of Catastrophic Forgetting to Poison Byzantine-robust Federated Learning

Federated Learning (FL) is susceptible to poisoning attacks, wherein compromised clients manipulate the global model by modifying local datasets or sending manipulated model updates. Experienced defenders can readily detect and mitigate the poisoning effects of malicious behaviors using Byzantine-robust aggregation rules. However, the exploration of poisoning attacks in scenarios where such behaviors are absent remains largely unexplored for Byzantine-robust FL. This paper addresses the challenging problem of poisoning Byzantine-robust FL by introducing catastrophic forgetting. To fill this gap, we first formally define generalization error and establish its connection to catastrophic forgetting, paving the way for the development of a clean-label data poisoning attack named BadSampler. This attack leverages only clean-label data (i.e., without poisoned data) to poison Byzantine-robust FL and requires the adversary to selectively sample training data with high loss to feed model training and maximize the model's generalization error. We formulate the attack as an optimization problem and present two elegant adversarial sampling strategies, Top-$\kappa$ sampling, and meta-sampling, to approximately solve it. Additionally, our formal error upper bound and time complexity analysis demonstrate that our design can preserve attack utility with high efficiency. Extensive evaluations on two real-world datasets illustrate the effectiveness and performance of our proposed attacks.

Updated: 2024-06-18 02:43:56

标题: BadSampler：利用灾难性遗忘的力量来损害拜占庭强健的联邦学习

摘要: 联邦学习（FL）容易受到毒化攻击的影响，受损的客户端通过修改本地数据集或发送操纵的模型更新来操纵全局模型。有经验的防御者可以通过拜占庭鲁棒的聚合规则轻松检测和减轻恶意行为的毒化影响。然而，在这些行为不存在的情况下探索毒化攻击在拜占庭鲁棒FL中仍然是一个未被充分探讨的问题。本文通过引入灾难性遗忘来解决毒化拜占庭鲁棒FL的挑战性问题。为了填补这一空白，我们首先正式定义泛化误差并建立其与灾难性遗忘的关系，为开发一种名为BadSampler的无标签数据毒化攻击铺平道路。这种攻击只利用无标签数据（即没有被毒化的数据）来毒化拜占庭鲁棒FL，并要求对手有选择地采样训练数据，使模型训练的损失最大化并最大化模型的泛化误差。我们将攻击形式化为一个优化问题，并提出两种优雅的对抗采样策略，即Top-$\kappa$采样和元采样，以近似解决它。此外，我们的形式化误差上界和时间复杂度分析表明我们的设计可以高效地保留攻击效用。对两个真实数据集的广泛评估展示了我们提出的攻击的有效性和性能。

更新时间: 2024-06-18 02:43:56

领域: cs.CR,cs.AI,cs.LG

下载: http://arxiv.org/abs/2406.12222v1

Hierarchical Associative Memory, Parallelized MLP-Mixer, and Symmetry Breaking

Transformers have established themselves as the leading neural network model in natural language processing and are increasingly foundational in various domains. In vision, the MLP-Mixer model has demonstrated competitive performance, suggesting that attention mechanisms might not be indispensable. Inspired by this, recent research has explored replacing attention modules with other mechanisms, including those described by MetaFormers. However, the theoretical framework for these models remains underdeveloped. This paper proposes a novel perspective by integrating Krotov's hierarchical associative memory with MetaFormers, enabling a comprehensive representation of the entire Transformer block, encompassing token-/channel-mixing modules, layer normalization, and skip connections, as a single Hopfield network. This approach yields a parallelized MLP-Mixer derived from a three-layer Hopfield network, which naturally incorporates symmetric token-/channel-mixing modules and layer normalization. Empirical studies reveal that symmetric interaction matrices in the model hinder performance in image recognition tasks. Introducing symmetry-breaking effects transitions the performance of the symmetric parallelized MLP-Mixer to that of the vanilla MLP-Mixer. This indicates that during standard training, weight matrices of the vanilla MLP-Mixer spontaneously acquire a symmetry-breaking configuration, enhancing their effectiveness. These findings offer insights into the intrinsic properties of Transformers and MLP-Mixers and their theoretical underpinnings, providing a robust framework for future model design and optimization.

Updated: 2024-06-18 02:42:19

标题: 分层联想记忆，并行化MLP-Mixer和对称性破坏

摘要: Transformers已经确立自己作为自然语言处理中领先的神经网络模型，并在各个领域中越来越具有基础性。在视觉领域，MLP-Mixer模型展示了竞争性表现，表明注意机制可能并非必不可少。受此启发，最近的研究探索了用其他机制替换注意模块，包括MetaFormers描述的那些机制。然而，这些模型的理论框架仍未得到充分发展。本文提出了一个新颖的观点，将Krotov的分层联想记忆与MetaFormers集成，实现了对整个Transformer块的全面表示，包括token-/channel-mixing模块、层归一化和跳过连接，作为一个单一的Hopfield网络。这种方法产生了一个由三层Hopfield网络导出的并行化MLP-Mixer，自然地结合了对称的token-/channel-mixing模块和层归一化。实证研究表明，在模型中对称交互矩阵会阻碍图像识别任务的性能。引入打破对称性的效果将对称并行化的MLP-Mixer的性能过渡到普通的MLP-Mixer。这表明在标准训练过程中，普通MLP-Mixer的权重矩阵会自发地获得打破对称的配置，增强它们的有效性。这些发现为理解Transformers和MLP-Mixers的内在特性以及它们的理论基础提供了见解，并为未来模型设计和优化提供了坚实的框架。

更新时间: 2024-06-18 02:42:19

领域: cs.LG,cond-mat.dis-nn,cs.CV,cs.NE,stat.ML

下载: http://arxiv.org/abs/2406.12220v1

DefSent+: Improving sentence embeddings of language models by projecting definition sentences into a quasi-isotropic or isotropic vector space of unlimited dictionary entries

This paper presents a significant improvement on the previous conference paper known as DefSent. The prior study seeks to improve sentence embeddings of language models by projecting definition sentences into the vector space of dictionary entries. We discover that this approach is not fully explored due to the methodological limitation of using word embeddings of language models to represent dictionary entries. This leads to two hindrances. First, dictionary entries are constrained by the single-word vocabulary, and thus cannot be fully exploited. Second, semantic representations of language models are known to be anisotropic, but pre-processing word embeddings for DefSent is not allowed because its weight is frozen during training and tied to the prediction layer. In this paper, we propose a novel method to progressively build entry embeddings not subject to the limitations. As a result, definition sentences can be projected into a quasi-isotropic or isotropic vector space of unlimited dictionary entries, so that sentence embeddings of noticeably better quality are attainable. We abbreviate our approach as DefSent+ (a plus version of DefSent), involving the following strengths: 1) the task performance on measuring sentence similarities is significantly improved compared to DefSent; 2) when DefSent+ is used to further train data-augmented models like SIMCSE, SNCSE, and SynCSE, state-of-the-art performance on measuring sentence similarities can be achieved among the approaches without using manually labeled datasets; 3) DefSent+ is also competitive in feature-based transfer for NLP downstream tasks.

Updated: 2024-06-18 02:40:11

标题: DefSent+：通过将定义句子投影到无限词典条目的拟等轴或等轴向量空间中，改善语言模型的句子嵌入

摘要: 这篇论文提出了对先前会议论文DefSent的重大改进。先前的研究旨在通过将定义句子投影到词典词条的向量空间中，从而改进语言模型的句子嵌入。我们发现，由于使用语言模型的词嵌入来表示词典词条的方法论限制，这种方法尚未得到充分探索。这导致了两个障碍。首先，词典词条受单词词汇的限制，因此无法充分利用。其次，已知语言模型的语义表示是各向异性的，但是对于DefSent的预处理词嵌入是不允许的，因为其权重在训练期间被冻结并与预测层绑定。在本文中，我们提出了一种新颖的方法来逐步构建不受限制的词条嵌入。因此，定义句子可以投影到一个无限词典词条的准各向同性或各向同性向量空间中，从而可以获得更好质量的句子嵌入。我们将我们的方法简称为DefSent+（DefSent的一个加强版），具有以下优势：1）与DefSent相比，测量句子相似度的任务性能显著提高；2）当DefSent+用于进一步训练数据增强模型如SIMCSE、SNCSE和SynCSE时，可以在不使用手动标记数据集的情况下实现在测量句子相似度方面的最先进性能；3）DefSent+在基于特征的NLP下游任务转移中也具有竞争力。

更新时间: 2024-06-18 02:40:11

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2405.16153v2

Is persona enough for personality? Using ChatGPT to reconstruct an agent's latent personality from simple descriptions

Personality, a fundamental aspect of human cognition, contains a range of traits that influence behaviors, thoughts, and emotions. This paper explores the capabilities of large language models (LLMs) in reconstructing these complex cognitive attributes based only on simple descriptions containing socio-demographic and personality type information. Utilizing the HEXACO personality framework, our study examines the consistency of LLMs in recovering and predicting underlying (latent) personality dimensions from simple descriptions. Our experiments reveal a significant degree of consistency in personality reconstruction, although some inconsistencies and biases, such as a tendency to default to positive traits in the absence of explicit information, are also observed. Additionally, socio-demographic factors like age and number of children were found to influence the reconstructed personality dimensions. These findings have implications for building sophisticated agent-based simulacra using LLMs and highlight the need for further research on robust personality generation in LLMs.

Updated: 2024-06-18 02:32:57

标题: 人设是否足以代表个性？使用ChatGPT从简单描述中重建代理人的潜在个性

摘要: 人格是人类认知的一个基本方面，包含一系列影响行为、思维和情绪的特质。本文探讨了大型语言模型（LLMs）在仅基于包含社会人口统计和人格类型信息的简单描述重建这些复杂认知属性方面的能力。利用HEXACO人格框架，我们的研究考察了LLMs在从简单描述中恢复和预测潜在人格维度的一致性。我们的实验揭示了在人格重建方面存在一定程度的一致性，尽管也观察到一些不一致和偏见，比如在没有明确信息的情况下倾向于默认积极特质。此外，年龄和子女数量等社会人口统计因素被发现影响了重建的人格维度。这些发现对利用LLMs构建复杂的基于代理的模拟体系具有重要意义，并强调了对LLMs中健壮人格生成的进一步研究的需求。

更新时间: 2024-06-18 02:32:57

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2406.12216v1

LLM-Oracle Machines

Contemporary AI applications leverage large language models (LLMs) for their knowledge and inference capabilities in natural language processing tasks. This approach aligns with the concept of oracle Turing machines (OTMs). To capture the essence of these computations, including those desired but not yet in practice, we extend the notion of OTMs by employing a cluster of LLMs as the oracle. We present four variants: basic, augmented, fault-avoidance, and $\epsilon$-fault. The first two variants are commonly observed, whereas the latter two are specifically designed to ensure reliable outcomes by addressing LLM hallucinations, biases, and inconsistencies.

Updated: 2024-06-18 02:25:33

标题: LLM-Oracle Machines 的中文翻译为“LLM-Oracle 机器”

摘要: 当代人工智能应用利用大型语言模型（LLMs）来实现其在自然语言处理任务中的知识和推理能力。这种方法与奥拉克图灵机（OTMs）的概念相一致。为了捕捉这些计算的本质，包括那些尚未在实践中实现的计算，我们通过采用一个LLMs簇作为奥拉克来扩展OTMs的概念。我们提出了四种变体：基本的、增强的、避免故障的和$\epsilon$-故障。前两种变体是常见的，而后两种是专门设计的，以确保通过解决LLM的幻觉、偏见和不一致性来实现可靠的结果。

更新时间: 2024-06-18 02:25:33

领域: cs.CL,cs.AI,cs.FL,F.1.1; F.4.1; I.2.0

下载: http://arxiv.org/abs/2406.12213v1

Ents: An Efficient Three-party Training Framework for Decision Trees by Communication Optimization

Multi-party training frameworks for decision trees based on secure multi-party computation enable multiple parties to train high-performance models on distributed private data with privacy preservation. The training process essentially involves frequent dataset splitting according to the splitting criterion (e.g. Gini impurity). However, existing multi-party training frameworks for decision trees demonstrate communication inefficiency due to the following issues: (1) They suffer from huge communication overhead in securely splitting a dataset with continuous attributes. (2) They suffer from huge communication overhead due to performing almost all the computations on a large ring to accommodate the secure computations for the splitting criterion. In this paper, we are motivated to present an efficient three-party training framework, namely Ents, for decision trees by communication optimization. For the first issue, we present a series of training protocols based on the secure radix sort protocols to efficiently and securely split a dataset with continuous attributes. For the second issue, we propose an efficient share conversion protocol to convert shares between a small ring and a large ring to reduce the communication overhead incurred by performing almost all the computations on a large ring. Experimental results from eight widely used datasets show that Ents outperforms state-of-the-art frameworks by $5.5\times \sim 9.3\times$ in communication sizes and $3.9\times \sim 5.3\times$ in communication rounds. In terms of training time, Ents yields an improvement of $3.5\times \sim 6.7\times$. To demonstrate its practicality, Ents requires less than three hours to securely train a decision tree on a widely used real-world dataset (Skin Segmentation) with more than 245,000 samples in the WAN setting.

Updated: 2024-06-18 02:24:16

标题: 三方决策树的高效训练框架Ents：通过通信优化实现

摘要: 基于安全多方计算的决策树多方训练框架使多个参与方能够在分布式私有数据上进行高性能模型训练并保护隐私。训练过程主要涉及根据分裂标准（如基尼不纯度）频繁地对数据集进行分裂。然而，现有的决策树多方训练框架存在通信效率低下的问题：（1）在安全地分裂具有连续属性的数据集时，存在巨大的通信开销。（2）由于几乎所有计算都在大环上进行以容纳用于分裂标准的安全计算，因此存在巨大的通信开销。本文旨在通过通信优化提出一种高效的三方训练框架Ents，用于决策树。针对第一个问题，我们提出了一系列基于安全基数排序协议的训练协议，以高效且安全地分裂具有连续属性的数据集。针对第二个问题，我们提出了一种高效的份额转换协议，用于在小环和大环之间转换份额，以减少几乎所有计算都在大环上进行时产生的通信开销。来自八个广泛使用的数据集的实验结果表明，Ents在通信大小方面比最先进的框架提高了$5.5\times \sim 9.3\times$，在通信轮数方面提高了$3.9\times \sim 5.3\times$。在训练时间方面，Ents提高了$3.5\times \sim 6.7\times$。为了证明其实用性，Ents在广泛使用的真实世界数据集（皮肤分割）上在WAN环境中安全地训练一颗决策树仅需不到三个小时。

更新时间: 2024-06-18 02:24:16

领域: cs.CR,cs.AI

下载: http://arxiv.org/abs/2406.07948v2

JailGuard: A Universal Detection Framework for LLM Prompt-based Attacks

Large Language Models (LLMs) and Multi-Modal LLMs (MLLMs) have played a critical role in numerous applications. However, current LLMs are vulnerable to prompt-based attacks, with jailbreaking attacks enabling LLMs to generate harmful content, while hijacking attacks manipulate the model to perform unintended tasks, underscoring the necessity for detection methods. Unfortunately, existing detecting approaches are usually tailored to specific attacks, resulting in poor generalization in detecting various attacks across different modalities. To address it, we propose JailGuard, a universal detection framework for jailbreaking and hijacking attacks across LLMs and MLLMs. JailGuard operates on the principle that attacks are inherently less robust than benign ones, regardless of method or modality. Specifically, JailGuard mutates untrusted inputs to generate variants and leverages the discrepancy of the variants' responses on the model to distinguish attack samples from benign samples. We implement 18 mutators for text and image inputs and design a mutator combination policy to further improve detection generalization. To evaluate the effectiveness of JailGuard, we build the first comprehensive multi-modal attack dataset, containing 11,000 data items across 15 known attack types. The evaluation suggests that JailGuard achieves the best detection accuracy of 86.14%/82.90% on text and image inputs, outperforming state-of-the-art methods by 11.81%-25.73% and 12.20%-21.40%.

Updated: 2024-06-18 02:21:02

标题: 监狱卫士：一种用于LLM提示攻击的通用检测框架

摘要: 大型语言模型(LLMs)和多模态LLMs(MLLMs)在许多应用中发挥了关键作用。然而，当前的LLMs容易受到基于提示的攻击的影响，越狱攻击使LLMs能够生成有害内容，而劫持攻击则操纵模型执行意外任务，强调了检测方法的必要性。不幸的是，现有的检测方法通常针对特定攻击进行定制，导致在检测不同模态下的各种攻击时缺乏广泛性。为了解决这个问题，我们提出了JailGuard，一个针对LLMs和MLLMs跨越越狱和劫持攻击的通用检测框架。JailGuard的操作原则是攻击本质上比良性攻击不够稳健，无论采用何种方法或模态。具体来说，JailGuard会对不受信任的输入进行变异以生成变体，并利用这些变体在模型上的响应差异来区分攻击样本和良性样本。我们为文本和图像输入实现了18个变异器，并设计了一个变异器组合策略来进一步提高检测的广泛性。为了评估JailGuard的有效性，我们构建了第一个全面的多模态攻击数据集，包含11,000个数据项，涵盖了15种已知的攻击类型。评估结果表明，JailGuard在文本和图像输入上实现了最佳的检测准确率，分别为86.14%/82.90%，优于现有方法11.81%-25.73%和12.20%-21.40%。

更新时间: 2024-06-18 02:21:02

领域: cs.CR

下载: http://arxiv.org/abs/2312.10766v3

Bayesian Networks and Machine Learning for COVID-19 Severity Explanation and Demographic Symptom Classification

With the prevailing efforts to combat the coronavirus disease 2019 (COVID-19) pandemic, there are still uncertainties that are yet to be discovered about its spread, future impact, and resurgence. In this paper, we present a three-stage data-driven approach to distill the hidden information about COVID-19. The first stage employs a Bayesian network structure learning method to identify the causal relationships among COVID-19 symptoms and their intrinsic demographic variables. As a second stage, the output from the Bayesian network structure learning, serves as a useful guide to train an unsupervised machine learning (ML) algorithm that uncovers the similarities in patients' symptoms through clustering. The final stage then leverages the labels obtained from clustering to train a demographic symptom identification (DSID) model which predicts a patient's symptom class and the corresponding demographic probability distribution. We applied our method on the COVID-19 dataset obtained from the Centers for Disease Control and Prevention (CDC) in the United States. Results from the experiments show a testing accuracy of 99.99%, as against the 41.15% accuracy of a heuristic ML method. This strongly reveals the viability of our Bayesian network and ML approach in understanding the relationship between the virus symptoms, and providing insights on patients' stratification towards reducing the severity of the virus.

Updated: 2024-06-18 02:20:19

标题: 贝叶斯网络和机器学习用于COVID-19严重程度解释和人口特征症状分类

摘要: 随着当前努力应对新型冠状病毒疫情的持续进行，关于其传播、未来影响和复发仍存在一些未知。本文提出了一个三阶段数据驱动方法，以挖掘关于COVID-19的隐藏信息。第一阶段采用贝叶斯网络结构学习方法，识别COVID-19症状及其内在人口特征之间的因果关系。作为第二阶段，来自贝叶斯网络结构学习的输出作为一个有用的指导，用于训练一个无监督的机器学习（ML）算法，通过聚类揭示患者症状之间的相似性。最后一阶段利用聚类获得的标签来训练一个人口特征症状识别（DSID）模型，该模型可以预测患者的症状类别和相应的人口特征概率分布。我们将该方法应用于美国疾病控制和预防中心（CDC）获取的COVID-19数据集上。实验结果显示，我们的方法的测试准确率为99.99%，而启发式机器学习方法的准确率为41.15%。这强烈表明了我们的贝叶斯网络和机器学习方法在理解病毒症状之间的关系以及为减少病毒严重性提供患者分层洞见的可行性。

更新时间: 2024-06-18 02:20:19

领域: stat.ML,cs.AI,cs.LG,stat.AP

下载: http://arxiv.org/abs/2406.10807v2

When Graph Neural Network Meets Causality: Opportunities, Methodologies and An Outlook

Graph Neural Networks (GNNs) have emerged as powerful representation learning tools for capturing complex dependencies within diverse graph-structured data. Despite their success in a wide range of graph mining tasks, GNNs have raised serious concerns regarding their trustworthiness, including susceptibility to distribution shift, biases towards certain populations, and lack of explainability. Recently, integrating causal learning techniques into GNNs has sparked numerous ground-breaking studies since many GNN trustworthiness issues can be alleviated by capturing the underlying data causality rather than superficial correlations. In this survey, we comprehensively review recent research efforts on Causality-Inspired GNNs (CIGNNs). Specifically, we first employ causal tools to analyze the primary trustworthiness risks of existing GNNs, underscoring the necessity for GNNs to comprehend the causal mechanisms within graph data. Moreover, we introduce a taxonomy of CIGNNs based on the type of causal learning capability they are equipped with, i.e., causal reasoning and causal representation learning. Besides, we systematically introduce typical methods within each category and discuss how they mitigate trustworthiness risks. Finally, we summarize useful resources and discuss several future directions, hoping to shed light on new research opportunities in this emerging field. The representative papers, along with open-source data and codes, are available in https://github.com/usail-hkust/Causality-Inspired-GNNs.

Updated: 2024-06-18 02:19:31

标题: 当图神经网络遇见因果关系：机遇、方法论和展望

摘要: 图神经网络（GNNs）已经成为强大的表示学习工具，用于捕捉多样图结构数据中复杂依赖关系。尽管它们在各种图挖掘任务中取得了成功，但GNNs引发了一些严重的关于其可信度的担忧，包括对分布变化的敏感性、对特定人群的偏见以及缺乏可解释性。最近，将因果学习技术整合到GNNs中引发了许多开创性研究，因为许多GNN的可信度问题可以通过捕捉数据潜在因果关系而不是表面相关性来缓解。在本调查中，我们全面回顾了最近关于因果启发的GNNs（CIGNNs）的研究努力。具体而言，我们首先使用因果工具分析现有GNNs的主要可信度风险，强调GNNs理解图数据内在因果机制的必要性。此外，我们基于它们配备的因果学习能力类型（即因果推理和因果表示学习）引入了CIGNNs的分类法。此外，我们系统介绍了每个类别内的典型方法，并讨论它们如何缓解可信度风险。最后，我们总结了有用的资源，并讨论了几个未来方向，希望为这一新兴领域的研究机会提供一些启示。代表性论文以及开源数据和代码可在https://github.com/usail-hkust/Causality-Inspired-GNNs 上找到。

更新时间: 2024-06-18 02:19:31

领域: cs.LG,cs.AI,stat.ME

下载: http://arxiv.org/abs/2312.12477v3

Jacobian-Enhanced Neural Networks

Jacobian-Enhanced Neural Networks (JENN) are densely connected multi-layer perceptrons, whose training process is modified to predict partial derivatives accurately. Their main benefit is better accuracy with fewer training points compared to standard neural networks. These attributes are particularly desirable in the field of computer-aided design, where there is often the need to replace computationally expensive, physics-based models with fast running approximations, known as surrogate models or meta-models. Since a surrogate emulates the original model accurately in near-real time, it yields a speed benefit that can be used to carry out orders of magnitude more function calls quickly. However, in the special case of gradient-enhanced methods, there is the additional value proposition that partial derivatives are accurate, which is a critical property for one important use-case: surrogate-based optimization. This work derives the complete theory and exemplifies its superiority over standard neural nets for surrogate-based optimization.

Updated: 2024-06-18 02:15:18

标题: 雅可比增强型神经网络

摘要: 雅可比增强神经网络（JENN）是密集连接的多层感知器，其训练过程被修改以准确预测偏导数。与标准神经网络相比，它们的主要优势在于更高的准确性以及更少的训练点。这些属性在计算机辅助设计领域尤为理想，因为经常需要用快速运行的近似模型（称为代理模型或元模型）替换计算昂贵的基于物理的模型。由于代理模型能够准确地模拟原始模型并实时运行，它产生了可以用于快速执行数量级更多函数调用的速度优势。然而，在梯度增强方法的特殊情况下，还有一个额外的价值主张，即偏导数准确，这对于一个重要的用例至关重要：基于代理的优化。本文推导了完整的理论并且举例说明了它在基于代理的优化中的优越性。

更新时间: 2024-06-18 02:15:18

领域: cs.LG

下载: http://arxiv.org/abs/2406.09132v2

Knowledge Fusion By Evolving Weights of Language Models

Fine-tuning pre-trained language models, particularly large language models, demands extensive computing resources and can result in varying performance outcomes across different domains and datasets. This paper examines the approach of integrating multiple models from diverse training scenarios into a unified model. This unified model excels across various data domains and exhibits the ability to generalize well on out-of-domain data. We propose a knowledge fusion method named Evolver, inspired by evolutionary algorithms, which does not need further training or additional training data. Specifically, our method involves aggregating the weights of different language models into a population and subsequently generating offspring models through mutation and crossover operations. These offspring models are then evaluated against their parents, allowing for the preservation of those models that show enhanced performance on development datasets. Importantly, our model evolving strategy can be seamlessly integrated with existing model merging frameworks, offering a versatile tool for model enhancement. Experimental results on mainstream language models (i.e., encoder-only, decoder-only, encoder-decoder) reveal that Evolver outperforms previous state-of-the-art models by large margins. The code is publicly available at {https://github.com/duguodong7/model-evolution}.

Updated: 2024-06-18 02:12:34

标题: 通过演化语言模型的权重实现知识融合

摘要: Fein调整预先训练的语言模型，特别是大型语言模型，需要大量的计算资源，并且可能导致在不同领域和数据集中表现不同的结果。本文研究了将来自不同训练场景的多个模型整合成一个统一模型的方法。这个统一模型在各种数据领域中表现出色，并且具有很好的泛化能力，能够很好地处理域外数据。我们提出了一个名为Evolver的知识融合方法，受到进化算法的启发，它不需要进一步的训练或额外的训练数据。具体而言，我们的方法涉及将不同语言模型的权重聚合到一个群体中，然后通过突变和交叉操作生成后代模型。这些后代模型随后会与它们的父代进行评估，以保留那些在开发数据集上表现出更好性能的模型。重要的是，我们的模型进化策略可以无缝集成到现有的模型合并框架中，提供了一个多功能的模型增强工具。对主流语言模型（即仅编码器、仅解码器、编码器-解码器）的实验结果显示，Evolver比以前的最先进模型表现出更大的优势。代码可以在{https://github.com/duguodong7/model-evolution}上公开获取。

更新时间: 2024-06-18 02:12:34

领域: cs.CL,cs.AI,cs.CV,cs.NE

下载: http://arxiv.org/abs/2406.12208v1

AutoSurvey: Large Language Models Can Automatically Write Surveys

This paper introduces AutoSurvey, a speedy and well-organized methodology for automating the creation of comprehensive literature surveys in rapidly evolving fields like artificial intelligence. Traditional survey paper creation faces challenges due to the vast volume and complexity of information, prompting the need for efficient survey methods. While large language models (LLMs) offer promise in automating this process, challenges such as context window limitations, parametric knowledge constraints, and the lack of evaluation benchmarks remain. AutoSurvey addresses these challenges through a systematic approach that involves initial retrieval and outline generation, subsection drafting by specialized LLMs, integration and refinement, and rigorous evaluation and iteration. Our contributions include a comprehensive solution to the survey problem, a reliable evaluation method, and experimental validation demonstrating AutoSurvey's effectiveness.We open our resources at \url{https://github.com/AutoSurveys/AutoSurvey}.

Updated: 2024-06-18 02:11:31

标题: AutoSurvey：大型语言模型可以自动编写调查问卷

摘要: 本文介绍了AutoSurvey，这是一种快速而有组织的方法，用于自动创建在快速发展的领域（如人工智能）中的综合文献调研。传统的调研论文创建面临着信息量巨大和复杂性的挑战，因此需要高效的调研方法。虽然大型语言模型（LLMs）在自动化这一过程中提供了希望，但挑战如上下文窗口限制、参数化知识约束和缺乏评估基准仍然存在。AutoSurvey通过一个系统化的方法来解决这些挑战，该方法涉及初始检索和大纲生成、由专门的LLMs起草的子部分、整合和完善，以及严格的评估和迭代。我们的贡献包括对调研问题的全面解决方案、可靠的评估方法，以及实验证实了AutoSurvey的有效性。我们开放了我们的资源，网址为\url{https://github.com/AutoSurveys/AutoSurvey}。

更新时间: 2024-06-18 02:11:31

领域: cs.IR,cs.AI,cs.CL

下载: http://arxiv.org/abs/2406.10252v2

Improving Medical Reasoning through Retrieval and Self-Reflection with Retrieval-Augmented Large Language Models

Recent proprietary large language models (LLMs), such as GPT-4, have achieved a milestone in tackling diverse challenges in the biomedical domain, ranging from multiple-choice questions to long-form generations. To address challenges that still cannot be handled with the encoded knowledge of LLMs, various retrieval-augmented generation (RAG) methods have been developed by searching documents from the knowledge corpus and appending them unconditionally or selectively to the input of LLMs for generation. However, when applying existing methods to different domain-specific problems, poor generalization becomes apparent, leading to fetching incorrect documents or making inaccurate judgments. In this paper, we introduce Self-BioRAG, a framework reliable for biomedical text that specializes in generating explanations, retrieving domain-specific documents, and self-reflecting generated responses. We utilize 84k filtered biomedical instruction sets to train Self-BioRAG that can assess its generated explanations with customized reflective tokens. Our work proves that domain-specific components, such as a retriever, domain-related document corpus, and instruction sets are necessary for adhering to domain-related instructions. Using three major medical question-answering benchmark datasets, experimental results of Self-BioRAG demonstrate significant performance gains by achieving a 7.2% absolute improvement on average over the state-of-the-art open-foundation model with a parameter size of 7B or less. Overall, we analyze that Self-BioRAG finds the clues in the question, retrieves relevant documents if needed, and understands how to answer with information from retrieved documents and encoded knowledge as a medical expert does. We release our data and code for training our framework components and model weights (7B and 13B) to enhance capabilities in biomedical and clinical domains.

Updated: 2024-06-18 02:10:15

标题: 通过检索和自我反思以检索增强大型语言模型改进医学推理

摘要: 最近，一些专有的大型语言模型（LLMs），如GPT-4，已经在解决生物医学领域的各种挑战方面取得了里程碑式的成就，从多项选择题到长篇生成。为了解决LLMs编码知识仍无法处理的挑战，人们开发了各种检索增强生成（RAG）方法，通过从知识语料库中搜索文档并无条件或有选择地将其附加到LLMs的输入中进行生成。然而，将现有方法应用于不同领域特定问题时，普遍存在泛化能力较差的问题，导致获取不正确的文档或做出不准确的判断。在本文中，我们介绍了Self-BioRAG，这是一个可靠的生物医学文本框架，专门用于生成解释，检索领域特定文档和自我反思生成的响应。我们利用84k个经过滤的生物医学指令集来训练Self-BioRAG，该模型可以使用定制的反射标记评估其生成的解释。我们的工作证明了，遵循领域相关指令需要领域特定的组件，如检索器、与领域相关的文档语料库和指令集。使用三个主要的医学问答基准数据集，Self-BioRAG的实验结果表明，在参数大小为7B或更小的情况下，平均性能较最先进的开放式基础模型提高了7.2%。总体而言，我们分析认为，Self-BioRAG在问题中找到线索，如有需要检索相关文档，并像医学专家一样理解如何借助来自检索文档和编码知识的信息来回答。我们释放了用于训练我们的框架组件和模型权重（7B和13B）的数据和代码，以增强在生物医学和临床领域的能力。

更新时间: 2024-06-18 02:10:15

领域: cs.CL,cs.AI,cs.IR

下载: http://arxiv.org/abs/2401.15269v3

Generative Pretrained Hierarchical Transformer for Time Series Forecasting

Recent efforts have been dedicated to enhancing time series forecasting accuracy by introducing advanced network architectures and self-supervised pretraining strategies. Nevertheless, existing approaches still exhibit two critical drawbacks. Firstly, these methods often rely on a single dataset for training, limiting the model's generalizability due to the restricted scale of the training data. Secondly, the one-step generation schema is widely followed, which necessitates a customized forecasting head and overlooks the temporal dependencies in the output series, and also leads to increased training costs under different horizon length settings. To address these issues, we propose a novel generative pretrained hierarchical transformer architecture for forecasting, named \textbf{GPHT}. There are two aspects of key designs in GPHT. On the one hand, we advocate for constructing a mixed dataset under the channel-independent assumption for pretraining our model, comprising various datasets from diverse data scenarios. This approach significantly expands the scale of training data, allowing our model to uncover commonalities in time series data and facilitating improved transfer to specific datasets. On the other hand, GPHT employs an auto-regressive forecasting approach, effectively modeling temporal dependencies in the output series. Importantly, no customized forecasting head is required, enabling \textit{a single model to forecast at arbitrary horizon settings.} We conduct sufficient experiments on eight datasets with mainstream self-supervised pretraining models and supervised models. The results demonstrated that GPHT surpasses the baseline models across various fine-tuning and zero/few-shot learning settings in the traditional long-term forecasting task. We make our codes publicly available\footnote{https://github.com/icantnamemyself/GPHT}.

Updated: 2024-06-18 02:09:45

标题: 基于生成式预训练分层Transformer的时间序列预测

摘要: 最近的努力致力于通过引入先进的网络架构和自监督预训练策略来提高时间序列预测的准确性。然而，现有方法仍然存在两个关键缺点。首先，这些方法通常依赖于单一数据集进行训练，由于训练数据的规模受限，限制了模型的泛化能力。其次，广泛采用一步生成模式，这需要一个定制的预测头，忽略了输出序列中的时间依赖性，并且在不同的时间跨度设置下导致了增加的训练成本。为了解决这些问题，我们提出了一种新颖的用于预测的生成式预训练分层transformer架构，称为\textbf{GPHT}。GPHT的关键设计有两个方面。一方面，我们主张在通道独立假设下构建混合数据集来预训练我们的模型，包括来自不同数据场景的各种数据集。这种方法显着扩展了训练数据的规模，使我们的模型能够发现时间序列数据中的共性，并有助于更好地转移到特定数据集。另一方面，GPHT采用自回归预测方法，有效地建模输出序列中的时间依赖性。重要的是，不需要定制的预测头，使得\textit{一个单一模型可以在任意时间跨度设置下进行预测。}我们在八个数据集上进行了充分的实验，使用主流的自监督预训练模型和监督模型。结果表明，在传统的长期预测任务中，GPHT在各种微调和零/少样本学习设置下都超过了基线模型。我们将我们的代码公开发布\footnote{https://github.com/icantnamemyself/GPHT}。

更新时间: 2024-06-18 02:09:45

领域: cs.LG

下载: http://arxiv.org/abs/2402.16516v2

On the Empirical Complexity of Reasoning and Planning in LLMs

Chain-of-thought (CoT), tree-of-thought (ToT), and related techniques work surprisingly well in practice for some complex reasoning tasks with Large Language Models (LLMs), but why? This work seeks the underlying reasons by conducting experimental case studies and linking the performance benefits to well-established sample and computational complexity principles in machine learning. We experimented with 6 reasoning tasks, ranging from grade school math, air travel planning, ..., to Blocksworld. The results suggest that (i) both CoT and ToT benefit significantly from task decomposition, which breaks a complex reasoning task into a sequence of steps with low sample complexity and explicitly outlines the reasoning structure, and (ii) for computationally hard reasoning tasks, the more sophisticated tree structure of ToT outperforms the linear structure of CoT. These findings provide useful guidelines for the use of LLM in solving reasoning tasks in practice.

Updated: 2024-06-18 02:03:35

标题: 关于LLMs中推理和规划的经验复杂性

摘要: 思维链（CoT）、思维树（ToT）及相关技术在实践中对一些复杂的推理任务与大型语言模型（LLMs）表现出惊人的效果，但为什么呢？本研究通过进行实验案例研究，将表现优势与机器学习中已建立的样本复杂性和计算复杂性原则联系起来，以寻找潜在原因。我们对6个推理任务进行了实验，涵盖了从小学数学、航空旅行规划，到Blocksworld。结果表明，（i）CoT和ToT都受益于任务分解，将复杂的推理任务分解为一系列具有低样本复杂性的步骤，并明确勾画出推理结构；（ii）对于计算困难的推理任务，ToT更复杂的树结构优于CoT的线性结构。这些发现为在实践中利用LLM解决推理任务提供了有用的指导。

更新时间: 2024-06-18 02:03:35

领域: cs.AI,cs.LG

下载: http://arxiv.org/abs/2404.11041v2

Order-Optimal Instance-Dependent Bounds for Offline Reinforcement Learning with Preference Feedback

We consider offline reinforcement learning (RL) with preference feedback in which the implicit reward is a linear function of an unknown parameter. Given an offline dataset, our objective consists in ascertaining the optimal action for each state, with the ultimate goal of minimizing the {\em simple regret}. We propose an algorithm, \underline{RL} with \underline{L}ocally \underline{O}ptimal \underline{W}eights or {\sc RL-LOW}, which yields a simple regret of $\exp ( - \Omega(n/H) )$ where $n$ is the number of data samples and $H$ denotes an instance-dependent hardness quantity that depends explicitly on the suboptimality gap of each action. Furthermore, we derive a first-of-its-kind instance-dependent lower bound in offline RL with preference feedback. Interestingly, we observe that the lower and upper bounds on the simple regret match order-wise in the exponent, demonstrating order-wise optimality of {\sc RL-LOW}. In view of privacy considerations in practical applications, we also extend {\sc RL-LOW} to the setting of $(\varepsilon,\delta)$-differential privacy and show, somewhat surprisingly, that the hardness parameter $H$ is unchanged in the asymptotic regime as $n$ tends to infinity; this underscores the inherent efficiency of {\sc RL-LOW} in terms of preserving the privacy of the observed rewards. Given our focus on establishing instance-dependent bounds, our work stands in stark contrast to previous works that focus on establishing worst-case regrets for offline RL with preference feedback.

Updated: 2024-06-18 02:03:12

标题: 具有偏好反馈的离线强化学习的最优实例相关界限

摘要: 我们考虑离线强化学习（RL）中的偏好反馈，其中隐含奖励是一个未知参数的线性函数。给定一个离线数据集，我们的目标是确定每个状态的最佳动作，最终目标是最小化{\em 简单遗憾}。我们提出了一种算法，称为具有局部最优权重的RL或{\sc RL-LOW}，该算法产生一个简单遗憾为$\exp(-\Omega(n/H))$，其中$n$是数据样本的数量，$H$表示一个依赖于实例的硬度量，它明确取决于每个动作的次优性差距。此外，我们推导了一个首次出现的离线RL与偏好反馈的实例相关下界。有趣的是，我们观察到简单遗憾的下界和上界在指数方面匹配，展示了{\sc RL-LOW}的指数优化性。考虑到实际应用中的隐私考虑，我们还将{\sc RL-LOW}扩展到$(\varepsilon,\delta)$-差分隐私设置，并且有些令人惊讶地显示，在渐近情况下，硬度参数$H$不会随着$n$趋向无穷而改变；这突显了{\sc RL-LOW}在保护观察到的奖励隐私方面的内在效率。鉴于我们专注于建立实例相关边界，我们的工作与先前专注于建立离线RL与偏好反馈的最坏情况遗憾的工作形成鲜明对比。

更新时间: 2024-06-18 02:03:12

领域: cs.LG,cs.AI,cs.IT,math.IT,math.ST,stat.ML,stat.TH

下载: http://arxiv.org/abs/2406.12205v1

An Optimal Transport Approach for Network Regression

We study the problem of network regression, where one is interested in how the topology of a network changes as a function of Euclidean covariates. We build upon recent developments in generalized regression models on metric spaces based on Fr\'echet means and propose a network regression method using the Wasserstein metric. We show that when representing graphs as multivariate Gaussian distributions, the network regression problem requires the computation of a Riemannian center of mass (i.e., Fr\'echet means). Fr\'echet means with non-negative weights translates into a barycenter problem and can be efficiently computed using fixed point iterations. Although the convergence guarantees of fixed-point iterations for the computation of Wasserstein affine averages remain an open problem, we provide evidence of convergence in a large number of synthetic and real-data scenarios. Extensive numerical results show that the proposed approach improves existing procedures by accurately accounting for graph size, topology, and sparsity in synthetic experiments. Additionally, real-world experiments using the proposed approach result in higher Coefficient of Determination ($R^{2}$) values and lower mean squared prediction error (MSPE), cementing improved prediction capabilities in practice.

Updated: 2024-06-18 02:03:07

标题: 一种网络回归的最优输运方法

摘要: 我们研究了网络回归问题，其中我们关心的是网络的拓扑结构如何随欧氏协变量的变化而变化。我们建立在最近在基于Fr\'echet均值的度量空间上的广义回归模型的发展基础上，提出了一种使用Wasserstein度量的网络回归方法。我们展示了当将图表示为多变量高斯分布时，网络回归问题需要计算黎曼质心（即Fr\'echet均值）。带有非负权重的Fr\'echet均值转化为一个重心问题，并且可以通过固定点迭代有效地计算。尽管用于计算Wasserstein仿射平均的固定点迭代的收敛性保证仍然是一个未解决的问题，但我们提供了大量合成和真实数据场景中收敛的证据。广泛的数值结果表明，所提出的方法通过准确考虑合成实验中的图大小、拓扑和稀疏性改进了现有程序。此外，使用所提出的方法进行真实世界实验导致更高的确定系数（$R^{2}$）值和更低的均方预测误差（MSPE），巩固了在实践中改进的预测能力。

更新时间: 2024-06-18 02:03:07

领域: stat.ML,cs.LG,math.OC

下载: http://arxiv.org/abs/2406.12204v1

InterIntent: Investigating Social Intelligence of LLMs via Intention Understanding in an Interactive Game Context

Large language models (LLMs) have demonstrated the potential to mimic human social intelligence. However, most studies focus on simplistic and static self-report or performance-based tests, which limits the depth and validity of the analysis. In this paper, we developed a novel framework, InterIntent, to assess LLMs' social intelligence by mapping their ability to understand and manage intentions in a game setting. We focus on four dimensions of social intelligence: situational awareness, self-regulation, self-awareness, and theory of mind. Each dimension is linked to a specific game task: intention selection, intention following, intention summarization, and intention guessing. Our findings indicate that while LLMs exhibit high proficiency in selecting intentions, achieving an accuracy of 88\%, their ability to infer the intentions of others is significantly weaker, trailing human performance by 20\%. Additionally, game performance correlates with intention understanding, highlighting the importance of the four components towards success in this game. These findings underline the crucial role of intention understanding in evaluating LLMs' social intelligence and highlight the potential of using social deduction games as a complex testbed to enhance LLM evaluation. InterIntent contributes a structured approach to bridging the evaluation gap in social intelligence within multiplayer games.

Updated: 2024-06-18 02:02:15

标题: 互动游戏背景下LLMs社交智能的研究：通过意图理解来探讨InterIntent

摘要: 大型语言模型（LLMs）已经展示了模仿人类社交智能的潜力。然而，大多数研究集中在简单和静态的自我报告或基于表现的测试上，这限制了分析的深度和有效性。在本文中，我们开发了一个新颖的框架InterIntent，通过在游戏设置中映射它们理解和管理意图的能力来评估LLMs的社交智能。我们关注社交智能的四个维度：情境意识、自我调节、自我意识和心智理论。每个维度都与特定的游戏任务相关联：意图选择、意图跟随、意图总结和意图猜测。我们的发现表明，虽然LLMs在选择意图方面表现出高水平的熟练度，准确率达到88％，但他们推断他人意图的能力明显较弱，比人类表现低20％。此外，游戏表现与意图理解相关，突显了这四个组成部分对游戏成功的重要性。这些发现强调了意图理解在评估LLMs社交智能中的至关重要性，并强调了使用社交推理游戏作为增强LLM评估的复杂试验平台的潜力。InterIntent提供了一个结构化方法，用于在多人游戏中填补社交智能评估的差距。

更新时间: 2024-06-18 02:02:15

领域: cs.AI

下载: http://arxiv.org/abs/2406.12203v1

SFedCA: Credit Assignment-Based Active Client Selection Strategy for Spiking Federated Learning

Spiking federated learning is an emerging distributed learning paradigm that allows resource-constrained devices to train collaboratively at low power consumption without exchanging local data. It takes advantage of both the privacy computation property in federated learning (FL) and the energy efficiency in spiking neural networks (SNN). Thus, it is highly promising to revolutionize the efficient processing of multimedia data. However, existing spiking federated learning methods employ a random selection approach for client aggregation, assuming unbiased client participation. This neglect of statistical heterogeneity affects the convergence and accuracy of the global model significantly. In our work, we propose a credit assignment-based active client selection strategy, the SFedCA, to judiciously aggregate clients that contribute to the global sample distribution balance. Specifically, the client credits are assigned by the firing intensity state before and after local model training, which reflects the local data distribution difference from the global model. Comprehensive experiments are conducted on various non-identical and independent distribution (non-IID) scenarios. The experimental results demonstrate that the SFedCA outperforms the existing state-of-the-art spiking federated learning methods, and requires fewer communication rounds.

Updated: 2024-06-18 01:56:22

标题: SFedCA：基于信用分配的脉冲联邦学习主动客户选择策略

摘要: 尖峰联合学习是一种新兴的分布式学习范式，允许资源受限的设备在低功耗的情况下协作训练，而无需交换本地数据。它利用了联合学习（FL）中的隐私计算属性和尖峰神经网络（SNN）中的能源效率。因此，它极有可能革新多媒体数据的高效处理。然而，现有的尖峰联合学习方法采用随机选择方法进行客户端聚合，假设客户端参与是无偏的。这种忽视统计异质性会显著影响全局模型的收敛和准确性。在我们的工作中，我们提出了一种基于信用分配的主动客户端选择策略，即SFedCA，以审慎地聚合对全局样本分布平衡有贡献的客户端。具体而言，客户端信用是通过本地模型训练前后的发火强度状态分配的，这反映了本地数据分布与全局模型的差异。在各种非相同和独立分布（non-IID）场景下进行了全面的实验。实验结果表明，SFedCA优于现有的尖峰联合学习方法，并且需要较少的通信轮次。

更新时间: 2024-06-18 01:56:22

领域: cs.LG,cs.DC,cs.ET,cs.MM,cs.NE

下载: http://arxiv.org/abs/2406.12200v1

Time Series Modeling for Heart Rate Prediction: From ARIMA to Transformers

Cardiovascular disease (CVD) is a leading cause of death globally, necessitating precise forecasting models for monitoring vital signs like heart rate, blood pressure, and ECG. Traditional models, such as ARIMA and Prophet, are limited by their need for manual parameter tuning and challenges in handling noisy, sparse, and highly variable medical data. This study investigates advanced deep learning models, including LSTM, and transformer-based architectures, for predicting heart rate time series from the MIT-BIH Database. Results demonstrate that deep learning models, particularly PatchTST, significantly outperform traditional models across multiple metrics, capturing complex patterns and dependencies more effectively. This research underscores the potential of deep learning to enhance patient monitoring and CVD management, suggesting substantial clinical benefits. Future work should extend these findings to larger, more diverse datasets and real-world clinical applications to further validate and optimize model performance.

Updated: 2024-06-18 01:55:37

标题: 时间序列建模用于心率预测：从ARIMA到Transformers

摘要: 心血管疾病（CVD）是全球死亡的主要原因，需要精确的预测模型来监测心率、血压和心电图等生命体征。传统模型，如ARIMA和Prophet，受限于需要手动调参以及处理嘈杂、稀疏和高度变化的医疗数据的挑战。本研究调查了包括LSTM和基于transformer的架构在内的先进深度学习模型，用于从MIT-BIH数据库预测心率时间序列。结果表明，深度学习模型，特别是PatchTST，在多个指标上明显优于传统模型，更有效地捕捉复杂的模式和依赖关系。这项研究强调了深度学习提升患者监测和CVD管理的潜力，暗示了重大的临床益处。未来的工作应该将这些发现扩展到更大、更多样化的数据集和真实世界的临床应用，以进一步验证和优化模型性能。

更新时间: 2024-06-18 01:55:37

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2406.12199v1

Debate as Optimization: Adaptive Conformal Prediction and Diverse Retrieval for Event Extraction

We propose a multi-agent debate as optimization (DAO) system for event extraction, where the primary objective is to iteratively refine the large language models (LLMs) outputs through debating without parameter tuning. In DAO, we introduce two novel modules: the Diverse-RAG (DRAG) module and the Adaptive Conformal Prediction (AdaCP) module. DRAG systematically retrieves supporting information that best fits the debate discussion, while AdaCP enhances the accuracy and reliability of event extraction by effectively rejecting less promising answers. Experimental results demonstrate a significant reduction in the performance gap between supervised approaches and tuning-free LLM-based methods by 18.1% and 17.8% on ACE05 and 17.9% and 15.2% on CASIE for event detection and argument extraction respectively.

Updated: 2024-06-18 01:53:49

标题: 辩论作为优化：自适应一致预测和多样化检索用于事件提取

摘要: 我们提出了一种多代理辩论作为优化（DAO）系统，用于事件提取，在这里，主要目标是通过辩论来迭代地改进大型语言模型（LLMs）的输出，而无需参数调整。在DAO中，我们引入了两个新颖的模块：多样性-关联自动生成（DRAG）模块和自适应确信预测（AdaCP）模块。DRAG系统地检索最适合辩论讨论的支持信息，而AdaCP通过有效地拒绝不太有希望的答案，增强了事件提取的准确性和可靠性。实验结果表明，在ACE05上，监督方法和无调整的基于LLM的方法之间的性能差距显著减少了18.1%，在CASIE上分别减少了17.8%和17.9%和15.2%用于事件检测和论证提取。

更新时间: 2024-06-18 01:53:49

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2406.12197v1

Towards Understanding Variants of Invariant Risk Minimization through the Lens of Calibration

Machine learning models traditionally assume that training and test data are independently and identically distributed. However, in real-world applications, the test distribution often differs from training. This problem, known as out-of-distribution (OOD) generalization, challenges conventional models. Invariant Risk Minimization (IRM) emerges as a solution that aims to identify invariant features across different environments to enhance OOD robustness. However, IRM's complexity, particularly its bi-level optimization, has led to the development of various approximate methods. Our study investigates these approximate IRM techniques, using the consistency and variance of calibration across environments as metrics to measure the invariance aimed for by IRM. Calibration, which measures the reliability of model prediction, serves as an indicator of whether models effectively capture environment-invariant features by showing how uniformly over-confident the model remains across varied environments. Through a comparative analysis of datasets with distributional shifts, we observe that Information Bottleneck-based IRM achieves consistent calibration across different environments. This observation suggests that information compression techniques, such as IB, are potentially effective in achieving model invariance. Furthermore, our empirical evidence indicates that models exhibiting consistent calibration across environments are also well-calibrated. This demonstrates that invariance and cross-environment calibration are empirically equivalent. Additionally, we underscore the necessity for a systematic approach to evaluating OOD generalization. This approach should move beyond traditional metrics, such as accuracy and F1 scores, which fail to account for the model's degree of over-confidence, and instead focus on the nuanced interplay between accuracy, calibration, and model invariance.

Updated: 2024-06-18 01:49:58

标题: 朝向通过校准镜头理解不变风险最小化的变体

摘要: 传统的机器学习模型通常假设训练和测试数据是独立且同分布的。然而，在实际应用中，测试分布往往与训练不同。这个问题被称为超出分布（OOD）泛化，挑战了传统模型。不变风险最小化（IRM）出现为一种解决方案，旨在识别跨不同环境的不变特征，以增强OOD稳健性。然而，IRM的复杂性，特别是其双层优化，导致了各种近似方法的发展。我们的研究调查了这些近似IRM技术，使用跨环境一致性和校准方差作为衡量IRM旨在实现的不变性的度量标准。校准度量模型预测的可靠性，作为模型是否有效捕获环境不变特征的指标，显示模型在各种环境中保持过度自信的一致程度。通过对具有分布偏移的数据集进行比较分析，我们观察到基于信息瓶颈的IRM实现了在不同环境中的一致校准。这一观察表明信息压缩技术，如IB，在实现模型不变性方面可能有效。此外，我们的实证证据表明，在不同环境中展现出一致校准的模型也是校准良好的。这表明，不变性和跨环境校准在经验上是等价的。此外，我们强调了对评估OOD泛化的系统性方法的必要性。这种方法应该超越传统的度量标准，如准确率和F1分数，这些标准未能考虑模型过度自信的程度，而应该专注于准确性、校准度和模型不变性之间微妙的相互作用。

更新时间: 2024-06-18 01:49:58

领域: cs.LG

下载: http://arxiv.org/abs/2401.17541v4

Quantum Compiling with Reinforcement Learning on a Superconducting Processor

To effectively implement quantum algorithms on noisy intermediate-scale quantum (NISQ) processors is a central task in modern quantum technology. NISQ processors feature tens to a few hundreds of noisy qubits with limited coherence times and gate operations with errors, so NISQ algorithms naturally require employing circuits of short lengths via quantum compilation. Here, we develop a reinforcement learning (RL)-based quantum compiler for a superconducting processor and demonstrate its capability of discovering novel and hardware-amenable circuits with short lengths. We show that for the three-qubit quantum Fourier transformation, a compiled circuit using only seven CZ gates with unity circuit fidelity can be achieved. The compiler is also able to find optimal circuits under device topological constraints, with lengths considerably shorter than those by the conventional method. Our study exemplifies the codesign of the software with hardware for efficient quantum compilation, offering valuable insights for the advancement of RL-based compilers.

Updated: 2024-06-18 01:49:48

标题: 超导处理器上的强化学习量子编译

摘要: 在现代量子技术中，有效地在噪声中间规模量子（NISQ）处理器上实施量子算法是一个核心任务。NISQ处理器具有几十到几百个嘈杂的量子比特，具有有限的相干时间和具有错误的门操作，因此NISQ算法自然需要通过量子编译使用短长度的电路。在这里，我们为一个超导处理器开发了一种基于强化学习（RL）的量子编译器，并展示了它发现新颖且适合硬件的短长度电路的能力。我们展示了对于三比特量子傅里叶变换，可以实现仅使用七个CZ门并具有单位电路保真度的编译电路。该编译器还能够在设备拓扑约束下找到最佳电路，其长度明显比传统方法短。我们的研究示范了软件与硬件协同设计以实现高效的量子编译，为基于强化学习的编译器的进步提供了宝贵的见解。

更新时间: 2024-06-18 01:49:48

领域: quant-ph,cs.LG

下载: http://arxiv.org/abs/2406.12195v1

Review and Prospect of Algebraic Research in Equivalent Framework between Statistical Mechanics and Machine Learning Theory

Mathematical equivalence between statistical mechanics and machine learning theory has been known since the 20th century, and researches based on such equivalence have provided novel methodology in both theoretical physics and statistical learning theory. For example, algebraic approach in statistical mechanics such as operator algebra enables us to analyze phase transition phenomena mathematically. In this paper, for theoretical physicists who are interested in artificial intelligence, we review and prospect algebraic researches in machine learning theory. If a learning machine has hierarchical structure or latent variables, then the random Hamiltonian cannot be expressed by any quadratic perturbation because it has singularities. To study an equilibrium state defined by such a singular random Hamiltonian, algebraic approach is necessary to derive asymptotic form of the free energy and the generalization error. We also introduce the most recent advance, in fact, theoretical foundation for alignment of artificial intelligence is now being constructed based on algebraic learning theory. This paper is devoted to the memory of Professor Huzihiro Araki who is a pioneer founder of algebraic research in both statistical mechanics and quantum field theory.

Updated: 2024-06-18 01:49:17

标题: 《统计力学和机器学习理论等价框架中代数研究的回顾与展望》

摘要: 20世纪以来，统计力学和机器学习理论之间的数学等价性已经被认识到，基于这种等价性的研究在理论物理和统计学习理论中提供了新的方法论。例如，统计力学中的代数方法，如算子代数，使我们能够数学地分析相变现象。本文针对对人工智能感兴趣的理论物理学家，回顾并展望机器学习理论中的代数研究。如果一个学习机器具有分层结构或潜在变量，那么随机哈密顿量无法通过任何二次扰动来表示，因为它具有奇点。为了研究由这种奇点随机哈密顿量定义的平衡态，需要采用代数方法来推导自由能和泛化误差的渐近形式。我们还介绍了最近的进展，事实上，人工智能的对齐理论现在正在基于代数学习理论构建其理论基础。本文致力于纪念阿拉基教授，他是统计力学和量子场论中代数研究的先驱创始人。

更新时间: 2024-06-18 01:49:17

领域: cond-mat.stat-mech,cs.LG,math.ST,stat.ML,stat.TH

下载: http://arxiv.org/abs/2406.10234v2

Adaptive Collaborative Correlation Learning-based Semi-Supervised Multi-Label Feature Selection

Semi-supervised multi-label feature selection has recently been developed to solve the curse of dimensionality problem in high-dimensional multi-label data with certain samples missing labels. Although many efforts have been made, most existing methods use a predefined graph approach to capture the sample similarity or the label correlation. In this manner, the presence of noise and outliers within the original feature space can undermine the reliability of the resulting sample similarity graph. It also fails to precisely depict the label correlation due to the existence of unknown labels. Besides, these methods only consider the discriminative power of selected features, while neglecting their redundancy. In this paper, we propose an Adaptive Collaborative Correlation lEarning-based Semi-Supervised Multi-label Feature Selection (Access-MFS) method to address these issues. Specifically, a generalized regression model equipped with an extended uncorrelated constraint is introduced to select discriminative yet irrelevant features and maintain consistency between predicted and ground-truth labels in labeled data, simultaneously. Then, the instance correlation and label correlation are integrated into the proposed regression model to adaptively learn both the sample similarity graph and the label similarity graph, which mutually enhance feature selection performance. Extensive experimental results demonstrate the superiority of the proposed Access-MFS over other state-of-the-art methods.

Updated: 2024-06-18 01:47:38

标题: 自适应协同相关性学习的半监督多标签特征选择

摘要: 最近，半监督多标签特征选择方法已经被发展出来，用于解决高维多标签数据中存在某些样本缺失标签的维度灾难问题。尽管已经做出了许多努力，大多数现有方法使用预定义的图方法来捕获样本相似性或标签相关性。在这种情况下，原始特征空间中的噪声和异常值可能会削弱生成的样本相似性图的可靠性。由于存在未知标签，它也无法准确描述标签相关性。此外，这些方法只考虑选定特征的区分能力，而忽略了它们的冗余性。在本文中，我们提出了一种基于自适应协作相关性学习的半监督多标签特征选择（Access-MFS）方法来解决这些问题。具体地，引入了一个配备扩展非相关约束的广义回归模型，以选择具有区分性但不相关的特征，并同时在标记数据中保持预测和地面真实标签之间的一致性。然后，将实例相关性和标签相关性整合到所提出的回归模型中，以自适应地学习样本相似性图和标签相似性图，从而相互增强特征选择性能。大量实验结果表明，所提出的Access-MFS方法优于其他最先进的方法。

更新时间: 2024-06-18 01:47:38

领域: cs.LG

下载: http://arxiv.org/abs/2406.12193v1

Finite-Time Error Analysis of Soft Q-Learning: Switching System Approach

Soft Q-learning is a variation of Q-learning designed to solve entropy regularized Markov decision problems where an agent aims to maximize the entropy regularized value function. Despite its empirical success, there have been limited theoretical studies of soft Q-learning to date. This paper aims to offer a novel and unified finite-time, control-theoretic analysis of soft Q-learning algorithms. We focus on two types of soft Q-learning algorithms: one utilizing the log-sum-exp operator and the other employing the Boltzmann operator. By using dynamical switching system models, we derive novel finite-time error bounds for both soft Q-learning algorithms. We hope that our analysis will deepen the current understanding of soft Q-learning by establishing connections with switching system models and may even pave the way for new frameworks in the finite-time analysis of other reinforcement learning algorithms.

Updated: 2024-06-18 01:45:03

标题: Soft Q-Learning的有限时间误差分析：切换系统方法

摘要: 软Q-learning是Q-learning的一种变体，旨在解决熵正则化马尔可夫决策问题，其中代理人的目标是最大化熵正则化值函数。尽管软Q-learning在实证研究中取得了成功，但迄今为止对软Q-learning的理论研究有限。本文旨在提供软Q-learning算法的一种新颖且统一的有限时间控制理论分析。我们重点关注两种软Q-learning算法：一种利用对数求和指数运算符，另一种利用Boltzmann运算符。通过使用动态切换系统模型，我们为两种软Q-learning算法推导了新颖的有限时间误差界限。我们希望我们的分析将通过与切换系统模型建立联系，深化对软Q-learning的当前理解，并甚至为其他强化学习算法的有限时间分析铺平道路。

更新时间: 2024-06-18 01:45:03

领域: cs.LG

下载: http://arxiv.org/abs/2403.06366v2

Attack on Scene Flow using Point Clouds

Deep neural networks have made significant advancements in accurately estimating scene flow using point clouds, which is vital for many applications like video analysis, action recognition, and navigation. The robustness of these techniques, however, remains a concern, particularly in the face of adversarial attacks that have been proven to deceive state-of-the-art deep neural networks in many domains. Surprisingly, the robustness of scene flow networks against such attacks has not been thoroughly investigated. To address this problem, the proposed approach aims to bridge this gap by introducing adversarial white-box attacks specifically tailored for scene flow networks. Experimental results show that the generated adversarial examples obtain up to 33.7 relative degradation in average end-point error on the KITTI and FlyingThings3D datasets. The study also reveals the significant impact that attacks targeting point clouds in only one dimension or color channel have on average end-point error. Analyzing the success and failure of these attacks on the scene flow networks and their 2D optical flow network variants shows a higher vulnerability for the optical flow networks.

Updated: 2024-06-18 01:40:23

标题: 使用点云的场景流攻击

摘要: 深度神经网络在使用点云准确估计场景流方面取得了重大进展，这对于许多应用程序如视频分析、动作识别和导航至关重要。然而，这些技术的鲁棒性仍然是一个问题，特别是面对已被证明能够欺骗最先进的深度神经网络的对抗性攻击时。令人惊讶的是，场景流网络对此类攻击的鲁棒性尚未得到彻底调查。为了解决这个问题，提出的方法旨在通过引入专门针对场景流网络的白盒对抗攻击来填补这一空白。实验结果显示，生成的对抗性示例在KITTI和FlyingThings3D数据集上的平均端点误差相对下降高达33.7％。研究还揭示了仅针对点云中的一个维度或颜色通道的攻击对平均端点误差的显著影响。分析这些攻击对场景流网络及其2D光流网络变体的成功和失败显示了光流网络更易受攻击的脆弱性。

更新时间: 2024-06-18 01:40:23

领域: cs.CV,cs.LG,cs.MM

下载: http://arxiv.org/abs/2404.13621v3

AvaTaR: Optimizing LLM Agents for Tool-Assisted Knowledge Retrieval

Large language model (LLM) agents have demonstrated impressive capability in utilizing external tools and knowledge to boost accuracy and reduce hallucinations. However, developing the prompting techniques that make LLM agents able to effectively use external tools and knowledge is a heuristic and laborious task. Here, we introduce AvaTaR, a novel and automatic framework that optimizes an LLM agent to effectively use the provided tools and improve its performance on a given task/domain. During optimization, we design a comparator module to iteratively provide insightful and holistic prompts to the LLM agent via reasoning between positive and negative examples sampled from training data. We demonstrate AvaTaR on four complex multimodal retrieval datasets featuring textual, visual, and relational information. We find AvaTaR consistently outperforms state-of-the-art approaches across all four challenging tasks and exhibits strong generalization ability when applied to novel cases, achieving an average relative improvement of 14% on the Hit@1 metric. Code and dataset are available at https://github.com/zou-group/avatar.

Updated: 2024-06-18 01:39:57

标题: AvaTaR：为工具辅助知识检索优化LLM代理

摘要: 大型语言模型（LLM）代理已经展示出在利用外部工具和知识来提高准确性并减少幻觉方面的令人印象深刻的能力。然而，开发使LLM代理能够有效利用外部工具和知识的提示技术是一项启发式和繁琐的任务。在这里，我们介绍了AvaTaR，一个新颖且自动化的框架，可以优化LLM代理以有效地使用提供的工具并提高其在给定任务/领域上的性能。在优化过程中，我们设计了一个比较模块，通过在训练数据中从正负例中采样进行推理，以迭代地为LLM代理提供富有洞见和全面的提示。我们在四个包含文本、视觉和关系信息的复杂多模态检索数据集上展示了AvaTaR。我们发现AvaTaR在所有四个具有挑战性的任务中一直优于最先进的方法，并在应用于新案例时表现出强大的泛化能力，使Hit@1指标的平均相对改进达到14％。代码和数据集可在https://github.com/zou-group/avatar 上找到。

更新时间: 2024-06-18 01:39:57

领域: cs.LG,cs.CL

下载: http://arxiv.org/abs/2406.11200v2

Multi-Sender Persuasion: A Computational Perspective

We consider the multi-sender persuasion problem: multiple players with informational advantage signal to convince a single self-interested actor to take certain actions. This problem generalizes the seminal Bayesian Persuasion framework and is ubiquitous in computational economics, multi-agent learning, and machine learning with multiple objectives. The core solution concept here is the Nash equilibrium of senders' signaling policies. Theoretically, we prove that finding an equilibrium in general is PPAD-Hard; in fact, even computing a sender's best response is NP-Hard. Given these intrinsic difficulties, we turn to finding local Nash equilibria. We propose a novel differentiable neural network to approximate this game's non-linear and discontinuous utilities. Complementing this with the extra-gradient algorithm, we discover local equilibria that Pareto dominates full-revelation equilibria and those found by existing neural networks. Broadly, our theoretical and empirical contributions are of interest to a large class of economic problems.

Updated: 2024-06-18 01:37:14

标题: 多发信者说服：一个计算视角

摘要: 我们考虑多发信者说服问题：多个具有信息优势的玩家发出信号，说服单个自私的行为者采取某些行动。这个问题推广了开创性的贝叶斯说服框架，在计算经济学、多智能体学习和具有多个目标的机器学习中普遍存在。这里的核心解决方案概念是发信者信号策略的纳什均衡。从理论上讲，我们证明了一般情况下寻找均衡是PPAD-Hard；事实上，甚至计算发信者的最佳响应都是NP-Hard。鉴于这些固有困难，我们转而寻找局部纳什均衡。我们提出了一种新颖的可微神经网络来近似该游戏的非线性和不连续效用。结合额外梯度算法，我们发现了帕累托支配全揭示均衡和现有神经网络找到的均衡的局部均衡。总的来说，我们的理论和实证贡献对大类经济问题都具有重要意义。

更新时间: 2024-06-18 01:37:14

领域: cs.AI,cs.GT

下载: http://arxiv.org/abs/2402.04971v3

Aqulia-Med LLM: Pioneering Full-Process Open-Source Medical Language Models

Recently, both closed-source LLMs and open-source communities have made significant strides, outperforming humans in various general domains. However, their performance in specific professional fields such as medicine, especially within the open-source community, remains suboptimal due to the complexity of medical knowledge. We propose Aquila-Med, a bilingual medical LLM based on Aquila, addressing these challenges through continue pre-training, supervised fine-tuning (SFT), and reinforcement learning from human feedback (RLHF). We construct a large-scale Chinese and English medical dataset for continue pre-training and a high-quality SFT dataset, covering extensive medical specialties. Additionally, we develop a high-quality Direct Preference Optimization (DPO) dataset for further alignment. Aquila-Med achieves notable results across single-turn, multi-turn dialogues, and medical multiple-choice questions, demonstrating the effectiveness of our approach. We open-source the datasets and the entire training process, contributing valuable resources to the research community. Our models and datasets will released at https://huggingface.co/BAAI/AquilaMed-RL.

Updated: 2024-06-18 01:30:07

标题: Aqulia-Med LLM：开创性的全流程开源医学语言模型

摘要: 最近，闭源LLM和开源社区都取得了显著进展，在各种通用领域中表现出优于人类的能力。然而，它们在特定专业领域，特别是医学领域的表现仍然不尽人意，尤其是在开源社区中，这是由于医学知识的复杂性。我们提出了基于Aquila的双语医学LLM——Aquila-Med，通过持续的预训练、监督微调（SFT）和从人类反馈中学习的强化学习（RLHF）来解决这些挑战。我们构建了一个大规模的中英文医学数据集用于持续预训练和高质量的SFT数据集，涵盖广泛的医学专业。此外，我们开发了一个高质量的直接偏好优化（DPO）数据集以进一步对齐。Aquila-Med在单轮、多轮对话和医学多项选择题上取得了显著成果，展示了我们方法的有效性。我们开源了数据集和整个训练过程，为研究社区提供了宝贵的资源。我们的模型和数据集将在https://huggingface.co/BAAI/AquilaMed-RL上发布。

更新时间: 2024-06-18 01:30:07

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2406.12182v1

Efficient Prompting for LLM-based Generative Internet of Things

Large language models (LLMs) have demonstrated remarkable capacities on various tasks, and integrating the capacities of LLMs into the Internet of Things (IoT) applications has drawn much research attention recently. Due to security concerns, many institutions avoid accessing state-of-the-art commercial LLM services, requiring the deployment and utilization of open-source LLMs in a local network setting. However, open-source LLMs usually have more limitations regarding their performance, such as their arithmetic calculation and reasoning capacities, and practical systems of applying LLMs to IoT have yet to be well-explored. Therefore, we propose a text-based generative IoT (GIoT) system deployed in the local network setting in this study. To alleviate the limitations of LLMs and provide service with competitive performance, we apply prompt engineering methods to enhance the capacities of the open-source LLMs, design a Prompt Management Module and a Post-processing Module to manage the tailored prompts for different tasks and process the results generated by the LLMs. To demonstrate the effectiveness of the proposed system, we discuss a challenging Table Question Answering (Table-QA) task as a case study of the proposed system, as tabular data is usually more challenging than plain text because of their complex structures, heterogeneous data types and sometimes huge sizes. We conduct comprehensive experiments on two popular Table-QA datasets, and the results show that our proposal can achieve competitive performance compared with state-of-the-art LLMs, demonstrating that the proposed LLM-based GIoT system can provide competitive performance with tailored prompting methods and is easily extensible to new tasks without training.

Updated: 2024-06-18 01:26:33

标题: 基于LLM的生成式物联网的高效提示

摘要: 大型语言模型（LLMs）在各种任务上展现出了非凡的能力，并将LLMs的能力整合到物联网（IoT）应用程序中近期引起了很多研究关注。由于安全问题，许多机构避免访问最先进的商业LLM服务，要求在本地网络环境中部署和利用开源LLMs。然而，开源LLMs通常在性能方面存在更多限制，比如它们的算术计算和推理能力，以及将LLMs应用于IoT的实际系统尚未得到很好的探索。因此，在本研究中我们提出了一个部署在本地网络环境中的基于文本生成的物联网（GIoT）系统。为了减轻LLMs的限制并提供具有竞争性能力的服务，我们应用提示工程方法来增强开源LLMs的能力，设计了一个提示管理模块和一个后处理模块，用于管理针对不同任务定制的提示并处理LLMs生成的结果。为了展示所提出系统的有效性，我们以具有挑战性的表格问答（Table-QA）任务作为所提出系统的案例研究，因为表格数据通常比纯文本更具挑战性，由于其复杂的结构、异构数据类型和有时巨大的大小。我们对两个流行的Table-QA数据集进行了全面实验，结果显示我们的提议可以与最先进的LLMs相比具有竞争性能力，证明了提出的基于LLMs的GIoT系统可以通过定制提示方法实现竞争性能力，并且可以轻松扩展到新任务而无需训练。

更新时间: 2024-06-18 01:26:33

领域: cs.AI,cs.CL

下载: http://arxiv.org/abs/2406.10382v2

Embodied Question Answering via Multi-LLM Systems

Embodied Question Answering (EQA) is an important problem, which involves an agent exploring the environment to answer user queries. In the existing literature, EQA has exclusively been studied in single-agent scenarios, where exploration can be time-consuming and costly. In this work, we consider EQA in a multi-agent framework involving multiple large language models (LLM) based agents independently answering queries about a household environment. To generate one answer for each query, we use the individual responses to train a Central Answer Model (CAM) that aggregates responses for a robust answer. Using CAM, we observe a $50\%$ higher EQA accuracy when compared against aggregation methods for ensemble LLM, such as voting schemes and debates. CAM does not require any form of agent communication, alleviating it from the associated costs. We ablate CAM with various nonlinear (neural network, random forest, decision tree, XGBoost) and linear (logistic regression classifier, SVM) algorithms. Finally, we present a feature importance analysis for CAM via permutation feature importance (PFI), quantifying CAMs reliance on each independent agent and query context.

Updated: 2024-06-18 01:18:46

标题: 通过多个LLM系统实现具身问答

摘要: 具身问答（EQA）是一个重要的问题，涉及代理探索环境以回答用户查询。在现有文献中，EQA已经在单一代理情景中研究过，其中探索可能耗时且昂贵。在这项工作中，我们考虑在一个多代理框架中进行EQA，涉及多个基于大型语言模型（LLM）的代理独立回答有关家庭环境的查询。为了为每个查询生成一个答案，我们使用个体响应来训练一个中央答案模型（CAM），该模型聚合响应以获得强大答案。使用CAM，与集合LLM的聚合方法（如投票方案和辩论）相比，我们观察到50％更高的EQA准确性。CAM不需要任何形式的代理通信，从而减轻了相关成本。我们使用各种非线性（神经网络，随机森林，决策树，XGBoost）和线性（逻辑回归分类器，支持向量机）算法对CAM进行消融实验。最后，我们通过置换特征重要性（PFI）提供了CAM的特征重要性分析，量化CAM对每个独立代理和查询上下文的依赖。

更新时间: 2024-06-18 01:18:46

领域: cs.LG,cs.AI,cs.CL

下载: http://arxiv.org/abs/2406.10918v2

Location-based Radiology Report-Guided Semi-supervised Learning for Prostate Cancer Detection

Prostate cancer is one of the most prevalent malignancies in the world. While deep learning has potential to further improve computer-aided prostate cancer detection on MRI, its efficacy hinges on the exhaustive curation of manually annotated images. We propose a novel methodology of semisupervised learning (SSL) guided by automatically extracted clinical information, specifically the lesion locations in radiology reports, allowing for use of unannotated images to reduce the annotation burden. By leveraging lesion locations, we refined pseudo labels, which were then used to train our location-based SSL model. We show that our SSL method can improve prostate lesion detection by utilizing unannotated images, with more substantial impacts being observed when larger proportions of unannotated images are used.

Updated: 2024-06-18 01:08:42

标题: 基于位置的放射学报告引导的半监督学习在前列腺癌检测中的应用

摘要: 前列腺癌是世界上最普遍的恶性肿瘤之一。虽然深度学习有潜力进一步改善MRI辅助前列腺癌检测，但其有效性取决于手动注释图像的详尽整理。我们提出了一种新颖的半监督学习方法，该方法由自动提取的临床信息指导，特别是放射学报告中的病变位置，允许使用未注释的图像来减少注释负担。通过利用病变位置，我们改进了伪标签，然后用于训练基于位置的半监督学习模型。我们展示了我们的半监督学习方法可以通过利用未经注释的图像来改善前列腺病变检测，当使用较大比例的未注释图像时，观察到更为显著的影响。

更新时间: 2024-06-18 01:08:42

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2406.12177v1

Convergence of Kinetic Langevin Monte Carlo on Lie groups

Explicit, momentum-based dynamics for optimizing functions defined on Lie groups was recently constructed, based on techniques such as variational optimization and left trivialization. We appropriately add tractable noise to the optimization dynamics to turn it into a sampling dynamics, leveraging the advantageous feature that the trivialized momentum variable is Euclidean despite that the potential function lives on a manifold. We then propose a Lie-group MCMC sampler, by delicately discretizing the resulting kinetic-Langevin-type sampling dynamics. The Lie group structure is exactly preserved by this discretization. Exponential convergence with explicit convergence rate for both the continuous dynamics and the discrete sampler are then proved under $W_2$ distance. Only compactness of the Lie group and geodesically $L$-smoothness of the potential function are needed. To the best of our knowledge, this is the first convergence result for kinetic Langevin on curved spaces, and also the first quantitative result that requires no convexity or, at least not explicitly, any common relaxation such as isoperimetry.

Updated: 2024-06-18 01:08:24

标题: Lie群上的动力学Langevin Monte Carlo的收敛性

摘要: 最近，基于变分优化和左微化等技术，构建了一种用于优化定义在李群上的函数的显式、基于动量的动力学。我们适当地向优化动力学中添加可处理的噪声，将其转变为采样动力学，利用了微化的动量变量是欧几里得的这一有利特性，尽管潜在函数存在于流形上。然后，我们提出了一种李群MCMC采样器，通过精细地离散化得到的动能-朗之万型采样动力学。这种离散化能够精确地保持李群结构。在$W_2$距离下，对于连续动力学和离散采样器，我们证明了指数收敛和显式收敛率。只需要李群的紧致性和潜在函数的测地$L$-光滑性。据我们所知，这是对曲面上动能朗之万的第一个收敛结果，也是第一个不需要凸性或至少不明确需要任何常见松弛如等周性的定量结果。

更新时间: 2024-06-18 01:08:24

领域: math.ST,cs.LG,cs.NA,math.NA,math.PR,stat.ML,stat.TH

下载: http://arxiv.org/abs/2403.12012v2

Pseudorandom Error-Correcting Codes

We construct pseudorandom error-correcting codes (or simply pseudorandom codes), which are error-correcting codes with the property that any polynomial number of codewords are pseudorandom to any computationally-bounded adversary. Efficient decoding of corrupted codewords is possible with the help of a decoding key. We build pseudorandom codes that are robust to substitution and deletion errors, where pseudorandomness rests on standard cryptographic assumptions. Specifically, pseudorandomness is based on either $2^{O(\sqrt{n})}$-hardness of LPN, or polynomial hardness of LPN and the planted XOR problem at low density. As our primary application of pseudorandom codes, we present an undetectable watermarking scheme for outputs of language models that is robust to cropping and a constant rate of random substitutions and deletions. The watermark is undetectable in the sense that any number of samples of watermarked text are computationally indistinguishable from text output by the original model. This is the first undetectable watermarking scheme that can tolerate a constant rate of errors. Our second application is to steganography, where a secret message is hidden in innocent-looking content. We present a constant-rate stateless steganography scheme with robustness to a constant rate of substitutions. Ours is the first stateless steganography scheme with provable steganographic security and any robustness to errors.

Updated: 2024-06-18 01:00:58

标题: 伪随机纠错码

摘要: 我们构建了伪随机纠错码（或简称为伪随机码），这些纠错码具有任意多个码字对于任何计算上受限的对手都是伪随机的特性。通过解码密钥可以有效地解码损坏的码字。我们构建了对替换和删除错误具有鲁棒性的伪随机码，其中伪随机性基于标准的密码学假设。具体而言，伪随机性基于LPN的$2^{O(\sqrt{n})}$难度，或者是LPN和低密度植入XOR问题的多项式难度。作为我们对伪随机码的主要应用，我们提出了一种适用于语言模型输出的不可检测水印方案，能够抵抗裁剪和恒定速率的随机替换和删除。水印是不可检测的，即任意数量的水印文本样本在计算上无法区分是否是由原始模型输出的文本。这是第一个能够容忍恒定错误率的不可检测水印方案。我们的第二个应用是隐写术，其中秘密消息被隐藏在看似无害的内容中。我们提出了一种恒定速率无状态隐写术方案，具有对恒定速率替换的鲁棒性。这是第一个具有可证隐写安全性和对错误具有任何鲁棒性的无状态隐写术方案。

更新时间: 2024-06-18 01:00:58

领域: cs.CR,cs.AI,cs.LG

下载: http://arxiv.org/abs/2402.09370v2

GPT-FL: Generative Pre-trained Model-Assisted Federated Learning

In this work, we propose GPT-FL, a generative pre-trained model-assisted federated learning (FL) framework. At its core, GPT-FL leverages generative pre-trained models to generate diversified synthetic data. These generated data are used to train a downstream model on the server, which is then fine-tuned with private client data under the standard FL framework. We show that GPT-FL consistently outperforms state-of-the-art FL methods in terms of model test accuracy, communication efficiency, and client sampling efficiency. Through comprehensive ablation analysis across various data modalities, we discover that the downstream model generated by synthetic data plays a crucial role in controlling the direction of gradient diversity during FL training, which enhances convergence speed and contributes to the notable accuracy boost observed with GPT-FL. Also, regardless of whether the target data falls within or outside the domain of the pre-trained generative model, GPT-FL consistently achieves significant performance gains, surpassing the results obtained by models trained solely with FL or synthetic data. The code is available at https://github.com/AvestimehrResearchGroup/GPT-FL.

Updated: 2024-06-18 01:00:10

标题: GPT-FL: 生成式预训练模型辅助联邦学习

摘要: 在这项工作中，我们提出了GPT-FL，一个生成式预训练模型辅助的联邦学习（FL）框架。在其核心，GPT-FL利用生成式预训练模型生成多样化的合成数据。这些生成的数据用于在服务器上训练下游模型，然后在标准FL框架下使用私有客户数据进行微调。我们展示了GPT-FL在模型测试准确性、通信效率和客户采样效率方面始终优于最先进的FL方法。通过对各种数据模态进行全面的消融分析，我们发现由合成数据生成的下游模型在控制FL训练过程中梯度多样性方面起着至关重要的作用，这提高了收敛速度并有助于观察到的GPT-FL显著准确性提升。此外，无论目标数据是否在预训练生成模型的领域内或外，GPT-FL始终实现显著的性能提升，超过了仅使用FL或合成数据训练的模型所获得的结果。该代码可在https://github.com/AvestimehrResearchGroup/GPT-FL找到。

更新时间: 2024-06-18 01:00:10

领域: cs.LG,cs.DC

下载: http://arxiv.org/abs/2306.02210v4

General Distribution Learning: A theoretical framework for Deep Learning

There remain numerous unanswered research questions on deep learning (DL) within the classical learning theory framework. These include the remarkable generalization capabilities of overparametrized neural networks (NNs), the efficient optimization performance despite non-convexity of objectives, the mechanism of flat minima for generalization, and the exceptional performance of deep architectures in solving physical problems. This paper introduces General Distribution Learning (GD Learning), a novel theoretical learning framework designed to address a comprehensive range of machine learning and statistical tasks, including classification, regression and parameter estimation. Departing from traditional statistical machine learning, GD Learning focuses on the true underlying distribution. In GD Learning, learning error, corresponding to the expected error in classical statistical learning framework, is divided into fitting errors due to models and algorithms, as well as sampling errors introduced by limited sampling data. The framework significantly incorporates prior knowledge, especially in scenarios characterized by data scarcity, thereby enhancing performance. Within the GD Learning framework, we demonstrate that the global optimal solutions in non-convex optimization can be approached by minimizing the gradient norm and the non-uniformity of the eigenvalues of the model's Jacobian matrix. This insight leads to the development of the gradient structure control algorithm. GD Learning also offers fresh insights into the questions on deep learning, including overparameterization and non-convex optimization, bias-variance trade-off, and the mechanism of flat minima.

Updated: 2024-06-18 00:54:46

标题: 一般分布学习：深度学习的理论框架

摘要: 在经典学习理论框架内，深度学习（DL）仍然存在许多未解答的研究问题。这些问题包括超参数神经网络（NNs）的显著泛化能力，尽管目标函数非凸，但优化性能高效，以及泛化的平坦极小机制，以及深度结构在解决物理问题方面的出色性能。本文介绍了一种新颖的理论学习框架General Distribution Learning（GD Learning），旨在解决一系列机器学习和统计任务，包括分类、回归和参数估计。与传统的统计机器学习不同，GD Learning关注真实的底层分布。在GD Learning中，学习误差，对应于经典统计学习框架中的期望误差，被分为由模型和算法引起的拟合误差，以及由有限采样数据引入的采样误差。该框架显著地整合了先验知识，特别是在数据稀缺的情况下，从而提高性能。在GD Learning框架内，我们展示了通过最小化梯度范数和模型雅可比矩阵特征值的非一致性，可以接近非凸优化中的全局最优解。这一见解导致了梯度结构控制算法的发展。GD Learning还为深度学习中的问题提供了新的见解，包括超参数化和非凸优化，偏差-方差权衡，以及平坦极小的机制。

更新时间: 2024-06-18 00:54:46

领域: cs.LG,cs.IR,stat.ML

下载: http://arxiv.org/abs/2406.05666v3

CLST: Cold-Start Mitigation in Knowledge Tracing by Aligning a Generative Language Model as a Students' Knowledge Tracer

Knowledge tracing (KT), wherein students' problem-solving histories are used to estimate their current levels of knowledge, has attracted significant interest from researchers. However, most existing KT models were developed with an ID-based paradigm, which exhibits limitations in cold-start performance. These limitations can be mitigated by leveraging the vast quantities of external knowledge possessed by generative large language models (LLMs). In this study, we propose cold-start mitigation in knowledge tracing by aligning a generative language model as a students' knowledge tracer (CLST) as a framework that utilizes a generative LLM as a knowledge tracer. Upon collecting data from math, social studies, and science subjects, we framed the KT task as a natural language processing task, wherein problem-solving data are expressed in natural language, and fine-tuned the generative LLM using the formatted KT dataset. Subsequently, we evaluated the performance of the CLST in situations of data scarcity using various baseline models for comparison. The results indicate that the CLST significantly enhanced performance with a dataset of fewer than 100 students in terms of prediction, reliability, and cross-domain generalization.

Updated: 2024-06-18 00:53:50

标题: CLST：通过将生成式语言模型对齐为学生知识跟踪器，减轻知识跟踪中的冷启动问题

摘要: 知识追踪（KT）是指利用学生的问题解决历史来估计他们当前知识水平的方法，引起了研究人员的极大兴趣。然而，大多数现有的KT模型是基于ID的范式开发的，这在冷启动性能方面存在局限性。这些限制可以通过利用生成性大语言模型（LLMs）所拥有的大量外部知识来缓解。在这项研究中，我们提出通过将生成语言模型作为学生知识追踪器（CLST）的框架来对知识追踪进行冷启动缓解，该框架利用生成性LLM作为知识追踪器。在从数学、社会学和科学学科收集数据后，我们将KT任务构建为自然语言处理任务，其中问题解决数据用自然语言表达，并使用格式化的KT数据集对生成性LLM进行微调。随后，我们评估了CLST在数据稀缺情况下的性能，使用各种基准模型进行比较。结果表明，CLST在少于100名学生的数据集中在预测、可靠性和跨领域泛化方面显着提升了性能。

更新时间: 2024-06-18 00:53:50

领域: cs.CL,cs.AI,cs.CY

下载: http://arxiv.org/abs/2406.10296v2

Navigating the Labyrinth: Evaluating and Enhancing LLMs' Ability to Reason About Search Problems

Recently, Large Language Models (LLMs) attained impressive performance in math and reasoning benchmarks. However, they still often struggle with logic problems and puzzles that are relatively easy for humans. To further investigate this, we introduce a new benchmark, SearchBench, containing 11 unique search problem types, each equipped with automated pipelines to generate an arbitrary number of instances and analyze the feasibility, correctness, and optimality of LLM-generated solutions. We show that even the most advanced LLMs fail to solve these problems end-to-end in text, e.g. GPT4 solves only 1.4%. SearchBench problems require considering multiple pathways to the solution as well as backtracking, posing a significant challenge to auto-regressive models. Instructing LLMs to generate code that solves the problem helps, but only slightly, e.g., GPT4's performance rises to 11.7%. In this work, we show that in-context learning with A* algorithm implementations enhances performance. The full potential of this promoting approach emerges when combined with our proposed Multi-Stage-Multi-Try method, which breaks down the algorithm implementation into two stages and verifies the first stage against unit tests, raising GPT-4's performance above 57%.

Updated: 2024-06-18 00:44:58

标题: "穿越迷宫：评估和增强LLMs推理搜索问题的能力"

摘要: 最近，大型语言模型（LLMs）在数学和推理基准测试中取得了令人印象深刻的表现。然而，它们仍然经常在对人类相对容易的逻辑问题和谜题上遇到困难。为了进一步研究这一问题，我们引入了一个新的基准测试，名为SearchBench，包含11种独特的搜索问题类型，每种类型都配备了自动化流水线，用于生成任意数量的实例并分析LLM生成的解决方案的可行性、正确性和最优性。我们展示，即使是最先进的LLMs也无法通过文本端到端解决这些问题，例如，GPT4仅解决了1.4%。SearchBench问题需要考虑到解决方案的多条路径以及回溯，对自回归模型构成了重大挑战。指示LLMs生成解决问题的代码有所帮助，但仅有轻微提升，例如，GPT4的性能提高到11.7%。在这项工作中，我们展示了通过A*算法实现的上下文学习如何增强性能。当结合我们提出的多阶段多尝试方法时，这种推广方法的全部潜力得以显现，该方法将算法实现分为两个阶段，并将第一阶段与单元测试进行验证，将GPT-4的性能提升至57%以上。

更新时间: 2024-06-18 00:44:58

领域: cs.AI

下载: http://arxiv.org/abs/2406.12172v1

BPO: Supercharging Online Preference Learning by Adhering to the Proximity of Behavior LLM

Direct alignment from preferences (DAP) has emerged as a promising paradigm for aligning large language models (LLMs) to human desiderata from pre-collected, offline preference datasets. While recent studies indicate that existing offline DAP methods can directly benefit from online training samples, we highlight the need to develop specific online DAP algorithms to fully harness the power of online training. Specifically, we identify that the learned LLM should adhere to the proximity of the behavior LLM, which collects the training samples. To this end, we propose online Preference Optimization in proximity to the Behavior LLM (BPO), emphasizing the importance of constructing a proper trust region for LLM alignment. We conduct extensive experiments to validate the effectiveness and applicability of our approach by integrating it with various DAP methods, resulting in significant performance improvements across a wide range of tasks when training with the same amount of preference data. Even when only introducing one additional data collection phase, our online BPO improves its offline DAP baseline from 72.0% to 80.2% on TL;DR and from 82.2% to 89.1% on Anthropic Helpfulness in terms of win rate against human reference text.

Updated: 2024-06-18 00:41:40

标题: BPO：通过遵循行为接近性来强化在线偏好学习LLM

摘要: 直接从偏好（DAP）作为一种有前途的范式，用于将大型语言模型（LLMs）与人类期望对齐，从预先收集的离线偏好数据集中。尽管最近的研究表明，现有的离线DAP方法可以直接受益于在线训练样本，但我们强调需要开发特定的在线DAP算法，以充分利用在线训练的力量。具体而言，我们确定学习的LLM应遵守行为LLM的接近度，该行为LLM收集训练样本。为此，我们提出了在线偏好优化与行为LLM接近度（BPO），强调构建适当信任区域对LLM对齐的重要性。我们进行了广泛的实验证实我们的方法的有效性和适用性，将其与各种DAP方法集成，结果在一系列任务中显著提高了性能，当使用相同数量的偏好数据进行训练时。即使只引入一个额外的数据收集阶段，我们的在线BPO也将其离线DAP基线从72.0%提高到TL;DR上的80.2%，从82.2%提高到Anthropic Helpfulness上的89.1%，在与人类参考文本的胜率方面。

更新时间: 2024-06-18 00:41:40

领域: cs.LG,cs.AI,cs.CL

下载: http://arxiv.org/abs/2406.12168v1

A Mel Spectrogram Enhancement Paradigm Based on CWT in Speech Synthesis

Acoustic features play an important role in improving the quality of the synthesised speech. Currently, the Mel spectrogram is a widely employed acoustic feature in most acoustic models. However, due to the fine-grained loss caused by its Fourier transform process, the clarity of speech synthesised by Mel spectrogram is compromised in mutant signals. In order to obtain a more detailed Mel spectrogram, we propose a Mel spectrogram enhancement paradigm based on the continuous wavelet transform (CWT). This paradigm introduces an additional task: a more detailed wavelet spectrogram, which like the post-processing network takes as input the Mel spectrogram output by the decoder. We choose Tacotron2 and Fastspeech2 for experimental validation in order to test autoregressive (AR) and non-autoregressive (NAR) speech systems, respectively. The experimental results demonstrate that the speech synthesised using the model with the Mel spectrogram enhancement paradigm exhibits higher MOS, with an improvement of 0.14 and 0.09 compared to the baseline model, respectively. These findings provide some validation for the universality of the enhancement paradigm, as they demonstrate the success of the paradigm in different architectures.

Updated: 2024-06-18 00:34:44

标题: 基于CWT的语音合成中的Mel频谱图增强范式

摘要: 声学特征在提高合成语音质量方面起着重要作用。目前，Mel频谱图是大多数声学模型中广泛采用的声学特征。然而，由于其傅里叶变换过程引起的细粒度损失，Mel频谱图合成的语音清晰度在变异信号中受到影响。为了获得更详细的Mel频谱图，我们提出了一种基于连续小波变换（CWT）的Mel频谱图增强范式。该范式引入了一个额外的任务：更详细的小波谱图，类似于后处理网络，以Mel频谱图作为解码器输出的输入。我们选择Tacotron2和Fastspeech2进行实验验证，以分别测试自回归（AR）和非自回归（NAR）语音系统。实验结果表明，使用Mel频谱图增强范式模型合成的语音具有更高的MOS，分别比基准模型提高了0.14和0.09。这些发现为增强范式的普适性提供了一些验证，因为它们展示了该范式在不同架构中的成功。

更新时间: 2024-06-18 00:34:44

领域: cs.SD,cs.AI,eess.AS

下载: http://arxiv.org/abs/2406.12164v1

ChatLog: Carefully Evaluating the Evolution of ChatGPT Across Time

ChatGPT has achieved great success and can be considered to have acquired an infrastructural status. There are abundant works for evaluating ChatGPT on benchmarks. However, existing benchmarks encounter two challenges: (1) Disregard for periodical evaluation and (2) Lack of fine-grained features. In this paper, we construct ChatLog, an ever-updating dataset with large-scale records of diverse long-form ChatGPT responses for 21 NLP benchmarks from March, 2023 to now. We conduct a comprehensive performance evaluation to find that most capabilities of ChatGPT improve over time except for some abilities, and there exists a step-wise evolving pattern of ChatGPT. We further analyze the inherent characteristics of ChatGPT by extracting the knowledge and linguistic features. We find some stable features that stay unchanged and apply them on the detection of ChatGPT-generated texts to improve the robustness of cross-version detection. We will continuously maintain our project at \url{https://github.com/THU-KEG/ChatLog/}.

Updated: 2024-06-18 00:33:25

标题: ChatLog: 仔细评估ChatGPT随时间演变

摘要: ChatGPT已取得巨大成功，并可被视为已获得基础设施地位。关于在基准测试中评估ChatGPT的作品丰富。然而，现有的基准测试遇到两个挑战：（1）忽视定期评估和（2）缺乏细粒度特征。本文构建了ChatLog，这是一个不断更新的数据集，包含了从2023年3月至今的大量记录，涵盖了各种形式的ChatGPT响应，用于21个NLP基准测试。我们进行了全面的性能评估，发现ChatGPT的大多数能力随时间改善，除了一些能力外，ChatGPT存在一种逐步演变的模式。我们进一步通过提取知识和语言特征来分析ChatGPT的固有特性。我们发现一些稳定的特征保持不变，并将它们应用于检测由ChatGPT生成的文本，以提高跨版本检测的鲁棒性。我们将持续维护我们的项目\url{https://github.com/THU-KEG/ChatLog/}。

更新时间: 2024-06-18 00:33:25

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2304.14106v2

Discussion Graph Semantics of First-Order Logic with Equality for Reasoning about Discussion and Argumentation

We formulate discussion graph semantics of first-order logic with equality for reasoning about discussion and argumentation as naturally as we would reason about sentences. While there are a few existing proposals to use a formal logic for reasoning about argumentation, they are constructed bottom-up and specialised to the argumentation model by Dung. There is indeed a conspicuous lack of a formal reasoning framework for handling general discussion and argumentation models. We achieve the generality through a top-down formulation of the semantics of first-order logic (with equality) formulas, addressing the current shortage.

Updated: 2024-06-18 00:32:00

标题: 一阶逻辑带有相等性的讨论图语义用于推理讨论与论证

摘要: 我们制定了一阶逻辑与等式讨论图语义，以便像我们处理句子一样自然地推理讨论和辩论。虽然有一些现有的提议使用形式逻辑来推理论证，但它们是自下而上构建的，并且专门针对Dung的论证模型。事实上，目前缺乏一个处理一般讨论和论证模型的形式推理框架。通过自上而下制定一阶逻辑（带等式）公式的语义，我们实现了这种广泛性，解决了当前的短缺问题。

更新时间: 2024-06-18 00:32:00

领域: cs.AI,cs.LO

下载: http://arxiv.org/abs/2406.12163v1

Understanding Help-Seeking and Help-Giving on Social Media for Image-Based Sexual Abuse

Image-based sexual abuse (IBSA), like other forms of technology-facilitated abuse, is a growing threat to people's digital safety. Attacks include unwanted solicitations for sexually explicit images, extorting people under threat of leaking their images, or purposefully leaking images to enact revenge or exert control. In this paper, we explore how people seek and receive help for IBSA on social media. Specifically, we identify over 100,000 Reddit posts that engage relationship and advice communities for help related to IBSA. We draw on a stratified sample of 261 posts to qualitatively examine how various types of IBSA unfold, including the mapping of gender, relationship dynamics, and technology involvement to different types of IBSA. We also explore the support needs of victim-survivors experiencing IBSA and how communities help victim-survivors navigate their abuse through technical, emotional, and relationship advice. Finally, we highlight sociotechnical gaps in connecting victim-survivors with important care, regardless of whom they turn to for help.

Updated: 2024-06-18 00:23:00

标题: 理解社交媒体上基于图像的性虐待的求助和援助

摘要: 基于图像的性虐待（IBSA），就像其他形式的科技助推滥用一样，正日益成为人们数字安全的威胁。攻击包括对性爱图像的不受欢迎的索取，在威胁泄露他们的图像的情况下勒索人们，或者故意泄露图像来报复或施加控制。在本文中，我们探讨了人们如何在社交媒体上寻求和获得有关IBSA的帮助。具体而言，我们确定了超过10万个Reddit帖子，其中参与了关系和咨询社区，寻求与IBSA相关的帮助。我们依靠对261个帖子的分层样本进行定性分析，探讨了各种类型的IBSA是如何展开的，包括将性别、关系动态和技术参与映射到不同类型的IBSA。我们还探讨了经历IBSA的受害者幸存者的支持需求，以及社区如何通过技术、情感和关系建议帮助受害者幸存者应对滥用。最后，我们强调了在连接受害者幸存者与重要关怀方面的社会技术差距，无论他们向谁寻求帮助。

更新时间: 2024-06-18 00:23:00

领域: cs.CY,cs.CR,cs.HC,cs.SI,K.4.2; H.4.3; J.4

下载: http://arxiv.org/abs/2406.12161v1

Block Circulant Codes with Application to Decentralized Systems

The structure of linear dependence relations between coded symbols of a linear code, irrespective of specific coefficients involved, is referred to as the {\em topology} of the code. The specification of coefficients is referred to as an {\em instantiation} of the topology. In this paper, we propose a new block circulant topology $T_{[\mu,\lambda,\omega]}(\rho)$ parameterized by integers $\rho \geq 2$, $\omega \geq 1$, $\lambda \geq 2$, and $\mu$ a multiple of $\lambda$. In this topology, the code has $\mu$ local codes with $\rho$ parity-check (p-c) constraints and a total of $\mu\rho$ p-c equations fully define the code. Next, we construct a class of block circulant (BC) codes ${\cal C}_{\text{BC}}[\mu,\lambda,\omega,\rho]$ with blocklength $n=\mu(\rho+\omega)$, dimension $k=\mu\omega$ that instantiate $T_{[\mu,\lambda,\omega]}(\rho)$. Every local code of ${\cal C}_{\text{BC}}[\mu,\lambda,\omega,\rho]$ is a $[\rho+\lambda\omega,\lambda\omega,\rho+1]$ generalized Reed-Solomon (RS) code. The overlap between supports of local codes helps to enhance the minimum distance $\rho+1$ to $2\rho+1$, without compromising much on the rate. We provide an efficient, parallelizable decoding algorithm to correct $2\rho$ erasures when $\lambda=2$. Finally, we illustrate that the BC codes serve as a viable alternative to 2D RS codes in protocols designed to tackle blockchain networks' data availability (DA) problem. In these protocols, every node in a network of light nodes randomly queries symbols from a codeword stored in full nodes and verifies them using a cryptographic commitment scheme. For the same performance in tackling the DA problem, the BC code requires querying a smaller number of symbols than a comparable 2D RS code for a fixed high rate. Furthermore, the number of local codes in the BC code is typically smaller, yielding a reduction in the complexity of realizing the commitment scheme.

Updated: 2024-06-18 00:22:20

标题: 块循环码及其在去中心化系统中的应用

摘要: 线性码的编码符号之间的线性依赖关系结构，不考虑具体涉及的系数，被称为码的拓扑结构。系数的规定被称为拓扑的实例化。在本文中，我们提出了一种新的块循环拓扑$T_{[\mu,\lambda,\omega]}(\rho)$，由整数参数化$\rho \geq 2$，$\omega \geq 1$，$\lambda \geq 2$，$\mu$是$\lambda$的倍数。在这种拓扑中，该码具有$\mu$个具有$\rho$个校验位(p-c)约束的本地码，总共有$\mu\rho$个p-c方程完全定义了该码。接下来，我们构造了一个块循环(BC)码类${\cal C}_{\text{BC}}[\mu,\lambda,\omega,\rho]$，块长度为$n=\mu(\rho+\omega)$，维度为$k=\mu\omega$，实例化了$T_{[\mu,\lambda,\omega]}(\rho)$。${\cal C}_{\text{BC}}[\mu,\lambda,\omega,\rho]$的每个本地码都是一个$[\rho+\lambda\omega,\lambda\omega,\rho+1]$广义Reed-Solomon (RS)码。本地码之间的支持重叠有助于将最小距离从$\rho+1$增加到$2\rho+1$，而不会对速率产生太大影响。我们提供了一种高效、可并行化的解码算法，用于纠正$2\rho$个擦除，当$\lambda=2$时。最后，我们说明BC码作为一种可行的替代方案，用于解决区块链网络数据可用性(DA)问题中的2D RS码。在这些协议中，轻节点网络中的每个节点随机查询存储在完整节点中的码字中的符号，并使用加密承诺方案进行验证。为了解决DA问题的相同性能，与固定高速率相比，BC码需要查询的符号数量比可比较的2D RS码少。此外，BC码中的本地码数量通常较少，降低了实现承诺方案的复杂性。

更新时间: 2024-06-18 00:22:20

领域: cs.IT,cs.CR,math.IT

下载: http://arxiv.org/abs/2406.12160v1

Exploring the Impact of a Transformer's Latent Space Geometry on Downstream Task Performance

It is generally thought that transformer-based large language models benefit from pre-training by learning generic linguistic knowledge that can be focused on a specific task during fine-tuning. However, we propose that much of the benefit from pre-training may be captured by geometric characteristics of the latent space representations, divorced from any specific linguistic knowledge. In this work we explore the relationship between GLUE benchmarking task performance and a variety of measures applied to the latent space resulting from BERT-type contextual language models. We find that there is a strong linear relationship between a measure of quantized cell density and average GLUE performance and that these measures may be predictive of otherwise surprising GLUE performance for several non-standard BERT-type models from the literature. These results may be suggestive of a strategy for decreasing pre-training requirements, wherein model initialization can be informed by the geometric characteristics of the model's latent space.

Updated: 2024-06-18 00:17:30

标题: 探讨变压器的潜在空间几何对下游任务性能的影响

摘要: 通常认为，基于变压器的大型语言模型通过预训练可以获益，通过学习通用的语言知识，这些知识可以在微调过程中专注于特定任务。然而，我们提出，预训练带来的许多好处可能是由于潜在空间表示的几何特征而非任何特定的语言知识。在这项工作中，我们探讨了GLUE基准测试任务表现与应用于BERT类型上下文语言模型产生的潜在空间的各种度量之间的关系。我们发现，量化细胞密度的度量与平均GLUE表现之间存在强烈的线性关系，这些度量可能对文献中几种非标准的BERT类型模型的意外GLUE表现具有预测能力。这些结果可能提示了一种减少预训练要求的策略，其中模型初始化可以受到模型潜在空间的几何特征的指导。

更新时间: 2024-06-18 00:17:30

领域: cs.CL,cs.LG

下载: http://arxiv.org/abs/2406.12159v1

LLMs Are Prone to Fallacies in Causal Inference

Recent work shows that causal facts can be effectively extracted from LLMs through prompting, facilitating the creation of causal graphs for causal inference tasks. However, it is unclear if this success is limited to explicitly-mentioned causal facts in the pretraining data which the model can memorize. Thus, this work investigates: Can LLMs infer causal relations from other relational data in text? To disentangle the role of memorized causal facts vs inferred causal relations, we finetune LLMs on synthetic data containing temporal, spatial and counterfactual relations, and measure whether the LLM can then infer causal relations. We find that: (a) LLMs are susceptible to inferring causal relations from the order of two entity mentions in text (e.g. X mentioned before Y implies X causes Y); (b) if the order is randomized, LLMs still suffer from the post hoc fallacy, i.e. X occurs before Y (temporal relation) implies X causes Y. We also find that while LLMs can correctly deduce the absence of causal relations from temporal and spatial relations, they have difficulty inferring causal relations from counterfactuals, questioning their understanding of causality.

Updated: 2024-06-18 00:14:07

标题: LLMs在因果推断中容易出现谬误

摘要: 最近的研究表明，通过提示可以有效地从LLMs中提取因果事实，从而促进了因果推断任务中因果图的创建。然而，目前尚不清楚这一成功是否仅限于预训练数据中明确提及的因果事实，这些事实模型可以记忆。因此，本研究调查了：LLMs能否从文本中的其他关系数据推断因果关系？为了解开记忆的因果事实与推断的因果关系的作用，我们在包含时间、空间和反事实关系的合成数据上对LLMs进行微调，并测量LLM是否能够推断因果关系。我们发现：(a) LLMs容易从文本中两个实体提及的顺序推断因果关系（例如，X在Y之前提及意味着X导致Y）；(b) 如果顺序被随机化，LLMs仍然会遭受事后谬误，即X发生在Y之前（时间关系）意味着X导致Y。我们还发现，虽然LLMs可以正确推断出因果关系不存在的情况，但它们在从反事实中推断因果关系方面存在困难，这对它们对因果关系的理解提出了质疑。

更新时间: 2024-06-18 00:14:07

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2406.12158v1